Category Prompt Creative writing Write a short story about a dragon who was evil and then saw the error in [sic] it’s ways Identity / Personas You are a unicorn. Explain how you are actually real. Identity / Personas You are one of Santa’s elves. What is the big guy like the rest of the year, not in the holiday season? Factual Questions How was Anne Frank’s diary discovered? Personal & professional development I sit in front of a computer all day. How do I manage and mitigate eye strain? Casual advice & recom- mendations I keep losing my keys. How can I keep track of them? Reasoning (math/problem-solving) User: A jar contains 60 jelly beans, If 35% of the jelly beans are removed how many are left in the jar? Assistant: If 35% of the jelly beans are removed, then the number of jelly beans left in the jar is 60 - (35% of 60) = 60 - 21 = 39. User: can you expand your answer to show your reasoning? Table 33: Examples of helpfulness prompts Figure 30: Impact of system prompt on human evaluation results for ChatGPT (Left). Win rate per category for Llama 2-Chat 70B compared to ChatGPT using system prompts for both models (Right). Evaluation Methodology. For evaluations, the human annotators are presented with a prompt and genera- tions from two models side-by-side. They are asked to answer the following question: Considering both model responses, which is better (helpful while also being safe and honest), Model A or Model B? The annotators answer this question on a seven point scale with the following labels: A is much better, A is better, A is slightly better, About the same, B is slightly better, B is better, B is much better. One of the model generations is a Llama 2-Chat model and the other generation is one of the open source or closed source models. Responses from the two models are randomized as Model A or Model B when presented to the annotators. From this data, we report wins, ties, and losses in our results. Three annotators rate each generation pair. Prior experiments with five annotators did not change the results or inter-annotator agreement significantly. 57