Prompt Engineering 学习笔记

2023/03/15
Prompt Engineering 学习笔记

零、概览

  • 大语言模型要解决的问题,如总结、问答、分类、扮演、编程和推理。
  • 提示工程的一些高级技巧,如少样本提示、思维链提示、零样本提示和程序辅助语言模型。
  • 模型安全的一些挑战和方法,如提示词注入、提示信息泄露和越狱。
  • 人类反馈强化学习的应用,以及未来的方向,如增强型语言模型、语言模型的涌现能力、行动/规划强化学习、多模态提示和图提示。

🔑 Key Words
LLM: Large Language Model 大语言模型
CoT: Chain of Thought 思维链
PAL: Program-aided Language Model 程序辅助语言模型

*本文为 Prompt Engineering Overview 视频学习笔记

一、大语言模型要解决哪些问题?

  1. Text Summarization 总结
  2. Question Answering 问答
  3. Text Classification 分类
  4. Role Playing 扮演

The following is a conversation with an AIresearchassistant.The assistant tone is technicalandscientific.
Human: Hello,who are you?
AI: Greetinq! I am an AI research assistant.How can Ihelp you today?
Human: Can you tell me about the creation of blackholes?
AI: Sure! Black holes are reqions of spacetime where thegravitational force is so strong that nothinq, not evenlight, can escape from it. They are created when a verymassive star dies and its core collapses in on itself,forming a sinqularity of infinite density. The intensegravity of the singularity pulls in all the matter andradiation around it, creating the black hole.

  1. Code Generation 编程
  2. Resoning 推理

二、 Prompt Engineering 的一些高级技巧

1. Few-shot Prompts

用几个例子(问题+答案)来引导
Few-shot prompting allows us to provide exemplars inprompts to steer the model towards better performance

The odd numbers in this group add up to an even number: 48,9,15,12.2.1.
A: The answer is False.
The odd numbers in this group add up to an even number: 17.
10,19,4,8,12,24.
A: The answer is True.
The odd numbers in this group add up to an even number: 1611,14,4,8,13,24.
A: The answer is True.

2. Chain-of-Though (CoT) Prompting

给出推理的过程(思考链路)来引导
Prompting can be further improved by instructing themodel to reason about the task when responding
This is very useful for tasks that requiring reasoningYou can combine it with few-shot prompting to get better resultsYou can also do zero-shot CoT where exemplars are notavailable

The odd numbers in this group add up to an even number: 4,8,9,15,12,2,1.
A: Adding all the odd numbers (9,15,1) gives 25. The answer is False.
The odd numbers in this group add up to an even number: 15, 32,5,13,82,7,1.
A: Adding all the odd numbers (15,5,13,7,1) gives 41. The answer is False

3. Zero Shot

仅用一句话来引导AI思考,而非给出例子
Involves adding “Let’s think step by step” to the originalprompt

I went to the market and bought 10 apples. I gave 2 applesneighbor and 2 to the repairman. I then went and to the bought 5 more apples and ate 1.How many apples did Iremain with?
11 apples

I went to the market and bought 10 apples. I gave 2 apples to the neiqhbor and 2 to the repairman.I then went and bought 5 more apples and ate 1. How many apples did Iremain with?
Let’s think step by step.
First, you started with 10 apples.You gave away 2 apples to the neighbor and 2 to the repairman, so you had 6 apples left.Then you bought 5 more apples, so now you had 11 apples.you ate l apple, so you would remain with 10Finallyapples.

4. Program-aided Language Model(PAL)

因为 CoT 有的时候还不够,我们可以用自然语言的形式辅助描述一个用编程语言构造的问题,然后让AI给出代码答案,我们再去运行这段代码答案,获得正确的结果。

  • Chain-of-thought prompting is a good example of how to steer models to perform better at complex reasoning tasks
    • However, sometimes CoT is not enough as it depends only on the generated text from the model
  • Program-aided language models (PAL) uses an LLM to read problems and generate programs as the intermediate reasoning steps
    • It offloads the solution step to a runtime such as Python interpreter

5. ReAct

ReAct是一个让大型语言模型(LLMs)能够同时进行推理和动作的框架,它可以让模型根据不同的任务生成合理的计划,并与外部资源交互获取更多的信息。

简单来说:搜索推理过程中无法理解的信息,再讲信息作为prompt喂给模型,接下来继续推理链条,重复步骤直到推理完成。New Bing 使用的就是类似逻辑。

  • ReAct is a framework where LLMs are used to generateboth reasoning traces and task-specific actions in aninterleaved manner
    • Generating reasoning traces allow the model to induce, track.and update action plans,and even handle exceptions
    • The action step allows to interface with and gather informationfrom external sources such as knowledge bases orenvironments.
      ReAct allows LLMs to interact with external tools toretrieve additional information that leads to more reliableand factual responses

三、结论和未来的方向

1. 模型安全 Model Safety

提示工程可以用于提升大语言模型的响应的安全性,降低潜在风险,也有人利用提示词注入来探索 LLM 的脆弱性。
Prompt engineering can be used not only to improve performance but also the reliability of response from a safety perspective

  • Prompt engineering can help identify risky behaviours of LLMs which can help to reduce harmful behaviours and risks that mayarise from language models
  • There is also a part of the community performing prompt injection to understand the vulnerability of LLMs

2. Prompt Injection 提示词注入

安全问题同样存在于大语言模型之中。
Some common issues include:

  • Prompt Injection
    • 注入不受信任的指令,来覆盖原有的对于提示词的引导规则: Prompt injection is used to hijack an LM’s output byinjecting an untrusted command that overrides instruction of a prompt
    • This could easily happen if you just concatenate your prompt with another user generated prompt
  • Prompt Leaking
    • 提示信息泄露的目的是迫使模型吐出有关其自身提示的信息。
    • 这可能导致敏感的、私人的或保密的信息的泄露。
    • Examples of prompt leaking
  • Jailbreaking
    • 越狱是绕过安全审核的提示注入。
    • API提供的大语言模型可能被有害提示/攻击绕过。
    • 模型是静态的,可能有漏洞。

3. 人类反馈强化学习 RLHF

所谓人类反馈强化,就是结合奖励机制,让人类标注员的反馈来不断纠正大语言模型的输出结果,来使得它的输出结果符合人类的主流偏好。

  • Reinforcement Learning from Human Feedback (RLHF)is now being used to train LLMs that fit human preference
  • RLHF involves collecting high-quality prompt datasets

一个例子,ChatGPT 是如何通过 RLHF 来优化模型的:
ChatGPT

4. 未来的方向 Future direction

  • Augmented LMs - 增强型语言模型
  • Emergent ability of LMs - 语言模型的涌现能力
  • Acting / Planning - Reinforcement Learning - 行动/规划 - 强化学习
  • Multimodal Prompting - 多模态提示
  • Graph Prompting - 图提示