Conference’17, July 2017, Washington, DC, USA Wenqi Fan, Yujuan Ding, Liangbo Ning, Shijie Wang, Hengyun Li, Dawei Yin, Tat-Seng Chua, and Qing Li Retriever Datastore Input Output Datastore Retriever Input Output Retriever Datastore Input Output Retriever Datastore Input Output Retriever Datastore Input Output Retriever Datastore Input Output 1. Independent Training of Retriever 2. Independent Training of LLMs Independent Training Training-Free Joint Training Sequential Training 1. Retriever First 2. LLMs First Forward Backward Retrieved Documents Trainable Frozen Large Language Models Large Language Models Large Language Models Large Language Models Large Language Models Large Language Models Figure 5: An illustration of different training methods in Retrieval-Augmented Large Language Models (RA-LLMs). Existing RA- LLMs approaches can be categorized into two classes: training-free approaches usually directly leverage retrieved information during the inference time by integrating the retrieved knowledge into the prompt, and training-based approaches fine-tune the retrieval and generator to enhance the generation performance. Based on the training strategies, training-based methods can be further categorized into three groups: independent training, where the retrieval and generator components are trained independently; sequential training, where they are trained sequentially; and joint training, where they are trained jointly. the original prompt directly, and 2) Retrieval-Guided Token Gen- eration Methods retrieve information to calibrate the token gen- eration process. 4.1.1 Prompt Engineering-based Methods. As the LLMs’ generation performance highly depends on the input query, numerous training- free RAG approaches employ external knowledge by refining the original prompts [57, 63, 81]. Specifically, the retrieved texts are usu- ally used as contextual information and combined with the original prompt to guide the generation of LLMs [54, 57, 63, 65, 81, 112, 158]. For example, In-Context RALM [117] keeps the LLM parameters unchanged and directly incorporates the retrieved document before the original prompt to augment the generation process. IRCoT [149] interleaves chain-of-thought (CoT) generation and knowledge re- trieval steps, enabling the retrieval of more relevant information for subsequent reasoning steps compared to standard retrieval methods that rely solely on the question as the query. Instead of retrieving knowledge from a large corpus, GENREAD [182] first prompts a LLM to generate contextual documents based on the query, and then generate answers based on the given context and question. SKR [159] proposes guiding LLMs to determine whether they can answer a given question based on their internal knowledge, en- abling flexible utilization of both internal and external knowledge by selectively calling the retriever. TOC [65] first retrieves relevant knowledge for ambiguous questions and recursively constructs a tree structure by clarifying ambiguous questions into multiple disambiguate questions, which is further aggregated to generate long-form answers. 4.1.2 Retrieval-Guided Token Generation Methods. In addition to directly integrating external knowledge into the original prompt, the auxiliary information can be employed to adjust the token gen- eration process. For example, KNN-KMs [62] first retrieves most relevant contexts from the datastore based on the given query, and computes a neighbor distribution based on the distance. The output distribution is calibrated by interpolating the neighbor distribution and the original model’s output distribution. Rest [49] is proposed to replace the parametric draft model with a non-parametric re- trieval datastore and retrieves relevant tokens based on the current context for speculative decoding [9, 71, 145]. 10