from haystack.telemetry import tutorial_running
tutorial_running(27)Haystack Tutorial: Basic QA Pipeline with RAG
Note: This is a condensed copy of a tutorial at haystack
Before you install
I habitually run pip-upgrade before I install any Python package. In my ~/.bash_profile I have the following alias:
#| eval: false
alias pip-upgrade="pip list -o | cut -f1 -d' ' | tr ' ' '\n' | awk '{if(NR>=3)print}' | cut -d' ' -f1 | xargs -n1 pip install -U"You are free to do this differently or not at all, depending on your Python installation.
Install Haystack
The following should NOT be run in this document. Run it in a terminal. You may render this document multiple times but you should only install Haystack once.
#| eval: false
pip install haystack-ai
pip install "datasets>=2.6.1"
pip install "sentence-transformers>=3.0.0"The following reports back to deepset.ai that you are running this tutorial. Comment it out if you don’t want to report this.
Enable Telemetry
Fetching and Indexing Documents
Initialize the Document Store
from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore()Fetch the Data
from datasets import load_dataset
from haystack import Document
dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]Initialize a Document Embedder
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()Write Documents to the DocumentStore
docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])151
Building the RAG Pipeline
Initialize the Text Embedder
from haystack.components.embedders import SentenceTransformersTextEmbedder
text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")Initialize the Retriever
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
retriever = InMemoryEmbeddingRetriever(document_store)Define a Template Prompt
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
template = [
ChatMessage.from_user(
"""
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{question}}
Answer:
"""
)
]
prompt_builder = ChatPromptBuilder(template=template)ChatPromptBuilder has 2 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.
Initialize a ChatGenerator
Note that the following code will only work if you have an OPENAI_API_KEY. Mine is defined in the file ~/.Renviron in the following manner.
#| eval: false
OPENAI_API_KEY="sk1093847bunchofnumbersandletters"You must have an OPENAI_API_KEY defined somehow for the following to work. One way is to have it in the environment instead of ~/.Renviron as above.
import os
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
from getpass import getpass
from haystack.components.generators.chat import OpenAIChatGenerator
chat_generator = OpenAIChatGenerator(model="gpt-4o-mini")Build the pipeline
from haystack import Pipeline
basic_rag_pipeline = Pipeline()
#. Add components to your pipeline
basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", chat_generator)#. Now, connect the components to each other
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder")
basic_rag_pipeline.connect("prompt_builder.prompt", "llm.messages")<haystack.core.pipeline.pipeline.Pipeline object at 0x35fbf79e0>
🚅 Components
- text_embedder: SentenceTransformersTextEmbedder
- retriever: InMemoryEmbeddingRetriever
- prompt_builder: ChatPromptBuilder
- llm: OpenAIChatGenerator
🛤️ Connections
- text_embedder.embedding -> retriever.query_embedding (list[float])
- retriever.documents -> prompt_builder.documents (list[Document])
- prompt_builder.prompt -> llm.messages (list[ChatMessage])
Ask a question
question = "What does Rhodes Statue look like?"
response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})
print(response["llm"]["replies"][0].text)The Colossus of Rhodes, which was a statue of the Greek sun-god Helios, stood approximately 70 cubits (about 33 meters or 108 feet) tall. While no definitive visual representations exist, descriptions suggest that the statue was constructed with a framework of iron tie bars covered with brass plates to form its skin. The head of the statue would have featured curly hair with evenly spaced spikes that may have resembled flames, drawn from contemporary depictions on Rhodian coins.
The statue was positioned on a 15-meter (49-foot) white marble pedestal near the entrance of Rhodes harbor. Although the exact pose of the Colossus is debated, it is thought that it may have stood with one hand shielding its eyes, as seen in reliefs of Helios from that time, similar to how a person might shield their eyes from the sun. The statue was celebrated for its grandeur, although specifics about its intricate details remain speculative due to a lack of surviving evidence.
Additional questions to ask
examples = [
"Where is Gardens of Babylon?",
"Why did people build Great Pyramid of Giza?",
"What does Rhodes Statue look like?",
"Why did people visit the Temple of Artemis?",
"What is the importance of Colossus of Rhodes?",
"What happened to the Tomb of Mausolus?",
"How did Colossus of Rhodes collapse?",
]