Haystack Tutorial: Basic QA Pipeline with RAG

Author

Mick McQuaid

Published

October 21, 2025

Note: This is a condensed copy of a tutorial at haystack

Before you install

I habitually run pip-upgrade before I install any Python package. In my ~/.bash_profile I have the following alias:

#| eval: false
alias pip-upgrade="pip list -o | cut -f1 -d' ' | tr ' ' '\n' | awk '{if(NR>=3)print}' | cut -d' ' -f1 | xargs -n1 pip install -U"

You are free to do this differently or not at all, depending on your Python installation.

Install Haystack

The following should NOT be run in this document. Run it in a terminal. You may render this document multiple times but you should only install Haystack once.

#| eval: false
pip install haystack-ai
pip install "datasets>=2.6.1"
pip install "sentence-transformers>=3.0.0"

The following reports back to deepset.ai that you are running this tutorial. Comment it out if you don’t want to report this.

Enable Telemetry

from haystack.telemetry import tutorial_running
tutorial_running(27)

Fetching and Indexing Documents

Initialize the Document Store

from haystack.document_stores.in_memory import InMemoryDocumentStore
document_store = InMemoryDocumentStore()

Fetch the Data

from datasets import load_dataset
from haystack import Document

dataset = load_dataset("bilgeyucel/seven-wonders", split="train")
docs = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset]

Initialize a Document Embedder

from haystack.components.embedders import SentenceTransformersDocumentEmbedder
doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder.warm_up()

Write Documents to the DocumentStore

docs_with_embeddings = doc_embedder.run(docs)
document_store.write_documents(docs_with_embeddings["documents"])
151

Building the RAG Pipeline

Initialize the Text Embedder

from haystack.components.embedders import SentenceTransformersTextEmbedder
text_embedder = SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")

Initialize the Retriever

from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
retriever = InMemoryEmbeddingRetriever(document_store)

Define a Template Prompt

from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage

template = [
    ChatMessage.from_user(
        """
Given the following information, answer the question.

Context:
{% for document in documents %}
    {{ document.content }}
{% endfor %}

Question: {{question}}
Answer:
"""
    )
]

prompt_builder = ChatPromptBuilder(template=template)
ChatPromptBuilder has 2 prompt variables, but `required_variables` is not set. By default, all prompt variables are treated as optional, which may lead to unintended behavior in multi-branch pipelines. To avoid unexpected execution, ensure that variables intended to be required are explicitly set in `required_variables`.

Initialize a ChatGenerator

Note that the following code will only work if you have an OPENAI_API_KEY. Mine is defined in the file ~/.Renviron in the following manner.

#| eval: false
OPENAI_API_KEY="sk1093847bunchofnumbersandletters"

You must have an OPENAI_API_KEY defined somehow for the following to work. One way is to have it in the environment instead of ~/.Renviron as above.

import os
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
from getpass import getpass
from haystack.components.generators.chat import OpenAIChatGenerator

chat_generator = OpenAIChatGenerator(model="gpt-4o-mini")

Build the pipeline

from haystack import Pipeline

basic_rag_pipeline = Pipeline()
#. Add components to your pipeline
basic_rag_pipeline.add_component("text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", chat_generator)
#. Now, connect the components to each other
basic_rag_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder")
basic_rag_pipeline.connect("prompt_builder.prompt", "llm.messages")
<haystack.core.pipeline.pipeline.Pipeline object at 0x35fbf79e0>
🚅 Components
  - text_embedder: SentenceTransformersTextEmbedder
  - retriever: InMemoryEmbeddingRetriever
  - prompt_builder: ChatPromptBuilder
  - llm: OpenAIChatGenerator
🛤️ Connections
  - text_embedder.embedding -> retriever.query_embedding (list[float])
  - retriever.documents -> prompt_builder.documents (list[Document])
  - prompt_builder.prompt -> llm.messages (list[ChatMessage])

Ask a question

question = "What does Rhodes Statue look like?"

response = basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})

print(response["llm"]["replies"][0].text)
The Colossus of Rhodes, which was a statue of the Greek sun-god Helios, stood approximately 70 cubits (about 33 meters or 108 feet) tall. While no definitive visual representations exist, descriptions suggest that the statue was constructed with a framework of iron tie bars covered with brass plates to form its skin. The head of the statue would have featured curly hair with evenly spaced spikes that may have resembled flames, drawn from contemporary depictions on Rhodian coins.

The statue was positioned on a 15-meter (49-foot) white marble pedestal near the entrance of Rhodes harbor. Although the exact pose of the Colossus is debated, it is thought that it may have stood with one hand shielding its eyes, as seen in reliefs of Helios from that time, similar to how a person might shield their eyes from the sun. The statue was celebrated for its grandeur, although specifics about its intricate details remain speculative due to a lack of surviving evidence.

Additional questions to ask

examples = [
    "Where is Gardens of Babylon?",
    "Why did people build Great Pyramid of Giza?",
    "What does Rhodes Statue look like?",
    "Why did people visit the Temple of Artemis?",
    "What is the importance of Colossus of Rhodes?",
    "What happened to the Tomb of Mausolus?",
    "How did Colossus of Rhodes collapse?",
]