from haystack.telemetry import tutorial_running
27) tutorial_running(
Haystack Tutorial: Basic QA Pipeline with RAG
Note: This is a condensed copy of a tutorial at haystack
Before you install
I habitually run pip-upgrade
before I install any Python package. In my ~/.bash_profile
I have the following alias:
#| eval: false
alias pip-upgrade="pip list -o | cut -f1 -d' ' | tr ' ' '\n' | awk '{if(NR>=3)print}' | cut -d' ' -f1 | xargs -n1 pip install -U"
You are free to do this differently or not at all, depending on your Python installation.
Install Haystack
The following should NOT be run in this document. Run it in a terminal. You may render this document multiple times but you should only install Haystack once.
#| eval: false
pip install haystack-ai
pip install "datasets>=2.6.1"
pip install "sentence-transformers>=3.0.0"
The following reports back to deepset.ai that you are running this tutorial. Comment it out if you donβt want to report this.
Enable Telemetry
Fetching and Indexing Documents
Initialize the Document Store
from haystack.document_stores.in_memory import InMemoryDocumentStore
= InMemoryDocumentStore() document_store
Fetch the Data
from datasets import load_dataset
from haystack import Document
= load_dataset("bilgeyucel/seven-wonders", split="train")
dataset = [Document(content=doc["content"], meta=doc["meta"]) for doc in dataset] docs
Initialize a Document Embedder
from haystack.components.embedders import SentenceTransformersDocumentEmbedder
= SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
doc_embedder doc_embedder.warm_up()
Write Documents to the DocumentStore
= doc_embedder.run(docs)
docs_with_embeddings "documents"]) document_store.write_documents(docs_with_embeddings[
151
Building the RAG Pipeline
Initialize the Text Embedder
from haystack.components.embedders import SentenceTransformersTextEmbedder
= SentenceTransformersTextEmbedder(model="sentence-transformers/all-MiniLM-L6-v2") text_embedder
Initialize the Retriever
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
= InMemoryEmbeddingRetriever(document_store) retriever
Define a Template Prompt
from haystack.components.builders import ChatPromptBuilder
from haystack.dataclasses import ChatMessage
= [
template
ChatMessage.from_user("""
Given the following information, answer the question.
Context:
{% for document in documents %}
{{ document.content }}
{% endfor %}
Question: {{question}}
Answer:
"""
)
]
= ChatPromptBuilder(template=template) prompt_builder
Initialize a ChatGenerator
Note that the following code will only work if you have an OPENAI_API_KEY. Mine is defined in the file ~/.Renviron
in the following manner.
#| eval: false
OPENAI_API_KEY="sk1093847bunchofnumbersandletters"
You must have an OPENAI_API_KEY defined somehow for the following to work.
import os
from getpass import getpass
from haystack.components.generators.chat import OpenAIChatGenerator
= OpenAIChatGenerator(model="gpt-4o-mini") chat_generator
Build the pipeline
from haystack import Pipeline
= Pipeline()
basic_rag_pipeline #. Add components to your pipeline
"text_embedder", text_embedder)
basic_rag_pipeline.add_component("retriever", retriever)
basic_rag_pipeline.add_component("prompt_builder", prompt_builder)
basic_rag_pipeline.add_component("llm", chat_generator) basic_rag_pipeline.add_component(
#. Now, connect the components to each other
connect("text_embedder.embedding", "retriever.query_embedding")
basic_rag_pipeline.connect("retriever", "prompt_builder")
basic_rag_pipeline.connect("prompt_builder.prompt", "llm.messages") basic_rag_pipeline.
<haystack.core.pipeline.pipeline.Pipeline object at 0x31ffd4a40>
π
Components
- text_embedder: SentenceTransformersTextEmbedder
- retriever: InMemoryEmbeddingRetriever
- prompt_builder: ChatPromptBuilder
- llm: OpenAIChatGenerator
π€οΈ Connections
- text_embedder.embedding -> retriever.query_embedding (List[float])
- retriever.documents -> prompt_builder.documents (List[Document])
- prompt_builder.prompt -> llm.messages (List[ChatMessage])
Ask a question
= "What does Rhodes Statue look like?"
question
= basic_rag_pipeline.run({"text_embedder": {"text": question}, "prompt_builder": {"question": question}})
response
print(response["llm"]["replies"][0].text)
The Colossus of Rhodes was a statue of the Greek sun-god Helios, standing approximately 70 cubits (about 32 meters or 105 feet) tall. It was constructed with iron tie bars and brass plates that formed its skin, while the interior was filled with stone blocks. The head of the statue was characterized by curly hair with evenly spaced spikes of bronze or silver flame radiating from it, reflecting a standard rendering of the time.
While the specific visual depiction of the statue has not survived, contemporary descriptions suggest that it was an impressive and grand figure. The statue likely stood on a 15-meter-high (49-foot) white marble pedestal near the entrance of the harbor, giving it even more height and prominence. The pose of the statue is debated, but it may have been depicted shielding its eyes, similar to other representations of Helios found in ancient art. Overall, the statue was a monumental representation of Helios, symbolizing triumph and freedom over their enemies.
Additional questions to ask
= [
examples "Where is Gardens of Babylon?",
"Why did people build Great Pyramid of Giza?",
"What does Rhodes Statue look like?",
"Why did people visit the Temple of Artemis?",
"What is the importance of Colossus of Rhodes?",
"What happened to the Tomb of Mausolus?",
"How did Colossus of Rhodes collapse?",
]