https://kazuhira-r.hatenablog.com/entry/2025/03/03/001723

これは、なにをしたくて書いたもの？

こちらのエントリーで、LangChainのチュートリアルRAG Part 1をLangSmith、LangGraphなしで試してみました。

LangChainのチュートリアル、RAG Part 1を試す（LangSmith、LangGraphなし） - CLOVER🍀

ただ、その後のチュートリアルを見ているとLangSmithはいいとして、LangGraphは使っておいた方がこの後の内容を
進めやすいなと思ってLangGraphを導入してもう1度試してみることにしました。

前の内容をLangGraphを使って書き換えただけになるので、今回は簡単に済ませることにします。

対象のチュートリアル

前にエントリーと同じく、LangChainのRAG Part 1のチュートリアルです。

Build a Retrieval Augmented Generation (RAG) App: Part 1 | 🦜️🔗 LangChain

LangGraph

LangGraphも少しだけ見ておきましょう。

LangGraph

GitHub リポジトリーはこちら。

GitHub - langchain-ai/langgraph: Build resilient language agents as graphs.

LangGraphは、エージェントおよびマルチエージェントのワークフローを作成するために使用される、LLMを使った
ステートフルなマルチアクターアプリケーションを作成するためのライブラリーであるとされています。

LangGraph is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows.

以下のような特徴を持つようです。

（会話の履歴などの）ステート管理
LLMを使ったワークフローをノードとエッジで構成されるグラフ（いわゆるDAG）として表現する
- 条件分岐やループの表現
チェックポイントの作成と再開

ワークフローの構成要素は、LLMに回答を生成させる、検索を行うといったもので前のエントリーではLangGraphを
使わない代わりにLCELで接続して表現しました。

なので、各タスクをノードとして表現・接続して実行することになりますね。

では、LangGraphを使ってチュートリアルのRAG Part 1をもう1度やってみましょう。前と同じくモデルの実行には
Ollama、ベクトルデータベースにQdrantを使います。

環境

今回の環境はこちら。

$ python3 --version
Python 3.12.3


$ uv --version
uv 0.6.3

Ollama。

$ bin/ollama serve
$ bin/ollama --version
ollama version is 0.5.12

Qdrantは172.17.0.2で動作しているものとします。

$ ./qdrant --version
qdrant 1.13.4

準備

プロジェクトの作成。

$ uv init --vcs none langchain-tutorial-rag-part1-with-langgraph
$ cd langchain-tutorial-rag-part1-with-langgraph
$ rm main.py

依存関係のインストール。

$ uv add langchain-community langchain-ollama langchain-qdrant beautifulsoup4 langgraph

mypy、Ruffの追加。

$ uv add --dev mypy ruff

インストールされた依存関係の一覧。

$ uv pip list
Package                  Version
------------------------ ---------
aiohappyeyeballs         2.4.6
aiohttp                  3.11.13
aiosignal                1.3.2
annotated-types          0.7.0
anyio                    4.8.0
attrs                    25.1.0
beautifulsoup4           4.13.3
certifi                  2025.1.31
charset-normalizer       3.4.1
dataclasses-json         0.6.7
frozenlist               1.5.0
greenlet                 3.1.1
grpcio                   1.70.0
grpcio-tools             1.70.0
h11                      0.14.0
h2                       4.2.0
hpack                    4.1.0
httpcore                 1.0.7
httpx                    0.28.1
httpx-sse                0.4.0
hyperframe               6.1.0
idna                     3.10
jsonpatch                1.33
jsonpointer              3.0.0
langchain                0.3.19
langchain-community      0.3.18
langchain-core           0.3.40
langchain-ollama         0.2.3
langchain-qdrant         0.2.0
langchain-text-splitters 0.3.6
langgraph                0.3.2
langgraph-checkpoint     2.0.16
langgraph-prebuilt       0.1.1
langgraph-sdk            0.1.53
langsmith                0.3.11
marshmallow              3.26.1
msgpack                  1.1.0
multidict                6.1.0
mypy                     1.15.0
mypy-extensions          1.0.0
numpy                    2.2.3
ollama                   0.4.7
orjson                   3.10.15
packaging                24.2
portalocker              2.10.1
propcache                0.3.0
protobuf                 5.29.3
pydantic                 2.10.6
pydantic-core            2.27.2
pydantic-settings        2.8.1
python-dotenv            1.0.1
pyyaml                   6.0.2
qdrant-client            1.13.2
requests                 2.32.3
requests-toolbelt        1.0.0
ruff                     0.9.9
setuptools               75.8.2
sniffio                  1.3.1
soupsieve                2.6
sqlalchemy               2.0.38
tenacity                 9.0.0
typing-extensions        4.12.2
typing-inspect           0.9.0
urllib3                  2.3.0
yarl                     1.18.3
zstandard                0.23.0

pyproject.toml

[project]
name = "langchain-tutorial-rag-part1-with-langgraph"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
    "beautifulsoup4>=4.13.3",
    "langchain-community>=0.3.18",
    "langchain-ollama>=0.2.3",
    "langchain-qdrant>=0.2.0",
    "langgraph>=0.3.2",
]

[dependency-groups]
dev = [
    "mypy>=1.15.0",
    "ruff>=0.9.9",
]

[tool.mypy]
strict = true
disallow_any_unimported = true
#disallow_any_expr = true
disallow_any_explicit = true
warn_unreachable = true
pretty = true

LangChainのチュートリアルのRAG Part 1を試す

では、もう1度LangChainのチュートリアルのRAG Part 1を試していきます。

LangChainのチュートリアル、RAG Part 1を試す（LangSmith、LangGraphなし） - CLOVER🍀

基本的にはこちらのエントリーで書いた内容のうち、LangGraphで置き換えられるものを書き換える感じになります。

LangChainのチュートリアル、RAG Part 1を試す（LangSmith、LangGraphなし） - CLOVER🍀

ドキュメントをVector storeに登録する

これは前のエントリーとまったく同じです。

Build a Retrieval Augmented Generation (RAG) App: Part 1 / Indexing

hello_load_documents.py

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_ollama import OllamaEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")
client.delete_collection(collection_name="tutorial_collection")
client.create_collection(
    collection_name="tutorial_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={
        "parse_only": bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    },
)

docs = loader.load()

assert len(docs) == 1
print(f"total characters: {len(docs[0].page_content)}")

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

print(f"splits = {len(all_splits)}")

document_ids = vector_store.add_documents(all_splits)
print(document_ids[:3])

実行してドキュメントをベクトル化、登録。

$ uv run hello_load_documents.py
USER_AGENT environment variable not set, consider setting it to identify your requests.
total characters: 43130
splits = 66
['61b67748c5c44c5bbd1f5d64f4b3eb39', '7d300a7ab0974170ba8316326387e965', '48db001c559c4be4884ee35b16e59ecf']

検索とChat modelへの質問（生成）を行う

次は、検索とChat modelへの質問（生成）を行います。

Build a Retrieval Augmented Generation (RAG) App: Part 1 Retrieval and Generation

作成したソースコードはこちら。

hello_rag.py

from langchain_core.documents import Document
from langchain_core.prompts import PromptTemplate
from langchain_ollama import OllamaEmbeddings
from langchain_ollama import ChatOllama
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from langgraph.graph import START, StateGraph
from typing_extensions import List, TypedDict

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

prompt = PromptTemplate.from_template("""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context} """)

llm = ChatOllama(model="llama3.2:3b", temperature=0, base_url="http://localhost:11434")


class State(TypedDict):
    question: str
    context: List[Document]
    answer: str


def retrieve(state: State) -> dict[str, List[Document]]:
    print(f"retieve state = {state}")
    print()
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State) -> dict[str, str]:
    print(f"generate state = {state}")
    print()
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

print(f"graph = {graph.get_graph()}")
print()

response = graph.invoke({"question": "What is Task Decomposition?"})
print(f"answer = {response['answer']}")

print()

print(f"response = {response}")

LangGraphに特化したところを見ていきます。

まずこちらがステートで、各ノードで処理した結果を持ち回すのに使うようです。

class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

検索と生成に使う関数はこちらで、ここで作成したStateクラスを受け取ります。

def retrieve(state: State) -> dict[str, List[Document]]:
    print(f"retieve state = {state}")
    print()
    retrieved_docs = vector_store.similarity_search(state["question"])
    return {"context": retrieved_docs}


def generate(state: State) -> dict[str, str]:
    print(f"generate state = {state}")
    print()
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}

ここではそれぞれ検索、そして検索結果をプロンプトのコンテキストに埋め込んでChat modelに質問を行います。

この2つの関数をノードとするグラフの作成。

graph_builder = StateGraph(State).add_sequence([retrieve, generate])
graph_builder.add_edge(START, "retrieve")
graph = graph_builder.compile()

最後にグラフに対して入力を与え、実行することで結果を得ます。

response = graph.invoke({"question": "What is Task Decomposition?"})
print(f"answer = {response['answer']}")

各ポイントにprintを入れているので、実行しつつその途中経過を見ていきましょう。

$ uv run hello_rag.py

まず、グラフを出力している部分。

print(f"graph = {graph.get_graph()}")
print()

辞書の内容がそのまま出ているだけですが、ノードとエッジの様子はなんとなくわかると思います。

graph = Graph(nodes={'__start__': Node(id='__start__', name='__start__', data=<class 'langchain_core.utils.pydantic.LangGraphInput'>, metadata=None), 'retrieve': Node(id='retrieve', name='retrieve', data=retrieve(tags=None, recurse=True, explode_args=False, func_accepts_config=False, func_accepts={}), metadata=None), 'generate': Node(id='generate', name='generate', data=generate(tags=None, recurse=True, explode_args=False, func_accepts_config=False, func_accepts={}), metadata=None)}, edges=[Edge(source='__start__', target='retrieve', data=None, conditional=False), Edge(source='retrieve', target='generate', data=None, conditional=False)])

チュートリアルでは、これを視覚的に表現するために以下のコードを使っています。

from IPython.display import Image, display

display(Image(graph.get_graph().draw_mermaid_png()))

各ノードに渡ってくるステートを見ていきます。

まずretrieve関数。

def retrieve(state: State) -> dict[str, List[Document]]:
    print(f"retieve state = {state}")
    print()

ここではquestionが入って渡ってきます。

retieve state = {'question': 'What is Task Decomposition?'}

これはグラフを実行する時に渡した内容ですね。

response = graph.invoke({"question": "What is Task Decomposition?"})

この内容がStateクラスにマッピングされているということになります。

class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

次はgenereate関数。

def generate(state: State) -> dict[str, str]:
    print(f"generate state = {state}")
    print()

こちらでは内容が増え、questionに加えてcontextが含まれています。

generate state = {'question': 'What is Task Decomposition?', 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '48db001c-559c-4be4-884e-e35b16e59ecf', '_collection_name': 'tutorial_collection'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '01c9c08e-5614-4ce6-8cde-ab599f8e164c', '_collection_name': 'tutorial_collection'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '85acbb24-1dae-480a-85ea-7268a252fa60', '_collection_name': 'tutorial_collection'}, page_content='Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\n\n\nChallenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'e72575f9-f306-48ee-b127-172aeb551355', '_collection_name': 'tutorial_collection'}, page_content='Fig. 3. Illustration of the Reflexion framework. (Image source: Shinn & Labash, 2023)\nThe heuristic function determines when the trajectory is inefficient or contains hallucination and should be stopped. Inefficient planning refers to trajectories that take too long without success. Hallucination is defined as encountering a sequence of consecutive identical actions that lead to the same observation in the environment.\nSelf-reflection is created by showing two-shot examples to LLM and each example is a pair of (failed trajectory, ideal reflection for guiding future changes in the plan). Then reflections are added into the agent’s working memory, up to three, to be used as context for querying LLM.')]}

これはretrieve関数が返した内容がステートに追加されたものですね。

    return {"context": retrieved_docs}

グラフを実行して得られるanswerを見てみましょう。ここまでくると予想がつきますが、generate関数が返した値を
指していますね。

response = graph.invoke({"question": "What is Task Decomposition?"})
print(f"answer = {response['answer']}")

結果。

answer = Task Decomposition is a technique where a complicated task is broken down into smaller and simpler steps, allowing an agent or model to plan ahead and tackle complex problems more effectively. This process can be achieved through various methods, including Chain of Thought (CoT) and Tree of Thoughts, which utilize prompting techniques to guide the model's thinking process. Task decomposition enables models to shed light into their own thought processes and improve performance on complex tasks.

出力結果全体はどうなっているかというと

print(f"response = {response}")

question、context、answerがすべて含まれていることになります。

response = {'question': 'What is Task Decomposition?', 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '48db001c-559c-4be4-884e-e35b16e59ecf', '_collection_name': 'tutorial_collection'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '01c9c08e-5614-4ce6-8cde-ab599f8e164c', '_collection_name': 'tutorial_collection'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '85acbb24-1dae-480a-85ea-7268a252fa60', '_collection_name': 'tutorial_collection'}, page_content='Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\n\n\nChallenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'e72575f9-f306-48ee-b127-172aeb551355', '_collection_name': 'tutorial_collection'}, page_content='Fig. 3. Illustration of the Reflexion framework. (Image source: Shinn & Labash, 2023)\nThe heuristic function determines when the trajectory is inefficient or contains hallucination and should be stopped. Inefficient planning refers to trajectories that take too long without success. Hallucination is defined as encountering a sequence of consecutive identical actions that lead to the same observation in the environment.\nSelf-reflection is created by showing two-shot examples to LLM and each example is a pair of (failed trajectory, ideal reflection for guiding future changes in the plan). Then reflections are added into the agent’s working memory, up to three, to be used as context for querying LLM.')], 'answer': "Task Decomposition is a technique where a complicated task is broken down into smaller and simpler steps, allowing an agent or model to plan ahead and tackle complex problems more effectively. This process can be achieved through various methods, including Chain of Thought (CoT) and Tree of Thoughts, which utilize prompting techniques to guide the model's thinking process. Task decomposition enables models to shed light into their own thought processes and improve performance on complex tasks."}

つまり、Stateクラスのプロパティがすべて設定された状態になった、ということですね。

class State(TypedDict):
    question: str
    context: List[Document]
    answer: str

クエリー分析

最後はクエリー分析です。

Build a Retrieval Augmented Generation (RAG) App: Part 1 / Query analysis

ロードするドキュメントに、メタデータとしてセクションを追加します。

hello_load_documents.py

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_ollama import OllamaEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")
client.delete_collection(collection_name="tutorial_collection")
client.create_collection(
    collection_name="tutorial_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={
        "parse_only": bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    },
)

docs = loader.load()

assert len(docs) == 1
print(f"total characters: {len(docs[0].page_content)}")

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

total_documents = len(all_splits)
third = total_documents // 3

for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"

print(f"splits = {len(all_splits)}")

_ = vector_store.add_documents(all_splits)

ドキュメントはロードしておきます。

$ uv run hello_load_documents.py

そして、変更後のRAGのコード。

hello_rag.py

from langchain_core.documents import Document
from langchain_core.prompts import PromptTemplate
from langchain_ollama import OllamaEmbeddings
from langchain_ollama import ChatOllama
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import FieldCondition, Filter, MatchValue
from langgraph.graph import START, StateGraph
from typing import Literal
from typing_extensions import Annotated, List, TypedDict

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

prompt = PromptTemplate.from_template("""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context} """)

llm = ChatOllama(model="llama3.2:3b", temperature=0, base_url="http://localhost:11434")


class Search(TypedDict):
    """Search query."""

    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ...,
        "Section to query.",
    ]


class State(TypedDict):
    question: str
    query: Search
    context: List[Document]
    answer: str


def analyze_query(state: State) -> dict[str, dict]:
    print(f"analyze query state = {state}")
    print()
    structured_llm = llm.with_structured_output(Search)

    # query = structured_llm.invoke(state["question"])
    # return {"query": query}

    _ = structured_llm.invoke(state["question"])
    return {"query": {"query": "Task Decomposition", "section": "end"}}


def retrieve(state: State) -> dict[str, List[Document]]:
    print(f"retieve state = {state}")
    print()

    query = state["query"]
    retrieved_docs = vector_store.similarity_search(
        query["query"],
        filter=Filter(
            must=[
                FieldCondition(
                    key="metadata.section",
                    match=MatchValue(value=query["section"]),
                )
            ]
        ),
    )
    return {"context": retrieved_docs}


def generate(state: State) -> dict[str, str]:
    print(f"generate state = {state}")
    print()
    docs_content = "\n\n".join(doc.page_content for doc in state["context"])
    messages = prompt.invoke({"question": state["question"], "context": docs_content})
    response = llm.invoke(messages)
    return {"answer": response.content}


graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()

print(f"graph = {graph.get_graph()}")
print()

response = graph.invoke({"question": "What is Task Decomposition?"})
print(f"answer = {response['answer']}")

print()

print(f"response = {response}")

モデルに対して質問からクエリーを生成してもらう際に、その結果と対応させるクラス。

class Search(TypedDict):
    """Search query."""

    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ...,
        "Section to query.",
    ]

これをStateに追加します。

class State(TypedDict):
    question: str
    query: Search
    context: List[Document]
    answer: str

質問からクエリーを生成する関数。

def analyze_query(state: State) -> dict[str, dict]:
    print(f"analyze query state = {state}")
    print()
    structured_llm = llm.with_structured_output(Search)

    # query = structured_llm.invoke(state["question"])
    # return {"query": query}

    _ = structured_llm.invoke(state["question"])
    return {"query": {"query": "Task Decomposition", "section": "end"}}

なのですが、今回使っているモデルだと相変わらずセクションを正しく認識してくれなかったので、ハードコードしました…。

Stateの途中経過を見ると、こうなるんですよね…。

retieve state = {'question': 'What is Task Decomposition?', 'query': {'query': 'Task Decomposition', 'section': 'definition'}}

generate state = {'question': 'What is Task Decomposition?', 'query': {'query': 'Task Decomposition', 'section': 'definition'}, 'context': []}

retrieve関数はフィルター付きに変更。

def retrieve(state: State) -> dict[str, List[Document]]:
    print(f"retieve state = {state}")
    print()

    query = state["query"]
    retrieved_docs = vector_store.similarity_search(
        query["query"],
        filter=Filter(
            must=[
                FieldCondition(
                    key="metadata.section",
                    match=MatchValue(value=query["section"]),
                )
            ]
        ),
    )
    return {"context": retrieved_docs}

generate関数は特に変更ありません。

グラフは、analyze_queryが始点になります。

graph_builder = StateGraph(State).add_sequence([analyze_query, retrieve, generate])
graph_builder.add_edge(START, "analyze_query")
graph = graph_builder.compile()

実行。

$ uv run hello_rag.py

途中経過を見ていきます。

グラフ。

graph = Graph(nodes={'__start__': Node(id='__start__', name='__start__', data=<class 'langchain_core.utils.pydantic.LangGraphInput'>, metadata=None), 'analyze_query': Node(id='analyze_query', name='analyze_query', data=analyze_query(tags=None, recurse=True, explode_args=False, func_accepts_config=False, func_accepts={}), metadata=None), 'retrieve': Node(id='retrieve', name='retrieve', data=retrieve(tags=None, recurse=True, explode_args=False, func_accepts_config=False, func_accepts={}), metadata=None), 'generate': Node(id='generate', name='generate', data=generate(tags=None, recurse=True, explode_args=False, func_accepts_config=False, func_accepts={}), metadata=None)}, edges=[Edge(source='__start__', target='analyze_query', data=None, conditional=False), Edge(source='analyze_query', target='retrieve', data=None, conditional=False), Edge(source='retrieve', target='generate', data=None, conditional=False)])

analyze_query関数に渡ってきたステート。

analyze query state = {'question': 'What is Task Decomposition?'}

retrieve関数に渡ってきたステート。

retieve state = {'question': 'What is Task Decomposition?', 'query': {'query': 'Task Decomposition', 'section': 'end'}}

generate関数に渡ってきたステート。

generate state = {'question': 'What is Task Decomposition?', 'query': {'query': 'Task Decomposition', 'section': 'end'}, 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'section': 'end', '_id': '1427dcb6-a342-4edb-a678-627dc5b0b6a0', '_collection_name': 'tutorial_collection'}, page_content='Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\n\n\nChallenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'section': 'end', '_id': '64521312-8177-4b3b-a450-3698ee5c51ec', '_collection_name': 'tutorial_collection'}, page_content='Here are a sample conversation for task clarification sent to OpenAI ChatCompletion endpoint used by GPT-Engineer. The user inputs are wrapped in {{user input text}}.\n[\n  {\n    "role": "system",\n    "content": "You will read instructions and not carry them out, only seek to clarify them.\\nSpecifically you will first summarise a list of super short bullets of areas that need clarification.\\nThen you will pick one clarifying question, and wait for an answer from the user.\\n"\n  },\n  {\n    "role": "user",\n    "content": "We are writing {{a Super Mario game in python. MVC components split in separate files. Keyboard control.}}\\n"\n  },\n  {\n    "role": "assistant",'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'section': 'end', '_id': '50c30ebe-acf3-453e-9601-a27a176b5916', '_collection_name': 'tutorial_collection'}, page_content='Or\n@article{weng2023agent,\n  title   = "LLM-powered Autonomous Agents",\n  author  = "Weng, Lilian",\n  journal = "lilianweng.github.io",\n  year    = "2023",\n  month   = "Jun",\n  url     = "https://lilianweng.github.io/posts/2023-06-23-agent/"\n}\nReferences#\n[1] Wei et al. “Chain of thought prompting elicits reasoning in large language models.” NeurIPS 2022\n[2] Yao et al. “Tree of Thoughts: Dliberate Problem Solving with Large Language Models.” arXiv preprint arXiv:2305.10601 (2023).\n[3] Liu et al. “Chain of Hindsight Aligns Language Models with Feedback\n“ arXiv preprint arXiv:2302.02676 (2023).\n[4] Liu et al. “LLM+P: Empowering Large Language Models with Optimal Planning Proficiency” arXiv preprint arXiv:2304.11477 (2023).\n[5] Yao et al. “ReAct: Synergizing reasoning and acting in language models.” ICLR 2023.\n[6] Google Blog. “Announcing ScaNN: Efficient Vector Similarity Search” July 28, 2020.\n[7] https://chat.openai.com/share/46ff149e-a4c7-4dd7-a800-fc4a642ea389'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'section': 'end', '_id': '5f8fd766-f45c-4f48-b4e4-7be2a710787c', '_collection_name': 'tutorial_collection'}, page_content='[6] Google Blog. “Announcing ScaNN: Efficient Vector Similarity Search” July 28, 2020.\n[7] https://chat.openai.com/share/46ff149e-a4c7-4dd7-a800-fc4a642ea389\n[8] Shinn & Labash. “Reflexion: an autonomous agent with dynamic memory and self-reflection” arXiv preprint arXiv:2303.11366 (2023).\n[9] Laskin et al. “In-context Reinforcement Learning with Algorithm Distillation” ICLR 2023.\n[10] Karpas et al. “MRKL Systems A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning.” arXiv preprint arXiv:2205.00445 (2022).\n[11] Nakano et al. “Webgpt: Browser-assisted question-answering with human feedback.” arXiv preprint arXiv:2112.09332 (2021).\n[12] Parisi et al. “TALM: Tool Augmented Language Models”\n[13] Schick et al. “Toolformer: Language Models Can Teach Themselves to Use Tools.” arXiv preprint arXiv:2302.04761 (2023).\n[14] Weaviate Blog. Why is Vector Search so fast? Sep 13, 2022.')]}

結果。

answer = Task decomposition refers to breaking down a complex task into smaller, manageable sub-tasks that can be solved individually. This approach helps in organizing and prioritizing tasks, making it easier to manage and complete them efficiently. It involves identifying the individual components of a larger task and allocating resources accordingly.

response = {'question': 'What is Task Decomposition?', 'query': {'query': 'Task Decomposition', 'section': 'end'}, 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'section': 'end', '_id': '1427dcb6-a342-4edb-a678-627dc5b0b6a0', '_collection_name': 'tutorial_collection'}, page_content='Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\n\n\nChallenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'section': 'end', '_id': '64521312-8177-4b3b-a450-3698ee5c51ec', '_collection_name': 'tutorial_collection'}, page_content='Here are a sample conversation for task clarification sent to OpenAI ChatCompletion endpoint used by GPT-Engineer. The user inputs are wrapped in {{user input text}}.\n[\n  {\n    "role": "system",\n    "content": "You will read instructions and not carry them out, only seek to clarify them.\\nSpecifically you will first summarise a list of super short bullets of areas that need clarification.\\nThen you will pick one clarifying question, and wait for an answer from the user.\\n"\n  },\n  {\n    "role": "user",\n    "content": "We are writing {{a Super Mario game in python. MVC components split in separate files. Keyboard control.}}\\n"\n  },\n  {\n    "role": "assistant",'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'section': 'end', '_id': '50c30ebe-acf3-453e-9601-a27a176b5916', '_collection_name': 'tutorial_collection'}, page_content='Or\n@article{weng2023agent,\n  title   = "LLM-powered Autonomous Agents",\n  author  = "Weng, Lilian",\n  journal = "lilianweng.github.io",\n  year    = "2023",\n  month   = "Jun",\n  url     = "https://lilianweng.github.io/posts/2023-06-23-agent/"\n}\nReferences#\n[1] Wei et al. “Chain of thought prompting elicits reasoning in large language models.” NeurIPS 2022\n[2] Yao et al. “Tree of Thoughts: Dliberate Problem Solving with Large Language Models.” arXiv preprint arXiv:2305.10601 (2023).\n[3] Liu et al. “Chain of Hindsight Aligns Language Models with Feedback\n“ arXiv preprint arXiv:2302.02676 (2023).\n[4] Liu et al. “LLM+P: Empowering Large Language Models with Optimal Planning Proficiency” arXiv preprint arXiv:2304.11477 (2023).\n[5] Yao et al. “ReAct: Synergizing reasoning and acting in language models.” ICLR 2023.\n[6] Google Blog. “Announcing ScaNN: Efficient Vector Similarity Search” July 28, 2020.\n[7] https://chat.openai.com/share/46ff149e-a4c7-4dd7-a800-fc4a642ea389'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'section': 'end', '_id': '5f8fd766-f45c-4f48-b4e4-7be2a710787c', '_collection_name': 'tutorial_collection'}, page_content='[6] Google Blog. “Announcing ScaNN: Efficient Vector Similarity Search” July 28, 2020.\n[7] https://chat.openai.com/share/46ff149e-a4c7-4dd7-a800-fc4a642ea389\n[8] Shinn & Labash. “Reflexion: an autonomous agent with dynamic memory and self-reflection” arXiv preprint arXiv:2303.11366 (2023).\n[9] Laskin et al. “In-context Reinforcement Learning with Algorithm Distillation” ICLR 2023.\n[10] Karpas et al. “MRKL Systems A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning.” arXiv preprint arXiv:2205.00445 (2022).\n[11] Nakano et al. “Webgpt: Browser-assisted question-answering with human feedback.” arXiv preprint arXiv:2112.09332 (2021).\n[12] Parisi et al. “TALM: Tool Augmented Language Models”\n[13] Schick et al. “Toolformer: Language Models Can Teach Themselves to Use Tools.” arXiv preprint arXiv:2302.04761 (2023).\n[14] Weaviate Blog. Why is Vector Search so fast? Sep 13, 2022.')], 'answer': 'Task decomposition refers to breaking down a complex task into smaller, manageable sub-tasks that can be solved individually. This approach helps in organizing and prioritizing tasks, making it easier to manage and complete them efficiently. It involves identifying the individual components of a larger task and allocating resources accordingly.'}

OKですね。

おわりに

前に試したLangChainのチュートリアル、RAG Part 1をLangGraphを追加して再チャレンジしてみました。

1度LCEPで苦労して組んだ分だけ、LangGraphに置き換えた時の動きが理解しやすかったです。

それに便利ですね、まずはLangChain自体に慣れた方がいいとは思うのですが、学ぶ内容に合わせてLangGraphも
使っていきましょう。