https://kazuhira-r.hatenablog.com/entry/2025/03/01/114720

言語モデルにデータを注入する方法としてファインチューニングが挙げられるが、実際にはハルシネーションが増加しやすい
言語モデルを継続的にトレーニングするにはコストがかかり、維持も困難
利用する人の権限を超えたデータについて回答してしまう可能性がある
回答はどのようなデータを元に行ったのかを検証可能である必要がある
ファインチューニングすることで与える知識は基盤モデルをトレーニングする際に与えられたデータよりも桁違いに少なく、モデルがもともと学習していた知識を忘れてしまうか、独自のデータのニュアンスを学習できないことがある

RAGは、以下の動作を行う仕組みです。

クエリーが与えられる
クエリーを使ってナレッジベースを検索する
検索結果を言語モデルに対するプロンプトにコンテキストとして組み込む
言語モデルは与えられたコンテキストを使用して、クエリーに対する回答を生成する

このようにして、RAGを使うと言語モデルとターゲットを絞った情報検索の組み合わせでより性能および信頼性の高い
AIシステムの構築に近づくことができます。

ドキュメントを見るのはこれくらいにして、今回はChat modelにOllama、Vector storeにQdrantを使ってRAG Part 1を
試してみます。

環境

今回の環境はこちら。

$ python3 --version
Python 3.12.3


$ uv --version
uv 0.6.3

Ollama。

$ bin/ollama serve
$ bin/ollama --version
ollama version is 0.5.12

Qdrantは172.17.0.2で動作しているものとします。

$ ./qdrant --version
qdrant 1.13.4

準備

まずはプロジェクトを作成します。

$ uv init --vcs none langchain-tutorial-rag-part1
$ cd langchain-tutorial-rag-part1
$ rm main.py

今回必要な依存関係をインストール。

$ uv add langchain-text-splitters langchain-community langchain-core langchain-ollama langchain-qdrant beautifulsoup4

mypyとRuffも入れておきます。

$ uv add --dev mypy ruff

インストールされた依存関係の一覧。

$ uv pip list
Package                  Version
------------------------ ---------
aiohappyeyeballs         2.4.6
aiohttp                  3.11.13
aiosignal                1.3.2
annotated-types          0.7.0
anyio                    4.8.0
attrs                    25.1.0
beautifulsoup4           4.13.3
certifi                  2025.1.31
charset-normalizer       3.4.1
dataclasses-json         0.6.7
frozenlist               1.5.0
greenlet                 3.1.1
grpcio                   1.70.0
grpcio-tools             1.70.0
h11                      0.14.0
h2                       4.2.0
hpack                    4.1.0
httpcore                 1.0.7
httpx                    0.28.1
httpx-sse                0.4.0
hyperframe               6.1.0
idna                     3.10
jsonpatch                1.33
jsonpointer              3.0.0
langchain                0.3.19
langchain-community      0.3.18
langchain-core           0.3.40
langchain-ollama         0.2.3
langchain-qdrant         0.2.0
langchain-text-splitters 0.3.6
langsmith                0.3.11
marshmallow              3.26.1
multidict                6.1.0
mypy                     1.15.0
mypy-extensions          1.0.0
numpy                    2.2.3
ollama                   0.4.7
orjson                   3.10.15
packaging                24.2
portalocker              2.10.1
propcache                0.3.0
protobuf                 5.29.3
pydantic                 2.10.6
pydantic-core            2.27.2
pydantic-settings        2.8.0
python-dotenv            1.0.1
pyyaml                   6.0.2
qdrant-client            1.13.2
requests                 2.32.3
requests-toolbelt        1.0.0
ruff                     0.9.7
setuptools               75.8.1
sniffio                  1.3.1
soupsieve                2.6
sqlalchemy               2.0.38
tenacity                 9.0.0
typing-extensions        4.12.2
typing-inspect           0.9.0
urllib3                  2.3.0
yarl                     1.18.3
zstandard                0.23.0

pyproject.toml

[project]
name = "langchain-tutorial-rag-part1"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = [
    "beautifulsoup4>=4.13.3",
    "langchain-community>=0.3.18",
    "langchain-core>=0.3.40",
    "langchain-ollama>=0.2.3",
    "langchain-qdrant>=0.2.0",
    "langchain-text-splitters>=0.3.6",
]

[dependency-groups]
dev = [
    "mypy>=1.15.0",
    "ruff>=0.9.7",
]

[tool.mypy]
strict = true
disallow_any_unimported = true
#disallow_any_expr = true
disallow_any_explicit = true
warn_unreachable = true
pretty = true

LangChainのチュートリアルのRAG Part 1を試す

では、LangChainのチュートリアルからRAG Part 1を試していきます。

Build a Retrieval Augmented Generation (RAG) App: Part 1 | 🦜️🔗 LangChain

主にここを見ていきます。

Build a Retrieval Augmented Generation (RAG) App: Part 1 / Detailed walkthrough

ドキュメントを読み取ってVector storeに登録するまでと、検索とChat modelへの質問（生成）、クエリー分析の
3つに分けて書きましょう。

ドキュメントをVector storeに登録する

最初はドキュメントをVector storeに登録するところまでです。

Build a Retrieval Augmented Generation (RAG) App: Part 1 / Indexing

作成したソースコードはこちら。

hello_load_documents.py

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_ollama import OllamaEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")
client.delete_collection(collection_name="tutorial_collection")
client.create_collection(
    collection_name="tutorial_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={
        "parse_only": bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    },
)

docs = loader.load()

assert len(docs) == 1
print(f"total characters: {len(docs[0].page_content)}")

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

print(f"splits = {len(all_splits)}")

document_ids = vector_store.add_documents(all_splits)
print(document_ids[:3])

それぞれ説明していきましょう。

Embedding modelの作成。Ollamaで、モデルはall-minilm:l6-v2を使います。

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

Vector storeにはQdrantを使います。Qdrantのコレクションの再作成と、Vector storeとして使うための
インスタンスの作成。ベクトルの次元数は384です。

client = QdrantClient("http://172.17.0.2:6333")
client.delete_collection(collection_name="tutorial_collection")
client.create_collection(
    collection_name="tutorial_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

ドキュメントは、今回はWebからロードします。

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={
        "parse_only": bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    },
)

docs = loader.load()

Build a Retrieval Augmented Generation (RAG) App: Part 1 / Indexing / Loading documents

対象はこちらのページですね。

LLM Powered Autonomous Agents | Lil'Log

ここからpost-content、post-title、post-header（いずれもclass）を抜き出します。

ドキュメント数と文字数を確認してみます。

assert len(docs) == 1
print(f"total characters: {len(docs[0].page_content)}")

実行。

$ uv run hello_load_documents.py

結果。…チュートリアルの結果と1文字違いますが、まあいいでしょう。

total characters: 43130

次はテキスト分割です。テキストが40,000文字を超えていてコンテキストに対して長すぎるので、チャンクに
分割します。

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

print(f"splits = {len(all_splits)}")

66個のチャンクに分割されました。

splits = 66

最後にVector storeに保存します。

document_ids = vector_store.add_documents(all_splits)

Build a Retrieval Augmented Generation (RAG) App: Part 1 / Indexing / Storing documents

ドキュメントのidを3つ見てみます。

print(document_ids[:3])

結果。

['46c19bbc995041819d27934b18f1e415', 'f5f88815e166442096e86908c24a2874', '5e76e6503a2d4097937f37ae6d1aad61']

Qdrantのコレクションにも66個のポイントが登録されました。

これで、Vector storeへの登録までは完了です。

検索とChat modelへの質問（生成）を行う

次は、検索とChat modelへの質問（生成）を行います。

Build a Retrieval Augmented Generation (RAG) App: Part 1 / Retrieval and Generation

作成したソースコードはこちら。

hello_rag.py

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_ollama import OllamaEmbeddings
from langchain_ollama import ChatOllama
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

prompt = PromptTemplate.from_template("""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} """)

example_messages = prompt.invoke(
    {"context": "(context goes here)", "question": "(question goes here)"}
).to_messages()

assert len(example_messages) == 1
print(example_messages[0].content)

print()

retriever = vector_store.as_retriever()

llm = ChatOllama(model="llama3.2:3b", temperature=0, base_url="http://localhost:11434")

chain = (
    {"question": RunnablePassthrough(), "context": retriever}
    | prompt
    | llm
    | StrOutputParser()
)

# chain = {
#     "question": RunnablePassthrough(),
#     "context": retriever,
# } | RunnablePassthrough.assign(answer=prompt | llm | StrOutputParser())

output = chain.invoke("What is Task Decomposition?")
print(output)

今回はLangGraphを使っていないので、チュートリアルのソースコードとはちょっと変わっています。

まずはOllamaのEmbedding modelとQdrantのVector storeを作成。

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

チュートリアルでは、プロンプトはこちらのものを使っているようです。

https://smith.langchain.com/hub/rlm/rag-prompt

今回はテキストを引っ張ってきて使うことにしました。

prompt = PromptTemplate.from_template("""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question} 
Context: {context} """)

プロンプトに値を入れて呼び出してみます。

example_messages = prompt.invoke(
    {"context": "(context goes here)", "question": "(question goes here)"}
).to_messages()

assert len(example_messages) == 1
print(example_messages[0].content)

print()

実行。

$ uv run hello_rag.py

こういう結果になります。

You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: (question goes here)
Context: (context goes here)

Vector modelからRetrieverを取得して、Chat modelも作成。Chat modelのモデルはllama3.2:3bにしました。

retriever = vector_store.as_retriever()

llm = ChatOllama(model="llama3.2:3b", temperature=0, base_url="http://localhost:11434")

この後はチュートリアルだとLangGraphを使ってまとめているのですが、今回はLCELを使ってつなげることにしました。

chain = (
    {"question": RunnablePassthrough(), "context": retriever}
    | prompt
    | llm
    | StrOutputParser()
)

LCELはRunnableをチェーンし、最適化して実行できるものです。

LangChain Expression Language (LCEL) | 🦜️🔗 LangChain

今回つなげているのは全部Runnableです。

How to chain runnables | 🦜️🔗 LangChain

最後にいるのはOutput parsersですね。

Output parsers | 🦜️🔗 LangChain

今回はモデルからのレスポンスをそのままテキストとして扱うStrOutputParserを使っています。

How to parse text from message objects | 🦜️🔗 LangChain

そして、このチェインを呼び出します。

output = chain.invoke("What is Task Decomposition?")
print(output)

先ほどは飛ばしましたが、ここでquestionに指定した引数が渡ってくるようにRunnablePassthroughを
指定しています。

    {"question": RunnablePassthrough(), "context": retriever}

How to pass through arguments from one step to the next | 🦜️🔗 LangChain

そしてcontextにはRetrieverから取得した値が入ります。

これを実行すると、こんなテキストが得られます。

Task Decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves instructing a model to "think step by step" to utilize more test-time computation and transform big tasks into multiple manageable tasks. This technique can be done with simple prompting or using task-specific instructions.

チュートリアルに載っている結果とは違いますが、まあまあ近いことを言っています。

なお、途中でどんな結果が入っているかを確認するには、以下のようにRunnablePassthroughを使います。

#chain = (
#    {"question": RunnablePassthrough(), "context": retriever}
#    | prompt
#    | llm
#    | StrOutputParser()
#)

chain = {
    "question": RunnablePassthrough(),
    "context": retriever,
} | RunnablePassthrough.assign(answer=prompt | llm | StrOutputParser())

すると結果はこうなり、途中の情報も含めてわかるようになります。

{'question': 'What is Task Decomposition?', 'context': [Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '5e76e650-3a2d-4097-937f-37ae6d1aad61', '_collection_name': 'tutorial_collection'}, page_content='Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'b70846e0-c3fe-4dd9-9824-68bf58117f06', '_collection_name': 'tutorial_collection'}, page_content='Tree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.\nTask decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are the subgoals for achieving XYZ?", (2) by using task-specific instructions; e.g. "Write a story outline." for writing a novel, or (3) with human inputs.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': '5fd575fd-5836-4873-b416-f3e2ebccf7d8', '_collection_name': 'tutorial_collection'}, page_content='Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\n\n\nChallenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.'), Document(metadata={'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', '_id': 'c26f2f03-00a6-40e3-b80b-bbd9b09c599f', '_collection_name': 'tutorial_collection'}, page_content='Fig. 3. Illustration of the Reflexion framework. (Image source: Shinn & Labash, 2023)\nThe heuristic function determines when the trajectory is inefficient or contains hallucination and should be stopped. Inefficient planning refers to trajectories that take too long without success. Hallucination is defined as encountering a sequence of consecutive identical actions that lead to the same observation in the environment.\nSelf-reflection is created by showing two-shot examples to LLM and each example is a pair of (failed trajectory, ideal reflection for guiding future changes in the plan). Then reflections are added into the agent’s working memory, up to three, to be used as context for querying LLM.')], 'answer': 'Task Decomposition is a technique used to break down complex tasks into smaller and simpler steps. It involves instructing a model to "think step by step" to decompose hard tasks into manageable subtasks. This technique can enhance model performance on complex tasks by transforming big tasks into multiple smaller tasks.'}

クエリー分析

最後はクエリー分析です。

Build a Retrieval Augmented Generation (RAG) App: Part 1 / Query analysis

これはなにかというと、以下の2つを行います。

ドキュメントにメタデータを追加し、検索時のフィルターとして使う
ユーザーの入力した質問をそのままモデルに渡すのではなく、モデルにクエリーに変換してもらう

なので、ここまで書いてきたソースコードを変更します。

最初はドキュメントをVector storeにロードする部分です。

hello_load_documents.py

import bs4
from langchain_community.document_loaders import WebBaseLoader
from langchain_ollama import OllamaEmbeddings
from langchain_qdrant import QdrantVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from qdrant_client import QdrantClient
from qdrant_client.http.models import Distance, VectorParams

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")
client.delete_collection(collection_name="tutorial_collection")
client.create_collection(
    collection_name="tutorial_collection",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

loader = WebBaseLoader(
    web_paths=("https://lilianweng.github.io/posts/2023-06-23-agent/",),
    bs_kwargs={
        "parse_only": bs4.SoupStrainer(
            class_=("post-content", "post-title", "post-header")
        )
    },
)

docs = loader.load()

assert len(docs) == 1
print(f"total characters: {len(docs[0].page_content)}")

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
all_splits = text_splitter.split_documents(docs)

print(f"splits = {len(all_splits)}")

total_documents = len(all_splits)
third = total_documents

for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"

print(all_splits[0].metadata)

document_ids = vector_store.add_documents(all_splits)
print(document_ids[:3])

こちらを追加し、メタデータとしてセクションをつけるようにしています。

total_documents = len(all_splits)
third = total_documents

for i, document in enumerate(all_splits):
    if i < third:
        document.metadata["section"] = "beginning"
    elif i < 2 * third:
        document.metadata["section"] = "middle"
    else:
        document.metadata["section"] = "end"

print(all_splits[0].metadata)

結果はこうなります。

$ uv run hello_load_documents.py
USER_AGENT environment variable not set, consider setting it to identify your requests.
total characters: 43130
splits = 66
{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'section': 'beginning'}
['de5ba0fcd43049708638f1627c6df237', '77edfef86c194f13877ac0f3e850892e', '4654208ada764f749305d5e6757fac16']

メタデータが入ったことは、ここで確認できます。

{'source': 'https://lilianweng.github.io/posts/2023-06-23-agent/', 'section': 'beginning'}

次は、Chat modelに質問からセクションを推測してもらい、そのコンテキストをもとに検索します。

が、今回はうまくいきませんでした…。

修正後のソースコードはこちらです。

hello_rag.py

from typing import Literal
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import PromptTemplate
from langchain_core.runnables import RunnablePassthrough, RunnableLambda
from langchain_ollama import OllamaEmbeddings
from langchain_ollama import ChatOllama
from langchain_qdrant import QdrantVectorStore
from qdrant_client import QdrantClient
from qdrant_client.http.models import FieldCondition, Filter, MatchValue
from typing_extensions import Annotated, TypedDict

embeddings = OllamaEmbeddings(
    model="all-minilm:l6-v2", base_url="http://localhost:11434"
)

client = QdrantClient("http://172.17.0.2:6333")

vector_store = QdrantVectorStore(
    client=client, collection_name="tutorial_collection", embedding=embeddings
)

prompt = PromptTemplate.from_template("""You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context} """)

example_messages = prompt.invoke(
    {"context": "(context goes here)", "question": "(question goes here)"}
).to_messages()

assert len(example_messages) == 1
print(example_messages[0].content)

print()


class Search(TypedDict):
    """Search query."""

    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ...,
        "Section to query.",
    ]


#retriever = vector_store.as_retriever()

def retriever(query):
    retrieved_docs = vector_store.similarity_search(
        query["query"],
        filter=Filter(must=[FieldCondition(key="metadata.section", match=MatchValue(value=query["section"]),)])
    )
    return "\n\n".join(doc.page_content for doc in retrieved_docs)


llm = ChatOllama(model="llama3.2:3b", temperature=0, base_url="http://localhost:11434")
structured_llm = llm.with_structured_output(Search)

print(structured_llm.invoke("What is Task Decomposition?"))

def provide_query_section(s):
    return {"query": "Task Decomposition", "section": "end"}

chain = (
    {
        "question": RunnablePassthrough(),
        "context": RunnableLambda(provide_query_section) | retriever,
    }
    | prompt
    | llm
    | StrOutputParser()
)

#chain = {
#   "question": RunnablePassthrough(),
#   "context": RunnableLambda(provide_query_section) | retriever,
#} | RunnablePassthrough.assign(answer=prompt | llm | StrOutputParser())

output = chain.invoke("What is Task Decomposition?")
print(output)

このクラスを使い

class Search(TypedDict):
    """Search query."""

    query: Annotated[str, ..., "Search query to run."]
    section: Annotated[
        Literal["beginning", "middle", "end"],
        ...,
        "Section to query.",
    ]

質問からどのセクションを選ぶべきか推論してもらっているのですが…

structured_llm = llm.with_structured_output(Search)

print(structured_llm.invoke("What is Task Decomposition?"))`

今回使っているllama3.2:3bだと、存在しないセクションを返してくるので意味がなくなりました…。

{'query': 'Task Decomposition', 'section': 'Definition'}

そこで今回は、チュートリアルを見て以下の結果をChat modelから選んだもらったという前提で進めることにします。

def provide_query_section(s):
    return {"query": "Task Decomposition", "section": "end"}

そしてこの内容でVector storeで検索します。sectionはフィルタリングに使います。

def retriever(query):
    retrieved_docs = vector_store.similarity_search(
        query["query"],
        filter=Filter(must=[FieldCondition(key="metadata.section", match=MatchValue(value=query["section"]),)])
    )
    return "\n\n".join(doc.page_content for doc in retrieved_docs)

filterの使い方はVector storeごとに変わるようで、それに気づかずにチュートリアルの内容をそのまま使って動かずに
とてもとてもハマりました…。

LCELはこのように変更。

chain = (
    {
        "question": RunnablePassthrough(),
        "context": RunnableLambda(provide_query_section) | retriever,
    }
    | prompt
    | llm
    | StrOutputParser()
)

結果。

Task decomposition refers to breaking down a complex task into smaller, manageable sub-tasks that can be solved individually. This approach helps in organizing and prioritizing tasks, making it easier to manage and complete them efficiently. It involves identifying the individual components of a larger task and allocating resources accordingly.

こちらのパターンだと

chain = {
   "question": RunnablePassthrough(),
   "context": RunnableLambda(provide_query_section) | retriever,
} | RunnablePassthrough.assign(answer=prompt | llm | StrOutputParser())

こうなります。

{'question': 'What is Task Decomposition?', 'context': 'Finite context length: The restricted context capacity limits the inclusion of historical information, detailed instructions, API call context, and responses. The design of the system has to work with this limited communication bandwidth, while mechanisms like self-reflection to learn from past mistakes would benefit a lot from long or infinite context windows. Although vector stores and retrieval can provide access to a larger knowledge pool, their representation power is not as powerful as full attention.\n\n\nChallenges in long-term planning and task decomposition: Planning over a lengthy history and effectively exploring the solution space remain challenging. LLMs struggle to adjust plans when faced with unexpected errors, making them less robust compared to humans who learn from trial and error.\n\nHere are a sample conversation for task clarification sent to OpenAI ChatCompletion endpoint used by GPT-Engineer. The user inputs are wrapped in {{user input text}}.\n[\n  {\n    "role": "system",\n    "content": "You will read instructions and not carry them out, only seek to clarify them.\\nSpecifically you will first summarise a list of super short bullets of areas that need clarification.\\nThen you will pick one clarifying question, and wait for an answer from the user.\\n"\n  },\n  {\n    "role": "user",\n    "content": "We are writing {{a Super Mario game in python. MVC components split in separate files. Keyboard control.}}\\n"\n  },\n  {\n    "role": "assistant",\n\nOr\n@article{weng2023agent,\n  title   = "LLM-powered Autonomous Agents",\n  author  = "Weng, Lilian",\n  journal = "lilianweng.github.io",\n  year    = "2023",\n  month   = "Jun",\n  url     = "https://lilianweng.github.io/posts/2023-06-23-agent/"\n}\nReferences#\n[1] Wei et al. “Chain of thought prompting elicits reasoning in large language models.” NeurIPS 2022\n[2] Yao et al. “Tree of Thoughts: Dliberate Problem Solving with Large Language Models.” arXiv preprint arXiv:2305.10601 (2023).\n[3] Liu et al. “Chain of Hindsight Aligns Language Models with Feedback\n“ arXiv preprint arXiv:2302.02676 (2023).\n[4] Liu et al. “LLM+P: Empowering Large Language Models with Optimal Planning Proficiency” arXiv preprint arXiv:2304.11477 (2023).\n[5] Yao et al. “ReAct: Synergizing reasoning and acting in language models.” ICLR 2023.\n[6] Google Blog. “Announcing ScaNN: Efficient Vector Similarity Search” July 28, 2020.\n[7] https://chat.openai.com/share/46ff149e-a4c7-4dd7-a800-fc4a642ea389\n\n[6] Google Blog. “Announcing ScaNN: Efficient Vector Similarity Search” July 28, 2020.\n[7] https://chat.openai.com/share/46ff149e-a4c7-4dd7-a800-fc4a642ea389\n[8] Shinn & Labash. “Reflexion: an autonomous agent with dynamic memory and self-reflection” arXiv preprint arXiv:2303.11366 (2023).\n[9] Laskin et al. “In-context Reinforcement Learning with Algorithm Distillation” ICLR 2023.\n[10] Karpas et al. “MRKL Systems A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning.” arXiv preprint arXiv:2205.00445 (2022).\n[11] Nakano et al. “Webgpt: Browser-assisted question-answering with human feedback.” arXiv preprint arXiv:2112.09332 (2021).\n[12] Parisi et al. “TALM: Tool Augmented Language Models”\n[13] Schick et al. “Toolformer: Language Models Can Teach Themselves to Use Tools.” arXiv preprint arXiv:2302.04761 (2023).\n[14] Weaviate Blog. Why is Vector Search so fast? Sep 13, 2022.', 'answer': 'Task decomposition refers to breaking down a complex task into smaller, manageable sub-tasks that can be solved individually. This approach helps in organizing and prioritizing tasks, making it easier to manage and complete them efficiently. It involves identifying the individual components of a larger task and allocating resources accordingly.'}

ちょっと強引な動かし方になりましたが、LCELなどの理解も進みました…。

おわりに

LangChainのチュートリアルからRAG Part 1を試してみました。

LangGraphとLangSmithを削ったのでだいぶ違う形になったのと、最後のクエリー分析は良い結果が得られません
でしたが、LCELなどの理解も進んだのでよかったと思います。

Output parsersも使えましたし。

RAG Part 2に進みたいところですが、LangGraphを使っておいた方がいいかも…という気分にはなってきました。
Part 2でも登場するみたいなので。