https://touch-sp.hatenablog.com/entry/2025/04/25/155150

はじめに

QATとは量子化認識トレーニング（Quantization-Aware Training）の略です。

これによって高品質を保ちながらVRAM使用量を大幅に抑えられるようです。

今回は「gemma-3-27b-it-qat-q4_0-gguf」をLM Studioから使ってみました。LM Studioから検索するとすぐに見つかると思います。

OCRとLangChain

以前LangChainを使ったOCRに関しての記事を書きました。
touch-sp.hatenablog.com
今回はそれをGradioで使ってみました。

Pythonスクリプト

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.chat_models import init_chat_model
import base64
import gradio as gr

model = init_chat_model(
    model="openai:gemma-3-27b-it-qat",
    api_key="EMPTY",
    base_url="http://localhost:1234/v1",
    temperature=0
)

# 画像をbase64エンコードする関数
def encode_image(image_path :str) -> dict[str, str]:
    with open(image_path, "rb") as image_file:
        return {"base64_image": base64.b64encode(image_file.read()).decode('utf-8')}

def extract_text(filepath: str):

    # プロンプトテンプレートの作成
    prompt_template = ChatPromptTemplate.from_messages(
        [
            ("system", "あなたは優秀なAIアシスタントです。"),
            ("human",[
                {"type": "text", "text":"画像からテキストを抽出して下さい。回答は抽出したテキストのみとして下さい。"},
                {
                    "type": "image_url",
                    "image_url": {"url": "data:image/jpeg;base64,{base64_image}"}
                }
            ])
        ]
    )

    chain = (
        encode_image
        | prompt_template
        | model
        | StrOutputParser()
    )

    result = ""
    for chunk in chain.stream(filepath):
        result += chunk
        yield result

with gr.Blocks() as demo:
    gr.Interface(
        fn=extract_text,
        inputs=gr.Image(type="filepath"),
        outputs=gr.Textbox(lines=10, max_lines=40, show_copy_button=True),
        flagging_mode="never"
    )

demo.launch(share=False)

ランキング参加中

プログラミング