使ってはいるものの、内容を理解していないので、メモ

参考url

Transformer の概要

EncoderとDecoderで構成され、内部にあったRNN層を除き、 Attention層のみで構成することで「速度、精度、汎用性」が向上

Transformer のLLM例

LLM	ベース	使用例
Bert (google製)	Encoder	テキスト分類、文章の要約
GPT (openai製)	Decoder	文章生成、質疑応答

Transformer の構成

EncoderとDecoderはそれぞれ単独利用もOK。
DecoderのAttention層はEncoder入力を処理するもので、Decoderのみ使用する場合,不要

  【Encoder】             【Decoder】
                          Output      
                          Probablities
                              ↑
┌──────┐        ┌──┴───┐
│Attention層 ├───→│Attention層 │
└──────┘        └──────┘
      ↑                      ↑
      │                ┌──┴───┐
      │                │Masked      │
      │                │Attention層 │
      │                └──────┘
      │                      ↑
┌──┴───┐        ┌──┴───┐
│単語位置把握│        │単語位置把握│
└──────┘        └──────┘
      ↑                      ↑        
　　Inputs                  Outputs

構成要素	役割
Encoder	テキストの意味表現を固定長数値ベクトル化
Decoder	意味表現ベクトルから翻訳後のテキスト生成
Attention層	文中で重要な単語に重み付け
Masked Attention層	ググった方が早いと思います