https://daisuke20240310.hatenablog.com/entry/codeql

前回は、GitHub が開発した CodeQL（セキュリティ脆弱性を検出するオープンソースのツール）で使われているアラートクエリについて、実際に内容を見ていきました。

クエリには、大きく分けて、アラートクエリとパスクエリがあります。今回は、パスクエリの実装を見て、理解していきたいと思います。

そもそもの目的だった Use After Free を検出できるクエリの UseAfterFree.ql が、パスクエリだったので、まずはこれを見ていきたいと思います。また、それ以外のパスクエリも見たいと思います。

それでは、やっていきます。

はじめに

「セキュリティ」の記事一覧です。良かったら参考にしてください。

セキュリティの記事一覧

以下は、CodeQL の公式サイトです。

codeql.github.com

また、CodeQL CLI の公式ドキュメント（GitHub Docs）は以下です。ありがたいことに、日本語のドキュメントがあります。

docs.github.com

このページにリンクがあるライセンスのページを見ると、CodeQL は、GitHub で公開されているリポジトリのように、オープンソースに使うことは許されていますが、GitHub のプライベートリポジトリのような公開されていないソフトウェアに使う場合は、商用ライセンスが必要になるようです。

ちゃんと分かっていなかったのですが、上記の GitHub Docs の他に、CodeQL の公式ドキュメントとして、以下もあります。上の日本語で表示できるドキュメントから、下のドキュメントにリンクが貼られていて、違うサイトがあることには、何となく気づいていましたが、ちゃんと紹介できていませんでした。

なぜ、2つに分かれているのかは分かりませんが、クエリの書き方や、クエリを記述する際の文法については、こちらにしか書かれていません。

codeql.github.com

こちらを参考にしながら、実際のクエリを読み解いていきたいと思います。

UseAfterFree.ql

UAF を検出してくれるはずのパスクエリの UseAfterFree.ql（qlpacks/codeql/cpp-queries/1.4.2/Critical/UseAfterFree.ql）です。

こちらも、説明のマークダウン形式のファイル（UseAfterFree.md）がありますが、UAF に関する説明だけで、クエリを読む助けになる説明はありませでした。

以下が、UseAfterFree.ql です。

/**
 * @name Potential use after free
 * @description An allocated memory block is used after it has been freed. Behavior in such cases is undefined and can cause memory corruption.
 * @kind path-problem
 * @precision high
 * @id cpp/use-after-free
 * @problem.severity warning
 * @security-severity 9.3
 * @tags reliability
 *       security
 *       external/cwe/cwe-416
 */

import cpp
import semmle.code.cpp.dataflow.new.DataFlow
import semmle.code.cpp.ir.IR
import semmle.code.cpp.security.flowafterfree.FlowAfterFree
import semmle.code.cpp.security.flowafterfree.UseAfterFree
import UseAfterFreeTrace::PathGraph

module UseAfterFreeParam implements FlowFromFreeParamSig {
  predicate isSink = isUse/2;

  predicate isExcluded = isExcludedMmFreePageFromMdl/2;

  predicate sourceSinkIsRelated = defaultSourceSinkIsRelated/2;
}

import UseAfterFreeParam

module UseAfterFreeTrace = FlowFromFree<UseAfterFreeParam>;

from UseAfterFreeTrace::PathNode source, UseAfterFreeTrace::PathNode sink, DeallocationExpr dealloc
where
  UseAfterFreeTrace::flowPath(source, sink) and
  isFree(source.getNode(), _, _, dealloc)
select sink.getNode(), source, sink, "Memory may have been previously freed by $@.", dealloc,
  dealloc.toString()

これを読み解くにあたり、パスクエリについての基本的な内容について、以下の記事にたくさん追記しました。

daisuke20240310.hatenablog.com

ここでは、上記以外のところを読み解いていきます。

まず、predicate isSink = isUse/2; という行ですが、まず、/2 の部分です。これは、2つの引数を持っている、という意味になります。おそらく、FlowFromFreeParamSigクラスに、isUse というメソッドがあり、その中の 2つの引数を持つメソッドに対して、別名（isSink）を付けているということだと思います。

このソースだけ見てても、全く分かりません。なので、インポートしてるライブラリの中身を見ていきます。

import semmle.code.cpp.security.flowafterfree.FlowAfterFree と import semmle.code.cpp.security.flowafterfree.UseAfterFree が重要そうなので、まずは、これらを探します。

まずは、リリースバイナリを解凍したディレクトリを検索します。

適切な探し方ではなかった気がしてきましたが、一応、見つかりました。複数の場所に、同じファイル（UseAfterFree.qll）がありそうです。

$ grep -rI 'FlowAfterFree*' .
./qlpacks/codeql/cpp-all/5.1.0/semmle/code/cpp/security/flowafterfree/UseAfterFree.qll:private import semmle.code.cpp.security.flowafterfree.FlowAfterFree
./qlpacks/codeql/cpp-examples/0.0.0/.codeql/libraries/codeql/cpp-all/5.1.0/semmle/code/cpp/security/flowafterfree/UseAfterFree.qll:private import semmle.code.cpp.security.flowafterfree.FlowAfterFree
./qlpacks/codeql/cpp-queries/1.4.2/Critical/UseAfterFree.ql:import semmle.code.cpp.security.flowafterfree.FlowAfterFree
./qlpacks/codeql/cpp-queries/1.4.2/Critical/DoubleFree.ql:import semmle.code.cpp.security.flowafterfree.FlowAfterFree
./qlpacks/codeql/cpp-queries/1.4.2/.codeql/libraries/codeql/cpp-all/5.1.0/semmle/code/cpp/security/flowafterfree/UseAfterFree.qll:private import semmle.code.cpp.security.flowafterfree.FlowAfterFree

UseAfterFree.qll と同じディレクトリに、FlowAfterFree.qll がありました。この 2つのファイルをインポートしているようです。あと、先に qlpacks/codeql/cpp-queries/1.4.2/.codeql/libraries/codeql/cpp-all/5.1.0/semmle/code/cpp/dataflow/new/DataFlow.qll と、qlpacks/codeql/cpp-queries/1.4.2/.codeql/libraries/codeql/cpp-all/5.1.0/semmle/code/cpp/ir/IR.qll を読み解く必要がありそうです。

かなり、ハードルが高そうですが、これらを読み解いていきます。

DataFlow.qll

他のファイルをインポートしてるだけに見えます。

先頭のコメントを和訳します。

ローカル（関数内、intra-procedural）およびグローバル（関数間、inter-procedural）のデータフロー解析を行うためのライブラリを提供します。

これは、データが source（ソース）から sink（シンク）へ流れる可能性があるかどうかを判定します。

このライブラリは semmle.code.cpp.dataflow にあるライブラリとは異なり、プログラムをより正確に意味的に表現する IR（中間表現, Intermediate Representation）ライブラリを使用します。

一方、もう一方のデータフローライブラリは、より構文寄りの AST（抽象構文木）を使用します。

このため、多くの場合、このライブラリのほうが AST ベースのライブラリより正確な結果を提供します。

特に設定を変更しない限り、flow（フロー）とは「ソースの正確な値がシンクに到達する可能性がある」ことを意味します。

ソースの正確な値が保持されない場合のフローを追跡したい場合は、semmle.code.cpp.dataflow.new.TaintTracking をインポートしてください。

グローバル（関数間）のデータフローを使用するには、DataFlow::Configuration クラスを拡張してください（詳細はそのクラスのドキュメントを参照）。

式間のローカル（関数内）データフローを使用するには、DataFlow::localExprFlow を呼び出します。

より一般的なローカルデータフローの場合は、DataFlow::Node 型の引数を用いて、DataFlow::localFlow または DataFlow::localFlowStep を呼び出してください。

DataFlow.qll のコメント以外のところです。

import cpp

/**
 * Provides classes for performing local (intra-procedural) and
 * global (inter-procedural) data flow analyses.
 * ローカル (プロシージャ内) およびグローバル (プロシージャ間) の
 * データ フロー分析を実行するためのクラスを提供します。
 */
module DataFlow {
  private import semmle.code.cpp.ir.dataflow.internal.DataFlowImplSpecific
  private import codeql.dataflow.DataFlow
  import DataFlowMake<Location, CppDataFlow>
  import Public
}

IR.qll

ほとんどコメントで、import implementation.aliased_ssa.IR の 1行だけが有効です。

以下は、和訳した内容です。

C/C++コードの中間表現（IR: Intermediate Representation）を提供します。

IR（中間表現）は、プログラムのセマンティクス（意味論）を表現するものであり、プログラムを書く際に使用された構文にはほとんど依存しません。

たとえば、C++では i += 1;、i++、++i の3つの文は、セマンティクス的にはすべて同じ効果を持ちますが、抽象構文木（AST）上では、それぞれ異なる型の Expr ノードとして表現されます。

一方、IRでは、これら3つの文はいずれも、以下のような基本的な操作の列に分解されます：

r1(int*) = VariableAddress[i] // Compute the address of variable i

r2(int) = Load &:r1, m0 // Load the value of i

r3(int) = Constant[1] // An integer constant with the value 1

r4(int) = Add r2, r3 // Add 1 to the value of i

r5(int) = Store &r1, r4 // Store the new value back into the variable i

これにより、IRベースの解析は、ソースコード上で操作がどのように表現されているかという様々な違いに気を取られることなく、基本的な操作そのものに集中することができます。

IRにおける主要なクラスは次のとおりです：

IRFunction - 関数定義全体のIR（中間表現）を含みます。これには、その関数内のすべての Instruction（命令）、IRBlock（基本ブロック）、および IRVariable（変数）が含まれます。

Instruction - IRにおける1つの操作（命令）を表します。命令は、実行される操作の内容、その操作に入力を与えるオペランド（Operand）、および結果の型を定義します。制御フローは、ある Instruction からその後に続くいくつかの Instruction のうちの1つへと移動します。

Operand - Instruction の入力値を表します。IRでは、すべての入力は明示的に Operand として表現され、たとえソースコード上でその入力が暗黙的であっても同様です。Operand は、自身の値を使用する Instruction（使用元）へのリンクと、自身の値を生成した Instruction（定義元）へのリンクを持ちます。

IRVariable - 特定の関数においてIRがアクセスする変数を表します。関数が直接アクセスするすべての変数に対して IRVariable が作成されます。加えて、ソースコード上で明示的に宣言されていない一時的な記憶領域（例えば、関数の戻り値など）も IRVariable として表現されます。

IRBlock - 関数の制御フローグラフにおける「基本ブロック」です。IRBlock は、複数の命令から成る連続した命令列を持ち、制御フローはブロックの最初の命令からのみ入ることができ、最後の命令からのみ出ることができます。

IRType - IR内でアクセスされる値の型を表します。ASTの Type クラスとは異なり、IRType は言語に依存しません。たとえば、C++において unsigned int、char32_t、wchar_t などはすべて、4バイトの符号なし整数として IRType uint4 で表現される場合があります。

ほとんどのクエリは、エイリアス解析されたSSA形式のIR上で動作すべきであるため、それを「IR」として公開しています。

FlowAfterFree.qll

長いので、使われてそうなところだけ、見ていきます。

まず、FlowFromFreeParamSig です。正直言って、全く分かりません。

/**
 * The signature for a module that is used to specify the inputs to the `FlowFromFree` module.
 * `FlowFromFree` モジュールへの入力を指定するために使用されるモジュールの署名。
 */
signature module FlowFromFreeParamSig {
  /**
   * Holds if `n.asExpr() = e` and `n` is a sink in the `FlowFromFreeConfig` module.
   * `n.asExpr() = e` かつ `n` が `FlowFromFreeConfig` モジュール内のシンクである場合に保持されます。
   */
  predicate isSink(DataFlow::Node n, Expr e);

  /**
   * Holds if `dealloc` is a deallocation expression and `e` is an expression such
   * that `isFree(_, e)` holds for some `isFree` predicate satisfying `isSinkSig`,
   * and this source-sink pair should be excluded from the analysis.
   * `dealloc` が解放式であり、`e` が `isSinkSig` を満たす何らかの `isFree` 述語に対して、
   * `isFree(_, e)` が成立するような式である場合に成立し、このソース シンク ペアは分析から除外される必要があります。
   */
  bindingset[dealloc, e]
  predicate isExcluded(DeallocationExpr dealloc, Expr e);

  /**
   * Holds if `sink` should be considered a `sink` when the source of flow is `source`.
   * フローのソースが `source` であるときに `sink` を `sink` と見なす必要がある場合に保持されます。
   */
  bindingset[source, sink]
  default predicate sourceSinkIsRelated(DataFlow::Node source, DataFlow::Node sink) { any() }
}

次に、isFree です。

/**
 * Holds if `outgoing` is a dataflow node that represents the pointer passed to
 * `dealloc` after the call returns (i.e., the post-update node associated with
 * the argument to `dealloc`), and `incoming` is the corresponding argument
 * node going into `dealloc` (i.e., the pre-update node of `outgoing`).
 * `outgoing` が、呼び出しが戻った後に `dealloc` に渡されるポインターを表すデータフローノード
 * (つまり、`dealloc` への引数に関連付けられた更新後のノード) であり、
 * `incoming` が `dealloc` に入る対応する引数ノード (つまり、`outgoing` の更新前のノード) である場合に保持されます。
 */
predicate isFree(DataFlow::Node outgoing, DataFlow::Node incoming, Expr e, DeallocationExpr dealloc) {
  exists(Expr conv |
    e = conv.getUnconverted() and
    conv = dealloc.getFreedExpr().getFullyConverted() and
    incoming = outgoing.(DataFlow::PostUpdateNode).getPreUpdateNode() and
    conv = incoming.asConvertedExpr()
  ) and
  // Ignore realloc functions
  not exists(dealloc.(FunctionCall).getTarget().(AllocationFunction).getReallocPtrArg())
}

次に、isExcludedMmFreePageFromMdl です。

/**
 * `dealloc1` is a deallocation expression, `e` is an expression that dereferences a
 * pointer, and the `(dealloc1, e)` pair should be excluded by the `FlowFromFree` library.
 * `dealloc1` は割り当て解除式であり、`e` はポインタを逆参照する式であり、
 * `(dealloc1, e)` ペアは `FlowFromFree` ライブラリによって除外される必要があります。
 *
 * Note that `e` is not necessarily the expression deallocated by `dealloc1`. It will
 * be bound to the second deallocation as identified by the `FlowFromFree` library.
 * `e` は必ずしも `dealloc1` によって解放される式ではないことに注意してください。
 * `FlowFromFree` ライブラリによって識別される2番目の解放にバインドされます。
 *
 * From https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-mmfreepagesfrommdl:
 * "After calling MmFreePagesFromMdl, the caller must also call ExFreePool
 * to release the memory that was allocated for the MDL structure."
 * https://learn.microsoft.com/en-us/windows-hardware/drivers/ddi/wdm/nf-wdm-mmfreepagesfrommdl より:
 * 「MmFreePagesFromMdl を呼び出した後、呼び出し元は ExFreePool も呼び出して、
 * MDL 構造に割り当てられたメモリを解放する必要があります。」
 */
bindingset[dealloc1, e]
predicate isExcludedMmFreePageFromMdl(DeallocationExpr dealloc1, Expr e) {
  exists(DeallocationExpr dealloc2 | isFree(_, _, e, dealloc2) |
    dealloc1.(FunctionCall).getTarget().hasGlobalName("MmFreePagesFromMdl") and
    isExFreePoolCall(dealloc2, _)
  )
}

最後に、defaultSourceSinkIsRelated です。

/**
 * Holds if either `source` strictly dominates `sink`, or `sink` strictly
 * post-dominates `source`.
 * `source` が `sink` を厳密に支配する場合、または `sink` が `source` を厳密に後置支配する場合に保持されます。
 */
bindingset[source, sink]
predicate defaultSourceSinkIsRelated(DataFlow::Node source, DataFlow::Node sink) {
  exists(IRBlock b1, int i1, IRBlock b2, int i2 |
    source.hasIndexInBlock(b1, i1) and
    sink.hasIndexInBlock(b2, i2)
  |
    strictlyDominates(b1, i1, b2, i2)
    or
    strictlyPostDominates(b2, i2, b1, i1)
  )
}

UseAfterFree.qll

こちらも使われているところだけ見ていきます。

isUse です。

/**
 * Holds if `n` represents the expression `e`, and `e` is a pointer that is
 * guaranteed to be dereferenced (either because it's an operand of a
 * dereference operation, or because it's an argument to a function that
 * always dereferences the parameter).
 * `n` が式 `e` を表し、`e` が逆参照されることが保証されたポインターである場合に保持されます
 * (逆参照操作のオペランドであるか、常にパラメーターを逆参照する関数の引数であるため)。
 */
predicate isUse(DataFlow::Node n, Expr e) {
  isUse0(e) and n.asExpr() = e
  or
  exists(DataFlowCall call, InitializeParameterInstruction init |
    n.asOperand().getDef().getUnconvertedResultExpression() = e and
    pragma[only_bind_into](init) = ParameterSinks::getAnAlwaysDereferencedParameter() and
    viableParamArg(call, DataFlow::instructionNode(init), n) and
    pragma[only_bind_out](init.getEnclosingFunction()) =
      pragma[only_bind_out](call.asCallInstruction().getStaticCallTarget())
  )
}