PyTorch Profiler で torch.profiler._KinetoProfile.export_stacks() 関数を使ってスタックトレースを書き出す

2024-04-02

torch.profiler._KinetoProfile.export_stacks()関数は、PyTorch Profilerを使用して取得したプロファイリング結果から、各イベントのスタックトレースをファイルに書き出すための関数です。この関数は、パフォーマンスのボトルネックを特定し、コードの問題をデバッグするのに役立ちます。

使用方法

export_stacks()関数は、以下の引数を受け取ります。

path: スタックトレースを書き出すファイルパス
metric: スタックトレースに含めるメトリクス。デフォルトはself_cpu_time_totalです。

例

import torch
from torch.profiler import profile

with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], with_stack=True) as prof:
    model(inputs)

prof.export_stacks("/tmp/profiler/stacks_cpu.txt", "self_cpu_time_total")
prof.export_stacks("/tmp/profiler/stacks_cuda.txt", "self_cuda_time_total")

上記コードは、CPUとCUDAの両方のパフォーマンスを計測し、それぞれのスタックトレースをstacks_cpu.txtとstacks_cuda.txtファイルに書き出す例です。

出力ファイルには、各イベントのスタックトレースがJSON形式で書き出されます。各イベントには、以下の情報が含まれます。

イベント名
開始時間
終了時間
経過時間
スタックトレース

スタックトレースの読み方

スタックトレースは、イベントが発生した時点のコードの呼び出し履歴を表します。各フレームには、以下の情報が含まれます。

ファイル名
行番号
関数名

注意事項

export_stacks()関数は、with_stack=Trueオプションを指定してプロファイリングを実行した場合にのみ使用できます。
出力ファイルは、Kineto Profilerなどのツールで読み込むことができます。

応用例

パフォーマンスのボトルネックを特定する
コードの問題をデバッグする
異なるコードパスのパフォーマンスを比較する

torch.profiler._KinetoProfile.export_stacks() 関数のサンプルコード

CPUとCUDAの両方のパフォーマンスを計測し、それぞれのスタックトレースをファイルに書き出す

import torch
from torch.profiler import profile

with profile(activities=[ProfilerActivity.CPU, ProfilerActivity.CUDA], with_stack=True) as prof:
    model(inputs)

prof.export_stacks("/tmp/profiler/stacks_cpu.txt", "self_cpu_time_total")
prof.export_stacks("/tmp/profiler/stacks_cuda.txt", "self_cuda_time_total")

特定のイベントのスタックトレースのみをファイルに書き出す

import torch
from torch.profiler import profile

with profile(activities=[ProfilerActivity.CPU], with_stack=True) as prof:
    model(inputs)

# 特定のイベントのスタックトレースのみを書き出す
event = prof.kineto_profile.events[0]
event.export_stacks("/tmp/profiler/stacks_event.txt", "self_cpu_time_total")

スタックトレースに含めるメトリクスを指定する

import torch
from torch.profiler import profile

with profile(activities=[ProfilerActivity.CPU], with_stack=True) as prof:
    model(inputs)

prof.export_stacks("/tmp/profiler/stacks.txt", "self_cpu_time_total,nvtx_time_total")

出力ファイル形式を指定する

import torch
from torch.profiler import profile

with profile(activities=[ProfilerActivity.CPU], with_stack=True) as prof:
    model(inputs)

prof.export_stacks("/tmp/profiler/stacks.json", "self_cpu_time_total", output_format="json")

上記コードは、出力ファイル形式をJSON形式に指定する例です。

Kineto Profiler でスタックトレースを開く

kineto_profiler -open /tmp/profiler/stacks.json

上記コマンドは、Kineto Profiler で stacks.json ファイルを開く例です。

上記のサンプルコードは、PyTorch Profiler の基本的な使い方を示しています。
詳細については、PyTorch Profiler のドキュメントを参照してください。

torch.profiler._KinetoProfile.export_stacks() 関数の代替方法

torch.profiler.export_stacks() 関数を使う

import torch
from torch.profiler import profile, export_stacks

with profile(activities=[ProfilerActivity.CPU], with_stack=True) as prof:
    model(inputs)

export_stacks(prof, "/tmp/profiler/stacks.txt", "self_cpu_time_total")

上記コードは、torch.profiler.export_stacks() 関数を使ってスタックトレースをファイルに書き出す例です。

自作のコードでスタックトレースを取得する

import torch
from torch.profiler import profile

with profile(activities=[ProfilerActivity.CPU], with_stack=True) as prof:
    model(inputs)

for event in prof.kineto_profile.events:
    # イベントの開始時間と終了時間を使って、自作のコードでスタックトレースを取得
    start_time = event.start_time
    end_time = event.start_time + event.duration
    stacks = get_stacks(start_time, end_time)

    # スタックトレースをファイルに書き出す
    with open("/tmp/profiler/stacks.txt", "a") as f:
        f.write(f"Event: {event.name}\n")
        for frame in stacks:
            f.write(f"    {frame.filename}:{frame.lineno} {frame.function}\n")

上記コードは、自作のコードでスタックトレースを取得し、ファイルに書き出す例です。

これらのツールは、PyTorch Profiler だけでなく、その他のフレームワークや言語で実行されるプログラムのパフォーマンスを分析することができます。

torch.profiler._KinetoProfile.export_stacks() 関数は、PyTorch Profiler で取得したプロファイリング結果からスタックトレースをファイルに書き出すための便利な関数です。しかし、いくつかの代替方法も存在します。これらの方法を理解することで、ニーズに合った方法を選択することができます。

PyTorch Profiler で torch.profiler._KinetoProfile.export_stacks() 関数を使ってスタックトレースを書き出す

応用例

torch.profiler._KinetoProfile.export_stacks() 関数のサンプルコード

CPUとCUDAの両方のパフォーマンスを計測し、それぞれのスタックトレースをファイルに書き出す

特定のイベントのスタックトレースのみをファイルに書き出す

スタックトレースに含めるメトリクスを指定する

出力ファイル形式を指定する

Kineto Profiler でスタックトレースを開く

torch.profiler._KinetoProfile.export_stacks() 関数の代替方法

torch.profiler.export_stacks() 関数を使う

自作のコードでスタックトレースを取得する

パフォーマンス向上：PyTorch Dataset と DataLoader でデータローディングを最適化する

PyTorch Miscellaneous モジュール：ディープラーニング開発を効率化するユーティリティ

PyTorchで事前学習済みモデルを使う：torch.utils.model_zoo徹底解説

PyTorch Miscellaneous: torch.testing.assert_close() の詳細解説

PyTorch Miscellaneous: torch.utils.cpp_extension.get_compiler_abi_compatibility_and_version() の概要

PyTorch Storage とは？Tensor との関係を分かりやすく解説

PyTorchでニューラルネットワークのバックプロパゲーションを制御する方法

PyTorch Tensor の特異値分解 (torch.Tensor.svd) のサンプルコード

PyTorch torch.renorm 関数：勾配クリッピング、ニューラルネットワークの安定化、L_p ノルム制限など

torch.distributions.dirichlet.Dirichlet.mean メソッドによる計算