https://vengineer.hatenablog.com/entry/71825304

@Vengineerの戯言 : Twitter
SystemVerilogの世界へようこそ、すべては、SystemC v0.9公開から始まった

先週(1/23)、AWSが「Neo-AI project」をオープンソースにしましたね。

AWS launches open source Neo-AI project to accelerate ML deployments on edge devices

　・Neo-AI DLR

　・Neo-AI TVM
　・Neo-AI Treelite

の3つ。

Neo-AI DLRは、バックエンドのNeo-AI TVMとNeo-AI Treeliteのフロントエンドって感じ。

Neo-AI TVMは、dmlcのTVMベースで、Neo-AI Treeliteは、Treeliteをベースにしているようです。

Neo-AIは、Re:Inventにて、Amazon SageMaker Neo – Train Your Machine Learning Models Once, Run Them Anywhereとして発表したAmazon SageMaker Neoのソースコードなのかな。。

MXNet、TensorFlow、PyTorch、XGBoost、またはトレーニングされた Amazon SageMaker を使用して構築された Machine Learning モデルの使用を開始します。続けて、Intel、Nvidia、Arm、Cadence、Qualcomm、Xilinx からターゲットハードウェアプラットフォームを選択します。

とありますが、公開されたソースコードでは、Intel、NVIDIA、Arm に対応しているようです。

Neo-AI DLRのinstallを見ていみると、

　・dlr-1.0-py2.py3-armv7l
　・dlr-1.0-py2.py3-cuda90-aarch64
　・lr-1.0-py2.py3-opencl-x86_64

があります。

SageMaker Neo – Train Your Machine Learning Models Once, Run Them Anywhereには、

　・Compiling the model for the Raspberry Pi

とありますから、これが、Arm(armv71)かな。

NVIDIAに関しては、本家のTVMには無くて、Neo-AI TVMにあるTTensorRT対応があるので、これかな。

この subgraph の TensorRT対応にあわせて、GraphRuntime::SetupOpExecs()も変更していますね。

void GraphRuntime::SetupOpExecs() {
  op_execs_.resize(this->GetNumOfNodes());
  // setup the array and requirements.
  for (uint32_t nid = 0; nid < this->GetNumOfNodes(); ++nid) {
    const auto& inode = nodes_[nid];
    if (inode.op_type == "null") continue;
    std::vector<DLTensor> args;
    for (const auto& e : inode.inputs) {
      args.push_back(*(data_entry_[this->entry_id(e)].operator->()));
    }
    for (uint32_t index = 0; index < inode.param.num_outputs; ++index) {
      uint32_t eid = this->entry_id(nid, index);
      args.push_back(*(data_entry_[eid].operator->()));
    }
    if (inode.op_type == "tvm_op") {
      op_execs_[nid] = CreateTVMOp(inode.param, args, inode.inputs.size());
    } else if (inode.op_type == "_tensorrt_subgraph_op") {
#ifdef TVM_GRAPH_RUNTIME_TENSORRT
      CHECK_EQ(inode.subgraphs.size(), 1U) << "Only supports one subgraph per node";
      CHECK_EQ(inode.subgraphs[0].arg_nodes.size(), inode.inputs.size());
      op_execs_[nid] = tensorrt_exec_manager_.CreateExec(
          inode.name, inode.subgraphs[0], args);
#else
      LOG(FATAL) << "TensorRT NOT enabled for operator " << inode.op_type;
#endif  // TVM_GRAPH_RUNTIME_TENSORRT
    } else {
      LOG(FATAL) << "Unknown op type " << inode.op_type << " in graph runtime";
    }
  }
}

inode.op_type == "tensort_subgraph_op" が追加されたようですね。

最後のopencl-x86_64は、IntelのOpenCL(内蔵GPU対応かな？)かな？

Xilinx, Cadence, and Qualcommに関しては、オープンソースになるのかしら？