masahi's comments

masahi · on Dec 1, 2021

TVM, similar to IREE, also has a good support for Vulkan. It compiles a tensor DSL written in Python into SPIR-V. AMD is using it in production for running deep learning models on their APU.

https://github.com/apache/tvm

masahi · on May 22, 2018

The TVM results on resnet50 and mobilenet seem a bit off. On GTX 1070 Ti, with an input of size (1, 3, 224, 224)

TVM result

Resnet50 : 100 inference/sec (0.009983 sec per each run)

Mobilenet: 450 inference/sec (0.002220 sec per each run)

PlaidML result

Resnet50 : 107 inference/sec (0.009302 sec per each run)

Mobilenet: 473 inference/sec (0.002112 sec per each run)

My benchmark script for tvm is here https://gist.github.com/masahi/a386c2ce5b5f8c2d9f7af5e09a8d8...

b33pr · on May 22, 2018

Thank you so much for pointing this out. We'll get updated numbers out soon. How did you benchmark plaid, out of curiosity? The error which I correct here (https://github.com/brianretford/nnvm-rocm/blob/master/mxnet_...) was caused by a desire to roughly approximate how keras does things, and plaidbench w/ keras is the easiest way for us to evaluate things, though it definitely adds in a lot of overhead. My script roughly matches the numbers I get out of your script, though I will say that I think the TVM time_evaluator should be calling Sync on the inside of its loop, to be fair (which I patched it to do to compare against your methodology). It doesn't make a huge difference, but it does exist.

If I just pull the overall kernel runtime from our logs, I get ~525 inferences/sec.

masahi · on May 22, 2018

for plaid, I used

plaidbench keras mobilenet

plaidbench keras resnet50

time_evaluator is what tvm/nnvm folks use for benchmark. See their benchmark script here https://github.com/dmlc/nnvm/blob/master/examples/benchmark/...

hedgehog · on May 22, 2018

Thanks, I've shared this with our team to have a look. There was some subtlety to timing the right thing and it's possible we missed something.