Get started with ORT for C#
Contents
- Install the Nuget Packages with the .NET CLI
- Import the libraries
- Create method for inference
- Reuse input/output tensor buffers
- Running on GPU (Optional)
- Supported Versions
- Builds
- API Reference
- Samples
- Learn More
Install the Nuget Packages with the .NET CLI
dotnet add package Microsoft.ML.OnnxRuntime --version 1.16.0
dotnet add package System.Numerics.Tensors --version 0.1.0
Import the libraries
using Microsoft.ML.OnnxRuntime;
using System.Numerics.Tensors;
Create method for inference
This is an Azure Function example that uses ORT with C# for inference on an NLP model created with SciKit Learn.
public static async Task<IActionResult> Run(
[HttpTrigger(AuthorizationLevel.Function, "get", "post", Route = null)] HttpRequest req,
ILogger log, ExecutionContext context)
{
log.LogInformation("C# HTTP trigger function processed a request.");
string review = req.Query["review"];
string requestBody = await new StreamReader(req.Body).ReadToEndAsync();
dynamic data = JsonConvert.DeserializeObject(requestBody);
review ??= data.review;
Debug.Assert(!string.IsNullOrEmpty(review), "Expecting a string with a content");
// Get path to model to create inference session.
const string modelPath = "./model.onnx";
// Create an InferenceSession from the Model Path.
// Creating and loading sessions are expensive per request.
// They better be cached
using var session = new InferenceSession(modelPath);
// create input tensor (nlp example)
using var inputOrtValue = OrtValue.CreateTensorWithEmptyStrings(OrtAllocator.DefaultInstance, new long[] { 1, 1 });
inputOrtValue.StringTensorSetElementAt(review, 0);
// Create input data for session. Request all outputs in this case.
var inputs = new Dictionary<string, OrtValue>
{
{ "input", inputOrtValue }
};
using var runOptions = new RunOptions();
// We are getting a sequence of maps as output. We are interested in the first element (map) of the sequence.
// That result is a Sequence of Maps, and we only need the first map from there.
using var outputs = session.Run(runOptions, inputs, session.OutputNames);
Debug.Assert(outputs.Count > 0, "Expecting some output");
// We want the last output, which is the sequence of maps
var lastOutput = outputs[outputs.Count - 1];
// Optional code to check the output type
{
var outputTypeInfo = lastOutput.GetTypeInfo();
Debug.Assert(outputTypeInfo.OnnxType == OnnxValueType.ONNX_TYPE_SEQUENCE, "Expecting a sequence");
var sequenceTypeInfo = outputTypeInfo.SequenceTypeInfo;
Debug.Assert(sequenceTypeInfo.ElementType.OnnxType == OnnxValueType.ONNX_TYPE_MAP, "Expecting a sequence of maps");
}
var elementsNum = lastOutput.GetValueCount();
Debug.Assert(elementsNum > 0, "Expecting a non empty sequence");
// Get the first map in sequence
using var firstMap = lastOutput.GetValue(0, OrtAllocator.DefaultInstance);
// Optional code just checking
{
// Maps always have two elements, keys and values
// We are expecting this to be a map of strings to floats
var mapTypeInfo = firstMap.GetTypeInfo().MapTypeInfo;
Debug.Assert(mapTypeInfo.KeyType == TensorElementType.String, "Expecting keys to be strings");
Debug.Assert(mapTypeInfo.ValueType.OnnxType == OnnxValueType.ONNX_TYPE_TENSOR, "Values are in the tensor");
Debug.Assert(mapTypeInfo.ValueType.TensorTypeAndShapeInfo.ElementDataType == TensorElementType.Float, "Result map value is float");
}
var inferenceResult = new Dictionary<string, float>();
// Let use the visitor to read map keys and values
// Here keys and values are represented with the same number of corresponding entries
// string -> float
firstMap.ProcessMap((keys, values) => {
// Access native buffer directly
var valuesSpan = values.GetTensorDataAsSpan<float>();
var entryCount = (int)keys.GetTensorTypeAndShape().ElementCount;
inferenceResult.EnsureCapacity(entryCount);
for (int i = 0; i < entryCount; ++i)
{
inferenceResult.Add(keys.GetStringElement(i), valuesSpan[i]);
}
}, OrtAllocator.DefaultInstance);
// Return the inference result as json.
return new JsonResult(inferenceResult);
}
Reuse input/output tensor buffers
In some scenarios, you may want to reuse input/output tensors. This often happens when you want to chain 2 models (ie. feed one’s output as input to another), or want to accelerate inference speed during multiple inference runs.
Chaining: Feed model A’s output(s) as input(s) to model B
using Microsoft.ML.OnnxRuntime.Tensors;
using Microsoft.ML.OnnxRuntime;
namespace Samples
{
class FeedModelAToModelB
{
static void Program()
{
const string modelAPath = "./modelA.onnx";
const string modelBPath = "./modelB.onnx";
using InferenceSession session1 = new InferenceSession(modelAPath);
using InferenceSession session2 = new InferenceSession(modelBPath);
// Illustration only
float[] inputData = { 1, 2, 3, 4 };
long[] inputShape = { 1, 4 };
using var inputOrtValue = OrtValue.CreateTensorValueFromMemory(inputData, inputShape);
// Create input data for session. Request all outputs in this case.
var inputs1 = new Dictionary<string, OrtValue>
{
{ "input", inputOrtValue }
};
using var runOptions = new RunOptions();
// session1 inference
using (var outputs1 = session1.Run(runOptions, inputs1, session1.OutputNames))
{
// get intermediate value
var outputToFeed = outputs1.First();
// modify the name of the ONNX value
// create input list for session2
var inputs2 = new Dictionary<string, OrtValue>
{
{ "inputNameForModelB", outputToFeed }
};
// session2 inference
using (var results = session2.Run(runOptions, inputs2, session2.OutputNames))
{
// manipulate the results
}
}
}
}
}
Multiple inference runs with fixed sized input(s) and output(s)
If the model have fixed sized inputs and outputs of numeric tensors, use the preferable OrtValue and its API to accelerate the inference speed and minimize data transfer. OrtValue class makes it possible to reuse the underlying buffer for the input and output tensors. It pins the managed buffers and makes use of them for inference. It also provides direct access to the native buffers for outputs. You can also preallocate OrtValue
for outputs or create it on top of the existing buffers. This avoids some overhead which may be beneficial for smaller models where the time is noticeable in the overall running time.
Keep in mind that OrtValue class, like many other classes in Onnruntime C# API is IDisposable. It needs to be properly disposed to either unpin the managed buffers or release the native buffers to avoid memory leaks.
Running on GPU (Optional)
If using the GPU package, simply use the appropriate SessionOptions when creating an InferenceSession.
int gpuDeviceId = 0; // The GPU device ID to execute on
using var gpuSessionOptoins = SessionOptions.MakeSessionOptionWithCudaProvider(gpuDeviceId);
using var session = new InferenceSession("model.onnx", gpuSessionOptoins);
ONNX Runtime C# API
The ONNX runtime provides a C# .NET binding for running inference on ONNX models in any of the .NET standard platforms.
Supported Versions
.NET standard 1.1
Builds
Artifact | Description | Supported Platforms |
---|---|---|
Microsoft.ML.OnnxRuntime | CPU (Release) | Windows, Linux, Mac, X64, X86 (Windows-only), ARM64 (Windows-only)…more details: compatibility |
Microsoft.ML.OnnxRuntime.Gpu | GPU - CUDA (Release) | Windows, Linux, Mac, X64…more details: compatibility |
Microsoft.ML.OnnxRuntime.DirectML | GPU - DirectML (Release) | Windows 10 1709+ |
onnxruntime | CPU, GPU (Dev), CPU (On-Device Training) | Same as Release versions |
Microsoft.ML.OnnxRuntime.Training | CPU On-Device Training (Release) | Windows, Linux, Mac, X64, X86 (Windows-only), ARM64 (Windows-only)…more details: compatibility |