MLflow Plugins
As a framework-agnostic tool for machine learning, the MLflow Python API provides developer APIs for writing plugins that integrate with different ML frameworks and backends.
Plugins provide a powerful mechanism for customizing the behavior of the MLflow Python client and integrating third-party tools, allowing you to:
Integrate with third-party storage solutions for experiment data, artifacts, and models
Integrate with third-party authentication providers, e.g. read HTTP authentication credentials from a special file
Use the MLflow client to communicate with other REST APIs, e.g. your organization’s existing experiment-tracking APIs
Automatically capture additional metadata as run tags, e.g. the git repository associated with a run
The MLflow Python API supports several types of plugins:
Tracking Store: override tracking backend logic, e.g. to log to a third-party storage solution
ArtifactRepository: override artifact logging logic, e.g. to log to a third-party storage solution
Run context providers: specify context tags to be set on runs created via the
mlflow.start_run()
fluent API.Model Registry Store: override model registry backend logic, e.g. to log to a third-party storage solution
Table of Contents
Using an MLflow Plugin
MLflow plugins are Python packages that you can install using PyPI or conda. This example installs a Tracking Store plugin from source and uses it within an example script.
Install the Plugin
To get started, clone MLflow and install this example plugin:
git clone https://github.com/mlflow/mlflow
cd mlflow
pip install -e tests/resources/mlflow-test-plugin
Run Code Using the Plugin
This plugin defines a custom Tracking Store for tracking URIs with the file-plugin
scheme.
The plugin implementation delegates to MLflow’s built-in file-based run storage. To use
the plugin, you can run any code that uses MLflow, setting the tracking URI to one with a
file-plugin://
scheme:
MLFLOW_TRACKING_URI=file-plugin:$(PWD)/mlruns python examples/quickstart/mlflow_tracking.py
Launch the MLflow UI:
cd ..
mlflow server --backend-store-uri ./mlflow/mlruns
View results at http://localhost:5000. You should see a newly-created run with a param named “param1” and a metric named “foo”:
Writing Your Own MLflow Plugins
Defining a Plugin
You define an MLflow plugin as a standalone Python package that can be distributed for installation via PyPI or conda. See https://github.com/mlflow/mlflow/tree/branch-1.5/tests/resources/mlflow-test-plugin for an example package that implements all available plugin types.
The example package contains a setup.py
that declares a number of
entry points:
setup(
name="mflow-test-plugin",
# Require MLflow as a dependency of the plugin, so that plugin users can simply install
# the plugin and then immediately use it with MLflow
install_requires=["mlflow"],
...
entry_points={
# Define a Tracking Store plugin for tracking URIs with scheme 'file-plugin'
"mlflow.tracking_store": "file-plugin=mlflow_test_plugin:PluginFileStore",
# Define a ArtifactRepository plugin for artifact URIs with scheme 'file-plugin'
"mlflow.artifact_repository":
"file-plugin=mlflow_test_plugin:PluginLocalArtifactRepository",
# Define a RunContextProvider plugin. The entry point name for run context providers
# is not used, and so is set to the string "unused" here
"mlflow.run_context_provider": "unused=mlflow_test_plugin:PluginRunContextProvider",
# Define a Model Registry Store plugin for tracking URIs with scheme 'file-plugin'
"mlflow.model_registry_store":
"file-plugin=mlflow_test_plugin:PluginRegistrySqlAlchemyStore",
},
)
Each element of this entry_points
dictionary specifies a single plugin. You
can choose to implement one or more plugin types in your package, and need not implement them all.
The type of plugin defined by each entry point and its corresponding reference implementation in
MLflow are described below. You can work from the reference implementations when writing your own
plugin:
Description |
Entry-point group |
Entry-point name and value |
Reference Implementation |
---|---|---|---|
Plugins for overriding definitions of tracking APIs like |
mlflow.tracking_store |
The entry point value (e.g. The entry point name (e.g. Users who install the example plugin and set a tracking URI of the form |
|
Plugins for defining artifact read/write APIs like |
mlflow.artifact_repository |
The entry point value (e.g. The entry point name (e.g. Users who install the example plugin and log to a run whose artifact URI is of the form |
|
Plugins for specifying custom context tags at run creation time, e.g. tags identifying the git repository associated with a run. |
mlflow.run_context_provider |
The entry point name is unused. The entry point value (e.g. |
|
Plugins for overriding definitions of Model Registry APIs like |
mlflow.model_registry_store |
Note The Model Registry is in beta (as of MLflow 1.5). Model Registry APIs are not guaranteed to be stable, and Model Registry plugins may break in the future. The entry point value (e.g. The entry point name (e.g. Users who install the example plugin and set a tracking URI of the form |
Testing Your Plugin
We recommend testing your plugin to ensure that it follows the contract expected by MLflow. For
example, a Tracking Store plugin should contain tests verifying correctness of its
log_metric
, log_param
, … etc implementations. See also the tests for MLflow’s
reference implementations as an example:
Distributing Your Plugin
Assuming you’ve structured your plugin similarly to the example plugin, you can distribute it via PyPI.
Congrats, you’ve now written and distributed your own MLflow plugin!
Community Plugins
SQL Server Plugin
The mlflow-dbstore plugin allows MLflow to use a relational database as an artifact store. As of now, it has only been tested with SQL Server as the artifact store.
You can install MLflow with the SQL Server plugin via:
pip install mlflow[sqlserver]
and then use MLflow as normal. The SQL Server artifact store support will be provided automatically.
The plugin implements all of the MLflow artifact store APIs. To use SQL server as an artifact store, a database URI must be provided, as shown in the example below:
db_uri = "mssql+pyodbc://username:password@host:port/database?driver=ODBC+Driver+17+for+SQL+Server"
client.create_experiment(exp_name, artifact_location=db_uri)
mlflow.set_experiment(exp_name)
mlflow.onnx.log_model(onnx, "model")
The first time an artifact is logged in the artifact store, the plugin automatically creates an artifacts
table in the database specified by the database URI and stores the artifact there as a BLOB.
Subsequent logged artifacts are stored in the same table.
In the example provided above, the log_model
operation creates three entries in the database table to store the ONNX model, the MLmodel file
and the conda.yaml file associated with the model.