mlflow

The mlflow module provides a high-level “fluent” API for starting and managing MLflow runs. For example:

import mlflow
mlflow.start_run()
mlflow.log_param("my", "param")
mlflow.log_metric("score", 100)
mlflow.end_run()

You can also use syntax like this:

with mlflow.start_run() as run:
    ...

which automatically terminates the run at the end of the block.

The fluent tracking API is not currently threadsafe. Any concurrent callers to the tracking API must implement mutual exclusion manually.

For a lower level API, see the mlflow.tracking module.

class mlflow.ActiveRun(run)[source]: Wrapper around mlflow.entities.Run to enable using Python with syntax.

mlflow.log_param(key, value)[source]

Log a parameter under the current run. If no run is active, this method will create a new active run.

Parameters

key – Parameter name (string)
value – Parameter value (string, but will be string-ified if not)

mlflow.log_params(params)[source]

Log a batch of params for the current run. If no run is active, this method will create a new active run.

Parameters: params – Dictionary of param_name: String -> value: (String, but will be string-ified if not)
Returns: None

mlflow.log_metric(key, value, step=None)[source]

Log a metric under the current run. If no run is active, this method will create a new active run.

Parameters

key – Metric name (string).
value – Metric value (float). Note that some special values such as +/- Infinity may be replaced by other values depending on the store. For example, sFor example, the SQLAlchemy store replaces +/- Inf with max / min float values.
step – Metric step (int). Defaults to zero if unspecified.

mlflow.log_metrics(metrics, step=None)[source]

Log multiple metrics for the current run. If no run is active, this method will create a new active run.

Parameters

metrics – Dictionary of metric_name: String -> value: Float. Note that some special values such as +/- Infinity may be replaced by other values depending on the store. For example, sql based store may replace +/- Inf with max / min float values.
step – A single integer step at which to log the specified Metrics. If unspecified, each metric is logged at step zero.

Returns

None

mlflow.set_tag(key, value)[source]

Set a tag under the current run. If no run is active, this method will create a new active run.

Parameters

key – Tag name (string)
value – Tag value (string, but will be string-ified if not)

mlflow.set_tags(tags)[source]

Log a batch of tags for the current run. If no run is active, this method will create a new active run.

Parameters: tags – Dictionary of tag_name: String -> value: (String, but will be string-ified if not)
Returns: None

mlflow.delete_tag(key)[source]

Delete a tag from a run. This is irreversible. If no run is active, this method will create a new active run.

Parameters: key – Name of the tag

mlflow.log_artifacts(local_dir, artifact_path=None)[source]

Log all the contents of a local directory as artifacts of the run. If no run is active, this method will create a new active run.

Parameters

local_dir – Path to the directory of files to write.
artifact_path – If provided, the directory in artifact_uri to write to.

mlflow.log_artifact(local_path, artifact_path=None)[source]

Log a local file or directory as an artifact of the currently active run. If no run is active, this method will create a new active run.

Parameters

local_path – Path to the file to write.
artifact_path – If provided, the directory in artifact_uri to write to.

mlflow.active_run()[source]

Get the currently active Run, or None if no such run exists.

Note: You cannot access currently-active run attributes (parameters, metrics, etc.) through the run returned by mlflow.active_run. In order to access such attributes, use the mlflow.tracking.MlflowClient as follows:

client = mlflow.tracking.MlflowClient()
data = client.get_run(mlflow.active_run().info.run_id).data

mlflow.start_run(run_id=None, experiment_id=None, run_name=None, nested=False)[source]

Start a new MLflow run, setting it as the active run under which metrics and parameters will be logged. The return value can be used as a context manager within a with block; otherwise, you must call end_run() to terminate the current run.

If you pass a run_id or the MLFLOW_RUN_ID environment variable is set, start_run attempts to resume a run with the specified run ID and other parameters are ignored. run_id takes precedence over MLFLOW_RUN_ID.

MLflow sets a variety of default tags on the run, as defined in MLflow system tags.

Parameters

run_id – If specified, get the run with the specified UUID and log parameters and metrics under that run. The run’s end time is unset and its status is set to running, but the run’s other attributes (source_version, source_type, etc.) are not changed.
experiment_id – ID of the experiment under which to create the current run (applicable only when run_id is not specified). If experiment_id argument is unspecified, will look for valid experiment in the following order: activated using set_experiment, MLFLOW_EXPERIMENT_NAME environment variable, MLFLOW_EXPERIMENT_ID environment variable, or the default experiment as defined by the tracking server.
run_name – Name of new run (stored as a mlflow.runName tag). Used only when run_id is unspecified.
nested – Controls whether run is nested in parent run. True creates a nest run.

Returns

mlflow.ActiveRun object that acts as a context manager wrapping the run’s state.

mlflow.end_run(status='FINISHED')[source]: End an active MLflow run (if there is one).

mlflow.search_runs(experiment_ids=None, filter_string='', run_view_type=1, max_results=100000, order_by=None)[source]

Get a pandas DataFrame of runs that fit the search criteria.

Parameters

experiment_ids – List of experiment IDs. None will default to the active experiment.
filter_string – Filter query string, defaults to searching all runs.
run_view_type – one of enum values ACTIVE_ONLY, DELETED_ONLY, or ALL runs defined in mlflow.entities.ViewType.
max_results – The maximum number of runs to put in the dataframe. Default is 100,000 to avoid causing out-of-memory issues on the user’s machine.
order_by – List of columns to order by (e.g., “metrics.rmse”). The order_by column can contain an optional DESC or ASC value. The default is ASC. The default ordering is to sort by start_time DESC, then run_id.

Returns

A pandas.DataFrame of runs, where each metric, parameter, and tag are expanded into their own columns named metrics.*, params.*, and tags.* respectively. For runs that don’t have a particular metric, parameter, or tag, their value will be (NumPy) Nan, None, or None respectively.

mlflow.get_artifact_uri(artifact_path=None)[source]

Get the absolute URI of the specified artifact in the currently active run. If path is not specified, the artifact root URI of the currently active run will be returned; calls to log_artifact and log_artifacts write artifact(s) to subdirectories of the artifact root URI.

If no run is active, this method will create a new active run.

Parameters: artifact_path – The run-relative artifact path for which to obtain an absolute URI. For example, “path/to/artifact”. If unspecified, the artifact root URI for the currently active run will be returned.
Returns: An absolute URI referring to the specified artifact or the currently adtive run’s artifact root. For example, if an artifact path is provided and the currently active run uses an S3-backed store, this may be a uri of the form s3://<bucket_name>/path/to/artifact/root/path/to/artifact. If an artifact path is not provided and the currently active run uses an S3-backed store, this may be a URI of the form s3://<bucket_name>/path/to/artifact/root.

mlflow.get_tracking_uri()[source]

Get the current tracking URI. This may not correspond to the tracking URI of the currently active run, since the tracking URI can be updated via set_tracking_uri.

Returns: The tracking URI.

mlflow.set_tracking_uri(uri)[source]

Set the tracking server URI. This does not affect the currently active run (if one exists), but takes effect for successive runs.

Parameters

uri –

An empty string, or a local file path, prefixed with file:/. Data is stored locally at the provided file (or ./mlruns if empty).
An HTTP URI like https://my-tracking-server:5000.
A Databricks workspace, provided as the string “databricks” or, to use a Databricks CLI profile, “databricks://<profileName>”.

mlflow.get_experiment(experiment_id)[source]

Retrieve an experiment by experiment_id from the backend store

Parameters: experiment_id – The experiment ID returned from create_experiment.
Returns: mlflow.entities.Experiment

mlflow.get_experiment_by_name(name)[source]

Retrieve an experiment by experiment name from the backend store

Parameters: name – The experiment name.
Returns: mlflow.entities.Experiment

mlflow.create_experiment(name, artifact_location=None)[source]

Create an experiment.

Parameters

name – The experiment name. Must be unique.
artifact_location – The location to store run artifacts. If not provided, the server picks an appropriate default.

Returns

Integer ID of the created experiment.

mlflow.set_experiment(experiment_name)[source]

Set given experiment as active experiment. If experiment does not exist, create an experiment with provided name.

Parameters: experiment_name – Name of experiment to be activated.

mlflow.delete_experiment(experiment_id)[source]

Delete an experiment from the backend store.

Parameters: experiment_id – The experiment ID returned from create_experiment.

mlflow.get_run(run_id)[source]

Fetch the run from backend store. The resulting Run contains a collection of run metadata – RunInfo, as well as a collection of run parameters, tags, and metrics – RunData. In the case where multiple metrics with the same key are logged for the run, the RunData contains the most recently logged value at the largest step for each metric.

Parameters: run_id – Unique identifier for the run.
Returns: A single mlflow.entities.Run object, if the run exists. Otherwise, raises an exception.

mlflow.delete_run(run_id)[source]

Deletes a run with the given ID.

Parameters: run_id – Unique identifier for the run to delete.

mlflow.run(uri, entry_point='main', version=None, parameters=None, docker_args=None, experiment_name=None, experiment_id=None, backend=None, backend_config=None, use_conda=True, storage_dir=None, synchronous=True, run_id=None)[source]

Run an MLflow project. The project can be local or stored at a Git URI.

You can run the project locally or remotely on a Databricks.

For information on using this method in chained workflows, see Building Multistep Workflows.

Raises

mlflow.exceptions.ExecutionException If a run launched in blocking mode is unsuccessful.

Parameters

uri – URI of project to run. A local filesystem path or a Git repository URI (e.g. https://github.com/mlflow/mlflow-example) pointing to a project directory containing an MLproject file.
entry_point – Entry point to run within the project. If no entry point with the specified name is found, runs the project file entry_point as a script, using “python” to run .py files and the default shell (specified by environment variable $SHELL) to run .sh files.
version – For Git-based projects, either a commit hash or a branch name.
experiment_name – Name of experiment under which to launch the run.
experiment_id – ID of experiment under which to launch the run.
backend – Execution backend for the run: “local”, “databricks”, or “kubernetes” (experimental). If running against Databricks, will run against a Databricks workspace determined as follows: if a Databricks tracking URI of the form databricks://profile has been set (e.g. by setting the MLFLOW_TRACKING_URI environment variable), will run against the workspace specified by <profile>. Otherwise, runs against the workspace specified by the default Databricks CLI profile.
backend_config – A dictionary, or a path to a JSON file (must end in ‘.json’), which will be passed as config to the backend. The exact content which should be provided is different for each execution backend and is documented at https://www.mlflow.org/docs/latest/projects.html.
use_conda – If True (the default), create a new Conda environment for the run and install project dependencies within that environment. Otherwise, run the project in the current environment without installing any project dependencies.
storage_dir – Used only if backend is “local”. MLflow downloads artifacts from distributed URIs passed to parameters of type path to subdirectories of storage_dir.
synchronous – Whether to block while waiting for a run to complete. Defaults to True. Note that if synchronous is False and backend is “local”, this method will return, but the current process will block when exiting until the local run completes. If the current process is interrupted, any asynchronous runs launched via this method will be terminated. If synchronous is True and the run fails, the current process will error out as well.
run_id – Note: this argument is used internally by the MLflow project APIs and should not be specified. If specified, the run ID will be used instead of creating a new run.

Returns

mlflow.projects.SubmittedRun exposing information (e.g. run ID) about the launched run.

mlflow.register_model(model_uri, name)[source]

Note

Experimental: This method may change or be removed in a future release without warning.

Create a new model version in model registry for the model files specified by model_uri. Note that this method assumes the model registry backend URI is the same as that of the tracking backend.

Parameters

model_uri – URI referring to the MLmodel directory. Use a runs:/ URI if you want to record the run ID with the model in model registry. models:/ URIs are currently not supported.
name – Name of the registered model under which to create a new model version. If a registered model with the given name does not exist, it will be created automatically.

Returns

Single mlflow.entities.model_registry.ModelVersion object created by backend.