mlflow.pyfunc
The mlflow.pyfunc
module defines a generic filesystem format
for Python models and provides utilities for saving to and loading from this format. The format is
self contained in the sense that it includes all necessary information for anyone to load it and
use it. Dependencies are either stored directly with the model or referenced via a Conda
environment.
The mlflow.pyfunc
module also defines utilities for creating custom pyfunc
models
using frameworks and inference logic that may not be natively included in MLflow. See
Creating custom Pyfunc models.
Filesystem format
The Pyfunc format is defined as a directory structure containing all required data, code, and configuration:
./dst-path/
./MLmodel: configuration
<code>: code packaged with the model (specified in the MLmodel file)
<data>: data packaged with the model (specified in the MLmodel file)
<env>: Conda environment definition (specified in the MLmodel file)
The directory structure may contain additional contents that can be referenced by the MLmodel
configuration.
MLModel configuration
A Python model contains an MLmodel
file in python_function format in its root with the
following parameters:
- loader_module [required]:
Python module that can load the model. Expected as module identifier e.g.
mlflow.sklearn
, it will be imported usingimportlib.import_module
. The imported module must contain a function with the following signature:_load_pyfunc(path: string) -> <pyfunc model>
The path argument is specified by the
data
parameter and may refer to a file or directory.
- code [optional]:
Relative path to a directory containing the code packaged with this model. All files and directories inside this directory are added to the Python path prior to importing the model loader.
- data [optional]:
Relative path to a file or directory containing model data. The path is passed to the model loader.
- env [optional]:
Relative path to an exported Conda environment. If present this environment should be activated prior to running the model.
Optionally, any additional parameters necessary for interpreting the serialized model in
pyfunc
format.
Example
tree example/sklearn_iris/mlruns/run1/outputs/linear-lr
├── MLmodel
├── code
│ ├── sklearn_iris.py
│
├── data
│ └── model.pkl
└── mlflow_env.yml
cat example/sklearn_iris/mlruns/run1/outputs/linear-lr/MLmodel
python_function:
code: code
data: data/model.pkl
loader_module: mlflow.sklearn
env: mlflow_env.yml
main: sklearn_iris
Inference API
The convention for pyfunc models is to have a predict
method or function with the following
signature:
predict(model_input: pandas.DataFrame) -> [numpy.ndarray | pandas.Series | pandas.DataFrame]
This convention is relied on by other MLflow components.
Creating custom Pyfunc models
MLflow’s persistence modules provide convenience functions for creating models with the
pyfunc
flavor in a variety of machine learning frameworks (scikit-learn, Keras, Pytorch, and
more); however, they do not cover every use case. For example, you may want to create an MLflow
model with the pyfunc
flavor using a framework that MLflow does not natively support.
Alternatively, you may want to build an MLflow model that executes custom logic when evaluating
queries, such as preprocessing and postprocessing routines. Therefore, mlflow.pyfunc
provides utilities for creating pyfunc
models from arbitrary code and model data.
The save_model()
and log_model()
methods are designed to support multiple workflows
for creating custom pyfunc
models that incorporate custom inference logic and artifacts
that the logic may require.
An artifact is a file or directory, such as a serialized model or a CSV. For example, a serialized TensorFlow graph is an artifact. An MLflow model directory is also an artifact.
Workflows
save_model()
and log_model()
support the following workflows:
Programmatically defining a new MLflow model, including its attributes and artifacts.
Given a set of artifact URIs,
save_model()
andlog_model()
can automatically download artifacts from their URIs and create an MLflow model directory.In this case, you must define a Python class which inherits from
PythonModel
, definingpredict()
and, optionally,load_context()
. An instance of this class is specified via thepython_model
parameter; it is automatically serialized and deserialized as a Python class, including all of its attributes.Interpreting pre-existing data as an MLflow model.
If you already have a directory containing model data,
save_model()
andlog_model()
can import the data as an MLflow model. Thedata_path
parameter specifies the local filesystem path to the directory containing model data.In this case, you must provide a Python module, called a loader module. The loader module defines a
_load_pyfunc()
method that performs the following tasks:Load data from the specified
data_path
. For example, this process may include deserializing pickled Python objects or models or parsing CSV files.Construct and return a pyfunc-compatible model wrapper. As in the first use case, this wrapper must define a
predict()
method that is used to evaluate queries.predict()
must adhere to the Inference API.
The
loader_module
parameter specifies the name of your loader module.For an example loader module implementation, refer to the loader module implementation in mlflow.keras.
Which workflow is right for my use case?
We consider the first workflow to be more user-friendly and generally recommend it for the following reasons:
It automatically resolves and collects specified model artifacts.
It automatically serializes and deserializes the
python_model
instance and all of its attributes, reducing the amount of user logic that is required to load the modelYou can create Models using logic that is defined in the
__main__
scope. This allows custom models to be constructed in interactive environments, such as notebooks and the Python REPL.
You may prefer the second, lower-level workflow for the following reasons:
Inference logic is always persisted as code, rather than a Python object. This makes logic easier to inspect and modify later.
If you have already collected all of your model data in a single location, the second workflow allows it to be saved in MLflow format directly, without enumerating constituent artifacts.
-
mlflow.pyfunc.
add_to_model
(model, loader_module, data=None, code=None, env=None, **kwargs)[source] Add a
pyfunc
spec to the model configuration.Defines
pyfunc
configuration schema. Caller can use this to create a validpyfunc
model flavor out of an existing directory structure. For example, other model flavors can use this to specify how to use their output as apyfunc
.Note
All paths are relative to the exported model root directory.
- Parameters
model – Existing model.
loader_module – The module to be used to load the model.
data – Path to the model data.
code – Path to the code dependencies.
env – Conda environment.
kwargs – Additional key-value pairs to include in the
pyfunc
flavor specification. Values must be YAML-serializable.
- Returns
Updated model configuration.
-
mlflow.pyfunc.
load_model
(model_uri, suppress_warnings=True)[source] Load a model stored in Python function format.
- Parameters
model_uri –
The location, in URI format, of the MLflow model. For example:
/Users/me/path/to/local/model
relative/path/to/local/model
s3://my_bucket/path/to/model
runs:/<mlflow_run_id>/run-relative/path/to/model
models:/<model_name>/<model_version>
models:/<model_name>/<stage>
For more information about supported URI schemes, see Referencing Artifacts.
suppress_warnings – If
True
, non-fatal warning messages associated with the model loading process will be suppressed. IfFalse
, these warning messages will be emitted.
-
mlflow.pyfunc.
load_pyfunc
(model_uri, suppress_warnings=False)[source] Warning
mlflow.pyfunc.load_pyfunc
is deprecated since 1.0. This method will be removed in a near future release. Usemlflow.pyfunc.load_model
instead.Load a model stored in Python function format.
- Parameters
model_uri –
The location, in URI format, of the MLflow model. For example:
/Users/me/path/to/local/model
relative/path/to/local/model
s3://my_bucket/path/to/model
runs:/<mlflow_run_id>/run-relative/path/to/model
models:/<model_name>/<model_version>
models:/<model_name>/<stage>
For more information about supported URI schemes, see Referencing Artifacts.
suppress_warnings – If
True
, non-fatal warning messages associated with the model loading process will be suppressed. IfFalse
, these warning messages will be emitted.
-
mlflow.pyfunc.
log_model
(artifact_path, loader_module=None, data_path=None, code_path=None, conda_env=None, python_model=None, artifacts=None, registered_model_name=None)[source] Log a Pyfunc model with custom inference logic and optional data dependencies as an MLflow artifact for the current run.
For information about the workflows that this method supports, see Workflows for creating custom pyfunc models and Which workflow is right for my use case?. You cannot specify the parameters for the second workflow:
loader_module
,data_path
and the parameters for the first workflow:python_model
,artifacts
together.- Parameters
artifact_path – The run-relative artifact path to which to log the Python model.
loader_module –
The name of the Python module that is used to load the model from
data_path
. This module must define a method with the prototype_load_pyfunc(data_path)
. If notNone
, this module and its dependencies must be included in one of the following locations:The MLflow library.
Package(s) listed in the model’s Conda environment, specified by the
conda_env
parameter.One or more of the files specified by the
code_path
parameter.
data_path – Path to a file or directory containing model data.
code_path – A list of local filesystem paths to Python file dependencies (or directories containing file dependencies). These files are prepended to the system path before the model is loaded.
conda_env –
Either a dictionary representation of a Conda environment or the path to a Conda environment yaml file. This decsribes the environment this model should be run in. If
python_model
is notNone
, the Conda environment must at least specify the dependencies contained inget_default_conda_env()
. If None, the defaultget_default_conda_env()
environment is added to the model. The following is an example dictionary representation of a Conda environment:{ 'name': 'mlflow-env', 'channels': ['defaults'], 'dependencies': [ 'python=3.7.0', 'cloudpickle==0.5.8' ] }
python_model –
An instance of a subclass of
PythonModel
. This class is serialized using the CloudPickle library. Any dependencies of the class should be included in one of the following locations:The MLflow library.
Package(s) listed in the model’s Conda environment, specified by the
conda_env
parameter.One or more of the files specified by the
code_path
parameter.
Note: If the class is imported from another module, as opposed to being defined in the
__main__
scope, the defining module should also be included in one of the listed locations.artifacts –
A dictionary containing
<name, artifact_uri>
entries. Remote artifact URIs are resolved to absolute filesystem paths, producing a dictionary of<name, absolute_path>
entries.python_model
can reference these resolved entries as theartifacts
property of thecontext
parameter inPythonModel.load_context()
andPythonModel.predict()
. For example, consider the followingartifacts
dictionary:{ "my_file": "s3://my-bucket/path/to/my/file" }
In this case, the
"my_file"
artifact is downloaded from S3. Thepython_model
can then refer to"my_file"
as an absolute filesystem path viacontext.artifacts["my_file"]
.If
None
, no artifacts are added to the model.registered_model_name – Note:: Experimental: This argument may change or be removed in a future release without warning. If given, create a model version under
registered_model_name
, also creating a registered model if one with the given name does not exist.
-
mlflow.pyfunc.
save_model
(path, loader_module=None, data_path=None, code_path=None, conda_env=None, mlflow_model=Model(), python_model=None, artifacts=None)[source] Save a Pyfunc model with custom inference logic and optional data dependencies to a path on the local filesystem.
For information about the workflows that this method supports, please see “workflows for creating custom pyfunc models” and “which workflow is right for my use case?”. Note that the parameters for the second workflow:
loader_module
,data_path
and the parameters for the first workflow:python_model
,artifacts
, cannot be specified together.- Parameters
path – The path to which to save the Python model.
loader_module –
The name of the Python module that is used to load the model from
data_path
. This module must define a method with the prototype_load_pyfunc(data_path)
. If notNone
, this module and its dependencies must be included in one of the following locations:The MLflow library.
Package(s) listed in the model’s Conda environment, specified by the
conda_env
parameter.One or more of the files specified by the
code_path
parameter.
data_path – Path to a file or directory containing model data.
code_path – A list of local filesystem paths to Python file dependencies (or directories containing file dependencies). These files are prepended to the system path before the model is loaded.
conda_env –
Either a dictionary representation of a Conda environment or the path to a Conda environment yaml file. This decsribes the environment this model should be run in. If
python_model
is notNone
, the Conda environment must at least specify the dependencies contained inget_default_conda_env()
. IfNone
, the defaultget_default_conda_env()
environment is added to the model. The following is an example dictionary representation of a Conda environment:{ 'name': 'mlflow-env', 'channels': ['defaults'], 'dependencies': [ 'python=3.7.0', 'cloudpickle==0.5.8' ] }
mlflow_model –
mlflow.models.Model
configuration to which to add the python_function flavor.python_model –
An instance of a subclass of
PythonModel
. This class is serialized using the CloudPickle library. Any dependencies of the class should be included in one of the following locations:The MLflow library.
Package(s) listed in the model’s Conda environment, specified by the
conda_env
parameter.One or more of the files specified by the
code_path
parameter.
Note: If the class is imported from another module, as opposed to being defined in the
__main__
scope, the defining module should also be included in one of the listed locations.artifacts –
A dictionary containing
<name, artifact_uri>
entries. Remote artifact URIs are resolved to absolute filesystem paths, producing a dictionary of<name, absolute_path>
entries.python_model
can reference these resolved entries as theartifacts
property of thecontext
parameter inPythonModel.load_context()
andPythonModel.predict()
. For example, consider the followingartifacts
dictionary:{ "my_file": "s3://my-bucket/path/to/my/file" }
In this case, the
"my_file"
artifact is downloaded from S3. Thepython_model
can then refer to"my_file"
as an absolute filesystem path viacontext.artifacts["my_file"]
.If
None
, no artifacts are added to the model.
-
mlflow.pyfunc.
spark_udf
(spark, model_uri, result_type='double')[source] A Spark UDF that can be used to invoke the Python function formatted model.
Parameters passed to the UDF are forwarded to the model as a DataFrame where the column names are ordinals (0, 1, …). On some versions of Spark, it is also possible to wrap the input in a struct. In that case, the data will be passed as a DataFrame with column names given by the struct definition (e.g. when invoked as my_udf(struct(‘x’, ‘y’), the model will ge the data as a pandas DataFrame with 2 columns ‘x’ and ‘y’).
The predictions are filtered to contain only the columns that can be represented as the
result_type
. If theresult_type
is string or array of strings, all predictions are converted to string. If the result type is not an array type, the left most column with matching type is returned.predict = mlflow.pyfunc.spark_udf(spark, "/my/local/model") df.withColumn("prediction", predict("name", "age")).show()
- Parameters
spark – A SparkSession object.
model_uri –
The location, in URI format, of the MLflow model with the
mlflow.pyfunc
flavor. For example:/Users/me/path/to/local/model
relative/path/to/local/model
s3://my_bucket/path/to/model
runs:/<mlflow_run_id>/run-relative/path/to/model
models:/<model_name>/<model_version>
models:/<model_name>/<stage>
For more information about supported URI schemes, see Referencing Artifacts.
result_type –
the return type of the user-defined function. The value can be either a
pyspark.sql.types.DataType
object or a DDL-formatted type string. Only a primitive type or an arraypyspark.sql.types.ArrayType
of primitive type are allowed. The following classes of result type are supported:”int” or
pyspark.sql.types.IntegerType
: The leftmost integer that can fit in anint32
or an exception if there is none.”long” or
pyspark.sql.types.LongType
: The leftmost long integer that can fit in anint64
or an exception if there is none.ArrayType(IntegerType|LongType)
: All integer columns that can fit into the requested size.”float” or
pyspark.sql.types.FloatType
: The leftmost numeric result cast tofloat32
or an exception if there is none.”double” or
pyspark.sql.types.DoubleType
: The leftmost numeric result cast todouble
or an exception if there is none.ArrayType(FloatType|DoubleType)
: All numeric columns cast to the requested type or an exception if there are no numeric columns.”string” or
pyspark.sql.types.StringType
: The leftmost column converted tostring
.ArrayType(StringType)
: All columns converted tostring
.
- Returns
Spark UDF that applies the model’s
predict
method to the data and returns a type specified byresult_type
, which by default is a double.
-
mlflow.pyfunc.
get_default_conda_env
()[source] - Returns
The default Conda environment for MLflow Models produced by calls to
save_model()
andlog_model()
when a user-defined subclass ofPythonModel
is provided.
-
class
mlflow.pyfunc.
PythonModelContext
[source] A collection of artifacts that a
PythonModel
can use when performing inference.PythonModelContext
objects are created implicitly by thesave_model()
andlog_model()
persistence methods, using the contents specified by theartifacts
parameter of these methods.
-
class
mlflow.pyfunc.
PythonModel
[source] Represents a generic Python model that evaluates inputs and produces API-compatible outputs. By subclassing
PythonModel
, users can create customized MLflow models with the “python_function” (“pyfunc”) flavor, leveraging custom inference logic and artifact dependencies.-
load_context
(context)[source] Loads artifacts from the specified
PythonModelContext
that can be used bypredict()
when evaluating inputs. When loading an MLflow model withload_pyfunc()
, this method is called as soon as thePythonModel
is constructed.The same
PythonModelContext
will also be available during calls topredict()
, but it may be more efficient to override this method and load artifacts from the context at model load time.- Parameters
context – A
PythonModelContext
instance containing artifacts that the model can use to perform inference.
-
abstract
predict
(context, model_input)[source] Evaluates a pyfunc-compatible input and produces a pyfunc-compatible output. For more information about the pyfunc input/output API, see the Inference API.
- Parameters
context – A
PythonModelContext
instance containing artifacts that the model can use to perform inference.model_input – A pyfunc-compatible input for the model to evaluate.
-