tf.distribute.cluster_resolver.KubernetesClusterResolver

View source on GitHub

ClusterResolver for Kubernetes.

Inherits From: ClusterResolver

tf.distribute.cluster_resolver.KubernetesClusterResolver(
    job_to_label_mapping=None, tf_server_port=8470, rpc_layer='grpc',
    override_client=None
)

This is an implementation of cluster resolvers for Kubernetes. When given the the Kubernetes namespace and label selector for pods, we will retrieve the pod IP addresses of all running pods matching the selector, and return a ClusterSpec based on that information.

Args:

Attributes:

Raises:

Methods

cluster_spec

View source

cluster_spec()

Returns a ClusterSpec object based on the latest info from Kubernetes.

We retrieve the information from the Kubernetes master every time this method is called.

Returns:

A ClusterSpec containing host information returned from Kubernetes.

Raises:

master

View source

master(
    task_type=None, task_id=None, rpc_layer=None
)

Returns the master address to use when creating a session.

You must have set the task_type and task_id object properties before calling this function, or pass in the task_type and task_id parameters when using this function. If you do both, the function parameters will override the object properties.

Args:

Returns:

The name or URL of the session master.

num_accelerators

View source

num_accelerators(
    task_type=None, task_id=None, config_proto=None
)

Returns the number of accelerator cores per worker.

This returns the number of accelerator cores (such as GPUs and TPUs) available per worker.

Optionally, we allow callers to specify the task_type, and task_id, for if they want to target a specific TensorFlow process to query the number of accelerators. This is to support heterogenous environments, where the number of accelerators cores per host is different.

Args:

Returns:

A map of accelerator types to number of cores.