tf.distribute.cluster_resolver.TPUClusterResolver

View source on GitHub

Cluster Resolver for Google Cloud TPUs.

Inherits From: ClusterResolver

tf.distribute.cluster_resolver.TPUClusterResolver(
    tpu=None, zone=None, project=None, job_name='worker', coordinator_name=None,
    coordinator_address=None, credentials='default', service=None,
    discovery_url=None
)

This is an implementation of cluster resolvers for the Google Cloud TPU service. As Cloud TPUs are in alpha, you will need to specify a API definition file for this to consume, in addition to a list of Cloud TPUs in your Google Cloud Platform project.

TPUClusterResolver supports the following distinct environments: Google Compute Engine Google Kubernetes Engine Google internal

Args:

Attributes:

Raises:

Methods

cluster_spec

View source

cluster_spec()

Returns a ClusterSpec object based on the latest TPU information.

We retrieve the information from the GCE APIs every time this method is called.

Returns:

A ClusterSpec containing host information returned from Cloud TPUs, or None.

Raises:

get_job_name

View source

get_job_name()

get_master

View source

get_master()

master

View source

master(
    task_type=None, task_id=None, rpc_layer=None
)

Get the Master string to be used for the session.

In the normal case, this returns the grpc path (grpc://1.2.3.4:8470) of first instance in the ClusterSpec returned by the cluster_spec function.

If a non-TPU name is used when constructing a TPUClusterResolver, that will be returned instead (e.g. If the tpus argument's value when constructing this TPUClusterResolver was 'grpc://10.240.1.2:8470', 'grpc://10.240.1.2:8470' will be returned).

Args:

Returns:

string, the connection string to use when creating a session.

Raises:

num_accelerators

View source

num_accelerators(
    task_type=None, task_id=None, config_proto=None
)

Returns the number of TPU cores per worker.

Connects to the master and list all the devices present in the master, and counts them up. Also verifies that the device counts per host in the cluster is the same before returning the number of TPU cores per host.

Args:

Raises: