TensorFlow is a cross-platform open-source machine learning and deep learning library. Its architecture makes TensorFlow's system efficient, scalable, and flexible enough to run on big data centers and even mobile phones. This tutorial will explain the TensorFlow architecture and its various components in detail.

Architecture of TensorFlow

All operations within TensorFlow follow a directed graph consisting of nodes and edges in which nodes represent some functions and edges show the input and output flow associated with those functions. TensorFlow has many other internal components within the system that complete the architecture. These are:

  1. Servables
    1. Servable Versions
    2. Servable Stream
    3. Models
  2. Loaders
  3. Sources
  4. Managers
  5. Core
  6. Batcher


Servables are the abstract unit in TensorFlow that helps in TensorFlow serving operations. These are the underlying entities that help in performing the computation. It caters to great flexibility in terms of size and granularity. TensorFlow Servable may consist of anything from a lookup table to a unique tuple with interface models. It can be of any interface and type and can enable future improvements with flexibility on various verticals like:

  1. Asynchronous methods of operation.
  2. Streaming results
  3. Experimenting API connectivity

Servable Versions

In a single server instance, the TensorFlow can handle or serve one or more versions of its Servables. Because of this architectural advancement, developers can find room for new algorithm and their configurations and weights that helps data to get loaded over time. Also, versions enable loading more servable versions concurrently to support gradual rollout & experimentation.

Servable Stream

These are a succession of any servables versions sorted in an increasing version number.


The serving in TensorFlow denotes a model as a single or multiple servables working together. When developers use TensorFlow to make a machine model, learn usually includes one or more algorithms (along with learned weights) & lookup tables. Servables can again serve as a part of a model. There are various ways of representing composite models, such as:

  1. single composite servable
  2. multiple independent servables


Loaders are responsible for managing the servable's life cycle. The Loader API allows using common infrastructure independent of typical learning algorithms, product use-cases, or data. Precisely, the loaders formalize the APIs to load & unload a servable.


These servable modules work as plug-ins and deliver zero or more servable streams. For each servable stream, the TensorFlow source delivers one loader instance with every version for making it available to get loaded. The sources in TensorFlow's serving system can identify and locate servables from arbitrary storage systems. Each source might also contain zero or multiple servable streams. Therefore, for every servable stream, a source will supply only one loader instance that will keep it available for loading.


It explicitly manages three operations of a servable. These are:

  1. Loading Servables
  2. Serving Servables
  3. Unloading Servables

Managers listen to the various sources and keep a trail of all versions. As the sources' requests, the Managers try to fulfill them. But it may deny loading the desired version in case; the requested resources do not remain available. Managers are also responsible for postponing and unloading.


It helps manage the underlying aspects of Servables, like the life-cycle of the Servables, metrics, etc., by using the standard TensorFlow Serving APIs.


TensorFlow can also deal with multiple requests clustering them as a single request using the concept of batching. The Batcher in TensorFlow architecture significantly reduces the overall performance cost, especially when the framework requires hardware accelerators like GPUs. TensorFlow Serving comes with a request batching widget allowing clients to batch inferences as per the type specified. Such a form of batch requests helps the TensorFlow-based algorithms process more efficiently.

Found This Page Useful? Share It!
Get the Latest Tutorials and Updates
Join us on Telegram