From 650c90a19c817ce8054c9e69845eb72ed80cb443 Mon Sep 17 00:00:00 2001 From: amandarichardsonn <30413257+amandarichardsonn@users.noreply.github.com> Date: Thu, 9 Jun 2022 01:00:10 -0400 Subject: [PATCH 01/26] Partial Fortran Update On hold --- doc/sr_fortran_walkthrough.rst | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/doc/sr_fortran_walkthrough.rst b/doc/sr_fortran_walkthrough.rst index e9fd1543a..d81fad9d2 100644 --- a/doc/sr_fortran_walkthrough.rst +++ b/doc/sr_fortran_walkthrough.rst @@ -6,18 +6,19 @@ Fortran .. _fortran_client_examples: -In this section, examples are presented using the SmartRedis Fortran -API to interact with the RedisAI tensor, model, and script -data types. Additionally, an example of utilizing the -SmartRedis ``DataSet`` API is also provided. +Below are examples that use the SmartRedis Fortran API to +interact with the RedisAI tensor, model, and script data types. +Additionally, this section demonstrates how to utilize the SmartRedis ``DataSet`` API. + + .. note:: The Fortran API examples rely on the ``SSDB`` environment - variable being set to the address and port of the Redis database. + variable set to the address and port of the Redis database. .. note:: - The Fortran API examples are written - to connect to a clustered database or clustered SmartSim Orchestrator. + The Fortran API examples + connect to a clustered database or clustered SmartSim Orchestrator. Update the ``Client`` constructor ``cluster`` flag to `.false.` to connect to a single shard (single compute host) database. @@ -25,10 +26,9 @@ SmartRedis ``DataSet`` API is also provided. Tensors ======= -The SmartRedis Fortran client is used to communicate between -a Fortran client and the Redis database. In this example, -the client will be used to send an array to the database -and then unpack the data into another Fortran array. +The SmartRedis Fortran client communicates between a Fortran +client and the Redis database. In this example, the client sends an array +to the database and then unpacks the data into another Fortran array. This example will go step-by-step through the program and then present the entirety of the example code at the end. @@ -68,7 +68,7 @@ if using a clustered database or ``.false.`` otherwise. After the SmartRedis client has been initialized, a Fortran array of any dimension and shape -and with a type of either 8, 16, 32, 64 bit +and with a type of either 8, 16, 32, 64-bit ``integer`` or 32 or 64-bit ``real`` can be put into the database using the type-bound procedure ``put_tensor``. @@ -77,7 +77,7 @@ data, the array ``send_array_real_64`` will be filled with random numbers and stored in the database using ``put_tensor``. This subroutine requires the user to specify a string used as the -'key' (here: ``send_array``) identifying the tensor +'key' (here: ``send_array``) to identify the tensor in the database, the array to be stored, and the shape of the array. @@ -383,4 +383,4 @@ Python Pre-Processing: .. literalinclude:: ../smartredis/examples/common/mnist_data/data_processing_script.txt :linenos: :language: Python - :lines: 15-20 \ No newline at end of file + :lines: 15-20 From f93a6d11ae54a203690aa5bd955a3fdd9d2352a6 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 6 Dec 2023 13:54:40 -0600 Subject: [PATCH 02/26] creating a temp file that holds orch info --- doc/orch_hold_file.rst | 0 1 file changed, 0 insertions(+), 0 deletions(-) create mode 100644 doc/orch_hold_file.rst diff --git a/doc/orch_hold_file.rst b/doc/orch_hold_file.rst new file mode 100644 index 000000000..e69de29bb From 1bcbb910e1cbe654dda3ef633566acc752dbb612 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 6 Dec 2023 17:29:54 -0600 Subject: [PATCH 03/26] updates to std orch example --- doc/orch_hold_file.rst | 310 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 310 insertions(+) diff --git a/doc/orch_hold_file.rst b/doc/orch_hold_file.rst index e69de29bb..6ac7da5c9 100644 --- a/doc/orch_hold_file.rst +++ b/doc/orch_hold_file.rst @@ -0,0 +1,310 @@ +************ +Orchestrator +************ + +======== +Overview +======== +The ``Orchestrator`` is an in-memory database that is launched prior to all other +entities within an ``Experiment``. The ``Orchestrator`` can be used to store and retrieve +data during the course of an experiment and across multiple entities. In order to +stream data into or receive data from the ``Orchestrator``, one of the SmartSim clients +(SmartRedis) has to be used within a Model. + +.. |orchestrator| image:: images/Orchestrator.png + :width: 700 + :alt: Alternative text + +|orchestrator| + +Combined with the SmartRedis clients, the ``Orchestrator`` is capable of hosting and executing +AI models written in Python on CPU or GPU. The ``Orchestrator`` supports models written with +TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sci-kit learn). + +====================== +Clustered Orchestrator +====================== +The ``Orchestrator`` supports single node and distributed memory settings. This means +that a single compute host can be used for the database or multiple by specifying +``db_nodes`` to be greater than 1. + +.. |cluster-orc| image:: images/clustered-orc-diagram.png + :width: 700 + :alt: Alternative text + +|cluster-orc| + + +With a clustered ``Orchestrator``, multiple compute hosts memory can be used together +to store data. As well, the CPU or GPU(s) where the ``Orchestrator`` is running can +be used to execute the AI models, and Torchscript code on data stored within it. + +Users do not need to know how the data is stored in a clustered configuration and +can address the cluster with the SmartRedis clients like a single block of memory +using simple put/get semantics in SmartRedis. SmartRedis will ensure that data +is evenly distributed amoungst all nodes in the cluster. + +The cluster deployment is optimal for high data throughput scenarios such as +online analysis, training and processing. + +Example +------- +This example provides a demonstration on automating the deployment of +a standard Orchestrator, connecting a SmartRedis Client from +within the + +The Application Script +---------------------- + +To begin writing the application script, import the necessary packages: +.. code-block:: python + + from smartredis import Client, log_data + from smartredis import * + import numpy as np + +Initialize the Client +^^^^^^^^^^^^^^^^^^^^^ +To establish a connection with the standard database, +we need to initialize a new SmartRedis client. +Since the standard database we launch in the driver script +multi-sharded, specify `cluster` as `True`: + +.. code-block:: python + + # Initialize a Client + standard_db_client = Client(cluster=True) + +Retrieve Data +^^^^^^^^^^^^^ +To confirm a successful connection to the database, we will retrieve the tensor +that we store in the python driver script. +Use the ``Client.get_tensor()`` method to +retrieve the tensor by specifying the name `tensor_1` we +used during ``Client.put_tensor()`` in the driver script: +.. code-block:: python + + # Retrieve tensor from driver script + value_1 = standard_db_client.get_tensor("tensor_1") + # Log tensor + standard_db_client.log_data(LLInfo, f"The single sharded db tensor is: {value_1}") + +Later, when you run the experiment driver script the following output will appear in ``model.out`` +located in ``getting-started-multidb/tutorial_model/``:: + Default@17-11-48:The single sharded db tensor is: [1 2 3 4] + +Store Data +^^^^^^^^^^ +Next, create a NumPy tensor to send to the standard database to retrieve +in the driver script by using ``Client.put_tensor(name, data)``: +.. code-block:: python + + # Create NumPy array + array_2 = np.array([5, 6, 7, 8]) + # Use SmartRedis client to place tensor in single sharded db + standard_db_client.put_tensor("tensor_2", array_2) + +The Experiment Driver Script +---------------------------- +To run the previous application, we must define workflow stages within a workload. +Defining workflow stages requires the utilization of functions associated +with the ``Experiment`` object. The Experiment object is intended to be instantiated +once and utilized throughout the workflow runtime. +In this example, we instantiate an ``Experiment`` object with the name ``getting-started-multidb``. +We setup the SmartSim ``logger`` to output information from the Experiment. + +.. code-block:: python + + import numpy as np + from smartredis import Client + from smartsim import Experiment + from smartsim.log import get_logger + import sys + + exe_ex = sys.executable + logger = get_logger("Example Experiment Log") + # Initialize the Experiment + exp = Experiment("tester", launcher="auto") + +Launch Standard Orchestrator +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +In the context of this ``Experiment``, it's essential to create and launch +the databases as a preliminary step before any other components since +the application script requests tensors from the launched databases. + +We aim to showcase the multi-database automation capabilities of SmartSim, so we +create two databases in the workflow: a single-sharded database and a +multi-sharded database. +Step 1: Initialize Orchestrator +""""""""""""""""""""""""""""""" +To create an database, utilize the ``Experiment.create_database()`` function. +.. code-block:: python + + # Initialize a multi sharded database + standard_db = exp.create_database(port=6379, db_nodes=3, interface="ib0") + exp.generate(standard_db, overwrite=True) + +Step 2: Start Databases +""""""""""""""""""""""" +Next, to launch the databases, +pass the database instances to ``Experiment.start()``. +.. code-block:: python + + # Launch the multi sharded database + exp.start(standard_db) + +The ``Experiment.start()`` function launches the ``Orchestrators`` for use within the workflow. In other words, the function +deploys the databases on the allocated compute resources. + +Create Client Connections to Orchestrator +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The SmartRedis ``Client`` object contains functions that manipulate, send, and receive +data within the database. Each database has a single, dedicated SmartRedis ``Client``. +Begin by initializing a SmartRedis ``Client`` object per launched database. + +To create a designated SmartRedis ``Client``, you need to specify the address of the target +running database. You can easily retrieve this address using the ``Orchestrator.get_address()`` function. + +.. code-block:: python + + # Initialize SmartRedis client for multi sharded database + driver_client_standard_db = Client(cluster=True, address=standard_db.get_address()[0]) + +Store Data Using Clients +^^^^^^^^^^^^^^^^^^^^^^^^ +In the application script, we retrieved two NumPy tensors. +To support the apps functionality, we will create two +NumPy arrays in the python driver script and send them to the a database. To +accomplish this, we use the ``Client.put_tensor()`` function with the respective +database client instances. +.. code-block:: python + + # Create NumPy array + array_1 = np.array([1, 2, 3, 4]) + # Use single shard db SmartRedis client to place tensor in single sharded db + driver_client_standard_db.put_tensor("tensor_1", array_1) + +Initialize a Model +^^^^^^^^^^^^^^^^^^ +In the next stage of the experiment, we +launch the application script with a co-located database +by configuring and creating +a SmartSim colocated ``Model``. + +Step 1: Configure +""""""""""""""""" +You can specify the run settings of a model. +In this experiment, we invoke the Python interpreter to run +the python script defined in section: :ref:`The Application Script`. +To configure this into a ``Model``, we use the ``Experiment.create_run_settings()`` function. +The function returns a ``RunSettings`` object. +When initializing the RunSettings object, +we specify the path to the application file, +`application_script.py`, for +``exe_args``, and the run command for ``exe``. +.. code-block:: python + + # Initialize a RunSettings object + model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/lus/scratch/richaama/standard_orch_model.py") + model_settings.set_nodes(1) + +Step 2: Initialize +"""""""""""""""""" +Next, create a ``Model`` instance using the ``Experiment.create_model()``. +Pass the ``model_settings`` object as an argument +to the ``create_model()`` function and assign to the variable ``model``. +.. code-block:: python + + # Initialize the Model + model = exp.create_model("model", model_settings) + +Step 3: Start +""""""""""""" +Next, launch the colocated model instance using the ``Experiment.start()`` function. +.. code-block:: python + + # Launch the Model + exp.start(model, block=True, summary=True) + +Retrieve Data Using Clients +^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. code-block:: python + + # Retrieve the tensors placed by the Model + value_2 = driver_client_standard_db.poll_key("tensor_2", 100, 100) + # Validate that the tensor exists + logger.info(f"The tensor is {value_2}") + +Cleanup Experiment +^^^^^^^^^^^^^^^^^^ +.. code-block:: python + + # Cleanup the database + exp.stop(standard_db) + logger.info(exp.summary()) + +How to Run the Example +---------------------- +Source Code +----------- +.. sourcecode:: +====================== +Colocated Orchestrator +====================== +A co-located Orchestrator is a special type of Orchestrator that is deployed on +the same compute hosts an a ``Model`` instance defined by the user. In this +deployment, the database is *not* connected together in a cluster and each +shard of the database is addressed individually by the processes running +on that compute host. + +.. |colo-orc| image:: images/co-located-orc-diagram.png + :width: 700 + :alt: Alternative text + + +|colo-orc| + +This deployment is designed for highly performant online inference scenarios where +a distributed process (likely MPI processes) are performing inference with +data local to each process. + +This method is deemed ``locality based inference`` since data is local to each +process and the ``Orchestrator`` is deployed locally on each compute host where +the distributed application is running. + +Example +------- +The Application Script +---------------------- +Initialize the Clients +^^^^^^^^^^^^^^^^^^^^^^ +Retrieve Data and Store Using SmartRedis Client Objects +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The Experiment Driver Script +---------------------------- +Initialize a Colocated Model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +Step 1: Configure +""""""""""""""""" +Step 2: Initialize +"""""""""""""""""" +Step 2: Colocate +"""""""""""""""" +Step 3: Start +""""""""""""" +Cleanup Experiment +^^^^^^^^^^^^^^^^^^ +How to Run the Example +---------------------- +Source Code +----------- + +====================== +Multiple Orchestrators +====================== + +Example +------- + +Source Code +----------- \ No newline at end of file From 5e255b8cd2c1c69aea6c50b9b5c4572d6d062714 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 6 Dec 2023 17:35:38 -0600 Subject: [PATCH 04/26] edits to colo example --- doc/orch_hold_file.rst | 59 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 57 insertions(+), 2 deletions(-) diff --git a/doc/orch_hold_file.rst b/doc/orch_hold_file.rst index 6ac7da5c9..1e6505b8d 100644 --- a/doc/orch_hold_file.rst +++ b/doc/orch_hold_file.rst @@ -276,24 +276,79 @@ Example ------- The Application Script ---------------------- +.. code-block:: python + + from smartredis import ConfigOptions, Client, log_data + from smartredis import * + import numpy as np + Initialize the Clients ^^^^^^^^^^^^^^^^^^^^^^ -Retrieve Data and Store Using SmartRedis Client Objects -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +.. code-block:: python + # Initialize a Client + colo_client = Client(cluster=False) + +Store Data +^^^^^^^^^^ +.. code-block:: python + # Create NumPy array + array_1 = np.array([1, 2, 3, 4]) + # Use SmartRedis client to place tensor in single sharded db + colo_client.put_tensor("tensor_1", array_1) +Retrieve Data +^^^^^^^^^^^^^ +.. code-block:: python + # Retrieve tensor from driver script + value_1 = colo_client.get_tensor("tensor_1") + # Log tensor + colo_client.log_data(LLInfo, f"The colocated db tensor is: {value_1}") + The Experiment Driver Script ---------------------------- +.. code-block:: python + import numpy as np + from smartredis import Client + from smartsim import Experiment + from smartsim.log import get_logger + import sys + + exe_ex = sys.executable + logger = get_logger("Example Experiment Log") + # Initialize the Experiment + exp = Experiment("tester", launcher="auto") + Initialize a Colocated Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Step 1: Configure """"""""""""""""" +.. code-block:: python + # Initialize a RunSettings object + model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/lus/scratch/richaama/clustered_model.py") + # Configure RunSettings object + model_settings.set_nodes(1) + Step 2: Initialize """""""""""""""""" +.. code-block:: python + # Initialize a SmartSim Model + model = exp.create_model("colo_model", model_settings) Step 2: Colocate """""""""""""""" +.. code-block:: python + # Colocate the Model + model.colocate_db_tcp() + Step 3: Start """"""""""""" +.. code-block:: python + # Launch the colocated Model + exp.start(model, block=True, summary=True) Cleanup Experiment ^^^^^^^^^^^^^^^^^^ +.. code-block:: python + + logger.info(exp.summary()) + How to Run the Example ---------------------- Source Code From 787ae5063b46d40ffa0ce7b12ba9f4617b1606ef Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 13 Dec 2023 13:07:50 -0600 Subject: [PATCH 05/26] pushing updates --- doc/orchestrator.rst | 517 ++++++++++++++++++++++++++++++++++++------- 1 file changed, 437 insertions(+), 80 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 320f03a10..92f17a3b9 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -2,7 +2,9 @@ Orchestrator ************ - +======== +Overview +======== The ``Orchestrator`` is an in-memory database that is launched prior to all other entities within an ``Experiment``. The ``Orchestrator`` can be used to store and retrieve data during the course of an experiment and across multiple entities. In order to @@ -19,13 +21,18 @@ Combined with the SmartRedis clients, the ``Orchestrator`` is capable of hosting AI models written in Python on CPU or GPU. The ``Orchestrator`` supports models written with TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sci-kit learn). - -Cluster Orchestrator -==================== - -The ``Orchestrator`` supports single node and distributed memory settings. This means -that a single compute host can be used for the database or multiple by specifying -``db_nodes`` to be greater than 1. +====================== +Clustered Orchestrator +====================== +-------- +Overview +-------- +A clustered Orchestrator is a type of deployment where the application and database +are launched on separate compute nodes. A clustered Orchestrator may be single-sharded +(allocated one database node) or multi-sharded (allocated multiple database nodes). +When initializing an ``Orchestrator`` within an Experiment, you may set +the argument `db_nodes` to be 1 or greater than 2. This parameter controls the number +of database nodes your in-memory database will span across. .. |cluster-orc| image:: images/clustered-orc-diagram.png :width: 700 @@ -33,28 +40,247 @@ that a single compute host can be used for the database or multiple by specifyin |cluster-orc| - -With a clustered ``Orchestrator``, multiple compute hosts memory can be used together -to store data. As well, the CPU or GPU(s) where the ``Orchestrator`` is running can -be used to execute the AI models, and Torchscript code on data stored within it. +Clustered Orchestrators support data communication across multiple simulations. +Given that a clustered database is standalone, meaning the database compute node +is separate from the application compute node, the database node does not tear +down after the finish of a SmartSim Model, unlike a colocated orchestrator. +With standalone database deployment, SmartSim can run AI models, and Torchscript +code on the CPU(s) or GPU(s) with existing data in the ``Orchestrator``. +Produced data can then requested by another application. Users do not need to know how the data is stored in a clustered configuration and can address the cluster with the SmartRedis clients like a single block of memory -using simple put/get semantics in SmartRedis. SmartRedis will ensure that data -is evenly distributed amoungst all nodes in the cluster. +using simple put/get semantics in SmartRedis. The cluster deployment is optimal for high data throughput scenarios such as online analysis, training and processing. +------- +Example +------- +This example provides a demonstration on automating the deployment of +a standard Orchestrator. Once the standard database is started, +we demonstrate connecting a client from within the driver script. + +The Application Script +====================== +To begin writing the application script, import the necessary packages: + +.. code-block:: python + + from smartredis import Client, log_data + from smartredis import * + import numpy as np + +Initialize the Client +--------------------- +To establish a connection with the standard database, +we need to initialize a new SmartRedis client. +Since the standard database we launch in the driver script +multi-sharded, we specify the `cluster` as `True`: + +.. code-block:: python + + # Initialize a Client + standard_db_client = Client(cluster=True) + +Retrieve Data +------------- +To confirm a successful connection to the database, we retrieve the tensor +we store in the Python driver script. +Use the ``Client.get_tensor()`` method to +retrieve the tensor by specifying the name `tensor_1` we +used during ``Client.put_tensor()`` in the driver script: +.. code-block:: python + + # Retrieve tensor from Orchestrator + value_1 = standard_db_client.get_tensor("tensor_1") + # Log tensor + standard_db_client.log_data(LLInfo, f"The single sharded db tensor is: {value_1}") + +Later, when you run the experiment driver script the following output will appear in ``model.out`` +located in ``getting-started/tutorial_model/``:: + + Default@17-11-48:The single sharded db tensor is: [1 2 3 4] + +Store Data +---------- +Next, create a NumPy tensor to send to the standard database using +``Client.put_tensor(name, data)``: +.. code-block:: python + + # Create a NumPy array + array_2 = np.array([5, 6, 7, 8]) + # Use SmartRedis client to place tensor in multi-sharded db + standard_db_client.put_tensor("tensor_2", array_2) + +We will retrieve `"tensor_2"` in the Python driver script. + +The Experiment Driver Script +============================ +To run the previous application, we must define a Model and Orchestrator within an +experiment. Defining workflow stages requires the utilization of functions associated +with the ``Experiment`` object. The Experiment object is intended to be instantiated +once and utilized throughout the workflow runtime. +In this example, we instantiate an ``Experiment`` object with the name ``getting-started``. +We setup the SmartSim ``logger`` to output information from the Experiment: + +.. code-block:: python + + import numpy as np + from smartredis import Client + from smartsim import Experiment + from smartsim.log import get_logger + import sys + + exe_ex = sys.executable + logger = get_logger("Example Experiment Log") + # Initialize the Experiment + exp = Experiment("getting-started", launcher="auto") + +Launch Standard Orchestrator +---------------------------- +In the context of this ``Experiment``, it's essential to create and launch +the databases as a preliminary step before any other components since +the application script requests and sends tensors from a launched databases. + +We aim to demonstrate the standard orchestrator automation capabilities of SmartSim, so we +create a single database in the workflow: a multi-sharded database. + +Step 1: Initialize Orchestrator +''''''''''''''''''''''''''''''' +To create a standard database, utilize the ``Experiment.create_database()`` function. +.. code-block:: python -Co-located Orchestrator -======================= + # Initialize a multi sharded database + standard_db = exp.create_database(db_nodes=3) + exp.generate(standard_db) -A co-located Orchestrator is a special type of Orchestrator that is deployed on -the same compute hosts an a ``Model`` instance defined by the user. In this -deployment, the database is *not* connected together in a cluster and each -shard of the database is addressed individually by the processes running -on that compute host. +Step 2: Start Databases +''''''''''''''''''''''' +Next, to launch the database, +pass the database instance to ``Experiment.start()``. +.. code-block:: python + + # Launch the multi sharded database + exp.start(standard_db) + +The ``Experiment.start()`` function launches the ``Orchestrator`` for use within the workflow. +In other words, the function deploys the database on the allocated compute resources. + +Create Client Connections to Orchestrator +----------------------------------------- +The SmartRedis ``Client`` object contains functions that manipulate, send, and receive +data within the database. Each database can have a single, dedicated SmartRedis ``Client``. +Begin by initializing a SmartRedis ``Client`` object for the standard database. + +When creating a client connect from within a driver script, +you need to specify the address of the database you would like to connect to. +You can easily retrieve this address using the ``Orchestrator.get_address()`` function: + +.. code-block:: python + + # Initialize a SmartRedis client for multi sharded database + driver_client_standard_db = Client(cluster=True, address=standard_db.get_address()[0]) + +Store Data Using Clients +------------------------ +In the application script, we retrieved a NumPy tensor. +To support the apps functionality, we create a +NumPy array in the python driver script to send to the a database. To +accomplish this, we use ``Client.put_tensor()``: +.. code-block:: python + + # Create NumPy array + array_1 = np.array([1, 2, 3, 4]) + # Use multi shard db SmartRedis client to place tensor standard database + driver_client_standard_db.put_tensor("tensor_1", array_1) + +Initialize a Model +------------------ +In the next stage of the experiment, we +launch the application script by configuring and creating +a SmartSim ``Model``. + +Step 1: Configure +''''''''''''''''' +You can specify the run settings of a model. +In this experiment, we invoke the Python interpreter to run +the python script defined in section: The Application Script. +To configure this into a ``Model``, we use the ``Experiment.create_run_settings()`` function. +The function returns a ``RunSettings`` object. +When initializing the ``RunSettings`` object, +we specify the path to the application file, +`application_script.py`, for +``exe_args``, and the run command for ``exe``. +.. code-block:: python + + # Initialize a RunSettings object + model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/lus/scratch/richaama/standard_orch_model.py") + model_settings.set_nodes(1) + +Step 2: Initialize +'''''''''''''''''' +Next, create a ``Model`` instance using the ``Experiment.create_model()``. +Pass the ``model_settings`` object as an argument +to the ``create_model()`` function and assign to the variable ``model``: +.. code-block:: python + + # Initialize the Model + model = exp.create_model("model", model_settings) + +Step 3: Start +''''''''''''' +Next, launch the model instance using the ``Experiment.start()`` function. +.. code-block:: python + + # Launch the Model + exp.start(model, block=True, summary=True) + +.. note:: + We specify `block=True` to ``exp.start()`` because our experiment + requires that the ``Model`` finish before the experiment continues. + This is because we will request tensors from the database that + are inputted by the Model we launched. + +Poll Data Using Clients +----------------------- +Next, check if the tensor exist in the standard database using ``Client.poll_tensor()``. +This function queries for data in the database. The function requires the tensor name (`name`), +how many milliseconds to wait in between queries (`poll_frequency_ms`), +and the total number of times to query (`num_tries`): +.. code-block:: python + + # Retrieve the tensors placed by the Model + value_2 = driver_client_standard_db.poll_key("tensor_2", 100, 100) + # Validate that the tensor exists + logger.info(f"The tensor is {value_2}") + +The output will be as follows:: + + test + +Cleanup Experiment +------------------ +Finally, use the ``Experiment.stop()`` function to stop the database instances. Print the +workflow summary with ``Experiment.summary()``: + +.. code-block:: python + + # Cleanup the database + exp.stop(standard_db) + logger.info(exp.summary()) + +When you run the experiment, the following output will appear:: + test + +====================== +Colocated Orchestrator +====================== +During colocated deployment, the application and database are deployed on the same +compute node.In this deployment, the database is *not* connected together in a +cluster and each shard of the database is addressed individually by the processes +running on that compute host. .. |colo-orc| image:: images/co-located-orc-diagram.png :width: 700 @@ -71,84 +297,215 @@ This method is deemed ``locality based inference`` since data is local to each process and the ``Orchestrator`` is deployed locally on each compute host where the distributed application is running. - -To create a co-located model, first, create a ``Model`` instance and then call -the ``Model.colocated_db`` function. - -.. currentmodule:: smartsim.entity.model - -.. automethod:: Model.colocate_db - :noindex: - -Here is an example of creating a simple model that is co-located with an -``Orchestrator`` deployment - +Example +------- +This example demonstrates using SmartSim functions and classes to + +This example provides a demonstration on automating the deployment of +a colocated Orchestrator within an Experiment. + +The example is comprised of two script files: + +* The Application Script +* The Experiment Driver Script + +**The Application Script Overview:** +The example application script is a Python file that contains +instructions to create and connect a SmartRedis +client to the colocated Orchestrator. +Since a colocated Orchestrator is launched when the Model +is started by the experiment, you may only connect +a SmartRedis client to a colocated database from within +the associated colocated Model script. + +**The Application Script Contents:** + +1. Connecting a SmartRedis client within the application to send and retrieve a tensor + from the colocated database. + +**The Experiment Driver Script Overview:** +The experiment driver script launches and manages +the example entities with the ``Experiment`` API. +In the driver script, we use the ``Experiment`` +to create and launch a colocated ``Model`` instance +launches a colocated Orchestrator and runs the application +script. + +**The Experiment Driver Script Contents:** + +1. Launching the application script with a co-located database. + +The Application Script +---------------------- +A SmartRedis client connects and interacts with +a launched Orchestrator. +In this section, we write an application script +that we will use as an executable argument +for the colocated Model. We demonstrate +how to connect a SmartRedis +client to the active colocated database. +Using the created client, we send a tensor +from the database, then retrieve. + +.. note:: + You must run the Python driver script to launch the Orchestrator within the + application script. Otherwise, there will be no database to connect the + client to. + +To begin writing the application script, provide the imports: .. code-block:: python - from smartsim import Experiment - exp = Experiment("colo-test", launcher="auto") + from smartredis import ConfigOptions, Client, log_data + from smartredis import * + import numpy as np - colo_settings = exp.create_run_settings(exe="./some_mpi_app") +Initialize the Clients +^^^^^^^^^^^^^^^^^^^^^^ +To establish a connection with the colocated database, +initialize a new SmartRedis client and specify `cluster=False` +since our database is single-sharded: +.. code-block:: python - colo_model = exp.create_model("colocated_model", colo_settings) - colo_model.colocate_db( - port=6780, # database port - db_cpus=1, # cpus given to the database on each node - debug=False # include debug information (will be slower) - limit_app_cpus=False, # don't overscubscribe app with database cpus - ifname=network_interface # specify network interface to use (i.e. "ib0") - ) - exp.start(colo_model) + # Initialize a Client + colo_client = Client(cluster=False) +.. note:: + Since there is only one database launched in the Experiment + (the colocated database), specifying a a datbase address + is not required when initializing the client. + SmartRedis will handle the connection. -By default, SmartSim will attempt to make sure that the database and the application -do not fight over resources by taking over the affinity mapping process locally on -each node. This can be disabled by setting ``limit_app_cpus`` to ``False``. +Store Data +^^^^^^^^^^ +Next, using the SmartRedis client instance, we create and store +a NumPy tensor using ``Client.put_tensor()``: +.. code-block:: python + # Create NumPy array + array_1 = np.array([1, 2, 3, 4]) + # Store the NumPy tensor + colo_client.put_tensor("tensor_1", array_1) -Redis -===== +Retrieve Data +^^^^^^^^^^^^^ +Next, retrieve the tensor using ``Client.get_tensor()``: +.. code-block:: python -.. _Redis: https://github.com/redis/redis -.. _RedisAI: https://github.com/RedisAI/RedisAI + # Retrieve tensor from driver script + value_1 = colo_client.get_tensor("tensor_1") + # Log tensor + colo_client.log_data(LLInfo, f"The colocated db tensor is: {value_1}") + +When the Experiment completes, you can find the following log message in `colo_model.out`:: + Default@21-48-01:The colocated db tensor is: [1 2 3 4] + +The Experiment Driver Script +---------------------------- +To run the application, specify a Model workload from +within the workflow (Experiment). +Defining workflow stages requires the utilization of functions associated +with the ``Experiment`` object. +In this example, we instantiate an ``Experiment`` object with the name ``getting-started``. +We setup the SmartSim ``logger`` to output information from the Experiment. +.. code-block:: python -The ``Orchestrator`` is built on `Redis`_. Largely, the job of the ``Orchestrator`` is to -create a Python reference to a Redis deployment so that users can launch, monitor -and stop a Redis deployment on workstations and HPC systems. + import numpy as np + from smartredis import Client + from smartsim import Experiment + from smartsim.log import get_logger + import sys + + exe_ex = sys.executable + logger = get_logger("Example Experiment Log") + # Initialize the Experiment + exp = Experiment("getting-started", launcher="auto") + +Initialize a Colocated Model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +In the next stage of the experiment, we +create and launch a colocated ``Model`` that +runs the application script with a database +on the same compute node. + +Step 1: Configure +""""""""""""""""" +In this experiment, we invoke the Python interpreter to run +the python script defined in section: The Application Script. +To configure this into a ``Model``, we use the ``Experiment.create_run_settings()`` function. +The function returns a ``RunSettings`` object. +A ``RunSettings`` allows you to configure +the run settings of a SmartSim entity. +We initialize a RunSettings object and +specify the path to the application file, +`application_script.py`, to the argument +``exe_args``, and the run command to ``exe``. + +.. note:: + Change the `exe_args` argument to the path of the application script + on your file system to run the example. + +Use the ``RunSettings`` helper functions to +configure the the distribution of computational tasks (``RunSettings.set_nodes()``). In this +example, we specify to SmartSim that we intend the Model to run on a single compute node. -Redis was chosen for the Orchestrator because it resides in-memory, can be distributed on-node -as well as across nodes, and provides low-latency data access to many clients in parallel. The -Redis ecosystem was a primary driver as the Redis module system provides APIs for languages, -libraries, and techniques used in Data Science. In particular, the ``Orchestrator`` -relies on `RedisAI`_ to provide access to Machine Learning runtimes. +.. code-block:: python -At its core, Redis is a key-value store. This means that put/get semantics are used to send -messages to and from the database. SmartRedis clients use a specific hashing algorithm, CRC16, to ensure -that data is evenly distributed amongst all database nodes. Notably, a user is not required to -know where (which database node) data or Datasets (see Dataset API) are stored as the -SmartRedis clients will infer their location for the user. + # Initialize a RunSettings object + model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/lus/scratch/richaama/clustered_model.py") + # Configure RunSettings object + model_settings.set_nodes(1) +Step 2: Initialize +"""""""""""""""""" +Next, create a ``Model`` instance using the ``Experiment.create_model()``. +Pass the ``model_settings`` object as an argument +to the ``create_model()`` function and assign to the variable ``model``. +.. code-block:: python -KeyDB -===== + # Initialize a SmartSim Model + model = exp.create_model("colo_model", model_settings) -.. _KeyDB: https://github.com/EQ-Alpha/KeyDB +Step 2: Colocate +"""""""""""""""" +To colocate the model, use the ``Model.colocate_db_uds()`` function. +This function will colocate an Orchestrator instance with this Model over +a Unix domain socket connection. +.. code-block:: python -`KeyDB`_ is a multi-threaded fork of Redis that can be swapped in as the database for -the ``Orchestrator`` in SmartSim. KeyDB can be swapped in for Redis by setting the -``REDIS_PATH`` environment variable to point to the ``keydb-server`` binary. + # Colocate the Model + model.colocate_db_tcp() -A full example of configuring KeyDB to run in SmartSim is shown below +Step 3: Start +""""""""""""" +Next, launch the colocated model instance using the ``Experiment.start()`` function. +.. code-block:: python -.. code-block:: bash + # Launch the colocated Model + exp.start(model, block=True, summary=True) - # build KeyDB - # see https://github.com/EQ-Alpha/KeyDB +test - # get KeyDB configuration file - wget https://github.com/CrayLabs/SmartSim/blob/d3d252b611c9ce9d9429ba6eeb71c15471a78f08/smartsim/_core/config/keydb.conf +Cleanup Experiment +^^^^^^^^^^^^^^^^^^ - export REDIS_PATH=/path/to/keydb-server - export REDIS_CONF=/path/to/keydb.conf +.. code-block:: python - # run smartsim workload + logger.info(exp.summary()) + +When you run the experiment, the following output will appear:: + + | | Name | Entity-Type | JobID | RunID | Time | Status | Returncode | + |----|--------|---------------|-----------|---------|---------|-----------|--------------| + | 0 | model | Model | 1592652.0 | 0 | 10.1039 | Completed | 0 | + +====================== +Multiple Orchestrators +====================== +SmartSim supports automating the deployment of multiple Orchestrators +from within an Experiment. Data communication for all +Communication with the database via a SmartRedis Client is possible from the +`db_identifier` argument that required when initializing an Orchestrator or +colocated Model during a multi database Experiment. When initializing a SmartRedis +client during the Experiment, first create a ``ConfigOptions`` object +with the `db_identifier` argument created during before passing object to the Client() +init call. \ No newline at end of file From fd6b3be1273fe4a7ac7b8fa1df5f753a9ccb6ae8 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 13 Dec 2023 13:09:57 -0600 Subject: [PATCH 06/26] pushing changes --- doc/orch_hold_file.rst | 365 ----------------------------------------- 1 file changed, 365 deletions(-) delete mode 100644 doc/orch_hold_file.rst diff --git a/doc/orch_hold_file.rst b/doc/orch_hold_file.rst deleted file mode 100644 index 1e6505b8d..000000000 --- a/doc/orch_hold_file.rst +++ /dev/null @@ -1,365 +0,0 @@ -************ -Orchestrator -************ - -======== -Overview -======== -The ``Orchestrator`` is an in-memory database that is launched prior to all other -entities within an ``Experiment``. The ``Orchestrator`` can be used to store and retrieve -data during the course of an experiment and across multiple entities. In order to -stream data into or receive data from the ``Orchestrator``, one of the SmartSim clients -(SmartRedis) has to be used within a Model. - -.. |orchestrator| image:: images/Orchestrator.png - :width: 700 - :alt: Alternative text - -|orchestrator| - -Combined with the SmartRedis clients, the ``Orchestrator`` is capable of hosting and executing -AI models written in Python on CPU or GPU. The ``Orchestrator`` supports models written with -TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sci-kit learn). - -====================== -Clustered Orchestrator -====================== -The ``Orchestrator`` supports single node and distributed memory settings. This means -that a single compute host can be used for the database or multiple by specifying -``db_nodes`` to be greater than 1. - -.. |cluster-orc| image:: images/clustered-orc-diagram.png - :width: 700 - :alt: Alternative text - -|cluster-orc| - - -With a clustered ``Orchestrator``, multiple compute hosts memory can be used together -to store data. As well, the CPU or GPU(s) where the ``Orchestrator`` is running can -be used to execute the AI models, and Torchscript code on data stored within it. - -Users do not need to know how the data is stored in a clustered configuration and -can address the cluster with the SmartRedis clients like a single block of memory -using simple put/get semantics in SmartRedis. SmartRedis will ensure that data -is evenly distributed amoungst all nodes in the cluster. - -The cluster deployment is optimal for high data throughput scenarios such as -online analysis, training and processing. - -Example -------- -This example provides a demonstration on automating the deployment of -a standard Orchestrator, connecting a SmartRedis Client from -within the - -The Application Script ----------------------- - -To begin writing the application script, import the necessary packages: -.. code-block:: python - - from smartredis import Client, log_data - from smartredis import * - import numpy as np - -Initialize the Client -^^^^^^^^^^^^^^^^^^^^^ -To establish a connection with the standard database, -we need to initialize a new SmartRedis client. -Since the standard database we launch in the driver script -multi-sharded, specify `cluster` as `True`: - -.. code-block:: python - - # Initialize a Client - standard_db_client = Client(cluster=True) - -Retrieve Data -^^^^^^^^^^^^^ -To confirm a successful connection to the database, we will retrieve the tensor -that we store in the python driver script. -Use the ``Client.get_tensor()`` method to -retrieve the tensor by specifying the name `tensor_1` we -used during ``Client.put_tensor()`` in the driver script: -.. code-block:: python - - # Retrieve tensor from driver script - value_1 = standard_db_client.get_tensor("tensor_1") - # Log tensor - standard_db_client.log_data(LLInfo, f"The single sharded db tensor is: {value_1}") - -Later, when you run the experiment driver script the following output will appear in ``model.out`` -located in ``getting-started-multidb/tutorial_model/``:: - Default@17-11-48:The single sharded db tensor is: [1 2 3 4] - -Store Data -^^^^^^^^^^ -Next, create a NumPy tensor to send to the standard database to retrieve -in the driver script by using ``Client.put_tensor(name, data)``: -.. code-block:: python - - # Create NumPy array - array_2 = np.array([5, 6, 7, 8]) - # Use SmartRedis client to place tensor in single sharded db - standard_db_client.put_tensor("tensor_2", array_2) - -The Experiment Driver Script ----------------------------- -To run the previous application, we must define workflow stages within a workload. -Defining workflow stages requires the utilization of functions associated -with the ``Experiment`` object. The Experiment object is intended to be instantiated -once and utilized throughout the workflow runtime. -In this example, we instantiate an ``Experiment`` object with the name ``getting-started-multidb``. -We setup the SmartSim ``logger`` to output information from the Experiment. - -.. code-block:: python - - import numpy as np - from smartredis import Client - from smartsim import Experiment - from smartsim.log import get_logger - import sys - - exe_ex = sys.executable - logger = get_logger("Example Experiment Log") - # Initialize the Experiment - exp = Experiment("tester", launcher="auto") - -Launch Standard Orchestrator -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -In the context of this ``Experiment``, it's essential to create and launch -the databases as a preliminary step before any other components since -the application script requests tensors from the launched databases. - -We aim to showcase the multi-database automation capabilities of SmartSim, so we -create two databases in the workflow: a single-sharded database and a -multi-sharded database. -Step 1: Initialize Orchestrator -""""""""""""""""""""""""""""""" -To create an database, utilize the ``Experiment.create_database()`` function. -.. code-block:: python - - # Initialize a multi sharded database - standard_db = exp.create_database(port=6379, db_nodes=3, interface="ib0") - exp.generate(standard_db, overwrite=True) - -Step 2: Start Databases -""""""""""""""""""""""" -Next, to launch the databases, -pass the database instances to ``Experiment.start()``. -.. code-block:: python - - # Launch the multi sharded database - exp.start(standard_db) - -The ``Experiment.start()`` function launches the ``Orchestrators`` for use within the workflow. In other words, the function -deploys the databases on the allocated compute resources. - -Create Client Connections to Orchestrator -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -The SmartRedis ``Client`` object contains functions that manipulate, send, and receive -data within the database. Each database has a single, dedicated SmartRedis ``Client``. -Begin by initializing a SmartRedis ``Client`` object per launched database. - -To create a designated SmartRedis ``Client``, you need to specify the address of the target -running database. You can easily retrieve this address using the ``Orchestrator.get_address()`` function. - -.. code-block:: python - - # Initialize SmartRedis client for multi sharded database - driver_client_standard_db = Client(cluster=True, address=standard_db.get_address()[0]) - -Store Data Using Clients -^^^^^^^^^^^^^^^^^^^^^^^^ -In the application script, we retrieved two NumPy tensors. -To support the apps functionality, we will create two -NumPy arrays in the python driver script and send them to the a database. To -accomplish this, we use the ``Client.put_tensor()`` function with the respective -database client instances. -.. code-block:: python - - # Create NumPy array - array_1 = np.array([1, 2, 3, 4]) - # Use single shard db SmartRedis client to place tensor in single sharded db - driver_client_standard_db.put_tensor("tensor_1", array_1) - -Initialize a Model -^^^^^^^^^^^^^^^^^^ -In the next stage of the experiment, we -launch the application script with a co-located database -by configuring and creating -a SmartSim colocated ``Model``. - -Step 1: Configure -""""""""""""""""" -You can specify the run settings of a model. -In this experiment, we invoke the Python interpreter to run -the python script defined in section: :ref:`The Application Script`. -To configure this into a ``Model``, we use the ``Experiment.create_run_settings()`` function. -The function returns a ``RunSettings`` object. -When initializing the RunSettings object, -we specify the path to the application file, -`application_script.py`, for -``exe_args``, and the run command for ``exe``. -.. code-block:: python - - # Initialize a RunSettings object - model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/lus/scratch/richaama/standard_orch_model.py") - model_settings.set_nodes(1) - -Step 2: Initialize -"""""""""""""""""" -Next, create a ``Model`` instance using the ``Experiment.create_model()``. -Pass the ``model_settings`` object as an argument -to the ``create_model()`` function and assign to the variable ``model``. -.. code-block:: python - - # Initialize the Model - model = exp.create_model("model", model_settings) - -Step 3: Start -""""""""""""" -Next, launch the colocated model instance using the ``Experiment.start()`` function. -.. code-block:: python - - # Launch the Model - exp.start(model, block=True, summary=True) - -Retrieve Data Using Clients -^^^^^^^^^^^^^^^^^^^^^^^^^^^ -.. code-block:: python - - # Retrieve the tensors placed by the Model - value_2 = driver_client_standard_db.poll_key("tensor_2", 100, 100) - # Validate that the tensor exists - logger.info(f"The tensor is {value_2}") - -Cleanup Experiment -^^^^^^^^^^^^^^^^^^ -.. code-block:: python - - # Cleanup the database - exp.stop(standard_db) - logger.info(exp.summary()) - -How to Run the Example ----------------------- -Source Code ------------ -.. sourcecode:: -====================== -Colocated Orchestrator -====================== -A co-located Orchestrator is a special type of Orchestrator that is deployed on -the same compute hosts an a ``Model`` instance defined by the user. In this -deployment, the database is *not* connected together in a cluster and each -shard of the database is addressed individually by the processes running -on that compute host. - -.. |colo-orc| image:: images/co-located-orc-diagram.png - :width: 700 - :alt: Alternative text - - -|colo-orc| - -This deployment is designed for highly performant online inference scenarios where -a distributed process (likely MPI processes) are performing inference with -data local to each process. - -This method is deemed ``locality based inference`` since data is local to each -process and the ``Orchestrator`` is deployed locally on each compute host where -the distributed application is running. - -Example -------- -The Application Script ----------------------- -.. code-block:: python - - from smartredis import ConfigOptions, Client, log_data - from smartredis import * - import numpy as np - -Initialize the Clients -^^^^^^^^^^^^^^^^^^^^^^ -.. code-block:: python - # Initialize a Client - colo_client = Client(cluster=False) - -Store Data -^^^^^^^^^^ -.. code-block:: python - # Create NumPy array - array_1 = np.array([1, 2, 3, 4]) - # Use SmartRedis client to place tensor in single sharded db - colo_client.put_tensor("tensor_1", array_1) -Retrieve Data -^^^^^^^^^^^^^ -.. code-block:: python - # Retrieve tensor from driver script - value_1 = colo_client.get_tensor("tensor_1") - # Log tensor - colo_client.log_data(LLInfo, f"The colocated db tensor is: {value_1}") - -The Experiment Driver Script ----------------------------- -.. code-block:: python - import numpy as np - from smartredis import Client - from smartsim import Experiment - from smartsim.log import get_logger - import sys - - exe_ex = sys.executable - logger = get_logger("Example Experiment Log") - # Initialize the Experiment - exp = Experiment("tester", launcher="auto") - -Initialize a Colocated Model -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Step 1: Configure -""""""""""""""""" -.. code-block:: python - # Initialize a RunSettings object - model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/lus/scratch/richaama/clustered_model.py") - # Configure RunSettings object - model_settings.set_nodes(1) - -Step 2: Initialize -"""""""""""""""""" -.. code-block:: python - # Initialize a SmartSim Model - model = exp.create_model("colo_model", model_settings) -Step 2: Colocate -"""""""""""""""" -.. code-block:: python - # Colocate the Model - model.colocate_db_tcp() - -Step 3: Start -""""""""""""" -.. code-block:: python - # Launch the colocated Model - exp.start(model, block=True, summary=True) -Cleanup Experiment -^^^^^^^^^^^^^^^^^^ -.. code-block:: python - - logger.info(exp.summary()) - -How to Run the Example ----------------------- -Source Code ------------ - -====================== -Multiple Orchestrators -====================== - -Example -------- - -Source Code ------------ \ No newline at end of file From 66a4cd5e5d626cf9362256172ffe411213848721 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 13 Dec 2023 13:17:14 -0600 Subject: [PATCH 07/26] pushing changes to orch --- doc/orchestrator.rst | 478 ++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 477 insertions(+), 1 deletion(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 92f17a3b9..77a8d52f3 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -277,6 +277,9 @@ When you run the experiment, the following output will appear:: ====================== Colocated Orchestrator ====================== +-------- +Overview +-------- During colocated deployment, the application and database are deployed on the same compute node.In this deployment, the database is *not* connected together in a cluster and each shard of the database is addressed individually by the processes @@ -297,6 +300,7 @@ This method is deemed ``locality based inference`` since data is local to each process and the ``Orchestrator`` is deployed locally on each compute host where the distributed application is running. +------- Example ------- This example demonstrates using SmartSim functions and classes to @@ -508,4 +512,476 @@ Communication with the database via a SmartRedis Client is possible from the colocated Model during a multi database Experiment. When initializing a SmartRedis client during the Experiment, first create a ``ConfigOptions`` object with the `db_identifier` argument created during before passing object to the Client() -init call. \ No newline at end of file +init call. + +Multiple Orchestrator Example +============================= +SmartSim offers functionality to automate the deployment of multiple +databases, supporting workloads that require multiple +``Orchestrators`` for a ``Experiment``. For instance, a workload may consist of a +simulation with high inference performance demands (necessitating a co-located deployment), +along with an analysis and +visualization workflow connected to the simulation (requiring a standard orchestrator). +In the following example, we simulate a simple version of this use case. + +The example is comprised of two script files: + +* The :ref:`Application Script` +* The :ref:`Experiment Driver Script` + +**The Application Script Overview:** +In this example, the application script is a python file that +contains instructions to complete computational +tasks. Applications are not limited to Python +and can also be written in C, C++ and Fortran. +This script specifies creating a Python SmartRedis client for each +standard orchestrator and a colocated orchestrator. We use the +clients to request data from both standard databases, then +transfer the data to the colocated database. The application +file is launched by the experiment driver script +through a ``Model`` stage. + +**The Application Script Contents:** + +1. Connecting SmartRedis clients within the application to retrieve tensors + from the standard databases to store in a colocated database. Details in section: + :ref:`Initialize the Clients`. + +**The Experiment Driver Script Overview:** +The experiment driver script holds the stages of the workflow +and manages their execution through the ``Experiment`` API. +We initialize an Experiment +at the beginning of the Python file and use the ``Experiment`` to +iteratively create, configure and launch computational kernels +on the system through the `slurm` launcher. +In the driver script, we use the ``Experiment`` to create and launch a ``Model`` instance that +runs the application. + +**The Experiment Driver Script Contents:** + +1. Launching two standard Orchestrators with unique identifiers. Details in section: + :ref:`Launch Multiple Orchestrators`. +2. Launching the application script with a co-located database. Details in section: + :ref:`Initialize a Colocated Model`. +3. Connecting SmartRedis clients within the driver script to send tensors to standard Orchestrators + for retrieval within the application. Details in section: + :ref:`Create Client Connections to Orchestrators`. + +Setup and run instructions can be found :ref:`here` + +The Application Script +---------------------- +Applications interact with the databases +through a SmartRedis client. +In this section, we write an application script +to demonstrate how to connect SmartRedis +clients in the context of multiple +launched databases. Using the clients, we retrieve tensors +from two databases launched in the driver script, then store +the tensors in the colocated database. + +.. note:: + The Experiment must be started to use the Orchestrators within the + application script. Otherwise, it will fail to connect. + Find the instructions on how to launch :ref:`here` + +To begin, import the necessary packages: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + :lines: 1-3 + +Initialize the Clients +^^^^^^^^^^^^^^^^^^^^^^ +To establish a connection with each database, +we need to initialize a new SmartRedis client for each +``Orchestrator``. + +Step 1: Initialize ConfigOptions +"""""""""""""""""""""""""""""""" +Since we are launching multiple databases within the experiment, +the SmartRedis ``ConfigOptions`` object is required when initializing +a client in the application. +We use the ``ConfigOptions.create_from_environment()`` +function to create three instances of ``ConfigOptions``, +with one instance associated with each launched ``Orchestrator``. +Most importantly, to associate each launched Orchestrator to a ConfigOptions object, +the ``create_from_environment()`` function requires specifying the unique database identifier +argument named `db_identifier`. + +For the single-sharded database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + :lines: 5-6 + +For the multi-sharded database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + :lines: 10-11 + +For the colocated database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + :lines: 15-16 + +Step 2: Initialize the Client Connections +""""""""""""""""""""""""""""""""""""""""" +Now that we have three ``ConfigOptions`` objects, we have the +tools necessary to initialize three SmartRedis clients and +establish a connection with the three databases. +We use the SmartRedis ``Client`` API to create the client instances by passing in +the ``ConfigOptions`` objects and assigning a `logger_name` argument. + +Single-sharded database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + :lines: 7-8 + +Multi-sharded database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + :lines: 12-13 + +Colocated database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + :lines: 17-18 + +Retrieve Data and Store Using SmartRedis Client Objects +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +To confirm a successful connection to each database, we will retrieve the tensors +that we plan to store in the python driver script. After retrieving, we +store both tensors in the colocated database. +The ``Client.get_tensor()`` method allows +retrieval of a tensor. It requires the `name` of the tensor assigned +when sent to the database via ``Client.put_tensor()``. + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + :lines: 20-26 + +Later, when you run the experiment driver script the following output will appear in ``tutorial_model.out`` +located in ``getting-started-multidb/tutorial_model/``:: + + Model: single shard logger@00-00-00:The single sharded db tensor is: [1 2 3 4] + Model: multi shard logger@00-00-00:The multi sharded db tensor is: [5 6 7 8] + +This output showcases that we have established a connection with multiple Orchestrators. + +Next, take the tensors retrieved from the standard deployment databases and +store them in the colocated database using ``Client.put_tensor(name, data)``. + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + :lines: 28-30 + +Next, check if the tensors exist in the colocated database using ``Client.poll_tensor()``. +This function queries for data in the database. The function requires the tensor name (`name`), +how many milliseconds to wait in between queries (`poll_frequency_ms`), +and the total number of times to query (`num_tries`): + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + :lines: 32-37 + +The output will be as follows:: + + Model: colo logger@00-00-00:The colocated db has tensor_1: True + Model: colo logger@00-00-00:The colocated db has tensor_2: True + +The Experiment Driver Script +---------------------------- +To run the previous application, we must define workflow stages within a workload. +Defining workflow stages requires the utilization of functions associated +with the ``Experiment`` object. The Experiment object is intended to be instantiated +once and utilized throughout the workflow runtime. +In this example, we instantiate an ``Experiment`` object with the name ``getting-started-multidb``. +We setup the SmartSim ``logger`` to output information from the Experiment. + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 1-10 + +Launch Multiple Orchestrators +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +In the context of this ``Experiment``, it's essential to create and launch +the databases as a preliminary step before any other components since +the application script requests tensors from the launched databases. + +We aim to showcase the multi-database automation capabilities of SmartSim, so we +create two databases in the workflow: a single-sharded database and a +multi-sharded database. + +Step 1: Initialize Orchestrators +"""""""""""""""""""""""""""""""" +To create an database, utilize the ``Experiment.create_database()`` function. +The function requires specifying a unique +database identifier argument named `db_identifier` to launch multiple databases. +This step is necessary to connect to databases outside of the driver script. +We will use the `db_identifier` names we specified in the application script. + +For the single-sharded database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 12-14 + +For the multi-sharded database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 16-18 + +.. note:: + Calling ``exp.generate()`` will create two subfolders + (one for each Orchestrator created in the previous step) + whose names are based on the db_identifier of that Orchestrator. + In this example, the Experiment folder is + named ``getting-started-multidb/``. Within this folder, two Orchestrator subfolders will + be created, namely ``single_shard_db_identifier/`` and ``multi_shard_db_identifier/``. + +Step 2: Start Databases +""""""""""""""""""""""" +Next, to launch the databases, +pass the database instances to ``Experiment.start()``. + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 20-21 + +The ``Experiment.start()`` function launches the ``Orchestrators`` for use within the workflow. In other words, the function +deploys the databases on the allocated compute resources. + +.. note:: + By setting `summary=True`, SmartSim will print a summary of the + experiment before it is launched. After printing the experiment summary, + the experiment is paused for 10 seconds giving the user time to + briefly scan the summary contents. If we set `summary=False`, then the experiment + would be launched immediately with no summary. + +Create Client Connections to Orchestrators +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +The SmartRedis ``Client`` object contains functions that manipulate, send, and receive +data within the database. Each database has a single, dedicated SmartRedis ``Client``. +Begin by initializing a SmartRedis ``Client`` object per launched database. + +To create a designated SmartRedis ``Client``, you need to specify the address of the target +running database. You can easily retrieve this address using the ``Orchestrator.get_address()`` function. + +For the single-sharded database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 23-24 + +For the multi-sharded database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 25-26 + +Store Data Using Clients +^^^^^^^^^^^^^^^^^^^^^^^^ +In the application script, we retrieved two NumPy tensors. +To support the apps functionality, we will create two +NumPy arrays in the python driver script and send them to the a database. To +accomplish this, we use the ``Client.put_tensor()`` function with the respective +database client instances. + +For the single-sharded database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 28-31 + +For the multi-sharded database: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 33-36 + +Lets check to make sure the database tensors do not exist in the incorrect databases: + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 38-42 + +When you run the experiment, the following output will appear:: + + 00:00:00 system.host.com SmartSim[#####] INFO The multi shard array key exists in the incorrect database: False + 00:00:00 system.host.com SmartSim[#####] INFO The single shard array key exists in the incorrect database: False + +Initialize a Colocated Model +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +In the next stage of the experiment, we +launch the application script with a co-located database +by configuring and creating +a SmartSim colocated ``Model``. + +Step 1: Configure +""""""""""""""""" +You can specify the run settings of a model. +In this experiment, we invoke the Python interpreter to run +the python script defined in section: :ref:`The Application Script`. +To configure this into a ``Model``, we use the ``Experiment.create_run_settings()`` function. +The function returns a ``RunSettings`` object. +When initializing the RunSettings object, +we specify the path to the application file, +`application_script.py`, for +``exe_args``, and the run command for ``exe``. + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 44-45 + +.. note:: + You will have to change the `exe_args` argument to the path of the application script + on your machine to run the example. + +With the ``RunSettings`` instance, +configure the the distribution of computational tasks (``RunSettings.set_nodes()``) and the number of instances +the script is execute on each node (``RunSettings.set_tasks_per_node()``). In this +example, we specify to SmartSim that we intend to execute the script once on a single node. + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 46-48 + +Step 2: Initialize +"""""""""""""""""" +Next, create a ``Model`` instance using the ``Experiment.create_model()``. +Pass the ``model_settings`` object as an argument +to the ``create_model()`` function and assign to the variable ``model``. + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 49-50 + +Step 2: Colocate +"""""""""""""""" +To colocate the model, use the ``Model.colocate_db_uds()`` function to +Colocate an Orchestrator instance with this Model over +a Unix domain socket connection. + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 51-52 + +This method will initialize settings which add an unsharded +database to this Model instance. Only this Model will be able +to communicate with this colocated database by using the loopback TCP interface. + +Step 3: Start +""""""""""""" +Next, launch the colocated model instance using the ``Experiment.start()`` function. + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 53-54 + +.. note:: + We set `block=True`, + so that ``Experiment.start()`` waits until the last Model has finished + before returning: it will act like a job monitor, letting us know + if processes run, complete, or fail. + +Cleanup Experiment +^^^^^^^^^^^^^^^^^^ +Finally, use the ``Experiment.stop()`` function to stop the database instances. Print the +workflow summary with ``Experiment.summary()``. + +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: + :lines: 56-59 + +When you run the experiment, the following output will appear:: + + 00:00:00 system.host.com SmartSim[#####]INFO + | | Name | Entity-Type | JobID | RunID | Time | Status | Returncode | + |----|------------------------------|---------------|-------------|---------|---------|-----------|--------------| + | 0 | colo_model | Model | 1556529.5 | 0 | 1.7437 | Completed | 0 | + | 1 | single_shard_db_identifier_0 | DBNode | 1556529.3 | 0 | 68.8732 | Cancelled | 0 | + | 2 | multi_shard_db_identifier_0 | DBNode | 1556529.4+2 | 0 | 45.5139 | Cancelled | 0 | + +How to Run the Example +---------------------- +Below are the steps to run the experiment. Find the +:ref:`experiment source code` +and :ref:`application source code` +below in the respective subsections. + +.. note:: + The example assumes that you have already installed and built + SmartSim and SmartRedis. Please refer to Section :ref:`Basic Installation` + for further details. For simplicity, we assume that you are + running on a SLURM-based HPC-platform. Refer to the steps below + for more details. + +Step 1 : Setup your directory tree + Your directory tree should look similar to below:: + + SmartSim/ + SmartRedis/ + Multi-db-example/ + application_script.py + experiment_script.py + + You can find the application and experiment source code in subsections below. + +Step 2 : Install and Build SmartSim + This example assumes you have installed SmartSim and SmartRedis in your + Python environment. We also assume that you have built SmartSim with + the necessary modules for the machine you are running on. + +Step 3 : Change the `exe_args` file path + When configuring the colocated model in `experiment_script.py`, + we pass the file path of `application_script.py` to the `exe_args` argument + on line 33 in :ref:`experiment_script.py`. + Edit this argument to the file path of your `application_script.py` + +Step 4 : Run the Experiment + Finally, run the experiment with ``python experiment_script.py``. + + +Application Source Code +^^^^^^^^^^^^^^^^^^^^^^^ +.. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py + :language: python + :linenos: + +Experiment Source Code +^^^^^^^^^^^^^^^^^^^^^^ +.. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py + :language: python + :linenos: \ No newline at end of file From 2c527fd72b086a1131a8563818ec6f79c3eb5372 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Mon, 18 Dec 2023 22:27:32 -0600 Subject: [PATCH 08/26] update to fortran --- doc/sr_fortran_walkthrough.rst | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/doc/sr_fortran_walkthrough.rst b/doc/sr_fortran_walkthrough.rst index d90b8e986..f01545db8 100644 --- a/doc/sr_fortran_walkthrough.rst +++ b/doc/sr_fortran_walkthrough.rst @@ -7,19 +7,18 @@ Fortran -Below are examples that use the SmartRedis Fortran API to -interact with the RedisAI tensor, model, and script data types. -Additionally, this section demonstrates how to utilize the SmartRedis ``DataSet`` API. - - +In this section, examples are presented using the SmartRedis Fortran +API to interact with the RedisAI tensor, model, and script +data types. Additionally, an example of utilizing the +SmartRedis ``DataSet`` API is also provided. .. note:: The Fortran API examples rely on the ``SSDB`` environment - variable set to the address and port of the Redis database. + variable being set to the address and port of the Redis database. .. note:: - The Fortran API examples - connect to a clustered database or clustered SmartSim Orchestrator. + The Fortran API examples are written + to connect to a clustered database or clustered SmartSim Orchestrator. Update the ``Client`` constructor ``cluster`` flag to `.false.` to connect to a single shard (single compute host) database. @@ -39,9 +38,10 @@ the text of the error message emitted within the C++ code. Tensors ======= -The SmartRedis Fortran client communicates between a Fortran -client and the Redis database. In this example, the client sends an array -to the database and then unpacks the data into another Fortran array. +The SmartRedis Fortran client is used to communicate between +a Fortran client and the Redis database. In this example, +the client will be used to send an array to the database +and then unpack the data into another Fortran array. This example will go step-by-step through the program and then present the entirety of the example code at the end. @@ -83,7 +83,7 @@ if using a clustered database or ``.false.`` otherwise. After the SmartRedis client has been initialized, a Fortran array of any dimension and shape -and with a type of either 8, 16, 32, 64-bit +and with a type of either 8, 16, 32, 64 bit ``integer`` or 32 or 64-bit ``real`` can be put into the database using the type-bound procedure ``put_tensor``. @@ -92,7 +92,7 @@ data, the array ``send_array_real_64`` will be filled with random numbers and stored in the database using ``put_tensor``. This subroutine requires the user to specify a string used as the -'key' (here: ``send_array``) to identify the tensor +'key' (here: ``send_array``) identifying the tensor in the database, the array to be stored, and the shape of the array. @@ -405,4 +405,4 @@ Python Pre-Processing: .. literalinclude:: ../smartredis/examples/common/mnist_data/data_processing_script.txt :linenos: :language: Python - :lines: 15-20 + :lines: 15-20 \ No newline at end of file From ce593107fce0c4f21006a7d5364be033fdbf2a29 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Tue, 19 Dec 2023 13:27:43 -0600 Subject: [PATCH 09/26] pushing nit picks --- doc/orchestrator.rst | 63 +++++++++++++++++++++++++++----------------- 1 file changed, 39 insertions(+), 24 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 77a8d52f3..92cf8d39c 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -5,34 +5,49 @@ Orchestrator ======== Overview ======== -The ``Orchestrator`` is an in-memory database that is launched prior to all other -entities within an ``Experiment``. The ``Orchestrator`` can be used to store and retrieve -data during the course of an experiment and across multiple entities. In order to -stream data into or receive data from the ``Orchestrator``, one of the SmartSim clients +The SmartSim ``Orchestrator`` is an in-memory database that is used to store and retrieve +data during the course of an experiment. Orchestrators can be used to 1) store and retrieve +data across multiple entities or 2) store and retrieve with a single ``Model``. +The two options refer to the two types of database deployments: clustered deployment +and colocated deployment. During clustered deployment, an orchestrator is allocated +its own compute resources. It does not resources with any other SmartSim entity. +Colocated orchestrators share compute resources with a SmartSim ``Model``. +In order to stream data into or receive data from the ``Orchestrator``, one of the SmartSim clients (SmartRedis) has to be used within a Model. +Orchestrators support a wide variety of AI-enabled workloads via ``Model`` objects +that can be instructed to load TF, TF-lite, PT, or ONNX machine learning models, +as well as TensorFlow scripts and functions to the database at runtime. + +Below is a diagram demonstrating an orchestrator as a general feature store +capable of storing numerical data (tensors and datasets), AI models, and scripts (TorchScript). +Combined with the SmartRedis clients, the ``Orchestrator`` is capable of hosting and executing +AI models written in Python on CPU or GPU. The ``Orchestrator`` supports models written with +TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sci-kit learn). + .. |orchestrator| image:: images/Orchestrator.png :width: 700 :alt: Alternative text |orchestrator| -Combined with the SmartRedis clients, the ``Orchestrator`` is capable of hosting and executing -AI models written in Python on CPU or GPU. The ``Orchestrator`` supports models written with -TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sci-kit learn). - ====================== Clustered Orchestrator ====================== -------- Overview -------- -A clustered Orchestrator is a type of deployment where the application and database -are launched on separate compute nodes. A clustered Orchestrator may be single-sharded -(allocated one database node) or multi-sharded (allocated multiple database nodes). -When initializing an ``Orchestrator`` within an Experiment, you may set -the argument `db_nodes` to be 1 or greater than 2. This parameter controls the number -of database nodes your in-memory database will span across. +A clustered Orchestrator +A clustered Orchestrator is deployed on separate compute resources than +a Model. + +A clustered Orchestrator may be single-sharded +(on a single compute node) or multi-sharded (spread across multiple compute nodes). +When initializing a standalone ``Orchestrator`` using ``Experiment.create_orchestrator()``, set +the init parameter `db_nodes` to be 1 or greater than 2. This parameter controls the number +of nodes the in-memory database spans across. + +After initializing a .. |cluster-orc| image:: images/clustered-orc-diagram.png :width: 700 @@ -43,7 +58,7 @@ of database nodes your in-memory database will span across. Clustered Orchestrators support data communication across multiple simulations. Given that a clustered database is standalone, meaning the database compute node is separate from the application compute node, the database node does not tear -down after the finish of a SmartSim Model, unlike a colocated orchestrator. +down after the finish of a SmartSim Model. With standalone database deployment, SmartSim can run AI models, and Torchscript code on the CPU(s) or GPU(s) with existing data in the ``Orchestrator``. Produced data can then requested by another application. @@ -340,7 +355,7 @@ script. 1. Launching the application script with a co-located database. The Application Script ----------------------- +====================== A SmartRedis client connects and interacts with a launched Orchestrator. In this section, we write an application script @@ -364,7 +379,7 @@ To begin writing the application script, provide the imports: import numpy as np Initialize the Clients -^^^^^^^^^^^^^^^^^^^^^^ +---------------------- To establish a connection with the colocated database, initialize a new SmartRedis client and specify `cluster=False` since our database is single-sharded: @@ -380,7 +395,7 @@ since our database is single-sharded: SmartRedis will handle the connection. Store Data -^^^^^^^^^^ +---------- Next, using the SmartRedis client instance, we create and store a NumPy tensor using ``Client.put_tensor()``: .. code-block:: python @@ -391,7 +406,7 @@ a NumPy tensor using ``Client.put_tensor()``: colo_client.put_tensor("tensor_1", array_1) Retrieve Data -^^^^^^^^^^^^^ +------------- Next, retrieve the tensor using ``Client.get_tensor()``: .. code-block:: python @@ -404,7 +419,7 @@ When the Experiment completes, you can find the following log message in `colo_m Default@21-48-01:The colocated db tensor is: [1 2 3 4] The Experiment Driver Script ----------------------------- +============================ To run the application, specify a Model workload from within the workflow (Experiment). Defining workflow stages requires the utilization of functions associated @@ -425,7 +440,7 @@ We setup the SmartSim ``logger`` to output information from the Experiment. exp = Experiment("getting-started", launcher="auto") Initialize a Colocated Model -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +---------------------------- In the next stage of the experiment, we create and launch a colocated ``Model`` that runs the application script with a database @@ -490,7 +505,7 @@ Next, launch the colocated model instance using the ``Experiment.start()`` funct test Cleanup Experiment -^^^^^^^^^^^^^^^^^^ +------------------ .. code-block:: python @@ -526,8 +541,8 @@ In the following example, we simulate a simple version of this use case. The example is comprised of two script files: -* The :ref:`Application Script` -* The :ref:`Experiment Driver Script` +* The Application Script +* The Experiment Driver Script **The Application Script Overview:** In this example, the application script is a python file that From e1b80d87409905bbb44dbdde0d65aa3d79f1db61 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Thu, 21 Dec 2023 10:55:32 -0600 Subject: [PATCH 10/26] orch deployment --- doc/orchestrator.rst | 28 ++++++++++------------------ 1 file changed, 10 insertions(+), 18 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 92cf8d39c..70924c983 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -1,7 +1,6 @@ ************ Orchestrator ************ - ======== Overview ======== @@ -37,17 +36,13 @@ Clustered Orchestrator -------- Overview -------- -A clustered Orchestrator -A clustered Orchestrator is deployed on separate compute resources than -a Model. - -A clustered Orchestrator may be single-sharded -(on a single compute node) or multi-sharded (spread across multiple compute nodes). -When initializing a standalone ``Orchestrator`` using ``Experiment.create_orchestrator()``, set -the init parameter `db_nodes` to be 1 or greater than 2. This parameter controls the number -of nodes the in-memory database spans across. -After initializing a +In a clustered orchestrator deployment, the database is initiated on a separate compute host +from the model's compute resources. Unlike a colocated orchestrator, a clustered orchestrator avoids +sharing compute resources with the model. It can be configured as either single-sharded or multi-sharded. +Data communication is initiated within the Model through a SmartRedis client. The client establishes a +connection with the database using a specified database address and travels off-node to reach the database +compute node. .. |cluster-orc| image:: images/clustered-orc-diagram.png :width: 700 @@ -295,10 +290,9 @@ Colocated Orchestrator -------- Overview -------- -During colocated deployment, the application and database are deployed on the same -compute node.In this deployment, the database is *not* connected together in a -cluster and each shard of the database is addressed individually by the processes -running on that compute host. +In a colocated orchestrator deployment, the database and model coexist on shared compute resources. +The orchestrator is non-clustered and each application compute node hosts an instance of the database. +Processes on the compute host individually address the database. .. |colo-orc| image:: images/co-located-orc-diagram.png :width: 700 @@ -318,8 +312,6 @@ the distributed application is running. ------- Example ------- -This example demonstrates using SmartSim functions and classes to - This example provides a demonstration on automating the deployment of a colocated Orchestrator within an Experiment. @@ -442,7 +434,7 @@ We setup the SmartSim ``logger`` to output information from the Experiment. Initialize a Colocated Model ---------------------------- In the next stage of the experiment, we -create and launch a colocated ``Model`` that +create and launch a colocated ``Model`` that runs the application script with a database on the same compute node. From 6f16a71171c794216bf16341e7a6f445fc8eae23 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Tue, 2 Jan 2024 10:21:41 -0600 Subject: [PATCH 11/26] pushing edits --- doc/orchestrator.rst | 108 +++++++++++++++++++++++++++---------------- 1 file changed, 67 insertions(+), 41 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 70924c983..13f203a06 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -4,19 +4,19 @@ Orchestrator ======== Overview ======== -The SmartSim ``Orchestrator`` is an in-memory database that is used to store and retrieve -data during the course of an experiment. Orchestrators can be used to 1) store and retrieve -data across multiple entities or 2) store and retrieve with a single ``Model``. -The two options refer to the two types of database deployments: clustered deployment -and colocated deployment. During clustered deployment, an orchestrator is allocated -its own compute resources. It does not resources with any other SmartSim entity. -Colocated orchestrators share compute resources with a SmartSim ``Model``. -In order to stream data into or receive data from the ``Orchestrator``, one of the SmartSim clients -(SmartRedis) has to be used within a Model. - -Orchestrators support a wide variety of AI-enabled workloads via ``Model`` objects -that can be instructed to load TF, TF-lite, PT, or ONNX machine learning models, -as well as TensorFlow scripts and functions to the database at runtime. +The orchestrator is an in-memory database with features built for +AI-enabled workflows including online training, low-latency inference, cross-application data +exchange, online interactive visualization, online data analysis, computational +steering, and more. The ``Orchestrator`` can be thought of as a general +feature store capable of storing numerical data, ML models, and scripts. +The orchestrator is capable of performing inference and script evaluation using data in the feature store. +Any SmartSim ``Model`` or ``Ensemble`` model can connect to the +``Orchestrator`` via the :ref:`SmartRedis` +client library to transmit data, execute ML models, and execute scripts. + +SmartSim offers two types of orchestrator deployments: :ref:`clustered deployment` and +:ref:`colocated deployment`. Continue to the respective deployment sections for +information on deployment types. Below is a diagram demonstrating an orchestrator as a general feature store capable of storing numerical data (tensors and datasets), AI models, and scripts (TorchScript). @@ -31,18 +31,24 @@ TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sc |orchestrator| ====================== -Clustered Orchestrator +Clustered Deployment ====================== -------- Overview -------- - -In a clustered orchestrator deployment, the database is initiated on a separate compute host -from the model's compute resources. Unlike a colocated orchestrator, a clustered orchestrator avoids -sharing compute resources with the model. It can be configured as either single-sharded or multi-sharded. -Data communication is initiated within the Model through a SmartRedis client. The client establishes a -connection with the database using a specified database address and travels off-node to reach the database -compute node. +During clustered deployment, a SmartSim ``Orchestrator`` (the database) runs on separate +compute node(s) from the model node(s). A clustered orchestrator can be deployed on a single +node or sharded (distributed) over multiple nodes. +With a sharded orchestrator, available hardware for inference and script +evaluation increases and overall memory for data storage increases. + +Communication between a clustered Orchestrator and Model +is initialized in the application script via a SmartRedis client. +Users do not need to know how the data is stored in a clustered configuration and +can address the cluster with the SmartRedis clients like a single block of memory +using simple put/get semantics in SmartRedis. The client establishes a +connection using the database address and travels off the model compute node to +the database compute node. .. |cluster-orc| image:: images/clustered-orc-diagram.png :width: 700 @@ -50,27 +56,46 @@ compute node. |cluster-orc| +A clustered database is optimal for high data throughput scenarios +such as online analysis, training and processing. Clustered Orchestrators support data communication across multiple simulations. -Given that a clustered database is standalone, meaning the database compute node -is separate from the application compute node, the database node does not tear -down after the finish of a SmartSim Model. -With standalone database deployment, SmartSim can run AI models, and Torchscript +With clustered database deployment, SmartSim can run AI models, and Torchscript code on the CPU(s) or GPU(s) with existing data in the ``Orchestrator``. Produced data can then requested by another application. -Users do not need to know how the data is stored in a clustered configuration and -can address the cluster with the SmartRedis clients like a single block of memory -using simple put/get semantics in SmartRedis. - -The cluster deployment is optimal for high data throughput scenarios such as -online analysis, training and processing. - ------- Example ------- -This example provides a demonstration on automating the deployment of -a standard Orchestrator. Once the standard database is started, -we demonstrate connecting a client from within the driver script. +We provide a demonstration on automating the deployment of +a standard Orchestrator from within a Python driver script. Once the standard database is started, +we demonstrate connecting a client from within the application script to transmit and poll data. + +The example is comprised of two script files: + +* The Application Script +* The Experiment Driver Script + +**The Application Script Overview:** +The example application script is a Python file that contains +instructions to create and connect a SmartRedis +client to the standard Orchestrator to transmit and retrieve +tensors. + +**The Application Script Contents:** + +1. Connecting a SmartRedis client within the application to send and retrieve a tensor + from the standard database. + +**The Experiment Driver Script Overview:** +The experiment driver script launches and manages +the example entities with the ``Experiment`` API. +In the driver script, we use the ``Experiment`` +to create and launch a ``Model`` instance +to communicate with a launched standard ``Orchestrator``. + +**The Experiment Driver Script Contents:** + +1. Launching a ``Model`` and ``Orchestrator``. The Application Script ====================== @@ -284,14 +309,15 @@ workflow summary with ``Experiment.summary()``: When you run the experiment, the following output will appear:: test -====================== -Colocated Orchestrator -====================== +==================== +Colocated Deployment +==================== -------- Overview -------- -In a colocated orchestrator deployment, the database and model coexist on shared compute resources. -The orchestrator is non-clustered and each application compute node hosts an instance of the database. +During colocated deployment, a SmartSim ``Orchestrator`` (the database) is launched on +the ``Model`` compute node(s). +The orchestrator is non-clustered and each ``Model`` compute node hosts an instance of the database. Processes on the compute host individually address the database. .. |colo-orc| image:: images/co-located-orc-diagram.png @@ -344,7 +370,7 @@ script. **The Experiment Driver Script Contents:** -1. Launching the application script with a co-located database. +1. Launching the application script with a co-located model. The Application Script ====================== From faaecaf9787630f828a1a452e871d3f48a373bd7 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 3 Jan 2024 16:09:28 -0600 Subject: [PATCH 12/26] clustered notes updated from chris comments --- doc/orchestrator.rst | 182 +++++++++++++++++++++---------------------- 1 file changed, 87 insertions(+), 95 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 13f203a06..8e4c53a4e 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -4,24 +4,14 @@ Orchestrator ======== Overview ======== -The orchestrator is an in-memory database with features built for +The ``Orchestrator`` is an in-memory database with features built for AI-enabled workflows including online training, low-latency inference, cross-application data -exchange, online interactive visualization, online data analysis, computational -steering, and more. The ``Orchestrator`` can be thought of as a general -feature store capable of storing numerical data, ML models, and scripts. -The orchestrator is capable of performing inference and script evaluation using data in the feature store. -Any SmartSim ``Model`` or ``Ensemble`` model can connect to the -``Orchestrator`` via the :ref:`SmartRedis` -client library to transmit data, execute ML models, and execute scripts. - -SmartSim offers two types of orchestrator deployments: :ref:`clustered deployment` and -:ref:`colocated deployment`. Continue to the respective deployment sections for -information on deployment types. - -Below is a diagram demonstrating an orchestrator as a general feature store -capable of storing numerical data (tensors and datasets), AI models, and scripts (TorchScript). +exchange, online interactive visualization, online data analysis, computational steering, and more. + +An ``Orchestrator`` can be thought of as a general feature store +capable of storing numerical data (Tensors and Datasets), AI Models, and scripts (TorchScripts). Combined with the SmartRedis clients, the ``Orchestrator`` is capable of hosting and executing -AI models written in Python on CPU or GPU. The ``Orchestrator`` supports models written with +AI models written in Python on CPU or GPU. The ``Orchestrator`` supports AI Models written with TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sci-kit learn). .. |orchestrator| image:: images/Orchestrator.png @@ -30,6 +20,23 @@ TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sc |orchestrator| +SmartSim ``Models`` or ``Ensemble`` models can be instructed to connect to an ``Orchestrator`` +via the :ref:`SmartRedis` client library from within a Python driver script or +an application script. + +SmartSim offers two types of orchestrator deployment: :ref:`clustered deployment` and +:ref:`colocated deployment`. During clustered deployment, the ``Orchestrator`` is launched +on separate compute resources than a ``Model``. Clustered deployment is well-suited for throughput +scenarios. In colocated deployment, an ``Orchestrator`` shares compute resources with a ``Model``. Colocated +deployment is well-suited for inference scenarios. + +SmartSim allows users to launch multiple orchestrators during the course of an experiment of +either deployment type. If a workflow requires a multiple database environment, a +`db_identifier` argument must be specified during database initialization. Users can connect to +orchestrators in a parallel database workflow by specifying the respective `db_identifier` argument +when initializing a SmartRedis client object. The client can then be used to transmit data, +execute ML models, and execute scripts on the linked database. + ====================== Clustered Deployment ====================== @@ -47,8 +54,9 @@ is initialized in the application script via a SmartRedis client. Users do not need to know how the data is stored in a clustered configuration and can address the cluster with the SmartRedis clients like a single block of memory using simple put/get semantics in SmartRedis. The client establishes a -connection using the database address and travels off the model compute node to -the database compute node. +connection using the database address detected by SmartSim or provided by the user. In multiple +database experiments, users provide the `db_identifier` used to create the clustered +database when creating a client. .. |cluster-orc| image:: images/clustered-orc-diagram.png :width: 700 @@ -61,45 +69,32 @@ such as online analysis, training and processing. Clustered Orchestrators support data communication across multiple simulations. With clustered database deployment, SmartSim can run AI models, and Torchscript code on the CPU(s) or GPU(s) with existing data in the ``Orchestrator``. -Produced data can then requested by another application. +Data produced by by these processes and stored in the clustered database is available for +consumption by other applications. ------- Example ------- -We provide a demonstration on automating the deployment of -a standard Orchestrator from within a Python driver script. Once the standard database is started, -we demonstrate connecting a client from within the application script to transmit and poll data. +In the following example, we provide a demonstration on automating the deployment of +a clustered Orchestrator using SmartSim from within a Python driver script. Once the standard database is launched, +we demonstrate connecting a client to the database from within the application script to transmit and poll data. The example is comprised of two script files: -* The Application Script -* The Experiment Driver Script - -**The Application Script Overview:** -The example application script is a Python file that contains -instructions to create and connect a SmartRedis -client to the standard Orchestrator to transmit and retrieve -tensors. - -**The Application Script Contents:** - -1. Connecting a SmartRedis client within the application to send and retrieve a tensor - from the standard database. - -**The Experiment Driver Script Overview:** -The experiment driver script launches and manages -the example entities with the ``Experiment`` API. -In the driver script, we use the ``Experiment`` -to create and launch a ``Model`` instance -to communicate with a launched standard ``Orchestrator``. - -**The Experiment Driver Script Contents:** - -1. Launching a ``Model`` and ``Orchestrator``. +- The Application Script + The application script is a Python file that contains instructions to create SmartRedis + client connection to the standard Orchestrator launched in the driver script. From within the + application script, the client sends and retrieve data. +- The Experiment Driver Script + The experiment driver script launches and manages SmartSim entities. In the driver script, we use the Experiment + API to create and launch a standard ``orchestrator``. We create a client connection and store a tensor for use within + the application. We then initialize a ``Model`` object with the + application script as an executable argument. Once the database has launched, we launch the ``Model``. + We then retrieve the tensors stored by the ``Model`` from within the driver script. Lastly, we tear down the database. The Application Script ====================== -To begin writing the application script, import the necessary packages: +To begin writing the application script, import the necessary SmartRedis packages: .. code-block:: python @@ -109,10 +104,8 @@ To begin writing the application script, import the necessary packages: Initialize the Client --------------------- -To establish a connection with the standard database, -we need to initialize a new SmartRedis client. -Since the standard database we launch in the driver script -multi-sharded, we specify the `cluster` as `True`: +To establish a connection with the standard database, we need to initialize a new SmartRedis client. +Since the standard database we launch in the driver script is sharded, we specify the `cluster` as `True`: .. code-block:: python @@ -121,11 +114,10 @@ multi-sharded, we specify the `cluster` as `True`: Retrieve Data ------------- -To confirm a successful connection to the database, we retrieve the tensor -we store in the Python driver script. -Use the ``Client.get_tensor()`` method to -retrieve the tensor by specifying the name `tensor_1` we +To confirm a successful connection to the database, we retrieve the tensor we store in the Python driver script. +Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name `tensor_1` we used during ``Client.put_tensor()`` in the driver script: + .. code-block:: python # Retrieve tensor from Orchestrator @@ -142,6 +134,7 @@ Store Data ---------- Next, create a NumPy tensor to send to the standard database using ``Client.put_tensor(name, data)``: + .. code-block:: python # Create a NumPy array @@ -153,9 +146,9 @@ We will retrieve `"tensor_2"` in the Python driver script. The Experiment Driver Script ============================ -To run the previous application, we must define a Model and Orchestrator within an -experiment. Defining workflow stages requires the utilization of functions associated -with the ``Experiment`` object. The Experiment object is intended to be instantiated +To run the previous application script, we define a ``Model`` and ``Orchestrator`` within an +Python driver script. Defining workflow stages (``Model`` and ``Orchestrator``) requires the utilization of functions associated +with the ``Experiment`` object. The ``Experiment`` object is intended to be instantiated once and utilized throughout the workflow runtime. In this example, we instantiate an ``Experiment`` object with the name ``getting-started``. We setup the SmartSim ``logger`` to output information from the Experiment: @@ -176,11 +169,11 @@ We setup the SmartSim ``logger`` to output information from the Experiment: Launch Standard Orchestrator ---------------------------- In the context of this ``Experiment``, it's essential to create and launch -the databases as a preliminary step before any other components since -the application script requests and sends tensors from a launched databases. +the databases as a preliminary step before any other workflow components. This is because +the application script requests and sends tensors to and from a launched database. We aim to demonstrate the standard orchestrator automation capabilities of SmartSim, so we -create a single database in the workflow: a multi-sharded database. +create a clustered database in the workflow. Step 1: Initialize Orchestrator ''''''''''''''''''''''''''''''' @@ -193,8 +186,7 @@ To create a standard database, utilize the ``Experiment.create_database()`` func Step 2: Start Databases ''''''''''''''''''''''' -Next, to launch the database, -pass the database instance to ``Experiment.start()``. +Next, to launch the database, pass the database instance to ``Experiment.start()``. .. code-block:: python # Launch the multi sharded database @@ -203,15 +195,15 @@ pass the database instance to ``Experiment.start()``. The ``Experiment.start()`` function launches the ``Orchestrator`` for use within the workflow. In other words, the function deploys the database on the allocated compute resources. -Create Client Connections to Orchestrator ------------------------------------------ -The SmartRedis ``Client`` object contains functions that manipulate, send, and receive -data within the database. Each database can have a single, dedicated SmartRedis ``Client``. +Create a Client Connection to the Orchestrator +---------------------------------------------- +The SmartRedis ``Client`` object contains functions that manipulate, send, and retrieve +data on the database. Each database can have a single, dedicated SmartRedis ``Client`` connection. Begin by initializing a SmartRedis ``Client`` object for the standard database. -When creating a client connect from within a driver script, -you need to specify the address of the database you would like to connect to. -You can easily retrieve this address using the ``Orchestrator.get_address()`` function: +When creating a client connection from within a driver script, +specify the address of the database you would like to connect to. +You can easily retrieve the database address using the ``Orchestrator.get_address()`` function: .. code-block:: python @@ -220,38 +212,35 @@ You can easily retrieve this address using the ``Orchestrator.get_address()`` fu Store Data Using Clients ------------------------ -In the application script, we retrieved a NumPy tensor. -To support the apps functionality, we create a -NumPy array in the python driver script to send to the a database. To -accomplish this, we use ``Client.put_tensor()``: +In the application script, we retrieved a NumPy tensor stored from within the driver script. +To support the application functionality, we create a +NumPy array in the experiment workflow to send to the database. To +send a tensor to the database, use the function ``Client.put_tensor()``: .. code-block:: python # Create NumPy array array_1 = np.array([1, 2, 3, 4]) - # Use multi shard db SmartRedis client to place tensor standard database + # Use the SmartRedis client to place tensor in the standard database driver_client_standard_db.put_tensor("tensor_1", array_1) Initialize a Model ------------------ -In the next stage of the experiment, we -launch the application script by configuring and creating -a SmartSim ``Model``. +In the next stage of the experiment, we execute the application script by configuring and creating +a SmartSim ``Model`` and specifying the application script name during ``Model`` creation. Step 1: Configure ''''''''''''''''' -You can specify the run settings of a model. -In this experiment, we invoke the Python interpreter to run -the python script defined in section: The Application Script. -To configure this into a ``Model``, we use the ``Experiment.create_run_settings()`` function. -The function returns a ``RunSettings`` object. -When initializing the ``RunSettings`` object, -we specify the path to the application file, -`application_script.py`, for -``exe_args``, and the run command for ``exe``. +In the example experiment, we invoke the Python interpreter to run +the python application script defined in section: The Application Script. +We use ``Experiment.create_run_settings()`` to create a configuration object that will define the +operation of a ``Model``. The function returns a ``RunSettings`` object. +When initializing the ``RunSettings`` object, we specify the path to the application file, +`application_script.py`, to ``exe_args``, and the run command to ``exe``. + .. code-block:: python # Initialize a RunSettings object - model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/lus/scratch/richaama/standard_orch_model.py") + model_settings = exp.create_run_settings(exe=exe_ex, exe_args="application_script.py") model_settings.set_nodes(1) Step 2: Initialize @@ -259,6 +248,7 @@ Step 2: Initialize Next, create a ``Model`` instance using the ``Experiment.create_model()``. Pass the ``model_settings`` object as an argument to the ``create_model()`` function and assign to the variable ``model``: + .. code-block:: python # Initialize the Model @@ -266,7 +256,8 @@ to the ``create_model()`` function and assign to the variable ``model``: Step 3: Start ''''''''''''' -Next, launch the model instance using the ``Experiment.start()`` function. +Next, launch the model instance using the ``Experiment.start()`` function: + .. code-block:: python # Launch the Model @@ -280,20 +271,21 @@ Next, launch the model instance using the ``Experiment.start()`` function. Poll Data Using Clients ----------------------- -Next, check if the tensor exist in the standard database using ``Client.poll_tensor()``. +Next, check if the tensor exists in the standard database using ``Client.poll_tensor()``. This function queries for data in the database. The function requires the tensor name (`name`), how many milliseconds to wait in between queries (`poll_frequency_ms`), -and the total number of times to query (`num_tries`): +and the total number of times to query (`num_tries`). Check if the data exists in the database by +polling every 100 milliseconds until 10 attempts are completed: + .. code-block:: python # Retrieve the tensors placed by the Model - value_2 = driver_client_standard_db.poll_key("tensor_2", 100, 100) + value_2 = driver_client_standard_db.poll_key("tensor_2", 100, 10) # Validate that the tensor exists logger.info(f"The tensor is {value_2}") The output will be as follows:: - - test + noted to run and replace asap Cleanup Experiment ------------------ @@ -307,7 +299,7 @@ workflow summary with ``Experiment.summary()``: logger.info(exp.summary()) When you run the experiment, the following output will appear:: - test + again noted to fill in, oops ==================== Colocated Deployment From d8bd98b755ccb338426aac2d28db73447582c66b Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 3 Jan 2024 18:11:53 -0600 Subject: [PATCH 13/26] going through chris comments --- doc/orchestrator.rst | 98 +++++++++++++++++++------------------------- 1 file changed, 42 insertions(+), 56 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 8e4c53a4e..b469d2c9a 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -69,7 +69,7 @@ such as online analysis, training and processing. Clustered Orchestrators support data communication across multiple simulations. With clustered database deployment, SmartSim can run AI models, and Torchscript code on the CPU(s) or GPU(s) with existing data in the ``Orchestrator``. -Data produced by by these processes and stored in the clustered database is available for +Data produced by these processes and stored in the clustered database is available for consumption by other applications. ------- @@ -312,6 +312,13 @@ the ``Model`` compute node(s). The orchestrator is non-clustered and each ``Model`` compute node hosts an instance of the database. Processes on the compute host individually address the database. +Communication between a colocated Orchestrator and Model +is initialized in the application script via a SmartRedis client. Since a colocated Orchestrator is launched when the Model +is started by the experiment, you may only connect a SmartRedis client to a colocated database from within +the associated colocated Model script. The client establishes a connection using the database address detected +by SmartSim or provided by the user. In multiple database experiments, users provide the `db_identifier` used to create the colocated +Model when creating a client connection. + .. |colo-orc| image:: images/co-located-orc-diagram.png :width: 700 :alt: Alternative text @@ -319,9 +326,14 @@ Processes on the compute host individually address the database. |colo-orc| -This deployment is designed for highly performant online inference scenarios where +Colocated deployment is designed for highly performant online inference scenarios where a distributed process (likely MPI processes) are performing inference with -data local to each process. +data local to each process. Data produced by these processes and stored in the colocated database +can be transferred via a SmartRedis client to a standard database to become available for consumption +by other applications. A tradeoff of colocated deployment is the ability to scale to a large workload. +Colocated deployment rather benefits small/medium simulations with low latency requirements. +By hosting the database and simulation on the same compute node, communication time is reduced which +contributes to quicker processing speeds. This method is deemed ``locality based inference`` since data is local to each process and the ``Orchestrator`` is deployed locally on each compute host where @@ -330,58 +342,26 @@ the distributed application is running. ------- Example ------- -This example provides a demonstration on automating the deployment of -a colocated Orchestrator within an Experiment. +In the following example, we provide a demonstration on automating the deployment of +a colocated Orchestrator using SmartSim from within a Python driver script. Once the colocated database is launched, +we demonstrate connecting a client to the database from within the application script to transmit and poll data. The example is comprised of two script files: -* The Application Script -* The Experiment Driver Script - -**The Application Script Overview:** -The example application script is a Python file that contains -instructions to create and connect a SmartRedis -client to the colocated Orchestrator. -Since a colocated Orchestrator is launched when the Model -is started by the experiment, you may only connect -a SmartRedis client to a colocated database from within -the associated colocated Model script. - -**The Application Script Contents:** - -1. Connecting a SmartRedis client within the application to send and retrieve a tensor - from the colocated database. - -**The Experiment Driver Script Overview:** -The experiment driver script launches and manages -the example entities with the ``Experiment`` API. -In the driver script, we use the ``Experiment`` -to create and launch a colocated ``Model`` instance -launches a colocated Orchestrator and runs the application -script. - -**The Experiment Driver Script Contents:** - -1. Launching the application script with a co-located model. +- The Application Script + The example application script is a Python file that contains + instructions to create and connect a SmartRedis + client to the colocated Orchestrator. +- The Experiment Driver Script + The experiment driver script launches and manages + the example entities with the ``Experiment`` API. + In the driver script, we use the ``Experiment`` + to create and launch a colocated ``Model``. The Application Script ====================== -A SmartRedis client connects and interacts with -a launched Orchestrator. -In this section, we write an application script -that we will use as an executable argument -for the colocated Model. We demonstrate -how to connect a SmartRedis -client to the active colocated database. -Using the created client, we send a tensor -from the database, then retrieve. - -.. note:: - You must run the Python driver script to launch the Orchestrator within the - application script. Otherwise, there will be no database to connect the - client to. - To begin writing the application script, provide the imports: + .. code-block:: python from smartredis import ConfigOptions, Client, log_data @@ -400,14 +380,20 @@ since our database is single-sharded: .. note:: Since there is only one database launched in the Experiment - (the colocated database), specifying a a datbase address + (the colocated database), specifying a a database address is not required when initializing the client. SmartRedis will handle the connection. +.. note:: + To create a client connection to the colocated database, the colocated Model must be launched + from within the driver script. You must execute the Python driver script, otherwise, there will + be no database to connect the client to. + Store Data ---------- -Next, using the SmartRedis client instance, we create and store -a NumPy tensor using ``Client.put_tensor()``: +Next, using the SmartRedis client instance, we create and store a NumPy tensor using +``Client.put_tensor()``: + .. code-block:: python # Create NumPy array @@ -496,9 +482,10 @@ to the ``create_model()`` function and assign to the variable ``model``. Step 2: Colocate """""""""""""""" -To colocate the model, use the ``Model.colocate_db_uds()`` function. +To colocate the model, use the ``Model.colocate_db_tcp()`` function. This function will colocate an Orchestrator instance with this Model over a Unix domain socket connection. + .. code-block:: python # Colocate the Model @@ -531,10 +518,9 @@ When you run the experiment, the following output will appear:: Multiple Orchestrators ====================== SmartSim supports automating the deployment of multiple Orchestrators -from within an Experiment. Data communication for all -Communication with the database via a SmartRedis Client is possible from the -`db_identifier` argument that required when initializing an Orchestrator or -colocated Model during a multi database Experiment. When initializing a SmartRedis +from within an Experiment. Communication with the database via a SmartRedis client is possible with the +`db_identifier` argument that is required when initializing an Orchestrator or +colocated Model during a multiple database experiment. When initializing a SmartRedis client during the Experiment, first create a ``ConfigOptions`` object with the `db_identifier` argument created during before passing object to the Client() init call. From 5ea295460e85bfd919918659efabf915b34ef5de Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 3 Jan 2024 18:12:53 -0600 Subject: [PATCH 14/26] spacing error --- doc/orchestrator.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index b469d2c9a..5463ecab7 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -373,6 +373,7 @@ Initialize the Clients To establish a connection with the colocated database, initialize a new SmartRedis client and specify `cluster=False` since our database is single-sharded: + .. code-block:: python # Initialize a Client @@ -404,6 +405,7 @@ Next, using the SmartRedis client instance, we create and store a NumPy tensor u Retrieve Data ------------- Next, retrieve the tensor using ``Client.get_tensor()``: + .. code-block:: python # Retrieve tensor from driver script @@ -422,6 +424,7 @@ Defining workflow stages requires the utilization of functions associated with the ``Experiment`` object. In this example, we instantiate an ``Experiment`` object with the name ``getting-started``. We setup the SmartSim ``logger`` to output information from the Experiment. + .. code-block:: python import numpy as np From 369130ce4df4cba7e9911c4e55c604c75c68e17c Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Tue, 9 Jan 2024 17:55:11 -0600 Subject: [PATCH 15/26] changes made to orch --- doc/orchestrator.rst | 56 +++++++++++++++++++++++++++++--------------- 1 file changed, 37 insertions(+), 19 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 5463ecab7..94091862c 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -10,9 +10,8 @@ exchange, online interactive visualization, online data analysis, computational An ``Orchestrator`` can be thought of as a general feature store capable of storing numerical data (Tensors and Datasets), AI Models, and scripts (TorchScripts). -Combined with the SmartRedis clients, the ``Orchestrator`` is capable of hosting and executing -AI models written in Python on CPU or GPU. The ``Orchestrator`` supports AI Models written with -TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sci-kit learn). +In addition to storing data, the ``Orchestrator`` is capable of executing ML models and TorchScripts +on the stored data using CPUs or GPUs. .. |orchestrator| image:: images/Orchestrator.png :width: 700 @@ -20,14 +19,28 @@ TensorFlow, Pytorch, TensorFlow-Lite, or models saved in an ONNX format (e.g. sc |orchestrator| -SmartSim ``Models`` or ``Ensemble`` models can be instructed to connect to an ``Orchestrator`` -via the :ref:`SmartRedis` client library from within a Python driver script or -an application script. +Users can establish a connection to the ``Orchestrator`` from within SmartSim ``Model`` executable code, ``Ensemble`` +model executable code, or driver scripts using the :ref:`SmartRedis` client library. -SmartSim offers two types of orchestrator deployment: :ref:`clustered deployment` and -:ref:`colocated deployment`. During clustered deployment, the ``Orchestrator`` is launched -on separate compute resources than a ``Model``. Clustered deployment is well-suited for throughput -scenarios. In colocated deployment, an ``Orchestrator`` shares compute resources with a ``Model``. Colocated +SmartSim offers two types of ``Orchestrator`` deployments: + +- :ref:`clustered deployment` + A clustered ``Orchestrator`` is ideal for systems that have heterogeneous node types + (i.e. a mix of CPU-only and GPU-enabled compute nodes) where + ML model and TorchScript evaluation is more efficiently performed off-node for a ``Model``. This + deployment is also ideal for workflows relying on data exchange between multiple + applications (e.g. online analysis, visualization, computational steering, or + producer/consumer application couplings). Clustered deployment is also optimal for + high data throughput scenarios such as online analysis, training and processing and + databases that require a large amount of hardware. + +- :ref:`colocated deployment`. + A colocated ``Orchestrator`` is ideal when the data and hardware accelerator are located on the same compute node. + This setup helps reduce latency in ML inference and TorchScript evaluation by eliminating off-node communication. + +During clustered deployment, the ``Orchestrator`` is launched +on separate compute nodes than a ``Model``. Clustered deployment is well-suited for throughput +scenarios. In colocated deployment, an ``Orchestrator`` shares compute nodes with a ``Model``. Colocated deployment is well-suited for inference scenarios. SmartSim allows users to launch multiple orchestrators during the course of an experiment of @@ -37,6 +50,7 @@ orchestrators in a parallel database workflow by specifying the respective `db_i when initializing a SmartRedis client object. The client can then be used to transmit data, execute ML models, and execute scripts on the linked database. +.. _clustered_orch_doc: ====================== Clustered Deployment ====================== @@ -44,19 +58,22 @@ Clustered Deployment Overview -------- During clustered deployment, a SmartSim ``Orchestrator`` (the database) runs on separate -compute node(s) from the model node(s). A clustered orchestrator can be deployed on a single -node or sharded (distributed) over multiple nodes. -With a sharded orchestrator, available hardware for inference and script -evaluation increases and overall memory for data storage increases. +compute node(s) from the ``Model`` node(s). A clustered ``Orchestrator`` can be deployed on a single +node or sharded (distributed) over multiple nodes. With a sharded ``Orchestrator``, available hardware +for inference and script evaluation increases and overall memory for data storage increases. Communication between a clustered Orchestrator and Model -is initialized in the application script via a SmartRedis client. +is initialized in the ``Model`` application script via a SmartRedis client. Users do not need to know how the data is stored in a clustered configuration and can address the cluster with the SmartRedis clients like a single block of memory -using simple put/get semantics in SmartRedis. The client establishes a -connection using the database address detected by SmartSim or provided by the user. In multiple -database experiments, users provide the `db_identifier` used to create the clustered -database when creating a client. +using simple put/get semantics in SmartRedis. The client can establish a connection +with an ``Orchestrator`` through **three** processes: + +- SmartSim establishes a connection using the database address provided by SmartSim through ``Model`` environment configuration + at runtime. +- A user provides the database address in the Client constructor. +- In multiple database experiments, a user provides the `db_identifier` used to create the clustered + database when creating a client. .. |cluster-orc| image:: images/clustered-orc-diagram.png :width: 700 @@ -302,6 +319,7 @@ When you run the experiment, the following output will appear:: again noted to fill in, oops ==================== +.. _colocated_orch_doc: Colocated Deployment ==================== -------- From 7d1b191038802680ba78528ecb9abfde281e15bc Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Tue, 9 Jan 2024 23:23:14 -0600 Subject: [PATCH 16/26] pushing half way point for matts comments --- doc/orchestrator.rst | 221 +++++++++++++++++++++++-------------------- 1 file changed, 120 insertions(+), 101 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 94091862c..6a2d083f2 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -9,15 +9,15 @@ AI-enabled workflows including online training, low-latency inference, cross-app exchange, online interactive visualization, online data analysis, computational steering, and more. An ``Orchestrator`` can be thought of as a general feature store -capable of storing numerical data (Tensors and Datasets), AI Models, and scripts (TorchScripts). -In addition to storing data, the ``Orchestrator`` is capable of executing ML models and TorchScripts -on the stored data using CPUs or GPUs. +capable of storing numerical data (Tensors and Datasets), AI Models (TF, TF-lite, PyTorch, or ONNX), +and scripts (TorchScripts). In addition to storing data, the ``Orchestrator`` is capable of +executing ML models and TorchScripts on the stored data using CPUs or GPUs. -.. |orchestrator| image:: images/Orchestrator.png - :width: 700 - :alt: Alternative text +.. figure:: images/Experiment.png -|orchestrator| + Sample experiment showing a user application leveraging + machine learning infrastructure launched by SmartSim and connected + to online analysis and visualization via the in-memory database. Users can establish a connection to the ``Orchestrator`` from within SmartSim ``Model`` executable code, ``Ensemble`` model executable code, or driver scripts using the :ref:`SmartRedis` client library. @@ -25,23 +25,18 @@ model executable code, or driver scripts using the :ref:`SmartRedis` - A clustered ``Orchestrator`` is ideal for systems that have heterogeneous node types - (i.e. a mix of CPU-only and GPU-enabled compute nodes) where - ML model and TorchScript evaluation is more efficiently performed off-node for a ``Model``. This - deployment is also ideal for workflows relying on data exchange between multiple - applications (e.g. online analysis, visualization, computational steering, or - producer/consumer application couplings). Clustered deployment is also optimal for - high data throughput scenarios such as online analysis, training and processing and - databases that require a large amount of hardware. - -- :ref:`colocated deployment`. - A colocated ``Orchestrator`` is ideal when the data and hardware accelerator are located on the same compute node. - This setup helps reduce latency in ML inference and TorchScript evaluation by eliminating off-node communication. - -During clustered deployment, the ``Orchestrator`` is launched -on separate compute nodes than a ``Model``. Clustered deployment is well-suited for throughput -scenarios. In colocated deployment, an ``Orchestrator`` shares compute nodes with a ``Model``. Colocated -deployment is well-suited for inference scenarios. + A clustered ``Orchestrator`` is ideal for systems that have heterogeneous node types + (i.e. a mix of CPU-only and GPU-enabled compute nodes) where + ML model and TorchScript evaluation is more efficiently performed off-node for a ``Model``. This + deployment is also ideal for workflows relying on data exchange between multiple + applications (e.g. online analysis, visualization, computational steering, or + producer/consumer application couplings). Clustered deployment is also optimal for + high data throughput scenarios such as online analysis, training and processing and + databases that require a large amount of hardware. + +- :ref:`colocated deployment` + A colocated ``Orchestrator`` is ideal when the data and hardware accelerator are located on the same compute node. + This setup helps reduce latency in ML inference and TorchScript evaluation by eliminating off-node communication. SmartSim allows users to launch multiple orchestrators during the course of an experiment of either deployment type. If a workflow requires a multiple database environment, a @@ -57,60 +52,76 @@ Clustered Deployment -------- Overview -------- -During clustered deployment, a SmartSim ``Orchestrator`` (the database) runs on separate +During clustered ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on separate compute node(s) from the ``Model`` node(s). A clustered ``Orchestrator`` can be deployed on a single -node or sharded (distributed) over multiple nodes. With a sharded ``Orchestrator``, available hardware -for inference and script evaluation increases and overall memory for data storage increases. +node (standalone) or sharded (distributed) over multiple nodes. With a sharded ``Orchestrator``, users can +scale the number of database nodes for inference and script evaluation, leading to an +increased in-memory capacity for data storage in large-scale workflows. Standalone +``Orchestrators`` are effective for small-scale workflows and offer lower latency since +single-node ``Orchestrators`` don't involve communication between nodes. -Communication between a clustered Orchestrator and Model -is initialized in the ``Model`` application script via a SmartRedis client. -Users do not need to know how the data is stored in a clustered configuration and -can address the cluster with the SmartRedis clients like a single block of memory -using simple put/get semantics in SmartRedis. The client can establish a connection -with an ``Orchestrator`` through **three** processes: +In high data throughput scenarios, such as online analysis, training, and processing, a clustered ``Orchestrator`` +is optimal. The data produced by processes performed in a ``Model`` and stored in the clustered ``Orchestrator`` becomes +available for consumption by other ``Models``. -- SmartSim establishes a connection using the database address provided by SmartSim through ``Model`` environment configuration - at runtime. -- A user provides the database address in the Client constructor. -- In multiple database experiments, a user provides the `db_identifier` used to create the clustered - database when creating a client. +Communication between a clustered ``Orchestrator`` and ``Model`` +is facilitated by a SmartRedis ``Client`` and initialized in a ``Model``. The following image illustrates +communication between a clustered ``Orchestrator`` and a +``Model``. In the diagram, the application is running on multiple compute nodes, +separate from the ``Orchestrator`` compute nodes. Communication is established between the +``Model`` application and the sharded ``Orchestrator`` using the :ref:`SmartRedis Client` Client. -.. |cluster-orc| image:: images/clustered-orc-diagram.png - :width: 700 - :alt: Alternative text +.. figure:: images/clustered_orchestrator-1.png + + Sample Clustered ``Orchestrator`` Deployment + +.. note:: + Users do not need to know how the data is stored in a clustered configuration and + can address the cluster with the SmartRedis clients like a single block of memory + using simple put/get semantics in SmartRedis. + +A SmartRedis ``Client`` can establish a connection with an ``Orchestrator`` through **four** processes: -|cluster-orc| +- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim + to detect the database address through the ``Model`` environment configuration + at runtime. +- User's can provide the database address in the Client constructor within the ``Model`` application script. +- User's can provide the database address in the Client constructor within the driver script via ``Orchestrator.get_address()``. +- In an experiment with multiple ``Orchestrator`` deployments, a user can connect to an ``Orchestrator`` by + specifying the `db_identifier` used when initializing the designated ``Orchestrator``. -A clustered database is optimal for high data throughput scenarios -such as online analysis, training and processing. -Clustered Orchestrators support data communication across multiple simulations. -With clustered database deployment, SmartSim can run AI models, and Torchscript -code on the CPU(s) or GPU(s) with existing data in the ``Orchestrator``. -Data produced by these processes and stored in the clustered database is available for -consumption by other applications. +.. note:: + The Model application code can remain unchanged as ``Orchestrator`` connection options are varied. ------- Example ------- -In the following example, we provide a demonstration on automating the deployment of -a clustered Orchestrator using SmartSim from within a Python driver script. Once the standard database is launched, -we demonstrate connecting a client to the database from within the application script to transmit and poll data. +In the following example, we demonstrate deploying a clustered ``Orchestrator``. +Once the clustered database is launched from the driver script, we walk through +connecting a SmartRedis ``Client`` to the database from within the application +script to transmit data then poll for the existence of the data. The example is comprised of two script files: -- The Application Script +- :ref:`Application Script` The application script is a Python file that contains instructions to create SmartRedis client connection to the standard Orchestrator launched in the driver script. From within the - application script, the client sends and retrieve data. -- The Experiment Driver Script - The experiment driver script launches and manages SmartSim entities. In the driver script, we use the Experiment - API to create and launch a standard ``orchestrator``. We create a client connection and store a tensor for use within - the application. We then initialize a ``Model`` object with the - application script as an executable argument. Once the database has launched, we launch the ``Model``. - We then retrieve the tensors stored by the ``Model`` from within the driver script. Lastly, we tear down the database. - -The Application Script -====================== + application script, the client sends and retrieves data. +- :ref:`Experiment Driver Script` + The experiment driver script is responsible for launching and managing SmartSim entities. Within this script, + we use the Experiment API to create and launch a standard ``Orchestrator``. To demonstrate the capability of + ``Model`` applications to access database data sent from other sources, we employ the SmartRedis ``Client`` in + the driver script to store a tensor in the ``Orchestrator``, which is later retrieved by the ``Model``. + Subsequently, we initialize a ``Model`` object with the application script as an executable argument, + launch the ``Orchestrator``, and then launch the ``Model``. + + To further demonstrate the ability of workflow components to access data from + other entities, we then retrieve the tensors stored by the ``Model`` using a SmartRedis client in + the driver script. Lastly, we tear down the ``Orchestrator``. + +.. _clustered_orch_app_script: +Application Script +================== To begin writing the application script, import the necessary SmartRedis packages: .. code-block:: python @@ -119,18 +130,18 @@ To begin writing the application script, import the necessary SmartRedis package from smartredis import * import numpy as np -Initialize the Client +Client Initialization --------------------- -To establish a connection with the standard database, we need to initialize a new SmartRedis client. -Since the standard database we launch in the driver script is sharded, we specify the `cluster` as `True`: +To establish a connection with the ``Orchestrator``, we need to initialize a new SmartRedis client. +Since the ``Orchestrator`` we launch in the driver script is sharded, we specify the `cluster` as `True`: .. code-block:: python # Initialize a Client standard_db_client = Client(cluster=True) -Retrieve Data -------------- +Data Retrieval +-------------- To confirm a successful connection to the database, we retrieve the tensor we store in the Python driver script. Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name `tensor_1` we used during ``Client.put_tensor()`` in the driver script: @@ -147,8 +158,8 @@ located in ``getting-started/tutorial_model/``:: Default@17-11-48:The single sharded db tensor is: [1 2 3 4] -Store Data ----------- +Data Storage +------------ Next, create a NumPy tensor to send to the standard database using ``Client.put_tensor(name, data)``: @@ -161,14 +172,16 @@ Next, create a NumPy tensor to send to the standard database using We will retrieve `"tensor_2"` in the Python driver script. -The Experiment Driver Script -============================ +.. _clustered_orch_driver_script: +Experiment Driver Script +======================== To run the previous application script, we define a ``Model`` and ``Orchestrator`` within an -Python driver script. Defining workflow stages (``Model`` and ``Orchestrator``) requires the utilization of functions associated -with the ``Experiment`` object. The ``Experiment`` object is intended to be instantiated +Python driver script. Configuring and launching workflow entities (``Model`` and ``Orchestrator``) requires the utilization of +``Experiment`` class methods. The ``Experiment`` object is intended to be instantiated once and utilized throughout the workflow runtime. -In this example, we instantiate an ``Experiment`` object with the name ``getting-started``. -We setup the SmartSim ``logger`` to output information from the Experiment: + +In this example, we instantiate an ``Experiment`` object with the name `getting-started`, and we +setup the SmartSim `logger` to output information from the ``Experiment`` at runtime: .. code-block:: python @@ -178,7 +191,9 @@ We setup the SmartSim ``logger`` to output information from the Experiment: from smartsim.log import get_logger import sys + # returns the executable binary for the Python interpreter exe_ex = sys.executable + # Initialize the logger logger = get_logger("Example Experiment Log") # Initialize the Experiment exp = Experiment("getting-started", launcher="auto") @@ -186,40 +201,41 @@ We setup the SmartSim ``logger`` to output information from the Experiment: Launch Standard Orchestrator ---------------------------- In the context of this ``Experiment``, it's essential to create and launch -the databases as a preliminary step before any other workflow components. This is because +the databases as a preliminary step before any other workflow entities. This is because the application script requests and sends tensors to and from a launched database. -We aim to demonstrate the standard orchestrator automation capabilities of SmartSim, so we -create a clustered database in the workflow. +In this subsection, we demonstrate the ability of SmartSim to launch a clustered ``Orchestrator``. Step 1: Initialize Orchestrator ''''''''''''''''''''''''''''''' -To create a standard database, utilize the ``Experiment.create_database()`` function. +To create a clustered database, utilize the ``Experiment.create_database()`` function. + .. code-block:: python - # Initialize a multi sharded database + # Initialize a multi-sharded database standard_db = exp.create_database(db_nodes=3) exp.generate(standard_db) Step 2: Start Databases ''''''''''''''''''''''' Next, to launch the database, pass the database instance to ``Experiment.start()``. + .. code-block:: python # Launch the multi sharded database exp.start(standard_db) The ``Experiment.start()`` function launches the ``Orchestrator`` for use within the workflow. -In other words, the function deploys the database on the allocated compute resources. +In other words, the function deploys the ``Orchestrator`` on the allocated compute resources. -Create a Client Connection to the Orchestrator ----------------------------------------------- +Client Initialization +--------------------- The SmartRedis ``Client`` object contains functions that manipulate, send, and retrieve -data on the database. Each database can have a single, dedicated SmartRedis ``Client`` connection. -Begin by initializing a SmartRedis ``Client`` object for the standard database. +data on the database. Begin by initializing a SmartRedis ``Client`` object for the standard database. -When creating a client connection from within a driver script, -specify the address of the database you would like to connect to. +SmartRedis clients in driver scripts do not have the ability to use a `db-identifier` or +rely on automatic configurations to connect to ``Orchestrators``. Therefore, when creating a client +connection from within a driver script, specify the address of the database you would like to connect to. You can easily retrieve the database address using the ``Orchestrator.get_address()`` function: .. code-block:: python @@ -227,12 +243,13 @@ You can easily retrieve the database address using the ``Orchestrator.get_addres # Initialize a SmartRedis client for multi sharded database driver_client_standard_db = Client(cluster=True, address=standard_db.get_address()[0]) -Store Data Using Clients ------------------------- +Data Storage +------------ In the application script, we retrieved a NumPy tensor stored from within the driver script. To support the application functionality, we create a NumPy array in the experiment workflow to send to the database. To send a tensor to the database, use the function ``Client.put_tensor()``: + .. code-block:: python # Create NumPy array @@ -240,19 +257,20 @@ send a tensor to the database, use the function ``Client.put_tensor()``: # Use the SmartRedis client to place tensor in the standard database driver_client_standard_db.put_tensor("tensor_1", array_1) -Initialize a Model ------------------- +Model Initialization +-------------------- In the next stage of the experiment, we execute the application script by configuring and creating a SmartSim ``Model`` and specifying the application script name during ``Model`` creation. Step 1: Configure ''''''''''''''''' -In the example experiment, we invoke the Python interpreter to run -the python application script defined in section: The Application Script. -We use ``Experiment.create_run_settings()`` to create a configuration object that will define the -operation of a ``Model``. The function returns a ``RunSettings`` object. -When initializing the ``RunSettings`` object, we specify the path to the application file, -`application_script.py`, to ``exe_args``, and the run command to ``exe``. +In this example experiment, the ``Model`` application is a Python script as defined in section: +:ref:`Application Script`. Before creating the ``Model`` object for this application, we must use +Experiment.create_run_settings() to create a RunSettings object that defines how to execute +the Model. To launch the Python script in this example, we specify the path to the application +file application_script.py as the exe_args parameter and the executable exe_ex (the Python +executable on this system) as exe parameter. The Experiment.create_run_settings() function +will return a RunSettings object that can then be used to initialize the Model object. .. code-block:: python @@ -260,9 +278,9 @@ When initializing the ``RunSettings`` object, we specify the path to the applica model_settings = exp.create_run_settings(exe=exe_ex, exe_args="application_script.py") model_settings.set_nodes(1) -Step 2: Initialize -'''''''''''''''''' -Next, create a ``Model`` instance using the ``Experiment.create_model()``. +Step 2: Initialization +'''''''''''''''''''''' +Next, create a ``Model`` instance using the ``Experiment.create_model()`` factory method. Pass the ``model_settings`` object as an argument to the ``create_model()`` function and assign to the variable ``model``: @@ -319,6 +337,7 @@ When you run the experiment, the following output will appear:: again noted to fill in, oops ==================== + .. _colocated_orch_doc: Colocated Deployment ==================== From 410cb8b4651a18bbfc2f45f5cdb1dc07fc55eea5 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Tue, 9 Jan 2024 23:58:57 -0600 Subject: [PATCH 17/26] push comments, pause --- doc/orchestrator.rst | 115 ++++++++++++++++++++++--------------------- 1 file changed, 60 insertions(+), 55 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 6a2d083f2..de497476d 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -198,8 +198,8 @@ setup the SmartSim `logger` to output information from the ``Experiment`` at run # Initialize the Experiment exp = Experiment("getting-started", launcher="auto") -Launch Standard Orchestrator ----------------------------- +Orchestrator Deployment +----------------------- In the context of this ``Experiment``, it's essential to create and launch the databases as a preliminary step before any other workflow entities. This is because the application script requests and sends tensors to and from a launched database. @@ -257,8 +257,8 @@ send a tensor to the database, use the function ``Client.put_tensor()``: # Use the SmartRedis client to place tensor in the standard database driver_client_standard_db.put_tensor("tensor_1", array_1) -Model Initialization --------------------- +Standard Model Initialization +----------------------------- In the next stage of the experiment, we execute the application script by configuring and creating a SmartSim ``Model`` and specifying the application script name during ``Model`` creation. @@ -272,6 +272,10 @@ file application_script.py as the exe_args parameter and the executable exe_ex ( executable on this system) as exe parameter. The Experiment.create_run_settings() function will return a RunSettings object that can then be used to initialize the Model object. +.. note:: + Change the `exe_args` argument to the path of the application script + on your file system to run the example. + .. code-block:: python # Initialize a RunSettings object @@ -281,8 +285,8 @@ will return a RunSettings object that can then be used to initialize the Model o Step 2: Initialization '''''''''''''''''''''' Next, create a ``Model`` instance using the ``Experiment.create_model()`` factory method. -Pass the ``model_settings`` object as an argument -to the ``create_model()`` function and assign to the variable ``model``: +Pass the ``model_settings`` object as an argument to the ``create_model()`` function and +store the returned ``Model`` object to the variable `model`: .. code-block:: python @@ -320,10 +324,10 @@ polling every 100 milliseconds until 10 attempts are completed: logger.info(f"The tensor is {value_2}") The output will be as follows:: - noted to run and replace asap + 23:45:46 osprey.us.cray.com SmartSim[87400] INFO The tensor is True -Cleanup Experiment ------------------- +Cleanup +------- Finally, use the ``Experiment.stop()`` function to stop the database instances. Print the workflow summary with ``Experiment.summary()``: @@ -334,11 +338,13 @@ workflow summary with ``Experiment.summary()``: logger.info(exp.summary()) When you run the experiment, the following output will appear:: - again noted to fill in, oops - -==================== + | | Name | Entity-Type | JobID | RunID | Time | Status | Returncode | + |----|----------------|---------------|-------------|---------|---------|-----------|--------------| + | 0 | model | Model | 1658679.3 | 0 | 1.3342 | Completed | 0 | + | 1 | orchestrator_0 | DBNode | 1658679.2+2 | 0 | 42.8742 | Cancelled | 0 | .. _colocated_orch_doc: +==================== Colocated Deployment ==================== -------- @@ -346,22 +352,23 @@ Overview -------- During colocated deployment, a SmartSim ``Orchestrator`` (the database) is launched on the ``Model`` compute node(s). -The orchestrator is non-clustered and each ``Model`` compute node hosts an instance of the database. +The ``Orchestrator`` is non-clustered and each ``Model`` compute node hosts an instance of the database. Processes on the compute host individually address the database. -Communication between a colocated Orchestrator and Model +Communication between a colocated ``Orchestrator`` and ``Model`` is initialized in the application script via a SmartRedis client. Since a colocated Orchestrator is launched when the Model is started by the experiment, you may only connect a SmartRedis client to a colocated database from within -the associated colocated Model script. The client establishes a connection using the database address detected -by SmartSim or provided by the user. In multiple database experiments, users provide the `db_identifier` used to create the colocated -Model when creating a client connection. +the associated colocated ``Model`` application. The client establishes a connection using the database address detected +by SmartSim or provided by the user. In multiple database experiments, users provide the `db_identifier` that was specified +during ``Model`` initialization when creating a client connection. -.. |colo-orc| image:: images/co-located-orc-diagram.png - :width: 700 - :alt: Alternative text +Below is an image illustrating communication within a colocated model spanning multiple compute nodes. +As demonstrated in the diagram, each process of the application creates its own SmartRedis client +connection to the orchestrator running on the same host. +.. figure:: images/colocated_orchestrator-1.png -|colo-orc| + Sample Colocated Orchestrator Deployment Colocated deployment is designed for highly performant online inference scenarios where a distributed process (likely MPI processes) are performing inference with @@ -372,10 +379,6 @@ Colocated deployment rather benefits small/medium simulations with low latency r By hosting the database and simulation on the same compute node, communication time is reduced which contributes to quicker processing speeds. -This method is deemed ``locality based inference`` since data is local to each -process and the ``Orchestrator`` is deployed locally on each compute host where -the distributed application is running. - ------- Example ------- @@ -385,19 +388,20 @@ we demonstrate connecting a client to the database from within the application s The example is comprised of two script files: -- The Application Script +- :ref:`Application Script` The example application script is a Python file that contains instructions to create and connect a SmartRedis client to the colocated Orchestrator. -- The Experiment Driver Script +- :ref:`Experiment Driver Script` The experiment driver script launches and manages the example entities with the ``Experiment`` API. - In the driver script, we use the ``Experiment`` + In the driver script, we use the ``Experiment`` API to create and launch a colocated ``Model``. -The Application Script -====================== -To begin writing the application script, provide the imports: +.. _colocated_orch_app_script: +Application Script +================== +To begin writing the application script, import the necessary SmartRedis packages: .. code-block:: python @@ -405,8 +409,8 @@ To begin writing the application script, provide the imports: from smartredis import * import numpy as np -Initialize the Clients ----------------------- +Client Initialization +--------------------- To establish a connection with the colocated database, initialize a new SmartRedis client and specify `cluster=False` since our database is single-sharded: @@ -420,15 +424,15 @@ since our database is single-sharded: Since there is only one database launched in the Experiment (the colocated database), specifying a a database address is not required when initializing the client. - SmartRedis will handle the connection. + SmartRedis will handle the connection configuration. .. note:: To create a client connection to the colocated database, the colocated Model must be launched from within the driver script. You must execute the Python driver script, otherwise, there will be no database to connect the client to. -Store Data ----------- +Data Storage +------------ Next, using the SmartRedis client instance, we create and store a NumPy tensor using ``Client.put_tensor()``: @@ -439,8 +443,8 @@ Next, using the SmartRedis client instance, we create and store a NumPy tensor u # Store the NumPy tensor colo_client.put_tensor("tensor_1", array_1) -Retrieve Data -------------- +Data Retrieval +-------------- Next, retrieve the tensor using ``Client.get_tensor()``: .. code-block:: python @@ -453,8 +457,9 @@ Next, retrieve the tensor using ``Client.get_tensor()``: When the Experiment completes, you can find the following log message in `colo_model.out`:: Default@21-48-01:The colocated db tensor is: [1 2 3 4] -The Experiment Driver Script -============================ +.. _colocated_orch_driver_script: +Experiment Driver Script +======================== To run the application, specify a Model workload from within the workflow (Experiment). Defining workflow stages requires the utilization of functions associated @@ -470,20 +475,22 @@ We setup the SmartSim ``logger`` to output information from the Experiment. from smartsim.log import get_logger import sys + # returns the executable binary for the Python interpreter exe_ex = sys.executable + # Initialize a logger object logger = get_logger("Example Experiment Log") # Initialize the Experiment exp = Experiment("getting-started", launcher="auto") -Initialize a Colocated Model ----------------------------- +Colocated Model Initialization +------------------------------ In the next stage of the experiment, we create and launch a colocated ``Model`` that -runs the application script with a database +runs the application script with a ``Orchestrator`` on the same compute node. Step 1: Configure -""""""""""""""""" +''''''''''''''''' In this experiment, we invoke the Python interpreter to run the python script defined in section: The Application Script. To configure this into a ``Model``, we use the ``Experiment.create_run_settings()`` function. @@ -511,17 +518,17 @@ example, we specify to SmartSim that we intend the Model to run on a single comp model_settings.set_nodes(1) Step 2: Initialize -"""""""""""""""""" +'''''''''''''''''' Next, create a ``Model`` instance using the ``Experiment.create_model()``. -Pass the ``model_settings`` object as an argument -to the ``create_model()`` function and assign to the variable ``model``. +Pass the ``model_settings`` object as an argument to the ``create_model()`` +function and assign to the variable ``Model`` object to the variable `model`. .. code-block:: python # Initialize a SmartSim Model model = exp.create_model("colo_model", model_settings) -Step 2: Colocate -"""""""""""""""" +Step 3: Colocate +'''''''''''''''' To colocate the model, use the ``Model.colocate_db_tcp()`` function. This function will colocate an Orchestrator instance with this Model over a Unix domain socket connection. @@ -531,18 +538,16 @@ a Unix domain socket connection. # Colocate the Model model.colocate_db_tcp() -Step 3: Start -""""""""""""" +Step 4: Start +''''''''''''' Next, launch the colocated model instance using the ``Experiment.start()`` function. .. code-block:: python # Launch the colocated Model exp.start(model, block=True, summary=True) -test - -Cleanup Experiment ------------------- +Cleanup +------- .. code-block:: python From fdc38eb69ad42ef684e19fe13f82a0eb63bc5605 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 10 Jan 2024 10:45:49 -0600 Subject: [PATCH 18/26] final edits made --- doc/orchestrator.rst | 268 ++++++++++++++++++++++++++----------------- 1 file changed, 160 insertions(+), 108 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index de497476d..09fad895a 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -13,7 +13,7 @@ capable of storing numerical data (Tensors and Datasets), AI Models (TF, TF-lite and scripts (TorchScripts). In addition to storing data, the ``Orchestrator`` is capable of executing ML models and TorchScripts on the stored data using CPUs or GPUs. -.. figure:: images/Experiment.png +.. figure:: images/smartsim-arch.png Sample experiment showing a user application leveraging machine learning infrastructure launched by SmartSim and connected @@ -24,25 +24,24 @@ model executable code, or driver scripts using the :ref:`SmartRedis` +- :ref:`Clustered Deployment` A clustered ``Orchestrator`` is ideal for systems that have heterogeneous node types (i.e. a mix of CPU-only and GPU-enabled compute nodes) where ML model and TorchScript evaluation is more efficiently performed off-node for a ``Model``. This deployment is also ideal for workflows relying on data exchange between multiple applications (e.g. online analysis, visualization, computational steering, or producer/consumer application couplings). Clustered deployment is also optimal for - high data throughput scenarios such as online analysis, training and processing and - databases that require a large amount of hardware. + high data throughput scenarios with databases that require a large amount of hardware. -- :ref:`colocated deployment` +- :ref:`Colocated Deployment` A colocated ``Orchestrator`` is ideal when the data and hardware accelerator are located on the same compute node. This setup helps reduce latency in ML inference and TorchScript evaluation by eliminating off-node communication. -SmartSim allows users to launch multiple orchestrators during the course of an experiment of +SmartSim allows users to launch **multiple orchestrators** during the course of an experiment of either deployment type. If a workflow requires a multiple database environment, a `db_identifier` argument must be specified during database initialization. Users can connect to orchestrators in a parallel database workflow by specifying the respective `db_identifier` argument -when initializing a SmartRedis client object. The client can then be used to transmit data, +within a ``ConfigOptions`` object to pass in to the SmartRedis ``Client`` constructor. The client can then be used to transmit data, execute ML models, and execute scripts on the linked database. .. _clustered_orch_doc: @@ -55,17 +54,30 @@ Overview During clustered ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on separate compute node(s) from the ``Model`` node(s). A clustered ``Orchestrator`` can be deployed on a single node (standalone) or sharded (distributed) over multiple nodes. With a sharded ``Orchestrator``, users can -scale the number of database nodes for inference and script evaluation, leading to an +scale the number of database nodes for inference and script evaluation, contributing to an increased in-memory capacity for data storage in large-scale workflows. Standalone ``Orchestrators`` are effective for small-scale workflows and offer lower latency since single-node ``Orchestrators`` don't involve communication between nodes. -In high data throughput scenarios, such as online analysis, training, and processing, a clustered ``Orchestrator`` -is optimal. The data produced by processes performed in a ``Model`` and stored in the clustered ``Orchestrator`` becomes -available for consumption by other ``Models``. - Communication between a clustered ``Orchestrator`` and ``Model`` -is facilitated by a SmartRedis ``Client`` and initialized in a ``Model``. The following image illustrates +is facilitated by a SmartRedis ``Client`` and initialized in a ``Model`` application script. + +A SmartRedis ``Client`` can establish a connection with an ``Orchestrator`` through **four** processes: + +- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim + to detect the database address through the ``Model`` environment configuration + at runtime. +- User's can provide the database address in the ``Client`` constructor within the ``Model`` application script. +- User's can provide the database address in the ``Client`` constructor within the driver script. Users + can access an ``Orchestrators`` address through ``Orchestrator.get_address()``. +- In an experiment with multiple ``Orchestrator`` deployments, a user can connect to an ``Orchestrator`` by + first specifying the `db_identifier` to the ``ConfigOptions`` constructor. A user should then pass the ``ConfigOptions`` + instance to the ``Client`` constructor. + +.. note:: + The Model application code can remain unchanged as ``Orchestrator`` connection options are varied. + +The following image illustrates communication between a clustered ``Orchestrator`` and a ``Model``. In the diagram, the application is running on multiple compute nodes, separate from the ``Orchestrator`` compute nodes. Communication is established between the @@ -80,39 +92,39 @@ separate from the ``Orchestrator`` compute nodes. Communication is established b can address the cluster with the SmartRedis clients like a single block of memory using simple put/get semantics in SmartRedis. -A SmartRedis ``Client`` can establish a connection with an ``Orchestrator`` through **four** processes: - -- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim - to detect the database address through the ``Model`` environment configuration - at runtime. -- User's can provide the database address in the Client constructor within the ``Model`` application script. -- User's can provide the database address in the Client constructor within the driver script via ``Orchestrator.get_address()``. -- In an experiment with multiple ``Orchestrator`` deployments, a user can connect to an ``Orchestrator`` by - specifying the `db_identifier` used when initializing the designated ``Orchestrator``. +In high data throughput scenarios, such as online analysis, training, and processing, a clustered ``Orchestrator`` +is optimal. The data produced by processes performed in a ``Model`` and stored in the clustered ``Orchestrator`` becomes +available for consumption by other ``Models``. -.. note:: - The Model application code can remain unchanged as ``Orchestrator`` connection options are varied. +If a workflow requires an application to leverage multiple clustered deployments, +multiple clients can be instantiated within an application, +with each client connected to a unique deployment. This is accomplished through the use of the +`db-identifier` and ``ConfigOptions`` object specified at ``Orchestrator`` initialization time. ------- Example ------- In the following example, we demonstrate deploying a clustered ``Orchestrator``. -Once the clustered database is launched from the driver script, we walk through +Once the clustered ``Orchestrator`` is launched from the driver script, we walk through connecting a SmartRedis ``Client`` to the database from within the application script to transmit data then poll for the existence of the data. The example is comprised of two script files: - :ref:`Application Script` - The application script is a Python file that contains instructions to create SmartRedis - client connection to the standard Orchestrator launched in the driver script. From within the - application script, the client sends and retrieves data. + The application script is a Python file that contains instructions to create a SmartRedis + client connection to the standard ``Orchestrator`` launched in the driver script. + To demonstrate the ability of workflow components to access data from + other entities, we then retrieve the tensors stored in the driver script using a SmartRedis client in + the application script. + + We then instruct the client to send and retrieve data from within the application script. - :ref:`Experiment Driver Script` The experiment driver script is responsible for launching and managing SmartSim entities. Within this script, we use the Experiment API to create and launch a standard ``Orchestrator``. To demonstrate the capability of ``Model`` applications to access database data sent from other sources, we employ the SmartRedis ``Client`` in the driver script to store a tensor in the ``Orchestrator``, which is later retrieved by the ``Model``. - Subsequently, we initialize a ``Model`` object with the application script as an executable argument, + To employ the application script, we initialize a ``Model`` object with the application script as an executable argument, launch the ``Orchestrator``, and then launch the ``Model``. To further demonstrate the ability of workflow components to access data from @@ -133,18 +145,30 @@ To begin writing the application script, import the necessary SmartRedis package Client Initialization --------------------- To establish a connection with the ``Orchestrator``, we need to initialize a new SmartRedis client. -Since the ``Orchestrator`` we launch in the driver script is sharded, we specify the `cluster` as `True`: +Since the ``Orchestrator`` we launch in the driver script is sharded, we specify the +constructor argument `cluster` as `True`: .. code-block:: python # Initialize a Client standard_db_client = Client(cluster=True) +.. note:: + Since there is only one database launched in the Experiment + (the standard database), specifying a a database address + is not required when initializing the client. + SmartRedis will handle the connection configuration. + +.. note:: + To create a client connection to the clustered database, the standard ``Orchestrator`` must be launched + from within the driver script. You must execute the Python driver script, otherwise, there will + be no database to connect the client to. + Data Retrieval -------------- To confirm a successful connection to the database, we retrieve the tensor we store in the Python driver script. Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name `tensor_1` we -used during ``Client.put_tensor()`` in the driver script: +used in the driver script to ``Client.put_tensor()``: .. code-block:: python @@ -153,7 +177,7 @@ used during ``Client.put_tensor()`` in the driver script: # Log tensor standard_db_client.log_data(LLInfo, f"The single sharded db tensor is: {value_1}") -Later, when you run the experiment driver script the following output will appear in ``model.out`` +Later, when you run the driver script the following output will appear in ``model.out`` located in ``getting-started/tutorial_model/``:: Default@17-11-48:The single sharded db tensor is: [1 2 3 4] @@ -175,7 +199,7 @@ We will retrieve `"tensor_2"` in the Python driver script. .. _clustered_orch_driver_script: Experiment Driver Script ======================== -To run the previous application script, we define a ``Model`` and ``Orchestrator`` within an +To run the previous application script, we define a ``Model`` and ``Orchestrator`` within a Python driver script. Configuring and launching workflow entities (``Model`` and ``Orchestrator``) requires the utilization of ``Experiment`` class methods. The ``Experiment`` object is intended to be instantiated once and utilized throughout the workflow runtime. @@ -201,10 +225,10 @@ setup the SmartSim `logger` to output information from the ``Experiment`` at run Orchestrator Deployment ----------------------- In the context of this ``Experiment``, it's essential to create and launch -the databases as a preliminary step before any other workflow entities. This is because +the database as a preliminary step before any other workflow entities. This is because the application script requests and sends tensors to and from a launched database. -In this subsection, we demonstrate the ability of SmartSim to launch a clustered ``Orchestrator``. +In the next stage of the experiment, we create and launch a standard ``Orchestrator``. Step 1: Initialize Orchestrator ''''''''''''''''''''''''''''''' @@ -214,7 +238,6 @@ To create a clustered database, utilize the ``Experiment.create_database()`` fun # Initialize a multi-sharded database standard_db = exp.create_database(db_nodes=3) - exp.generate(standard_db) Step 2: Start Databases ''''''''''''''''''''''' @@ -248,7 +271,7 @@ Data Storage In the application script, we retrieved a NumPy tensor stored from within the driver script. To support the application functionality, we create a NumPy array in the experiment workflow to send to the database. To -send a tensor to the database, use the function ``Client.put_tensor()``: +send a tensor to the database, use the function ``Client.put_tensor(name, data)``: .. code-block:: python @@ -265,37 +288,42 @@ a SmartSim ``Model`` and specifying the application script name during ``Model`` Step 1: Configure ''''''''''''''''' In this example experiment, the ``Model`` application is a Python script as defined in section: -:ref:`Application Script`. Before creating the ``Model`` object for this application, we must use -Experiment.create_run_settings() to create a RunSettings object that defines how to execute -the Model. To launch the Python script in this example, we specify the path to the application -file application_script.py as the exe_args parameter and the executable exe_ex (the Python -executable on this system) as exe parameter. The Experiment.create_run_settings() function -will return a RunSettings object that can then be used to initialize the Model object. +:ref:`Application Script`. Before initializing the ``Model`` object, we must use +``Experiment.create_run_settings()`` to create a ``RunSettings`` object that defines how to execute +the ``Model``. To launch the Python script in this example workflow, we specify the path to the application +file `application_script.py` as the `exe_args` parameter and the executable `exe_ex` (the Python +executable on this system) as `exe` parameter. The ``Experiment.create_run_settings()`` function +will return a ``RunSettings`` object that can then be used to initialize the ``Model`` object. .. note:: Change the `exe_args` argument to the path of the application script on your file system to run the example. +Use the ``RunSettings`` helper functions to +configure the the distribution of computational tasks (``RunSettings.set_nodes()``). In this +example, we specify to SmartSim that we intend the Model to run on a single compute node. + .. code-block:: python # Initialize a RunSettings object - model_settings = exp.create_run_settings(exe=exe_ex, exe_args="application_script.py") + model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/path/to/application_script.py") model_settings.set_nodes(1) Step 2: Initialization '''''''''''''''''''''' Next, create a ``Model`` instance using the ``Experiment.create_model()`` factory method. Pass the ``model_settings`` object as an argument to the ``create_model()`` function and -store the returned ``Model`` object to the variable `model`: +assign the returned ``Model`` instance to the variable `model`: .. code-block:: python # Initialize the Model model = exp.create_model("model", model_settings) + exp.generate(standard_db, model) Step 3: Start ''''''''''''' -Next, launch the model instance using the ``Experiment.start()`` function: +Next, launch the `model` instance using the ``Experiment.start()`` function: .. code-block:: python @@ -306,7 +334,7 @@ Next, launch the model instance using the ``Experiment.start()`` function: We specify `block=True` to ``exp.start()`` because our experiment requires that the ``Model`` finish before the experiment continues. This is because we will request tensors from the database that - are inputted by the Model we launched. + are inputted by the ``Model`` we launched. Poll Data Using Clients ----------------------- @@ -314,7 +342,7 @@ Next, check if the tensor exists in the standard database using ``Client.poll_te This function queries for data in the database. The function requires the tensor name (`name`), how many milliseconds to wait in between queries (`poll_frequency_ms`), and the total number of times to query (`num_tries`). Check if the data exists in the database by -polling every 100 milliseconds until 10 attempts are completed: +polling every 100 milliseconds until 10 attempts have completed: .. code-block:: python @@ -323,7 +351,8 @@ polling every 100 milliseconds until 10 attempts are completed: # Validate that the tensor exists logger.info(f"The tensor is {value_2}") -The output will be as follows:: +When you execute the driver script, the output will be as follows:: + 23:45:46 osprey.us.cray.com SmartSim[87400] INFO The tensor is True Cleanup @@ -338,6 +367,7 @@ workflow summary with ``Experiment.summary()``: logger.info(exp.summary()) When you run the experiment, the following output will appear:: + | | Name | Entity-Type | JobID | RunID | Time | Status | Returncode | |----|----------------|---------------|-------------|---------|---------|-----------|--------------| | 0 | model | Model | 1658679.3 | 0 | 1.3342 | Completed | 0 | @@ -350,48 +380,61 @@ Colocated Deployment -------- Overview -------- -During colocated deployment, a SmartSim ``Orchestrator`` (the database) is launched on -the ``Model`` compute node(s). -The ``Orchestrator`` is non-clustered and each ``Model`` compute node hosts an instance of the database. -Processes on the compute host individually address the database. +During colocated ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on +the ``Models`` compute node(s). A colocated ``Orchestrator`` can only be deployed on a single node +and cannot be sharded (distributed) over multiple nodes. The database on each application node is +utilized by SmartRedis clients on the same node. With a colocated ``Orchestrator``, latency is reduced +in ML inference and TorchScript evaluation by eliminating off-node communication. A colocated ``Orchestrator`` +is ideal when the data and hardware accelerator are located on the same compute node. Communication between a colocated ``Orchestrator`` and ``Model`` -is initialized in the application script via a SmartRedis client. Since a colocated Orchestrator is launched when the Model -is started by the experiment, you may only connect a SmartRedis client to a colocated database from within -the associated colocated ``Model`` application. The client establishes a connection using the database address detected -by SmartSim or provided by the user. In multiple database experiments, users provide the `db_identifier` that was specified -during ``Model`` initialization when creating a client connection. +is initiated in the application script through a SmartRedis client. Since a colocated ``Orchestrator`` is launched when the ``Model`` +is started by the experiment, connecting a SmartRedis ``Client`` to a colocated database is only possible from within +the associated ``Model`` application. + +The client can establish a connection with a colocated ``Orchestrator`` through **three** processes: + +- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim + to detect the database address through the ``Model`` environment configuration + at runtime. +- Users can provide the database address in the ``Client`` constructor within the ``Model`` application script. +- In an experiment with multiple ``Orchestrator`` deployments, a user can connect to an ``Orchestrator`` by + first specifying the `db_identifier` to the ``ConfigOptions`` constructor. A user should then pass the ``ConfigOptions`` + instance to the ``Client`` constructor. -Below is an image illustrating communication within a colocated model spanning multiple compute nodes. +Below is an image illustrating communication within a colocated ``Model`` spanning multiple compute nodes. As demonstrated in the diagram, each process of the application creates its own SmartRedis client connection to the orchestrator running on the same host. .. figure:: images/colocated_orchestrator-1.png - Sample Colocated Orchestrator Deployment + Sample Colocated ``Orchestrator`` Deployment -Colocated deployment is designed for highly performant online inference scenarios where -a distributed process (likely MPI processes) are performing inference with -data local to each process. Data produced by these processes and stored in the colocated database -can be transferred via a SmartRedis client to a standard database to become available for consumption -by other applications. A tradeoff of colocated deployment is the ability to scale to a large workload. -Colocated deployment rather benefits small/medium simulations with low latency requirements. -By hosting the database and simulation on the same compute node, communication time is reduced which -contributes to quicker processing speeds. +Colocated deployment is ideal for highly performant online inference scenarios where +a distributed application (likely an MPI application) is performing inference with +data local to each process. With colocated deployment, data does not need to travel +off-node to be used to evaluate a ML model, and the results of the ML model evaluation +are stored on-node. + +If a workflow requires an application to both leverage colocated +deployment and clustered deployment, multiple clients can be instantiated within an application, +with each client connected to a unique deployment. This is accomplished through the use of the +`db-identifier` specified at Orchestrator initialization time. ------- Example ------- -In the following example, we provide a demonstration on automating the deployment of -a colocated Orchestrator using SmartSim from within a Python driver script. Once the colocated database is launched, -we demonstrate connecting a client to the database from within the application script to transmit and poll data. +In the following example, we demonstrate deploying a colocated ``Orchestrator``. +Once the database is launched, we walk through connecting a SmartRedis ``Client`` +from within the application script to transmit data then poll for the existence of the data +on the database. The example is comprised of two script files: - :ref:`Application Script` - The example application script is a Python file that contains - instructions to create and connect a SmartRedis - client to the colocated Orchestrator. + The application script is a Python script that connects a SmartRedis + client to the colocated ``Orchestrator``. From within the application script, + the client is utilized to both send and retrieve data. - :ref:`Experiment Driver Script` The experiment driver script launches and manages the example entities with the ``Experiment`` API. @@ -411,9 +454,9 @@ To begin writing the application script, import the necessary SmartRedis package Client Initialization --------------------- -To establish a connection with the colocated database, -initialize a new SmartRedis client and specify `cluster=False` -since our database is single-sharded: +To establish a connection with the colocated ``Orchestrator``, we need to initialize a +new SmartRedis client and specify `cluster=False` since colocated deployments are always +single-sharded: .. code-block:: python @@ -433,8 +476,8 @@ since our database is single-sharded: Data Storage ------------ -Next, using the SmartRedis client instance, we create and store a NumPy tensor using -``Client.put_tensor()``: +Next, using the SmartRedis client instance, we create and store a NumPy tensor through +``Client.put_tensor(name, data)``: .. code-block:: python @@ -443,9 +486,13 @@ Next, using the SmartRedis client instance, we create and store a NumPy tensor u # Store the NumPy tensor colo_client.put_tensor("tensor_1", array_1) +We will retrieve `“tensor_1”` in the following section. + Data Retrieval -------------- -Next, retrieve the tensor using ``Client.get_tensor()``: +To confirm a successful connection to the database, we retrieve the tensor we stored. +Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name +`“tensor_1”`: .. code-block:: python @@ -455,17 +502,19 @@ Next, retrieve the tensor using ``Client.get_tensor()``: colo_client.log_data(LLInfo, f"The colocated db tensor is: {value_1}") When the Experiment completes, you can find the following log message in `colo_model.out`:: + Default@21-48-01:The colocated db tensor is: [1 2 3 4] .. _colocated_orch_driver_script: Experiment Driver Script ======================== -To run the application, specify a Model workload from -within the workflow (Experiment). -Defining workflow stages requires the utilization of functions associated -with the ``Experiment`` object. -In this example, we instantiate an ``Experiment`` object with the name ``getting-started``. -We setup the SmartSim ``logger`` to output information from the Experiment. +To run the previous application script, a ``Model`` object must be configured and launched within the +Experiment driver script. Configuring and launching workflow entities (``Model``) +requires the utilization of ``Experiment`` class methods. The ``Experiment`` object is intended to +be instantiated once and utilized throughout the workflow runtime. + +In this example, we instantiate an ``Experiment`` object with the `name` `"getting-started"`, +and set up the SmartSim `logger` to output information from the ``Experiment`` at runtime: .. code-block:: python @@ -491,16 +540,13 @@ on the same compute node. Step 1: Configure ''''''''''''''''' -In this experiment, we invoke the Python interpreter to run -the python script defined in section: The Application Script. -To configure this into a ``Model``, we use the ``Experiment.create_run_settings()`` function. -The function returns a ``RunSettings`` object. -A ``RunSettings`` allows you to configure -the run settings of a SmartSim entity. -We initialize a RunSettings object and -specify the path to the application file, -`application_script.py`, to the argument -``exe_args``, and the run command to ``exe``. +In this example experiment, the ``Model`` application is a Python script as defined in section: +:ref:`Application Script`. Before initializing the ``Model`` object, we must use +``Experiment.create_run_settings()`` to create a ``RunSettings`` object that defines how to execute +the ``Model``. To launch the Python script in this example workflow, we specify the path to the application +file `application_script.py` as the `exe_args` parameter and the executable `exe_ex` (the Python +executable on this system) as `exe` parameter. The ``Experiment.create_run_settings()`` function +will return a ``RunSettings`` object that can then be used to initialize the ``Model`` object. .. note:: Change the `exe_args` argument to the path of the application script @@ -513,15 +559,16 @@ example, we specify to SmartSim that we intend the Model to run on a single comp .. code-block:: python # Initialize a RunSettings object - model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/lus/scratch/richaama/clustered_model.py") + model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/path/to/clustered_model.py") # Configure RunSettings object model_settings.set_nodes(1) Step 2: Initialize '''''''''''''''''' -Next, create a ``Model`` instance using the ``Experiment.create_model()``. -Pass the ``model_settings`` object as an argument to the ``create_model()`` -function and assign to the variable ``Model`` object to the variable `model`. +Next, create a ``Model`` instance using the ``Experiment.create_model()`` factory method. +Pass the ``model_settings`` object as an argument to the method and +assign the returned ``Model`` instance to the variable `model`: + .. code-block:: python # Initialize a SmartSim Model @@ -529,18 +576,19 @@ function and assign to the variable ``Model`` object to the variable `model`. Step 3: Colocate '''''''''''''''' -To colocate the model, use the ``Model.colocate_db_tcp()`` function. -This function will colocate an Orchestrator instance with this Model over +To colocate the model, use the ``Model.colocate_db_uds()`` function. +This function will colocate an ``Orchestrator`` instance with this ``Model`` over a Unix domain socket connection. .. code-block:: python # Colocate the Model - model.colocate_db_tcp() + model.colocate_db_uds() Step 4: Start ''''''''''''' Next, launch the colocated model instance using the ``Experiment.start()`` function. + .. code-block:: python # Launch the colocated Model @@ -548,6 +596,9 @@ Next, launch the colocated model instance using the ``Experiment.start()`` funct Cleanup ------- +.. note:: + Since the colocated ``Orchestrator`` is automatically torn down by SmartSim once the colocated ``Model`` + has finished, we do not need to `stop` the ``Orchestrator``. .. code-block:: python @@ -564,12 +615,13 @@ Multiple Orchestrators ====================== SmartSim supports automating the deployment of multiple Orchestrators from within an Experiment. Communication with the database via a SmartRedis client is possible with the -`db_identifier` argument that is required when initializing an Orchestrator or -colocated Model during a multiple database experiment. When initializing a SmartRedis -client during the Experiment, first create a ``ConfigOptions`` object -with the `db_identifier` argument created during before passing object to the Client() -init call. +`db_identifier` argument that is required when initializing an ``Orchestrator`` or +colocated ``Model`` during a multiple database experiment. When initializing a SmartRedis +client during the Experiment, create a ``ConfigOptions`` object to specify the `db_identifier` +argument used when creating the ``Orchestrator``. Pass the ``ConfigOptions`` object to +the Client() init call. +.. _mutli_orch: Multiple Orchestrator Example ============================= SmartSim offers functionality to automate the deployment of multiple From fb9225ddd8bc9bb12a0b02360497a0a23cbbcfe2 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Thu, 11 Jan 2024 18:56:09 -0600 Subject: [PATCH 19/26] some comments addressed --- doc/orchestrator.rst | 102 +++++++++++++++++++++++++------------------ 1 file changed, 59 insertions(+), 43 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 09fad895a..97dddd743 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -60,22 +60,34 @@ increased in-memory capacity for data storage in large-scale workflows. Standalo single-node ``Orchestrators`` don't involve communication between nodes. Communication between a clustered ``Orchestrator`` and ``Model`` -is facilitated by a SmartRedis ``Client`` and initialized in a ``Model`` application script. +is facilitated by a SmartRedis ``Client`` and initialized in a ``Model`` application. -A SmartRedis ``Client`` can establish a connection with an ``Orchestrator`` through **four** processes: +When connecting to a clustered ``Orchestrator`` from within a ``Model`` application, the user has +several options when using the SmartRedis ``Client``: - In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim - to detect the database address through the ``Model`` environment configuration - at runtime. -- User's can provide the database address in the ``Client`` constructor within the ``Model`` application script. -- User's can provide the database address in the ``Client`` constructor within the driver script. Users - can access an ``Orchestrators`` address through ``Orchestrator.get_address()``. -- In an experiment with multiple ``Orchestrator`` deployments, a user can connect to an ``Orchestrator`` by - first specifying the `db_identifier` to the ``ConfigOptions`` constructor. A user should then pass the ``ConfigOptions`` + to detect the ``Orchestrator`` address through runtime configuration of the ``Model`` environment. + A default ``Client`` constructor, with no user-specified parameters, is sufficient to + connect to the ``Orchestrator``. The only exception is for the Python `client`, which requires + the `cluster` constructor parameter to differentiate between a multi-node clustered deployment + and a single-node clustered deployment. +- In an experiment with multiple ``Orchestrator`` deployments, users can connect to a specific ``Orchestrator`` by + first specifying the `db_identifier` in the ``ConfigOptions`` constructor. Subsequently, users should pass the + ``ConfigOptions`` instance to the ``Client`` constructor. +- Users can specify or override automatically configured connection options by providing the + database address in the ``ConfigOptions`` object. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. +If connecting to a clustered ``Orchestrator`` from a SmartSim driver script, the user must specify +the address of the ``Orchestrator`` via the ``Client`` constructor. SmartSim does not automatically +configure the environment of the driver script to connect to an ``Orchestrator``. Users +can access an ``Orchestrators`` address through ``Orchestrator.get_address()``. + .. note:: - The Model application code can remain unchanged as ``Orchestrator`` connection options are varied. + In ``Model`` applications, it is advisable to **avoid** specifying addresses directly to the ``Client`` constructor. + Utilizing the SmartSim environment configuration for SmartRedis `client` connections + allows the ``Model`` application code to remain unchanged even as ``Orchestrator`` deployment + options vary. The following image illustrates communication between a clustered ``Orchestrator`` and a @@ -92,9 +104,9 @@ separate from the ``Orchestrator`` compute nodes. Communication is established b can address the cluster with the SmartRedis clients like a single block of memory using simple put/get semantics in SmartRedis. -In high data throughput scenarios, such as online analysis, training, and processing, a clustered ``Orchestrator`` -is optimal. The data produced by processes performed in a ``Model`` and stored in the clustered ``Orchestrator`` becomes -available for consumption by other ``Models``. +In scenarios with high data throughput, such as online analysis, training, and processing, a clustered ``Orchestrator`` +is optimal. The data produced by multiple processes in a ``Model`` is stored in the clustered +``Orchestrator`` and is available for consumption by other ``Models``. If a workflow requires an application to leverage multiple clustered deployments, multiple clients can be instantiated within an application, @@ -106,7 +118,7 @@ Example ------- In the following example, we demonstrate deploying a clustered ``Orchestrator``. Once the clustered ``Orchestrator`` is launched from the driver script, we walk through -connecting a SmartRedis ``Client`` to the database from within the application +connecting a SmartRedis ``Client`` to the database from within the ``Model`` script to transmit data then poll for the existence of the data. The example is comprised of two script files: @@ -115,10 +127,8 @@ The example is comprised of two script files: The application script is a Python file that contains instructions to create a SmartRedis client connection to the standard ``Orchestrator`` launched in the driver script. To demonstrate the ability of workflow components to access data from - other entities, we then retrieve the tensors stored in the driver script using a SmartRedis client in - the application script. - - We then instruct the client to send and retrieve data from within the application script. + other entities, we then retrieve the tensors set by the driver script using a SmartRedis client in + the application script. We then instruct the client to send and retrieve data from within the application script. - :ref:`Experiment Driver Script` The experiment driver script is responsible for launching and managing SmartSim entities. Within this script, we use the Experiment API to create and launch a standard ``Orchestrator``. To demonstrate the capability of @@ -146,7 +156,12 @@ Client Initialization --------------------- To establish a connection with the ``Orchestrator``, we need to initialize a new SmartRedis client. Since the ``Orchestrator`` we launch in the driver script is sharded, we specify the -constructor argument `cluster` as `True`: +constructor argument `cluster` as `True`. + +.. note:: + Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations + from the ``Model`` environment and the `cluster` constructor argument does not need to be specified + in those client languages. .. code-block:: python @@ -155,7 +170,7 @@ constructor argument `cluster` as `True`: .. note:: Since there is only one database launched in the Experiment - (the standard database), specifying a a database address + (the standard database), specifying a database address is not required when initializing the client. SmartRedis will handle the connection configuration. @@ -166,9 +181,9 @@ constructor argument `cluster` as `True`: Data Retrieval -------------- -To confirm a successful connection to the database, we retrieve the tensor we store in the Python driver script. +To confirm a successful connection to the database, we retrieve the tensor we set from the Python driver script. Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name `tensor_1` we -used in the driver script to ``Client.put_tensor()``: +used in the driver script as input to ``Client.put_tensor()``: .. code-block:: python @@ -622,8 +637,9 @@ argument used when creating the ``Orchestrator``. Pass the ``ConfigOptions`` obj the Client() init call. .. _mutli_orch: +----------------------------- Multiple Orchestrator Example -============================= +----------------------------- SmartSim offers functionality to automate the deployment of multiple databases, supporting workloads that require multiple ``Orchestrators`` for a ``Experiment``. For instance, a workload may consist of a @@ -678,7 +694,7 @@ runs the application. Setup and run instructions can be found :ref:`here` The Application Script ----------------------- +====================== Applications interact with the databases through a SmartRedis client. In this section, we write an application script @@ -701,13 +717,13 @@ To begin, import the necessary packages: :lines: 1-3 Initialize the Clients -^^^^^^^^^^^^^^^^^^^^^^ +---------------------- To establish a connection with each database, we need to initialize a new SmartRedis client for each ``Orchestrator``. Step 1: Initialize ConfigOptions -"""""""""""""""""""""""""""""""" +'''''''''''''''''''''''''''''''' Since we are launching multiple databases within the experiment, the SmartRedis ``ConfigOptions`` object is required when initializing a client in the application. @@ -740,7 +756,7 @@ For the colocated database: :lines: 15-16 Step 2: Initialize the Client Connections -""""""""""""""""""""""""""""""""""""""""" +''''''''''''''''''''''''''''''''''''''''' Now that we have three ``ConfigOptions`` objects, we have the tools necessary to initialize three SmartRedis clients and establish a connection with the three databases. @@ -769,7 +785,7 @@ Colocated database: :lines: 17-18 Retrieve Data and Store Using SmartRedis Client Objects -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +------------------------------------------------------- To confirm a successful connection to each database, we will retrieve the tensors that we plan to store in the python driver script. After retrieving, we store both tensors in the colocated database. @@ -814,7 +830,7 @@ The output will be as follows:: Model: colo logger@00-00-00:The colocated db has tensor_2: True The Experiment Driver Script ----------------------------- +============================ To run the previous application, we must define workflow stages within a workload. Defining workflow stages requires the utilization of functions associated with the ``Experiment`` object. The Experiment object is intended to be instantiated @@ -828,7 +844,7 @@ We setup the SmartSim ``logger`` to output information from the Experiment. :lines: 1-10 Launch Multiple Orchestrators -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +----------------------------- In the context of this ``Experiment``, it's essential to create and launch the databases as a preliminary step before any other components since the application script requests tensors from the launched databases. @@ -838,7 +854,7 @@ create two databases in the workflow: a single-sharded database and a multi-sharded database. Step 1: Initialize Orchestrators -"""""""""""""""""""""""""""""""" +'''''''''''''''''''''''''''''''' To create an database, utilize the ``Experiment.create_database()`` function. The function requires specifying a unique database identifier argument named `db_identifier` to launch multiple databases. @@ -868,7 +884,7 @@ For the multi-sharded database: be created, namely ``single_shard_db_identifier/`` and ``multi_shard_db_identifier/``. Step 2: Start Databases -""""""""""""""""""""""" +''''''''''''''''''''''' Next, to launch the databases, pass the database instances to ``Experiment.start()``. @@ -888,7 +904,7 @@ deploys the databases on the allocated compute resources. would be launched immediately with no summary. Create Client Connections to Orchestrators -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +------------------------------------------ The SmartRedis ``Client`` object contains functions that manipulate, send, and receive data within the database. Each database has a single, dedicated SmartRedis ``Client``. Begin by initializing a SmartRedis ``Client`` object per launched database. @@ -911,7 +927,7 @@ For the multi-sharded database: :lines: 25-26 Store Data Using Clients -^^^^^^^^^^^^^^^^^^^^^^^^ +------------------------ In the application script, we retrieved two NumPy tensors. To support the apps functionality, we will create two NumPy arrays in the python driver script and send them to the a database. To @@ -945,14 +961,14 @@ When you run the experiment, the following output will appear:: 00:00:00 system.host.com SmartSim[#####] INFO The single shard array key exists in the incorrect database: False Initialize a Colocated Model -^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +---------------------------- In the next stage of the experiment, we launch the application script with a co-located database by configuring and creating a SmartSim colocated ``Model``. Step 1: Configure -""""""""""""""""" +''''''''''''''''' You can specify the run settings of a model. In this experiment, we invoke the Python interpreter to run the python script defined in section: :ref:`The Application Script`. @@ -983,7 +999,7 @@ example, we specify to SmartSim that we intend to execute the script once on a s :lines: 46-48 Step 2: Initialize -"""""""""""""""""" +'''''''''''''''''' Next, create a ``Model`` instance using the ``Experiment.create_model()``. Pass the ``model_settings`` object as an argument to the ``create_model()`` function and assign to the variable ``model``. @@ -994,7 +1010,7 @@ to the ``create_model()`` function and assign to the variable ``model``. :lines: 49-50 Step 2: Colocate -"""""""""""""""" +'''''''''''''''' To colocate the model, use the ``Model.colocate_db_uds()`` function to Colocate an Orchestrator instance with this Model over a Unix domain socket connection. @@ -1009,7 +1025,7 @@ database to this Model instance. Only this Model will be able to communicate with this colocated database by using the loopback TCP interface. Step 3: Start -""""""""""""" +''''''''''''' Next, launch the colocated model instance using the ``Experiment.start()`` function. .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py @@ -1024,7 +1040,7 @@ Next, launch the colocated model instance using the ``Experiment.start()`` funct if processes run, complete, or fail. Cleanup Experiment -^^^^^^^^^^^^^^^^^^ +------------------ Finally, use the ``Experiment.stop()`` function to stop the database instances. Print the workflow summary with ``Experiment.summary()``. @@ -1043,7 +1059,7 @@ When you run the experiment, the following output will appear:: | 2 | multi_shard_db_identifier_0 | DBNode | 1556529.4+2 | 0 | 45.5139 | Cancelled | 0 | How to Run the Example ----------------------- +====================== Below are the steps to run the experiment. Find the :ref:`experiment source code` and :ref:`application source code` @@ -1083,13 +1099,13 @@ Step 4 : Run the Experiment Application Source Code -^^^^^^^^^^^^^^^^^^^^^^^ +----------------------- .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python :linenos: Experiment Source Code -^^^^^^^^^^^^^^^^^^^^^^ +---------------------- .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python :linenos: \ No newline at end of file From 8d7440a5023801cce6c9970b9aea3c402f275310 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Thu, 11 Jan 2024 19:38:14 -0600 Subject: [PATCH 20/26] going through namings --- doc/orchestrator.rst | 266 +++++++++++++++++++++---------------------- 1 file changed, 132 insertions(+), 134 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 97dddd743..2e8661ab6 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -17,20 +17,20 @@ executing ML models and TorchScripts on the stored data using CPUs or GPUs. Sample experiment showing a user application leveraging machine learning infrastructure launched by SmartSim and connected - to online analysis and visualization via the in-memory database. + to online analysis and visualization via the ``Orchestrator``. Users can establish a connection to the ``Orchestrator`` from within SmartSim ``Model`` executable code, ``Ensemble`` model executable code, or driver scripts using the :ref:`SmartRedis` client library. SmartSim offers two types of ``Orchestrator`` deployments: -- :ref:`Clustered Deployment` - A clustered ``Orchestrator`` is ideal for systems that have heterogeneous node types +- :ref:`Standalone Deployment` + A standalone ``Orchestrator`` is ideal for systems that have heterogeneous node types (i.e. a mix of CPU-only and GPU-enabled compute nodes) where ML model and TorchScript evaluation is more efficiently performed off-node for a ``Model``. This deployment is also ideal for workflows relying on data exchange between multiple applications (e.g. online analysis, visualization, computational steering, or - producer/consumer application couplings). Clustered deployment is also optimal for + producer/consumer application couplings). Standalone deployment is also optimal for high data throughput scenarios with databases that require a large amount of hardware. - :ref:`Colocated Deployment` @@ -38,47 +38,47 @@ SmartSim offers two types of ``Orchestrator`` deployments: This setup helps reduce latency in ML inference and TorchScript evaluation by eliminating off-node communication. SmartSim allows users to launch **multiple orchestrators** during the course of an experiment of -either deployment type. If a workflow requires a multiple database environment, a -`db_identifier` argument must be specified during database initialization. Users can connect to -orchestrators in a parallel database workflow by specifying the respective `db_identifier` argument +either deployment type. If a workflow requires a multiple ``Orchestrator`` environment, a +`db_identifier` argument must be specified during ``Orchestrator`` initialization. Users can connect to +``Orchestrators`` in a parallel database workflow by specifying the respective `db_identifier` argument within a ``ConfigOptions`` object to pass in to the SmartRedis ``Client`` constructor. The client can then be used to transmit data, -execute ML models, and execute scripts on the linked database. +execute ML models, and execute scripts on the linked ``Orchestrator``. -.. _clustered_orch_doc: -====================== -Clustered Deployment -====================== +.. _standalone_orch_doc: +===================== +Standalone Deployment +===================== -------- Overview -------- -During clustered ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on separate -compute node(s) from the ``Model`` node(s). A clustered ``Orchestrator`` can be deployed on a single +During standalone ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on separate +compute node(s) from the ``Model`` node(s). A standalone ``Orchestrator`` can be deployed on a single node (standalone) or sharded (distributed) over multiple nodes. With a sharded ``Orchestrator``, users can scale the number of database nodes for inference and script evaluation, contributing to an increased in-memory capacity for data storage in large-scale workflows. Standalone ``Orchestrators`` are effective for small-scale workflows and offer lower latency since single-node ``Orchestrators`` don't involve communication between nodes. -Communication between a clustered ``Orchestrator`` and ``Model`` +Communication between a standalone ``Orchestrator`` and ``Model`` is facilitated by a SmartRedis ``Client`` and initialized in a ``Model`` application. -When connecting to a clustered ``Orchestrator`` from within a ``Model`` application, the user has +When connecting to a standalone ``Orchestrator`` from within a ``Model`` application, the user has several options when using the SmartRedis ``Client``: - In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim to detect the ``Orchestrator`` address through runtime configuration of the ``Model`` environment. A default ``Client`` constructor, with no user-specified parameters, is sufficient to connect to the ``Orchestrator``. The only exception is for the Python `client`, which requires - the `cluster` constructor parameter to differentiate between a multi-node clustered deployment - and a single-node clustered deployment. + the `cluster` constructor parameter to differentiate between a multi-node standalone deployment + and a single-node standalone deployment. - In an experiment with multiple ``Orchestrator`` deployments, users can connect to a specific ``Orchestrator`` by first specifying the `db_identifier` in the ``ConfigOptions`` constructor. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. - Users can specify or override automatically configured connection options by providing the - database address in the ``ConfigOptions`` object. Subsequently, users should pass the ``ConfigOptions`` + ``Orchestrator`` address in the ``ConfigOptions`` object. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. -If connecting to a clustered ``Orchestrator`` from a SmartSim driver script, the user must specify +If connecting to a standalone ``Orchestrator`` from a SmartSim driver script, the user must specify the address of the ``Orchestrator`` via the ``Client`` constructor. SmartSim does not automatically configure the environment of the driver script to connect to an ``Orchestrator``. Users can access an ``Orchestrators`` address through ``Orchestrator.get_address()``. @@ -90,25 +90,25 @@ can access an ``Orchestrators`` address through ``Orchestrator.get_address()``. options vary. The following image illustrates -communication between a clustered ``Orchestrator`` and a +communication between a standalone ``Orchestrator`` and a ``Model``. In the diagram, the application is running on multiple compute nodes, separate from the ``Orchestrator`` compute nodes. Communication is established between the ``Model`` application and the sharded ``Orchestrator`` using the :ref:`SmartRedis Client` Client. .. figure:: images/clustered_orchestrator-1.png - Sample Clustered ``Orchestrator`` Deployment + Sample Standalone ``Orchestrator`` Deployment .. note:: - Users do not need to know how the data is stored in a clustered configuration and + Users do not need to know how the data is stored in a standalone configuration and can address the cluster with the SmartRedis clients like a single block of memory using simple put/get semantics in SmartRedis. -In scenarios with high data throughput, such as online analysis, training, and processing, a clustered ``Orchestrator`` -is optimal. The data produced by multiple processes in a ``Model`` is stored in the clustered +In scenarios with high data throughput, such as online analysis, training, and processing, a standalone ``Orchestrator`` +is optimal. The data produced by multiple processes in a ``Model`` is stored in the standalone ``Orchestrator`` and is available for consumption by other ``Models``. -If a workflow requires an application to leverage multiple clustered deployments, +If a workflow requires an application to leverage multiple standalone deployments, multiple clients can be instantiated within an application, with each client connected to a unique deployment. This is accomplished through the use of the `db-identifier` and ``ConfigOptions`` object specified at ``Orchestrator`` initialization time. @@ -116,23 +116,23 @@ with each client connected to a unique deployment. This is accomplished through ------- Example ------- -In the following example, we demonstrate deploying a clustered ``Orchestrator``. -Once the clustered ``Orchestrator`` is launched from the driver script, we walk through -connecting a SmartRedis ``Client`` to the database from within the ``Model`` +In the following example, we demonstrate deploying a standalone ``Orchestrator``. +Once the standalone ``Orchestrator`` is launched from the driver script, we walk through +connecting a SmartRedis ``Client`` to the ``Orchestrator`` from within the ``Model`` script to transmit data then poll for the existence of the data. The example is comprised of two script files: -- :ref:`Application Script` +- :ref:`Application Script` The application script is a Python file that contains instructions to create a SmartRedis - client connection to the standard ``Orchestrator`` launched in the driver script. + client connection to the standalone ``Orchestrator`` launched in the driver script. To demonstrate the ability of workflow components to access data from other entities, we then retrieve the tensors set by the driver script using a SmartRedis client in the application script. We then instruct the client to send and retrieve data from within the application script. -- :ref:`Experiment Driver Script` +- :ref:`Experiment Driver Script` The experiment driver script is responsible for launching and managing SmartSim entities. Within this script, - we use the Experiment API to create and launch a standard ``Orchestrator``. To demonstrate the capability of - ``Model`` applications to access database data sent from other sources, we employ the SmartRedis ``Client`` in + we use the Experiment API to create and launch a standalone ``Orchestrator``. To demonstrate the capability of + ``Model`` applications to access ``Orchestrator`` data sent from other sources, we employ the SmartRedis ``Client`` in the driver script to store a tensor in the ``Orchestrator``, which is later retrieved by the ``Model``. To employ the application script, we initialize a ``Model`` object with the application script as an executable argument, launch the ``Orchestrator``, and then launch the ``Model``. @@ -141,7 +141,7 @@ The example is comprised of two script files: other entities, we then retrieve the tensors stored by the ``Model`` using a SmartRedis client in the driver script. Lastly, we tear down the ``Orchestrator``. -.. _clustered_orch_app_script: +.. _standalone_orch_app_script: Application Script ================== To begin writing the application script, import the necessary SmartRedis packages: @@ -166,31 +166,31 @@ constructor argument `cluster` as `True`. .. code-block:: python # Initialize a Client - standard_db_client = Client(cluster=True) + application_client = Client(cluster=True) .. note:: - Since there is only one database launched in the Experiment - (the standard database), specifying a database address + Since there is only one ``Orchestrator`` launched in the Experiment + (the standalone ``Orchestrator``), specifying a ``Orchestrator`` address is not required when initializing the client. SmartRedis will handle the connection configuration. .. note:: - To create a client connection to the clustered database, the standard ``Orchestrator`` must be launched + To create a client connection to the standalone ``Orchestrator``, the standalone ``Orchestrator`` must be launched from within the driver script. You must execute the Python driver script, otherwise, there will - be no database to connect the client to. + be no ``Orchestrator`` to connect the client to. Data Retrieval -------------- -To confirm a successful connection to the database, we retrieve the tensor we set from the Python driver script. +To confirm a successful connection to the ``Orchestrator``, we retrieve the tensor we set from the Python driver script. Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name `tensor_1` we used in the driver script as input to ``Client.put_tensor()``: .. code-block:: python # Retrieve tensor from Orchestrator - value_1 = standard_db_client.get_tensor("tensor_1") + driver_script_tensor = application_client.get_tensor("tensor_1") # Log tensor - standard_db_client.log_data(LLInfo, f"The single sharded db tensor is: {value_1}") + application_client.log_data(LLInfo, f"The single sharded db tensor is: {driver_script_tensor}") Later, when you run the driver script the following output will appear in ``model.out`` located in ``getting-started/tutorial_model/``:: @@ -199,19 +199,19 @@ located in ``getting-started/tutorial_model/``:: Data Storage ------------ -Next, create a NumPy tensor to send to the standard database using +Next, create a NumPy tensor to send to the standalone ``Orchestrator`` using ``Client.put_tensor(name, data)``: .. code-block:: python # Create a NumPy array - array_2 = np.array([5, 6, 7, 8]) + local_array = np.array([5, 6, 7, 8]) # Use SmartRedis client to place tensor in multi-sharded db - standard_db_client.put_tensor("tensor_2", array_2) + application_client.put_tensor("tensor_2", local_array) We will retrieve `"tensor_2"` in the Python driver script. -.. _clustered_orch_driver_script: +.. _standalone_orch_driver_script: Experiment Driver Script ======================== To run the previous application script, we define a ``Model`` and ``Orchestrator`` within a @@ -230,7 +230,7 @@ setup the SmartSim `logger` to output information from the ``Experiment`` at run from smartsim.log import get_logger import sys - # returns the executable binary for the Python interpreter + # Returns the executable binary for the Python interpreter exe_ex = sys.executable # Initialize the logger logger = get_logger("Example Experiment Log") @@ -240,28 +240,28 @@ setup the SmartSim `logger` to output information from the ``Experiment`` at run Orchestrator Deployment ----------------------- In the context of this ``Experiment``, it's essential to create and launch -the database as a preliminary step before any other workflow entities. This is because -the application script requests and sends tensors to and from a launched database. +the ``Orchestrator`` as a preliminary step before any other workflow entities. This is because +in this example the application script requests and sends tensors to and from a launched ``Orchestrator``. -In the next stage of the experiment, we create and launch a standard ``Orchestrator``. +In the next stage of the experiment, we create and launch a standalone ``Orchestrator``. -Step 1: Initialize Orchestrator -''''''''''''''''''''''''''''''' -To create a clustered database, utilize the ``Experiment.create_database()`` function. +Step 1: Initialize +'''''''''''''''''' +To create a standalone ``Orchestrator``, utilize the ``Experiment.create_database()`` function. .. code-block:: python # Initialize a multi-sharded database - standard_db = exp.create_database(db_nodes=3) + standalone_orchestrator = exp.create_database(db_nodes=3) -Step 2: Start Databases -''''''''''''''''''''''' -Next, to launch the database, pass the database instance to ``Experiment.start()``. +Step 2: Start +''''''''''''' +Next, to launch the ``Orchestrator``, pass the ``Orchestrator`` instance to ``Experiment.start()``. .. code-block:: python - # Launch the multi sharded database - exp.start(standard_db) + # Launch the multi sharded orchestrator + exp.start(standalone_orchestrator) The ``Experiment.start()`` function launches the ``Orchestrator`` for use within the workflow. In other words, the function deploys the ``Orchestrator`` on the allocated compute resources. @@ -269,41 +269,41 @@ In other words, the function deploys the ``Orchestrator`` on the allocated compu Client Initialization --------------------- The SmartRedis ``Client`` object contains functions that manipulate, send, and retrieve -data on the database. Begin by initializing a SmartRedis ``Client`` object for the standard database. +data on the ``Orchestrator``. Begin by initializing a SmartRedis ``Client`` object for the standalone ``Orchestrator``. SmartRedis clients in driver scripts do not have the ability to use a `db-identifier` or rely on automatic configurations to connect to ``Orchestrators``. Therefore, when creating a client -connection from within a driver script, specify the address of the database you would like to connect to. -You can easily retrieve the database address using the ``Orchestrator.get_address()`` function: +connection from within a driver script, specify the address of the ``Orchestrator`` you would like to connect to. +You can easily retrieve the ``Orchestrator`` address using the ``Orchestrator.get_address()`` function: .. code-block:: python - # Initialize a SmartRedis client for multi sharded database - driver_client_standard_db = Client(cluster=True, address=standard_db.get_address()[0]) + # Initialize a SmartRedis client for multi sharded orchestrator + driver_client = Client(cluster=True, address=standalone_orchestrator.get_address()[0]) Data Storage ------------ In the application script, we retrieved a NumPy tensor stored from within the driver script. To support the application functionality, we create a -NumPy array in the experiment workflow to send to the database. To -send a tensor to the database, use the function ``Client.put_tensor(name, data)``: +NumPy array in the experiment workflow to send to the ``Orchestrator``. To +send a tensor to the ``Orchestrator``, use the function ``Client.put_tensor(name, data)``: .. code-block:: python # Create NumPy array - array_1 = np.array([1, 2, 3, 4]) - # Use the SmartRedis client to place tensor in the standard database - driver_client_standard_db.put_tensor("tensor_1", array_1) + local_array = np.array([1, 2, 3, 4]) + # Use the SmartRedis client to place tensor in the standalone orchestrator + driver_client.put_tensor("tensor_1", local_array) -Standard Model Initialization ------------------------------ +Model Initialization +-------------------- In the next stage of the experiment, we execute the application script by configuring and creating a SmartSim ``Model`` and specifying the application script name during ``Model`` creation. Step 1: Configure ''''''''''''''''' In this example experiment, the ``Model`` application is a Python script as defined in section: -:ref:`Application Script`. Before initializing the ``Model`` object, we must use +:ref:`Application Script`. Before initializing the ``Model`` object, we must use ``Experiment.create_run_settings()`` to create a ``RunSettings`` object that defines how to execute the ``Model``. To launch the Python script in this example workflow, we specify the path to the application file `application_script.py` as the `exe_args` parameter and the executable `exe_ex` (the Python @@ -324,8 +324,8 @@ example, we specify to SmartSim that we intend the Model to run on a single comp model_settings = exp.create_run_settings(exe=exe_ex, exe_args="/path/to/application_script.py") model_settings.set_nodes(1) -Step 2: Initialization -'''''''''''''''''''''' +Step 2: Initialize +'''''''''''''''''' Next, create a ``Model`` instance using the ``Experiment.create_model()`` factory method. Pass the ``model_settings`` object as an argument to the ``create_model()`` function and assign the returned ``Model`` instance to the variable `model`: @@ -334,7 +334,6 @@ assign the returned ``Model`` instance to the variable `model`: # Initialize the Model model = exp.create_model("model", model_settings) - exp.generate(standard_db, model) Step 3: Start ''''''''''''' @@ -348,23 +347,23 @@ Next, launch the `model` instance using the ``Experiment.start()`` function: .. note:: We specify `block=True` to ``exp.start()`` because our experiment requires that the ``Model`` finish before the experiment continues. - This is because we will request tensors from the database that + This is because we will request tensors from the ``Orchestrator`` that are inputted by the ``Model`` we launched. -Poll Data Using Clients ------------------------ -Next, check if the tensor exists in the standard database using ``Client.poll_tensor()``. -This function queries for data in the database. The function requires the tensor name (`name`), +Data Polling +------------ +Next, check if the tensor exists in the standalone ``Orchestrator`` using ``Client.poll_tensor()``. +This function queries for data in the ``Orchestrator``. The function requires the tensor name (`name`), how many milliseconds to wait in between queries (`poll_frequency_ms`), -and the total number of times to query (`num_tries`). Check if the data exists in the database by +and the total number of times to query (`num_tries`). Check if the data exists in the ``Orchestrator`` by polling every 100 milliseconds until 10 attempts have completed: .. code-block:: python # Retrieve the tensors placed by the Model - value_2 = driver_client_standard_db.poll_key("tensor_2", 100, 10) + app_tensor = driver_client.poll_key("tensor_2", 100, 10) # Validate that the tensor exists - logger.info(f"The tensor is {value_2}") + logger.info(f"The tensor is {app_tensor}") When you execute the driver script, the output will be as follows:: @@ -372,13 +371,13 @@ When you execute the driver script, the output will be as follows:: Cleanup ------- -Finally, use the ``Experiment.stop()`` function to stop the database instances. Print the +Finally, use the ``Experiment.stop()`` function to stop the ``Orchestrator`` instances. Print the workflow summary with ``Experiment.summary()``: .. code-block:: python - # Cleanup the database - exp.stop(standard_db) + # Cleanup the Orchestrator + exp.stop(standalone_orchestrator) logger.info(exp.summary()) When you run the experiment, the following output will appear:: @@ -396,14 +395,14 @@ Colocated Deployment Overview -------- During colocated ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on -the ``Models`` compute node(s). A colocated ``Orchestrator`` can only be deployed on a single node -and cannot be sharded (distributed) over multiple nodes. The database on each application node is +the ``Models`` compute node(s). Colocated ``Orchestrators`` can only be deployed as isolated instances +on each compute node and cannot be clustered over multiple nodes. The database on each application node is utilized by SmartRedis clients on the same node. With a colocated ``Orchestrator``, latency is reduced in ML inference and TorchScript evaluation by eliminating off-node communication. A colocated ``Orchestrator`` is ideal when the data and hardware accelerator are located on the same compute node. Communication between a colocated ``Orchestrator`` and ``Model`` -is initiated in the application script through a SmartRedis client. Since a colocated ``Orchestrator`` is launched when the ``Model`` +is initiated in the application through a SmartRedis client. Since a colocated ``Orchestrator`` is launched when the ``Model`` is started by the experiment, connecting a SmartRedis ``Client`` to a colocated database is only possible from within the associated ``Model`` application. @@ -432,7 +431,7 @@ off-node to be used to evaluate a ML model, and the results of the ML model eval are stored on-node. If a workflow requires an application to both leverage colocated -deployment and clustered deployment, multiple clients can be instantiated within an application, +deployment and standalone deployment, multiple clients can be instantiated within an application, with each client connected to a unique deployment. This is accomplished through the use of the `db-identifier` specified at Orchestrator initialization time. @@ -440,9 +439,9 @@ with each client connected to a unique deployment. This is accomplished through Example ------- In the following example, we demonstrate deploying a colocated ``Orchestrator``. -Once the database is launched, we walk through connecting a SmartRedis ``Client`` +Once the ``Orchestrator`` is launched, we walk through connecting a SmartRedis ``Client`` from within the application script to transmit data then poll for the existence of the data -on the database. +on the ``Orchestrator``. The example is comprised of two script files: @@ -479,15 +478,15 @@ single-sharded: colo_client = Client(cluster=False) .. note:: - Since there is only one database launched in the Experiment - (the colocated database), specifying a a database address + Since there is only one ``Orchestrator`` launched in the Experiment + (the colocated ``Orchestrator``), specifying a orchestrator address is not required when initializing the client. SmartRedis will handle the connection configuration. .. note:: - To create a client connection to the colocated database, the colocated Model must be launched + To create a client connection to the colocated ``Orchestrator``, the colocated Model must be launched from within the driver script. You must execute the Python driver script, otherwise, there will - be no database to connect the client to. + be no orchestrator to connect the client to. Data Storage ------------ @@ -497,24 +496,24 @@ Next, using the SmartRedis client instance, we create and store a NumPy tensor t .. code-block:: python # Create NumPy array - array_1 = np.array([1, 2, 3, 4]) + local_array = np.array([1, 2, 3, 4]) # Store the NumPy tensor - colo_client.put_tensor("tensor_1", array_1) + colo_client.put_tensor("tensor_1", local_array) We will retrieve `“tensor_1”` in the following section. Data Retrieval -------------- -To confirm a successful connection to the database, we retrieve the tensor we stored. +To confirm a successful connection to the ``Orchestrator``, we retrieve the tensor we stored. Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name `“tensor_1”`: .. code-block:: python # Retrieve tensor from driver script - value_1 = colo_client.get_tensor("tensor_1") + local_tensor = colo_client.get_tensor("tensor_1") # Log tensor - colo_client.log_data(LLInfo, f"The colocated db tensor is: {value_1}") + colo_client.log_data(LLInfo, f"The colocated db tensor is: {local_tensor}") When the Experiment completes, you can find the following log message in `colo_model.out`:: @@ -539,7 +538,7 @@ and set up the SmartSim `logger` to output information from the ``Experiment`` a from smartsim.log import get_logger import sys - # returns the executable binary for the Python interpreter + # Returns the executable binary for the Python interpreter exe_ex = sys.executable # Initialize a logger object logger = get_logger("Example Experiment Log") @@ -629,9 +628,9 @@ When you run the experiment, the following output will appear:: Multiple Orchestrators ====================== SmartSim supports automating the deployment of multiple Orchestrators -from within an Experiment. Communication with the database via a SmartRedis client is possible with the +from within an Experiment. Communication with the ``Orchestrator`` via a SmartRedis client is possible with the `db_identifier` argument that is required when initializing an ``Orchestrator`` or -colocated ``Model`` during a multiple database experiment. When initializing a SmartRedis +colocated ``Model`` during a multiple Orchestrator experiment. When initializing a SmartRedis client during the Experiment, create a ``ConfigOptions`` object to specify the `db_identifier` argument used when creating the ``Orchestrator``. Pass the ``ConfigOptions`` object to the Client() init call. @@ -645,7 +644,7 @@ databases, supporting workloads that require multiple ``Orchestrators`` for a ``Experiment``. For instance, a workload may consist of a simulation with high inference performance demands (necessitating a co-located deployment), along with an analysis and -visualization workflow connected to the simulation (requiring a standard orchestrator). +visualization workflow connected to the simulation (requiring a standalone orchestrator). In the following example, we simulate a simple version of this use case. The example is comprised of two script files: @@ -659,16 +658,16 @@ contains instructions to complete computational tasks. Applications are not limited to Python and can also be written in C, C++ and Fortran. This script specifies creating a Python SmartRedis client for each -standard orchestrator and a colocated orchestrator. We use the -clients to request data from both standard databases, then -transfer the data to the colocated database. The application +standalone orchestrator and a colocated orchestrator. We use the +clients to request data from both standalone ``Orchestrators``, then +transfer the data to the colocated ``Orchestrator``. The application file is launched by the experiment driver script through a ``Model`` stage. **The Application Script Contents:** 1. Connecting SmartRedis clients within the application to retrieve tensors - from the standard databases to store in a colocated database. Details in section: + from the standalone ``Orchestrators`` to store in a colocated ``Orchestrator``. Details in section: :ref:`Initialize the Clients`. **The Experiment Driver Script Overview:** @@ -683,11 +682,11 @@ runs the application. **The Experiment Driver Script Contents:** -1. Launching two standard Orchestrators with unique identifiers. Details in section: +1. Launching two standalone Orchestrators with unique identifiers. Details in section: :ref:`Launch Multiple Orchestrators`. -2. Launching the application script with a co-located database. Details in section: +2. Launching the application script with a co-located ``Orchestrator``. Details in section: :ref:`Initialize a Colocated Model`. -3. Connecting SmartRedis clients within the driver script to send tensors to standard Orchestrators +3. Connecting SmartRedis clients within the driver script to send tensors to standalone Orchestrators for retrieval within the application. Details in section: :ref:`Create Client Connections to Orchestrators`. @@ -695,14 +694,14 @@ Setup and run instructions can be found :ref:`here` The Application Script ====================== -Applications interact with the databases +Applications interact with the ``Orchestrators`` through a SmartRedis client. In this section, we write an application script to demonstrate how to connect SmartRedis clients in the context of multiple -launched databases. Using the clients, we retrieve tensors -from two databases launched in the driver script, then store -the tensors in the colocated database. +launched ``Orchestrators``. Using the clients, we retrieve tensors +from two ``Orchestrators`` launched in the driver script, then store +the tensors in the colocated ``Orchestrators``. .. note:: The Experiment must be started to use the Orchestrators within the @@ -718,37 +717,36 @@ To begin, import the necessary packages: Initialize the Clients ---------------------- -To establish a connection with each database, -we need to initialize a new SmartRedis client for each -``Orchestrator``. +To establish a connection with each ``Orchestrator``, +we need to initialize a new SmartRedis client for each. Step 1: Initialize ConfigOptions '''''''''''''''''''''''''''''''' -Since we are launching multiple databases within the experiment, +Since we are launching multiple ``Orchestrators`` within the experiment, the SmartRedis ``ConfigOptions`` object is required when initializing a client in the application. We use the ``ConfigOptions.create_from_environment()`` function to create three instances of ``ConfigOptions``, with one instance associated with each launched ``Orchestrator``. Most importantly, to associate each launched Orchestrator to a ConfigOptions object, -the ``create_from_environment()`` function requires specifying the unique database identifier +the ``create_from_environment()`` function requires specifying the unique ``Orchestrator`` identifier argument named `db_identifier`. -For the single-sharded database: +For the single-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python :linenos: :lines: 5-6 -For the multi-sharded database: +For the multi-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python :linenos: :lines: 10-11 -For the colocated database: +For the colocated ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python @@ -759,25 +757,25 @@ Step 2: Initialize the Client Connections ''''''''''''''''''''''''''''''''''''''''' Now that we have three ``ConfigOptions`` objects, we have the tools necessary to initialize three SmartRedis clients and -establish a connection with the three databases. +establish a connection with the three ``Orchestrators``. We use the SmartRedis ``Client`` API to create the client instances by passing in the ``ConfigOptions`` objects and assigning a `logger_name` argument. -Single-sharded database: +Single-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python :linenos: :lines: 7-8 -Multi-sharded database: +Multi-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python :linenos: :lines: 12-13 -Colocated database: +Colocated ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python @@ -786,12 +784,12 @@ Colocated database: Retrieve Data and Store Using SmartRedis Client Objects ------------------------------------------------------- -To confirm a successful connection to each database, we will retrieve the tensors +To confirm a successful connection to each ``Orchestrator``, we will retrieve the tensors that we plan to store in the python driver script. After retrieving, we -store both tensors in the colocated database. +store both tensors in the colocated ``Orchestrator``. The ``Client.get_tensor()`` method allows retrieval of a tensor. It requires the `name` of the tensor assigned -when sent to the database via ``Client.put_tensor()``. +when sent to the ``Orchestrator`` via ``Client.put_tensor()``. .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python @@ -806,8 +804,8 @@ located in ``getting-started-multidb/tutorial_model/``:: This output showcases that we have established a connection with multiple Orchestrators. -Next, take the tensors retrieved from the standard deployment databases and -store them in the colocated database using ``Client.put_tensor(name, data)``. +Next, take the tensors retrieved from the standalone deployment ``Orchestrators`` and +store them in the colocated ``Orchestrator`` using ``Client.put_tensor(name, data)``. .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python From 1be6241649090016ec81f3da1a10b7e3b5a0d44b Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Fri, 12 Jan 2024 15:06:59 -0600 Subject: [PATCH 21/26] address final comments --- doc/installation_instructions/basic.rst | 1 + doc/orchestrator.rst | 423 +++++++++++++----------- 2 files changed, 225 insertions(+), 199 deletions(-) diff --git a/doc/installation_instructions/basic.rst b/doc/installation_instructions/basic.rst index 3874eb961..556bb3d11 100644 --- a/doc/installation_instructions/basic.rst +++ b/doc/installation_instructions/basic.rst @@ -1,3 +1,4 @@ +.. _basic_install_SS: ****************** Basic Installation ****************** diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 2e8661ab6..718c8e492 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -4,45 +4,45 @@ Orchestrator ======== Overview ======== -The ``Orchestrator`` is an in-memory database with features built for +The orchestrator is an in-memory database with features built for AI-enabled workflows including online training, low-latency inference, cross-application data exchange, online interactive visualization, online data analysis, computational steering, and more. -An ``Orchestrator`` can be thought of as a general feature store +An orchestrator can be thought of as a general feature store capable of storing numerical data (Tensors and Datasets), AI Models (TF, TF-lite, PyTorch, or ONNX), -and scripts (TorchScripts). In addition to storing data, the ``Orchestrator`` is capable of +and scripts (TorchScripts). In addition to storing data, the orchestrator is capable of executing ML models and TorchScripts on the stored data using CPUs or GPUs. .. figure:: images/smartsim-arch.png Sample experiment showing a user application leveraging machine learning infrastructure launched by SmartSim and connected - to online analysis and visualization via the ``Orchestrator``. + to online analysis and visualization via the orchestrator. -Users can establish a connection to the ``Orchestrator`` from within SmartSim ``Model`` executable code, ``Ensemble`` +Users can establish a connection to the orchestrator from within SmartSim ``Model`` executable code, ``Ensemble`` model executable code, or driver scripts using the :ref:`SmartRedis` client library. -SmartSim offers two types of ``Orchestrator`` deployments: +SmartSim offers **two** types of orchestrator deployments: - :ref:`Standalone Deployment` - A standalone ``Orchestrator`` is ideal for systems that have heterogeneous node types + A standalone orchestrator is ideal for systems that have heterogeneous node types (i.e. a mix of CPU-only and GPU-enabled compute nodes) where - ML model and TorchScript evaluation is more efficiently performed off-node for a ``Model``. This + ML model and TorchScript evaluation is more efficiently performed off-node for a model. This deployment is also ideal for workflows relying on data exchange between multiple applications (e.g. online analysis, visualization, computational steering, or producer/consumer application couplings). Standalone deployment is also optimal for high data throughput scenarios with databases that require a large amount of hardware. - :ref:`Colocated Deployment` - A colocated ``Orchestrator`` is ideal when the data and hardware accelerator are located on the same compute node. + A colocated orchestrator is ideal when the data and hardware accelerator are located on the same compute node. This setup helps reduce latency in ML inference and TorchScript evaluation by eliminating off-node communication. -SmartSim allows users to launch **multiple orchestrators** during the course of an experiment of -either deployment type. If a workflow requires a multiple ``Orchestrator`` environment, a +SmartSim allows users to launch :ref:`multiple orchestrators` during the course of an experiment of +either orchestrator deployment type. If a workflow requires a multiple orchestrator environment, a `db_identifier` argument must be specified during ``Orchestrator`` initialization. Users can connect to -``Orchestrators`` in a parallel database workflow by specifying the respective `db_identifier` argument +orchestrators in a parallel database workflow by specifying the respective `db_identifier` argument within a ``ConfigOptions`` object to pass in to the SmartRedis ``Client`` constructor. The client can then be used to transmit data, -execute ML models, and execute scripts on the linked ``Orchestrator``. +execute ML models, and execute scripts on the linked orchestrator. .. _standalone_orch_doc: ===================== @@ -51,62 +51,62 @@ Standalone Deployment -------- Overview -------- -During standalone ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on separate -compute node(s) from the ``Model`` node(s). A standalone ``Orchestrator`` can be deployed on a single -node (standalone) or sharded (distributed) over multiple nodes. With a sharded ``Orchestrator``, users can +During standalone orchestrator deployment, a SmartSim orchestrator (the database) runs on separate +compute node(s) from the model node(s). A standalone orchestrator can be deployed on a single +node (standalone) or sharded (distributed) over multiple nodes. With a sharded orchestrator, users can scale the number of database nodes for inference and script evaluation, contributing to an increased in-memory capacity for data storage in large-scale workflows. Standalone -``Orchestrators`` are effective for small-scale workflows and offer lower latency since -single-node ``Orchestrators`` don't involve communication between nodes. +orchestrators are effective for small-scale workflows and offer lower latency since +single-node orchestrators don't involve communication between nodes. -Communication between a standalone ``Orchestrator`` and ``Model`` -is facilitated by a SmartRedis ``Client`` and initialized in a ``Model`` application. +Communication between a standalone orchestrator and SmartSim model +is facilitated by a SmartRedis client and initialized in a SmartSim model application. -When connecting to a standalone ``Orchestrator`` from within a ``Model`` application, the user has -several options when using the SmartRedis ``Client``: +When connecting to a standalone orchestrator from within a model application, the user has +several options when using the SmartRedis client: -- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim - to detect the ``Orchestrator`` address through runtime configuration of the ``Model`` environment. +- In an experiment with a single deployed orchestrator, users can rely on SmartSim + to detect the orchestrator address through runtime configuration of the SmartSim model environment. A default ``Client`` constructor, with no user-specified parameters, is sufficient to - connect to the ``Orchestrator``. The only exception is for the Python `client`, which requires + connect to the orchestrator. The only exception is for the Python `client`, which requires the `cluster` constructor parameter to differentiate between a multi-node standalone deployment and a single-node standalone deployment. -- In an experiment with multiple ``Orchestrator`` deployments, users can connect to a specific ``Orchestrator`` by +- In an experiment with multiple orchestrator deployments, users can connect to a specific orchestrator by first specifying the `db_identifier` in the ``ConfigOptions`` constructor. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. - Users can specify or override automatically configured connection options by providing the - ``Orchestrator`` address in the ``ConfigOptions`` object. Subsequently, users should pass the ``ConfigOptions`` + orchestrator address in the ``ConfigOptions`` object. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. -If connecting to a standalone ``Orchestrator`` from a SmartSim driver script, the user must specify -the address of the ``Orchestrator`` via the ``Client`` constructor. SmartSim does not automatically -configure the environment of the driver script to connect to an ``Orchestrator``. Users -can access an ``Orchestrators`` address through ``Orchestrator.get_address()``. +If connecting to a standalone orchestrator from a SmartSim driver script, the user must specify +the address of the orchestrator via the ``Client`` constructor. SmartSim does not automatically +configure the environment of the driver script to connect to an orchestrator. Users +can access an orchestrators address through ``Orchestrator.get_address()``. .. note:: - In ``Model`` applications, it is advisable to **avoid** specifying addresses directly to the ``Client`` constructor. - Utilizing the SmartSim environment configuration for SmartRedis `client` connections - allows the ``Model`` application code to remain unchanged even as ``Orchestrator`` deployment + In SmartSim model applications, it is advisable to **avoid** specifying addresses directly to the ``Client`` constructor. + Utilizing the SmartSim environment configuration for SmartRedis client connections + allows the SmartSim model application code to remain unchanged even as orchestrator deployment options vary. The following image illustrates -communication between a standalone ``Orchestrator`` and a -``Model``. In the diagram, the application is running on multiple compute nodes, -separate from the ``Orchestrator`` compute nodes. Communication is established between the -``Model`` application and the sharded ``Orchestrator`` using the :ref:`SmartRedis Client` Client. +communication between a standalone orchestrator and a +SmartSim model. In the diagram, the application is running on multiple compute nodes, +separate from the orchestrator compute nodes. Communication is established between the +SmartSim model application and the sharded orchestrator using the :ref:`SmartRedis client`. .. figure:: images/clustered_orchestrator-1.png - Sample Standalone ``Orchestrator`` Deployment + Sample Standalone orchestrator Deployment .. note:: Users do not need to know how the data is stored in a standalone configuration and can address the cluster with the SmartRedis clients like a single block of memory using simple put/get semantics in SmartRedis. -In scenarios with high data throughput, such as online analysis, training, and processing, a standalone ``Orchestrator`` -is optimal. The data produced by multiple processes in a ``Model`` is stored in the standalone -``Orchestrator`` and is available for consumption by other ``Models``. +In scenarios with high data throughput, such as online analysis, training, and processing, a standalone orchestrator +is optimal. The data produced by multiple processes in a SmartSim model is stored in the standalone +orchestrator and is available for consumption by other SmartSim models. If a workflow requires an application to leverage multiple standalone deployments, multiple clients can be instantiated within an application, @@ -116,24 +116,24 @@ with each client connected to a unique deployment. This is accomplished through ------- Example ------- -In the following example, we demonstrate deploying a standalone ``Orchestrator``. -Once the standalone ``Orchestrator`` is launched from the driver script, we walk through -connecting a SmartRedis ``Client`` to the ``Orchestrator`` from within the ``Model`` +In the following example, we demonstrate deploying a standalone orchestrator on an HPC System. +Once the standalone orchestrator is launched from the driver script, we walk through +connecting a SmartRedis client to the orchestrator from within the SmartSim model script to transmit data then poll for the existence of the data. The example is comprised of two script files: - :ref:`Application Script` The application script is a Python file that contains instructions to create a SmartRedis - client connection to the standalone ``Orchestrator`` launched in the driver script. + client connection to the standalone orchestrator launched in the driver script. To demonstrate the ability of workflow components to access data from other entities, we then retrieve the tensors set by the driver script using a SmartRedis client in the application script. We then instruct the client to send and retrieve data from within the application script. - :ref:`Experiment Driver Script` The experiment driver script is responsible for launching and managing SmartSim entities. Within this script, - we use the Experiment API to create and launch a standalone ``Orchestrator``. To demonstrate the capability of - ``Model`` applications to access ``Orchestrator`` data sent from other sources, we employ the SmartRedis ``Client`` in - the driver script to store a tensor in the ``Orchestrator``, which is later retrieved by the ``Model``. + we use the Experiment API to create and launch a standalone orchestrator. To demonstrate the capability of + SmartSim model applications to access orchestrator data sent from other sources, we employ the SmartRedis ``Client`` in + the driver script to store a tensor in the orchestrator, which is later retrieved by the SmartSim model. To employ the application script, we initialize a ``Model`` object with the application script as an executable argument, launch the ``Orchestrator``, and then launch the ``Model``. @@ -154,13 +154,13 @@ To begin writing the application script, import the necessary SmartRedis package Client Initialization --------------------- -To establish a connection with the ``Orchestrator``, we need to initialize a new SmartRedis client. +To establish a connection with the orchestrator, we need to initialize a new SmartRedis client. Since the ``Orchestrator`` we launch in the driver script is sharded, we specify the constructor argument `cluster` as `True`. .. note:: Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations - from the ``Model`` environment and the `cluster` constructor argument does not need to be specified + from the SmartSim model environment and the `cluster` constructor argument does not need to be specified in those client languages. .. code-block:: python @@ -169,25 +169,25 @@ constructor argument `cluster` as `True`. application_client = Client(cluster=True) .. note:: - Since there is only one ``Orchestrator`` launched in the Experiment - (the standalone ``Orchestrator``), specifying a ``Orchestrator`` address - is not required when initializing the client. + Since there is only one orchestrator launched in the experiment + (the standalone orchestrator), specifying a orchestrator address + is not required when initializing the SmartRedis client. SmartRedis will handle the connection configuration. .. note:: - To create a client connection to the standalone ``Orchestrator``, the standalone ``Orchestrator`` must be launched + To create a SmartRedis client connection to the standalone orchestrator, the standalone orchestrator must be launched from within the driver script. You must execute the Python driver script, otherwise, there will - be no ``Orchestrator`` to connect the client to. + be no orchestrator to connect the client to. Data Retrieval -------------- -To confirm a successful connection to the ``Orchestrator``, we retrieve the tensor we set from the Python driver script. +To confirm a successful connection to the orchestrator, we retrieve the tensor we set from the Python driver script. Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name `tensor_1` we used in the driver script as input to ``Client.put_tensor()``: .. code-block:: python - # Retrieve tensor from Orchestrator + # Retrieve tensor from orchestrator driver_script_tensor = application_client.get_tensor("tensor_1") # Log tensor application_client.log_data(LLInfo, f"The single sharded db tensor is: {driver_script_tensor}") @@ -199,7 +199,7 @@ located in ``getting-started/tutorial_model/``:: Data Storage ------------ -Next, create a NumPy tensor to send to the standalone ``Orchestrator`` using +Next, create a NumPy tensor to send to the standalone orchestrator using ``Client.put_tensor(name, data)``: .. code-block:: python @@ -219,8 +219,11 @@ Python driver script. Configuring and launching workflow entities (``Model`` and ``Experiment`` class methods. The ``Experiment`` object is intended to be instantiated once and utilized throughout the workflow runtime. -In this example, we instantiate an ``Experiment`` object with the name `getting-started`, and we -setup the SmartSim `logger` to output information from the ``Experiment`` at runtime: +In this example, we instantiate an ``Experiment`` object with the name `getting-started` +and the `launcher` set to `auto`. When using `launcher=auto`, SmartSim attempts to find a launcher on the machine. +In this case, since we are running the example on a Slurm-based machine, +SmartSim will automatically set the launcher to `slurm`. +We setup the SmartSim `logger` to output information from the ``Experiment`` at runtime: .. code-block:: python @@ -239,42 +242,42 @@ setup the SmartSim `logger` to output information from the ``Experiment`` at run Orchestrator Deployment ----------------------- -In the context of this ``Experiment``, it's essential to create and launch -the ``Orchestrator`` as a preliminary step before any other workflow entities. This is because -in this example the application script requests and sends tensors to and from a launched ``Orchestrator``. +In the context of this experiment, it's essential to create and launch +the orchestrator as a preliminary step before any other workflow entities. This is because +in this example the application script requests and sends tensors to and from a launched orchestrator. -In the next stage of the experiment, we create and launch a standalone ``Orchestrator``. +In the next stage of the experiment, we create and launch a standalone orchestrator. Step 1: Initialize '''''''''''''''''' -To create a standalone ``Orchestrator``, utilize the ``Experiment.create_database()`` function. +To create a standalone orchestrator, utilize the ``Experiment.create_database()`` function. .. code-block:: python - # Initialize a multi-sharded database + # Initialize a multi-sharded Orchestrator standalone_orchestrator = exp.create_database(db_nodes=3) Step 2: Start ''''''''''''' -Next, to launch the ``Orchestrator``, pass the ``Orchestrator`` instance to ``Experiment.start()``. +Next, to launch the orchestrator, pass the ``Orchestrator`` instance to ``Experiment.start()``. .. code-block:: python - # Launch the multi sharded orchestrator + # Launch the multi sharded Orchestrator exp.start(standalone_orchestrator) -The ``Experiment.start()`` function launches the ``Orchestrator`` for use within the workflow. +The ``Experiment.start()`` function launches the orchestrator for use within the workflow. In other words, the function deploys the ``Orchestrator`` on the allocated compute resources. Client Initialization --------------------- The SmartRedis ``Client`` object contains functions that manipulate, send, and retrieve -data on the ``Orchestrator``. Begin by initializing a SmartRedis ``Client`` object for the standalone ``Orchestrator``. +data on the orchestrator. Begin by initializing a SmartRedis ``Client`` object for the standalone orchestrator. SmartRedis clients in driver scripts do not have the ability to use a `db-identifier` or -rely on automatic configurations to connect to ``Orchestrators``. Therefore, when creating a client -connection from within a driver script, specify the address of the ``Orchestrator`` you would like to connect to. -You can easily retrieve the ``Orchestrator`` address using the ``Orchestrator.get_address()`` function: +rely on automatic configurations to connect to orchestrators. Therefore, when creating a SmartRedis client +connection from within a driver script, specify the address of the orchestrator you would like to connect to. +You can easily retrieve the orchestrator address using the ``Orchestrator.get_address()`` function: .. code-block:: python @@ -285,8 +288,8 @@ Data Storage ------------ In the application script, we retrieved a NumPy tensor stored from within the driver script. To support the application functionality, we create a -NumPy array in the experiment workflow to send to the ``Orchestrator``. To -send a tensor to the ``Orchestrator``, use the function ``Client.put_tensor(name, data)``: +NumPy array in the experiment workflow to send to the orchestrator. To +send a tensor to the orchestrator, use the function ``Client.put_tensor(name, data)``: .. code-block:: python @@ -302,7 +305,7 @@ a SmartSim ``Model`` and specifying the application script name during ``Model`` Step 1: Configure ''''''''''''''''' -In this example experiment, the ``Model`` application is a Python script as defined in section: +In this example experiment, the model application is a Python script as defined in section: :ref:`Application Script`. Before initializing the ``Model`` object, we must use ``Experiment.create_run_settings()`` to create a ``RunSettings`` object that defines how to execute the ``Model``. To launch the Python script in this example workflow, we specify the path to the application @@ -347,15 +350,15 @@ Next, launch the `model` instance using the ``Experiment.start()`` function: .. note:: We specify `block=True` to ``exp.start()`` because our experiment requires that the ``Model`` finish before the experiment continues. - This is because we will request tensors from the ``Orchestrator`` that + This is because we will request tensors from the orchestrator that are inputted by the ``Model`` we launched. Data Polling ------------ -Next, check if the tensor exists in the standalone ``Orchestrator`` using ``Client.poll_tensor()``. -This function queries for data in the ``Orchestrator``. The function requires the tensor name (`name`), +Next, check if the tensor exists in the standalone orchestrator` using ``Client.poll_tensor()``. +This function queries for data in the orchestrator. The function requires the tensor name (`name`), how many milliseconds to wait in between queries (`poll_frequency_ms`), -and the total number of times to query (`num_tries`). Check if the data exists in the ``Orchestrator`` by +and the total number of times to query (`num_tries`). Check if the data exists in the orchestrator by polling every 100 milliseconds until 10 attempts have completed: .. code-block:: python @@ -371,7 +374,7 @@ When you execute the driver script, the output will be as follows:: Cleanup ------- -Finally, use the ``Experiment.stop()`` function to stop the ``Orchestrator`` instances. Print the +Finally, use the ``Experiment.stop()`` function to stop the ``Orchestrator`` instance. Print the workflow summary with ``Experiment.summary()``: .. code-block:: python @@ -394,35 +397,40 @@ Colocated Deployment -------- Overview -------- -During colocated ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on -the ``Models`` compute node(s). Colocated ``Orchestrators`` can only be deployed as isolated instances -on each compute node and cannot be clustered over multiple nodes. The database on each application node is -utilized by SmartRedis clients on the same node. With a colocated ``Orchestrator``, latency is reduced -in ML inference and TorchScript evaluation by eliminating off-node communication. A colocated ``Orchestrator`` +During colocated orchestrator deployment, a SmartSim orchestrator (the database) runs on +the models compute node(s). Colocated orchestrators can only be deployed as isolated instances +on each compute node and cannot be clustered over multiple nodes. The orchestrator on each application node is +utilized by SmartRedis clients on the same node. With a colocated orchestrator, latency is reduced +in ML inference and TorchScript evaluation by eliminating off-node communication. A colocated orchestrator is ideal when the data and hardware accelerator are located on the same compute node. -Communication between a colocated ``Orchestrator`` and ``Model`` -is initiated in the application through a SmartRedis client. Since a colocated ``Orchestrator`` is launched when the ``Model`` -is started by the experiment, connecting a SmartRedis ``Client`` to a colocated database is only possible from within -the associated ``Model`` application. +Communication between a colocated orchestrator and SmartSim model +is initiated in the application through a SmartRedis client. Since a colocated orchestrator is launched when the SmartSim model +is started by the experiment, connecting a SmartRedis client to a colocated orchestrator is only possible from within +the associated SmartSim model application. -The client can establish a connection with a colocated ``Orchestrator`` through **three** processes: +There are **three** methods for connecting the SmartRedis client to the colocated orchestrator: -- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim - to detect the database address through the ``Model`` environment configuration - at runtime. -- Users can provide the database address in the ``Client`` constructor within the ``Model`` application script. -- In an experiment with multiple ``Orchestrator`` deployments, a user can connect to an ``Orchestrator`` by - first specifying the `db_identifier` to the ``ConfigOptions`` constructor. A user should then pass the ``ConfigOptions`` + +- In an experiment with a single deployed orchestrator, users can rely on SmartSim + to detect the orchestrator address through runtime configuration of the SmartSim model environment. + A default ``Client`` constructor, with no user-specified parameters, is sufficient to + connect to the orchestrator. The only exception is for the Python `client`, which requires + the `cluster=False` constructor parameter for the colocated orchestrator. +- In an experiment with multiple orchestrator deployments, users can connect to a specific orchestrator by + first specifying the `db_identifier` in the ``ConfigOptions`` constructor. Subsequently, users should pass the + ``ConfigOptions`` instance to the ``Client`` constructor. +- Users can specify or override automatically configured connection options by providing the + orchestrator address in the ``ConfigOptions`` object. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. -Below is an image illustrating communication within a colocated ``Model`` spanning multiple compute nodes. +Below is an image illustrating communication within a colocated SmartSim model spanning multiple compute nodes. As demonstrated in the diagram, each process of the application creates its own SmartRedis client connection to the orchestrator running on the same host. .. figure:: images/colocated_orchestrator-1.png - Sample Colocated ``Orchestrator`` Deployment + Sample Colocated Orchestrator Deployment Colocated deployment is ideal for highly performant online inference scenarios where a distributed application (likely an MPI application) is performing inference with @@ -433,27 +441,27 @@ are stored on-node. If a workflow requires an application to both leverage colocated deployment and standalone deployment, multiple clients can be instantiated within an application, with each client connected to a unique deployment. This is accomplished through the use of the -`db-identifier` specified at Orchestrator initialization time. +`db-identifier` specified at ``Orchestrator`` initialization time. ------- Example ------- -In the following example, we demonstrate deploying a colocated ``Orchestrator``. -Once the ``Orchestrator`` is launched, we walk through connecting a SmartRedis ``Client`` +In the following example, we demonstrate deploying a colocated orchestrator on an HPC System. +Once the orchestrator is launched, we walk through connecting a SmartRedis client from within the application script to transmit data then poll for the existence of the data -on the ``Orchestrator``. +on the orchestrator. The example is comprised of two script files: - :ref:`Application Script` The application script is a Python script that connects a SmartRedis - client to the colocated ``Orchestrator``. From within the application script, + client to the colocated orchestrator. From within the application script, the client is utilized to both send and retrieve data. - :ref:`Experiment Driver Script` The experiment driver script launches and manages - the example entities with the ``Experiment`` API. - In the driver script, we use the ``Experiment`` API - to create and launch a colocated ``Model``. + the example entities with the Experiment API. + In the driver script, we use the Experiment API + to create and launch a colocated model. .. _colocated_orch_app_script: Application Script @@ -468,9 +476,14 @@ To begin writing the application script, import the necessary SmartRedis package Client Initialization --------------------- -To establish a connection with the colocated ``Orchestrator``, we need to initialize a -new SmartRedis client and specify `cluster=False` since colocated deployments are always -single-sharded: +To establish a connection with the colocated orchestrator, we need to initialize a +new SmartRedis `client` and specify `cluster=False` since colocated deployments are never +clustered but single-sharded. + +.. note:: + Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations + from the model environment and the `cluster` constructor argument does not need to be specified + in those client languages. .. code-block:: python @@ -478,13 +491,13 @@ single-sharded: colo_client = Client(cluster=False) .. note:: - Since there is only one ``Orchestrator`` launched in the Experiment - (the colocated ``Orchestrator``), specifying a orchestrator address + Since there is only one orchestrator launched in the Experiment + (the colocated orchestrator), specifying a orchestrator address is not required when initializing the client. SmartRedis will handle the connection configuration. .. note:: - To create a client connection to the colocated ``Orchestrator``, the colocated Model must be launched + To create a client connection to the colocated orchestrator, the colocated model must be launched from within the driver script. You must execute the Python driver script, otherwise, there will be no orchestrator to connect the client to. @@ -504,7 +517,7 @@ We will retrieve `“tensor_1”` in the following section. Data Retrieval -------------- -To confirm a successful connection to the ``Orchestrator``, we retrieve the tensor we stored. +To confirm a successful connection to the orchestrator, we retrieve the tensor we stored. Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name `“tensor_1”`: @@ -527,8 +540,11 @@ Experiment driver script. Configuring and launching workflow entities (``Model`` requires the utilization of ``Experiment`` class methods. The ``Experiment`` object is intended to be instantiated once and utilized throughout the workflow runtime. -In this example, we instantiate an ``Experiment`` object with the `name` `"getting-started"`, -and set up the SmartSim `logger` to output information from the ``Experiment`` at runtime: +In this example, we instantiate an ``Experiment`` object with the name `getting-started` +and the `launcher` set to `auto`. When using `launcher=auto`, SmartSim attempts to find a launcher on the machine. +In this case, since we are running the example on a Slurm-based machine, +SmartSim will automatically set the launcher to `slurm`. We set up the SmartSim `logger` +to output information from the experiment at runtime: .. code-block:: python @@ -548,13 +564,13 @@ and set up the SmartSim `logger` to output information from the ``Experiment`` a Colocated Model Initialization ------------------------------ In the next stage of the experiment, we -create and launch a colocated ``Model`` that -runs the application script with a ``Orchestrator`` +create and launch a colocated model that +runs the application script with a orchestrator on the same compute node. Step 1: Configure ''''''''''''''''' -In this example experiment, the ``Model`` application is a Python script as defined in section: +In this example experiment, the model application is a Python script as defined in section: :ref:`Application Script`. Before initializing the ``Model`` object, we must use ``Experiment.create_run_settings()`` to create a ``RunSettings`` object that defines how to execute the ``Model``. To launch the Python script in this example workflow, we specify the path to the application @@ -601,7 +617,7 @@ a Unix domain socket connection. Step 4: Start ''''''''''''' -Next, launch the colocated model instance using the ``Experiment.start()`` function. +Next, launch the colocated ``Model`` instance using the ``Experiment.start()`` function. .. code-block:: python @@ -611,8 +627,8 @@ Next, launch the colocated model instance using the ``Experiment.start()`` funct Cleanup ------- .. note:: - Since the colocated ``Orchestrator`` is automatically torn down by SmartSim once the colocated ``Model`` - has finished, we do not need to `stop` the ``Orchestrator``. + Since the colocated orchestrator is automatically torn down by SmartSim once the colocated model + has finished, we do not need to `stop` the orchestrator. .. code-block:: python @@ -624,16 +640,17 @@ When you run the experiment, the following output will appear:: |----|--------|---------------|-----------|---------|---------|-----------|--------------| | 0 | model | Model | 1592652.0 | 0 | 10.1039 | Completed | 0 | +.. _mutli_orch_doc: ====================== Multiple Orchestrators ====================== -SmartSim supports automating the deployment of multiple Orchestrators +SmartSim supports automating the deployment of multiple orchestrators from within an Experiment. Communication with the ``Orchestrator`` via a SmartRedis client is possible with the -`db_identifier` argument that is required when initializing an ``Orchestrator`` or -colocated ``Model`` during a multiple Orchestrator experiment. When initializing a SmartRedis -client during the Experiment, create a ``ConfigOptions`` object to specify the `db_identifier` +`db_identifier` argument that is required when initializing an orchestrator or +colocated model during a multiple orchestrator experiment. When initializing a SmartRedis +client during the experiment, create a ``ConfigOptions`` object to specify the `db_identifier` argument used when creating the ``Orchestrator``. Pass the ``ConfigOptions`` object to -the Client() init call. +the ``Client()`` init call. .. _mutli_orch: ----------------------------- @@ -641,11 +658,11 @@ Multiple Orchestrator Example ----------------------------- SmartSim offers functionality to automate the deployment of multiple databases, supporting workloads that require multiple -``Orchestrators`` for a ``Experiment``. For instance, a workload may consist of a +orchestrators for a ``Experiment``. For instance, a workload may consist of a simulation with high inference performance demands (necessitating a co-located deployment), -along with an analysis and -visualization workflow connected to the simulation (requiring a standalone orchestrator). -In the following example, we simulate a simple version of this use case. +along with an analysis and visualization workflow connected to the simulation +(requiring a standalone orchestrator). In the following example, we simulate a +simple version of this use case. The example is comprised of two script files: @@ -659,20 +676,20 @@ tasks. Applications are not limited to Python and can also be written in C, C++ and Fortran. This script specifies creating a Python SmartRedis client for each standalone orchestrator and a colocated orchestrator. We use the -clients to request data from both standalone ``Orchestrators``, then -transfer the data to the colocated ``Orchestrator``. The application +clients to request data from both standalone orchestrators, then +transfer the data to the colocated orchestrator. The application file is launched by the experiment driver script through a ``Model`` stage. **The Application Script Contents:** 1. Connecting SmartRedis clients within the application to retrieve tensors - from the standalone ``Orchestrators`` to store in a colocated ``Orchestrator``. Details in section: - :ref:`Initialize the Clients`. + from the standalone orchestrators to store in a colocated orchestrator. Details in section: + :ref:`Initialize the Clients`. **The Experiment Driver Script Overview:** The experiment driver script holds the stages of the workflow -and manages their execution through the ``Experiment`` API. +and manages their execution through the Experiment API. We initialize an Experiment at the beginning of the Python file and use the ``Experiment`` to iteratively create, configure and launch computational kernels @@ -682,31 +699,32 @@ runs the application. **The Experiment Driver Script Contents:** -1. Launching two standalone Orchestrators with unique identifiers. Details in section: - :ref:`Launch Multiple Orchestrators`. -2. Launching the application script with a co-located ``Orchestrator``. Details in section: - :ref:`Initialize a Colocated Model`. -3. Connecting SmartRedis clients within the driver script to send tensors to standalone Orchestrators +1. Launching two standalone orchestrators with unique identifiers. Details in section: + :ref:`Launch Multiple Orchestrators`. +2. Launching the application script with a colocated orchestrator. Details in section: + :ref:`Initialize a Colocated Model`. +3. Connecting SmartRedis clients within the driver script to send tensors to standalone orchestrators for retrieval within the application. Details in section: - :ref:`Create Client Connections to Orchestrators`. + :ref:`Create Client Connections to Orchestrators`. -Setup and run instructions can be found :ref:`here` +Setup and run instructions can be found :ref:`here` +.. _app_script_multi_db: The Application Script ====================== -Applications interact with the ``Orchestrators`` +Applications interact with the orchestrators through a SmartRedis client. In this section, we write an application script to demonstrate how to connect SmartRedis clients in the context of multiple -launched ``Orchestrators``. Using the clients, we retrieve tensors -from two ``Orchestrators`` launched in the driver script, then store -the tensors in the colocated ``Orchestrators``. +launched orchestrators. Using the clients, we retrieve tensors +from two orchestrators launched in the driver script, then store +the tensors in the colocated orchestrators. .. note:: - The Experiment must be started to use the Orchestrators within the + The Experiment must be started to use the orchestrators within the application script. Otherwise, it will fail to connect. - Find the instructions on how to launch :ref:`here` + Find the instructions on how to launch :ref:`here` To begin, import the necessary packages: @@ -715,38 +733,39 @@ To begin, import the necessary packages: :linenos: :lines: 1-3 +.. _init_model_client: Initialize the Clients ---------------------- -To establish a connection with each ``Orchestrator``, +To establish a connection with each orchestrators, we need to initialize a new SmartRedis client for each. Step 1: Initialize ConfigOptions '''''''''''''''''''''''''''''''' -Since we are launching multiple ``Orchestrators`` within the experiment, +Since we are launching multiple orchestrators within the experiment, the SmartRedis ``ConfigOptions`` object is required when initializing a client in the application. We use the ``ConfigOptions.create_from_environment()`` function to create three instances of ``ConfigOptions``, -with one instance associated with each launched ``Orchestrator``. -Most importantly, to associate each launched Orchestrator to a ConfigOptions object, -the ``create_from_environment()`` function requires specifying the unique ``Orchestrator`` identifier +with one instance associated with each launched orchestrator. +Most importantly, to associate each launched orchestrator to a ``ConfigOptions`` object, +the ``create_from_environment()`` function requires specifying the unique orchestrator identifier argument named `db_identifier`. -For the single-sharded ``Orchestrator``: +For the single-sharded orchestrator: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python :linenos: :lines: 5-6 -For the multi-sharded ``Orchestrator``: +For the multi-sharded orchestrator: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python :linenos: :lines: 10-11 -For the colocated ``Orchestrator``: +For the colocated orchestrator: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python @@ -812,8 +831,8 @@ store them in the colocated ``Orchestrator`` using ``Client.put_tensor(name, da :linenos: :lines: 28-30 -Next, check if the tensors exist in the colocated database using ``Client.poll_tensor()``. -This function queries for data in the database. The function requires the tensor name (`name`), +Next, check if the tensors exist in the colocated ``Orchestrator`` using ``Client.poll_tensor()``. +This function queries for data in the ``Orchestrator``. The function requires the tensor name (`name`), how many milliseconds to wait in between queries (`poll_frequency_ms`), and the total number of times to query (`num_tries`): @@ -841,32 +860,33 @@ We setup the SmartSim ``logger`` to output information from the Experiment. :linenos: :lines: 1-10 +.. _launch_multiple_orch: Launch Multiple Orchestrators ----------------------------- In the context of this ``Experiment``, it's essential to create and launch -the databases as a preliminary step before any other components since -the application script requests tensors from the launched databases. +the orchestrators as a preliminary step before any other components since +the application script requests tensors from the launched orchestrators. We aim to showcase the multi-database automation capabilities of SmartSim, so we -create two databases in the workflow: a single-sharded database and a -multi-sharded database. +create two orchestrators in the workflow: a single-sharded ``Orchestrator`` and a +multi-sharded ``Orchestrator``. Step 1: Initialize Orchestrators '''''''''''''''''''''''''''''''' -To create an database, utilize the ``Experiment.create_database()`` function. +To create an orchestrator, utilize the ``Experiment.create_database()`` function. The function requires specifying a unique -database identifier argument named `db_identifier` to launch multiple databases. +database identifier argument named `db_identifier` to launch multiple orchestrators. This step is necessary to connect to databases outside of the driver script. We will use the `db_identifier` names we specified in the application script. -For the single-sharded database: +For the single-sharded orchestrator: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python :linenos: :lines: 12-14 -For the multi-sharded database: +For the multi-sharded orchestrator: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python @@ -875,24 +895,24 @@ For the multi-sharded database: .. note:: Calling ``exp.generate()`` will create two subfolders - (one for each Orchestrator created in the previous step) - whose names are based on the db_identifier of that Orchestrator. + (one for each orchestrator created in the previous step) + whose names are based on the `db_identifier` of that orchestrator. In this example, the Experiment folder is - named ``getting-started-multidb/``. Within this folder, two Orchestrator subfolders will + named ``getting-started-multidb/``. Within this folder, two orchestrator subfolders will be created, namely ``single_shard_db_identifier/`` and ``multi_shard_db_identifier/``. -Step 2: Start Databases -''''''''''''''''''''''' -Next, to launch the databases, -pass the database instances to ``Experiment.start()``. +Step 2: Start +''''''''''''' +Next, to launch the orchestrators, +pass the ``Orchestrator`` instances to ``Experiment.start()``. .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python :linenos: :lines: 20-21 -The ``Experiment.start()`` function launches the ``Orchestrators`` for use within the workflow. In other words, the function -deploys the databases on the allocated compute resources. +The ``Experiment.start()`` function launches the orchestrators for use within the workflow. In other words, the function +deploys the orchestrators on the allocated compute resources. .. note:: By setting `summary=True`, SmartSim will print a summary of the @@ -901,23 +921,24 @@ deploys the databases on the allocated compute resources. briefly scan the summary contents. If we set `summary=False`, then the experiment would be launched immediately with no summary. +.. _client_connect_orch: Create Client Connections to Orchestrators ------------------------------------------ The SmartRedis ``Client`` object contains functions that manipulate, send, and receive -data within the database. Each database has a single, dedicated SmartRedis ``Client``. -Begin by initializing a SmartRedis ``Client`` object per launched database. +data within the orchestrator. Each orchestrator has a single, dedicated SmartRedis ``Client``. +Begin by initializing a SmartRedis ``Client`` object per launched orchestrator. -To create a designated SmartRedis ``Client``, you need to specify the address of the target -running database. You can easily retrieve this address using the ``Orchestrator.get_address()`` function. +To create a designated SmartRedis client, you need to specify the address of the target +running orchestrator. You can easily retrieve this address using the ``Orchestrator.get_address()`` function. -For the single-sharded database: +For the single-sharded orchestrator: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python :linenos: :lines: 23-24 -For the multi-sharded database: +For the multi-sharded orchestrator: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python @@ -928,25 +949,25 @@ Store Data Using Clients ------------------------ In the application script, we retrieved two NumPy tensors. To support the apps functionality, we will create two -NumPy arrays in the python driver script and send them to the a database. To +NumPy arrays in the python driver script and send them to the a orchestrator. To accomplish this, we use the ``Client.put_tensor()`` function with the respective -database client instances. +orchestrator `client` instances. -For the single-sharded database: +For the single-sharded orchestrator: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python :linenos: :lines: 28-31 -For the multi-sharded database: +For the multi-sharded orchestrator: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python :linenos: :lines: 33-36 -Lets check to make sure the database tensors do not exist in the incorrect databases: +Lets check to make sure the database tensors do not exist in the incorrect orchestrators: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python @@ -958,10 +979,11 @@ When you run the experiment, the following output will appear:: 00:00:00 system.host.com SmartSim[#####] INFO The multi shard array key exists in the incorrect database: False 00:00:00 system.host.com SmartSim[#####] INFO The single shard array key exists in the incorrect database: False +.. _init_colocated_model: Initialize a Colocated Model ---------------------------- In the next stage of the experiment, we -launch the application script with a co-located database +launch the application script with a co-located orchestrator by configuring and creating a SmartSim colocated ``Model``. @@ -969,8 +991,8 @@ Step 1: Configure ''''''''''''''''' You can specify the run settings of a model. In this experiment, we invoke the Python interpreter to run -the python script defined in section: :ref:`The Application Script`. -To configure this into a ``Model``, we use the ``Experiment.create_run_settings()`` function. +the python script defined in section: :ref:`The Application Script`. +To configure this into a SmartSim model, we use the ``Experiment.create_run_settings()`` function. The function returns a ``RunSettings`` object. When initializing the RunSettings object, we specify the path to the application file, @@ -1010,7 +1032,7 @@ to the ``create_model()`` function and assign to the variable ``model``. Step 2: Colocate '''''''''''''''' To colocate the model, use the ``Model.colocate_db_uds()`` function to -Colocate an Orchestrator instance with this Model over +Colocate an ``Orchestrator`` instance with this Model over a Unix domain socket connection. .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py @@ -1019,8 +1041,8 @@ a Unix domain socket connection. :lines: 51-52 This method will initialize settings which add an unsharded -database to this Model instance. Only this Model will be able -to communicate with this colocated database by using the loopback TCP interface. +orchestrator to this ``Model`` instance. Only this Model will be able +to communicate with this colocated orchestrator by using the loopback TCP interface. Step 3: Start ''''''''''''' @@ -1039,7 +1061,7 @@ Next, launch the colocated model instance using the ``Experiment.start()`` funct Cleanup Experiment ------------------ -Finally, use the ``Experiment.stop()`` function to stop the database instances. Print the +Finally, use the ``Experiment.stop()`` function to stop the ``Orchestrator`` instances. Print the workflow summary with ``Experiment.summary()``. .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py @@ -1056,16 +1078,17 @@ When you run the experiment, the following output will appear:: | 1 | single_shard_db_identifier_0 | DBNode | 1556529.3 | 0 | 68.8732 | Cancelled | 0 | | 2 | multi_shard_db_identifier_0 | DBNode | 1556529.4+2 | 0 | 45.5139 | Cancelled | 0 | +.. _run_ex_instruct: How to Run the Example ====================== Below are the steps to run the experiment. Find the -:ref:`experiment source code` -and :ref:`application source code` +:ref:`experiment source code` +and :ref:`application source code` below in the respective subsections. .. note:: The example assumes that you have already installed and built - SmartSim and SmartRedis. Please refer to Section :ref:`Basic Installation` + SmartSim and SmartRedis. Please refer to Section :ref:`Basic Installation` for further details. For simplicity, we assume that you are running on a SLURM-based HPC-platform. Refer to the steps below for more details. @@ -1089,19 +1112,21 @@ Step 2 : Install and Build SmartSim Step 3 : Change the `exe_args` file path When configuring the colocated model in `experiment_script.py`, we pass the file path of `application_script.py` to the `exe_args` argument - on line 33 in :ref:`experiment_script.py`. + on line 33 in :ref:`experiment_script.py`. Edit this argument to the file path of your `application_script.py` Step 4 : Run the Experiment Finally, run the experiment with ``python experiment_script.py``. +.. _multi_app_source_code: Application Source Code ----------------------- .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python :linenos: +.. _multi_exp_source_code: Experiment Source Code ---------------------- .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py From aa8c386b79d4e44b8fa5a98fa195566c1144d363 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Fri, 12 Jan 2024 15:08:29 -0600 Subject: [PATCH 22/26] caps --- doc/orchestrator.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 718c8e492..2a37204cb 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -11,7 +11,7 @@ exchange, online interactive visualization, online data analysis, computational An orchestrator can be thought of as a general feature store capable of storing numerical data (Tensors and Datasets), AI Models (TF, TF-lite, PyTorch, or ONNX), and scripts (TorchScripts). In addition to storing data, the orchestrator is capable of -executing ML models and TorchScripts on the stored data using CPUs or GPUs. +executing ML Models and TorchScripts on the stored data using CPUs or GPUs. .. figure:: images/smartsim-arch.png @@ -27,7 +27,7 @@ SmartSim offers **two** types of orchestrator deployments: - :ref:`Standalone Deployment` A standalone orchestrator is ideal for systems that have heterogeneous node types (i.e. a mix of CPU-only and GPU-enabled compute nodes) where - ML model and TorchScript evaluation is more efficiently performed off-node for a model. This + ML Model and TorchScript evaluation is more efficiently performed off-node for a ML Model. This deployment is also ideal for workflows relying on data exchange between multiple applications (e.g. online analysis, visualization, computational steering, or producer/consumer application couplings). Standalone deployment is also optimal for @@ -42,7 +42,7 @@ either orchestrator deployment type. If a workflow requires a multiple orchestra `db_identifier` argument must be specified during ``Orchestrator`` initialization. Users can connect to orchestrators in a parallel database workflow by specifying the respective `db_identifier` argument within a ``ConfigOptions`` object to pass in to the SmartRedis ``Client`` constructor. The client can then be used to transmit data, -execute ML models, and execute scripts on the linked orchestrator. +execute ML Models, and execute scripts on the linked orchestrator. .. _standalone_orch_doc: ===================== @@ -52,7 +52,7 @@ Standalone Deployment Overview -------- During standalone orchestrator deployment, a SmartSim orchestrator (the database) runs on separate -compute node(s) from the model node(s). A standalone orchestrator can be deployed on a single +compute node(s) from the SmartSim model node(s). A standalone orchestrator can be deployed on a single node (standalone) or sharded (distributed) over multiple nodes. With a sharded orchestrator, users can scale the number of database nodes for inference and script evaluation, contributing to an increased in-memory capacity for data storage in large-scale workflows. Standalone From 138b7e8924d37b9f6338f0802274c03c980a0b75 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Wed, 17 Jan 2024 16:10:21 -0600 Subject: [PATCH 23/26] updating some orch comments --- doc/orchestrator.rst | 64 +++++++++++++++++++++----------------------- 1 file changed, 30 insertions(+), 34 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 2a37204cb..33b14da22 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -4,11 +4,11 @@ Orchestrator ======== Overview ======== -The orchestrator is an in-memory database with features built for +The Orchestrator is an in-memory database with features built for AI-enabled workflows including online training, low-latency inference, cross-application data exchange, online interactive visualization, online data analysis, computational steering, and more. -An orchestrator can be thought of as a general feature store +An Orchestrator can be thought of as a general feature store capable of storing numerical data (Tensors and Datasets), AI Models (TF, TF-lite, PyTorch, or ONNX), and scripts (TorchScripts). In addition to storing data, the orchestrator is capable of executing ML Models and TorchScripts on the stored data using CPUs or GPUs. @@ -17,32 +17,31 @@ executing ML Models and TorchScripts on the stored data using CPUs or GPUs. Sample experiment showing a user application leveraging machine learning infrastructure launched by SmartSim and connected - to online analysis and visualization via the orchestrator. + to online analysis and visualization via the Orchestrator. -Users can establish a connection to the orchestrator from within SmartSim ``Model`` executable code, ``Ensemble`` +Users can establish a connection to the ``Orchestrator`` from within SmartSim ``Model`` executable code, ``Ensemble`` model executable code, or driver scripts using the :ref:`SmartRedis` client library. -SmartSim offers **two** types of orchestrator deployments: +SmartSim offers **two** types of ``Orchestrator`` deployments: - :ref:`Standalone Deployment` - A standalone orchestrator is ideal for systems that have heterogeneous node types + A standalone ``Orchestrator`` is ideal for systems that have heterogeneous node types (i.e. a mix of CPU-only and GPU-enabled compute nodes) where - ML Model and TorchScript evaluation is more efficiently performed off-node for a ML Model. This + ML Model and TorchScript evaluation is more efficiently performed off-node. This deployment is also ideal for workflows relying on data exchange between multiple applications (e.g. online analysis, visualization, computational steering, or producer/consumer application couplings). Standalone deployment is also optimal for - high data throughput scenarios with databases that require a large amount of hardware. + high data throughput scenarios where ``Orchestrators`` require large amounts of compute resources. - :ref:`Colocated Deployment` - A colocated orchestrator is ideal when the data and hardware accelerator are located on the same compute node. + A colocated ``Orchestrator`` is ideal when the data and hardware accelerator are located on the same compute node. This setup helps reduce latency in ML inference and TorchScript evaluation by eliminating off-node communication. -SmartSim allows users to launch :ref:`multiple orchestrators` during the course of an experiment of -either orchestrator deployment type. If a workflow requires a multiple orchestrator environment, a +SmartSim allows users to launch :ref:`multiple Orchestrators` of either type during +the course of an experiment. If a workflow requires a multiple ``Orchestrator`` environment, a `db_identifier` argument must be specified during ``Orchestrator`` initialization. Users can connect to -orchestrators in a parallel database workflow by specifying the respective `db_identifier` argument -within a ``ConfigOptions`` object to pass in to the SmartRedis ``Client`` constructor. The client can then be used to transmit data, -execute ML Models, and execute scripts on the linked orchestrator. +``Orchestrators`` in a multiple ``Orchestrator`` workflow by specifying the respective `db_identifier` argument +within a ``ConfigOptions`` object that is passed into the SmartRedis ``Client`` constructor. .. _standalone_orch_doc: ===================== @@ -54,14 +53,11 @@ Overview During standalone orchestrator deployment, a SmartSim orchestrator (the database) runs on separate compute node(s) from the SmartSim model node(s). A standalone orchestrator can be deployed on a single node (standalone) or sharded (distributed) over multiple nodes. With a sharded orchestrator, users can -scale the number of database nodes for inference and script evaluation, contributing to an +scale the number of database nodes for inference and script evaluation, enabling increased in-memory capacity for data storage in large-scale workflows. Standalone -orchestrators are effective for small-scale workflows and offer lower latency since +orchestrators are effective for small-scale workflows and offer lower latency for some API calls because single-node orchestrators don't involve communication between nodes. -Communication between a standalone orchestrator and SmartSim model -is facilitated by a SmartRedis client and initialized in a SmartSim model application. - When connecting to a standalone orchestrator from within a model application, the user has several options when using the SmartRedis client: @@ -158,19 +154,19 @@ To establish a connection with the orchestrator, we need to initialize a new Sma Since the ``Orchestrator`` we launch in the driver script is sharded, we specify the constructor argument `cluster` as `True`. -.. note:: - Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations - from the SmartSim model environment and the `cluster` constructor argument does not need to be specified - in those client languages. - .. code-block:: python # Initialize a Client application_client = Client(cluster=True) +.. note:: + Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations + from the SmartSim model environment and the `cluster` constructor argument does not need to be specified + in those client languages. + .. note:: Since there is only one orchestrator launched in the experiment - (the standalone orchestrator), specifying a orchestrator address + (the standalone orchestrator), specifying an orchestrator address is not required when initializing the SmartRedis client. SmartRedis will handle the connection configuration. @@ -192,8 +188,8 @@ used in the driver script as input to ``Client.put_tensor()``: # Log tensor application_client.log_data(LLInfo, f"The single sharded db tensor is: {driver_script_tensor}") -Later, when you run the driver script the following output will appear in ``model.out`` -located in ``getting-started/tutorial_model/``:: +After the Model is launched by the driver script, the following output will appear in +`getting-started/tutorial_model/model.out`:: Default@17-11-48:The single sharded db tensor is: [1 2 3 4] @@ -288,7 +284,7 @@ Data Storage ------------ In the application script, we retrieved a NumPy tensor stored from within the driver script. To support the application functionality, we create a -NumPy array in the experiment workflow to send to the orchestrator. To +NumPy array in the experiment driver script to send to the orchestrator. To send a tensor to the orchestrator, use the function ``Client.put_tensor(name, data)``: .. code-block:: python @@ -355,7 +351,7 @@ Next, launch the `model` instance using the ``Experiment.start()`` function: Data Polling ------------ -Next, check if the tensor exists in the standalone orchestrator` using ``Client.poll_tensor()``. +Next, check if the tensor exists in the standalone orchestrator using ``Client.poll_tensor()``. This function queries for data in the orchestrator. The function requires the tensor name (`name`), how many milliseconds to wait in between queries (`poll_frequency_ms`), and the total number of times to query (`num_tries`). Check if the data exists in the orchestrator by @@ -480,16 +476,16 @@ To establish a connection with the colocated orchestrator, we need to initialize new SmartRedis `client` and specify `cluster=False` since colocated deployments are never clustered but single-sharded. -.. note:: - Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations - from the model environment and the `cluster` constructor argument does not need to be specified - in those client languages. - .. code-block:: python # Initialize a Client colo_client = Client(cluster=False) +.. note:: + Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations + from the model environment and the `cluster` constructor argument does not need to be specified + in those client languages. + .. note:: Since there is only one orchestrator launched in the Experiment (the colocated orchestrator), specifying a orchestrator address From c8802aebb0d652fe164d8d55b4a1cae3ab4f724d Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Thu, 18 Jan 2024 11:07:36 -0600 Subject: [PATCH 24/26] changing the orch styling --- doc/orchestrator.rst | 326 +++++++++++++++++++++---------------------- 1 file changed, 163 insertions(+), 163 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index 33b14da22..ac076d051 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -4,23 +4,23 @@ Orchestrator ======== Overview ======== -The Orchestrator is an in-memory database with features built for +The ``Orchestrator`` is an in-memory database with features built for AI-enabled workflows including online training, low-latency inference, cross-application data exchange, online interactive visualization, online data analysis, computational steering, and more. -An Orchestrator can be thought of as a general feature store +An ``Orchestrator`` can be thought of as a general feature store capable of storing numerical data (Tensors and Datasets), AI Models (TF, TF-lite, PyTorch, or ONNX), -and scripts (TorchScripts). In addition to storing data, the orchestrator is capable of +and scripts (TorchScripts). In addition to storing data, the ``Orchestrator`` is capable of executing ML Models and TorchScripts on the stored data using CPUs or GPUs. .. figure:: images/smartsim-arch.png Sample experiment showing a user application leveraging machine learning infrastructure launched by SmartSim and connected - to online analysis and visualization via the Orchestrator. + to online analysis and visualization via the ``Orchestrator``. Users can establish a connection to the ``Orchestrator`` from within SmartSim ``Model`` executable code, ``Ensemble`` -model executable code, or driver scripts using the :ref:`SmartRedis` client library. +``Model`` executable code, or driver scripts using the :ref:`SmartRedis` client library. SmartSim offers **two** types of ``Orchestrator`` deployments: @@ -50,59 +50,59 @@ Standalone Deployment -------- Overview -------- -During standalone orchestrator deployment, a SmartSim orchestrator (the database) runs on separate -compute node(s) from the SmartSim model node(s). A standalone orchestrator can be deployed on a single -node (standalone) or sharded (distributed) over multiple nodes. With a sharded orchestrator, users can +During standalone ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on separate +compute node(s) from the SmartSim ``Model`` node(s). A standalone ``Orchestrator`` can be deployed on a single +node (standalone) or sharded (distributed) over multiple nodes. With a sharded ``Orchestrator``, users can scale the number of database nodes for inference and script evaluation, enabling increased in-memory capacity for data storage in large-scale workflows. Standalone -orchestrators are effective for small-scale workflows and offer lower latency for some API calls because -single-node orchestrators don't involve communication between nodes. +``Orchestrators`` are effective for small-scale workflows and offer lower latency for some API calls because +single-node ``Orchestrators`` don't require communication between ``Orchestrator`` nodes. -When connecting to a standalone orchestrator from within a model application, the user has +When connecting to a standalone ``Orchestrator`` from within a ``Model`` application, the user has several options when using the SmartRedis client: -- In an experiment with a single deployed orchestrator, users can rely on SmartSim - to detect the orchestrator address through runtime configuration of the SmartSim model environment. +- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim + to detect the ``Orchestrator`` address through runtime configuration of the SmartSim ``Model`` environment. A default ``Client`` constructor, with no user-specified parameters, is sufficient to - connect to the orchestrator. The only exception is for the Python `client`, which requires + connect to the ``Orchestrator``. The only exception is for the Python `client`, which requires the `cluster` constructor parameter to differentiate between a multi-node standalone deployment and a single-node standalone deployment. -- In an experiment with multiple orchestrator deployments, users can connect to a specific orchestrator by +- In an experiment with multiple ``Orchestrator``, users can connect to a specific ``Orchestrator`` by first specifying the `db_identifier` in the ``ConfigOptions`` constructor. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. - Users can specify or override automatically configured connection options by providing the - orchestrator address in the ``ConfigOptions`` object. Subsequently, users should pass the ``ConfigOptions`` + ``Orchestrator`` address in the ``ConfigOptions`` object. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. -If connecting to a standalone orchestrator from a SmartSim driver script, the user must specify -the address of the orchestrator via the ``Client`` constructor. SmartSim does not automatically -configure the environment of the driver script to connect to an orchestrator. Users +If connecting to a standalone ``Orchestrator`` from a SmartSim driver script, the user must specify +the address of the ``Orchestrator`` via the ``Client`` constructor. SmartSim does not automatically +configure the environment of the driver script to connect to an ``Orchestrator``. Users can access an orchestrators address through ``Orchestrator.get_address()``. .. note:: - In SmartSim model applications, it is advisable to **avoid** specifying addresses directly to the ``Client`` constructor. + In SmartSim ``Model`` applications, it is advisable to **avoid** specifying addresses directly to the ``Client`` constructor. Utilizing the SmartSim environment configuration for SmartRedis client connections - allows the SmartSim model application code to remain unchanged even as orchestrator deployment + allows the SmartSim ``Model`` application code to remain unchanged even as ``Orchestrator`` deployment options vary. The following image illustrates -communication between a standalone orchestrator and a -SmartSim model. In the diagram, the application is running on multiple compute nodes, -separate from the orchestrator compute nodes. Communication is established between the -SmartSim model application and the sharded orchestrator using the :ref:`SmartRedis client`. +communication between a standalone ``Orchestrator`` and a +SmartSim ``Model``. In the diagram, the application is running on multiple compute nodes, +separate from the ``Orchestrator`` compute nodes. Communication is established between the +SmartSim ``Model`` application and the sharded ``Orchestrator`` using the :ref:`SmartRedis client`. .. figure:: images/clustered_orchestrator-1.png - Sample Standalone orchestrator Deployment + Sample Standalone ``Orchestrator`` Deployment .. note:: Users do not need to know how the data is stored in a standalone configuration and can address the cluster with the SmartRedis clients like a single block of memory using simple put/get semantics in SmartRedis. -In scenarios with high data throughput, such as online analysis, training, and processing, a standalone orchestrator -is optimal. The data produced by multiple processes in a SmartSim model is stored in the standalone -orchestrator and is available for consumption by other SmartSim models. +In scenarios with high data throughput, such as online analysis, training, and processing, a standalone ``Orchestrator`` +is optimal. The data produced by multiple processes in a SmartSim ``Model`` is stored in the standalone +orchest``Orchestrator``rator and is available for consumption by other SmartSim ``Models``. If a workflow requires an application to leverage multiple standalone deployments, multiple clients can be instantiated within an application, @@ -112,24 +112,24 @@ with each client connected to a unique deployment. This is accomplished through ------- Example ------- -In the following example, we demonstrate deploying a standalone orchestrator on an HPC System. -Once the standalone orchestrator is launched from the driver script, we walk through -connecting a SmartRedis client to the orchestrator from within the SmartSim model -script to transmit data then poll for the existence of the data. +In the following example, we demonstrate deploying a standalone ``Orchestrator`` on an HPC System. +Once the standalone ``Orchestrator`` is launched from the driver script, we walk through +connecting a SmartRedis client to the ``Orchestrator`` from within the SmartSim ``Model`` +script to transmit and poll for data. The example is comprised of two script files: - :ref:`Application Script` The application script is a Python file that contains instructions to create a SmartRedis - client connection to the standalone orchestrator launched in the driver script. + client connection to the standalone ``Orchestrator`` launched in the driver script. To demonstrate the ability of workflow components to access data from other entities, we then retrieve the tensors set by the driver script using a SmartRedis client in the application script. We then instruct the client to send and retrieve data from within the application script. - :ref:`Experiment Driver Script` The experiment driver script is responsible for launching and managing SmartSim entities. Within this script, - we use the Experiment API to create and launch a standalone orchestrator. To demonstrate the capability of - SmartSim model applications to access orchestrator data sent from other sources, we employ the SmartRedis ``Client`` in - the driver script to store a tensor in the orchestrator, which is later retrieved by the SmartSim model. + we use the Experiment API to create and launch a standalone ``Orchestrator``. To demonstrate the capability of + SmartSim ``Model`` applications to access ``Orchestrator`` data sent from other sources, we employ the SmartRedis ``Client`` in + the driver script to store a tensor in the ``Orchestrator``, which is later retrieved by the SmartSim ``Model``. To employ the application script, we initialize a ``Model`` object with the application script as an executable argument, launch the ``Orchestrator``, and then launch the ``Model``. @@ -150,7 +150,7 @@ To begin writing the application script, import the necessary SmartRedis package Client Initialization --------------------- -To establish a connection with the orchestrator, we need to initialize a new SmartRedis client. +To establish a connection with the ``Orchestrator``, we need to initialize a new SmartRedis client. Since the ``Orchestrator`` we launch in the driver script is sharded, we specify the constructor argument `cluster` as `True`. @@ -161,41 +161,41 @@ constructor argument `cluster` as `True`. .. note:: Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations - from the SmartSim model environment and the `cluster` constructor argument does not need to be specified + from the SmartSim ``Model`` environment and the `cluster` constructor argument does not need to be specified in those client languages. .. note:: - Since there is only one orchestrator launched in the experiment - (the standalone orchestrator), specifying an orchestrator address + Since there is only one ``Orchestrator`` launched in the experiment + (the standalone ``Orchestrator``), specifying an ``Orchestrator`` address is not required when initializing the SmartRedis client. SmartRedis will handle the connection configuration. .. note:: - To create a SmartRedis client connection to the standalone orchestrator, the standalone orchestrator must be launched + To create a SmartRedis client connection to the standalone ``Orchestrator``, the standalone ``Orchestrator`` must be launched from within the driver script. You must execute the Python driver script, otherwise, there will - be no orchestrator to connect the client to. + be no ``Orchestrator`` to connect the client to. Data Retrieval -------------- -To confirm a successful connection to the orchestrator, we retrieve the tensor we set from the Python driver script. +To confirm a successful connection to the ``Orchestrator``, we retrieve the tensor we set from the Python driver script. Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name `tensor_1` we used in the driver script as input to ``Client.put_tensor()``: .. code-block:: python - # Retrieve tensor from orchestrator + # Retrieve tensor from Orchestrator driver_script_tensor = application_client.get_tensor("tensor_1") # Log tensor application_client.log_data(LLInfo, f"The single sharded db tensor is: {driver_script_tensor}") -After the Model is launched by the driver script, the following output will appear in +After the ``Model`` is launched by the driver script, the following output will appear in `getting-started/tutorial_model/model.out`:: Default@17-11-48:The single sharded db tensor is: [1 2 3 4] Data Storage ------------ -Next, create a NumPy tensor to send to the standalone orchestrator using +Next, create a NumPy tensor to send to the standalone ``Orchestrator`` using ``Client.put_tensor(name, data)``: .. code-block:: python @@ -239,14 +239,14 @@ We setup the SmartSim `logger` to output information from the ``Experiment`` at Orchestrator Deployment ----------------------- In the context of this experiment, it's essential to create and launch -the orchestrator as a preliminary step before any other workflow entities. This is because -in this example the application script requests and sends tensors to and from a launched orchestrator. +the ``Orchestrator`` as a preliminary step before any other workflow entities. This is because +in this example the application script requests and sends tensors to and from a launched ``Orchestrator``. -In the next stage of the experiment, we create and launch a standalone orchestrator. +In the next stage of the experiment, we create and launch a standalone ``Orchestrator``. Step 1: Initialize '''''''''''''''''' -To create a standalone orchestrator, utilize the ``Experiment.create_database()`` function. +To create a standalone ``Orchestrator``, utilize the ``Experiment.create_database()`` function. .. code-block:: python @@ -255,43 +255,43 @@ To create a standalone orchestrator, utilize the ``Experiment.create_database()` Step 2: Start ''''''''''''' -Next, to launch the orchestrator, pass the ``Orchestrator`` instance to ``Experiment.start()``. +Next, to launch the ``Orchestrator``, pass the ``Orchestrator`` instance to ``Experiment.start()``. .. code-block:: python # Launch the multi sharded Orchestrator exp.start(standalone_orchestrator) -The ``Experiment.start()`` function launches the orchestrator for use within the workflow. +The ``Experiment.start()`` function launches the ``Orchestrator`` for use within the workflow. In other words, the function deploys the ``Orchestrator`` on the allocated compute resources. Client Initialization --------------------- The SmartRedis ``Client`` object contains functions that manipulate, send, and retrieve -data on the orchestrator. Begin by initializing a SmartRedis ``Client`` object for the standalone orchestrator. +data on the ``Orchestrator``. Begin by initializing a SmartRedis ``Client`` object for the standalone ``Orchestrator``. SmartRedis clients in driver scripts do not have the ability to use a `db-identifier` or -rely on automatic configurations to connect to orchestrators. Therefore, when creating a SmartRedis client -connection from within a driver script, specify the address of the orchestrator you would like to connect to. -You can easily retrieve the orchestrator address using the ``Orchestrator.get_address()`` function: +rely on automatic configurations to connect to ``Orchestrators``. Therefore, when creating a SmartRedis client +connection from within a driver script, specify the address of the ``Orchestrator`` you would like to connect to. +You can easily retrieve the ``Orchestrator`` address using the ``Orchestrator.get_address()`` function: .. code-block:: python - # Initialize a SmartRedis client for multi sharded orchestrator + # Initialize a SmartRedis client for multi sharded Orchestrator driver_client = Client(cluster=True, address=standalone_orchestrator.get_address()[0]) Data Storage ------------ In the application script, we retrieved a NumPy tensor stored from within the driver script. To support the application functionality, we create a -NumPy array in the experiment driver script to send to the orchestrator. To -send a tensor to the orchestrator, use the function ``Client.put_tensor(name, data)``: +NumPy array in the experiment driver script to send to the ``Orchestrator``. To +send a tensor to the ``Orchestrator``, use the function ``Client.put_tensor(name, data)``: .. code-block:: python # Create NumPy array local_array = np.array([1, 2, 3, 4]) - # Use the SmartRedis client to place tensor in the standalone orchestrator + # Use the SmartRedis client to place tensor in the standalone Orchestrator driver_client.put_tensor("tensor_1", local_array) Model Initialization @@ -301,7 +301,7 @@ a SmartSim ``Model`` and specifying the application script name during ``Model`` Step 1: Configure ''''''''''''''''' -In this example experiment, the model application is a Python script as defined in section: +In this example experiment, the ``Model`` application is a Python script as defined in section: :ref:`Application Script`. Before initializing the ``Model`` object, we must use ``Experiment.create_run_settings()`` to create a ``RunSettings`` object that defines how to execute the ``Model``. To launch the Python script in this example workflow, we specify the path to the application @@ -315,7 +315,7 @@ will return a ``RunSettings`` object that can then be used to initialize the ``M Use the ``RunSettings`` helper functions to configure the the distribution of computational tasks (``RunSettings.set_nodes()``). In this -example, we specify to SmartSim that we intend the Model to run on a single compute node. +example, we specify to SmartSim that we intend the ``Model`` to run on a single compute node. .. code-block:: python @@ -346,15 +346,15 @@ Next, launch the `model` instance using the ``Experiment.start()`` function: .. note:: We specify `block=True` to ``exp.start()`` because our experiment requires that the ``Model`` finish before the experiment continues. - This is because we will request tensors from the orchestrator that + This is because we will request tensors from the ``Orchestrator`` that are inputted by the ``Model`` we launched. Data Polling ------------ -Next, check if the tensor exists in the standalone orchestrator using ``Client.poll_tensor()``. -This function queries for data in the orchestrator. The function requires the tensor name (`name`), +Next, check if the tensor exists in the standalone ``Orchestrator`` using ``Client.poll_tensor()``. +This function queries for data in the ``Orchestrator``. The function requires the tensor name (`name`), how many milliseconds to wait in between queries (`poll_frequency_ms`), -and the total number of times to query (`num_tries`). Check if the data exists in the orchestrator by +and the total number of times to query (`num_tries`). Check if the data exists in the ``Orchestrator`` by polling every 100 milliseconds until 10 attempts have completed: .. code-block:: python @@ -393,40 +393,40 @@ Colocated Deployment -------- Overview -------- -During colocated orchestrator deployment, a SmartSim orchestrator (the database) runs on -the models compute node(s). Colocated orchestrators can only be deployed as isolated instances -on each compute node and cannot be clustered over multiple nodes. The orchestrator on each application node is -utilized by SmartRedis clients on the same node. With a colocated orchestrator, latency is reduced -in ML inference and TorchScript evaluation by eliminating off-node communication. A colocated orchestrator +During colocated ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on +the ``Models`` compute node(s). Colocated ``Orchestrators`` can only be deployed as isolated instances +on each compute node and cannot be clustered over multiple nodes. The ``Orchestrator`` on each application node is +utilized by SmartRedis clients on the same node. With a colocated ``Orchestrator``, latency is reduced +in ML inference and TorchScript evaluation by eliminating off-node communication. A colocated ``Orchestrator`` is ideal when the data and hardware accelerator are located on the same compute node. -Communication between a colocated orchestrator and SmartSim model -is initiated in the application through a SmartRedis client. Since a colocated orchestrator is launched when the SmartSim model -is started by the experiment, connecting a SmartRedis client to a colocated orchestrator is only possible from within -the associated SmartSim model application. +Communication between a colocated ``Orchestrator`` and SmartSim ``Model`` +is initiated in the application through a SmartRedis client. Since a colocated ``Orchestrator`` is launched when the SmartSim ``Model`` +is started by the experiment, connecting a SmartRedis client to a colocated ``Orchestrator`` is only possible from within +the associated SmartSim ``Model`` application. -There are **three** methods for connecting the SmartRedis client to the colocated orchestrator: +There are **three** methods for connecting the SmartRedis client to the colocated ``Orchestrator``: -- In an experiment with a single deployed orchestrator, users can rely on SmartSim - to detect the orchestrator address through runtime configuration of the SmartSim model environment. +- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim + to detect the ``Orchestrator`` address through runtime configuration of the SmartSim ``Model`` environment. A default ``Client`` constructor, with no user-specified parameters, is sufficient to - connect to the orchestrator. The only exception is for the Python `client`, which requires - the `cluster=False` constructor parameter for the colocated orchestrator. -- In an experiment with multiple orchestrator deployments, users can connect to a specific orchestrator by + connect to the ``Orchestrator``. The only exception is for the Python `client`, which requires + the `cluster=False` constructor parameter for the colocated ``Orchestrator``. +- In an experiment with multiple ``Orchestrator`` deployments, users can connect to a specific ``Orchestrator`` by first specifying the `db_identifier` in the ``ConfigOptions`` constructor. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. - Users can specify or override automatically configured connection options by providing the - orchestrator address in the ``ConfigOptions`` object. Subsequently, users should pass the ``ConfigOptions`` + ``Orchestrator`` address in the ``ConfigOptions`` object. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. -Below is an image illustrating communication within a colocated SmartSim model spanning multiple compute nodes. +Below is an image illustrating communication within a colocated SmartSim ``Model`` spanning multiple compute nodes. As demonstrated in the diagram, each process of the application creates its own SmartRedis client -connection to the orchestrator running on the same host. +connection to the ``Orchestrator`` running on the same host. .. figure:: images/colocated_orchestrator-1.png - Sample Colocated Orchestrator Deployment + Sample Colocated ``Orchestrator`` Deployment Colocated deployment is ideal for highly performant online inference scenarios where a distributed application (likely an MPI application) is performing inference with @@ -442,22 +442,22 @@ with each client connected to a unique deployment. This is accomplished through ------- Example ------- -In the following example, we demonstrate deploying a colocated orchestrator on an HPC System. -Once the orchestrator is launched, we walk through connecting a SmartRedis client +In the following example, we demonstrate deploying a colocated ``Orchestrator`` on an HPC System. +Once the ``Orchestrator`` is launched, we walk through connecting a SmartRedis client from within the application script to transmit data then poll for the existence of the data -on the orchestrator. +on the ``Orchestrator``. The example is comprised of two script files: - :ref:`Application Script` The application script is a Python script that connects a SmartRedis - client to the colocated orchestrator. From within the application script, + client to the colocated ``Orchestrator``. From within the application script, the client is utilized to both send and retrieve data. - :ref:`Experiment Driver Script` The experiment driver script launches and manages the example entities with the Experiment API. In the driver script, we use the Experiment API - to create and launch a colocated model. + to create and launch a colocated ``Model``. .. _colocated_orch_app_script: Application Script @@ -472,7 +472,7 @@ To begin writing the application script, import the necessary SmartRedis package Client Initialization --------------------- -To establish a connection with the colocated orchestrator, we need to initialize a +To establish a connection with the colocated ``Orchestrator``, we need to initialize a new SmartRedis `client` and specify `cluster=False` since colocated deployments are never clustered but single-sharded. @@ -483,19 +483,19 @@ clustered but single-sharded. .. note:: Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations - from the model environment and the `cluster` constructor argument does not need to be specified + from the ``Model`` environment and the `cluster` constructor argument does not need to be specified in those client languages. .. note:: - Since there is only one orchestrator launched in the Experiment - (the colocated orchestrator), specifying a orchestrator address + Since there is only one ``Orchestrator`` launched in the Experiment + (the colocated ``Orchestrator``), specifying a ``Orchestrator`` address is not required when initializing the client. SmartRedis will handle the connection configuration. .. note:: - To create a client connection to the colocated orchestrator, the colocated model must be launched + To create a client connection to the colocated ``Orchestrator``, the colocated ``Model`` must be launched from within the driver script. You must execute the Python driver script, otherwise, there will - be no orchestrator to connect the client to. + be no ``Orchestrator`` to connect the client to. Data Storage ------------ @@ -513,7 +513,7 @@ We will retrieve `“tensor_1”` in the following section. Data Retrieval -------------- -To confirm a successful connection to the orchestrator, we retrieve the tensor we stored. +To confirm a successful connection to the ``Orchestrator``, we retrieve the tensor we stored. Use the ``Client.get_tensor()`` method to retrieve the tensor by specifying the name `“tensor_1”`: @@ -560,13 +560,13 @@ to output information from the experiment at runtime: Colocated Model Initialization ------------------------------ In the next stage of the experiment, we -create and launch a colocated model that -runs the application script with a orchestrator +create and launch a colocated ``Model`` that +runs the application script with a ``Orchestrator`` on the same compute node. Step 1: Configure ''''''''''''''''' -In this example experiment, the model application is a Python script as defined in section: +In this example experiment, the ``Model`` application is a Python script as defined in section: :ref:`Application Script`. Before initializing the ``Model`` object, we must use ``Experiment.create_run_settings()`` to create a ``RunSettings`` object that defines how to execute the ``Model``. To launch the Python script in this example workflow, we specify the path to the application @@ -580,7 +580,7 @@ will return a ``RunSettings`` object that can then be used to initialize the ``M Use the ``RunSettings`` helper functions to configure the the distribution of computational tasks (``RunSettings.set_nodes()``). In this -example, we specify to SmartSim that we intend the Model to run on a single compute node. +example, we specify to SmartSim that we intend the ``Model`` to run on a single compute node. .. code-block:: python @@ -602,7 +602,7 @@ assign the returned ``Model`` instance to the variable `model`: Step 3: Colocate '''''''''''''''' -To colocate the model, use the ``Model.colocate_db_uds()`` function. +To colocate the `model`, use the ``Model.colocate_db_uds()`` function. This function will colocate an ``Orchestrator`` instance with this ``Model`` over a Unix domain socket connection. @@ -623,8 +623,8 @@ Next, launch the colocated ``Model`` instance using the ``Experiment.start()`` f Cleanup ------- .. note:: - Since the colocated orchestrator is automatically torn down by SmartSim once the colocated model - has finished, we do not need to `stop` the orchestrator. + Since the colocated ``Orchestrator`` is automatically torn down by SmartSim once the colocated ``Model`` + has finished, we do not need to `stop` the ``Orchestrator``. .. code-block:: python @@ -640,10 +640,10 @@ When you run the experiment, the following output will appear:: ====================== Multiple Orchestrators ====================== -SmartSim supports automating the deployment of multiple orchestrators +SmartSim supports automating the deployment of multiple ``Orchestrators`` from within an Experiment. Communication with the ``Orchestrator`` via a SmartRedis client is possible with the -`db_identifier` argument that is required when initializing an orchestrator or -colocated model during a multiple orchestrator experiment. When initializing a SmartRedis +`db_identifier` argument that is required when initializing an ``Orchestrator`` or +colocated ``Model`` during a multiple ``Orchestrator`` experiment. When initializing a SmartRedis client during the experiment, create a ``ConfigOptions`` object to specify the `db_identifier` argument used when creating the ``Orchestrator``. Pass the ``ConfigOptions`` object to the ``Client()`` init call. @@ -654,10 +654,10 @@ Multiple Orchestrator Example ----------------------------- SmartSim offers functionality to automate the deployment of multiple databases, supporting workloads that require multiple -orchestrators for a ``Experiment``. For instance, a workload may consist of a +``Orchestrators`` for a ``Experiment``. For instance, a workload may consist of a simulation with high inference performance demands (necessitating a co-located deployment), along with an analysis and visualization workflow connected to the simulation -(requiring a standalone orchestrator). In the following example, we simulate a +(requiring a standalone ``Orchestrator``). In the following example, we simulate a simple version of this use case. The example is comprised of two script files: @@ -671,16 +671,16 @@ contains instructions to complete computational tasks. Applications are not limited to Python and can also be written in C, C++ and Fortran. This script specifies creating a Python SmartRedis client for each -standalone orchestrator and a colocated orchestrator. We use the -clients to request data from both standalone orchestrators, then -transfer the data to the colocated orchestrator. The application +standalone ``Orchestrator`` and a colocated ``Orchestrator``. We use the +clients to request data from both standalone ``Orchestrators``, then +transfer the data to the colocated ``Orchestrator``. The application file is launched by the experiment driver script through a ``Model`` stage. **The Application Script Contents:** 1. Connecting SmartRedis clients within the application to retrieve tensors - from the standalone orchestrators to store in a colocated orchestrator. Details in section: + from the standalone ``Orchestrators`` to store in a colocated ``Orchestrator``. Details in section: :ref:`Initialize the Clients`. **The Experiment Driver Script Overview:** @@ -695,11 +695,11 @@ runs the application. **The Experiment Driver Script Contents:** -1. Launching two standalone orchestrators with unique identifiers. Details in section: +1. Launching two standalone ``Orchestrators`` with unique identifiers. Details in section: :ref:`Launch Multiple Orchestrators`. -2. Launching the application script with a colocated orchestrator. Details in section: +2. Launching the application script with a colocated ``Orchestrator``. Details in section: :ref:`Initialize a Colocated Model`. -3. Connecting SmartRedis clients within the driver script to send tensors to standalone orchestrators +3. Connecting SmartRedis clients within the driver script to send tensors to standalone ``Orchestrators`` for retrieval within the application. Details in section: :ref:`Create Client Connections to Orchestrators`. @@ -708,17 +708,17 @@ Setup and run instructions can be found :ref:`here` .. _app_script_multi_db: The Application Script ====================== -Applications interact with the orchestrators +Applications interact with the ``Orchestrators`` through a SmartRedis client. In this section, we write an application script to demonstrate how to connect SmartRedis clients in the context of multiple -launched orchestrators. Using the clients, we retrieve tensors -from two orchestrators launched in the driver script, then store -the tensors in the colocated orchestrators. +launched ``Orchestrators``. Using the clients, we retrieve tensors +from two ``Orchestrators`` launched in the driver script, then store +the tensors in the colocated ``Orchestrators``. .. note:: - The Experiment must be started to use the orchestrators within the + The Experiment must be started to use the ``Orchestrators`` within the application script. Otherwise, it will fail to connect. Find the instructions on how to launch :ref:`here` @@ -732,36 +732,36 @@ To begin, import the necessary packages: .. _init_model_client: Initialize the Clients ---------------------- -To establish a connection with each orchestrators, +To establish a connection with each ``Orchestrators``, we need to initialize a new SmartRedis client for each. Step 1: Initialize ConfigOptions '''''''''''''''''''''''''''''''' -Since we are launching multiple orchestrators within the experiment, +Since we are launching multiple ``Orchestrators`` within the experiment, the SmartRedis ``ConfigOptions`` object is required when initializing a client in the application. We use the ``ConfigOptions.create_from_environment()`` function to create three instances of ``ConfigOptions``, -with one instance associated with each launched orchestrator. -Most importantly, to associate each launched orchestrator to a ``ConfigOptions`` object, -the ``create_from_environment()`` function requires specifying the unique orchestrator identifier +with one instance associated with each launched ``Orchestrator``. +Most importantly, to associate each launched ``Orchestrator`` to a ``ConfigOptions`` object, +the ``create_from_environment()`` function requires specifying the unique ``Orchestrator`` identifier argument named `db_identifier`. -For the single-sharded orchestrator: +For the single-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python :linenos: :lines: 5-6 -For the multi-sharded orchestrator: +For the multi-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python :linenos: :lines: 10-11 -For the colocated orchestrator: +For the colocated ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/application_script.py :language: python @@ -817,7 +817,7 @@ located in ``getting-started-multidb/tutorial_model/``:: Model: single shard logger@00-00-00:The single sharded db tensor is: [1 2 3 4] Model: multi shard logger@00-00-00:The multi sharded db tensor is: [5 6 7 8] -This output showcases that we have established a connection with multiple Orchestrators. +This output showcases that we have established a connection with multiple ``Orchestrators``. Next, take the tensors retrieved from the standalone deployment ``Orchestrators`` and store them in the colocated ``Orchestrator`` using ``Client.put_tensor(name, data)``. @@ -860,29 +860,29 @@ We setup the SmartSim ``logger`` to output information from the Experiment. Launch Multiple Orchestrators ----------------------------- In the context of this ``Experiment``, it's essential to create and launch -the orchestrators as a preliminary step before any other components since -the application script requests tensors from the launched orchestrators. +the ``Orchestrators`` as a preliminary step before any other components since +the application script requests tensors from the launched ``Orchestrators``. We aim to showcase the multi-database automation capabilities of SmartSim, so we -create two orchestrators in the workflow: a single-sharded ``Orchestrator`` and a +create two ``Orchestrators`` in the workflow: a single-sharded ``Orchestrator`` and a multi-sharded ``Orchestrator``. Step 1: Initialize Orchestrators '''''''''''''''''''''''''''''''' -To create an orchestrator, utilize the ``Experiment.create_database()`` function. +To create an ``Orchestrator``, utilize the ``Experiment.create_database()`` function. The function requires specifying a unique -database identifier argument named `db_identifier` to launch multiple orchestrators. +database identifier argument named `db_identifier` to launch multiple ``Orchestrators``. This step is necessary to connect to databases outside of the driver script. We will use the `db_identifier` names we specified in the application script. -For the single-sharded orchestrator: +For the single-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python :linenos: :lines: 12-14 -For the multi-sharded orchestrator: +For the multi-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python @@ -891,15 +891,15 @@ For the multi-sharded orchestrator: .. note:: Calling ``exp.generate()`` will create two subfolders - (one for each orchestrator created in the previous step) - whose names are based on the `db_identifier` of that orchestrator. + (one for each ``Orchestrator`` created in the previous step) + whose names are based on the `db_identifier` of that ``Orchestrator``. In this example, the Experiment folder is - named ``getting-started-multidb/``. Within this folder, two orchestrator subfolders will + named ``getting-started-multidb/``. Within this folder, two ``Orchestrator`` subfolders will be created, namely ``single_shard_db_identifier/`` and ``multi_shard_db_identifier/``. Step 2: Start ''''''''''''' -Next, to launch the orchestrators, +Next, to launch the ``Orchestrators``, pass the ``Orchestrator`` instances to ``Experiment.start()``. .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py @@ -907,8 +907,8 @@ pass the ``Orchestrator`` instances to ``Experiment.start()``. :linenos: :lines: 20-21 -The ``Experiment.start()`` function launches the orchestrators for use within the workflow. In other words, the function -deploys the orchestrators on the allocated compute resources. +The ``Experiment.start()`` function launches the ``Orchestrators`` for use within the workflow. In other words, the function +deploys the ``Orchestrators`` on the allocated compute resources. .. note:: By setting `summary=True`, SmartSim will print a summary of the @@ -921,20 +921,20 @@ deploys the orchestrators on the allocated compute resources. Create Client Connections to Orchestrators ------------------------------------------ The SmartRedis ``Client`` object contains functions that manipulate, send, and receive -data within the orchestrator. Each orchestrator has a single, dedicated SmartRedis ``Client``. -Begin by initializing a SmartRedis ``Client`` object per launched orchestrator. +data within the ``Orchestrator``. Each ``Orchestrator`` has a single, dedicated SmartRedis ``Client``. +Begin by initializing a SmartRedis ``Client`` object per launched ``Orchestrator``. To create a designated SmartRedis client, you need to specify the address of the target -running orchestrator. You can easily retrieve this address using the ``Orchestrator.get_address()`` function. +running ``Orchestrator``. You can easily retrieve this address using the ``Orchestrator.get_address()`` function. -For the single-sharded orchestrator: +For the single-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python :linenos: :lines: 23-24 -For the multi-sharded orchestrator: +For the multi-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python @@ -945,25 +945,25 @@ Store Data Using Clients ------------------------ In the application script, we retrieved two NumPy tensors. To support the apps functionality, we will create two -NumPy arrays in the python driver script and send them to the a orchestrator. To +NumPy arrays in the python driver script and send them to the a ``Orchestrator``. To accomplish this, we use the ``Client.put_tensor()`` function with the respective -orchestrator `client` instances. +``Orchestrator`` `client` instances. -For the single-sharded orchestrator: +For the single-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python :linenos: :lines: 28-31 -For the multi-sharded orchestrator: +For the multi-sharded ``Orchestrator``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python :linenos: :lines: 33-36 -Lets check to make sure the database tensors do not exist in the incorrect orchestrators: +Lets check to make sure the database tensors do not exist in the incorrect ``Orchestrators``: .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python @@ -979,16 +979,16 @@ When you run the experiment, the following output will appear:: Initialize a Colocated Model ---------------------------- In the next stage of the experiment, we -launch the application script with a co-located orchestrator +launch the application script with a co-located ``Orchestrator`` by configuring and creating a SmartSim colocated ``Model``. Step 1: Configure ''''''''''''''''' -You can specify the run settings of a model. +You can specify the run settings of a ``Model``. In this experiment, we invoke the Python interpreter to run the python script defined in section: :ref:`The Application Script`. -To configure this into a SmartSim model, we use the ``Experiment.create_run_settings()`` function. +To configure this into a SmartSim ``Model``, we use the ``Experiment.create_run_settings()`` function. The function returns a ``RunSettings`` object. When initializing the RunSettings object, we specify the path to the application file, @@ -1027,8 +1027,8 @@ to the ``create_model()`` function and assign to the variable ``model``. Step 2: Colocate '''''''''''''''' -To colocate the model, use the ``Model.colocate_db_uds()`` function to -Colocate an ``Orchestrator`` instance with this Model over +To colocate the ``Model``, use the ``Model.colocate_db_uds()`` function to +Colocate an ``Orchestrator`` instance with this ``Model`` over a Unix domain socket connection. .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py @@ -1037,12 +1037,12 @@ a Unix domain socket connection. :lines: 51-52 This method will initialize settings which add an unsharded -orchestrator to this ``Model`` instance. Only this Model will be able -to communicate with this colocated orchestrator by using the loopback TCP interface. +``Orchestrator`` to this ``Model`` instance. Only this ``Model`` will be able +to communicate with this colocated ``Orchestrator`` by using the loopback TCP interface. Step 3: Start ''''''''''''' -Next, launch the colocated model instance using the ``Experiment.start()`` function. +Next, launch the colocated ``Model`` instance using the ``Experiment.start()`` function. .. literalinclude:: ../tutorials/getting_started/multi_db_example/multidb_driver.py :language: python @@ -1051,7 +1051,7 @@ Next, launch the colocated model instance using the ``Experiment.start()`` funct .. note:: We set `block=True`, - so that ``Experiment.start()`` waits until the last Model has finished + so that ``Experiment.start()`` waits until the last ``Model`` has finished before returning: it will act like a job monitor, letting us know if processes run, complete, or fail. From 7ef65f7e973ae7fc19d4588348cb2e7a83ba1cf6 Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Thu, 18 Jan 2024 12:51:09 -0600 Subject: [PATCH 25/26] pushing edits --- doc/orchestrator.rst | 224 +++++++++++++++++++++---------------- doc/sr_advanced_topics.rst | 2 +- 2 files changed, 130 insertions(+), 96 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index ac076d051..c3b806094 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -11,16 +11,16 @@ exchange, online interactive visualization, online data analysis, computational An ``Orchestrator`` can be thought of as a general feature store capable of storing numerical data (Tensors and Datasets), AI Models (TF, TF-lite, PyTorch, or ONNX), and scripts (TorchScripts). In addition to storing data, the ``Orchestrator`` is capable of -executing ML Models and TorchScripts on the stored data using CPUs or GPUs. +executing AI Models and TorchScripts on the stored data using CPUs or GPUs. .. figure:: images/smartsim-arch.png Sample experiment showing a user application leveraging machine learning infrastructure launched by SmartSim and connected - to online analysis and visualization via the ``Orchestrator``. + to an online analysis and visualization simulation via the ``Orchestrator``. Users can establish a connection to the ``Orchestrator`` from within SmartSim ``Model`` executable code, ``Ensemble`` -``Model`` executable code, or driver scripts using the :ref:`SmartRedis` client library. +``Model`` executable code, or driver scripts using the :ref:`SmartRedis` ``Client`` library. SmartSim offers **two** types of ``Orchestrator`` deployments: @@ -41,7 +41,7 @@ SmartSim allows users to launch :ref:`multiple Orchestrators` of the course of an experiment. If a workflow requires a multiple ``Orchestrator`` environment, a `db_identifier` argument must be specified during ``Orchestrator`` initialization. Users can connect to ``Orchestrators`` in a multiple ``Orchestrator`` workflow by specifying the respective `db_identifier` argument -within a ``ConfigOptions`` object that is passed into the SmartRedis ``Client`` constructor. +within a :ref:`ConfigOptions` object that is passed into the SmartRedis ``Client`` constructor. .. _standalone_orch_doc: ===================== @@ -52,22 +52,24 @@ Overview -------- During standalone ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on separate compute node(s) from the SmartSim ``Model`` node(s). A standalone ``Orchestrator`` can be deployed on a single -node (standalone) or sharded (distributed) over multiple nodes. With a sharded ``Orchestrator``, users can +node (single-sharded) or sharded (distributed) over multiple nodes. With a sharded ``Orchestrator``, users can scale the number of database nodes for inference and script evaluation, enabling -increased in-memory capacity for data storage in large-scale workflows. Standalone -``Orchestrators`` are effective for small-scale workflows and offer lower latency for some API calls because -single-node ``Orchestrators`` don't require communication between ``Orchestrator`` nodes. +increased in-memory capacity for data storage in large-scale workflows. Single-node +``Orchestrators`` are effective for small-scale workflows and offer lower latency for ``Client`` API calls +that involve data appending or processing (e.g. ``Client.append_to_list()``, ``Client.run_model()``, etc). +This efficiency stems from the localization of all data on the compute node +and no requirement for communication between other ``Orchestrator`` nodes. When connecting to a standalone ``Orchestrator`` from within a ``Model`` application, the user has -several options when using the SmartRedis client: +several options when using the SmartRedis ``Client``: - In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim to detect the ``Orchestrator`` address through runtime configuration of the SmartSim ``Model`` environment. A default ``Client`` constructor, with no user-specified parameters, is sufficient to - connect to the ``Orchestrator``. The only exception is for the Python `client`, which requires + connect to the ``Orchestrator``. The only exception is for the Python ``client``, which requires the `cluster` constructor parameter to differentiate between a multi-node standalone deployment and a single-node standalone deployment. -- In an experiment with multiple ``Orchestrator``, users can connect to a specific ``Orchestrator`` by +- In an experiment with multiple ``Orchestrators``, users can connect to a specific ``Orchestrator`` by first specifying the `db_identifier` in the ``ConfigOptions`` constructor. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. - Users can specify or override automatically configured connection options by providing the @@ -77,7 +79,7 @@ several options when using the SmartRedis client: If connecting to a standalone ``Orchestrator`` from a SmartSim driver script, the user must specify the address of the ``Orchestrator`` via the ``Client`` constructor. SmartSim does not automatically configure the environment of the driver script to connect to an ``Orchestrator``. Users -can access an orchestrators address through ``Orchestrator.get_address()``. +can access an ``Orchestrators`` address through ``Orchestrator.get_address()``. .. note:: In SmartSim ``Model`` applications, it is advisable to **avoid** specifying addresses directly to the ``Client`` constructor. @@ -102,39 +104,39 @@ SmartSim ``Model`` application and the sharded ``Orchestrator`` using the :ref:` In scenarios with high data throughput, such as online analysis, training, and processing, a standalone ``Orchestrator`` is optimal. The data produced by multiple processes in a SmartSim ``Model`` is stored in the standalone -orchest``Orchestrator``rator and is available for consumption by other SmartSim ``Models``. +``Orchestrator`` and is available for consumption by other SmartSim ``Models``. If a workflow requires an application to leverage multiple standalone deployments, -multiple clients can be instantiated within an application, -with each client connected to a unique deployment. This is accomplished through the use of the -`db-identifier` and ``ConfigOptions`` object specified at ``Orchestrator`` initialization time. +multiple ``Clients`` can be instantiated within an application, +with each ``Client`` connected to a unique deployment. This is accomplished through the use of the +`db-identifier` and :ref:`ConfigOptions` object specified at ``Orchestrator`` initialization time. ------- Example ------- In the following example, we demonstrate deploying a standalone ``Orchestrator`` on an HPC System. Once the standalone ``Orchestrator`` is launched from the driver script, we walk through -connecting a SmartRedis client to the ``Orchestrator`` from within the SmartSim ``Model`` +connecting a SmartRedis ``Client`` to the ``Orchestrator`` from within the SmartSim ``Model`` script to transmit and poll for data. The example is comprised of two script files: - :ref:`Application Script` The application script is a Python file that contains instructions to create a SmartRedis - client connection to the standalone ``Orchestrator`` launched in the driver script. + ``Client`` connection to the standalone ``Orchestrator`` launched in the driver script. To demonstrate the ability of workflow components to access data from - other entities, we then retrieve the tensors set by the driver script using a SmartRedis client in - the application script. We then instruct the client to send and retrieve data from within the application script. + other entities, we then retrieve the tensors set by the driver script using a SmartRedis ``Client`` in + the application script. We then instruct the ``Client`` to send and retrieve data from within the application script. - :ref:`Experiment Driver Script` - The experiment driver script is responsible for launching and managing SmartSim entities. Within this script, - we use the Experiment API to create and launch a standalone ``Orchestrator``. To demonstrate the capability of + The ``Experiment`` driver script is responsible for launching and managing SmartSim entities. Within this script, + we use the ``Experiment`` API to create and launch a standalone ``Orchestrator``. To demonstrate the capability of SmartSim ``Model`` applications to access ``Orchestrator`` data sent from other sources, we employ the SmartRedis ``Client`` in the driver script to store a tensor in the ``Orchestrator``, which is later retrieved by the SmartSim ``Model``. To employ the application script, we initialize a ``Model`` object with the application script as an executable argument, launch the ``Orchestrator``, and then launch the ``Model``. To further demonstrate the ability of workflow components to access data from - other entities, we then retrieve the tensors stored by the ``Model`` using a SmartRedis client in + other entities, we then retrieve the tensors stored by the ``Model`` using a SmartRedis ``Client`` in the driver script. Lastly, we tear down the ``Orchestrator``. .. _standalone_orch_app_script: @@ -150,7 +152,7 @@ To begin writing the application script, import the necessary SmartRedis package Client Initialization --------------------- -To establish a connection with the ``Orchestrator``, we need to initialize a new SmartRedis client. +To establish a connection with the ``Orchestrator``, we need to initialize a new SmartRedis ``Client``. Since the ``Orchestrator`` we launch in the driver script is sharded, we specify the constructor argument `cluster` as `True`. @@ -160,7 +162,7 @@ constructor argument `cluster` as `True`. application_client = Client(cluster=True) .. note:: - Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations + Note that the C/C++/Fortran SmartRedis ``Clients`` are capable of reading cluster configurations from the SmartSim ``Model`` environment and the `cluster` constructor argument does not need to be specified in those client languages. @@ -173,7 +175,7 @@ constructor argument `cluster` as `True`. .. note:: To create a SmartRedis client connection to the standalone ``Orchestrator``, the standalone ``Orchestrator`` must be launched from within the driver script. You must execute the Python driver script, otherwise, there will - be no ``Orchestrator`` to connect the client to. + be no ``Orchestrator`` to connect the ``Client`` to. Data Retrieval -------------- @@ -183,15 +185,15 @@ used in the driver script as input to ``Client.put_tensor()``: .. code-block:: python - # Retrieve tensor from Orchestrator + # Retrieve the driver script tensor from Orchestrator driver_script_tensor = application_client.get_tensor("tensor_1") # Log tensor - application_client.log_data(LLInfo, f"The single sharded db tensor is: {driver_script_tensor}") + application_client.log_data(LLInfo, f"The multi-sharded db tensor is: {driver_script_tensor}") After the ``Model`` is launched by the driver script, the following output will appear in `getting-started/tutorial_model/model.out`:: - Default@17-11-48:The single sharded db tensor is: [1 2 3 4] + Default@17-11-48:The multi-sharded db tensor is: [1 2 3 4] Data Storage ------------ @@ -219,7 +221,7 @@ In this example, we instantiate an ``Experiment`` object with the name `getting- and the `launcher` set to `auto`. When using `launcher=auto`, SmartSim attempts to find a launcher on the machine. In this case, since we are running the example on a Slurm-based machine, SmartSim will automatically set the launcher to `slurm`. -We setup the SmartSim `logger` to output information from the ``Experiment`` at runtime: +We also setup the SmartSim `logger` to output information from the ``Experiment`` at runtime: .. code-block:: python @@ -236,16 +238,10 @@ We setup the SmartSim `logger` to output information from the ``Experiment`` at # Initialize the Experiment exp = Experiment("getting-started", launcher="auto") -Orchestrator Deployment ------------------------ -In the context of this experiment, it's essential to create and launch -the ``Orchestrator`` as a preliminary step before any other workflow entities. This is because -in this example the application script requests and sends tensors to and from a launched ``Orchestrator``. - -In the next stage of the experiment, we create and launch a standalone ``Orchestrator``. +Orchestrator Initialization +--------------------------- +In the next stage of the experiment, we create a standalone ``Orchestrator``. -Step 1: Initialize -'''''''''''''''''' To create a standalone ``Orchestrator``, utilize the ``Experiment.create_database()`` function. .. code-block:: python @@ -253,31 +249,19 @@ To create a standalone ``Orchestrator``, utilize the ``Experiment.create_databas # Initialize a multi-sharded Orchestrator standalone_orchestrator = exp.create_database(db_nodes=3) -Step 2: Start -''''''''''''' -Next, to launch the ``Orchestrator``, pass the ``Orchestrator`` instance to ``Experiment.start()``. - -.. code-block:: python - - # Launch the multi sharded Orchestrator - exp.start(standalone_orchestrator) - -The ``Experiment.start()`` function launches the ``Orchestrator`` for use within the workflow. -In other words, the function deploys the ``Orchestrator`` on the allocated compute resources. - Client Initialization --------------------- The SmartRedis ``Client`` object contains functions that manipulate, send, and retrieve data on the ``Orchestrator``. Begin by initializing a SmartRedis ``Client`` object for the standalone ``Orchestrator``. -SmartRedis clients in driver scripts do not have the ability to use a `db-identifier` or +SmartRedis ``Clients`` in driver scripts do not have the ability to use a `db-identifier` or rely on automatic configurations to connect to ``Orchestrators``. Therefore, when creating a SmartRedis client connection from within a driver script, specify the address of the ``Orchestrator`` you would like to connect to. You can easily retrieve the ``Orchestrator`` address using the ``Orchestrator.get_address()`` function: .. code-block:: python - # Initialize a SmartRedis client for multi sharded Orchestrator + # Initialize a SmartRedis client for multi-sharded Orchestrator driver_client = Client(cluster=True, address=standalone_orchestrator.get_address()[0]) Data Storage @@ -296,8 +280,8 @@ send a tensor to the ``Orchestrator``, use the function ``Client.put_tensor(name Model Initialization -------------------- -In the next stage of the experiment, we execute the application script by configuring and creating -a SmartSim ``Model`` and specifying the application script name during ``Model`` creation. +In the next stage of the experiment, we configure and create +a SmartSim ``Model`` and specify the application script path during ``Model`` creation. Step 1: Configure ''''''''''''''''' @@ -334,8 +318,49 @@ assign the returned ``Model`` instance to the variable `model`: # Initialize the Model model = exp.create_model("model", model_settings) -Step 3: Start -''''''''''''' +File Generation +--------------- +To created an isolated output directory for the ``Orchestrator`` and ``Model``, invoke ``Experiment.generate()`` via the +``Experiment`` instance `exp` with `standalone_orchestrator` and `model` as input parameters: + +.. code-block:: python + + # Create the output directory + exp.generate(standalone_orchestrator, model) + +.. note:: + Invoking ``Experiment.generate(standalone_orchestrator, model)`` will create two directories: + `standalone_orchestrator/` and `model/`. Each of these directories will store + two output files: a `.out` file and a `.err` file. + +.. note:: + It is important to invoke ``Experiment.generate()`` with all ``Experiment`` entity instances as input + as well as before any entities have been launched. This will ensure that the output files are + organized in the main ``experiment-name/`` folder. In this example, the ``Experiment`` folder is named + `getting-started/`. + +Entity Deployment +----------------- +In the next stage of the experiment, we launch the ``Orchestrator``, then launch the ``Model``. + +Step 1: Start Orchestrator +'''''''''''''''''''''''''' +In the context of this experiment, it's essential to create and launch +the ``Orchestrator`` as a preliminary step before any other workflow entities. This is because +in this example the application script requests and sends tensors to and from a launched ``Orchestrator``. + +To launch the ``Orchestrator``, pass the ``Orchestrator`` instance to ``Experiment.start()``. + +.. code-block:: python + + # Launch the multi-sharded Orchestrator + exp.start(standalone_orchestrator) + +The ``Experiment.start()`` function launches the ``Orchestrator`` for use within the workflow. +In other words, the function deploys the ``Orchestrator`` on the allocated compute resources. + +Step 2: Start Model +''''''''''''''''''' Next, launch the `model` instance using the ``Experiment.start()`` function: .. code-block:: python @@ -359,14 +384,14 @@ polling every 100 milliseconds until 10 attempts have completed: .. code-block:: python - # Retrieve the tensors placed by the Model + # Poll the tensors placed by the Model app_tensor = driver_client.poll_key("tensor_2", 100, 10) # Validate that the tensor exists - logger.info(f"The tensor is {app_tensor}") + logger.info(f"The tensor exists: {app_tensor}") When you execute the driver script, the output will be as follows:: - 23:45:46 osprey.us.cray.com SmartSim[87400] INFO The tensor is True + 23:45:46 system.host.com SmartSim[87400] INFO The tensor exists: True Cleanup ------- @@ -396,13 +421,13 @@ Overview During colocated ``Orchestrator`` deployment, a SmartSim ``Orchestrator`` (the database) runs on the ``Models`` compute node(s). Colocated ``Orchestrators`` can only be deployed as isolated instances on each compute node and cannot be clustered over multiple nodes. The ``Orchestrator`` on each application node is -utilized by SmartRedis clients on the same node. With a colocated ``Orchestrator``, latency is reduced +utilized by SmartRedis ``Clients`` on the same node. With a colocated ``Orchestrator``, latency is reduced in ML inference and TorchScript evaluation by eliminating off-node communication. A colocated ``Orchestrator`` is ideal when the data and hardware accelerator are located on the same compute node. Communication between a colocated ``Orchestrator`` and SmartSim ``Model`` -is initiated in the application through a SmartRedis client. Since a colocated ``Orchestrator`` is launched when the SmartSim ``Model`` -is started by the experiment, connecting a SmartRedis client to a colocated ``Orchestrator`` is only possible from within +is initiated in the application through a SmartRedis ``Client``. Since a colocated ``Orchestrator`` is launched when the SmartSim ``Model`` +is started by the experiment, connecting a SmartRedis ``Client`` to a colocated ``Orchestrator`` is only possible from within the associated SmartSim ``Model`` application. There are **three** methods for connecting the SmartRedis client to the colocated ``Orchestrator``: @@ -411,9 +436,9 @@ There are **three** methods for connecting the SmartRedis client to the colocate - In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim to detect the ``Orchestrator`` address through runtime configuration of the SmartSim ``Model`` environment. A default ``Client`` constructor, with no user-specified parameters, is sufficient to - connect to the ``Orchestrator``. The only exception is for the Python `client`, which requires + connect to the ``Orchestrator``. The only exception is for the Python ``Client``, which requires the `cluster=False` constructor parameter for the colocated ``Orchestrator``. -- In an experiment with multiple ``Orchestrator`` deployments, users can connect to a specific ``Orchestrator`` by +- In an experiment with multiple ``Orchestrators``, users can connect to a specific ``Orchestrator`` by first specifying the `db_identifier` in the ``ConfigOptions`` constructor. Subsequently, users should pass the ``ConfigOptions`` instance to the ``Client`` constructor. - Users can specify or override automatically configured connection options by providing the @@ -421,7 +446,7 @@ There are **three** methods for connecting the SmartRedis client to the colocate instance to the ``Client`` constructor. Below is an image illustrating communication within a colocated SmartSim ``Model`` spanning multiple compute nodes. -As demonstrated in the diagram, each process of the application creates its own SmartRedis client +As demonstrated in the diagram, each process of the application creates its own SmartRedis ``Client`` connection to the ``Orchestrator`` running on the same host. .. figure:: images/colocated_orchestrator-1.png @@ -435,24 +460,23 @@ off-node to be used to evaluate a ML model, and the results of the ML model eval are stored on-node. If a workflow requires an application to both leverage colocated -deployment and standalone deployment, multiple clients can be instantiated within an application, -with each client connected to a unique deployment. This is accomplished through the use of the +deployment and standalone deployment, multiple ``Clients`` can be instantiated within an application, +with each ``Client`` connected to a unique deployment. This is accomplished through the use of the `db-identifier` specified at ``Orchestrator`` initialization time. ------- Example ------- In the following example, we demonstrate deploying a colocated ``Orchestrator`` on an HPC System. -Once the ``Orchestrator`` is launched, we walk through connecting a SmartRedis client -from within the application script to transmit data then poll for the existence of the data -on the ``Orchestrator``. +Once the ``Orchestrator`` is launched, we walk through connecting a SmartRedis ``Client`` +from within the application script to transmit and poll for data on the ``Orchestrator``. The example is comprised of two script files: - :ref:`Application Script` The application script is a Python script that connects a SmartRedis - client to the colocated ``Orchestrator``. From within the application script, - the client is utilized to both send and retrieve data. + ``Client`` to the colocated ``Orchestrator``. From within the application script, + the ``Client`` is utilized to both send and retrieve data. - :ref:`Experiment Driver Script` The experiment driver script launches and manages the example entities with the Experiment API. @@ -473,8 +497,8 @@ To begin writing the application script, import the necessary SmartRedis package Client Initialization --------------------- To establish a connection with the colocated ``Orchestrator``, we need to initialize a -new SmartRedis `client` and specify `cluster=False` since colocated deployments are never -clustered but single-sharded. +new SmartRedis ``Client`` and specify `cluster=False` since colocated deployments are never +clustered but only single-sharded. .. code-block:: python @@ -482,9 +506,9 @@ clustered but single-sharded. colo_client = Client(cluster=False) .. note:: - Note that the C/C++/Fortran SmartRedis clients are capable of reading cluster configurations + Note that the C/C++/Fortran SmartRedis ``Clients`` are capable of reading cluster configurations from the ``Model`` environment and the `cluster` constructor argument does not need to be specified - in those client languages. + in those ``Client`` languages. .. note:: Since there is only one ``Orchestrator`` launched in the Experiment @@ -493,13 +517,13 @@ clustered but single-sharded. SmartRedis will handle the connection configuration. .. note:: - To create a client connection to the colocated ``Orchestrator``, the colocated ``Model`` must be launched + To create a ``Client`` connection to the colocated ``Orchestrator``, the colocated ``Model`` must be launched from within the driver script. You must execute the Python driver script, otherwise, there will - be no ``Orchestrator`` to connect the client to. + be no ``Orchestrator`` to connect the ``Client`` to. Data Storage ------------ -Next, using the SmartRedis client instance, we create and store a NumPy tensor through +Next, using the SmartRedis ``Client`` instance, we create and store a NumPy tensor through ``Client.put_tensor(name, data)``: .. code-block:: python @@ -611,7 +635,17 @@ a Unix domain socket connection. # Colocate the Model model.colocate_db_uds() -Step 4: Start +Step 4: Generate Files +'''''''''''''''''''''' +Next, generate the ``Experiment`` entity output files by passing the ``Model`` instance to +``Experiment.generate()``: + +.. code-block:: python + + # Generate output files + exp.generate(model) + +Step 5: Start ''''''''''''' Next, launch the colocated ``Model`` instance using the ``Experiment.start()`` function. @@ -641,10 +675,10 @@ When you run the experiment, the following output will appear:: Multiple Orchestrators ====================== SmartSim supports automating the deployment of multiple ``Orchestrators`` -from within an Experiment. Communication with the ``Orchestrator`` via a SmartRedis client is possible with the +from within an Experiment. Communication with the ``Orchestrator`` via a SmartRedis ``Client`` is possible with the `db_identifier` argument that is required when initializing an ``Orchestrator`` or colocated ``Model`` during a multiple ``Orchestrator`` experiment. When initializing a SmartRedis -client during the experiment, create a ``ConfigOptions`` object to specify the `db_identifier` +``Client`` during the experiment, create a ``ConfigOptions`` object to specify the `db_identifier` argument used when creating the ``Orchestrator``. Pass the ``ConfigOptions`` object to the ``Client()`` init call. @@ -670,16 +704,16 @@ In this example, the application script is a python file that contains instructions to complete computational tasks. Applications are not limited to Python and can also be written in C, C++ and Fortran. -This script specifies creating a Python SmartRedis client for each +This script specifies creating a Python SmartRedis ``Client`` for each standalone ``Orchestrator`` and a colocated ``Orchestrator``. We use the -clients to request data from both standalone ``Orchestrators``, then +``Clients`` to request data from both standalone ``Orchestrators``, then transfer the data to the colocated ``Orchestrator``. The application file is launched by the experiment driver script through a ``Model`` stage. **The Application Script Contents:** -1. Connecting SmartRedis clients within the application to retrieve tensors +1. Connecting SmartRedis ``Clients`` within the application to retrieve tensors from the standalone ``Orchestrators`` to store in a colocated ``Orchestrator``. Details in section: :ref:`Initialize the Clients`. @@ -699,7 +733,7 @@ runs the application. :ref:`Launch Multiple Orchestrators`. 2. Launching the application script with a colocated ``Orchestrator``. Details in section: :ref:`Initialize a Colocated Model`. -3. Connecting SmartRedis clients within the driver script to send tensors to standalone ``Orchestrators`` +3. Connecting SmartRedis ``Clients`` within the driver script to send tensors to standalone ``Orchestrators`` for retrieval within the application. Details in section: :ref:`Create Client Connections to Orchestrators`. @@ -709,11 +743,11 @@ Setup and run instructions can be found :ref:`here` The Application Script ====================== Applications interact with the ``Orchestrators`` -through a SmartRedis client. +through a SmartRedis ``Client``. In this section, we write an application script to demonstrate how to connect SmartRedis -clients in the context of multiple -launched ``Orchestrators``. Using the clients, we retrieve tensors +``Clients`` in the context of multiple +launched ``Orchestrators``. Using the ``Clients``, we retrieve tensors from two ``Orchestrators`` launched in the driver script, then store the tensors in the colocated ``Orchestrators``. @@ -733,13 +767,13 @@ To begin, import the necessary packages: Initialize the Clients ---------------------- To establish a connection with each ``Orchestrators``, -we need to initialize a new SmartRedis client for each. +we need to initialize a new SmartRedis ``Client`` for each. Step 1: Initialize ConfigOptions '''''''''''''''''''''''''''''''' Since we are launching multiple ``Orchestrators`` within the experiment, the SmartRedis ``ConfigOptions`` object is required when initializing -a client in the application. +a ``Client`` in the application. We use the ``ConfigOptions.create_from_environment()`` function to create three instances of ``ConfigOptions``, with one instance associated with each launched ``Orchestrator``. @@ -771,9 +805,9 @@ For the colocated ``Orchestrator``: Step 2: Initialize the Client Connections ''''''''''''''''''''''''''''''''''''''''' Now that we have three ``ConfigOptions`` objects, we have the -tools necessary to initialize three SmartRedis clients and +tools necessary to initialize three SmartRedis ``Clients`` and establish a connection with the three ``Orchestrators``. -We use the SmartRedis ``Client`` API to create the client instances by passing in +We use the SmartRedis ``Client`` API to create the ``Client`` instances by passing in the ``ConfigOptions`` objects and assigning a `logger_name` argument. Single-sharded ``Orchestrator``: @@ -924,7 +958,7 @@ The SmartRedis ``Client`` object contains functions that manipulate, send, and r data within the ``Orchestrator``. Each ``Orchestrator`` has a single, dedicated SmartRedis ``Client``. Begin by initializing a SmartRedis ``Client`` object per launched ``Orchestrator``. -To create a designated SmartRedis client, you need to specify the address of the target +To create a designated SmartRedis ``Client``, you need to specify the address of the target running ``Orchestrator``. You can easily retrieve this address using the ``Orchestrator.get_address()`` function. For the single-sharded ``Orchestrator``: diff --git a/doc/sr_advanced_topics.rst b/doc/sr_advanced_topics.rst index 30da2c578..84ae8e959 100644 --- a/doc/sr_advanced_topics.rst +++ b/doc/sr_advanced_topics.rst @@ -1,2 +1,2 @@ - +.. _learn_config_otpions: .. include:: ../smartredis/doc/advanced_topics.rst \ No newline at end of file From 4badd0e0a0a42fa86f4edce1cba84d12d145849c Mon Sep 17 00:00:00 2001 From: Amanda Richardson Date: Thu, 18 Jan 2024 19:10:12 -0600 Subject: [PATCH 26/26] pushing matts comments --- doc/orchestrator.rst | 20 +++++++++----------- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/doc/orchestrator.rst b/doc/orchestrator.rst index c3b806094..307f8042c 100644 --- a/doc/orchestrator.rst +++ b/doc/orchestrator.rst @@ -63,7 +63,7 @@ and no requirement for communication between other ``Orchestrator`` nodes. When connecting to a standalone ``Orchestrator`` from within a ``Model`` application, the user has several options when using the SmartRedis ``Client``: -- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim +- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartRedis to detect the ``Orchestrator`` address through runtime configuration of the SmartSim ``Model`` environment. A default ``Client`` constructor, with no user-specified parameters, is sufficient to connect to the ``Orchestrator``. The only exception is for the Python ``client``, which requires @@ -146,8 +146,7 @@ To begin writing the application script, import the necessary SmartRedis package .. code-block:: python - from smartredis import Client, log_data - from smartredis import * + from smartredis import Client, log_data, LLInfo import numpy as np Client Initialization @@ -168,7 +167,7 @@ constructor argument `cluster` as `True`. .. note:: Since there is only one ``Orchestrator`` launched in the experiment - (the standalone ``Orchestrator``), specifying an ``Orchestrator`` address + (the standalone ``Orchestrator``), specifying an ``Orchestrator`` `db_identifier` is not required when initializing the SmartRedis client. SmartRedis will handle the connection configuration. @@ -369,8 +368,8 @@ Next, launch the `model` instance using the ``Experiment.start()`` function: exp.start(model, block=True, summary=True) .. note:: - We specify `block=True` to ``exp.start()`` because our experiment - requires that the ``Model`` finish before the experiment continues. + We specify `block=True` to ``exp.start()`` because our ``Experiment`` driver script + requires that the ``Model`` finish before the ``Experiment`` continues. This is because we will request tensors from the ``Orchestrator`` that are inputted by the ``Model`` we launched. @@ -433,7 +432,7 @@ the associated SmartSim ``Model`` application. There are **three** methods for connecting the SmartRedis client to the colocated ``Orchestrator``: -- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartSim +- In an experiment with a single deployed ``Orchestrator``, users can rely on SmartRedis to detect the ``Orchestrator`` address through runtime configuration of the SmartSim ``Model`` environment. A default ``Client`` constructor, with no user-specified parameters, is sufficient to connect to the ``Orchestrator``. The only exception is for the Python ``Client``, which requires @@ -490,8 +489,7 @@ To begin writing the application script, import the necessary SmartRedis package .. code-block:: python - from smartredis import ConfigOptions, Client, log_data - from smartredis import * + from smartredis import ConfigOptions, Client, log_data, LLInfo import numpy as np Client Initialization @@ -512,7 +510,7 @@ clustered but only single-sharded. .. note:: Since there is only one ``Orchestrator`` launched in the Experiment - (the colocated ``Orchestrator``), specifying a ``Orchestrator`` address + (the colocated ``Orchestrator``), specifying a ``Orchestrator`` `db_identifier` is not required when initializing the client. SmartRedis will handle the connection configuration. @@ -637,7 +635,7 @@ a Unix domain socket connection. Step 4: Generate Files '''''''''''''''''''''' -Next, generate the ``Experiment`` entity output files by passing the ``Model`` instance to +Next, generate the ``Experiment`` entity directories by passing the ``Model`` instance to ``Experiment.generate()``: .. code-block:: python