Skip to content

Commit

Permalink
Ship deployment general instructions
Browse files Browse the repository at this point in the history
  • Loading branch information
Sohambutala committed Oct 19, 2024
1 parent 091585c commit 8ab8b2b
Showing 1 changed file with 102 additions and 0 deletions.
102 changes: 102 additions & 0 deletions deployment_config/ship_deployment_instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Deployment Instructions

## Setting up the Environment

### Python Environment

1. Create and Activate Environment

Start by creating a new conda, venv, or mamba environment and activate it.

2. Clone Echodataflow

Clone the echodataflow repository to your local machine:
```bash
git clone https://github.com/OSOceanAcoustics/echodataflow
```
3. Install Echodataflow in Editable Mode

Navigate to the cloned repository and install the package in editable mode:
```bash
cd echodataflow
pip install -e .
```

### Echodataflow configuration
1. Connect to Prefect Account

Ensure your machine is connected to a Prefect account, which could be either a local Prefect server or a [Prefect Cloud account](https://docs.prefect.io/3.0/manage/cloud/connect-to-cloud).
2. Initialize Echodataflow

In your environment, run the following command to set up the basic configuration for Echodataflow:

```bash
echodataflow init
```
3. Create YAML Configuration Files

Set up the YAML configuration files for each service to be deployed. Reference the [sample config files](https://drive.google.com/drive/u/2/folders/1C2Hs3-SxWbYaE3xTo7RRqAg4I7fzponW) for guidance and check the [documentation](https://echodataflow.readthedocs.io/en/latest/configuration/datastore.html) for additional information.
4. Add Required YAML Files

Place the following YAML files in a directory. These files are required for the current deployment on Lasker or Shimada, if your use case is different feel free to modify the files accordingly:

```bash
df_Sv_pipeline
datastore.yaml
pipeline.yaml
datastore_MVBS.yaml
pipeline_MVBS.yaml
datastore_prediction.yaml
pipeline_prediction.yaml
```

## Deploying the flows

1. Run Initial Scripts

In the extensions folder of your environment, run `file_monitor.py` and `file_downloader.py`:

```bash
python path/extensions/file_monitor.py
python path/extensions/file_downloader.py
```
Wait for the message "Your flow is being served and polling for scheduled runs!" to confirm that deployments have been created in your Prefect account.

2. Configure File Monitoring and File Transfer

Configure the source for file monitoring and set up the `rclone` command for file transfer.

3. Run Main Deployment Script

Run `main.py` from the deployment folder to create additional deployments:

```bash
python deployment/main.py
```
4. View and Edit Deployments in Prefect UI

Go to the Prefect UI and check the Deployments tab to view the created deployments. You can duplicate deployments, modify schedules, and update datastore and pipeline configuration files directly from the UI.

5. Duplicate Deployments for Different Flows

Create separate deployments for the Sv, MVBS, and prediction flows by duplicating the existing ones and customizing the schedule and configurations as needed.

6. Add Path to Configuration Files

Update the deployment to include the correct paths to the YAML configuration files. If you're using S3 to manage config files, ensure to add the appropriate [block configuration](https://echodataflow.readthedocs.io/en/latest/configuration/blocks.html).
## Creating Work Pools and Work Queues
### Create Work Pools
In the Prefect UI, navigate to the Work Pools tab and create new pools. These pools can be distributed logically, such as one pool per service or per instance.
### Create Work Queues
Similarly, create work queues and assign them to the work pools. Distribute the queues in a way that optimizes load distribution across the available workers.
## Spinning Up Workers
### Start Workers
Once the work pools and queues are set up, you can start the workers. Prefect will provide commands for each pool and queue, which you can run to spin up workers on the instance.
### Run Workers in Parallel
Each worker command should be executed in a separate terminal session. This will allow multiple workers to run in parallel, processing tasks across different flows simultaneously.

1 comment on commit 8ab8b2b

@valentina-s
Copy link
Collaborator

@valentina-s valentina-s commented on 8ab8b2b Oct 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Details to add:

  • details on the package versions
  • also can you specify the hash of the version you are runnning
  • add NASC and panel runs
  • add config files dor NASC
  • If intermediate files are created during some of the deployments (such as eshader/database.sql), could you add details on those where they are and if the user should change the path for the next steps if they are on different machine
  • For the Pools and Queues: can you take a snapshot of where you set it, can you add at which stage your run these steps
  • Spinning Up Workers is note very clear: can you add more details and snapshots
  • There was a step to set add a rules files, right? Can you add a file that you used for the last run and instructions how to set it
  • Automations: those need to be set up separately, right?
  • Details on the scheduling/set ups of the deployments: the steps that need to be set up in Prefect
  • Is there a way to export the deployment parameter configurations?
  • path to model
  • Cluster Setup?

Also, move to the merging branch (subrefactor).

Please sign in to comment.