Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: complete minimal workflow as template #8

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 7 additions & 5 deletions .github/workflows/conventional-prs.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: PR
name: Lint PR
on:
pull_request_target:
types:
Expand All @@ -7,12 +7,14 @@ on:
- edited
- synchronize

permissions:
pull-requests: read

jobs:
title-format:
main:
name: Validate PR title
runs-on: ubuntu-latest
steps:
- uses: amannn/action-semantic-pull-request@v3.4.0
- uses: amannn/action-semantic-pull-request@v5
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
validateSingleCommit: true
52 changes: 27 additions & 25 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,16 @@ name: Tests

on:
push:
branches: [ main ]
branches: [main, dev]
pull_request:
branches: [ main ]

branches: [main, dev]

jobs:
Formatting:
runs-on: ubuntu-latest
if: ${{ github.actor != 'github-actions[bot]' }}
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4
- name: Formatting
uses: github/super-linter@v4
env:
Expand All @@ -22,33 +22,35 @@ jobs:

Linting:
runs-on: ubuntu-latest
if: ${{ github.actor != 'github-actions[bot]' }}
steps:
- uses: actions/checkout@v2
- name: Lint workflow
uses: snakemake/snakemake-github-action@v1.24.0
with:
directory: .
snakefile: workflow/Snakefile
args: "--lint"
- uses: actions/checkout@v2
- name: Lint workflow
uses: snakemake/snakemake-github-action@v1.25.1
with:
directory: .
snakefile: workflow/Snakefile
args: "--lint"

Testing:
runs-on: ubuntu-latest
needs:
if: ${{ github.actor != 'github-actions[bot]' }}
needs:
- Linting
- Formatting
steps:
- uses: actions/checkout@v2
- uses: actions/checkout@v4

- name: Test workflow
uses: snakemake/snakemake-github-action@v1.24.0
with:
directory: .test
snakefile: workflow/Snakefile
args: "--use-conda --show-failed-logs --cores 3 --conda-cleanup-pkgs cache --all-temp"
- name: Test workflow
uses: snakemake/snakemake-github-action@v1.25.1
with:
directory: .test
snakefile: workflow/Snakefile
args: "--use-conda --show-failed-logs --cores 3 --conda-cleanup-pkgs cache --all-temp"

- name: Test report
uses: snakemake/snakemake-github-action@v1.24.0
with:
directory: .test
snakefile: workflow/Snakefile
args: "--report report.zip"
- name: Test report
uses: snakemake/snakemake-github-action@v1.25.1
with:
directory: .test
snakefile: workflow/Snakefile
args: "--report report.zip"
3 changes: 1 addition & 2 deletions .github/workflows/release-please.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,8 @@ jobs:
release-please:
runs-on: ubuntu-latest
steps:

- uses: GoogleCloudPlatform/release-please-action@v2
id: release
with:
release-type: go # just keep a changelog, no version anywhere outside of git tags
package-name: <repo>
package-name: <repo>
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,4 @@ resources/**
logs/**
.snakemake
.snakemake/**
.test/results/*
34 changes: 34 additions & 0 deletions .test/config/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
samplesheet: "config/samples.tsv"

get_genome:
database: "ncbi"
assembly: "GCF_000006785.2"
fasta: Null
gff: Null
gff_source_type:
[
"RefSeq": "gene",
"RefSeq": "pseudogene",
"RefSeq": "CDS",
"Protein Homology": "CDS",
]

simulate_reads:
read_length: 100
read_number: 100000
random_freq: 0.01

cutadapt:
threep_adapter: "-a ATCGTAGATCGG"
fivep_adapter: "-A GATGGCGATAGG"
default: ["-q 10 ", "-m 25 ", "-M 100", "--overlap=5"]

multiqc:
config: "config/multiqc_config.yml"

report:
export_figures: True
export_dir: "figures/"
figure_width: 875
figure_height: 500
figure_resolution: 125
2 changes: 2 additions & 0 deletions .test/config/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
remove_sections:
- samtools-stats
3 changes: 3 additions & 0 deletions .test/config/samples.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
sample condition replicate read1 read2
sample1 wild_type 1 sample1.bwa.read1.fastq.gz sample1.bwa.read2.fastq.gz
sample2 wild_type 2 sample2.bwa.read1.fastq.gz sample2.bwa.read2.fastq.gz
98 changes: 93 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,109 @@
# Snakemake workflow: `<name>`

[![Snakemake](https://img.shields.io/badge/snakemake-≥6.3.0-brightgreen.svg)](https://snakemake.github.io)
[![GitHub actions status](https://github.com/<owner>/<repo>/workflows/Tests/badge.svg?branch=main)](https://github.com/<owner>/<repo>/actions?query=branch%3Amain+workflow%3ATests)

[![Snakemake](https://img.shields.io/badge/snakemake-≥8.0.0-brightgreen.svg)](https://snakemake.github.io)
[![GitHub actions status](https://github.com/MPUSP/snakemake-workflow-template/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/MPUSP/snakemake-workflow-template/actions/workflows/main.yml)
[![run with conda](http://img.shields.io/badge/run%20with-conda-3EB049?labelColor=000000&logo=anaconda)](https://docs.conda.io/en/latest/)
[![run with singularity](https://img.shields.io/badge/run%20with-singularity-1D355C.svg?labelColor=000000)](https://sylabs.io/docs/)
[![workflow catalog](https://img.shields.io/badge/Snakemake%20workflow%20catalog-darkgreen)](https://snakemake.github.io/snakemake-workflow-catalog)

A Snakemake workflow for `<description>`

- [Snakemake workflow: `<name>`](#snakemake-workflow-name)
- [Usage](#usage)
- [Workflow overview](#workflow-overview)
- [Running the workflow](#running-the-workflow)
- [Input data](#input-data)
- [Execution](#execution)
- [Parameters](#parameters)
- [Authors](#authors)
- [References](#references)
- [TODO](#todo)

## Usage

The usage of this workflow is described in the [Snakemake Workflow Catalog](https://snakemake.github.io/snakemake-workflow-catalog/?usage=<owner>%2F<repo>).

If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) <repo>sitory and its DOI (see above).
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this repository or its DOI.

## Workflow overview

This workflow is a best-practice workflow for `<detailed description>`.
The workflow is built using [snakemake](https://snakemake.readthedocs.io/en/stable/) and consists of the following steps:

1. Parse sample sheet containing sample meta data (`python`)
2. Simulate short read sequencing data on the fly (`dwgsim`)
3. Check quality of input read data (`FastQC`)
4. Trim adapters from input data (`cutadapt`)
5. Collect statistics from tool output (`MultiQC`)

## Running the workflow

### Input data

This template workflow contains artifical sequencing data in `*.fastq.gz` format.
The test data is located in `.test/data`. Input files are supplied with a mandatory table, whose location is indicated in the `config.yml` file (default: `.test/samples.tsv`). The sample sheet has the following layout:

| sample | condition | replicate | data_folder | fq1 |
| -------- | --------- | --------- | ----------- | ------------------------ |
| RPF-RTP1 | RPF-RTP | 1 | data | RPF-RTP1_R1_001.fastq.gz |
| RPF-RTP2 | RPF-RTP | 2 | data | RPF-RTP2_R1_001.fastq.gz |

### Execution

To run the workflow from command line, change the working directory.

```bash
cd path/to/snakemake-workflow-name
```

Adjust options in the default config file `config/config.yml`.
Before running the entire workflow, you can perform a dry run using:

```bash
snakemake --dry-run
```

To run the complete workflow with test files using **conda**, execute the following command. The definition of the number of compute cores is mandatory.

```bash
snakemake --cores 10 --sdm conda --directory .test
```

To run the workflow with **singularity** / **apptainer**, use:

```bash
snakemake --cores 10 --sdm conda apptainer --directory .test
```

### Parameters

This table lists all parameters that can be used to run the workflow.

| parameter | type | details | default |
| ---------------------- | ---- | ------------------------------------------- | -------------------------------------------- |
| **samplesheet** | | | |
| path | str | path to samplesheet, mandatory | "config/samples.tsv" |
| **cutadapt** | | | |
| fivep_adapter | str | sequence of the 5' adapter | Null |
| threep_adapter | str | sequence of the 3' adapter | `ATCGTAGATCGGAAGAGCACACGTCTGAA` |
| default | str | additional options passed to `cutadapt` | [`-q 10 `, `-m 22 `, `-M 52`, `--overlap=3`] |

## Authors

- Firstname Lastname
- Affiliation
- ORCID profile
- home page

## References

> Köster, J., Mölder, F., Jablonski, K. P., Letcher, B., Hall, M. B., Tomkins-Tinch, C. H., Sochat, V., Forster, J., Lee, S., Twardziok, S. O., Kanitz, A., Wilm, A., Holtgrewe, M., Rahmann, S., & Nahnsen, S. *Sustainable data analysis with Snakemake*. F1000Research, 10:33, 10, 33, **2021**. https://doi.org/10.12688/f1000research.29032.2.

# TODO
## TODO

* Replace `<owner>` and `<repo>` everywhere in the template (also under .github/workflows) with the correct `<repo>` name and owning user or organization.
* Replace `<name>` with the workflow name (can be the same as `<repo>`).
* Replace `<description>` with a description of what the workflow does.
* Update the workflow description, parameters, running options, authors and references in the `README.md`
* Update the `README.md` badges. Add or remove badges for `conda`/`singularity`/`apptainer` usage depending on the workflow's capability
* The workflow will occur in the snakemake-workflow-catalog once it has been made public. Then the link under "Usage" will point to the usage instructions if `<owner>` and `<repo>` were correctly set.
34 changes: 34 additions & 0 deletions config/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
samplesheet: ".test/config/samples.tsv"

get_genome:
database: "ncbi"
assembly: "GCF_000006785.2"
fasta: Null
gff: Null
gff_source_type:
[
"RefSeq": "gene",
"RefSeq": "pseudogene",
"RefSeq": "CDS",
"Protein Homology": "CDS",
]

simulate_reads:
read_length: 100
read_number: 100000
random_freq: 0.01

cutadapt:
threep_adapter: "-a ATCGTAGATCGG"
fivep_adapter: "-A GATGGCGATAGG"
default: ["-q 10 ", "-m 25 ", "-M 100", "--overlap=5"]

multiqc:
config: "config/multiqc_config.yml"

report:
export_figures: True
export_dir: "figures/"
figure_width: 875
figure_height: 500
figure_resolution: 125
2 changes: 2 additions & 0 deletions config/multiqc_config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
remove_sections:
- samtools-stats
44 changes: 44 additions & 0 deletions config/schemas/config.schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
$schema: "http://json-schema.org/draft-07/schema#"
description: an entry in the sample sheet
properties:
samplesheet:
type: string
description: sample name/identifier

get_genome:
properties:
database:
type: ["string", "null"]
assembly:
type: ["string", "null"]
fasta:
type: ["string", "null"]
gff:
type: ["string", "null"]
gff_source_type:
type: array

simulate_reads:
properties:
read_length:
type: number
read_number:
type: number
random_freq:
type: number

cutadapt:
properties:
threep_adapter:
type: string
fivep_adapter:
type: string
default:
type: array

multiqc:
properties:
config:
type: string

required: ["samplesheet", "get_genome", "simulate_reads", "cutadapt", "multiqc"]
25 changes: 25 additions & 0 deletions config/schemas/samples.schema.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
$schema: "http://json-schema.org/draft-07/schema#"
description: an entry in the sample sheet
properties:
sample:
type: string
description: sample name/identifier
condition:
type: string
description: sample condition that will be compared during differential analysis
replicate:
type: number
default: 1
description: consecutive numbers representing multiple replicates of one condition
read1:
type: string
description: names of fastq.gz files, read 1
read2:
type: string
description: names of fastq.gz files, read 2 (optional)

required:
- sample
- condition
- replicate
- read1
Loading