In this repo we show how to use genomic language models on the AWS cloud.
A genomic language model is a type of machine learning model, typically based on architectures like transformers or recurrent neural networks (RNNs), that is trained to understand and generate sequences of DNA, RNA, or other biological sequences. Just as language models are trained on human text to predict and generate natural language, genomic language models are trained on nucleotide sequences (like those composed of the bases A, T, C, and G) to capture the underlying patterns and structures in genomic data.
We show here, how to work with four genomic language models---HyenaDNA, Evo, Evo 2, and Caduceus---and a sc-RNAseq foundational model, Geneformer.
See here our HyenaDNA project.
See here our Evo project.
See here our Evo 2 project.
See here our Geneformer project.
See here our Caduceus project.
See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.