Simulate a misassembly for a given fasta.
![]() ![]() |
Install Rust.
Compile misasim
.
cargo build --release
Usage: misasim [OPTIONS] <COMMAND>
Commands:
misjoin Simulate a misjoin in a sequence
false-duplication Simulate a falsely duplicated sequence
gap Simulate a gap in a sequence
break Simulate a break in a sequence
inversion Simulate an inversion in a sequence
help Print this message or the help of the given subcommand(s)
Options:
-i, --infile <INFILE> Input sequence file. Uncompressed or bgzipped
-r, --inbedfile <INBEDFILE> Input bed file. Each region should map to a sequence from infile
-o, --outfile <OUTFILE> Output sequence file
-b, --outbedfile <OUTBEDFILE> Output BED file with misassemblies
-s, --seed <SEED> Seed to use for the random number generator
--randomize-length Randomize length
-g, --group-by <GROUP_BY> Group by regex pattern. ex. "^.*?_(?<hap>.*?)$" with group by haplotype
-h, --help Print help
./target/release/misasim misjoin \
-i test/data/HG002_chr10_cens.fa.gz
./target/release/misasim misjoin \
-i test/data/HG002_chr10_cens.fa.gz \
-n 12
./target/release/misasim false-duplication \
-i test/data/HG002_chr10_cens.fa.gz \
-l 5000
Generate a false-duplication at a random position with a length of 5000 bp duplicated at most four times.
./target/release/misasim false-duplication \
-i test/data/HG002_chr10_cens.fa.gz \
-l 5000 \
--max-duplications 4
./target/release/misasim misjoin \
-i test/data/HG002.fa.gz \
-r test/data/region.bed \
-l 5000
Generate a misjoin at a random position within the regions specified with a length of 5000 bp grouped by chromosome name.
# Either chr10_MATERNAL or chr10_PATERNAL would get a misjoin.
./target/release/misasim misjoin \
-i test/data/HG002.fa.gz \
-r test/data/region.bed \
-g "$(?<chr>.*?)_.*?$" # "$(.*?)_.*?$" would also work.