Skip to content

Add accession to sequence type mapping #77

Open
@jsstevenson

Description

@jsstevenson

A while back, this method got added to VRS-Python to help with HGVS translation:

def extract_sequence_type(alias: str) -> str | None:
    """Provide a convenient way to extract the sequence type from an accession by matching its prefix to a known set of prefixes.

    Args:
    alias (str): The accession string.

    Returns:
    str or None: The sequence type associated with the accession string, or None if no matching prefix is found.

    """
    prefix_dict = {
        "refseq:NM_": "c",
        "refseq:NC_012920": "m",
        "refseq:NG_": "g",
        "refseq:NC_00": "g",
        "refseq:NW_": "g",
        "refseq:NT_": "g",
        "refseq:NR_": "n",
        "refseq:NP_": "p",
        "refseq:XM_": "c",
        "refseq:XR_": "n",
        "refseq:XP_": "p",
        "GRCh": "g",
    }

    for prefix, seq_type in prefix_dict.items():
        if alias.startswith(prefix):
            return seq_type
    return None

I don't really know the context or whether something here already fulfills this need, but it struck me as a bioutils-esque task and I figured I'd throw out the idea of moving it here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions