-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow commented lines in fragment files. #83
Allow commented lines in fragment files. #83
Conversation
0.2.0 Release - Gibson Les Paul
This reverts commit 800a47a.
Allow commented lines in fragment files as CellRanger-ATAC/CellRangerARC puts some commented lines at the start of the file.
Hey @ghuls thanks for the PR, I changed the base to A follow up question... are you using the fragment scoring code, or did you just stumble on the code and saw you could add this line? |
I was trying out the code but it crashed on my fragments file (failed to parse integer). We have code that does something similar in pycisTopic: https://github.com/aertslab/pycisTopic/blob/polars_1xx/src/pycisTopic/fragments.py#L1132-L1337 , but that would make the counts per cell barcode (and returns a sparse matrix instead). |
Gotcha! That's really interesting. This is similar, I think, but it does counts by pseudobulk not by cell barcodes (unless you set each barcode is its own pseudobulk 😀). I'm interested in identifying cell-type specific peaks, so that's why I was doing this by pseudobulk, but it truly was just a stopgap for me to move forward in my analysis and get the output I needed. This implementation I wrote uses binary interval search (BITS). I'm using the rust-lapper crate for that. It's very fast, and it can be even faster if you parallelize it intelligently. The Anyways... if you see value here or potential improvements, let me know! Thank's for the PR! |
Ah, I was not sure what gtars was using for intersecting regions (if it was using its own intersecting implementation or another rust crate. In |
We are actually in the process of moving away from BITS/
Interestingly, I do something super similar. It's by pseudobulk, but only after processing with SnapATAC2. Looking through the |
How does it compare with
The pseudobulks are not known in advance. First we create a binary matrix for all cell barcodes over an initial set of consensus regions, then this binary matrix is used in topic modeling which output will be used to cluster the cells (works better than clustering on just sparse binary/count matrix. From this clustering you can create your pseudobulks and you can refine your consensus regions by combining consensus regions made per pseudobulk (assuming this cell types are only a small percentage of all your cells and those regions would be missed when you just take full bulk consensus regions). |
Allow commented lines in fragment files as CellRanger-ATAC/CellRangerARC puts some commented lines at the start of the file.