Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Full fragment coverage mode for proper pairs #246

Merged
merged 12 commits into from
Jan 17, 2025

Conversation

LudvigOlsen
Copy link
Contributor

@LudvigOlsen LudvigOlsen commented Jan 15, 2025

Whereas the current version of mosdepth only calculates the coverage in the positions of the observed reads, the actual biological coverage in a paired-end sequencing would surely include positions in-between the reads (when 2x read size < fragment length). This PR adds a fragment-based coverage mode (--fragment-mode) where the entire fragment between paired-end reads is counted. It is a separate mode to fast-mode and the normal mode (I don't really understand the cigar operations part of the code, so correct me if these need to be supported here as well).

  • Proper Pairs Only: Reads must be both properly paired and not supplementary to be considered.
  • Avoid Double-Counting: We count the fragment only for the “first” read by checking if rec.start < rec.matepos.
  • Tie-Breaker for Same Start Positions: If rec.start == rec.matepos, we use a hash set (seen_ids) to ensure each read name (qname) is counted only once.
  • Coverage Updating: Coverage increments at rec.start and decrements at rec.start + abs(rec.isize), preserving a start/end approach for efficient calculation with cumsum() later.

This was the cleanest initial approach, I could come up with. I haven't tested it yet, but wanted to start the conversation.

@brentp
Copy link
Owner

brentp commented Jan 15, 2025

i haven't looked at code yet, just your comment, but for pos == matepos you don't need a hash, can just always use read1 (from flag).
in fact, you can do this always, even when pos != matepos

@LudvigOlsen
Copy link
Contributor Author

LudvigOlsen commented Jan 15, 2025

Good point. I've simplified it to only count read1 but use the lowest start coordinate of rec.start and rec.matepos.

@LudvigOlsen
Copy link
Contributor Author

LudvigOlsen commented Jan 15, 2025

I've added a functional test and a small bam file (8 simulated reads) that has 2 fragments with isize > 2x read size and 2 fragments with isize < 2x read size. It seems to work as expected, unless I've mixed up some 0/1-indexing somewhere. Locally, it fails at the d4 test step but I think that's an installation thing. And the CI checks pass here. So, I think it's ready for you to have a look at. No rush though!

Here's an overview of the bam file:

billede

And the mosdepth output:

billede

Edit: There's an issue with how I generated the bam file (the unpaired reads have proper-pair flags even though they're not paired, so it counts them). I will fix the bam file tomorrow.

@LudvigOlsen
Copy link
Contributor Author

Now the file should be more correct:

Bam file with 2 unpaired reads, 2 long fragments (> 2x read size), and two short fragments:

billede

Fragment-mode output:

billede

From the command:

mosdepth <prefix> full-fragment-pairs.bam --fragment-mode

@LudvigOlsen
Copy link
Contributor Author

Is it too strict to only include the proper pairs? We obviously need a filter like that to ensure the combined reads represent an actual fragment. Wondering whether there is a less strict set of conditions for that though?

Copy link
Owner

@brentp brentp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great! just minor changes.
thank you!

mosdepth.nim Outdated Show resolved Hide resolved
mosdepth.nim Outdated Show resolved Hide resolved
mosdepth.nim Show resolved Hide resolved
mosdepth.nim Outdated Show resolved Hide resolved
@brentp
Copy link
Owner

brentp commented Jan 16, 2025

Is it too strict to only include the proper pairs? We obviously need a filter like that to ensure the combined reads represent an actual fragment. Wondering whether there is a less strict set of conditions for that though?

hah, I just had the same thought while reviewing. open to brainstorming, but seems reasonable.

@brentp brentp merged commit b54ce45 into brentp:master Jan 17, 2025
4 checks passed
@brentp
Copy link
Owner

brentp commented Jan 17, 2025

thanks! i made a new release here: https://github.com/brentp/mosdepth/releases/tag/v0.3.11

@LudvigOlsen
Copy link
Contributor Author

Perfect!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants