Full fragment coverage mode for proper pairs #246

LudvigOlsen · 2025-01-15T19:39:30Z

Whereas the current version of mosdepth only calculates the coverage in the positions of the observed reads, the actual biological coverage in a paired-end sequencing would surely include positions in-between the reads (when 2x read size < fragment length). This PR adds a fragment-based coverage mode (--fragment-mode) where the entire fragment between paired-end reads is counted. It is a separate mode to fast-mode and the normal mode (I don't really understand the cigar operations part of the code, so correct me if these need to be supported here as well).

Proper Pairs Only: Reads must be both properly paired and not supplementary to be considered.
Avoid Double-Counting: We count the fragment only for the “first” read by checking if rec.start < rec.matepos.
Tie-Breaker for Same Start Positions: If rec.start == rec.matepos, we use a hash set (seen_ids) to ensure each read name (qname) is counted only once.
Coverage Updating: Coverage increments at rec.start and decrements at rec.start + abs(rec.isize), preserving a start/end approach for efficient calculation with cumsum() later.

This was the cleanest initial approach, I could come up with. I haven't tested it yet, but wanted to start the conversation.

brentp · 2025-01-15T21:18:20Z

i haven't looked at code yet, just your comment, but for pos == matepos you don't need a hash, can just always use read1 (from flag).
in fact, you can do this always, even when pos != matepos

LudvigOlsen · 2025-01-15T21:32:17Z

Good point. I've simplified it to only count read1 but use the lowest start coordinate of rec.start and rec.matepos.

LudvigOlsen · 2025-01-15T22:27:38Z

I've added a functional test and a small bam file (8 simulated reads) that has 2 fragments with isize > 2x read size and 2 fragments with isize < 2x read size. It seems to work as expected, unless I've mixed up some 0/1-indexing somewhere. Locally, it fails at the d4 test step but I think that's an installation thing. And the CI checks pass here. So, I think it's ready for you to have a look at. No rush though!

Here's an overview of the bam file:

And the mosdepth output:

Edit: There's an issue with how I generated the bam file (the unpaired reads have proper-pair flags even though they're not paired, so it counts them). I will fix the bam file tomorrow.

LudvigOlsen · 2025-01-16T11:44:43Z

Now the file should be more correct:

Bam file with 2 unpaired reads, 2 long fragments (> 2x read size), and two short fragments:

Fragment-mode output:

From the command:

mosdepth <prefix> full-fragment-pairs.bam --fragment-mode

LudvigOlsen · 2025-01-16T15:22:36Z

Is it too strict to only include the proper pairs? We obviously need a filter like that to ensure the combined reads represent an actual fragment. Wondering whether there is a less strict set of conditions for that though?

brentp

looks great! just minor changes.
thank you!

mosdepth.nim

brentp · 2025-01-16T15:28:47Z

Is it too strict to only include the proper pairs? We obviously need a filter like that to ensure the combined reads represent an actual fragment. Wondering whether there is a less strict set of conditions for that though?

hah, I just had the same thought while reviewing. open to brainstorming, but seems reasonable.

brentp · 2025-01-17T15:51:12Z

thanks! i made a new release here: https://github.com/brentp/mosdepth/releases/tag/v0.3.11

LudvigOlsen · 2025-01-17T15:54:45Z

Perfect!

Ludvig added 2 commits January 15, 2025 20:19

Add initial idea for full fragment coverage mode

7aeb6a7

Adds bam file to test fragment-mode

0fbe49b

Simplify read filtering in fragment-mode

fc681c9

Ludvig added 4 commits January 15, 2025 22:56

Passes the fragment-mode arg to coverage()

01948b1

Removes unnecessary 0-indexing correction (start is 1-indexed)

5fe25b4

Adds regression test for fragment mode

e9a630c

Uses tabs instead of spaces in expected test output

5416860

LudvigGenomeDK added 4 commits January 16, 2025 00:44

Creates new bam file for testing fragment-mode

2d950e1

Regenerates test bam files for fragment-mode

1dfeb59

Updates test expectation

8e98db3

Fixes whitespace in expectation

24c44f8

brentp requested changes Jan 16, 2025

View reviewed changes

mosdepth.nim Outdated Show resolved Hide resolved

mosdepth.nim Outdated Show resolved Hide resolved

mosdepth.nim Show resolved Hide resolved

mosdepth.nim Outdated Show resolved Hide resolved

Removes seen_ids hash set. Improves arg string

a835419

brentp approved these changes Jan 17, 2025

View reviewed changes

brentp merged commit b54ce45 into brentp:master Jan 17, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Full fragment coverage mode for proper pairs #246

Full fragment coverage mode for proper pairs #246

LudvigOlsen commented Jan 15, 2025 •

edited

Loading

brentp commented Jan 15, 2025

LudvigOlsen commented Jan 15, 2025 •

edited

Loading

LudvigOlsen commented Jan 15, 2025 •

edited

Loading

LudvigOlsen commented Jan 16, 2025

LudvigOlsen commented Jan 16, 2025

brentp left a comment

brentp commented Jan 16, 2025

brentp commented Jan 17, 2025

LudvigOlsen commented Jan 17, 2025

Full fragment coverage mode for proper pairs #246

Full fragment coverage mode for proper pairs #246

Conversation

LudvigOlsen commented Jan 15, 2025 • edited Loading

brentp commented Jan 15, 2025

LudvigOlsen commented Jan 15, 2025 • edited Loading

LudvigOlsen commented Jan 15, 2025 • edited Loading

LudvigOlsen commented Jan 16, 2025

LudvigOlsen commented Jan 16, 2025

brentp left a comment

Choose a reason for hiding this comment

brentp commented Jan 16, 2025

brentp commented Jan 17, 2025

LudvigOlsen commented Jan 17, 2025

LudvigOlsen commented Jan 15, 2025 •

edited

Loading

LudvigOlsen commented Jan 15, 2025 •

edited

Loading

LudvigOlsen commented Jan 15, 2025 •

edited

Loading