1. Purpose of Automerge
Automerge was added to FPC v7.2 to allow automatic end-merging
of contigs. End-merging without the Auto setting is discussed in the
FPC tutorial, and the reader is assumed to be familiar with that material.
Automatic merges are
not screened by a human, and therefore automerge should not be viewed
as part the finishing stage of a project. Rather, it enables a different
way of building projects, which hopefully leads to a higher-quality
build, especially in the presence of contamination.
The idea is to build
the project at a stringent enough cutoff so that the initial contigs
are valid, and then to gradually automerge the contigs together
at successively lower cutoffs. An important point is that the automerge
is set to require at least 2 different overlapping end-clone pairs
for each merge, so that one contaminated end-clone by itself cannot
cause a false merge.
In FPC v7.2, automerge had to be run in steps with successively less-stringent
cutoff. FPC v8.0 removes this necessity, allowing the automerge to be
performed just once at the final cutoff value. The merges which are found
are carried out automatically in order of the cutoff at which they would
first occur, thereby reproducing the effect of the multi-step process.
2. Running Automerge
We will go through
a simple example demonstrating automerge. Change to the demo/auto
directory, and open up auto.fpc by typing 'fpc auto.fpc'. The project
has been built with a cutoff 1e-80, creating 13 contigs, as shown
below.
In a real project,
the DQer should now be run to break up possible bad contigs, but
since there are few Qs in this demo we will skip this.
Now we will play with
merging, first without the Auto option. Without the Auto option,
the merges are only suggested, not carried out.
The functions we will
be working with are located on the Main Analysis window. The first
thing to do is to set the FromEnd parameter correctly. FromEnd tells
how close to the contig end a clone must be in order to count as
an end-clone. Its units are CB units and typically it is set to
1/2 the number of bands in an average clone. For HICF, this means
about 50, so enter 50 in the text box.
Now set the cutoff
to 1e-79, and press the 'Ends-->Ends' button. The project window
pops up, showing no merges; however, the terminal window shows the
following:
Ends --> Ends (Cutoff 1e-79 FromEnd 50)
Ctg4 L c0007P08 165b Ctg6 L c0430J16 124b Match 95 2e-80
This shows that a clone
on the L side of contig 4 matched one on the L end of contig 6,
but this did not lead to a merge because the Match parameter is
set to 2, requiring 2 matches. Change Match to 1 and re-run 'Ends-->Ends';
now the project window pops up, showing the merge between contig
4 and 6.
Now check the Auto
button, and run 'Ends-->Ends' one more time. This time the project
window pops up, showing that contig 6 has been merged to contig
4. Contig 6 is now gone:
Click on the first
line to open up the display for Contig 4. You may need to zoom a
bit to spread out the contig. Observe that there are two 'End-merge'
remarks in the remarks track. These record the best-overlapping
clone pair at each merge site, so the merge sites and some indication
of their quality can be viewed in the future.
The highlighted clones
in this figure are the ones which were merged from contig 6. To
highlight these clones, choose Show Additions from the "Highlight"
menu resulting window.
Double-click on one of the highlighted clones to bring up its clone
window, and observe that it has an entry 'Old ctg6' (image below).
Now we will experiment
a bit with the Match parameter. Set Match to 3 and cutoff to 1e-70,
and run 'Ends-->Ends'. No merges are found, but the terminal window
shows 4 matches between contigs 2 and 3 (plus some other lines):
Ctg2 L c0395A07 110b Ctg3 R b0029L03 98b Match 75 2e-76
Ctg2 L c0395A07 110b Ctg3 R c0467H22 133b Match 82 3e-75
Ctg2 L c0478J17 121b Ctg3 R b0029L03 98b Match 76 4e-75
Ctg2 L c0478J17 121b Ctg3 R c0467H22 133b Match 86 8e-76
Why does this not trigger
a merge? The reason is that there are only two clones involved on
each side. To meet the requirement of Match=3, there must be at
least 3 different clones involved on each side.
In small contigs, some clones can be simultaneously near the left
and right end, so that they would be both 'L' and 'R'. These clones
are denoted as 'B' in the printout, as is seen
in some of the other lines:
Ctg8 B c0068P14 104b Ctg12 B c0002M08 121b Match 78 5e-75
Ctg8 B c0068P14 104b Ctg12 B c0455E19 113b Match 76 8e-74
This shows that a clone
on the R end of contig 8 overlapped two clones which are
both L and R in contig 12.
Lastly we will see how multiple merges are handled. There are two points
to understand:
- Merges are performed in order of the cutoff at which they would occur
- Each end of each contig is only allowed to merge once
Set the cutoff to 1e-68,
and Match to 2, and run Ends->Ends with Auto checked. Two merges
are performed, and the following prints to the console window:
>> Ends->Ends: Cutoff 1.0e-68 Auto 1 Match 2 FromEnd 50
Ctg1 L c0570J22 108b Ctg2 R b0618A02 74b Match 65 1e-74
Ctg1 R c0352C11 100b Ctg5 R c0064C24 91b Match 72 4e-78
Ctg2 L c0037N21 114b Ctg3 R c0467H22 133b Match 80 7e-70
Ctg2 L c0395A07 110b Ctg3 R b0029L03 98b Match 75 2e-76
Ctg2 L c0395A07 110b Ctg3 R c0467H22 133b Match 82 3e-75
Ctg2 L c0477J20 119b Ctg3 R b0029L03 98b Match 73 2e-70
Ctg2 L c0477J20 119b Ctg3 R c0467H22 133b Match 82 1e-70
Ctg2 L c0478J17 121b Ctg3 R b0029L03 98b Match 76 4e-75
Ctg2 L c0478J17 121b Ctg3 R c0467H22 133b Match 86 8e-76
Ctg2 L c0553D05 89b Ctg3 R c0467H22 133b Match 71 4e-69
Match: 2L 3R cutoff:8e-76
Ctg4 L c0007P08 165b Ctg6 L c0430J16 124b Match 95 2e-80
Ctg4 R c0360G04 107b Ctg10 B b0110H13 98b Match 71 3e-70
Ctg5 L c0359H07 161b Ctg9 L c0477C19 110b Match 83 2e-70
Ctg5 R b0625N15 62b Ctg11 B b0390K21 77b Match 54 2e-69
Ctg5 R b0625N15 62b Ctg11 B b0452K02 69b Match 53 8e-70
Ctg5 R c0064C24 91b Ctg11 B b0349K22 74b Match 61 6e-71
Ctg5 R c0064C24 91b Ctg11 B b0390K21 77b Match 64 7e-75
Ctg5 R c0064C24 91b Ctg11 B b0452K02 69b Match 62 3e-77
Match: 5R 11R cutoff:8e-70
Match: 5R 11L cutoff:8e-70
Ctg7 R b0040K19 76b Ctg8 R c0430C01 120b Match 69 2e-78
Ctg7 R b0635F03 82b Ctg8 L c0462D11 120b Match 68 2e-71
Ctg7 R c0317E09 118b Ctg8 L c0462D11 120b Match 85 2e-79
Ctg8 R c0068P14 104b Ctg12 B c0002M08 121b Match 78 5e-75
Ctg8 R c0068P14 104b Ctg12 B c0455E19 113b Match 76 8e-74
Ctg9 R b0617N03 84b Ctg10 B b0325O12 106b Match 66 9e-70
Merge contig 3R to 2L (Original:2L 3R) Score:8e-76
Merge contig 11R to 5R (Original:5R 11R) Score:8e-70
Flip contig 11
Ignoring merge 5R to 11L because of previous merge of 5R to 11
Complete merge of 2 contigs.
Calculate merges: Real time 0.330s User time 0.340s Sys time 0.000s
Perform merges: Real time 0.000s User time 0.000s Sys time 0.000s
We will go through this step by step. First, a match is found between 2L and 3R,
which would happen at minimum cutoff 8e-76. Next a match is found between contigs 5R
and 11. The clones from contig 11 are B clones, meaning they are both L and R, so the
matches are recorded for both 11L and 11R. These matches happen first at cutoff 8e-70.
Next, the merges are performed, in order of cutoff. 3R to 2L has the lowest
cutoff and is done first. Next, 11R to 5R is done (11L could equally well be chosen).
Each end of each contig can only be used once, so 5R is not allowed to be used
again; hence, the merge of 5R to 11L is ignored, as would be any other merge of
a contig to 5R.
Finally, notice the comments in parenthesis, e.g. "(Original:2L 3R)".
These tell the original contig numbers involved in the merge, whereas the numbers
in the "Merge contig" statement include all previous merges. For example, if
Ctg7L is merged to Ctg1R, and then Ctg21L is merged Ctg7R, the latter will be reported as
"Merge contig 21L to 1R (Original:21L 7R)".
Looking at the project page (see below), we see that the "Qs" column has "~" signs for both
contigs which were merged. E.g., for contig 5, the column reads "~ 1". Contigs that
are automerged are simply joined with an overlap equal to the FromEnd setting, and
their CB maps are not recomputed. FPC therefore does not know how many Qs
are in those contigs, and the number reported is simply the sum of the Qs
for the contigs which went into the merge. The "~" indicates that the Q number
was obtained in this way and not by recomputing CB maps. There is good
reason not to recompute CB maps, since end-merging is frequently done at
less stringent cutoffs which could result in false-positive overlaps within
a contig, leading to a bad CB map. However, if clone order is crucial, e.g.
for selection of a minimal tiling path, then the CB map must be recomputed
if an appropriate cutoff can be found.
|