The University of Arizona
Automerge  
Home | Search | FPC | Contact Us

 

1. Purpose of Automerge

Automerge was added to FPC v7.2 to allow automatic end-merging of contigs. End-merging without the Auto setting is discussed in the FPC tutorial, and the reader is assumed to be familiar with that material.

Automatic merges are not screened by a human, and therefore automerge should not be viewed as part the finishing stage of a project. Rather, it enables a different way of building projects, which hopefully leads to a higher-quality build, especially in the presence of contamination.

The idea is to build the project at a stringent enough cutoff so that the initial contigs are valid, and then to gradually automerge the contigs together at successively lower cutoffs. An important point is that the automerge is set to require at least 2 different overlapping end-clone pairs for each merge, so that one contaminated end-clone by itself cannot cause a false merge.

In FPC v7.2, automerge had to be run in steps with successively less-stringent cutoff. FPC v8.0 removes this necessity, allowing the automerge to be performed just once at the final cutoff value. The merges which are found are carried out automatically in order of the cutoff at which they would first occur, thereby reproducing the effect of the multi-step process.

2. Running Automerge

We will go through a simple example demonstrating automerge. Change to the demo/auto directory, and open up auto.fpc by typing 'fpc auto.fpc'. The project has been built with a cutoff 1e-80, creating 13 contigs, as shown below.

In a real project, the DQer should now be run to break up possible bad contigs, but since there are few Qs in this demo we will skip this.

Now we will play with merging, first without the Auto option. Without the Auto option, the merges are only suggested, not carried out.

The functions we will be working with are located on the Main Analysis window. The first thing to do is to set the FromEnd parameter correctly. FromEnd tells how close to the contig end a clone must be in order to count as an end-clone. Its units are CB units and typically it is set to 1/2 the number of bands in an average clone. For HICF, this means about 50, so enter 50 in the text box.

Now set the cutoff to 1e-79, and press the 'Ends-->Ends' button. The project window pops up, showing no merges; however, the terminal window shows the following:


Ends --> Ends (Cutoff 1e-79 FromEnd 50)
Ctg4    L        c0007P08 165b  Ctg6    L        c0430J16 124b  Match  95 2e-80 

This shows that a clone on the L side of contig 4 matched one on the L end of contig 6, but this did not lead to a merge because the Match parameter is set to 2, requiring 2 matches. Change Match to 1 and re-run 'Ends-->Ends'; now the project window pops up, showing the merge between contig 4 and 6.

Now check the Auto button, and run 'Ends-->Ends' one more time. This time the project window pops up, showing that contig 6 has been merged to contig 4. Contig 6 is now gone:

Click on the first line to open up the display for Contig 4. You may need to zoom a bit to spread out the contig. Observe that there are two 'End-merge' remarks in the remarks track. These record the best-overlapping clone pair at each merge site, so the merge sites and some indication of their quality can be viewed in the future.

The highlighted clones in this figure are the ones which were merged from contig 6. To highlight these clones, choose Show Additions from the "Highlight" menu resulting window. Double-click on one of the highlighted clones to bring up its clone window, and observe that it has an entry 'Old ctg6' (image below).

Now we will experiment a bit with the Match parameter. Set Match to 3 and cutoff to 1e-70, and run 'Ends-->Ends'. No merges are found, but the terminal window shows 4 matches between contigs 2 and 3 (plus some other lines):


Ctg2    L        c0395A07 110b  Ctg3    R        b0029L03  98b  Match  75 2e-76 
Ctg2    L        c0395A07 110b  Ctg3    R        c0467H22 133b  Match  82 3e-75 
Ctg2    L        c0478J17 121b  Ctg3    R        b0029L03  98b  Match  76 4e-75 
Ctg2    L        c0478J17 121b  Ctg3    R        c0467H22 133b  Match  86 8e-76 

Why does this not trigger a merge? The reason is that there are only two clones involved on each side. To meet the requirement of Match=3, there must be at least 3 different clones involved on each side.

In small contigs, some clones can be simultaneously near the left and right end, so that they would be both 'L' and 'R'. These clones are denoted as 'B' in the printout, as is seen in some of the other lines:


Ctg8    B        c0068P14 104b  Ctg12   B        c0002M08 121b  Match  78 5e-75 
Ctg8    B        c0068P14 104b  Ctg12   B        c0455E19 113b  Match  76 8e-74 

This shows that a clone on the R end of contig 8 overlapped two clones which are both L and R in contig 12.

Lastly we will see how multiple merges are handled. There are two points to understand:

  • Merges are performed in order of the cutoff at which they would occur
  • Each end of each contig is only allowed to merge once
Set the cutoff to 1e-68, and Match to 2, and run Ends->Ends with Auto checked. Two merges are performed, and the following prints to the console window:
>> Ends->Ends: Cutoff 1.0e-68 Auto 1 Match 2  FromEnd 50
Ctg1    L        c0570J22 108b  Ctg2    R        b0618A02  74b  Match  65 1e-74
Ctg1    R        c0352C11 100b  Ctg5    R        c0064C24  91b  Match  72 4e-78
Ctg2    L        c0037N21 114b  Ctg3    R        c0467H22 133b  Match  80 7e-70
Ctg2    L        c0395A07 110b  Ctg3    R        b0029L03  98b  Match  75 2e-76
Ctg2    L        c0395A07 110b  Ctg3    R        c0467H22 133b  Match  82 3e-75
Ctg2    L        c0477J20 119b  Ctg3    R        b0029L03  98b  Match  73 2e-70
Ctg2    L        c0477J20 119b  Ctg3    R        c0467H22 133b  Match  82 1e-70
Ctg2    L        c0478J17 121b  Ctg3    R        b0029L03  98b  Match  76 4e-75
Ctg2    L        c0478J17 121b  Ctg3    R        c0467H22 133b  Match  86 8e-76
Ctg2    L        c0553D05  89b  Ctg3    R        c0467H22 133b  Match  71 4e-69
Match: 2L 3R cutoff:8e-76
Ctg4    L        c0007P08 165b  Ctg6    L        c0430J16 124b  Match  95 2e-80
Ctg4    R        c0360G04 107b  Ctg10   B        b0110H13  98b  Match  71 3e-70
Ctg5    L        c0359H07 161b  Ctg9    L        c0477C19 110b  Match  83 2e-70
Ctg5    R        b0625N15  62b  Ctg11   B        b0390K21  77b  Match  54 2e-69
Ctg5    R        b0625N15  62b  Ctg11   B        b0452K02  69b  Match  53 8e-70
Ctg5    R        c0064C24  91b  Ctg11   B        b0349K22  74b  Match  61 6e-71
Ctg5    R        c0064C24  91b  Ctg11   B        b0390K21  77b  Match  64 7e-75
Ctg5    R        c0064C24  91b  Ctg11   B        b0452K02  69b  Match  62 3e-77
Match: 5R 11R cutoff:8e-70
Match: 5R 11L cutoff:8e-70
Ctg7    R        b0040K19  76b  Ctg8    R        c0430C01 120b  Match  69 2e-78
Ctg7    R        b0635F03  82b  Ctg8    L        c0462D11 120b  Match  68 2e-71
Ctg7    R        c0317E09 118b  Ctg8    L        c0462D11 120b  Match  85 2e-79
Ctg8    R        c0068P14 104b  Ctg12   B        c0002M08 121b  Match  78 5e-75
Ctg8    R        c0068P14 104b  Ctg12   B        c0455E19 113b  Match  76 8e-74
Ctg9    R        b0617N03  84b  Ctg10   B        b0325O12 106b  Match  66 9e-70
 
Merge contig 3R to 2L (Original:2L 3R) Score:8e-76
 
Merge contig 11R to 5R (Original:5R 11R) Score:8e-70
Flip contig 11
 
Ignoring merge 5R to 11L because of previous merge of 5R to 11
 
Complete merge of 2 contigs.
Calculate merges: Real time 0.330s   User time 0.340s   Sys time 0.000s
Perform   merges: Real time 0.000s   User time 0.000s   Sys time 0.000s

We will go through this step by step. First, a match is found between 2L and 3R, which would happen at minimum cutoff 8e-76. Next a match is found between contigs 5R and 11. The clones from contig 11 are B clones, meaning they are both L and R, so the matches are recorded for both 11L and 11R. These matches happen first at cutoff 8e-70.

Next, the merges are performed, in order of cutoff. 3R to 2L has the lowest cutoff and is done first. Next, 11R to 5R is done (11L could equally well be chosen). Each end of each contig can only be used once, so 5R is not allowed to be used again; hence, the merge of 5R to 11L is ignored, as would be any other merge of a contig to 5R.

Finally, notice the comments in parenthesis, e.g. "(Original:2L 3R)". These tell the original contig numbers involved in the merge, whereas the numbers in the "Merge contig" statement include all previous merges. For example, if Ctg7L is merged to Ctg1R, and then Ctg21L is merged Ctg7R, the latter will be reported as "Merge contig 21L to 1R (Original:21L 7R)".

Looking at the project page (see below), we see that the "Qs" column has "~" signs for both contigs which were merged. E.g., for contig 5, the column reads "~ 1". Contigs that are automerged are simply joined with an overlap equal to the FromEnd setting, and their CB maps are not recomputed. FPC therefore does not know how many Qs are in those contigs, and the number reported is simply the sum of the Qs for the contigs which went into the merge. The "~" indicates that the Q number was obtained in this way and not by recomputing CB maps. There is good reason not to recompute CB maps, since end-merging is frequently done at less stringent cutoffs which could result in false-positive overlaps within a contig, leading to a bad CB map. However, if clone order is crucial, e.g. for selection of a minimal tiling path, then the CB map must be recomputed if an appropriate cutoff can be found.

Email Comments To: fpc@agcol.arizona.edu

 

 

 

Last Modified Thursday February 14, 2008 10:28 AM and 07 seconds