Written by
Fred Engler     Aug 2003
Updated by Will Nelson     June 2004
Updated by Martin Pokorny, Jingmei Yang, Will Nelson & Cari Soderlund   April 2006
Updated by Jingmei Yang, Will Nelson & Cari Soderlund   Aug 2006
Updated by Will Nelson & Cari Soderlund   Aug 2007
Contents
A. Introduction
B. MTP using Fingerprints
    1. Finding overlapping clone pairs using
fingerprints (Step1)
    2. Viewing overlapping clone pairs in
the contig display(Step3)
    3. Picking MTP clones(Step2)
    4. Viewing MTP clones in the contig display
C. MTP using Fingerprints and draft sequence alignments to BES
D. Saving MTP results
E. Mandatory clones
F. Split BSS Contigs
G. HICF Contigs
Also see: MTP simulation results
Selecting an MTP (Minimal Tiling Path) is the task of
picking a set of minimally overlapping clones that span an
entire contig. Due to inexact coordinates in the CB
(Consensus Bands) map, one cannot pick overlapping clones
based solely on their position on the map. There are two
methods using two different input sources for picking MTP
clones: 1) the fingerprint method, in which overlaps are
determined by looking at the clone fingerprints and their
map position, and 2) the BSS-draft method, in which
sequence comparison between draft sequence and BESs (BAC
End Sequences) via BSS is used. The first method involves
analyzing the fingerprints of a pair of overlapping clones
for shared restriction fragments (bands), and verifying
the integrity of the fingerprints of the potentially
overlapping pair by matching bands with a spanning and two
flanking clones. In the second method, map overlap is
confirmed by a draft sequence contig matching two BESs of
the overlapping clone pair. The first method uses
information that is already present in the physical map,
but the overlaps can be inexact. The second method
requires sequence information, but gives very exact
overlaps. The 'Select MTP' function can use fingerprints,
BES draft sequence comparison results, or both as input
for picking MTP clones. When both are used, precedence is
given to overlaps verified by the BSS-draft method. There
are two steps in automatically picking MTP clones: 1)
finding a set of overlapping clone pairs, and 2) picking a
contiguous path of overlapping clone pairs through a
contig. The following sections guide you through an
example of using the automatic MTP picking function of
FPC. Before you begin, please download the files used in
this demo by clicking here.
Uncompress the files:
tar xzvf mtpdemo.tar.gz
Change directory to the mtpdemo directory:
cd mtpdemo
Next, start FPC with the demo file by typing
fpc mtpdemo
on the command line. Open the window for selecting MTP
clones by clicking on the 'MTP' button from the Main Menu.
The following window appears:
B. MTP using Fingerprints
1. Finding overlapping clone pairs using fingerprints Back to top
This step locates all pairs of overlapping clones that
satisfy the criteria given in the top part of the window.
No sequence information is used in this step. The
fingerprint bands and the map positions of clones are used
in the analysis.
Make sure the 'Use
fingerprints' options are turned 'on' (the circle is
filled). Leave all parameters at their default value. An
explanation of the parameters is given in the online help,
which may be read by clicking on the 'Help' button at the
top of the window.
Click on the 'Find overlapping
pairs' button. This starts the computation process. You
will see on the standard output the progress being made
through the contigs, as shown below:
********** Find overlapping pairs ***********
Read 114098 bands from .cor file. Band range(720 3298).
Find Fingerprints Pairs
Clone pairs for ctg1 (clones 314)...3837 pairs
Clone pairs for ctg2 (clones 221)...2880 pairs
Clone pairs for ctg3 (clones 92)...1016 pairs
.
.
.
Clone pairs for ctg40 (clones 5)...1 pairs
Clone pairs for ctg41 (clones 2)...0 pairs
Contigs with zero pairs 2
Identified 33456 fingerprint pairs
// All contigs Min FPC overlap 0 Max FPC overlap 20
// Use Fingerprints: Min Shared Bands 6
********** Finish overlapping pairs ***********
When the computation is complete, the button will turn gray.
2. Viewing overlapping clone pairs in the contig display
Back to top
We will come back to STEP 2, but first we will look at the
results from STEP 1 by going to STEP 3 (which shows results from
both steps).
To see the overlapping pairs in the contig display, select
a contig via the 'Contig' text box, and click the 'Next'
button beside 'Step through pairs'. The selected contig
will open, and the first pair, along with the spanning and
flanking clones, will be highlighted. As you continue to
click on 'Next', each pair will in turn be
highlighted:
If the 'Show fingerprints (fp only)' option is turned on,
the Fingerprint window will also open, showing a series of
fingerprints:
A total of five fingerprints will be shown, and five
clones are highlighted. In the contig display, the clones
highlighted in blue indicate the clone pair. The pale blue
clone spanning the overlap of the pair (called the
spanner) verifies the shared bands of the pair. Extending
to the left and right of the pair are two clones
highlighted in gray. These clones confirm bands in the
pair that are not confirmed by the spanning clone. The
Fingerprint window is used to show how bands are shared.
The following color scheme is used:
Cyan
-- band is shared by both clones in the pair and the spanning clone.
Green
-- band is shared only by the left clone in the pair and spanning
clone.
Blue
-- band is shared only by the right clone in the pair and spanning
clone.
Violet
-- band is shared by a clone in the pair and its flanking clone, but
not by the spanner.
Red
-- band in a pair or spanning clone that is unconfirmed; a mismatch.
In the standard output, information on the shared bands
and unmatched bands, along with the length of the pair
clones, is given:
Fingerprint pair:
L-flank Left Spanner Right R-flank
z2598 z2597 z2602 z2612 z2611
122880 167936 (length)
6 10 12 20 9 (shared)
- 2 1 0 - (mismatch)
The numbers displayed on
the terminal for each pair have the following
correspondence to the colors of bands in the Fingerprint
window.
- The number of cyan bands is the number in the "Spanner" column and "(shared)" row (e.g. 12).
- The number of green bands is the number in the "Left" column and "(shared)" row (e.g. 10).
- The number of blue bands is the number in the "Right" column and "(shared)" row (20).
- The number of violet bands in the fingerprint of the left clone of the pair is the
number in the "L-flank" column and the "(shared)" row (e.g. 6).
- The number of violet bands in the fingerprint of the right
clones of the pair is the number in the "R-flank" column
and the "(shared)" row (e.g. 9).
- The number of red bands bands
is the sum of the numbers in the "(mismatch)" row (e.g. 2, 1, 0 for Left, Spanner, Right, respectively).
You may get output as follows:
Fingerprint pair: olap 49152
L-flank Left Spanner Right R-flank
z2598 z2597 z2602 z2612 z2611
122880 167936 (length)
6 10 12 20 9 (shared)
- 2 1 0 - (mismatch)
which indicates that no valid spanner and flankers could be found, though the pair
does qualify based on the user input.
- You can step through the pairs by repeatedly clicking on
the 'Next' and 'Previous' buttons beside 'Step through
pairs'.
- You may select the clone from which to begin
stepping through the pairs by clicking on 'Pick start',
followed by clicking on the clone of interest.
- You may
wish to hide everything except the step buttons. To do
this, click on the 'Mini' button. Only those options
essential in stepping through pairs are shown:
To revert to the full-sized window, click on the 'Full' button.
Now we will go back to STEP2 to automatically select the MTP.
Using the
shortest paths algorithm, a minimal path of clones is
picked through a contig based on the amount of clone
overlap (shared bands) and clone size. To run this, click
on the 'Pick MTP clones' button in STEP 2. The following text will
be displayed:
************ Starting PickMTP ************
Building graphs completed.
Finding MTP completed.
Average MTP clone size: 143959
Contig totals: (in CB units)
Contig Ctg len # MTP overlap # of gaps gap length %covered
------- ------- ----- ------- --------- ---------- --------
ctg1 493 14 102 0 0 94%
ctg2 303 8 62 0 0 94%
ctg3 159 4 23 0 0 92%
ctg4 137 3 16 0 0 79%
ctg5 185 5 30 0 0 94%
ctg6 538 17 133 0 0 97%
ctg7 298 11 76 0 0 94%
ctg8 872 26 184 0 0 97%
ctg9 257 7 48 0 0 89%
ctg10 100 3 16 0 0 95%
...
ctg40 37 1 0 0 0 64%
ctg41 41 1 0 0 0 70%
Clone overlap (base pairs):
Positive:
20000- 30000- 40000- 50000- 60000- 70000- 80000-
29999 39999 49999 59999 69999 79999 89999
80 31 21 7 5 2 3
Total positive overlap: 5177344
Average positive overlap: 34747
Number of positive clone overlaps: 149
Clones picked:190
BSS pairs: 0 (0%)
Fingerprint pairs: 149 (98%)
Single MTP clones: 3
Mandatory clone pairs: 0 (0%)
Expressway junctions: 0 (0%)
Number of clones in MTP: 190
Number of mandatory clones: 0
Total gap span: 0 kb
Total MTP span: 24019 kb
Percent of map covered: 92%
// All contigs Prefer large Mandatory:
************ Finished PickMTP ************
The 'Pick MTP clones' button will turn gray once the
process completes.
In the table displayed on the standard output, the "ctg
len" column displays the length of the contig, and the
"overlap" column displays the total overlap of the clones
in the MTP of the given contig, both lengths being given
in FPC units.
Gaps in the MTP are counted wherever there
is a break in the MTP (not including pairs with a negative
overlap, which can happen with BSS pairs), where this number
does not include any potential
uncovered segments of the contig beyond the ends of the
MTP. Thus "# gaps" displays the number of such breaks in
the MTP, and "gap len" is the total length of those gaps
in FPC units. Gaps should usually be 0 if fingerprint data
is used, because there is almost always at least one
viable path through the contig; however, if only BSS data is
used, or if the parameters are set very stringently, then
gaps may appear. We emphasize that these gaps are relative
to the FPC contig; typically, additional gaps will be found
when sequencing is performed, because the FPC contig embodies
only partial information about the underlying sequence.
Finally, "%covered" displays the fraction of
the contig between the ends of the MTP on that contig
(excluding gaps).
Note that all of the lengths given in the MTP report to
the standard output are based on the numbers of bands and
the average band size in the FPC project, and are
therefore only approximations of the actual overlaps,
gaps, etc.
Note:
If you change the pair parameters, you will need to first rerun the
'Find overlapping pairs', and then rerun
the 'Pick MTP clones' function.
4. Viewing MTP clones in the contig display Back to top
The MTP clones are viewed in the contig display the same
way as the pairs, using the 'Next' and 'Previous' buttons
beside 'Step through MTP'. Sometimes a complete
path cannot be found through a contig. The contiguous
paths are called "expressways". Look at the information in
the standard output to see how the clones make up the
expressways. For example,
Clones (1, 2) in expressway of 14:
Fingerprint pair:
L-flank Left Spanner Right R-flank
z2598 z2602 z2610 z2628 z2629
176128 126976 (length)
8 28 7 6 18 (shared)
- 0 0 0 - (mismatch)
tells us that we are looking at the 1st and 2nd
clones in a contiguous path of 14 clones. Click on the
'All' button to see all picked clones highlighted in blue.
Look at the text output to see where junctions occur.
Whenever a path or expressway does not span the entire
contig, a new expressway must start near the last clone of
the previous expressway. The process of choosing a clone
to begin the new expressway can only be based on overlaps
of clones on the CB map, and should therefore be verified
by a person. FPC will display messages to the standard
output signifying such regions of "weak overlap" whenever
they occur in the MTP (see the section on picking an MTP using BSS
pairs, below, for an example).
C. Adding overlap data from draft sequence alignments to BES
Back to top
A much more accurate estimate of the overlap of two clones can
be obtained if a BES from each clone hits a particular
draft sequence contig. If your species has both draft sequence and BES,
it is recommended to use this data in addition to the
fingerprint overlap data.
To use the draft information, you must first use the BSS tool to
make the BSS file of alignments. This is very easy (see the
BSS documentation). For this demo, a BSS file "Dseq.Dbes.bss"
has been produced by
performing a BSS search with query Seq/DSeq.seq and
database BES/DBes.bes. This query
file DSeq.seq simulates Whole Genome Shotgun Sequence contigs, and
the database DBes.bes are actual BAC End Sequences. MegaBLAST was
used as the search engine, using an E-value cutoff of
1e-100.
NB: The MTP function was developed with very short draft sequences in mind,
for example the result of a 1x sequence survey project.
Beginning in FPC V9.1, it is also possible
to use long reference sequences, such as sequenced chromosomes from a closely
related species. If your project does involve small survey sequences,
you may wish to change the Multiple Contig Ratio parameter back
to its former default value of 3. This parameter is on the Advanced Settings dialog.
Finding overlapping clone pairs using fingerprints and BSS:
To incorporate the BSS alignments, turn on the
'Use BSS results' option. Click on the 'Load...' button,
and select the "Dseq.Dbes.bss" on the right. Click 'OK' (or
double-click "Dseq.Dbes.bss"). Click on the 'Find overlapping pairs' button, and
say yes to removing the old pairs. This starts the process
of finding overlapping pairs based on the sequence
comparison results between draft genomic sequence and BAC
End Sequences. The following is printed to the standard
output:
********** Find overlapping pairs ***********
Hit rejections:
0 singleton
2564 min ID and min score
24327 only min ID
2778 only min score
0 not in best contig
0 seqCtg hit too many ctgs (0 seqCtgs)
Total good hits: 10213
Pair rejections:
13808 same orientation
0 too much sequence overlap
6845 below minimum fpc overlap
453 above maximum fpc overlap
34124 different contigs
Total good pairs: 2197
write /home/will/demo/mtpdemo/BSS_results/mtp_pairs.bss
Find Fingerprints Pairs
Clone pairs for ctg1 (clones 314)...3958 pairs
Clone pairs for ctg2 (clones 221)...2951 pairs
Clone pairs for ctg3 (clones 92)...1042 pairs
....
Clone pairs for ctg40 (clones 5)...1 pairs
Clone pairs for ctg41 (clones 2)...0 pairs
Contigs with zero pairs 2
Identified 34669 fingerprint pairs
// All contigs Min FPC overlap 0 Max FPC overlap 20
// Use Fingerprints: Min Shared Bands 6
// Use BSS: Score 400 Identity 97 File /home/will/demo/mtpdemo/BSS_results/Dseq.Dbes.bss
// Advanced: Max Seq overlap 50000 Mult contig ratio 0 Allow neg overlaps Allow mult BES hits
********** Finish overlapping pairs ***********
You can step through these pairs in the same way as with
the fingerprint-based pairs. Only the pair will be
highlighted, as spanners and flankers are not used for
this process.
If you want to view the BSS alignment data for just those overlaps
used for the MTP, select BSS from the Main Menu. You will see
a file called mtp_pairs.bss, which was written during the MTP "find overlapping
pairs" process; these are the BSS results that were selected for input to the MTP algorithm.
Picking MTP clones with fingerprints and BES overlaps:
To see the effect of the more-precise overlap data, click again on the
'Pick MTP clones' button, to create a new MTP including the new data.
The MTP data is again printed to the console, as follows:
************ Starting PickMTP ************
Building graphs completed.
Finding MTP completed.
Average MTP clone size: 142157
Contig totals: (in CB units)
Contig Ctg len # MTP overlap # of gaps gap length %covered
------- ------- ----- ------- --------- ---------- --------
ctg1 493 14 41 0 0 95%
ctg2 303 8 20 0 0 97%
ctg3 159 4 10 0 0 94%
ctg4 137 4 9 0 0 97%
ctg5 185 5 9 0 0 98%
ctg6 538 17 74 0 0 98%
ctg7 298 10 21 0 0 94%
ctg8 872 27 82 0 0 97%
ctg9 257 7 29 0 0 91%
ctg10 100 3 12 0 0 97%
..
ctg40 37 1 0 0 0 64%
ctg41 41 1 0 0 0 70%
Clone overlap (base pairs):
Positive:
0- 10000- 20000- 30000- 40000- 50000- 60000- 70000- 80000-
9999 19999 29999 39999 49999 59999 69999 79999 89999
48 4 18 16 13 3 7 3 3
Total positive overlap: 2890786
Average positive overlap: 25137
Number of positive clone overlaps: 115
Negative:
0-9999 10000-19999
34 1
Total negative overlap: 44125
Average negative overlap: 1260
Number of negative clone overlaps (spanned by draft): 35
Clones picked:191
BSS pairs: 87 (56%)
Fingerprint pairs: 63 (41%)
Single MTP clones: 3
Mandatory clone pairs: 0 (0%)
Expressway junctions: 0 (0%)
Number of clones in MTP: 191
Number of mandatory clones: 0
Total gap span: 0 kb
Total MTP span: 24274 kb
Percent of map covered: 93%
// All contigs Prefer large Mandatory:
************ Finished PickMTP ************
To look at just the overall
statistics:
Total positive overlap: 2890786
Average positive overlap: 25137
Number of positive clone overlaps: 115
Comparing with the previous run, we see that total overlaps have
dropped from 5.2 Mb to 2.9 Mb, a reduction of 44%.
The average positive overlap between clones
has been reduced from 35kb to 25kb, a major improvement.
These basepair figures are also more accurate now, since some overlaps are
known exactly from the draft alignments, while before they all were estimated
from the fingerprint band overlaps.
Notice also that the printout now has a "negative overlap" section which was not present before:
Total negative overlap: 22731
Average negative overlap: 391
Number of negative clone overlaps: 58
Negative overlaps arise because the draft sequence may overlap the ends of two clones
that do not overlap. Since these clones must be very close together,
you may want them to be used in
the MTP. If you do not, on the Advanced Settings, set 'Only positive overlaps' to on.
If you now step through the MTP pairs as before, you will see
the draft-based pairs distinguished from the fingerprint pairs, e.g.:
Clones (1, 2) in expressway of 2:
BSS-draft based pair:
Left Right Seq Olap FPC Olap
z2603 z2627 2824 0
180224 106496 (length)
You may wish to save the results of running MTP, for which
purpose FPC provides two options. The first option, "Set
MTP clone status to TILE", will change the status of all
clones in the MTP to TILE. This option may be useful to
indicate in FPC those clones that are part of the MTP.
They should highlighted in red on the contig display. If they
are not, pull down in white space on the contig display and select
'Edit track properties'; make a filter of Status=Tile, and set Color
to blue; see
Ctgdemo for more information.
Having done this, your contig displays will show the MTP clones, as
well as clone remarks with prefix "MTP:" indicating the expressway start and stop locations
and the overlap between each clone and the previous clone in its
expressway:
The
second option, is Save on the "File of MTP clones", will produce
a text file of the clones in the MTP. The clones in this
file are given by contig, where the clone names are listed
together with the clone lengths, and the (estimated)
overlap of each pair. This option may be useful when you
need a list of the MTP clones outside of FPC.
You may also Save the "File of Pairs" and later load them in using the
"Or use existing Pairs File:" in STEP 1..
If some clones have already been sequenced or are in the pipleline for sequencing, these clones
should be included in the MTP, hence, they are called "Mandatory clones" in STEP 2.
Select the "Mandatory clone"
button and a menu will appear that lists: Tile, Sent, Ready, Shotgun, Finished, SD. As shown in this
tutorial, you can manually set a clone to one of these statuses. The status of Tile implies the clone
has been selected for sequencing but not sent. The status of SD represents a Simulated Digest clone
created from the sequence; these can be created with our FSD/ESD package. All the other statuses can
be used as desired by your laboratory. Generally, the original clone with have a sequencing status
other than SD. You do not want to have both the original in the SD clone in
the MTP, as that would be redundant. We provide the choice of selecting zero or more types to be included.
By default, all of the statuses are unchecked, and you should check those for which you
have clones which you want included in the MTP.
Important: If your intention is to pick an entirely new MTP, make sure that
you are not including mandatory clones. The best way to do this is to unselect all the
options in the Mandatory Clone dialog. You can verify that no clones were mandatory
by saving the MTP to a file (as just described) and checking that no mandatory clones are indicated
in the file.
In the BSS, there is an option to "Split BSS output by contig".
This can be useful when draft sequence is being blasted against the BESs,
because there may be a tremendous amount of output, and you may wish to
study it per contig without loading a very large BSS result file.
If you have generated a split-contig BSS output, then for the MTP you can
either load just a single contig file, or you can enter the directory
that contains the contig files, in which case all contigs will be loaded.
If you want to try this, first bring up the BSS window and make a split-contig
version of the DSeq.DBes.bss result file we have been working with:
- For Query, select the Seq directory (click Browse, double click Seq, select OK).
- For Database, select the BES directory (click Browse, double click BES, select OK).
- Enter the subdirectory 'test'. Select "Split BSS output by contig".
- Select 'Start search'.
- You will see the word 'test' in the BSS results window when it is done.
Then in the MTP window, select Load for the BSS file, select test and then Dseq.Dbes.bss (this is a directory). Run STEP 1 and STEP 2 as before.
If you are using HICF fingerprints instead of Agarose:
- Select Configure on the main menu.
- Select HICF.
- Save .fpc
You will note that there are different default values for HICF. Also, the option to "Use Sizes"
is gone, as it is not relevant for HICF. Everything else is the same.