Table
of contents     Tutorial
BSS Manual
1.1. Overview
BSS (Blast Some Sequence) organizes sequence searches against target sequences
that are located on an FPC map. The target sequences can
be BAC-end sequences (BES) or sequenced clones that are associated with
clones in the FPC map. The queries can be arbitrary sequences. The
results of the searches can be added to the FPC map as
either electronic markers or remarks on the clones whose associated
sequences were hit by a query. The search may be filtered based on various attributes,
such as a given contig, clones at the end of contigs, or by scores.
There are many uses for these capabilities. Adding
electronic marker hits can either confirm existing
placements or discover new ones, which helps to anchor
contigs. Other information, such as hits from a repeat
database or whole genome shotgun (WGS) contigs,
can be placed on the map. Also, BSS
searches can aid in merging contigs, finding a minimal
tiling path, and finding the next clone for sequencing.
BSS can perform searches using either BLAST,
MegaBLAST, or BLAT. In all cases, BSS will parse the
resulting output file, generating a new and more
readable "BSS file" which summarizes the search results.
We provide both this manual and a tutorial for the BSS; both
are short and you should read them both carefully. The bssdemo is referred
to in this Manual, so it is worth getting it so you can follow the manual
while viewing the FPC BSS windows on your terminal.
Summary of the BSS windows: From the FPC main window, the "BSS window"
will be launched,
which allows you to specify your input files and parameters.
At the bottom of this window is a label "BSS results" with a scrollable
text panel listing all your BSS output files (we will refer to this
as the "output file panel"). Clicking a file in the output file panel
will open the results data in a new window, the "BSS Results window" which shows all hits.
From the BSS result window,
you can filter your hits, save them, or add them to the FPC project.
1.2. Getting the executable and bssdemo files
You can get the BSS demo files here. To untar,
type tar xvf bssdemoV2.tar
This will create
a directory called bssdemo that contains the bssdemo files.
The FPC V8.9 (or greater) containing the new BSS may be downloaded from http://www.agcol.arizona.edu/software/fpc/.
Follow FPC's instructions for installation.
In order to use BSS, at least one of the search
programs must be installed.
BLAST is available from ftp://ncbi.nlm.nih.gov/
by navigating to the blast/executables directory. The
executable "blastall" needs to be in your path. If you
are unsure about this, ask your system administrator to
install it. MegaBLAST is included with BLAST and does
not need a separate installation.
BLAT is available from http://www.cse.ucsc.edu/~kent.
The executable is called "blat" and must be in your path
for BSS to find it. Again, ask your system administrator
for help if you cannot run BLAT.
To run the demo, change to the demo directory and startup fpc, i.e
cd bssdemo/demo
fpc bssdemo
and select the button BSS on the main window and the "BSS window" will be
shown.
1.3. Input search files
Two sets of sequence files must be specified for each
search: a query set and a database set. The query files
are searched for matches to the database files, while the database
sequences are located on the FPC map by matching their names
with the FPC clone names. All files must be
in fasta format, i.e. one or more sequences where each
sequence has a description line, which is a ">" followed by
the sequence name. For example,
>demo.1
aaaccctgct
cctgctctcc
>demo.2
cctgcatg
The demo.1 and demo.2 are the sequence names.
The sequences can be any length, and the number of bases per line must be under 1000.
Four types of sequence are commonly used with BSS:
- Markers
These are sequences with biological meaning which one
wishes to locate on the FPC map. Sometimes they are the sequences of markers which
have already been assigned to some clones experimentally.
- Draft Sequence
A common sequencing strategy is to use whole-genome draft sequence in conjunction
with an FPC map. BSS may then be used to align the draft sequence to the map to identify
assembly problems. An additional "sequence track" feature will be added to FPC v9.0 to
enhance this functionality.
- Sequenced Clone
A fasta file containing one sequenced clone, possibly having
more than one sequence contig. If you would like BSS to recognize the sequence contig numbers,
then the sequence names must have one of two formats, either "clone_name.N" or "clone_name.ContigN".
For example, a sequence file for clone a0089K24 may have sequenced contigs
named a0089K24.1 and a0089K24.2.
The sequence contigs also must appear in the file in the same order as their
numbering, i.e. a0089K24.1 comes before a0089K24.2, etc. (there can be gaps
in the numbering, however).
If the sequenced clone is a clone in the FPC map, then it may
be used as the database for a BSS search, provided the sequence name matches
the FPC clone name. For example if your clone is called a0089K24, then your
sequence name should be a0089K24 (or a0089K24.1, etc. if it has sequence contigs).
More precisely the sequence name, with the .N or .ContigN suffix removed, must
match to the beginning of the FPC clone name. Hence, a0089K24.1 matches
clone a0089K24sd1, which is important for users of our FSD helper application; however,
a0089K24.1 would not match FPC clone ZMMBa0089K24.
- BES
A file of BESs that are associated with clones in the FPC file,
to be used as the database for a BSS search.
Again, this requires that the sequence names in the BES file match to the clone names in
the FPC project; otherwise, BSS does not know where the BES belongs.
Specifically, the FPC clone name must be contained in the BES name. For example,
clone a0089K24 could match to BES ZMMBa0089K24.r and ZMMBa0089K24.f.
You may use any prefixes and suffixes on the BES names; however, BSS will
operate much faster, and print fewer warning messages, if the prefix and suffix
always have the same length (and this is good informatics practice anyway!!)
To recap, either Sequenced Clones or BES may be used for the database, as long
as the sequence names match the FPC clone names. Also, one or more
query files can be searched against one or more database files.
If using more than one database file, they must all be in the same directory.
When using a directory of multiple database
files, the results are grouped together; consequently,
each sequence name across all database files must be unique.
It is recommended that each fasta description line have one simple name. If a file has a Genbank
header, BSS will use the accession number, except in the case of MegaBLAST,
because MegaBLAST parses these headers specially. A pop-up window will
inform you of which field is to be used and ask if this is acceptable.
1.4. Directory Setup
The sequence files for the BSS can be stored in any directory on
the computer, but it is probably easiest to copy the
configuration used in this tutorial. The tutorial bssdemoV2 has the files:
./bssdemo.fpc
./BES/
lib1.bes
lib2.bes
./Seq/
AP001551.seq
a0089K24.seq
./Mark/
00.est
j.est
1.5. Naming the BSS result files
FPC creates a sub-directory under the FPC project directory called "BSS_results".
All result files are written into this directory and all files in this
directory are in the output file panel.
The naming convention is as follows:
(1) Suffixes are removed, that is, a period and anything following is removed.
(2) The query name, database name, and suffix ".bss" are concatenated together.
For example:
QUERY DATABASE RESULT FILE(s)
00_ests.mrk a0089K24.seq 00_ests.a0089K24.bss
00_ests.mrk /Seq/* 00_ests.Seq.bss
/Mark/* /Seq/* 00_ests.Seq.bss,j_ests.Seq.bss
Everything under "BSS_results" is managed by the BSS software.
To delete one or more files in BSS_results, use the File Delete option,
on the "BSS window"
where you can either delete all files in this BES_results or only the
selected one.
1.6. Running BSS
1. Select your query file or directory.
2. Select your database file or directory.
3. Select Blast, MegaBLAST or Blat.
4. Change the Search parameters if desired. Note that BLAST requires an E-value whereas
BLAT requires a score.
5. Output options:
- Enter BSS subdirectory if desired.
The subdirectory will be created and your result files put in that subdirectory.
Once the search is complete, the subdirectory will be listed in the output file panel.
Double-click the subdirectory name, and all files in the subdirectory will be shown.
- Select 'Split BSS output by contig' if desired;
this creates a sub-directory of BSS files, one for each contig. Note: you can also
filter a BSS file by contig, which is generally preferred. You would only want
the split-contig option if your BLAST results are too large to handle in one
BSS output file.
6. Start Search. Your output files will be shown in the BSS results panel.
Doubleclick on one to view it.
1.7. BSS Results Window
This has three tables. The following description uses results from
the bssdemo, with the query file Mark/00.est against
the database /Seq.
Query Hit summary table (first table in upper right corner):
Sequence Hits/#Ctgs Best Ctg/#Hits
001-132-E04 2/0 0/2
002-101-F05 2/1 1/2
002-147-C01 3/2 1/2
All sequence names in the file are listed. The Hits/#Ctgs gives the total number of
hits for a sequence and the total number of contigs that contain a clone that it hit.
The Best Ctg is the one with the most hits, followed by the number of hits.
In this example, the first sequence had 2 hits that were all to singletons (Ctg0), the second had
2 hits to Ctg1, and the third had 3 hits to 2 contigs, where the best was Ctg1 with
2 hits.
Contig Hit summary table (second table in upper right corner):
Contig CloneHits
1 5
3 2
In this example, Ctg1 had a total of 5 clones that were hit by a sequence, and Ctg3
had 2 clones that were hit. You can double click a contig name in this
table and it will be displayed.
Hit table (large table at the bottom of the window):
Target RC Clone Contig Query Score EValue Identity Match Query_len, etc
a0089K24 y a0089K24 3 001-132-E04 751 0.0 95% 14% 3296
etc
All hits are shown. The columns are as follows:
-
The target (database) sequence name
-
Whether the hit was reverse complemented. This is useful
when manually selecting the next clone for sequencing.
-
The FPC clone name.
-
The contig for the FPC clone.
-
The query sequence name.
-
The next 3 fields are the BLAST Score, EValue, and Identity.
For Blat alignments, EValue is replaced by Intron, which gives the largest gap
in the target site of the alignment (since Blat joins together multiple Blast HSP's)
-
The Match field is the percentage of the query length which was matched; this is
useful information in screening for complete alignments of a query.
-
The remaining fields are the Query length, start and end, and the Target (Database)
length, start and end.
Double click an entry in the table and the alignment will be printed to the
terminal window from which you launched FPC.
1.8. Save, Filter, Add to FPC
The BSS results window has three menu items:
The File menu has:
- Save BSS - you can filter the hits (see Analysis) and then save the results as a BSS file to view later. It will overwrite the current file. Changes to the Columns (see Columns) are NOT saved.
- Save BSS as - same as above but you supply the file name.
- Save for spreadsheet - save the Hit table to a file. If you change the Hit table, what is shown is what is saved, i.e the hits have been filtered and/or columns have been removed.
The Analysis menu has:
- Filter Hits - brings up a window of attributes you can filter on. You can apply multiple filters
in succession, and undo them (see Tutorial).
The filters are:
String: {Select type} [enter substring] Type is Query,Target,RC,or Clone
Numeric: {Select type} {<, =, >} [enter number] Type is Contig, Score, Evalue, etc
Min hits per ctg: [enter number]
Max ctgs hit: [enter number]
Max total hits: [enter number]
Ctg ends only: FromEnd [enter number] FromEnd is as on Analysis windows.
- View Keyset of hit clones - brings up an FPC keyset window of the clones listed in the Hits table.
You can open a contig, pull-down Highlight, then select Keyset; all the clones in the keyset that
are in the contig will be highlighted. In this way, you can see the distribution of your
hits.
- Add hits to FPC - brings up a window with the following options:
Add as Marker
Add as Remarks
Add as FP remarks
Only the sequences that are in the Hit Table are added, i.e. you can filter out sequences
you do not want first.
The sequence name is added with a prefix which you can specify (the default is "BSS:"). This allows you
to easily locate the new markers or remarks using the FPC search functions. If added as a Marker, the
type is eMrk.
The Columns menu has:
A menu is shown with all the columns. Selecting any of them toggles it
on or off.
Note that the "SeqCtg" column is hidden by default. Toggling this column on
allows you to see and sort on the sequence contig numbers. These may be derived
from the fasta names, as previously discussed, or (if the names
don't follow the numbering convention), they just give the order of the sequence in its
fasta file.
1.9. Summary of Usages
Query |
Database |
Function |
Options |
Markers or WGS contigs1 |
one or more BES files |
Locate
| Filter2, Add as marker or remarks |
Markers or WGS contigs |
directory of sequenced clones |
Locate
| Filter2, Add as marker or remarks |
Sequenced clone from end of FPC contig |
BES |
Find neighbor contig |
Filter: Ctg ends only |
Sequenced clone in FPC contig |
BES |
Find next clone for sequencing |
Filter: Numeric {Contig} |
WGS contigs |
BES or Sequenced clones |
Use with FPC MTP3 |
|
1WGS files can be very big, so the results can be very big and
hard to view all at once. Use the "Split by contig" option for these, though
if you have many contigs, there will be many files...
2Many of the filters are for markers so as to only add the 'good' hits.
For example, you may only want to add markers that do not hit more than 2 contigs
(Max ctgs hit).
Or you may filter on the 'Match' to make sure that at least 95% of the query matches
the target. You may apply multiple filters as each filter is applied to the results
of the previous filter.
3FPC V9 (coming soon) will have a sequence track, in which this file can be added.
Table of Contents    
Go to Top    
Tutorial
|