BSS Manual  
Home | Search | FPC | Contact Us

Table of contents     Manual

BSS Tutorial

Follow along the demonstration below to gain a firm understanding of the BSS function.
Actions to be taken by the reader are highlighted in red.

2.1. Searching

Make sure you are in the "bssdemo/demo" directory, and then launch FPC on the bssdemo file by running "fpc bssdemo.fpc". Click the "BSS" button on the main FPC window, and you will see the BSS window:

Our goal in this search will be to anchor some EST markers to the FPC map electronically, by aligning their sequences to BES. Hence our "Query" sequences are the marker sequences, and our "Database" sequences are the BES sequences. Click the Browse button on the very top (Query) line, and the file/directory browse window pops up. Doubleclick on the "Mark" directory to select it, and then click "OK". The browse window closes and the Query entry field reads "./Mark/*", indicating that all files in the Mark directory will be used. (In the browse window we could also have selected one particular file, and it is also possible to type the file path directly into the Query entry field.)

Next, select the BES database by repeating the previous three steps for the Database entry field: click "Browse", doubleclick "BES", click "OK".

"Search tool" is set to MegaBLAST, and we will leave it this way, although in fact Blat would be a more appropriate tool for this search, since Blat alignments can bridge introns. We will see that MegaBLAST finds two alignments for some of the queries, which really are a single alignment divided by an intron.

The "Search parameters" field sets parameters which are passed to the alignment program. The most important for MegaBLAST is the E-value, which has its own entry window. Any other parameters you wish to set should be entered into the "Other" window. We will not change these settings. For a complete list of the options available to the selected program, click the "Options" button. This causes the program to print a list of its options to the terminal window (the window from which you launched FPC).

Finally, we will choose a subdirectory to store these search results and keep them separate from other searches we may perform later. Enter "Mark_to_BES" in the Subdirectory field of the "BSS output options" section.

Now we are ready to search! Your BSS window should now look like this:

Click the "Start search" button and the search is executed. While the search is proceeding (a very short time in this case), the BSS and other FPC windows are not usable.

As a search is executed, various outputs are printed to the terminal window. For our search, they are as follows:

Collect sequence lengths from file:BES/lib2.bes
Matched target sequence OSJNBa0002A22r to clone a0002A22
Prefix length:5
45 sequences (out of 45) could be matched to a clone

Collect sequence lengths from file:BES/lib1.bes
24 sequences (out of 24) could be matched to a clone
Collect sequence lengths from file:Mark/j_ests.mrk
execute:mkdir BES/formatdb
Formatting BLAST databases
lib2.bes needs to be formatted
formatdb -i BES/lib2.bes -p F -o T -n BES/formatdb/lib2.bes
execute:formatdb -i BES/lib2.bes -p F -o T -n BES/formatdb/lib2.bes
lib1.bes needs to be formatted
formatdb -i BES/lib1.bes -p F -o T -n BES/formatdb/lib1.bes
execute:formatdb -i BES/lib1.bes -p F -o T -n BES/formatdb/lib1.bes
execute:megablast -V F -d BES/formatdb/bss -i Mark/j_ests.mrk -o ./BSS_results/Mark_to_BES/j_ests.BES.bss.megablast -e 1e-100 -F F -D 3
parsing ./BSS_results/Mark_to_BES/j_ests.BES.bss.megablast
write ./BSS_results/Mark_to_BES/j_ests.BES.bss
Collect sequence lengths from file:Mark/00_ests.mrk
Formatting BLAST databases
execute:megablast -V F -d BES/formatdb/bss -i Mark/00_ests.mrk -o ./BSS_results/Mark_to_BES/00_ests.BES.bss.megablast -e 1e-100 -F F -D 3
parsing ./BSS_results/Mark_to_BES/00_ests.BES.bss.megablast
write ./BSS_results/Mark_to_BES/00_ests.BES.bss
Number of files processed: 2; Total retained hits:12
On average, each file had 6 hits.




The output illustrates three steps which are common to all BSS searches:

Collect sequence lengths and names
All sequence files in the query and database set are examined and checked for duplicate sequence names. All of the search programs can have problems when names are duplicated, so to prevent this BSS will not proceed if duplicate names exist.

Match sequences to clones
The purpose of BSS searches is to locate your query sequences on the FPC map. To accomplish this, BSS matches sequence names to FPC clone names. When it succeeds, you will see output as above:

Matched target sequence OSJNBa0002A22r to clone a0002A22
Prefix length:5
45 sequences (out of 45) could be matched to a clone
This indicates that BES database sequence OSJNBb0001K12r was matched to FPC clone b0001K12, and that a prefix of five characters was found on the BES sequence name, as compared to the clone name. The BSS program uses these prefix/suffix lengths to speed up future matching.
In this case, all 45 BES sequences could be matched to a clone. Usually some do not match, since BES sequencing can succeed while fingerprinting reactions fail on a given clone. If fewer than 50% of the sequences can be matched, BSS will provide an alert.

Important: For the BES name matching to succeed, your clone names must not contain each other, e.g. C01 and C010A. In this case, BSS can not tell whether a BES such as C010A.r belongs with clone C01 (using a 4-character suffix) or clone C010A (using a 2-character suffix). It will choose one at random, which may be incorrect.

Parse the alignment results
BSS reads the output files from the search program and converts the results to a "BSS file", which is easier to read and contains extra information such as the contig where the match was located. These files are stored in the BSS_results directory, or in the subdirectory which was specified (in our case, the "Mark_to_BES" subdirectory).

One additional operation is done only for BLAST or MegaBLAST:
Format the BLAST databases
BLAST and MegaBLAST before require the database sequences to be formatted with a program called "formatdb". BSS checks for the formatted databases, and if they are not found or are out of date, it runs formatdb to create them. Blat does not require this step.

The search step is now complete. In the next section, we will learn how to work with the results.

2.2. Filtering and adding to FPC

The "BSS Results" section at the bottom of the BSS window should now list the subdirectory "Mark_to_BES" which we chose for the results:

Double-clicking on this directory causes its contents to be listed, namely the two BSS output files resulting from our search:

There are two output files, one for each of the marker query files in the "Mark" directory. (BSS always generates a separate output file for each input query file). Clicking on a file selects it, and it may then be deleted using the "File" menu at the top of the window. Double-clicking on a file opens it into its own display. Note that you can widen the "Output file" column to see the whole filenames.

In our search, the first file, named "00_ests.BES.bss", has no hits, while the second, "j_ests.BES.bss", has 12. Double-click on "j_ests.BES.bss", and you should see a new window, the Results window for this search:

We see from the Query Hits summary table (upper right, top table) that each marker had four hits and each hit exactly one contig (second column). The best contig for each was contig 1, and each had four hits to contig 1 (third column).

From the Contig Hits summary table (upper right, second table), we see that contig 1 received 12 hits from our search. We can easily visualize these hits within FPC displays, as follows:
First, we open an FPC "keyset" of the clones which had hits (see the FPC documentation for more information on keysets). To open the keyset, choose "View keyset of hit clones" from the Analysis menu. You should see an FPC window containing 3 clones.
Next, click on the contig 1 line (the only line) of the Contig Hit summary table. The FPC contig display of contig 1 will open.
Finally, on the FPC contig display, choose "Select Keyset" from the Highlight menu. The three clones which had hits in our search are highlighted. Your contig display should look as follows:

Now close the contig display and keyset and we will explore the analysis tools available through the BSS Results view.

The simplest tool is sorting. Clicking on any column header of any one of the three tables causes the rows of the table to be sorted according to that column. For example, click on the Score column of the Hits table; the hit entries are now sorted by score, lowest to highest. Probably we would prefer to sort them highest-to-lowest, and this is accomplished by holding down the shift key and clicking on the Score column. A shift-click always sorts in the reverse order to a click. The sorting is alphabetic or numeric, depending on the type of data in the column.

Multiple sorts are accomplished by simply clicking several columns in a row. For example, now shift-click the Score column and then click the Query column. Now the hits are ordered by query name, and secondarily by score, so for each query the hits are in high-to-low order of score.

In addition to sorting columns, columns may also be hidden using the Columns menu at the top of the BSS window. The menu contains a checkbox for each column, allowing the column to be shown or hidden. If you choose "Save for spreadsheet" from the File menu, then hidden columns are not saved to the spreadsheet table; however, if you save the BSS file and reopen it, hidden columns are restored. Hiding columns does not remove any data permanently from the BSS File.

Next we will look at the filtering tool. Select "Filter hits" from the Analysis menu, and the Filter dialog will open:

The dialog contains six different types of filter, which you can apply one after another and then undo if desired. To select a filter type, press the button to the left of its line. We will filter first by sequence name, so select the "String" filter option and then choose "Query" on its drop-down menu of column choices. Enter "J03303*" in the String text box. This signifies to keep only hits for which the Query column value starts with "J03303". (If we leave off the "*", then it will keep only hits with exact name J03303). Your Filter dialog should now look like:

Press the "Apply Filter" button, and the Hits table in the Results window changes to show only the four hits for Sequence J033030D20. Also, the Filter history table at the bottom of the Filter dialog acquires an entry showing the filter which was applied.

Press the "Undo Last Filter" button, to undo the last filter; now the Results window shows all the hits once again. Press the "Apply Filter" button to reapply the filter.

Now we will add another filter, to find only the hits which are to the end of a contig (this would be useful e.g. for finding evidence for contig merges). select the "Ctg ends only" option on the Filter dialog, and Apply Filter. Now the Hits table shows only two hits, and they are both to clone b0068E10 on the left end of contig one (indicated by the "1L" in the Contig column). Note that the "Ctg ends" filter has a numeric parameter "FromEnd"; this tells how close to the end of the contig, in CB units, the hit clone must be to qualify as being at an end (see FPC documentation for explanation of CB units).

These two hits also illustrate how MegaBLAST (and BLAST) divide hits which span introns. Looking at Targ_start,Targ_end (the last two columns) we see that the first hit ends at basepair 406 on the BES, while the second hit starts at 547. Given that the query is an EST sequence, the gap of 141 basepairs is very likely to be an intron. Using Blat, these two hits would have been reported as one, with a value of 141 in the "Intron" column.

Press the "Undo Last Filter" button to undo the "Ctg ends" filter. The Hits table now shows four hits once again. Let us find all hits with score greater than 600. Select the "Numeric" filter option and then choose "Score" on its first drop-down menu (of column choices). Choose ">" in the second drop down menu (of comparison options). Enter "600" the text box. Your Filter dialog should now look like:

Press the "Apply Filter" button. Now the Hits table shows two hits, namely the hits of J033030D20 having score greater than 600. Both filters we applied are listed in the filter history list on the Filter dialog, and we could undo them by pressing "Undo Last Filter" twice. However, we will instead add these hits to the FPC map.

Adding hits to the FPC map as markers, remarks, or "FP remarks" allows us to store and visualize the results of BSS alignments. The different categories have different purposes, as follows:

Markers
A marker generally indicates a gene or other genomic sequence property (such as an SSR) which can be identified and genetically mapped. Markers within FPC have a number of defined types; those added through BSS have type "eMRK".

Remarks
Remarks are used for any other kind of annotation which one wants to be visible to every user of the FPC map.

FP Remarks
FP Remarks are remarks intended for a smaller audience, e.g. remarks concerning the assembly of the FPC map. Typically these remarks are not shown by default (although all users can see them, if they desire.)

Choose the "Add Hits to FPC" entry from the Analysis menu. The Add Hits dialog comes up:

The "Prefix" entry allows you to choose a prefix to be added to the sequence name. The two together become the marker name, remark, or fp remark. This allows you to locate the added markers or remarks at a later time using the FPC search functions.

We will use the default Prefix and the Markers category, so simply press the "Add" button. Now bring up the contig 1 FPC display again, and the new marker is visible:

 

Email Comments To: fpc@agcol.arizona.edu