BSS Manual  
Home | Search | FPC | Contact Us

Table of contents     Tutorial

BSS Manual

1.1. Overview

BSS (Blast Some Sequence) organizes sequence searches against target sequences that are located on an FPC map. The target sequences can be BAC-end sequences (BES) or sequenced clones that are associated with clones in the FPC map. The queries can be arbitrary sequences. The results of the searches can be added to the FPC map as either electronic markers or remarks on the clones whose associated sequences were hit by a query. The search may be filtered based on various attributes, such as a given contig, clones at the end of contigs, or by scores.

There are many uses for these capabilities. Adding electronic marker hits can either confirm existing placements or discover new ones, which helps to anchor contigs. Other information, such as hits from a repeat database or whole genome shotgun (WGS) contigs, can be placed on the map. Also, BSS searches can aid in merging contigs, finding a minimal tiling path, and finding the next clone for sequencing.

BSS can perform searches using either BLAST, MegaBLAST, or BLAT. In all cases, BSS will parse the resulting output file, generating a new and more readable "BSS file" which summarizes the search results.

We provide both this manual and a tutorial for the BSS; both are short and you should read them both carefully. The bssdemo is referred to in this Manual, so it is worth getting it so you can follow the manual while viewing the FPC BSS windows on your terminal.

Summary of the BSS windows: From the FPC main window, the "BSS window" will be launched, which allows you to specify your input files and parameters. At the bottom of this window is a label "BSS results" with a scrollable text panel listing all your BSS output files (we will refer to this as the "output file panel"). Clicking a file in the output file panel will open the results data in a new window, the "BSS Results window" which shows all hits. From the BSS result window, you can filter your hits, save them, or add them to the FPC project.

1.2. Getting the executable and bssdemo files

You can get the BSS demo files here. To untar, type
 tar xvf bssdemoV2.tar 
This will create a directory called bssdemo that contains the bssdemo files.

The FPC V8.9 (or greater) containing the new BSS may be downloaded from http://www.agcol.arizona.edu/software/fpc/. Follow FPC's instructions for installation.

In order to use BSS, at least one of the search programs must be installed. BLAST is available from ftp://ncbi.nlm.nih.gov/ by navigating to the blast/executables directory. The executable "blastall" needs to be in your path. If you are unsure about this, ask your system administrator to install it. MegaBLAST is included with BLAST and does not need a separate installation.

BLAT is available from http://www.cse.ucsc.edu/~kent. The executable is called "blat" and must be in your path for BSS to find it. Again, ask your system administrator for help if you cannot run BLAT.

To run the demo, change to the demo directory and startup fpc, i.e

cd bssdemo/demo
fpc bssdemo
and select the button BSS on the main window and the "BSS window" will be shown.

1.3. Input search files

Two sets of sequence files must be specified for each search: a query set and a database set. The query files are searched for matches to the database files, while the database sequences are located on the FPC map by matching their names with the FPC clone names. All files must be in fasta format, i.e. one or more sequences where each sequence has a description line, which is a ">" followed by the sequence name. For example,
>demo.1
aaaccctgct
cctgctctcc
>demo.2
cctgcatg
The demo.1 and demo.2 are the sequence names. The sequences can be any length, and the number of bases per line must be under 1000.

Four types of sequence are commonly used with BSS:

  1. Markers
  2. These are sequences with biological meaning which one wishes to locate on the FPC map. Sometimes they are the sequences of markers which have already been assigned to some clones experimentally.
  3. Draft Sequence
  4. A common sequencing strategy is to use whole-genome draft sequence in conjunction with an FPC map. BSS may then be used to align the draft sequence to the map to identify assembly problems. An additional "sequence track" feature will be added to FPC v9.0 to enhance this functionality.
  5. Sequenced Clone
  6. A fasta file containing one sequenced clone, possibly having more than one sequence contig. If you would like BSS to recognize the sequence contig numbers, then the sequence names must have one of two formats, either "clone_name.N" or "clone_name.ContigN". For example, a sequence file for clone a0089K24 may have sequenced contigs named a0089K24.1 and a0089K24.2. The sequence contigs also must appear in the file in the same order as their numbering, i.e. a0089K24.1 comes before a0089K24.2, etc. (there can be gaps in the numbering, however).

    If the sequenced clone is a clone in the FPC map, then it may be used as the database for a BSS search, provided the sequence name matches the FPC clone name. For example if your clone is called a0089K24, then your sequence name should be a0089K24 (or a0089K24.1, etc. if it has sequence contigs). More precisely the sequence name, with the .N or .ContigN suffix removed, must match to the beginning of the FPC clone name. Hence, a0089K24.1 matches clone a0089K24sd1, which is important for users of our FSD helper application; however, a0089K24.1 would not match FPC clone ZMMBa0089K24.

  7. BES
  8. A file of BESs that are associated with clones in the FPC file, to be used as the database for a BSS search. Again, this requires that the sequence names in the BES file match to the clone names in the FPC project; otherwise, BSS does not know where the BES belongs. Specifically, the FPC clone name must be contained in the BES name. For example, clone a0089K24 could match to BES ZMMBa0089K24.r and ZMMBa0089K24.f. You may use any prefixes and suffixes on the BES names; however, BSS will operate much faster, and print fewer warning messages, if the prefix and suffix always have the same length (and this is good informatics practice anyway!!)

To recap, either Sequenced Clones or BES may be used for the database, as long as the sequence names match the FPC clone names. Also, one or more query files can be searched against one or more database files. If using more than one database file, they must all be in the same directory. When using a directory of multiple database files, the results are grouped together; consequently, each sequence name across all database files must be unique.

It is recommended that each fasta description line have one simple name. If a file has a Genbank header, BSS will use the accession number, except in the case of MegaBLAST, because MegaBLAST parses these headers specially. A pop-up window will inform you of which field is to be used and ask if this is acceptable.

1.4. Directory Setup

The sequence files for the BSS can be stored in any directory on the computer, but it is probably easiest to copy the configuration used in this tutorial. The tutorial bssdemoV2 has the files:
./bssdemo.fpc
./BES/
      lib1.bes
      lib2.bes
./Seq/
      AP001551.seq
      a0089K24.seq
./Mark/
      00.est
      j.est

1.5. Naming the BSS result files

FPC creates a sub-directory under the FPC project directory called "BSS_results". All result files are written into this directory and all files in this directory are in the output file panel.

The naming convention is as follows: (1) Suffixes are removed, that is, a period and anything following is removed. (2) The query name, database name, and suffix ".bss" are concatenated together. For example:

        QUERY		DATABASE	RESULT FILE(s) 
	00_ests.mrk	a0089K24.seq	00_ests.a0089K24.bss
	00_ests.mrk	/Seq/*		00_ests.Seq.bss
	/Mark/*		/Seq/*		00_ests.Seq.bss,j_ests.Seq.bss
Everything under "BSS_results" is managed by the BSS software. To delete one or more files in BSS_results, use the File Delete option, on the "BSS window" where you can either delete all files in this BES_results or only the selected one.

1.6. Running BSS

1. Select your query file or directory.

2. Select your database file or directory.

3. Select Blast, MegaBLAST or Blat.

4. Change the Search parameters if desired. Note that BLAST requires an E-value whereas BLAT requires a score.

5. Output options:

  1. Enter BSS subdirectory if desired. The subdirectory will be created and your result files put in that subdirectory. Once the search is complete, the subdirectory will be listed in the output file panel. Double-click the subdirectory name, and all files in the subdirectory will be shown.
  2. Select 'Split BSS output by contig' if desired; this creates a sub-directory of BSS files, one for each contig. Note: you can also filter a BSS file by contig, which is generally preferred. You would only want the split-contig option if your BLAST results are too large to handle in one BSS output file.

6. Start Search. Your output files will be shown in the BSS results panel. Doubleclick on one to view it.

1.7. BSS Results Window

This has three tables. The following description uses results from the bssdemo, with the query file Mark/00.est against the database /Seq.

Query Hit summary table (first table in upper right corner):

Sequence	Hits/#Ctgs	Best Ctg/#Hits
001-132-E04	2/0		0/2
002-101-F05	2/1		1/2
002-147-C01	3/2		1/2
All sequence names in the file are listed. The Hits/#Ctgs gives the total number of hits for a sequence and the total number of contigs that contain a clone that it hit. The Best Ctg is the one with the most hits, followed by the number of hits. In this example, the first sequence had 2 hits that were all to singletons (Ctg0), the second had 2 hits to Ctg1, and the third had 3 hits to 2 contigs, where the best was Ctg1 with 2 hits.

Contig Hit summary table (second table in upper right corner):

Contig  CloneHits
1       5
3	2
In this example, Ctg1 had a total of 5 clones that were hit by a sequence, and Ctg3 had 2 clones that were hit. You can double click a contig name in this table and it will be displayed.

Hit table (large table at the bottom of the window):

Target     RC  Clone    Contig  Query       Score EValue Identity Match Query_len, etc 
a0089K24   y  a0089K24  3       001-132-E04 751   0.0    95%      14%   3296
etc
All hits are shown. The columns are as follows:
  1. The target (database) sequence name
  2. Whether the hit was reverse complemented. This is useful when manually selecting the next clone for sequencing.
  3. The FPC clone name.
  4. The contig for the FPC clone.
  5. The query sequence name.
  6. The next 3 fields are the BLAST Score, EValue, and Identity. For Blat alignments, EValue is replaced by Intron, which gives the largest gap in the target site of the alignment (since Blat joins together multiple Blast HSP's)
  7. The Match field is the percentage of the query length which was matched; this is useful information in screening for complete alignments of a query.
  8. The remaining fields are the Query length, start and end, and the Target (Database) length, start and end.
Double click an entry in the table and the alignment will be printed to the terminal window from which you launched FPC.

1.8. Save, Filter, Add to FPC

The BSS results window has three menu items:

The File menu has:

  • Save BSS - you can filter the hits (see Analysis) and then save the results as a BSS file to view later. It will overwrite the current file. Changes to the Columns (see Columns) are NOT saved.

  • Save BSS as - same as above but you supply the file name.

  • Save for spreadsheet - save the Hit table to a file. If you change the Hit table, what is shown is what is saved, i.e the hits have been filtered and/or columns have been removed.
The Analysis menu has:
  • Filter Hits - brings up a window of attributes you can filter on. You can apply multiple filters in succession, and undo them (see Tutorial). The filters are:
    String: {Select type} [enter substring]           Type is Query,Target,RC,or Clone
    Numeric: {Select type} {<, =, >} [enter number]   Type is Contig, Score, Evalue, etc
    Min hits per ctg: [enter number]
    Max ctgs hit: [enter number]
    Max total hits: [enter number]
    Ctg ends only:  FromEnd [enter number] 		  FromEnd is as on Analysis windows.
    
  • View Keyset of hit clones - brings up an FPC keyset window of the clones listed in the Hits table. You can open a contig, pull-down Highlight, then select Keyset; all the clones in the keyset that are in the contig will be highlighted. In this way, you can see the distribution of your hits.

  • Add hits to FPC - brings up a window with the following options:
         Add as Marker
         Add as Remarks
         Add as FP remarks
    
    Only the sequences that are in the Hit Table are added, i.e. you can filter out sequences you do not want first. The sequence name is added with a prefix which you can specify (the default is "BSS:"). This allows you to easily locate the new markers or remarks using the FPC search functions. If added as a Marker, the type is eMrk.

The Columns menu has:
A menu is shown with all the columns. Selecting any of them toggles it on or off.
Note that the "SeqCtg" column is hidden by default. Toggling this column on allows you to see and sort on the sequence contig numbers. These may be derived from the fasta names, as previously discussed, or (if the names don't follow the numbering convention), they just give the order of the sequence in its fasta file.

1.9. Summary of Usages

Query Database Function Options
Markers or WGS contigs1 one or more BES files Locate Filter2, Add as marker or remarks
Markers or WGS contigs directory of sequenced clones Locate Filter2, Add as marker or remarks
Sequenced clone from end of FPC contig BES Find neighbor contig Filter: Ctg ends only
Sequenced clone in FPC contig BES Find next clone for sequencing Filter: Numeric {Contig}
WGS contigs BES or Sequenced clones Use with FPC MTP3

1WGS files can be very big, so the results can be very big and hard to view all at once. Use the "Split by contig" option for these, though if you have many contigs, there will be many files...

2Many of the filters are for markers so as to only add the 'good' hits. For example, you may only want to add markers that do not hit more than 2 contigs (Max ctgs hit). Or you may filter on the 'Match' to make sure that at least 95% of the query matches the target. You may apply multiple filters as each filter is applied to the results of the previous filter.

3FPC V9 (coming soon) will have a sequence track, in which this file can be added.

Table of Contents     Go to Top     Tutorial

 

Email Comments To: fpc@agcol.arizona.edu