The University of Arizona
Extract Sequence Data  
Home | Search | FPC | Contact Us

Written by Jamie Hatfield     01/2002

Contents

Description
Command Line Options
Sample Usage

Description

Back to top
ESD reads a incremental update downloaded from Genbank (ftp://ncbi.nlm.nih.gov/genbank/daily-nc/ - filename is of the format ncMMDD.flat.gz). Genbank posts one of these every morning at around 7am for the previous day's additions. ESD scans through this file looking for an organism that you specify on the command line after the filename. It then saves the sequence data from that Genbank entry to a file that is named either by the sequence's accession number or the clone name.

Command line options

Back to top
                esd <file> '<organism>' {c | a}

        <file> is the file to extract daily updates from
        '<organism>' is the official organism name 
              (e.g., 'Oryza sativa')
              (Don't forget the single quotes around the organism name!)
        a - name sequence files by the accession number
        c - name sequence files by the clone name

Sample Usage

Back to top
    ## Get the daily incremental update
    cd fsd
    ftp ncbi.nlm.nih.gov
       user anonymous 
       cd /genbank/daily-nc
       bi
       get ncMMDD.flat.gz

    ## Extract the sequence files for your organism
    tar -xf ncMMDD.flat.gz
    esd ncMMDD.flat 'Organism' a

    cd ..

    ## Perform a simulated digest.  See fsd documentation for more options
    fsd b f . d fsd c 180000 80

    ## Input the simulated digest clones into fpc
    fpc  -batch updcor

    ## Update the remarks for those clones
    fpc  -batch mergerm fsd/remark.ace

 

 

Email Comments To: fpc@agcol.arizona.edu

 

Last Modified Thursday February 14, 2008 10:31 AM and 41 seconds