Package Bio :: Package GenBank
[hide private]
[frames] | no frames]

Package GenBank

source code

Code to work with GenBank formatted files.

Rather than using Bio.GenBank, you are now encouraged to use Bio.SeqIO with
the "genbank" or "embl" format names to parse GenBank or EMBL files into
SeqRecord and SeqFeature objects (see the Biopython tutorial for details).

Also, rather than using Bio.GenBank to search or download files from the NCBI,
you are now encouraged to use Bio.Entrez instead (again, see the Biopython
tutorial for details).

Currently the ONLY reason to use Bio.GenBank directly is for the RecordParser
which turns a GenBank file into GenBank-specific Record objects.  This is a
much closer representation to the raw file contents that the SeqRecord
alternative from the FeatureParser (used in Bio.SeqIO).

Classes:
Iterator              Iterate through a file of GenBank entries
ErrorFeatureParser    Catch errors caused during parsing.
FeatureParser         Parse GenBank data in SeqRecord and SeqFeature objects.
RecordParser          Parse GenBank data into a Record object.
NCBIDictionary        Access GenBank using a dictionary interface (DEPRECATED).

_BaseGenBankConsumer  A base class for GenBank consumer that implements
                      some helpful functions that are in common between
                      consumers.
_FeatureConsumer      Create SeqFeature objects from info generated by
                      the Scanner
_RecordConsumer       Create a GenBank record object from Scanner info.
_PrintingConsumer     A debugging consumer.

ParserFailureError    Exception indicating a failure in the parser (ie.
                      scanner or consumer)
LocationParserError   Exception indiciating a problem with the spark based
                      location parser.

Functions:
search_for            Do a query against GenBank (DEPRECATED).
download_many         Download many GenBank records (DEPRECATED).

17-MAR-2009: added wgs, wgs_scafld for GenBank whole genome shotgun master records.
These are GenBank files that summarize the content of a project, and provide lists of
scaffold and contig files in the project. These will be in annotations['wgs'] and
annotations['wgs_scafld']. These GenBank files do not have sequences. See
http://groups.google.com/group/bionet.molbio.genbank/browse_thread/thread/51fb88bf39e7dc36

http://is.gd/nNgk
for more details of this format, and an example.
Added by Ying Huang & Iddo Friedberg

Submodules [hide private]

Classes [hide private]
  Iterator
Iterator interface to move over a file of GenBank entries one at a time.
  ParserFailureError
Failure caused by some kind of problem in the parser.
  LocationParserError
Could not Properly parse out a location from a GenBank file.
  FeatureParser
Parse GenBank files into Seq + Feature objects.
  RecordParser
Parse GenBank files into Record objects
  _BaseGenBankConsumer
Abstract GenBank consumer providing useful general functions.
  _FeatureConsumer
Create a SeqRecord object with Features to return.
  _RecordConsumer
Create a GenBank Record object from scanner generated information.
  NCBIDictionary
Access GenBank using a read-only dictionary interface (DEPRECATED).
Functions [hide private]
 
search_for(search, database='nucleotide', reldate=None, mindate=None, maxdate=None, start_id=0, max_ids=50000000)
Do an online search at the NCBI, returns a list of IDs (DEPRECATED).
source code
 
download_many(ids, database='nucleotide')
Download multiple NCBI GenBank records, returned as a handle (DEPRECATED).
source code
Variables [hide private]
  GENBANK_INDENT = 12
  GENBANK_SPACER = ' '
  FEATURE_KEY_INDENT = 5
  FEATURE_QUALIFIER_INDENT = 21
  FEATURE_KEY_SPACER = ' '
  FEATURE_QUALIFIER_SPACER = ' '
  __package__ = 'Bio.GenBank'
Function Details [hide private]

search_for(search, database='nucleotide', reldate=None, mindate=None, maxdate=None, start_id=0, max_ids=50000000)

source code 

Do an online search at the NCBI, returns a list of IDs (DEPRECATED).

This function is deprecated and will be removed in a future release of Biopython. Please use Bio.Entrez instead as described in the tutorial.

Search GenBank and return a list of the GenBank identifiers (gi's) that match the criteria. search is the search string used to search the database. Valid values for database are 'nucleotide', 'protein', 'popset' and 'genome'. reldate is the number of dates prior to the current date to restrict the search. mindate and maxdate are the dates to restrict the search, e.g. 2002/12/20. start_id is the number to begin retrieval on. max_ids specifies the maximum number of id's to retrieve.

download_many(ids, database='nucleotide')

source code 

Download multiple NCBI GenBank records, returned as a handle (DEPRECATED).

This function is deprecated and will be removed in a future release of Biopython. Please use Bio.Entrez instead as described in the tutorial.

Download many records from GenBank. ids is a list of gis or accessions.