Package Martel
[hide private]
[frames] | no frames]

Package Martel

source code

Martel is a 'regular expressions on steroids' parser generator (DEPRECATED).

A goal of the Biopython project is to reduce the amount of effort needed to do computational biology. A large part of that work turns out to be parsing file formats, which lead to the development of Martel, a parser generator which uses a regular expression as the format description to create a parser that returns the parse tree using the SAX API common in XML processing.

While intended to be both fast and relatively easy to understand, Martel did struggle with some very large records (e.g. GenBank files for whole genomes or chromosomes), and in practice debugging the Martel format specifications for evolving file formats like GenBank proved non-trivial.

Andrew Dalke is no longer maintaining Martel or Bio.Mindy, and these modules are now deprecated. They are no longer used in any of the current Biopython parsers, and are likely to be removed in a future release of Biopython.


Version: 1.50

Submodules [hide private]

Functions [hide private]
 
Str1(s)
(s) -> match the literal string
source code
 
Str(*args)
(s1, s2, ...) -> match s1 or s2 or ...
source code
 
Any(s)
(s) -> match any character in s
source code
 
AnyBut(s)
s -> match any character not in s
source code
 
Seq(*args)
exp1, exp2, ...
source code
 
Alt(*args)
exp1, exp2, ...
source code
 
Opt(expr)
expr -> match 'expr' 1 or 0 times
source code
 
Rep(expr)
expr -> match 'expr' as many times as possible, even 0 time
source code
 
Rep1(expr)
expr -> match 'expr' as many times as possible, but at least once
source code
 
Case(expr) source code
 
Bol() source code
 
Eol() source code
 
Empty() source code
 
Eof() source code
 
MaxRepeat(expr, min_count, max_count=65535)
expr, min_count, max_count = 65535 -> match between min- and max_count times
source code
 
RepN(expr, count)
expr, count -> match the expression 'count' number of time
source code
 
Group(name, expr, attrs=None)
name, expr -> use 'name' to describe a successful match of the expression
source code
 
_fix_newlines(s) source code
 
Re(pattern, fix_newlines=0)
pattern -> the expression tree for the regexp pattern string
source code
 
Assert(expression) source code
 
AssertNot(expression) source code
 
_group(name, exp, attrs) source code
 
Digits(name=None, attrs=None)
match one or more decimal digits
source code
 
Integer(name=None, attrs=None)
match an integer (digits w/ optional leading + or - sign)
source code
 
Float(name=None, attrs=None)
match floating point numbers like 6, 6., -.1, 2.3, +4E-5, ...
source code
 
Word(name=None, attrs=None)
match a 'word'
source code
 
Spaces(name=None, attrs=None)
match one or more whitespace (except newline)
source code
 
Unprintable(name=None, attrs=None)
match an unprintable character (characters not in string.printable)
source code
 
Punctuation(name=None, attrs=None)
match a punctuation character (characters in string.punctuation)
source code
 
ToEol(name=None, attrs=None)
match everything up to and including the end of line
source code
 
UntilEol(name=None, attrs=None)
match everything up to but not including the end of line
source code
 
SkipLinesUntil(expr)
read and ignore lines up to, but excluding, the line matching expr
source code
 
SkipLinesTo(expr)
read and ignore lines up to and including, the line matching expr
source code
 
ToSep(name=None, sep=None, attrs=None)
match all characters up to the given seperator(s)
source code
 
UntilSep(name=None, sep=None, attrs=None)
match all characters up to the given seperators(s)
source code
 
DelimitedFields(name=None, sep=None, attrs=None)
match 0 or more fields seperated by the given seperator(s)
source code
 
select_names(expression, names) source code
 
replace_groups(expr, replacements) source code
 
SimpleRecordFilter(expr, make_reader, reader_args=()) source code
Variables [hide private]
  __warningregistry__ = {('Martel and those parts of Biopython d...
Function Details [hide private]

Seq(*args)

source code 

exp1, exp2, ... -> match exp1 followed by exp2 followed by ...

Alt(*args)

source code 

exp1, exp2, ... -> match exp1 or (if that fails) match exp2 or ...

MaxRepeat(expr, min_count, max_count=65535)

source code 

expr, min_count, max_count = 65535 -> match between min- and max_count times

If max_count == 65535 (which is Expression.MAXREPEAT) then there is no upper limit.

RepN(expr, count)

source code 

expr, count -> match the expression 'count' number of time

This option is handy for named group repeats since you don't have to use the name twice; for the min_count and max_count fields.

Digits(name=None, attrs=None)

source code 

match one or more decimal digits

This is the same as (?P<name?attrs>\d+).

If 'name' is not None, the matching text will be put inside a group of the given name. You can optionally include group attributes.

Integer(name=None, attrs=None)

source code 

match an integer (digits w/ optional leading + or - sign)

If 'name' is not None, the matching text will be put inside a group of the given name. You can optionally include group attributes.

Float(name=None, attrs=None)

source code 

match floating point numbers like 6, 6., -.1, 2.3, +4E-5, ...

If 'name' is not None, the matching text will be put inside of a group of the given name. You can optionally include group attributes.

Word(name=None, attrs=None)

source code 

match a 'word'

A 'word' is defined as '\w+', and \w is [a-zA-Z0-9_].

If 'name' is not None, the matching text will be put inside of a group of the given name. You can optionally include group attributes.

In other words, this is the short way to write (?P<name>\w+).

Spaces(name=None, attrs=None)

source code 

match one or more whitespace (except newline)

"Spaces" is defined as [\t\v\f\r ]+, which is *not* the same as '\s+'. (It's missing the '\n', which is useful since you almost never mean for whitespace to go beyond the newline.)

If 'name' is not None, the matching text will be put inside of a group of the given name. You can optionally include group attributes.

Unprintable(name=None, attrs=None)

source code 

match an unprintable character (characters not in string.printable)

If 'name' is not None, the matching text will be put inside of a group of the given name. You can optionally include group attributes.

Punctuation(name=None, attrs=None)

source code 

match a punctuation character (characters in string.punctuation)

If 'name' is not None, the matching text will be put inside of a group of the given name. You can optionally include group attributes.

ToEol(name=None, attrs=None)

source code 

match everything up to and including the end of line

If 'name' is not None, the matching text, except for the newline, will be put inside a group of the given name. You can optionally include group attributes.

UntilEol(name=None, attrs=None)

source code 

match everything up to but not including the end of line

If 'name' is not None, the matching text, except for the newline, will be put inside a group of the given name. You can optionally include group attributes.

ToSep(name=None, sep=None, attrs=None)

source code 

match all characters up to the given seperator(s)

This is useful for parsing space, tab, color, or other character delimited fields. There is no default seperator character.

If 'name' is not None, the matching text, except for the seperator will be put inside a group of the given name. You can optionally include group attributes. The seperator character will also be consumed.

Neither "\r" nor "\n" may be used as a seperator

UntilSep(name=None, sep=None, attrs=None)

source code 

match all characters up to the given seperators(s)

This is useful for parsing space, tab, color, or other character delimited fields. There is no default seperator.

If 'name' is not None, the matching text, except for the seperator will be put inside a group of the given name. You can optionally include group attributes. The seperator character will not be consumed.

Neither "\r" nor "\n" may be used as a seperator.

DelimitedFields(name=None, sep=None, attrs=None)

source code 

match 0 or more fields seperated by the given seperator(s)

This is useful for parsing space, tab, color, or other character delimited fields. There is no default seperator.

If 'name' is not None, the delimited text, excluding the seperator, will be put inside groups of the given name. You can optionally include group attributes. The seperator character is consumed, but not accessible using a group.

Neither "\r" nor "\n" may be used as a seperator. The line as a whole is not included in a group.


Variables Details [hide private]

__warningregistry__

Value:
{('Martel and those parts of Biopython depending on it directly, such \
as Bio.Mindy, are now deprecated, and will be removed in a future rele\
ase of Biopython.  If you want to continue to use this code, please ge\
t in contact with the Biopython developers via the mailing lists to av\
oid its permanent removal from Biopython',
  <type 'exceptions.DeprecationWarning'>,
  30): 1}