Package Bio :: Module Seq :: Class UnknownSeq
[hide private]
[frames] | no frames]

Class UnknownSeq

source code

object --+    
         |    
       Seq --+
             |
            UnknownSeq

A read-only sequence object of known length but unknown contents.

If you have an unknown sequence, you can represent this with a normal Seq object, for example:

>>> my_seq = Seq("N"*5)
>>> my_seq
Seq('NNNNN', Alphabet())
>>> len(my_seq)
5
>>> print my_seq
NNNNN

However, this is rather wasteful of memory (especially for large sequences), which is where this class is most usefull:

>>> unk_five = UnknownSeq(5)
>>> unk_five
UnknownSeq(5, alphabet = Alphabet(), character = '?')
>>> len(unk_five)
5
>>> print(unk_five)
?????

You can add unknown sequence together, provided their alphabets and characters are compatible, and get another memory saving UnknownSeq:

>>> unk_four = UnknownSeq(4)
>>> unk_four
UnknownSeq(4, alphabet = Alphabet(), character = '?')
>>> unk_four + unk_five
UnknownSeq(9, alphabet = Alphabet(), character = '?')

If the alphabet or characters don't match up, the addition gives an ordinary Seq object:

>>> unk_nnnn = UnknownSeq(4, character = "N")
>>> unk_nnnn
UnknownSeq(4, alphabet = Alphabet(), character = 'N')
>>> unk_nnnn + unk_four
Seq('NNNN????', Alphabet())

Combining with a real Seq gives a new Seq object:

>>> known_seq = Seq("ACGT")
>>> unk_four + known_seq
Seq('????ACGT', Alphabet())
>>> known_seq + unk_four
Seq('ACGT????', Alphabet())
Instance Methods [hide private]
 
__init__(self, length, alphabet=Alphabet(), character=None)
Create a new UnknownSeq object.
source code
 
__len__(self)
Returns the stated length of the unknown sequence.
source code
 
__str__(self)
Returns the unknown sequence as full string of the given length.
source code
 
__repr__(self)
Returns a (truncated) representation of the sequence for debugging.
source code
 
__add__(self, other)
Add another sequence or string to this sequence.
source code
 
__radd__(self, other) source code
 
__getitem__(self, index) source code
 
count(self, sub, start=0, end=2147483647)
Non-overlapping count method, like that of a python string.
source code
 
complement(self)
The complement of an unknown nucleotide equals itself.
source code
 
reverse_complement(self)
The reverse complement of an unknown nucleotide equals itself.
source code
 
transcribe(self)
Returns unknown RNA sequence from an unknown DNA sequence.
source code
 
back_transcribe(self)
Returns unknown DNA sequence from an unknown RNA sequence.
source code
 
translate(self, **kwargs)
Translate an unknown nucleotide sequence into an unknown protein.
source code

Inherited from Seq: __contains__, endswith, find, lstrip, rfind, rsplit, rstrip, split, startswith, strip, tomutable, tostring

Inherited from Seq (private): _get_seq_str_and_check_alphabet, _set_data

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __subclasshook__

Properties [hide private]

Inherited from Seq: data

Inherited from object: __class__

Method Details [hide private]

__init__(self, length, alphabet=Alphabet(), character=None)
(Constructor)

source code 

Create a new UnknownSeq object.

If character is ommited, it is determed from the alphabet, "N" for nucleotides, "X" for proteins, and "?" otherwise.

Overrides: object.__init__

__len__(self)
(Length operator)

source code 

Returns the stated length of the unknown sequence.

Overrides: Seq.__len__

__str__(self)
(Informal representation operator)

source code 

Returns the unknown sequence as full string of the given length.

Overrides: object.__str__

__repr__(self)
(Representation operator)

source code 

Returns a (truncated) representation of the sequence for debugging.

Overrides: object.__repr__
(inherited documentation)

__add__(self, other)
(Addition operator)

source code 

Add another sequence or string to this sequence.

Overrides: Seq.__add__
(inherited documentation)

__radd__(self, other)
(Right-side addition operator)

source code 
Overrides: Seq.__radd__

__getitem__(self, index)
(Indexing operator)

source code 
Overrides: Seq.__getitem__

count(self, sub, start=0, end=2147483647)

source code 

Non-overlapping count method, like that of a python string.

This behaves like the python string (and Seq object) method of the same name, which does a non-overlapping count!

Returns an integer, the number of occurrences of substring argument sub in the (sub)sequence given by [start:end]. Optional arguments start and end are interpreted as in slice notation.

Arguments:

  • sub - a string or another Seq object to look for
  • start - optional integer, slice start
  • end - optional integer, slice end
>>> "NNNN".count("N")
4
>>> Seq("NNNN").count("N")
4
>>> UnknownSeq(4, character="N").count("N")
4
>>> UnknownSeq(4, character="N").count("A")
0
>>> UnknownSeq(4, character="N").count("AA")
0

HOWEVER, please note because that python strings and Seq objects (and MutableSeq objects) do a non-overlapping search, this may not give the answer you expect:

>>> UnknownSeq(4, character="N").count("NN")
2
>>> UnknownSeq(4, character="N").count("NNN")
1
Overrides: Seq.count

complement(self)

source code 

The complement of an unknown nucleotide equals itself.

>>> my_nuc = UnknownSeq(8)
>>> my_nuc
UnknownSeq(8, alphabet = Alphabet(), character = '?')
>>> print my_nuc
????????
>>> my_nuc.complement()
UnknownSeq(8, alphabet = Alphabet(), character = '?')
>>> print my_nuc.complement()
????????
Overrides: Seq.complement

reverse_complement(self)

source code 

The reverse complement of an unknown nucleotide equals itself.

>>> my_nuc = UnknownSeq(10)
>>> my_nuc
UnknownSeq(10, alphabet = Alphabet(), character = '?')
>>> print my_nuc
??????????
>>> my_nuc.reverse_complement()
UnknownSeq(10, alphabet = Alphabet(), character = '?')
>>> print my_nuc.reverse_complement()
??????????
Overrides: Seq.reverse_complement

transcribe(self)

source code 

Returns unknown RNA sequence from an unknown DNA sequence.

>>> my_dna = UnknownSeq(10, character="N")
>>> my_dna
UnknownSeq(10, alphabet = Alphabet(), character = 'N')
>>> print my_dna
NNNNNNNNNN
>>> my_rna = my_dna.transcribe()
>>> my_rna
UnknownSeq(10, alphabet = RNAAlphabet(), character = 'N')
>>> print my_rna
NNNNNNNNNN
Overrides: Seq.transcribe

back_transcribe(self)

source code 

Returns unknown DNA sequence from an unknown RNA sequence.

>>> my_rna = UnknownSeq(20, character="N")
>>> my_rna
UnknownSeq(20, alphabet = Alphabet(), character = 'N')
>>> print my_rna
NNNNNNNNNNNNNNNNNNNN
>>> my_dna = my_rna.back_transcribe()
>>> my_dna
UnknownSeq(20, alphabet = DNAAlphabet(), character = 'N')
>>> print my_dna
NNNNNNNNNNNNNNNNNNNN
Overrides: Seq.back_transcribe

translate(self, **kwargs)

source code 

Translate an unknown nucleotide sequence into an unknown protein.

e.g.

>>> my_seq = UnknownSeq(11, character="N")
>>> print my_seq
NNNNNNNNNNN
>>> my_protein = my_seq.translate()
>>> my_protein
UnknownSeq(3, alphabet = ProteinAlphabet(), character = 'X')
>>> print my_protein
XXX

In comparison, using a normal Seq object:

>>> my_seq = Seq("NNNNNNNNNNN")
>>> print my_seq
NNNNNNNNNNN
>>> my_protein = my_seq.translate()
>>> my_protein
Seq('XXX', ExtendedIUPACProtein())
>>> print my_protein
XXX
Overrides: Seq.translate