![]() |
etandem |
Input sequences are converted into ACGT or N (so ambiguity codes are ignored).
The score is +1 for a match, -1 for a mismatch.
The first copy of a repeat is ignored.
The highest score is kept for each start position and repeat size.
The lowest score to be reported is set by the threshold score. The threshold score can be set on the command-line using the -threshold qualifier, the default is 20. For perfect repeats, the score is the length of the repeat (except for the first copy). Reduce the threshold score a little if you wish to to allow mismatches. Each mismatch scores -1 instead of +1 so it scores 2 less than a perfect match of the same number of bases.
Running with a wide range of repeat sizes is inefficient. That is why equicktandem was written - to give a rapid estimate of the major repeat sizes.
The input sequence is the human herpesvirus tandem repeat.
% etandem Looks for tandem repeats in a nucleotide sequence Input sequence: tembl:hhtetra Minimum repeat size [10]: 6 Maximum repeat size [6]: Output report [hhtetra.tan]: |
Go to the input files for this example
Go to the output files for this example
Mandatory qualifiers: [-sequence] sequence Sequence USA -minrepeat integer Minimum repeat size -maxrepeat integer Maximum repeat size [-outfile] report Output report file name Optional qualifiers: (none) Advanced qualifiers: -threshold integer Threshold score -mismatch boolean Allow N as a mismatch -uniform boolean Allow uniform consensus -origfile outfile Output file name General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-sequence] (Parameter 1) |
Sequence USA | Readable sequence | Required |
-minrepeat | Minimum repeat size | Integer, 2 or higher | 10 |
-maxrepeat | Maximum repeat size | Integer, same as -minrepeat or higher | Same as -minrepeat |
[-outfile] (Parameter 2) |
Output report file name | Report output file | |
Optional qualifiers | Allowed values | Default | |
(none) | |||
Advanced qualifiers | Allowed values | Default | |
-threshold | Threshold score | Any integer value | 20 |
-mismatch | Allow N as a mismatch | Boolean value Yes/No | No |
-uniform | Allow uniform consensus | Boolean value Yes/No | No |
-origfile | Output file name | Output file | <sequence>.etandem |
ID HHTETRA standard; DNA; VRL; 1272 BP. XX AC L46634; L46689; XX SV L46634.1 XX DT 06-NOV-1995 (Rel. 45, Created) DT 04-MAR-2000 (Rel. 63, Last updated, Version 3) XX DE Human herpesvirus 7 (clone ED132'1.2) telomeric repeat region. XX KW telomeric repeat. XX OS Human herpesvirus 7 OC Viruses; dsDNA viruses, no RNA stage; Herpesviridae; Betaherpesvirinae. XX RN [1] RP 1-1272 RX MEDLINE; 96079055. RA Secchiero P., Nicholas J., Deng H., Xiaopeng T., van Loon N., Ruvolo V.R., RA Berneman Z.N., Reitz M.S. Jr., Dewhurst S.; RT "Identification of human telomeric repeat motifs at the genome termini of RT human herpesvirus 7: structural analysis and heterogeneity"; RL J. Virol. 69(12):8041-8045(1995). XX FH Key Location/Qualifiers FH FT source 1..1272 FT /db_xref="taxon:10372" FT /organism="Human herpesvirus 7" FT /strain="JI" FT /clone="ED132'1.2" FT repeat_region 207..928 FT /note="long and complex repeat region composed of various FT direct repeats, including TAACCC (TRS), degenerate copies FT of TRS motifs and a 14-bp repeat, TAGGGCTGCGGCCC" FT misc_signal 938..998 FT /note="pac2 motif" FT misc_feature 1009 FT /note="right genome terminus (...ACA)" XX SQ Sequence 1272 BP; 346 A; 455 C; 222 G; 249 T; 0 other; aagcttaaac tgaggtcaca cacgacttta attacggcaa cgcaacagct gtaagctgca 60 ggaaagatac gatcgtaagc aaatgtagtc ctacaatcaa gcgaggttgt agacgttacc 120 tacaatgaac tacacctcta agcataacct gtcgggcaca gtgagacacg cagccgtaaa 180 ttcaaaactc aacccaaacc gaagtctaag tctcacccta atcgtaacag taaccctaca 240 actctaatcc tagtccgtaa ccgtaacccc aatcctagcc cttagcccta accctagccc 300 taaccctagc tctaacctta gctctaactc tgaccctagg cctaacccta agcctaaccc 360 taaccgtagc tctaagttta accctaaccc taaccctaac catgaccctg accctaaccc 420 tagggctgcg gccctaaccc tagccctaac cctaacccta atcctaatcc tagccctaac 480 cctagggctg cggccctaac cctagcccta accctaaccc taaccctagg gctgcggccc 540 taaccctaac cctagggctg cggcccgaac cctaacccta accctaaccc taaccctagg 600 gctgcggccc taaccctaac cctagggctg cggccctaac cctaacccta gggctgcggc 660 ccgaacccta accctaaccc taaccctagg gctgcggccc taaccctaac cctagggctg 720 cggccctaac cctaacccta actctagggc tgcggcccta accctaaccc taaccctaac 780 cctagggctg cggcccgaac cctagcccta accctaaccc tgaccctgac cctaacccta 840 accctaaccc taaccctaac cctaacccta accctaaccc taaccctaac cctaacccta 900 accctaaccc taaccctaac cctaaccccg cccccactgg cagccaatgt cttgtaatgc 960 cttcaaggca ctttttctgc gagccgcgcg cagcactcag tgaaaaacaa gtttgtgcac 1020 gagaaagacg ctgccaaacc gcagctgcag catgaaggct gagtgcacaa ttttggcttt 1080 agtcccataa aggcgcggct tcccgtagag tagaaaaccg cagcgcggcg cacagagcga 1140 aggcagcggc tttcagactg tttgccaagc gcagtctgca tcttaccaat gatgatcgca 1200 agcaagaaaa atgttctttc ttagcatatg cgtggttaat cctgttgtgg tcatcactaa 1260 gttttcaagc tt 1272 // |
The output is a standard EMBOSS report file.
The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, trace, listfile, dbmotif, diffseq, excel, feattable, motif, regions, seqtable, simple, srs, table, tagseq
See: http://www.uk.embnet.org/Software/EMBOSS/Themes/ReportFormats.html for further information on report formats.
By default etandem writes a 'table' report file.
######################################## # Program: etandem # Rundate: Thu Nov 07 14:33:07 2002 # Report_format: table # Report_file: hhtetra.tan ######################################## #======================================= # # Sequence: HHTETRA from: 1 to: 1272 # HitCount: 5 # # Threshold: 20 # Minrepeat: 6 # Maxrepeat: 6 # Mismatch: No # Uniform: No # #======================================= Start End Score Size Count Identity Consensus 793 936 120 6 24 93.8 acccta 283 420 90 6 23 84.8 taaccc 432 485 38 6 9 90.7 ccctaa 494 529 26 6 6 94.4 ccctaa 568 597 24 6 5 100.0 aaccct #--------------------------------------- #--------------------------------------- |
Program name | Description |
---|---|
einverted | Finds DNA inverted repeats |
equicktandem | Finds tandem repeats |
palindrome | Looks for inverted repeats in a nucleotide sequence |
Running with a wide range of repeat sizes is inefficient. That is why equicktandem was written - to give a rapid estimate of the major repeat sizes.
This application was modified for inclusion in EMBOSS by Peter Rice (pmr@sanger.ac.uk) Informatics Division, The Sanger Centre, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SA, UK.