![]() |
plotcon |
The similarity is calculated by moving a window of a specified length along the aligned sequences. Within the window, the similarity of any one position is taken to be the average of all the possible pairwise scores of the bases or residues at that position. The pairwise scores are taken from the specified similarity matrix. The average of the position similarities within the window is plotted.
The program is useful for determining where the quality of alignments is good or bad.
The average similarity is calculated by:
Av. Sim. = sum( Mij*wi + Mji*w2 ) ------------------- (Nseq*Wsize)*((Nseq-1)*Wsize)
sum - over column*window size
w - sequence weighting
M - matrix comparison table
i,j - with respect to residue i or j
Nseq - number of sequences in the alignment
Wsize - window size
% plotcon -sformat msf ../../data/globins.msf -graph cps Plots the quality of conservation of a sequence alignment Window size [4]: Created plotcon.ps |
Go to the input files for this example
Go to the output files for this example
Mandatory qualifiers (* if not always prompted): [-msf] seqset File containing a sequence alignment -winsize integer Number of columns to average alignment quality over. The larger this value is, the smoother the plot will be. * -graph xygraph Graph type * -outfile outfile Display as data Optional qualifiers: -scorefile matrix This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. Advanced qualifiers: -data boolean Output the match data to a file instead of plotting it General qualifiers: -help boolean Report command line options. More information on associated and general qualifiers can be found with -help -verbose |
Mandatory qualifiers | Allowed values | Default | |
---|---|---|---|
[-msf] (Parameter 1) |
File containing a sequence alignment | Readable sequences | Required |
-winsize | Number of columns to average alignment quality over. The larger this value is, the smoother the plot will be. | Any integer value | 4 |
-graph | Graph type | EMBOSS has a list of known devices, including postscript, ps, hpgl, hp7470, hp7580, meta, colourps, cps, xwindows, x11, tektronics, tekt, tek4107t, tek, none, null, text, data, xterm, png | EMBOSS_GRAPHICS value, or x11 |
-outfile | Display as data | Output file | <sequence>.plotcon |
Optional qualifiers | Allowed values | Default | |
-scorefile | This is the scoring matrix file used when comparing sequences. By default it is the file 'EBLOSUM62' (for proteins) or the file 'EDNAFULL' (for nucleic sequences). These files are found in the 'data' directory of the EMBOSS installation. | Comparison matrix file in EMBOSS data path | EBLOSUM62 for protein EDNAFULL for DNA |
Advanced qualifiers | Allowed values | Default | |
-data | Output the match data to a file instead of plotting it | Boolean value Yes/No | No |
!!AA_MULTIPLE_ALIGNMENT 1.0 ../data/globins.msf MSF: 164 Type: P 25/06/01 CompCheck: 4278 .. Name: HBB_HUMAN Len: 164 Check: 6914 Weight: 0.14 Name: HBB_HORSE Len: 164 Check: 6007 Weight: 0.15 Name: HBA_HUMAN Len: 164 Check: 3921 Weight: 0.15 Name: HBA_HORSE Len: 164 Check: 4770 Weight: 0.19 Name: MYG_PHYCA Len: 164 Check: 7930 Weight: 0.23 Name: GLB5_PETMA Len: 164 Check: 1857 Weight: 0.21 Name: LGB2_LUPLU Len: 164 Check: 2879 Weight: 0.10 // 1 50 HBB_HUMAN ~~~~~~~~VHLTPEEKSAVTALWGKVN.VDEVGGEALGR.LLVVYPWTQR HBB_HORSE ~~~~~~~~VQLSGEEKAAVLALWDKVN.EEEVGGEALGR.LLVVYPWTQR HBA_HUMAN ~~~~~~~~~~~~~~VLSPADKTNVKAA.WGKVGAHAGEYGAEALERMFLS HBA_HORSE ~~~~~~~~~~~~~~VLSAADKTNVKAA.WSKVGGHAGEYGAEALERMFLG MYG_PHYCA ~~~~~~~VLSEGEWQLVLHVWAKVEAD.VAGHGQDILIR.LFKSHPETLE GLB5_PETMA PIVDTGSVAPLSAAEKTKIRSAWAPVYSTYETSGVDILVKFFTSTPAAQE LGB2_LUPLU ~~~~~~~~GALTESQAALVKSSWEEFNANIPKHTHRFFILVLEIAPAAKD 51 100 HBB_HUMAN FFESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLDNLKGTFATLSE HBB_HORSE FFDSFGDLSNPGAVMGNPKVKAHGKKVLHSFGEGVHHLDNLKGTFAALSE HBA_HUMAN FPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAHVDDMPNALSALSD HBA_HORSE FPTTKTYFPHFDLSHGSAQVKAHGKKVGDALTLAVGHLDDLPGALSNLSD MYG_PHYCA KFDRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKGHHEAELKPLAQ GLB5_PETMA FFPKFKGLTTADQLKKSADVRWHAERIINAVNDAVASMDDTEKMSMKLRD LGB2_LUPLU LFSFLKGTSEVPQNNPELQAHAGKVFKLVYEAAIQLQVTGVVVTDATLKN 101 150 HBB_HUMAN LHCDKLH..VDPENFRLLGNVLVCVLAHHFGKEFTPPVQAAYQKVVAGVA HBB_HORSE LHCDKLH..VDPENFRLLGNVLVVVLARHFGKDFTPELQASYQKVVAGVA HBA_HUMAN LHAHKLR..VDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVS HBA_HORSE LHAHKLR..VDPVNFKLLSHCLLSTLAVHLPNDFTPAVHASLDKFLSSVS MYG_PHYCA SHATKHK..IPIKYLEFISEAIIHVLHSRHPGDFGADAQGAMNKALELFR GLB5_PETMA LSGKHAK..SFQVDPQYFKVLAAVIADTVAAGDAGFEKLMSMICILLRSA LGB2_LUPLU LGSVHVSKGVADAHFPVVKEAILKTIKEVVGAKWSEELNSAWTIAYDELA 151 164 HBB_HUMAN NALAHKYH~~~~~~ HBB_HORSE NALAHKYH~~~~~~ HBA_HUMAN TVLTSKYR~~~~~~ HBA_HORSE TVLTSKYR~~~~~~ MYG_PHYCA KDIAAKYKELGYQG GLB5_PETMA Y~~~~~~~~~~~~~ LGB2_LUPLU IVIKKEMNDAA~~~ |
EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA.
To see the available EMBOSS data files, run:
% embossdata -showall
To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run:
% embossdata -fetch -file Exxx.dat
Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata".
The directories are searched in the following order:
Program name | Description |
---|---|
emma | Multiple alignment program - interface to ClustalW program |
infoalign | Information on a multiple sequence alignment |
prettyplot | Displays aligned sequences, with colouring and boxing |
showalign | Displays a multiple sequence alignment |
tranalign | Align nucleic coding regions given the aligned proteins |