textsearch

 

Function

Search sequence documentation text. SRS and Entrez are faster!

Description

This is a small utility search for words in the description text of a sequence and for each match list the sequence's name and/or description. NB. It only searches the description line of the annotation, not the full annotation.

Usage

Here is a sample session with textsearch

Search for 'lactose':


% textsearch tsw:* 'lactose' 
Search sequence documentation text. SRS and Entrez are faster!
Output sequence details to a file [100k_rat.textsearch]: 

Go to the input files for this example
Go to the output files for this example

Example 2

Search for 'lactose' or 'permease' in E.coli proteins:


% textsearch tsw:*_ecoli 'lactose | permease' 
Search sequence documentation text. SRS and Entrez are faster!
Output sequence details to a file [laci_ecoli.textsearch]: 

Go to the output files for this example

Example 3

Output a search for 'lacz' formatted with HTML to a file:


% textsearch tembl:* 'lacz' -html -outfile embl.lacz.html 
Search sequence documentation text. SRS and Entrez are faster!

Go to the input files for this example
Go to the output files for this example

Command line arguments

   Mandatory qualifiers:
  [-sequence]          seqall     Sequence database USA
  [-pattern]           string     The search pattern is a regular expression.
                                  Use a | to indicate OR.
                                  For example:
                                  human|mouse
                                  will find text with either 'human' OR
                                  'mouse' in the text
  [-outfile]           outfile    If you enter the name of a file here then
                                  this program will write the sequence details
                                  into that file.

   Optional qualifiers:
   -casesensitive      boolean    Do a case-sensitive search
   -html               boolean    Format output as an HTML table

   Advanced qualifiers:
   -only               boolean    This is a way of shortening the command line
                                  if you only want a few things to be
                                  displayed. Instead of specifying:
                                  '-nohead -noname -nousa -noacc -nodesc'
                                  to get only the name output, you can specify
                                  '-only -name'
   -heading            boolean    Display column headings
   -usa                boolean    Display the USA of the sequence
   -accession          boolean    Display 'accession' column
   -name               boolean    Display 'name' column
   -description        boolean    Display 'description' column

   General qualifiers:
  -help                boolean    Report command line options. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose


Mandatory qualifiers Allowed values Default
[-sequence]
(Parameter 1)
Sequence database USA Readable sequence(s) Required
[-pattern]
(Parameter 2)
The search pattern is a regular expression. Use a | to indicate OR. For example: human|mouse will find text with either 'human' OR 'mouse' in the text Any string is accepted An empty string is accepted
[-outfile]
(Parameter 3)
If you enter the name of a file here then this program will write the sequence details into that file. Output file <sequence>.textsearch
Optional qualifiers Allowed values Default
-casesensitive Do a case-sensitive search Boolean value Yes/No No
-html Format output as an HTML table Boolean value Yes/No No
Advanced qualifiers Allowed values Default
-only This is a way of shortening the command line if you only want a few things to be displayed. Instead of specifying: '-nohead -noname -nousa -noacc -nodesc' to get only the name output, you can specify '-only -name' Boolean value Yes/No No
-heading Display column headings Boolean value Yes/No @(!$(only))
-usa Display the USA of the sequence Boolean value Yes/No @(!$(only))
-accession Display 'accession' column Boolean value Yes/No @(!$(only))
-name Display 'name' column Boolean value Yes/No @(!$(only))
-description Display 'description' column Boolean value Yes/No @(!$(only))

Input file format

textsearch reads one or more normal sequence USAs.

Input files for usage example

'tsw:*' is a sequence entry in the example protein database 'tsw'

Input files for usage example 3

'tembl:*' is a sequence entry in the example nucleic acid database 'tembl'

Output file format

Output files for usage example

File: 100k_rat.textsearch

# Search for: lactose
tsw-id:LACI_ECOLI LACI_ECOLI    P03023	LACTOSE OPERON REPRESSOR.
tsw-id:LACY_ECOLI LACY_ECOLI    P02920	LACTOSE PERMEASE (LACTOSE-PROTON SYMPORT).

Output files for usage example 2

File: laci_ecoli.textsearch

# Search for: lactose | permease
tsw-id:LACI_ECOLI LACI_ECOLI    P03023	LACTOSE OPERON REPRESSOR.
tsw-id:LACY_ECOLI LACY_ECOLI    P02920	LACTOSE PERMEASE (LACTOSE-PROTON SYMPORT).

Output files for usage example 3

File: embl.lacz.html

Search for: lacz
tembl-id:ECLACECLACJ01636 E.coli lactose operon with lacI, lacZ, lacY and lacA genes.
tembl-id:ECLACZECLACZV00296 E. coli gene lacZ coding for beta-galactosidase (EC 3.2.1.23).

The first column in the name or ID of each sequence. The remaining text is the description line of the sequence.

When the -html qualifier is specified, then the output will be wrapped in HTML tags, ready for inclusion in a Web page. Note that tags such as <HTML>, <BODY>, </BODY> and </HTML> are not output by this program as the table of databases is expected to form only part of the contents of a web page - the rest of the web page must be supplier by the user.

The lines of out information are guaranteed not to have trailing white-space at the end. So if '-nodesc' is used, there will not be any whitespace after the ID name.

Data files

None.

Notes

This is a rather slow way to search for text in databases. If you are searching for text in public databases, you should consider using either Entrez (http://www.ncbi.nlm.nih.gov/Entrez/) or SRS (http://srs.hgmp.mrc.ac.uk/ or http://www.sanger.ac.uk/srs6/ etc.)

References

None.

Warnings

None.

Diagnostic Error Messages

None.

Exit status

It always exits with status 0

Known bugs

None noted.

See also

Program nameDescription
abiviewReads ABI file and display the trace
cirdnaDraws circular maps of DNA constructs
infoalignInformation on a multiple sequence alignment
infoseqDisplays some simple information about sequences
lindnaDraws linear maps of DNA constructs
pepnetDisplays proteins as a helical net
pepwheelShows protein sequences as helices
prettyplotDisplays aligned sequences, with colouring and boxing
prettyseqOutput sequence with translated ranges
remapDisplay a sequence with restriction cut sites, translation etc
seealsoFinds programs sharing group names
showalignDisplays a multiple sequence alignment
showdbDisplays information on the currently available databases
showfeatShow features of a sequence
showseqDisplay a sequence with features, translation etc
sixpackDisplay a DNA sequence with 6-frame translation and ORFs
tfmDisplays a program's help documentation manual
whichdbSearch all databases for an entry
wossnameFinds programs by keywords in their one-line documentation

Author(s)

This application was written by Gary Williams (gwilliam@hgmp.mrc.ac.uk)

History

Finished.

Target users

This program is intended to be used by everyone and everything, from naive users to embedded scripts.

Comments