README 

The script oligoselect.py batch processes genes/ORFS and finds oligos
of a length determined by the user

There is more documentation on this program and fasta.py in the doc directory.  
(Generated by HappyDoc)


THE QUICK START

Use run_oligoselect to run the program oligoselect.py.
run_oligoselect specifies all the parameters in it.  Just change the
parameters in this file.  (Remeber to change the output file name for
different runs.) 

The input for this script is in the form:
./oligoselect.py input_file output_file oligo_length blast_database
gc_min gc_max percent_match num_matches

Information on each of these parameters:

input_file = the file of sequence you want to find oligos for; the
file format is discussed below.

output_file = the file to write the results to

oligo_length = how long you want your oligos to be

blast_database = the database you want to use to blast the oligos
against to make sure they are unique.  This database must already be
blast formatted.

gc_min = the minimum gc_content you want your oligos to have

gc_max = the maximum gc_content you want your oligos to have

percent_match = percent matching allowed for uniqueness  In the blast
analysis, the number of base pairs that match between the oligo and
the second best hit is found (the first best hit is the sequence
itself) e.g. 35bp out of a 50bp oligo would be 70 percent_match.  The
recommended value is 80.  

num_matches = the number of matches for each gene that you want returned




AN EXAMPLE 

Right now it uses an example input file, testgene for the database
s_putrefaciens.  Run this first on your system to make sure everything
is working OK.  It will probably take a few minutes for you to get a
result.  This is normal and what you should expect at a minimum for
your runs.

There is a bit more information in the EXAMPLE file.


INFORMATION ABOUT WHAT OLIGOSELECT.PY DOES

This script batch processes genes/ORFS and finds oligos of a length
determined by the user.  It can be used to find oligos for many
genes/ORFs at once.  It checks to make sure they are of the specified
GC content, that they are unique in the genome database of interest,
and that they are self-annealing.  It looks at the region towards the
5' end, preferentially choosing oligos that match the criteria that
are closer to the 5' end.  Right now it looks at the region 50 base
pairs from the 5' end and 50 bp from the 3' end.  This can be modified
in the script.

To use this script the user must 1) be able to run it on a machine
that can do batch blast queries.  These tools can be downloaded from
NCBI.  And 2) the user must load the database they need to have the
oligos blasted against. This can be done by downloading the database
from NCBI or elsewhere to the server where this script will be run and
the blast analyses done.  Then follow blast's 'formatdb' instructions
or the formatdb_README here.


INPUT FILE FORMAT
The input file format should be tab delimited with the following
column format. 1: gene or ORF name 2: ascession number
3: start site of the gene or ORF 4: end site 5: sequence
If you have data in some other format just change the columns in the
input file seciton of the script.
 

OUTPUT FILE FORMAT
1: Gene name 2: GC content
4: Percent match 4: oligo sequence

This can be changed in the output file format section of the script.


Oligoselect was created by Tracy K. Teal on September 19, 2002.
Copyright: California Institute of Technology