README The script oligoselect.py batch processes genes/ORFS and finds oligos of a length determined by the user There is more documentation on this program and fasta.py in the doc directory. (Generated by HappyDoc) THE QUICK START Use run_oligoselect to run the program oligoselect.py. run_oligoselect specifies all the parameters in it. Just change the parameters in this file. (Remeber to change the output file name for different runs.) The input for this script is in the form: ./oligoselect.py input_file output_file oligo_length blast_database gc_min gc_max percent_match num_matches Information on each of these parameters: input_file = the file of sequence you want to find oligos for; the file format is discussed below. output_file = the file to write the results to oligo_length = how long you want your oligos to be blast_database = the database you want to use to blast the oligos against to make sure they are unique. This database must already be blast formatted. gc_min = the minimum gc_content you want your oligos to have gc_max = the maximum gc_content you want your oligos to have percent_match = percent matching allowed for uniqueness In the blast analysis, the number of base pairs that match between the oligo and the second best hit is found (the first best hit is the sequence itself) e.g. 35bp out of a 50bp oligo would be 70 percent_match. The recommended value is 80. num_matches = the number of matches for each gene that you want returned AN EXAMPLE Right now it uses an example input file, testgene for the database s_putrefaciens. Run this first on your system to make sure everything is working OK. It will probably take a few minutes for you to get a result. This is normal and what you should expect at a minimum for your runs. There is a bit more information in the EXAMPLE file. INFORMATION ABOUT WHAT OLIGOSELECT.PY DOES This script batch processes genes/ORFS and finds oligos of a length determined by the user. It can be used to find oligos for many genes/ORFs at once. It checks to make sure they are of the specified GC content, that they are unique in the genome database of interest, and that they are self-annealing. It looks at the region towards the 5' end, preferentially choosing oligos that match the criteria that are closer to the 5' end. Right now it looks at the region 50 base pairs from the 5' end and 50 bp from the 3' end. This can be modified in the script. To use this script the user must 1) be able to run it on a machine that can do batch blast queries. These tools can be downloaded from NCBI. And 2) the user must load the database they need to have the oligos blasted against. This can be done by downloading the database from NCBI or elsewhere to the server where this script will be run and the blast analyses done. Then follow blast's 'formatdb' instructions or the formatdb_README here. INPUT FILE FORMAT The input file format should be tab delimited with the following column format. 1: gene or ORF name 2: ascession number 3: start site of the gene or ORF 4: end site 5: sequence If you have data in some other format just change the columns in the input file seciton of the script. OUTPUT FILE FORMAT 1: Gene name 2: GC content 4: Percent match 4: oligo sequence This can be changed in the output file format section of the script. Oligoselect was created by Tracy K. Teal on September 19, 2002. Copyright: California Institute of Technology