; ; THE BEST WAY to read this file is to use plain text editor, ; which does not wrap contents of isolated lines along several ; lines on screen. For example, with Microsoft(R) Windows(TM) ; operating systems use program "Notepad" and disable menu option ; "Format/Word Wrap". ; ; ================================================================== ; ; This is an exemplar parameter file for the program: ; ; ***** BEsTRF: Best Estimated Terminal Restriction Fragment ***** ; ; Purpose of the program: ; ; BEsTRF provides an in-depth and controlled environment for up ; to date exploration of Primers-Enzymes-Gene sections combinations ; used in T-RFLP. ; User defined sequence database can be processed and the resolution ; of user specified sets of Primers and restriction endonucleases ; can be analyzed on either forward or reverse terminal fragments. ; ; Suggested citation: ; ; B. Stres, J.M. Tiedje, B. Murovec, BEsTRF: a tool for optimal ; resolution of terminal restriction fragment length polymorphism ; analysis based on user defined primer-enzyme-sequence databases, ; Submitted for publication in journal Bioinformatics ; ; Please check our web site ; ; http://lie.fe.uni-lj.si/bestrf ; ; for reference updates. ; ; ; ; ================================================================== ; ; DISCLAIMER: this program is developed with the best effort to make ; it error-free and working as described. ; However, there are no guarantees. ; ; The developers do not take any responsibilities ; whatsoever if the use or misuse of the program causes ; or leads to personal injuries, data loss, material ; damage or any other undesired consequences. ; ; Under no conditions should the developers be charged ; or prosecuted by anyone using the program for any ; reason whatsoever. ; ; ================================================================== ; ; This is an exemplar file with user's specified parameters. ; ; A value for each parameter is specified on a separate line ; with the following syntax: ; ; ; ; There can be arbitrarily many spaces and tabs between ; and as well as between and ; . ; ; Semicolon as the first non-blank character on the line declares ; the whole line a comment, therefore makes it being entirely ; ignored. This is the case with all lines so far in this file, ; and that is why this file can contain such elaborate instructions ; without interfering with the meaningful contents of the file. ; ; NOTE: currently, all settings in this file are commented out ; in order not to introduce any spurious setup. However, ; keep in mind that you must remove leading semicolon from the ; lines that you want to have influence on execution of the ; program. ; ; Empty lines (possibly with spaces and tabs) are allowed ; and ignored too, which may be of a stylistic value. ; ; The order of parameters specification is not important. ; ; ------------------------------------------------------------------ ; ; As an example, let us set a parameter "DNA_File_Names" to a value ; "file_with_my_DNAs.fa". To achieve the stated goal, you would ; write the following line somewhere in the parameter file ; (without the leading semicolon, of course). ; ; DNA_File_Names file_with_my_DNAs.fa ; ; In this example "DNA_File_Names" represents ; and "file_with_my_DNAs.fa" constellates the field. ; Note that there are no quotes in the contents of the real line. ; ; You can always put an arbitrary comment after parameter's value ; for annotation purposes or for making all sorts of notes. ; Therefore, the previous line could as well be written as: ; ; DNA_File_Names file_with_my_DNAs.fa DNAs from my experiments ; ; or maybe ; ; DNA_File_Names file_with_my_DNAs.fa the selection of the best ; ; The program also knows about the so called multi-value parameters. ; These are parameters to which you can assign several values. ; You can do this by providing several lines with the same ; parameter name but each time with some new value. ; ; For example, the previously mentioned parameter "DNA_File_Names" ; is of a multi-value type (that is why its name suggests plural), ; so you can specify several DNA files to be processed like this: ; ; DNA_File_Names file_with_my_DNAs.fa ; DNA_File_Names another_one_of_my_files.fa ; DNA_File_Names a_file_from_my_neighbour.fa ; DNA_File_Names downloaded_DNAs.fa ; DNA_File_Names etc_for_as_long_as_needed.fa ; ; Alternatively, you can specify several values on the same line by ; surrounding all of them in double quotes. Isolated values must be ; separated by spaces or tabs. Therefore, the following example ; achieves the same functionality as the previous one: ; ; DNA_File_Names "file_with_my_DNAs.fa another_one_of_my_files.fa" ; DNA_File_Names a_file_from_my_neighbour.fa ; DNA_File_Names "downloaded_DNAs.fa etc_for_as_long_as_needed.fa" ; ; As you can see, you can arbitrarily intermix both writing styles. ; ; ------------------------------------------------------------------ ; ; Some parameters have their default values, which are assumed if ; you do not define the parameter at all. Please, see descriptions ; that follow. ; ; GOOD LUCK ! ; ; ================================================================== ; ;******************************************************************* ; ; I. Input FASTA sequences ; ;******************************************************************* ; ; 1. The parameter "DNA_File_Names" specifies file names with DNA ; patterns in FASTA format (both, aligned or unaligned). ; Note the plural in the name of the parameter. You can specify ; several files to perform a united analysis on their contents. ; I.e. all specified files are regarded as one big file with ; concatenated contents. ; ; NOTE: this parameter is required. If you do not specify at least ; one file name, the program execution will be aborted. ; ; Examples of plausible settings are as follows: ;DNA_File_Names "file1.fa file2.fa" ; enter name(s) of your file(s) ;DNA_File_Names "phantom1.txt" ; for debugging purposes ;DNA_File_Names "rdp_download_31seqs.fa phantom1.txt" ; debugging ;DNA_File_Names rdp_download_138807seqs.fa ;DNA_File_Names rdp_download_129580seqs.fa ;DNA_File_Names "rpobseq2183.txt" ;DNA_File_Names "my_new_seqs.txt" ; ; 2. The parameter "Max_DNA_Degeneration" with ; permissible values of 4, 3, 2 or 1 specifies maximal ; tolerated degeneration of DNA sites in input file(s). ; ; When the value of the parameter equals 4, any level of ; degeneracy is tolerated. I.e. FASTA code "N" specifies ; possibility of all four nucleotides A, T, C and G, hence ; number 4 indicates that such level of degeneracy is tolerated. ; ; When the value of the parameter equals 3 degenerated sites ; that specify no more than 3 possible nucleotides are tolerated. ; Therefore, DNA that contains code N is rejected and does not ; take part in the analysis. Other codes are tolerated. ; ; Similarly, when the value of the parameter equals 2, ; DNA is permitted to contain only sites that specify no more ; than 2 possible nucleotides. Therefore, codes N, D, V, H ; and B are ruled out, but ambiguous codes Y, R, W, S, K, M ; are allowed to be present. ; ; Finally, when the value of the parameter equals 1, ; then only DNA codes A, C, T and G are allowed to exist. ; Otherwise, DNA is rejected. ; ; NOTE: the value of 0 is also allowed and internally ; silently changed to 1. ; ; ; Default value: 4 (any kind of degeneracy is tolerated). ; ;Max_DNA_Degeneration 4 ; set the value to 4, 3, 2 or 1 (or 0) ;******************************************************************* ; ; II. Primers discovery specification ; ;******************************************************************* ; ; 4. Multi-value parameters "Forward_Primer_Dictionary" and ; "Reverse_Primer_Dictionary" specify dictionary files with ; forward and reverse Primers' definitions, respectively. ; ; The contents of each dictionary file has to be arranged ; into two columns with space(s) and/or tab(s) in-between. ; In each row the first column contains Primer's name whereas ; the second one specifies the Primer's associated FASTA pattern. ; (Num)FASTA blanks are not allowed when specifying Primers. ; ; Please, see exemplar files "ForwardPrimers.txt" ; and "ReversePrimers.txt" for a template. ; ; NOTE: reverse Primers must be specified as a reverse complement ; of matching DNA sequence. For example, the Primer pattern ; 5’ AAACR 3’ matches with DNA pattern 5’ YGTTT 3’. ; ; NOTE: if you intend to specify Primers of interest by hand ; by setting the soon to be described parameters ; "Forward_Primers" and "Reverse_Primers", you do not need to ; (but you can) provide dictionary file(s). In this case ; leave one or both of the parameters ; "Forward_Primer_Dictionary" and "Reverse_Primer_Dictionary" ; unspecified (e.g. comment out the appropriate lines by ; putting semicolon at its beginning). ; ;Forward_Primer_Dictionary F_my_new_primers.txt ;Reverse_Primer_Dictionary R_my_new_primers.txt ; ; 5. Multi-value parameters "Forward_Primers" and "Reverse_Primers" ; specify all desired combinations of forward/reverse Primer pairs ; by Primers' names (e.g. "968f") or in FASTA format ; (FASTA blanks are not allowed). In the former case, the Primer ; must be in the appropriate dictionary so that its FASTA pattern ; can be deduced from the name. In the later case use FASTA format ; to specify Primer. ; ; You can specify as many patterns as needed, separated with spaces ; or tabs. Use, several "Forward_Primers" and "Reverse_Primers" ; prefixed lines if needed. ; ; NOTE: reverse Primers must be specified as a reverse complement ; of matching DNA sequences. For example, the pattern ; 5’ AAACR 3’ matches with DNA pattern 5’ YGTTT 3’. ; ;Forward_Primers "968f 341f" ;Forward_Primers "341f TTAC CGT 519f AGAGTTTGATCMTGGCTCAG" ;debugging ;Forward_Primers "AATAC" ;debugging ;Reverse_Primers "1406r GTGTGTRC 1387r ACGGG CCGTCAATTCCTTTRAGTTT" ;debugging ;Reverse_Primers "1492r 1406r" ;Reverse_Primers "YRTAC" ;debugging ; ; 6. The parameter "Max_Primer_Degeneration" with ; permissible values of 4, 3, 2 or 1 specifies maximal ; tolerated degeneration of DNA sites to which Primers ; can bind. ; ; When the value of the parameter equals 4, Primer can ; bind to any degenerated site that matches with. ; I.e. ambiguity code "N" specifies possibility of all ; four nucleotides A, T, C and G, hence number 4 ; indicates that such level of ambiguity is tolerated. ; ; When the value of the parameter equals 3 Primer can bind ; to degenerated sites that specify no more than 3 possible ; nucleotides. Therefore, code N is ruled out, but other ; ambiguous codes are checked for a match. ; ; Similarly, when the value of the parameter equals 2, ; Primer can bind to degenerated sites that specify no more ; than 2 possible nucleotides. Therefore, codes N, D, V, H ; and B are ruled out, but ambiguous codes Y, R, W, S, K, M ; are checked for a match. ; ; Finally, when the value of the parameter equals 1, ; then only DNA codes A,C,T and G are allowed to be matched. ; I.e. Primer cannot bind to any degenerated DNA sites. ; ; NOTE: Primer patterns can always specify any non-selectivity. ; I.e. Primer pattern AAN specifies that the third site ; can bind to any DNA site that is not ambiguous more ; than the parameter "Max_Primer_Degeneration" allows. ; ; ; NOTE: the value of 0 is also allowed and internally ; silently changed to 1. ; ; ; Default value: 4 (any level of DNA degeneracy is tolerated). ; ;Max_Primer_Degeneration 4 ; set the value to 4, 3, 2 or 1 (or 0) ; ; 7. If the parameter "Strict_Primer_Match" is set to "Yes", ; then a match with degenerated DNA site is allowed if ; and only if Primer binding choices are a (sub)set of ; degenerated DNA choices, i.e. there is certainty of a match. ; When the parameter is set to "No", a match is declared whenever ; there is at least a possibility of a match. ; ; For example: Primer site specifies R, which denotes ; binding to either A or G (Purine). ; DNA site contains ambiguous code W, ; which denotes the presence of either ; A or T (weak). ; ; If the parameter "Strict_Primer_Match" ; is set to "Yes", Primer site R does not bind ; to DNA site W, since there is a possibility ; of a mismatch if DNA happens to contain T. ; ; However, when the parameter "Strict_Primer_Match" ; is set to "No", these two sites do bind with ; each other since there is a possibility of a match ; in the case that DNA site contains A. ; ; ; Default value: No (strict match is not required). ; ;Strict_Primer_Match No ; set the value to Yes or No ; ; 8. If the parameter "Primer_Matches_Aligned_Sites_Only" is set ; to "Yes", then a match with unaligned DNA sites is not ; allowed. ; ; For example: Primer site A normally binds to DNA sites ; A, a, N, n, R, r, W, w, M, m, D, d, V, v, H and h. ; ; However, if the parameter ; "Primer_Matches_Aligned_Sites_Only" is set to "Yes" ; then Primer site A binds only to DNA sites ; A, N, R, W, M, D, V and H (capital letters only). ; ; ; Default value: No (aligned as well as unaligned DNA sites can bind). ; ;Primer_Matches_Aligned_Sites_Only no ; set the value to Yes or No ; ; 9. The Parameter "Primer_Mismatches" specifies number of mismatches ; that are tolerated when searching for a Primer. ; ; For example: Primer pattern "ATT" matches with DNA pattern "ATT". ; The same Primer also matches DNA pattern "ATC" ; if "Primer_Mismatches" is set to at least one. ; The Primer also matches with DNA pattern "AGC" if ; "Primer_Mismatches" is set to at least two, etc. ; ; To perform an exact match, set the parameter to zero. ; ; Default value: 0 (no mismatches are tolerable). ; ;Primer_Mismatches 0 ; the value of tolerable mismatches ; ; 10. When the parameter "Primer_Levenshtein" is set to ; "Yes", then detection of Primer binding uses a more elaborate ; criterion (Levenshtein distance) instead of a pure comparison of ; respective Primer and DNA sites. This way the algorithm also takes ; into account fictitious insertions and deletions into or out of ; Primer when calculating (mis)matches between the two. ; ; For example: Primer pattern equals AATCC, whereas DNA pattern ; is AAGTCCGCC and "Primer_Mismatches" is set to 1 ; (this way one mismatch is tolerable). ; ; Without insertions the Primer does not match ; with the DNA, since more than one of its' sites ; do not match: ; ; Primer: AATCC AATCC AATCC ; DNA: AAGTCCGCC AATCCGCC AATCCGCC ; ----------------------------------------- ; mismatch: XX xx x xxxx ; ; However, when insertions and deletions are taken ; into account, the Primer does bind successfully: ; ; Primer: AA-TCC ; "-" denotes insertion ; DNA: AAGTCCGCC ; ------------------- ; mismatch: x ; one mismatch is tolerable ; ; NOTE: computation is much more intensive and therefore SLOW, ; when Levenshtein distance is utilized. ; ; NOTE: when Levenshtein distance is used, the parameter "Primer_Mismatches" ; serves as a general criterion that takes into account ; weights (costs) of code mismatch, insertinos, etc. ; Please, see the next set of parameters. ; ; NOTE: if the parameter "Primer_Mismatches" is set to 0, ; insertions are silently disabled internally, ; regardles of the value of the parameter "Primer_Levenshtein". ; ; Default value: no (use simple side-by-side comparison). ; ;Primer_Levenshtein no ; set the value to "Yes" or "No" ; ; 11. The parameters "Primer_Insertion_Cost", ; "Primer_Deletion_Cost" and "Primer_Wrong_Code_Cost" ; fine-tune calculation of Levenshtein distance between ; Primer and DNA. When using simple side by side comparisons ; (Levenshtein distance is not utilized) these parameters are ignored. ; ; The parameter "Primer_Insertion_Cost" specifies how much of ; a mismatch value is contributed to by each fictitious insertion ; into Primer sequence. ; ; The parameter "Primer_Deletion_Cost" specifies how much of ; a mismatch value is contributed to by each fictitious deletion ; from Primer sequence. ; ; The parameter "Primer_Wrong_Code_Cost" specifies how much of ; a mismatch value is contributed to by a mismatch of a Primer and ; DNA code. ; ; The actual mismatch value is calculated as a sum of all these ; contributions over the whole Pattern (i.e. sum over all Primer sites). ; If the actual mismatch value is not greater than the value of the ; parameter "Primer_Mismatches", then the Primer does bind to DNA at ; the location in question, otherwise it does not. ; ; A proper selection of these parameters can adopt Primer ; discovery algorithm to virtually any desired scenario of ; usage. ; ; For example: when all three parameters are set to 1, ; each insertion, deletion and mismatch contributes ; equal amount (1) to a mismatch value. ; If "Primer_Mismatches" is set to 2, then ; Primer binds to DNA if: [1] there are no more ; than two mismatches between the two, [2] ; at most one fictitious insertion or deletion ; results in no more than one additional mismatch, ; [3] two insertions or deletions results in no mismatches. ; ; Contrary, when "Primer_Wrong_Code_Cost" is set ; to 3 and "Primer_Mismatches" is still set to 2, then ; up to two (insertions + deletions) are allowed ; which result in no mismatches between DNA and Primer codes. ; Any mismatch between the pattern codes results in ; Primer not being able to bind to DNA at the spot, ; since any code mismatch results in mismatch value of ; at least 3, but the value of at most 2 is allowed. ; ; Similarly, it is possible to allow insertions into ; Primer pattern but not deletions by setting the parameter ; "Primer_Deletion_Cost" to a greater value than the ; parameter "Primer_Mismatches". ; ; As the above examples demonstrate, different tuning values can ; result in radically different binding scenarios. ; ; NOTE: generally, the value of "Primer_Wrong_Code_Cost" should NOT ; (but it can) exceed the sum of "Primer_Insertion_Cost" and ; "Primer_Deletion_Cost" since otherwise the behaviour of ; the algorithm may be different from intended. Namely, ; if Primer and DNA codes do not match at certain spot, ; the cost of wrong code would NOT be "Primer_Wrong_Code_Cost" ; (as expected) but the sum of "Primer_Insertion_Cost" and ; "Primer_Deletion_Cost", since it is cheaper to delete ; offending code and insert a new one instead of direct replacement. ; ; Default value: 1 (for all three discussed parameters). ; ;Primer_Insertion_Cost 2 ; se the value to a desired cost of insertion into Primer sequence ;Primer_Deletion_Cost 2 ; se the value to a desired cost of deletion from Primer sequence ;Primer_Wrong_Code_Cost 1 ; se the value to a desired cost of mismatch with DNA code ; ; 12. The parameter "Primer_Look_Ahead" specifies number of DNA ; sites that Primer binding algorithm searches for a better binding ; when binding place is already discovered. Such behaviour intents to ; discover the best place for Primer binding among several possible ones. ; ; For example: Primer pattern equals AATC, whereas DNA pattern ; is GGAATCCGCC and "Primer_Mismatches" is set to 3 ; (three mismatches are tolerable). For the sake of ; simplicity Levenshtein distance is not utilized. ; ; The following sketch reveals amount of Primer ; mismatch at several DNA positions. ; ; Primer: AATC AATC AATC ; DNA: GGAATCCGCC GGAATCCGCC GGAATCCGCC ; ---------------------------------------------- ; mismatch: xxxx x xx ; ; The first presented position results in 5 mismatches, ; so the Primer does not bind here. In the second ; position there are 3 mismatches, therefore according ; to the selected criterion ("Primer_Mismatches" is set to 3) ; the Primer can bind here. However, the question is, whether ; this is the best binding position. By looking ahead ; one more place, we see that even a better match is possible ; and the third presented position is a natural binding place. ; ; As the example reveals, by proper utilization of the parameter ; "Primer_Look_Ahead" Primers are not stuck in the first discovered binding ; position, which may not be the the optimal choice. Instead, the algorithm ; puts the specified amount of further locations under the test, after ; the binding position is already discovered, in a hope that a more natural ; binding place will be revealed. ; ; A positive value of the parameter specifies number of DNA sites to be ; looked ahead. ; ; A value of 0 disables look ahead altogether. ; ; A negative value means that as many sites as the length of the acutual ; Primer minus "Primer_Look_Ahead" value plus 1 are checked. ; If "Primer_Look_Ahead" is set to -1, as many DNA sites ; as Primer length are subjected to look ahead. The value of -2 means ; one less DNA site than Primer length, etc. ; ; NOTE: look ahead costs time. If five DNA sites are looked ahead, ; then binding discovery algorithm is run five times instead ; of one, which may be especially noticeable when using ; fictitious insertions. ; ; NOTE: if the parameter "Primer_Mismatches" is set to zero, look ahead ; is silently disabled internally. ; ; Default value: 0 (look ahead is disabled). ; ;Primer_Look_Ahead 2 ; the absolute number of DNA sites that are subjected to look ahead ;Primer_Look_Ahead -1 ; look ahead is equal to Primer length ;Primer_Look_Ahead 2000 ; ; ; 13. The value "Max_Degeneration_Between_Primers" specifies maximal ; allowed number of degenerated codes between the two Primers. ; If the actual number of degenerated codes is greater than the ; specified value, the sequence is rejected (does not contribute ; to the result of the analysis). To skip this test, set the parameter to "No". ; ; Default value: No (degeneration test is skipped). ; ;Max_Degeneration_Between_Primers 4 ; allowed degenerated sites or "No" ;Max_Degeneration_Between_Primers no ; allowed degenerated sites or "No" ; ; 14. When a value of parameter "Revert_DNA_When_No_Match" is "Yes", ; then DNA sequence is reverted if none of the forward/reverse ; Primer pairs are found or if all found fragments are rejected ; due to excess degeneracy. The reverted sequence is then tried ; again as if it were the original one. ; ; For example, the original sequence ACCCTG is reverted into CAGGGT ; and subjected to the same analysis as the original one. ; ; Default value: Yes (DNA is reverted if no Primers are found). ; ;Revert_DNA_When_No_Match no ; set the value to Yes or No ;******************************************************************* ; ; III. Enzymes discovery specification ; ;******************************************************************* ; ; 15. Multi-value parameter "Enzyme_Dictionary" specifies dictionary ; file(s) with Enzyme names and their FASTA patterns. The contents ; of each dictionary file has to be arranged into two columns with ; space(s) and/or tab(s) in-between. The first column of each row ; contains Enzyme name and the second one Enzyme's associated FASTA ; pattern (FASTA blanks are not allowed). DNA cutting ; point is specified with character "^", which can appear only ; once in the pattern. If cutting point is not specified, ; then DNA is cut before the Enzyme. ; ; As most Enzymes used in generation of Terminal Restriction ; Fragments are four cutters (have four nucleotide recognition ; sites), the use of other more exotic Enzymes was not anticipated ; and therefore was not set as a priority when developing the ; program. Nevertheless, all available Enzymes can be used. ; ; For Enzymes, which cleave away from their recognition sequence, ; the cutting points currently CANNOT be indicated in parentheses, ; like GACGC(5/10). However, it is possible to specify the cutting ; point before or after the Enzyme by prepending or appending ; sufficient nucleotide ambiguity codes "N" to the recognition ; sequence. ; ; For example, the pattern "GACGC(5/10)" must be specified as ; "GACGCNNNNN^". The program correctly eliminates any leading ; and trailing "N" codes form the recognition sequence so that ; such Enzymes can be recognized immediately before the start ; of reverse Primer (or immediately after the forward Primer ; in the case of leading "N" codes). ; ; Enzyme cutting point is allowed to be located in the sequence ; of the Primer binding site but if it is located before ; forward Primer or after reverse Primer, the Enzyme is treated ; as not found. ; ; Please, see exemplar file "Enzymes.txt" for a template. ; ; NOTE: if you intend to specify Enzymes of interest by hand ; by setting the soon to be described parameter "Enzymes", ; you do not need to (but you can) provide dictionary file. ; In this case leave the parameter "Enzyme_Dictionary" ; unspecified (i.e. comment out the appropriate line by ; putting semicolon at its beginning). ; ;Enzyme_Dictionary "Dict1.txt Dict2.txt Dict3.txt" ;Enzyme_Dictionary "Enzymes.txt" ;Enzyme_Dictionary "4cutters.txt" ; ; 16. Multi-value Parameter "Enzymes" specifies Enzymes by their ; names (e.g. AccII) or by their FASTA patterns. In the former case, ; the Enzyme must be in a dictionary so that its pattern can be ; deduced from the name. In the later case use FASTA format to ; specify Enzymes; please read description of the parameter ; "Enzyme_Dictionary", above. ; ; You can specify as many Enzymes as needed, separated with spaces ; and tabs. Use, several "Enzymes" prefixed lines if needed. ; Do not forget double quotes. ; ; NOTE: if you do not specify this parameter, then all Enzymes ; in the file "Enzyme_Dictionary" are subjected to the ; analysis. To achieve this functionality, simply comment ; out all "Enzymes" lines. ; ; ;Enzymes "AbsI CC^GG AbaI GCTA" ; intermix of names and patterns ;Enzymes "ATC^G GGCTAA GG^CAA AccI G^GTACC" ;Enzymes "CCAA AccB1I AccII" ;Enzymes "AccB1I AccII AbaI" ; ; 17. The parameters: ; ; "Max_Enzyme_Degeneration", "Strict_Enzyme_Match", ; "Enzyme_Matches_Aligned_Sites_Only", "Enzyme_Mismatches" ; and "Enzyme_Look_Ahead", ; ; have the same role when discovering restriction sites ; than the parameters: ; ; "Max_Primer_Degeneration", "Strict_Primer_Match", ; "Primer_Matches_Aligned_Sites_Only", "Primer_Mismatches" ; and "Primer_Look_Ahead", ; ; respectively, play with Primer discovery algorithm. ;Max_Enzyme_Degeneration 4 ;Strict_Enzyme_Match no ;Enzyme_Matches_Aligned_Sites_Only no ;Enzyme_Mismatches 0 ;Enzyme_Look_Ahead 4 ; ; 18. The value "Max_Fragment_Degeneration" specifies maximal ; allowed number of degenerated codes in the fragment determined ; by Primer and Enzyme's restriction site (i.e. from the beginning ; of forward Primer to the cut point or from cut point to the end of ; the reverse Primer). If the actual number of degenerated codes ; is greater than the specified value, the fragment is rejected ; (does not contribute to the result). To skip this test, set ; the parameter to "No". ; ; Default value: No (degeneration test is skipped). ; ;Max_Fragment_Degeneration 8 ; allowed degeneracies or "No" ;Max_Fragment_Degeneration no ; allowed degeneracies or "No" ; ; 19. The "Yes" value of parameter "Treat_No_Enzyme_As_Full_Hit" ; selects that in the cases where Enzyme cannot be located, ; the whole DNA fragment between the beginning of forward Primer ; and the ending of reverse Primer is treated as resulting Enzyme ; fragment (in both forward and reverse reports). When the value ; of the parameter is "No", a non-found Enzyme does not contribute ; to the report. ; ; Default value: Yes (no Enzyme is treated as the whole fragment). ; ;Treat_No_Enzyme_As_Full_Hit yes ; set the value to Yes or No ;******************************************************************* ; ; IV. Output specification ; ;******************************************************************* ; ; 20. Parameter "Output_Directory_Root" specifies the first part of ; directory/folder name for storing outputs of the analysis. ; ; Default value: none, which means that results are stored in the ; current BEsTRF directory/folder, which is usually ; the location from which the program is executed. ; ; Further, a non-negative (true) value of the parameter ; "Output_Directory_Append_Date" instructs that the current system ; date and time are appended to the root specified by the parameter ; "Output_Directory_Root". ; ; Default value: true ; yes, date/time is appended to the name ; ; Note: since default value of the parameter"Output_Directory_Root" ; is none and default value of "Output_Directory_Append_Date" ; is true, the results are by default stored in newly created ; directory/folder with the name, derived from the current time. ;Output_Directory_Root genetic_engineering ;Output_Directory_Root my_new_folder_ ;Output_Directory_Append_Date yes ; ; 21. Parameter "Primer_Output_File_Name" specifies file name for ; storing the Primers-only analysis results. The results of the ; report are sorted according to the number of DNA sequence ; matches (primary sort key). The results with equal primary sort ; key are further sorted according to the number of different ; fragment lengths that the Primer pairs produce. ; ; Parameters "Forward_Output_File_Name" and ; "Reverse_Output_File_Name" specify file names for storing the ; forward and reverse Primer/Enzyme fragment analysis results, ; respectively. The results of the report are sorted according to ; the number of different fragment lengths (primary sort key). ; The results with equal primary sort key are further sorted ; according to the number of DNA sequences recognized by ; Primer pairs. ; ; Parameters "Forward_Alt_Output_File_Name" and ; "Reverse_Alt_Output_File_Name" specify file names for storing ; exactly the same report as it is written to files ; "Forward_Output_File_Name" and "Reverse_Output_File_Name", ; respectively, except that this time the contents are sorted ; according to the number of DNA sequences recognized by Primer ; pairs (primary sort key). The results with equal primary sort ; key are further sorted according to the number of different ; fragment lengths. I.e. the sort criterions are swapped in ; comparison to the "non-ALT" version of the report. ; ; Parameters "Forward_Enzyme_Fragments_File_Name" and ; "Reverse_Enzyme_Fragments_File_Name" specify file names for ; storing analysis of Enzyme fragment lengths according to ; utilized Primer pair. ; ; All outputs are in plain ASCII (text) format suitable for ; importing into spreadsheet programs. ; ; If you do not want to generate some of these reports, ; comment out their respective parameters. ; ; NOTE: any file(s) with specified name(s) that already exist(s) ; will be erased. ; Primer_Output_File_Name "primer_results.txt" Forward_Output_File_Name "enzyme_forward_results1.txt" Forward_Alt_Output_File_Name "enzyme_forward_results2.txt" Reverse_Output_File_Name "enzyme_reverse_results1.txt" Reverse_Alt_Output_File_Name "enzyme_reverse_results2.txt" Forward_Enzyme_Fragments_File_Name "fwd_fragments.txt" Reverse_Enzyme_Fragments_File_Name "rev_fragments.txt" ; ; 22. If parameter "Table_Header" is set to "Yes", then table ; headers are added to the reports for easier navigation in ; spreadsheet programs. ; ; Default value: Yes (table headers are added to the reports). ; Table_Header yes ; set the value to Yes or No ; ; 23. Parameter "Number_of_Best_Primer_Histograms" determines ; how many histograms of resulting fragment lengths of the best ; Primer pairs should be included in the report ; "Primer_Output_File_Name". ; ; If the report should include all histograms, ; set the parameter to "All". ; ; If the report should include no histograms, ; set the parameter to "None". ; ; Default value: All (include all histograms in reports). ; ;Number_of_Best_Primer_Histograms 7 ;Number_of_Best_Primer_Histograms all ;Number_of_Best_Primer_Histograms none ; ; 24. Parameter "Minimal_Primer_Histogram_Entries" determines ; how many different fragment lengths a Primer histogram must ; contain minimally to be included in the report. For example, ; if Primer pair generates only one fragment length and you do ; not want to examine such histograms, set the parameter to ; two, etc. ; ; If the report should include all histograms with at least one ; entry, set the parameter to "All". ; ; Default value: All (include all histograms in reports). ; ;Minimal_Primer_Histogram_Entries 3 ;Minimal_Primer_Histogram_Entries all ; ; 25. Parameter "Number_of_Best_Results" determines how many ; the best Forward_Primer/Reverse_Primer/Enzyme combinations ; should be included in the reports "Forward_Output_File_Name" ; and "Reverse_Output_File_Name" and their "Alt" counterparts. ; ; If the reports should include all combinations, ; set the parameter to "All". ; ; Default value: All (include all results in reports). ; ;Number_of_Best_Results 12 ;Number_of_Best_Results all ; ; 26. Parameter "Number_of_Best_Histograms" determines how many ; histograms of the resulting fragment lengths of the best ; forward_Primer/Reverse_Primer/Enzyme combinations should be ; included in the reports "Forward_Output_File_Name" and ; "Reverse_Output_File_Name" as well as their "Alt" counterparts. ; ; If the reports should include all histograms, ; set the parameter to "All". ; ; If the reports should include no histograms, ; set the parameter to "None". ; ; Default value: All (include all histograms in reports). ; ;Number_of_Best_Histograms 15 ;Number_of_Best_Histograms all ;Number_of_Best_Histograms none ; ; 27. Parameter "Minimal_Histogram_Entries" determines how many ; different fragment lengths a histogram must contain minimally ; to be included in the reports "Forward_Output_File_Name" and ; "Reverse_Output_File_Name" as well as their "Alt" counterparts. ; For example, if Primer/Enzyme combination generates only one ; fragment length and you do not want to examine such histograms, ; set the parameter to 2, etc. ; ; If the reports should include all histograms with at least one ; entry, set the parameter to "All". ; ; Default value: All (include all histograms in reports). ; ;Minimal_Histogram_Entries 3 ;Minimal_Histogram_Entries all ; ; 28. Parameters "Rejected_No_Primers" and "Rejected_No_Enzymes" ; specify file names for storing DNA sequences that do not match ; with any forward/reverse Primer pair or for which no Enzymes were ; discovered, respectively. You can use these files to inspect ; undetected and untreated DNA sequences by importing the files into ; other programs. ; ; If you do not want this(these) output(s) to be be generated, ; leave the respective parameter(s) unspecified, ; i.e. comment out the appropriate line(s). ; ; NOTE: any file(s) with specified name(s) that already exist(s) ; will be erased. ; Rejected_No_Primers rejected_NP.txt Rejected_No_Enzymes rejected_NE.txt ; ; 29. Parameter "Accepted_File_Name" specifies file name for storing ; DNA sequences that match with at least one forward/reverse Primer ; pair and in addition contain at least one recognized Enzyme ; sequence (unless the parameter "Accepted_Requires_Enzyme_Match" ; relaxes this demand). You can use this file to inspect good DNA ; sequences by importing the file into other programs. ; ; The purpose is to generate new sequence datasets from imported ; sequence database using best Primer combination, export it and ; use it for comparative phylogenetic analyses using other programs. ; Additionally, Primer set spanning the largest part of the gene ; of interest can be used to create new sequence dataset that can ; be used as a new input database for the comparative analysis of ; Primer-Enzyme-Gene region sets. User should keep in mind that all ; gene regions are not uniformly covered by the same abundance of ; deposited sequences to public databases. ; ; If you do not want this output to be generated, leave the ; parameter unspecified (comment out the appropriate line). ; ; NOTE: any file with specified name that already exists ; will be erased. ; Accepted_File_Name accepted.txt ; ; 30. Parameter "Accepted_Requires_Enzyme_Match" determines whether ; DNA sequence must match at least with one Enzyme in addition to ; matching with at least one Primer pair, to be exported into ; file specified by the parameter "Accepted_File_Name". ; ; If this parameter is set to false, then sequences need only match ; with Primers to be included into the file, specified by the ; parameter "Accepted_File_Name". ; ; Default value: True (DNA must match with Primer pair AND Enzyme). ; Accepted_Requires_Enzyme_Match false ; ; 31. A "true" value of parameter "Group_Accepted_Into_Files" ; organizes DNA patterns into files according to forward/reverse ; Primer pair match. One DNA pattern can apperar in several ; files if it matches with several Primer pairs. ; ; If you set this parameter to true, you must also provide ; parameter "Accepted_File_Name" (see above), so that ; the program can extract file name root from it. ; ; Default value: False. ; Group_Accepted_Into_Files true ; ; 32. A "Yes" value of parameters "Drop_Blanks_From_Rejected" and ; "Drop_Blanks_From_Accepted" results in dropping FASTA blank ; characters "-" from DNA patterns that are exported to the files ; "Rejected_No_XXXXXXX" and Accepted_File_Name, respectively. ; ; For example: instead of outputting "AA-C----G----T---A" ; the actual output is "AACGTA". ; ; Default value: Yes (FASTA blanks are dropped from output). ; Drop_Blanks_From_Rejected no ; set the value to Yes or No Drop_Blanks_From_Accepted no ; set the value to Yes or No ; ; 33. A "Yes" value of parameter "Use_Locale_Settings" instructs ; BEsTRF to query operating system settings for determining ; locale format for outputting numbers into report files ; (decimal separator like decimal point or decimal comma, and ; thousand separator, etc.). ; ; Specifying a "No" value instructs usage of a default ; English number format. ; ; Default value: No (do not use locale specific settings). ; ;Use_Locale_Settings yes ; 34. A "Yes" value of parameter "TRF_Phylogeny_Report" instructs ; creation of separate file for each different combination ; of Primer pair / Enzyme / T-RF length and to store there ; sequences that generate the associated T-RF lengths. ; ; The separate report is created for forward and reverse fragments. ; Further, there are two sets of files for both reports, where one ; set contains T-RFs only, whereas other set contains fragments ; that correspond to an associated Primer pair. ; ; Note. This report substantially slows down BEsTRF since each ; sequence is stored into file individually upon processing. ; Try to generate this report with as little Primer pairs and Enzymes ; as possible. Further, drop FASTA blanks from this report ; if aligned sequences are not required for further use ; (please, see the next parameter - 35). ; ; Default value: No (T-RF_Phylogeny_Report is not produced) ; ;TRF_Phylogeny_Report yes ; set the value to Yes or No ; ; 35. A "Yes" value of parameters "Drop_Blanks_From_TRF_Phylogeny_Report" ; results in dropping FASTA blank characters "-" from DNA patterns ; that are exported to the TRF_Phylogeny_Report. ; ; For example: instead of outputting "AA-C----G----T---A" ; the actual output is "AACGTA". ; ; Default value: Yes (FASTA blanks are dropped from output). ; ;Drop_Blanks_From_TRF_Phylogeny_Report no ; set the value to Yes or No ;******************************************************************* ; ; V. Running the application ; ;******************************************************************* ; ; How to run the program in MS(R) Windows(TM) and Linux(R)? ; ; A. Download the appropriate ZIP archive. ; ; B. Unpack the ZIP archive into the folder/directory ; of your choice. ; ; C. Have your FASTA sequences at hand (on your local disk). ; ; D. Prepare parameters file (like this one) according to your ; specifications and desires. ; ; ; ; ***** Windows specific ***** ; ; E. Double click on the EXE file to start the program. ; ; F. When the program asks you for the name of the file with ; parameters (the one that you have prepared in step "D"), ; properly satisfy its curiosity. ; ; G. Grab a cup of coffee, some lunch, or go to vacation, ; depending on an imposed workload. ; ; H. Use the results of the analysis in whatever way you like. ; Examine directory log_of_analysis that contains ; progress report and other files that reveal level of success. ; ; ; ; ***** Linux specific ***** ; ; In Linux you usualy do not have the possibility of the first ; approach that we described for Windows because graphics ; user interfaces do not let you double click on an executable ; to run it. Instead, you must resort to the analogy of the ; second aproach. ; ; E. Open "Terminal Window" or "Linux Console" into the directory ; with program executable. Do this by selecting the appropriate ; choice in a menu of your GUI "Linux explorer" ; (like Dolphin if you are using KDE(R) desktop environment). ; Note that in Linux it is usually possible by default to open ; the terminal window directly into the directory of ; your choice. ; ; F. Start the program by entering the following command: ; ; ./BEsTRF_version your_parameter_file.txt ; ; ...note the sequence of characters "./" at the beginning ; of the line, which you must not forget; ; ; Alternatively, you can copy program executable into some ; directory on the PATH (usually, a suitable choice ; is "/usr/local/bin"), so that you can execute it from any ; directory and without the annoying sequence "./". ; ; G. Grab a cup of coffee, some lunch, or go to vacation, ; depending on an imposed workload. ; ; H. Use the results of the analysis in whatever way you like. ; Examine directory log_of_analysis that contains ; progress report and other files that reveal level of success. ;