3. Commands: (general, GBS, TI, JTV) | 4. Examples | 5. Data sets | 6. Software design | Source code | Javadocs | 7. batchScripts/ directory | 8. About | Description Figures: 1 | 2 | 3 | 4 ConvertGUI Figures: G.1 | G.2 | G.3 | G.4 | G.5 | G.6 | G.7 | SearchGUI Figures: S.1 | S.2 | S.3 | S.4 | S.5 | S.6 | S.7 | S.8 | S.9 | S.10 | S.11 | S.12 | S.13 | S.14 | S.15 | S.16 | S.17 |
File: ReferenceManual.htmlHTMLtools is a Java program to automate the batch conversion of tab-delimited spreadsheet type text files to HTML Web-page files. There are a variety of flexible options to make the Web page presentations more useful. It can also be used for editing large tables. This is described in more detail throughout this reference manual.
Additional command subsets were developed for specialized conversions (JTVconvert, GenBatchScripts, and TestsIntersections) and may be ignored for routine Web page generation in other domains where they don't apply.
The JTVconvert commands can re-map data array names in Java TreeView mAdb data set files to more user friendly experiment names as well as generate HTML Web pages to launch these converted JTV files for each JTV data set.
The GenBatchScripts commands may be used to generate HTMLtools batch scripts for subsequent processing given a list of data test-results files to convert and a tab-delimited tests descriptions file. It is able to use a table (prepared with Excel or some other source) that describes this data and can then use that for extracting and inserting information from various mapping tables into the generated Web pages.
The TestsIntersections commands will synthesize a tests intersection summary table and Web page as well as generating some summary statistics. It uses the same data used in the GenBatchScripts commands.
The Converter GUI mode starts a graphical user command interface (GUI) where they can specify either individual parameter scripts or a batch file list of scripts to be executed in the background with the results shown in the user interface including the ability to view the generated HTML files though a pop up Web browser.
The Search GUI mode starts a database search graphical user interface (GUI) to generated a "flipped table" ((see examples Figure S.14 and Figure S.16)) on a subset of the data from a pre-computed edited table database. They can specify filters on the rows and columns (data and samples subsets), and presentation options to generate a HTML file though a pop up Web browser.
Note: this software is released as an
OPEN SOURCE project HTMLtools on
http://HTMLtools.SourceForge.net/ with a small non-proprietary sample data
set that is bundled along with the program. This data set has already been
published on the public access NLM/NCBI GEO Web site. The original data set was
proprietary created for the Group STAT Project (GSP) and was created along
with the original conversion program, CvtTabDelim2HTML, to support the NIH
Jak-Stat Prospector Web site that is part of the Trans-NIH Jak-Stat
Initiative (http://jak-stat.nih.gov/)
accessible when opened up to the public in the future. Note: the initial release only includes GSP data that has been released to the public in NCBI's GEO database (currently one data set). As more GSP GEO data is released to the public,we will include some of them in the demo database to illustrate more of the features of HTMLtools. |
1. Gather sets of laboratory experiments in multiple laboratories relating to the Jak-Stat gene pathway | v 2. Create Affymetrix microarray data (resulting in .CEL data files) | v 3. Create Inventory of relevant data and annotation of the data in the GSP-Inventory.xls spreadsheets consisting of: 1) group arrays by experiment EG001, EG002,...EG00n; 2) a top level spreadsheet ExperimentGroups describing all EG experiments. | v 4. Consolidate data in mAdb (Microarray DataBase system mAdb.nci.nih.gov). Data is uploaded to each EGxxx subproject, and normalized by pooled RMA or MAS5. | v 4.1 Perform t-test or fold-change tests on subsets of the data that makes sense to compare saving results (+ and - changes separately) as gene subsets. | v 4.2 Export tab-delimited (Excel) mAdb Retrieval Reports (MRR) for each gene set for 1) just the arrays used in the test; 2) all samples in the database. | v 4.3 Compute and export the hierarchical clustered heat maps as Java Tree View (JTV) .zip tab-delimited data sets for external viewing for 1) just the arrays used in the test; 2) all samples in the database. | v 5. Convert the MRR and JTV tab-delimited data to HTML Web pages using the HTMLtools tools | v 6. Merge links to this generated data with Web pages in the Jak-Stat Prospector Web server (and upload to the server). Figure 1. shows an example of a data analysis processing pipeline to convert laboratory microarray data to Web pages that can be used in a the Jak-Stat Prospector Web site. Steps 4.1 and 4.2 could be run for a set of experiments as a batch job. Similarly, the set of files exported from mAdb could be batch processed with HTMLtools. Note that although the HTMLtools converter was developed for this project, the command structure is flexible enough that it could easily be used with other types of data. |
Section 7 describes creating and running the scripts for the batchScripts/ directory for creating Web pages.
1. Read the mAdb-TestsToDo.txt table that specifies all of the tests to be performed on subsets of the mAdb GSP database. For each test, these include: test name, samples being compared, test thresholds, test name and related annotation, tissue name, relative directory for the data (used both for input InputTree/ and output data Analyses/) generated directory trees. | v 2. Create lists of related tests by grouping by same tissue name. | v 3. Read additional mapping table files (ExperimentGroups.map, EGMAP.map, CellTypeTissue.map table to use in generating the summary web pages. | v 4. Create summary Web pages for each tissue type with links to Web pages for analyses we will generate and save in the Summary/ directory. | v 5. Generate all of the 'params .map' batch scripts, several for each test, and save them in the ParamScripts/ directory (see Figure 3 for details). It then copies all support files (above mapping tables), JTVjars/, data.Table/ and other files required when running converter to generate Web pages. | v 6. Generate a buildWebPages.doit file listing the params .map files to be processed with a subsequent batch run using the HTMLtools converter, and a Windows .BAT file, buildWebPages.bat v 7. Start the buildWebPages.bat batch job which generates the Web pages in the Summary/, Analyses/ and JTV/ directory trees. | v 8. Copy the generated Web pages to the Web server. Figure 2. shows an example of the batch script generation pipeline from a table describing a lists of tests that were run as a batch job on another analysis system. In this case, the analysis system is mAdb and it uses the same test "todo" file to specify the tests data.Table/mAdb-TestsToDo.txt as are used here with the GenBatchScripts processing. The mAdb data analysis and tab-delimited Excel data generated is shown in steps 4.1 and 4.2 (see Figure 1). In the GenBatchScripts processing, we first create a batchScripts/ directory and then fill it with various types of data described in this figure. |
tests (MRR & JTV) input: Converter output: Tests for samples: {testName}+FC.txt {testName}+FC.html {testName}+FC-keep.html {testName}-FC.txt {testName}+FC.html {testName}-FC-keep.html AND of above tests for ALL samples: {testName}+FC-ALL.txt {testName}+FC-ALL.html {testName}-FC-ALL.txt {testName}-FC-ALL.html JTV for test samples: {testName}+FC-JTV.zip {testName}+FC-JTV/ {testName}+FC-JTV.zip {testName}+FC-JTV.html {testName}-FC-JTV.zip {testName}-FC-JTV/ {testName}-FC-JTV.zip {testName}-FC-JTV.html JTV above for AND of above tests for ALL Samples: {testName}+FC-ALL-JTV.zip {testName}+FC-ALL-JTV/ {testName}+FC-ALL-JTV.zip {testName}+FC-ALL-JTV.html {testName}-FC-ALL-JTV.zip {testName}-FC-ALL-JTV/ {testName}-FC-ALL-JTV.zip {testName}-FC-ALL-JTV.html Figure 3. shows the set of 8 mAdb results files and 18 converter HTML and JTV generated for each test testName in the mAdb-TestsToDo list. For example, if the test is "EG3.1-test-2", then in the above figure, replace {testName} with EG3.1-test-2, etc. The "+FC" indicates a positive fold-change, and the "-FC" a negative fold-change. The file with "-keep" are gene lists with no expression data. The GenBatchScripts option for the converter generates parameters .map batch scripts for each of these converted files. |
1. Edit the <GSP-Inventory Excel workbook to annotate the set of Affymetrix .CEL files where we assign the next free Experiment Group EGnnn, simple GSP ID, GSP ID, etc. | v 2. Upload the Affymetrix .CEL file data to the GSP mAdb database and normalized the new samples using the pooled RMA data for the base GSP database. | v 3. Add new test to-do in the mAdb-TestsToDo.xls Excel workbook and upload the new test list to mAdb. | v 4. Run the batch tests in mAdb resulting in Excel and JTV data sets that are exported for conversion to Web pages. | v 5. Process these data using the HTMLtools converter into HTML pages and converted data for the Web server. | v 6. Upload these Web pages and data to the NIDDK Jak-Stat Prospector staging area for the jak-stat.nih.gov server. Figure 4. shows shows shows the top-level procedure used for adding new Affymetrix .CEL file data sets to the GSP database and Jak Stat Prosector Web server. In addition, if new gene identifications are made to some of the affymetrix probes (Feature IDs), running steps 4) through 6) can update these identifiers. |
Java TreeView (JTV) DocumentationJava TreeView is an open-source (jTreeView.sourceforge.net) Java applet that mAdb uses to view heatmaps of gene sets. We also use Java TreeView for looking at data snapshots we have taken of the mAdb data.Java TreeView may be downloaded to run as either a standalone application or Java applet from http://jTreeView.sourceforge.net/. The 2004 journal paper by Alok J. Saldanha gives an overview of Java TreeView: "Java Treeview�extensible visualization of microarray data" Bioinformatics 2004 20(17):3246-3248.There is additional Java TreeView documentation Web page includes links to examples, an FAQ, a user guide, Alok J. Saldanha's disertation describing additional aspects of Java Treeview. NOTE: The Java TreeView applet has been shown to work on Mac OSX, XP and Win2K.
|
java -Xmx256M -classpath .;.\HTMLtools.jar \ HTMLtools -inputDir:dataXXX -outputDir:html (etc.)
java -Xmx256M -classpath .;.\HTMLtools.jar \ HTMLtools dataXXX/paramsXXX.map or (in Unix, MacOS-X, or Cygwin): java -Xmx256M -classpath .;./HTMLtools.jar \ HTMLtools dataXXX/paramsXXX.map or java -Xmx256M -classpath . -jar HTMLtools.jar \ dataXXX/paramsXXX.mapwhere the dataXXX/ directory and the paramsXXX.map file are replaced by your data directory and params map file. Then, the generated HTML files will be in the html/ directory or whatever output directory is specified by the -outputDirectory switch in the paramsXXX.map file. This command line tells Java to run the program with 256 Mbytes of memory. For very large files, you may need to increase this memory size. For very large data sets, even that may cause problems and you may not be able to convert them since for the default mode, the Table is loaded into memory before being edited. Some commands such as -fastEditFile are designed to work with very large files and process them as a buffered I/O pipeline and so don't load the Table into memory.
java -Xmx256M -classpath .;.\HTMLtools.jar \ HTMLtools -batchProcessing:batchList.doit
java -Xmx256M -classpath .;.\HTMLtools.jar \ HTMLtools data.GBS:params-genBatchScripts.map
It is invoked from the command line as: java -Xmx512M -classpath .;.\HTMLtools.jar HTMLtools -gui or using the M.S. Windows script (similar for Mac and Linux): cvtTxt2HTML-GUI.batThis is illustrated in the following screen shots.
Figure G.1 This shows the Initial graphical user interface. Using the File menu, the user should select either a batch ".doit" file or a parameter ".map" script file. The File menu is shown in Figure G.2
Figure G.2 This shows the File menu, the user should select either a batch ".doit" file or a parameter ".map" script file.
Figure G.3 This shows GUI after selectiong the script to process. The user then presses the Process button to start processing. The next Figure G.4 shows the program during processing.
Figure G.4 This shows GUI during Processing. The output from the converter is shown in the Report window in the middle of the GUI. This can be saved into a .txt file or cleared if desired. The next Figure G.5 shows the program after processing is finished.
Figure G.5 This shows the GUI after Processing. The output from the converter is shown in the Report window in the middle of the GUI. This can be saved into a .txt file or cleared if desired. The next Figure G.6 shows the program after processing is finished.
Figure G.6 This shows the GUI generated HTML options to choose to view. Selecting one of them will popup a Web browser with that file (see Figure G.8).
Figure G.7 This shows the popup Web browser window for the the selected generated HTML file you chose to view.
However, if you must explicitly run the Java interpreter, you can do it on the command line (invoked various ways on different operating systems) by typing
java -Xmx256M -classpath . -jar searchGui.jaror
java -Xmx256M -classpath .;.\HTMLtools.jar HTMLtools -searchGuiThis line was put into a Windows .BAT file (SearchGUI.bat) that can be run by clicking on this batch file. Notice that the -Xmx256M specification is available to increase or decrease the amount of memory used. The default memory may vary on different computers. So you can use the script for force it the program to start with more or less memory if you run into problems.
You also need to select the set of samples to use by selecting one or more Experiment Groups (see the Jak-Stat Prospector Web site for details on Experiment Groups). In the 2. Select one or more 'Sample Experiment Groups' window, selecting ALL is the default and will select all 18 arrays. You can click on individual Experiment Groups. To select a range, click on the first one that starts the range and then hold the SHIFT key and click the end of the range. To select non-adjacent Experiment Groups, hold the CONTROL key as you select different groups. Pressing the Reset button, will clear these two windows.
The File menu offers additional some data input options. You do not need to use any of these menu options to use the program. However, they can be useful for customizing your search results.
You can save the text output generated during processing that is shown in the 3. Processing Report Log scrollable text area at the bottom of the window. Several File menu commands are used with this including: Clear Report-Log, and Save Report-Log As. Note that the Clear report, and Save report as commands are also available in the bottom as buttons with the same names.
You must specify a list of data search terms in the upper window 1. Enter list of Gene, Well ID or Probe ID.... The simplest way to specify these terms is to either cut and paste or type them into the window. To help demonstrate and simplify specifying the search terms, there are two commands in the File menu for setting the list: Set demo term-list data to enter a short list Stat5a Stat5b 1438470_at 1441476_at 1446085_at. The other is Import user term-list data from a file. The file can be a list of Genes or Feature IDs (probes) or Well IDs or any combination. Several example files are provided including data.search/LitRefGeneList.txt file, data.search/testGeneList.txt, and data.search/testFeatureIDList.txt. The first is a tab-delimited data with all 3 fields. The latter two examples just have lists of Genes or Feature IDs.
After you finish a search, you can do another one. The File menu options: Reset converter or the Reset button at the bottom of the Window will reset the search specification and make the Process button available.
Figure S.2 This shows the menu options in the File menu.
This menu offers additional processing options described above.
Figure S.3 This shows the popup file browser for specifying a list
of gene/probes in a .txt file using the
(File | Import user term-list data from a file) menu option.
If the testGeneList.txt file was selected, the next figure shows the
new term-list.
Figure S.4 This shows the new term-list specified from importing
the gene list from a file (previous figure)..
The View menu offers additional some data input
options. The menu Verbose reporting check box
could be enabled it you want to see the details on the search and
table generating as it progresses in the Report-Log window.
When the search results table is being generated, you can modify
it's presentation using other view options:
Sort descending by column data in generated table
(see Figure S.6 for more details).
The Show data heat-map in View HTML to show the generated results
table as a colored heatmap (see Figure S.14 for
an example) This is the default. Finally, Set data precision for generated
HTML to adjust the number of digits presented in the generated table
(0 sets it to no fraction, whereas the default -1 shows the full precision
of available in the data).
Figure S.5 This shows the menu options in the View menu.
This menu offers additional processing options described above.
Figure S.6 This shows the pop up query to let you define the sort name
to specify the generated table gene or gene probe ID column to be used for
the sort process. This will then use the gene expression data for the gene
probe you specified to sort the sample rows for the entire table. The default
is not to sort the data, but to use the sample order of the samples in the
expression groups you have specified. This pop up window is invoked from the
(View menu | Sort descending by column data in generated table).
Figure S.7 This shows dialog box (View menu | Set data precision
for generated HTML). The default is -1 which prints all digits available
in the generated HTML table. Setting it to 0, removes all fractions (used in
this example).
Figure S.8 This shows the menu options in the List menu.
You may list some of the data matching the gene/probe search terms or
EG sample search terms prior to doing the search. The first option is
to list all 45K gene/probe IDs. The second menu option lets you specify
gene/probe search terms either using the exact gene names or using
substrings. All genes/probes matching will be reported. The third menu
option lets you specify EG samples search terms either using selected
EG groups from the list. In addition, this is filtered by a list of substrings
which can be qualified as both being required (AND) or either being required (OR)
if the EG sample search terms are specified. All lists are reported in
the bottom scrollable Report Window.
Figure S.9 This shows results from
List menu | List matching genes in database in the Report
window .
The genes/probes matching the substring terms "stat" and "jak"
in the 45K probe database are listed in the scrollable
Processing Report log at the bottom of the window.
Figure S.10 This shows results from
List menu | List matching EG samples in database in the Report
window using the OR condition.
The Expression Group (EG) samples matching the substring terms
".treated" or ".untreated" in the 151 sample database are listed
in the scrollable Processing Report log at the bottom
of the window. It searches within the EG sample groups you have
selected. In this example, we have selected "All samples", but any
other subset could be used. Also, we required an OR
condition to select samples where either of the search terms
are present.
Figure S.11 This shows results from
List menu | List matching EG samples in database in the Report
window using the AND condition.
The Expression Group (EG) samples matching the substring terms
".stat" and ".GH" in the 18 sample database are listed
in the scrollable Processing Report log at the bottom
of the window. It searches within the EG sample groups you have
selected. In this example, we have selected "All samples", but any
other subset could be used. Also, we are required an AND
condition to select samples where both search terms are present.
Figure S.12 This shows the Search window before processing
is finished and the Process button is made available. Pressing
it will start processing. This will typically take 7 to 10 seconds, so be
patient. Note that the View HTML button is disabled and will
be enabled after processing is completed.
Figure S.13 This shows the Search window after processing
is finished and the View HTML button is made available. Pressing
it will pop up a local web browser with the data shown in the next
figure. Note that the Process button is now disabled and will
be until you reset the converter using the Reset button.
Figure S.14 This shows the generated table Web page created
by the above search and viewed when the View HTML button
was pressed. The colored cells reflect the quantiles that the data belong
to and are based on (max, min, mean, stddev) statistics computed over
the entire database. The data was sorted by the third probe (Stat5b/1422103_a_at)
and the numeric data was listed with no fractions to make it easier to "eyeball"
the data.
Figure S.16 This shows the report generated that includes the intensity data
followed by the fold-change and statistics for that data generated using the data
in the previous figure.
Three additional subsets of specialized commands that are described
separately: the 3.1 GenBatchScripts
commands, 3.2 Tests-Intersection
and the 3.3 Java TreeView commands.
Section 7 describes creating and running the scripts for the
batchScripts/ directory for creating Web
pages.
The parameters specify the data used in the output files generation include
several directories in the batchScripts/
directory:
Additional data files are used when the -genBatchScripts command is run
including:
There may be multiple instances of the -genCopySupportFile, -genParamTemplate,
-genSummaryTemplate switches.
One could experiment with these parameter files adding or removing various
options such as -dropColumn, -reorderColumn, -sortTable, etc.
Adding fold-change statistics to the generated HTML report
The procedure used to compare the fold change of the Stat5 subsets for the
specified genes (I used the demo set of genes/probes) for the two sets of
sample EG003.1 (Stat5KO+GH) and EG003.2 (Stat5KO-GH), called classes A and B
here and in the SearchGUI menus and report.
Procedure
The generated HTML and .txt files are attached in this email. Note that the
fold-change results are appended to the regular table and the the class A and class
B samples have those identifiers prefixed to their sample names. Note that the
fold-change report is in the second half of the report with the statistics reported
being computed on the column data for each gene/probe. Note: Sorting is can't be
enabled if generating the fold-change report data since it would cause problems with
the reporting format.
Figure S.15 This shows the menu options in the View menu after
the (View | Report Fold Change of 2 sample subsets) option was enabled.
Note the two new commands that are activated: Assign EG samples to Class A and
Assign EG samples to Class B.
e.g., set 2. filter sample search term to ".stat", select EG003.1 in the scrollable
list, then select
(View menu | Assign EG samples to Class A) to define class A samples
e.g., set 2. filter sample search term to ".stat", select EG003.2 in the scrollable
list, then select
(View menu | Assign EG samples to Class B) to define class B samples.
SearcGUI Help
There are several Web pages that contain the documentation in the Help menu.
Figure S.8 This shows the menu options in the Help menu.
This menu offers additional processing options described above. This
document is the first entry Documentation on using the Search GUI.
3. COMMAND LINE SWITCHES
Command line switches are case-sensitive and of the form '-switchName:a1,a2,...,an'
where: 'switchName' is the minimum number of characters in the switch
shown below, and 'a1', 'a2', etc. are the comma-separated switch arguments
with no spaces between the commas and the arguments. Use double quotes
in arguments with spaces. Tabs are not allowed and all switches must
be on the same line unless either the switches are in a parameter file in
which case they are on separate lines, or the command lines is entered
using line continuation characters for the operating system (e.g., '\'
in Unix, etc). Switches with additional arguments require the comma-separated
arguments after the ':'. We denote the arguments as being within '{'...'}'
brackets. Note you do not include the '{' or '}' brackets in the
actual switches - it just denotes that is some argument.
There may be multiple instances of some of the switch commands
including: -files, -hrefData, -dropColumn, -keepColumn, -reorderColumn,
-sortTableByColumn, -mapDollarsigns, -mapQuestionmarks, -copyFile,
-copyTree, -genCopySupportFile, -genParamTemplate, -genSummaryTemplate,
-genCopyfile, -genTreeCopyData, -dirIndexHtml.
{parameter command file}
[this argument does not start with '-' and is thus
assumed to be a parameter command file. It will then
get all of the command switches from this file if
present. Examples of command file contents are in
the EXAMPLES section below. By convention, we name
these command text files 'paramXXX.map' with a '.map'
file extension and keep them in the same directory
that we specify with -inputDirectory. We refer to these
fileas throughout this document as "params .map" files.
The .map file extension is used for tab-delimited text
files that we do not want to convert. We only convert
tab-delimited text files with .txt file extensions.]
-addE:{opt. epilogue file name}
['-addEpilogue:{opt epilogue filename}' add an
epilogue HTML file in inputDir or user directory
(common epilogue for all conversions). If the
keywords $$DATE$$ or $$INPUTFILENAME$$ is in the
file, it will substitute today's date or file name
respectively. $$FILE_ZIP_EXTENSION$$ will substitute
the file name with a ".zip" extension. Default name is
'epilogue.html'. Default is to not add an epilogue to
the HTML output.]
-addO:{postfix name}
['-addOutfilePostfix:{postfix name}' add a postfix
name to the output file before the .html. E.g., for
an output file 'abc.html', with a postfix name of
'-xyz', the new name is 'abcxyz.html'. This can
be useful if you are mapping the same input file
by several different param.map files and saving
them all in the same html/ output directory.]
-addP:{opt. prolog file name}
['-addProlog:{opt. prolog file name}' to add a prolog
HTML file in inputDir or user directory (common prolog
for all conversions. If the keywords $$DATE$$ or
$$INPUTFILENAME$$ is in the file, it will
substitute today's date or file name respectively.
$$FILE_ZIP_EXTENSION$$ will substitute the file name
with a ".zip" extension. Default name is 'prolog.html'.
The default is to not add a prolog to the HTML output.]
-addRow
['-addRowNumbers to preface each row with sequential
row numbers. Default is to not add row numbers.]
-addT
['-addTableName' to add TABLE name to HTML. Default
is to not add the name.]
-allowH
['-allowHdrDups' to allow duplicate column fields
in the header. Default is to not allow duplicates.]
-alt:{color name}
['-alternateRowBackgroundColor:{c}' alternate the
background row cell colors in the <TABLE>.
Default is no color changes.]
-batchP:{file of param specs, opt. new working dir}
['-batchProcess:{file of param specs, opt. new working dir}'
batch process a list of param.map type files specified
in a file. If the {opt. new working dir}value is
specified, it will change the current working directory
of the HTMLtools when runnning -batchProcess so
that you can specify it run in a particular environment.
No other switches should be used with this as they will
be ignored. If errors occur in any of the batch jobs,
the errors are logged in the HTMLtools.log file
and it aborts that particular job and continues on
to do the next job in the batch list. Default
is no batch processing.]
-concat:{concatenatedDataFile,opt."noHTML"}
['-concatTables:{concatenatedDataFile,opt."noHTML"}' to
create a new tab-delimited {concatenatedDataFile} (e.g.,
".txt" or ".map" file) and a .html output file using the
base address (without the ".txt" or ".map" file extensions)
of the {concatenatedDataFile} and if the "noHTML" option
is not specified. The data is from the set of concatenated
input text files data if-and-only-if they have exactly the
same column header names. The -outputDir specifies where
the files are saved. The input files are not converted
to HTML files. Default is to not concatinate the
input files. The -makeMapFile switch can be used
along with the concat switch to make a map file with fewer
columns.]
-copyFile:{sourceTreeDir,destDir}
['-copyFile:{srcFile,destFile}' to copy an input source
file {srcFile} to a destination subdirectory {destDir}.
There can be multiple instances of this option. Default is
to not copy tree data.]
-copyTree:{sourceTreeDir,destDir}
['-copyTree:{srcTreeFiles,destPath}' to copy an input
source tree subdirectory to a destination subdirectory.
There can be multiple instances of this option. Default is
to not copy tree data.]
-dataP:{nbr digits precision}'
[-dataPrecisionHTMLtable:{nbr digits precision}' sets the
precision to use in numeric data for a generated HTML file.
The table must be a numeric data table (such as generated
using the '-flipTableByIndexMap' option. If the value is < 0,
then use the full precision of the data (as supplied in the input
string data). If {nbr digits precision} >= 0, then clip digits
as required.]
-dirIndexHtml:{dir,'O'verride or 'N'ooverride}
['-dirIndexHtml:{dir,'O'verrideor 'N'ooverride}' to create
"index.html" files of all of the files in the specified directories
in the list of directories specified with multiple copies
of this switch. It is useful when copying a set of directories
on a Web server that does not show the contents of the directory
if there is no index.html file. In addition, if the corresponding
flag'Override', then override the "index.html" file if it
already exists in that directory otherwise don't generate the
"index.html" file. Do this recursively on each directory.
Default is no index.html file generation. Multiple copies
of the switch are allowed.]
-dropColumn:{column header name}
['-dropColumn:{column header name}' to specify a
column to drop from the ouput TABLE. There can be
multiple instances of this switch.]
-exportB:{opt. big size threshold}
['-exportBigCellsToHTMLfile:{opt. size for big}'
to save the contents of big cells as separate
HTML files with a prefix
'big-R<r>C<c>-<outputFileName>.
So for a (r,c) of (4,5) and a file name 'xyz.html',
the generated name would be 'big-R4C5-xyz.html'.
The big size threshold defaults to 200. Default is no
exporting of big cells.]
-extractR:{colName,rowNbr,resourceTblFile,htmlStyle}
['-extractRow:{colName,rowNbr,resourceTblFile,
htmlStyle}' to get and lookup a keyword in the
table being processed at (colName,rowNbr) and then
to search a resourceTblFile for that keyword. If
it found, then it will extract the header row and
the data row from the resource file and create
HTML of htmlStyle to insert into the epilogue.
If $$EXTRACT_ROW$$ is in the epilogue, then
replace it with the generated HTML else insert
the HTML at the front of the epilogue. The
htmlStyles may be DL, OL, UL and TABLE. Default
is no row extraction.]
-fastE:{outTblFile}
['-fastEditFile:{opt. output file} to allow processing
input file data line by line table that does not
buffer the data in a Table structure, but remaps each
line on the fly using -mapHdrNames,
{-dropColumns or -keepColumns} followed by
-reorderColumns. Data is written immediately to an
output stream so it can handle huge files. Because
it is sequential, it can't do a -sortRowsByColumnData.
This would generally be used to generate a tab-delim
.txt files that can be random accessed. HTML table
generation is disabled. It is used instead of
'-saveEditedTable2File:{outTblFile,opt. "noHTML"}'
and overides the -saveEditedTable2File options.
Default is not to do a fast edit.]
-files:{f1,f2,...,fn}
['-files:{f1,f2,...,fn}' to specify list of files
here rather than all in all of the files in the
inputDir. You can have multiple instances of this
switch.]
-flipA:{flipAclass}
['-flipAclass:{flipAclass}' to specify the list of EG samples used
in class A if reporting the fold-change data in the flipped Table report.
Default is no list of EG samples is specified.]
-flipB:{flipBclass}
['-flipBclass:{flipBclass}' to specify the list of EG samples used
in class B if reporting the fold-change data in the flipped Table report.
Default is no list of EG samples is specified.]
-flipC:{flipColumnFile,flipColumnName} or -flipC:{*LIST*,flipColumnName,v1,v2,...vn}
['-flipColumnName:{flipColumnFile,flipColumnName}'
to specify the source Table column name to use in
filtering which row data to use in the
'-flipTableByIndexMap' operation. An alternative specification
is '-flipColumnName:{*LIST*,flipColumnName,v1,v2,...vn}'
where the values are listed explicitly. Multiple instances
of this '-flipColumnName' switch are used to specify
the header entries by '{flipColumnName}' of the new
flipped table. If the {flipColumnFile}' files exist,
they are used to filter the {flipDataFile} row entries.
Only the rows of the original Table that match one
of the {column-data-list} entries will be transposed.
Default is to transpose all rows unless the filter
files are specified.]
-flipD:{flipDirectory}
['-flipDirectory:{flipDirectory}' to specify the directory to
save the generated flipped Table. Default is the data.search
directory.]
-flipE:{flipExcludeColumnName}
['-flipExcludeColumnName:{flipExcludeColumnName}' to specify the
column names from the source Table exclude from the final flipped
Table using the '-flipTableByIndexMap' operation. Multiple instances
of this switch are allowed. Default is to include all data Table
columns unless the filter is specified.]
-flipO:{colHdrName1,colHdrName2,...,colHdrNameN}
['-flipOrderHdrColNames:{colHdrName1,colHdrName2,...,colHdrNameN}'
to specify the list of columns in the source Table that will be
used to create the flipped Table multi-line header entries.
This option must be specified when using the '-flipTableByIndexMap'
operation.]
-flipRowF:{flipRowFilterNamesfile} or
-flipRowF:{*LIST*,name1,name2,...,nameK}
['-flipRowFilterNamesFile:{flipRowNamesFile}' or the alternate
'-flipRowFilterNamesFile:{*LIST*,name1,name2,...,nameK}'switch specifies
the source Table column names to use in filtering which source sample
columns data will be used as rows in the finalflipped Table using the
'-flipTableByIndexMap' operation. Analternative specification is
'-flipRowFilterNamesFile:{*LIST*,name1,name2,...,nameK}' where the values
are listed explicitly. If the "*LIST*" name is used instead of the file
name, then the rest of the switch specifies the row names. Only the
columns of the original Table that partially match one the
{flipRowNamesFile} entries will be transposed. Default is to transpose
all data Table columns unless the filter is specified.]
-flipRowGSP:{list of filter substrings}
['-flipRowGSPIDfilters:{list of filter substrings}' is an optiona
list of substring filters used to filter Experiment Group sample name
rows in the flipped table computation when using the
'-flipTableByIndexMap:{flipDataFile,flipIndexMapFile}' switch. It matches
case-independent substrings in the GSP ID names for the samples where
if more than one substring is specified, then they must all be found
for that sample to be used (e.g., ".Stat .GH" requires a ".Stat" and
a ".GH" to be present). Default is no filtering.]
-flipSa:{flipSaveOutputFile}
['-flipSaveOutputFile:{flipSaveOutputFile}' is the alternate
output (HTML and TXT) file name to use when generating the
flipped Table using the
'-flipTableByIndexMap:{flipDataFile,flipIndexMapFile,(opt)maxRows}'
switch. Default is to generate the output file name from the
input file base name, adding a postfix using the
'-addOutfilePostfix:{postfix name}' or "-flipped" default
postfix. If the switch is not specified, it will use the base
input file name. (See Example 14
for an example of it's usage.) ]
-flipT:{flipDataFile,flipIndexMapFile,(opt)maxRows}
['-flipTableByIndexMap:{flipDataFile,flipIndexMapFile,(opt)maxRows}'
to generate a transposed file using random access file indexing to
create a multi-line header (1 line for each column name in the
list) using the list of columns previously specified with the
-flipColTableList and -flipRowTableList filters. It uses the index-map
created with '-makeIndexMapFile:{colName1,colName2,...,colNameN}'
command. It analyze the index map Table and then uses
all columns before the ("StartByte", "EndByte") columns
to define the flipped Table header. See the '-flipColTableList' and
-flipRowTableList to restrict which flipped column data to use.
See the '-flipRowTableList' to restrict which flipped row data to
use. Default is to not flip the Table.]
-flipU:{colHdrName1,colHdrName2,...,colHdrNameN}
['-flipUseExactColumnNameMatch:{TRUE | FALSE}' to specify the exact
match filter flag. If an exact match, then match '-flipColumnName:{names}'
exactly, otherwise do look for substring matches. Ignore case in
both instances. This option may be specified when using the
'-flipTableByIndexMap' operation. The default if no
flipUseExactColumnNameMatch is specified is "AND".]
-font:{-1,-2,-3,-4,+1,+2,+3,+4}
['-fontSizeHtml:{font Size modifier}' to change
the <TABLE> FONT SIZE in the HTML file.]
-gui
['-gui' to invoke the graphical user interface version
of the converter. See
Using the Graphical User Interface (GUI) to run the converter.]
-hdrL:{n}
['-hdrLines:n' to include in header. The last line
row is the one searched for mapping column URLs.
Default is 1 line.]
-hdrM:{oldHdrColName,newHdrColName}
['-hdrMapName:{oldHdrColName,newHdrColName}' to map
an old header column name {oldHdrColName} to a new
name {newHdrColName}. There may be multiple instances
of this switch. Default is to not do any mappings.]
-joinT:{joinTableFile}
['-joinTableFile:{joinTableFile}' adds the contents of
the {joinTableFile} file to the table being processed.
This allows us to add fields that can be used for
sorting the new table by the {joinTableFile} data
if it is defined. This switch can not be used with
the -fastEditFile option. Default is not to join
any tables.]
-keepColumn:{colName}
['-keepColumn:{colName}' specifies which columns
to keep in multiple instances of the switch.
Then, when the Table is processed, it drops all
columns not listed. It may be used as an
alternative to -dropColumn as the Table may have
unknown column names. Default is not active.]
-help (or '?')
[print instructions to see the README.txt file.]
-hrefD:{colName,Url,mapToken}
['-hrefData:{colHdrName,Url,(optional)mapToken}' to
get the mapping of column header name and the Url to use
as a base link to use for making a URL for Table data
in that column. It makes the URL by appending the data
in cells in that column to the Url. ([TODO] If
the optional mapToken is specified, then replace the cell
contents for the occurance of the mapToken in Url.)
There can be multiple instances of this switch. See
the following switch '-hrefHeaderRow' to change the mapping
from Table data to header rows. ]
-hrefHeaderRow
['-hrefHeaderRowMapping' is used with the above switch
'-hrefData:{colHdrName,Url,(optional)mapToken}' to map
the data in the header row(s) instead of the data in
the Table data columns. It searches the first column of
the header rows to find the colHdrName to determine
the row to be mapped to that colHdrName. Unlike the
-hrefData option, the colHdrName can be embedded within
a string. The default is not to map the header rows.]
-inputD:{input directory}
['-inputDirectory:{input dir}' where the input
tab-delimited table .txt files to be converted are
found. By convention, we name other text files that
we may need, and want to keep in the inputDirectory
but do not want to convert to HTML, with a '.map'
file extension. Examples of non-data files include
'paramXXX.map', 'prolog.html', 'epilogue.html', etc.,
Default directory is 'data/'.]
-limitM:{maxNbrRows,(opt.)sortFirstByColName,(opt.)'A'scending or 'D'escending}
['-limitMaxTableRows:{maxNbrRows,(opt.)sortFirstByColName},
(opt.)'A'scending or 'D'escending}' to limit the number of
rows of a table to {maxNbrRows}. If the {sortFirstByColName}
is specified, then sort the table first before limit the
number of rows. Default is not to limit rows.]
-log:{new log file name}
['-logName:{new log file name}' to log all
information about the processing to the console and
then to save this output in a log file. The new file
must end in ".log". Default is to use the
"HTMLtools.log" file name.]
-makeI:{colName1,colName2,...,colNameN}
['-makeIndexMapFile:{colName1,colName2,...,colNameN}' to
make an index map Table file (same name as the input file
but with an .idx file extension) of the input file (or the
file output from -saveEditedTable2File after the input
table has been edited). The index file will contain the
specified columns in the column-list followed by the
StartByte, EndByte for data in the input table with those
column values. This file can then be used to quickly
index a huge input file probably using a Hash table of
the selected column names instances to lookup the
(start,end) file byte pointers to random access the
large file. The software to use the index file is not
part of HTMLtools at this time.
The default is not to make an index map file.]
-makeM:{makeMapTblFileName,orderedCommaColumnList}
['-makeMapFile:{makeMapTblFileName,orderedCommaColList}'
used with -concatTable command to also make a map
file at the same time. This switch is only used
with -concatTable. Default is no map is made.]
-makeP
['-makePrefaceHTML' to make a separate preface
HTML file from the input text proceeding the table
data. The file has the same name, but has a
"preface-" added to the front of the file name. The
first generated HTML file is then linked from the
second generated file. Default no preface file.]
-makeS
['-makeStatisticsIndexMapFile' to make a 'Statistics Index Map'
table file with the same base file name as the index map (.idx)
but with a .sidx file extension. It is invoked after the
IndexMap file is created (using the '-makeIndexMapFile' switch).
Therefore, it must be specified in a subsequent command line
(if using batch). Default is not to make a Statistics Index Map.]
-mapD:{$$keyword$$,toString}
['-mapDollarsigns:{$$keyword$$,toString}' to
map cell data of the form '$${keyword}$$' to
{toString}. The preface, epilogue as well as the
table cell data is checked to see if any keywords
should be mapped. There may be multiple instances
of this switch. Default is to not do any mappings.]
-mapH:{mapHdrNamesFile,fromHdrName,toHdrName}
['-mapHdrNames:{mapHdrNamesFile,fromHdrName,toHdrName}
to map header names. E.g., map long to short header
names, or map obscure to well-defined header names.
The map file (specified with a relative path) is a
tab-delimited and must contain both the {fromHdrName}
and {toHdrName} entries. Default is no mapping.]
-mapO
['-mapOptionsList' to map ;; delimited strings
to inactive <OPTION> pull-down option lists.
Default is no mapping to option lists.]
-mapQ
['-mapQuestionmarks:{??keyword??,toString}' to
map cell data of the form '??{keyword}??' to
{toString}. If the toString is BOLD_RED,
BOLD_GREEN, or BOLD_BLUE, then just map the
all ??{keyword}?? string to bold and red (green,
or blue). The preface, epilogue as well as the
table cell data is checked to see if any keywords
should be mapped. There may be multiple instances
of this switch. Default is to not do any mappings.]
-noB
['-noBorder' to set no border for tables. The
default is there is a 'BORDER=1' in the TABLE.]
-noHeader
['-noHeader' set no header for tables. The
default is there is a header in input file.]
-noHTML
['-noHTML' set to not generate HTML if it would
normally do so. This switch disallows generation of
HTML when doing a input file processing if that
operation also allows HTML generation. This is useful
if doing editing of large input files to generate
index maps or saved files. The default is to allow
the generation of HTML..]
-outputD:{output directory}
['-outputDirectory:{output directory}' to set the
output directory. The default directory is 'html/'.]
-reorderC:{colName,newColNbr}
['-reorderColumn:{colName,newColNbr}' to reorder
this column to the new column number. You may
specify multiple new columns (they must be
different). Those columns not specified are moved
toward the right. This is done after the list of
dropped columns has been processed. There can be
multiple instances of this switch. Default is not
to reorder columns.]
-reorderR
['-reorderRemainingColumnsAlphabeticly' used if doing
a set of -reorderColumn operations, sort the remaining
columns not specified, but that are used, alphabetically.
Default is not to sort the remaining columns.]
-rmvT
['-rmvTrailingBlankRowsAndColumns' in the table.
Default is not to remove trailing blank lines or
trailing blank columns.]
-saveE:{outTblFile,opt. "HTML"}
['-saveEditedTable2File:{outTblFile,opt. "HTML"}'
to make a Table file from the modified input
table stream. It is created after the Table is
edited by -dropColumns, -keepColumns,
-reorderColumns, -sortRowsByColumn. If the outTblFile
is not specified (i.e., ":,") then the input file name
with the name from the input file with the postfile
name from the '-addOutfilePostfix:{postfix name}' is
used. If the "HTML" option is set, it also outputs the
HTML when doing this operation. Note that the switch
should not be used with '-fastEditFile:{opt. output file}'
which can be used for converting very large files
without generating the HTML file. Default is not to save
the Table.]
-searchGui
['-searchGui' to invoke the graphical user interface for the
database search engine to generate a flip table. See
Search Database GUI generating specialized reports.
Also see Example-17
for examples of the default parameter file used as the
basis of the flip table generated. Default is no search GUI.]
-shrinkB:{opt. size for big,opt. font size decrement}
['-shrinkBigCells:{opt. size for big,opt. font size
decrement}' in the Table with more than the big
threshold number of characters/cell by decreasing
the font size to -5 (or the opt. font size
decrement) for those cells. The big size threshold
defaults to 25 characters. Setting the threshold to
1 forces all cells to shrink. Default is not to
shrink cells.]
-showDataHeatmapFlipTable
['-showDataHeatmapFlipTable' used to generate colored heat-map
data cells in a HTML conversion for a flip table using the
'-flipTableByIndexMap' option. It uses the global statistics
on the (digital) data in the Statistics Index Map .sidx file
if it exists to normalize the data and generate a cell color
background range in 7 quantiles of colors: dark green,
medium green, light green, white, light red, medium red, dark red.
Default is not to generate the colored heatmap.]
-sortFlip:{col data name}
['-sortFlipTableByColumnName:{col data name}' specifies the name
of field in the flip table to use in sorting by column data in
descending order in the generated table. It is used with the
'-flipTableByIndexMap' option. Note this name can be any of the
flipped header column values (multiheader data names}. When doing
the sort it matches the specified name with any of the header
rows to find the column to use for the sort. Default is not to
sort the generated flip table.]
-sortR:{colName,'A'scending or 'D'escending}
['-sortRowsByColumn:{colName,'A'scending or specified column.
You can specify 'Ascending' or 'D'escending. This is done after
any columns have been dropped or reordered. Default is not to
sort columns. If the column is not found, don't sort - just
continue. You can have multiple instances of the switches. If the
first column name is not found, it looks for the second, etc.
and only ignores the sort if no column names are found. Default
is not to sort the table.]
'-startT:{keyword}
['-startTableAtKeywordLine:{keyword} specifies the start
of the last line of a Table header by a keyword that
is part of any of the fields in that line. This is
useful when reading a file with complex preface info
with possibly multiple blank lines. It can be used
with the '-hdrLines' switch to specify multiple
header lines. Default no keyword search.]
-tableD:{tablesDirectory}'
['-tableDir:{tablesDirectory}' to set the various mapping
tables directory. These tables are used during various
conversion procedures. They include both the .txt and
the .map file (same file, but with different extensions).
Examples include: EGMAP.map(.txt), ExperimentGroups.map(.txt)
mAdbArraySummary.map(.txt). The default directory is
'data.Table/'.]
-useOnly
['-useOnlyLastHeaderLine' to reduce the number of header
lines to 1 even if there are more than 1 header line.
Default is to use all of the header lines.]
3.1 GenBatchScripts COMMANDS Extension
These commands are used to create batch scripts for subsequent use by
HTMLtools. This set of commands is called the GenBatchScript commands.
The GenBatchScript process is described in
Section 1.1. The -genBatchScript command is only used to generate these
batch scripts in a set of structured trees suitable for copying directly to a
Web server. It uses a test-ToDo-list.txt Table to specify a list of tests, column
"Test-name", a column "Relative directory" where the data is to be saved and some
documentation columns "Page label", "Page description", and "Tissue name"
that are used for helping generate the Summary HTML Web pages and
params .map files used in the subsequent conversion of the .txt Table
data files to HTML documentation. See Example 15.
for am example of a params.map file using the GenBatchScripts commands.
*** REWRITE and EDIT more detailed and generalized description ***
-genBatch:{batchDir,paramScriptsDir,inputTreeDir,summaryDir,analysisDir}'
['-genBatchScripts:{batchDir,paramScriptsDir,inputTreeDir,
outputTreeDir,analysisTreeDir,JTVDir}' to generate a set
of scripts to batch convert a set of tab-delimited Table
test data files specified by the -genTestFile:{testToDoFile}
Table in the {batchDir} directory. It generates a set of
parameter .map files in the {paramScriptsDir} directory. It
also generates a set of summary HTML Web pages in {summaryDir}
that describe the data, one page for each type of tissue,
and (pre) generates links to data that will be generated in
the {analysisTreeDir} when the batch script is subsequently
run. These new params .map files can then be run by a converter
batch file called buildWebPages.doit started with a
Windows buildWebPages.bat BAT file to start the batch
job (both files are in the batchDir directory along with a
copy of HTMLtools.jar). The buildWebPages.bat file
could easily be edited to run on MacOS-X or Linux. The paths
created in the {inputTreeDir}, and {analysisTreeDir} base paths
use the "Relative Directory" data in the {testToDoFile} within
those directories. This generated batch .doit script will
process a data set to generate a set of HTML pages and
converted database .txt files defined by the {testToDoFile}
Table database. Default is no batch script generation.
Additional switches required with -genBatchScripts are:
-genTestFile, -genMapEGdetails, and -genMapEGintroduction]
-genC:{support file}
['-genCopySupportFile:{support file}' to specify a list
of support files to copy to the output batchDir (e.g.,
'-outputDir:batchScripts'). The support files are
specified with a list created using multiple instances of
-genCopySupportFile:{support file}. Default is no support
files to copy.]
-genMapEGd:{EGdetailsMapFile}
['-genMapEGdetails:{EGdetailsMapFile}' specifies the
'details' Table used when the -genBatchScripts switch is
invoked. This is required when the -genBatchScripts switch
is used.]
-genMapIntro:{introductionMapFile}
['-genMapIntroduction:{introductionMapFile}' specifies the
'Introduction' Table used when the -genBatchScripts switch
is invoked. This is required when the -genBatchScripts
switch is used.]
-genP:{name,paramTemplateFileName}
['-genParamTemplate:{name,paramTemplateFileName}' to
specify a list of parameter map Templates that are used
for mapping the test-ToDo-list data so that (param-MRR,
param-MRR-keep, param-JTV) etc. dynamically. These are
then mapped into the following keywords that may appear in
any of these templates: $$TISSUE$$, "$$TEST_NAME$$",
"$$MRR_FILE$$", $$DESCRIPTION$$, $$PROLOG$$, $$EPILOG$$,
$$DATE$$. Multiple unique instances are allowed. The
default is no parameter templates.]
-genS:{orderNbr,templateFileName}
['-genSummaryTemplate:{orderNbr,templateFileName}' to define
a list of Summary Templates that are used for mapping the
test-ToDo-list data so that (summaryProlog, summaryExperimental,
summaryAnalysis, summaryFurtherAnalysis, summaryEpilogue)
etc. dynamically. Set by -genSummaryTemplate:{orderNbr,
templateFileName} instances that can be used to generalized
the currently hardwired. These are then mapped into the
following keywords that may appear in any of these templates:
$$TISSUE$$, $$LIST_EXPR_GROUPS$$, $$DESCRIPTION$$,
$$ANALYSIS$$, $$FURTHERANALYSIS$$, $$DATE$$. The
$$INTRODUCTION$$ is extracted from the {"CellTypeTissue.map"}.
Default is no templates being defined. Multiple instances
are allowed where they are concatenenated by the orderNbr
associated with each template.]
-genTest:{testToDoFile}
['-genTestFile:{testToDoFile}' specifies the tests to do
when the -genBatchScripts switch is invoked. This is
required when the -genBatchScripts switch is used.]
-genTree:{sourceTreeDir,destDir}
['-genTreeCopyData:{sourceTreeDir,destDir}' to copy an
input data tree data to batch scripts subdirectory.
There can be multiple instances of this option.
Default is to not copy tree data.]
3.2 Tests-Intersection COMMANDS Extension
These Tests-Intersections subset of commands are only used to create
Tests-Intersection tables from mAdb Retrieval Reports (MRR) containing fold-change
data from the Tests-ToDo database used with GenBatchScripts. The primary command to
invoke this is the makeTestsIntersectionTbl switch. These Tests-Intersection commands
can be used with the regular HTML or table editing commands such as
'-noHTML' and/or '-saveTable' switches. If HTML is generated, then the
'-addProlog' and '-addEpilogue', '-mapQuestion', and '-mapDollar', '-sortByColumn',
-limitMaxTableRows, etc. See Example 13 for
an example of generating a Tests-Intersection tab-delimited table and HTML
Web page.
-addFCranges
['-addFCrangesForTestsIntersectionTable' may be used when
generating a table Tests-Intersection Table using the
'-makeTestsIntersectionTbl:{testsToDoFile}'. This switch
does a simple fold-change (FC) row analysis after the
Tests-Intersection Table is created by adding ("Min FC"
"Max FC" "FC Range") data for each row. Because this
extends the table, you can sort by any of these fields.]
-addRange
['-addRangeOfMeansToTItable' to add the ("Range Mean A",
"Range Mean B" and "FC counts %") computations to an expanded
Tests-Intersection Table table. Default is to not add these
fields.]
-filterData:{dataTableField,d1,d2,...,dn}
['-filterDataTestIntersection:{dataTableField,d1,d2,...,dn}' that
is used with the '-makeTestsIntersectionTbl:{testsToDoFile}' to
filter the MRR rows using the specified MRR {dataTableField}
and use it if it matches any of {d1,d2,...,dn} substrings.
The default is not to filter the Tests-Intersection Table.]
-filterTest:{testTableField,d1,d2,...,dn}
['-filterDataTestIntersection:{testTableField,d1,d2,...,dn}' that
is used with the '-makeTestsIntersectionTbl:{testsToDoFile}' to
filter the Tests-ToDo table rows using the specified {testTableField}
and use it if it matches any of {d1,d2,...,dn} substrings.
The default is not to filter the Tests-Intersection Table.]
-makeT:{testsToDoFile,testsInputTreeDir}
['-makeTestsIntersectionTbl:{testsToDoFile}' that
generates a table Tests-Intersection Table that contains
data from the individual tests from the tests input data tree
specified by the tests in -tableDir directory in the
{testsToDoFile} which specifies the relative data file tree.
The tree is found in -inputDir directory. The data files in
the tree are used as input data. The computed table is
organized by rows of +FC genes/Feature-IDs and -FC
genes/Feature-IDs. The data from the {testsToDoFile} is used
to get additional information for each test as follows.
This switch is used with the '-noHTML' and/or '-saveTable'
switches. If HTML is generated, then the '-addProlog' and
'-addEpilogue', '-mapQuestion', and '-mapDollar' can be used.
You can filter the MRR rows using the
'-filterDataTestIntersection:{dataTableField,d1,d2,...,dn}' and
the {testsToDoFile} test data using the
'-filterTestTestIntersection:{testTableField,d1,d2,...,dn}'.
The default is not to make the Tests-Intersection Table. You
can do a simple FC row analysis by adding ("Min FC" "Max FC"
"FC Range") for each row using the
'-addFCrangesForTestsIntersectionTable' switch.]
3.3 Java TreeView COMMANDS Extension
These commands are only used to convert Java TreeView (JTV) mAdb heatmap data
files for use on the Jak Stat Prospector Web site. In addition to mapping the
sample names from mAdb names to GSP ID names, it reorders the data so that the
"gene - gene description" appears first rather than the "WID: #" in the "NAME"
field. It also changes the contents of the "YORF" field data to the
"gene - gene description" data so that when mousing over a heatmap cell in
the zoom window, the upper left-hand corner displays the "gene - gene description"
for the row and the GSP ID sample for the column.
Reorder:
"WID:... || xxxxxx_at || MAP:... || gene -- geneDescr. || RID:..."
to
"gene -- geneDescr. || xxxxxx_at || WID:... || MAP:... || RID:..."
You can not mix tab-delimited file to HTML conversions with JTV conversions in
the params .map files.
-jvtB:{button name for JTV activation button}
['jvtButtonName:{button name for JTV activation button}'
that may be used with '-jtvHTMLgenerate' to label the
button to activate Java TreeView. The default is
"Press the button to activate JTV".]
-jtvC:{JTV jars directory}
['-jtvCopyJTVjars:{JTV jars directory}' to copy the
JTV jar files and plugins to the jtvOutputDir.
The default is no copying of the .jar files.]
-jvtD:{description text for prologue}
['-jvtDescription:{description text for prologue}'
that may be used with '-jtvHTMLgenerate' to insert
additional text into the prolog where it replaces
$$DATA_DESCRIPTION$$. The default is no description.]
-jtvFiles:{f1,f2,...,fn}
['-jtvFiles:{f1,f2,...,fn}' to specify list of files
here rather than all in all of the files in the
jtvInputDir. You can have multiple instances of this
switch.]
-jtvH:
['-jtvHTMLgenerate' to generate a HTML file to invoke
the JTV applet for each JTV specification in the
jtvInputDir. It puts the HTML file in the jtvOutputDir.
Some of the non-JTV HTML modification switches are
operable including: '-addEpilogue', '-addOutfilePostfix',
'-addProlog', '-mapQuestionmarks'. The default is to not
generate JTV HTML.]
-jtvI:{input JTV directory}
['-jtvInputDir:{input JTV directory}' to set the
input directory of JTV sub directories. This contains
the zipped or unzipped JTV files downloaded from mAdb.
Each zip file contains 3 files with (.atr,.cdt,.gtr)
extensions. Default directory is 'JTVinput/'.]
-jtvO:{output JTV directory}
['-jtvOutputDir:{output JTV directory}' to set the
output directory of JTV sub directories. The converted
JTV directory and a corresponding HTML file are saved
there. Default directory is 'JTVoutput/'.]
-jtvN:{mAdbArraySummary,mapHdrNamesFile,fromHdrName,toHdrName}
['-jtvMapping:{mAdbArraySummaryFile,mapHdrNamesFile,
fromHdrName,toHdrName}' to convert a list of sub
directories of JTV file sets by reading the three files
from the each of the subdirectories in the jtvInputDir
directory. The {mAdbArraySummaryFile} and {mapHdrNamesFile}
are specified with a relative path. It maps the .cdt file
in each sub directory to use the {toHdrName} column of
the equivalent mapNamesFile map Table instead of
the "EID:'mAdb ID'" as generated by mAdb. The mapping
between "mAdb ID" and short array names is done using
the {fromHdrName} column of the jtv_mAdbArraySummaryFile
Table map. It then writes out the JTV subset to a created
sub directory in jtvOutputDir that has the same base
name as the input JTV subdirectory being processed.
See the optional switches: '-jtvInputDir:{jtvInputDir}'
and '-jtvOutputDir:{jtvOutputSubDir}' to set the
directories to other than the defaults ("JTVinput" and
"JTVoutput"). The values for {fromHdrName} and {toHdrName}
should be in the of mapNamesFile.]
-jtvR [TODO]
['-jtvReZipConvertedFiles' to reZip the converted files
in the output JTV directory in a file with the same name.
Default is not to zip the converted files.]
-jtvTableDir:{tablesDirectory}'
['-jtvTableDir:{tablesDirectory}' to set the various mapping
tables directory. These tables are used during various
conversion procedures. They include both the .txt and
the .map file (same file, but with different extensions).
Examples include: EGMAP.map(.txt), ExperimentGroups.map(.txt)
mAdbArraySummary.map(.txt). Note: this switch is used when
processing JTV files, but may also be set with the
'-tableDir:{tablesDirectory}' switch. The default directory
is 'data.Table/'.]
4. EXAMPLES
We demonstrate running the program with a set of examples (and a few sub
examples). The first (1 through 8) are for converting tab-delimited .txt
files to .html files. Example 9 illustrates remapping sample labels for a Java
TreeView conversion. Example 10 shows how these examples can be run by specifying
a list of parameter .map files using a batch command. Example 11 illustrates
editing a very large .txt file into another .txt file using the fast edit command.
Example 12 illustrates URL mapping the header data in a transposed table.
Example 13 generates a Tests-Intersection .txt and .html table from the tests data
also used in Example 13. Example 14 illustrates generating a flipped table with
hyperlinked multi-line headers with data filtered by rows and column name filters.
Example 15 illustrates generating a set of batch jobs to
convert data described in a table file generating summary Web pages, a set of params
.map files in a tree structure. Example 16 is used for preparing a database
and an Index Map files for used in the GUI based database search shown
in Example 17.
Example 1.
The program with no arguments uses the defaults described above. I will look
for tab-delimited .txt files in the default input directory (data/) and save
the generated HTML files in the output directory (html/). It also looks for
the default template files prolog.html and epilogue.html in the current directory.
================================================================
HTMLtools
Example 2.
The defaults for Example 1 are shown explicitly here. [The '\' indicates
line continuations in Unix for ease of reading here, but they should all be
on the same line when the command is issued from the command line unless
line continuation charactes are used for your particular operating system.]
================================================================
HTMLtools -addProlog:prolog.html -addEpilogue:epilogue.html \
-inputDir:data -outputDir:html -tableDir:data.Table
Example 3.
This gets the arguments from the default data/params.map file if
it exists. These params .map files are generally kept in the in the same
directory as the .txt input files to be converted. We could use any other
file extension except .txt (since we are converting all .txt files found in
the data directory). So by convention we use the .map file extension instead.
================================================================
HTMLtools data/params.map
Example 4.
This uses a simple spreadsheets with row background colors
alternated between the prolog background color and white, big
cells have their font shrunk, trailing blank rows are removed,
The switches are in file 'params-GSPI-EG.map'. The
'-extractRow:"Experiment Group ID (1),1,data.Table/ExperimentGroups.map,DL"'
switch tells it to extract the row that matches data in column
"Experiment Group ID (1)" in the table with the same column name
in the ExperimentGroups.map file column (row 1) and generate a
>DL< list in the epilogue. This lets you use data from
a meta-database table to document each of the individual tables being
converted.
E.g., GSP-Inventory.xls EG samples
data saved from Excel worksheets. Note: you must double quote arguments that use
spaces.
===============================================================
HTMLtools data.GSPI-EG/params-GSPI-EG.map
where: data.GSPI-EG/params-GSPI-EG.map
contains:
#File:params-GSPI-EG.map
#"Revised: 3-30-2009"
#
-addPrologue:data.GSPI-ExpGrp/prolog.html
-addEpilogue:data.GSPI-ExpGrp/epilogue.html
-addRowNumbers
-addTableName:"GSP Experiment Group Samples"
-inputDir:data.GSPI-EG
-outputDir:html/GSP/GSP-Inventory/HTML
-tablesDir:data.Table
#
-extractRow:"Experiment Group ID (1),1,data.Table/ExperimentGroups.map,DL"
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-rmvTrailingBlankRowsAndColumns
#
#"----------- End --------- "
Example 4.1
Example to generate a concatenated .txt file from a
set of simple spreadsheets. Row background colors are alternated,
big cells have their font shrunk, trailing blank rows are removed.
E.g., single table from set of EG001.txt to EG0nn.txt single files with single
row headers from the GSP-Inventory.xls
data saved from Excel worksheets. They are concatenated to file "EGMAP.txt".
The switches are in file 'params-GSPI-EG-concat.map'.
Note: you must double quote arguments that use spaces.
==================================================================
HTMLtools data.GSPI-EG/params-GSPI-EG-concat.map
where: data.GSPI-EG/params-GSPI-EG-concat.map
contains:
#File:params-GSPI-EG-concatTXT.map
#"Revised: 3-29-2009"
#
-addPrologue:data.GSPI-EG/prolog.html
-addEpilogue:data.GSPI-EG/epilogue.html
-addRowNumbers
-addTableName:"GSP Inventory Concatenated List of all EG Samples"
-inputDir:data.GSPI-EG
-outputDir:data.Table
-tablesDir:data.Table
#
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-rmvTrailingBlankRowsAndColumns
#
#"Save the concatenated data in the following file."
#-concatTables:EGALLDataSet.txt
-concatTables:EGMAP.txt,noHTML
#
#"----------- End --------- "
Example 4.2
This example extends Example 4.1 to generate a HTML file of a concatenated
set of .txt files. They are concatenated to the "EGMAP.html" file.
The switches are in file 'params-GSPI-EG-concat.map'. Note the use of
the 'noTXT' argument in the '-concatTables' switch. Row background colors
are alternated, big cells have their font shrunk, trailing blank rows are
removed. The switches are in file
'data.GSPI-EG/params-GSPI-EG-concatHTML.map'. Note: you must double quote
arguments that use spaces.
==================================================================
HTMLtools data.GSPI-EG/params-GSPI-EG-concatHTML.map
where: data.GSPI-EG/params-GSPI-EG-concatHTML.map
contains:
#File:params-GSPI-EG-concatHTML.map
#"Revised: 3-29-2009"
#
-addPrologue:data.GSPI-ExpGrp/prolog.html
-addEpilogue:data.GSPI-ExpGrp/epilogue.html
-addRowNumbers
-addTableName:"GSP Inventory Concatenated List of all EG Samples"
-inputDir:data.GSPI-EG
-outputDir:html/GSP/GSP-Inventory/HTML
-tablesDir:data.Table
#
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-rmvTrailingBlankRowsAndColumns
#
#"Save the concatenated data in the following file."
#-concatTables:EGALLDataSet.txt
-concatTables:EGMAP.txt,noTXT
#
#"----------- End --------- "
Example 4.3
This example (extends Example 4.1) to generate
a map file from concatenated .txt files from a set of simple spreadsheets
using the -concatTable switch. The generated file is saved in file
data.Table/EGMAP.map. Alternatively, the .map file could be specified with
the '-mapHdrNames' switch to restrict the columns to appear in the
generated .map file. There is no HTML file generated. The switches are in file
'data.Maps/params-Maps-EGMAP-map.map'. Note: you must double
quote arguments that use spaces.
==================================================================
HTMLtools data.Maps/params-Maps-EGMAP-map.map
where: data.Maps/params-Maps-EGMAP-map.map
contains:
#File:params-Maps-EGMAP-map.map
#"Revised: 3-30-2009"
#"Generate the EGMAP.map file, but no HTML file."
#
-addRowNumbers
-addTableName:"Concatenation of all GSP Experiment Groups tables."
-inputDir:data.Table
-outputDir:data.Table
-tablesDir:data.Table
-files:"EGMAP.txt"
#
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-rmvTrailingBlankRowsAndColumns
#
-concatTables:EGMAP.map,noHTML
#
#"---------- end ---------"
Example 5.
This uses a simple spreadsheet with mapping some of the
cells to colored bold fonts. Row background colors are alternated, big
cells have their font shrunk, trailing blank rows are removed.
The switches are in file 'params-GSPI-ExpGrp.map'.
E.g., GSP-Inventory.xls
'ExperimentGroups' sheet describing samples data saved from Excel
worksheets. Note: you must double quote arguments that use spaces.
==================================================================
HTMLtools data.GSPI-ExpGrp/params-GSPI-ExpGrp.map
where: data.GSPI-ExpGrp/params-GSPI-ExpGrp.map
contains:
#File:params-GSPI-ExpGrp.map
#"Revised: 3-30-2009"
#
-addPrologue:data.GSPI-ExpGrp/prolog.html
-addEpilogue:data.GSPI-ExpGrp/epilogue.html
-addRowNumbers
-addSubTitleFromInputFile
-addTableName:"GSP Experiment Groups Details"
-files:"ExperimentGroups.txt"
-inputDir:data.Table
-outputDir:html/GSP/GSP-Inventory/HTML
-tablesDir:data.Table
#
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-rmvTrailingBlankRowsAndColumns
#
-mapOptionsLists
-mapQuestionmarks:WHO,BOLD_RED
-mapQuestionmarks:WHAT,BOLD_RED
-mapQuestionmarks:WHEN,BOLD_RED
#
#"----------- End --------- "
Example 5.1
This example is an extension of Example 5., except
that it uses the '-exportBigCellsToHTMLfile:200' to export large cells with more
than 200 characters to separate small HTML files and generate hyperlinks
to those small files in the affected cells. This makes the spreadsheet more
readable when there are some cells that have a large number of characters.
The switches are in file 'params-GSPI-ExpGrp-exportBigCells.map'.
E.g., GSP-Inventory.xls 'ExperimentGroup'
samples data saved from Excel worksheets. Note: you must double quote arguments
that use spaces.
==================================================================
HTMLtools data.GSPI-ExpGrp/params-GSPI-ExpGrp-exportBigCells
where: data.GSPI-ExpGrp/params-GSPI-ExpGrp-exportBigCells.map
contains:
#File:params-GSPI-ExpGrp-exportBigCells.map
#"Revised: 3-30-2009"
#
-addOutfilePostfix:"-BRC"
-addPrologue:data.GSPI-ExpGrp/prolog.html
-addEpilogue:data.GSPI-ExpGrp/epilogue.html
-addRowNumbers
-addSubTitleFromInputFile
-addTableName:"GSP Experiment Groups Details"
-inputDir:data.Table
-outputDir:html/GSP/GSP-Inventory/HTML
-tablesDir:data.Table
-files:"ExperimentGroups.txt"
#
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-mapOptionsLists
-mapQuestionmarks:WHO,BOLD_RED
-mapQuestionmarks:WHAT,BOLD_RED
-mapQuestionmarks:WHEN,BOLD_RED
-rmvTrailingBlankRowsAndColumns
#
-exportBigCellsToHTMLfile:200
#
#"----------- End --------- "
Example 6.
Example using a 2 sub-table spreadsheet with the first and
second tables being separated by a blank line. This example also
allows 2-line headers, dropping some of the columns, and mapping
some of the column cell data to URLs and lists of ';;' separated
items in cells to be mapped to non-active <OPTION> lists. It
creates a preface HTML file from the first part of the input file
and links to in the second Table file. The '-mapHdrNames' switch
is used to map the long data names to shorter distinct names
specified in a mapping table. After dropping columns, using the
'-reorderColumn' switch it reorders columns (ignoring ones that
don't exist). It sortsthe rows by a particular column
'-sortRowsByColumn' where it uses the p-Value column if it exists,
else it sorts by the Difference data if it exists, etc.
E.g., mAdb Microarray Retrieval Reports (MRR) from the Excel download.
where MRR is a 'mAdb Microarray Retrieval Report' report. The switches are
in file 'params-MRR.map'. Note: you must double quote arguments
that use spaces.
===============================================================
HTMLtools data.MRR/params-MRR.map
where: data.MRR/params-MRR.map
contains:
#File:params-MRR.map
#"Revised: 5-28-2009"
#
-addPrologue:data.MRR/prolog.html
-addEpilogue:data.MRR/epilogue.html
-addRowNumbers
-addTableName:"mAdb Microarray Retrieval Report"
-inputDir:data.MRR
-outputDir:html/data.MRR
-tablesDir:data.Table
#
#"Limit the number of rows to the highest 500 fold-change values"
-limitMaxTableRows:"500,A-B Mean Difference,Descending"
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
-hdrLines:2
-hasEmptyLineBeforeTable
-makePrefaceHTML
-mapOptionsLists
#
#"Map header names. Select from field='Affy .CEL file (16)'"
#" to field= 'GSP ID (9)' or 'Simple GSP ID (10)'"
-mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)"
#
#"Drop some of the columns"
-dropColumnColumn:"mgB36 Chr:Start-Stop"
-dropColumn:"mgB36 Cytoband"
-dropColumn:Annotation_Src
-dropColumn:UniGene
-dropColumn:RefSeq
-dropColumn:Refseqs_Hit
-dropColumn:geneIDS_Hit
-dropColumn:"Entrez GeneID"
-dropColumn:"Locus Tag"
-dropColumn:"BioCarta Pathways"
-dropColumn:"KEGG Pathways"
#-dropColumn:"Gene Ontology Terms" (remove # if want to drop)
-dropColumn:"GO Tier2 Component"
-dropColumn:"GO Tier3 Component"
-dropColumn:"GO Tier2 Function"
-dropColumn:"GO Tier3 Function"
-dropColumn:"GO Tier2 Process"
-dropColumn:"GO Tier3 Process"
#"The following was added 5/28/09"
-dropColumn:"Map"
-dropColumn:"mgB37_Probe Chr:Start-Stop"
-dropColumn:"mgB37_Probe Cytoband"
-dropColumn:"mgB37_RefSeq Chr:Start-Stop"
-dropColumn:"mgB37_RefSeq Cytoband"
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"Sort rows by column - use whichever comes first"
-sortRowsByColumn:"A-B Mean Difference",Descending
-sortRowsByColumn:Difference,Descending
-sortRowsByColumn:p-Value,Ascending
-sortRowsByColumn:Gene,Ascending
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Entrez GeneID",http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"----------- End --------- "
Example 7.
Similar to Example 6, but remaps the array names
to the shorter ID names in the example.
E.g., mAdb Microarray Reports (MRR) from the Excel download.
reordered. The switches are in file 'params-MRR-Short_GSP_ID.map'.
Note: you must double quote arguments that use spaces.
===============================================================
HTMLtools data.MRR/params-MRR-Short_GSP_ID.map
where: data.MRR/params-MRR-Short_GSP_ID.map
contains:
#File:params-MRR-Short_GSP_ID.map
#"Revised: 5-28-2009"
#
-addOutfilePostfix:"-Short_GSP_ID"
-addPrologue:data.MRR/prolog.html
-addEpilogue:data.MRR/epilogue.html
-addRowNumbers
-addTableName:"mAdb Microarray Retrieval Report"
-inputDir:data.MRR
-outputDir:html/data.MRR
-outputDir:html
-tablesDir:data.Table
#
#"Limit the number of rows to the highest 500 fold-change values"
-limitMaxTableRows:"500,A-B Mean Difference,Descending"
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
-hdrLines:2
-hasEmptyLineBeforeTable
-makePrefaceHTML
-mapOptionsLists
#
#"Map header names. Select from field='Affy .CEL file (16)'"
#" to field= 'GSP ID (9)' or 'Simple GSP ID (10)'"
-mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),Simple GSP ID (10)"
#
#"Drop some of the columns"
-dropColumnColumn:"mgB36 Chr:Start-Stop"
-dropColumn:"mgB36 Cytoband"
-dropColumn:Annotation_Src
-dropColumn:UniGene
-dropColumn:RefSeq
-dropColumn:Refseqs_Hit
-dropColumn:geneIDS_Hit
-dropColumn:"Entrez GeneID"
-dropColumn:"Locus Tag"
-dropColumn:"BioCarta Pathways"
-dropColumn:"KEGG Pathways"
#
#-dropColumn:"Gene Ontology Terms" (remove # if want to drop)
-dropColumn:"GO Tier2 Component"
-dropColumn:"GO Tier3 Component"
-dropColumn:"GO Tier2 Function"
-dropColumn:"GO Tier3 Function"
-dropColumn:"GO Tier2 Process"
-dropColumn:"GO Tier3 Process"
#"The following was added 5/28/09"
-dropColumn:"Map"
-dropColumn:"mgB37_Probe Chr:Start-Stop"
-dropColumn:"mgB37_Probe Cytoband"
-dropColumn:"mgB37_RefSeq Chr:Start-Stop"
-dropColumn:"mgB37_RefSeq Cytoband"
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"Sort rows by column - use whichever comes first"
-sortRowsByColumn:"A-B Mean Difference",Descending
-sortRowsByColumn:Difference,Descending
-sortRowsByColumn:p-Value,Ascending
-sortRowsByColumn:Gene,Ascending
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Entrez GeneID",http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"----------- End --------- "
Example 8.
An alternative way to specify columns is using a list of '-keepColumn' switches.
This is the same as Example 7, however the columns are
specified with the -keepColumn switches. This is useful if you have data that
you don't want to use and don't know the names or many of the columns that you
don't want.
E.g., mAdb Microarray Reports (MRR) from the Excel download.
reordered. The switches are in file 'params-MRR-keep.map'.
Note: you must double quote arguments that use spaces.
===============================================================
HTMLtools data.MRR/params-MRR-keep.map
where: data.MRR/params-MRR-keep.map
contains:
#File:params-MRR-keep.map
#"Revised: 4-9-2009"
#
-addOutfilePostfix:"-keep"
-addPrologue:data.MRR/prolog.html
-addEpilogue:data.MRR/epilogue.html
-addRowNumbers
-addTableName:"mAdb Microarray Retrieval Report"
-inputDir:data.MRR
-outputDir:html/data.MRR
-tablesDir:data.Table
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
-hdrLines:2
-hasEmptyLineBeforeTable
-makePrefaceHTML
-mapOptionsLists
#
#"Specify columns to keep, the rest are dropped"
-keepColumn:Gene
-keepColumn:p-Value
-keepColumn:Difference
-keepColumn:"A-B p-Value"
-keepColumn:"A-B Mean Difference"
-keepColumn:"A Mean"
-keepColumn:"B Mean"
-keepColumn:"Well ID"
-keepColumn:"Feature ID"
-keepColumn:Description
-keepColumn:"Gene Ontology Terms"
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"Sort rows by column - use whichever comes first"
-sortRowsByColumn:"A-B Mean Difference",Descending
-sortRowsByColumn:Difference,Descending
-sortRowsByColumn:p-Value,Ascending
-sortRowsByColumn:Gene,Ascending
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Entrez GeneID",http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"----------- End --------- "
Example 9.
Convert all mAdb generated Java TreeView .zip files or (unpacked)
directories in the JVT input directory. It is invoked by the switch
'-jtvNamesMap:{mAdbArraySummaryFile,mapHdrFile,fromHdrName,toHdrName}'.
The JVT input (output) directories are specified with
-jtvInputDir (-jtvOutputDir) switches.
The JVT input directory contains saved mAdb JTV files that are
are unzipped. The zip files could be saved with 'save names'
indicating the data analysis conditions. The processing converts
a list of files from the each of the sub directories in the
-jtvInputDir directory. The conversion maps the .cdt file in each
sub directory to use the {toHdrName} column data of the equivalent
{mapHdrFile} map Table instead of the "EID:'mAdb ID'" as generated by
mAdb. The mapping between "mAdb ID" and short array names is done using
the jtv_mAdbArraySummaryFile Table map. It then writes out the
JTV subset to a created sub directory in jtvOutputDir that
has the same base name as the input JTV sub directory being
processed. This is run using the switches '-jtvMapping',
'-jtvInputDir' and -jtvOutputDir'. After converting the JTV files,
generate corresponding Web pages to invoke the JTV applets using
the jtvHTMLgenerate, -jtvDescription, -jtvButton, -addProlog,
-addEpilogue. When doing -jtvHTMLgenerate, the jtvCopyJTVjars
copies the Java TreeView .jar files and plugins to the jtvOutputDir
directory. It also rezips the output directory since the
'jtvReZipConvertedFiles' switch was specified. Note: you must double
quote arguments that use spaces.
E.g., convert a set of Java TreeView zip files downloaded from mAdb.
The switches are in file 'params-JTV.map'.
===============================================================
HTMLtools JTVinput/params-JTV.map
where: params-JTV.map
contains:
#File:params-JTV.map
#"Revised: 3-30-2009"
#
#"(1) Convert array names in JTV data sets to mapped array names."
-jtvNamesMap:"data.Table/mAdbArraySummary.map,data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)"
#
-jtvInputDir:JTVinput
-jtvOutputDir:JTVoutput
-jtvTableDir:data.Table
#
#"(2) Generate HTML web pages to invoke the converted JTV data."
-jtvHTMLgenerate
-jtvDescription:"Sample description paragraph on mouse muscle GH/Stat-null Controlled (+) genes [i.e. experiment $$INFILENAME$$]."
-jtvButtonName:"Mouse Muscle: $$INFILENAME$$"
-addProlog:JTVinput/prolog.html
-addEpilogue:JTVinput/epilogue.html
-jtvCopyJTVjars:JTVjars
#
# [3] Rezip the converted files
-jtvReZipConvertedFiles
#
#"------------ End -----------"
Example 10.
Batch process a list of HTMLtools params .map files specified in the
batch input file (batchList.doit in this example). The batch processing
is started as shown below with the '-batchProcess' switch. The previous
examples show some of params .map files that could be used in the list.
Note: you may not nest '-batchProcess' commands (it is not recursive). The
list may only contain comments ('#' prefixed lines) or params .map file names.
Note: you must double quote arguments that use spaces.
E.g., Execute the list of parameter .map files listed in the batch file called
batchList.doit.
===============================================================
HTMLtools -batchProcess:batchList.doit
where: batchList.doit
contains:
#File:batchList.doit
#"Revised: 6-23-2009"
#"Preprocess the data for the NIDDK/mAdb GSP Jak-Stat Prosector Database"
#
#"(1) Doing GSP-InventoryExperiment Groups conversions and generating HTML pages"
data.GSPI-EG/params-GSPI-EG.map
data.GSPI-EG/params-GSPI-EG-concatTXT.map
data.GSPI-EG/params-GSPI-EG-concatHTML.map
data.GSPI-ExpGrp/params-GSPI-ExpGrp.map
data.GSPI-ExpGrp/params-GSPI-ExpGrp-exportBigCells.map
#
#"(2) Doing mAdb Retrieval Report conversions and generating HTML pages"
data.MRR/params-MRR.map
data.MRR/params-MRR-keep.map
#
#"(3) Doing JTV array name conversions and generating HTML pages"
###JTVinput/params-JTV.map
JTVinput/params-JTV-jtvReZip.map
#
#"(4) Convert Mapping .txt files to HTML"
data.Maps/params-Maps-EGMAP-html.map
data.Maps/params-Maps-ExperimentGroups-html.map
data.Maps/params-Maps-mAdbArraySummary-html.map
#
#"(4.1) Convert Mapping .txt files to .map files"
data.Maps/params-Maps-EGMAP-map.map
data.Maps/params-Maps-ExperimentGroups-map.map
data.Maps/params-Maps-mAdbArraySummary-map.map
#
#"(5) Doing mAdb Retrieval Report Gene List mappings and generating HTML pages"
data.MRR-GL-examples/params-MRR-GL-orig.map
data.MRR-GL-examples/params-MRR-GL-Review.map
data.MRR-GL-examples/params-MRR-GL-GeneList.map
#
#"(6) Doing mAdb and HTML conversion tests TODO generating HTML pages"
data.mAdb-TestsToDo/params-mAdb-TestsToDo.map
#
#"(7) Convert MRR all arrays to edited DB file."
#" This is normally not done each time."
#data.MRR-all/params-MRR-all-18-RMA-fast.map
#data.MRR-all/params-MRR-all-18-MAS5-fast.map
#
#"(7.1) Convert MRR Literature data for all arrays."
#" This is normally not done each time."
data.MRR-Literature/params-MRR.map
data.MRR-Literature/params-MRR-keep.map
data.MRR-Literature/params-JTV-jtvReZip.map
#
#"(8) Generate a Tests-Intersection .txt table and also the HTML for it."
#" from the mAdb-TestsToDo.txt data."
data.TestsIntersection/params-TI-HTML-all.map
data.TestsIntersection/params-TestsIntersection-ALL.map
data.TestsIntersection/params-TestsIntersection-ALL-filter.map
data.TestsIntersection/params-TestsIntersection-ALL-filter-LIT.map
#
#"(9) Flip several types of samples - not currently used in html/GSP"
#"(9) Flip several types of samples"
#"(9.1) Create Data file for Flip Tables."
#" Create edited Tables with Index-Maps."
#" This is normally not done each time."
data.MRR-flip/params-MRR-all-fastSave.map
data.MRR-flip/params-MRR-all-fastMakeIndex.map
data.MRR-flip/params-MRR-all-fastSave+MakeIndex.map
data.MRR-flip/params-MRR-LitRev-fastSave.map
data.MRR-flip/params-MRR-LitRev-fastMakeIndex.map
data.MRR-flip/params-MRR-LitRev-fastSave+MakeIndex.map
data.MRR-flip/params-MRR-EG3.2-Test1-fastSave.map
data.MRR-flip/params-MRR-EG3.2-Test1-fastMakeIndex.map
#
#"(9.2) Flip Tables with and without filtering saving"
#" the flipped .txt file and .html file."
data.MRR-flip/params-MRR-flipGID-all-GeneList.map
data.MRR-flip/params-MRR-flipGID-all-FeatureID.map
data.MRR-flip/params-MRR-flipGID-all-GeneList+FeatureID.map
data.MRR-flip/params-MRR-flipGID-all-GeneList-RowNames.map
data.MRR-flip/params-MRR-flipGID-all-GeneList+FeatureID-RowNames.map
data.MRR-flip/params-MRR-flipGID-LitRev.map
data.MRR-flip/params-MRR-flipGID-EG3.2-test1.map
#
#"(10) Run the GenBatchScripts to create the batch scripts data"
#" This is normally not done each time."
data.GBS/params-genBatchScripts.map
#
#
#"------------ End -------------"
Example 11.
Edit a very large .txt file (EG1+EG3.1+EG3.2-MSE430_2-18Arrays-RMA-Grouped.txt) into
another .txt file (EGALLDataSet.txt) using the fast edit command. Note
the -sortRowsByColumn is not available when doing a fast-edit of a large file.
E.g., Edit (-dropColumns, -reorderColumns, -mapHdrNames, -reorderRemainingColumnsAlphabeticly).
The switches are in file 'params-MRR-all-fast.map'.
Note: you must double quote arguments that use spaces.
===============================================================
HTMLtools data.MRR-all/params-MRR-all-fast.map
where: data.MRR-all/params-MRR-all-fast.map
contains:
#File:params-MRR-all-fast.map
#"Revised 6-20-2009"
#"Convert MRR all arrays file to edited DB 'EGALLDataSet.txt' file."
#
-inputDir:data.MRR-all
-outputDir:data.Table
-outputDir:data.Table
-tableDir:data.Table
#
-files:"EG1+EG3.1+EG3.2-MSE430_2-18Arrays-RMA-Grouped.txt"
#
#
#"Save the edited table as a .txt file"
-saveEditedTable2File:EGALLDataSet.txt,noHTML
#
#"Do a fast edit of the .txt file and don't generate HTML file"
-fastEditFile
#-noHTML
#
#-addOutfilePostfix:"-edit"
#
-allowHdrDups
-rmvTrailingBlankRowsAndColumns
-hdrLines:2
-useOnlyLastHeaderLine
-hasEmptyLineBeforeTable
#
#"Map header names. Select from field='Affy .CEL file (16)'"
#" to field= 'GSP ID (9)' or 'Simple GSP ID (10)'"
-mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)"
#
#"Drop more columns for simplest file"
-dropColumn:Difference
-dropColumn:p-Value
-dropColumn:Description
#"Drop some of the columns"
-dropColumnColumn:"mgB36 Chr:Start-Stop"
-dropColumn:"mgB36 Cytoband"
-dropColumn:Annotation_Src
-dropColumn:UniGene
-dropColumn:RefSeq
-dropColumn:Refseqs_Hit
-dropColumn:geneIDS_Hit
-dropColumn:"Entrez GeneID"
-dropColumn:"Locus Tag"
-dropColumn:"BioCarta Pathways"
-dropColumn:"KEGG Pathways"
-dropColumn:"Gene Ontology Terms" (remove # if want to drop)
-dropColumn:"GO Tier2 Component"
-dropColumn:"GO Tier3 Component"
-dropColumn:"GO Tier2 Function"
-dropColumn:"GO Tier3 Function"
-dropColumn:"GO Tier2 Process"
-dropColumn:"GO Tier3 Process"
#"The following was added 5/28/09"
-dropColumn:"Map"
-dropColumn:"mgB37_Probe Chr:Start-Stop"
-dropColumn:"mgB37_Probe Cytoband"
-dropColumn:"mgB37_RefSeq Chr:Start-Stop"
-dropColumn:"mgB37_RefSeq Cytoband"
#
#"Sort the remaining columns alphabetically"
-reorderRemainingColumnsAlphabeticly
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"----------- End --------- "
Example 12.
Do URL mapping of a multiple row header data in a transposed table. The Table
was transposed in a separate operation using Excel. ([TODO] we may
add a transpose function to the converter in the future).
E.g., Make hyperlinks in header rows rather than the table data.
The switches are in file 'params-MRR-GL-GeneList.map'.
===============================================================
HTMLtools data.MRR-GL-examples/params-MRR-GL-GeneList.map
where: data.MRR-GL-examples/params-MRR-GL-GeneList.map
contains:
#File:params-MRR-GL-GeneList.map
#"Revised: 3-30-2009"
#
-addPrologue:data.MRR-GL-examples/prolog.html
-addEpilogue:data.MRR-GL-examples/epilogue.html
-addRowNumbers
-addTableName:"GSP Genes mentioned in Hennighausen & Robinson Review (2008)"
-inputDir:data.MRR-GL-examples
-outputDir:html/GSP/Search/example/
-tablesDir:data.Table
-files:GeneListTbl-all-A+G.txt,GeneListTbl-all-EG1+EG3.txt,GeneListTbl-all-Stat5ab+Socs2.txt
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
#
#"Map all 3 header lines in the Table"
-hdrLines:3
#
#"This does multiple header-row data mapping."
-hrefHeaderRowMapping
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"------------ End -----------"
Example 13.
Generates a Tests-Intersection .txt and .html table from the tests data also
used in Example 15. This table of the tests intersections showing test-results
of gene fold-change data. It is computed for all samples and all tests in the
mAdb-TestsToDo.txt list that
was saved from the mAdb-TestsToDo.xls
file. Only genes that have passed any of the tests are included, even if the
gene had only passed one test. The results are sorted by gene name. These
results are available as an Excel file TestsIntersection-ALL.txt and
also as a HTML Web page. See the
Tests-Intersection commands.
E.g., Create a a Tests-Intersection .txt and .html table.
The switches are in file 'params-TestsIntersection-ALL.map'.
===============================================================
HTMLtools data.TestsIntersection/params-TestsIntersection-ALL.map
where: data.TestsIntersection/params-TestsIntersection-ALL.map
contains:
#File:params-TestsIntersection-ALL.map
#"Revised 5-06-2009"
#
#"[1] Master script to create a Tests Intersection Table file for all tests"
#"in the mAdb-TestsToDo.txt file that we have data."
#
-inputDir:data.GBS
-outputDir:html/GSP/TestsIntersection
#
#"Limit the number of rows to the highest 500 fold-change values"
-limitMaxTableRows:"500,Range FC,Descending"
#
#"The tablesDir subdir. where mapping and other reference Tables are copied"
#"to the batchScripts directory."
-tablesDir:data.Table
#
-allowHdrDups
-rmvTrailingBlankRowsAndColumns
-hdrLines:2
-hasEmptyLineBeforeTable
#
-makeTestsIntersectionTable:"mAdb-TestsToDo.txt"
#
#"Add FC range computations and expand the TI table with"
#"fields ('Max FC', 'Min FC', 'Range FC')."
-addFCrangesForTestsIntersectionTable
#
#"Add the ('Range A Mean', 'Range B Mean', 'FC counts %') computations to"
#"an expanded TestsIntersectionTable table."
-addRangeOfMeansToTItable
#
#"Save the edited table as a .txt file"
-saveEditedTable2File:"TestsIntersection-ALL.txt,HTML"
-mapDollarsigns:$$EXCEL-FILE$$,"TestsIntersection-ALL.txt"
#
#"The mAdb-TestsToDo.txt Tables are in"
#"the '-tablesDir:data.Table' subdirectory."
#
#"[2] Now after the Tests-Intersection .txt table is saved, generate the HTML file."
#"Note: Converter removes -hasEmptyLineBeforeTable and sets -hdrLines:5 switches."
#
-addPrologue:data.TestsIntersection/prolog-TI.html
-addEpilogue:data.TestsIntersection/epilogue-TI.html
#
#
-addRowNumbers
-addTableName:"Intersection of All GSP Fold-Change Tests for Genes in any test"
-mapDollarsigns:$$TITLE$$,"All GSP Tests for Genes in any test"
-allowHdrDups
#
-alternateRowBackgroundColor:white
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
#
###-sortRowsByColumn:Gene,Ascending
-sortRowsByColumn:"Range FC",Descending
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID,http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A"
-hrefData:"Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene="
-hrefData:"Feature ID,https://www.affymetrix.com/LinkServlet?probeset="
#
#"------------- End ---------------"
Example 14.
Create a flipped tables with hyperlinked multi-line headers with data
filtered by rows and column name filters. This process is broken into
two scripts: Example 14.1 to create an edited
table file and then an Index-Map table file from the edited table file.
Then, Example 14.2 to generate a flipped
table saved as .txt and .html files using the edited table and it's
Index-Map file.
Example 14.1
Create an edited table file and then an Index-Map table file from the
edited table file. See Example 14.2
for the second part to use these files to created a flipped table.
E.g., Create batch scripts for subsequent file conversion processing.
The switches are in file 'params-MRR-all-fastSave+MakeIndex.map.map'.
===============================================================
HTMLtools data.flip/params-MRR-all-fastSave+MakeIndex.map
where: data.flip/params-MRR-all-fastSave+MakeIndex.map
contains:
#File:params-MRR-all-fastSave+MakeIndex.map
#"This saves the edited Table and then makes an Index-Map file"
#"of the saved edited table."
#"Revised 6-23-2009"
#
-inputDir:data.MRR-all
-outputDir:data.MRR-flip
-tablesDir:data.Table
#
-files:"Review-LH-18Arrays-54-pathway-Genes-in-JakStat.txt"
#
#"Save the edited table as a .txt file"
-saveEditedTable2File:EGALLDataSet.txt,noHTML
#
#"Make an EGALLDataSet.idx index file of the .txt file"
-makeIndexMapFile:"Gene,Well ID,Feature ID"
#
-allowHdrDups
-rmvTrailingBlankRowsAndColumns
-hdrLines:2
-useOnlyLastHeaderLine
-hasEmptyLineBeforeTable
#
#"Do a fast edit of the .txt file"
-fastEditFile
-noHTML
#
#"Map header names. Select from field='Affy .CEL file (16)'"
#" to field= 'GSP ID (9)' or 'Simple GSP ID (10)'"
-mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)"
#
#"Drop more columns for simplest file"
-dropColumn:Difference
-dropColumn:p-Value
-dropColumn:Description
#"Drop some of the columns"
-dropColumnColumn:"mgB36 Chr:Start-Stop"
-dropColumn:"mgB36 Cytoband"
-dropColumn:Annotation_Src
-dropColumn:UniGene
-dropColumn:RefSeq
-dropColumn:Refseqs_Hit
-dropColumn:geneIDS_Hit
-dropColumn:"Entrez GeneID"
-dropColumn:"Locus Tag"
-dropColumn:"BioCarta Pathways"
-dropColumn:"KEGG Pathways"
-dropColumn:"Gene Ontology Terms" (remove # if want to drop)
-dropColumn:"GO Tier2 Component"
-dropColumn:"GO Tier3 Component"
-dropColumn:"GO Tier2 Function"
-dropColumn:"GO Tier3 Function"
-dropColumn:"GO Tier2 Process"
-dropColumn:"GO Tier3 Process"
#"The following was added 5/28/09"
-dropColumn:"Map"
-dropColumn:"mgB37_Probe Chr:Start-Stop"
-dropColumn:"mgB37_Probe Cytoband"
-dropColumn:"mgB37_RefSeq Chr:Start-Stop"
-dropColumn:"mgB37_RefSeq Cytoband"
#
#"Sort the rest of the columns alphabetically"
-reorderRemainingColumnsAlphabeticly
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"----------- End ----------- "
Example 14.2
Create a flipped table tab-delimited .txt file and .html file from the
edited table file and index-map table files created in
Example 14.1.
E.g., Create batch scripts for subsequent file conversion processing.
The switches are in file 'params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map'.
===============================================================
HTMLtools data.flip/params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map
where: data.flip/params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map
#File:params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map
#"Revised 7-18-2009"
#
-addPrologue:data.MRR-flip/prolog.html
-addEpilogue:data.MRR-flip/epilogue.html
-addRowNumbers
-addTableName:"Flipped 18 GSP Mouse MOE403_2 arrays Filtered by Feature_ID List"
#
-inputDir:data.MRR-flip
-outputDir:html/data.flip
-tablesDir:data.Table
#
-addOutfilePostfix:"-GeneList+FeatureID-RowNames"
#
-flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx"
-flipColumnName:"*LIST*,Gene,Socs1,Socs2,Socs3,Stat1,Stat2,Stat3,Stat4,Stat5a,Stat5b"
-flipColumnName:"*LIST*,Well ID,"
-flipColumnName:"*LIST*,Feature ID,1418507_s_at,1449109_at, 1438470_at,1441476_at"
-flipRowNames:"*LIST*,EG001,EG003.1,EG003.2"
-flipOrder:"Gene,Well ID,Feature ID"
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
#
#"This does multiple header-row data mapping."
-hrefHeaderRowMapping
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"--------- End ---------"
Example 15.
Create a set of batch jobs to convert data described in a table file
generating summary Web pages, a set of params .map files in a tree structure
in directory batchScripts/. (See Section
1.1 Generating Batch Scripts
for more details on GenBatchScripts processing.)
The data for this script is in data.GBS. It uses
a list of data.GBS/mAdb-TestsToDo.txt
tests that are used to generate a sets of HTML files in the Analyses and JTV
directories. The file as created by saving that was saved from the manually
edited mAdb-TestsToDo.xls as a
tab-delimited file. It also uses a set of params .map templates prefixed with
paramsTemplate, a set of summary templates prefixed with
summaryTemplate, and a prolog and epilogue templates for the MRR and JTV
HTML web pages that will be generated.
The 'inputTree' of tab-delimited and .zip file data for these batch analyses
are in the data.GBS/CellTissue/
directory. The GenBatchScripts processing will generate the
batchScripts/ tree (described in the following
params-genBatchScripts.map script. After GenBatchScripts processing is finished,
the user would run the Windows BAT file batchScripts/buildWebPages.bat on
the batchScripts/buildWebPages.doit
batch list, both just created. The params .map files referenced in the .doit file
are in batchScripts/ParamScripts.
When run, it saves the generated HTML Web pages and converted
JTV files in batchScripts/Summary,
batchScripts/Analyses, and
batchScripts/JTV. See the
GenBatchScripts commands.
E.g., Create batch scripts for subsequent file conversion processing.
The switches are in file 'params-genBatchScripts.map'.
===============================================================
HTMLtools data.GBS/params-genBatchScripts.map
where: data.GBS/params-genBatchScripts.map
contains:
#File:params-genBatchScripts.map
#"Revised 4-26-2009"
#
#"Master script to generate params .map files and buildWebPages.doit"
#"file for all tests in the mAdb-TestsToDo.txt file. It generates an"
#"environment in batchScripts/ to enable running all of the scripts as"
#"a Window's batch job buildWebPages.bat on the buildWebPages.doit batch"
#"input list Windows batch startup file."
#
#"The templates (.html, .param) and .map files are in the same directory"
#"as this master batch generation script."
#
#"Map: mAdb-TestsToDo.txt - test Table to drive the batch scripts generation."
#
#"Map: CellTypeTissue.map - maps 'Introduction' field for 'Tissues'."
#"Map: ExperimentGroups.map - maps 'Details' field for 'Expression Groups'."
#"Map: EGMAP.map - maps 'Affy .CEL file' name to 'Simple GSP ID' or 'GSP ID'."
#
#"Arg: batchScripts - where all files and the following subdirectories are saved"
#"Arg: ParamScripts - subdir. where generated params*.map files are copied"
#"Arg: inputTree - subdir. where mAdb generated .txt MRR and JTV data are copied"
#"Arg: Summary - subdir. where generated text HTML top level Web pages are saved"
#"Arg: Analyses - subdir. where generated text HTML & edited .txt files are saved"
#"Arg: JTV - subdir. where generated JTVtext HTML & edited JTV files are saved"
#"Arg: JTVjars - subdir. where the JTV runtime jar files are copied"
#
-inputDir:data.GBS
-outputDir:batchScripts
#
#"The tablesDir subdir. where mapping and other reference Tables are copied"
#"to the batchScripts directory."
-tablesDir:data.Table
#
-genBatchScripts:"batchScripts,ParamScripts,InputTree,Summary,Analyses,JTV"
-rmvTrailingBlankRowsAndColumns
#
#"The following maps and Tables are in the '-tablesDir:data.Table' subdirectory."
-genMapHdrNames:"EGMAP.map"
-genMapEGdetails:"ExperimentGroups.map"
-genMapIntroduction:"CellTypeTissue.map"
-genTestFiles:"mAdb-TestsToDo.txt"
#
#"Create Tests-Intersection (TI) HTML links in summary file & params .map files."
-genTestsIntersection
#
#"List of CellType/Tissue summary templates for generating the Summary pages"
-genSummaryTemplate:1,summaryTemplateProlog.html
-genSummaryTemplate:2,summaryTemplateExperimental.html
-genSummaryTemplate:3,summaryTemplateAnalysis.html
-genSummaryTemplate:4,summaryTemplateFurtherAnalysis.html
-genSummaryTemplate:5,summaryTemplateEpilogue.html
#
#"List of params .map templates for generating batch params .map files."
-genParamTemplate:MRR,paramsTemplate-MRR.map
-genParamTemplate:MRR-keep,paramsTemplate-MRR-keep.map
###-genParamTemplate:JTV,paramsTemplate-JTV.map
-genParamTemplate:JTV,paramsTemplate-JTV-jtvReZip.map
-genParamTemplate:MRR-saveFile,paramsTemplate-MRR-saveFile.map
-genParamTemplate:TI,paramsTemplate-TI.map
#
#"List of support files to be copied to support -batchProcess of the .doit file."
-genCopySupportFile:"../HTMLtools.jar"
-genCopySupportFile:"../ReferenceManual.html"
-genCopySupportFile:prologMRR.html
-genCopySupportFile:prologJTV.html
-genCopySupportFile:prologTI.html
-genCopySupportFile:epilogueMRR.html
-genCopySupportFile:epilogueJTV.html
-genCopySupportFile:epilogueTI.html
#
#"List of JTV support files to be copied to support -batchProcess of the .doit file."
#-genCopySupportFile:JTVjars/TreeViewApplet.jar
#-genCopySupportFile:JTVjars/nanoxml-2.2.2.jar
#-genCopySupportFile:JTVjars/plugins/Dendrogram.jar
#-genCopySupportFile:JTVjars/plugins/Karyoscope.jar
#-genCopySupportFile:JTVjars/plugins/Scatterplot.jar
#-genCopySupportFile:JTVjars/plugins/Treeanno.jar
#
#"Copy tree data to top level batch scripts subdirectory"
-genTreeCopy:JTVjars,batchScripts/JTVjars
#"Copy Mapping files tree data to top level batch scripts subdirectory"
-genTreeCopy:data.Table,batchScripts/data.Table
#
#"Copy input data tree data to batch scripts subdirectory"
-genTreeCopy:data.GBS/CellTissue,batchScripts/inputTree/CellTissue
#
#"------------- End ---------------"
Example 16 - generates a .txt database file and an Index Map .idx file.
The two operations consiste of two parameter .map files:
Example 16.1
data.MRR-all/params-MRR-all-fastSave.map for the .txt database file,
Example 16.2
params-MRR-all-fastMakeIndex.map for the .idx Index Map file, and
Example 16.3
params-MRR-all-fastMakeStatisticsIndex.map for the .sidx global
Statistics Index Map file. The latter computes an extended Index Map file
with (min,max,mean,stddev) for each row of the numeric data and
global (min,max,mean,stddev) values used in 2 additional header rows.
The .sidx file is used in generating heatmap tables in flipped table database
search example in Example 17.
===============================================================
Example 16.1 - generates a .txt database file
This creates an edited database file that can be used in other
operations such as the database search example in
Example 17 where the output .txt file could
be copied to the data.search/ directory.
HTMLtools data.MRR-all/params-MRR-all-fastSave.map
where: data.MRR-all/params-MRR-all-fastSave.map
contains:
#File:params-MRR-all-fastSave.map
#"Revised 8-18-2009"
#"Convert MRR all arrays file to edited DB 'EGALLDataSet.txt' file."
#
-inputDir:data.MRR-all
-outputDir:data.MRR-all
-tableDir:data.Table
#
-files:"EG1+EG3.1+EG3.2-MSE430_2-18Arrays-RMA-Grouped.txt"
#
#
#"Save the edited table as a .txt file"
-saveEditedTable2File:EGALLDataSet.txt,noHTML
#
#"Do a fast edit of the .txt file and don't generate HTML file"
-fastEditFile
#-noHTML
#
#-addOutfilePostfix:"-edit"
#
-allowHdrDups
-rmvTrailingBlankRowsAndColumns
-hdrLines:2
-useOnlyLastHeaderLine
-hasEmptyLineBeforeTable
#
#"Map header names. Select from field='Affy .CEL file (16)'"
#" to field= 'GSP ID (9)' or 'Simple GSP ID (10)'"
-mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)"
#
#"Drop more columns for simplest file"
-dropColumn:Difference
-dropColumn:p-Value
-dropColumn:Description
#"Drop some of the columns"
-dropColumnColumn:"mgB36 Chr:Start-Stop"
-dropColumn:"mgB36 Cytoband"
-dropColumn:Annotation_Src
-dropColumn:UniGene
-dropColumn:RefSeq
-dropColumn:Refseqs_Hit
-dropColumn:geneIDS_Hit
-dropColumn:"Entrez GeneID"
-dropColumn:"Locus Tag"
-dropColumn:"BioCarta Pathways"
-dropColumn:"KEGG Pathways"
-dropColumn:"Gene Ontology Terms" (remove # if want to drop)
-dropColumn:"GO Tier2 Component"
-dropColumn:"GO Tier3 Component"
-dropColumn:"GO Tier2 Function"
-dropColumn:"GO Tier3 Function"
-dropColumn:"GO Tier2 Process"
-dropColumn:"GO Tier3 Process"
#"The following was added 5/28/09"
-dropColumn:"Map"
-dropColumn:"mgB37_Probe Chr:Start-Stop"
-dropColumn:"mgB37_Probe Cytoband"
-dropColumn:"mgB37_RefSeq Chr:Start-Stop"
-dropColumn:"mgB37_RefSeq Cytoband"
#
#"Sort the remaining columns alphabetically"
-reorderRemainingColumnsAlphabeticly
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"----------- End --------- "
Example 16.2 - generates an .idx Index Map file of the database file
This creates an Index Map .idx file from the database file. These files are
used in other operations such as the
database searchGUI using the script example in
Example 17 where the output .idx file could
be copied to the data.search/ directory.
HTMLtools data.MRR-all/params-MRR-all-fastMakeIndex.map
where: data.MRR-all/params-MRR-all-fastMakeIndex.map
contains:
#File:params-MRR-all-fastMakeIndex.map
#"Revised: 8-19-2009"
#
-inputDir:data.MRR-all
-outputDir:data.MRR-all
-tablesDir:data.Table
#
-files:"EGALLDataSet.txt"
#
-hdrLines:1
#
#"Do a fast edit of the .txt file"
-fastEditFile
#
#"Make an .idx index file of the .txt file"
-makeIndexMapFile:"Gene,Well ID,Feature ID"
#
#"----------- end --------- "
Example 16.3 - generates a .sidx global StatisticsIndex Map file of the database file
This creates a global Statistics Index Map .sidx file from the database file.
This file is used in other operations such as the
database searchGUI using the script example in
Example 17 where the output .sidx file could
be copied to the data.search/ directory and is used if a heatmap table
is generated for the flipped table.
HTMLtools data.MRR-all/params-MRR-all-fastMakeStatisticsIndex.map
where: data.MRR-all/params-MRR-all-fastMakeStatisticsIndex.map
contains:
#File:params-MRR-all-fastMakeStatisticsIndex.map
#"Revised: 8-11-2009"
#
-inputDir:data.MRR-all
-outputDir:data.MRR-all
-tablesDir:data.Table
#
#"Specify the edited data set file to use. It is assumed that"
#"the IndexMap file was created and has a .idx file extension."
-files:"EGALLDataSet.txt"
#
-hdrLines:1
#
#"Specify columns to drop when analyzing the Statistics, the rest are dropped."
-dropColumn:Gene
-dropColumn:"Well ID"
-dropColumn:"Feature ID"
#
#"Make an .sidx index file of the .txt and .idx files"
-makeStatisticsIndexMapFile
#
#"----------- end --------- "
Example 17 - the paramsSearchDefault.map file used
with the "-searchGui" option
The the paramsSearchDefault.map file contains additional information
used by the search database GUI ("-searchGui" option). See
Search GUI for more details.
===============================================================
HTMLtools -searchGui
where: this looks for the file data.search/paramsSearchDefault.map
contains:
#File:paramsSearchFlip.map
#"$$DATE$$"
#
#"Search information read by the search GUI for prompts and menus"
-searchTermNames:"Gene,Well ID,Feature ID"
-searchRowFilterName:"Sample Experiment Groups"
-searchSampleChoiceFile:sampleExperimentGroupsChoices.txt
-searchTermsDemoData:"Stat5a Stat5b 1438470_at 1441476_at 1446085_at"
-searchUserTermList:"LitRefGeneList.txt,Feature ID,Literature Review"
-searchTermsFilterPrompt:"'Gene', 'Well ID', and/or 'Probe' names. E.g., Stat5a, Stat5b, 1438470_at 1441476_at 1446085_at, etc."
-searchRowFilterPrompt:"'Sample Experiment Groups'. E.g., select one or more Experiment Groups"
#
-addPrologue:data.search/prolog.html
-addEpilogue:data.search/epilogue.html
-addTableName:"$$DATA_SOURCE_SUBTITLE$$"
-addRowNumbers
#
-addTableName:"Search database filtered by Gene and/or Probe IDs and Experiment Groups"
#
-inputDir:data.search
-outputDir:data.search
-tablesDir:data.Table
#
-addOutfilePostfix:"-search"
#
#"Database (.txt) and index map of database (.idx) to search"
-flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx"
#
$$SEARCH_FILTERS$$
#
#"Maps to:"
# '-flipOrder:"Gene,Well ID,Feature ID"'
# '-flipColumnName:"*LIST*,Gene,g1,g2,...gn"'
# '-flipColumnName:"*LIST*,Well ID,w1,w2,...,wk"'
# '-flipColumnName:"*LIST*,Feature ID,f1,f2,...,fm"'
# '-flipRowNames:"*LIST*,s1,s2,...,sp"'
# '-dataPrecisionHTMLtable:-1'
# '-showDataHeatmapFlipTable'
# '-flipUseExactColumnNameMatch:TRUE'
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
#
#"This does multiple header-row data mapping."
-hrefHeaderRowMapping
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"--------- End ---------"
5. DEMONSTRATION DATA SETS
6. SOFTWARE DESIGN
This Java application converts a set of tab-delimited data files to various
HTML TABLE formats with many mapping options available. We use the
term Table with an uppercase 'T' to indicate the FileTable data
structure used throughout the program. The command line arguments are
parsed by the Switches class. The main() method in the HTMLtools
class class invokes the switch parser and then determines if batch processing
is to be used (if the '-batchProcess' switch is invoked). If the first
command does not have a '-' prefix, it is assumed to be a parameter file
(denoted paramXXX.map above). This is then read and the switches in that
file are then parsed. It is assumed that there may be more than one .txt
file in the input directory ('-inputDir' switch). So a list of these files
is then processed applying the various other command line switches that
were specified. To run it only on a subset of the files in the inputDirectory,
use the '-files:{f1,f2,...,fn}' command specification. In addition, it may be
used iteratively if the initial command line argument is the
'-batchProcess:{a batch file}' in which case it is assumed that the batch
input file (e.g., batchList.doit contains a
list of param .map files to be batch processed.
The program generates a HTMLtools.log
file of the processing every time it is run (and overrides it each time
it is run).
java -Xmx256M -classpath .;.\HTMLtools.jar HTMLtools \
-genBatchScripts:data.GBS/params-genBatchScripts.map
This in turn generates the batchScripts/
directory described in Section 5 above with a generated
batchScripts/buildWebPages.doit params .map scripts to execute and
the Windows .BAT file batchScripts/buildWebPages.bat to start the
batch processing.
The program was built using Eclipse (Version 3.4) (www.eclipse.org). The distibution includes the ANT (ant.apache.org) build.xml script that could be used either standalone or with some Integrated Development Environment such as Eclipse (which includes ANT). There is a separate javadocs .BAT file javadocs-HTMLtools.bat that can be used for generated the java class documentation in the javadocs/ directory. The .BAT files are renamed in the initial .zip file distribution and need to be unpacked before use (see Section 2.1 for details).
List of Java Class modules
Source code modules for HTMLtools application.
6.1 Converter GUI design
The ConverterGUI.jar file is just a copy of the HTMLtools.jar
file renamed to HTMLtools.jar. When it runs, it checks to see what it was
called and then does the same thing as it as HTMLtools -gui.
When started, it pops up a graphical user interface (see
2.1.1 Using the Graphical User Interface (GUI) to run the converter.
The user selected, using the File menu, either a parameter .map script file or
a batch .doit file (which contains a list of .map script files). When they
press the Process button, it creates a new thread ProcessData.java
and has it execute the selected .map or .doit file. When processing, it
accumulates a list of HTML files that were generated. When done, it puts
this list into a View HTML chooser GUI. If the user selects one, it will
then pop up a Web browser showing this file.
6.2 Search GUI design
The SearchGui.jar file is just a copy of the HTMLtools.jar
file renamed to SearchGui.jar. When it runs, it checks to see what it was
called and then does the same thing as it as HTMLtools -searchGui.
When started, it pops up a graphical user interface (see
2.1.2 Search User Database with a Graphical User Interface (GUI) Generating Reports.
It also needs to load the Index Map (.idx) file which it does in the background
by creating a new thread ProcessLoadIndexMapData.java which lets the
user continue selecting data in the interface. Processing is delayed until the
map is loaded since it is used to verify the data entered by the user.
The user enters search information into the SearchGui interface to specify
1. a list of genes and/or Well IDs and/or Feature IDs (gene probe IDs);
and then, 2. select one or more experiment groups. When they press
the Process button, it creates a custom script
data.search/paramSearchFlip.map from a default
data.search/paramsSearchDefault.map script that is domain dependent. Then
it creates a new thread (ProcessDataSearch.java) and recursively calls
HTMLtools to execute the just generated paramSearchFlip.map
script file. The script includes the flip table options to actually generate the
flipped table on the specified subset of data. When the thread is done processing,
it has generated data.search/EGALLDataSet-search.txt and
data.search/EGALLDataSet-search.html files. It then lets the user press
the View HTML button to pop up a web brower to see the
EGALLDataSet-search.html file.
In the example data.search/paramSearchFlip.map, it has flip options
specified from a merging of the user-specified data along with other data
(see the file for the rest of the default options)
. . .
#"Database (.txt) and index map of database (.idx) to search"
-flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx"
#
-flipOrderHdrColNames:"Gene,Well ID,Feature ID"
-flipColumnName:"*LIST*,Gene,Stat5a,Stat5b"
-flipColumnName:"*LIST*,Feature ID,1438470_at,1441476_at,1446085_at"
-flipRowFilterNames:"*LIST*,EG002,EG003.1,EG003.2"
#
#"Set the data precision for generated HTML."
-dataPrecisionHTMLtable:0
#
#
#"Set the flip-table sort by column name."
-sortFlipTableByColumnName:"Stat5b"
#
#"Generate heat-map data cells in a HTML conversion if .sidx exists."
-showDataHeatmapFlipTable
#
. . .
Flip table computation
The flip table option uses three precomputed database files (created with HTMLtools): data.search/EGALLDataSet{.txt,.idx,.sidx} that allow us to random access any row by probe ID. (A gene may have 1 to half a dozen probe IDs). Since the number of genes that would be used in a search is relatively small (under 100 - typically a lot less on the order of 10 to 20), gathering the data is relatively fast. The script file specifies -flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx".
If the heatmap option is used, then it uses the EGALLDataSet.sidx file instead (i.e., Statistics Index Map file). This contains the same row seek data as the .idx file, but also statistics (min, max, mean, stddev) for each row and for the entire database. This is then used to map each table data cell value to a cell background color to implement the heat map.
In addition to the heatmap option (default), it also lets you adjust the precision of the data (which normally has 3 or 4 digits) to 0 or more digits, and to sort the rows by the data in a particular gene/probe ID column.
So the processing is broken into two parts: the GUI part (SearchGui) to gather the arg list to generate the paramSearchFlip.map file, and the flip table generate part to build the search results heatmap HTML table.
This section documents the batchScripts/ directory for creating Web pages and briefly describes the 1) contents of the batchScripts/ directory, 2) how it is created using the GenBatchScripts (GBS) facility of this converter program using the list of mAdb tests and data from those tests, and 3) finally how it is used to create Web pages suitable for copying to the Jak-Stat Prospector (J-S P) Web server on http://jak-stat.nih.gov/. This example could serve as a model for developing static Web server pages for other types of analysis system generated data to be used on a static Web server.
Before attempting to run the GenBatchScripts process to create the batchScripts/ directory, we recommend you familiarize your self with the commands in this Reference Manual. The batchScripts/ directory contains a Windows .bat file to run the HTMLtools program, buildWebPages.bat, and a list of conversion batch jobs in buildWebPages.doit. The buildWebPages.doit file contains a list of generated conversions to be performed in converter-batch (as opposed to Windows-batch). Each conversion is in the form of a generated parameter .map file saved in the batchScripts/ParamsScripts/ directory. (In the rest of this discussion for brevity, we will omit the batchScripts/ prefix in mentioning these directories where it is unambiguous.)
There are additional support files that are required for updating the J-S P Web server tree. These are described at the end of this document in the discussion on converting data from GSP-Inventory for the J-S P.
7.1 Overview of Conversion Process
7.2 Contents of the batchScripts/ directory
The batchScripts/ directory contains files
and subdirectories required to convert the mAdb tab-delimited text data into
HTML and JTV data that can then be copied to the J-S P Web site. The list of
generated batchScripts/ subdirectories is:
ParamScripts/ - GBS generated conversion params*.map files InputTree/ - mAdb generated .txt MRR and JTV data are copied by GBS Summary/ - GBS generated text HTML top level Web pages Analyses/ - GBS generated text HTML & edited .txt files JTV/ - GBS generated JTVtext HTML & edited JTV files JTVjars/ - the JTV runtime jar files are copied by GBS data.Table/ - the common mapping files are copied by GBS buildWebPages.doit - GBS generated -batch script to convert MRR and JTV data buildWebPages.bat - GBS generated Windows BAT file run converter on .doit file
ExperimentGroups.map - Experiment Group info by EGxxxx CellTypeTissue.map - Tissue 'Introduction' by EGxxxx for summaries EGMAP.map - map 'Affy .CEL file' names to 'GSP ID's mAdbArraySummary.map - the 'mAdb ID' by 'Affy .CEL file'The ExperimentGroups.map file is the tab-delimited sheet of that name in the GSP-Inventory.xls spreadsheet. The EGMAP.map tab-delimited file is the concatenation of the individual EGxxxx.txt data files from the GSP-Inventory.xls spreadsheet (see Notes sheet in the GSP-Inventory), and assembled into the map by the data.GSP-EG/params-GSPI-EG-concatTXT.map and data.Maps/params-Maps-EGMAP-map.map scripts. The mAdbArraySummary.map file is the saved 'mAdb Array Summary' for all of the samples data. All generated .map files are saved in the to the data.Table/ directory where they are used.
summaryTemplateProlog.html summaryTemplateExperimental.html summaryTemplateAnalysis.html summaryTemplateFurtherAnalysis.html summaryTemplateEpilogue.html
$$TISSUE$$ - tissue associated with the test $$INTRODUCTION$$ - Introduction data from the CellTypeTissue.map file $$LIST_EXPR_GROUPS$$ - list of expression groups used in the test $$DESCRIPTION$$ - description using data from mAdb-TestsToDo data $$ANALYSIS$$ - data generated for the "Further Analysis" section $$FUTHERANALYSIS$$ - data generated for the "Further Analysis" section $$DATE$$ - date of conversion $$INFILENAME$$ - specific test name (e.g., EG1-test-1+FC-ALL.txt)
paramsTemplate-MRR.map - generate MRR gene expression HTML report paramsTemplate-MRR-keep.map - generate MRR gene list HTML report paramsTemplate-JTV-jtvReZip.map - generate mapped JTV data and JTV HTML appletThe $$ keywords are expanded during batchScipts/ParamScripts/ files generation. The entries with ".2" in the name are used for subsequent name remapping during the second phase when evaluating the generated params .map files. This list is common for all params .map files generated for the same test.
$$DATE$$ - date of conversion $$INPUT_DATA$$ - input data relative directory $$OUTPUT_DATA$$ - output data relative directory $$TABLE_DATA$$ - location of the .map files relative directory $$A_SAMPLE_NAME$$ - name of the 'A' condition $$B_SAMPLE_NAME$$ - name of the 'B' condition $$$$JTV_JARS$$$$ - location of the JTV runtime .jar support files directory $$TISSUE.2$$ - tissue associated with the test $$PAGE_LABEL.2$$ - data from page label in mAdb-TestsToDo for test entry $$DESCRIPTION.2$$ - data from description in mAdb-TestsToDo for test entry $$CLASS_A.2$$ - list of GSP IDs for condition A $$CLASS_B.2$$ - list of GSP IDs for condition BThis list can be different for each params .map file generated for the same test.
$$PROLOG$$ - name of prolog file prologMRR.html or prologJTV.html $$EPILOGUE$$ - name of prolog file epilogueMRR.html or epilogueJTV.html $$TITLE.2$$ - title for specific generated Web page $$TEST_OR_ALL.2$$ - "Test" or "All" modifierp $$GBS_DESCRIPTION.2$$ - test specific information $$PARAM_MAP_NAME$$ - name of parameter file with (+-FC, -ALL, -JTV) modifiers $$TESTMAME.2$$ - test name with (+-FC, -ALL, -JTV) modifiers $$FILE.2$$ - data input file for each params .map $$JOIN_TABLE_FILE.2$$ - the -joinTableFile file for MRR-ALL processing only $$MAPDIR$$ - mAdb mapping file for JTV sample name processing only
prologMRR.html epilogueMRR.htmland
prologJTV.html epilogueJTV.html
java -Xmx256M -classpath .;.\HTMLtools.jar HTMLtools \ data.GBS/params-genBatchScripts.mapThis will use the mAdb-TestsToDo.txt Table as well as other files in data.GBS/ including CellTypeTissue.map (the tab-delimited script from the GSP-Inventory.xls spreadsheet) to:
mAdb (MRR) and JTV generated data:
EG1-test-1+FC.txt EG1-test-1-FC.txt (t-Test or fold-change test gene set) EG1-test-1+FC-ALL.txt EG1-test-1-FC-ALL.txt (AND of test gene set with ALL samples) EG1-test-1+FC-JTV.zip EG1-test-1-FC-JTV.zip (JTV heatmap for gene set) EG1-test-1+FC-JTV-ALL.zip EG1-test-1-FC-JTV-ALL.zip (JTV heatmap for ALL samples)Converter output: The .txt files processed by the converter have .html file extensions, the .zip files have the .zip removed and an HTML file generated to start up the JTV. If the mAdb group changes the JTV output to use the GSP IDs instead of mAdb IDs, we can avoid processing the JTV .zip files.
On the J-S-P Web site:
EG1-test-1+FC.txt EG1-test-1-FC.txt (t-Test or fold-change gene set test) EG1-test-1+FC-ALL.txt EG1-test-1-FC-ALL.txt (AND of test gene set with ALL samples) and EG1-test-1+FC.html EG1-test-1-FC.html (t-Test or F-C test - with expr.data) EG1-test-1+FC-keep.html EG1-test-1-FC-keep.html (t-Test or F-C test - no expr. data) EG1-test-1+FC-ALL.html EG1-test-1-FC-ALL.html (AND of test gene set with ALL samples) EG1-test-1+FC-JTV.html EG1-test-1-FC-JTV.html (JTV heatmap for gene set) EG1-test-1+FC-JTV-ALL.html EG1-test-1-FC-JTV-ALL.html (JTV heatmap for ALL samples) EG1-test-1+FC-JTV.zip EG1-test-1-FC-JTV.zip (JTV heatmap for gene set) EG1-test-1+FC-JTV-ALL.zip EG1-test-1-FC-JTV-ALL.zip (JTV heatmap for ALL samples)
java -Xmx512M -classpath .;.\HTMLtools.jar HTMLtools \ -batchProcess:batchScripts/buildWebPages.doit
7.10. Converting data from GSP-Inventory for the Jak-Stat Prospector
There are additional support files that are required for updating the J-S P
Web server tree. The following JS-P GSP/ subdirectories are updated by running
the converter -batchProcessing:batchList.doit batch job: GSP/GSP-Inventory/,
GSP/Search/, and GSP/Tests/ that are created in
html/GSP. Each of these subdirectories has
two additional subdirectories HTML/ and XLS/. The converter then converts the
data to HTML and saves the results in the HTML subdirectories. The XLS/ data
is not created by the HTMLtools converted, but rather separately from the
source data.
7.10.1 HTMLtools distribution directory
The distribution directory has the following data subdirectories required for
generating data for the Jak-Stat Prospector Web site.
Data.GBS/ - GenBatchScripts to create batchScripts/ directory
Data.GSPI-EG/ - EGxxxx.txt data, HTML and concatenated EGMAP.txt scripts
Data.GSPI-ExpGrp/ - ExperimentGroups.txt data and HTML scripts
Data.mAdb-TestsToDo/ - script to create HTML of mAdb-TestsToDo
Data.Maps/ - the scripts used to create HTML and .map files
JTVjars/ - the JTV runtime jar files required
Data.Table/ - primary .txt and .map files for EGMAP, ExperimentGroups,
mAdbArraySummary, and mAdb-TestsToDo files
Directories trees that are created when running the converter. These will contain
the data to be copied to the J-S-P Web site staging directory:
batchScripts/ - the directory created when run GenBatchScripts
html/ - the directory created when run -batchProcess:batchList.doit
JTVoutput/ - the JTV demonstration conversion output (from JTVinput/)
Additional directories have demonstrations of other features that could
be used in conversions including:
Data.MRR/ - separate demonstration mAdb MRR conversions to HTML
Data.MRR-all/ - fast-edit table conversion scripts using buffered I/O
JTVinput/ - the JTV demonstration conversion scripts and data
Additional directories required for support of the converter. Note that the BAT
files in the demo-bat/ directory end in "-bat"
not ".bat". The
README-NOTE-restoring-the-BAT-file-names.txt describes how to make
the BAT files in the demo-bat/ directory runable.
build/build.xml - ANT build file for the making the converter demo-bat/ - additional Windows BAT scripts in portable ("...-bat") form docs/ - additional converter documentation javadocs/ - automatic javadoc Java documentation for the converter src/ - source code for the HTMLtools converterAdditional top level files in the distribution directory:
HTMLtools.jar - converter Java jar file used by the BAT scripts ReferenceManual.html - primary documentation for the converter README-NOTE-restoring-the-BAT-file-names.txt - how to activate the BAT files
java -Xm256M -classpath .;.\HTMLtools.jar HTMLtools \ -batchProcessing:batchList.doitThe subdirectories of generated files are created in html/GSP/ and then copied to subdirectories with the same names in the Jak-Stat Prospector Web tree. See Example 10 for the listing of the batchList.doit file.
It has been released with a small non-proprietary sample data currently publicly available on NCBI GEO to demonstrate some of the aspects of the software.
It was derived and refactored from the open source MAExplorer (http://maexplorer.sourceforge.net/), and Open2Dprot (http://Open2Dprot.sourceforge.net/) Table modules.
Copyright 2008, 2009 by Peter Lemkin
E-Mail: lemkin@users.sourceforge.net
http://lemkingroup.com