3. Commands: (general, GBS, TI, JTV) | 4. Examples | 5. Data sets | 6. Software design | Source code | Javadocs | 7. batchScripts/ directory | 8. About | Figures: 1 | 2 | 3 | 4 | G.1 | G.2 | G.3 | G.4 | G.5 | G.6 | G.7 | S.1 | S.2 | S.3 | S.4 | S.5 | S.6 | S.7 |
File: ReferenceManual.htmlHTMLtools is a Java program to automate the batch conversion of tab-delimited spreadsheet type text files to HTML Web-page files. There are a variety of flexible options to make the Web page presentations more useful. It can also be used for editing large tables. This is described in more detail throughout this reference manual.
Additional command subsets were developed for specialized conversions (JTVconvert, GenBatchScripts, and TestsIntersections) and may be ignored for routine Web page generation in other domains where they don't apply.
The JTVconvert commands can re-map data array names in Java TreeView mAdb data set files to more user friendly experiment names as well as generate HTML Web pages to launch these converted JTV files for each JTV data set.
The GenBatchScripts commands may be used to generate HTMLtools batch scripts for subsequent processing given a list of data test-results files to convert and a tab-delimited tests descriptions file. It is able to use a table (prepared with Excel or some other source) that describes this data and can then use that for extracting and inserting information from various mapping tables into the generated Web pages.
The TestsIntersections commands will synthesize a tests intersection summary table and Web page as well as generating some summary statistics. It uses the same data used in the GenBatchScripts commands.
The Converter GUI mode starts a graphical user command interface (GUI) where they can specify either individual parameter scripts or a batch file list of scripts to be executed in the background with the results shown in the user interface including the ability to view the generated HTML files though a pop up Web browser.
The Search GUI mode starts a database search graphical user interface (GUI) to generated a "flipped table" (see Figure S.6) on a subset of the data from a pre-computed edited table database. They can specify filters on the rows and columns (data and samples subsets), and presentation options to generate a HTML file though a pop up Web browser.
Note: this software is released as an
OPEN SOURCE project HTMLtools on
http://HTMLtools.SourceForge.net/ with a small non-proprietary sample data
set that is bundled along with the program. This data set has already been
published on the public access NLM/NCBI GEO Web site. The original data set was
proprietary created for the Group STAT Project (GSP) and was created along
with the original conversion program, CvtTabDelim2HTML, to support the NIH
Jak-Stat Prospector Web site
that is part of the Trans-NIH Jak-Stat Initiative
(http://jak-stat.nih.gov/)
accessible through the the Prospector login facility. This Web site will
be opened up in the future. Note: the initial release only includes GSP data that has been released to the public in NCBI's GEO database (currently one data set). As more GSP GEO data is released to the public,we will include some of them in the demo database to illustrate more of the features of HTMLtools. |
1. Gather sets of laboratory experiments in multiple laboratories relating to the Jak-Stat gene pathway | v 2. Create Affymetrix microarray data (resulting in .CEL data files) | v 3. Create Inventory of relevant data and annotation of the data in the GSP-Inventory.xls spreadsheets consisting of: 1) group arrays by experiment EG001, EG002,...EG00n; 2) a top level spreadsheet ExperimentGroups describing all EG experiments. | v 4. Consolidate data in mAdb (Microarray DataBase system mAdb.nci.nih.gov). Data is uploaded to each EGxxx subproject, and normalized by pooled RMA or MAS5. | v 4.1 Perform t-test or fold-change tests on subsets of the data that makes sense to compare saving results (+ and - changes separately) as gene subsets. | v 4.2 Export tab-delimited (Excel) mAdb Retrieval Reports (MRR) for each gene set for 1) just the arrays used in the test; 2) all samples in the database. | v 4.3 Compute and export the hierarchical clustered heat maps as Java Tree View (JTV) .zip tab-delimited data sets for external viewing for 1) just the arrays used in the test; 2) all samples in the database. | v 5. Convert the MRR and JTV tab-delimited data to HTML Web pages using the HTMLtools tools | v 6. Merge links to this generated data with Web pages in the Jak-Stat Prospector Web server (and upload to the server). Figure 1. shows an example of a data analysis processing pipeline to convert laboratory microarray data to Web pages that can be used in a the Jak-Stat Prospector Web site. Steps 4.1 and 4.2 could be run for a set of experiments as a batch job. Similarly, the set of files exported from mAdb could be batch processed with HTMLtools. Note that although the HTMLtools converter was developed for this project, the command structure is flexible enough that it could easily be used with other types of data. |
Section 7 describes creating and running the scripts for the batchScripts/ directory for creating Web pages.
1. Read the mAdb-TestsToDo.txt table that specifies all of the tests to be performed on subsets of the mAdb GSP database. For each test, these include: test name, samples being compared, test thresholds, test name and related annotation, tissue name, relative directory for the data (used both for input InputTree/ and output data Analyses/) generated directory trees. | v 2. Create lists of related tests by grouping by same tissue name. | v 3. Read additional mapping table files (ExperimentGroups.map, EGMAP.map, CellTypeTissue.map table to use in generating the summary web pages. | v 4. Create summary Web pages for each tissue type with links to Web pages for analyses we will generate and save in the Summary/ directory. | v 5. Generate all of the 'params .map' batch scripts, several for each test, and save them in the ParamScripts/ directory (see Figure 3 for details). It then copies all support files (above mapping tables), JTVjars/, data.Table/ and other files required when running converter to generate Web pages. | v 6. Generate a buildWebPages.doit file listing the params .map files to be processed with a subsequent batch run using the HTMLtools converter, and a Windows .BAT file, buildWebPages.bat v 7. Start the buildWebPages.bat batch job which generates the Web pages in the Summary/, Analyses/ and JTV/ directory trees. | v 8. Copy the generated Web pages to the Web server. Figure 2. shows an example of the batch script generation pipeline from a table describing a lists of tests that were run as a batch job on another analysis system. In this case, the analysis system is mAdb and it uses the same test "todo" file to specify the tests data.Table/mAdb-TestsToDo.txt as are used here with the GenBatchScripts processing. The mAdb data analysis and tab-delimited Excel data generated is shown in steps 4.1 and 4.2 (see Figure 1). In the GenBatchScripts processing, we first create a batchScripts/ directory and then fill it with various types of data described in this figure. |
tests (MRR & JTV) input: Converter output: Tests for samples: {testName}+FC.txt {testName}+FC.html {testName}+FC-keep.html {testName}-FC.txt {testName}+FC.html {testName}-FC-keep.html AND of above tests for ALL samples: {testName}+FC-ALL.txt {testName}+FC-ALL.html {testName}-FC-ALL.txt {testName}-FC-ALL.html JTV for test samples: {testName}+FC-JTV.zip {testName}+FC-JTV/ {testName}+FC-JTV.zip {testName}+FC-JTV.html {testName}-FC-JTV.zip {testName}-FC-JTV/ {testName}-FC-JTV.zip {testName}-FC-JTV.html JTV above for AND of above tests for ALL Samples: {testName}+FC-ALL-JTV.zip {testName}+FC-ALL-JTV/ {testName}+FC-ALL-JTV.zip {testName}+FC-ALL-JTV.html {testName}-FC-ALL-JTV.zip {testName}-FC-ALL-JTV/ {testName}-FC-ALL-JTV.zip {testName}-FC-ALL-JTV.html Figure 3. shows the set of 8 mAdb results files and 18 converter HTML and JTV generated for each test testName in the mAdb-TestsToDo list. For example, if the test is "EG3.1-test-2", then in the above figure, replace {testName} with EG3.1-test-2, etc. The "+FC" indicates a positive fold-change, and the "-FC" a negative fold-change. The file with "-keep" are gene lists with no expression data. The GenBatchScripts option for the converter generates parameters .map batch scripts for each of these converted files. |
1. Edit the <GSP-Inventory Excel workbook to annotate the set of Affymetrix .CEL files where we assign the next free Experiment Group EGnnn, simple GSP ID, GSP ID, etc. | v 2. Upload the Affymetrix .CEL file data to the GSP mAdb database and normalized the new samples using the pooled RMA data for the base GSP database. | v 3. Add new test to-do in the mAdb-TestsToDo.xls Excel workbook and upload the new test list to mAdb. | v 4. Run the batch tests in mAdb resulting in Excel and JTV data sets that are exported for conversion to Web pages. | v 5. Process these data using the HTMLtools converter into HTML pages and converted data for the Web server. | v 6. Upload these Web pages and data to the NIDDK Jak-Stat Prospector staging area for the jak-stat.nih.gov server. Figure 4. shows shows shows the top-level procedure used for adding new Affymetrix .CEL file data sets to the GSP database and Jak Stat Prosector Web server. In addition, if new gene identifications are made to some of the affymetrix probes (Feature IDs), running steps 4) through 6) can update these identifiers. |
Java TreeView (JTV) DocumentationJava TreeView is an open-source (jTreeView.sourceforge.net) Java applet that mAdb uses to view heatmaps of gene sets. We also use Java TreeView for looking at data snapshots we have taken of the mAdb data.Java TreeView may be downloaded to run as either a standalone application or Java applet from http://jTreeView.sourceforge.net/. The 2004 journal paper by Alok J. Saldanha gives an overview of Java TreeView: "Java Treeview�extensible visualization of microarray data" Bioinformatics 2004 20(17):3246-3248.There is additional Java TreeView documentation Web page includes links to examples, an FAQ, a user guide, Alok J. Saldanha's disertation describing additional aspects of Java Treeview. NOTE: The Java TreeView applet has been shown to work on Mac OSX, XP and Win2K.
|
java -Xmx256M -classpath .;.\HTMLtools.jar \ HTMLtools -inputDir:dataXXX -outputDir:html (etc.)
java -Xmx256M -classpath .;.\HTMLtools.jar \ HTMLtools dataXXX/paramsXXX.map or (in Unix, MacOS-X, or Cygwin): java -Xmx256M -classpath .;./HTMLtools.jar \ HTMLtools dataXXX/paramsXXX.mapwhere the dataXXX/ directory and the paramsXXX.map file are replaced by your data directory and params map file. Then, the generated HTML files will be in the html/ directory or whatever output directory is specified by the -outputDirectory switch in the paramsXXX.map file. This command line tells Java to run the program with 256 Mbytes of memory. For very large files, you may need to increase this memory size. For very large data sets, even that may cause problems and you may not be able to convert them since for the default mode, the Table is loaded into memory before being edited. Some commands such as -fastEditFile are designed to work with very large files and process them as a buffered I/O pipeline and so don't load the Table into memory.
java -Xmx256M -classpath .;.\HTMLtools.jar \ HTMLtools -batchProcessing:batchList.doit
java -Xmx256M -classpath .;.\HTMLtools.jar \ HTMLtools data.GBS:params-genBatchScripts.map
It is invoked from the command line as: java -Xmx512M -classpath .;.\HTMLtools.jar HTMLtools -gui or using the M.S. Windows script (similar for Mac and Linux): cvtTxt2HTML-GUI.batThis is illustrated in the following screen shots.
Figure G.1 This shows the Initial graphical user interface. Using the File menu, the user should select either a batch ".doit" file or a parameter ".map" script file. The File menu is shown in Figure G.2
Figure G.2 This shows the File menu, the user should select either a batch ".doit" file or a parameter ".map" script file.
Figure G.3 This shows GUI after selectiong the script to process. The user then presses the Process button to start processing. The next Figure G.4 shows the program during processing.
Figure G.4 This shows GUI during Processing. The output from the converter is shown in the Report window in the middle of the GUI. This can be saved into a .txt file or cleared if desired. The next Figure G.5 shows the program after processing is finished.
Figure G.5 This shows the GUI after Processing. The output from the converter is shown in the Report window in the middle of the GUI. This can be saved into a .txt file or cleared if desired. The next Figure G.6 shows the program after processing is finished.
Figure G.6 This shows the GUI generated HTML options to choose to view. Selecting one of them will popup a Web browser with that file (see Figure G.8).
Figure G.7 This shows the popup Web browser window for the the selected generated HTML file you chose to view.
However, if you must explicitly run the Java interpreter, you can do it on the command line (invoked various ways on different operating systems) by typing
java -Xmx256M -classpath .;.\searchGui.jar HTMLtools -searchGuiThis line was put into a Windows .BAT file (GSP-SearchGUI.bat) that can be run by clicking on this batch file. Notice that the -Xmx256M specification is available to increase or decrease the amount of memory used. The default memory may vary on different computers. So you can use the script for force it the program to start with more or less memory if you run into problems.
You also need to select the set of samples to use by selecting one or more Experiment Groups (see the Jak-Stat Prospector Web site for details on Experiment Groups). In the 2. Select one or more 'Sample Experiment Groups' window, selecting ALL is the default and will select all 18 arrays. You can click on individual Experiment Groups. To select a range, click on the first one that starts the range and then hold the SHIFT key and click the end of the range. To select non-adjacent Experiment Groups, hold the CONTROL key as you select different groups. Pressing the Reset button, will clear these two windows.
The File menu offers additional processing options. You do not need to use any of these menu options to use the program. However, they can be useful for customizing your search results.
You can save the text output generated during processing that is shown in the 3. Processing Report Log scrollable text area at the bottom of the window.Several File menu commands are used with this including: Set report file name, Clear report, and Save report as. The menu Verbose reporting checkbox could be enabled it you want to see the details on the search and table generating as it progresses in the Report Log window. Note that the Clear report, and Save report as commands are also available in the bottom as buttons with the same names.
You must specify a list of data search terms in the upper window 1. Enter list of Gene, Well ID or Probe ID.... The simplest way to specify these terms is to either cut and paste or type them into the window. To help demonstrate and simplify specifying the search terms, there are three commands in the File menu: Set demo data to enter a short list Stat5a Stat5b 1438470_at 1441476_at 1446085_at, Set default user term-list data that uses the list of gene probe IDs data in the data.search/LitRefGeneList.txt file, and Get user term-list data from file that lets the user specify the list from a text file.
When the search results table is being generated, you can modify it's presentation using other File menu options: Sort descending by column data in generated table (see Figure S.3 for more details). The Show data heat-map in View HTML to show the generated results table as a colored heatmap (see Figure S.6 for an example) This is the default. Finally, Set data precision for generated HTML to adjust the number of digits presented in the generated table (0 sets it to no fraction, whereas the default -1 shows the full precision of availble in the data).
After you finish a search, you can do another one. The File menu options: Reset converter or the Reset button will reset the search specification and make the Process button available.
Figure S.2 This shows the menu options in the File menu.
This menu offers additional processing options described above.
Three additional subsets of specialized commands that are described
separately: the 3.1 GenBatchScripts
commands, 3.2 Tests-Intersection
and the 3.3 Java TreeView commands.
Section 7 describes creating and running the scripts for the
batchScripts/ directory for creating Web
pages.
The parameters specify the data used in the output files generation include
several directories in the batchScripts/
directory:
Additional data files are used when the -genBatchScripts command is run
including:
There may be multiple instances of the -genCopySupportFile, -genParamTemplate,
-genSummaryTemplate switches.
Figure S.3 This shows the pop up query to let you define the sort name
to specify the generated table gene or gene probe ID column to be used for
the sort process. This will then use the gene expression data for the gene
probe you specified to sort the sample rows for the entire table. The default
is not to sort the data, but to use the sample order of the samples in the
expression groups you have specified. This pop up window is invoked from the
(File menu | Sort descending by column data in generated table).
Figure S.4 This shows dialog box (File menu | Set data precision
for generated HTML). The default is -1 which prints all digits available.
Setting it to 0, removes all fractions (used in this example).
Figure S.5 This shows the menu options in the List menu.
You may list some of the data matching the gene/probe search terms or
EG sample search terms prior to doing the search. The first option is
to list all 45K gene/probe ids. The second menu option lets you specify
gene/probe search terms either using the exact gene names or using
substrings. All genes/probes matching will be reported. The third menu
option lets you specify EG samples search terms either using selected
EG groups from the list. In addition, this is filtered by a list of substrings
which can be qualified as both being required (AND) or either being required (OR)
if the EG sample search terms are specified. All lists are reported in
the bottom scrollable Report Window.
Figure S.5.1 This shows results from
List menu | List matching genes in database in the Report
window .
The genes/probes matching the substring terms "stat" and "jak"
in the 45K probe database are listed in the scrollable
Processing Report log at the bottom of the window.
Figure S.5.2 This shows results from
List menu | List matching EG samples in database in the Report
window using the OR condition.
The Expression Group (EG) samples matching the substring terms
".treated" or ".untreated" in the 18 sample database are listed
in the scrollable Processing Report log at the bottom
of the window. It searches within the EG sample groups you have
selected. In this example, we have selected "All samples", but any
other subset could be used. Also, we required an OR
condition to select samples where either of the search terms
are present.
Figure S.5.3 This shows results from
List menu | List matching EG samples in database in the Report
window using the AND condition.
The Expression Group (EG) samples matching the substring terms
".stat" and ".GH" in the 18 sample database are listed
in the scrollable Processing Report log at the bottom
of the window. It searches within the EG sample groups you have
selected. In this example, we have selected "All samples", but any
other subset could be used. Also, we are required an AND
condition to select samples where both search terms are present.
Figure S.6 This shows the Search window after processing
is finished and the View HTML button is made available. Pressing
it will pop up a local web browser with the data shown in the next
figure. Note that the Process button is now disabled and will
be until you reset the converter using the Reset button.
Figure S.7 This shows the generated table Web page created
by the above search and viewed when the View HTML button
was pressed. The colored cells reflect the quantiles that the data belong
to and are based on (max, min, mean, stddev) statistics computed over
the entire database. The data was sorted by the third probe (Stat5b/1422103_at)
and the numeric data was listed with fractions to make it easier to "eyeball"
the data.
3. COMMAND LINE SWITCHES
Command line switches are case-sensitive and of the form '-switchName:a1,a2,...,an'
where: 'switchName' is the minimum number of characters in the switch
shown below, and 'a1', 'a2', etc. are the comma-separated switch arguments
with no spaces between the commas and the arguments. Use double quotes
in arguments with spaces. Tabs are not allowed and all switches must
be on the same line unless either the switches are in a parameter file in
which case they are on separate lines, or the command lines is entered
using line continuation characters for the operating system (e.g., '\'
in Unix, etc). Switches with additional arguments require the comma-separated
arguments after the ':'. We denote the arguments as being within '{'...'}'
brackets. Note you do not include the '{' or '}' brackets in the
actual switches - it just denotes that is some argument.
There may be multiple instances of some of the switch commands
including: -files, -hrefData, -dropColumn, -keepColumn, -reorderColumn,
-sortTableByColumn, -mapDollarsigns, -mapQuestionmarks, -copyFile,
-copyTree, -genCopySupportFile, -genParamTemplate, -genSummaryTemplate,
-genCopyfile, -genTreeCopyData, -dirIndexHtml.
{parameter command file}
[this argument does not start with '-' and is thus
assumed to be a parameter command file. It will then
get all of the command switches from this file if
present. Examples of command file contents are in
the EXAMPLES section below. By convention, we name
these command text files 'paramXXX.map' with a '.map'
file extension and keep them in the same directory
that we specify with -inputDirectory. We refer to these
fileas throughout this document as "params .map" files.
The .map file extension is used for tab-delimited text
files that we do not want to convert. We only convert
tab-delimited text files with .txt file extensions.]
-addE:{opt. epilogue file name}
['-addEpilogue:{opt epilogue filename}' add an
epilogue HTML file in inputDir or user directory
(common epilogue for all conversions). If the
keywords $$DATE$$ or $$INPUTFILENAME$$ is in the
file, it will substitute today's date or file name
respectively. $$FILE_ZIP_EXTENSION$$ will substitute
the file name with a ".zip" extension. Default name is
'epilogue.html'. Default is to not add an epilogue to
the HTML output.]
-addO:{postfix name}
['-addOutfilePostfix:{postfix name}' add a postfix
name to the output file before the .html. E.g., for
an output file 'abc.html', with a postfix name of
'-xyz', the new name is 'abcxyz.html'. This can
be useful if you are mapping the same input file
by several different param.map files and saving
them all in the same html/ output directory.]
-addP:{opt. prolog file name}
['-addProlog:{opt. prolog file name}' to add a prolog
HTML file in inputDir or user directory (common prolog
for all conversions. If the keywords $$DATE$$ or
$$INPUTFILENAME$$ is in the file, it will
substitute today's date or file name respectively.
$$FILE_ZIP_EXTENSION$$ will substitute the file name
with a ".zip" extension. Default name is 'prolog.html'.
The default is to not add a prolog to the HTML output.]
-addRow
['-addRowNumbers to preface each row with sequential
row numbers. Default is to not add row numbers.]
-addT
['-addTableName' to add TABLE name to HTML. Default
is to not add the name.]
-allowH
['-allowHdrDups' to allow duplicate column fields
in the header. Default is to not allow duplicates.]
-alt:{color name}
['-alternateRowBackgroundColor:{c}' alternate the
background row cell colors in the <TABLE>.
Default is no color changes.]
-batchP:{file of param specs, opt. new working dir}
['-batchProcess:{file of param specs, opt. new working dir}'
batch process a list of param.map type files specified
in a file. If the {opt. new working dir}value is
specified, it will change the current working directory
of the HTMLtools when runnning -batchProcess so
that you can specify it run in a particular environment.
No other switches should be used with this as they will
be ignored. If errors occur in any of the batch jobs,
the errors are logged in the HTMLtools.log file
and it aborts that particular job and continues on
to do the next job in the batch list. Default
is no batch processing.]
-concat:{concatenatedDataFile,opt."noHTML"}
['-concatTables:{concatenatedDataFile,opt."noHTML"}' to
create a new tab-delimited {concatenatedDataFile} (e.g.,
".txt" or ".map" file) and a .html output file using the
base address (without the ".txt" or ".map" file extensions)
of the {concatenatedDataFile} and if the "noHTML" option
is not specified. The data is from the set of concatenated
input text files data if-and-only-if they have exactly the
same column header names. The -outputDir specifies where
the files are saved. The input files are not converted
to HTML files. Default is to not concatinate the
input files. The -makeMapFile switch can be used
along with the concat switch to make a map file with fewer
columns.]
-copyFile:{sourceTreeDir,destDir}
['-copyFile:{srcFile,destFile}' to copy an input source
file {srcFile} to a destination subdirectory {destDir}.
There can be multiple instances of this option. Default is
to not copy tree data.]
-copyTree:{sourceTreeDir,destDir}
['-copyTree:{srcTreeFiles,destPath}' to copy an input
source tree subdirectory to a destination subdirectory.
There can be multiple instances of this option. Default is
to not copy tree data.]
-dataP:{nbr digits precision}'
[-dataPrecisionHTMLtable:{nbr digits precision}' sets the
precision to use in numeric data for a generated HTML file.
The table must be a numeric data table (such as generated
using the '-flipTableByIndexMap' option. If the value is < 0,
then use the full precision of the data (as supplied in the input
string data). If {nbr digits precision} >= 0, then clip digits
as required.]
-dirIndexHtml:{dir,'O'verride or 'N'ooverride}
['-dirIndexHtml:{dir,'O'verrideor 'N'ooverride}' to create
"index.html" files of all of the files in the specified directories
in the list of directories specified with multiple copies
of this switch. It is useful when copying a set of directories
on a Web server that does not show the contents of the directory
if there is no index.html file. In addition, if the corresponding
flag 'Override', then override the "index.html" file it it
already exists in that directory otherwise don't generate the
"index.html" file. Do this recursively on each directory.
Default is no index.html file generation. Multiple copies
of the switch are allowed.]
-dropColumn:{column header name}
['-dropColumn:{column header name}' to specify a
column to drop from the ouput TABLE. There can be
multiple instances of this switch.]
-exportB:{opt. big size threshold}
['-exportBigCellsToHTMLfile:{opt. size for big}'
to save the contents of big cells as separate
HTML files with a prefix
'big-R<r>C<c>-<outputFileName>.
So for a (r,c) of (4,5) and a file name 'xyz.html',
the generated name would be 'big-R4C5-xyz.html'.
The big size threshold defaults to 200. Default is no
exporting of big cells.]
-extractR:{colName,rowNbr,resourceTblFile,htmlStyle}
['-extractRow:{colName,rowNbr,resourceTblFile,
htmlStyle}' to get and lookup a keyword in the
table being processed at (colName,rowNbr) and then
to search a resourceTblFile for that keyword. If
it found, then it will extract the header row and
the data row from the resource file and create
HTML of htmlStyle to insert into the epilogue.
If $$EXTRACT_ROW$$ is in the epilogue, then
replace it with the generated HTML else insert
the HTML at the front of the epilogue. The
htmlStyles may be DL, OL, UL and TABLE. Default
is no row extraction.]
-fastE:{outTblFile}
['-fastEditFile:{opt. output file} to allow processing
input file data line by line table that does not
buffer the data in a Table structure, but remaps each
line on the fly using -mapHdrNames,
{-dropColumns or -keepColumns} followed by
-reorderColumns. Data is written immediately to an
output stream so it can handle huge files. Because
it is sequential, it can't do a -sortRowsByColumnData.
This would generally be used to generate a tab-delim
.txt files that can be random accessed. HTML table
generation is disabled. It is used instead of
'-saveEditedTable2File:{outTblFile,opt. "noHTML"}'
and overides the -saveEditedTable2File options.
Default is not to do a fast edit.]
-files:{f1,f2,...,fn}
['-files:{f1,f2,...,fn}' to specify list of files
here rather than all in all of the files in the
inputDir. You can have multiple instances of this
switch.]
-flipC:{flipColumnFile,flipColumnName} or -flipC:{*LIST*,flipColumnName,v1,v2,...vn}
['-flipColumnName:{flipColumnFile,flipColumnName}'
to specify the source Table column name to use in
filtering which row data to use in the
'-flipTableByIndexMap' operation. An alternative specification
is '-flipColumnName:{*LIST*,flipColumnName,v1,v2,...vn}'
where the values are listed explicitly. Multiple instances
of this '-flipColumnName' switch are used to specify
the header entries by '{flipColumnName}' of the new
flipped table. If the {flipColumnFile}' files exist,
they are used to filter the {flipDataFile} row entries.
Only the rows of the original Table that match one
of the {column-data-list} entries will be transposed.
Default is to transpose all rows unless the filter
files are specified.]
-flipE:{flipExcludeColumnName}
['-flipExcludeColumnName:{flipExcludeColumnName}' to specify the
column names from the source Table exclude from the final flipped
Table using the '-flipTableByIndexMap' operation. Multiple instances
of this switch are allowed. Default is to include all data Table
columns unless the filter is specified.]
-flipO:{colHdrName1,colHdrName2,...,colHdrNameN}
['-flipOrderHdrColNames:{colHdrName1,colHdrName2,...,colHdrNameN}'
to specify the list of columns in the source Table that will be
used to create the flipped Table multi-line header entries.
This option must be specified when using the '-flipTableByIndexMap'
operation.]
-flipRowF:{flipRowFilterNamesfile} or
-flipRowF:{*LIST*,name1,name2,...,nameK}
['-flipRowFilterNamesFile:{flipRowNamesFile}' or the alternate
'-flipRowFilterNamesFile:{*LIST*,name1,name2,...,nameK}'switch specifies
the source Table column names to use in filtering which source sample
columns data will be used as rows in the finalflipped Table using the
'-flipTableByIndexMap' operation. Analternative specification is
'-flipRowFilterNamesFile:{*LIST*,name1,name2,...,nameK}' where the values
are listed explicitly. If the "*LIST*" name is used instead of the file
name, then the rest of the switch specifies the row names. Only the
columns of the original Table that partially match one the
{flipRowNamesFile} entries will be transposed. Default is to transpose
all data Table columns unless the filter is specified.]
-flipRowGSP:{list of filter substrings}
['-flipRowGSPIDfilters:{list of filter substrings}' is an optiona
list of substring filters used to filter Experiment Group sample name
rows in the flipped table computation when using the
'-flipTableByIndexMap:{flipDataFile,flipIndexMapFile}' switch. It matches
case-independent substrings in the GSP ID names for the samples where
if more than one substring is specified, then they must all be found
for that sample to be used (e.g., ".Stat .GH" requires a ".Stat" and
a ".GH" to be present). Default is no filtering.]
-flipS:{flipSaveOutputFile}
['-flipSaveOutputFile:{flipSaveOutputFile}' is the alternate
output (HTML and TXT) file name to use when generating the
flipped Table using the
'-flipTableByIndexMap:{flipDataFile,flipIndexMapFile,(opt)maxRows}'
switch. Default is to generate the output file name from the
input file base name, adding a postfix using the
'-addOutfilePostfix:{postfix name}' or "-flipped" default
postfix. If the switch is not specified, it will use the base
input file name. (See Example 14
for an example of it's usage.) ]
-flipT:{flipDataFile,flipIndexMapFile,(opt)maxRows}
['-flipTableByIndexMap:{flipDataFile,flipIndexMapFile,(opt)maxRows}'
to generate a transposed file using random access file indexing to
create a multi-line header (1 line for each column name in the
list) using the list of columns previously specified with the
-flipColTableList and -flipRowTableList filters. It uses the index-map
created with '-makeIndexMapFile:{colName1,colName2,...,colNameN}'
command. It analyze the index map Table and then uses
all columns before the ("StartByte", "EndByte") columns
to define the flipped Table header. See the '-flipColTableList' and
-flipRowTableList to restrict which flipped column data to use.
See the '-flipRowTableList' to restrict which flipped row data to
use. Default is to not flip the Table.]
-flipO:{colHdrName1,colHdrName2,...,colHdrNameN}
['-flipUseExactColumnNameMatch:{TRUE | FALSE}' to specify the exact
match filter flag. If an exact match, then match '-flipColumnName:{names}'
exactly, otherwise do look for substring matches. Ignore case in
both instances. This option may be specified when using the
'-flipTableByIndexMap' operation. The default if no
flipUseExactColumnNameMatch is specified is "AND".]
-font:{-1,-2,-3,-4,+1,+2,+3,+4}
['-fontSizeHtml:{font Size modifier}' to change
the <TABLE> FONT SIZE in the HTML file.]
-gui
['-gui' to invoke the graphical user interface version
of the converter. See
Using the Graphical User Interface (GUI) to run the converter.]
-hdrL:{n}
['-hdrLines:n' to include in header. The last line
row is the one searched for mapping column URLs.
Default is 1 line.]
-hdrM:{oldHdrColName,newHdrColName}
['-hdrMapName:{oldHdrColName,newHdrColName}' to map
an old header column name {oldHdrColName} to a new
name {newHdrColName}. There may be multiple instances
of this switch. Default is to not do any mappings.]
-joinT:{joinTableFile}
['-joinTableFile:{joinTableFile}' adds the contents of
the {joinTableFile} file to the table being processed.
This allows us to add fields that can be used for
sorting the new table by the {joinTableFile} data
if it is defined. This switch can not be used with
the -fastEditFile option. Default is not to join
any tables.]
-keepColumn:{colName}
['-keepColumn:{colName}' specifies which columns
to keep in multiple instances of the switch.
Then, when the Table is processed, it drops all
columns not listed. It may be used as an
alternative to -dropColumn as the Table may have
unknown column names. Default is not active.]
-help (or '?')
[print instructions to see the README.txt file.]
-hrefD:{colName,Url,mapToken}
['-hrefData:{colHdrName,Url,(optional)mapToken}' to
get the mapping of column header name and the Url to use
as a base link to use for making a URL for Table data
in that column. It makes the URL by appending the data
in cells in that column to the Url. ([TODO] If
the optional mapToken is specified, then replace the cell
contents for the occurance of the mapToken in Url.)
There can be multiple instances of this switch. See
the following switch '-hrefHeaderRow' to change the mapping
from Table data to header rows. ]
-hrefHeaderRow
['-hrefHeaderRowMapping' is used with the above switch
'-hrefData:{colHdrName,Url,(optional)mapToken}' to map
the data in the header row(s) instead of the data in
the Table data columns. It searches the first column of
the header rows to find the colHdrName to determine
the row to be mapped to that colHdrName. Unlike the
-hrefData option, the colHdrName can be embedded within
a string. The default is not to map the header rows.]
-inputD:{input directory}
['-inputDirectory:{input dir}' where the input
tab-delimited table .txt files to be converted are
found. By convention, we name other text files that
we may need, and want to keep in the inputDirectory
but do not want to convert to HTML, with a '.map'
file extension. Examples of non-data files include
'paramXXX.map', 'prolog.html', 'epilogue.html', etc.,
Default directory is 'data/'.]
-limitM:{maxNbrRows,(opt.)sortFirstByColName,(opt.)'A'scending or 'D'escending}
['-limitMaxTableRows:{maxNbrRows,(opt.)sortFirstByColName},
(opt.)'A'scending or 'D'escending}' to limit the number of
rows of a table to {maxNbrRows}. If the {sortFirstByColName}
is specified, then sort the table first before limit the
number of rows. Default is not to limit rows.]
-log:{new log file name}
['-logName:{new log file name}' to log all
information about the processing to the console and
then to save this output in a log file. The new file
must end in ".log". Default is to use the
"HTMLtools.log" file name.]
-makeI:{colName1,colName2,...,colNameN}
['-makeIndexMapFile:{colName1,colName2,...,colNameN}' to
make an index map Table file (same name as the input file
but with an .idx file extension) of the input file (or the
file output from -saveEditedTable2File after the input
table has been edited). The index file will contain the
specified columns in the column-list followed by the
StartByte, EndByte for data in the input table with those
column values. This file can then be used to quickly
index a huge input file probably using a Hash table of
the selected column names instances to lookup the
(start,end) file byte pointers to random access the
large file. The software to use the index file is not
part of HTMLtools at this time.
The default is not to make an index map file.]
-makeM:{makeMapTblFileName,orderedCommaColumnList}
['-makeMapFile:{makeMapTblFileName,orderedCommaColList}'
used with -concatTable command to also make a map
file at the same time. This switch is only used
with -concatTable. Default is no map is made.]
-makeP
['-makePrefaceHTML' to make a separate preface
HTML file from the input text proceeding the table
data. The file has the same name, but has a
"preface-" added to the front of the file name. The
first generated HTML file is then linked from the
second generated file. Default no preface file.]
-makeS
['-makeStatisticsIndexMapFile' to make a 'Statistics Index Map'
table file with the same base file name as the index map (.idx)
but with a .sidx file extension. It is invoked after the
IndexMap file is created (using the '-makeIndexMapFile' switch).
Therefore, it must be specified in a subsequent command line
(if using batch). Default is not to make a Statistics Index Map.]
-mapD:{$$keyword$$,toString}
['-mapDollarsigns:{$$keyword$$,toString}' to
map cell data of the form '$${keyword}$$' to
{toString}. The preface, epilogue as well as the
table cell data is checked to see if any keywords
should be mapped. There may be multiple instances
of this switch. Default is to not do any mappings.]
-mapH:{mapHdrNamesFile,fromHdrName,toHdrName}
['-mapHdrNames:{mapHdrNamesFile,fromHdrName,toHdrName}
to map header names. E.g., map long to short header
names, or map obscure to well-defined header names.
The map file (specified with a relative path) is a
tab-delimited and must contain both the {fromHdrName}
and {toHdrName} entries. Default is no mapping.]
-mapO
['-mapOptionsList' to map ;; delimited strings
to inactive <OPTION> pull-down option lists.
Default is no mapping to option lists.]
-mapQ
['-mapQuestionmarks:{??keyword??,toString}' to
map cell data of the form '??{keyword}??' to
{toString}. If the toString is BOLD_RED,
BOLD_GREEN, or BOLD_BLUE, then just map the
all ??{keyword}?? string to bold and red (green,
or blue). The preface, epilogue as well as the
table cell data is checked to see if any keywords
should be mapped. There may be multiple instances
of this switch. Default is to not do any mappings.]
-noB
['-noBorder' to set no border for tables. The
default is there is a 'BORDER=1' in the TABLE.]
-noHeader
['-noHeader' set no header for tables. The
default is there is a header in input file.]
-noHTML
['-noHTML' set to not generate HTML if it would
normally do so. This switch disallows generation of
HTML when doing a input file processing if that
operation also allows HTML generation. This is useful
if doing editing of large input files to generate
index maps or saved files. The default is to allow
the generation of HTML..]
-outputD:{output directory}
['-outputDirectory:{output directory}' to set the
output directory. The default directory is 'html/'.]
-reorderC:{colName,newColNbr}
['-reorderColumn:{colName,newColNbr}' to reorder
this column to the new column number. You may
specify multiple new columns (they must be
different). Those columns not specified are moved
toward the right. This is done after the list of
dropped columns has been processed. There can be
multiple instances of this switch. Default is not
to reorder columns.]
-reorderR
['-reorderRemainingColumnsAlphabeticly' used if doing
a set of -reorderColumn operations, sort the remaining
columns not specified, but that are used, alphabetically.
Default is not to sort the remaining columns.]
-rmvT
['-rmvTrailingBlankRowsAndColumns' in the table.
Default is not to remove trailing blank lines or
trailing blank columns.]
-saveE:{outTblFile,opt. "HTML"}
['-saveEditedTable2File:{outTblFile,opt. "HTML"}'
to make a Table file from the modified input
table stream. It is created after the Table is
edited by -dropColumns, -keepColumns,
-reorderColumns, -sortRowsByColumn. If the outTblFile
is not specified (i.e., ":,") then the input file name
with the name from the input file with the postfile
name from the '-addOutfilePostfix:{postfix name}' is
used. If the "HTML" option is set, it also outputs the
HTML when doing this operation. Note that the switch
should not be used with '-fastEditFile:{opt. output file}'
which can be used for converting very large files
without generating the HTML file. Default is not to save
the Table.]
-searchGui
['-searchGui' to invoke the graphical user interface for the
database search engine to generate a flip table. See
Search Database GUI generating specialized reports.
Also see Example-17
for examples of the default parameter file used as the
basis of the flip table generated. Default is no search GUI.]
-shrinkB:{opt. size for big,opt. font size decrement}
['-shrinkBigCells:{opt. size for big,opt. font size
decrement}' in the Table with more than the big
threshold number of characters/cell by decreasing
the font size to -5 (or the opt. font size
decrement) for those cells. The big size threshold
defaults to 25 characters. Setting the threshold to
1 forces all cells to shrink. Default is not to
shrink cells.]
-showDataHeatmapFlipTable
['-showDataHeatmapFlipTable' used to generate colored heat-map
data cells in a HTML conversion for a flip table using the
'-flipTableByIndexMap' option. It uses the global statistics
on the (digital) data in the Statistics Index Map .sidx file
if it exists to normalize the data and generate a cell color
background range in 7 quantiles of colors: dark green,
medium green, light green, white, light red, medium red, dark red.
Default is not to generate the colored heatmap.]
-sortFlip:{col data name}
['-sortFlipTableByColumnName:{col data name}' specifies the name
of field in the flip table to use in sorting by column data in
descending order in the generated table. It is used with the
'-flipTableByIndexMap' option. Note this name can be any of the
flipped header column values (multiheader data names}. When doing
the sort it matches the specified name with any of the header
rows to find the column to use for the sort. Default is not to
sort the generated flip table.]
-sortR:{colName,'A'scending or 'D'escending}
['-sortRowsByColumn:{colName,'A'scending or specified column.
You can specify 'Ascending' or 'D'escending. This is done after
any columns have been dropped or reordered. Default is not to
sort columns. If the column is not found, don't sort - just
continue. You can have multiple instances of the switches. If the
first column name is not found, it looks for the second, etc.
and only ignores the sort if no column names are found. Default
is not to sort the table.]
'-startT:{keyword}
['-startTableAtKeywordLine:{keyword} specifies the start
of the last line of a Table header by a keyword that
is part of any of the fields in that line. This is
useful when reading a file with complex preface info
with possibly multiple blank lines. It can be used
with the '-hdrLines' switch to specify multiple
header lines. Default no keyword search.]
-tableD:{tablesDirectory}'
['-tableDir:{tablesDirectory}' to set the various mapping
tables directory. These tables are used during various
conversion procedures. They include both the .txt and
the .map file (same file, but with different extensions).
Examples include: EGMAP.map(.txt), ExperimentGroups.map(.txt)
mAdbArraySummary.map(.txt). The default directory is
'data.Table/'.]
-useOnly
['-useOnlyLastHeaderLine' to reduce the number of header
lines to 1 even if there are more than 1 header line.
Default is to use all of the header lines.]
3.1 GenBatchScripts COMMANDS Extension
These commands are used to create batch scripts for subsequent use by
HTMLtools. This set of commands is called the GenBatchScript commands.
The GenBatchScript process is described in
Section 1.1. The -genBatchScript command is only used to generate these
batch scripts in a set of structured trees suitable for copying directly to a
Web server. It uses a test-ToDo-list.txt Table to specify a list of tests, column
"Test-name", a column "Relative directory" where the data is to be saved and some
documentation columns "Page label", "Page description", and "Tissue name"
that are used for helping generate the Summary HTML Web pages and
params .map files used in the subsequent conversion of the .txt Table
data files to HTML documentation. See Example 15.
for am example of a params.map file using the GenBatchScripts commands.
*** REWRITE and EDIT more detailed and generalized description ***
-genBatch:{batchDir,paramScriptsDir,inputTreeDir,summaryDir,analysisDir}'
['-genBatchScripts:{batchDir,paramScriptsDir,inputTreeDir,
outputTreeDir,analysisTreeDir,JTVDir}' to generate a set
of scripts to batch convert a set of tab-delimited Table
test data files specified by the -genTestFile:{testToDoFile}
Table in the {batchDir} directory. It generates a set of
parameter .map files in the {paramScriptsDir} directory. It
also generates a set of summary HTML Web pages in {summaryDir}
that describe the data, one page for each type of tissue,
and (pre) generates links to data that will be generated in
the {analysisTreeDir} when the batch script is subsequently
run. These new params .map files can then be run by a converter
batch file called buildWebPages.doit started with a
Windows buildWebPages.bat BAT file to start the batch
job (both files are in the batchDir directory along with a
copy of HTMLtools.jar). The buildWebPages.bat file
could easily be edited to run on MacOS-X or Linux. The paths
created in the {inputTreeDir}, and {analysisTreeDir} base paths
use the "Relative Directory" data in the {testToDoFile} within
those directories. This generated batch .doit script will
process a data set to generate a set of HTML pages and
converted database .txt files defined by the {testToDoFile}
Table database. Default is no batch script generation.
Additional switches required with -genBatchScripts are:
-genTestFile, -genMapEGdetails, and -genMapEGintroduction]
-genC:{support file}
['-genCopySupportFile:{support file}' to specify a list
of support files to copy to the output batchDir (e.g.,
'-outputDir:batchScripts'). The support files are
specified with a list created using multiple instances of
-genCopySupportFile:{support file}. Default is no support
files to copy.]
-genMapEGd:{EGdetailsMapFile}
['-genMapEGdetails:{EGdetailsMapFile}' specifies the
'details' Table used when the -genBatchScripts switch is
invoked. This is required when the -genBatchScripts switch
is used.]
-genMapIntro:{introductionMapFile}
['-genMapIntroduction:{introductionMapFile}' specifies the
'Introduction' Table used when the -genBatchScripts switch
is invoked. This is required when the -genBatchScripts
switch is used.]
-genP:{name,paramTemplateFileName}
['-genParamTemplate:{name,paramTemplateFileName}' to
specify a list of parameter map Templates that are used
for mapping the test-ToDo-list data so that (param-MRR,
param-MRR-keep, param-JTV) etc. dynamically. These are
then mapped into the following keywords that may appear in
any of these templates: $$TISSUE$$, "$$TEST_NAME$$",
"$$MRR_FILE$$", $$DESCRIPTION$$, $$PROLOG$$, $$EPILOG$$,
$$DATE$$. Multiple unique instances are allowed. The
default is no parameter templates.]
-genS:{orderNbr,templateFileName}
['-genSummaryTemplate:{orderNbr,templateFileName}' to define
a list of Summary Templates that are used for mapping the
test-ToDo-list data so that (summaryProlog, summaryExperimental,
summaryAnalysis, summaryFurtherAnalysis, summaryEpilogue)
etc. dynamically. Set by -genSummaryTemplate:{orderNbr,
templateFileName} instances that can be used to generalized
the currently hardwired. These are then mapped into the
following keywords that may appear in any of these templates:
$$TISSUE$$, $$LIST_EXPR_GROUPS$$, $$DESCRIPTION$$,
$$ANALYSIS$$, $$FURTHERANALYSIS$$, $$DATE$$. The
$$INTRODUCTION$$ is extracted from the {"CellTypeTissue.map"}.
Default is no templates being defined. Multiple instances
are allowed where they are concatenenated by the orderNbr
associated with each template.]
-genTest:{testToDoFile}
['-genTestFile:{testToDoFile}' specifies the tests to do
when the -genBatchScripts switch is invoked. This is
required when the -genBatchScripts switch is used.]
-genTree:{sourceTreeDir,destDir}
['-genTreeCopyData:{sourceTreeDir,destDir}' to copy an
input data tree data to batch scripts subdirectory.
There can be multiple instances of this option.
Default is to not copy tree data.]
3.2 Tests-Intersection COMMANDS Extension
These Tests-Intersections subset of commands are only used to create
Tests-Intersection tables from mAdb Retrieval Reports (MRR) containing fold-change
data from the Tests-ToDo database used with GenBatchScripts. The primary command to
invoke this is the makeTestsIntersectionTbl switch. These Tests-Intersection commands
can be used with the regular HTML or table editing commands such as
'-noHTML' and/or '-saveTable' switches. If HTML is generated, then the
'-addProlog' and '-addEpilogue', '-mapQuestion', and '-mapDollar', '-sortByColumn',
-limitMaxTableRows, etc. See Example 13 for
an example of generating a Tests-Intersection tab-delimited table and HTML
Web page.
-addFCranges ['-addFCrangesForTestsIntersectionTable' may be used when generating a table Tests-Intersection Table using the '-makeTestsIntersectionTbl:{testsToDoFile}'. This switch does a simple fold-change (FC) row analysis after the Tests-Intersection Table is created by adding ("Min FC" "Max FC" "FC Range") data for each row. Because this extends the table, you can sort by any of these fields.] -addRange ['-addRangeOfMeansToTItable' to add the ("Range Mean A", "Range Mean B" and "FC counts %") computations to an expanded Tests-Intersection Table table. Default is to not add these fields.] -filterData:{dataTableField,d1,d2,...,dn} ['-filterDataTestIntersection:{dataTableField,d1,d2,...,dn}' that is used with the '-makeTestsIntersectionTbl:{testsToDoFile}' to filter the MRR rows using the specified MRR {dataTableField} and use it if it matches any of {d1,d2,...,dn} substrings. The default is not to filter the Tests-Intersection Table.] -filterTest:{testTableField,d1,d2,...,dn} ['-filterDataTestIntersection:{testTableField,d1,d2,...,dn}' that is used with the '-makeTestsIntersectionTbl:{testsToDoFile}' to filter the Tests-ToDo table rows using the specified {testTableField} and use it if it matches any of {d1,d2,...,dn} substrings. The default is not to filter the Tests-Intersection Table.] -makeT:{testsToDoFile,testsInputTreeDir} ['-makeTestsIntersectionTbl:{testsToDoFile}' that generates a table Tests-Intersection Table that contains data from the individual tests from the tests input data tree specified by the tests in -tableDir directory in the {testsToDoFile} which specifies the relative data file tree. The tree is found in -inputDir directory. The data files in the tree are used as input data. The computed table is organized by rows of +FC genes/Feature-IDs and -FC genes/Feature-IDs. The data from the {testsToDoFile} is used to get additional information for each test as follows. This switch is used with the '-noHTML' and/or '-saveTable' switches. If HTML is generated, then the '-addProlog' and '-addEpilogue', '-mapQuestion', and '-mapDollar' can be used. You can filter the MRR rows using the '-filterDataTestIntersection:{dataTableField,d1,d2,...,dn}' and the {testsToDoFile} test data using the '-filterTestTestIntersection:{testTableField,d1,d2,...,dn}'. The default is not to make the Tests-Intersection Table. You can do a simple FC row analysis by adding ("Min FC" "Max FC" "FC Range") for each row using the '-addFCrangesForTestsIntersectionTable' switch.]
Reorder: "WID:... || xxxxxx_at || MAP:... || gene -- geneDescr. || RID:..." to "gene -- geneDescr. || xxxxxx_at || WID:... || MAP:... || RID:..."You can not mix tab-delimited file to HTML conversions with JTV conversions in the params .map files.
-jvtB:{button name for JTV activation button} ['jvtButtonName:{button name for JTV activation button}' that may be used with '-jtvHTMLgenerate' to label the button to activate Java TreeView. The default is "Press the button to activate JTV".] -jtvC:{JTV jars directory} ['-jtvCopyJTVjars:{JTV jars directory}' to copy the JTV jar files and plugins to the jtvOutputDir. The default is no copying of the .jar files.] -jvtD:{description text for prologue} ['-jvtDescription:{description text for prologue}' that may be used with '-jtvHTMLgenerate' to insert additional text into the prolog where it replaces $$DATA_DESCRIPTION$$. The default is no description.] -jtvFiles:{f1,f2,...,fn} ['-jtvFiles:{f1,f2,...,fn}' to specify list of files here rather than all in all of the files in the jtvInputDir. You can have multiple instances of this switch.] -jtvH: ['-jtvHTMLgenerate' to generate a HTML file to invoke the JTV applet for each JTV specification in the jtvInputDir. It puts the HTML file in the jtvOutputDir. Some of the non-JTV HTML modification switches are operable including: '-addEpilogue', '-addOutfilePostfix', '-addProlog', '-mapQuestionmarks'. The default is to not generate JTV HTML.] -jtvI:{input JTV directory} ['-jtvInputDir:{input JTV directory}' to set the input directory of JTV sub directories. This contains the zipped or unzipped JTV files downloaded from mAdb. Each zip file contains 3 files with (.atr,.cdt,.gtr) extensions. Default directory is 'JTVinput/'.] -jtvO:{output JTV directory} ['-jtvOutputDir:{output JTV directory}' to set the output directory of JTV sub directories. The converted JTV directory and a corresponding HTML file are saved there. Default directory is 'JTVoutput/'.] -jtvN:{mAdbArraySummary,mapHdrNamesFile,fromHdrName,toHdrName} ['-jtvMapping:{mAdbArraySummaryFile,mapHdrNamesFile, fromHdrName,toHdrName}' to convert a list of sub directories of JTV file sets by reading the three files from the each of the subdirectories in the jtvInputDir directory. The {mAdbArraySummaryFile} and {mapHdrNamesFile} are specified with a relative path. It maps the .cdt file in each sub directory to use the {toHdrName} column of the equivalent mapNamesFile map Table instead of the "EID:'mAdb ID'" as generated by mAdb. The mapping between "mAdb ID" and short array names is done using the {fromHdrName} column of the jtv_mAdbArraySummaryFile Table map. It then writes out the JTV subset to a created sub directory in jtvOutputDir that has the same base name as the input JTV subdirectory being processed. See the optional switches: '-jtvInputDir:{jtvInputDir}' and '-jtvOutputDir:{jtvOutputSubDir}' to set the directories to other than the defaults ("JTVinput" and "JTVoutput"). The values for {fromHdrName} and {toHdrName} should be in the of mapNamesFile.] -jtvR [TODO] ['-jtvReZipConvertedFiles' to reZip the converted files in the output JTV directory in a file with the same name. Default is not to zip the converted files.] -jtvTableDir:{tablesDirectory}' ['-jtvTableDir:{tablesDirectory}' to set the various mapping tables directory. These tables are used during various conversion procedures. They include both the .txt and the .map file (same file, but with different extensions). Examples include: EGMAP.map(.txt), ExperimentGroups.map(.txt) mAdbArraySummary.map(.txt). Note: this switch is used when processing JTV files, but may also be set with the '-tableDir:{tablesDirectory}' switch. The default directory is 'data.Table/'.]
One could experiment with these parameter files adding or removing various options such as -dropColumn, -reorderColumn, -sortTable, etc.
HTMLtools
HTMLtools -addPrologue:prolog.html -addEpilogue:epilogue.html \ -inputDir:data -outputDir:html -tableDir:data.Table
HTMLtools data/params.map
HTMLtools data.GSPI-EG/params-GSPI-EG.map where: data.GSPI-EG/params-GSPI-EG.map contains:
#File:params-GSPI-EG.map #"Revised: 3-30-2009" # -addPrologue:data.GSPI-ExpGrp/prolog.html -addEpilogue:data.GSPI-ExpGrp/epilogue.html -addRowNumbers -addTableName:"GSP Experiment Group Samples" -inputDir:data.GSPI-EG -outputDir:html/GSP/GSP-Inventory/HTML -tablesDir:data.Table # -extractRow:"Experiment Group ID (1),1,data.Table/ExperimentGroups.map,DL" -alternateRowBackgroundColor:white -shrinkBigCells:25,-5 -rmvTrailingBlankRowsAndColumns # #"----------- End --------- "
HTMLtools data.GSPI-EG/params-GSPI-EG-concat.map where: data.GSPI-EG/params-GSPI-EG-concat.map contains:
#File:params-GSPI-EG-concatTXT.map #"Revised: 3-29-2009" # -addPrologue:data.GSPI-EG/prolog.html -addEpilogue:data.GSPI-EG/epilogue.html -addRowNumbers -addTableName:"GSP Inventory Concatenated List of all EG Samples" -inputDir:data.GSPI-EG -outputDir:data.Table -tablesDir:data.Table # -alternateRowBackgroundColor:white -shrinkBigCells:25,-5 -rmvTrailingBlankRowsAndColumns # #"Save the concatenated data in the following file." #-concatTables:EGALLDataSet.txt -concatTables:EGMAP.txt,noHTML # #"----------- End --------- "
HTMLtools data.GSPI-EG/params-GSPI-EG-concatHTML.map where: data.GSPI-EG/params-GSPI-EG-concatHTML.map contains:
#File:params-GSPI-EG-concatHTML.map #"Revised: 3-29-2009" # -addPrologue:data.GSPI-ExpGrp/prolog.html -addEpilogue:data.GSPI-ExpGrp/epilogue.html -addRowNumbers -addTableName:"GSP Inventory Concatenated List of all EG Samples" -inputDir:data.GSPI-EG -outputDir:html/GSP/GSP-Inventory/HTML -tablesDir:data.Table # -alternateRowBackgroundColor:white -shrinkBigCells:25,-5 -rmvTrailingBlankRowsAndColumns # #"Save the concatenated data in the following file." #-concatTables:EGALLDataSet.txt -concatTables:EGMAP.txt,noTXT # #"----------- End --------- "
HTMLtools data.Maps/params-Maps-EGMAP-map.map where: data.Maps/params-Maps-EGMAP-map.map contains:
#File:params-Maps-EGMAP-map.map #"Revised: 3-30-2009" #"Generate the EGMAP.map file, but no HTML file." # -addRowNumbers -addTableName:"Concatenation of all GSP Experiment Groups tables." -inputDir:data.Table -outputDir:data.Table -tablesDir:data.Table -files:"EGMAP.txt" # -alternateRowBackgroundColor:white -shrinkBigCells:25,-5 -rmvTrailingBlankRowsAndColumns # -concatTables:EGMAP.map,noHTML # #"---------- end ---------"
HTMLtools data.GSPI-ExpGrp/params-GSPI-ExpGrp.map where: data.GSPI-ExpGrp/params-GSPI-ExpGrp.map contains:
#File:params-GSPI-ExpGrp.map #"Revised: 3-30-2009" # -addPrologue:data.GSPI-ExpGrp/prolog.html -addEpilogue:data.GSPI-ExpGrp/epilogue.html -addRowNumbers -addSubTitleFromInputFile -addTableName:"GSP Experiment Groups Details" -files:"ExperimentGroups.txt" -inputDir:data.Table -outputDir:html/GSP/GSP-Inventory/HTML -tablesDir:data.Table # -alternateRowBackgroundColor:white -shrinkBigCells:25,-5 -rmvTrailingBlankRowsAndColumns # -mapOptionsLists -mapQuestionmarks:WHO,BOLD_RED -mapQuestionmarks:WHAT,BOLD_RED -mapQuestionmarks:WHEN,BOLD_RED # #"----------- End --------- "
HTMLtools data.GSPI-ExpGrp/params-GSPI-ExpGrp-exportBigCells where: data.GSPI-ExpGrp/params-GSPI-ExpGrp-exportBigCells.map contains:
#File:params-GSPI-ExpGrp-exportBigCells.map #"Revised: 3-30-2009" # -addOutfilePostfix:"-BRC" -addPrologue:data.GSPI-ExpGrp/prolog.html -addEpilogue:data.GSPI-ExpGrp/epilogue.html -addRowNumbers -addSubTitleFromInputFile -addTableName:"GSP Experiment Groups Details" -inputDir:data.Table -outputDir:html/GSP/GSP-Inventory/HTML -tablesDir:data.Table -files:"ExperimentGroups.txt" # -alternateRowBackgroundColor:white -shrinkBigCells:25,-5 -mapOptionsLists -mapQuestionmarks:WHO,BOLD_RED -mapQuestionmarks:WHAT,BOLD_RED -mapQuestionmarks:WHEN,BOLD_RED -rmvTrailingBlankRowsAndColumns # -exportBigCellsToHTMLfile:200 # #"----------- End --------- "
HTMLtools data.MRR/params-MRR.map where: data.MRR/params-MRR.map contains:
#File:params-MRR.map #"Revised: 5-28-2009" # -addPrologue:data.MRR/prolog.html -addEpilogue:data.MRR/epilogue.html -addRowNumbers -addTableName:"mAdb Microarray Retrieval Report" -inputDir:data.MRR -outputDir:html/data.MRR -tablesDir:data.Table # #"Limit the number of rows to the highest 500 fold-change values" -limitMaxTableRows:"500,A-B Mean Difference,Descending" # -allowHdrDups -alternateRowBackgroundColor:white -rmvTrailingBlankRowsAndColumns #-shrinkBigCells:25,-5 -shrinkBigCells:1,-5 -hdrLines:2 -hasEmptyLineBeforeTable -makePrefaceHTML -mapOptionsLists # #"Map header names. Select from field='Affy .CEL file (16)'" #" to field= 'GSP ID (9)' or 'Simple GSP ID (10)'" -mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)" # #"Drop some of the columns" -dropColumnColumn:"mgB36 Chr:Start-Stop" -dropColumn:"mgB36 Cytoband" -dropColumn:Annotation_Src -dropColumn:UniGene -dropColumn:RefSeq -dropColumn:Refseqs_Hit -dropColumn:geneIDS_Hit -dropColumn:"Entrez GeneID" -dropColumn:"Locus Tag" -dropColumn:"BioCarta Pathways" -dropColumn:"KEGG Pathways" #-dropColumn:"Gene Ontology Terms" (remove # if want to drop) -dropColumn:"GO Tier2 Component" -dropColumn:"GO Tier3 Component" -dropColumn:"GO Tier2 Function" -dropColumn:"GO Tier3 Function" -dropColumn:"GO Tier2 Process" -dropColumn:"GO Tier3 Process" #"The following was added 5/28/09" -dropColumn:"Map" -dropColumn:"mgB37_Probe Chr:Start-Stop" -dropColumn:"mgB37_Probe Cytoband" -dropColumn:"mgB37_RefSeq Chr:Start-Stop" -dropColumn:"mgB37_RefSeq Cytoband" # #"Reorder columns to left side of Table" -reorderColumn:Gene,1 -reorderColumn:"A-B Mean Difference",2 -reorderColumn:Difference,3 -reorderColumn:"A Mean",4 -reorderColumn:"B Mean",5 -reorderColumn:"A-B p-Value",6 -reorderColumn:"p-Value",7 -reorderColumn:"Well ID",8 -reorderColumn:"Feature ID",9 -reorderColumn:"Description",10 # #"Sort rows by column - use whichever comes first" -sortRowsByColumn:"A-B Mean Difference",Descending -sortRowsByColumn:Difference,Descending -sortRowsByColumn:p-Value,Ascending -sortRowsByColumn:Gene,Ascending # #"These map mAdb Feature Report data to Bioinformatics databases" -hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A -hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene= -hrefData:"Entrez GeneID",http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids= -hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset= # #"----------- End --------- "
HTMLtools data.MRR/params-MRR-Short_GSP_ID.map where: data.MRR/params-MRR-Short_GSP_ID.map contains:
#File:params-MRR-Short_GSP_ID.map #"Revised: 5-28-2009" # -addOutfilePostfix:"-Short_GSP_ID" -addPrologue:data.MRR/prolog.html -addEpilogue:data.MRR/epilogue.html -addRowNumbers -addTableName:"mAdb Microarray Retrieval Report" -inputDir:data.MRR -outputDir:html/data.MRR -outputDir:html -tablesDir:data.Table # #"Limit the number of rows to the highest 500 fold-change values" -limitMaxTableRows:"500,A-B Mean Difference,Descending" # -allowHdrDups -alternateRowBackgroundColor:white -rmvTrailingBlankRowsAndColumns #-shrinkBigCells:25,-5 -shrinkBigCells:1,-5 -hdrLines:2 -hasEmptyLineBeforeTable -makePrefaceHTML -mapOptionsLists # #"Map header names. Select from field='Affy .CEL file (16)'" #" to field= 'GSP ID (9)' or 'Simple GSP ID (10)'" -mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),Simple GSP ID (10)" # #"Drop some of the columns" -dropColumnColumn:"mgB36 Chr:Start-Stop" -dropColumn:"mgB36 Cytoband" -dropColumn:Annotation_Src -dropColumn:UniGene -dropColumn:RefSeq -dropColumn:Refseqs_Hit -dropColumn:geneIDS_Hit -dropColumn:"Entrez GeneID" -dropColumn:"Locus Tag" -dropColumn:"BioCarta Pathways" -dropColumn:"KEGG Pathways" # #-dropColumn:"Gene Ontology Terms" (remove # if want to drop) -dropColumn:"GO Tier2 Component" -dropColumn:"GO Tier3 Component" -dropColumn:"GO Tier2 Function" -dropColumn:"GO Tier3 Function" -dropColumn:"GO Tier2 Process" -dropColumn:"GO Tier3 Process" #"The following was added 5/28/09" -dropColumn:"Map" -dropColumn:"mgB37_Probe Chr:Start-Stop" -dropColumn:"mgB37_Probe Cytoband" -dropColumn:"mgB37_RefSeq Chr:Start-Stop" -dropColumn:"mgB37_RefSeq Cytoband" # #"Reorder columns to left side of Table" -reorderColumn:Gene,1 -reorderColumn:"A-B Mean Difference",2 -reorderColumn:Difference,3 -reorderColumn:"A Mean",4 -reorderColumn:"B Mean",5 -reorderColumn:"A-B p-Value",6 -reorderColumn:"p-Value",7 -reorderColumn:"Well ID",8 -reorderColumn:"Feature ID",9 -reorderColumn:"Description",10 # #"Sort rows by column - use whichever comes first" -sortRowsByColumn:"A-B Mean Difference",Descending -sortRowsByColumn:Difference,Descending -sortRowsByColumn:p-Value,Ascending -sortRowsByColumn:Gene,Ascending # #"These map mAdb Feature Report data to Bioinformatics databases" -hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A -hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene= -hrefData:"Entrez GeneID",http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids= -hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset= # #"----------- End --------- "
HTMLtools data.MRR/params-MRR-keep.map where: data.MRR/params-MRR-keep.map contains:
#File:params-MRR-keep.map #"Revised: 4-9-2009" # -addOutfilePostfix:"-keep" -addPrologue:data.MRR/prolog.html -addEpilogue:data.MRR/epilogue.html -addRowNumbers -addTableName:"mAdb Microarray Retrieval Report" -inputDir:data.MRR -outputDir:html/data.MRR -tablesDir:data.Table # -allowHdrDups -alternateRowBackgroundColor:white -rmvTrailingBlankRowsAndColumns #-shrinkBigCells:25,-5 -shrinkBigCells:1,-5 -hdrLines:2 -hasEmptyLineBeforeTable -makePrefaceHTML -mapOptionsLists # #"Specify columns to keep, the rest are dropped" -keepColumn:Gene -keepColumn:p-Value -keepColumn:Difference -keepColumn:"A-B p-Value" -keepColumn:"A-B Mean Difference" -keepColumn:"A Mean" -keepColumn:"B Mean" -keepColumn:"Well ID" -keepColumn:"Feature ID" -keepColumn:Description -keepColumn:"Gene Ontology Terms" # #"Reorder columns to left side of Table" -reorderColumn:Gene,1 -reorderColumn:"A-B Mean Difference",2 -reorderColumn:Difference,3 -reorderColumn:"A Mean",4 -reorderColumn:"B Mean",5 -reorderColumn:"A-B p-Value",6 -reorderColumn:"p-Value",7 -reorderColumn:"Well ID",8 -reorderColumn:"Feature ID",9 -reorderColumn:"Description",10 # #"Sort rows by column - use whichever comes first" -sortRowsByColumn:"A-B Mean Difference",Descending -sortRowsByColumn:Difference,Descending -sortRowsByColumn:p-Value,Ascending -sortRowsByColumn:Gene,Ascending # #"These map mAdb Feature Report data to Bioinformatics databases" -hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A -hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene= -hrefData:"Entrez GeneID",http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids= -hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset= # #"----------- End --------- "
HTMLtools JTVinput/params-JTV.map where: params-JTV.map contains:
#File:params-JTV.map #"Revised: 3-30-2009" # #"(1) Convert array names in JTV data sets to mapped array names." -jtvNamesMap:"data.Table/mAdbArraySummary.map,data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)" # -jtvInputDir:JTVinput -jtvOutputDir:JTVoutput -jtvTableDir:data.Table # #"(2) Generate HTML web pages to invoke the converted JTV data." -jtvHTMLgenerate -jtvDescription:"Sample description paragraph on mouse muscle GH/Stat-null Controlled (+) genes [i.e. experiment $$INFILENAME$$]." -jtvButtonName:"Mouse Muscle: $$INFILENAME$$" -addProlog:JTVinput/prolog.html -addEpilogue:JTVinput/epilogue.html -jtvCopyJTVjars:JTVjars # # [3] Rezip the converted files -jtvReZipConvertedFiles # #"------------ End -----------"
HTMLtools -batchProcess:batchList.doit where: batchList.doit contains:
#File:batchList.doit #"Revised: 6-23-2009" #"Preprocess the data for the NIDDK/mAdb GSP Jak-Stat Prosector Database" # #"(1) Doing GSP-InventoryExperiment Groups conversions and generating HTML pages" data.GSPI-EG/params-GSPI-EG.map data.GSPI-EG/params-GSPI-EG-concatTXT.map data.GSPI-EG/params-GSPI-EG-concatHTML.map data.GSPI-ExpGrp/params-GSPI-ExpGrp.map data.GSPI-ExpGrp/params-GSPI-ExpGrp-exportBigCells.map # #"(2) Doing mAdb Retrieval Report conversions and generating HTML pages" data.MRR/params-MRR.map data.MRR/params-MRR-keep.map # #"(3) Doing JTV array name conversions and generating HTML pages" ###JTVinput/params-JTV.map JTVinput/params-JTV-jtvReZip.map # #"(4) Convert Mapping .txt files to HTML" data.Maps/params-Maps-EGMAP-html.map data.Maps/params-Maps-ExperimentGroups-html.map data.Maps/params-Maps-mAdbArraySummary-html.map # #"(4.1) Convert Mapping .txt files to .map files" data.Maps/params-Maps-EGMAP-map.map data.Maps/params-Maps-ExperimentGroups-map.map data.Maps/params-Maps-mAdbArraySummary-map.map # #"(5) Doing mAdb Retrieval Report Gene List mappings and generating HTML pages" data.MRR-GL-examples/params-MRR-GL-orig.map data.MRR-GL-examples/params-MRR-GL-Review.map data.MRR-GL-examples/params-MRR-GL-GeneList.map # #"(6) Doing mAdb and HTML conversion tests TODO generating HTML pages" data.mAdb-TestsToDo/params-mAdb-TestsToDo.map # #"(7) Convert MRR all arrays to edited DB file." #" This is normally not done each time." #data.MRR-all/params-MRR-all-18-RMA-fast.map #data.MRR-all/params-MRR-all-18-MAS5-fast.map # #"(7.1) Convert MRR Literature data for all arrays." #" This is normally not done each time." data.MRR-Literature/params-MRR.map data.MRR-Literature/params-MRR-keep.map data.MRR-Literature/params-JTV-jtvReZip.map # #"(8) Generate a Tests-Intersection .txt table and also the HTML for it." #" from the mAdb-TestsToDo.txt data." data.TestsIntersection/params-TI-HTML-all.map data.TestsIntersection/params-TestsIntersection-ALL.map data.TestsIntersection/params-TestsIntersection-ALL-filter.map data.TestsIntersection/params-TestsIntersection-ALL-filter-LIT.map # #"(9) Flip several types of samples - not currently used in html/GSP" #"(9) Flip several types of samples" #"(9.1) Create Data file for Flip Tables." #" Create edited Tables with Index-Maps." #" This is normally not done each time." data.MRR-flip/params-MRR-all-fastSave.map data.MRR-flip/params-MRR-all-fastMakeIndex.map data.MRR-flip/params-MRR-all-fastSave+MakeIndex.map data.MRR-flip/params-MRR-LitRev-fastSave.map data.MRR-flip/params-MRR-LitRev-fastMakeIndex.map data.MRR-flip/params-MRR-LitRev-fastSave+MakeIndex.map data.MRR-flip/params-MRR-EG3.2-Test1-fastSave.map data.MRR-flip/params-MRR-EG3.2-Test1-fastMakeIndex.map # #"(9.2) Flip Tables with and without filtering saving" #" the flipped .txt file and .html file." data.MRR-flip/params-MRR-flipGID-all-GeneList.map data.MRR-flip/params-MRR-flipGID-all-FeatureID.map data.MRR-flip/params-MRR-flipGID-all-GeneList+FeatureID.map data.MRR-flip/params-MRR-flipGID-all-GeneList-RowNames.map data.MRR-flip/params-MRR-flipGID-all-GeneList+FeatureID-RowNames.map data.MRR-flip/params-MRR-flipGID-LitRev.map data.MRR-flip/params-MRR-flipGID-EG3.2-test1.map # #"(10) Run the GenBatchScripts to create the batch scripts data" #" This is normally not done each time." data.GBS/params-genBatchScripts.map # # #"------------ End -------------"
HTMLtools data.MRR-all/params-MRR-all-fast.map where: data.MRR-all/params-MRR-all-fast.map contains:
#File:params-MRR-all-fast.map #"Revised 6-20-2009" #"Convert MRR all arrays file to edited DB 'EGALLDataSet.txt' file." # -inputDir:data.MRR-all -outputDir:data.Table -outputDir:data.Table -tableDir:data.Table # -files:"EG1+EG3.1+EG3.2-MSE430_2-18Arrays-RMA-Grouped.txt" # # #"Save the edited table as a .txt file" -saveEditedTable2File:EGALLDataSet.txt,noHTML # #"Do a fast edit of the .txt file and don't generate HTML file" -fastEditFile #-noHTML # #-addOutfilePostfix:"-edit" # -allowHdrDups -rmvTrailingBlankRowsAndColumns -hdrLines:2 -useOnlyLastHeaderLine -hasEmptyLineBeforeTable # #"Map header names. Select from field='Affy .CEL file (16)'" #" to field= 'GSP ID (9)' or 'Simple GSP ID (10)'" -mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)" # #"Drop more columns for simplest file" -dropColumn:Difference -dropColumn:p-Value -dropColumn:Description #"Drop some of the columns" -dropColumnColumn:"mgB36 Chr:Start-Stop" -dropColumn:"mgB36 Cytoband" -dropColumn:Annotation_Src -dropColumn:UniGene -dropColumn:RefSeq -dropColumn:Refseqs_Hit -dropColumn:geneIDS_Hit -dropColumn:"Entrez GeneID" -dropColumn:"Locus Tag" -dropColumn:"BioCarta Pathways" -dropColumn:"KEGG Pathways" -dropColumn:"Gene Ontology Terms" (remove # if want to drop) -dropColumn:"GO Tier2 Component" -dropColumn:"GO Tier3 Component" -dropColumn:"GO Tier2 Function" -dropColumn:"GO Tier3 Function" -dropColumn:"GO Tier2 Process" -dropColumn:"GO Tier3 Process" #"The following was added 5/28/09" -dropColumn:"Map" -dropColumn:"mgB37_Probe Chr:Start-Stop" -dropColumn:"mgB37_Probe Cytoband" -dropColumn:"mgB37_RefSeq Chr:Start-Stop" -dropColumn:"mgB37_RefSeq Cytoband" # #"Sort the remaining columns alphabetically" -reorderRemainingColumnsAlphabeticly # #"Reorder columns to left side of Table" -reorderColumn:Gene,1 -reorderColumn:"A-B Mean Difference",2 -reorderColumn:Difference,3 -reorderColumn:"A Mean",4 -reorderColumn:"B Mean",5 -reorderColumn:"A-B p-Value",6 -reorderColumn:"p-Value",7 -reorderColumn:"Well ID",8 -reorderColumn:"Feature ID",9 -reorderColumn:"Description",10 # #"----------- End --------- "
HTMLtools data.MRR-GL-examples/params-MRR-GL-GeneList.map where: data.MRR-GL-examples/params-MRR-GL-GeneList.map contains:
#File:params-MRR-GL-GeneList.map #"Revised: 3-30-2009" # -addPrologue:data.MRR-GL-examples/prolog.html -addEpilogue:data.MRR-GL-examples/epilogue.html -addRowNumbers -addTableName:"GSP Genes mentioned in Hennighausen & Robinson Review (2008)" -inputDir:data.MRR-GL-examples -outputDir:html/GSP/Search/example/ -tablesDir:data.Table -files:GeneListTbl-all-A+G.txt,GeneListTbl-all-EG1+EG3.txt,GeneListTbl-all-Stat5ab+Socs2.txt # -allowHdrDups -alternateRowBackgroundColor:white -rmvTrailingBlankRowsAndColumns #-shrinkBigCells:25,-5 -shrinkBigCells:1,-5 # #"Map all 3 header lines in the Table" -hdrLines:3 # #"This does multiple header-row data mapping." -hrefHeaderRowMapping #"These map mAdb Feature Report data to Bioinformatics databases" -hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A -hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene= -hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset= # #"------------ End -----------"
HTMLtools data.TestsIntersection/params-TestsIntersection-ALL.map where: data.TestsIntersection/params-TestsIntersection-ALL.map contains:
#File:params-TestsIntersection-ALL.map #"Revised 5-06-2009" # #"[1] Master script to create a Tests Intersection Table file for all tests" #"in the mAdb-TestsToDo.txt file that we have data." # -inputDir:data.GBS -outputDir:html/GSP/TestsIntersection # #"Limit the number of rows to the highest 500 fold-change values" -limitMaxTableRows:"500,Range FC,Descending" # #"The tablesDir subdir. where mapping and other reference Tables are copied" #"to the batchScripts directory." -tablesDir:data.Table # -allowHdrDups -rmvTrailingBlankRowsAndColumns -hdrLines:2 -hasEmptyLineBeforeTable # -makeTestsIntersectionTable:"mAdb-TestsToDo.txt" # #"Add FC range computations and expand the TI table with" #"fields ('Max FC', 'Min FC', 'Range FC')." -addFCrangesForTestsIntersectionTable # #"Add the ('Range A Mean', 'Range B Mean', 'FC counts %') computations to" #"an expanded TestsIntersectionTable table." -addRangeOfMeansToTItable # #"Save the edited table as a .txt file" -saveEditedTable2File:"TestsIntersection-ALL.txt,HTML" -mapDollarsigns:$$EXCEL-FILE$$,"TestsIntersection-ALL.txt" # #"The mAdb-TestsToDo.txt Tables are in" #"the '-tablesDir:data.Table' subdirectory." # #"[2] Now after the Tests-Intersection .txt table is saved, generate the HTML file." #"Note: Converter removes -hasEmptyLineBeforeTable and sets -hdrLines:5 switches." # -addPrologue:data.TestsIntersection/prolog-TI.html -addEpilogue:data.TestsIntersection/epilogue-TI.html # # -addRowNumbers -addTableName:"Intersection of All GSP Fold-Change Tests for Genes in any test" -mapDollarsigns:$$TITLE$$,"All GSP Tests for Genes in any test" -allowHdrDups # -alternateRowBackgroundColor:white #-shrinkBigCells:25,-5 -shrinkBigCells:1,-5 # ###-sortRowsByColumn:Gene,Ascending -sortRowsByColumn:"Range FC",Descending # #"These map mAdb Feature Report data to Bioinformatics databases" -hrefData:"Well ID,http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A" -hrefData:"Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=" -hrefData:"Feature ID,https://www.affymetrix.com/LinkServlet?probeset=" # #"------------- End ---------------"
Example 14.1
Create an edited table file and then an Index-Map table file from the
edited table file. See Example 14.2
for the second part to use these files to created a flipped table.
E.g., Create batch scripts for subsequent file conversion processing.
The switches are in file 'params-MRR-all-fastSave+MakeIndex.map.map'.
===============================================================
CvtTabDelim2HTML data.flip/params-MRR-all-fastSave+MakeIndex.map where: data.flip/params-MRR-all-fastSave+MakeIndex.map contains:
#File:params-MRR-all-fastSave+MakeIndex.map #"This saves the edited Table and then makes an Index-Map file" #"of the saved edited table." #"Revised 6-23-2009" # -inputDir:data.MRR-all -outputDir:data.MRR-flip -tablesDir:data.Table # -files:"Review-LH-18Arrays-54-pathway-Genes-in-JakStat.txt" # #"Save the edited table as a .txt file" -saveEditedTable2File:EGALLDataSet.txt,noHTML # #"Make an EGALLDataSet.idx index file of the .txt file" -makeIndexMapFile:"Gene,Well ID,Feature ID" # -allowHdrDups -rmvTrailingBlankRowsAndColumns -hdrLines:2 -useOnlyLastHeaderLine -hasEmptyLineBeforeTable # #"Do a fast edit of the .txt file" -fastEditFile -noHTML # #"Map header names. Select from field='Affy .CEL file (16)'" #" to field= 'GSP ID (9)' or 'Simple GSP ID (10)'" -mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)" # #"Drop more columns for simplest file" -dropColumn:Difference -dropColumn:p-Value -dropColumn:Description #"Drop some of the columns" -dropColumnColumn:"mgB36 Chr:Start-Stop" -dropColumn:"mgB36 Cytoband" -dropColumn:Annotation_Src -dropColumn:UniGene -dropColumn:RefSeq -dropColumn:Refseqs_Hit -dropColumn:geneIDS_Hit -dropColumn:"Entrez GeneID" -dropColumn:"Locus Tag" -dropColumn:"BioCarta Pathways" -dropColumn:"KEGG Pathways" -dropColumn:"Gene Ontology Terms" (remove # if want to drop) -dropColumn:"GO Tier2 Component" -dropColumn:"GO Tier3 Component" -dropColumn:"GO Tier2 Function" -dropColumn:"GO Tier3 Function" -dropColumn:"GO Tier2 Process" -dropColumn:"GO Tier3 Process" #"The following was added 5/28/09" -dropColumn:"Map" -dropColumn:"mgB37_Probe Chr:Start-Stop" -dropColumn:"mgB37_Probe Cytoband" -dropColumn:"mgB37_RefSeq Chr:Start-Stop" -dropColumn:"mgB37_RefSeq Cytoband" # #"Sort the rest of the columns alphabetically" -reorderRemainingColumnsAlphabeticly # #"Reorder columns to left side of Table" -reorderColumn:Gene,1 -reorderColumn:"A-B Mean Difference",2 -reorderColumn:Difference,3 -reorderColumn:"A Mean",4 -reorderColumn:"B Mean",5 -reorderColumn:"A-B p-Value",6 -reorderColumn:"p-Value",7 -reorderColumn:"Well ID",8 -reorderColumn:"Feature ID",9 -reorderColumn:"Description",10 # #"----------- End ----------- "
CvtTabDelim2HTML data.flip/params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map where: data.flip/params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map
#File:params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map #"Revised 7-18-2009" # -addPrologue:data.MRR-flip/prolog.html -addEpilogue:data.MRR-flip/epilogue.html -addRowNumbers -addTableName:"Flipped 18 GSP Mouse MOE403_2 arrays Filtered by Feature_ID List" # -inputDir:data.MRR-flip -outputDir:html/data.flip -tablesDir:data.Table # -addOutfilePostfix:"-GeneList+FeatureID-RowNames" # -flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx" -flipColumnName:"*LIST*,Gene,Socs1,Socs2,Socs3,Stat1,Stat2,Stat3,Stat4,Stat5a,Stat5b" -flipColumnName:"*LIST*,Well ID," -flipColumnName:"*LIST*,Feature ID,1418507_s_at,1449109_at, 1438470_at,1441476_at" -flipRowNames:"*LIST*,EG001,EG003.1,EG003.2" -flipOrder:"Gene,Well ID,Feature ID" # -allowHdrDups -alternateRowBackgroundColor:white -rmvTrailingBlankRowsAndColumns # #-shrinkBigCells:25,-5 -shrinkBigCells:1,-5 # #"This does multiple header-row data mapping." -hrefHeaderRowMapping # #"These map mAdb Feature Report data to Bioinformatics databases" -hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A -hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene= -hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset= # #"--------- End ---------"
HTMLtools data.GBS/params-genBatchScripts.map where: data.GBS/params-genBatchScripts.map contains:
#File:params-genBatchScripts.map #"Revised 4-26-2009" # #"Master script to generate params .map files and buildWebPages.doit" #"file for all tests in the mAdb-TestsToDo.txt file. It generates an" #"environment in batchScripts/ to enable running all of the scripts as" #"a Window's batch job buildWebPages.bat on the buildWebPages.doit batch" #"input list Windows batch startup file." # #"The templates (.html, .param) and .map files are in the same directory" #"as this master batch generation script." # #"Map: mAdb-TestsToDo.txt - test Table to drive the batch scripts generation." # #"Map: CellTypeTissue.map - maps 'Introduction' field for 'Tissues'." #"Map: ExperimentGroups.map - maps 'Details' field for 'Expression Groups'." #"Map: EGMAP.map - maps 'Affy .CEL file' name to 'Simple GSP ID' or 'GSP ID'." # #"Arg: batchScripts - where all files and the following subdirectories are saved" #"Arg: ParamScripts - subdir. where generated params*.map files are copied" #"Arg: inputTree - subdir. where mAdb generated .txt MRR and JTV data are copied" #"Arg: Summary - subdir. where generated text HTML top level Web pages are saved" #"Arg: Analyses - subdir. where generated text HTML & edited .txt files are saved" #"Arg: JTV - subdir. where generated JTVtext HTML & edited JTV files are saved" #"Arg: JTVjars - subdir. where the JTV runtime jar files are copied" # -inputDir:data.GBS -outputDir:batchScripts # #"The tablesDir subdir. where mapping and other reference Tables are copied" #"to the batchScripts directory." -tablesDir:data.Table # -genBatchScripts:"batchScripts,ParamScripts,InputTree,Summary,Analyses,JTV" -rmvTrailingBlankRowsAndColumns # #"The following maps and Tables are in the '-tablesDir:data.Table' subdirectory." -genMapHdrNames:"EGMAP.map" -genMapEGdetails:"ExperimentGroups.map" -genMapIntroduction:"CellTypeTissue.map" -genTestFiles:"mAdb-TestsToDo.txt" # #"Create Tests-Intersection (TI) HTML links in summary file & params .map files." -genTestsIntersection # #"List of CellType/Tissue summary templates for generating the Summary pages" -genSummaryTemplate:1,summaryTemplateProlog.html -genSummaryTemplate:2,summaryTemplateExperimental.html -genSummaryTemplate:3,summaryTemplateAnalysis.html -genSummaryTemplate:4,summaryTemplateFurtherAnalysis.html -genSummaryTemplate:5,summaryTemplateEpilogue.html # #"List of params .map templates for generating batch params .map files." -genParamTemplate:MRR,paramsTemplate-MRR.map -genParamTemplate:MRR-keep,paramsTemplate-MRR-keep.map ###-genParamTemplate:JTV,paramsTemplate-JTV.map -genParamTemplate:JTV,paramsTemplate-JTV-jtvReZip.map -genParamTemplate:MRR-saveFile,paramsTemplate-MRR-saveFile.map -genParamTemplate:TI,paramsTemplate-TI.map # #"List of support files to be copied to support -batchProcess of the .doit file." -genCopySupportFile:"../HTMLtools.jar" -genCopySupportFile:"../ReferenceManual.html" -genCopySupportFile:prologMRR.html -genCopySupportFile:prologJTV.html -genCopySupportFile:prologTI.html -genCopySupportFile:epilogueMRR.html -genCopySupportFile:epilogueJTV.html -genCopySupportFile:epilogueTI.html # #"List of JTV support files to be copied to support -batchProcess of the .doit file." #-genCopySupportFile:JTVjars/TreeViewApplet.jar #-genCopySupportFile:JTVjars/nanoxml-2.2.2.jar #-genCopySupportFile:JTVjars/plugins/Dendrogram.jar #-genCopySupportFile:JTVjars/plugins/Karyoscope.jar #-genCopySupportFile:JTVjars/plugins/Scatterplot.jar #-genCopySupportFile:JTVjars/plugins/Treeanno.jar # #"Copy tree data to top level batch scripts subdirectory" -genTreeCopy:JTVjars,batchScripts/JTVjars #"Copy Mapping files tree data to top level batch scripts subdirectory" -genTreeCopy:data.Table,batchScripts/data.Table # #"Copy input data tree data to batch scripts subdirectory" -genTreeCopy:data.GBS/CellTissue,batchScripts/inputTree/CellTissue # #"------------- End ---------------"
CvtTabDelim2HTML data.MRR-all/params-MRR-all-fastSave.map where: data.MRR-all/params-MRR-all-fastSave.map contains:
#File:params-MRR-all-fastSave.map #"Revised 8-18-2009" #"Convert MRR all arrays file to edited DB 'EGALLDataSet.txt' file." # -inputDir:data.MRR-all -outputDir:data.MRR-all -tableDir:data.Table # -files:"EG1+EG3.1+EG3.2-MSE430_2-18Arrays-RMA-Grouped.txt" # # #"Save the edited table as a .txt file" -saveEditedTable2File:EGALLDataSet.txt,noHTML # #"Do a fast edit of the .txt file and don't generate HTML file" -fastEditFile #-noHTML # #-addOutfilePostfix:"-edit" # -allowHdrDups -rmvTrailingBlankRowsAndColumns -hdrLines:2 -useOnlyLastHeaderLine -hasEmptyLineBeforeTable # #"Map header names. Select from field='Affy .CEL file (16)'" #" to field= 'GSP ID (9)' or 'Simple GSP ID (10)'" -mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)" # #"Drop more columns for simplest file" -dropColumn:Difference -dropColumn:p-Value -dropColumn:Description #"Drop some of the columns" -dropColumnColumn:"mgB36 Chr:Start-Stop" -dropColumn:"mgB36 Cytoband" -dropColumn:Annotation_Src -dropColumn:UniGene -dropColumn:RefSeq -dropColumn:Refseqs_Hit -dropColumn:geneIDS_Hit -dropColumn:"Entrez GeneID" -dropColumn:"Locus Tag" -dropColumn:"BioCarta Pathways" -dropColumn:"KEGG Pathways" -dropColumn:"Gene Ontology Terms" (remove # if want to drop) -dropColumn:"GO Tier2 Component" -dropColumn:"GO Tier3 Component" -dropColumn:"GO Tier2 Function" -dropColumn:"GO Tier3 Function" -dropColumn:"GO Tier2 Process" -dropColumn:"GO Tier3 Process" #"The following was added 5/28/09" -dropColumn:"Map" -dropColumn:"mgB37_Probe Chr:Start-Stop" -dropColumn:"mgB37_Probe Cytoband" -dropColumn:"mgB37_RefSeq Chr:Start-Stop" -dropColumn:"mgB37_RefSeq Cytoband" # #"Sort the remaining columns alphabetically" -reorderRemainingColumnsAlphabeticly # #"Reorder columns to left side of Table" -reorderColumn:Gene,1 -reorderColumn:"A-B Mean Difference",2 -reorderColumn:Difference,3 -reorderColumn:"A Mean",4 -reorderColumn:"B Mean",5 -reorderColumn:"A-B p-Value",6 -reorderColumn:"p-Value",7 -reorderColumn:"Well ID",8 -reorderColumn:"Feature ID",9 -reorderColumn:"Description",10 # #"----------- End --------- "
Example 16.2 - generates an .idx Index Map file of the database file
This creates an Index Map .idx file from the database file. These files are
used in other operations such as the
database searchGUI using the script example in
Example 17 where the output .idx file could
be copied to the data.search/ directory.
CvtTabDelim2HTML data.MRR-all/params-MRR-all-fastMakeIndex.map where: data.MRR-all/params-MRR-all-fastMakeIndex.map contains:
#File:params-MRR-all-fastMakeIndex.map #"Revised: 8-19-2009" # -inputDir:data.MRR-all -outputDir:data.MRR-all -tablesDir:data.Table # -files:"EGALLDataSet.txt" # -hdrLines:1 # #"Do a fast edit of the .txt file" -fastEditFile # #"Make an .idx index file of the .txt file" -makeIndexMapFile:"Gene,Well ID,Feature ID" # #"----------- end --------- "
Example 16.3 - generates a .sidx global StatisticsIndex Map file of the database file
This creates a global Statistics Index Map .sidx file from the database file.
This file is used in other operations such as the
database searchGUI using the script example in
Example 17 where the output .sidx file could
be copied to the data.search/ directory and is used if a heatmap table
is generated for the flipped table.
CvtTabDelim2HTML data.MRR-all/params-MRR-all-fastMakeStatisticsIndex.map where: data.MRR-all/params-MRR-all-fastMakeStatisticsIndex.map contains:
#File:params-MRR-all-fastMakeStatisticsIndex.map #"Revised: 8-11-2009" # -inputDir:data.MRR-all -outputDir:data.MRR-all -tablesDir:data.Table # #"Specify the edited data set file to use. It is assumed that" #"the IndexMap file was created and has a .idx file extension." -files:"EGALLDataSet.txt" # -hdrLines:1 # #"Specify columns to drop when analyzing the Statistics, the rest are dropped." -dropColumn:Gene -dropColumn:"Well ID" -dropColumn:"Feature ID" # #"Make an .sidx index file of the .txt and .idx files" -makeStatisticsIndexMapFile # #"----------- end --------- "
Example 17 - the paramsSearchDefault.map file used
with the "-searchGui" option
The the paramsSearchDefault.map file contains additional information
used by the search database GUI ("-searchGui" option). See
Search GUI for more details.
===============================================================
CvtTabDelim2HTML -searchGui where: this looks for the file data.search/paramsSearchDefault.map contains:
#File:paramsSearchFlip.map #"$$DATE$$" # #"Search information read by the search GUI for prompts and menus" -searchTermNames:"Gene,Well ID,Feature ID" -searchRowFilterName:"Sample Experiment Groups" -searchSampleChoiceFile:sampleExperimentGroupsChoices.txt -searchTermsDemoData:"Stat5a Stat5b 1438470_at 1441476_at 1446085_at" -searchUserTermList:"LitRefGeneList.txt,Feature ID,Literature Review" -searchTermsFilterPrompt:"'Gene', 'Well ID', and/or 'Probe' names. E.g., Stat5a, Stat5b, 1438470_at 1441476_at 1446085_at, etc." -searchRowFilterPrompt:"'Sample Experiment Groups'. E.g., select one or more Experiment Groups" # -addPrologue:data.search/prolog.html -addEpilogue:data.search/epilogue.html -addTableName:"$$DATA_SOURCE_SUBTITLE$$" -addRowNumbers # -addTableName:"Search database filtered by Gene and/or Probe IDs and Experiment Groups" # -inputDir:data.search -outputDir:data.search -tablesDir:data.Table # -addOutfilePostfix:"-search" # #"Database (.txt) and index map of database (.idx) to search" -flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx" # $$SEARCH_FILTERS$$ # #"Maps to:" # '-flipOrder:"Gene,Well ID,Feature ID"' # '-flipColumnName:"*LIST*,Gene,g1,g2,...gn"' # '-flipColumnName:"*LIST*,Well ID,w1,w2,...,wk"' # '-flipColumnName:"*LIST*,Feature ID,f1,f2,...,fm"' # '-flipRowNames:"*LIST*,s1,s2,...,sp"' # '-dataPrecisionHTMLtable:-1' # '-showDataHeatmapFlipTable' # '-flipUseExactColumnNameMatch:TRUE' # -allowHdrDups -alternateRowBackgroundColor:white -rmvTrailingBlankRowsAndColumns # #-shrinkBigCells:25,-5 -shrinkBigCells:1,-5 # #"This does multiple header-row data mapping." -hrefHeaderRowMapping # #"These map mAdb Feature Report data to Bioinformatics databases" -hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A -hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene= -hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset= # #"--------- End ---------"
java -Xmx256M -classpath .;.\HTMLtools.jar HTMLtools \ -genBatchScripts:data.GBS/params-genBatchScripts.mapThis in turn generates the batchScripts/ directory described in Section 5 above with a generated batchScripts/buildWebPages.doit params .map scripts to execute and the Windows .BAT file batchScripts/buildWebPages.bat to start the batch processing. The JTV, GenBatchScripts,and JTV commands are currently loosely integrated into HTMLtools so they could be run with the params .map scripts. However, the code is kept fairly distinct so it can be spearated at some point into a separate JTVconverter application when the program is made open source as the JTV conversion is specific to NIH. The program is documented in this Reference Manual file that includes an introduction, list of switch commands, a number of examples and additional documentation. The Java source code files are in the src/ directory.
The program was built using Eclipse (Version 3.4) (www.eclipse.org). The distibution includes the ANT (ant.apache.org) build.xml script that could be used either standalone or with some Integrated Development Environment such as Eclipse (which includes ANT). There is a separate javadocs .BAT file javadocs-HTMLtools.bat that can be used for generated the java class documentation in the javadocs/ directory. The .BAT files are renamed in the initial .zip file distribution and need to be unpacked before use (see Section 2.1 for details).
List of Java Class modules
Source code modules for HTMLtools application.
6.1 Converter GUI design
The ConverterGUI.jar file is just a copy of the HTMLtools.jar
file renamed to HTMLtools.jar. When it runs, it checks to see what it was
called and then does the same thing as it as HTMLtools -gui.
When started, it pops up a graphical user interface (see
2.1.1 Using the Graphical User Interface (GUI) to run the converter.
The user selected, using the File menu, either a parameter .map script file or
a batch .doit file (which contains a list of .map script files). When they
press the Process button, it creates a new thread ProcessData.java
and has it execute the selected .map or .doit file. When processing, it
accumulates a list of HTML files that were generated. When done, it puts
this list into a View HTML chooser GUI. If the user selects one, it will
then pop up a Web browser showing this file.
6.2 Search GUI design
The SearchGui.jar file is just a copy of the HTMLtools.jar
file renamed to SearchGui.jar. When it runs, it checks to see what it was
called and then does the same thing as it as HTMLtools -searchGui.
When started, it pops up a graphical user interface (see
2.1.2 Search User Database with a Graphical User Interface (GUI) Generating Reports.
It also needs to load the Index Map (.idx) file which it does in the background
by creating a new thread ProcessLoadIndexMapData.java which lets the
user continue selecting data in the interface. Processing is delayed until the
map is loaded since it is used to verify the data entered by the user.
The user enters search information into the SearchGui interface to specify
1. a list of genes and/or Well IDs and/or Feature IDs (gene probe IDs);
and then, 2. select one or more experiment groups. When they press
the Process button, it creates a custom script
data.search/paramSearchFlip.map from a default
data.search/paramsSearchDefault.map script that is domain dependent. Then
it creates a new thread (ProcessDataSearch.java) and recursively calls
CvtTabDelim2HTML to execute the just generated paramSearchFlip.map
script file. The script includes the flip table options to actually generate the
flipped table on the specified subset of data. When the thread is done processing,
it has generated data.search/EGALLDataSet-search.txt and
data.search/EGALLDataSet-search.html files. It then lets the user press
the View HTML button to pop up a web brower to see the
EGALLDataSet-search.html file.
In the example data.search/paramSearchFlip.map, it has flip options
specified from a merging of the user-specified data along with other data
(see the file for the rest of the default options)
. . .
#"Database (.txt) and index map of database (.idx) to search"
-flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx"
#
-flipOrderHdrColNames:"Gene,Well ID,Feature ID"
-flipColumnName:"*LIST*,Gene,Stat5a,Stat5b"
-flipColumnName:"*LIST*,Feature ID,1438470_at,1441476_at,1446085_at"
-flipRowFilterNames:"*LIST*,EG002,EG003.1,EG003.2"
#
#"Set the data precision for generated HTML."
-dataPrecisionHTMLtable:0
#
#
#"Set the flip-table sort by column name."
-sortFlipTableByColumnName:"Stat5b"
#
#"Generate heat-map data cells in a HTML conversion if .sidx exists."
-showDataHeatmapFlipTable
#
. . .
Flip table computation
The flip table option uses three precomputed database files (created with CvtTabDelim2HTML): data.search/EGALLDataSet{.txt,.idx,.sidx} that allow us to random access any row by probe ID. (A gene may have 1 to half a dozen probe IDs). Since the number of genes that would be used in a search is relatively small (under 100 - typically a lot less on the order of 10 to 20), gathering the data is relatively fast. The script file specifies -flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx".
If the heatmap option is used, then it uses the EGALLDataSet.sidx file instead (i.e., Statistics Index Map file). This contains the same row seek data as the .idx file, but also statistics (min, max, mean, stddev) for each row and for the entire database. This is then used to map each table data cell value to a cell background color to implement the heat map.
In addition to the heatmap option (default), it also lets you adjust the precision of the data (which normally has 3 or 4 digits) to 0 or more digits, and to sort the rows by the data in a particular gene/probe ID column.
So the processing is broken into two parts: the GUI part (SearchGui) to gather the arg list to generate the paramSearchFlip.map file, and the flip table generate part to build the search results heatmap HTML table.
This section documents the batchScripts/ directory for creating Web pages and briefly describes the 1) contents of the batchScripts/ directory, 2) how it is created using the GenBatchScripts (GBS) facility of this converter program using the list of mAdb tests and data from those tests, and 3) finally how it is used to create Web pages suitable for copying to the Jak-Stat Prospector (J-S P) Web server on http://jak-stat.nih.gov/. This example could serve as a model for developing static Web server pages for other types of analysis system generated data to be used on a static Web server.
Before attempting to run the GenBatchScripts process to create the batchScripts/ directory, we recommend you familiarize your self with the commands in this Reference Manual. The batchScripts/ directory contains a Windows .bat file to run the HTMLtools program, buildWebPages.bat, and a list of conversion batch jobs in buildWebPages.doit. The buildWebPages.doit file contains a list of generated conversions to be performed in converter-batch (as opposed to Windows-batch). Each conversion is in the form of a generated parameter .map file saved in the batchScripts/ParamsScripts/ directory. (In the rest of this discussion for brevity, we will omit the batchScripts/ prefix in mentioning these directories where it is unambiguous.)
There are additional support files that are required for updating the J-S P Web server tree. These are described at the end of this document in the discussion on converting data from GSP-Inventory for the J-S P.
7.1 Overview of Conversion Process
7.2 Contents of the batchScripts/ directory
The batchScripts/ directory contains files
and subdirectories required to convert the mAdb tab-delimited text data into
HTML and JTV data that can then be copied to the J-S P Web site. The list of
generated batchScripts/ subdirectories is:
ParamScripts/ - GBS generated conversion params*.map files InputTree/ - mAdb generated .txt MRR and JTV data are copied by GBS Summary/ - GBS generated text HTML top level Web pages Analyses/ - GBS generated text HTML & edited .txt files JTV/ - GBS generated JTVtext HTML & edited JTV files JTVjars/ - the JTV runtime jar files are copied by GBS data.Table/ - the common mapping files are copied by GBS buildWebPages.doit - GBS generated -batch script to convert MRR and JTV data buildWebPages.bat - GBS generated Windows BAT file run converter on .doit file
ExperimentGroups.map - Experiment Group info by EGxxxx CellTypeTissue.map - Tissue 'Introduction' by EGxxxx for summaries EGMAP.map - map 'Affy .CEL file' names to 'GSP ID's mAdbArraySummary.map - the 'mAdb ID' by 'Affy .CEL file'The ExperimentGroups.map file is the tab-delimited sheet of that name in the GSP-Inventory.xls spreadsheet. The EGMAP.map tab-delimited file is the concatenation of the individual EGxxxx.txt data files from the GSP-Inventory.xls spreadsheet (see Notes sheet in the GSP-Inventory), and assembled into the map by the data.GSP-EG/params-GSPI-EG-concatTXT.map and data.Maps/params-Maps-EGMAP-map.map scripts. The mAdbArraySummary.map file is the saved 'mAdb Array Summary' for all of the samples data. All generated .map files are saved in the to the data.Table/ directory where they are used.
summaryTemplateProlog.html summaryTemplateExperimental.html summaryTemplateAnalysis.html summaryTemplateFurtherAnalysis.html summaryTemplateEpilogue.html
$$TISSUE$$ - tissue associated with the test $$INTRODUCTION$$ - Introduction data from the CellTypeTissue.map file $$LIST_EXPR_GROUPS$$ - list of expression groups used in the test $$DESCRIPTION$$ - description using data from mAdb-TestsToDo data $$ANALYSIS$$ - data generated for the "Further Analysis" section $$FUTHERANALYSIS$$ - data generated for the "Further Analysis" section $$DATE$$ - date of conversion $$INFILENAME$$ - specific test name (e.g., EG1-test-1+FC-ALL.txt)
paramsTemplate-MRR.map - generate MRR gene expression HTML report paramsTemplate-MRR-keep.map - generate MRR gene list HTML report paramsTemplate-JTV-jtvReZip.map - generate mapped JTV data and JTV HTML appletThe $$ keywords are expanded during batchScipts/ParamScripts/ files generation. The entries with ".2" in the name are used for subsequent name remapping during the second phase when evaluating the generated params .map files. This list is common for all params .map files generated for the same test.
$$DATE$$ - date of conversion $$INPUT_DATA$$ - input data relative directory $$OUTPUT_DATA$$ - output data relative directory $$TABLE_DATA$$ - location of the .map files relative directory $$A_SAMPLE_NAME$$ - name of the 'A' condition $$B_SAMPLE_NAME$$ - name of the 'B' condition $$$$JTV_JARS$$$$ - location of the JTV runtime .jar support files directory $$TISSUE.2$$ - tissue associated with the test $$PAGE_LABEL.2$$ - data from page label in mAdb-TestsToDo for test entry $$DESCRIPTION.2$$ - data from description in mAdb-TestsToDo for test entry $$CLASS_A.2$$ - list of GSP IDs for condition A $$CLASS_B.2$$ - list of GSP IDs for condition BThis list can be different for each params .map file generated for the same test.
$$PROLOG$$ - name of prolog file prologMRR.html or prologJTV.html $$EPILOGUE$$ - name of prolog file epilogueMRR.html or epilogueJTV.html $$TITLE.2$$ - title for specific generated Web page $$TEST_OR_ALL.2$$ - "Test" or "All" modifierp $$GBS_DESCRIPTION.2$$ - test specific information $$PARAM_MAP_NAME$$ - name of parameter file with (+-FC, -ALL, -JTV) modifiers $$TESTMAME.2$$ - test name with (+-FC, -ALL, -JTV) modifiers $$FILE.2$$ - data input file for each params .map $$JOIN_TABLE_FILE.2$$ - the -joinTableFile file for MRR-ALL processing only $$MAPDIR$$ - mAdb mapping file for JTV sample name processing only
prologMRR.html epilogueMRR.htmland
prologJTV.html epilogueJTV.html
java -Xmx256M -classpath .;.\HTMLtools.jar HTMLtools \ data.GBS/params-genBatchScripts.mapThis will use the mAdb-TestsToDo.txt Table as well as other files in data.GBS/ including CellTypeTissue.map (the tab-delimited script from the GSP-Inventory.xls spreadsheet) to:
mAdb (MRR) and JTV generated data:
EG1-test-1+FC.txt EG1-test-1-FC.txt (t-Test or fold-change test gene set) EG1-test-1+FC-ALL.txt EG1-test-1-FC-ALL.txt (AND of test gene set with ALL samples) EG1-test-1+FC-JTV.zip EG1-test-1-FC-JTV.zip (JTV heatmap for gene set) EG1-test-1+FC-JTV-ALL.zip EG1-test-1-FC-JTV-ALL.zip (JTV heatmap for ALL samples)Converter output: The .txt files processed by the converter have .html file extensions, the .zip files have the .zip removed and an HTML file generated to start up the JTV. If the mAdb group changes the JTV output to use the GSP IDs instead of mAdb IDs, we can avoid processing the JTV .zip files.
On the J-S-P Web site:
EG1-test-1+FC.txt EG1-test-1-FC.txt (t-Test or fold-change gene set test) EG1-test-1+FC-ALL.txt EG1-test-1-FC-ALL.txt (AND of test gene set with ALL samples) and EG1-test-1+FC.html EG1-test-1-FC.html (t-Test or F-C test - with expr.data) EG1-test-1+FC-keep.html EG1-test-1-FC-keep.html (t-Test or F-C test - no expr. data) EG1-test-1+FC-ALL.html EG1-test-1-FC-ALL.html (AND of test gene set with ALL samples) EG1-test-1+FC-JTV.html EG1-test-1-FC-JTV.html (JTV heatmap for gene set) EG1-test-1+FC-JTV-ALL.html EG1-test-1-FC-JTV-ALL.html (JTV heatmap for ALL samples) EG1-test-1+FC-JTV.zip EG1-test-1-FC-JTV.zip (JTV heatmap for gene set) EG1-test-1+FC-JTV-ALL.zip EG1-test-1-FC-JTV-ALL.zip (JTV heatmap for ALL samples)
java -Xmx512M -classpath .;.\HTMLtools.jar HTMLtools \ -batchProcess:batchScripts/buildWebPages.doit
7.10. Converting data from GSP-Inventory for the Jak-Stat Prospector
There are additional support files that are required for updating the J-S P
Web server tree. The following JS-P GSP/ subdirectories are updated by running
the converter -batchProcessing:batchList.doit batch job: GSP/GSP-Inventory/,
GSP/Search/, and GSP/Tests/ that are created in
html/GSP. Each of these subdirectories has
two additional subdirectories HTML/ and XLS/. The converter then converts the
data to HTML and saves the results in the HTML subdirectories. The XLS/ data
is not created by the HTMLtools converted, but rather separately from the
source data.
7.10.1 HTMLtools distribution directory
The distribution directory has the following data subdirectories required for
generating data for the Jak-Stat Prospector Web site.
Data.GBS/ - GenBatchScripts to create batchScripts/ directory
Data.GSPI-EG/ - EGxxxx.txt data, HTML and concatenated EGMAP.txt scripts
Data.GSPI-ExpGrp/ - ExperimentGroups.txt data and HTML scripts
Data.mAdb-TestsToDo/ - script to create HTML of mAdb-TestsToDo
Data.Maps/ - the scripts used to create HTML and .map files
JTVjars/ - the JTV runtime jar files required
Data.Table/ - primary .txt and .map files for EGMAP, ExperimentGroups,
mAdbArraySummary, and mAdb-TestsToDo files
Directories trees that are created when running the converter. These will contain
the data to be copied to the J-S-P Web site staging directory:
batchScripts/ - the directory created when run GenBatchScripts
html/ - the directory created when run -batchProcess:batchList.doit
JTVoutput/ - the JTV demonstration conversion output (from JTVinput/)
Additional directories have demonstrations of other features that could
be used in conversions including:
Data.MRR/ - separate demonstration mAdb MRR conversions to HTML
Data.MRR-all/ - fast-edit table conversion scripts using buffered I/O
JTVinput/ - the JTV demonstration conversion scripts and data
Additional directories required for support of the converter. Note that the BAT
files in the demo-bat/ directory end in "-bat"
not ".bat". The
README-NOTE-restoring-the-BAT-file-names.txt describes how to make
the BAT files in the demo-bat/ directory runable.
build/build.xml - ANT build file for the making the converter demo-bat/ - additional Windows BAT scripts in portable ("...-bat") form docs/ - additional converter documentation javadocs/ - automatic javadoc Java documentation for the converter src/ - source code for the HTMLtools converterAdditional top level files in the distribution directory:
HTMLtools.jar - converter Java jar file used by the BAT scripts ReferenceManual.html - primary documentation for the converter README-NOTE-restoring-the-BAT-file-names.txt - how to activate the BAT files
java -Xm256M -classpath .;.\HTMLtools.jar HTMLtools \ -batchProcessing:batchList.doitThe subdirectories of generated files are created in html/GSP/ and then copied to subdirectories with the same names in the Jak-Stat Prospector Web tree. See Example 10 for the listing of the batchList.doit file.
It has been released with a small non-proprietary sample data currently publicly available on NCBI GEO to demonstrate some of the aspects of the software.
It was derived and refactored from the open source MAExplorer (http://maexplorer.sourceforge.org/), and Open2Dprot (http://Open2Dprot.sourceforge.net/) Table modules.
Copyright 2008, 2009 by Peter Lemkin
E-Mail: lemkin@users.sourceforge.net
http://lemkingroup.com