HTMLtools Documentation

1. Introduction: GenBatchScripts | 2. Running | Graphical User Interface | Search Database |
3. Commands: (general, GBS, TI, JTV) | 4. Examples | 5. Data sets | 6. Software design |
Source code | Javadocs | 7. batchScripts/ directory | 8. About |
Description Figures: 1 | 2 | 3 | 4
ConvertGUI Figures: G.1 | G.2 | G.3 | G.4 | G.5 | G.6 | G.7 |
SearchGUI Figures: S.1 | S.2 | S.3 | S.4 | S.5 | S.6 | S.7 | S.8 | S.9 | S.10 | S.11 | S.12 | S.13 | S.14 | S.15 | S.16 | S.17

File: ReferenceManual.html

HTMLtools is a Java program to automate the batch conversion of tab-delimited spreadsheet type text files to HTML Web-page files. There are a variety of flexible options to make the Web page presentations more useful. It can also be used for editing large tables. This is described in more detail throughout this reference manual.

Additional command subsets were developed for specialized conversions (JTVconvert, GenBatchScripts, and TestsIntersections) and may be ignored for routine Web page generation in other domains where they don't apply.

The JTVconvert commands can re-map data array names in Java TreeView mAdb data set files to more user friendly experiment names as well as generate HTML Web pages to launch these converted JTV files for each JTV data set.

The GenBatchScripts commands may be used to generate HTMLtools batch scripts for subsequent processing given a list of data test-results files to convert and a tab-delimited tests descriptions file. It is able to use a table (prepared with Excel or some other source) that describes this data and can then use that for extracting and inserting information from various mapping tables into the generated Web pages.

The TestsIntersections commands will synthesize a tests intersection summary table and Web page as well as generating some summary statistics. It uses the same data used in the GenBatchScripts commands.

The Converter GUI mode starts a graphical user command interface (GUI) where they can specify either individual parameter scripts or a batch file list of scripts to be executed in the background with the results shown in the user interface including the ability to view the generated HTML files though a pop up Web browser.

The Search GUI mode starts a database search graphical user interface (GUI) to generated a "flipped table" ((see examples Figure S.14 and Figure S.16)) on a subset of the data from a pre-computed edited table database. They can specify filters on the rows and columns (data and samples subsets), and presentation options to generate a HTML file though a pop up Web browser.

Note: this software is released as an OPEN SOURCE project HTMLtools on http://HTMLtools.SourceForge.net/ with a small non-proprietary sample data set that is bundled along with the program. This data set has already been published on the public access NLM/NCBI GEO Web site. The original data set was proprietary created for the Group STAT Project (GSP) and was created along with the original conversion program, CvtTabDelim2HTML, to support the NIH Jak-Stat Prospector Web site that is part of the Trans-NIH Jak-Stat Initiative (http://jak-stat.nih.gov/) accessible when opened up to the public in the future.

Note: the initial release only includes GSP data that has been released to the public in NCBI's GEO database (currently one data set). As more GSP GEO data is released to the public,we will include some of them in the demo database to illustrate more of the features of HTMLtools.

1. INTRODUCTION

The open source HTMLtools program was developed to help generate Web pages for the Jak-Stat Prospector on the Trans-NIH Jak-Stat Initiative (http://jak-stat.nih.gov/) Web site. The Affymetrix data comes from the Group STAT Project (GSP) headed by Lothar Hennighausen of LGP/NIDDK, a subset of the members of the Trans-NIH Jak-Stat Initiative. The data for the GSP was generated from the a GSP Inventory workbook and the GSP database of Affymetrix data that was assembled on the CIT/CBEL mAdb microarray database system http://mAdb.nci.nih.gov headed by John Powell. A typical data conversion pipeline for the GSP data is shown in Figure 1 below. Note that although the GSP is a specialized database, the publication process for tab-delimited data could easily be generated to other types of data.

HTMLtools is a Java application to convert a directory of tab-delimited files (such as is produced by Excel or similar programs into a directory of HTML <TABLE> formated files. There are various options available to make the table easier to read (shrink large cells), mapping cells to HREF URL links associated with the column headers, alternating row colors, bolding headers, adding prolog and epilogue HTML files, reconfiguring the table to drop specified columns, remove trailing empty rows and columns, keyword mapping (date, filename, and user specified), extract a sub-table by current table column keyword from a resource table file to insert into the prolog or epilogue section in a user specified format, sort rows by column data, etc. See the Section 2 RUNNING THE PROGRAM describes how to unpack the distribution .zip file and other setup information. There is also a Graphical User Interface and a special Search Database graphical user interface used for generating specialized reports. See the Section 3 COMMANDs below that lists the case-sensitive command-line switches for full details.

You may generate a single concatenated file table from all of the input files in the list provided they all have the same column headers. This may be saved as a tab-delimited text file or Web page. The concatenated file can be used for generating a map file to map column data into alternative column data in subsequent runs of the program. You can optionally generate the map file of selected columns from the concatenated file to be used in subsequent conversions.

We recommend that you convert a group of similar files together in the same directory specifying the switches by using a parameter file to avoid inconsistent switches. It will generate a HTML file for each tab-delimited input .txt file in the directory. By convention, we name these 'param-xxx.map' files (for some descriptive comment 'xxx', but they could be called any name (except some file ending in .txt since that would be picked up as a input Table file. Section 4 EXAMPLES 4 to 9 below are equivalent to the parameter files used in the examples. Note: you should avoid using spaces in file names and switches,however if you do then use double quotes around the arguments. The data for these examples is in Section 5 DEMONSTRATION DATA SETS which currently is based on the Jak-Stat Group STAT Project data (this will change in the open source release).

Section 6 describes the SOFTWARE DESIGN for those who want to either modify or extend the code or understand the processing in more detail.

Section 7 describes the BATCH PROCESSING TESTS DATA when it creates the batchScripts/ directory for subsequent further batch processing for created Web pages.

   1. Gather sets of laboratory experiments in multiple laboratories
   relating to the Jak-Stat gene pathway
                  |
                  v
   2. Create Affymetrix microarray data (resulting in .CEL data files)
                  |
                  v
   3. Create Inventory of relevant data and annotation of the data
   in the GSP-Inventory.xls spreadsheets consisting of:
   1) group arrays by experiment EG001, EG002,...EG00n;
   2) a top level spreadsheet ExperimentGroups describing 
   all EG experiments.
                  |
                  v
   4. Consolidate data in mAdb (Microarray DataBase system 
   mAdb.nci.nih.gov). Data is uploaded to each EGxxx subproject, 
   and normalized by pooled RMA or MAS5.
                  |
                  v
   4.1 Perform t-test or fold-change tests on subsets of the data 
   that makes sense to compare saving results (+ and - changes 
   separately) as gene subsets.
                  |
                  v
   4.2 Export tab-delimited (Excel) mAdb Retrieval Reports (MRR) 
   for each gene set for 1) just the arrays used in the test; 
   2) all samples in the database.
                  |
                  v
   4.3 Compute and export the hierarchical clustered heat maps as
   Java Tree View (JTV) .zip tab-delimited data sets for external 
   viewing for 1) just the arrays used in the test; 2) all samples
   in the database.   
                  |
                  v
   5. Convert the MRR and JTV tab-delimited data to HTML Web pages
   using the HTMLtools tools
                  |
                  v
   6. Merge links to this generated data with Web pages in
   the Jak-Stat Prospector Web server (and upload to the server).

Figure 1. shows an example of a data analysis processing pipeline to convert laboratory microarray data to Web pages that can be used in a the Jak-Stat Prospector Web site. Steps 4.1 and 4.2 could be run for a set of experiments as a batch job. Similarly, the set of files exported from mAdb could be batch processed with HTMLtools. Note that although the HTMLtools converter was developed for this project, the command structure is flexible enough that it could easily be used with other types of data.

1.1 Generating Batch Scripts

The HTMLtools has a specialized (at this time) set of commands to generate a set of batch scripts from a tab-delimited table file for subsequent processing by HTMLtools. This set of commands are called the GenBatchScripts commands. For details on generating batch scripts and the these commands, see the GenBatchScripts commands. See Example 15 that illustrates generating batch scripts (GBS). See examples of GBS data in Section 5. Figure 2. shows the batch scripts generation process using GenBatchScripts.

Section 7 describes creating and running the scripts for the batchScripts/ directory for creating Web pages.

   1. Read the mAdb-TestsToDo.txt table that specifies all of the tests to
      be performed on subsets of the mAdb GSP database. For each test, these
      include: test name, samples being compared, test thresholds, test name
      and related annotation, tissue name, relative directory for the data
      (used both for input InputTree/ and output data Analyses/)
      generated directory trees.
                  |
                  v
   2. Create lists of related tests by grouping by same tissue name.
                  |
                  v
   3. Read additional mapping table files (ExperimentGroups.map, EGMAP.map,
      CellTypeTissue.map table to use in generating the summary web pages.
                  |
                  v
   4. Create summary Web pages for each tissue type with links to Web pages
      for analyses we will generate and save in the Summary/ directory.
                  |
                  v
   5. Generate all of the 'params .map' batch scripts, several for each test,
      and save them in the ParamScripts/ directory (see Figure 3 for 
      details). It then copies all support files (above mapping tables),
      JTVjars/, data.Table/ and other files required when running
      converter to generate Web pages. 
                  |
                  v
   6. Generate a buildWebPages.doit file listing the params .map files to be
      processed with a subsequent batch run using the HTMLtools converter,
      and a Windows .BAT file, buildWebPages.bat
                  v
   7. Start the buildWebPages.bat batch job which generates the Web pages in
      the Summary/, Analyses/ and JTV/ directory trees.  
                  |
                  v
   8. Copy the generated Web pages to the Web server.

Figure 2. shows an example of the batch script generation pipeline from a table describing a lists of tests that were run as a batch job on another analysis system. In this case, the analysis system is mAdb and it uses the same test "todo" file to specify the tests data.Table/mAdb-TestsToDo.txt as are used here with the GenBatchScripts processing. The mAdb data analysis and tab-delimited Excel data generated is shown in steps 4.1 and 4.2 (see Figure 1). In the GenBatchScripts processing, we first create a batchScripts/ directory and then fill it with various types of data described in this figure.

The following Figure 3. shows the details on how each test in the data.Table/mAdb-TestsToDo.txt list is then expanded to include samples that have an increased gene expression fold-change (FC) indicated by a "+FC" and a decreased fold change "-FC". The +/-"FC.html" files are the sample expression data for the test. The +/-"FC-keep.html" files are the same data showing the gene changes but not the actual expression data. The +/-"FC-ALL.html" files are the expression data for all samples, but just for the genes that passed the test. The +/-"FC-JTV.zip" and +/-"FC-ALL-JTV.zip" files are the Java TreeView zip file for the corresponding tests. The +/-"FC-JTV.html" and +/-"FC-ALL-JTV.html" files are the HTML Web pages generated to invoke Java TreeView for that data. All links to the converter output files are in the batchScripts/Summary/ Web pages discussed above. Also note that JTV heatmap conversions are not performed form tests which only have 2 samples (1 per class) since clustering does not make sense for 2 samples.

   
  tests (MRR & JTV) input:     Converter output:
  
  Tests for samples:
   {testName}+FC.txt           {testName}+FC.html 
                               {testName}+FC-keep.html      
   {testName}-FC.txt           {testName}+FC.html 
                               {testName}-FC-keep.html
           
  AND of above tests for ALL samples:
   {testName}+FC-ALL.txt       {testName}+FC-ALL.html
   {testName}-FC-ALL.txt       {testName}-FC-ALL.html
          
  JTV for test samples:        
   {testName}+FC-JTV.zip       {testName}+FC-JTV/ 
                               {testName}+FC-JTV.zip   
                               {testName}+FC-JTV.html  
   {testName}-FC-JTV.zip       {testName}-FC-JTV/ 
                               {testName}-FC-JTV.zip 
                               {testName}-FC-JTV.html
  
  JTV above for AND of above tests for ALL Samples:
   {testName}+FC-ALL-JTV.zip   {testName}+FC-ALL-JTV/ 
                               {testName}+FC-ALL-JTV.zip  
                               {testName}+FC-ALL-JTV.html 
   {testName}-FC-ALL-JTV.zip   {testName}-FC-ALL-JTV/ 
                               {testName}-FC-ALL-JTV.zip  
                               {testName}-FC-ALL-JTV.html

Figure 3. shows the set of 8 mAdb results files and 18 converter HTML and JTV generated for each test testName in the mAdb-TestsToDo list. For example, if the test is "EG3.1-test-2", then in the above figure, replace {testName} with EG3.1-test-2, etc. The "+FC" indicates a positive fold-change, and the "-FC" a negative fold-change. The file with "-keep" are gene lists with no expression data. The GenBatchScripts option for the converter generates parameters .map batch scripts for each of these converted files.

Top-level procedure for adding data to the Jak-Stat Prospector database

The following Figure 4. shows the steps required for adding new data to the Jak-Stat Prospector database. It primarily consists of collecting and annotating data in the GSP-Inventory.xls and mAdb-TestsToDo.xls Excel files. Then using this data to generate and convert the new data after it has been uploaded and processed on the mAdb system. The results are converted and uploaded to the NIDDK Web server.


   1. Edit the <GSP-Inventory Excel workbook to annotate the
      set of Affymetrix .CEL files where we assign the next
      free Experiment Group EGnnn, simple GSP ID, GSP ID, etc.
                  |
                  v
   2. Upload the Affymetrix .CEL file data to the GSP mAdb 
      database and normalized the new samples using the pooled
      RMA data for the base GSP database.
                  |
                  v
   3. Add new test to-do in the mAdb-TestsToDo.xls Excel workbook
      and upload the new test list to mAdb.
                  |
                  v
   4. Run the batch tests in mAdb resulting in Excel and JTV
      data sets that are exported for conversion to Web pages.
                  |
                  v
   5. Process these data using the HTMLtools converter
      into HTML pages and converted data for the Web server.     
                  |
                  v
   6. Upload these Web pages and data to the NIDDK Jak-Stat 
      Prospector staging area for the jak-stat.nih.gov server.

Figure 4. shows shows shows the top-level procedure used for adding new Affymetrix .CEL file data sets to the GSP database and Jak Stat Prosector Web server. In addition, if new gene identifications are made to some of the affymetrix probes (Feature IDs), running steps 4) through 6) can update these identifiers.

1.2 Java TreeView conversions

The 'jtv' prefixed commands are only used for converting Java TreeView (JTV) files by mapping the "mAdb ID" array names to experiment data names specified by a mapping file. You can't mix tab-delimited file to HTML conversions with JTV conversions in the same batch file (except for -addProlog and -addEpilogue switches). See Example 9 of a JTV conversion.

Java TreeView (JTV) Documentation

Java TreeView is an open-source (jTreeView.sourceforge.net) Java applet that mAdb uses to view heatmaps of gene sets. We also use Java TreeView for looking at data snapshots we have taken of the mAdb data.

Java TreeView may be downloaded to run as either a standalone application or Java applet from http://jTreeView.sourceforge.net/.

The 2004 journal paper by Alok J. Saldanha gives an overview of Java TreeView: "Java Treeview�extensible visualization of microarray data" Bioinformatics 2004 20(17):3246-3248.

There is additional Java TreeView documentation Web page includes links to examples, an FAQ, a user guide, Alok J. Saldanha's disertation describing additional aspects of Java Treeview. NOTE: The Java TreeView applet has been shown to work on Mac OSX, XP and Win2K.

2. RUNNING THE PROGRAM

First install Java (version 1.4 or later) on your computer if it is not already installed. You may install either the JDK or JRE.

Copy and unpack the HTMLtools-dist.zip file to a user directory where you want to convert files (or set up a temporary directory). There will be several sample data/ input directories, html/ directory, (JTVinput/, JTVjars/ and (optional) JTVoutput/ directories) as well as HTMLtools.jar and a sample param.map file in the data/ directory. You will also need to unpack the .BAT and .EXE files (described below in section 2.1 so they are executable. See Section 5 Data sets for examples of this data.
In the case of doing a Java TreeView conversion, there will be JTVinput/, JTVjars/, and (optional) JTVoutput/ directories. The zipped or unzipped mAdb JTV sub directories are put in JTVinput/ and the JTVjars/ has all of the .jar files needed to invoke JTV.
Then copy the tab-delimited .txt files you want to convert to the data/ directory relative to where the HTMLtools diretory resides. Alternatively, you could make a new data sub-directory in the HTMLtools directory (e.g., dataXXX or whatever you want to call it) since you may have any number of data directories in the HTMLtools directory. Edit a paramXXX.map file in the dataXXX/ directory modeled after one of the sample params .map files in the examples.
The converter may be started several different ways. The first two methods invoke graphical user interfaces. The third method involves typing commands into your operating system command. The fourth method involves invoking batch scripts (.BAT files in Windows) which basically supplies the command line you would have typed in the third method. The fourth method offers the most flexibility, whereas the first two are easier to use.
a) Clicking on the supplied Java .jar file(s)
If your computer (Windows, MacOS-X or Unix machines) is set up to execute Java .jar files, then you can run a Graphical User Interface version of the program by clicking on either:
1. ConverterGui.jar to run the full converter. You select a parameter script (.map) file to excecute and then press the Process button. This is equivalent to running HTMLtools -gui. See the description Using the Graphical User Interface (GUI) to run the converter for more details.
2. SearchGui.jar to run a subset of converter to specify a search query on a previously computed database file and then to generate a "flipped table" from the search results. You specify search terms and sample subsets and then press the Process button. This is equivalent to running HTMLtools -searchGui. See the description Search User Database with a Graphical User Interface (GUI) Generating Reports for more details.
b) Typing commands to your operating system's command shell
Start a command line window (e.g., run CMD in Windows, or create a shell window in Unix, MacOS-X, or Cygwin). The command line method requires that you know the switches to use and so is more difficult way to run the program. Note: you can specify the same switches you would type into the command line in a script file (describe next). Some of the required switches are:
```
     java -Xmx256M -classpath .;.\HTMLtools.jar \
          HTMLtools -inputDir:dataXXX -outputDir:html (etc.)  
   
```
Using a parameter script file is easier. You can start the program through the command line Java as for example (in Windows):
```
     java -Xmx256M -classpath .;.\HTMLtools.jar \ 
          HTMLtools dataXXX/paramsXXX.map
or (in Unix, MacOS-X, or Cygwin):
     java -Xmx256M -classpath .;./HTMLtools.jar \ 
          HTMLtools dataXXX/paramsXXX.map
or
     java -Xmx256M -classpath . -jar HTMLtools.jar \ 
          dataXXX/paramsXXX.map
  
```
where the dataXXX/ directory and the paramsXXX.map file are replaced by your data directory and params map file. Then, the generated HTML files will be in the html/ directory or whatever output directory is specified by the -outputDirectory switch in the paramsXXX.map file. This command line tells Java to run the program with 256 Mbytes of memory. For very large files, you may need to increase this memory size. For very large data sets, even that may cause problems and you may not be able to convert them since for the default mode, the Table is loaded into memory before being edited. Some commands such as -fastEditFile are designed to work with very large files and process them as a buffered I/O pipeline and so don't load the Table into memory.
c) Run a "batch" script (.BAT in windows or the equalivalent in Unit.)
In Windows, just click on one of the .bat files suplied (see the .BAT file scripts in demo-bat).
Run the program through a command line Java as for example (in Windows):
The program generates (and overrides each time it is run) a HTMLtools.log file of the processing every time it is run. If the program is used in batch mode ('-batchProcess' switch), then all batch jobs process logging will be in the same logging file.
Batch processing, (with the -batchProcess switch), is performed on a list of params .map files contained in a batch input file (such as batchList.doit). It is invoked by
```
     java -Xmx256M -classpath .;.\HTMLtools.jar \
          HTMLtools -batchProcessing:batchList.doit  
   
```
The GenBatchScripts processing is performed in a special params script called data.GBS/params-genBatchScripts.map. There is nothing special about this script except that it uses the GenBatchScripts commands. So you could customize it for your data. It is invoked by
```
     java -Xmx256M -classpath .;.\HTMLtools.jar \
          HTMLtools data.GBS:params-genBatchScripts.map 
   
```
The program is documented in this file.

2.1 Unpacking the .BAT and .EXE files so they are executable

The following is from the README-NOTE-restoring-the-BAT-file-names.txt file. Because of occasionally restrictive E-mail filtering problems, the .BAT files have been renamed from xxxx.bat to xxxx-bat. To restore the .BAT files in the demo-bat/ directory to the parent directory so they are operational to process the data, it just renames the files from xxxx-bat to xxxx.bat and copied to the parent directory.

To make it easier to do this there is a single .BAT file (called RESTORE_DEMO_BAT_FILES-bat) that will do this by renaming the files and copying them to the distribution directory.

Change (cd) to the demo-bat directory where the .BAT files are distributed when you unpack the distribution .zip file.
Rename (this) RESTORE_BAT_FILES-bat file to the new name RESTORE_BAT_FILES.bat file so you can make it executable.
Execute the RESTORE_BAT_FILES.bat in the demo-bat directory to copy the other -bat files to .bat files in the distribution directory (the parent directory of demo-bat/) where they are now ready to run.

Note that to run the .bat files, you should change to the distribution directory (i.e. where src/, data directories, html/ directories etc. reside). This is necessary so that the .bat files start with the correct path.

2.1.1 Using the Graphical User Interface (GUI) to run the converter

The converter can also be used from a graphical user interface or GUI by selecting either a batch ".doit" script file or a parameter ".map" script file. It works the same as if the script were run with the -batch switch (in the case of the ".doit" file) or with a parameter ".map" file. However, in addition, it accumulates a list of any HTML files that were generated and then lets the user select any of them to view in a popup Web browser. More than one generated HTML file can be viewed, but just selecting additional files from the View pull-down chooser.

It is invoked from the command line as:
     java -Xmx512M -classpath .;.\HTMLtools.jar HTMLtools -gui

or using the M.S. Windows script (similar for Mac and Linux):
     cvtTxt2HTML-GUI.bat

This is illustrated in the following screen shots.

Figure G.1 This shows the Initial graphical user interface. Using the File menu, the user should select either a batch ".doit" file or a parameter ".map" script file. The File menu is shown in Figure G.2 Converter Graphical User Interface

Figure G.2 This shows the File menu, the user should select either a batch ".doit" file or a parameter ".map" script file.

Converter Graphical User Interface - File menu

Figure G.3 This shows GUI after selectiong the script to process. The user then presses the Process button to start processing. The next Figure G.4 shows the program during processing.

Converter Graphical User Interface - press Process to start processing

Figure G.4 This shows GUI during Processing. The output from the converter is shown in the Report window in the middle of the GUI. This can be saved into a .txt file or cleared if desired. The next Figure G.5 shows the program after processing is finished.

Converter Graphical User Interface - during processing

Figure G.5 This shows the GUI after Processing. The output from the converter is shown in the Report window in the middle of the GUI. This can be saved into a .txt file or cleared if desired. The next Figure G.6 shows the program after processing is finished.

Converter Graphical User Interface - finished processing

Figure G.6 This shows the GUI generated HTML options to choose to view. Selecting one of them will popup a Web browser with that file (see Figure G.8).

Converter Graphical User Interface - choosing HTML file to view

Figure G.7 This shows the popup Web browser window for the the selected generated HTML file you chose to view.

Converter Graphical User Interface - browser showing HTML generated search results table

2.1.2 Search User Database with a Graphical User Interface (GUI) Generating Reports

The converter has an additional graphical user interface (GUI) mode available to perform a the user-specified database search of a large database file. It can generate and view using a Java Graphical user interface shown below. This assumes that one a database .txt and its corresponding Index Map .idx file (see Example 16 that generates the a .txt database file and an Index Map .idx file). In addition, it uses a default parameter script (paramsSearchDefault.map) with additional information used by the search GUI (see Example 17). Processing is relatively quick. For a 151 sample database with 45K rows of gene probes, processing time is 7 or 8 seconds on a PC. Noted: We are releasing a 18 sample database on HTMLtools.sf.net from that have been publicly release on the NCBI GEO database.

Starting the Search Database program

Processing is relatively quick. For a 151 Sample database with 45K rows of gene probes, processing time is about 7 or 8 seconds on a PC for a list of about 50 genes.

The program is run several ways after opening the SearchGUI directory. All methods require you to have Java installed (which is the case for almost all computers these days). If your computer allows launching Java applications by clicking on them, just click on the SearchGui.jar file to start the program.

However, if you must explicitly run the Java interpreter, you can do it on the command line (invoked various ways on different operating systems) by typing

     java -Xmx256M -classpath . -jar searchGui.jar

     java -Xmx256M -classpath .;.\HTMLtools.jar HTMLtools -searchGui

This line was put into a Windows .BAT file (SearchGUI.bat) that can be run by clicking on this batch file. Notice that the -Xmx256M specification is available to increase or decrease the amount of memory used. The default memory may vary on different computers. So you can use the script for force it the program to start with more or less memory if you run into problems.

Specifying the search terms and Samples subset

After the windows pops up (see Figure S.1 below), you must specify a set of search terms and samples before doing the search. The search terms are entered into the text area 1. Enter list of Gene, Well ID or Probe ID... and can be any combination of gene names, mAdb well ids or probe IDs. They will map to the corresponding probe IDs.

You also need to select the set of samples to use by selecting one or more Experiment Groups (see the Jak-Stat Prospector Web site for details on Experiment Groups). In the 2. Select one or more 'Sample Experiment Groups' window, selecting ALL is the default and will select all 18 arrays. You can click on individual Experiment Groups. To select a range, click on the first one that starts the range and then hold the SHIFT key and click the end of the range. To select non-adjacent Experiment Groups, hold the CONTROL key as you select different groups. Pressing the Reset button, will clear these two windows.

Using the Search Database GUI for generating specialized reports

To help illustrate the operation, we present a sequence of screen shots through the rest of the document.

Figure S.1 This shows the search terms (1.) and the sample groups (2) selected for searching the database. After the user selects these, they press the Process. Later, after the search results table is generated, the View HTML button is activated. Pressing the View HTML pops up local Web browser (see example in (see examples Figure S.14 and Figure S.16)).

Search GUI interface showing queries for 1. search terms, 2. Experiment Group samples, and Process button.

The File menu offers additional some data input options. You do not need to use any of these menu options to use the program. However, they can be useful for customizing your search results.

You can save the text output generated during processing that is shown in the 3. Processing Report Log scrollable text area at the bottom of the window. Several File menu commands are used with this including: Clear Report-Log, and Save Report-Log As. Note that the Clear report, and Save report as commands are also available in the bottom as buttons with the same names.

You must specify a list of data search terms in the upper window 1. Enter list of Gene, Well ID or Probe ID.... The simplest way to specify these terms is to either cut and paste or type them into the window. To help demonstrate and simplify specifying the search terms, there are two commands in the File menu for setting the list: Set demo term-list data to enter a short list Stat5a Stat5b 1438470_at 1441476_at 1446085_at. The other is Import user term-list data from a file. The file can be a list of Genes or Feature IDs (probes) or Well IDs or any combination. Several example files are provided including data.search/LitRefGeneList.txt file, data.search/testGeneList.txt, and data.search/testFeatureIDList.txt. The first is a tab-delimited data with all 3 fields. The latter two examples just have lists of Genes or Feature IDs.

After you finish a search, you can do another one. The File menu options: Reset converter or the Reset button at the bottom of the Window will reset the search specification and make the Process button available.

Figure S.2 This shows the menu options in the File menu. This menu offers additional processing options described above.

Search GUI interface showing the File menu options

Figure S.3 This shows the popup file browser for specifying a list of gene/probes in a .txt file using the (File | Import user term-list data from a file) menu option. If the testGeneList.txt file was selected, the next figure shows the new term-list.

Search GUI interface showing the File menu Import search terms file options

Figure S.4 This shows the new term-list specified from importing the gene list from a file (previous figure)..

Search GUI interface showing the list of genes specified from a term-list file.

The View menu offers additional some data input options. The menu Verbose reporting check box could be enabled it you want to see the details on the search and table generating as it progresses in the Report-Log window. When the search results table is being generated, you can modify it's presentation using other view options: Sort descending by column data in generated table (see Figure S.6 for more details). The Show data heat-map in View HTML to show the generated results table as a colored heatmap (see Figure S.14 for an example) This is the default. Finally, Set data precision for generated HTML to adjust the number of digits presented in the generated table (0 sets it to no fraction, whereas the default -1 shows the full precision of available in the data).

Figure S.5 This shows the menu options in the View menu. This menu offers additional processing options described above.

Search GUI interface showing the View menu options

Figure S.6 This shows the pop up query to let you define the sort name to specify the generated table gene or gene probe ID column to be used for the sort process. This will then use the gene expression data for the gene probe you specified to sort the sample rows for the entire table. The default is not to sort the data, but to use the sample order of the samples in the expression groups you have specified. This pop up window is invoked from the (View menu | Sort descending by column data in generated table).

Pop up query to specify the gene or probe ID column to sort generated table rows

Figure S.7 This shows dialog box (View menu | Set data precision for generated HTML). The default is -1 which prints all digits available in the generated HTML table. Setting it to 0, removes all fractions (used in this example).

Pop up query to specify the data precision for the generated HTML table

Figure S.8 This shows the menu options in the List menu. You may list some of the data matching the gene/probe search terms or EG sample search terms prior to doing the search. The first option is to list all 45K gene/probe IDs. The second menu option lets you specify gene/probe search terms either using the exact gene names or using substrings. All genes/probes matching will be reported. The third menu option lets you specify EG samples search terms either using selected EG groups from the list. In addition, this is filtered by a list of substrings which can be qualified as both being required (AND) or either being required (OR) if the EG sample search terms are specified. All lists are reported in the bottom scrollable Report Window.

Database SearchGUI - List menu

Figure S.9 This shows results from List menu | List matching genes in database in the Report window . The genes/probes matching the substring terms "stat" and "jak" in the 45K probe database are listed in the scrollable Processing Report log at the bottom of the window.

Database SearchGUI - List Genes selected by sub-string search

Figure S.10 This shows results from List menu | List matching EG samples in database in the Report window using the OR condition. The Expression Group (EG) samples matching the substring terms ".treated" or ".untreated" in the 151 sample database are listed in the scrollable Processing Report log at the bottom of the window. It searches within the EG sample groups you have selected. In this example, we have selected "All samples", but any other subset could be used. Also, we required an OR condition to select samples where either of the search terms are present.

Database SearchGUI - List EG samples selected by sub-string search

Figure S.11 This shows results from List menu | List matching EG samples in database in the Report window using the AND condition. The Expression Group (EG) samples matching the substring terms ".stat" and ".GH" in the 18 sample database are listed in the scrollable Processing Report log at the bottom of the window. It searches within the EG sample groups you have selected. In this example, we have selected "All samples", but any other subset could be used. Also, we are required an AND condition to select samples where both search terms are present.

Database SearchGUI - List EG samples selected by sub-string search

Figure S.12 This shows the Search window before processing is finished and the Process button is made available. Pressing it will start processing. This will typically take 7 to 10 seconds, so be patient. Note that the View HTML button is disabled and will be enabled after processing is completed.

Database SearchGUI - before processing is started by pressing the 'Process' button

Figure S.13 This shows the Search window after processing is finished and the View HTML button is made available. Pressing it will pop up a local web browser with the data shown in the next figure. Note that the Process button is now disabled and will be until you reset the converter using the Reset button.

Database SearchGUI - after processing is finished press the 'View HTML' to see search results

Figure S.14 This shows the generated table Web page created by the above search and viewed when the View HTML button was pressed. The colored cells reflect the quantiles that the data belong to and are based on (max, min, mean, stddev) statistics computed over the entire database. The data was sorted by the third probe (Stat5b/1422103_a_at) and the numeric data was listed with no fractions to make it easier to "eyeball" the data.

Database SearchGUI - browser showing the generated search results

Adding fold-change statistics to the generated HTML report

The procedure used to compare the fold change of the Stat5 subsets for the specified genes (I used the demo set of genes/probes) for the two sets of sample EG003.1 (Stat5KO+GH) and EG003.2 (Stat5KO-GH), called classes A and B here and in the SearchGUI menus and report.

Procedure

Set the list of search terms by either typing them in or using either the demo set (File menu | Set demo term-list data) or import the terms from a text file. [The demo data set was used in this example.]
Then set the (View menu | Report Fold Change of 2 sample Sets) to fold-change mode. See Fig S.15.
Set the 'Sample Experiment Groups' filter once for each set of samples to define the class A and class B samples.
e.g., set 2. filter sample search term to ".stat", select EG003.1 in the scrollable list, then select
(View menu | Assign EG samples to Class A) to define class A samples e.g., set 2. filter sample search term to ".stat", select EG003.2 in the scrollable list, then select
(View menu | Assign EG samples to Class B) to define class B samples.
Then press the Process button (as usual), and wait about 10 seconds.
Then press the View HTML to see the results. (shown below in second screen dump). See Fig S.16.

The generated HTML and .txt files are attached in this email. Note that the fold-change results are appended to the regular table and the the class A and class B samples have those identifiers prefixed to their sample names. Note that the fold-change report is in the second half of the report with the statistics reported being computed on the column data for each gene/probe. Note: Sorting is can't be enabled if generating the fold-change report data since it would cause problems with the reporting format.

Figure S.15 This shows the menu options in the View menu after the (View | Report Fold Change of 2 sample subsets) option was enabled. Note the two new commands that are activated: Assign EG samples to Class A and Assign EG samples to Class B.

Search GUI interface showing the View menu report fold-change options

Figure S.16 This shows the report generated that includes the intensity data followed by the fold-change and statistics for that data generated using the data in the previous figure.

Search GUI interface showing the HTML report with fold-change statistics

SearcGUI Help

There are several Web pages that contain the documentation in the Help menu.

Figure S.8 This shows the menu options in the Help menu. This menu offers additional processing options described above. This document is the first entry Documentation on using the Search GUI.

Search GUI interface showing the Help menu options

3. COMMAND LINE SWITCHES

Command line switches are case-sensitive and of the form '-switchName:a1,a2,...,an' where: 'switchName' is the minimum number of characters in the switch shown below, and 'a1', 'a2', etc. are the comma-separated switch arguments with no spaces between the commas and the arguments. Use double quotes in arguments with spaces. Tabs are not allowed and all switches must be on the same line unless either the switches are in a parameter file in which case they are on separate lines, or the command lines is entered using line continuation characters for the operating system (e.g., '\' in Unix, etc). Switches with additional arguments require the comma-separated arguments after the ':'. We denote the arguments as being within '{'...'}' brackets. Note you do not include the '{' or '}' brackets in the actual switches - it just denotes that is some argument. There may be multiple instances of some of the switch commands including: -files, -hrefData, -dropColumn, -keepColumn, -reorderColumn, -sortTableByColumn, -mapDollarsigns, -mapQuestionmarks, -copyFile, -copyTree, -genCopySupportFile, -genParamTemplate, -genSummaryTemplate, -genCopyfile, -genTreeCopyData, -dirIndexHtml.

Three additional subsets of specialized commands that are described separately: the 3.1 GenBatchScripts commands, 3.2 Tests-Intersection and the 3.3 Java TreeView commands.

{parameter command file}
[this argument does not start with '-' and is thus
assumed to be a parameter command file. It will then
get all of the command switches from this file if
present. Examples of command file contents are in
the EXAMPLES section below. By convention, we name
these command text files 'paramXXX.map' with a '.map'
file extension and keep them in the same directory
that we specify with -inputDirectory. We refer to these
fileas throughout this document as "params .map" files.
The .map file extension is used for tab-delimited text
files that we do not want to convert. We only convert
tab-delimited text files with .txt file extensions.]

-addE:{opt. epilogue file name}
['-addEpilogue:{opt epilogue filename}' add an
epilogue HTML file in inputDir or user directory
(common epilogue for all conversions). If the
keywords $$DATE$$ or $$INPUTFILENAME$$ is in the
file, it will substitute today's date or file name
respectively. $$FILE_ZIP_EXTENSION$$ will substitute
the file name with a ".zip" extension. Default name is
'epilogue.html'. Default is to not add an epilogue to
the HTML output.]

-addO:{postfix name}
['-addOutfilePostfix:{postfix name}' add a postfix
name to the output file before the .html. E.g., for
an output file 'abc.html', with a postfix name of
'-xyz', the new name is 'abcxyz.html'. This can
be useful if you are mapping the same input file
by several different param.map files and saving
them all in the same html/ output directory.]

-addP:{opt. prolog file name}
['-addProlog:{opt. prolog file name}' to add a prolog
HTML file in inputDir or user directory (common prolog
for all conversions. If the keywords $$DATE$$ or
$$INPUTFILENAME$$ is in the file, it will
substitute today's date or file name respectively.
$$FILE_ZIP_EXTENSION$$ will substitute the file name
with a ".zip" extension. Default name is 'prolog.html'.
The default is to not add a prolog to the HTML output.]

-addRow
['-addRowNumbers to preface each row with sequential
row numbers. Default is to not add row numbers.]

-addT
['-addTableName' to add TABLE name to HTML. Default
is to not add the name.]

-allowH
['-allowHdrDups' to allow duplicate column fields
in the header. Default is to not allow duplicates.]

-alt:{color name}
['-alternateRowBackgroundColor:{c}' alternate the
background row cell colors in the <TABLE>.
Default is no color changes.]

-batchP:{file of param specs, opt. new working dir}
['-batchProcess:{file of param specs, opt. new working dir}'
batch process a list of param.map type files specified
in a file. If the {opt. new working dir}value is
specified, it will change the current working directory
of the HTMLtools when runnning -batchProcess so
that you can specify it run in a particular environment.
No other switches should be used with this as they will
be ignored. If errors occur in any of the batch jobs,
the errors are logged in the HTMLtools.log file
and it aborts that particular job and continues on
to do the next job in the batch list. Default
is no batch processing.]

-concat:{concatenatedDataFile,opt."noHTML"}
['-concatTables:{concatenatedDataFile,opt."noHTML"}' to
create a new tab-delimited {concatenatedDataFile} (e.g.,
".txt" or ".map" file) and a .html output file using the
base address (without the ".txt" or ".map" file extensions)
of the {concatenatedDataFile} and if the "noHTML" option
is not specified. The data is from the set of concatenated
input text files data if-and-only-if they have exactly the
same column header names. The -outputDir specifies where
the files are saved. The input files are not converted
to HTML files. Default is to not concatinate the
input files. The -makeMapFile switch can be used
along with the concat switch to make a map file with fewer
columns.]

-copyFile:{sourceTreeDir,destDir}
['-copyFile:{srcFile,destFile}' to copy an input source
file {srcFile} to a destination subdirectory {destDir}.
There can be multiple instances of this option. Default is
to not copy tree data.]

-copyTree:{sourceTreeDir,destDir}
['-copyTree:{srcTreeFiles,destPath}' to copy an input
source tree subdirectory to a destination subdirectory.
There can be multiple instances of this option. Default is
to not copy tree data.]

-dataP:{nbr digits precision}'
[-dataPrecisionHTMLtable:{nbr digits precision}' sets the
precision to use in numeric data for a generated HTML file.
The table must be a numeric data table (such as generated
using the '-flipTableByIndexMap' option. If the value is < 0,
then use the full precision of the data (as supplied in the input
string data). If {nbr digits precision} >= 0, then clip digits
as required.]

-dirIndexHtml:{dir,'O'verride or 'N'ooverride}
['-dirIndexHtml:{dir,'O'verrideor 'N'ooverride}' to create
"index.html" files of all of the files in the specified directories
in the list of directories specified with multiple copies
of this switch. It is useful when copying a set of directories
on a Web server that does not show the contents of the directory
if there is no index.html file. In addition, if the corresponding
flag'Override', then override the "index.html" file if it
already exists in that directory otherwise don't generate the
"index.html" file. Do this recursively on each directory.
Default is no index.html file generation. Multiple copies
of the switch are allowed.]

-dropColumn:{column header name}
['-dropColumn:{column header name}' to specify a
column to drop from the ouput TABLE. There can be
multiple instances of this switch.]

-exportB:{opt. big size threshold}
['-exportBigCellsToHTMLfile:{opt. size for big}'
to save the contents of big cells as separate
HTML files with a prefix
'big-R<r>C<c>-<outputFileName>.
So for a (r,c) of (4,5) and a file name 'xyz.html',
the generated name would be 'big-R4C5-xyz.html'.
The big size threshold defaults to 200. Default is no
exporting of big cells.]

-extractR:{colName,rowNbr,resourceTblFile,htmlStyle}
['-extractRow:{colName,rowNbr,resourceTblFile,
htmlStyle}' to get and lookup a keyword in the
table being processed at (colName,rowNbr) and then
to search a resourceTblFile for that keyword. If
it found, then it will extract the header row and
the data row from the resource file and create
HTML of htmlStyle to insert into the epilogue.
If $$EXTRACT_ROW$$ is in the epilogue, then
replace it with the generated HTML else insert
the HTML at the front of the epilogue. The
htmlStyles may be DL, OL, UL and TABLE. Default
is no row extraction.]

-fastE:{outTblFile}
['-fastEditFile:{opt. output file} to allow processing
input file data line by line table that does not
buffer the data in a Table structure, but remaps each
line on the fly using -mapHdrNames,
{-dropColumns or -keepColumns} followed by
-reorderColumns. Data is written immediately to an
output stream so it can handle huge files. Because
it is sequential, it can't do a -sortRowsByColumnData.
This would generally be used to generate a tab-delim
.txt files that can be random accessed. HTML table
generation is disabled. It is used instead of
'-saveEditedTable2File:{outTblFile,opt. "noHTML"}'
and overides the -saveEditedTable2File options.
Default is not to do a fast edit.]

-files:{f1,f2,...,fn}
['-files:{f1,f2,...,fn}' to specify list of files
here rather than all in all of the files in the
inputDir. You can have multiple instances of this
switch.]

-flipA:{flipAclass}
['-flipAclass:{flipAclass}' to specify the list of EG samples used
in class A if reporting the fold-change data in the flipped Table report.
Default is no list of EG samples is specified.]

-flipB:{flipBclass}
['-flipBclass:{flipBclass}' to specify the list of EG samples used
in class B if reporting the fold-change data in the flipped Table report.
Default is no list of EG samples is specified.]

-flipC:{flipColumnFile,flipColumnName} or -flipC:{*LIST*,flipColumnName,v1,v2,...vn}
['-flipColumnName:{flipColumnFile,flipColumnName}'
to specify the source Table column name to use in
filtering which row data to use in the
'-flipTableByIndexMap' operation. An alternative specification
is '-flipColumnName:{*LIST*,flipColumnName,v1,v2,...vn}'
where the values are listed explicitly. Multiple instances
of this '-flipColumnName' switch are used to specify
the header entries by '{flipColumnName}' of the new
flipped table. If the {flipColumnFile}' files exist,
they are used to filter the {flipDataFile} row entries.
Only the rows of the original Table that match one
of the {column-data-list} entries will be transposed.
Default is to transpose all rows unless the filter
files are specified.]

-flipD:{flipDirectory}
['-flipDirectory:{flipDirectory}' to specify the directory to
save the generated flipped Table. Default is the data.search
directory.]

-flipE:{flipExcludeColumnName}
['-flipExcludeColumnName:{flipExcludeColumnName}' to specify the
column names from the source Table exclude from the final flipped
Table using the '-flipTableByIndexMap' operation. Multiple instances
of this switch are allowed. Default is to include all data Table
columns unless the filter is specified.]

-flipO:{colHdrName1,colHdrName2,...,colHdrNameN}
['-flipOrderHdrColNames:{colHdrName1,colHdrName2,...,colHdrNameN}'
to specify the list of columns in the source Table that will be
used to create the flipped Table multi-line header entries.
This option must be specified when using the '-flipTableByIndexMap'
operation.]

-flipRowF:{flipRowFilterNamesfile} or
-flipRowF:{*LIST*,name1,name2,...,nameK}
['-flipRowFilterNamesFile:{flipRowNamesFile}' or the alternate
'-flipRowFilterNamesFile:{*LIST*,name1,name2,...,nameK}'switch specifies
the source Table column names to use in filtering which source sample
columns data will be used as rows in the finalflipped Table using the
'-flipTableByIndexMap' operation. Analternative specification is
'-flipRowFilterNamesFile:{*LIST*,name1,name2,...,nameK}' where the values
are listed explicitly. If the "*LIST*" name is used instead of the file
name, then the rest of the switch specifies the row names. Only the
columns of the original Table that partially match one the
{flipRowNamesFile} entries will be transposed. Default is to transpose
all data Table columns unless the filter is specified.]

-flipRowGSP:{list of filter substrings}
['-flipRowGSPIDfilters:{list of filter substrings}' is an optiona
list of substring filters used to filter Experiment Group sample name
rows in the flipped table computation when using the
'-flipTableByIndexMap:{flipDataFile,flipIndexMapFile}' switch. It matches
case-independent substrings in the GSP ID names for the samples where
if more than one substring is specified, then they must all be found
for that sample to be used (e.g., ".Stat .GH" requires a ".Stat" and
a ".GH" to be present). Default is no filtering.]

-flipSa:{flipSaveOutputFile}
['-flipSaveOutputFile:{flipSaveOutputFile}' is the alternate
output (HTML and TXT) file name to use when generating the
flipped Table using the
'-flipTableByIndexMap:{flipDataFile,flipIndexMapFile,(opt)maxRows}'
switch. Default is to generate the output file name from the
input file base name, adding a postfix using the
'-addOutfilePostfix:{postfix name}' or "-flipped" default
postfix. If the switch is not specified, it will use the base
input file name. (See Example 14
for an example of it's usage.) ]

-flipT:{flipDataFile,flipIndexMapFile,(opt)maxRows}
['-flipTableByIndexMap:{flipDataFile,flipIndexMapFile,(opt)maxRows}'
to generate a transposed file using random access file indexing to
create a multi-line header (1 line for each column name in the
list) using the list of columns previously specified with the
-flipColTableList and -flipRowTableList filters. It uses the index-map
created with '-makeIndexMapFile:{colName1,colName2,...,colNameN}'
command. It analyze the index map Table and then uses
all columns before the ("StartByte", "EndByte") columns
to define the flipped Table header. See the '-flipColTableList' and
-flipRowTableList to restrict which flipped column data to use.
See the '-flipRowTableList' to restrict which flipped row data to
use. Default is to not flip the Table.]

-flipU:{colHdrName1,colHdrName2,...,colHdrNameN}
['-flipUseExactColumnNameMatch:{TRUE | FALSE}' to specify the exact
match filter flag. If an exact match, then match '-flipColumnName:{names}'
exactly, otherwise do look for substring matches. Ignore case in
both instances. This option may be specified when using the
'-flipTableByIndexMap' operation. The default if no
flipUseExactColumnNameMatch is specified is "AND".]

-font:{-1,-2,-3,-4,+1,+2,+3,+4}
['-fontSizeHtml:{font Size modifier}' to change
the <TABLE> FONT SIZE in the HTML file.]

-gui
['-gui' to invoke the graphical user interface version
of the converter. See
Using the Graphical User Interface (GUI) to run the converter.]

-hdrL:{n}
['-hdrLines:n' to include in header. The last line
row is the one searched for mapping column URLs.
Default is 1 line.]

-hdrM:{oldHdrColName,newHdrColName}
['-hdrMapName:{oldHdrColName,newHdrColName}' to map
an old header column name {oldHdrColName} to a new
name {newHdrColName}. There may be multiple instances
of this switch. Default is to not do any mappings.]

-joinT:{joinTableFile}
['-joinTableFile:{joinTableFile}' adds the contents of
the {joinTableFile} file to the table being processed.
This allows us to add fields that can be used for
sorting the new table by the {joinTableFile} data
if it is defined. This switch can not be used with
the -fastEditFile option. Default is not to join
any tables.]

-keepColumn:{colName}
['-keepColumn:{colName}' specifies which columns
to keep in multiple instances of the switch.
Then, when the Table is processed, it drops all
columns not listed. It may be used as an
alternative to -dropColumn as the Table may have
unknown column names. Default is not active.]

-help (or '?')
[print instructions to see the README.txt file.]

-hrefD:{colName,Url,mapToken}
['-hrefData:{colHdrName,Url,(optional)mapToken}' to
get the mapping of column header name and the Url to use
as a base link to use for making a URL for Table data
in that column. It makes the URL by appending the data
in cells in that column to the Url. ([TODO] If
the optional mapToken is specified, then replace the cell
contents for the occurance of the mapToken in Url.)
There can be multiple instances of this switch. See
the following switch '-hrefHeaderRow' to change the mapping
from Table data to header rows. ]
-hrefHeaderRow
['-hrefHeaderRowMapping' is used with the above switch
'-hrefData:{colHdrName,Url,(optional)mapToken}' to map
the data in the header row(s) instead of the data in
the Table data columns. It searches the first column of
the header rows to find the colHdrName to determine
the row to be mapped to that colHdrName. Unlike the
-hrefData option, the colHdrName can be embedded within
a string. The default is not to map the header rows.]

-inputD:{input directory}
['-inputDirectory:{input dir}' where the input
tab-delimited table .txt files to be converted are
found. By convention, we name other text files that
we may need, and want to keep in the inputDirectory
but do not want to convert to HTML, with a '.map'
file extension. Examples of non-data files include
'paramXXX.map', 'prolog.html', 'epilogue.html', etc.,
Default directory is 'data/'.]

-limitM:{maxNbrRows,(opt.)sortFirstByColName,(opt.)'A'scending or 'D'escending}
['-limitMaxTableRows:{maxNbrRows,(opt.)sortFirstByColName},
(opt.)'A'scending or 'D'escending}' to limit the number of
rows of a table to {maxNbrRows}. If the {sortFirstByColName}
is specified, then sort the table first before limit the
number of rows. Default is not to limit rows.]

-log:{new log file name}
['-logName:{new log file name}' to log all
information about the processing to the console and
then to save this output in a log file. The new file
must end in ".log". Default is to use the
"HTMLtools.log" file name.]

-makeI:{colName1,colName2,...,colNameN}
['-makeIndexMapFile:{colName1,colName2,...,colNameN}' to
make an index map Table file (same name as the input file
but with an .idx file extension) of the input file (or the
file output from -saveEditedTable2File after the input
table has been edited). The index file will contain the
specified columns in the column-list followed by the
StartByte, EndByte for data in the input table with those
column values. This file can then be used to quickly
index a huge input file probably using a Hash table of
the selected column names instances to lookup the
(start,end) file byte pointers to random access the
large file. The software to use the index file is not
part of HTMLtools at this time.
The default is not to make an index map file.]

-makeM:{makeMapTblFileName,orderedCommaColumnList}
['-makeMapFile:{makeMapTblFileName,orderedCommaColList}'
used with -concatTable command to also make a map
file at the same time. This switch is only used
with -concatTable. Default is no map is made.]

-makeP
['-makePrefaceHTML' to make a separate preface
HTML file from the input text proceeding the table
data. The file has the same name, but has a
"preface-" added to the front of the file name. The
first generated HTML file is then linked from the
second generated file. Default no preface file.]

-makeS
['-makeStatisticsIndexMapFile' to make a 'Statistics Index Map'
table file with the same base file name as the index map (.idx)
but with a .sidx file extension. It is invoked after the
IndexMap file is created (using the '-makeIndexMapFile' switch).
Therefore, it must be specified in a subsequent command line
(if using batch). Default is not to make a Statistics Index Map.]

-mapD:{$$keyword$$,toString}
['-mapDollarsigns:{$$keyword$$,toString}' to
map cell data of the form '$${keyword}$$' to
{toString}. The preface, epilogue as well as the
table cell data is checked to see if any keywords
should be mapped. There may be multiple instances
of this switch. Default is to not do any mappings.]

-mapH:{mapHdrNamesFile,fromHdrName,toHdrName}
['-mapHdrNames:{mapHdrNamesFile,fromHdrName,toHdrName}
to map header names. E.g., map long to short header
names, or map obscure to well-defined header names.
The map file (specified with a relative path) is a
tab-delimited and must contain both the {fromHdrName}
and {toHdrName} entries. Default is no mapping.]

-mapO
['-mapOptionsList' to map ;; delimited strings
to inactive <OPTION> pull-down option lists.
Default is no mapping to option lists.]

-mapQ
['-mapQuestionmarks:{??keyword??,toString}' to
map cell data of the form '??{keyword}??' to
{toString}. If the toString is BOLD_RED,
BOLD_GREEN, or BOLD_BLUE, then just map the
all ??{keyword}?? string to bold and red (green,
or blue). The preface, epilogue as well as the
table cell data is checked to see if any keywords
should be mapped. There may be multiple instances
of this switch. Default is to not do any mappings.]

-noB
['-noBorder' to set no border for tables. The
default is there is a 'BORDER=1' in the TABLE.]

-noHeader
['-noHeader' set no header for tables. The
default is there is a header in input file.]

-noHTML
['-noHTML' set to not generate HTML if it would
normally do so. This switch disallows generation of
HTML when doing a input file processing if that
operation also allows HTML generation. This is useful
if doing editing of large input files to generate
index maps or saved files. The default is to allow
the generation of HTML..]

-outputD:{output directory}
['-outputDirectory:{output directory}' to set the
output directory. The default directory is 'html/'.]

-reorderC:{colName,newColNbr}
['-reorderColumn:{colName,newColNbr}' to reorder
this column to the new column number. You may
specify multiple new columns (they must be
different). Those columns not specified are moved
toward the right. This is done after the list of
dropped columns has been processed. There can be
multiple instances of this switch. Default is not
to reorder columns.]

-reorderR
['-reorderRemainingColumnsAlphabeticly' used if doing
a set of -reorderColumn operations, sort the remaining
columns not specified, but that are used, alphabetically.
Default is not to sort the remaining columns.]

-rmvT
['-rmvTrailingBlankRowsAndColumns' in the table.
Default is not to remove trailing blank lines or
trailing blank columns.]

-saveE:{outTblFile,opt. "HTML"}
['-saveEditedTable2File:{outTblFile,opt. "HTML"}'
to make a Table file from the modified input
table stream. It is created after the Table is
edited by -dropColumns, -keepColumns,
-reorderColumns, -sortRowsByColumn. If the outTblFile
is not specified (i.e., ":,") then the input file name
with the name from the input file with the postfile
name from the '-addOutfilePostfix:{postfix name}' is
used. If the "HTML" option is set, it also outputs the
HTML when doing this operation. Note that the switch
should not be used with '-fastEditFile:{opt. output file}'
which can be used for converting very large files
without generating the HTML file. Default is not to save
the Table.]

-searchGui
['-searchGui' to invoke the graphical user interface for the
database search engine to generate a flip table. See
Search Database GUI generating specialized reports.
Also see Example-17
for examples of the default parameter file used as the
basis of the flip table generated. Default is no search GUI.]

-shrinkB:{opt. size for big,opt. font size decrement}
['-shrinkBigCells:{opt. size for big,opt. font size
decrement}' in the Table with more than the big
threshold number of characters/cell by decreasing
the font size to -5 (or the opt. font size
decrement) for those cells. The big size threshold
defaults to 25 characters. Setting the threshold to
1 forces all cells to shrink. Default is not to
shrink cells.]

-showDataHeatmapFlipTable
['-showDataHeatmapFlipTable' used to generate colored heat-map
data cells in a HTML conversion for a flip table using the
'-flipTableByIndexMap' option. It uses the global statistics
on the (digital) data in the Statistics Index Map .sidx file
if it exists to normalize the data and generate a cell color
background range in 7 quantiles of colors: dark green,
medium green, light green, white, light red, medium red, dark red.
Default is not to generate the colored heatmap.]

-sortFlip:{col data name}
['-sortFlipTableByColumnName:{col data name}' specifies the name
of field in the flip table to use in sorting by column data in
descending order in the generated table. It is used with the
'-flipTableByIndexMap' option. Note this name can be any of the
flipped header column values (multiheader data names}. When doing
the sort it matches the specified name with any of the header
rows to find the column to use for the sort. Default is not to
sort the generated flip table.]

-sortR:{colName,'A'scending or 'D'escending}
['-sortRowsByColumn:{colName,'A'scending or specified column.
You can specify 'Ascending' or 'D'escending. This is done after
any columns have been dropped or reordered. Default is not to
sort columns. If the column is not found, don't sort - just
continue. You can have multiple instances of the switches. If the
first column name is not found, it looks for the second, etc.
and only ignores the sort if no column names are found. Default
is not to sort the table.]

'-startT:{keyword}
['-startTableAtKeywordLine:{keyword} specifies the start
of the last line of a Table header by a keyword that
is part of any of the fields in that line. This is
useful when reading a file with complex preface info
with possibly multiple blank lines. It can be used
with the '-hdrLines' switch to specify multiple
header lines. Default no keyword search.]

-tableD:{tablesDirectory}'
['-tableDir:{tablesDirectory}' to set the various mapping
tables directory. These tables are used during various
conversion procedures. They include both the .txt and
the .map file (same file, but with different extensions).
Examples include: EGMAP.map(.txt), ExperimentGroups.map(.txt)
mAdbArraySummary.map(.txt). The default directory is
'data.Table/'.]

-useOnly
['-useOnlyLastHeaderLine' to reduce the number of header
lines to 1 even if there are more than 1 header line.
Default is to use all of the header lines.]

3.1 GenBatchScripts COMMANDS Extension

These commands are used to create batch scripts for subsequent use by HTMLtools. This set of commands is called the GenBatchScript commands. The GenBatchScript process is described in Section 1.1. The -genBatchScript command is only used to generate these batch scripts in a set of structured trees suitable for copying directly to a Web server. It uses a test-ToDo-list.txt Table to specify a list of tests, column "Test-name", a column "Relative directory" where the data is to be saved and some documentation columns "Page label", "Page description", and "Tissue name" that are used for helping generate the Summary HTML Web pages and params .map files used in the subsequent conversion of the .txt Table data files to HTML documentation. See Example 15. for am example of a params.map file using the GenBatchScripts commands.

Section 7 describes creating and running the scripts for the batchScripts/ directory for creating Web pages.

The parameters specify the data used in the output files generation include several directories in the batchScripts/ directory:

batchScripts/ParamScripts/ - contains the GenBatchScript's generated params .map files that will be used in the subsequent conversion batch run.
batchScripts/inputTree/ - contains tab-delimited .txt input files in the same directory structure as that defined in the test-ToDo-list Table.
batchScripts/Summary/ - contains the subsequent HTML summary Web page files in the same relative directory structure as that defined in the test-ToDo-list Table.
batchScripts/Analysis/ - contains the subsequent HTML test Web page files generated when the batch script is run. Data is in the same relative directory structure as that defined in the test-ToDo-list Table.
batchScripts/JTV/ - contains the subsequent JTV and their HTML test Web page files generated when the batch script is run. Data is in the same relative directory structure as that defined in the test-ToDo-list Table.
batchScripts/JTVjars/ - contains the JTV jar files required to support the Java TreeView at run time.
batchScripts/data.Table/ - contains the mapping tables required to support processing the buildWebPages.doit batch processing.

Additional data files are used when the -genBatchScripts command is run including:

data.Table/EGMAP.map - to map long to short header names
data.Table/ExperimentGroups.map - to be a resource for Expression Group data
data.Table/CellTypeTissue.map - contains the Introductions used in the Summary Web page generation.
data.Table/mAdbArraySummary.map - contains the table required for SGID mapping of JTV files.
Prolog template HTML files (prologMRR.html and prologJTV.html) used when the params .map scripts are run are run using the subsequent batchScripts/buildWebPages.bat processing.
Epilogue template HTML files (epilogueMRR.html and epilogueJTV.html) used when the params .map scripts are run using the subsequent batchScripts/buildWebPages.bat processing.
Summary template HTML files (SumaryProlog.html, SummaryExperimental.html, SummaryAnalysis.html, SummaryFurtherAnalysis.html, SummaryEpilogue.html) used when generating the Summary HTML Web pages for each tissue. These are saved in batchScripts/Summary/.
Params .map template files used when generating the specific params .map files for the subsequent batch run. These include (paramsTemplate-MRR.map, paramsTemplate-MRR-keep.map, paramsTemplate-JTV.map,paramsTemplate-JTV-ReZip.map, and paramsTemplate-MRR-saveFile.map). The generated params .map files are saved in batchScripts/ParamScripts/.

*** REWRITE and EDIT more detailed and generalized description ***

There may be multiple instances of the -genCopySupportFile, -genParamTemplate, -genSummaryTemplate switches.

                        
 -genBatch:{batchDir,paramScriptsDir,inputTreeDir,summaryDir,analysisDir}' 
                   ['-genBatchScripts:{batchDir,paramScriptsDir,inputTreeDir,
                    outputTreeDir,analysisTreeDir,JTVDir}' to generate a set 
                    of scripts to batch convert a set of tab-delimited Table
                    test data files specified by the -genTestFile:{testToDoFile}
                    Table in the {batchDir} directory. It generates a set of 
                    parameter .map files in the {paramScriptsDir} directory. It
                    also generates a set of summary HTML Web pages in {summaryDir}
                    that describe the data, one page for each type of tissue, 
                    and (pre) generates links to data that will be generated in
                    the {analysisTreeDir} when the batch script is subsequently 
                    run. These new params .map files can then be run by a converter
                    batch file called buildWebPages.doit started with a 
                    Windows buildWebPages.bat BAT file to start the batch
                    job (both files are in the batchDir directory along with a 
                    copy of HTMLtools.jar). The buildWebPages.bat file
                    could easily be edited to run on MacOS-X or Linux. The paths 
                    created in the {inputTreeDir}, and {analysisTreeDir} base paths
                    use the "Relative Directory" data in the {testToDoFile} within
                    those directories. This generated batch .doit script will
                    process a data set to generate a set of HTML pages and 
                    converted database .txt files defined by the {testToDoFile}
                    Table database. Default is no batch script generation. 
                    Additional switches required with -genBatchScripts are:
                    -genTestFile, -genMapEGdetails, and -genMapEGintroduction]

 -genC:{support file}
                   ['-genCopySupportFile:{support file}' to specify a list
                    of support files to copy to the output batchDir (e.g., 
                    '-outputDir:batchScripts'). The support files are 
                    specified with a list created using multiple instances of 
                    -genCopySupportFile:{support file}. Default is no support
                    files to copy.]                    
                     
 -genMapEGd:{EGdetailsMapFile}
                   ['-genMapEGdetails:{EGdetailsMapFile}' specifies the 
                    'details' Table used when the -genBatchScripts switch is 
                    invoked. This is required when the -genBatchScripts switch
                    is used.] 
                    
 -genMapIntro:{introductionMapFile}
                   ['-genMapIntroduction:{introductionMapFile}' specifies the 
                    'Introduction' Table used when the -genBatchScripts switch
                    is invoked. This is required when the -genBatchScripts 
                    switch is used.]
                    
 -genP:{name,paramTemplateFileName}
                    ['-genParamTemplate:{name,paramTemplateFileName}' to
                    specify a list of parameter map Templates that are used 
                    for mapping the test-ToDo-list data so that (param-MRR, 
                    param-MRR-keep, param-JTV) etc. dynamically.  These are 
                    then mapped into the following keywords that may appear in
                    any of these templates: $$TISSUE$$, "$$TEST_NAME$$", 
                    "$$MRR_FILE$$", $$DESCRIPTION$$, $$PROLOG$$, $$EPILOG$$, 
                    $$DATE$$. Multiple unique instances are allowed. The
                    default is no parameter templates.]
   
 -genS:{orderNbr,templateFileName}               
                   ['-genSummaryTemplate:{orderNbr,templateFileName}' to define
                    a list of Summary Templates that are used for mapping the
                    test-ToDo-list data so that (summaryProlog, summaryExperimental,
                    summaryAnalysis, summaryFurtherAnalysis, summaryEpilogue) 
                    etc. dynamically. Set by -genSummaryTemplate:{orderNbr,
                    templateFileName} instances that can be used to generalized 
                    the currently hardwired. These are then mapped into the 
                    following keywords that may appear in any of these templates: 
                    $$TISSUE$$, $$LIST_EXPR_GROUPS$$, $$DESCRIPTION$$, 
                    $$ANALYSIS$$, $$FURTHERANALYSIS$$, $$DATE$$. The 
                    $$INTRODUCTION$$ is extracted from the {"CellTypeTissue.map"}.
                    Default is no templates being defined. Multiple instances
                    are allowed where they are concatenenated by the orderNbr
                    associated with each template.] 
 
 -genTest:{testToDoFile}
                   ['-genTestFile:{testToDoFile}' specifies the tests to do
                    when the -genBatchScripts switch is invoked. This is 
                    required when the -genBatchScripts switch is used.]  
                     
 -genTree:{sourceTreeDir,destDir}
                   ['-genTreeCopyData:{sourceTreeDir,destDir}' to copy an
                    input data tree data to batch scripts subdirectory.
                    There can be multiple instances of this option.
                    Default is to not copy tree data.]

3.2 Tests-Intersection COMMANDS Extension

These Tests-Intersections subset of commands are only used to create Tests-Intersection tables from mAdb Retrieval Reports (MRR) containing fold-change data from the Tests-ToDo database used with GenBatchScripts. The primary command to invoke this is the makeTestsIntersectionTbl switch. These Tests-Intersection commands can be used with the regular HTML or table editing commands such as '-noHTML' and/or '-saveTable' switches. If HTML is generated, then the '-addProlog' and '-addEpilogue', '-mapQuestion', and '-mapDollar', '-sortByColumn', -limitMaxTableRows, etc. See Example 13 for an example of generating a Tests-Intersection tab-delimited table and HTML Web page.

                  
 -addFCranges
                   ['-addFCrangesForTestsIntersectionTable' may be used when 
                    generating a table Tests-Intersection Table using the 
                    '-makeTestsIntersectionTbl:{testsToDoFile}'. This switch
                    does a simple fold-change (FC) row analysis after the
                    Tests-Intersection Table is created by adding ("Min FC" 
                    "Max FC" "FC Range") data for each row. Because this
                    extends the table, you can sort by any of these fields.]                    
                    
 -addRange
                   ['-addRangeOfMeansToTItable' to add the ("Range Mean A", 
                    "Range Mean B" and "FC counts %") computations to an expanded
                    Tests-Intersection Table table. Default is to not add these 
                    fields.]
                                                            
 -filterData:{dataTableField,d1,d2,...,dn}
                   ['-filterDataTestIntersection:{dataTableField,d1,d2,...,dn}' that
                    is used with the '-makeTestsIntersectionTbl:{testsToDoFile}' to
                    filter the MRR rows using the  specified MRR {dataTableField}
                    and use it if it matches any of {d1,d2,...,dn} substrings.
                    The default is not to filter the Tests-Intersection Table.]
                                        
 -filterTest:{testTableField,d1,d2,...,dn}
                   ['-filterDataTestIntersection:{testTableField,d1,d2,...,dn}' that
                    is used with the '-makeTestsIntersectionTbl:{testsToDoFile}' to
                    filter the Tests-ToDo table rows using the specified {testTableField}
                    and use it if it matches any of {d1,d2,...,dn} substrings.
                    The default is not to filter the Tests-Intersection Table.]
                    
 -makeT:{testsToDoFile,testsInputTreeDir}
                   ['-makeTestsIntersectionTbl:{testsToDoFile}' that
                    generates a table Tests-Intersection Table that contains
                    data from the individual tests from the tests input data tree
                    specified by the tests in -tableDir directory in the
                    {testsToDoFile} which specifies the relative data file tree.
                    The tree is found in -inputDir directory. The data files in 
                    the tree are used as input data. The computed table is 
                    organized by rows of +FC genes/Feature-IDs and -FC 
                    genes/Feature-IDs. The data from the {testsToDoFile} is used 
                    to get additional information for each test as follows.
                    This switch is used with the '-noHTML' and/or '-saveTable'
                    switches. If HTML is generated, then the '-addProlog' and
                    '-addEpilogue', '-mapQuestion', and '-mapDollar' can be used.
                    You can filter the MRR rows using the 
                    '-filterDataTestIntersection:{dataTableField,d1,d2,...,dn}' and
                    the {testsToDoFile} test data using the                     
                    '-filterTestTestIntersection:{testTableField,d1,d2,...,dn}'.
                    The default is not to make the Tests-Intersection Table. You
                    can do a simple FC row analysis by adding ("Min FC" "Max FC"
                    "FC Range") for each row using the
                    '-addFCrangesForTestsIntersectionTable' switch.]

3.3 Java TreeView COMMANDS Extension

These commands are only used to convert Java TreeView (JTV) mAdb heatmap data files for use on the Jak Stat Prospector Web site. In addition to mapping the sample names from mAdb names to GSP ID names, it reorders the data so that the "gene - gene description" appears first rather than the "WID: #" in the "NAME" field. It also changes the contents of the "YORF" field data to the "gene - gene description" data so that when mousing over a heatmap cell in the zoom window, the upper left-hand corner displays the "gene - gene description" for the row and the GSP ID sample for the column.

Reorder:
  "WID:... || xxxxxx_at || MAP:... || gene -- geneDescr. || RID:..."
to
  "gene -- geneDescr. || xxxxxx_at || WID:... ||  MAP:... || RID:..."

You can not mix tab-delimited file to HTML conversions with JTV conversions in the params .map files.

 
 -jvtB:{button name for JTV activation button}
                   ['jvtButtonName:{button name for JTV activation button}' 
                    that may be used with '-jtvHTMLgenerate' to label the 
                    button to activate Java TreeView. The default is 
                    "Press the button to activate JTV".]

 -jtvC:{JTV jars directory}
                   ['-jtvCopyJTVjars:{JTV jars directory}' to copy the 
                    JTV jar files and plugins to the jtvOutputDir. 
                    The default is no copying of the .jar files.]

 -jvtD:{description text for prologue}
                   ['-jvtDescription:{description text for prologue}' 
                    that may be used with '-jtvHTMLgenerate' to insert 
                    additional text into the prolog where it replaces 
                    $$DATA_DESCRIPTION$$. The default is no description.]                   
                   
 -jtvFiles:{f1,f2,...,fn}
                   ['-jtvFiles:{f1,f2,...,fn}' to specify list of files
                    here rather than all in all of the files in the 
                    jtvInputDir. You can have multiple instances of this
                    switch.]
                    
 -jtvH:
                   ['-jtvHTMLgenerate' to generate a HTML file to invoke
                    the JTV applet for each JTV specification in the 
                    jtvInputDir. It puts the HTML file in the jtvOutputDir. 
                    Some of the non-JTV HTML modification switches are 
                    operable including: '-addEpilogue', '-addOutfilePostfix',
                    '-addProlog', '-mapQuestionmarks'. The default is to not
                    generate JTV HTML.]

 -jtvI:{input JTV directory}
                   ['-jtvInputDir:{input JTV directory}' to set the
                    input directory of JTV sub directories. This contains
                    the zipped or unzipped JTV files downloaded from mAdb.
                    Each zip file contains 3 files with (.atr,.cdt,.gtr) 
                    extensions. Default directory is 'JTVinput/'.]

 -jtvO:{output JTV directory}
                   ['-jtvOutputDir:{output JTV directory}' to set the
                    output directory of JTV sub directories. The converted
                    JTV directory and a corresponding HTML file are saved
                    there. Default directory is 'JTVoutput/'.]

 -jtvN:{mAdbArraySummary,mapHdrNamesFile,fromHdrName,toHdrName}
                   ['-jtvMapping:{mAdbArraySummaryFile,mapHdrNamesFile,
                    fromHdrName,toHdrName}' to convert a list of sub 
                    directories of JTV file sets by reading the three files
                    from the each of the subdirectories in the jtvInputDir
                    directory. The {mAdbArraySummaryFile} and {mapHdrNamesFile} 
                    are specified with a relative path. It maps the .cdt file
                    in each sub directory to use the {toHdrName} column of
                    the equivalent mapNamesFile map Table instead of 
                    the "EID:'mAdb ID'" as generated by mAdb. The mapping 
                    between "mAdb ID" and short array names is done using 
                    the {fromHdrName} column of the jtv_mAdbArraySummaryFile
                    Table map. It then writes out the JTV subset to a created
                    sub directory in jtvOutputDir that has the same base
                    name as the input JTV subdirectory being processed.
                    See the optional switches: '-jtvInputDir:{jtvInputDir}'
                    and '-jtvOutputDir:{jtvOutputSubDir}' to set the
                    directories to other than the defaults ("JTVinput" and
                    "JTVoutput"). The values for {fromHdrName} and {toHdrName}
                    should be in the of mapNamesFile.]

 -jtvR             [TODO]
                   ['-jtvReZipConvertedFiles' to reZip the converted files
                    in the output JTV directory in a file with the same name. 
                    Default is not to zip the converted files.]  
                    
                     
 -jtvTableDir:{tablesDirectory}'
                   ['-jtvTableDir:{tablesDirectory}' to set the various mapping
                    tables directory. These tables are used during various
                    conversion procedures. They include both the .txt and
                    the .map file (same file, but with different extensions).
                    Examples include: EGMAP.map(.txt), ExperimentGroups.map(.txt)
                    mAdbArraySummary.map(.txt). Note: this switch is used when
                    processing JTV files, but may also be set with the
                    '-tableDir:{tablesDirectory}' switch. The default directory
                    is 'data.Table/'.]

4. EXAMPLES

We demonstrate running the program with a set of examples (and a few sub examples). The first (1 through 8) are for converting tab-delimited .txt files to .html files. Example 9 illustrates remapping sample labels for a Java TreeView conversion. Example 10 shows how these examples can be run by specifying a list of parameter .map files using a batch command. Example 11 illustrates editing a very large .txt file into another .txt file using the fast edit command. Example 12 illustrates URL mapping the header data in a transposed table. Example 13 generates a Tests-Intersection .txt and .html table from the tests data also used in Example 13. Example 14 illustrates generating a flipped table with hyperlinked multi-line headers with data filtered by rows and column name filters. Example 15 illustrates generating a set of batch jobs to convert data described in a table file generating summary Web pages, a set of params .map files in a tree structure. Example 16 is used for preparing a database and an Index Map files for used in the GUI based database search shown in Example 17.

One could experiment with these parameter files adding or removing various options such as -dropColumn, -reorderColumn, -sortTable, etc.

Example 1 - no arguments.
Example 2 - defaults arguements for Example 1.
Example 3 - gets arguments from data/params.map file.
Example 4 - converts set of simple EGxxx.txt Excel files.
Example 4.1 - concatenates the set of similar EGxxx.txt Excel files to a single .txt file.
Example 4.2 - extends Example 4.1 to generate a HTML file of a concatenated set of .txt files.
Example 4.3 - extends Example 4.1 to generate a map file when creating a concatenated .txt file (the map could be used in the MRR conversions).
Example 5 - converts an ExperimentGroup.txt Excel file.
Example 5.1 - similar to Example 5 but exports large cells to small HTML files to make it more readable.
Example 6 - converts a 2 sub-table spreadsheet (mAdb Retrieval Report). It also drops some columns, maps array names to 'GSP ID' names in EGxxx files, reorders colummns, sorts columns, and maps some bioinformatic DB keyword data to HREF links to those databases.
Example 7 - similar to Example 6 but maps long names to 'Short GSP ID' names in EGxxx files.
Example 8 - similar to Example 7, but specifies columns to keep rather than to drop.
Example 9 - converts mAdb generated Java TreeView data by mapping array names, generating HTML JTV applet startup files.
Example 10 - batch process a .doit file list of data.XXX/paramXXX.map conversions.
Example 11 - edit a very large .txt file into another .txt file using the fast edit command.
Example 12 - URL map multiple row header data in a transposed table.
Example 13 - generate a Tests-Intersection .txt and .html table.
Example 14 - generate a flipped tables with hyperlinked multi-line headers with data filtered by rows and column name filters.
Example 15 - generate a set of batch jobs to convert data described in a table file generating summary Web pages, a set of params .map files in a tree structure.
Example 16 - generates a .txt database file, an .idx Index Map file, and a .sidx Statistics Index Map file.
Example 17 - the paramsSearchDefault.map file with additional information used by the search database GUI ("-searchGUI" option).

Example 1.

The program with no arguments uses the defaults described above. I will look for tab-delimited .txt files in the default input directory (data/) and save the generated HTML files in the output directory (html/). It also looks for the default template files prolog.html and epilogue.html in the current directory.
================================================================

 
     HTMLtools

Example 2.

The defaults for Example 1 are shown explicitly here. [The '\' indicates line continuations in Unix for ease of reading here, but they should all be on the same line when the command is issued from the command line unless line continuation charactes are used for your particular operating system.]
================================================================

 
HTMLtools -addProlog:prolog.html -addEpilogue:epilogue.html \
     -inputDir:data -outputDir:html -tableDir:data.Table

Example 3.

This gets the arguments from the default data/params.map file if it exists. These params .map files are generally kept in the in the same directory as the .txt input files to be converted. We could use any other file extension except .txt (since we are converting all .txt files found in the data directory). So by convention we use the .map file extension instead.
================================================================

 
     HTMLtools data/params.map

Example 4.

This uses a simple spreadsheets with row background colors alternated between the prolog background color and white, big cells have their font shrunk, trailing blank rows are removed, The switches are in file 'params-GSPI-EG.map'. The '-extractRow:"Experiment Group ID (1),1,data.Table/ExperimentGroups.map,DL"' switch tells it to extract the row that matches data in column "Experiment Group ID (1)" in the table with the same column name in the ExperimentGroups.map file column (row 1) and generate a >DL< list in the epilogue. This lets you use data from a meta-database table to document each of the individual tables being converted.
E.g., GSP-Inventory.xls EG samples data saved from Excel worksheets. Note: you must double quote arguments that use spaces.
===============================================================

      HTMLtools data.GSPI-EG/params-GSPI-EG.map 
      
where: data.GSPI-EG/params-GSPI-EG.map 
contains:

 
#File:params-GSPI-EG.map
#"Revised: 3-30-2009"
#
-addPrologue:data.GSPI-ExpGrp/prolog.html
-addEpilogue:data.GSPI-ExpGrp/epilogue.html
-addRowNumbers
-addTableName:"GSP Experiment Group Samples"
-inputDir:data.GSPI-EG
-outputDir:html/GSP/GSP-Inventory/HTML
-tablesDir:data.Table
#
-extractRow:"Experiment Group ID (1),1,data.Table/ExperimentGroups.map,DL"
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-rmvTrailingBlankRowsAndColumns
#
#"----------- End --------- "

Example 4.1

Example to generate a concatenated .txt file from a set of simple spreadsheets. Row background colors are alternated, big cells have their font shrunk, trailing blank rows are removed.
E.g., single table from set of EG001.txt to EG0nn.txt single files with single row headers from the GSP-Inventory.xls data saved from Excel worksheets. They are concatenated to file "EGMAP.txt". The switches are in file 'params-GSPI-EG-concat.map'. Note: you must double quote arguments that use spaces.
==================================================================

      HTMLtools data.GSPI-EG/params-GSPI-EG-concat.map 
      
where: data.GSPI-EG/params-GSPI-EG-concat.map 
contains:

#File:params-GSPI-EG-concatTXT.map
#"Revised: 3-29-2009"
#
-addPrologue:data.GSPI-EG/prolog.html
-addEpilogue:data.GSPI-EG/epilogue.html
-addRowNumbers
-addTableName:"GSP Inventory Concatenated List of all EG Samples"
-inputDir:data.GSPI-EG
-outputDir:data.Table
-tablesDir:data.Table
#
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-rmvTrailingBlankRowsAndColumns
#
#"Save the concatenated data in the following file."
#-concatTables:EGALLDataSet.txt
-concatTables:EGMAP.txt,noHTML
#
#"----------- End --------- "

Example 4.2

This example extends Example 4.1 to generate a HTML file of a concatenated set of .txt files. They are concatenated to the "EGMAP.html" file. The switches are in file 'params-GSPI-EG-concat.map'. Note the use of the 'noTXT' argument in the '-concatTables' switch. Row background colors are alternated, big cells have their font shrunk, trailing blank rows are removed. The switches are in file 'data.GSPI-EG/params-GSPI-EG-concatHTML.map'. Note: you must double quote arguments that use spaces.
==================================================================

      HTMLtools data.GSPI-EG/params-GSPI-EG-concatHTML.map 
      
where: data.GSPI-EG/params-GSPI-EG-concatHTML.map 
contains:

#File:params-GSPI-EG-concatHTML.map
#"Revised: 3-29-2009"
#
-addPrologue:data.GSPI-ExpGrp/prolog.html
-addEpilogue:data.GSPI-ExpGrp/epilogue.html
-addRowNumbers
-addTableName:"GSP Inventory Concatenated List of all EG Samples"
-inputDir:data.GSPI-EG
-outputDir:html/GSP/GSP-Inventory/HTML
-tablesDir:data.Table
#
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-rmvTrailingBlankRowsAndColumns
#
#"Save the concatenated data in the following file."
#-concatTables:EGALLDataSet.txt
-concatTables:EGMAP.txt,noTXT
#
#"----------- End --------- "

Example 4.3

This example (extends Example 4.1) to generate a map file from concatenated .txt files from a set of simple spreadsheets using the -concatTable switch. The generated file is saved in file data.Table/EGMAP.map. Alternatively, the .map file could be specified with the '-mapHdrNames' switch to restrict the columns to appear in the generated .map file. There is no HTML file generated. The switches are in file 'data.Maps/params-Maps-EGMAP-map.map'. Note: you must double quote arguments that use spaces.
==================================================================

      HTMLtools data.Maps/params-Maps-EGMAP-map.map
       
where: data.Maps/params-Maps-EGMAP-map.map 
contains:

#File:params-Maps-EGMAP-map.map
#"Revised: 3-30-2009"
#"Generate the EGMAP.map file, but no HTML file."
#
-addRowNumbers
-addTableName:"Concatenation of all GSP Experiment Groups tables."
-inputDir:data.Table
-outputDir:data.Table
-tablesDir:data.Table
-files:"EGMAP.txt"
#
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-rmvTrailingBlankRowsAndColumns
#
-concatTables:EGMAP.map,noHTML
#
#"---------- end ---------"

Example 5.

This uses a simple spreadsheet with mapping some of the cells to colored bold fonts. Row background colors are alternated, big cells have their font shrunk, trailing blank rows are removed. The switches are in file 'params-GSPI-ExpGrp.map'.
E.g., GSP-Inventory.xls 'ExperimentGroups' sheet describing samples data saved from Excel worksheets. Note: you must double quote arguments that use spaces.
==================================================================

      HTMLtools data.GSPI-ExpGrp/params-GSPI-ExpGrp.map
       
where: data.GSPI-ExpGrp/params-GSPI-ExpGrp.map 
contains:

#File:params-GSPI-ExpGrp.map
#"Revised: 3-30-2009"
#
-addPrologue:data.GSPI-ExpGrp/prolog.html
-addEpilogue:data.GSPI-ExpGrp/epilogue.html
-addRowNumbers
-addSubTitleFromInputFile
-addTableName:"GSP Experiment Groups Details"
-files:"ExperimentGroups.txt"
-inputDir:data.Table
-outputDir:html/GSP/GSP-Inventory/HTML
-tablesDir:data.Table
#
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-rmvTrailingBlankRowsAndColumns
#
-mapOptionsLists
-mapQuestionmarks:WHO,BOLD_RED
-mapQuestionmarks:WHAT,BOLD_RED
-mapQuestionmarks:WHEN,BOLD_RED
#
#"----------- End --------- "

Example 5.1

This example is an extension of Example 5., except that it uses the '-exportBigCellsToHTMLfile:200' to export large cells with more than 200 characters to separate small HTML files and generate hyperlinks to those small files in the affected cells. This makes the spreadsheet more readable when there are some cells that have a large number of characters. The switches are in file 'params-GSPI-ExpGrp-exportBigCells.map'.
E.g., GSP-Inventory.xls 'ExperimentGroup' samples data saved from Excel worksheets. Note: you must double quote arguments that use spaces.
==================================================================

      HTMLtools data.GSPI-ExpGrp/params-GSPI-ExpGrp-exportBigCells 
      
where: data.GSPI-ExpGrp/params-GSPI-ExpGrp-exportBigCells.map 
contains:

#File:params-GSPI-ExpGrp-exportBigCells.map
#"Revised: 3-30-2009"
#
-addOutfilePostfix:"-BRC"
-addPrologue:data.GSPI-ExpGrp/prolog.html
-addEpilogue:data.GSPI-ExpGrp/epilogue.html
-addRowNumbers
-addSubTitleFromInputFile
-addTableName:"GSP Experiment Groups Details"
-inputDir:data.Table
-outputDir:html/GSP/GSP-Inventory/HTML
-tablesDir:data.Table
-files:"ExperimentGroups.txt"
#
-alternateRowBackgroundColor:white
-shrinkBigCells:25,-5
-mapOptionsLists
-mapQuestionmarks:WHO,BOLD_RED
-mapQuestionmarks:WHAT,BOLD_RED
-mapQuestionmarks:WHEN,BOLD_RED
-rmvTrailingBlankRowsAndColumns
#
-exportBigCellsToHTMLfile:200
#
#"----------- End --------- "

Example 6.

Example using a 2 sub-table spreadsheet with the first and second tables being separated by a blank line. This example also allows 2-line headers, dropping some of the columns, and mapping some of the column cell data to URLs and lists of ';;' separated items in cells to be mapped to non-active <OPTION> lists. It creates a preface HTML file from the first part of the input file and links to in the second Table file. The '-mapHdrNames' switch is used to map the long data names to shorter distinct names specified in a mapping table. After dropping columns, using the '-reorderColumn' switch it reorders columns (ignoring ones that don't exist). It sortsthe rows by a particular column '-sortRowsByColumn' where it uses the p-Value column if it exists, else it sorts by the Difference data if it exists, etc.
E.g., mAdb Microarray Retrieval Reports (MRR) from the Excel download. where MRR is a 'mAdb Microarray Retrieval Report' report. The switches are in file 'params-MRR.map'. Note: you must double quote arguments that use spaces.
===============================================================

      HTMLtools data.MRR/params-MRR.map
      
where: data.MRR/params-MRR.map  
contains:

#File:params-MRR.map
#"Revised: 5-28-2009"
#
-addPrologue:data.MRR/prolog.html
-addEpilogue:data.MRR/epilogue.html
-addRowNumbers
-addTableName:"mAdb Microarray Retrieval Report"
-inputDir:data.MRR
-outputDir:html/data.MRR
-tablesDir:data.Table
#
#"Limit the number of rows to the highest 500 fold-change values"
-limitMaxTableRows:"500,A-B Mean Difference,Descending"
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
-hdrLines:2
-hasEmptyLineBeforeTable
-makePrefaceHTML
-mapOptionsLists
#
#"Map header names. Select  from field='Affy .CEL file (16)'"
#"  to field= 'GSP ID (9)' or 'Simple GSP ID (10)'" 
-mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)"
#
#"Drop some of the columns"
-dropColumnColumn:"mgB36 Chr:Start-Stop"
-dropColumn:"mgB36 Cytoband"
-dropColumn:Annotation_Src
-dropColumn:UniGene
-dropColumn:RefSeq
-dropColumn:Refseqs_Hit
-dropColumn:geneIDS_Hit
-dropColumn:"Entrez GeneID"
-dropColumn:"Locus Tag"
-dropColumn:"BioCarta  Pathways"
-dropColumn:"KEGG Pathways"
#-dropColumn:"Gene Ontology Terms" (remove # if want to drop)
-dropColumn:"GO Tier2 Component"
-dropColumn:"GO Tier3 Component"
-dropColumn:"GO Tier2 Function"
-dropColumn:"GO Tier3 Function"
-dropColumn:"GO Tier2 Process"
-dropColumn:"GO Tier3 Process"
#"The following was added 5/28/09"
-dropColumn:"Map"
-dropColumn:"mgB37_Probe Chr:Start-Stop"
-dropColumn:"mgB37_Probe Cytoband"
-dropColumn:"mgB37_RefSeq Chr:Start-Stop"
-dropColumn:"mgB37_RefSeq Cytoband"
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"Sort rows by column - use whichever comes first"
-sortRowsByColumn:"A-B Mean Difference",Descending
-sortRowsByColumn:Difference,Descending
-sortRowsByColumn:p-Value,Ascending
-sortRowsByColumn:Gene,Ascending
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Entrez GeneID",http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"----------- End --------- "

Example 7.

Similar to Example 6, but remaps the array names to the shorter ID names in the example.
E.g., mAdb Microarray Reports (MRR) from the Excel download. reordered. The switches are in file 'params-MRR-Short_GSP_ID.map'. Note: you must double quote arguments that use spaces.
===============================================================

      HTMLtools data.MRR/params-MRR-Short_GSP_ID.map
       
where: data.MRR/params-MRR-Short_GSP_ID.map  
contains:

#File:params-MRR-Short_GSP_ID.map
#"Revised: 5-28-2009"
#
-addOutfilePostfix:"-Short_GSP_ID"
-addPrologue:data.MRR/prolog.html
-addEpilogue:data.MRR/epilogue.html
-addRowNumbers
-addTableName:"mAdb Microarray Retrieval Report"
-inputDir:data.MRR
-outputDir:html/data.MRR
-outputDir:html
-tablesDir:data.Table
#
#"Limit the number of rows to the highest 500 fold-change values"
-limitMaxTableRows:"500,A-B Mean Difference,Descending"
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
-hdrLines:2
-hasEmptyLineBeforeTable
-makePrefaceHTML
-mapOptionsLists
#
#"Map header names. Select  from field='Affy .CEL file (16)'"
#"  to field= 'GSP ID (9)' or 'Simple GSP ID (10)'"
-mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),Simple GSP ID (10)"
#
#"Drop some of the columns"
-dropColumnColumn:"mgB36 Chr:Start-Stop"
-dropColumn:"mgB36 Cytoband"
-dropColumn:Annotation_Src
-dropColumn:UniGene
-dropColumn:RefSeq
-dropColumn:Refseqs_Hit
-dropColumn:geneIDS_Hit
-dropColumn:"Entrez GeneID"
-dropColumn:"Locus Tag"
-dropColumn:"BioCarta  Pathways"
-dropColumn:"KEGG Pathways"
#
#-dropColumn:"Gene Ontology Terms" (remove # if want to drop)
-dropColumn:"GO Tier2 Component"
-dropColumn:"GO Tier3 Component"
-dropColumn:"GO Tier2 Function"
-dropColumn:"GO Tier3 Function"
-dropColumn:"GO Tier2 Process"
-dropColumn:"GO Tier3 Process"
#"The following was added 5/28/09"
-dropColumn:"Map"
-dropColumn:"mgB37_Probe Chr:Start-Stop"
-dropColumn:"mgB37_Probe Cytoband"
-dropColumn:"mgB37_RefSeq Chr:Start-Stop"
-dropColumn:"mgB37_RefSeq Cytoband"
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"Sort rows by column - use whichever comes first"
-sortRowsByColumn:"A-B Mean Difference",Descending
-sortRowsByColumn:Difference,Descending
-sortRowsByColumn:p-Value,Ascending
-sortRowsByColumn:Gene,Ascending
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Entrez GeneID",http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"----------- End --------- "

Example 8.

An alternative way to specify columns is using a list of '-keepColumn' switches. This is the same as Example 7, however the columns are specified with the -keepColumn switches. This is useful if you have data that you don't want to use and don't know the names or many of the columns that you don't want.
E.g., mAdb Microarray Reports (MRR) from the Excel download. reordered. The switches are in file 'params-MRR-keep.map'. Note: you must double quote arguments that use spaces.
===============================================================

      HTMLtools data.MRR/params-MRR-keep.map
       
where: data.MRR/params-MRR-keep.map
contains:

#File:params-MRR-keep.map
#"Revised: 4-9-2009"
#
-addOutfilePostfix:"-keep"
-addPrologue:data.MRR/prolog.html
-addEpilogue:data.MRR/epilogue.html
-addRowNumbers
-addTableName:"mAdb Microarray Retrieval Report"
-inputDir:data.MRR
-outputDir:html/data.MRR
-tablesDir:data.Table
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
-hdrLines:2
-hasEmptyLineBeforeTable
-makePrefaceHTML
-mapOptionsLists
#
#"Specify columns to keep, the rest are dropped"
-keepColumn:Gene 
-keepColumn:p-Value 
-keepColumn:Difference  
-keepColumn:"A-B p-Value"  
-keepColumn:"A-B Mean Difference" 
-keepColumn:"A Mean"  
-keepColumn:"B Mean"
-keepColumn:"Well ID" 
-keepColumn:"Feature ID" 
-keepColumn:Description 
-keepColumn:"Gene Ontology Terms"
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"Sort rows by column - use whichever comes first"
-sortRowsByColumn:"A-B Mean Difference",Descending
-sortRowsByColumn:Difference,Descending
-sortRowsByColumn:p-Value,Ascending
-sortRowsByColumn:Gene,Ascending
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Entrez GeneID",http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=retrieve&dopt=graphics&list_uids=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"----------- End --------- "

Example 9.

Convert all mAdb generated Java TreeView .zip files or (unpacked) directories in the JVT input directory. It is invoked by the switch '-jtvNamesMap:{mAdbArraySummaryFile,mapHdrFile,fromHdrName,toHdrName}'. The JVT input (output) directories are specified with -jtvInputDir (-jtvOutputDir) switches. The JVT input directory contains saved mAdb JTV files that are are unzipped. The zip files could be saved with 'save names' indicating the data analysis conditions. The processing converts a list of files from the each of the sub directories in the -jtvInputDir directory. The conversion maps the .cdt file in each sub directory to use the {toHdrName} column data of the equivalent {mapHdrFile} map Table instead of the "EID:'mAdb ID'" as generated by mAdb. The mapping between "mAdb ID" and short array names is done using the jtv_mAdbArraySummaryFile Table map. It then writes out the JTV subset to a created sub directory in jtvOutputDir that has the same base name as the input JTV sub directory being processed. This is run using the switches '-jtvMapping', '-jtvInputDir' and -jtvOutputDir'. After converting the JTV files, generate corresponding Web pages to invoke the JTV applets using the jtvHTMLgenerate, -jtvDescription, -jtvButton, -addProlog, -addEpilogue. When doing -jtvHTMLgenerate, the jtvCopyJTVjars copies the Java TreeView .jar files and plugins to the jtvOutputDir directory. It also rezips the output directory since the 'jtvReZipConvertedFiles' switch was specified. Note: you must double quote arguments that use spaces.
E.g., convert a set of Java TreeView zip files downloaded from mAdb. The switches are in file 'params-JTV.map'.
===============================================================

      HTMLtools JTVinput/params-JTV.map
      
where: params-JTV.map  
contains:

#File:params-JTV.map
#"Revised: 3-30-2009"
#
#"(1) Convert array names in JTV data sets to mapped array names."
-jtvNamesMap:"data.Table/mAdbArraySummary.map,data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)"
#
-jtvInputDir:JTVinput
-jtvOutputDir:JTVoutput
-jtvTableDir:data.Table

#
#"(2) Generate HTML web pages to invoke the converted JTV data."
-jtvHTMLgenerate
-jtvDescription:"Sample description paragraph on mouse muscle GH/Stat-null Controlled (+) genes [i.e. experiment $$INFILENAME$$]."
-jtvButtonName:"Mouse Muscle: $$INFILENAME$$"
-addProlog:JTVinput/prolog.html
-addEpilogue:JTVinput/epilogue.html
-jtvCopyJTVjars:JTVjars
#
# [3] Rezip the converted files
-jtvReZipConvertedFiles
#
#"------------ End -----------"

Example 10.

Batch process a list of HTMLtools params .map files specified in the batch input file (batchList.doit in this example). The batch processing is started as shown below with the '-batchProcess' switch. The previous examples show some of params .map files that could be used in the list. Note: you may not nest '-batchProcess' commands (it is not recursive). The list may only contain comments ('#' prefixed lines) or params .map file names. Note: you must double quote arguments that use spaces.
E.g., Execute the list of parameter .map files listed in the batch file called batchList.doit.
===============================================================

       HTMLtools -batchProcess:batchList.doit
       
where: batchList.doit
contains:

#File:batchList.doit
#"Revised: 6-23-2009"
#"Preprocess the data for the NIDDK/mAdb GSP Jak-Stat Prosector Database"
#
#"(1) Doing  GSP-InventoryExperiment Groups conversions and generating HTML pages"
data.GSPI-EG/params-GSPI-EG.map
data.GSPI-EG/params-GSPI-EG-concatTXT.map
data.GSPI-EG/params-GSPI-EG-concatHTML.map
data.GSPI-ExpGrp/params-GSPI-ExpGrp.map
data.GSPI-ExpGrp/params-GSPI-ExpGrp-exportBigCells.map
#
#"(2) Doing mAdb Retrieval Report conversions and generating HTML pages"
data.MRR/params-MRR.map
data.MRR/params-MRR-keep.map
#
#"(3) Doing JTV array name conversions and generating HTML pages"
###JTVinput/params-JTV.map
JTVinput/params-JTV-jtvReZip.map
#
#"(4) Convert Mapping .txt files to HTML"
data.Maps/params-Maps-EGMAP-html.map
data.Maps/params-Maps-ExperimentGroups-html.map
data.Maps/params-Maps-mAdbArraySummary-html.map
#
#"(4.1) Convert Mapping .txt files to .map files"
data.Maps/params-Maps-EGMAP-map.map
data.Maps/params-Maps-ExperimentGroups-map.map
data.Maps/params-Maps-mAdbArraySummary-map.map
#
#"(5) Doing mAdb Retrieval Report Gene List mappings and generating HTML pages"
data.MRR-GL-examples/params-MRR-GL-orig.map
data.MRR-GL-examples/params-MRR-GL-Review.map
data.MRR-GL-examples/params-MRR-GL-GeneList.map
#
#"(6) Doing mAdb and HTML conversion tests TODO generating HTML pages"
data.mAdb-TestsToDo/params-mAdb-TestsToDo.map
#
#"(7) Convert MRR all arrays to edited DB file."
#"    This is normally not done each time."
#data.MRR-all/params-MRR-all-18-RMA-fast.map
#data.MRR-all/params-MRR-all-18-MAS5-fast.map
#
#"(7.1) Convert MRR Literature data for all arrays."
#"    This is normally not done each time."
data.MRR-Literature/params-MRR.map
data.MRR-Literature/params-MRR-keep.map
data.MRR-Literature/params-JTV-jtvReZip.map
#
#"(8) Generate a Tests-Intersection .txt table and also the HTML for it."
#" from the mAdb-TestsToDo.txt data."
data.TestsIntersection/params-TI-HTML-all.map
data.TestsIntersection/params-TestsIntersection-ALL.map
data.TestsIntersection/params-TestsIntersection-ALL-filter.map
data.TestsIntersection/params-TestsIntersection-ALL-filter-LIT.map
#
#"(9) Flip several types of samples - not currently used in html/GSP"
#"(9) Flip several types of samples"
#"(9.1) Create Data file for Flip Tables."
#" Create edited Tables with Index-Maps."
#"  This is normally not done each time."
data.MRR-flip/params-MRR-all-fastSave.map
data.MRR-flip/params-MRR-all-fastMakeIndex.map
data.MRR-flip/params-MRR-all-fastSave+MakeIndex.map
data.MRR-flip/params-MRR-LitRev-fastSave.map
data.MRR-flip/params-MRR-LitRev-fastMakeIndex.map
data.MRR-flip/params-MRR-LitRev-fastSave+MakeIndex.map
data.MRR-flip/params-MRR-EG3.2-Test1-fastSave.map
data.MRR-flip/params-MRR-EG3.2-Test1-fastMakeIndex.map
#
#"(9.2) Flip Tables with and without filtering saving"
#" the flipped .txt file and .html file."
data.MRR-flip/params-MRR-flipGID-all-GeneList.map
data.MRR-flip/params-MRR-flipGID-all-FeatureID.map
data.MRR-flip/params-MRR-flipGID-all-GeneList+FeatureID.map
data.MRR-flip/params-MRR-flipGID-all-GeneList-RowNames.map
data.MRR-flip/params-MRR-flipGID-all-GeneList+FeatureID-RowNames.map
data.MRR-flip/params-MRR-flipGID-LitRev.map
data.MRR-flip/params-MRR-flipGID-EG3.2-test1.map
#
#"(10) Run the GenBatchScripts to create the batch scripts data"
#"    This is normally not done each time."
data.GBS/params-genBatchScripts.map
#
#
#"------------ End -------------"

Example 11.

Edit a very large .txt file (EG1+EG3.1+EG3.2-MSE430_2-18Arrays-RMA-Grouped.txt) into another .txt file (EGALLDataSet.txt) using the fast edit command. Note the -sortRowsByColumn is not available when doing a fast-edit of a large file.
E.g., Edit (-dropColumns, -reorderColumns, -mapHdrNames, -reorderRemainingColumnsAlphabeticly). The switches are in file 'params-MRR-all-fast.map'. Note: you must double quote arguments that use spaces.
===============================================================

      HTMLtools data.MRR-all/params-MRR-all-fast.map 
      
where: data.MRR-all/params-MRR-all-fast.map  
contains:

#File:params-MRR-all-fast.map
#"Revised 6-20-2009"
#"Convert MRR all arrays file to edited DB 'EGALLDataSet.txt' file."
#
-inputDir:data.MRR-all
-outputDir:data.Table
-outputDir:data.Table
-tableDir:data.Table
#
-files:"EG1+EG3.1+EG3.2-MSE430_2-18Arrays-RMA-Grouped.txt"
#
#
#"Save the edited table as a .txt file"
-saveEditedTable2File:EGALLDataSet.txt,noHTML
#
#"Do a fast edit of the .txt file and don't generate HTML file"
-fastEditFile
#-noHTML
#
#-addOutfilePostfix:"-edit"
#
-allowHdrDups
-rmvTrailingBlankRowsAndColumns
-hdrLines:2
-useOnlyLastHeaderLine
-hasEmptyLineBeforeTable
#
#"Map header names. Select  from field='Affy .CEL file (16)'"
#"  to field= 'GSP ID (9)' or 'Simple GSP ID (10)'" 
-mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)"
#
#"Drop more columns for simplest file"
-dropColumn:Difference
-dropColumn:p-Value
-dropColumn:Description
#"Drop some of the columns"
-dropColumnColumn:"mgB36 Chr:Start-Stop"
-dropColumn:"mgB36 Cytoband"
-dropColumn:Annotation_Src
-dropColumn:UniGene
-dropColumn:RefSeq
-dropColumn:Refseqs_Hit
-dropColumn:geneIDS_Hit
-dropColumn:"Entrez GeneID"
-dropColumn:"Locus Tag"
-dropColumn:"BioCarta  Pathways"
-dropColumn:"KEGG Pathways"
-dropColumn:"Gene Ontology Terms" (remove # if want to drop)
-dropColumn:"GO Tier2 Component"
-dropColumn:"GO Tier3 Component"
-dropColumn:"GO Tier2 Function"
-dropColumn:"GO Tier3 Function"
-dropColumn:"GO Tier2 Process"
-dropColumn:"GO Tier3 Process"
#"The following was added 5/28/09"
-dropColumn:"Map"
-dropColumn:"mgB37_Probe Chr:Start-Stop"
-dropColumn:"mgB37_Probe Cytoband"
-dropColumn:"mgB37_RefSeq Chr:Start-Stop"
-dropColumn:"mgB37_RefSeq Cytoband"
#
#"Sort the remaining columns alphabetically"
-reorderRemainingColumnsAlphabeticly
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"----------- End --------- "

Example 12.

Do URL mapping of a multiple row header data in a transposed table. The Table was transposed in a separate operation using Excel. ([TODO] we may add a transpose function to the converter in the future).
E.g., Make hyperlinks in header rows rather than the table data. The switches are in file 'params-MRR-GL-GeneList.map'.
===============================================================

      HTMLtools data.MRR-GL-examples/params-MRR-GL-GeneList.map 
      
where: data.MRR-GL-examples/params-MRR-GL-GeneList.map  
contains:

#File:params-MRR-GL-GeneList.map
#"Revised: 3-30-2009"
#
-addPrologue:data.MRR-GL-examples/prolog.html
-addEpilogue:data.MRR-GL-examples/epilogue.html
-addRowNumbers
-addTableName:"GSP Genes mentioned in Hennighausen & Robinson Review (2008)"
-inputDir:data.MRR-GL-examples
-outputDir:html/GSP/Search/example/
-tablesDir:data.Table
-files:GeneListTbl-all-A+G.txt,GeneListTbl-all-EG1+EG3.txt,GeneListTbl-all-Stat5ab+Socs2.txt
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
#
#"Map all 3 header lines in the Table"
-hdrLines:3
#
#"This does multiple header-row data mapping."
-hrefHeaderRowMapping
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"------------ End -----------"

Example 13.

Generates a Tests-Intersection .txt and .html table from the tests data also used in Example 15. This table of the tests intersections showing test-results of gene fold-change data. It is computed for all samples and all tests in the mAdb-TestsToDo.txt list that was saved from the mAdb-TestsToDo.xls file. Only genes that have passed any of the tests are included, even if the gene had only passed one test. The results are sorted by gene name. These results are available as an Excel file TestsIntersection-ALL.txt and also as a HTML Web page. See the Tests-Intersection commands.
E.g., Create a a Tests-Intersection .txt and .html table. The switches are in file 'params-TestsIntersection-ALL.map'.
===============================================================

      HTMLtools data.TestsIntersection/params-TestsIntersection-ALL.map 
      
where: data.TestsIntersection/params-TestsIntersection-ALL.map  
contains:

#File:params-TestsIntersection-ALL.map
#"Revised 5-06-2009"
#
#"[1] Master script to create a Tests Intersection Table file for all tests"
#"in the mAdb-TestsToDo.txt file that we have data."
#
-inputDir:data.GBS
-outputDir:html/GSP/TestsIntersection
#
#"Limit the number of rows to the highest 500 fold-change values"
-limitMaxTableRows:"500,Range FC,Descending"
#
#"The tablesDir subdir. where mapping and other reference Tables are copied"
#"to the batchScripts directory."
-tablesDir:data.Table
#
-allowHdrDups
-rmvTrailingBlankRowsAndColumns
-hdrLines:2
-hasEmptyLineBeforeTable
#
-makeTestsIntersectionTable:"mAdb-TestsToDo.txt"
#
#"Add FC range computations and expand the TI table with"
#"fields ('Max FC', 'Min FC', 'Range FC')."
-addFCrangesForTestsIntersectionTable
#
#"Add the ('Range A Mean', 'Range B Mean', 'FC counts %') computations to"
#"an expanded TestsIntersectionTable table."
-addRangeOfMeansToTItable 
#
#"Save the edited table as a .txt file"
-saveEditedTable2File:"TestsIntersection-ALL.txt,HTML"
-mapDollarsigns:$$EXCEL-FILE$$,"TestsIntersection-ALL.txt"
#
#"The mAdb-TestsToDo.txt Tables are in"
#"the '-tablesDir:data.Table' subdirectory."
#
#"[2] Now after the Tests-Intersection .txt table is saved, generate the HTML file."
#"Note: Converter removes -hasEmptyLineBeforeTable and sets -hdrLines:5 switches."
#
-addPrologue:data.TestsIntersection/prolog-TI.html
-addEpilogue:data.TestsIntersection/epilogue-TI.html
#
#
-addRowNumbers
-addTableName:"Intersection of All GSP Fold-Change Tests for Genes in any test"
-mapDollarsigns:$$TITLE$$,"All GSP Tests for Genes in any test"
-allowHdrDups
#
-alternateRowBackgroundColor:white
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
#
###-sortRowsByColumn:Gene,Ascending
-sortRowsByColumn:"Range FC",Descending
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID,http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A"
-hrefData:"Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene="
-hrefData:"Feature ID,https://www.affymetrix.com/LinkServlet?probeset="
#
#"------------- End ---------------"

Example 14.

Create a flipped tables with hyperlinked multi-line headers with data filtered by rows and column name filters. This process is broken into two scripts: Example 14.1 to create an edited table file and then an Index-Map table file from the edited table file. Then, Example 14.2 to generate a flipped table saved as .txt and .html files using the edited table and it's Index-Map file.

Example 14.1

Create an edited table file and then an Index-Map table file from the edited table file. See Example 14.2 for the second part to use these files to created a flipped table.
E.g., Create batch scripts for subsequent file conversion processing. The switches are in file 'params-MRR-all-fastSave+MakeIndex.map.map'.
===============================================================

      HTMLtools data.flip/params-MRR-all-fastSave+MakeIndex.map 
      
where: data.flip/params-MRR-all-fastSave+MakeIndex.map  
contains:

#File:params-MRR-all-fastSave+MakeIndex.map
#"This saves the edited Table and then makes an Index-Map file"
#"of the saved edited table."
#"Revised 6-23-2009"
#
-inputDir:data.MRR-all
-outputDir:data.MRR-flip
-tablesDir:data.Table
#
-files:"Review-LH-18Arrays-54-pathway-Genes-in-JakStat.txt"
#
#"Save the edited table as a .txt file"
-saveEditedTable2File:EGALLDataSet.txt,noHTML
#
#"Make an EGALLDataSet.idx index file of the .txt file"
-makeIndexMapFile:"Gene,Well ID,Feature ID"
#
-allowHdrDups
-rmvTrailingBlankRowsAndColumns
-hdrLines:2
-useOnlyLastHeaderLine
-hasEmptyLineBeforeTable
#
#"Do a fast edit of the .txt file"
-fastEditFile
-noHTML
#
#"Map header names. Select  from field='Affy .CEL file (16)'"
#"  to field= 'GSP ID (9)' or 'Simple GSP ID (10)'" 
-mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)"
#
#"Drop more columns for simplest file"
-dropColumn:Difference
-dropColumn:p-Value
-dropColumn:Description
#"Drop some of the columns"
-dropColumnColumn:"mgB36 Chr:Start-Stop"
-dropColumn:"mgB36 Cytoband"
-dropColumn:Annotation_Src
-dropColumn:UniGene
-dropColumn:RefSeq
-dropColumn:Refseqs_Hit
-dropColumn:geneIDS_Hit
-dropColumn:"Entrez GeneID"
-dropColumn:"Locus Tag"
-dropColumn:"BioCarta  Pathways"
-dropColumn:"KEGG Pathways"
-dropColumn:"Gene Ontology Terms" (remove # if want to drop)
-dropColumn:"GO Tier2 Component"
-dropColumn:"GO Tier3 Component"
-dropColumn:"GO Tier2 Function"
-dropColumn:"GO Tier3 Function"
-dropColumn:"GO Tier2 Process"
-dropColumn:"GO Tier3 Process"
#"The following was added 5/28/09"
-dropColumn:"Map"
-dropColumn:"mgB37_Probe Chr:Start-Stop"
-dropColumn:"mgB37_Probe Cytoband"
-dropColumn:"mgB37_RefSeq Chr:Start-Stop"
-dropColumn:"mgB37_RefSeq Cytoband"
#
#"Sort the rest of the columns alphabetically"
-reorderRemainingColumnsAlphabeticly
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"----------- End ----------- "

Example 14.2

Create a flipped table tab-delimited .txt file and .html file from the edited table file and index-map table files created in Example 14.1.
E.g., Create batch scripts for subsequent file conversion processing. The switches are in file 'params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map'.
===============================================================

      HTMLtools data.flip/params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map 
      
where: data.flip/params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map

#File:params-MRR-flipGID-all-GeneListNames+FeatureIDnames-RowNames.map
#"Revised 7-18-2009"
#
-addPrologue:data.MRR-flip/prolog.html
-addEpilogue:data.MRR-flip/epilogue.html
-addRowNumbers
-addTableName:"Flipped 18 GSP Mouse MOE403_2 arrays Filtered by Feature_ID List"
#
-inputDir:data.MRR-flip
-outputDir:html/data.flip
-tablesDir:data.Table
#
-addOutfilePostfix:"-GeneList+FeatureID-RowNames"
#
-flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx"
-flipColumnName:"*LIST*,Gene,Socs1,Socs2,Socs3,Stat1,Stat2,Stat3,Stat4,Stat5a,Stat5b"
-flipColumnName:"*LIST*,Well ID,"
-flipColumnName:"*LIST*,Feature ID,1418507_s_at,1449109_at, 1438470_at,1441476_at"
-flipRowNames:"*LIST*,EG001,EG003.1,EG003.2"
-flipOrder:"Gene,Well ID,Feature ID"
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
#
#"This does multiple header-row data mapping."
-hrefHeaderRowMapping
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"--------- End ---------"

Example 15.

Create a set of batch jobs to convert data described in a table file generating summary Web pages, a set of params .map files in a tree structure in directory batchScripts/. (See Section 1.1 Generating Batch Scripts for more details on GenBatchScripts processing.) The data for this script is in data.GBS. It uses a list of data.GBS/mAdb-TestsToDo.txt tests that are used to generate a sets of HTML files in the Analyses and JTV directories. The file as created by saving that was saved from the manually edited mAdb-TestsToDo.xls as a tab-delimited file. It also uses a set of params .map templates prefixed with paramsTemplate, a set of summary templates prefixed with summaryTemplate, and a prolog and epilogue templates for the MRR and JTV HTML web pages that will be generated. The 'inputTree' of tab-delimited and .zip file data for these batch analyses are in the data.GBS/CellTissue/ directory. The GenBatchScripts processing will generate the batchScripts/ tree (described in the following params-genBatchScripts.map script. After GenBatchScripts processing is finished, the user would run the Windows BAT file batchScripts/buildWebPages.bat on the batchScripts/buildWebPages.doit batch list, both just created. The params .map files referenced in the .doit file are in batchScripts/ParamScripts. When run, it saves the generated HTML Web pages and converted JTV files in batchScripts/Summary, batchScripts/Analyses, and batchScripts/JTV. See the GenBatchScripts commands.
E.g., Create batch scripts for subsequent file conversion processing. The switches are in file 'params-genBatchScripts.map'.
===============================================================

      HTMLtools data.GBS/params-genBatchScripts.map 
      
where: data.GBS/params-genBatchScripts.map  
contains:

#File:params-genBatchScripts.map
#"Revised 4-26-2009"
#
#"Master script to generate params .map files and buildWebPages.doit"
#"file for all tests in the mAdb-TestsToDo.txt file. It generates an"
#"environment in batchScripts/ to enable running all of the scripts as"
#"a Window's batch job buildWebPages.bat on the buildWebPages.doit batch"
#"input list Windows batch startup file."
#
#"The templates (.html, .param) and .map files are in the same directory"
#"as this master batch generation script."
#
#"Map: mAdb-TestsToDo.txt - test Table to drive the batch scripts generation."
#
#"Map: CellTypeTissue.map - maps 'Introduction' field for 'Tissues'."
#"Map: ExperimentGroups.map - maps 'Details' field for 'Expression Groups'."
#"Map: EGMAP.map - maps 'Affy .CEL file' name to 'Simple GSP ID' or 'GSP ID'."
#
#"Arg: batchScripts - where all files and the following subdirectories are saved"
#"Arg: ParamScripts - subdir. where generated params*.map files are copied"
#"Arg: inputTree - subdir. where mAdb generated .txt MRR and JTV data are copied"
#"Arg: Summary - subdir. where generated text HTML top level Web pages are saved"
#"Arg: Analyses - subdir. where generated text HTML & edited .txt files are saved"
#"Arg: JTV - subdir. where generated JTVtext HTML & edited JTV files are saved"
#"Arg: JTVjars - subdir. where the JTV runtime jar files are copied"
#
-inputDir:data.GBS
-outputDir:batchScripts
#
#"The tablesDir subdir. where mapping and other reference Tables are copied"
#"to the batchScripts directory."
-tablesDir:data.Table
#
-genBatchScripts:"batchScripts,ParamScripts,InputTree,Summary,Analyses,JTV"
-rmvTrailingBlankRowsAndColumns
#
#"The following maps and Tables are in the '-tablesDir:data.Table' subdirectory."
-genMapHdrNames:"EGMAP.map"
-genMapEGdetails:"ExperimentGroups.map"
-genMapIntroduction:"CellTypeTissue.map"
-genTestFiles:"mAdb-TestsToDo.txt"
#
#"Create Tests-Intersection (TI) HTML links in summary file & params .map files."
-genTestsIntersection
#
#"List of CellType/Tissue summary templates for generating the Summary pages"
-genSummaryTemplate:1,summaryTemplateProlog.html
-genSummaryTemplate:2,summaryTemplateExperimental.html
-genSummaryTemplate:3,summaryTemplateAnalysis.html
-genSummaryTemplate:4,summaryTemplateFurtherAnalysis.html
-genSummaryTemplate:5,summaryTemplateEpilogue.html
#
#"List of params .map templates for generating batch params .map files."
-genParamTemplate:MRR,paramsTemplate-MRR.map
-genParamTemplate:MRR-keep,paramsTemplate-MRR-keep.map
###-genParamTemplate:JTV,paramsTemplate-JTV.map
-genParamTemplate:JTV,paramsTemplate-JTV-jtvReZip.map
-genParamTemplate:MRR-saveFile,paramsTemplate-MRR-saveFile.map
-genParamTemplate:TI,paramsTemplate-TI.map
#
#"List of support files to be copied to support -batchProcess of the .doit file."
-genCopySupportFile:"../HTMLtools.jar"
-genCopySupportFile:"../ReferenceManual.html"
-genCopySupportFile:prologMRR.html
-genCopySupportFile:prologJTV.html
-genCopySupportFile:prologTI.html
-genCopySupportFile:epilogueMRR.html
-genCopySupportFile:epilogueJTV.html
-genCopySupportFile:epilogueTI.html
#
#"List of JTV support files to be copied to support -batchProcess of the .doit file."
#-genCopySupportFile:JTVjars/TreeViewApplet.jar
#-genCopySupportFile:JTVjars/nanoxml-2.2.2.jar
#-genCopySupportFile:JTVjars/plugins/Dendrogram.jar
#-genCopySupportFile:JTVjars/plugins/Karyoscope.jar
#-genCopySupportFile:JTVjars/plugins/Scatterplot.jar
#-genCopySupportFile:JTVjars/plugins/Treeanno.jar
#
#"Copy tree data to top level batch scripts subdirectory"
-genTreeCopy:JTVjars,batchScripts/JTVjars

#"Copy Mapping files tree data to top level batch scripts subdirectory"
-genTreeCopy:data.Table,batchScripts/data.Table
#
#"Copy input data tree data to batch scripts subdirectory"
-genTreeCopy:data.GBS/CellTissue,batchScripts/inputTree/CellTissue
#
#"------------- End ---------------"

Example 16 - generates a .txt database file and an Index Map .idx file.

The two operations consiste of two parameter .map files: Example 16.1 data.MRR-all/params-MRR-all-fastSave.map for the .txt database file, Example 16.2 params-MRR-all-fastMakeIndex.map for the .idx Index Map file, and Example 16.3 params-MRR-all-fastMakeStatisticsIndex.map for the .sidx global Statistics Index Map file. The latter computes an extended Index Map file with (min,max,mean,stddev) for each row of the numeric data and global (min,max,mean,stddev) values used in 2 additional header rows. The .sidx file is used in generating heatmap tables in flipped table database search example in Example 17.
===============================================================

Example 16.1 - generates a .txt database file

This creates an edited database file that can be used in other operations such as the database search example in Example 17 where the output .txt file could be copied to the data.search/ directory.

      HTMLtools data.MRR-all/params-MRR-all-fastSave.map 
      
where: data.MRR-all/params-MRR-all-fastSave.map  
contains:

#File:params-MRR-all-fastSave.map
#"Revised 8-18-2009"
#"Convert MRR all arrays file to edited DB 'EGALLDataSet.txt' file."
#
-inputDir:data.MRR-all
-outputDir:data.MRR-all
-tableDir:data.Table
#
-files:"EG1+EG3.1+EG3.2-MSE430_2-18Arrays-RMA-Grouped.txt"
#
#
#"Save the edited table as a .txt file"
-saveEditedTable2File:EGALLDataSet.txt,noHTML
#
#"Do a fast edit of the .txt file and don't generate HTML file"
-fastEditFile
#-noHTML
#
#-addOutfilePostfix:"-edit"
#
-allowHdrDups
-rmvTrailingBlankRowsAndColumns
-hdrLines:2
-useOnlyLastHeaderLine
-hasEmptyLineBeforeTable
#
#"Map header names. Select  from field='Affy .CEL file (16)'"
#"  to field= 'GSP ID (9)' or 'Simple GSP ID (10)'" 
-mapHdrNames:"data.Table/EGMAP.map,Affy .CEL file (16),GSP ID (9)"
#
#"Drop more columns for simplest file"
-dropColumn:Difference
-dropColumn:p-Value
-dropColumn:Description
#"Drop some of the columns"
-dropColumnColumn:"mgB36 Chr:Start-Stop"
-dropColumn:"mgB36 Cytoband"
-dropColumn:Annotation_Src
-dropColumn:UniGene
-dropColumn:RefSeq
-dropColumn:Refseqs_Hit
-dropColumn:geneIDS_Hit
-dropColumn:"Entrez GeneID"
-dropColumn:"Locus Tag"
-dropColumn:"BioCarta  Pathways"
-dropColumn:"KEGG Pathways"
-dropColumn:"Gene Ontology Terms" (remove # if want to drop)
-dropColumn:"GO Tier2 Component"
-dropColumn:"GO Tier3 Component"
-dropColumn:"GO Tier2 Function"
-dropColumn:"GO Tier3 Function"
-dropColumn:"GO Tier2 Process"
-dropColumn:"GO Tier3 Process"
#"The following was added 5/28/09"
-dropColumn:"Map"
-dropColumn:"mgB37_Probe Chr:Start-Stop"
-dropColumn:"mgB37_Probe Cytoband"
-dropColumn:"mgB37_RefSeq Chr:Start-Stop"
-dropColumn:"mgB37_RefSeq Cytoband"
#
#"Sort the remaining columns alphabetically"
-reorderRemainingColumnsAlphabeticly
#
#"Reorder columns to left side of Table"
-reorderColumn:Gene,1
-reorderColumn:"A-B Mean Difference",2
-reorderColumn:Difference,3
-reorderColumn:"A Mean",4
-reorderColumn:"B Mean",5
-reorderColumn:"A-B p-Value",6
-reorderColumn:"p-Value",7
-reorderColumn:"Well ID",8
-reorderColumn:"Feature ID",9
-reorderColumn:"Description",10
#
#"----------- End --------- "

Example 16.2 - generates an .idx Index Map file of the database file

This creates an Index Map .idx file from the database file. These files are used in other operations such as the database searchGUI using the script example in Example 17 where the output .idx file could be copied to the data.search/ directory.

      HTMLtools data.MRR-all/params-MRR-all-fastMakeIndex.map 
      
where: data.MRR-all/params-MRR-all-fastMakeIndex.map  
contains:

#File:params-MRR-all-fastMakeIndex.map
#"Revised: 8-19-2009"
#
-inputDir:data.MRR-all
-outputDir:data.MRR-all
-tablesDir:data.Table
#
-files:"EGALLDataSet.txt"
#
-hdrLines:1
#
#"Do a fast edit of the .txt file"
-fastEditFile
#
#"Make an .idx index file of the .txt file"
-makeIndexMapFile:"Gene,Well ID,Feature ID"
#
#"----------- end --------- "

Example 16.3 - generates a .sidx global StatisticsIndex Map file of the database file

This creates a global Statistics Index Map .sidx file from the database file. This file is used in other operations such as the database searchGUI using the script example in Example 17 where the output .sidx file could be copied to the data.search/ directory and is used if a heatmap table is generated for the flipped table.

      HTMLtools data.MRR-all/params-MRR-all-fastMakeStatisticsIndex.map 
      
where: data.MRR-all/params-MRR-all-fastMakeStatisticsIndex.map  
contains:

#File:params-MRR-all-fastMakeStatisticsIndex.map
#"Revised: 8-11-2009"
#
-inputDir:data.MRR-all
-outputDir:data.MRR-all
-tablesDir:data.Table
#
#"Specify the edited data set file to use. It is assumed that"
#"the IndexMap file was created and has a .idx file extension."
-files:"EGALLDataSet.txt"
#
-hdrLines:1
#
#"Specify columns to drop when analyzing the Statistics, the rest are dropped."
-dropColumn:Gene
-dropColumn:"Well ID" 
-dropColumn:"Feature ID" 
#
#"Make an .sidx index file of the .txt and .idx files"
-makeStatisticsIndexMapFile
#
#"----------- end --------- "

Example 17 - the `paramsSearchDefault.map` file used with the "-searchGui" option

The the paramsSearchDefault.map file contains additional information used by the search database GUI ("-searchGui" option). See Search GUI for more details.
===============================================================

      HTMLtools -searchGui 
      
where: this looks for the file data.search/paramsSearchDefault.map  
contains:

#File:paramsSearchFlip.map
#"$$DATE$$"
#
#"Search information read by the search GUI for prompts and menus"
-searchTermNames:"Gene,Well ID,Feature ID"
-searchRowFilterName:"Sample Experiment Groups"
-searchSampleChoiceFile:sampleExperimentGroupsChoices.txt
-searchTermsDemoData:"Stat5a Stat5b 1438470_at 1441476_at 1446085_at"
-searchUserTermList:"LitRefGeneList.txt,Feature ID,Literature Review"
-searchTermsFilterPrompt:"'Gene', 'Well ID', and/or 'Probe' names. E.g., Stat5a, Stat5b, 1438470_at 1441476_at 1446085_at, etc."
-searchRowFilterPrompt:"'Sample Experiment Groups'. E.g., select one or more Experiment Groups"
#
-addPrologue:data.search/prolog.html
-addEpilogue:data.search/epilogue.html
-addTableName:"$$DATA_SOURCE_SUBTITLE$$"
-addRowNumbers
#
-addTableName:"Search database filtered by Gene and/or Probe IDs and Experiment Groups"
#
-inputDir:data.search
-outputDir:data.search
-tablesDir:data.Table
#
-addOutfilePostfix:"-search"
#
#"Database (.txt) and index map of database (.idx) to search" 
-flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx"
#
$$SEARCH_FILTERS$$
#
#"Maps to:"
# '-flipOrder:"Gene,Well ID,Feature ID"'
# '-flipColumnName:"*LIST*,Gene,g1,g2,...gn"'
# '-flipColumnName:"*LIST*,Well ID,w1,w2,...,wk"'
# '-flipColumnName:"*LIST*,Feature ID,f1,f2,...,fm"'
# '-flipRowNames:"*LIST*,s1,s2,...,sp"'
# '-dataPrecisionHTMLtable:-1'
# '-showDataHeatmapFlipTable'
# '-flipUseExactColumnNameMatch:TRUE'
#
-allowHdrDups
-alternateRowBackgroundColor:white
-rmvTrailingBlankRowsAndColumns
#
#-shrinkBigCells:25,-5
-shrinkBigCells:1,-5
#
#"This does multiple header-row data mapping."
-hrefHeaderRowMapping
#
#"These map mAdb Feature Report data to Bioinformatics databases"
-hrefData:"Well ID",http://madb.nci.nih.gov/cgi-bin/clone_report.cgi?CLONE=WID%3A
-hrefData:Gene,http://www-bimas.cit.nih.gov/cgi-bin/cards/carddisp.pl?gene=
-hrefData:"Feature ID",https://www.affymetrix.com/LinkServlet?probeset=
#
#"--------- End ---------"

5. DEMONSTRATION DATA SETS

*** TO BE EDITED ***

The demonstration data sets for the Group STAT Project (GSP) are in data directories (not included in open source release). These include:

data.GSPI-EG/ contains the EG001.txt through EGxxx.txt (or more as new experiment groups are added) files (saved as tab-delimited .txt from the Excel GSP-Inventory.xls spreadsheets). It also includes the EGMAP.txt concatenated file (generated by HTMLtools using the -concatTables command), and prolog.html and epilogue.html template files configured for EG data. There are three script files for mapping the data: params-EG.map for mapping all of the EGxxx.txt files; params-EG-concatTXT.map for generating the EGMAP.txt file (in the data.Table// directory). The params-EG-concatTXT.map generates the EGMAP.html file. The data.Maps/params-Maps-EGMAP-map.map generates the data.Table/EGMAP.map file used for mapping sample names to a short consistent names in may of the other scripts. The data.Maps/ also contains a script to create the ExperimentGroups.map file (from the ExperimentGroups.txt) for extracting experiment specific information for the corresponding EGxxx row to include in the epilogue for the EGxxx.html files. Similarly for the mAdbArraySummary.map file (used with the JTV conversions and was export as a mAdb Array Summary). The generated HTML files are saved in the html/GSP/GSP-Inventory/ directory.
data.GSPI-ExpGrp/ contains the ExperimentGroups.txt file (saved as tab-delimited .txt from Excel spreadsheet), and prolog.html and epilogue.html files configured for ExperimentGroups data. The script file params-GSPI-ExpGrp.map specifies how to convert the .txt file). The generated HTML files are saved in the html/GSP/GSP-Inventory/ directory.
data.MRR/ contains the some examples of mAdb Retrieval Reports (MRR) with .txt file extensions for generating various types of HTML web pages. It uses a several mapping files generated in data.Maps/ in doing the MRR HTML Web page generation. The prolog.html and epilogue.html files are configured for MRR type data. There are several params-MRR*.map script files. As the MRR input data files each contains two tables, the -makePrefaceHTML will generate additional "preface-" HTML files linked from the generated HTML files. The generated HTML files are saved in the html/data.MRR directory. Note that this is a demonstration directory, not used for the GSP Web site, to illustrate the various HTML conversion options. The actual data for the GSP database is generated in the GenBatchScripts Example 13.
Another demonstration data set is in JTVinput/ that contains some sample Java TreeView data sets, a prolog.html and epilogue.html configured for JTV data (see Example 9). It uses the data.Table/mAdbArraySummary.map and data.Table/EGMAP.map map files for mapping the mAdb unzipped JTV mAdb ID array names to GSP ID names. The params-JTV.map specifies how to map the data. The JTVjars/ contains the set of Java Tree View .jar files required to run the generated HTML files. The '-jtvCopyJTVjars' switch (in the script) will copy the required .jar files to the output directory. The converted JTV files, generated HTML files (along with the copied JTV .jar files) are saved in the html/JTVoutput/ directory. Note that this is a demonstration directory, not used for the GSP Web site, to illustrate the various conversion options. The actual data for the GSP database is generated in the GenBatchScripts Example 13.
data.Table/ contains the tab-delimited .txt files for the mapping tables as well as the .map files by running the scripts in the data.Maps/ directory.
data.TestsIntersection/ contains the scripts for creating a Tests-Intersection .txt and .html file shown in the html/GSP/TestsIntersection/ directory (see Example 13) .
data.all/ contains data and scripts for using the -fastEditFile option to handle very large input files. The edited .txt files are saved in data.all/.
data.flip/ contains data and scripts for using the -fastEditFile option to handle very large input files. The edited .txt files are saved in data.flip/ and the .html file shown in the html/data.flip/ directory (see Example 14).
data.search/ contains data and scripts for using the -searchGui option to to use with the Search Database GUI. The database files and results files are saved in data.search/ directory (see 2.1.2 Search Database GUI).
data.GBS/ contains the params-genBatchScripts.map script (see Example 15) and support data for generating batch scripts (for subsequent processing). The set of tests to generate the params .map files is described in data.Table/mAdb-TestsToDo.txt that in turn was saved as a tab-delimited file from the Excel spreadsheet mAdb-TestsToDo.xls. It uses mapping files EGMAP.map, ExperimentGroups.map, CellTypeTissue.map in data.Maps/ as well as various templates in data.GBS/ for params .map, summary HTML templates, and prolog and epilogue HTML templates used when execcuting the batch scripts. The generated batch scripts and support data is saved in a created directory batchScripts/. The generated parameter .map scripts are saved in the batchScripts/ParamScripts/ directory. The generated Tissue summary HTML Web pages are saved in the batchScripts/Summary/ directory. The input data required when running the generated batch scripts are copied to the batchScripts/inputTree directory. The generated test data HTML web pages generated when the generated batch scripts are run will be saved in the batchScripts/Analyses/ directory. The generated JTV test data HTML web pages and edited JTV data when the generated batch scripts are run will be saved in the batchScripts/JTV/ directory. Additional files generated are the batchScripts/buildWebPages.doit params .map scripts to execute and the Windows .BAT file batchScripts/buildWebPages.bat to start the batch processing.

6. SOFTWARE DESIGN

This Java application converts a set of tab-delimited data files to various HTML TABLE formats with many mapping options available. We use the term Table with an uppercase 'T' to indicate the FileTable data structure used throughout the program. The command line arguments are parsed by the Switches class. The main() method in the HTMLtools class class invokes the switch parser and then determines if batch processing is to be used (if the '-batchProcess' switch is invoked). If the first command does not have a '-' prefix, it is assumed to be a parameter file (denoted paramXXX.map above). This is then read and the switches in that file are then parsed. It is assumed that there may be more than one .txt file in the input directory ('-inputDir' switch). So a list of these files is then processed applying the various other command line switches that were specified. To run it only on a subset of the files in the inputDirectory, use the '-files:{f1,f2,...,fn}' command specification. In addition, it may be used iteratively if the initial command line argument is the '-batchProcess:{a batch file}' in which case it is assumed that the batch input file (e.g., batchList.doit contains a list of param .map files to be batch processed. The program generates a HTMLtools.log file of the processing every time it is run (and overrides it each time it is run).

The GenBatchScript process is described in Section 1.1 (see Example 13. Although the -genBatchScripts set of commands is integrated into HTMLtools, it is different from other command switches in that it can not be integrated itself into a -batchProcess script. It should be run by itself as

  
java -Xmx256M -classpath .;.\HTMLtools.jar HTMLtools \
          -genBatchScripts:data.GBS/params-genBatchScripts.map

This in turn generates the batchScripts/ directory described in Section 5 above with a generated batchScripts/buildWebPages.doit params .map scripts to execute and the Windows .BAT file batchScripts/buildWebPages.bat to start the batch processing.

The JTV, GenBatchScripts,and JTV commands are currently loosely integrated into HTMLtools so they could be run with the params .map scripts. However, the code is kept fairly distinct so it can be spearated at some point into a separate JTVconverter application when the program is made open source as the JTV conversion is specific to NIH.

The program is documented in this Reference Manual file that includes an introduction, list of switch commands, a number of examples and additional documentation. The Java source code files are in the src/ directory.

The program was built using Eclipse (Version 3.4) (www.eclipse.org). The distibution includes the ANT (ant.apache.org) build.xml script that could be used either standalone or with some Integrated Development Environment such as Eclipse (which includes ANT). There is a separate javadocs .BAT file javadocs-HTMLtools.bat that can be used for generated the java class documentation in the javadocs/ directory. The .BAT files are renamed in the initial .zip file distribution and need to be unpacked before use (see Section 2.1 for details).

List of Java Class modules

Source code modules for HTMLtools application.

HTMLtools.java - main class and contains the top-level process loop. It extends class Globals.
Globals.java - contains the global variables.
ConvertGUI.java - converter graphical user interface (GUI)
ProcessData.java - thread to process CvtGUI data
Convert.java - contains the Table-to-HTML converters and various mapping methods.
DataRowStatisticsIndexMap.java - computes the global Statistics Index Map when generate .sidx file.
JTVconvert.java - JTV array-name mapping and HTML generation methods.
FileIO.java - methods for file I/O and zip file I/O.
FileTable.java - Table readers and manipulation methods.
GenBatchScripts.java - generate batch scripts from methods.
MakeFlipTable.java - create a flipped Table methods.
MakeTestsIntersectionTbl.java - create a Tests-Intersection Table.
SearchGUI.java - specialized database search GUI
ProcessLoadIndexMapData.java - thread to load .idx or .sidx in the background for SearchGUI
ProcessDataSearch.java - thread to process SearchGUI data
Sort.java - sort methods used in sorting Table rows.
Switches.java - the command line switch parser.
UtilCM.java - various short utility methods.
TODO.java - list of additional options under development.

6.1 Converter GUI design

The ConverterGUI.jar file is just a copy of the HTMLtools.jar file renamed to HTMLtools.jar. When it runs, it checks to see what it was called and then does the same thing as it as HTMLtools -gui. When started, it pops up a graphical user interface (see 2.1.1 Using the Graphical User Interface (GUI) to run the converter. The user selected, using the File menu, either a parameter .map script file or a batch .doit file (which contains a list of .map script files). When they press the Process button, it creates a new thread ProcessData.java and has it execute the selected .map or .doit file. When processing, it accumulates a list of HTML files that were generated. When done, it puts this list into a View HTML chooser GUI. If the user selects one, it will then pop up a Web browser showing this file.

6.2 Search GUI design

The SearchGui.jar file is just a copy of the HTMLtools.jar file renamed to SearchGui.jar. When it runs, it checks to see what it was called and then does the same thing as it as HTMLtools -searchGui. When started, it pops up a graphical user interface (see 2.1.2 Search User Database with a Graphical User Interface (GUI) Generating Reports. It also needs to load the Index Map (.idx) file which it does in the background by creating a new thread ProcessLoadIndexMapData.java which lets the user continue selecting data in the interface. Processing is delayed until the map is loaded since it is used to verify the data entered by the user.

The user enters search information into the SearchGui interface to specify 1. a list of genes and/or Well IDs and/or Feature IDs (gene probe IDs); and then, 2. select one or more experiment groups. When they press the Process button, it creates a custom script data.search/paramSearchFlip.map from a default data.search/paramsSearchDefault.map script that is domain dependent. Then it creates a new thread (ProcessDataSearch.java) and recursively calls HTMLtools to execute the just generated paramSearchFlip.map script file. The script includes the flip table options to actually generate the flipped table on the specified subset of data. When the thread is done processing, it has generated data.search/EGALLDataSet-search.txt and data.search/EGALLDataSet-search.html files. It then lets the user press the View HTML button to pop up a web brower to see the EGALLDataSet-search.html file. In the example data.search/paramSearchFlip.map, it has flip options specified from a merging of the user-specified data along with other data (see the file for the rest of the default options)

   . . .
#"Database (.txt) and index map of database (.idx) to search"
-flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx"
#
-flipOrderHdrColNames:"Gene,Well ID,Feature ID"
-flipColumnName:"*LIST*,Gene,Stat5a,Stat5b"
-flipColumnName:"*LIST*,Feature ID,1438470_at,1441476_at,1446085_at"
-flipRowFilterNames:"*LIST*,EG002,EG003.1,EG003.2"
#
#"Set the data precision for generated HTML."
-dataPrecisionHTMLtable:0
#
#
#"Set the flip-table sort by column name."
-sortFlipTableByColumnName:"Stat5b"
#
#"Generate heat-map data cells in a HTML conversion if .sidx exists."
-showDataHeatmapFlipTable
#
    . . .

Flip table computation

The flip table option uses three precomputed database files (created with HTMLtools): data.search/EGALLDataSet{.txt,.idx,.sidx} that allow us to random access any row by probe ID. (A gene may have 1 to half a dozen probe IDs). Since the number of genes that would be used in a search is relatively small (under 100 - typically a lot less on the order of 10 to 20), gathering the data is relatively fast. The script file specifies -flipTableByIndexMap:"EGALLDataSet.txt,EGALLDataSet.idx".

If the heatmap option is used, then it uses the EGALLDataSet.sidx file instead (i.e., Statistics Index Map file). This contains the same row seek data as the .idx file, but also statistics (min, max, mean, stddev) for each row and for the entire database. This is then used to map each table data cell value to a cell background color to implement the heat map.

In addition to the heatmap option (default), it also lets you adjust the precision of the data (which normally has 3 or 4 digits) to 0 or more digits, and to sort the rows by the data in a particular gene/probe ID column.

So the processing is broken into two parts: the GUI part (SearchGui) to gather the arg list to generate the paramSearchFlip.map file, and the flip table generate part to build the search results heatmap HTML table.

7. BATCH PROCESSING TESTS DATA: batchScripts/ directory

This section documents the batchScripts/ directory for creating Web pages and briefly describes the 1) contents of the batchScripts/ directory, 2) how it is created using the GenBatchScripts (GBS) facility of this converter program using the list of mAdb tests and data from those tests, and 3) finally how it is used to create Web pages suitable for copying to the Jak-Stat Prospector (J-S P) Web server on http://jak-stat.nih.gov/. This example could serve as a model for developing static Web server pages for other types of analysis system generated data to be used on a static Web server.

Before attempting to run the GenBatchScripts process to create the batchScripts/ directory, we recommend you familiarize your self with the commands in this Reference Manual. The batchScripts/ directory contains a Windows .bat file to run the HTMLtools program, buildWebPages.bat, and a list of conversion batch jobs in buildWebPages.doit. The buildWebPages.doit file contains a list of generated conversions to be performed in converter-batch (as opposed to Windows-batch). Each conversion is in the form of a generated parameter .map file saved in the batchScripts/ParamsScripts/ directory. (In the rest of this discussion for brevity, we will omit the batchScripts/ prefix in mentioning these directories where it is unambiguous.)

There are additional support files that are required for updating the J-S P Web server tree. These are described at the end of this document in the discussion on converting data from GSP-Inventory for the J-S P.

7.1 Overview of Conversion Process

Tests specified by mAdb-TestsToDo.txt (from the Excel file) are run on mAdb and data is exported to a conversion computer/file system in directory data.GBS/inputTree/. A mAdbArraySummary.txt is also exported and converted to a mAdbArraySummary.map file.
The GSP-Inventory.xls worksheets are saved as ExperimentGroups.txt, CellTypeTissue.txt, and EG001.txt through EGxxxx.txt. These are copied and renamed by the converter to ExperimentGroups.map, CellTypeTissue.map, and EGMAP.map (concatenation of EG001.txt through EGxxx.txt). All .map files are saved in data.Table/ directory.
The above data is then used to generate batch scripts using the GBS/params-genBatchScripts.map (shown below). This generates the batchScripts/ directory (described below).
The buildWebPages.bat Windows batch job is run in batchScripts/ to convert the mAdb inputTree/ data into batchScripts/ directories Summary/, Analysis/, and JTV/.
At this point the directories Summary/, Analysis/, and JTV/ are copied to the GSP/ subdirectories with the same names in the Jak-Stat Prospector Web tree.
The Jak-Stat Prospector Web tree is uploaded from the data conversion computer/file system to the NIDDK staging file system and then copied to the live Web server.

7.2 Contents of the batchScripts/ directory

The batchScripts/ directory contains files and subdirectories required to convert the mAdb tab-delimited text data into HTML and JTV data that can then be copied to the J-S P Web site. The list of generated batchScripts/ subdirectories is:

 
  ParamScripts/      - GBS generated conversion params*.map files
  InputTree/         - mAdb generated .txt MRR and JTV data are copied by GBS
  Summary/           - GBS generated text HTML top level Web pages
  Analyses/          - GBS generated text HTML & edited .txt files
  JTV/               - GBS generated JTVtext HTML & edited JTV files 
  JTVjars/           - the JTV runtime jar files are copied by GBS
  data.Table/        - the common mapping files are copied by GBS
  buildWebPages.doit - GBS generated -batch script to convert MRR and JTV data
  buildWebPages.bat  - GBS generated Windows BAT file run converter on .doit file

7.3 Mapping files used in the conversions

Several mapping files are required for both the initial GenBatchScripts and the subsequent buildWebPages conversions.

  ExperimentGroups.map     - Experiment Group info by EGxxxx
  CellTypeTissue.map       - Tissue 'Introduction' by EGxxxx for summaries
  EGMAP.map                - map 'Affy .CEL file' names to 'GSP ID's
  mAdbArraySummary.map     - the 'mAdb ID' by 'Affy .CEL file'

The ExperimentGroups.map file is the tab-delimited sheet of that name in the GSP-Inventory.xls spreadsheet. The EGMAP.map tab-delimited file is the concatenation of the individual EGxxxx.txt data files from the GSP-Inventory.xls spreadsheet (see Notes sheet in the GSP-Inventory), and assembled into the map by the data.GSP-EG/params-GSPI-EG-concatTXT.map and data.Maps/params-Maps-EGMAP-map.map scripts. The mAdbArraySummary.map file is the saved 'mAdb Array Summary' for all of the samples data. All generated .map files are saved in the to the data.Table/ directory where they are used.

7.4 HTML templates used when generating Summary/ HTML

These templates are used as the basis for the tissue summary Web pages generated in the batchScripts/Summary/ directory. They contain various special '$$' keywords that the converter uses to map sample-specific data when generating the Web pages.

  summaryTemplateProlog.html
  summaryTemplateExperimental.html
  summaryTemplateAnalysis.html
  summaryTemplateFurtherAnalysis.html
  summaryTemplateEpilogue.html

Mapping $$keywords$$ used with the HTML SummaryTemplate files

The following fields are created by the GenBatchScripts process in the converter when analyzing the mAdb-TestsToDo.txt data and also extracting data from the other map files (EGMAP.map, ExperimentGroups.map, CellTypeTissue.map). See mAdb-TestsToDo.txt for data associated with each test. The summary web pages that are generated reference the various converted Web pages derived from the tests. All tests with the same tissue name are reference from the same tissue summary Web page. The $$ keywords are expanded during batchScipts/Summary/ HTML Web page generation.

  $$TISSUE$$            - tissue associated with the test
  $$INTRODUCTION$$      - Introduction data from the CellTypeTissue.map file
  $$LIST_EXPR_GROUPS$$  - list of expression groups used in the test
  $$DESCRIPTION$$       - description using data from mAdb-TestsToDo data
  $$ANALYSIS$$          - data generated for the "Further Analysis" section
  $$FUTHERANALYSIS$$    - data generated for the "Further Analysis" section
  $$DATE$$              - date of conversion
  $$INFILENAME$$        - specific test name (e.g., EG1-test-1+FC-ALL.txt)

7.6 Generation of batchScipts/ParamScripts/ .map files with GenBatchScripts

Multiple params .map files are generated during the initial GenBatchScripts process and are saved in batchScipts/ParamScripts/. In addition, the command to execute each of these params .map files is added to the master batchScripts/buildWebPages.doit script file. The latter is executed after the initial batchScripts/ tree is generated. Each test from mAdb-TestsToDo.txt is then used to generate data associated with each test (see A HREF="#TestNameUsage"> Test Name Usage below). A test instance is of the form 'EG1-test-1' and generates the requests for batchScripts/inputTree/ data (e.g., 'EG1-test-1+FC.txt', 'EG1-test-1+FC-ALL.txt', 'EG1-test-1+FC-JTV.zip', 'EG1-test-1+FC-JTV-ALL.zip', as well as the "-FC" versions of these 4 types). It also generates requests for the output data in batchScripts/Analyses/ ('EG1-test-1+FC.html', 'EG1-test-1+FC-keep.html', 'EG1-test-1+FC-ALL.html', 'EG1-test-1+FC-JTV.html', 'EG1-test-1+FC-JTV-ALL.html', the 'EG1-test-1+FC-JTV.zip' and 'EG1-test-1+FC-JTV-ALL.zip files as well as the "-FC" versions of these files). The following data.GBS/ parameter .map templates are used to generate the specific params .map files for the above test sub-instances for each test. Executing these params .map files will generate the abovementioned .html and JTV.zip files.

  paramsTemplate-MRR.map          - generate MRR gene expression HTML report
  paramsTemplate-MRR-keep.map     - generate MRR gene list HTML report
  paramsTemplate-JTV-jtvReZip.map - generate mapped JTV data and JTV HTML applet

The $$ keywords are expanded during batchScipts/ParamScripts/ files generation. The entries with ".2" in the name are used for subsequent name remapping during the second phase when evaluating the generated params .map files. This list is common for all params .map files generated for the same test.

  $$DATE$$              - date of conversion
  $$INPUT_DATA$$        - input data relative directory 
  $$OUTPUT_DATA$$       - output data relative directory
  $$TABLE_DATA$$        - location of the .map files relative directory
  $$A_SAMPLE_NAME$$     - name of the 'A' condition
  $$B_SAMPLE_NAME$$     - name of the 'B' condition
  $$$$JTV_JARS$$$$      - location of the JTV runtime .jar support files directory
    
  $$TISSUE.2$$          - tissue associated with the test 
  $$PAGE_LABEL.2$$      - data from page label in mAdb-TestsToDo for test entry
  $$DESCRIPTION.2$$     - data from description in mAdb-TestsToDo for test entry
  $$CLASS_A.2$$         - list of GSP IDs for condition A
  $$CLASS_B.2$$         - list of GSP IDs for condition B

This list can be different for each params .map file generated for the same test.

  $$PROLOG$$            - name of prolog file prologMRR.html or prologJTV.html
  $$EPILOGUE$$          - name of prolog file epilogueMRR.html or epilogueJTV.html
  $$TITLE.2$$           - title for specific generated Web page 
  $$TEST_OR_ALL.2$$     - "Test" or "All" modifierp     
  $$GBS_DESCRIPTION.2$$ - test specific information        
  $$PARAM_MAP_NAME$$    - name of parameter file with (+-FC, -ALL, -JTV) modifiers         
  $$TESTMAME.2$$        - test name with (+-FC, -ALL, -JTV) modifiers       
  $$FILE.2$$            - data input file for each params .map       
  $$JOIN_TABLE_FILE.2$$ - the -joinTableFile file for MRR-ALL processing only        
  $$MAPDIR$$            - mAdb mapping file for JTV sample name processing only

7.6 Prolog/epilogue templates used when generating Analysis/ tests

These templates are used as the basis for the Web pages generated in the 2nd phase of the batchScripts conversion when executing the buildWebPages.doit scripts in the batchScripts/Analyses/ (the MRR data) and batchScripts/JTV/ (the JTV data) directories. The parameter files just mentioned will contain the above $$ mappings generated during the 1st phase. These are then used during the second phase to expand the following templates when it does the actual Web page generation. They contain various special $$ keywords that the converter users to map sample-specific data when generating the Web pages.

  prologMRR.html
  epilogueMRR.html

and

  prologJTV.html
  epilogueJTV.html

7.7 Support directories added to the batchScripts/ to support the conversions

The InputDataTree/, JTVjars/ and data.Table/ directories were added to the batchScripts/ directory to support the conversions. The params .map batch jobs also refer to well as corresponding data files to be converted in the InputDataTree/ directory. The data files consist of tab-delimited .txt test data files and JTV .zip files generated by mAdb and organized in a tree that corresponds to the mAdb-TestsToDo.txt Table from the mAdb-TestsToDo.xls Excel spreadsheet table (See the Test-Notes sheet in that Excel file for discussion of the fields). The entire batchScripts/ directory with the Analyses/, InputTree/, JTV/, JTV/jar/, data.Table/, ParamScripts/ and Summary/ subfolders as well as a copy of HTMLtools.jar was created when running the genBatchScripts.map script.

7.8 Running the data.GBS/ script to build the batchScripts/ directory

The GenBatchScripts process is started by executing the data.GBS/params-genBatchScripts.map script is shown below. The additional support files are described in the script.

   java -Xmx256M -classpath .;.\HTMLtools.jar HTMLtools \
          data.GBS/params-genBatchScripts.map

This will use the mAdb-TestsToDo.txt Table as well as other files in data.GBS/ including CellTypeTissue.map (the tab-delimited script from the GSP-Inventory.xls spreadsheet) to:

Build the batchScripts/ directory structure for directories: Analyses/, InputTree/, data.Table/, JTV/, JTVjars/, ParamScripts/ and Summary/.
Analyze the common Tissues in the mAdb-TestsToDo.txt Table and then build the Jak-Stat Prospector summary Web pages in directory batchScripts/Summary/ using mapping data and templates in the data.GBS/ directory.
Use the list of mAdb tests in mAdb-TestsToDo.txt Table (as well as other mapping data and templates) to build the ParamScripts/ parameter .map scripts. Each test of the form 'EGn.m-test-n' will generate 8 tests on mAdb and thus HTML and heatmap(Java TreeView) files that are processed by the batch script into files for the Web site:

7.8.1 Test name usage:

The following examples contain file names generated using the 'Test Name' in from the mAdb-TestsToDo.xls used both by mAdb for running the tests in batch mode and for the conversion of that data to Web pages (running the converter) also in batch mode.

mAdb (MRR) and JTV generated data:

EG1-test-1+FC.txt           EG1-test-1-FC.txt  (t-Test or fold-change test gene set)
EG1-test-1+FC-ALL.txt       EG1-test-1-FC-ALL.txt (AND of test gene set with ALL samples)
EG1-test-1+FC-JTV.zip       EG1-test-1-FC-JTV.zip   (JTV heatmap for gene set)
EG1-test-1+FC-JTV-ALL.zip   EG1-test-1-FC-JTV-ALL.zip   (JTV heatmap for ALL samples)

Converter output: The .txt files processed by the converter have .html file extensions, the .zip files have the .zip removed and an HTML file generated to start up the JTV. If the mAdb group changes the JTV output to use the GSP IDs instead of mAdb IDs, we can avoid processing the JTV .zip files.

On the J-S-P Web site:

EG1-test-1+FC.txt            EG1-test-1-FC.txt (t-Test or fold-change gene set test)
EG1-test-1+FC-ALL.txt        EG1-test-1-FC-ALL.txt (AND of test gene set with ALL samples)
and
EG1-test-1+FC.html           EG1-test-1-FC.html (t-Test or F-C test - with expr.data)
EG1-test-1+FC-keep.html      EG1-test-1-FC-keep.html (t-Test or F-C test - no expr. data)
EG1-test-1+FC-ALL.html       EG1-test-1-FC-ALL.html (AND of test gene set with ALL samples)
EG1-test-1+FC-JTV.html       EG1-test-1-FC-JTV.html     (JTV heatmap for gene set)
EG1-test-1+FC-JTV-ALL.html   EG1-test-1-FC-JTV-ALL.html (JTV heatmap for ALL samples)
EG1-test-1+FC-JTV.zip        EG1-test-1-FC-JTV.zip      (JTV heatmap for gene set)
EG1-test-1+FC-JTV-ALL.zip    EG1-test-1-FC-JTV-ALL.zip  (JTV heatmap for ALL samples)

7.9 Running the Windows buildWebPages batch script to build the Web pages

To run the buildWebPages batch script in the batchScripts/ directory, 1) change to the batchscripts/ directory, and 2) then execute the following script to build the Analyses/, Summary/ and JVT/ web pages. See Example 15 for the list of the params-genBatchScripts.map script. The active line in the buildWebPages.bat BAT file is:

   java -Xmx512M -classpath .;.\HTMLtools.jar HTMLtools \
          -batchProcess:batchScripts/buildWebPages.doit

7.10. Converting data from GSP-Inventory for the Jak-Stat Prospector

There are additional support files that are required for updating the J-S P Web server tree. The following JS-P GSP/ subdirectories are updated by running the converter -batchProcessing:batchList.doit batch job: GSP/GSP-Inventory/, GSP/Search/, and GSP/Tests/ that are created in html/GSP. Each of these subdirectories has two additional subdirectories HTML/ and XLS/. The converter then converts the data to HTML and saves the results in the HTML subdirectories. The XLS/ data is not created by the HTMLtools converted, but rather separately from the source data.

7.10.1 HTMLtools distribution directory

The distribution directory has the following data subdirectories required for generating data for the Jak-Stat Prospector Web site.

  Data.GBS/             - GenBatchScripts to create batchScripts/ directory
  Data.GSPI-EG/         - EGxxxx.txt data, HTML and concatenated EGMAP.txt scripts 
  Data.GSPI-ExpGrp/     - ExperimentGroups.txt data and HTML scripts 
  Data.mAdb-TestsToDo/  - script to create HTML of mAdb-TestsToDo
  Data.Maps/            - the scripts used to create HTML and .map files
  JTVjars/              - the JTV runtime jar files required
  Data.Table/           - primary .txt and .map files for EGMAP, ExperimentGroups,
                          mAdbArraySummary, and mAdb-TestsToDo files

Directories trees that are created when running the converter. These will contain the data to be copied to the J-S-P Web site staging directory:

  batchScripts/         - the directory created when run GenBatchScripts
  html/                 - the directory created when run -batchProcess:batchList.doit
  JTVoutput/            - the JTV demonstration conversion output (from JTVinput/)

Additional directories have demonstrations of other features that could be used in conversions including:

  Data.MRR/             - separate demonstration mAdb MRR conversions to HTML
  Data.MRR-all/         - fast-edit table conversion scripts using buffered I/O
  JTVinput/             - the JTV demonstration conversion scripts and data

Additional directories required for support of the converter. Note that the BAT files in the demo-bat/ directory end in "-bat" not ".bat". The README-NOTE-restoring-the-BAT-file-names.txt describes how to make the BAT files in the demo-bat/ directory runable.

  build/build.xml       - ANT build file for the making the converter 
  demo-bat/             - additional Windows BAT scripts in portable ("...-bat") form 
  docs/                 - additional converter documentation 
  javadocs/             - automatic javadoc Java documentation for the converter
  src/                  - source code for the HTMLtools converter

Additional top level files in the distribution directory:

  HTMLtools.jar          - converter Java jar file used by the BAT scripts
  ReferenceManual.html   - primary documentation for the converter
  README-NOTE-restoring-the-BAT-file-names.txt - how to activate the BAT files

7.11 Running the batchList.doit script

The Windows BAT file batchList.bat (below) runs the batchList.doit script. The script is run from the HTMLtools distribution directory.

   java -Xm256M -classpath .;.\HTMLtools.jar HTMLtools \
        -batchProcessing:batchList.doit

The subdirectories of generated files are created in html/GSP/ and then copied to subdirectories with the same names in the Jak-Stat Prospector Web tree. See Example 10 for the listing of the batchList.doit file.

8. ABOUT - OPEN SOURCE COPYRIGHT

This code is available at the HTMLtools project on SourceForge at http://htmltools.sourceforge.net/ under the "Common Public License Version 1.0" http://www.opensource.org/licenses/cpl1.0.php.

It has been released with a small non-proprietary sample data currently publicly available on NCBI GEO to demonstrate some of the aspects of the software.

It was derived and refactored from the open source MAExplorer (http://maexplorer.sourceforge.net/), and Open2Dprot (http://Open2Dprot.sourceforge.net/) Table modules.

Program Version: December 23, 2009 V.1.42 (Beta)
Revised: December 23, 2009