The graphical user interface was implemented to give an easy access point to the user, and to make working with the jPREdictor more comfortable. All settings and parameters can be found here, the tasks are easily started by just pressing the corresponding action buttons.
The graphical part of the jPREdictor is started either as a stand alone program or within a website, as a Java Applet or as a Java WebStart application. Both Applet and WebStart application are imple- mented into the jPREdictor website:
http://bibiserv.techfak.uni-bielefeld.de/jpredictor
To start the program with the graphical user interface simply type
java -jar jPREdictor.jar
into the terminal. Prior to version 1.2 this was the only way to get the GUI started, and no pa- rameters were allowed at start-up. Now, with version 1.2 and newer, parameters were introduced to force the jPREdictor into certain actions. One of these actions is to start the GUI regardless of other settings. It is done with:
java -jar jPREdictor.jar [parameters] --forceGUI [parameters]
The jPREdictor program will first evaluate all other parameters given and then start the graphical user interface. Note that all other force parameters are ignored.
3.6.1.1 Main window
The main window of the jPREdictor is organized into sections, each corresponding to different com- ponents related to each other (Figure 3.7). In Section 1 all components are related to file input and output, namely the sequence files to search or score, the positive and negative training sets, and the output file. The files may be browsed by pressing the corresponding buttons or by directly typing them in. As an alternative, sequence data can be given directly by pressing the “Paste...” button.
Section 2 contains additional parameters and settings (Figure 3.7). Both window parameters are necessary for scoring a sequence, since the score will be calculated within a window that slides over the sequence. The width is defined in number of nucleotides, e.g. 500. The shift value denotes the number of nucleotides lying between the starts of two consecutive windows, e.g. the first window starts at position 700 on the sequence, the second at 710, which would mean a shift value of 10. The cut-off value is a parameter which changes the output of the score sequence task. If it is omitted, the scoring prints the score for every possible sequence window. If it is specified, only windows with scores higher than the cut-off are output after an additional step, which combines overlapping windows to so-called bands.
Four checkboxes can be found in Section 2 (Figure 3.7). The first checkbox, “Output motif occur- rences”, is available for the “Search for motifs” task. Normally, searching motifs does not return the
Figure 3.7: Main window of the jPREdictor program. Section 1 and 2 contain certain settings and parameters, Section 3 lists all usable motifs in a tree structure, Section 4 contains the action buttons for starting tasks and Section 5 has two output fields, the upper for “normal” output and the lower for “error” messages.
single hits but the number of found motifs per sequence. With this checkbox activated, every and all single hits are printed. The second checkbox, “Weight normalization” enables or disables sequence length normalization while calculating the weights (see Chapter 3.3.2 for an explanation). This check- box is related to the “Weight motifs” task. The last two checkboxes exist to redirect the output. The first one, “Mirror output”, prints all the text, which is normally sent to the output file given in Section 1, to the output Section 5. If much output is produced this may rapidly slow down the jPREdictor, since the text fields supported by Java are not meant for very large amounts of text. The graphical output available by checking the last box in Section 2 will only affect the “Score sequence” task. After starting the task a window will pop up, the ScorePlotBrowser, showing a graph over the scores (see Figure 3.8).
In Section 3 (Figure 3.7) the list of motifs is shown to be used to search and score with. Motifs can be checked and unchecked to include or exclude them from being used in the tasks. This list can be filled and edited either by loading an option file via menu, File→Load..., by selecting a predefined set of motifs from the menu Presets, or by comfortably designing motifs using the available MotifMaker. Together with most parameters and settings, the list of motifs can be saved to an option file via the File→Save... menu.
In Section 4 (Figure 3.7) the buttons to start certain tasks are located. By pressing the first button, “Search for motifs”, the jPREdictor searches the checked motifs in the given sequence file or in the pasted sequence, and outputs either for every motif and every given sequence the number of motifs found, or, if the box “Output motif occurrences” is checked, returns a exhaustive list of all motif occurrences. Pressing the second button, “Weight motifs”, performs a search in both positive and negative set for all checked motifs. As a result, the motifs are assigned weights as explained in Chapter 3.3.2. If they were already weighted, these weights are discarded. By pressing the third
3.6 User interfaces
Figure 3.8: Example score plot over the bithorax complex of D. melanogaster. The grey vertical bar marks the position of the sequence cutout, which can be seen in the lower section of the window, and which shows single motifs on the sequence highlighted in green colors.
button, “Score sequence”, the task to score a sequence window-by-window is started. Note that all checked motifs must have been weighted previously, otherwise an error message will pop up. These three tasks can be canceled at any time by pressing the “Cancel” button, which is only available if a task really was started. Another commonality between these three tasks is that they send their status and error messages to the output section, numbered as 5 in Figure 3.7.
3.6.1.2 Cut-off calculator
In Section 4, a fourth button exists, the one to start a cut-off calculation via the CutoffCalculator (see Figure 3.9). The cut-off calculator works by randomly generating a large amount of sequences. A zeroth order Markov chain model is used by the underlying generator. The vector of nucleotide distribution is either the program-wide one or it is calculated by counting the number of nucleotides in the sequence file given in the main window (Section 1 in Figure 3.7). This counting is started by pressing the “Get Distribution”-button. The number of sequences to be generated is given in the text field “Sample Size” (Figure 3.9). The length of each generated sequence can be given in the text field “Genome length”.
In order to get a valid cut-off all sequences are scored and the number of windows with scores above each possible cut-off is counted. The motifs needed for the scoring are the ones set in the main window of the jPREdictor. As a result a window pops up with the optimal cut-off for every E-value. When scoring the real genome, the E-value is the number of sequence windows above the corresponding cut-off to be expected merely by chance.
Figure 3.9: The CutoffCalculator built into the jPREdictor. Performs a cut-off calculation by scoring randomly generated sequences, using the motifs from the main window of the jPREdic- tor. After start-up (left), while performing a calculation (middle), and after finishing the calculation (right).
3.6.1.3 Motif maker
The motif maker is started via pressing the button “MotifMaker” in Section 3 of the jPREdictor main window (Figure 3.7). It allows for easy creation and editing of motifs and for re-arranging the list of used motifs. In Figure 3.10 two windows with different activated tabs are shown, on the left for editing a regular expression motif, and on the right for editing a matrix motif. All attributes and characteristics of a motif are accessible for change, e.g. identifier (name), description, weight, and search direction. In order to get both MotifMaker windows, select Presets→New (2006) PRE/TRE prediction on D. melanogaster in the jPREdictor’s main window, followed by clicking the “MotifMaker”-button. Within the MotifMaker, expand the tree to the left until an En1 or a pssmPHO, respectively, is se- lectable. Press “Edit” to automatically switch to the tab necessary for displaying all attributes of the motif. For the pssmPHO motif, it is possible to see all sequences the matrix was created from by clicking the “Paste...”-button.
After finishing all changes the newly created motif can either be a replacement for selected motifs from the tree (button “Overwrite”), or it can simply be registered as a new motif (button “Register”). Note that the name acts as unique identifier, and no duplicate names are allowed.
In Figure 3.11 the tab to create and edit motif patterns, also called MultiMotifs, is shown. Since MultiMotifs might become very complicated, the usual procedure starts with creating simple, flat MultiMotifs, which are afterwards combined to higher-order MultiMotifs of any complexity. The intermediate motifs are all stored in the list of motifs to the left, and have to be deleted after the assembly process. To insert distance constraints between the motifs, the “Add”-button has to be pressed after the minimal and maximal distance was typed in. Note that in order to append a distance constraint or a motif at the end, the insert policy requires that nothing is selected. If a selection is made, the constraint or motif is placed before it.
3.6 User interfaces
Figure 3.10: The MotifMaker with two out of five tabs for editing a RegularExpressionMotif and a PSSM motif. In the left picture, the regular expression motif Engrailed, and in the right picture, the position specific score matrix for the Pho motif is shown.
Figure 3.11: The MotifMaker with the selected tab for creating and editing motif patterns, so called MultiMotifs. A non-flat sample motif was created containing one MultiMotif, PHO- DSP1, followed by a matrix motif, pssmPHO, and terminated by another MultiMotif, Z:Z.