• No se han encontrado resultados

Estados financieros correspondientes al bienio terminado el 31 de diciembre de 2001

B. Notas a los estados financieros

Nucleosome Dynamics (ND) is one of the analysis methods integrated at the platform that offers a set of analyses packed in a single modular MuGVRE tool, together with a custom report viewer. Besides, the suite can well interact with other services on the platform, (i) ArrayExpress repository includes MNase-seq experiments, that might be used either to import FASTQ reads, or bibliographically compare with other nucleosome annotated sequences, (ii) alignment tools (BWA, Bowtie2) can be used to produce the aligned reads (BAM format) that ND takes as input, and (iii) the resulting nucleosome architectures written as annotation files (GFF3, BW) might be visualized with two of the visualizers offered at MuGVRE: JBrowse and TADkit.

Nucleosome Dynamics’ tool preparation

ND code (Nucleosome Dynamics CLI) is installed, together with its dependencies, in a MuG base VMI hosted at the MMB IRB cloud ( see Method’s section 3.1.2.1), in this case, configured with SGE. According to MuGVRE Tool integration protocol, either (i) support team provides IaaS access to the virtual instance and the Tool Developer directly performs the installation there, or (ii) Tool Developer makes its code public, and support team installs it in the virtual instance. The chosen strategy mainly depends on the tool installation complexity. Here, we have requested direct access to the VM.

The tool wrapping is implemented on a thin python script dependent on MG-TOOL-API, that is able to respond at the MuG agreed CLI. The code parses MuG auxiliary files and extracts from them the input file locations and ND arguments. Depending on the received arguments, the script builds the ND command(s) of the requested analysis (i.e. nucleR, nuclDyn, NFR, etc) and calls a system subprocess on each (Figure 4.37). Pipeline analyses are switched ON/OFF by the user on the web, and the wrapper runs them accordingly (i.e. arguments.NUCLER: true/false). If multiple BAM instances are given, the pipeline is executed multiple times. MUG metadata is read to learn the library type of the given BAM (single, paired) and its reference genome (e.g. hg38). ND adaptor’s code is publicly available at NucleosomeDynamics-MuGVRE repository16, with some documentation and test data.

ND requires access to certain reference data, which is part of the MuG data repository accessible via NFS under the “public_dir” mounting point on all SGE-enabled VMs. It corresponds to assembly associated data (e.g. hg38 annotated genes) used to compute nucleosome-related statistics. MuGVRE sends the location of the public directory together with the rest of the arguments, and the script uses it to compose the path for the files it is interested in.

Nucleosome Dynamics’ registration

Once the code is tested and ready, it is time to register it at the platform using a Tool Developer user account. The new Tool petition is submitted online, and requires the acceptance by the MuGVRE support team. The following information is supplied to the platform:

The Tool definition is the first and essential requirement. All the attributes comprised in the Tool data model (8.4 MuG data models) are to be filled in: list of input files, arguments, titles, keywords, etc. MuGVRE embeds on the developer’s panel an online JSON schema validator that ensures data consistency of the submitted Tools (Figure 4.38 (a)).

"arguments": [

{"name":"execution","value":"$WORKING_DIR"}, {"name":"NUCLER" ,"value": true}, {"name":"NUCLER:width" ,"value": 147}, {"name":"NUCLER:hthresh","value": 0.4}, {"name":"NUCDYN" ,"value": true}, {"name":"NUCDYN:range","value": "All"}, {"name":"NUCDYN:maxDiff","value": 70 }, […] { "_id": "unique_file_id", "file_path": "$INPUTS_DIR/chrII_4545.bam", "file_type": "BAM", "data_type": "data_mnase_seq", "meta_data": { "paired": "paired", "assembly": "hg38", […]

Rscript nucleR.R --width 147 --hthresh 0.4 --type paired […] $WORKING_DIR/chrII4545.bam

Rscript nucleRStats.R --genome $PUBLIC_DIR/hg38genes.gff --input $WORKING_DIR/NR_chrII4545.gff

Rscript nuclDyn.R --range All --maxDiff 70 --type paired […] $WORKING_DIR/NR_chrII4545.gff […]

Rscript nuclDynStats.R --genome $PUBLIC_DIR/hg38genes.gff --input $WORKING_DIR/ND_chrII4545.gff

[…]

Figure 4.37: MuGVRE wrapper for Nucleosome Dynamics build Rscript calls

Nucleosome Dynamics Tool Adaptor

input_metadata.js on

config.jso nn

Nucleosome Dynamic Tool definition is available at NucleosomeDynamics-MuGVRE repository.

- After ND is accepted, it appears on the web interface under “Testing” mode, so that only the Tool owner has access to it. Some documentation and how-to tutorials are then added on the help pages, which are online editable as Mark Down files (Figure 4.38 (b)). Extra descriptive information, logo images and ND snapshots are also submitted to be displayed on the home page.

- An auto-contained HTML/JavaScript single page (hereinafter, the custom viewer) is prepared in order to display the ND statistical information for each run. The viewer prints nicely some CSV files generated during the ND execution as histograms and HTML tables. Tool Developers submit it, and the support team integrates it at the platform. This is the way the MuGVRE offers the opportunity to provide a custom visualization for tool results. It is displayed in the side menu “View Results” for each Run folder at the workspace (Figure 4.38 (c)).

- Example datasets of ND input and output files are prepared and displayed at MuGVRE as sample data, enabling a one-click import of demo data at the workspace. Indeed, three different datasets are prepared ([220]–[222]) with the double propose of demonstrating ND usage to MuGVRE users, and show-casing ND method potential during a peer-to-peer journal review. Demo data is under “Get Data” section.

Running Nucleosome Dynamics

Running Nucleosome Dynamics (ND) at MuGVRE follows the same basic flow already commented on MuGVRE user’s perspective, which can be summarized as:

I. Get Data

II. Choose and Configure Tool III. Display Results

(a) (b) (c)

Figure 4.38 Nucleosome Dynamics Tool submission on MuGVRE.

(a) Online validation of the Tool object definition. (b) Example of MD help page, online editable. (c) Nucleosome Dynamics Custom viewer

Choosing “Import example dataset” and “Cell Cycle Data Set” results in a quick sample data loading sufficient for demonstration purposes. In our active workspace, two BAM files of type “MNaseq data” should appear under the “Uploads” folder.

We can reach ND Tool from either “Workspace” or “Launch Tool”. The first implies knowing in advance what are the input files taken for ND, selecting them at the Workspace, and choosing “Nucleosome Dynamics” from the “Available Tools” drop-down menu that appears below, next to the shopping card listing selected data. The latter is designed for a more exploratory use of MuGVRE. Launch Tool section lists available tools and visualizers, with some descriptive data to browse among them. Nucleosome Dynamics appears under “Analyse MNase-seq Data”. We should now set a name for the new job (by default run[000]), and configure ND pipeline. The webform lists the ND different analyses grouping them under a switch button to enable/disable them. Input file and argument fields are to be filled in. If files have already been selected in the “Workspace”, the form already shows them in place. If not, the selection of each input field opens a dialog with all the possible files suitable for such field, filtering out data with not appropriated “File Format” or “Data Type”.

After job submission, the Run folder appears at the “Workspace”, with all the job information (e.g. arguments in use, requested resources) and job status (i.e. PENDING, RUNNING, ERROR, FINISHED). Meanwhile, the execution progression also monitored checking the log file, either nicely formatted or in raw format.

On job completion (after 5 – 15’ depending on selected ND analyses), the list of GFF and BED files composing ND results appears grouped inside the Run folder. Under “View Results”, some run’s statistical data and reports are displayed. 2D sequence annotation files are eligible to be displayed by either JBrowse or TADkit, the second more focused on chromatin models, so that we choose the generic genome browser. JBrowse is available for a selection of reference genomes, including S. Cerevisiae, our dataset’s yeast. ND sequence tracks can be visualized and comparatively analyzed with other user’s data, from its workspace or imported from ArrayExpress, but also with bibliographic tracks integrated into JBrowse a public reference data.

Documento similar