• No se han encontrado resultados

of-function studies have become feasible with RNAi (Wheeler et al., 2005). The interfering RNAs are spotted on a microarray and cell cultures are grown. The cultures can then be analyzed for different phenotypes such as cell morphology, cell viability, or the expression of a reporter gene.

7.2.4

Data integration

As mentioned above, an important challenge for future research is the improvement of data integration. The current situation is characterized by data of different quality and level of detail. These differences have to be accommodated for and, in addition, data from different experimental approaches and different species should be integrated into algorith- mic approaches that aim at improving the understanding of certain biological processes or diseases.

Integration of data from different species is of particular importance. On the sequence level, comparisons between different species has become a standard procedure for many research questions as sequence conservation can often provide valuable information about the functional significance of certain features. An example for that approach was presented in 6.2.2, where we used sequence conservation to find functional transcription factor binding sites. An interesting question for future research is how far this comparative approach can be extended to other types of data such as networks or even expression data.

7.3

Final remarks

Most parts of this thesis were concerned with the integration of different data types, espe- cially gene expression data and biological networks. As larger and better curated databases of networks will certainly evolve, it is important to develop algorithms that can work with these new data. At the moment, two kinds of methods are prevalent: The first one com- prises simple querying mechanisms, usually based on keyword matching, as they can be found in almost any web interface for a biological database. The second one covers sim- ulation approaches, where very detailed models are needed, which are available only for

very few biological systems. Pathway queries were developed as an attempt to provide

a method that lies between these two approaches. They provide capabilities for complex queries, taking network structure into account, as well as for basic reasoning using statis-

tical scoring methods. As the name suggests, pathway queries are still mainly a querying

mechanism. Further developments should improve the reasoning capabilities, possibly tak- ing the dynamics of biological systems into account. Ideas for such developments could be borrowed from simulation methods as well as classical reasoning systems. With such an improved querying and reasoning system, it will be possible to integrate existing and emerging biological databases much more efficiently into the task of interpreting newly generated data.

Appendix A

XML Schema and stylesheet of the

Pathway Query Language

A.1

Schema definition

<?xml version=”1.0” encoding=”UTF−8”?> <xs:schema xmlns:pw=”http://bio.informatik.uni−muenchen.de/Pathways” xmlns:xs=”http://www.w3.org/2001/XMLSchema” xmlns=”http://www.w3.org/1999/xhtml” elementFormDefault=”qualified” targetNamespace=”http://bio.informatik.uni−muenchen.de/Pathways” version=”1.0” xml:lang=”en”> <xs:annotation>

<xs:documentation xml:lang=”en”>Schema for pathway queries.

A pathway query describes a template for biological networks with annotations and experimental data.

<p>Florian Sohler, Ralf Zimmer:<cite>Identifying active transcription factors and

kinases from expression data using Pathway Queries.</cite>

Bioinformatics. 2005; 21(Suppl. 2) :ii115−ii122.

</p><p>

Florian Sohler, Daniel Hanisch, Ralf Zimmer:<cite>

New methods for joint analysis of biological networks and expression data.</cite>

Bioinformatics. 2004 Jul 1;20(10):1517−21.

</p>

</xs:documentation> </xs:annotation>

<xs:element name=”TheNet” type=”pw:Net”> <xs:annotation>

160 A. XML Schema and stylesheet of the Pathway Query Language

<xs:documentationxml:lang=”en”>

The root element for a pathway query is TheNet.

</xs:documentation> </xs:annotation>

<xs:key name=”SubnetKey”> <xs:annotation>

<xs:documentationxml:lang=”en”>Names of subnet elements must be unique.

They are referred to in connection elements.</xs:documentation> </xs:annotation>

<xs:selector xpath=”.//pw:Subnet”/> <xs:field xpath=”@name”/>

</xs:key>

<xs:keyref name=”FromRef” refer=”pw:SubnetKey”> <xs:selector xpath=”.//pw:Connection”/>

<xs:field xpath=”pw:ConnectFrom”/> </xs:keyref>

<xs:keyref name=”ToRef” refer=”pw:SubnetKey”> <xs:selector xpath=”.//pw:Connection”/> <xs:field xpath=”pw:ConnectTo”/> </xs:keyref> <!−−unique name=”ConnectionKey”> <selector xpath=”.//pw:Connection”/> <field xpath=”@name”/> </unique−−> </xs:element> <xs:complexType name=”Net”> <xs:annotation>

<xs:documentationxml:lang=”en”>The type Net defines a network template.

It specifies subnetworks or single nodes in the subnet element and connections (corresponding to paths in instances of the template) in the connection elements.

</xs:documentation> </xs:annotation> <xs:choice>

<xs:sequence>

<xs:element maxOccurs=”unbounded” minOccurs=”0” name=”Subnet” type=”pw:Net”/> <xs:choice maxOccurs=”unbounded” minOccurs=”0”>

<xs:element name=”Connection” type=”pw:ConnectionType”/>

<xs:element name=”VirtualConnection” type=”pw:VirtualConnectionType”/> </xs:choice>

</xs:sequence>

<xs:element name=”PathwayNode” type=”pw:PathwayNodeType”/> </xs:choice>

<xs:attribute name=”name” type=”xs:string” use=”required”/> <xs:attribute default=”false” name=”multiple” type=”xs:boolean”/>

A.1 Schema definition 161

<xs:attribute name=”multipleMin” type=”xs:nonNegativeInteger”/> <xs:attribute name=”layer” type=”xs:positiveInteger” use=”optional”/> </xs:complexType>

<xs:complexType name=”ConnectionType”> <xs:sequence>

<xs:element maxOccurs=”1” minOccurs=”1” name=”ConnectFrom” type=”xs:string”/> <xs:element maxOccurs=”1” minOccurs=”1” name=”ConnectTo” type=”xs:string”/> <xs:element maxOccurs=”1” minOccurs=”0” name=”PlaceQuery” type=”pw:QueryType”/> <xs:element maxOccurs=”1” minOccurs=”0” name=”TransitionQuery” type=”pw:QueryType”/>

<xs:element maxOccurs=”1” minOccurs=”0” name=”Scoring” type=”pw:ScoringType”/> </xs:sequence>

<xs:attribute default=”0” name=”minEdges” type=”xs:nonNegativeInteger”/> <xs:attribute default=”1” name=”maxEdges” type=”xs:positiveInteger”/> <xs:attribute default=”false” name=”intersectionAllowed” type=”xs:boolean”/> <xs:attribute default=”false” name=”undirected” type=”xs:boolean”/>

<xs:attribute name=”layer” type=”xs:nonNegativeInteger” use=”optional”/> </xs:complexType>

<xs:complexType name=”VirtualConnectionType”> <xs:sequence>

<xs:element maxOccurs=”1” minOccurs=”1” name=”ConnectFrom” type=”xs:string”/> <xs:element maxOccurs=”1” minOccurs=”1” name=”ConnectTo” type=”xs:string”/> <xs:choice>

<xs:element name=”Comparison” type=”xs:string”/> <xs:sequence>

<xs:element minOccurs=”0” name=”FromDynamicQuery” type=”pw:DynamicQueryType”/> <xs:element minOccurs=”0” name=”ToDynamicQuery” type=”pw:DynamicQueryType”/> </xs:sequence> </xs:choice> </xs:sequence> </xs:complexType> <xs:complexType name=”DynamicQueryType”> <xs:choice> <xs:sequence>

<xs:element maxOccurs=”1” minOccurs=”1” name=”BasicQuery” type=”pw:BasicQueryType”/> <xs:element maxOccurs=”unbounded” minOccurs=”0” name=”DynamicParameter”>

<xs:complexType> <xs:sequence>

<xs:element name=”MapName” type=”xs:string”/> <xs:element name=”Node”>

<xs:simpleType>

162 A. XML Schema and stylesheet of the Pathway Query Language <xs:enumeration value=”from”/> <xs:enumeration value=”to”/> </ xs:restriction> </xs:simpleType> </xs:element> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:sequence>

<xs:element maxOccurs=”1” minOccurs=”1” name=”DynamicQuery1”

type=”pw:DynamicQueryType”/>

<xs:element maxOccurs=”1” minOccurs=”1” name=”Operator”

type=”pw:OperatorType”/>

<xs:element maxOccurs=”1” minOccurs=”1” name=”DynamicQuery2”

type=”pw:DynamicQueryType”/> </xs:sequence> </xs:choice> </xs:complexType> <xs:complexType name=”PathwayNodeType”> <xs:annotation>

<xs:documentationxml:lang=”en”>Defines a node of the network template by specifying

restrictions on node annotations.</xs:documentation>

</xs:annotation> <xs:sequence>

<xs:element maxOccurs=”1” minOccurs=”1” name=”Query” type=”pw:QueryType”/> <xs:element maxOccurs=”1” minOccurs=”0” name=”Scoring” type=”pw:ScoringType”/> </xs:sequence>

<xs:attribute name=”name” type=”xs:string”/>

<xs:attribute default=”1000000000” name=”maxVertices” type=”xs:positiveInteger”/> </xs:complexType>

<xs:complexType name=”QueryType”> <xs:annotation>

<xs:documentationxml:lang=”en”>

Combines BasicQuery elements using boolean operators.

</xs:documentation> </xs:annotation>

<xs:choice>

<xs:element maxOccurs=”1” minOccurs=”1” name=”BasicQuery” type=”pw:BasicQueryType”/> <xs:sequence>

<xs:element maxOccurs=”1” minOccurs=”1” name=”Query1” type=”pw:QueryType”/>

A.1 Schema definition 163

<xs:element maxOccurs=”1” minOccurs=”1” name=”Query2” type=”pw:QueryType”/> </xs:sequence> </xs:choice> </xs:complexType> <xs:simpleType name=”OperatorType”> <xs:restriction base=”xs:string”> <xs:enumeration value=”and”/> <xs:enumeration value=”or”/> </ xs:restriction> </xs:simpleType> <xs:complexType name=”BasicQueryType”> <xs:annotation>

<xs:documentationxml:lang=”en”>Defines a query on one annotation type, e.g.

GO molecular function = transcription factor activity .</xs:documentation>

</xs:annotation>

<xs:sequence>

<xs:element maxOccurs=”1” minOccurs=”1” name=”MapName” type=”xs:string”/>

<xs:element maxOccurs=”unbounded” minOccurs=”0” name=”Parameter” type=”xs:string”/> <xs:element maxOccurs=”1” minOccurs=”1” name=”Operator” type=”pw:MapOperatorType”/> <xs:element maxOccurs=”1” minOccurs=”1” name=”Value” type=”xs:string”/>

</xs:sequence>

<xs:attribute default=”false” name=”Negated” type=”xs:boolean”/> </xs:complexType> <xs:simpleType name=”MapOperatorType”> <xs:restriction base=”xs:string”> <xs:enumeration value=”lt”/> <xs:enumeration value=”gt”/> <xs:enumeration value=”eq”/> <xs:enumeration value=”like”/> <xs:enumeration value=”gteq”/> <xs:enumeration value=”lteq”/> <xs:enumeration value=”isnull”/> </ xs:restriction> </xs:simpleType> <xs:complexType name=”ScoringType”> <xs:annotation>

<xs:documentationxml:lang=”en”>Defines different scoring methods that are used to score

instances of the network template.</xs:documentation>

164 A. XML Schema and stylesheet of the Pathway Query Language

<xs:choice>

<xs:element name=”P−Value” type=”pw:PValType”/>

<xs:element name=”AdditiveScore” type=”pw:AdditiveScoreType”/> </xs:choice>

</xs:complexType>

<xs:complexType name=”PValType”> <xs:annotation>

<xs:documentationxml:lang=”en”>Defines scoring types that result in a p−value.

The MapName scoring type simply refers to a ToPNet data map and gets its result by applying the data map to the nodes of the instance .

The RankScore should only be applied to PathwayNodes with

multiplicity one. It specifies a data map to rank all nodes of the search network. From these ranks significance values will be computed.

The FETScore works similar to the RankScore, but uses Fisher exact test to

compute p−values. A Query is performed on

all nodes in the search graph. The significance of the overlap with the instance nodes

for the given PathwayNode is computed.</xs:documentation>

</xs:annotation>

<xs:choice>

<xs:element name=”MapName” type=”xs:string”/> <xs:element name=”MultiNode”>

<xs:complexType/> </xs:element>

<xs:element name=”RankScore”> <xs:complexType>

<xs:attribute name=”MapName” type=”xs:string”/> <xs:attribute name=”tail” use=”required”>

<xs:simpleType> <xs:restriction base=”xs:string”> <xs:enumeration value=”high”/> <xs:enumeration value=”low”/> <xs:enumeration value=”both”/> </ xs:restriction> </xs:simpleType> </xs:attribute> </xs:complexType> </xs:element> <xs:element name=”FETScore”> <xs:complexType> <xs:sequence>

<xs:element name=”Query” type=”pw:QueryType”/> </xs:sequence>

Documento similar