HIPÓTESIS:
3. MATERIAL Y MÉTODOS
4.2 ESTUDIO DE LOS CONTACTOS
1. V ⊆R2;
2. every edge is an arc between two vertices; 3. different edges have different sets of endpoints;
4. the interior of an edge contains no vertex and no point of any other edge.
WhenGis a plane graph, we call the regions ofR2\GthefacesofG. These are open subsets ofR2and hence have their frontiers inG. SinceGis bounded–i.e., lies inside some sufficiently large discD–exactly one of its faces is unbounded, the face that containsR2\D. This face is the outer faceofG; the other faces are itsinner faces. We denote the set of faces ofGbyF(G). A planar graphis a graph which can be embedded inR2to form a plane graph.
Definition 7.16. LetG= (V,E),G= (V,E)be two graphs. A mapϕ:V →V is ahomo- morphismfromGto G if it preserves the adjacency of vertices, that is, if{ϕ(x),ϕ(y)} ∈E whenever {x,y} ∈E. If ϕ is bijective and its inverseϕ−1 is also a homomorphism (so that xy∈E ⇔ϕ(x)ϕ(y)∈E for allx,y∈V), we callϕ anisomorphism, say that Gand G are isomor phic, and writeGG. An isomorphism fromGto itself is anautomorphismofG. Let G= (V,E)be a subgraph ofG. If a homomorphism bijective mapϕ:V→Vexists from GtoG, thenGissubgraph isomorphicwithG, that isGis isomorphic with a subgraph ofG.
7.2. Condensed Molecular Graph
The vertex and edge colourings of a molecular graph (definition 7.14) contain insufficient molec- ular information to be used for the parametrisation process, primarily a lack of stereochemical information and formal charge information on the atoms. Additionally, they contain a large num- ber of leaves which can cause exponential explosions in the number of (sub)graph isomorphic mappings between two molecular graphs. To bypass these issues, the concept of a condensed molecular graph is introduced.
Definition 7.17. LetM= (A,B)be a molecular graph, with weak vertex colouringcvM:A→P
and weak edge colouringceM:B→Ras per definition 7.14. Acondensed molecular graph G=
(V,E)is then the subgraphG=M[A]whereA⊆Ais the set of verticesa∈AwithdM(a)=1
unlessdM(a) =1 andceM({a} ∪NM(a))=1 orcvG(a)∧7=0. A condensed molecular graph
has a vertex colouring cvG :V →Sv whereSv is the set of vertex colours as per section 7.2.1,
and an edge colouring ceG :E →Se whereSe is the set of edge colours as per section 7.2.2.
7. Algorithmic Design and Theory
0 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 1 1 1 0 1 FC +/−H-count F-count Cl-count Br-count I-count R/S degree atomic number
20 216 231
Figure 7.9:Bit string representation of the vertex colour 32-bit integer. This string has the integer value 3,099,068,446 and corresponds to the (unrealistic) situation of a uranium atom with a formal charge of
−6, one condensed hydrogen atom, two condensed chlorine atoms, S stereo chemistry and a degree of 5 in the normal molecular graph.
incorporated into its parent vertex and is known as acondensed vertex. The vertex colouring contains information on the condensed vertices.
To aid in packing large amounts of information into as little memory as possible, the no- tion of bit manipulation is utilised in determining the colours available to the vertex and edge colourings. Here bit manipulation is used to compress multiple types of information into a sin- gle integer by assigning the different types of information to unique index ranges within the bit representation of the integer.
7.2.1. Vertex Colours
Vertex colours utilise 31 of the available 32 bits of the 32-bit integer used for each available colour. These 31 bits contain ten distinct pieces of information, as shown in figure 7.9 and detailed below, from least significant bit to most significant bit.
20→22 These two bits give the formal charge magnitude on the atom, as determined by the FPT algorithm described in section 9.4.4. The use of three bits allows for formal charge magni- tudes of between zero and seven, which should cover all reasonable formal charge values. The binary value of the formal charge is used to populate the three indices.
23 This bit gives the sign of the formal charge. It is set to 0if the formal charge is zero or positive and1if it is negative.
24→26, 27→29, 210→212, 213→215 and 216→218 These five groups of three bits rep- resent the counts of the different elements which could have been condensed into a vertex in the transition from a molecular graph to a condensed molecular graph. The rules governing which vertices may be condensed into their parent vertex (definition 7.17) effectively limit the elements that could be condensed to hydrogen and the halogens. Though it is unlikely that there would be a situation where more than three vertices of the same molecular graph colour need
7.2. Condensed Molecular Graph
0 1 0 1 1 0 0 0 E/Z bond order
20 27
Figure 7.10: Bit string representation of the edge colour 8-bit integer. This string has the integer value 26 and corresponds to the (unrealistic) situation of a bond with E stereo chemistry and an order of 6.
to be condensed into the same parent vertex, there are some simple situations, such as methane, where it is possible, so three bits are required rather than just two. Again, the values of the bits are set by the binary value of the integer count of each element.
219→220 These two bits are used to represent any chirality associated with an atom. Using
the Cahn–Ingold–Prelog priority rules (CIP) rules and the cartesian coordinates from which a molecular graph is generated, an estimate of the stereo chemistry (section 7.3) of an atom with four unique neighbours is made.39,40An estimate is made, as opposed to an exact calculation, as
the stereo chemistry of an atomic centre could change depending on the size of the fragment that it is in. To make this estimate, only atoms within a path of lengthkfrom the centre of interest are included in the calculation. For the purposes of this description,kis assumed to be set to two. There are three possible states that the chirality of a centre can take, achiral, represented by
00, R, represented by01and S, represented by11.
221→223 The degree of the vertex within the molecular graph is given by these three bits. This information is somewhat redundant, but it could be useful in some matching situations. The three bits are set to match the binary value of the integer degree.
225→231 These final seven bits represent the atomic number of the element associated with
the vertex. Currently, there are 118 elements in the periodic table, meaning they can all be represented by the seven bits available. Naturally, the binary value of the atomic number is used to fill the seven bits.
7.2.2. Edge Colours
Edge colours utilise five of eight bits in an 8-bit integer. These five bits contain two types of information, as shown in figure 7.10 and detailed below.
20→21 These two bits contain information about the stereochemistry of double bonds. As is the case for the stereochemistry of atom centres, this only gives an estimate due to the potential
7. Algorithmic Design and Theory
for the result to change depending on the size of the fragment. If a bond is not a double bond, or the double bond is symmetric, a value of00is used. If a double bond hasEstereochemistry, a value of01is used, and a value of11is used forZstereochemistry.
22→24 These three bits contain the bond order as produced by the FPT algorithm described in section 9.4.4. The bits are set to match the integer value of the order, unless the bond is aromatic, in which case the bits are set to111. An aromatic bond is determined using the bond orders obtained by the FPT algorithm and the aromaticity perception algorithm in OpenBabel.41