This part of the original CLARET program, handling the creation of the relation graphs, was split off into a separate program called CGRAPH.
In a first step, the relations between all parts, vectors of binary attribute values are computed from the primary attributes.
From this fully connected structure those relations are selected for the graph, which fit the desired structure.
Binary Attributes
Using the intermediary attributes described in the previous section, the fol- lowing list of binary attributes is computed. Most of the attributes are asymmetric, i.e. in generalA(i, j)6=A(j, i). The attributes can be grouped, according to their dependency on depth values1. All values are normalised, so the input representation is scale invariant as well as invariant to the ab- solute level of brightness. The computed attributes form an attribute vector
~
A(i, j), describing the relation between parts i and j.
• Geometric 3D
– D3(i, j) Distance in 3D between two parts
The distance is normalised by the mean distance.
D(i, j) = |~p(i)−~p(j)|/N1 P
l,k>l(|~p(l)−~p(k)|).
– Angles in 3D
Angles between two vectors 6 (d~
i, ~dj) are computed using the fol-
lowing formula: 6 (d~i, ~dj) = arcsin( ~ di×d~j |d~i||d~j| [d~i×d~j]z |[d~i×d~j]z| ); (B.1) See sec. B.1.2 how the angle between to parts is defined.
1Note that of course there can be strong correlations between the groups. For instance
B.1. INPUT REPRESENTATION 119
– F3(i, j) Relative area in 3D
The difference of the areas of part i and part j is normalised by the area of part i:
F(i, j) = (f(i)−f(j))/f(i). – S3(i, j) Maximum span in 3D
The difference of the spans of part i and part j is normalised by the span of part i:
S3(i, j) = (s3(i)−s3(j))/s3(i).
– B3(i, j) Length of shared border in 3D
The length of the border is normalised by the total border length of part i:
B3(i, j) =b3(i, j)/b3(i, i). – Z(i, j) Difference in depth
The difference of the mean depths of partiand partjis normalised by the depth of part i. z(i) denotes the z or depth component, i.e. the distance from the virtual camera, of the coordinate vector
p(i) of the centre of part i:
Z(i, j) = (z(i)−z(j))/z(i). – Vz(i, j) Variance of depth
The difference of the variances of part i and partj is normalised by the variance of part i:
Vz(i, j) = (vz(i)−vz(j))/vz(i).
• Geometric 2D
The following attributes are computed in analogy to the 3D geometric attributes, using the corresponding 2D primary attributes and discard- ing the z-coordinate values.
– D2(i, j) Distance – Angle – F2(i, j) Area – S2(i, j) Span – B2(i, j) Border length – L(i, j) Mean intensity
The difference of the mean intensities of part i and part j is nor- malised by the mean intensity of part i:
120 APPENDIX B. DESCRIPTION OF CLARET-2
– Vl(i, j) Variance of the intensity
The difference of the variances of the intensities of partiand part
j is normalised by the variance of part i:
Vl(i, j) = (vl(i)−vl(j))/vl(i).
Geometric 3D attributes are computed using a range image providing depth values. This range image is constructed from the rendered projections of the objects, as the observers see it during learning. Note that the use of such a range image is only for convenience. The same depth information could be constructed from the knowledge that the objects are composed of four identical spheres and the fact that a perspective camera is used. By estimating the apparent radius of a sphere from the curvature of its boundary the depth values of its visible surface can be computed.
Geometric 2D attributes and intensity based attributes are computed using only the projection of the objects on the computer screen, the same views the human observers are presented with during learning.
The difference in the variance of the depth values over all visible pixels of a sphere, has the following interpretation. Since all four spheres, of which each object consists, are identical, also the variance of an unoccluded sphere is a constant. Depending on how much a sphere is occluded, the variance of depth changes. With increasing occlusion the variance first increases until half the sphere is occluded and then drops to zero, as finally only the outer rim is visible (the variance in brightness of this part should be closely correlated with this attribute).
Implementing Higher Order Attributes
Given the symmetry properties of the objects, it should not be sufficient to use binary attributes alone. Also higher order relations are necessary to distinguish between the objects. In our case the angle as a tertiary (or quaternary) attribute was implemented. Since CLARET-2 only supports binary relations, angles have to be implemented using a trick. In the same way the representation of a binary attribute within an unary framework was solved in the case of border lengths, angles can be implemented by enumerating the possible combinations, again assuming a fixed maximum number of parts.
Given the set of four parts P = {a, b, c, d} each object consists of, the angle between two parts A(a, b) is defined as the angle between the two connecting lines:
B.1. INPUT REPRESENTATION 121
with D~(ij) = ~p(i)−p~(j) the connecting line between the centres of part i
and part j. Should one sphere be completely occluded, the angle is defined to be zeroA(a, b) = 0.
Graph Representations of Derived Binary Attributes
The binary attributes are organised in a graph to capture the structure of an object view. The nodes of this graph are labelled with part indices, the edges labelled with attribute vectors. The computation of this graph struc- ture can involve two main stages, first creating a pool of relations and then optionally selecting relations from this pool to satisfy constraints regarding the minimum and maximum number of edges originating from a node (so called “valency”)
For creating the pool of relations two criteria can be selected via com- mandline parameters:
Fully connected The initial relations connect all parts with each other, resulting in a total number of relations ofNr =Np(Np−1).
Connect adjacent parts Only parts which are adjacent to each other, i.e. share a border, are connected by relations.
For graphs representing objects with comparatively few parts, as in our case, the complete pool can be used to represent the object view. With complex objects having a large number of parts these initial pools of relations can be further reduced, to also reduce the computing time required by the CLARET-2 algorithm. For this end the minimum and the maximum number of edges originating from a node can be specified.
From the sorted list of relations, those are selected, where both parts have less than the desired minimum number of relations. This is the first pass of selection. After connecting all parts appropriately, in a second pass, all those relations are selected, connecting a part, which has less than the minimum number of relations. Preferably, a relation is chosen where the second part has not reached the maximum number of connections. The list of relations is constantly kept sorted first by the number of connections both parts already have and second by the distance between the parts. So if a several relations connect parts with the same number of edges, the relation with the smallest distance is selected. The result of this approach a graphs with a more regular structure, since relations between parts with smaller numbers of connections are selected first.
There is a final check making sure that the resulting graph does not dis- sociate into two unconnected subgraphs. In that case an additional relation creating a connection between them is selected.
122 APPENDIX B. DESCRIPTION OF CLARET-2