3. Objetivo
8.8. Anexo 8: Tabla resumen de la bibliografia consultada
The caption semantics framework of ShapeWorld is loosely inspired by how language meaning is modelled in formal semantics via a logical formalism. A caption component can act either as a caption itself, that is, a fully-specified component which corresponds to a natural language statement or, where applicable, as an argument to another component, compositionally forming a more complex nested structure. The next paragraphs introduce the various caption components which are currently implemented in ShapeWorld. Similar to a microworld, these components are internally represented as dictionary-like attribute-value structures.
Captions and predicates. The interpretation of a caption JcK is a function assigning world instances to ternary truth values, W 7−→ {true,f alse,ambiguous}, indicating whether caption c is true, false or ambiguous given a corresponding image. Some caption components – in the following referred to as predicates – are better characterised via two mappings p.agree(·) and p.disagree(·), which indicate whether the predicate caption p agrees with an entity or not: e 7−→ {true,f alse}. These two functions alternatively characterise the semantic interpretation of a predicate via: JpK =
true if ∃ e ∈ W : p.agree(e) [=true]
f alse if ∀ e ∈ W : p.disagree(e) [=true] ambiguous else
Note that an entity may neither clearly agree nor disagree, in which case the truth of the predicate is interpreted asambiguous. Importantly, the boundary to definite truth values is not informed by human perception of ambiguity (a research topic in its own right), but is rather chosen as ‘safe margin’ which generously excludes potentially controversial configurations. Combinations of a caption and an image with ambiguous agreement are categorically rejected. Consequently, all generated outputs are unambiguously true or false, while there are ‘grey zones’ of the ShapeWorld instance space which are never sampled.
Attributes. An attribute is a predicate component specified by a type and a value. Available types are: “shape”, “colour” and “combination” which accept as value a corresponding shape/colour value or a pair thereof; and “shapes”, “colours” and “combinations” which accept a set of such values. The following two examples illustrate the general pattern of attribute definitions.
• “red” translates to p = attribute(type = colour, value = red):
p.agree(e) := e.colour = red
p.disagree(e) := e.colour 6= red
• “round” translates to p = attribute(type = shapes, value = {circle, semicircle, ellipse}):
p.agree(e) := e.shape ∈ {circle, semicircle, ellipse}
p.disagree(e) := e.shape /∈ {circle, semicircle, ellipse}
Object-types. An object-type is an intersective combination of attributes, itself again forming a predicate component. For instance, “red square” combines two attributes to an object-type. Note that an “object-type” describes a class of objects (red squares), whereas an “object” corresponds to a concrete entity in an image (a concrete red square).
• p = object-type(attributes = A):
p.agree(e) := ∀ p0 ∈ A : p0.agree(e) p.disagree(e) := ∃ p0 ∈ A : p0.disagree(e)
Relations. A relation is a predicate component specified by a type, a value and a reference object-type. Relations address most properties of an object: x-coordinate (“to the left/right of X”), y-coordinate (“above/below X”), z-coordinate (“behind/in front of X”), shape (“same/different shape as/from X”), colour (“same/different colour as/from X”), shape size (“smaller/bigger than X”), colour shade (“darker/lighter than X”), plus two ternary relations addressing relative distances (“closer to the Y than X”, “farther from the Y than X”) which additionally involve a second unique comparison object-type. Furthermore, both an attribute and an object-type can be trivially turned into a relation (via a form of “to be X”, for example, “is blue”). The following two examples illustrate the pattern of relation definitions. Note the avoidance of ambiguity by requiring minimal value differences, by accepting “left” only if an object’s x-distance is more than its y-distance to the reference, and by accepting “bigger” only if both objects have the same shape, since relative size perception can be skewed for certain shape pairs.
• “to the left of” translates to p = relation(type = x-rel, value = −1, reference = r):
p.agree(e) := ∃ e0 ∈ r.agree(·) : (e0.x − e.x) > max(distance, |e.y − e0.y|)
p.disagree(e) := ∀ e0 ∈ ¬r.disagree(·) : (e0.x − e.x) < −distance
• “bigger” translates to p = relation(type = area-rel, value = 1, reference = r):
p.agree(e) := ∃ e0 ∈ r.agree(·) : (e.area − e0.area) > area ∧ e0.shape = e.shape
p.disagree(e) := ∀ e0 ∈ ¬r.disagree(·) : (e.area − e0.area) < −area
Selectors. I refer to a range of “the. . . X” phrases as “selector”, like “the bigger square” or “the leftmost circle”, which ‘select’ one object from a set of two/multiple objects according to a certain criterion, like size or relative x-location. Each phrase comes in two variations, one based on the positive or comparative form of the adjective, like “the bigger X” or “the left X” of overall two “X”, and another based on the superlative form, like “the biggest X” or “the leftmost X” of an arbitrary number of “X”. Formally, a selector is a predicate component specified by a type, a value and a scope object-type which defines the set of objects from which is selected. Similarly to relations, selectors may address most properties of an object: x-coordinate (“the left/right X” and “the leftmost/rightmost X”), (“the upper/lower X” and “the uppermost/lowermost X”), shape size (“the smaller/bigger X” and “the smallest/biggest X”), colour shade (“the darker/lighter X” and “the darkest/lightest X”), and two relative distance selectors (“the X closer/farther to/from the Y”and “the X closest/farthest to/from the Y”) which additionally involve a second unique comparison object-type. The following two examples illustrate the pattern of selector definitions. Similar to the relation example, ambiguity is avoided by requiring minimal differences. Note also that the semantics of the “the bigger” example is defined as ambiguous (and consequently discarded in the generation process) unless there are exactly two objects to choose from, as the phrase is not well-defined otherwise.
• “the bigger” translates to p = selector(type = area-two, value = 1, scope = s):
p.agree(e) := s.agree(e) ∧ (part of scope)
|{e0 : s.agree(e0)}| = |{e0 : ¬s.disagree(e0)}| = 2 ∧ (two scope objects)
∀≥1e0 ∈ s.agree(·) : e0 6= e ∧ (at least one other scope object)
e0.shape = e.shape ∧ (other scope objects have the same shape) (e.area − e0.area) > area (other scope objects are smaller)
p.disagree(e) := s.disagree(e) ∨ (either not part of scope) h
|{e0 : s.agree(e0)}| = |{e0 : ¬s.disagree(e0)}| = 2 ∧ (or two scope obj.) ∃ e0 ∈ ¬s.disagree(·) : (e.area − e0.area) < −area
i
(other objects bigger)
• “the biggest” translates to p = selector(type = area-max, value = 1, scope = s):
p.agree(e) := s.agree(e) ∧ (part of scope)
|{e0 : s.agree(e0)}| ≥ 2 ∧ (at least two scope objects)
∀≥1e0 ∈ s.agree(·) : e0 6= e ∧ (at least one other scope object)
e0.shape = e.shape ∧ (other scope objects have the same shape) (e.area − e0.area) > area (other scope objects are smaller)
p.disagree(e) := s.disagree(e) ∨ (either not part of scope) h
|{e0 : s.
agree(e0)}| ≥ 2 ∧ (or at least two scope objects)
∃ e0 ∈ ¬s.disagree(·) : (e.area − e0.area) < −area
i
(other objects bigger)
Existentials. An existential is a combination of an object-type or selector acting as subject, and a relation acting as verb.
• c = existential(subject = s, verb = v): JcK(W ) :=
true if ∃ e ∈ W : s.agree(e) ∧ v.agree(e)
f alse if ∀ e ∈ W : s.disagree(e) ∨ v.disagree(e) ambiguous else
Quantifiers. A quantifier is a caption component specified by a type (“count” or “ratio”), a comparator (“equal”, “not equal”, “less than”, “at most”, “more than”, “at least”), a quantity, plus an object-type acting as subject and a relation as verb. The type defines whether object numbers are quantified, like “three”, or fractions between set cardinalities, like “half ”3. The
quantity specifies the reference number/fraction, which combined with the comparator yields the associated truth value. Currently supported quantities are the numbers 0 to 5, and 0.0 (“no”), 0.25 (“a quarter of ”), 0.33 (“a third of ”), 0.5 (“half ”), 0.66 (“two thirds of ”), 0.75 (“three quarters of ”), and 1.0 (“all”). Trivial and nonsensical combinations are excluded, for instance, “less than/at most zero/no”or “more than/at least all”. The following two examples illustrate the
3Note that the current definition avoids non-trivial nested quantification, which simplifies some of the definitions,
pattern of quantifier definitions. Once again, note the avoidance of ambiguity by conservatively using the smaller set of agreeing versus the larger set of not-disagreeing objects to decide for agreement, and vice versa for disagreement.
• “at most three” translates to c = quantifier(type = count, comparator = at most, quantity = 3, subject = s, verb = v): JcK(W ) :=
true if |{e ∈ W : ¬s.disagree(e) ∧ ¬v.disagree(e)}| ≤ 3
f alse if |{e ∈ W : s.agree(e) ∧ v.agree(e)}| > 3 ambiguous else
• “most” / “more than half” translates to c = quantifier(type = ratio, comparator = more than, quantity = 0.5, subject = s, verb = v):
JcK(W ) :=
true if |{e∈W :s.|{e∈W :¬s.agree(e)∧v.disagreeagree(e)}|(e)}| > 0.5
f alse if |{e∈W :¬s.|{e∈W :s.disagree(e)∧¬v.agree(e)}|disagree(e)}| ≤ 0.5 ambiguous else
Propositions. A proposition is a caption component specified by a type and a set of clause captions. It combines the truth values of these clauses according to common logical operators given by the type: “conjunction”, “disjunction”, “implication” (requiring exactly two clauses), and “equivalence”. The following two examples illustrate the pattern of proposition definitions. Note that these definitions use the fact that the ternary logic formalism is implemented as continuous values between −1.0 and 1.0.
• “and” translates to c = proposition(type = conjunction, clauses = C):
JcK(W ) := minp∈C JpK(W )
• “if and only if” translates to c = proposition(type = equivalence, clauses = C):
JcK(W ) := max min
p∈C JpK(W ), minp∈C −JpK(W )