COLOMBIA Y LOS DERECHOS DE LA NATURALEZA:
3. DESAFÍOS AMBIENTALES Y PUEBLOS ORIGINARIOS
The idea of automated reasoning dates back before AI itself and can be traced to ancient
Greece. Aristotle’s syllogisms paved the way fordeductive reasoning formalism. It continued
its way with philosophers like Al-Kindi, Al-Farabi, and Avicenna (Davidson, 1992), before culminating as the modern mathematics and logic.
Within AI research, McCarthy (1963) pioneered the use of logic for automating reasoning
for language problems, which over time branched into other classes of reasoning (Holland
et al., 1989; Evans et al., 1993).
A closely related reasoning to what we study here is abduction (Peirce, 1883; Hobbs et al.,
1993), which is the process of findingthe best minimal explanationfrom a set of observations
(see Figure 8). Unlike in deductive reasoning, in abductive reasoning the premises do not
guarantee the conclusion. Informally speaking, abduction is inferring cause from effect (reverse direction from deductive reasoning). The two reasoning systems in Chapter 3 and
4 can be interpreted as abductive systems.
We define the notation to make the exposition slightly more formal. Let`denote entailment and ⊥denote contradiction. Formally, (logical) abductive reasoning is defined as follows:
Given background knowledge B and observations O, find a hypothesis H, such that
B ∪H 0 ⊥ (consistency with the given background) and B∪H ` O (explaining the
observations).
In practical settings, this purely logical definition has many limitations: (a) There could be
multiple hypotheses H that explain a particular set of observations given the background
knowledge. The best hypothesis has to be selected based on some measure of goodness
Figure 8: Brief definitions for popular reasoning classes and their examples.
elements, i.e. there are degrees of certainties (rather than binary assignments) associated
with observations and background knowledge. Hence the decision of consistency and ex-
plainability has to be done with respect to this fuzzy measure. (c) The inference problem in its general form is computationally intractable; often assumptions have to be made to
have tractable inference (e.g., restricting the representation to Horn clauses).
2.5.2. Incorporating “uncertainty” in reasoning
Over the years, a wide variety of soft alternatives have emerged for reasoning algorithms,
by incorporating uncertainty into symbolic models. This resulted in theories like fuzzy-
logic (Zadeh, 1975), or probabilistic Bayesian networks (Pearl, 1988; Dechter, 2013), soft
abduction (Hobbs et al., 1988; Selman and Levesque, 1990; Poole, 1990). In Bayesian net- works, the (uncertain) background knowledge is encoded in a graphical structure and upon
receiving observations, the probabilistic explanation is derived by maximizing a posterior
probability distribution. These models are essentially based on propositional logic and can-
not handle quantifiers (Kate and Mooney, 2009). Weighted abduction combines the weights
of relevance/plausibility with first-order logic rules (Hobbs et al., 1988). However, unlike
ical basis and does not lend itself to a complete probabilistic analysis. Our framework in
Chapter 3,4 is also a way to perform abductive reasoning under uncertainty. Our proposal
is different from the previous models in a few ways: (i) Unlike Bayesian network our frame-
work is not limited to propositional rules; in fact, there are first-order relations used in the
design of TableILP (more details in Chapter 3). (ii) unlike many other previous works,
we do not make representational assumptions to make the inference simpler (like limiting to Horn clauses, or certain independence assumptions). In fact, the inference might be
NP-hard, but with the existence of industrial ILP solvers this is not an issue in practice.
Our work is inspired by a prior line of work on inference on structured representations to
reason on (and with) language; see Chang et al. (2008, 2010); ?, 2012), among others.
2.5.3. Macro-reading vs micro-reading
With increased availability of information (especially through the internet)macro-reading
systems have emerged with the aim of leveraging a large variety of resources and exploiting the redundancy of information (Mitchell et al., 2009). Even if a system does not understand
one text, there might be many other texts that convey a similar meaning. Such systems de-
rive significant leverage from relatively shallow statistical methods with surprisingly strong
performance (Clark et al., 2016). Today’s Internet search engines, for instance, can success-
fully retrievefactoid style answers to many natural language queries by efficiently searching
the Web. Information Retrieval (IR) systems work under the assumption that answers to
many questions of interest are often explicitly stated somewhere (Kwok et al., 2001), and
all one needs, in principle, is access to a sufficiently large corpus. Similarly, statistical cor-
relation based methods, such as those using Pointwise Mutual Information or PMI (Church and Hanks, 1989), work under the assumption that many questions can be answered by
looking for words that tend to co-occur with the question words in a large corpus. While
both of these approaches help identify correct answers, they are not suitable for questions
requiring language understanding and reasoning, such as chaining together multiple facts in
a piece of evidence given to the system, without reliance of redundancy. The focus of this
thesis ismicro-reading as it directly addresses NLU; that being said, whenever possible, we
use macro-reading systems as our baselines.
2.5.4. Reasoning on “structured” representations
With increasing knowledge resources and diversity of the available knowledge representa- tions, numerous QA systems are developed to operate over large-scale explicit knowledge
representations. These approaches perform reasoning over structured (discrete) abstrac-
tions. For instance, Chang et al. (2010) address RTE (and other tasks) via inference on
structured representations), Banarescu et al. (2013) use AMR annotators (Wang et al.,
2015), Unger et al. (2012) use RDF knowledge (Yang et al., 2017), Zettlemoyer and Collins
(2005); Clarke et al. (2010); Goldwasser and Roth (2014); Krishnamurthy et al. (2016) use
semantic parsers to answer a given question, and Do et al. (2011, 2012) employ constrained
inference for temporal/causal reasoning. The framework we study in Chapter 3 is a reason- ing algorithm functioning over tabular knowledge (frames) of basic science concepts.
An important limitation of IR-based systems is their inability to connect distant pieces of
information together. However, many other realistic domains (such as science questions or
biology articles) have answers that are not explicitly stated in text, and instead require com-
bining facts together. Khot et al. (2017) creates an inference system capable of combining
Open IE tuples (Banko et al., 2007). Jansen et al. (2017) propose reasoning by aggregating
sentential information from multiple knowledge bases. Socher et al. (2013); McCallum et al.
(2017) propose frameworks for chaining relations to infer new (unseen) relations. Our work in Chapter 3 creates chaining of information over multiple tables. The reasoning framework
in Chapter 4 investigates reasoning over multiple peaces of raw text. The QA dataset in
Chapter 5 we propose also encourages the use of information from different segments of the
2.5.5. Models utilizing massive annotated data
A highlight over the past two decades is the advent of statistical techniques into NLP (Hirschman
et al., 1999). Since then, a wide variety of supervised-learning algorithms have shown strong
performances on different datasets.
The increasingly large amount of data available for recent benchmarks make it possible to
train neural models (see “Connectionism”; Section 2.4.2) (Seo et al., 2016; Parikh et al.,
2016; Wang et al., 2018; Liu et al., 2018; Hu et al., 2018). Moreover, an additional tech-
nical shift was using distributional representation of words (word vectors or embeddings)
extracted from large-scale text corpora (Mikolov et al., 2013; Pennington et al., 2014) (see
Section 2.4.3).
Despite all the decade-long excited about supervised-learning algorithms, the main progress,
especially in the past few years, has mostly been due to the re-emergence of unsupervised
representations (Peters et al., 2018; Devlin et al., 2018).2