1.2 Formulación del Problema
2.1.12 Características de un sistema de información contable efectivo
We will now shortly discuss a number of directions for future research. We divide our discussion into three parts. In the first part we review the open questions and future directions on the topic of combining CSP(O) and probabilistic inference. The second part deals with possibilities for future work in the application of integer programming to constrained clustering problems. We conclude by mentioning opportunities for further research on the topic of learning taxi passenger demand.
132 CONCLUSIONS AND FUTURE WORK
8.2.1
Probabilistic Models in Constraint Satisfaction and
Optimization
In one of our studies we represented an arithmetic circuit by a set of constraints. It has been shown that using a dedicated propagator for an s-DNNF outperforms the method that represents the circuit as a set of constraints (Gange and Stuckey 2012). A dedicated propagator for the arithmetic circuit obtained from the d-DNNF might improve the efficiency of the search. Designing such a propagator is a direction for future research.
In our work on stochastic constraint programming using And-Or search, there are several directions for future research. One component missing from our work which is common in the existing research on stochastic constraint programming is the notion ofchance constraints. A chance constraint is a constraint that is allowed to be violated in a certain fraction of possible worlds (Walsh 2002). Introducing chance constraints to our method is a topic for future work. Stochastic constraint programming is a combination of two difficult problems. This motivates designing approximate methods that aim at solving large problems. An example of approximate methods for stochastic constraint programming is to use reinforcement learning for solving these problems (S. D. Prestwich et al. 2017). Another possible approach which is closer to an exact algorithm is to use the same And-Or branch and bound method, while ignoring the branches that lead to a small probability (or expected utility) in such a way that the difference with the exact objective is guaranteed to be less than a certain threshold.
The And-Or branch and bound algorithm shares many principles with nested constraint programming (NCP) (Chu and Stuckey 2014). In principle, factored stochastic constraint programs can be solved as special cases of NCP. This would make it possible to take advantage of improvements that are built in this framework. Currently, when a stochastic constraint programs is modeled as NCP it is assumed that the random variables are independent. Modeling factored stochastic programs as NCP requires designing special propagators that reason over the joint probability distribution. This can be done in the same direction as designing propagators that reason over arithmetic circuits. In our FSCP framework, the problems of constraint satisfaction/optimization (search) and probabilistic inference (counting) were decoupled. The search component repeatedly communicates with an external inference engine to answer probability queries. A direction for future research is to perform these tasks jointly within a single mechanism. This can facilitate exploiting the joint structure of deterministic and probabilistic factors. DPLL search is a possible candidate for unifying search and counting. On one hand, it has been used for
DISCUSSION AND FUTURE WORK 133
probabilistic inference (Sang et al. 2004), and on the other hand it can be used for propagation in constraint satisfaction/optimization (Ohrimenko et al. 2007). Finally, it is interesting to allow the probability distribution to be influenced by the decisions. Standard scenario-based methods sample the scenarios in a phase prior to the decision-making step and hence can not deal with such problems. The advantage of our method is that it directly reasons over the problem structure and hence it might be easier to adapt to such problems.
8.2.2
Constrained Clustering using Integer Linear Program-
ming
There are a number of open questions about our work on clustering by column generation. The performance of this method is affected by the efficiency of solving the master problem and the subproblem. An interesting direction for future work is to improve the bounding method in the branch and bound algorithm that solves the subproblem. It might also be beneficial to use an approximate algorithm for solving the subproblem and use the exact method only when this approximate method fails. This technique has been used in the column generation algorithm for the unconstrained version of our clustering problem (Aloise, Hansen, and Liberti 2012). In our algorithm, the constraints are enforced in the subproblem. Another interesting future work is to extend the branch and bound algorithm so that it supports other types of constraints. In our work on graph clustering, we based our formulation on encoding assignments of nodes to clusters as decision variables. Another common encoding in graph clustering is based on co-membership of pairs of nodes in a cluster. Using this approach, the number of clusters can be decided automatically (Benati et al. 2017). Formulating the graph clustering problem using this encoding and comparing it with the current two formulations is an interesting direction for future work. Moreover, in our formulations of the graph clustering problem we did not use any redundant constraints. Finding valid inequalities that can strengthen the current formulations is another topic that needs further investigation.
In both formulations of clustering problems, we had to develop specialized algorithms for solving the subproblems. This at least partially contradicts the declarative nature of constraint solving which was one of the motivations for formulating DM/ML tasks as CSP(O) models. A possible direction for future research is to automatically detect and solve the subproblems. There is a framework called Generic Column Generation (GCG) which converts a MILP formulation into an equivalent model that can be solved using the
134 CONCLUSIONS AND FUTURE WORK
column-generation method. It then formulates the subproblem as another MILP model and solves the problem using the branch-and-price method (Gamrath and Lübbecke 2010). However, our MSS clustering problem can not easily be modeled as a compact MILP problem. Moreover, the subproblem in our formulation involves non-linear constraints and hence can not be directly solved by the MILP solver. This suggests that new representations and solving techniques are required to extend existing methods such as GCG to general clustering problems.
8.2.3
Learning Taxi Passenger Demand
In our work we assumed that the random variables representing the passenger demand in different areas are independent given the contextual features (e.g. weather, demographic attributes, calendar information, etc.). This assumption simplifies both learning the distributions and sampling from them. However, there are methods in spatio-temporal data analysis that take the dependencies between variables into account (Diggle 2013). Another type of models that encode such dependencies between count variables are Poisson dependency networks (Hadiji et al. 2015). Using these richer models might lead to more accurate distributions and more realistic samples.
An emerging topic in machine learning is probabilistic programming. In this declarative approach, the user specifies a probabilistic model, and inference (also learning, as a special type of inference) is performed automatically. Several probabilistic programming frameworks rely on sampling as their approximate inference mechanism. It might be beneficial to use these frameworks for learning distributions and sampling scenarios. A next step will be extending probabilistic programming frameworks with decision making capabilities. This will offer a fully-declarative approach to (multi-stage) stochastic optimization. Since the declarative specification might include constraints over the decision variables, enforcing such constraints in this setting is a challenge that needs to be addressed.