2.2 CARACTERIZACIÓN DE LOS TIPOS DE TECNOLOGÍAS EN ENERGÍAS RENOVABLES
2.2.3 ENERGÍA SOLAR
2.2.3.2 Construcción de una central solar fotovoltaica [15]
Separation of the ranking and testing effects is important in its own right as it demonstrates that the current research on post-selection inference ignores the testing effect. Furthermore, the magnitude of the testing effect is often much larger than that of the ranking effect. Revisiting sequential methods are designed to control these selection effects, producing simple procedures that outperform the more complicated ones from the conditional inference framework.
The sequential selection framework provides many practical benefits. The core improve- ment is the ability to have dynamic algorithms that do not require a fixed set of hypotheses to be specified in advance. This allows for revisiting and directed search procedures. This flexibility also highlights the needs for new notions of false rejections. For example, identi- fying the interaction X1X2 in our framework requires both marginal terms to be included
initially, even though neither is in the true model. The marginal terms are often considered false rejections even though, in the model in which they are tested, they capture significant signal. Progress on revising measures of false rejections to account for these considerations has been made by (G’Sell et al.,2013).
Future projects in this domain are as follows:
• One straightforward extension to Chapter 3 is to explore the use of Revisiting Alpha- Investing in generalized linear models. As previously discussed, type-I error control is maintained in this domain, but similar proofs for the performance of the procedure and
the implications of submodularity are unknown. This is partly due to submodularity being closely tied to linear models theory, but this can be extended to generalized lin- ear models via weighted least-squares. Even in linear models, however, assessing the approximate submodularity of interaction spaces is an interesting and open problem.
• Further computational improvements can be achieved when approximating forward step- wise by using “lazy evaluation,” which is an alternative algorithm to approximately sort the p-values. Lazy evaluation begins with a sorted list of marginal p-values from which the most significant feature is chosen. In order to select the correct second feature, forward stepwise recomputes all stepwise p-values conditional on the previously selected feature. Lazy evaluation merely recomputes the second smallest p-value. If, after recomputing, it is still smaller than the third smallest p-value, the corresponding hypothesis test is rejected. Else, the third smallest p-value is recomputed and compared to the fourth- smallest. The process continues until the corrected p-value is smaller than all p-values from the previous step and the corrected p-values in the current step. Under submodu- larity, this process is exactly the same as forward stepwise. The performance and error control of lazy evaluation under approximate submodularity are open questions.
• Instead of recomputing p-values after each feature is added to a model, one could only recompute p-values after each testing pass of a revisiting procedure is complete. This may result in a different set of features selected; however, rejected features fall into two sets which are intuitively appealing. The first set contains those features which would have been rejected in the revisiting procedures of Chapters 2 and 3. These features contain unique signal not captured by the others. The second set of features are “redundant” in that multiple features convey the same information. Loosely speaking, forward stepwise only selects one from each set. While this does reduce the proportion of seemingly false rejections, when correlation is high as it is in real data, it can be difficult to separate true features from the false features with which they are highly correlated. Including groups of features in this way is similar to a discrete version of the group lasso (Yuan and Lin, 2006).
• The Benjamini-Hochberg step-up procedures are “forward looking” in that they do not provide an accurate stopping time like alpha-investing rules. The power of alpha-investing can be increased if it is allowed to “peek” at the results of subsequent tests. The challenge in doing so is that the martingale properties of mFDR need to be preserved. By leveraging tests of intersection hypotheses and the closure principle we conjecture that some future information can be used. For example, instead of merely testing an individual hypothesis
H1, consider a test of the joint hypothesisH[4] =H1∩. . .∩H4. IfH1 fails to be rejected
but H[4] is rejected, it indicates that there is a false hypotheses in subsequent tests. Intuitively, if alpha-investing could take out a “loan” of error probability, it may be able to “repay” the loan when rejecting the future tests. Such behavior may preserve the martingale structure over sets of tests.
6.2. Submodularity
Submodularity plays an important role in statistics because it characterizes the difficulty of the search problem of feature selection. Assumptions used to prove the success of the Lasso, such as the restricted eigenvalue and restricted isometry properties, bound minimum sparse eigenvalues and hence are stronger assumptions than submodularity. Similarly, SIS requires true model features to have a bounded discrepancy between their joint coefficient in the true model and their marginal coefficient from a simple regression. Bounding this discrepancy is stronger than approximate submodularity as all true features cannot become vastly more significant in the presence of others. Similarly, worst case data examples can be crafted by intentionally breaking submodularity. This can be seen inBerk et al. (2013) and Miller (2002). Due to the importance of submodularity in discrete optimization, it provides a more theoretically robust assumption than those more commonly considered in statistics. Furthermore, it characterizes a different dimension of difficulty than the signal to noise ratio. As such, it is an important statistic to report in simulated data analyses. Future projects in this domain are as follows:
• If data is submodular then the forward stepwise table will have non-decreasing p-values; however, it is not known the extent to which the converse holds. It clearly does not need to be the case that sorted p-values implies that there are no conditional suppressor variables in the entire data set. It can easily be the case that suppression occurs in the “insignificant” features which are not selected by forward stepwise for many steps; however, many feature selection assumptions only need to hold on the restricted set of relevant variables. Therefore, we conjecture that sorted stepwise p-values have an important connection to the submodularity of the highly significant variables.
• The graphs and discussion of Chapter4indicate that regardless of the correlation between
X1 and X2, there exists a response Y such that the data is submodular. We conjecture
that this is a general property for any correlation matrix with fewer thannfeatures. This would force researchers to focus more on the nature of their true response function when using simulated data.