El sistema de Telesecundarias - Descripción del contexto

Capítulo III. Descripción del contexto

3.1 Telesecundarias

3.1.1 El sistema de Telesecundarias

A.1 Auxiliary Tasks: Additional Motivation and Details

A.1.1 Autoencoding

Motivation. Autoencoding encourages the hidden state to retain all relevant information from the original input. By reducing the size of the hidden state, our model must learn a compressed representation of the signal. Compression encourages learning latent struc- ture and discourages learning noise.

Details. We use a single-layer autoregressive LSTM for the recurrent layer RAE, which

is initialized using the hidden and cell states from the encoder LSTM and sequentially decodes hidden states z_tAE. These hidden states are fed into a fully-connected output layer, OAE, that maps ztAE → ˆxt. The output ˆxt is then fed into RAE, generating z_tAE+1,

continuing until ˆx0:T is fully generated. To provide a shorter path for gradient flow, we

A.1.2 Forecasting

Motivation. Forecasting encourages E to encode the dynamics of the data-generating process. Without information about the underlying dynamics, future value prediction is either challenging or trivial (if the signal does not change). Such dynamics may carry valuable information for a range of sequence-level tasks, though the aspects of the dynamics most relevant to the target task may differ from those that predict future values. Details. We focus on a multi-output forecasting architecture, predicting several future values simultaneously [173]. This is done using a recurrent decoder, similar to the au- toencoder, but applied at each encoding step and expanded only h steps, where h is a hyperparameter.

A.1.3 Partial-Signal Autoencoding

Motivation. Previous work has found advantages to reconstruction over prediction [96]. Dai and Le hypothesize the observed superiority of sequence autoencoding over forecasting may result from the short-term nature of the language modeling task (only predicting the next word). To investigate the effect of short-term dependency auxiliary tasks, we use a multi-step forecasting system, where we can examine the effect of vary- ing the prediction horizon. Analogously, we use PS-AE to examine the effect of produc- ing short-term reconstructions.

Details. PS-AE is implemented using a setup identical to our forecasting approach, ex- cept we estimate the previous h values instead of the subsequent values (Figure 4.1 a, DPS).

A.1.4 Piecewise-Linear Autoencoding

Motivation. AE encourages fine-grained modeling of the signal over long periods, whereas PS-AE encourages fine-grained modeling over a short period. While PS-AE can control the range of temporal dependencies modeled using the decoding horizon h, it does not vary the granularity at which the output is modeled. What level of signal granularity is

required for reasonable target-task performance? To explore this question, we introduce Piecewise-Linear (PL) Autoencoding.

Details. To generate a PL representation of a signal with n distinct pieces, we take the encoded representation of the signal, zT, and feed it into a recurrent layer RPL. Figure

4.1 a, DPLillustrates our point-generation system. The sequential output of the decoder,

zPL_j , generates the jthpoint value and position. Specifically, we use two fully-connected output layers, Oval and Opos, that map zPL_j →vj, pjrespectively. Since we know p0 =0,

at the first step we generate only an initial value. We continue the decoding process, feeding vj,∑_kj=0pkto the decoder to generate z

j+1, until we have generated n+1 points.

We then use linear interpolation to map(v0, p0). . .(vn, pn) → ˆx0:T. We normalize the

position vector and use the cumulative summation to determine segment positions.

A.2 All Combinations of Auxiliary Tasks

In Figure 5 we present plots showing average AUC-ROC for all possible combinations of auxiliary tasks, for each data set. Here the contribution of each task can be examined.

A.3 Application to UCR Times Series data-sets

In order to evaluate the use of self-supervised auxiliary tasks in an easily replicable set- ting, and in order to facilitate comparisons with state-of-the are methods, we performed classification on 7 data sets from the UCR Time Series Archive https://www.cs.ucr.edu/ ~eamonn/time_series_data_2018/. We included tasks that had fixed length sequences and a sample to label ratio greater than 100. We only included the subset of these tasks for which our baseline architecture performed better than random chance. These selections were to ensure that the tasks were reasonably well suited to our baseline architecture. For each dataset, the last 20% of the training sample was used as a validation set, with the testing sample used as a test set.

Table A.1 shows baseline results for each task (accuracy when our model was run with no auxiliary tasks), the highest accuracy achieved with any combination of auxiliary

0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 AUC-ROC PLA 0.500 0.525 0.550 0.575 0.600 0.625 0.650 0.675 0.700 AUC-ROC T1D None AE Forecaster InlineAE PLA

AE+Fore AE+Inline AE+PLA

Fore+Inline Fore+PLA Inline+PLA NOT AE NOT Fore NOT Inline NOT PLA ALL 0.50 0.52 0.54 0.56 0.58 0.60 0.62 0.64 AUC-ROC AF

FIGUREA.1: Average AUC-ROC for every combination of Auxiliary task for all three analyses. Here, columns marked ’NOT’ included all but the indi-

cated auxiliary task.

tasks, and state-of-the-art performance according to timeseriesclassification.com. Al- though the baseline architecture is routinely and drastically outperformed by state-of- the-art methods, the addition of auxiliary tasks improved performance to be significantly closer to the state-of-the-art level in most cases, and our model achieved at- state-of-the- art accuracy for the Two Patterns task. Although our model was unable to achieve state- of-the-art performance on most tasks, the improvement over baseline indicates that the addition of self-supervised auxiliary tasks could enhance better suited baseline architec- tures on these and other tasks.

TABLEA.1: Accuracy on 7 data sets from the UCR Archive.

Data Set Baseline-ACC Best-ACC Best-Tasks SOTA- ACC

ElectricDevices 57.5 59.6 Forecaster 89.5

FordB 60.3 75.4 Inline + PLA 92.9

FordA 61.4 86.6 Not Forecaster 96.5

Wafer 89.1 95.2 AE 100.0

HandOutlines 54.7 54.7 None 92.4

TwoPatterns 64.4 100 Forecaster+Inline 100

In document Prácticas inclusivas de docentes de telesecundaria en un contexto de violencia y drogadicción (página 93-97)