Revista Argentina de Clínica Psicológica 2020, Vol. XXIX, N°2, 400-405
DOI: 10.24205/03276716.2020.255 400
P
REDICTION OF
S
PORTS
N
EWS
C
LICK
-T
HROUGH
R
ATE
B
ASED ON
N
EURAL
N
ETWORKS
Liquan Chen
1, You Li
1*, Shandong Pang
2, Jiaxuan Chen
3Abstract
In the era of new media, major search engines and news portals are concerned with the accurate prediction of the click-through rate (CTR). Against the backdrop, this paper improved the recurrent neural network (RNN) for better prediction of the CTR of sports news. First, the curve neurons in the RNN were replaced with the long short-term memory (LSTM) structure, such that the improved RNN can record long-term historical data. Besides, the steep gradient descent (SGD) algorithm and cross entropy were introduced to further optimize the computing method and objective function of the network. Furthermore, the accumulated calculation output mode was adopted to solve the vanishing gradient problem. Finally, the improved RNN was compared with existing methods like the backpropagation neural network (BPNN) and logistic regression (LR) algorithm through a contrastive experiment. The results show that our improved RNN achieved the minimum log-likelihood loss (logloss), i.e. the most accurate CTR prediction for sports news. The research results shed new light on the design of effective news delivery strategy for news websites.
Key words: Sports News, Click-Through Rate (CTR), Prediction, Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM).
Received: 04-03-19 | Accepted: 16-10-19
INTRODUCTION
At present, the world has entered the era of new media, and the emergence of smart phones has enabled people to view massive hot news online anytime and anywhere (Lawson-Borders, 2003; Oakley & Crisp, 2011). For sports news, hot news such as NBA, UEFA Champions League, tennis, and F1 racing are characterized by large click-through rate and fast news dissemination (Chen and Liu, 2014; Yang, 2014; Zeng, 2006). The CTR prediction of the sports news and public opinion can help to follow the news trend, and the hot news can improve the CRT and advertising related income of the portal website (Gu, Dong & He, 2014; Zanjani & Khadivi, 2015).
1College of Sport Science and Physical Education,
Mudanjiang Normal University, Mudanjiang 157012, China.
2Department of Social Sports, Hebei Institute of Physical
Education, Shijiazhuang 050041, China. 3International Elite
College, Yonsei University, Wonju 26493, South Korea. E-Mail: chenliquan2004@163.com
CTR prediction of news is one of the most concerned issues for the major searching corporations and portal websites (Bauman, 2013).
Figure 1 shows the CTR prediction process of sports news, mainly including data collection, data feature extraction and processing, and selection of reasonable prediction models for prediction (Arkhipova, Grauer, Kuralenok et al, 2015; Atkinson, Driesener, & Corkindale, 2014; Shan, Lin, Sun et al., 2016).
Figure 1
.
CTR prediction process of sports
news
Physical education Data Data feature extractionand processing prediction modelClick rate
Determine sports order and increase click-through rate
LIQUAN CHEN,YOU LI,SHANDONG PANG,JIAXUAN CHEN
401
in a period of time, the future CTR of the news can be better predicted by the traditional linear regression models and the ordinary intelligent learning models, including the support vector machine model, BPNN models, genetic algorithm models, simulated annealing models, etc. (Delgado, Pegalajar, & Cuéllar, 2010; Freitag, Graf, Kaliske et al., 2011; Gao & Meng, 2005; Lee, Shi, Wang et al., 2016; Parlos, Rais, & Atiya, 2000; Wang, Suphamitmongkol, & Wang, 2013; Yue, Wang, Zhu et al., 2013; Zhang, Dai, Xu et al, 2014; Zhang, Jansen, & Spink, 2014). However, the current number of sports news increases very fast, and most of the new news has no similar historical data for reference, and the clicks is relatively sparse. In this case, researchers have proposed hierarchical clustering models, time and space prediction models, factorization models, etc. (Goldberger, 2004; Hasnat, Alata, & Trémeau, 2016; Li, Deng, Wang et al., 2014; Li, Zheng, Yang et al., 2014; Posse, 2001), which are all in the preliminary research stage with long calculation time and the relatively large errors.
Based on the existing research, this paper proposes an improved RNN algorithm model, using the LSTM structure, which can effectively suppress the gradient exploding defects of the RNN model in the calculation process. The research conclusions can provide a new method for news CNR prediction.
IMPROVED RECURRENT NEURAL NETWORK-BASED CTR PREDICTION MODEL
Recurrent neural network model
RNNs are improved algorithms based on traditional BP neural networks. Figure 2 shows its structure of the traditional BP neural network.
Figure 2
.
BP neural networks
Output layer
Hidden layer
Input layer
Due to its structural characteristics, BP neural network itself has the defects such as long training time in the earlier stage, excessive convergence or
local optimization. Also, it has short memory with large fluctuations. The RNN effectively solves the shortcomings of BP neural network with poor storage capacity and convergence. When the information propagates within the RNN structure, it has a long existence time, and the shared output of the upper layer and the hidden layer the RNN model also can ensure the RNN model to have a good prediction effect on the news calculation of CTR with large data volume and complicated conditions.
Figure 3
.
Recurrent neural network and its
partial enlarged details
u1
¡-u2
¡-uK
¡-y1
¡-y2
¡-yL
¡-(a) Recurrent neural network Input
t
t+1
wih
Hidden Output
wh'h
(b) Partial enlarged details
The RNN structure is shown in Figure 3. The data propagation process of the RNN can be divided into forward propagation and back propagation.
For forward propagation, the input vector aht and the output vector bht on the hidden layer at the time t are expressed as:
1
1 1
I H
t t t
h ih i h h h
i h
a w x w b −
= =
=
+
(1)( )
t t
h h h
b = a (2)
where, wih and whh are the weights of different layers; xit is the input value; θk is the excitation
PREDICTION OF SPORTS NEWS CLICK-THROUGH RATE BASED ON NEURAL NETWORKS 402
function. The input vector aot and the output vector bot of the output layer at time t are respectively expressed as:
1
H
t t
o ho h
h
a w b
=
=
(3)( )
t t
o o o
b =
a (4)The backpropagation process is roughly the same as the traditional BP neural network. During propagation, the error signal δht is expressed as:
( )
11 1
O H
t t t t
h h o hk h h h
o h
a
w
w
+
= =
=
+
(5)The partial derivative of δht for different weights wi is:
( )
1 1
t
T T
j t t
ij t j i
t t
ij j ij
a
L L
L w b
w = a w =
= = =
(6)Then, the weight wij can be updated according to Equation 7.
( )
ij ij ij
w
=
w
−
L w
(7)The RNN training process is as follows: (1) Input historical data, and convert it into an output vector through the input layer; (2) Take the output vector as the input vector of the hidden layer, and after the hidden layer is converted, obtain an output vector conforming to the excitation function and the threshold operator condition; (3) Continuously update the calculation weights of the different layers by Equation 5-7, and obtain the final expected error.
Improved recurrent neural network model
Traditional RNN algorithms have significant gradient vanishing or gradient exploding. In this paper, the LSTM structure was incorporated to the RNN model for solving the above defects.
Figure 4 shows the specific composition of the LSTM structure, including the input gate (control information input), the output gate (the output after the control information is converted), the forget gate (determine whether to retain the historical calculation data), and the Cell node unit. Sct is the calculated value at time t.
Figure 4
.
Long short-term memory structure
Cell
g Input Gate
Block Forget Gate Output
Gate
f
h
f f
Figure 5
.
Improved recurrent neural
network model
Input
t
t+1
wih
Hidden Output
wh'h
Cell
g
Input Gate
Block Forget Gate Output
Gate
Cell
Input Gate
Block Forget Gate Output
Gate
¡-f h
f
f f
f
f
g h
Figure 5 shows the improved RNN model established in this paper. The number of input values for the input gate is increased to 3: the output vectors of the input layer and the hidden layer, and the historical information retained by the cell node. Let the input vector of the input gate in the improved RNN model be aτt, then the output vector bτt can be expressed as:
( )
t t
LIQUAN CHEN,YOU LI,SHANDONG PANG,JIAXUAN CHEN
403
Similarly, using the input vector aφt, the output
vector bφt of the forget gate can be indicated as:
( )
t t
b = f a (9)
The model uses the forget gate to determine whether the historical information retained by the cell node is adopted.
( )
1
t t t t t
c c c
s =b s − +b g a (10)
The output vector bwt of the output gate and the output vector bct of the Cell unit can be expressed as:
( )
t t
b
=
f a
(11)( )
t t t
c c
b =b h s (12)
Then, the final output vector bkt (optimized by three gates) of the improved RNN model is:
( )
t t
k k
b
=
f a
(13)The node weight for RNN models is updated as:
i t t
ij ij j j
w
=
w
−
b
(14)It can be seen from Equation 8-14 that the state function and derivative of the improved RNN model are obtained by accumulating methods, which can avoid the gradient exploding in the original recurrent network when the derivative calculation value tends to zero.
The log loss function logloss of Equations 15 and 16 was used to evaluate the prediction effect of
different models about news CTR. It’s expressed as:
( )
1 1
1
log
log
N M
ij ij i j
loss
y
p
N
= == −
(15)( ) (
) (
)
(
)
1 log
1
log 1 log 1 N
i i i i
i
loss
y p y p
N =
=
−
+ − −(16)
The smaller the Logloss, the more accurate the model's prediction results
TEST RESULTS AND ANALYSIS
The news data provided by a famous online sports news web portal was selected as the sample data in this study. Each sample data contains several explicit and implicit features. All sample data was divided into 5 parts, 4 of which were training samples and 1 was test sample.
In the calculation process, samples with less occurrences of Device_ were eliminated, making the model calculation faster and more stable. The learning rate was set to 0.00025 and the optimization function to Adagrad.
Both the RNN model and the improved model established in this paper adopt the Keras framework and a stochastic gradient descent algorithm, using the cross-entropy function as the objective function of the two.
In order to verify the feasibility and superiority of the proposed LSTM-based algorithm in the study, this algorithm was compared with traditional BPNN, LR algorithm and original RNN algorithm. The sample training of the algorithm was first performed, followed by the sample test. 6 times of iteration were conducted in the test, increasing 10 each time from 10th iterations until it reaches 60 iterations.
Figure 6 shows the change of the logloss values for the four prediction models when the number of iterations was increased from 10 to 60. Figure 5 shows the minimum logloss values of BPNN algorithm, LR algorithm, RNN algorithm and the proposed algorithm in this paper under different time of iterations.
It can be seen from Fig. 6 and 7 that the logloss value of the LR algorithm firstly decreases and then increases; with the number of iterations 40, the minimum value was 0.389; as it increases further, the logloss value starts to increase, indicating the LR algorithm learns best when the number of iterations is 40.
The logloss value of BPNN shows a decreasing trend as a whole, and it reaches the minimum value of 0.462 at the 60 iterations. However, compared with the other three algorithms, the logloss value of BPNN is too large, indicating that its predicted values are the least accurate among the four algorithms.
The logloss values of both the RNN model and of the models established in this paper decrease with the number of iterations. The minimum logloss value of the RNN model was 0.386, while that of the LSTM model in this paper was 0.383, indicating that the news CTR of the LSTM-based model has the highest accuracy.
PREDICTION OF SPORTS NEWS CLICK-THROUGH RATE BASED ON NEURAL NETWORKS 404
Figure 6
.
CTR prediction logloss values of 4
models
Figure 7
.
The CTR predicted minimum
logloss values of the four models
Lowest log
less
value
LR
Model
BPNN RNN LSTM
0 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50
0.389217
0.462238
0.385945 0.382996
Figure 6 and 7 further show that the RNN algorithm has better prediction accuracy of the CTR than the traditional LR and BPNN algorithms. LR algorithm has some problems such as insufficient learning, over-fitting and local optimization in the prediction of nonlinear problems; for the BPNN, its core algorithm is gradient descent algorithm, and it cannot memorize historical data, so the local extremums occurs easily or iterations do not reach the pre-set threshold before the end, thereby leading to different calculation results each time.
The RNN algorithm optimizes the above defects of the BPNN algorithm, but its gradient exploding makes its calculation results unstable when calculating complex problems. The proposed model in this paper uses the LSTM structure to replace the curve neurons in the original RNN, so that the model has a record function of historical results for a long time. The output function is calculated in an accumulated manner so as to ensure that no gradient exploding will occur. The calculation results
from Figure 6 and 7 also prove that the proposed algorithm has better prediction accuracy in the CTR predication of sports news.
CONCLUSIONS
Based on the existing research, this paper proposes an improved recurrent neural network algorithm model, using the LSTM structure to replace the curve neurons in the original recurrent neural network, so that the model has a long history record function. Then, the SGD algorithm and cross entropy were applied to further optimize the calculation method and objective function of the traditional RNN, and the accumulated calculation output mode was used to ensure that no gradient exploding occurs to the model.
Comparing the proposed algorithm in this paper with the existing BPNN and LR algorithm etc., it’s
found that with the number of iterations increasing, the LSTM-based model obtains a minimum logloss value of 0.383, proving that this LSTM model has the highest accuracy of CTR prediction for sports news.
Using the optimization model proposed in this paper, the news delivery strategy can be effectively formulated to maximize the CTR and profitability of hot sports news.
Acknowledgement
This study was supported by:
(1) 2018 National Cultivation project of Mudanjiang Normal University "Empirical Analysis and Optimization Strategy Research on Business Environment of China's Leisure Skiing Industry", No.GP2019007, Host: Liquan Chen.
(2) Research Platform for Basic Research Operating Expenses of Education Department of Heilongjiang Province in 2018, Research on Overseas Marketing Strategy and Inspiration of Australian Football League (AFL), No.1353PT001, Host: Liquan Chen.
(3) The Subject of Education Planning Filing in Heilongjiang Province in 2018, Research on the Matching between the Modular Talent Training of Social Sports Guidance and Management Specialty and the Sports Market Demand, No.GJC1318109, Host: Liquan Chen.
REFERENCES
Arkhipova, O., Grauer, L., Kuralenok, I., & Serdyukov, P. (2015). Search engine evaluation based on
2 4 6
0.38 0.40 0.42 0.44 0.46 0.48 0.50
60 50 40 30 20 10
Log
less
va
lue
Number of iterations
LR BPNN RNN LSTM
LIQUAN CHEN,YOU LI,SHANDONG PANG,JIAXUAN CHEN
405
search engine switching prediction. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in
Information Retrieval, 723-726.
Atkinson, G., Driesener, C., & Corkindale, D. (2014). Search engine advertisement design effects on click-through rates. Journal of Interactive
Advertising, 14(1), 24-30.
Bauman, K. E. (2013). Optimization of click-through rate prediction in the yandex search engine. Automatic Documentation & Mathematical
Linguistics, 47(2), 52-58.
Chen, N., & Liu, J. (2014). Research on the impact of modern network technology on sports culture dissemination. Advanced Materials Research,
926-930, 2722-2725.
Delgado, M., Pegalajar, M. C., & Cuéllar, M. P. (2010). Memetic evolutionary training for recurrent neural networks: an application to time-series prediction. Expert Systems, 23(2), 99-115. Freitag, S., Graf, W., Kaliske, M., & Sickert, J. U. (2011).
Prediction of time-dependent structural behaviour with recurrent neural networks for fuzzy data. Computers & Structures, 89(21), 1971-1981.
Gao, Y., & Meng, J. E. (2005). Narmax time series model prediction: feedforward and recurrent fuzzy neural network approaches. Fuzzy Sets &
Systems, 150(2), 331-350.
Goldberger, J. (2004). Hierarchical clustering of a mixture model. Neural Information Processing
Systems, 17, 505-512.
Gu, W., Dong, S., & He, J. (2014). Automatic prediction method for hot news topics based on time-series analysis. Journal of Computational
Information Systems, 10(8), 3473-3485.
Hasnat, M. A., Alata, O., & Trémeau, A. (2016). Model-based hierarchical clustering with bregman divergences and fishers mixture model: Application to depth image analysis. Statistics &
Computing, 26(4), 861-880.
Lawson-Borders, G. (2003). Integrating new media and old media: seven observations of convergence as a strategy for best practices in media organizations. International Journal on
Media Management, 5(2), 91-99.
Lee, J., Shi, Y., Wang, F., Lee, H., & Kim, H. K. (2016). Advertisement clicking prediction by using multiple criteria mathematical programming. World Wide Web-internet & Web Information
Systems, 19(4), 707-724.
Li, L., Zheng, L., Yang, F., & Li, T. (2014). Modeling and broadening temporal user interest in personalized news recommendation. Expert
Systems with Applications, 41(7), 3168-3177.
Li, M., Deng, S., Wang, L., Feng, S., & Fan, J. (2014). Hierarchical clustering algorithm for categorical data using a probabilistic rough set model.
Knowledge-Based Systems, 65(1), 60-71.
Oakley, T., & Crisp, P. (2011). Honeymoons and pilgrimages: conceptual integration and allegory in old and new media. Metaphor & Symbol, 26(2), 152-159.
Parlos, A. G., Rais, O. T., & Atiya, A. F. (2000). Multi-step-ahead prediction using dynamic recurrent neural networks. Neural Networks,13(7), 765-786.
Posse, C. (2001). Hierarchical model-based clustering for large datasets. Journal of
Computational & Graphical Statistics, 10(3),
464-486.
Shan, L., Lin, L., Sun, C., & Wang, X. (2016). Predicting ad click-through rates via feature-based fully coupled interaction tensor factorization. Electronic Commerce Research & Applications,
16(C), 30-42.
Wang, F., Suphamitmongkol, W., & Wang, B. (2013). Advertisement click-through rate prediction using multiple criteria linear programming regression model. Procedia Computer Science, 17, 803-811.
Yang, J. W. (2014). Research on information dissemination channels and monitoring of sports events. Advanced Materials Research, 926-930, 2606-2609.
Yue, K., Wang, C. L., Zhu, Y. L., Hao, W. U., & Liu, W. Y. (2013). Click-through rate prediction of online advertisements based on probabilistic graphical model. Journal of East China Normal University, 53(3), 15-25.
Zanjani, M. D., & Khadivi, S. (2015). Predicting user click behaviour in search engine advertisements. New Review of Hypermedia & Multimedia, 21(3-4), 301-319.
Zeng, W. (2006). Research on impact of the internet on the dissemination of sports culture. Shandong Sports Science & Technology, 926-930, 2702-2705.
Zhang, Y., Dai, H., Xu, C., Feng, J., Wang, T., Bian, J., Wang, B., & Liu, T. Y. (2014). Sequential click prediction for sponsored search with recurrent neural networks. In Twenty-Eighth AAAI
Conference on Artificial Intelligence, 1369-1375.
Zhang, Y., Jansen, B. J., & Spink, A. (2014). Identification of factors predicting clickthrough in web searching using neural network analysis. Journal of the American Society for Information