• No se han encontrado resultados

CAPÍTULO III: LA RSC Y EL COMPORTAMIENTO DEL CONSUMIDOR

3.5 El comportamiento del consumidor

3.5.1 La intención de lealtad

The system’s main components can be divided into external components that users connect to and internal components where the system does all the training and

model evaluation and selection. To help users to connect to the system, a web interface was built that has forms for submitting a task and checking the status of the submitted task. Internally, there are several components, tweet collection, feature extraction, and data preparation and classification, which are presented in Figure 6.1. All the components are shown in Figure 6.1, and the components and flow of the system are described in detail in this section.

Figure 6.1 System's main components

flow starts from system web interface and ends by sending email to the client, small grey area represents the internal retraining process that conducted

periodically

All components are developed from scratch starting from building a web interface and uploading datasets to the system host and then importing data into a database. Collecting data depends on the type of source, which could be direct tweet IDs or monitoring specific keywords or hashtags. Then feature extraction components (see Figure 6.2) are developed as each one requires specific and customised methods to obtain the required data. Moreover, system models are maintained and an automatic

retraining schedule is internally programmed to be run daily. Finally, the results are prepared and the applicant is notified that the results are ready to download.

Task input (using Web GUI): The system’s web interface is the main port to use the researchers’ prototype system. Users can submit their task through simple forms that require name, email and file, keyword or hashtag. Users can submit tweet IDs for the researchers to carry out the collection and sequences process or if users want to collect data from the tweet stream that contains specific keywords or hashtags. After a user submits a task to the system, they will receive a confirmation that contains details of the task ID and the expected finishing time or date. They will also be referred to the status page where they can check the task completion percentage.

Tweet collection, feature extraction and data preparation: Connecting to Twitter to retrieve a task tweet is the first thing the system does, and then all retrieved tweets are stored in a database (called requests DB). The feature extraction component starts once a new tweet is added to the table. This component contains several subcomponents that focus on extracting groups of features from various sources. Figure 6.2 shows in detail the inner process applied to each request to get into this component, starting with getting the final landing page from the unshortened tweet’s URL (if shorted link) and ending with the process of combining all the features and transforming them into an accepted machine learning format.

Figure 6.2 Feature extraction internal processes

extracting feature process start by retrieving URL from the database and ends back with all extracted features to the database.

The URL redirection feature shows the number of redirection URLs included before reaching the landing page. The pages opened during this URL redirection process were stored. Once the researchers reach the landing page, they start crawling the page using Selenium WebDriver (Google Chrome and Firefox). The landing page domain name is then passed to WHOIS feature extraction, which does domain WHOIS information requests and parses response data to obtain valuable information about domain ages and registrar information. In the final process of the feature extraction component, the system’s feature validation and transformation component carry out the final data validation to check whether all the features that needed to be used in the detection model exist and have been transformed into an accepted machine learning format, which is called the machine learning domain features vector. The process of

checking features’ formats is done automatically without any human intervention. Features should follow certain criteria and validate them before being inserted back into the database.

Finally, all the features and data collected are returned to the requests database; however, this time it is flagged as ready to be used against the system detection model. For further information about the features used in this system and the implementation and techniques involved, see chapter 3, section 3.3.1.

Classification stage: This stage contains pre-trained machine learning models which will be used to give the suspicion probability for each tweet (with URLs attached). In the back end of the system, there is already a specific component of which the main goal is to conduct model training and model selection. The next section will give more details about how the models were built and trained. In general, the task in this stage is to retrieve flagged requests and apply them to the deployed model or models depending on the method used. Requests should have a similar number of features and format to the ground truth dataset used in training the models. The classification output is a probability number that represents how suspicious the content is. The value of suspicion level ranges from 0 to 1, 0 being not suspicious at all and 1 being very suspicious of being a spam tweet.