RELATO DE EXPERIÊNCIA

OBSTETRICAL NURSES, OBSTETRICIANS AND MIDWIVES IN THE FIELD OF OBSTETRICS TODAY

Introducing Oracle Predictive Analytics 3-7

Behind the Scenes

This section provides some high-level information about the inner workings of Oracle predictive analytics. If you know something about data mining, you will find this information to be straight-forward and easy to understand. If you are unfamiliar with data mining, you can skip this section. You do not need to know this information to use predictive analytics.

EXPLAIN

EXPLAIN creates an attribute importance model. Attribute importance uses the Minimum Description Length algorithm to determine the relative importance of attributes in predicting a target value. EXPLAIN returns a list of attributes ranked in relative order of their impact on the prediction. This information is derived from the model details for the attribute importance model.

Attribute importance models are not scored against new data. They simply return information (model details) about the data you provide.

Attribute importance is described in "Feature Selection" on page 9-2.

PREDICT

PREDICT creates a Support Vector Machine (SVM) model for classification or regression.

PREDICT creates a Receiver Operating Characteristics (ROC) curve to analyze the per-case accuracy of the predictions. PREDICT optimizes the probability threshold for binary classification models. The probability threshold is the probability that the model uses to make a positive prediction. The default is 50%.

Accuracy

PREDICT returns a value indicating the accuracy, or predictive confidence, of the prediction. The accuracy is the improvement gained over a naive prediction. For a categorical target, a naive prediction would be the most common class, for a numerical target it would be the mean. For example, if a categorical target can have values small, medium, or large, and small is predicted more often than medium or large, a naive model would return small for all cases. Predictive analytics uses the accuracy of a naive model as the baseline accuracy.

The accuracy metric returned by PREDICT is a measure of improved maximum average accuracy versus a naive model's maximum average accuracy. Maximum average accuracy is the average per-class accuracy achieved at a specific probability threshold that is greater than the accuracy achieved at all other possible thresholds. SVM is described in Chapter 18.

PROFILE

PROFILE creates a Decision Tree model to identify the characteristics of the attributes that predict a common target. For example, if the data has a categorical target with values small, medium, or large, PROFILE would describe how certain attributes typically predict each size.

See Also: Chapter 2 for an overview of model functions and algorithms

Behind the Scenes

3-8 Oracle Data Mining Concepts

The Decision Tree algorithm creates rules that describe the decisions that affect the prediction. The rules, expressed in XML as if-then-else statements, are returned in the model details. PROFILE returns XML that is derived from the model details generated by the algorithm.

Part II

Mining Functions

In Part II, you will learn about the mining functions supported by Oracle Data Mining. Mining functions represent a class of mining problems that can be solved using data mining algorithms. When creating a data mining model, you must first specify the mining function then choose an appropriate algorithm to implement the function if one is not provided by default. Oracle Data Mining algorithms are described in

Part III.

Part II contains the following chapters:

■ Chapter 4, "Regression" ■ Chapter 5, "Classification" ■ Chapter 6, "Anomaly Detection" ■ Chapter 7, "Clustering"

■ Chapter 8, "Association"

■ Chapter 9, "Feature Selection and Extraction"

Note on Terminology: The term mining function has no relationship to a SQL language function.

Oracle Data Mining supports a family of SQL language functions that serve as operators for the deployment of mining models. See "Scoring and Deployment" in Oracle Data Mining Application Developer's Guide.

4

Regression 4-1 4

Regression

This chapter describes regression, the supervised mining function for predicting a continuous, numerical target.

This chapter includes the following topics:

■ About Regression

■ A Sample Regression Problem ■ Testing a Regression Model ■ Regression Algorithms

About Regression

Regression is a data mining function that predicts a number. Profit, sales, mortgage rates, house values, square footage, temperature, or distance could all be predicted using regression techniques. For example, a regression model could be used to predict the value of a house based on location, number of rooms, lot size, and other factors. A regression task begins with a data set in which the target values are known. For example, a regression model that predicts house values could be developed based on observed data for many houses over a period of time. In addition to the value, the data might track the age of the house, square footage, number of rooms, taxes, school district, proximity to shopping centers, and so on. House value would be the target, the other attributes would be the predictors, and the data for each house would constitute a case.

In the model build (training) process, a regression algorithm estimates the value of the target as a function of the predictors for each case in the build data. These relationships between predictors and target are summarized in a model, which can then be applied to a different data set in which the target values are unknown.

Regression models are tested by computing various statistics that measure the difference between the predicted values and the expected values. The historical data for a regression project is typically divided into two data sets: one for building the model, the other for testing the model. See "Testing a Regression Model" on page 4-5. Regression modeling has many applications in trend analysis, business planning, marketing, financial forecasting, time series prediction, biomedical and drug response modeling, and environmental modeling.

About Regression

4-2 Oracle Data Mining Concepts

In document SUMARIO EDITORIAL HISTORIA MISCELÁNEA. AÑO VIII - Número 18. Segundo Semestre 2021 (página 73-83)