Panorama de la industria publicitaria actual

1 Introducción

3.4 Panorama de la industria publicitaria actual

Statistical data analysis faces limitations in dealing with data with high levels of uncertainty or with non-monotonic relationships among the variables. Pawlak provided a rule-based approach using Rough sets theory to address these problems. The original idea behind his Rough sets theory was “… vagueness inherent to the representation of a decision situation. Vagueness may be caused by granularity of the representation. Due to the granularity, the facts describing a situation are either expressed precisely by means of ‘granule’ of the representation or only approximately.”3_The

vagueness and imprecision problems are present in the information that describes most real world applications.

Information System

In rough sets, an information system is a representation of data that describes some object. An information system S is composed of a 4-tuple

S = < U, Q, V, f >

where U is the closed universe of a N objects {x1, x2, …, xN}, a non- empty finite set; Q is a nonempty finite set of n attributes {q1, q2, …, qn} (that uniquely characterizes the objects); V = UqQ Vq where Vq is the value of the attribute q; f : U × Q o V is the total decision function called the information function such that f(x, q) Vq for every q Q, x

U .4_{Pawlak (2000) gave a concrete instance in the form of a universe}

of 6 stores with four attributes (sales force empowerment {no, medium, or high}, merchandise quality {good or average}, presence of high traffic {yes, no}, and categorization of profit {profit, loss}).5_{The six stores are}

the universe U, the first three attributes are Q, their possible values V, and the profit category f.

Any pair (q, v) for q Q, v Vq is called the descriptor in an information system S. The information system can be represented as a finite data table, in which the columns represent the attributes, the rows represent the objects and the cells represent the attribute values f(x, q). Thus, each row in the table describes the information about an object in S.

3_{Z. Pawlak, R. S}_á_{owinski (1994). Decision analysis using rough sets,}_Interna-

tional Transactions of Operational Research 1:1, 107–114. 4_{Pawlak (1991), op cit.}

A Brief Theory of Rough Sets 89 Indiscernibility. If we let S = < U, Q, V, f > be an information system, A Q be a subset of attributes, and x, yU are objects, then x and y are indiscernible by the set of attribute A in S if and only if f(x, a) = f(y, a) for every a A. Every subset of variables A determines an equivalence relation of the universe U, which is referred to indiscernibility relation. For any given subset of attributes AQ the IND(A) is an equivalence relation on universe U and is called an indiscernibility relation. The indiscernibility relation IND(A) can be defined as

IND(A) = {(x, y) U × U : for all aA,f(x, a) = f(y, a)

If the pair of objects (x, y) belongs to the relation IND(A) then objects x

and y are called indiscernible with respect to attribute set A. In other words, we cannot distinguish object x from y based on the information contained in the attribute set A.

Decision Table

In rough sets, a special case of information systems is called a decision table (or decision system) if the attribute set Q is divided into two disjoint sets, conditional attribute set C and decision attribute set D, such that C D = Q and CD = . For instance, for a classification type problem, the set C would represent the characterization of the pattern (the independent variables) such as the financial indicators of a company, and the set D

would represent the classification decision (bankruptcy or not). A decision table can be denoted as

DT = < U, CD,V, f >

where C is a set of condition attributes (a non-empty set of discrete valued independent variables); D is a set of decision attributes (a non empty set of discrete valued decision variables); V = UqCD Vq, where Vq is the set of discrete values of the attribute q Q; f : U × (CD) oV is a total decision function such that f(x, q) Vqfor every q Q and x V. A decision table can also be represented as (U, C D) or DTC where the set C de- notes the conditional attribute set while the set D in general may represent the decision attribute set.

A decision table is called deterministic if each object’s decision attribute values can uniquely be specify by some combination of the conditional attribute values. On the other hand, a decision table is called non- deterministic if a number of decision attribute values may be taken for a given conditional attribute values. Some of the non-deterministic decision

90 6 Rough Sets

tables may be decomposed into two sub tables, deterministic and totally non-deterministic, where a totally non-deterministic decision table does not contain any deterministic sub table.

(Approximation) Some object cannot be completely distinguished from the rest of the objects (in line with the decision variable values) in terms of the available conditional attributes. Their designation can only be roughly (approximately) defined; leading to the idea of rough sets based approximation. The fundamental underlying of rough sets consists of the approximation of a set of objects by a pair of sets, called lower and upper approximation sets.

Let S = < U, Q, V, f > be an information system. A given subset of attributes AQ determines the approximation space AS = (U, IND(A)) in S. For a given AQ and XU the A-lower approximation AX of set

X in AS and the A-upper approximationCAX of X in AS are defined as follows:

AX = { x U : [x]AXz } = { Y A* : YX }

CAX = { x U : [x]AXz } = { Y A* : YXz } The lower approximation of AX of set X is the union of all those elementary sets, each of which contained by X. For any x AX, it is certain that x belongs to X. In other words, the lower approximation of AX of set X

contains all objects that, based on the information content of attributes A, can be classify as belonging to the concept of X with complete certainty.

The upper approximationCAX of set X is the union of those elementary sets each of which has non-empty intersection with X. For any x CAX,

we can only say that x can “possibly” belong to X. In other words, the upper approximationCAX of set X contains all objects that based on the information contained in A cannot be classified as not belonging to the concept X.

The A-boundary region of a set X U in AS (the doubtful region of IND(A)) is defined as follows:

BNA(X) =CAX – AX

For any x U belonging to BNA(X), it is impossible to determine that x belongs to X based on the description set of the elementary set of IND(A).

The A-lower approximation of set X is the possibly (the greatest) definable set in A of set X and the A-upper approximation of set X is the certainty (the smallest) definable set in A of set X. The A-boundary is a doubtful region in A of set X.

In document Aplicaciones de la inteligencia artificial en la industria publicitaria (página 50-57)