Análisis del Sector Económico - Estudio de Mercado

4. Estudio de Mercado

4.2. Análisis del Sector Económico

We now introduce the algorithm for deriving the set of uploading contents, which is

called the Greedy Content Modification algorithm (GCM for short). GCM has two

phases. In the first phase, GCM searches for a feasible set of contents covering all the

features in a task. In the second phase, GCM revises the contents one by one until the

For the first phase, GCM follows the category of typical heuristic algorithms designed for

the set cover problem. It iteratively selects the contents whose features cover the maximum

number of uncovered features in a task, until all the features are included in the selected

contents. Initially, GCM assigns for a task a vector Si ={si1, si2,· · · , siK0}, which indicates whether each feature in the task is covered. All the entries ofSi are set to 0 at the beginning.

In each iteration, GCM evaluates the following information for each non-determined content

Cij:

N CCij =

|{fijk|∃fl=fijk, til= 1, sil = 0}|

Bij

, (4.9)

where j ∈ {1,2,· · · , Ki}, fijk is the kth feature in content Cij, k ∈ {1,2,· · · , Nij}, and

| · | refers to the number of elements in the set. Then GCM selects the content with the

maximumN CCij, setsIij = 1, and updatesSi by changing the entries of those newly covered features to 1. GCM moves to the next phase when all the required features inTi are covered

and the relative entropy is still above the threshold. In other cases, GCM terminates, and

either uploads the selected content or sends notice to the participants.

For the second phase, GCM updates the results in the previous phase, and looks for a

feasible solution that follows the privacy constraint. The main idea in this phase is to use

the minimum cost of extra contents to decrease the relative entropy below the threshold.

Assume the uploading decision for participant Wi is I = {I1, I2,· · · , IKi}. In each iteration, GCM first partitions the set of physical status L = {L1, L2,· · · , LN} into two

categories: lower case andupper case. The lower case refers to the status that would lead

to the decrease of relative entropy, if GCM uploads one content belonging to that status,

i.e.,

D(P_I||PGI)> D(PIj||PGI), (4.10)

where Ij is the same as I except for uploading one more content with physical status Lj.

Meanwhile, the upper case accordingly refers to the status that would lead to the increase

As a first step, GCM checks for each uploaded content in the upper case. It searches

in the lower case for the contents that include all the features covered by an upper-case

content selected in the previous phase. When there are multiple candidate contents in the

lower case, GCM selects the one leading to the most change in the relative entropy. More

specifically, the features only refer to the ones newly covered in the first phase when the

replaced upper-case content is selected. GCM changes the uploading indicators for the two

contents when a matching is discovered. Then GCM updates the relative entropy and the

status in the upper and lower cases. This procedure stops when the relative entropy falls

below the threshold or no replacement exists.

As a second step, GCM selects the contents from the lower case to further decrease the

relative entropy. It ranks each non-determined content according to its scale of change

on the relative entropy, i.e., the difference between relative entropies when Cij is uploaded

or not, denoted asDiv(Cij).

Div(Cij) =

D(P_I||PGI)−D(PIj||PGI) Bij

. (4.11)

When multiple contents have the same Div(Cij), GCM ranks them according to the number

of features they cover for currently uploading contents in the upper case.

As a third step, GCM selects the first contentCir1 in the ranking list and sets Iir1 = 1. After the selection, GCM checks whether any content in the upper case withIij = 1 is covered

with the previous determined contents. If so, GCM first removes the covered content from

the uploading set. All the previous determined contents covering this content are marked as

exclusive contents.

When the relative entropy is still above the threshold, GCM continues ranking and

selecting a next non-determined content. In this scenario, GCM updates the relative entropy,

Div(·), uploading set I, and goes back to the second step for the next iteration.

Otherwise, when the threshold is reached, GCM starts to trace back. It checks the

means GCM always selects the content which leads to the minimum increase on the relative

entropy. If the content is marked as exclusive, GCM searches for the next one. Otherwise,

GCM removes the content from the uploading set. The tracing back ends when removing

any content violates the privacy constraint, and GCM stops.

Finally, GCM uploads all the contents withIij = 1 in the uploading decision vector and

terminates.

4.4 Performance Analysis

In this section, we first analyze the time complexity of GCM. Then we prove GCM can

always find a feasible uploading scheme for our problem when such a solution exists. Finally,

we analyze the extra cost caused by the existence of privacy issues.

In document Estudio de factibilidad para la creación de una institución educativa en primera infancia (página 46-55)