4. Estudio de Mercado
4.2. Análisis del Sector Económico
We now introduce the algorithm for deriving the set of uploading contents, which is
called the Greedy Content Modification algorithm (GCM for short). GCM has two
phases. In the first phase, GCM searches for a feasible set of contents covering all the
features in a task. In the second phase, GCM revises the contents one by one until the
For the first phase, GCM follows the category of typical heuristic algorithms designed for
the set cover problem. It iteratively selects the contents whose features cover the maximum
number of uncovered features in a task, until all the features are included in the selected
contents. Initially, GCM assigns for a task a vector Si ={si1, si2,· · · , siK0}, which indicates whether each feature in the task is covered. All the entries ofSi are set to 0 at the beginning.
In each iteration, GCM evaluates the following information for each non-determined content
Cij:
N CCij =
|{fijk|∃fl=fijk, til= 1, sil = 0}|
Bij
, (4.9)
where j ∈ {1,2,· · · , Ki}, fijk is the kth feature in content Cij, k ∈ {1,2,· · · , Nij}, and
| · | refers to the number of elements in the set. Then GCM selects the content with the
maximumN CCij, setsIij = 1, and updatesSi by changing the entries of those newly covered features to 1. GCM moves to the next phase when all the required features inTi are covered
and the relative entropy is still above the threshold. In other cases, GCM terminates, and
either uploads the selected content or sends notice to the participants.
For the second phase, GCM updates the results in the previous phase, and looks for a
feasible solution that follows the privacy constraint. The main idea in this phase is to use
the minimum cost of extra contents to decrease the relative entropy below the threshold.
Assume the uploading decision for participant Wi is I = {I1, I2,· · · , IKi}. In each iteration, GCM first partitions the set of physical status L = {L1, L2,· · · , LN} into two
categories: lower case andupper case. The lower case refers to the status that would lead
to the decrease of relative entropy, if GCM uploads one content belonging to that status,
i.e.,
D(PI||PGI)> D(PIj||PGI), (4.10)
where Ij is the same as I except for uploading one more content with physical status Lj.
Meanwhile, the upper case accordingly refers to the status that would lead to the increase
As a first step, GCM checks for each uploaded content in the upper case. It searches
in the lower case for the contents that include all the features covered by an upper-case
content selected in the previous phase. When there are multiple candidate contents in the
lower case, GCM selects the one leading to the most change in the relative entropy. More
specifically, the features only refer to the ones newly covered in the first phase when the
replaced upper-case content is selected. GCM changes the uploading indicators for the two
contents when a matching is discovered. Then GCM updates the relative entropy and the
status in the upper and lower cases. This procedure stops when the relative entropy falls
below the threshold or no replacement exists.
As a second step, GCM selects the contents from the lower case to further decrease the
relative entropy. It ranks each non-determined content according to its scale of change
on the relative entropy, i.e., the difference between relative entropies when Cij is uploaded
or not, denoted asDiv(Cij).
Div(Cij) =
D(PI||PGI)−D(PIj||PGI) Bij
. (4.11)
When multiple contents have the same Div(Cij), GCM ranks them according to the number
of features they cover for currently uploading contents in the upper case.
As a third step, GCM selects the first contentCir1 in the ranking list and sets Iir1 = 1. After the selection, GCM checks whether any content in the upper case withIij = 1 is covered
with the previous determined contents. If so, GCM first removes the covered content from
the uploading set. All the previous determined contents covering this content are marked as
exclusive contents.
When the relative entropy is still above the threshold, GCM continues ranking and
selecting a next non-determined content. In this scenario, GCM updates the relative entropy,
Div(·), uploading set I, and goes back to the second step for the next iteration.
Otherwise, when the threshold is reached, GCM starts to trace back. It checks the
means GCM always selects the content which leads to the minimum increase on the relative
entropy. If the content is marked as exclusive, GCM searches for the next one. Otherwise,
GCM removes the content from the uploading set. The tracing back ends when removing
any content violates the privacy constraint, and GCM stops.
Finally, GCM uploads all the contents withIij = 1 in the uploading decision vector and
terminates.
4.4 Performance Analysis
In this section, we first analyze the time complexity of GCM. Then we prove GCM can
always find a feasible uploading scheme for our problem when such a solution exists. Finally,
we analyze the extra cost caused by the existence of privacy issues.