2. CAPÍTULO II: MARCO TEÓRICO 23
2.2. ALGORITMOS DE MINERÍA DE DATOS DE MICROSOFT 35
2.2.1. EL ALGORITMO DE REGRESIÓN LOGÍSTICA 38
i=1;:::;x, then select a random numberbetween zero and one and choose the evente
isuch that i 1 X j=1 Pr[e j ℄< i X j=1 Pr[e j ℄
(in the above expression
P 0 j=1
Pr[e j
℄=0). To compute the probabilitiesPr[e i
℄a method must be
available to list all thee i’s.
In the first part of this section we describe a well-known bijection between the conjugacy classes in the permutation group of ordernand the set of so called integer partitions ofn. By
Theorem 19 below and Theorem 16, for each conjugacy classC, all the information required to
computePr[C℄ can be extracted from the integer partition corresponding to the cycle type ofC.
Therefore the selection of a conjugacy class in Dixon and Wilf’s algorithm can be done by listing the corresponding integer partitions and using them to compute the required probabilities.
As a side result the bijection mentioned above along with some old results on the number of different integer partitions of a number, implies that the number of different conjugacy classes in a permutation group is superpolynomial in the order of the group. Hence an explicit listing of all of them is not a very effective way to compute the required probabilities. However in Section 2.2 we showed that if conjugacy classes are listed in w.a.l. order the first few are much more probable than all the others. In the final part of the section we describe an NC parallel algorithm for listing a polynomial number of integer partitions in the corresponding partial order.
2.4.1
Definitions and Relationship with Conjugacy Classes
Forn2IN + any sequence =(a i ) i=1;:::;h with1a 1 a 2 :::a h nandn= P h i=1 a iis called an integer partition ofn. Following Theorem 19 below, we will occasionally drop the
distinction between and the cycle type of the conjugacy class associated to it, and represent
by the cycle type notation,[k 1
;:::;k n
℄ (compare with Section 1.2.6), wheren = P n i=1 ik i and k
i gives the number of parts of size
i. A more compact multiplicity notation is obtained by taking
the pairs(k i
;i)such thatk i
representations.
Lemma 4 The number of non-zero pairs in multiplicity notation of any integer partition ofn is
O( p n). Proof. Let(i 1 ;k 1 )(i 2 ;k 2 ):::(i h ;k h
)be an integer partition ofninto somehnon-zero parts. We
can writen = P h j=1 i j k j P h j=1 i j but also n P h i=1 i = h(h 1)=2, sincei j j. This impliesh=O( p n). In particularh p
2nfor every positive integern. 2
standard cycle type multiplicity 1 1 1 1 1 1 1 1 [8,0,0,0,0,0,0,0] 8,1 1 1 1 1 1 1 2 [6,1,0,0,0,0,0,0] 6,1 1,2 1 1 1 1 1 3 [5,0,1,0,0,0,0,0] 5,1 1,3 1 1 1 1 2 2 [4,2,0,0,0,0,0,0] 4,1 2,2 1 1 1 1 4 [4,0,0,1,0,0,0,0] 4,1 1,4 1 1 1 2 3 [3,1,1,0,0,0,0,0] 3,1 1,2 1,3 1 1 1 5 [3,0,0,0,1,0,0,0] 3,1 1,5 1 1 2 2 2 [2,3,0,0,0,0,0,0] 2,1 3,2 1 1 2 4 [2,1,0,1,0,0,0,0] 2,1 1,2 1,4 1 1 3 3 [2,0,2,0,0,0,0,0] 2,1 2,3 1 1 6 [2,0,0,0,0,1,0,0] 2,1 1,6 1 2 2 3 [1,2,1,0,0,0,0,0] 1,1 2,2 1,3 1 2 5 [1,1,0,0,1,0,0,0] 1,1 1,2 1,5 1 3 4 [1,0,1,1,0,0,0,0] 1,1 1,3 1,4 1 7 [1,0,0,0,0,0,1,0] 1,1 1,7 2 2 2 2 [0,4,0,0,0,0,0,0] 4,2 2 2 4 [0,2,0,1,0,0,0,0] 2,2 1,4 2 3 3 [0,1,2,0,0,0,0,0] 1,2 2,3 2 6 [0,1,0,0,0,1,0,0] 1,2 1,6 3 5 [0,0,1,0,1,0,0,0] 1,3 1,5 4 4 [0,0,0,2,0,0,0,0] 2,4 8 [0,0,0,0,0,0,0,1] 1,8
Figure 2.3: Integer partitions of 8 in different representations.
The next result shows a relationship between integer partitions and conjugacy classes in permutation groups.
Theorem 19 The number of different conjugacy classes in the permutation groupS
nis exactly the
numberp(n)of integer partitions ofn.
Proof. The result is proved by showing that every conjugacy class inS
ndefines an integer partition
ofnand conversely for every integer partition ofnthere exists a conjugacy class inS n.
By Theorem 13 all permutations in a given conjugacy class have the same cycle type. Since permutations are bijections everyi2[n℄belongs to some cycle and to only one cycle. Hence the set
of cycle lengths of a permutation forms an integer partition ofn.
Conversely if=[k 1
;:::;k n
℄is an integer partition ofnthen the permutationgdescribed
by (k 1 +1 k 1 +2):::(k 1 +2k 2 1 k 1 +2k 2 ) | {z } k 2 ylesoflength2 (k 1 +2k 2 +1 k 1 +2k 2 +2 k 1 +2k 2 +3)::: | {z } ::: k 3 ylesoflength3 hask 1fixed elements, k
2cycles of length 2 and so on.
2
Both the counting and the uniform generation problem for integer partitions can be solved fast in parallel. The following result states a well known combinatorial identity on the numberp(n)
and the existence of an NC algorithm for calculatingp(n).
Theorem 20 [SS96] For everyn 2 IN
+
the numbersp(i)fori 2 f0;:::;ngcan be computed in O(log
2
n)time usingn 2
processors on an EREW PRAM.
Proof. It is well known (see for example [Rio58, p. 111]) that the numbersp(i)fori2f0;:::;ng
occur as coefficients ofx i in the polynomial n Y j=1 bn=j X i=0 x ij LetQ j (x)= P bn=j i=0 x ij
andQ[j℄be its internal representation. For example ifn=10then Q[3℄=(1;0;0;1;0;0;1;0;0;1;0) corresponding to1+x 3 +x 6 +x 9
. The product of two polynomials can be implemented by a function poly prod inO(logn)time usingnprocessors on an EREW PRAM using a parallel algorithm to
perform a Fast Fourier Transform (see [Wil94] for a nice and gentle description of the algorithmic ideas and [Lei92] for details about the parallel algorithm). Finallynpolynomials can be multiplied
using the function tree(Q;n;polyprod)as defined in Section 1.2.2. The result follows. 2
Example. Forn=10, 1 Y j=1 0 b10=j X i=0 x ij = (1+x+x 2 +x 3 +x 4 +x 5 +x 6 +x 7 +x 8 +x 9 +x 10 ) (1+x 2 +x 4 +x 6 +x 8 +x 10 )(1+x 3 +x 6 +x 9 )(1+x 4 +x 8 ) (1+x 5 +x 10 )(1+x 6 )(1+x 7 )(1+x 8 )(1+x 9 )(1+x 10 )
which eventually gives the polynomial 1+x+2x 2 +3x 3 +5x 4 +7x 5 +11x 6 +15x 7 +22x 8 +30x 9 +42x 10 +:::
It is worth noticing that there is a simpler way to compute thep(i)’s. It is easy to design
an algorithm for multiplying two polynomials inO(logn)parallel steps usingn 2
processors on an EREW PRAM. Such an algorithm can then be used in the proof of Theorem 20 instead of the one based on Fast Fourier Transform. This results in aO(log
2
n)time,O(n 3
)processors algorithm for
computing all thep(i)’s. Since the value ofnfor whichp(1);:::;p(n)are computed is logarithmic
in the overall input order, the complexity values mentioned above do not represent any penalty. We conclude this section by stating without proof a well-known asymptotic identity on
p(n). It will be used in analysing the algorithms presented in the next sub-section.
Theorem 21 [And76, Th. 6.3]p(n) e p 2n=3 4n p 3 .