1.3. Procesos de separación con membranas
1.3.6. Fenómenos implicados en la transferencia de materia
In this subsection, we study uniqueness of the matrix factorization problem (2.12) (modulo permutation of columns/rows). First note that in view of the affine indepen- dence of the columns of T , the factorization is unique iff T is, which holds iff
aff(D) ∩ {0, 1}m = aff(T ) ∩ {0, 1}m= {T
i.e. if the affine subspace generated by {T:,1, . . . , T:,r} contains no other vertices of [0, 1]m than the r given ones (cf. Figure 2.2). Uniqueness is of great importance in applica- tions, where one aims at an interpretation in which the columns of T play the role of underlying data-generating elements. Such an interpretation is not valid if (2.19) fails to hold, since it is then possible to replace one of the columns of a specific choice of T by another vertex contained in the same affine subspace.
Solution of a non-negative variant of our factorization. In the sequel, we argue that property (2.19) plays an important role from a computational point of view when solv- ing extensions of problem (2.12) in which further constraints are imposed on A. One particularly important extension is the following.
find T ∈ {0, 1}m×r
and A ∈ Rr×n
+ , A
>
1r = 1n such that D = T A. (2.20) Problem (2.20) is a special instance of NMF. The additional non-negativity constraints are of particular interest here, because they arise in the real world application which has motivated this work; see §2.4.8 below. It is natural to ask whether Algorithm 2.3 can be adapted to solve problem (2.20). A change is obviously required for the second step when selecting r vertices from T , since in (2.20) the columns D now have to be expressed as convex instead of only affine combinations of columns of T : picking an affinely independent collection from T does not take into account the non-negativity constraint imposed on A. If, however, (2.19) holds, we have |T | = r and Algorithm2.3 must return a solution of (2.20) provided that there exists one.
Corollary 2.5. If problem (2.12) has a unique solution, i.e. if condition (2.19) holds and if there exists a solution of (2.20), then it is returned by Algorithm 2.3.
Corollary 2.5 follows immediately from Proposition 2.4. Note that analogous results hold for arbitrary (not necessarily convex) constraints imposed on A. In this sense, the statement can be seen as trivial. Nevertheless, to appreciate that result, consider the converse case |T | > r. Since the aim is a minimal factorization, one has to find a subset of T of cardinality r such that (2.20) can be solved. In principle, this can be achieved by considering all |T |r subsets of T , but this is in general not computationally feasible: the upper bound of Proposition 2.4 indicates that |T | = 2r−1 in the worst case. For the example below, T consists of all 2r−1 vertices contained in an r − 1-dimensional face of [0, 1]m: T = 0(m−r)×r Ir−10r−1 0>r with T = ( T λ: λ1 ∈ {0, 1}, . . . , λr−1 ∈ {0, 1}, λr = 1 − r−1 X k=1 λk ) . (2.21) Remark. In Appendix B, we show that in general even the NMF problem (2.20) may not have a unique solution (note that failure of (2.19) only implies non-uniqueness of problem (2.12)). While it is well-known that NMF need not to have a unique solution [48], it is rather remarkable that even with binary constraints on one factor, which is a strong additional restriction, uniqueness may fail.
Uniqueness under separability. In view of the negative example (2.21), one might ask whether uniqueness according to (2.19) can at least be achieved under additional conditions on T . Below we prove uniqueness under separability (Definition2.3). Proposition 2.6. If T is separable, condition (2.19) holds and thus problem (2.12) has a unique solution.
Proof. We have aff(T ) 3 b ∈ {0, 1}miff there exists λ ∈ Rr, λ>1
r = 1 such that T λ = b. Since T is separable, there exists a permutation matrix Π such that ΠT = [Ir; M ] with M ∈ {0, 1}(m−r)×r. As a result,
T λ= b ⇐⇒ ΠT λ = Πb ⇐⇒ [Ir; M ]λ = Πb.
Since Πb ∈ {0, 1}m, for the top r block of the linear system to be fulfilled, it is necessary that λ ∈ {0, 1}r. The condition λ>1
r = 1 then implies that λ must be one of the r canonical basis vectors of Rr. We conclude that aff(T ) ∩ {0, 1}m = {T
:,1, . . . , T:,r}. Uniqueness under generic random sampling. Both the negative example (2.21) as well as the positive result of Proposition 2.6 are associated with special matrices T . This raises the question whether uniqueness holds respectively fails for broader classes of binary matrices. In order to gain insight into this question, we consider random T with i.i.d. entries from a Bernoulli distribution with parameter 1
2 and study the probability of the event {aff(T ) ∩ {0, 1}m = {T
:,1, . . . , T:,r}}. This question has essentially been studied in combinatorics [117], with further improvements in [84]. The results therein rely crucially on Littlewood-Offord theory, a topic we will touch upon in the subsequent paragraph.
Theorem 2.7. Let T be a random m × r-matrix whose entries are drawn i.i.d. from {0, 1} with probability 1
2. Then, there is a constant C so that if r ≤ m − C, Paff(T ) ∩ {0, 1}m = {T :,1, . . . , T:,r} ≥ 1 − (1 + o(1)) 4r 3 3 4 m − 3 4 + o(1) m as m → ∞.
Our proof of Theorem 2.7 relies on on two seminal results on random ±1-matrices. Theorem 2.8. [84] Let M be a random m×r-matrix whose entries are drawn i.i.d. from {−1, 1} each with probability 1
2. There is a constant C so that if r ≤ m − C, P (span(M ) ∩ {−1, 1}m = {±M:,1, . . . , ±M:,r}) ≥ 1 − (1 + o(1)) 4 r 3 3 4 m (2.22) as m → ∞.
Theorem 2.9. [148] Let M be a random m × r-matrix, r ≤ m, whose entries are drawn i.i.d. from {−1, 1} each with probability 1
2. Then P M has linearly independent columns ≥ 1 − 3
4 + o(1) m
Proof. (Theorem 2.7) Note that T = 1
2(M + 1m×r), where M is a random ±1-matrix as in Theorem 2.8. Let λ ∈ Rr, λ>1
r = 1 and b ∈ {0, 1}m. Then
T λ= b ⇐⇒ 1
2(M λ + 1m) = b ⇐⇒ M λ = 2b − 1m ∈ {−1, 1}
m. (2.24)
Now note that with the probability given in (2.22), span(M ) ∩ {−1, 1}m = {±M
:,1, . . . , ±M:,r} =⇒ aff(M ) ∩ {−1, 1}m ⊆ {±M:,1, . . . , ±M:,r}
On the other hand, with the probability given in (2.23), the columns of M are linearly independent. If this is the case,
aff(M ) ∩ {−1, 1}m ⊆ {±M
:,1, . . . , ±M:,r}
=⇒ aff(M ) ∩ {−1, 1}m = {M:,1, . . . , M:,r}. (2.25) To verify this, first note the obvious inclusion aff(M ) ∩ {−1, 1}m ⊇ {M
:,1, . . . , M:,r}. Moreover, suppose by contradiction that there exists j ∈ {1, . . . , r} and θ ∈ Rr, θ>1
r= 1 such that M θ = −M:,j. Writing ej for the j-th canonical basis vector, this would imply M (θ + ej) = 0 and in turn by linear independence θ = −ej, which contradicts θ>1r = 1.
Under the event (2.25), M λ = 2b − 1m is fulfilled iff λ is equal to one of the canonical basis vectors and 2b − 1m equals the corresponding column of M . We conclude the assertion in view of (2.24).
Theorem 2.7 suggests a positive answer to the question of uniqueness posed above. Asymptotically as m → ∞ and for r small compared to m (in fact, following [84] one may conjecture that Theorem 2.7 holds with C = 1), the probability that the affine hull of r vertices of [0, 1]m selected uniformly at random contains some other vertex is exponentially small in the dimension m. It is natural to ask whether a result sim- ilar to Theorem 2.7 holds if the entries of T are drawn from a Bernoulli distribution with parameter p in (0, 1) sufficiently far away from the boundary points. Second, it is of interest to know whether the above statement is already valid for finite, though reasonably large values of m. We have therefore conducted an experiment whose out- come suggests that the answers to both questions are positive. For this experiment, we consider the grid {0.01, 0.02, . . . , 0.99} for p and generate random binary matrices T ∈ Rm×r with m = 500 and r ∈ {8, 16, 24} whose entries are i.i.d. Bernoulli with parameter p. For each value of p and r, 100 trials are considered, and for each of these trials, we compute the number of vertices of [0, 1]m contained in aff(T ). In Figure 2.3, we report the maximum number of vertices over these trials. One observes that except for a small set of values of p very close to 0 or 1, exactly r vertices are returned in all trials. On the other hand, for extreme values of p the number of vertices can be as large as 220 in the worst case.
As a byproduct, these results indicate that also the NMF variant of our matrix factor- ization problem (2.20) can in most cases be reduced to identifying a set of r vertices of [0, 1]m (cf. Corollary 2.5).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 25 Probability p
Number of returned vertices (log
2
)
Maximum number of returned vertices over 100 trials r=8 r=16 r=24
Figure 2.3: Number of vertices contained in aff(T ) over 100 trials for T drawn entry-wise from a Bernoulli distribution with parameter p.