The bag model in language statistics
F. Criado
a,*, T. Gachechiladze
b, H. Meladze
b,
G. Tsertsradze
baFacultad de Ciencias, Campus de Teatinos, Universidad de M
a
alaga, 29071 Maalaga, Spain bDepartment of Applied Mathematics and Computer Science, Tbilisi State University,
1, chavchavadze Ave, Tbilisi 380028, Georgia
Received 8 May 2000; received in revised form 3 November 2001; accepted 30 January 2002
Abstract
In this paper, fuzzy quantitative models of language statistics are constructed. All suggested models are based on the assumption about a superposition of two kinds of uncertainties: probabilistic and possibilistic. The realization of this superposition in statistical distributions is achieved by the probability measure splitting procedure. In this way, the fuzzy versions of generalized binomial, Fucks and Zipf–MandelbrotÕs distributions are constructed describing the probabilistic and possibilistic organization of language at any level: morphological, syntactic or phonological. The main problem when constructing the quantitative model of some fuzzy linear structure is finding the corresponding linguistic spectrum, which is reduced to the solution of algebraic or transcendental equation systems by inverse spline-interpolation. In the final section, the general linear mathematical model of language structures is then described briefly, as well as bag statistics for consonantal structures of languages.
Ó 2002 Elsevier Science Inc. All rights reserved.
Keywords:Fuzzy sets; Membership functions; Probability theory; Linguistic modeling
www.elsevier.com/locate/ins
*Corresponding author.
E-mail address:[email protected](F. Criado).
0020-0255/02/$ - see front matter Ó 2002 Elsevier Science Inc. All rights reserved. PII: S 002 0- 02 5 5 ( 02 ) 002 01 - 3
1. Introduction
Fuzzy logic and fuzzy set theories were initially proposed to describe lin-guistic variables, i.e., to describe the meaning of words in natural language. Originally, Zadeh thought that the area of linguistics would be one of the major fields of application for this new formalism. Surprisingly, the main area of application is now control, and in comparison with control there are only a few applications in the field of linguistics. In view of this, this paper describes a new approach to the study of natural language.
A new approach to the representation of fuzzy sets as a result of set splitting is given in Section 2. This new approach has been applied to the representation of fuzzy sets as a result of the set splitting procedure into usual subsets of some universal set, which is convenient for describing possibilistic and probabilistic superpositions.
In Section 3 the characteristic laws of the split subset lattice (especially pseudo-complements and relative pseudo-complements) of the Browerian lat-tice of indicators (membership functions) of fuzzy sets, and as a consequence the measures of fuzziness, have also been considered in Section 5. The fuzziness is characterized by a relation betweeneAAandeAAD, which underlines the fact that
fuzziness is an intrinsic property of eAA and independent of the pseudo-com-plement.
The set splitting procedure is a new tool for defining and calculating random fuzzy event probabilities. On this basis, Section 7 deal with new generalizations of binomial, Zipf–Mandelbrot and other distributions have been obtained, describing the possibilistic–probabilistic organization of structures created by different language elements. In Section 8 the general linear mathematical model of language structures and its main characteristics are described briefly. The possibilistic characteristics of these models are represented by components of the so-called linguistic spectrum.
Applications of these models to language structures are presented in Section 9.
2. Set splitting
Let Xbe a finite set and Aany subset, AX. Consider a correspondence
IA! ðIe
A
A;IeAADÞ, whereIA is the indicator of subsetA,IeAA;IeAAD 2 ½0;1 X
and
IAðxÞ ¼Ie A
AðxÞ þIeAADðxÞ 8x2X: ð1Þ
Ais a support of mappings I
e
A andIAeD.
According to Zadeh [15], splitting components I
e
A and IAeD are fuzzy subsets
ofX. CallI
The procedure in which indicator IA is compared with a pair ðIe A A;IeAADÞ is
called ‘‘splitting of indicatorIA(subsetA)’’.1
The splitting procedure of some subsetsA;BXinduces the corresponding splitting of the union and intersection of these two subsets. For split indicators
I
f
A\B
A\B and IAAf[[BB it is essential to fulfill the natural conditions (as for non-split
ones)
I
eAAðxÞ;IeBBðxÞPIAAf\\BBðxÞ; IeAAðxÞ;IeBBðxÞ
6I
f
A[B
A[BðxÞ; x2X:
Then, as it is easy to see for intersection and union indicators, the following expressions are obtained:
If
A\B
A\BðxÞ ¼IeAAðxÞ ^IeBBðxÞ 8x2X ð^ minÞ ðsimultaneous splittingÞ;
ð2Þ
If
f
A\B Af\B A\B A\B
ðxÞ ¼Ie A
AðxÞ IeBBðxÞ 8x2X ðsequential splittingÞ; ð3Þ
I
f f
A[B Af[B A[B A[B
ðxÞ ¼I
eAAðxÞ þIeBBðxÞ Iff A\B Af\B A\B A\B
ðxÞ 8x2X ðsequential splittingÞ;
ð4Þ
If
A[B
A[BðxÞ ¼IeAAðxÞ _IeBBðxÞ 8x2X ð_ minÞ ðsimultaneous splittingÞ:
ð5Þ
3. The lattice of split elements of ordinary indicators’ Boolean lattice
Consider the Boolean lattice I¼ ðf0;1gX;_;^Þwith natural order. The set of all split elements of this lattice with natural order I ¼ ð½0;1X;_;^Þis a lattice.
Theorem 1.I is a Brouwer’s lattice.
A direct demonstration of this theorem (i.e., the demonstration that for any two elementsI
eAA andIeBB 2I
the set of allI
e
X X 2I
such that I
eAA^IeXX IeBB
2 has the greatest elementðI
eBB :IeAAÞcalled the relative pseudo-complement ofIeAAinIeBB)
can be made by [1]. It is easy to see that
1Notice that component
Ie
A
Acan be split again:IeAA¼ ðIeAAAAee;IeeeAAAADÞ, whereIee
A A
eAA
¼vIe
A
A,IeeeAAAAD¼ ð1vÞIe
A A;
v:X! ½0;1. Two sequential splittings induce the splitting of initial subset IA¼ ðvlIA;ð1vlÞIAÞ; lIA¼Ie
A
A; ð1lÞIA¼IeAAD; l:X! ½0;1: 2
ðIe B
B :IeAAÞðxÞ ¼
1; Ie
A AðxÞ
6I
eBBðxÞ IACðxÞ _I
eBBðxÞ; IeAAðxÞ>IeBBðxÞ
(
8x2X; ð6Þ
where IAC ¼ ðI;;I
eAAÞ is a pseudo-complement of IeAA and, as a function of x,
represents the indicator of the usual complement of set A in X. Next, the following theorem is easy to demonstrate.
Theorem 2.The following statements hold in latticeI:
ðiÞ If Ie A
A IeBB;thenðI;:IeBBÞ ðI;:IeAAÞ;
ðiiÞ Ie A
A ðI;:ðI;:IeAAÞÞ;
ðiiiÞ ðI;:IeAAÞ ¼ ðI;:ðI;:ðI;:IeAAÞÞÞ;
ðivÞ ðI;:ðIe
A
A_IeBBÞÞ ¼ ðI;:IeAAÞ ^ ðI;:IeBBÞ;
ðvÞ ðI;:ðI
eAA^IeBBÞÞ ¼ ðI;:IeAAÞ _ ðI;:IeBBÞ:
ðviÞ ðI
eAA:IeBBÞ ^IeAA¼IeAA;
ðviiÞ ðI
eAA:IeBBÞ ^IeBB¼IeAA^IeBB;
ðviiiÞ ððIe A
A:IeBBÞ:IeCCÞ ¼ ðIeAA:IeCCÞ ^ ðIeBB:IeCCÞ;
ðixÞ ðIe A
A:ðIeBB_IeCCÞÞ ¼ ðIeAA :IeBBÞ ^ ðIeAA:IeCCÞ:
ð7Þ
4. The splitting of a set
The splitting of a set, which as already seen corresponds to the indicator splitting, is represented by
ðIA! ðIeAA;IeAADÞÞ¡ðA! ðeAA;AAe DÞÞ;
ðIA¼Ie A
AþIeAADÞ¡ðA¼AAeeAA
DÞ: ð8Þ
Hereis the operation of set synthesis.
On the basis of (8) one can obtain a more general expressionAAeBBe, which obviously will make sense provided that eBB :eAA, or eAA :eBB. One can also obtain the existence conditions for expressions eAABBeCCe, etc.
Considering that such a condition holds for the above expressions, one can easily prove that
ðiÞ AAeBBe¼eBBeAA;
ðiiÞ AAe ðeBBCCeÞ ¼ ðAAeBBeÞ CCe;
ðiiiÞ ðAAeAAeDÞ \ ðeBBBBeDÞ ¼ ðAAg\\BBÞ ðAAg\\BBÞD
¼ ðeAA\BBeÞ ½ðA\BBeDÞ [ ðeAAD\BÞ;
ðivÞ ðAAeAAeDÞ [ ðeBBBBeDÞ ¼ ðAAg[[BBÞ ðAAg[[BBÞD
¼ ðeAA[BBeÞ ½ðeAAD\eBBDÞ [ ðAC\BBeDÞ [ ðeAAD\BCÞ;
ðvÞ AAe ðeBB\CCeÞ ¼ ðeAAeBBÞ \ ðeAACCeÞ;
ðviÞ AAe ðeBB[CCeÞ ¼ ðeAAeBBÞ [ ðeAACCeÞ:
ð9Þ
For example, to prove the last two formulae, one can write
ðvÞ eAA ðeBB\CCeÞ¡Ie A
Aþ ðIeBB^IeCCÞ ¼ ðIeAAþIeBBÞ ^ ðIeAAþIeCCÞ ¡ðeAABBeÞ \ ðAAeCCeÞ:
ðviÞ eAA ðeBB[CCeÞ¡Ie A
Aþ ðIeBB_IeCCÞ ¼ ðIeAAþIeBBÞ _ ðIeAAþIeCCÞ ¡ðeAABBeÞ [ ðAAeeBBÞ:
Let it be assumed that in these formulae the following relations hold:
ðI
f
A\B
A\B ¼IeAA^IeBBÞ¡ðAAg\\BB¼eAA\eBBÞ;
ðI
f
A[B
A[B ¼IeAA_IeBBÞ¡ðAAg[[BB¼eAA[eBBÞ
ð10Þ
which are evident because of (2), (5) and (8).
In the lattice of split subsets almost all Boolean lattice rules hold:
4(1) Reflexivity: AAe eAA.
3(2) Antisymmetry:ðeAAeBB;BBeeAAÞ )AAe¼eBB. 3(3) Transitivity: ðeAAeBB;eBBCCeÞ ) ðeAACCeÞ. 3(4) Idempotency: AAe\AAe ¼eAA and eAA[eAA¼AAe.
3(5) Commutativity: AAe\eBB¼BBe\AAe and eAA[BBe¼eBB[eAA.
3(6) Associativity: ðeAA\BBeÞ \CCe¼eAA\ ðeBB\CCeÞandðeAA[eBBÞ [CCe¼AeA[ ðBBe[CCeÞ. 3(7) Distributivity: AAe\ ðBBe[CCeÞ ¼ ðeAA\eBBÞ [ ðeAA\CCeÞ and AAe[ ðeBB\CCeÞ ¼
ðeAA[eBBÞ \ ðeAA[CCeÞ.
3(8) Annihilation laws: eAA\ ðeAA[eBBÞ ¼eAA and eAA[ ðeAA\eBBÞ ¼eAA.
3(9) Involution law for fuzzy complement::ð:AAeÞ ¼ eAA.
(10) Identity laws:AAe[ ; ¼eAA; AAe\X¼AAe and eAA[X¼X; AAe\ ; ¼ ;.
(11) Order inversion laws: ðeAAeBBÞ () ð:eBB :eAAÞ and ðeAAeBBÞ () ðeBBD
e
A ADÞ.
(12) De MorganÕs laws::ðeAA[eBBÞ ¼ ð:eAA\ :eBBÞand:ðeAA\eBBÞ ¼ ð:eAA[ :eBBÞ. In connection with the introduced notion of dual subsets one can prove the following laws:
(13) Involution law for the dual subset: ðeAADÞD¼eAA:
(14) Duality laws for the union and intersection of split subsets:
ðeAA[eBBÞD¼ ðeAAD\BBeDÞ [ ðAC\BBeDÞ [ ðBC\eAADÞ;
ðeAA\eBBÞD¼ ðA\BBeDÞ [ ðeAAD\BÞ:
Notice that in latticeI laws of contradiction and tertium non-datur do not hold.
5. Dual element and fuzziness (qualitative consideration)
As illustrated before, the dual element plays an important role in describing split subset lattices. Now, the role of the dual element in understanding fuzz-iness will be considered.
There is an important difference between usual and fuzzy subsets. The usual subset (set) can be represented as an aggregate of real objects only when the real measured potential possibility of aggregate formation corresponds to fuzzy subsets. Fuzzy subset is a medium of formation for real aggregate. It is im-portant to notice that the term ‘‘medium of formation’’ is borrowed from Weil [11] to underline the following circumstance: Any sequence of research out-comes is a result of acts of free decision-making by the subject (observer), any concrete sequence is a crisp finite subset of some universum, but the fuzzy subset is analogous of WeilÕs continuum.
In the lattice of fuzzy subsets a dual element eAAD is defined by splitting procedure [2,5,6]. Its sense can be explained as follows: the value of the membership functionIe
A
AðxÞis a degree of concordance of an elementxwith the
concept represented byAAe; the valueIe A
ADðxÞhas the same sense with respect to
the concept represented byAAeD, which together withAAe,ðeAA;AAeDÞdefines a crisp subsetA. The nearer (in some sense) AAe and eAAD are [12], the more fuzzy the following statement is ‘‘Elements of A possess property eAAðeAADÞ’’. Below, a qualitative description of fuzziness is considered analogously with [12], but with the following difference: In [12], the fuzziness is characterized by the re-lationfbetweeneAAand ZadehÕs negation:eAA. In the present case, the less rigid relationubetweeneAAandeAAD, which in the authorsÕopinion underlines the fact that fuzziness is an intrinsic property of AAe and is independent of the pseudo-complement, is assumed as a basis. The basis for considering the relationuis a relation in distributive lattice, ‘‘CCe is betweenAAe and eBB,ðeAA;CCe;BBeÞ’’ [12].
Definition 1.Let XXe and YYe 2L (distributive lattice). XXe is no less fuzzy than e
Y
Y ðXXeuYYeÞifXXe Y andðXXeYÞD¼XXeDY are inLbetweenYYe andYYeD. Here
ðXXeuYYeÞ ¼ ðeðeYY;XXe Y;YYeDÞ
Y
Y;XXeDY;YYeDÞ
() YYe\YYeDXXe Y YYe[YYeD:
Theorem 3.Relationuis reflexive and transitive onL,i.e.,
ðXXeuXXeÞ and ½ðXXeuYYeÞand ðYYeuZZeÞ ) ðXXeueZZÞ:
It can be seen thatuonLis not antisymmetric and, therefore, not a partial order.
Theorem 4.RelationuonL is such that
(1) ðXXeuXXeDÞand ðXXeDuXXeÞ.
(2) ðXXeuYYeÞ () ðXXeDuYYeÞ () ðXXeDuYYeDÞ () ðXXeuYYeDÞ.
On the lattice L, let a relation E be defined so thatðXX Ee YYeÞ if XXe ¼YYe or e
X
XD¼YYe or XXe ¼YYeD. It can be shown that Eis an equivalence relation. Each
equivalence class consists of a fuzzy subset and its respective dual. If XXe ¼XXeD
then the equivalence class consists of only one element.
The subset consisting of any fuzzy subset and its respective dual subset is called the dual pair. According to Theorem 4, if one component of the dual pair is more fuzzy than any component of the other pair, then any component of the first pair is more fuzzy than any component of the second pair. So it is reasonable to introduce the notion of fuzziness of the dual pair.
Definition 2. Let L be a set of dual pairs. Define on L a relation U so that
ðeuuUevvÞforeuu;evv2L, if one can say that the dual paireuu is no less fuzzy than the dual pairevv.
It is easy to demonstrate that relationUon the set of dual pairs is a partial order relation.
6. Probability measure splitting
Let ðX;B;pð ÞÞ be a given probability space. The probability of the event
K2Bis calculated by formula
pðKÞ ¼
Z
X
IKðxÞpðdxÞ: ð11Þ
According to the splitting procedure of the set K, this formula can be re-written in the following form:
pðKKeKKeDÞ ¼
Z
X I
e
K
KðxÞpðdxÞ þ
Z
X I
eKKDðxÞpðdxÞ; ð12Þ
whereIe K
Kis aB-measurable membership function (the corresponding subsetKKe
is a fuzzy random event). DefinepðKKeÞandpðeKKDÞas follows:
pðKKeÞ ¼ Z
X Ie
K
KðxÞpðdxÞ and pðKKe DÞ ¼
Z
X Ie
K
KDðxÞpðdxÞ; ð13Þ
the probability of fuzzy event KKe and the probability of dual fuzzy event KKeD,
respectively. Let representation
pðKÞ ¼pðeKKKKeDÞ ¼pðKKeÞ þpðKKeDÞ ð14Þ
be called the procedure of probability measure splitting [16].
7. Fuzzy distributions
7.1. Binomial distribution withfuzzy elementary events
Let A¼ f0;1g be the space of elementary events.
One can obtain the fuzzy elementary events by splitting usual eventsf0gand f1g. For membership functions one can write
vf0gðxÞ ¼l0ðxÞvf0gðxÞ þ ð1l0ðxÞÞvf0gðxÞ;
vf1gðxÞ ¼l1ðxÞvf1gðxÞ þ ð1l1ðxÞÞvf1gðxÞ;
ð15Þ
wherel0;l1:A! ½0;1,x¼0;1.
According to (13), the probability of fuzzy elementary events is
pfe00g ¼l0p0; pfe11g ¼l1p1; ð16Þ
wherep0andp1are the probabilities of the corresponding crisp events. Now it is easy to write the split binomial distribution corresponding to fuzzy elementary events. Only two variants will be considered: completely simulta-neous and completely sequential. The intermediate cases are not of any interest and for this reason they will not be considered here.
For the completely simultaneous case, the split binomial distribution is
pðgBBnn;;nnÞ ¼l1pn1;
pðgBBnn;;00Þ ¼l0ð1p1Þ
n ;
pðgBBnn;;kkÞ ¼ ðl0^l1Þ
n
k p
k
1ð1p1Þ
nk
; k¼1;. . .;n1;
ð17Þ
whereBBgnn;;kk is the fuzzy Bernoulli event. The normalization factor is
p1ðeAAnÞ ¼ ½ðl0^l1Þ þ ðl1 ðl0^l1ÞÞp
n
1þ ðl0 ðl0^l1ÞÞð1p1Þ
n
For the completely sequential case one gets
pðBBBBgggnnnn;;;;kkkkÞ ¼ n
k ðl1p1Þ
k
ðl0ð1p1ÞÞnk ð18Þ
and
p1ðAAfAAffnnnnÞ ¼ ½l
0þ ðl1l0Þp1
n :
The important characteristic of split Bernoulli probability (17) is the com-position law; in the simultaneous case
pðgBBnn;;kk;p1p2Þ ¼
Xn
m¼0
pðBn;m;p1ÞðBBgmm;;kk;p2Þ ð19Þ
and in the sequential case
p BBBBgggnnnn;;;;kkkk;
l1l2p1p2
ðl0þ ðl1l0Þp1Þðl0þ ðl1l0Þp2Þ
¼X
n
m¼0
p BBBBgggnnnn;;;;mmmm;
l1p1
l0þ ðl1l0Þp1
p
ee
An Aen An An
Bm;k;
l2p2
l0þ ðl1l0Þp2
ð20Þ
and
p
ee
An Aen An An
Bn;k;
l1l2p1p2
ðl0þ ðl1l0Þp1Þðl0þ ðl1l0Þp2Þ
¼X
n
m¼0
p
ee
An Aen An An
Bn;m;
l1p1
l0þ ðl1l0Þp1
p
ee
An Aen An An
Bm;k;
l2p2
l0þ ðl1l0Þp2
:
As well as the characteristics of binomial probabilities in the case of fuzzy elementary events, one may consider the known property of exponential dis-tribution; in the simultaneous case
X1
m¼0
pðBBgmm;;nn;p1Þfðm;uÞ
¼ ðl0^l1Þð1vÞvnþ ðl1 ðl0^l1ÞÞð1uÞðp1uÞn; n6¼0;
X1
m¼0
pðBBgmm;;00;p1Þfðm;uÞ ¼
l0ð1uÞ
1 ð1p1Þu¼l0gð0;vÞ
ð21Þ
and in the sequential case
X1
m¼0
p
ee
An Aen An An
Bn;m;
l1p1
l0þ ðl1l0Þp1
fðm;uÞ ¼gðn;v0Þ; ð22Þ
where
v¼ p1u 1uþp1u
; v0¼ l1p1u
ð1p1Þl0þl1p1þ ð1p1Þl0u
:
7.2. The binomial distribution with fuzzy number of successes
Let setAn¼ f0;1; . . . ;ngbe considered. The fuzzy quantity ‘‘approximately k from n’’ is defined as the fuzzy subset of An. Therefore the corresponding
distribution is
pðBekk n;pÞ ¼
Xn
l¼0
le kkðlÞpðB
l
n;pÞ; ð23Þ
where le
kkðlÞ is the membership function of fuzzy number ‘‘approximate k
fromn’’.
This distribution is also called the binomial distribution because it is char-acterized by the above composition law and the property of exponential dis-tribution.
7.3. Fuzzy upper binomial distribution
The consideration of the usual upper binomial distribution is based on the model of superposition of two events. The Bernoulli event and the emergence of the total amount of failures characterized by a priori probabilitypðB0Þ ¼ 1c.
If p1 is the probability of elementary success, l0 and l00 are values of
membership functions corresponding to complicated eventsð0zfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflffl{;0;. . .;0Þ
n
when distinguishing the events of a Bernoulli and non-Bernoulli origin, then the universal set X, which is the composition B0 ðS
n
i¼0Bn;iÞ, is split in the
fol-lowing way:
X¼ B0 f [n
i¼0 [n
i¼0
Bn;i
!
B0
f [n
i¼0 [n
i¼0
Bn;i
!D :
The corresponding membership function
v B0 f [n
i¼0 [n
i¼0
Bn;i
!ðx1;...;xnÞ
¼l00vB0ðx1;. . .;xnÞ þl0vBn;0ðx1;. . .;xnÞ
þX
n
i¼1
vBn;iðx1;. . .;xnÞ
(conditionBBgnn;;00fBB00 :ðS
n
The probability measure corresponding to fuzzy upper binomial distribution is
p B0
f [n
i¼0 [n
i¼0
Bn;i
!ðB0Bn;i;p1;cÞ
¼ 1
p B0
fSn
i¼0 Sn i¼0
Bn;i
!
l0
0ð1cÞ þl0cð1p1Þ
n
; i¼0;
c n
i p
i
1ð1p1Þ
ni
; i¼1;. . .;n;
8 > < >
: ð24Þ
where
p B0
f [n
i¼0 [n
i¼0
Bn;i
!
¼l00ð1cÞ þl0cð1p1Þ
n
þcð1 ð1p1ÞnÞ:
The Poisson limit
PðiÞ ¼l00ð1cÞ þl0cecþcð1ecÞ 1
l0
0ð1cÞ þl0cec; i¼0;
cexp cci i!
; i¼1;2;. . .;
(
ð25Þ
whereiiandcare connected by the relation
ii¼ cc
l0
0ð1cÞ þl0cecþcð1ecÞ
:
From a practical viewpoint, what is interesting is the expression of the sum over all values ofl0andl00:
3
e
P
PðiÞ ¼ 1 ð1e
cÞn; i¼0; nexp cci
i!
; i¼1;2;. . .;
ð26Þ
where
n¼c
Z Z
06l0;l0061
l00ð1
cÞ þl0cecþcð1ecÞ 1
dl0dl00:
Taking into account the relation between c;nandii, then
e
P PðiÞ ¼
1nð1eii=nÞ; i¼0;
nexp ii n
ðii
nÞ
i
i!
; i¼1;2;. . .
8 < :
3Notice that formula (26) does not itself contain any fuzziness, being a nice instance of
7.4. Negative binomial distribution withfuzzy elementary events [8]
Let the sequence of Bernoulli trials with probability of fuzzy success
pðe11Þ ¼l1p1 be considered ðl1:f0;1g ! ½0;1Þ, p1 the probability of usual Bernoulli elementary event, fðk;r;p1Þ denotes the probability that the rth success takes place in ðkþrÞth trial, provided that trials are continued up to
rth success. Accepting a splitting scheme that is used for binomial distribution with fuzzy elementary events, one can write
fðk;r;pðe11ÞÞ ¼ ðl0^l1Þ
rþk1
k
pr
1ð1p1Þ
k
; k>0;
l1pr
1; k¼0:
8 <
: ð27Þ
Since for anym>0,
mþk1
k
¼ ð1Þk m
k
;
then the above formula can be written in the following form:
fðk;r;pðe11ÞÞ ¼ ðl0^l1Þ r
k
ð1Þkpr
1ð1p1Þ
k; k>
0;
l1pr1; k¼0:
8 <
: ð28Þ
Define negative binomial distribution with fuzzy elementary events, but fixed real numberr>0and 0<p1<1 as sequence
fðpÞ1fðk;r;pðe11ÞÞg; ð29Þ
where
p¼X
1
k¼0
fðk;r;pðe11ÞÞ ¼ ðl0^l1Þ þ ½l1 ðl0^l1Þpr1:
Note that ifl0;l1!1, orðl0^l1Þ ¼l1, then (29) reduces to usual negative binomial distribution.
7.5. Fuzzy Fucks’ distribution
As in the case of ‘‘upper Bernoulli’’ distribution, all variants of FucksÕ distribution [3] are based on the assumption that FucksÕ event is a superposi-tion of Bernoulli and deterministic events
Uk
n;r;p1 ¼BrB
kr nr; U
k n;p1¼
[n
r¼0
BrBkn;rr;p1
; ð30Þ
whereBr is deterministic (certainlyrsuccesses inn trials) andBnkrr;p1 is a Ber-noulli event ((kr) successes in (nr) random events).
There are many variants of FucksÕevent splitting, but only some of them are considered in this paper.
(1) The deterministic event is non-fuzzy, but Bernoulli elementary events are fuzzy. In this case
g
Uk n;p1
Uk n;p1 ¼
[n
r¼0
BrBBgkn;r;rp1
kr n;r;p1
:
The corresponding probability measure is
PðUgk n;p1 Uk
n;p1Þ ¼ Pn
r¼0qrðl0^l1Þ nr kr
pkr
1 ð1p1Þnk; k¼1;2;. . .;n1; Pn
r¼0qrl1p1nr; k¼n; qol0ð1p1Þn; k¼0; 8
> > < > > :
ð31Þ
(for simultaneous splitting) with
Xn
k¼0
PðUgk n;p1
Uk n;p1Þ ¼
l1þ ðl0^l1Þq0ð1p1Þ
n
; l0Pl1;
l0þ ðl1l0Þ Pn
r¼0qrpn1r; l0<l1
and
PðUggk n;p1
Uk n;p1 g
Uk n;p1
Uk n;p1Þ ¼
Xn
r¼0
qr nr
kr
ðl1p1Þkrðl0ð1p1ÞÞnk ð32Þ
(for sequential splitting) with
Xn
k¼0
PðUggk n;p1
Uk n;p1 g
Uk n;p1
Uk n;p1Þ ¼
Xn
r¼0
qrðl0þ ðl1l0Þp1Þ
nr :
Hereqr is connected to linguistic spectrum [3].
(2) Br events are splitðBr¼BBfrrfBBrrDÞand Bernoulli events are crisp:
g
Uk n;p1
Uk n;p1 ¼
[n
r¼0
fBBrrBknrr;p1
:
Evidently
PðUgk n;p1
Uk n;p1Þ ¼
Xn
r¼0
vrqr nr
kr
pkr
1 ð1p1Þ
nk;
ð33Þ
wherevr is the membership function of fuzzy setfBBrr and
Xn
k¼0
PðUgk n;p1
Uk n;p1Þ ¼
Xn
r¼0
vrqr: ð34Þ
(3) In the case when both deterministic and Bernoulli events are split, one must discriminate clearly the simultaneous and successive or sequential splitting
of FucksÕevent. In the last case it is easy to obtain the final result. Consid-eration of the two aforesaid cases allows one to write
g g
Ukn;p1 Ugkn;p1 Ukn;p1 Ukn;p1 ¼[
n
r¼0
Br
g
Bkr nr;p1
Bkr nr;p1 g g
Bkr nr;p1
Bkr nr;p1 g
Bkr nr;p1
Bkr nr;p1 8 < : 0 @ 1
A; ð35Þ
consequently
PðUggk n;p1
Uk n;p1 g
Uk n;p1
Uk n;p1Þ ¼
Pn
r¼0qrðvr^ ðl0^l1ÞÞ
nr
kr
pkr
1 ð1p1Þ
nk
; k¼1;. . .;n1;
Pn
r¼0qrðvr^l1Þpn1r; k¼n; ðv0^l0Þq0ð1p1Þ
n
; k¼0;
8 > > < > > : ð36Þ
(simultaneous splitting of Bernoulli event) and
PðUggk n;p1
Uk n;p1 g
Uk n;p1
Uk n;p1Þ ¼
Xn
r¼0
vrqr nr
kr
ðl1p1Þkrðl0ð1p1ÞÞnk ð37Þ
(completely sequential splitting of Bernoulli event). When Fucks’ event is split simultaneously the author’s reasoning is as follows: ðBrBknrr;p1Þ is a realized chain of distributed successes and failures, a chain that is a concatenation of two others: Deterministic in which there are onlyr suc-cesses and Bernoulli sequence of length (nr) containing krsuccesses. Therefore simultaneous splitting must take place according to the rule
lðBrBBgknrr;p1
kr
nr;p1Þ ¼lðfBBrrÞ ^lðBg
kr nr;p1
Bkr
nr;p1Þ: ð38Þ Consequently
PðUgk n;p1 Uk
n;p1Þ ¼
Pn
r¼0qrðvr^ ðl0^l1ÞÞ
nr
kr
pkr
1 ð1p1Þ
nk
; k¼1;. . .;n1;
Pn
r¼0qrðvr^l1Þp1nr; k¼n; ðv0^l0Þq0ð1p1Þn; k¼0:
8 > > < > > : ð39Þ
The considered fuzzy FucksÕ distributions play a leading part in con-structing fuzzy quantitative micro-linguistic models of language.
(4) Some language structures are often described by generalized FucksÕ distri-butions when there are two kinds of successes with probabilitiesp andq. In this case
P
P0ðUkg
n;Wr;p;q Uk
n;Wr;p;qÞ ¼
Xn
r¼0
qr
nr
kr
If one is only interested in one kind of success then
P P00ðUkg
n;Wr;p;q Uk
n;Wr;p;qÞ ¼
Xn
r¼0
qr nr
kr
Z 1
0
ðpqÞkrð1pþpð1qÞÞnkdp: ð41Þ
The corresponding Poisson limit (n! 1;q!0and aiiexpPn r¼0rqr¼ q=2ðnPnr¼0rqrÞ ¼const) is
F0ðUgk Wr
Uk WrÞ ¼e
aX
1
r¼0
qr akr
ðkrÞ!/krðaÞ; ð42Þ
where
/krðaÞ ¼
ea
2akrþ1 Z 2a
0
tkretdt¼e
aCðkrþ1Þ
2akrþ1 Pðkrþ1;2aÞ; ð43Þ
CðzÞ is an Euler integral, and Pðkrþ1;2aÞ an incomplete function. Taking into account the relation between the incomplete gamma-function andv2-distribution one finally obtains
F0ðUgk Wr
Uk WrÞ ¼
1 2a
X1
r¼0
qrPð4ajkrþ1Þ; ð44Þ
wherePð4ajkrþ1Þis v2-distribution with 2ðkrþ1Þdegrees of free-dom. Distribution (44) is called the ‘‘v2-distribution with approximately ðkrþ1Þdegrees of freedom’’.
7.6. Fuzzy Zipf–Mandelbrot distribution
It is a well known that MandelbrotÕs theory of recurrent coding constitutes the basis of statistical macro-linguistics. If the vocabulary of volumeRis di-vided into S classes according to informational cost [9] of words of a given class, then the probability of the word ofkth class can be expressed as
pk¼PMBCk; ð45Þ
whereP;M;B do not depend on the cost and Ck is a kth class informational
cost.
Let three cases of splitting be considered.
(1) The set of classesK¼ ðk1;k2;. . .;kSÞ ¼ KKe KKeD. In this case
f
pki
pki ¼lekkðkiÞpki; i¼1;. . .;S: ð46Þ
(2) The set of informational costs C¼ ðc1;. . .;cSÞ ¼CCe CCeD. Since pk is a
function ofck, according to the principle of generalization [17] one obtains
f
pki
(3) When the number of classes is fuzzy numbereSS ¼S1s¼1ðleSSðSÞ=SÞ, by anal-ogy with binomial distribution with fuzzy number of trials, one can write
e
p
pk ¼X
1
S¼1
le S
SðSÞPðSÞM
BðSÞCk: ð48Þ
The above-mentioned formulae must be applied to the whole language as a formation medium, while the classical one must be applied to individual texts.
8. Linear structure
One of the research methods of a linear structure of language elements se-quence is the gap analysis method which consists in the following: the elements of a sequence are not distributed randomly (in disorder); any deviation from full disorder indicates the presence of some structure. The quantitative inves-tigation techniques are as follows: The pair of elements are fixed by some features; elements between the fixed ones are considered as gaps. Hence the sequence may have the following form:– – –½a1– – –½a2– – –½b1½a3– – –½b2– – –
½a4½b3– – –½b4– – –½a5– – –½b5– – –½a6– – –½b6– – –.
Let the structure defined by elements ½a–½b be considered. The complex consisting of ½a nearest ½b and the gaps between them are called ‘‘words’’ (MandelbortÕs definition).
So as to describe mathematically such word generation, the model is applied according to the generation process of any analyzing structure represented as the superposition of two processes: probabilistic and possibilistic. Therefore, one may apply the considered fuzzy probability measures for describing gap distribution by words. The gap analysis method, together with suggested modeling schemes, allows one to establish the structural dependence between elements of any level.
The main characteristics of linear models are the components of the lin-guistic spectrum (qr;/r;wr). Their determination is reduced to the solution of the system of equations
okGðy;aÞ
oyk
" " " "y
!1
¼iði1Þ ðikþ1Þexp; k¼1;2;. . .; ð49Þ
where
Gðy;aÞ ¼X
l
PðlÞyl;
PðlÞis the probability distribution of the chosen model,ais a known function of linguistic spectrum components andð Þexp are measured moments of gap distribution. A special method is elaborated for solving system (49). The
de-termination of the linguistic spectrum allows one to calculate the informational content of any given structure.
9. Bag statistics for consonantal structures of languages
One method of analyzing several structures of printed information entails the investigation of the probabilistic–possibilistic organization of some of the distribution elements determining the analytic structure [7]. From the point of view of the chosen elements, printed information can be considered as carteges of elements or YagerÕs bags, the main characteristics of which can serve as quantitative analytic parameters, and the probabilistic–possibilistic model parameters as the characteristics of the structures studied in this paper.
The probabilistic–possibilistic organization of bag distributions is described by generalized FucksÕ distribution [4]. Some of the results obtained from this distribution are given below. The aforesaid FucksÕdistribution is based on the superposition of the following two processes:
Uk
n;r;p¼BrBnkrr;p; U k n;p¼
[n
r¼0
BrBknrr;p
; ð50Þ
where Uk
n;r;p is a FucksÕ event, Bnkrr a BernoulliÕs event and Br the so-called
deterministic event [3]. There are many ways of splitting FucksÕevent [4]. But only the event related to the bag distribution model will be considered here. That is to say, the case in which BernoulliÕs event is classical but the deter-ministic event
e
U Ukn;p¼[
k
r¼0 e
B
BrBknrr;p ð51Þ
is split.
Let the structure of the set of events UUekn;p be described.
Before continuing, the following comments should be made: In fuzzy subset applications the problem of evaluating the membership grade is highly im-portant. The membership grade is a result of expert research determining (creating) the fuzzy subset. Let the method making it possible to reveal the membership function in a logically consistent way be considered. It is supposed that the fuzzy subset elements are such that I
eAAðx
0ÞPI
eAAðx
00Þ, x0;x002X0, if
x0"x00;I
eAA is a membership function of eAA. In the present caseX
0 consists of
BernoulliÕs eventsBir
nr,iis a full number of successes andris a fixed number of
successes determining the structure of eventfFFi
n;r;p and corresponding to event Br. The fuzzy subsets considered here are normalized. This permits one to
can be easily related to focal probabilities, without the necessity of making any additional assumptions.
Let the random experiment in which the level set notion is used and the YagerÕs algorithm is represented byx02X0[13] be considered. Firstly let value
a2 ½0;1and the element from the corresponding set ofa-level be chosen. Now let the probability of choosing specifically element x02X0 be calculated
ac-cording to the conditions established in this example. In accordance with this assumption
061626 6nmax¼1;
where r are values of the membership function (components of possibility
distribution, or components of the so-called linguistic spectrum [3]). The level sets are as follows:
when 06a61:B1¼ fx01;. . .;x
0
ng; 16a62:B2¼ fx02;. . .;x0ng; 26a63:B3¼ fx03;. . .;x
0
ng; . . .
n26a6n1:Bn1¼ fx0n1;x
0
ng; n16a6n:Bn¼ fx0ng;
<a:Ba¼ ;:
Because a was chosen randomly in this example, then the probability that level set Br will be chosen is equal to the length of interval ðr1; rÞ, mðBrÞ ¼rr1. Besides, an element is chosen from the level set in accordance
with BernoulliÕs probability model, thus
Fðchoose element x0jB
rÞ
¼
nr
ir
pirð1pÞnrðirÞ
if x02B
r;
0if x062B
r:
8 > < >
: ð52Þ
Then, according to the formula of full probability
FnðiÞ ¼
Xn
r¼1
mðBrÞ
nr
ir
pirð1pÞni
ði¼1;nÞ: ð53Þ
Or in Poisson limit
FðiÞ ¼eaX
1
r¼1
mðBrÞ air
ðirÞ! ði¼1;nÞ: ð54Þ
a¼iirr¼const,iiis the average empirical value of random variablen i¼i, rr¼P1r¼1rmðBrÞin full accordance with the model described in this paper.
The above example provides an explanation of the rule of probabilistic and possibilistic uncertainty index composition [2]. From (53) one can obtain
mðBrÞ ¼ea
Xr
k¼1
ð1Þk1Fðrkþ1Þ a
k1
ðk1Þ!; ðr¼1;nÞ: ð55Þ
The information contained in the distribution moments must be used for determining parameter a. This can be achieved by means of the relationship between focal and empirical moments
lfocalk ¼X
k
l¼0
ð1Þl r
l M
emp
kla
l; ð56Þ
where lfocal
k ¼
Pm
r¼1rðr1Þðrkþ1ÞmðBrÞ;M
emp
j
Pkðk1Þðkjþ1Þf
k; fk are the empirical frequencies.
In the case of the finite spectrum the higher moments from some order are equal to 0. This condition allows one to obtain the equation fora.
Another way of obtaining the equation foracan be formulated as follows: The empirical frequencies from some i are practically #0; in this case it is natural to assume that themðBrÞforr¼ialso#0. One obtains the equation
PðrÞ Pðr1ÞaþPðr2Þa
2
2!þ þ ð1Þ r1
Pð1Þ a
r1
ðr1Þ!¼0: ð57Þ
The numerical solution of such an equation does not present any difficulties. It is essential to choose the positive solution from those obtained from the aforementioned equation, which fulfills condition
aþX
r
m¼1
m¼ii:
It is worth mentioning that the choice of a solution in all cases is now the subject of further research. The method described above is applied to the in-vestigation of consonantal structures in English, French, Latin, Spanish and Georgian. Empirical data are obtained from [10], presenting word frequencies with consonantal structures in accordance with the number of syllables. Using YagerÕs notation [14], the following types of bags are subject to processing:
fccvg ¼ ð1=v;2=cÞ; fvccg ¼ ð1=v;2=cÞ;
fccvccg ¼ ð1=v;4=cÞ; fccvccvccg ¼ ð2=v;6=cÞ;
fvcccg ¼ ð1=v;3=cÞ; fcccvg ¼ ð1=v;3=cÞ:
Additionally, the mixed case representing all consonantal structures is considered. v represents a vowel and c a consonant. All the structures are typical of the above-mentioned languages.
From condition PðiÞ #0and data regardingii one obtains the following
equations for each of the languages under investigation: (1) English language:
(a) Mixed case
a38:7463a2þ13:6513a5:6083¼0;
(b) Structurecc
a36:9360a2þ10:0423a3:9026¼0;
(c) Structurecccc
a312:5278a2þ28:0481a12:8692¼0;
(d) Structurecccccc
a37:9731a2þ11:8423a1:8172¼0;
(e) Structureccc
a414:0925a3þ20:0506a212:7206aþ2:3685¼0: (2) French language:
(a) Mixed case
a314:7997a2þ37:2222a19:3223¼0;
(b) Structurecc
a314:0118a2þ32:4529a15:6912¼0;
(c) Structurecccc
a47:4352a3þ13:4000a210:0444aþ2:2222¼0;
(d) Structurecccccc
a430:3984a3þ80:7891a276:7813aþ17:5781¼0;
(e) Structureccc
a312:6322a2þ26:4220a13:9248¼0: (3) Latin language:
(a) Mixed case
a37:1387a2þ10:9416a4:0073¼0; (b) Structurecc
a35:9020a2þ8:6708a3:1547¼0; (c) Structurecccc
(d) Structurecccccc
a326:9878a2þ115:9512a43:9756¼0;
(e) Structureccc
a38:1323a2þ10:8228a3:6494¼0:
(4) Spanish language: (a) Mixed case
a514:9809a4þ66:0878a396:1527a2þ59:8855a14:7710¼0;
(b) Structurecc
a512:3107a4þ47:9429a358:3714a2þ28:7143a5:8286¼0;
(c) Structurecccc
a541:8090a4þ247:5000a3419:4231a2þ306:9231a82:3077¼0;
(d) Structurecccccc
a524:2440a4þ119:4896a3181:5311a2þ110:8134a19:7129¼0;
(e) Structureccc
a515:5704a4þ59:3172a360:6994a2þ22:4813a5:5953¼0:
(5) Georgian language: (a) Mixed case
a49:3378a3þ22:6638a239:2616aþ17:7409¼0;
(b) Structurecc
a48:8120a3þ26:7474a236:7270aþ14:8680¼0;
(c) Structurecccc
a517:7878a4þ113:1295a3260:9353a2þ282:0860a93:4530¼0;
(d) Structurecccccc
a414:0866a3þ46:2600a269:6806aþ27:7148¼0;
(e) Structureccc
a510:6601a4þ31:0880a352:4083a2þ42:4694a8:8487¼0:
The calculation results of parameter a, the spectral parameter values, the first empirical and focal moments and the empirical and model frequencies are given in Tables 1–5.
Table 1 English
N Mean value of word
length (experiment)
a Distribution characteristics
ii mðB1Þ=1 mðB2Þ=2 mðB3Þ=3 mðB4Þ=4
1 Mixed case 0.6983 0:2711
1:0000
0:6000 0:7289
0:1309
0:1281 –
2.5531 rr¼1:8654
2 Structurecc 0.6639 0:3304
1:0000
0:5475 0:6696
0:1167
0:1221 –
2.4630 rr¼1:7755
3 Structurecccc 0.6242 0:1474
1:0000
0:5238 0:8525
0:3337
0:3287 –
2.7997 rr¼2:1961
4 Structurecccccc 0.1732 0:0000
1:0000
0:1990 1:000
0:4945 0:8018
0:3042 0:3065
3.2886 rr¼3:0983
5 Structureccc 0.2965 0:1949
1:0000
0:6289 0:8051
0:1294 0:1762
0:0320 0:0468
2.3251 rr¼1:9689
Empirical and theoretical frequencies
Pð1Þ Fð1Þ
Pð2Þ Fð2Þ
Pð3Þ Fð3Þ
Pð4Þ Fð4Þ
Pð5Þ Fð5Þ
Pð6Þ Fð6Þ
Pð7Þ Fð7Þ
Pð8Þ Fð8Þ
1 0:1348
0:1348
0:3930 0:3930
0:3067 0:3067
0:1260 0:1260
0:0311 0:0311
0:0063 0:0068
0:0015 0:0015
0:0002 0:0001
2 0:1701
0:1701
0:3948 0:3948
0:2847 0:2847
0:1123 0:1122
0:0303 0:0284
0:0062 0:0054
0:0015 0:0015
0:0001 0:0000
3 0:0790
0:0790
0:3299 0:3299
0:3693 0:3693
0:1698 0:1694
0:0410 0:0463
0:0077 0:0077
0:0030 0:0033
0:0002 0:0003
4 0:0000
0:0000
0:1674 0:1674
0:4449 0:4449
0:3304 0:3304
0:0507 0:0503
0:0022 0:0042
0:0044 0:0024
0:0000 0:0000
5 0:1449
0:1449
0:5105 0:5105
0:2473 0:2412
0:0768 0:0750
0:0143 0:0140
0:0011 0:0017
0:0000 0:0017
0:0000 0:0000
F. Criado et al. / Information Sciences 147 (2002) 13–44
French
N Mean value of word
length (experiment)
a Distribution characteristics
ii mðB1Þ=1 mðB2Þ=2 mðB3Þ=3 mðB4Þ=4 mðB5Þ=5
1 Mixed case 0.7099 0:1218
1:0000
0:5145 0:8782
0:3608 0:3737
0:0000 0:0029
0:0000 0:0029
2.9602 rr¼2:2332
2 Structurecc 0.6791 0:1341
1:0000
0:5353 0:8659
0:3455 0:3300
0:0000 0:0096 –
2.8779 rr¼2:2412
3 Structurecccc 0.3641 0:0384
1:0000
0:3109 0:9616
0:6507 0:6507
0:1573 0:1857
0:0396 0:0284
3.2295 rr¼2:8819
4 Structurecccccc 0.3287 0:0000
1:0000
0:0711 1:0000
0:5171 0:9289
0:3050 0:4118
0:0979 0:0968
3.7877 rr¼3:4070
5 Structureccc 0.6470 0:1738
1:0000
0:5846 0:8262
0:2081 0:2416
0:0000 0:0335 –
2.9888 rr¼1:9673
Empirical and theoretical frequencies
Pð1Þ Fð1Þ
Pð2Þ Fð2Þ
Pð3Þ Fð3Þ
Pð4Þ Fð4Þ
Pð5Þ Fð5Þ
Pð6Þ Fð6Þ
Pð7Þ Fð7Þ
Pð8Þ Fð8Þ
1 0:0599
0:0599
0:2955 0:2955
0:3716 0:3716
0:1936 0:1929
0:0605 0:0603
0:0148 0:0133
0:0034 0:0023
0:0004 0:0003
2 0:0680
0:0680
0:3176 0:3176
0:3678 0:3733
0:1801 0:1843
0:0514 0:0545
0:0123 0:0120
0:0024 0:0015
0:0003 0:0002
3 0:0267
0:0000
0:2160 0:2160
0:4015 0:4015
0:2412 0:2412
0:0904 0:0904
0:0200 0:0100
0:0033 0:0029
0:0008 0:0008
4 0:0000
0:0000
0:0512 0:0512
0:3891 0:3891
0:3447 0:3447
0:1638 0:1631
0:0375 0:0373
0:0136 0:0148
0:0000 0:0000
5 0:0745
0:0745
0:3137 0:3137
0:3282 0:3282
0:1729 0:1715
0:0683 0:0608
0:0279 0:0327
0:0134 0:0032
0:0010 0:0045
F. Criado et al. / Information Sciences 147 (2002) 13–44 35
Table 3 Latin
N Mean value of word
length (experiment)
a Distribution characteristics
ii mðB1Þ=1 mðB2Þ=2 mðB3Þ=3 mðB4Þ=4 mðB5Þ=5
1 Mixed case 0.5456 0:0139
1:0000
0:2837 0:9860
0:5203 0:7023
0:1108 0:1820
0:0000 0:0053
3.4213 rr¼2:5854
2 Structurecc 0.5515 0:0149
1:0000
0:3400 0:9851
0:4815 0:6451
0:1510 0:1636
0:0000 0:1260
3.3504 rr¼2:7434
3 Structurecccc 0.5855 0:0123
1:0000
0:1693 0:9877
0:6023 0:8184
0:2284 0:2161
0:0000 0:0000
3.6045 rr¼3:0714
4 Structurecccccc 0.4196 0:0000
1:0000
0:0374 1:0000
0:3209 0:9626
0:5853 0:6417
0:0000 0:0902
4.1141 rr¼3:4087
5 Structureccc 0.5466 0:0295
1:0000
0:2724 0:9775
0:5895 0:7061
0:1284 0:1156 –
3.3514 rr¼2:8564
Empirical and theoretical frequencies
Pð1Þ Fð1Þ
Pð2Þ Fð2Þ
Pð3Þ Fð3Þ
Pð4Þ Fð4Þ
Pð5Þ Fð5Þ
Pð6Þ Fð6Þ
Pð7Þ Fð7Þ
Pð8Þ Fð8Þ
1 0:0081
0:0000
0:1644 0:1644
0:3912 0:3912
0:2998 0:2535
0:1098 0:0844
0:0245 0:0203
0:0020 0:0029
0:0002 0:0006
2 0:0086
0:0000
0:1959 0:1959
0:3854 0:3854
0:2831 0:2897
0:1030 0:1000
0:0217 0:0217
0:0022 0:0035
0:0001 0:0004
3 0:0069
0:0000
0:0943 0:0943
0:3906 0:3906
0:3397 0:3397
0:1351 0:1351
0:0307 0:0347
0:0021 0:0059
0:0005 0:0008
4 0:0000
0:0000
0:0246 0:0246
0:2213 0:2213
0:4754 0:4754
0:1803 0:1803
0:0902 0:0365
0:0081 0:0050
0:0000 0:0005
5 0:0130
0:0130
0:1580 0:1650
0:4283 0:4296
0:2850 0:2850
0:0961 0:0751
0:0195 0:0210
0:0000 0:0033
0:0000 0:0000
F. Criado et al. / Information Sciences 147 (2002) 13–44
Spanish
N Mean value of word length (experiment)
a Distribution characteristics
ii mðB1Þ
1
mðB2Þ 2
mðB3Þ 3
mðB4Þ 4
mðB5Þ 5
mðB6Þ 6
mðB7Þ 7
1 Mixed case 0.7504 0:0044
1:0000
0:02219 0:9956
0:4985 0:7737
0:1719 0:2752
0:0985 0:1033
3.7844 rr¼3:1138
2 Structurecc 0.5475 0:0054
1:0000
0:2420 0:9405
0:4634 0:7526
0:2897 0:2892
0:0008 0:0601
3.5912 rr¼3:0424
3 Structurecccc 0.6304 0:0009
1:0000
0:0586 0:9991
0:4531 0:9405
0:4279 0:4874
0:0474 0:0595
0:0157 0:0121
4.1252 rr¼3:5402
4 Structurecccccc 0.2930 0:0000
1:0000
0:0011 1:0000
0:0840 0:9989
0:3829 0:9149
0:3863 0:5320
0:1243 0:1457
0:0229 0:0214
4.9046 rr¼4:6234
5 Structureccc 0.2402 0:0000
1:0000
0:1527 1:0000
0:4389 0:8473
0:3431 0:4184
0:0591 0:0653
0:0035 0:0062
0:0000 0:0013
3.5687 rr¼3:3110
Empirical and theoretical frequencies
Pð1Þ Fð1Þ
Pð2Þ Fð2Þ
Pð3Þ Fð3Þ
Pð4Þ Fð4Þ
Pð5Þ Fð5Þ
Pð6Þ Fð6Þ
Pð7Þ Fð7Þ
Pð8Þ Fð8Þ
1 0:0021
0:0000
0:1048 0:1047
0:3140 0:3140
0:3463 0:3463
0:1662 0:1788
0:0523 0:0574
0:0129 0:0153
0:0013 0:0020
2 0:0031
0:0000
0:1400 0:1400
0:3447 0:3447
0:3353 0:3353
0:1362 0:1361
0:0335 0:0332
0:0068 0:0063
0:0004 0:0008
3 0:0005
0:0000
0:0312 0:0312
0:2609 0:2609
0:3861 0:3861
0:2181 0:2104
0:0798 0:0711
0:0214 0:0211
0:0020 0:0037
4 0:0000
0:0000
0:0008 0:0000
0:0627 0:0627
0:3040 0:3040
0:3746 0:3746
0:1897 0:1897
0:0579 0:0579
0:0103 0:0103
5 0:0000
0:0000
0:1201 0:1201
0:3740 0:3740
0:3562 0:3562
0:1215 0:1215
0:0225 0:0225
0:0056 0:0027
0:0000 0:0002
F. Criado et al. / Information Sciences 147 (2002) 13–44 37
Table 5 Georgian
N Mean value of word length (experiment)
a Distribution characteristics
ii mðB1Þ
1
mðB2Þ 2
mðB3Þ 3
mðB4Þ 4
mðB5Þ 5
mðB6Þ 6
mðB7Þ 7
mðB8Þ 8
1 Mixed case 0.6216 0:0136
1:0000
0:2199 0:9864
0:3770 0:7665
0:2446 0:3895
0:1236 0:1419
0:0000 0:0213
3.9542 rr¼3:1808
2 Structurecc 0.6546 0:0110
1:0000
0:2446 0:9890
0:3787 0:7444
0:2449 0:3657
0:1244 0:1208
0:0000 0:0000
3.8923 rr¼3:2379
3 Structurecccc 0.5499 0:0012
1:0000
0:0964 0:9988
0:2898 0:9084
0:3722 0:6126
0:1687 0:2404
0:0693 0:0717
4.3921 rr¼3:8115
4 Structurecccccc 0.5877 0:0000
1:0000
0:0000 1:0000
0:1200 0:9859
0:3935 0:8659
0:2625 0:4724
0:1625 0:2099
0:0090 0:0474
0:0518 0:0465
5.2287 rr¼4:7412
5 Structureccc 0.3094 0:0118
1:0000
0:2229 0:9882
0:4062 0:7653
0:2101 0:3591
0:1091 0:1490
0:0339 0:00399
3.5801 rr¼3:2655
Empirical and theoretical frequencies
Pð1Þ Fð1Þ
Pð2Þ Fð2Þ
Pð3Þ Fð3Þ
Pð4Þ Fð4Þ
Pð5Þ Fð5Þ
Pð6Þ Fð6Þ
Pð7Þ Fð7Þ
Pð8Þ Fð8Þ
1 – 0:1181
0:1181
0:2757 0:2759
0:2821 0:2801
0:1932 0:1920
0:0873 0:0755
0:0277 0:0276
0:0000 0:0066
2 – 0:1271
0:1271
0:2800 0:2800
0:2833 0:2833
0:1945 0:1948
0:0789 0:0799
0:0240 0:0243
0:0000 0:0042
3 – 0:0556
0:0556
0:1978 0:1978
0:3145 0:3151
0:2418 0:2422
0:1307 0:1308
0:0433 0:0449
0:0000 0:0096
4 – – 0:0745
0:0748
0:2624 0:2624
0:2872 0:2872
0:2163 0:2163
0:0496 0:0867
0:0496 0:0512
5 – 0:1636
0:1636
0:3488 0:3488
0:2543 0:2543
0:1429 0:1429
0:0579 0:0578
0:0122 0:0122
0:0031 0:0016
F. Criado et al. / Information Sciences 147 (2002) 13–44
English model distributions:
FmixðiÞ ¼e0:6983
0:2711ð0:6983Þ
i1
ði1Þ! þ0:6000
ð0:6983Þi2 ði2Þ!
þ0:1309ð0:6983Þ
i3 ði3Þ!
!
;
FccðiÞ ¼e0:6639 0:3304ð0:6639Þ
i1
ði1Þ! þ0:5475
ð0:6639Þi2 ði2Þ!
þ0:1167ð0:6639Þ
i3
ði3Þ!
!
;
FccccðiÞ ¼e0:6246 0:1474ð0:6242Þ
i1
ði1Þ! þ0:5238
ð0:6242Þi2 ði2Þ!
þ0:3337ð0:6242Þ
i3
ði3Þ!
!
;
FccccccðiÞ ¼e0:1732 0:1990ð0:1732Þ
i2
ði2Þ! þ0:4945
ð0:1732Þi3 ði3Þ!
þ0:3042ð0:1732Þ
i4 ði4Þ!
!
;
FcccðiÞ ¼e0:2965
0:1949ð0:2965Þ
i1
ði1Þ! þ0:6289
ð0:2965Þi2 ði2Þ!
þ0:1294ð0:2965Þ
i3
ði3Þ! þ0:0320
ð0:2965Þi4 ði4Þ!
!
:
French model distributions:
FmixðiÞ ¼e0:7099 0:1218ð0:7099Þ
i1
ði1Þ! þ0:5145
ð0:7099Þi2 ði2Þ!
þ0:3608ð0:7099Þ
i3 ði3Þ!
!
;
FccðiÞ ¼e0:6791
0:1341ð0:6791Þ
i1
ði1Þ! þ0:5353
ð0:6791Þi2 ði2Þ!
þ0:3455ð0:6791Þ
i3
ði3Þ!
!
FccccðiÞ ¼e0:3641 0:0384ð0:3641Þ
i1
ði1Þ! þ0:3109
ð0:3641Þi2 ði2Þ!
þ0:6507ð0:3641Þ
i3
ði3Þ! þ0:1573
ð0:3641Þi4 ði4Þ!
þ0:0396ð0:3641Þ
i5 ði5Þ!
!
;
FccccccðiÞ ¼e0:3287 0:0711ð0:3287Þ
i2
ði2Þ! þ0:5171
ð0:3287Þi3 ði3Þ!
þ0:3050ð0:3287Þ
i4
ði4Þ! þ0:0979
ð0:3287Þi5 ði5Þ!
!
;
FcccðiÞ ¼e0:6470 0:1738ð0:6470Þ
i1
ði1Þ! þ0:5846
ð0:6470Þi2 ði2Þ!
þ0:2081ð0:6470Þ
i3
ði3Þ!
!
:
Latin model distributions:
FmixðiÞ ¼e0:5456 0:0139ð0:5456Þ
i1
ði1Þ! þ0:2837
ð0:5456Þi2 ði2Þ!
þ0:5203ð0:5456Þ
i3
ði3Þ! þ0:1108
ð0:5456Þi4 ði4Þ!
!
;
FccðiÞ ¼e0:5515 0:0149ð0:5515Þ
i1
ði1Þ! þ0:3400
ð0:5515Þi2 ði2Þ!
þ0:4815ð0:5515Þ
i3
ði3Þ! þ0:1510
ð0:5515Þi4 ði4Þ!
!
;
FccccðiÞ ¼e0:5855 0:0123ð0:5855Þ
i1
ði1Þ! þ0:1693
ð0:5855Þi2 ði2Þ!
þ0:6023ð0:5855Þ
i3
ði3Þ! þ0:2284
ð0:5855Þi4 ði4Þ!
!
FccccccðiÞ ¼e0:4196 0:0374ð0:4196Þ
i2
ði2Þ! þ0:3209
ð0:4196Þi3 ði3Þ!
þ0:5853ð0:4196Þ
i4
ði4Þ!
!
;
FcccðiÞ ¼e0:5466 0:0295ð0:5466Þ
i1
ði1Þ! þ0:2724
ð0:5466Þi2 ði2Þ!
þ0:5895ð0:5466Þ
i3
ði3Þ! þ0:1284
ð0:5466Þi4 ði4Þ!
!
:
Spanish model distributions:
FmixðiÞ ¼e0:7504 0:0044ð0:7504Þ
i1
ði1Þ! þ0:2219
ð0:7504Þi2 ði2Þ!
þ0:4985ð0:7504Þ
i3
ði3Þ! þ0:1719
ð0:7504Þi4 ði4Þ!
þ0:0985ð0:7504Þ
i5 ði5Þ!
!
;
FccðiÞ ¼e0:5475
0:0054ð0:5475Þ
i1
ði1Þ! þ0:2420
ð0:5475Þi2 ði2Þ!
þ0:4634ð0:5475Þ
i3
ði3Þ! þ0:2897
ð0:5475Þi4
ði4Þ! þ0:0008
ð0:5475Þi5 ði5Þ!
!
;
FccccðiÞ ¼e0:6304 0:0009ð0:6304Þ
i1
ði1Þ! þ0:0586
ð0:6304Þi2 ði2Þ!
þ0:4531ð0:6304Þ
i3
ði3Þ! þ0:4279
ð0:6304Þi4
ði4Þ! þ0:0474
ð0:6304Þi5 ði5Þ!
þ0:0157ð0:6304Þ
i6
ði6Þ!
!
;
FccccccðiÞ ¼e0:2930
0:0011ð0:2930Þ
i2
ði2Þ! þ0:0840
ð0:2930Þi3 ði3Þ!
þ0:3829ð0:2930Þ
i4
ði4Þ! þ0:3863
ð0:2930Þi5 ði5Þ!
þ0:1243ð0:2930Þ
i6
ði6Þ! þ0:0229
ð0:2930Þi7 ði7Þ!
!
FcccðiÞ ¼e0:2402 0:1527ð0:2402Þ
i2
ði2Þ! þ0:4389
ð0:2402Þi3 ði3Þ!
þ0:3431ð0:2402Þ
i4
ði4Þ! þ0:0591
ð0:2402Þi5 ði5Þ!
þ0:0035ð0:2402Þ
i6 ði6Þ!
!
:
Georgian model distributions:
FmixðiÞ ¼e0:6216 0:0136ð0:6216Þ
i1
ði1Þ! þ0:2199
ð0:6216Þi2 ði2Þ!
þ0:3770ð0:6216Þ
i3
ði3Þ! þ0:2446
ð0:6216Þi4 ði4Þ!
þ0:1244ð0:6216Þ
i5 ði5Þ!
!
;
FccðiÞ ¼e0:6546
0:0110ð0:6546Þ
i1
ði1Þ! þ0:2446
ð0:6546Þi2 ði2Þ!
þ0:3787ð0:6546Þ
i3
ði3Þ! þ0:2449
ð0:6546Þi4
ði4Þ! þ0:1244
ð0:6546Þi5 ði5Þ!
!
;
FccccðiÞ ¼e0:5499 0:0012ð0:5499Þ
i1
ði1Þ! þ0:0964
ð0:5499Þi2 ði2Þ!
þ0:2898ð0:5499Þ
i3
ði3Þ! þ0:3722
ð0:5499Þi4
ði4Þ! þ0:1687
ð0:5499Þi5 ði5Þ!
þ0:0693ð0:5499Þ
i6 ði6Þ!
!
;
FccccccðiÞ ¼e0:5877 0:1200ð0:5877Þ
i3
ði3Þ! þ0:3935
ð0:5877Þi4 ði4Þ!
þ0:2625ð0:5877Þ
i5
ði5Þ! þ0:1625
ð0:5877Þi6 ði6Þ!
þ0:0090ð0:5877Þ
i7
ði7Þ! þ0:0518
ð0:5877Þi8 ði8Þ!
!
FcccðiÞ ¼e0:3094 0:0118ð0:3094Þ
i1
ði1Þ! þ0:2229
ð0:3094Þi2 ði2Þ!
þ0:4062ð0:3094Þ
i3
ði3Þ! þ0:2101
ð0:3094Þi4
ði4Þ! þ0:1091
ð0:3094Þi5 ði5Þ!
þ0:0339ð0:3094Þ
i6 ði6Þ!
!
:
Table 6
Language Structure Phonological structure length
English Mixed case e11¼ ð1:0000=1;0:7289=2;0:1281=3Þ
cc e11¼ ð1:0000=1;0:6696=2;0:1221=3Þ
cc cc e11¼ ð1:0000=1;0:8525=2;0:3287=3Þ
cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:8018=3;0:3042=4Þ
ccc e11¼ ð1:0000=1;0:8051=2;0:1762=3;0:0468=4Þ
French Mixed case e11¼ ð1:0000=1;0:8732=2;0:3737=3;0:0029=4;0:0029=5Þ
cc e11¼ ð1:0000=1;0:8659=2;0:3300=3;0:0096=4Þ
cc cc e11¼ ð1:0000=1;0:9616=2;0:6507=3;0:1857=4;0:0284=5Þ
cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:9289=3;0:4118=4;0:0968=5Þ
ccc e11¼ ð1:0000=1;0:8262=2;0:2416=3;0:0335=4Þ
Latin Mixed case e11¼ ð1:0000=1;0:9860=2;0:7023=3;0:1820=4;0:0053=5Þ
cc e11¼ ð1:0000=1;0:9851=2;0:6451=3;0:1636=4;0:1260=5Þ
cc cc e11¼ ð1:0000=1;0:9877=2;0:8184=3;0:2161=4Þ
cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:9626=3;0:6417=4;0:0902=5Þ
ccc e11¼ ð1:0000=1;0:9775=2;0:7061=3;0:1156=4Þ
Spanish Mixed case e11¼ ð1:0000=1;0:9956=2;0:7737=3;0:2752=4;0:1033=5Þ
cc e11¼ ð1:0000=1;0:9405=2;0:7526=3;0:2892=4;0:0601=5Þ
cc cc e11¼ ð1:0000=1;0:9991=2;0:9405= 3;0:4874=4;0:0595=5;0:0121=6Þ
cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:9989= 3;0:9149=4;0:5320=5;0:1457=6;0:0214=7Þ
ccc e11¼ ð1:0000=1;1:0000=2;0:8473= 3;0:4184=4;0:0653=5;0:0062=6;0:0013=7Þ
Georgian Mixed case e11¼ ð1:0000=1;0:9864=2;0:7665= 3;0:3895=4;0:1419=5;0:0213=6Þ
cc e11¼ ð1:0000=1;0:9890=2;0:7444=3;0:3657=4;0:1208=5Þ
cc cc e11¼ ð1:0000=1;0:9988=2;0:9084= 3;0:6126=4;0:2404=5;0:0717=6Þ
cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:8659=
3;0:8659=4;0:4724=5;0:2099=6;0:0474=7;0:0465=8Þ
ccc e11¼ ð1:0000=1;0:9882=2;0:7653=3;0:3591= 4;0:1490=5;0:0399=6Þ
The findings of this study suggest that real structures are characterized by fuzzy phonological lengths (number of sounds in the structure). Words (car-teges of chosen elements) are represented by mixtures of certain focal car(car-teges
Br with focal probabilities mðBrÞ and by a fuzzy unimodal structure with a
length of ‘‘approximately 1’’; such a model is assumed for the mixed case and bags ð1=v;2=cÞ, ð1=v;3=cÞand ð1=v;1=uÞ. Bags ð2=v;6=cÞcorresponds to the fuzzy bimodal structure model with a length of ‘‘approximately 2’’.
The data concerning fuzzy structure lengths are to be found in Table 6.
References
[1] G. Birkhoff, Lattice Theory, NY, 1981.
[2] F. Criado, T. Gachechiladze, Fuzzy random events and their corresponding conditional probability measures, Real Academia de Ciencias Exactas LXXXIX (1995).
[3] W. Fucks, Mathematical theory of word formation, Communication Theory, London, 1953.
[4] T. Gachechiladze, T. Manjaparashvili, Fuzzy generalized Bernoulli distributions, in: Proceed-ings of Tbilisi State University, Cybernetics, Applied Mathematics, vol. 224, 1981.
[5] T. Gachechiladze, T. Manjaparashvili, On fuzzy sets, Rep. of Tbilisi University 279 (1988). [6] T. Gachechiladze, T. Manjaparashvili, Fuzzy random events and corresponding probability
measures, Rep. of Tbilisi University (1990) 300.
[7] T. Gachechiladze, T. Manjaparashvili, Fuzzy linguistical models, in: Quantitative Linguistic, Tallin-Tbilisi, 1990.
[8] S. Kullback, Information Theory and Statistics, John Wiley, London, 1958.
[9] B. Mandelbrot, An information theory and statistical structure of language, Communication Theory, London, 1953.
[10] R. Megrelishvili, Structures of word and mathematical theory of word formation, in: Pro-ceeding of Tbilisi State University, Cybernetics, Applied Mathematics, vol. 289, 1989. [11] H. Weil, in: A. Baumler, Sch€ooter (Eds.), Filosofie der mathematik und
Naturwissenschaft-Handbuch der Filosofie, 1927.
[12] R. Yager, On the measures of fuzziness and negation, I, Intern. J. General Systems 5 (1979) 221.
[13] R. Yager, Level sets for evaluation of the grade of membership of the fuzzy sets, in: R. Yager (Ed.), Fuzzy Sets and Possibililty Theory, Pergamon Press, Oxford, 1984.
[14] R. Yager, On the theory of bags, Tech Report M11-601, IONA college, Machine Intelligence Inst. (1986).
[15] L. Zadeh, Fuzzy sets, Inform. and Control (8) (1965) 338.
[16] L. Zadeh, Probabililty measures and fuzzy events, J. Math. Anal. and Applic. 23 (1968) 424.
[17] L. Zadeh, The concept of linguistic variable and its application to approximate reasoning, Information Sciences 8 (1975) 199–249 (see also pp. 301–357).