16637756

(1)

The bag model in language statistics

F. Criado

a,*

, T. Gachechiladze

b

, H. Meladze

b

,

G. Tsertsradze

b

a_{Facultad de Ciencias, Campus de Teatinos, Universidad de M}

a

alaga, 29071 Maalaga, Spain b_{Department of Applied Mathematics and Computer Science, Tbilisi State University,}

1, chavchavadze Ave, Tbilisi 380028, Georgia

Received 8 May 2000; received in revised form 3 November 2001; accepted 30 January 2002

Abstract

In this paper, fuzzy quantitative models of language statistics are constructed. All suggested models are based on the assumption about a superposition of two kinds of uncertainties: probabilistic and possibilistic. The realization of this superposition in statistical distributions is achieved by the probability measure splitting procedure. In this way, the fuzzy versions of generalized binomial, Fucks and Zipf–MandelbrotÕs distributions are constructed describing the probabilistic and possibilistic organization of language at any level: morphological, syntactic or phonological. The main problem when constructing the quantitative model of some fuzzy linear structure is finding the corresponding linguistic spectrum, which is reduced to the solution of algebraic or transcendental equation systems by inverse spline-interpolation. In the final section, the general linear mathematical model of language structures is then described briefly, as well as bag statistics for consonantal structures of languages.

Keywords:Fuzzy sets; Membership functions; Probability theory; Linguistic modeling

www.elsevier.com/locate/ins

*_{Corresponding author.}

E-mail address:[email protected](F. Criado).

(2)

1. Introduction

Fuzzy logic and fuzzy set theories were initially proposed to describe lin-guistic variables, i.e., to describe the meaning of words in natural language. Originally, Zadeh thought that the area of linguistics would be one of the major ﬁelds of application for this new formalism. Surprisingly, the main area of application is now control, and in comparison with control there are only a few applications in the ﬁeld of linguistics. In view of this, this paper describes a new approach to the study of natural language.

A new approach to the representation of fuzzy sets as a result of set splitting is given in Section 2. This new approach has been applied to the representation of fuzzy sets as a result of the set splitting procedure into usual subsets of some universal set, which is convenient for describing possibilistic and probabilistic superpositions.

In Section 3 the characteristic laws of the split subset lattice (especially pseudo-complements and relative pseudo-complements) of the Browerian lat-tice of indicators (membership functions) of fuzzy sets, and as a consequence the measures of fuzziness, have also been considered in Section 5. The fuzziness is characterized by a relation betweeneAAandeAAD_{, which underlines the fact that}

fuzziness is an intrinsic property of eAA and independent of the pseudo-com-plement.

The set splitting procedure is a new tool for defining and calculating random fuzzy event probabilities. On this basis, Section 7 deal with new generalizations of binomial, Zipf–Mandelbrot and other distributions have been obtained, describing the possibilistic–probabilistic organization of structures created by different language elements. In Section 8 the general linear mathematical model of language structures and its main characteristics are described briefly. The possibilistic characteristics of these models are represented by components of the so-called linguistic spectrum.

Applications of these models to language structures are presented in Section 9.

2. Set splitting

Let Xbe a ﬁnite set and Aany subset, AX. Consider a correspondence

IA! ðI_e

A

A;IeAADÞ, whereIA is the indicator of subsetA,Ie_A_A;Ie_A_AD 2 ½0;1 X

and

IAðxÞ ¼I_e A

AðxÞ þIeAADðxÞ 8x2X: ð1Þ

Ais a support of mappings I

e

A andIAeD.

According to Zadeh [15], splitting components I

e

A and IAeD are fuzzy subsets

ofX. CallI

(3)

The procedure in which indicator IA is compared with a pair ðI_e A A;IeAADÞ is

called ‘‘splitting of indicatorIA(subsetA)’’.1

The splitting procedure of some subsetsA;BXinduces the corresponding splitting of the union and intersection of these two subsets. For split indicators

I

f

A\B

A\B and IAAf[[BB it is essential to fulﬁll the natural conditions (as for non-split

ones)

I

eAAðxÞ;IeBBðxÞPIAAf\\BBðxÞ; IeAAðxÞ;IeBBðxÞ

6I

f

A[B

A[BðxÞ; x2X:

Then, as it is easy to see for intersection and union indicators, the following expressions are obtained:

I_f

A\B

A\BðxÞ ¼IeAAðxÞ ^IeBBðxÞ 8x2X ð^ minÞ ðsimultaneous splittingÞ;

ð2Þ

I_f

f

A\B Af\B A\B A\B

ðxÞ ¼I_e A

AðxÞ IeBBðxÞ 8x2X ðsequential splittingÞ; ð3Þ

I

f f

A[B Af[B A[B A[B

ðxÞ ¼I

eAAðxÞ þIeBBðxÞ If_f A\B Af\B A\B A\B

ðxÞ 8x2X ðsequential splittingÞ;

ð4Þ

I_f

A[B

A[BðxÞ ¼IeAAðxÞ _IeBBðxÞ 8x2X ð_ minÞ ðsimultaneous splittingÞ:

ð5Þ

3. The lattice of split elements of ordinary indicators’ Boolean lattice

Consider the Boolean lattice I¼ ðf0;1gX;_;^Þwith natural order. The set of all split elements of this lattice with natural order I ¼ ð½0;1X;_;^Þis a lattice.

Theorem 1.I is a Brouwer’s lattice.

A direct demonstration of this theorem (i.e., the demonstration that for any two elementsI

eAA andIeBB 2I

_{the set of all}_I

e

X X 2I

_{such that} _I

eAA^IeXX IeBB

2 has the greatest elementðI

eBB :IeAAÞcalled the relative pseudo-complement ofIeAAinIeBB)

can be made by [1]. It is easy to see that

1_{Notice that component}

I_e

A

Acan be split again:IeAA¼ ðI_e_A_A_A_Aee;Iee_e_A_A_A_ADÞ, whereIee

A A

eAA

¼vI_e

A

A,I_eee_A_A_A_AD¼ ð1vÞIe

A A;

v:X! ½0;1. Two sequential splittings induce the splitting of initial subset IA¼ ðvlIA;ð1vlÞIAÞ; lIA¼I_e

A

A; ð1lÞIA¼IeAAD; l:X! ½0;1: 2

(4)

ðI_e B

B :IeAAÞðxÞ ¼

1; I_e

A AðxÞ

6_I

eBBðxÞ I_ACðxÞ _I

eBBðxÞ; IeAAðxÞ>IeBBðxÞ

(

8x2X; ð6Þ

where I_AC ¼ ðI_;;I

eAAÞ is a pseudo-complement of IeAA and, as a function of x,

represents the indicator of the usual complement of set A in X. Next, the following theorem is easy to demonstrate.

Theorem 2.The following statements hold in latticeI:

ðiÞ If I_e A

A IeBB;thenðI;:IeBBÞ ðI;:IeAAÞ;

ðiiÞ I_e A

A ðI;:ðI;:IeAAÞÞ;

ðiiiÞ ðI;:I_e_A_AÞ ¼ ðI;:ðI;:ðI;:I_e_A_AÞÞÞ;

ðivÞ ðI;:ðI_e

A

A_IeBBÞÞ ¼ ðI;:IeAAÞ ^ ðI;:IeBBÞ;

ðvÞ ðI_;:ðI

eAA^IeBBÞÞ ¼ ðI;:IeAAÞ _ ðI;:IeBBÞ:

ðviÞ ðI

eAA:IeBBÞ ^IeAA¼IeAA;

ðviiÞ ðI

eAA:IeBBÞ ^IeBB¼IeAA^IeBB;

ðviiiÞ ððI_e A

A:IeBBÞ:IeCCÞ ¼ ðIeAA:IeCCÞ ^ ðIeBB:IeCCÞ;

ðixÞ ðI_e A

A:ðIeBB_IeCCÞÞ ¼ ðIeAA :IeBBÞ ^ ðIeAA:IeCCÞ:

ð7Þ

4. The splitting of a set

The splitting of a set, which as already seen corresponds to the indicator splitting, is represented by

ðIA! ðI_e_A_A;I_e_A_ADÞÞ¡ðA! ðeAA;AAe D_ÞÞ_;

ðIA¼I_e A

AþIeAADÞ¡ðA¼AAeeAA

D_Þ_: ð8Þ

Hereis the operation of set synthesis.

On the basis of (8) one can obtain a more general expressionAAeBBe, which obviously will make sense provided that eBB :eAA, or eAA :eBB. One can also obtain the existence conditions for expressions eAABBeCCe, etc.

Considering that such a condition holds for the above expressions, one can easily prove that

(5)

ðiÞ AAeBBe¼eBBeAA;

ðiiÞ AAe ðeBBCCeÞ ¼ ðAAeBBeÞ CCe;

ðiiiÞ ðAAeAAeDÞ \ ðeBBBBeD_{Þ ¼ ð}_A_Ag_\_\_B_B_{Þ ð}_A_Ag_\_\_B_B_ÞD

¼ ðeAA\BBeÞ ½ðA\BBeDÞ [ ðeAAD\BÞ;

ðivÞ ðAAeAAeDÞ [ ðeBBBBeDÞ ¼ ðAAg[[BBÞ ðAAg[[BBÞD

¼ ðeAA[BBeÞ ½ðeAAD\eBBDÞ [ ðAC\BBeDÞ [ ðeAAD\BCÞ;

ðvÞ AAe ðeBB\CCeÞ ¼ ðeAAeBBÞ \ ðeAACCeÞ;

ðviÞ AAe ðeBB[CCeÞ ¼ ðeAAeBBÞ [ ðeAACCeÞ:

ð9Þ

For example, to prove the last two formulae, one can write

ðvÞ eAA ðeBB\CCeÞ¡I_e A

Aþ ðIeBB^IeCCÞ ¼ ðIeAAþIeBBÞ ^ ðIeAAþIeCCÞ ¡ðeAABBeÞ \ ðAAeCCeÞ:

ðviÞ eAA ðeBB[CCeÞ¡I_e A

Aþ ðIeBB_IeCCÞ ¼ ðIeAAþIeBBÞ _ ðIeAAþIeCCÞ ¡ðeAABBeÞ [ ðAAeeBBÞ:

Let it be assumed that in these formulae the following relations hold:

ðI

f

A\B

A\B ¼IeAA^IeBBÞ¡ðAAg\\BB¼eAA\eBBÞ;

ðI

f

A[B

A[B ¼IeAA_IeBBÞ¡ðAAg[[BB¼eAA[eBBÞ

ð10Þ

which are evident because of (2), (5) and (8).

In the lattice of split subsets almost all Boolean lattice rules hold:

4(1) Reﬂexivity: AAe eAA.

3(2) Antisymmetry:ðeAAeBB;BBeeAAÞ )AAe¼eBB. 3(3) Transitivity: ðeAAeBB;eBBCCeÞ ) ðeAACCeÞ. 3(4) Idempotency: AAe\AAe ¼eAA and eAA[eAA¼AAe.

3(5) Commutativity: AAe\eBB¼BBe\AAe and eAA[BBe¼eBB[eAA.

3(6) Associativity: ðeAA\BBeÞ \CCe¼eAA\ ðeBB\CCeÞandðeAA[eBBÞ [CCe¼AeA[ ðBBe[CCeÞ. 3(7) Distributivity: AAe\ ðBBe[CCeÞ ¼ ðeAA\eBBÞ [ ðeAA\CCeÞ and AAe[ ðeBB\CCeÞ ¼

ðeAA[eBBÞ \ ðeAA[CCeÞ.

3(8) Annihilation laws: eAA\ ðeAA[eBBÞ ¼eAA and eAA[ ðeAA\eBBÞ ¼eAA.

3(9) Involution law for fuzzy complement::ð:AAeÞ ¼ eAA.

(10) Identity laws:AAe[ ; ¼eAA; AAe\X¼AAe and eAA[X¼X; AAe\ ; ¼ ;.

(11) Order inversion laws: ðeAAeBBÞ () ð:eBB :eAAÞ and ðeAAeBBÞ () ðeBBD

e

A AD_Þ.

(12) De MorganÕs laws::ðeAA[eBBÞ ¼ ð:eAA\ :eBBÞand:ðeAA\eBBÞ ¼ ð:eAA[ :eBBÞ. In connection with the introduced notion of dual subsets one can prove the following laws:

(6)

(13) Involution law for the dual subset: ðeAADÞD¼eAA:

(14) Duality laws for the union and intersection of split subsets:

ðeAA[eBBÞD¼ ðeAAD_\_B_B_eD_{Þ [ ð}_AC_\_B_B_eD_{Þ [ ð}_BC_\_e_A_AD_Þ_;

ðeAA\eBBÞD¼ ðA\BBeDÞ [ ðeAAD\BÞ:

Notice that in latticeI laws of contradiction and tertium non-datur do not hold.

5. Dual element and fuzziness (qualitative consideration)

As illustrated before, the dual element plays an important role in describing split subset lattices. Now, the role of the dual element in understanding fuzz-iness will be considered.

There is an important diﬀerence between usual and fuzzy subsets. The usual subset (set) can be represented as an aggregate of real objects only when the real measured potential possibility of aggregate formation corresponds to fuzzy subsets. Fuzzy subset is a medium of formation for real aggregate. It is im-portant to notice that the term ‘‘medium of formation’’ is borrowed from Weil [11] to underline the following circumstance: Any sequence of research out-comes is a result of acts of free decision-making by the subject (observer), any concrete sequence is a crisp ﬁnite subset of some universum, but the fuzzy subset is analogous of WeilÕs continuum.

In the lattice of fuzzy subsets a dual element eAAD is deﬁned by splitting procedure [2,5,6]. Its sense can be explained as follows: the value of the membership functionI_e

A

AðxÞis a degree of concordance of an elementxwith the

concept represented byAAe; the valueI_e A

ADðxÞhas the same sense with respect to

the concept represented byAAeD, which together withAAe,ðeAA;AAeDÞdeﬁnes a crisp subsetA. The nearer (in some sense) AAe and eAAD are [12], the more fuzzy the following statement is ‘‘Elements of A possess property eAAðeAADÞ’’. Below, a qualitative description of fuzziness is considered analogously with [12], but with the following diﬀerence: In [12], the fuzziness is characterized by the re-lationfbetweeneAAand ZadehÕs negation:eAA. In the present case, the less rigid relationubetweeneAAandeAAD, which in the authorsÕopinion underlines the fact that fuzziness is an intrinsic property of AAe and is independent of the pseudo-complement, is assumed as a basis. The basis for considering the relationuis a relation in distributive lattice, ‘‘CCe is betweenAAe and eBB,ðeAA;CCe;BBeÞ’’ [12].

Deﬁnition 1.Let XXe and YYe 2L (distributive lattice). XXe is no less fuzzy than e

Y

Y ðXXeuYYeÞifXXe Y andðXXeYÞD¼XXeD_Y _{are in}_L_between_Y_Y_e _and_Y_Y_eD_{. Here}

(7)

ðXXeuYYeÞ ¼ _ðeðeYY;XXe Y;YYeDÞ

Y

Y;XXeD_Y_;_Y_YeD_Þ

() YYe\YYeDXXe Y YYe[YYeD:

Theorem 3.Relationuis reflexive and transitive onL,i.e.,

ðXXeuXXeÞ and ½ðXXeuYYeÞand ðYYeuZZeÞ ) ðXXeueZZÞ:

It can be seen thatuonLis not antisymmetric and, therefore, not a partial order.

Theorem 4.RelationuonL is such that

(1) ðXXeuXXeD_Þ_and _ð_X_X_eD_u_X_X_e_Þ.

(2) ðXXeuYYeÞ () ðXXeDuYYeÞ () ðXXeDuYYeDÞ () ðXXeuYYeDÞ.

On the lattice L, let a relation E be deﬁned so thatðXX Ee YYeÞ if XXe ¼YYe or e

X

XD_¼_Y_Y_e _or _X_X_e _¼_Y_Y_eD_{. It can be shown that} _E_{is an equivalence relation. Each}

equivalence class consists of a fuzzy subset and its respective dual. If XXe ¼XXeD

then the equivalence class consists of only one element.

The subset consisting of any fuzzy subset and its respective dual subset is called the dual pair. According to Theorem 4, if one component of the dual pair is more fuzzy than any component of the other pair, then any component of the ﬁrst pair is more fuzzy than any component of the second pair. So it is reasonable to introduce the notion of fuzziness of the dual pair.

Deﬁnition 2. Let L be a set of dual pairs. Deﬁne on L a relation U so that

ðeuuU_evvÞfor_euu;_evv2L, if one can say that the dual pair_euu is no less fuzzy than the dual pair_evv.

It is easy to demonstrate that relationUon the set of dual pairs is a partial order relation.

6. Probability measure splitting

Let ðX;B;pð ÞÞ be a given probability space. The probability of the event

K2Bis calculated by formula

pðKÞ ¼

Z

X

IKðxÞpðdxÞ: ð11Þ

According to the splitting procedure of the set K, this formula can be re-written in the following form:

(8)

pðKKeKKeD_{Þ ¼}

Z

X I

e

K

KðxÞpðdxÞ þ

Z

X I

eKKDðxÞpðdxÞ; ð12Þ

whereI_e K

Kis aB-measurable membership function (the corresponding subsetKKe

is a fuzzy random event). DeﬁnepðKKeÞandpðeKKDÞas follows:

pðKKeÞ ¼ Z

X I_e

K

KðxÞpðdxÞ and pðKKe D_{Þ ¼}

Z

X I_e

K

KDðxÞpðdxÞ; ð13Þ

the probability of fuzzy event KKe and the probability of dual fuzzy event KKeD_,

respectively. Let representation

pðKÞ ¼pðeKKKKeD_{Þ ¼}_p_ð_K_Ke_{Þ þ}_p_ð_K_KeD_Þ _ð14Þ

be called the procedure of probability measure splitting [16].

7. Fuzzy distributions

7.1. Binomial distribution withfuzzy elementary events

Let A¼ f0;1g be the space of elementary events.

One can obtain the fuzzy elementary events by splitting usual eventsf0gand f1g. For membership functions one can write

v_f₀_gðxÞ ¼l₀ðxÞv_f₀_gðxÞ þ ð1l₀ðxÞÞv_f₀_gðxÞ;

v_f1gðxÞ ¼l1ðxÞvf1gðxÞ þ ð1l1ðxÞÞvf1gðxÞ;

ð15Þ

wherel0;l1:A! ½0;1,x¼0;1.

According to (13), the probability of fuzzy elementary events is

pfe00g ¼l0p0; pfe11g ¼l1p1; ð16Þ

wherep0andp1are the probabilities of the corresponding crisp events. Now it is easy to write the split binomial distribution corresponding to fuzzy elementary events. Only two variants will be considered: completely simulta-neous and completely sequential. The intermediate cases are not of any interest and for this reason they will not be considered here.

For the completely simultaneous case, the split binomial distribution is

pðgBBnn;;nnÞ ¼l1pn1;

pðgBBnn;;00Þ ¼l0ð1p1Þ

n ;

pðgBBnn;;kkÞ ¼ ðl0^l1Þ

n

k p

k

1ð1p1Þ

nk

; k¼1;. . .;n1;

ð17Þ

whereBBgnn;;kk is the fuzzy Bernoulli event. The normalization factor is

p1ðeAAnÞ ¼ ½ðl0^l1Þ þ ðl1 ðl0^l1ÞÞp

n

1þ ðl0 ðl0^l1ÞÞð1p1Þ

n

(9)

For the completely sequential case one gets

pðBBBBgggnnnn;;;;kkkkÞ ¼ n

k ðl1p1Þ

k

ðl0ð1p1ÞÞnk ð18Þ

and

p1ð_A_Af_AAffnnnn_{Þ ¼ ½}_l

0þ ðl1l0Þp1

n :

The important characteristic of split Bernoulli probability (17) is the com-position law; in the simultaneous case

pðgBBnn;;kk;p1p2Þ ¼

Xn

m¼0

pðBn;m;p1ÞðBBgmm;;kk;p2Þ ð19Þ

and in the sequential case

p BBBBgggnnnn;;;;kkkk;

l1l2p1p2

ðl0þ ðl1l0Þp1Þðl0þ ðl1l0Þp2Þ

¼X

n

m¼0

p BBBBgggnnnn;;;;mmmm;

l1p1

l0þ ðl1l0Þp1

p

ee

An Aen An An

Bm;k;

l2p2

l0þ ðl1l0Þp2

ð20Þ

and

p

ee

An Aen An An

Bn;k;

l1l2p1p2

ðl0þ ðl1l0Þp1Þðl0þ ðl1l0Þp2Þ

¼X

n

m¼0

p

ee

An Aen An An

Bn;m;

l₁p1

l0þ ðl1l0Þp1

p

ee

An Aen An An

Bm;k;

l₂p2

l0þ ðl1l0Þp2

:

As well as the characteristics of binomial probabilities in the case of fuzzy elementary events, one may consider the known property of exponential dis-tribution; in the simultaneous case

X1

m¼0

pðBBgmm;;nn;p1Þfðm;uÞ

¼ ðl0^l1Þð1vÞvnþ ðl1 ðl0^l1ÞÞð1uÞðp1uÞn; n6¼0;

X1

m¼0

pðBBgmm;;00;p1Þfðm;uÞ ¼

l0ð1uÞ

1 ð1p1Þu¼l0gð0;vÞ

ð21Þ

and in the sequential case

X1

m¼0

p

ee

An Aen An An

Bn;m;

l1p1

l0þ ðl1l0Þp1

fðm;uÞ ¼gðn;v0Þ; ð22Þ

where

(10)

v¼ p1u 1uþp1u

; v0¼ l1p1u

ð1p1Þl0þl1p1þ ð1p1Þl0u

:

7.2. The binomial distribution with fuzzy number of successes

Let setAn¼ f0;1; . . . ;ngbe considered. The fuzzy quantity ‘‘approximately k from n’’ is deﬁned as the fuzzy subset of An. Therefore the corresponding

distribution is

pðBekk n;pÞ ¼

Xn

l¼0

l_e kkðlÞpðB

l

n;pÞ; ð23Þ

where l_e

kkðlÞ is the membership function of fuzzy number ‘‘approximate k

fromn’’.

This distribution is also called the binomial distribution because it is char-acterized by the above composition law and the property of exponential dis-tribution.

7.3. Fuzzy upper binomial distribution

The consideration of the usual upper binomial distribution is based on the model of superposition of two events. The Bernoulli event and the emergence of the total amount of failures characterized by a priori probabilitypðB0Þ ¼ 1c.

If p1 is the probability of elementary success, l0 and l00 are values of

membership functions corresponding to complicated eventsð0zfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflffl{;0;. . .;0Þ

n

when distinguishing the events of a Bernoulli and non-Bernoulli origin, then the universal set X, which is the composition B0 ðS

n

i¼0Bn;iÞ, is split in the

fol-lowing way:

X¼ B0 f [n

i¼0 [n

i¼0

Bn;i

!

B0

f [n

i¼0 [n

i¼0

Bn;i

!D :

The corresponding membership function

v B0 f [n

i¼0 [n

i¼0

Bn;i

!ðx1;...;xnÞ

¼l0₀vB0ðx1;. . .;xnÞ þl0vBn;0ðx1;. . .;xnÞ

þX

n

i¼1

vBn;iðx1;. . .;xnÞ

(conditionBBgnn;;00fBB00 :ðS

n

(11)

The probability measure corresponding to fuzzy upper binomial distribution is

p B0

f [n

i¼0 [n

i¼0

Bn;i

!ðB0Bn;i;p1;cÞ

¼ 1

p B0

f_Sn

i¼0 Sn i¼0

Bn;i

!

l0

0ð1cÞ þl0cð1p1Þ

n

; i¼0;

c n

i p

i

1ð1p1Þ

ni

; i¼1;. . .;n;

8 > < >

: ð24Þ

where

p B0

f [n

i¼0 [n

i¼0

Bn;i

!

¼l0₀ð1cÞ þl0cð1p1Þ

n

þcð1 ð1p1ÞnÞ:

The Poisson limit

PðiÞ ¼l0₀ð1cÞ þl0cecþcð1ecÞ 1

l0

0ð1cÞ þl0cec; i¼0;

cexp cci i!

; i¼1;2;. . .;

(

ð25Þ

where_ii_and_c_{are connected by the relation}

_ii_¼ cc

l0

0ð1cÞ þl0cecþcð1ecÞ

:

From a practical viewpoint, what is interesting is the expression of the sum over all values ofl0andl00:

3

e

P

PðiÞ ¼ 1 ð1e

c_Þ_n_; _i_¼₀_; nexp cci

i!

; i¼1;2;. . .;

ð26Þ

where

n¼c

Z Z

06l0;l0061

l0₀ð1

cÞ þl0cecþcð1ecÞ 1

dl0dl00:

Taking into account the relation between c;nandii, then

e

P PðiÞ ¼

1nð1eii=n_Þ_; _i_¼₀_;

nexp ii n

ðii

nÞ

i

i!

; i¼1;2;. . .

8 < :

3_{Notice that formula (26) does not itself contain any fuzziness, being a nice instance of}

(12)

7.4. Negative binomial distribution withfuzzy elementary events [8]

Let the sequence of Bernoulli trials with probability of fuzzy success

pðe11Þ ¼l1p1 be considered ðl1:f0;1g ! ½0;1Þ, p1 the probability of usual Bernoulli elementary event, fðk;r;p1Þ denotes the probability that the rth success takes place in ðkþrÞth trial, provided that trials are continued up to

rth success. Accepting a splitting scheme that is used for binomial distribution with fuzzy elementary events, one can write

fðk;r;pðe11ÞÞ ¼ ðl0^l1Þ

rþk1

k

pr

1ð1p1Þ

k

; k>0;

l₁pr

1; k¼0:

8 <

: ð27Þ

Since for anym>0,

mþk1

k

¼ ð1Þk m

k

;

then the above formula can be written in the following form:

fðk;r;pðe11ÞÞ ¼ ðl0^l1Þ r

k

ð1Þkpr

1ð1p1Þ

k_; _k_>

0;

l1pr1; k¼0:

8 <

: ð28Þ

Deﬁne negative binomial distribution with fuzzy elementary events, but ﬁxed real numberr>0and 0<p1<1 as sequence

fðpÞ1fðk;r;pðe11ÞÞg; ð29Þ

where

p¼X

1

k¼0

fðk;r;pðe11ÞÞ ¼ ðl0^l1Þ þ ½l1 ðl0^l1Þpr1:

Note that ifl0;l1!1, orðl0^l1Þ ¼l1, then (29) reduces to usual negative binomial distribution.

7.5. Fuzzy Fucks’ distribution

As in the case of ‘‘upper Bernoulli’’ distribution, all variants of FucksÕ distribution [3] are based on the assumption that FucksÕ event is a superposi-tion of Bernoulli and deterministic events

Uk

n;r;p1 ¼BrB

kr nr; U

k n;p1¼

[n

r¼0

BrBkn;rr;p1

; ð30Þ

whereBr is deterministic (certainlyrsuccesses inn trials) andBnkrr;p1 is a Ber-noulli event ((kr) successes in (nr) random events).

(13)

There are many variants of FucksÕevent splitting, but only some of them are considered in this paper.

(1) The deterministic event is non-fuzzy, but Bernoulli elementary events are fuzzy. In this case

g

Uk n;p1

Uk n;p1 ¼

[n

r¼0

BrBBgkn;r;rp1

kr n;r;p1

:

The corresponding probability measure is

PðUgk n;p1 Uk

n;p1Þ ¼ Pn

r¼0qrðl0^l1Þ nr kr

pkr

1 ð1p1Þnk; k¼1;2;. . .;n1; Pn

r¼0qrl1p1nr; k¼n; qol0ð1p1Þn; k¼0; 8

> > < > > :

ð31Þ

(for simultaneous splitting) with

Xn

k¼0

PðUgk n;p1

Uk n;p1Þ ¼

l1þ ðl0^l1Þq0ð1p1Þ

n

; l0Pl1;

l0þ ðl1l0Þ Pn

r¼0qrpn1r; l0<l1

and

PðUggk n;p1

Uk n;p1 g

Uk n;p1

Uk n;p1Þ ¼

Xn

r¼0

q_r nr

kr

ðl₁p1Þkrðl₀ð1p1ÞÞnk ð32Þ

(for sequential splitting) with

Xn

k¼0

PðUggk n;p1

Uk n;p1 g

Uk n;p1

Uk n;p1Þ ¼

Xn

r¼0

qrðl0þ ðl1l0Þp1Þ

nr :

Hereq_r is connected to linguistic spectrum [3].

(2) Br events are splitðBr¼BBfrrfBBrrDÞand Bernoulli events are crisp:

g

Uk n;p1

Uk n;p1 ¼

[n

r¼0

fBBrrBknrr;p1

:

Evidently

PðUgk n;p1

Uk n;p1Þ ¼

Xn

r¼0

v_rq_r nr

kr

pkr

1 ð1p1Þ

nk_;

ð33Þ

wherev_r is the membership function of fuzzy setfBBrr and

Xn

k¼0

PðUgk n;p1

Uk n;p1Þ ¼

Xn

r¼0

v_rq_r: ð34Þ

(3) In the case when both deterministic and Bernoulli events are split, one must discriminate clearly the simultaneous and successive or sequential splitting

(14)

of FucksÕevent. In the last case it is easy to obtain the ﬁnal result. Consid-eration of the two aforesaid cases allows one to write

g g

Uk_n_;_p₁ Ugk_n_;_p₁ Uk_n_;_p₁ Uk_n_;_p₁ ¼[

n

r¼0

Br

g

Bkr nr;p1

Bkr nr;p1 g g

Bkr nr;p1

Bkr nr;p1 g

Bkr nr;p1

Bkr nr;p1 8 < : 0 @ 1

A; ð35Þ

consequently

PðUggk n;p1

Uk n;p1 g

Uk n;p1

Uk n;p1Þ ¼

Pn

r¼0qrðvr^ ðl0^l1ÞÞ

nr

kr

pkr

1 ð1p1Þ

nk

; k¼1;. . .;n1;

Pn

r¼0qrðvr^l1Þpn1r; k¼n; ðv0^l0Þq0ð1p1Þ

n

; k¼0;

8 > > < > > : ð36Þ

(simultaneous splitting of Bernoulli event) and

PðUggk n;p1

Uk n;p1 g

Uk n;p1

Uk n;p1Þ ¼

Xn

r¼0

v_rq_r nr

kr

ðl₁p1Þkrðl₀ð1p1ÞÞnk ð37Þ

(completely sequential splitting of Bernoulli event). When Fucks’ event is split simultaneously the author’s reasoning is as follows: ðBrBknrr;p1Þ is a realized chain of distributed successes and failures, a chain that is a concatenation of two others: Deterministic in which there are onlyr suc-cesses and Bernoulli sequence of length (nr) containing krsuccesses. Therefore simultaneous splitting must take place according to the rule

lðBrBBgknrr;p1

kr

nr;p1Þ ¼lðfBBrrÞ ^lðBg

kr nr;p1

Bkr

nr;p1Þ: ð38Þ Consequently

PðUgk n;p₁ Uk

n;p₁Þ ¼

Pn

r¼0qrðvr^ ðl0^l1ÞÞ

nr

kr

pkr

1 ð1p1Þ

nk

; k¼1;. . .;n1;

Pn

r¼0qrðvr^l1Þp1nr; k¼n; ðv₀^l₀Þq₀ð1p1Þn; k¼0:

8 > > < > > : ð39Þ

The considered fuzzy FucksÕ distributions play a leading part in con-structing fuzzy quantitative micro-linguistic models of language.

(4) Some language structures are often described by generalized FucksÕ distri-butions when there are two kinds of successes with probabilitiesp andq. In this case

P

P0ðUkg

n;Wr;p;q Uk

n;Wr;p;qÞ ¼

Xn

r¼0

qr

nr

kr

(15)

If one is only interested in one kind of success then

P P00ðUkg

n;Wr;p;q Uk

n;Wr;p;qÞ ¼

Xn

r¼0

q_r nr

kr

Z 1

0

ðpqÞkrð1pþpð1qÞÞnkdp: ð41Þ

The corresponding Poisson limit (n! 1;q!0and a_iiexpPn r¼0rqr¼ q=2ðnPn_r_¼0rqrÞ ¼const) is

F0ðUgk Wr

Uk WrÞ ¼e

aX

1

r¼0

qr akr

ðkrÞ!/krðaÞ; ð42Þ

where

/krðaÞ ¼

ea

2akrþ1 Z 2a

0

tkretdt¼e

a_C_ð_k_r_þ_1Þ

2akrþ1 Pðkrþ1;2aÞ; ð43Þ

CðzÞ is an Euler integral, and Pðkrþ1;2aÞ an incomplete function. Taking into account the relation between the incomplete gamma-function andv2_{-distribution one ﬁnally obtains}

F0ðUgk Wr

Uk WrÞ ¼

1 2a

X1

r¼0

qrPð4ajkrþ1Þ; ð44Þ

wherePð4ajkrþ1Þis v2_{-distribution with 2ð}_k_r_þ_1Þ_{degrees of} free-dom. Distribution (44) is called the ‘‘v2_{-distribution with approximately} ðkrþ1Þdegrees of freedom’’.

7.6. Fuzzy Zipf–Mandelbrot distribution

It is a well known that MandelbrotÕs theory of recurrent coding constitutes the basis of statistical macro-linguistics. If the vocabulary of volumeRis di-vided into S classes according to informational cost [9] of words of a given class, then the probability of the word ofkth class can be expressed as

pk¼PMBCk; ð45Þ

whereP;M;B do not depend on the cost and Ck is a kth class informational

cost.

Let three cases of splitting be considered.

(1) The set of classesK¼ ðk1;k2;. . .;kSÞ ¼ KKe KKeD. In this case

f

pki

pki ¼le_kkðkiÞpki; i¼1;. . .;S: ð46Þ

(2) The set of informational costs C¼ ðc1;. . .;cSÞ ¼CCe CCeD. Since pk is a

function ofck, according to the principle of generalization [17] one obtains

f

pki

(16)

(3) When the number of classes is fuzzy numbereSS ¼S1s¼1ðle_S_SðSÞ=SÞ, by anal-ogy with binomial distribution with fuzzy number of trials, one can write

e

p

p_k ¼X

1

S¼1

l_e S

SðSÞPðSÞM

BðSÞCk_: _ð48Þ

The above-mentioned formulae must be applied to the whole language as a formation medium, while the classical one must be applied to individual texts.

8. Linear structure

One of the research methods of a linear structure of language elements se-quence is the gap analysis method which consists in the following: the elements of a sequence are not distributed randomly (in disorder); any deviation from full disorder indicates the presence of some structure. The quantitative inves-tigation techniques are as follows: The pair of elements are ﬁxed by some features; elements between the ﬁxed ones are considered as gaps. Hence the sequence may have the following form:– – –½a1– – –½a2– – –½b1½a3– – –½b2– – –

½a4½b3– – –½b4– – –½a5– – –½b5– – –½a6– – –½b6– – –.

Let the structure deﬁned by elements ½a–½b be considered. The complex consisting of ½a nearest ½b and the gaps between them are called ‘‘words’’ (MandelbortÕs deﬁnition).

So as to describe mathematically such word generation, the model is applied according to the generation process of any analyzing structure represented as the superposition of two processes: probabilistic and possibilistic. Therefore, one may apply the considered fuzzy probability measures for describing gap distribution by words. The gap analysis method, together with suggested modeling schemes, allows one to establish the structural dependence between elements of any level.

The main characteristics of linear models are the components of the lin-guistic spectrum (q_r;/_r;w_r). Their determination is reduced to the solution of the system of equations

ok_G_ð_y_;_a_Þ

oyk

" " " "_y

!1

¼iði1Þ ðikþ1Þexp; k¼1;2;. . .; ð49Þ

where

Gðy;aÞ ¼X

l

PðlÞyl_;

PðlÞis the probability distribution of the chosen model,ais a known function of linguistic spectrum components andð Þexp are measured moments of gap distribution. A special method is elaborated for solving system (49). The

(17)

de-termination of the linguistic spectrum allows one to calculate the informational content of any given structure.

9. Bag statistics for consonantal structures of languages

One method of analyzing several structures of printed information entails the investigation of the probabilistic–possibilistic organization of some of the distribution elements determining the analytic structure [7]. From the point of view of the chosen elements, printed information can be considered as carteges of elements or YagerÕs bags, the main characteristics of which can serve as quantitative analytic parameters, and the probabilistic–possibilistic model parameters as the characteristics of the structures studied in this paper.

The probabilistic–possibilistic organization of bag distributions is described by generalized FucksÕ distribution [4]. Some of the results obtained from this distribution are given below. The aforesaid FucksÕdistribution is based on the superposition of the following two processes:

Uk

n;r;p¼BrBnkrr;p; U k n;p¼

[n

r¼0

BrBknrr;p

; ð50Þ

where Uk

n;r;p is a FucksÕ event, Bnkrr a BernoulliÕs event and Br the so-called

deterministic event [3]. There are many ways of splitting FucksÕevent [4]. But only the event related to the bag distribution model will be considered here. That is to say, the case in which BernoulliÕs event is classical but the deter-ministic event

e

U Uk_n_;_p¼[

k

r¼0 e

B

BrBknrr;p ð51Þ

is split.

Let the structure of the set of events UUek_n_;_p be described.

Before continuing, the following comments should be made: In fuzzy subset applications the problem of evaluating the membership grade is highly im-portant. The membership grade is a result of expert research determining (creating) the fuzzy subset. Let the method making it possible to reveal the membership function in a logically consistent way be considered. It is supposed that the fuzzy subset elements are such that I

eAAðx

0_Þ_P_I

eAAðx

00_Þ, _x0_;_x00₂_X0_{, if}

x0_"_x00_;_I

eAA is a membership function of eAA. In the present caseX

0 _{consists of}

BernoulliÕs eventsBir

nr,iis a full number of successes andris a ﬁxed number of

successes determining the structure of eventfFFi

n;r;p and corresponding to event Br. The fuzzy subsets considered here are normalized. This permits one to

(18)

can be easily related to focal probabilities, without the necessity of making any additional assumptions.

Let the random experiment in which the level set notion is used and the YagerÕs algorithm is represented byx0₂_X0_{[13] be considered. Firstly let value}

a2 ½0;1and the element from the corresponding set ofa-level be chosen. Now let the probability of choosing speciﬁcally element x0₂_X0 _{be calculated}

ac-cording to the conditions established in this example. In accordance with this assumption

06₁6₂66_n_max_¼₁_;

where r are values of the membership function (components of possibility

distribution, or components of the so-called linguistic spectrum [3]). The level sets are as follows:

when 06_a61:B1¼ fx01;. . .;x

0

ng; 16a62:B2¼ fx02;. . .;x0ng; 26a63:B3¼ fx03;. . .;x

0

ng; . . .

n26a6n1:Bn1¼ fx0n1;x

0

ng; n16a6n:Bn¼ fx0ng;

<a:Ba¼ ;:

Because a was chosen randomly in this example, then the probability that level set Br will be chosen is equal to the length of interval ðr1; rÞ, mðBrÞ ¼rr1. Besides, an element is chosen from the level set in accordance

with BernoulliÕs probability model, thus

Fðchoose element x0_j_B

rÞ

¼

nr

ir

pir_ð1_p_ÞnrðirÞ

if x0₂_B

r;

0if x0₆₂_B

r:

8 > < >

: ð52Þ

Then, according to the formula of full probability

FnðiÞ ¼

Xn

r¼1

mðBrÞ

nr

ir

pir_ð1_p_Þni

ði¼1;nÞ: ð53Þ

Or in Poisson limit

FðiÞ ¼eaX

1

r¼1

mðBrÞ air

ðirÞ! ði¼1;nÞ: ð54Þ

a¼_ii_rr_¼_const,_ii_{is the average empirical value of random variable}_n i¼i, rr¼P1_r_¼₁rmðBrÞin full accordance with the model described in this paper.

(19)

The above example provides an explanation of the rule of probabilistic and possibilistic uncertainty index composition [2]. From (53) one can obtain

mðBrÞ ¼ea

Xr

k¼1

ð1Þk1Fðrkþ1Þ a

k1

ðk1Þ!; ðr¼1;nÞ: ð55Þ

The information contained in the distribution moments must be used for determining parameter a. This can be achieved by means of the relationship between focal and empirical moments

lfocal_k ¼X

k

l¼0

ð1Þl r

l M

emp

kla

l_; _ð56Þ

where lfocal

k ¼

Pm

r¼1rðr1Þðrkþ1ÞmðBrÞ;M

emp

j

P_k_ð_k_1Þð_k_j_þ_1Þ_f

k; fk are the empirical frequencies.

In the case of the ﬁnite spectrum the higher moments from some order are equal to 0. This condition allows one to obtain the equation fora.

Another way of obtaining the equation foracan be formulated as follows: The empirical frequencies from some i are practically #0; in this case it is natural to assume that themðBrÞforr¼ialso#0. One obtains the equation

PðrÞ Pðr1ÞaþPðr2Þa

2

2!þ þ ð1Þ r1

Pð1Þ a

r1

ðr1Þ!¼0: ð57Þ

The numerical solution of such an equation does not present any diﬃculties. It is essential to choose the positive solution from those obtained from the aforementioned equation, which fulﬁlls condition

aþX

r

m¼1

m¼ii:

It is worth mentioning that the choice of a solution in all cases is now the subject of further research. The method described above is applied to the in-vestigation of consonantal structures in English, French, Latin, Spanish and Georgian. Empirical data are obtained from [10], presenting word frequencies with consonantal structures in accordance with the number of syllables. Using YagerÕs notation [14], the following types of bags are subject to processing:

fccvg ¼ ð1=v;2=cÞ; fvccg ¼ ð1=v;2=cÞ;

fccvccg ¼ ð1=v;4=cÞ; fccvccvccg ¼ ð2=v;6=cÞ;

fvcccg ¼ ð1=v;3=cÞ; fcccvg ¼ ð1=v;3=cÞ:

Additionally, the mixed case representing all consonantal structures is considered. v represents a vowel and c a consonant. All the structures are typical of the above-mentioned languages.

(20)

From condition PðiÞ #0and data regarding_ii _{one obtains the following}

equations for each of the languages under investigation: (1) English language:

(a) Mixed case

a38:7463a2þ13:6513a5:6083¼0;

(b) Structurecc

a3₆_:₉₃₆₀_a2_þ₁₀_:₀₄₂₃_a₃_:₉₀₂₆_¼₀_;

(c) Structurecccc

a3₁₂_:₅₂₇₈_a2_þ₂₈_:₀₄₈₁_a₁₂_:₈₆₉₂_¼₀_;

(d) Structurecccccc

a3₇_:₉₇₃₁_a2_þ₁₁_:₈₄₂₃_a₁_:₈₁₇₂_¼₀_;

(e) Structureccc

a4₁₄_:₀₉₂₅_a3_þ₂₀_:₀₅₀₆_a2₁₂_:₇₂₀₆_a_þ₂_:₃₆₈₅_¼₀_: (2) French language:

(a) Mixed case

a3₁₄_:₇₉₉₇_a2_þ₃₇_:₂₂₂₂_a₁₉_:₃₂₂₃_¼₀_;

(b) Structurecc

a3₁₄_:₀₁₁₈_a2_þ₃₂_:₄₅₂₉_a₁₅_:₆₉₁₂_¼₀_;

(c) Structurecccc

a4₇_:₄₃₅₂_a3_þ₁₃_:₄₀₀₀_a2₁₀_:₀₄₄₄_a_þ₂_:₂₂₂₂_¼₀_;

(d) Structurecccccc

a4₃₀_:₃₉₈₄_a3_þ₈₀_:₇₈₉₁_a2₇₆_:₇₈₁₃_a_þ₁₇_:₅₇₈₁_¼₀_;

(e) Structureccc

a3₁₂_:₆₃₂₂_a2_þ₂₆_:₄₂₂₀_a₁₃_:₉₂₄₈_¼₀_: (3) Latin language:

(a) Mixed case

a3₇_:₁₃₈₇_a2_þ₁₀_:₉₄₁₆_a₄_:₀₀₇₃_¼₀_; (b) Structurecc

a3₅_:₉₀₂₀_a2_þ₈_:₆₇₀₈_a₃_:₁₅₄₇_¼₀_; (c) Structurecccc

(21)

(d) Structurecccccc

a326:9878a2þ115:9512a43:9756¼0;

(e) Structureccc

a3₈_:₁₃₂₃_a2_þ₁₀_:₈₂₂₈_a₃_:₆₄₉₄_¼₀_:

(4) Spanish language: (a) Mixed case

a514:9809a4þ66:0878a396:1527a2þ59:8855a14:7710¼0;

(b) Structurecc

a5₁₂_:₃₁₀₇_a4_þ₄₇_:₉₄₂₉_a3₅₈_:₃₇₁₄_a2_þ₂₈_:₇₁₄₃_a₅_:₈₂₈₆_¼₀_;

(c) Structurecccc

a541:8090a4þ247:5000a3419:4231a2þ306:9231a82:3077¼0;

(d) Structurecccccc

a5₂₄_:₂₄₄₀_a4_þ₁₁₉_:₄₈₉₆_a3₁₈₁_:₅₃₁₁_a2_þ₁₁₀_:₈₁₃₄_a₁₉_:₇₁₂₉_¼₀_;

(e) Structureccc

a5₁₅_:₅₇₀₄_a4_þ₅₉_:₃₁₇₂_a3₆₀_:₆₉₉₄_a2_þ₂₂_:₄₈₁₃_a₅_:₅₉₅₃_¼₀_:

(5) Georgian language: (a) Mixed case

a49:3378a3þ22:6638a239:2616aþ17:7409¼0;

(b) Structurecc

a4₈_:₈₁₂₀_a3_þ₂₆_:₇₄₇₄_a2₃₆_:₇₂₇₀_a_þ₁₄_:₈₆₈₀_¼₀_;

(c) Structurecccc

a5₁₇_:₇₈₇₈_a4_þ₁₁₃_:₁₂₉₅_a3₂₆₀_:₉₃₅₃_a2_þ₂₈₂_:₀₈₆₀_a₉₃_:₄₅₃₀_¼₀_;

(d) Structurecccccc

a4₁₄_:₀₈₆₆_a3_þ₄₆_:₂₆₀₀_a2₆₉_:₆₈₀₆_a_þ₂₇_:₇₁₄₈_¼₀_;

(e) Structureccc

a510:6601a4þ31:0880a352:4083a2þ42:4694a8:8487¼0:

The calculation results of parameter a, the spectral parameter values, the ﬁrst empirical and focal moments and the empirical and model frequencies are given in Tables 1–5.

(22)

Table 1 English

N Mean value of word

length (experiment)

a Distribution characteristics

_ii _m_ð_B₁_Þ₌₁ _m_ð_B₂_Þ₌₂ _m_ð_B₃_Þ₌₃ _m_ð_B₄_Þ₌₄

1 Mixed case 0.6983 0:2711

1:0000

0:6000 0:7289

0:1309

0:1281 –

2.5531 rr¼1:8654

2 Structurecc 0.6639 0:3304

1:0000

0:5475 0:6696

0:1167

0:1221 –

2.4630 rr¼1:7755

3 Structurecccc 0.6242 0:1474

1:0000

0:5238 0:8525

0:3337

0:3287 –

2.7997 rr¼2:1961

4 Structurecccccc 0.1732 0:0000

1:0000

0:1990 1:000

0:4945 0:8018

0:3042 0:3065

3.2886 rr¼3:0983

5 Structureccc 0.2965 0:1949

1:0000

0:6289 0:8051

0:1294 0:1762

0:0320 0:0468

2.3251 rr¼1:9689

Empirical and theoretical frequencies

Pð1Þ Fð1Þ

Pð2Þ Fð2Þ

Pð3Þ Fð3Þ

Pð4Þ Fð4Þ

Pð5Þ Fð5Þ

Pð6Þ Fð6Þ

Pð7Þ Fð7Þ

Pð8Þ Fð8Þ

1 0:1348

0:1348

0:3930 0:3930

0:3067 0:3067

0:1260 0:1260

0:0311 0:0311

0:0063 0:0068

0:0015 0:0015

0:0002 0:0001

2 0:1701

0:1701

0:3948 0:3948

0:2847 0:2847

0:1123 0:1122

0:0303 0:0284

0:0062 0:0054

0:0015 0:0015

0:0001 0:0000

3 0:0790

0:0790

0:3299 0:3299

0:3693 0:3693

0:1698 0:1694

0:0410 0:0463

0:0077 0:0077

0:0030 0:0033

0:0002 0:0003

4 0:0000

0:0000

0:1674 0:1674

0:4449 0:4449

0:3304 0:3304

0:0507 0:0503

0:0022 0:0042

0:0044 0:0024

0:0000 0:0000

5 0:1449

0:1449

0:5105 0:5105

0:2473 0:2412

0:0768 0:0750

0:0143 0:0140

0:0011 0:0017

0:0000 0:0017

0:0000 0:0000

F. Criado et al. / Information Sciences 147 (2002) 13–44

(23)

French

length (experiment)

ii mðB1Þ=1 mðB2Þ=2 mðB3Þ=3 mðB4Þ=4 mðB5Þ=5

1 Mixed case 0.7099 0:1218

1:0000

0:5145 0:8782

0:3608 0:3737

0:0000 0:0029

2.9602 rr¼2:2332

1:0000

0:5353 0:8659

0:3455 0:3300

0:0000 0:0096 –

2.8779 rr¼2:2412

1:0000

0:3109 0:9616

0:6507 0:6507

0:1573 0:1857

0:0396 0:0284

3.2295 rr¼2:8819

1:0000

0:0711 1:0000

0:5171 0:9289

0:3050 0:4118

0:0979 0:0968

3.7877 rr¼3:4070

1:0000

0:5846 0:8262

0:2081 0:2416

0:0000 0:0335 –

2.9888 rr¼1:9673

Pð1Þ Fð1Þ

Pð2Þ Fð2Þ

Pð3Þ Fð3Þ

Pð4Þ Fð4Þ

Pð5Þ Fð5Þ

Pð6Þ Fð6Þ

Pð7Þ Fð7Þ

Pð8Þ Fð8Þ

1 0:0599

0:0599

0:2955 0:2955

0:3716 0:3716

0:1936 0:1929

0:0605 0:0603

0:0148 0:0133

0:0034 0:0023

0:0004 0:0003

2 0:0680

0:0680

0:3176 0:3176

0:3678 0:3733

0:1801 0:1843

0:0514 0:0545

0:0123 0:0120

0:0024 0:0015

0:0003 0:0002

3 0:0267

0:0000

0:2160 0:2160

0:4015 0:4015

0:2412 0:2412

0:0904 0:0904

0:0200 0:0100

0:0033 0:0029

0:0008 0:0008

4 0:0000

0:0000

0:0512 0:0512

0:3891 0:3891

0:3447 0:3447

0:1638 0:1631

0:0375 0:0373

0:0136 0:0148

0:0000 0:0000

5 0:0745

0:0745

0:3137 0:3137

0:3282 0:3282

0:1729 0:1715

0:0683 0:0608

0:0279 0:0327

0:0134 0:0032

0:0010 0:0045

F. Criado et al. / Information Sciences 147 (2002) 13–44 35

(24)

Table 3 Latin

length (experiment)

_ii _m_ð_B₁_Þ₌₁ _m_ð_B₂_Þ₌₂ _m_ð_B₃_Þ₌₃ _m_ð_B₄_Þ₌₄ _m_ð_B₅_Þ₌₅

1 Mixed case 0.5456 0:0139

1:0000

0:2837 0:9860

0:5203 0:7023

0:1108 0:1820

0:0000 0:0053

3.4213 rr¼2:5854

1:0000

0:3400 0:9851

0:4815 0:6451

0:1510 0:1636

0:0000 0:1260

3.3504 rr¼2:7434

1:0000

0:1693 0:9877

0:6023 0:8184

0:2284 0:2161

0:0000 0:0000

3.6045 rr¼3:0714

1:0000

0:0374 1:0000

0:3209 0:9626

0:5853 0:6417

0:0000 0:0902

4.1141 rr¼3:4087

1:0000

0:2724 0:9775

0:5895 0:7061

0:1284 0:1156 –

3.3514 rr¼2:8564

Pð1Þ Fð1Þ

Pð2Þ Fð2Þ

Pð3Þ Fð3Þ

Pð4Þ Fð4Þ

Pð5Þ Fð5Þ

Pð6Þ Fð6Þ

Pð7Þ Fð7Þ

Pð8Þ Fð8Þ

1 0:0081

0:0000

0:1644 0:1644

0:3912 0:3912

0:2998 0:2535

0:1098 0:0844

0:0245 0:0203

0:0020 0:0029

0:0002 0:0006

2 0:0086

0:0000

0:1959 0:1959

0:3854 0:3854

0:2831 0:2897

0:1030 0:1000

0:0217 0:0217

0:0022 0:0035

0:0001 0:0004

3 0:0069

0:0000

0:0943 0:0943

0:3906 0:3906

0:3397 0:3397

0:1351 0:1351

0:0307 0:0347

0:0021 0:0059

0:0005 0:0008

4 0:0000

0:0000

0:0246 0:0246

0:2213 0:2213

0:4754 0:4754

0:1803 0:1803

0:0902 0:0365

0:0081 0:0050

0:0000 0:0005

5 0:0130

0:0130

0:1580 0:1650

0:4283 0:4296

0:2850 0:2850

0:0961 0:0751

0:0195 0:0210

0:0000 0:0033

0:0000 0:0000

(25)

Spanish

N Mean value of word length (experiment)

ii mðB1Þ

1

mðB2Þ 2

mðB3Þ 3

mðB4Þ 4

mðB5Þ 5

mðB6Þ 6

mðB7Þ 7

1 Mixed case 0.7504 0:0044

1:0000

0:02219 0:9956

0:4985 0:7737

0:1719 0:2752

0:0985 0:1033

3.7844 rr¼3:1138

1:0000

0:2420 0:9405

0:4634 0:7526

0:2897 0:2892

0:0008 0:0601

3.5912 rr¼3:0424

1:0000

0:0586 0:9991

0:4531 0:9405

0:4279 0:4874

0:0474 0:0595

0:0157 0:0121

4.1252 rr¼3:5402

1:0000

0:0011 1:0000

0:0840 0:9989

0:3829 0:9149

0:3863 0:5320

0:1243 0:1457

0:0229 0:0214

4.9046 rr¼4:6234

1:0000

0:1527 1:0000

0:4389 0:8473

0:3431 0:4184

0:0591 0:0653

0:0035 0:0062

0:0000 0:0013

3.5687 rr¼3:3110

Pð1Þ Fð1Þ

Pð2Þ Fð2Þ

Pð3Þ Fð3Þ

Pð4Þ Fð4Þ

Pð5Þ Fð5Þ

Pð6Þ Fð6Þ

Pð7Þ Fð7Þ

Pð8Þ Fð8Þ

1 0:0021

0:0000

0:1048 0:1047

0:3140 0:3140

0:3463 0:3463

0:1662 0:1788

0:0523 0:0574

0:0129 0:0153

0:0013 0:0020

2 0:0031

0:0000

0:1400 0:1400

0:3447 0:3447

0:3353 0:3353

0:1362 0:1361

0:0335 0:0332

0:0068 0:0063

0:0004 0:0008

3 0:0005

0:0000

0:0312 0:0312

0:2609 0:2609

0:3861 0:3861

0:2181 0:2104

0:0798 0:0711

0:0214 0:0211

0:0020 0:0037

4 0:0000

0:0000

0:0008 0:0000

0:0627 0:0627

0:3040 0:3040

0:3746 0:3746

0:1897 0:1897

0:0579 0:0579

0:0103 0:0103

5 0:0000

0:0000

0:1201 0:1201

0:3740 0:3740

0:3562 0:3562

0:1215 0:1215

0:0225 0:0225

0:0056 0:0027

0:0000 0:0002

F. Criado et al. / Information Sciences 147 (2002) 13–44 37

(26)

Table 5 Georgian

N Mean value of word length (experiment)

ii mðB1Þ

1

mðB2Þ 2

mðB3Þ 3

mðB4Þ 4

mðB5Þ 5

mðB6Þ 6

mðB7Þ 7

mðB8Þ 8

1 Mixed case 0.6216 0:0136

1:0000

0:2199 0:9864

0:3770 0:7665

0:2446 0:3895

0:1236 0:1419

0:0000 0:0213

3.9542 rr¼3:1808

1:0000

0:2446 0:9890

0:3787 0:7444

0:2449 0:3657

0:1244 0:1208

0:0000 0:0000

3.8923 rr¼3:2379

1:0000

0:0964 0:9988

0:2898 0:9084

0:3722 0:6126

0:1687 0:2404

0:0693 0:0717

4.3921 rr¼3:8115

1:0000

0:0000 1:0000

0:1200 0:9859

0:3935 0:8659

0:2625 0:4724

0:1625 0:2099

0:0090 0:0474

0:0518 0:0465

5.2287 rr¼4:7412

1:0000

0:2229 0:9882

0:4062 0:7653

0:2101 0:3591

0:1091 0:1490

0:0339 0:00399

3.5801 rr¼3:2655

Pð1Þ Fð1Þ

Pð2Þ Fð2Þ

Pð3Þ Fð3Þ

Pð4Þ Fð4Þ

Pð5Þ Fð5Þ

Pð6Þ Fð6Þ

Pð7Þ Fð7Þ

Pð8Þ Fð8Þ

1 – 0:1181

0:1181

0:2757 0:2759

0:2821 0:2801

0:1932 0:1920

0:0873 0:0755

0:0277 0:0276

0:0000 0:0066

2 – 0:1271

0:1271

0:2800 0:2800

0:2833 0:2833

0:1945 0:1948

0:0789 0:0799

0:0240 0:0243

0:0000 0:0042

3 – 0:0556

0:0556

0:1978 0:1978

0:3145 0:3151

0:2418 0:2422

0:1307 0:1308

0:0433 0:0449

0:0000 0:0096

4 – – 0:0745

0:0748

0:2624 0:2624

0:2872 0:2872

0:2163 0:2163

0:0496 0:0867

0:0496 0:0512

5 – 0:1636

0:1636

0:3488 0:3488

0:2543 0:2543

0:1429 0:1429

0:0579 0:0578

0:0122 0:0122

0:0031 0:0016

(27)

English model distributions:

FmixðiÞ ¼e0:6983

0:2711ð0:6983Þ

i1

ði1Þ! þ0:6000

ð0:6983Þi2 ði2Þ!

þ0:1309ð0:6983Þ

i3 ði3Þ!

!

;

Fcc_ð_i_{Þ ¼}_e0:6639 ₀_:₃₃₀₄ð0:6639Þ

i1

ði1Þ! þ0:5475

ð0:6639Þi2 ði2Þ!

þ0:1167ð0:6639Þ

i3

ði3Þ!

!

;

FccccðiÞ ¼e0:6246 ₀_:₁₄₇₄ð0:6242Þ

i1

ði1Þ! þ0:5238

ð0:6242Þi2 ði2Þ!

þ0:3337ð0:6242Þ

i3

ði3Þ!

!

;

FccccccðiÞ ¼e0:1732 0:1990ð0:1732Þ

i2

ði2Þ! þ0:4945

ð0:1732Þi3 ði3Þ!

þ0:3042ð0:1732Þ

i4 ði4Þ!

!

;

Fccc_ð_i_{Þ ¼}_e0:2965

0:1949ð0:2965Þ

i1

ði1Þ! þ0:6289

ð0:2965Þi2 ði2Þ!

þ0:1294ð0:2965Þ

i3

ði3Þ! þ0:0320

ð0:2965Þi4 ði4Þ!

!

:

French model distributions:

Fmix_ð_i_{Þ ¼}_e0:7099 ₀_:₁₂₁₈ð0:7099Þ

i1

ði1Þ! þ0:5145

ð0:7099Þi2 ði2Þ!

þ0:3608ð0:7099Þ

i3 ði3Þ!

!

;

Fcc_ð_i_{Þ ¼}_e0:6791

0:1341ð0:6791Þ

i1

ði1Þ! þ0:5353

ð0:6791Þi2 ði2Þ!

þ0:3455ð0:6791Þ

i3

ði3Þ!

!

(28)

FccccðiÞ ¼e0:3641 0:0384ð0:3641Þ

i1

ði1Þ! þ0:3109

ð0:3641Þi2 ði2Þ!

þ0:6507ð0:3641Þ

i3

ði3Þ! þ0:1573

ð0:3641Þi4 ði4Þ!

þ0:0396ð0:3641Þ

i5 ði5Þ!

!

;

FccccccðiÞ ¼e0:3287 ₀_:₀₇₁₁ð0:3287Þ

i2

ði2Þ! þ0:5171

ð0:3287Þi3 ði3Þ!

þ0:3050ð0:3287Þ

i4

ði4Þ! þ0:0979

ð0:3287Þi5 ði5Þ!

!

;

FcccðiÞ ¼e0:6470 0:1738ð0:6470Þ

i1

ði1Þ! þ0:5846

ð0:6470Þi2 ði2Þ!

þ0:2081ð0:6470Þ

i3

ði3Þ!

!

:

Latin model distributions:

Fmix_ð_i_{Þ ¼}_e0:5456 ₀_:₀₁₃₉ð0:5456Þ

i1

ði1Þ! þ0:2837

ð0:5456Þi2 ði2Þ!

þ0:5203ð0:5456Þ

i3

ði3Þ! þ0:1108

ð0:5456Þi4 ði4Þ!

!

;

FccðiÞ ¼e0:5515 ₀_:₀₁₄₉ð0:5515Þ

i1

ði1Þ! þ0:3400

ð0:5515Þi2 ði2Þ!

þ0:4815ð0:5515Þ

i3

ði3Þ! þ0:1510

ð0:5515Þi4 ði4Þ!

!

;

Fcccc_ð_i_{Þ ¼}_e0:5855 ₀_:₀₁₂₃ð0:5855Þ

i1

ði1Þ! þ0:1693

ð0:5855Þi2 ði2Þ!

þ0:6023ð0:5855Þ

i3

ði3Þ! þ0:2284

ð0:5855Þi4 ði4Þ!

!

(29)

FccccccðiÞ ¼e0:4196 0:0374ð0:4196Þ

i2

ði2Þ! þ0:3209

ð0:4196Þi3 ði3Þ!

þ0:5853ð0:4196Þ

i4

ði4Þ!

!

;

FcccðiÞ ¼e0:5466 ₀_:₀₂₉₅ð0:5466Þ

i1

ði1Þ! þ0:2724

ð0:5466Þi2 ði2Þ!

þ0:5895ð0:5466Þ

i3

ði3Þ! þ0:1284

ð0:5466Þi4 ði4Þ!

!

:

Spanish model distributions:

Fmix_ð_i_{Þ ¼}_e0:7504 ₀_:₀₀₄₄ð0:7504Þ

i1

ði1Þ! þ0:2219

ð0:7504Þi2 ði2Þ!

þ0:4985ð0:7504Þ

i3

ði3Þ! þ0:1719

ð0:7504Þi4 ði4Þ!

þ0:0985ð0:7504Þ

i5 ði5Þ!

!

;

Fcc_ð_i_{Þ ¼}_e0:5475

0:0054ð0:5475Þ

i1

ði1Þ! þ0:2420

ð0:5475Þi2 ði2Þ!

þ0:4634ð0:5475Þ

i3

ði3Þ! þ0:2897

ð0:5475Þi4

ði4Þ! þ0:0008

ð0:5475Þi5 ði5Þ!

!

;

Fcccc_ð_i_{Þ ¼}_e0:6304 ₀_:₀₀₀₉ð0:6304Þ

i1

ði1Þ! þ0:0586

ð0:6304Þi2 ði2Þ!

þ0:4531ð0:6304Þ

i3

ði3Þ! þ0:4279

ð0:6304Þi4

ði4Þ! þ0:0474

ð0:6304Þi5 ði5Þ!

þ0:0157ð0:6304Þ

i6

ði6Þ!

!

;

Fcccccc_ð_i_{Þ ¼}_e0:2930

0:0011ð0:2930Þ

i2

ði2Þ! þ0:0840

ð0:2930Þi3 ði3Þ!

þ0:3829ð0:2930Þ

i4

ði4Þ! þ0:3863

ð0:2930Þi5 ði5Þ!

þ0:1243ð0:2930Þ

i6

ði6Þ! þ0:0229

ð0:2930Þi7 ði7Þ!

!

(30)

FcccðiÞ ¼e0:2402 0:1527ð0:2402Þ

i2

ði2Þ! þ0:4389

ð0:2402Þi3 ði3Þ!

þ0:3431ð0:2402Þ

i4

ði4Þ! þ0:0591

ð0:2402Þi5 ði5Þ!

þ0:0035ð0:2402Þ

i6 ði6Þ!

!

:

Georgian model distributions:

FmixðiÞ ¼e0:6216 ₀_:₀₁₃₆ð0:6216Þ

i1

ði1Þ! þ0:2199

ð0:6216Þi2 ði2Þ!

þ0:3770ð0:6216Þ

i3

ði3Þ! þ0:2446

ð0:6216Þi4 ði4Þ!

þ0:1244ð0:6216Þ

i5 ði5Þ!

!

;

Fcc_ð_i_{Þ ¼}_e0:6546

0:0110ð0:6546Þ

i1

ði1Þ! þ0:2446

ð0:6546Þi2 ði2Þ!

þ0:3787ð0:6546Þ

i3

ði3Þ! þ0:2449

ð0:6546Þi4

ði4Þ! þ0:1244

ð0:6546Þi5 ði5Þ!

!

;

FccccðiÞ ¼e0:5499 0:0012ð0:5499Þ

i1

ði1Þ! þ0:0964

ð0:5499Þi2 ði2Þ!

þ0:2898ð0:5499Þ

i3

ði3Þ! þ0:3722

ð0:5499Þi4

ði4Þ! þ0:1687

ð0:5499Þi5 ði5Þ!

þ0:0693ð0:5499Þ

i6 ði6Þ!

!

;

FccccccðiÞ ¼e0:5877 ₀_:₁₂₀₀ð0:5877Þ

i3

ði3Þ! þ0:3935

ð0:5877Þi4 ði4Þ!

þ0:2625ð0:5877Þ

i5

ði5Þ! þ0:1625

ð0:5877Þi6 ði6Þ!

þ0:0090ð0:5877Þ

i7

ði7Þ! þ0:0518

ð0:5877Þi8 ði8Þ!

!

(31)

FcccðiÞ ¼e0:3094 0:0118ð0:3094Þ

i1

ði1Þ! þ0:2229

ð0:3094Þi2 ði2Þ!

þ0:4062ð0:3094Þ

i3

ði3Þ! þ0:2101

ð0:3094Þi4

ði4Þ! þ0:1091

ð0:3094Þi5 ði5Þ!

þ0:0339ð0:3094Þ

i6 ði6Þ!

!

:

Table 6

Language Structure Phonological structure length

English Mixed case e11¼ ð1:0000=1;0:7289=2;0:1281=3Þ

cc e11¼ ð1:0000=1;0:6696=2;0:1221=3Þ

cc cc e11¼ ð1:0000=1;0:8525=2;0:3287=3Þ

cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:8018=3;0:3042=4Þ

ccc e11¼ ð1:0000=1;0:8051=2;0:1762=3;0:0468=4Þ

French Mixed case e11¼ ð1:0000=1;0:8732=2;0:3737=3;0:0029=4;0:0029=5Þ

cc e11¼ ð1:0000=1;0:8659=2;0:3300=3;0:0096=4Þ

cc cc e11¼ ð1:0000=1;0:9616=2;0:6507=3;0:1857=4;0:0284=5Þ

cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:9289=3;0:4118=4;0:0968=5Þ

ccc e11¼ ð1:0000=1;0:8262=2;0:2416=3;0:0335=4Þ

Latin Mixed case e11¼ ð1:0000=1;0:9860=2;0:7023=3;0:1820=4;0:0053=5Þ

cc e11¼ ð1:0000=1;0:9851=2;0:6451=3;0:1636=4;0:1260=5Þ

cc cc e11¼ ð1:0000=1;0:9877=2;0:8184=3;0:2161=4Þ

cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:9626=3;0:6417=4;0:0902=5Þ

ccc e11¼ ð1:0000=1;0:9775=2;0:7061=3;0:1156=4Þ

Spanish Mixed case e11¼ ð1:0000=1;0:9956=2;0:7737=3;0:2752=4;0:1033=5Þ

cc e11¼ ð1:0000=1;0:9405=2;0:7526=3;0:2892=4;0:0601=5Þ

cc cc e11¼ ð1:0000=1;0:9991=2;0:9405= 3;0:4874=4;0:0595=5;0:0121=6Þ

cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:9989= 3;0:9149=4;0:5320=5;0:1457=6;0:0214=7Þ

ccc e11¼ ð1:0000=1;1:0000=2;0:8473= 3;0:4184=4;0:0653=5;0:0062=6;0:0013=7Þ

Georgian Mixed case e11¼ ð1:0000=1;0:9864=2;0:7665= 3;0:3895=4;0:1419=5;0:0213=6Þ

cc e11¼ ð1:0000=1;0:9890=2;0:7444=3;0:3657=4;0:1208=5Þ

cc cc e11¼ ð1:0000=1;0:9988=2;0:9084= 3;0:6126=4;0:2404=5;0:0717=6Þ

cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:8659=

3;0:8659=4;0:4724=5;0:2099=6;0:0474=7;0:0465=8Þ

ccc e11¼ ð1:0000=1;0:9882=2;0:7653=3;0:3591= 4;0:1490=5;0:0399=6Þ

(32)

The ﬁndings of this study suggest that real structures are characterized by fuzzy phonological lengths (number of sounds in the structure). Words (car-teges of chosen elements) are represented by mixtures of certain focal car(car-teges

Br with focal probabilities mðBrÞ and by a fuzzy unimodal structure with a

length of ‘‘approximately 1’’; such a model is assumed for the mixed case and bags ð1=v;2=cÞ, ð1=v;3=cÞand ð1=v;1=uÞ. Bags ð2=v;6=cÞcorresponds to the fuzzy bimodal structure model with a length of ‘‘approximately 2’’.

The data concerning fuzzy structure lengths are to be found in Table 6.

References

[1] G. Birkhoﬀ, Lattice Theory, NY, 1981.

[2] F. Criado, T. Gachechiladze, Fuzzy random events and their corresponding conditional probability measures, Real Academia de Ciencias Exactas LXXXIX (1995).

[3] W. Fucks, Mathematical theory of word formation, Communication Theory, London, 1953.

[4] T. Gachechiladze, T. Manjaparashvili, Fuzzy generalized Bernoulli distributions, in: Proceed-ings of Tbilisi State University, Cybernetics, Applied Mathematics, vol. 224, 1981.

[5] T. Gachechiladze, T. Manjaparashvili, On fuzzy sets, Rep. of Tbilisi University 279 (1988). [6] T. Gachechiladze, T. Manjaparashvili, Fuzzy random events and corresponding probability

measures, Rep. of Tbilisi University (1990) 300.

[7] T. Gachechiladze, T. Manjaparashvili, Fuzzy linguistical models, in: Quantitative Linguistic, Tallin-Tbilisi, 1990.

[8] S. Kullback, Information Theory and Statistics, John Wiley, London, 1958.

[9] B. Mandelbrot, An information theory and statistical structure of language, Communication Theory, London, 1953.

[10] R. Megrelishvili, Structures of word and mathematical theory of word formation, in: Pro-ceeding of Tbilisi State University, Cybernetics, Applied Mathematics, vol. 289, 1989. [11] H. Weil, in: A. Baumler, Sch€ooter (Eds.), Filosoﬁe der mathematik und

Naturwissenschaft-Handbuch der Filosoﬁe, 1927.

[12] R. Yager, On the measures of fuzziness and negation, I, Intern. J. General Systems 5 (1979) 221.

[13] R. Yager, Level sets for evaluation of the grade of membership of the fuzzy sets, in: R. Yager (Ed.), Fuzzy Sets and Possibililty Theory, Pergamon Press, Oxford, 1984.

[14] R. Yager, On the theory of bags, Tech Report M11-601, IONA college, Machine Intelligence Inst. (1986).

[15] L. Zadeh, Fuzzy sets, Inform. and Control (8) (1965) 338.

[16] L. Zadeh, Probabililty measures and fuzzy events, J. Math. Anal. and Applic. 23 (1968) 424.

[17] L. Zadeh, The concept of linguistic variable and its application to approximate reasoning, Information Sciences 8 (1975) 199–249 (see also pp. 301–357).