• No se han encontrado resultados

16637756

N/A
N/A
Protected

Academic year: 2020

Share "16637756"

Copied!
32
0
0

Texto completo

(1)

The bag model in language statistics

F. Criado

a,*

, T. Gachechiladze

b

, H. Meladze

b

,

G. Tsertsradze

b

aFacultad de Ciencias, Campus de Teatinos, Universidad de M

a

alaga, 29071 Maalaga, Spain bDepartment of Applied Mathematics and Computer Science, Tbilisi State University,

1, chavchavadze Ave, Tbilisi 380028, Georgia

Received 8 May 2000; received in revised form 3 November 2001; accepted 30 January 2002

Abstract

In this paper, fuzzy quantitative models of language statistics are constructed. All suggested models are based on the assumption about a superposition of two kinds of uncertainties: probabilistic and possibilistic. The realization of this superposition in statistical distributions is achieved by the probability measure splitting procedure. In this way, the fuzzy versions of generalized binomial, Fucks and Zipf–MandelbrotÕs distributions are constructed describing the probabilistic and possibilistic organization of language at any level: morphological, syntactic or phonological. The main problem when constructing the quantitative model of some fuzzy linear structure is finding the corresponding linguistic spectrum, which is reduced to the solution of algebraic or transcendental equation systems by inverse spline-interpolation. In the final section, the general linear mathematical model of language structures is then described briefly, as well as bag statistics for consonantal structures of languages.

Ó 2002 Elsevier Science Inc. All rights reserved.

Keywords:Fuzzy sets; Membership functions; Probability theory; Linguistic modeling

www.elsevier.com/locate/ins

*Corresponding author.

E-mail address:[email protected](F. Criado).

0020-0255/02/$ - see front matter Ó 2002 Elsevier Science Inc. All rights reserved. PII: S 002 0- 02 5 5 ( 02 ) 002 01 - 3

(2)

1. Introduction

Fuzzy logic and fuzzy set theories were initially proposed to describe lin-guistic variables, i.e., to describe the meaning of words in natural language. Originally, Zadeh thought that the area of linguistics would be one of the major fields of application for this new formalism. Surprisingly, the main area of application is now control, and in comparison with control there are only a few applications in the field of linguistics. In view of this, this paper describes a new approach to the study of natural language.

A new approach to the representation of fuzzy sets as a result of set splitting is given in Section 2. This new approach has been applied to the representation of fuzzy sets as a result of the set splitting procedure into usual subsets of some universal set, which is convenient for describing possibilistic and probabilistic superpositions.

In Section 3 the characteristic laws of the split subset lattice (especially pseudo-complements and relative pseudo-complements) of the Browerian lat-tice of indicators (membership functions) of fuzzy sets, and as a consequence the measures of fuzziness, have also been considered in Section 5. The fuzziness is characterized by a relation betweeneAAandeAAD, which underlines the fact that

fuzziness is an intrinsic property of eAA and independent of the pseudo-com-plement.

The set splitting procedure is a new tool for defining and calculating random fuzzy event probabilities. On this basis, Section 7 deal with new generalizations of binomial, Zipf–Mandelbrot and other distributions have been obtained, describing the possibilistic–probabilistic organization of structures created by different language elements. In Section 8 the general linear mathematical model of language structures and its main characteristics are described briefly. The possibilistic characteristics of these models are represented by components of the so-called linguistic spectrum.

Applications of these models to language structures are presented in Section 9.

2. Set splitting

Let Xbe a finite set and Aany subset, AX. Consider a correspondence

IA! ðIe

A

A;IeAADÞ, whereIA is the indicator of subsetA,IeAA;IeAAD 2 ½0;1 X

and

IAðxÞ ¼Ie A

AðxÞ þIeAADðxÞ 8x2X: ð1Þ

Ais a support of mappings I

e

A andIAeD.

According to Zadeh [15], splitting components I

e

A and IAeD are fuzzy subsets

ofX. CallI

(3)

The procedure in which indicator IA is compared with a pair ðIe A A;IeAADÞ is

called ‘‘splitting of indicatorIA(subsetA)’’.1

The splitting procedure of some subsetsA;BXinduces the corresponding splitting of the union and intersection of these two subsets. For split indicators

I

f

A\B

A\B and IAAf[[BB it is essential to fulfill the natural conditions (as for non-split

ones)

I

eAAðxÞ;IeBBðxÞPIAAf\\BBðxÞ; IeAAðxÞ;IeBBðxÞ

6I

f

A[B

A[BðxÞ; x2X:

Then, as it is easy to see for intersection and union indicators, the following expressions are obtained:

If

A\B

A\BðxÞ ¼IeAAðxÞ ^IeBBðxÞ 8x2X ð^ minÞ ðsimultaneous splittingÞ;

ð2Þ

If

f

A\B Af\B A\B A\B

ðxÞ ¼Ie A

AðxÞ IeBBðxÞ 8x2X ðsequential splittingÞ; ð3Þ

I

f f

A[B Af[B A[B A[B

ðxÞ ¼I

eAAðxÞ þIeBBðxÞ Iff A\B Af\B A\B A\B

ðxÞ 8x2X ðsequential splittingÞ;

ð4Þ

If

A[B

A[BðxÞ ¼IeAAðxÞ _IeBBðxÞ 8x2X ð_ minÞ ðsimultaneous splittingÞ:

ð5Þ

3. The lattice of split elements of ordinary indicators’ Boolean lattice

Consider the Boolean lattice I¼ ðf0;1gX;_;^Þwith natural order. The set of all split elements of this lattice with natural order I ¼ ð½0;1X;_;^Þis a lattice.

Theorem 1.I is a Brouwer’s lattice.

A direct demonstration of this theorem (i.e., the demonstration that for any two elementsI

eAA andIeBB 2I

the set of allI

e

X X 2I

such that I

eAA^IeXX IeBB

2 has the greatest elementðI

eBB :IeAAÞcalled the relative pseudo-complement ofIeAAinIeBB)

can be made by [1]. It is easy to see that

1Notice that component

Ie

A

Acan be split again:IeAA¼ ðIeAAAAee;IeeeAAAADÞ, whereIee

A A

eAA

¼vIe

A

A,IeeeAAAAD¼ ð1vÞIe

A A;

v:X! ½0;1. Two sequential splittings induce the splitting of initial subset IA¼ ðvlIA;ð1vlÞIAÞ; lIA¼Ie

A

A; ð1lÞIA¼IeAAD; l:X! ½0;1: 2

(4)

ðIe B

B :IeAAÞðxÞ ¼

1; Ie

A AðxÞ

6I

eBBðxÞ IACðxÞ _I

eBBðxÞ; IeAAðxÞ>IeBBðxÞ

(

8x2X; ð6Þ

where IAC ¼ ðI;;I

eAAÞ is a pseudo-complement of IeAA and, as a function of x,

represents the indicator of the usual complement of set A in X. Next, the following theorem is easy to demonstrate.

Theorem 2.The following statements hold in latticeI:

ðiÞ If Ie A

A IeBB;thenðI;:IeBBÞ ðI;:IeAAÞ;

ðiiÞ Ie A

A ðI;:ðI;:IeAAÞÞ;

ðiiiÞ ðI;:IeAAÞ ¼ ðI;:ðI;:ðI;:IeAAÞÞÞ;

ðivÞ ðI;:ðIe

A

A_IeBBÞÞ ¼ ðI;:IeAAÞ ^ ðI;:IeBBÞ;

ðvÞ ðI;:ðI

eAA^IeBBÞÞ ¼ ðI;:IeAAÞ _ ðI;:IeBBÞ:

ðviÞ ðI

eAA:IeBBÞ ^IeAA¼IeAA;

ðviiÞ ðI

eAA:IeBBÞ ^IeBB¼IeAA^IeBB;

ðviiiÞ ððIe A

A:IeBBÞ:IeCCÞ ¼ ðIeAA:IeCCÞ ^ ðIeBB:IeCCÞ;

ðixÞ ðIe A

A:ðIeBB_IeCCÞÞ ¼ ðIeAA :IeBBÞ ^ ðIeAA:IeCCÞ:

ð7Þ

4. The splitting of a set

The splitting of a set, which as already seen corresponds to the indicator splitting, is represented by

ðIA! ðIeAA;IeAADÞÞ¡ðA! ðeAA;AAe DÞÞ;

ðIA¼Ie A

AþIeAADÞ¡ðA¼AAeeAA

DÞ: ð8Þ

Hereis the operation of set synthesis.

On the basis of (8) one can obtain a more general expressionAAeBBe, which obviously will make sense provided that eBB :eAA, or eAA :eBB. One can also obtain the existence conditions for expressions eAABBeCCe, etc.

Considering that such a condition holds for the above expressions, one can easily prove that

(5)

ðiÞ AAeBBe¼eBBeAA;

ðiiÞ AAe ðeBBCCeÞ ¼ ðAAeBBeÞ CCe;

ðiiiÞ ðAAeAAeDÞ \ ðeBBBBeDÞ ¼ ðAAg\\BBÞ ðAAg\\BBÞD

¼ ðeAA\BBeÞ ½ðA\BBeDÞ [ ðeAAD\BÞ;

ðivÞ ðAAeAAeDÞ [ ðeBBBBeDÞ ¼ ðAAg[[BBÞ ðAAg[[BBÞD

¼ ðeAA[BBeÞ ½ðeAAD\eBBDÞ [ ðAC\BBeDÞ [ ðeAAD\BCÞ;

ðvÞ AAe ðeBB\CCeÞ ¼ ðeAAeBBÞ \ ðeAACCeÞ;

ðviÞ AAe ðeBB[CCeÞ ¼ ðeAAeBBÞ [ ðeAACCeÞ:

ð9Þ

For example, to prove the last two formulae, one can write

ðvÞ eAA ðeBB\CCeÞ¡Ie A

Aþ ðIeBB^IeCCÞ ¼ ðIeAAþIeBBÞ ^ ðIeAAþIeCCÞ ¡ðeAABBeÞ \ ðAAeCCeÞ:

ðviÞ eAA ðeBB[CCeÞ¡Ie A

Aþ ðIeBB_IeCCÞ ¼ ðIeAAþIeBBÞ _ ðIeAAþIeCCÞ ¡ðeAABBeÞ [ ðAAeeBBÞ:

Let it be assumed that in these formulae the following relations hold:

ðI

f

A\B

A\B ¼IeAA^IeBBÞ¡ðAAg\\BB¼eAA\eBBÞ;

ðI

f

A[B

A[B ¼IeAA_IeBBÞ¡ðAAg[[BB¼eAA[eBBÞ

ð10Þ

which are evident because of (2), (5) and (8).

In the lattice of split subsets almost all Boolean lattice rules hold:

4(1) Reflexivity: AAe eAA.

3(2) Antisymmetry:ðeAAeBB;BBeeAAÞ )AAe¼eBB. 3(3) Transitivity: ðeAAeBB;eBBCCeÞ ) ðeAACCeÞ. 3(4) Idempotency: AAe\AAe ¼eAA and eAA[eAA¼AAe.

3(5) Commutativity: AAe\eBB¼BBe\AAe and eAA[BBe¼eBB[eAA.

3(6) Associativity: ðeAA\BBeÞ \CCe¼eAA\ ðeBB\CCeÞandðeAA[eBBÞ [CCe¼AeA[ ðBBe[CCeÞ. 3(7) Distributivity: AAe\ ðBBe[CCeÞ ¼ ðeAA\eBBÞ [ ðeAA\CCeÞ and AAe[ ðeBB\CCeÞ ¼

ðeAA[eBBÞ \ ðeAA[CCeÞ.

3(8) Annihilation laws: eAA\ ðeAA[eBBÞ ¼eAA and eAA[ ðeAA\eBBÞ ¼eAA.

3(9) Involution law for fuzzy complement::ð:AAeÞ ¼ eAA.

(10) Identity laws:AAe[ ; ¼eAA; AAe\X¼AAe and eAA[X¼X; AAe\ ; ¼ ;.

(11) Order inversion laws: ðeAAeBBÞ () ð:eBB :eAAÞ and ðeAAeBBÞ () ðeBBD

e

A ADÞ.

(12) De MorganÕs laws::ðeAA[eBBÞ ¼ ð:eAA\ :eBBÞand:ðeAA\eBBÞ ¼ ð:eAA[ :eBBÞ. In connection with the introduced notion of dual subsets one can prove the following laws:

(6)

(13) Involution law for the dual subset: ðeAADÞD¼eAA:

(14) Duality laws for the union and intersection of split subsets:

ðeAA[eBBÞD¼ ðeAAD\BBeDÞ [ ðAC\BBeDÞ [ ðBC\eAADÞ;

ðeAA\eBBÞD¼ ðA\BBeDÞ [ ðeAAD\BÞ:

Notice that in latticeI laws of contradiction and tertium non-datur do not hold.

5. Dual element and fuzziness (qualitative consideration)

As illustrated before, the dual element plays an important role in describing split subset lattices. Now, the role of the dual element in understanding fuzz-iness will be considered.

There is an important difference between usual and fuzzy subsets. The usual subset (set) can be represented as an aggregate of real objects only when the real measured potential possibility of aggregate formation corresponds to fuzzy subsets. Fuzzy subset is a medium of formation for real aggregate. It is im-portant to notice that the term ‘‘medium of formation’’ is borrowed from Weil [11] to underline the following circumstance: Any sequence of research out-comes is a result of acts of free decision-making by the subject (observer), any concrete sequence is a crisp finite subset of some universum, but the fuzzy subset is analogous of WeilÕs continuum.

In the lattice of fuzzy subsets a dual element eAAD is defined by splitting procedure [2,5,6]. Its sense can be explained as follows: the value of the membership functionIe

A

AðxÞis a degree of concordance of an elementxwith the

concept represented byAAe; the valueIe A

ADðxÞhas the same sense with respect to

the concept represented byAAeD, which together withAAe,ðeAA;AAeDÞdefines a crisp subsetA. The nearer (in some sense) AAe and eAAD are [12], the more fuzzy the following statement is ‘‘Elements of A possess property eAAðeAADÞ’’. Below, a qualitative description of fuzziness is considered analogously with [12], but with the following difference: In [12], the fuzziness is characterized by the re-lationfbetweeneAAand ZadehÕs negation:eAA. In the present case, the less rigid relationubetweeneAAandeAAD, which in the authorsÕopinion underlines the fact that fuzziness is an intrinsic property of AAe and is independent of the pseudo-complement, is assumed as a basis. The basis for considering the relationuis a relation in distributive lattice, ‘‘CCe is betweenAAe and eBB,ðeAA;CCe;BBeÞ’’ [12].

Definition 1.Let XXe and YYe 2L (distributive lattice). XXe is no less fuzzy than e

Y

Y ðXXeuYYeÞifXXe Y andðXXeYÞD¼XXeDY are inLbetweenYYe andYYeD. Here

(7)

ðXXeuYYeÞ ¼ ðeðeYY;XXe Y;YYeDÞ

Y

Y;XXeDY;YYeDÞ

() YYe\YYeDXXe Y YYe[YYeD:

Theorem 3.Relationuis reflexive and transitive onL,i.e.,

ðXXeuXXeÞ and ½ðXXeuYYeÞand ðYYeuZZeÞ ) ðXXeueZZÞ:

It can be seen thatuonLis not antisymmetric and, therefore, not a partial order.

Theorem 4.RelationuonL is such that

(1) ðXXeuXXeDÞand ðXXeDuXXeÞ.

(2) ðXXeuYYeÞ () ðXXeDuYYeÞ () ðXXeDuYYeDÞ () ðXXeuYYeDÞ.

On the lattice L, let a relation E be defined so thatðXX Ee YYeÞ if XXe ¼YYe or e

X

XD¼YYe or XXe ¼YYeD. It can be shown that Eis an equivalence relation. Each

equivalence class consists of a fuzzy subset and its respective dual. If XXe ¼XXeD

then the equivalence class consists of only one element.

The subset consisting of any fuzzy subset and its respective dual subset is called the dual pair. According to Theorem 4, if one component of the dual pair is more fuzzy than any component of the other pair, then any component of the first pair is more fuzzy than any component of the second pair. So it is reasonable to introduce the notion of fuzziness of the dual pair.

Definition 2. Let L be a set of dual pairs. Define on L a relation U so that

ðeuuUevvÞforeuu;evv2L, if one can say that the dual paireuu is no less fuzzy than the dual pairevv.

It is easy to demonstrate that relationUon the set of dual pairs is a partial order relation.

6. Probability measure splitting

Let ðX;B;pð ÞÞ be a given probability space. The probability of the event

K2Bis calculated by formula

pðKÞ ¼

Z

X

IKðxÞpðdxÞ: ð11Þ

According to the splitting procedure of the set K, this formula can be re-written in the following form:

(8)

pðKKeKKeDÞ ¼

Z

X I

e

K

KðxÞpðdxÞ þ

Z

X I

eKKDðxÞpðdxÞ; ð12Þ

whereIe K

Kis aB-measurable membership function (the corresponding subsetKKe

is a fuzzy random event). DefinepðKKeÞandpðeKKDÞas follows:

pðKKeÞ ¼ Z

X Ie

K

KðxÞpðdxÞ and pðKKe DÞ ¼

Z

X Ie

K

KDðxÞpðdxÞ; ð13Þ

the probability of fuzzy event KKe and the probability of dual fuzzy event KKeD,

respectively. Let representation

pðKÞ ¼pðeKKKKeDÞ ¼pðKKeÞ þpðKKeDÞ ð14Þ

be called the procedure of probability measure splitting [16].

7. Fuzzy distributions

7.1. Binomial distribution withfuzzy elementary events

Let A¼ f0;1g be the space of elementary events.

One can obtain the fuzzy elementary events by splitting usual eventsf0gand f1g. For membership functions one can write

vf0gðxÞ ¼l0ðxÞvf0gðxÞ þ ð1l0ðxÞÞvf0gðxÞ;

vf1gðxÞ ¼l1ðxÞvf1gðxÞ þ ð1l1ðxÞÞvf1gðxÞ;

ð15Þ

wherel0;l1:A! ½0;1,x¼0;1.

According to (13), the probability of fuzzy elementary events is

pfe00g ¼l0p0; pfe11g ¼l1p1; ð16Þ

wherep0andp1are the probabilities of the corresponding crisp events. Now it is easy to write the split binomial distribution corresponding to fuzzy elementary events. Only two variants will be considered: completely simulta-neous and completely sequential. The intermediate cases are not of any interest and for this reason they will not be considered here.

For the completely simultaneous case, the split binomial distribution is

pðgBBnn;;nnÞ ¼l1pn1;

pðgBBnn;;00Þ ¼l0ð1p1Þ

n ;

pðgBBnn;;kkÞ ¼ ðl0^l1Þ

n

k p

k

1ð1p1Þ

nk

; k¼1;. . .;n1;

ð17Þ

whereBBgnn;;kk is the fuzzy Bernoulli event. The normalization factor is

p1ðeAAnÞ ¼ ½ðl0^l1Þ þ ðl1 ðl0^l1ÞÞp

n

1þ ðl0 ðl0^l1ÞÞð1p1Þ

n

(9)

For the completely sequential case one gets

pðBBBBgggnnnn;;;;kkkkÞ ¼ n

k ðl1p1Þ

k

ðl0ð1p1ÞÞnk ð18Þ

and

p1ðAAfAAffnnnnÞ ¼ ½l

0þ ðl1l0Þp1

n :

The important characteristic of split Bernoulli probability (17) is the com-position law; in the simultaneous case

pðgBBnn;;kk;p1p2Þ ¼

Xn

m¼0

pðBn;m;p1ÞðBBgmm;;kk;p2Þ ð19Þ

and in the sequential case

p BBBBgggnnnn;;;;kkkk;

l1l2p1p2

ðl0þ ðl1l0Þp1Þðl0þ ðl1l0Þp2Þ

¼X

n

m¼0

p BBBBgggnnnn;;;;mmmm;

l1p1

l0þ ðl1l0Þp1

p

ee

An Aen An An

Bm;k;

l2p2

l0þ ðl1l0Þp2

ð20Þ

and

p

ee

An Aen An An

Bn;k;

l1l2p1p2

ðl0þ ðl1l0Þp1Þðl0þ ðl1l0Þp2Þ

¼X

n

m¼0

p

ee

An Aen An An

Bn;m;

l1p1

l0þ ðl1l0Þp1

p

ee

An Aen An An

Bm;k;

l2p2

l0þ ðl1l0Þp2

:

As well as the characteristics of binomial probabilities in the case of fuzzy elementary events, one may consider the known property of exponential dis-tribution; in the simultaneous case

X1

m¼0

pðBBgmm;;nn;p1Þfðm;uÞ

¼ ðl0^l1Þð1vÞvnþ ðl1 ðl0^l1ÞÞð1uÞðp1uÞn; n6¼0;

X1

m¼0

pðBBgmm;;00;p1Þfðm;uÞ ¼

l0ð1uÞ

1 ð1p1Þu¼l0gð0;vÞ

ð21Þ

and in the sequential case

X1

m¼0

p

ee

An Aen An An

Bn;m;

l1p1

l0þ ðl1l0Þp1

fðm;uÞ ¼gðn;v0Þ; ð22Þ

where

(10)

v¼ p1u 1uþp1u

; v0¼ l1p1u

ð1p1Þl0þl1p1þ ð1p1Þl0u

:

7.2. The binomial distribution with fuzzy number of successes

Let setAn¼ f0;1; . . . ;ngbe considered. The fuzzy quantity ‘‘approximately k from n’’ is defined as the fuzzy subset of An. Therefore the corresponding

distribution is

pðBekk n;pÞ ¼

Xn

l¼0

le kkðlÞpðB

l

n;pÞ; ð23Þ

where le

kkðlÞ is the membership function of fuzzy number ‘‘approximate k

fromn’’.

This distribution is also called the binomial distribution because it is char-acterized by the above composition law and the property of exponential dis-tribution.

7.3. Fuzzy upper binomial distribution

The consideration of the usual upper binomial distribution is based on the model of superposition of two events. The Bernoulli event and the emergence of the total amount of failures characterized by a priori probabilitypðB0Þ ¼ 1c.

If p1 is the probability of elementary success, l0 and l00 are values of

membership functions corresponding to complicated eventsð0zfflfflfflfflfflfflfflffl}|fflfflfflfflfflfflfflffl{;0;. . .;0Þ

n

when distinguishing the events of a Bernoulli and non-Bernoulli origin, then the universal set X, which is the composition B0 ðS

n

i¼0Bn;iÞ, is split in the

fol-lowing way:

X¼ B0 f [n

i¼0 [n

i¼0

Bn;i

!

B0

f [n

i¼0 [n

i¼0

Bn;i

!D :

The corresponding membership function

v B0 f [n

i¼0 [n

i¼0

Bn;i

!ðx1;...;xnÞ

¼l00vB0ðx1;. . .;xnÞ þl0vBn;0ðx1;. . .;xnÞ

þX

n

i¼1

vBn;iðx1;. . .;xnÞ

(conditionBBgnn;;00fBB00 :ðS

n

(11)

The probability measure corresponding to fuzzy upper binomial distribution is

p B0

f [n

i¼0 [n

i¼0

Bn;i

!ðB0Bn;i;p1;cÞ

¼ 1

p B0

fSn

i¼0 Sn i¼0

Bn;i

!

l0

0ð1cÞ þl0cð1p1Þ

n

; i¼0;

c n

i p

i

1ð1p1Þ

ni

; i¼1;. . .;n;

8 > < >

: ð24Þ

where

p B0

f [n

i¼0 [n

i¼0

Bn;i

!

¼l00ð1cÞ þl0cð1p1Þ

n

þcð1 ð1p1ÞnÞ:

The Poisson limit

PðiÞ ¼l00ð1cÞ þl0cecþcð1ecÞ 1

l0

0ð1cÞ þl0cec; i¼0;

cexp cci i!

; i¼1;2;. . .;

(

ð25Þ

whereiiandcare connected by the relation

ii¼ cc

l0

0ð1cÞ þl0cecþcð1ecÞ

:

From a practical viewpoint, what is interesting is the expression of the sum over all values ofl0andl00:

3

e

P

PðiÞ ¼ 1 ð1e

cÞn; i¼0; nexp cci

i!

; i¼1;2;. . .;

ð26Þ

where

n¼c

Z Z

06l0;l0061

l00ð1

cÞ þl0cecþcð1ecÞ 1

dl0dl00:

Taking into account the relation between c;nandii, then

e

P PðiÞ ¼

1nð1eii=nÞ; i¼0;

nexp ii n

ðii

i

i!

; i¼1;2;. . .

8 < :

3Notice that formula (26) does not itself contain any fuzziness, being a nice instance of

(12)

7.4. Negative binomial distribution withfuzzy elementary events [8]

Let the sequence of Bernoulli trials with probability of fuzzy success

pðe11Þ ¼l1p1 be considered ðl1:f0;1g ! ½0;1Þ, p1 the probability of usual Bernoulli elementary event, fðk;r;p1Þ denotes the probability that the rth success takes place in ðkþrÞth trial, provided that trials are continued up to

rth success. Accepting a splitting scheme that is used for binomial distribution with fuzzy elementary events, one can write

fðk;r;pðe11ÞÞ ¼ ðl0^l1Þ

rþk1

k

pr

1ð1p1Þ

k

; k>0;

l1pr

1; k¼0:

8 <

: ð27Þ

Since for anym>0,

mþk1

k

¼ ð1Þk m

k

;

then the above formula can be written in the following form:

fðk;r;pðe11ÞÞ ¼ ðl0^l1Þ r

k

ð1Þkpr

1ð1p1Þ

k; k>

0;

l1pr1; k¼0:

8 <

: ð28Þ

Define negative binomial distribution with fuzzy elementary events, but fixed real numberr>0and 0<p1<1 as sequence

fðpÞ1fðk;r;pðe11ÞÞg; ð29Þ

where

p¼X

1

k¼0

fðk;r;pðe11ÞÞ ¼ ðl0^l1Þ þ ½l1 ðl0^l1Þpr1:

Note that ifl0;l1!1, orðl0^l1Þ ¼l1, then (29) reduces to usual negative binomial distribution.

7.5. Fuzzy Fucks’ distribution

As in the case of ‘‘upper Bernoulli’’ distribution, all variants of FucksÕ distribution [3] are based on the assumption that FucksÕ event is a superposi-tion of Bernoulli and deterministic events

Uk

n;r;p1 ¼BrB

kr nr; U

k n;p1¼

[n

r¼0

BrBkn;rr;p1

; ð30Þ

whereBr is deterministic (certainlyrsuccesses inn trials) andBnkrr;p1 is a Ber-noulli event ((kr) successes in (nr) random events).

(13)

There are many variants of FucksÕevent splitting, but only some of them are considered in this paper.

(1) The deterministic event is non-fuzzy, but Bernoulli elementary events are fuzzy. In this case

g

Uk n;p1

Uk n;p1 ¼

[n

r¼0

BrBBgkn;r;rp1

kr n;r;p1

:

The corresponding probability measure is

PðUgk n;p1 Uk

n;p1Þ ¼ Pn

r¼0qrðl0^l1Þ nr kr

pkr

1 ð1p1Þnk; k¼1;2;. . .;n1; Pn

r¼0qrl1p1nr; k¼n; qol0ð1p1Þn; k¼0; 8

> > < > > :

ð31Þ

(for simultaneous splitting) with

Xn

k¼0

PðUgk n;p1

Uk n;p1Þ ¼

l1þ ðl0^l1Þq0ð1p1Þ

n

; l0Pl1;

l0þ ðl1l0Þ Pn

r¼0qrpn1r; l0<l1

and

PðUggk n;p1

Uk n;p1 g

Uk n;p1

Uk n;p1Þ ¼

Xn

r¼0

qr nr

kr

ðl1p1Þkrðl0ð1p1ÞÞnk ð32Þ

(for sequential splitting) with

Xn

k¼0

PðUggk n;p1

Uk n;p1 g

Uk n;p1

Uk n;p1Þ ¼

Xn

r¼0

qrðl0þ ðl1l0Þp1Þ

nr :

Hereqr is connected to linguistic spectrum [3].

(2) Br events are splitðBr¼BBfrrfBBrrDÞand Bernoulli events are crisp:

g

Uk n;p1

Uk n;p1 ¼

[n

r¼0

fBBrrBknrr;p1

:

Evidently

PðUgk n;p1

Uk n;p1Þ ¼

Xn

r¼0

vrqr nr

kr

pkr

1 ð1p1Þ

nk;

ð33Þ

wherevr is the membership function of fuzzy setfBBrr and

Xn

k¼0

PðUgk n;p1

Uk n;p1Þ ¼

Xn

r¼0

vrqr: ð34Þ

(3) In the case when both deterministic and Bernoulli events are split, one must discriminate clearly the simultaneous and successive or sequential splitting

(14)

of FucksÕevent. In the last case it is easy to obtain the final result. Consid-eration of the two aforesaid cases allows one to write

g g

Ukn;p1 Ugkn;p1 Ukn;p1 Ukn;p1 ¼[

n

r¼0

Br

g

Bkr nr;p1

Bkr nr;p1 g g

Bkr nr;p1

Bkr nr;p1 g

Bkr nr;p1

Bkr nr;p1 8 < : 0 @ 1

A; ð35Þ

consequently

PðUggk n;p1

Uk n;p1 g

Uk n;p1

Uk n;p1Þ ¼

Pn

r¼0qrðvr^ ðl0^l1ÞÞ

nr

kr

pkr

1 ð1p1Þ

nk

; k¼1;. . .;n1;

Pn

r¼0qrðvr^l1Þpn1r; k¼n; ðv0^l0Þq0ð1p1Þ

n

; k¼0;

8 > > < > > : ð36Þ

(simultaneous splitting of Bernoulli event) and

PðUggk n;p1

Uk n;p1 g

Uk n;p1

Uk n;p1Þ ¼

Xn

r¼0

vrqr nr

kr

ðl1p1Þkrðl0ð1p1ÞÞnk ð37Þ

(completely sequential splitting of Bernoulli event). When Fucks’ event is split simultaneously the author’s reasoning is as follows: ðBrBknrr;p1Þ is a realized chain of distributed successes and failures, a chain that is a concatenation of two others: Deterministic in which there are onlyr suc-cesses and Bernoulli sequence of length (nr) containing krsuccesses. Therefore simultaneous splitting must take place according to the rule

lðBrBBgknrr;p1

kr

nr;p1Þ ¼lðfBBrrÞ ^lðBg

kr nr;p1

Bkr

nr;p1Þ: ð38Þ Consequently

PðUgk n;p1 Uk

n;p1Þ ¼

Pn

r¼0qrðvr^ ðl0^l1ÞÞ

nr

kr

pkr

1 ð1p1Þ

nk

; k¼1;. . .;n1;

Pn

r¼0qrðvr^l1Þp1nr; k¼n; ðv0^l0Þq0ð1p1Þn; k¼0:

8 > > < > > : ð39Þ

The considered fuzzy FucksÕ distributions play a leading part in con-structing fuzzy quantitative micro-linguistic models of language.

(4) Some language structures are often described by generalized FucksÕ distri-butions when there are two kinds of successes with probabilitiesp andq. In this case

P

P0ðUkg

n;Wr;p;q Uk

n;Wr;p;qÞ ¼

Xn

r¼0

qr

nr

kr

(15)

If one is only interested in one kind of success then

P P00ðUkg

n;Wr;p;q Uk

n;Wr;p;qÞ ¼

Xn

r¼0

qr nr

kr

Z 1

0

ðpqÞkrð1pþpð1qÞÞnkdp: ð41Þ

The corresponding Poisson limit (n! 1;q!0and aiiexpPn r¼0rqr¼ q=2ðnPnr¼0rqrÞ ¼const) is

F0ðUgk Wr

Uk WrÞ ¼e

aX

1

r¼0

qr akr

ðkrÞ!/krðaÞ; ð42Þ

where

/krðaÞ ¼

ea

2akrþ1 Z 2a

0

tkretdt¼e

aCðkrþ

2akrþ1 Pðkrþ1;2aÞ; ð43Þ

CðzÞ is an Euler integral, and Pðkrþ1;2aÞ an incomplete function. Taking into account the relation between the incomplete gamma-function andv2-distribution one finally obtains

F0ðUgk Wr

Uk WrÞ ¼

1 2a

X1

r¼0

qrPð4ajkrþ1Þ; ð44Þ

wherePð4ajkrþ1Þis v2-distribution with 2ðkrþdegrees of free-dom. Distribution (44) is called the ‘‘v2-distribution with approximately ðkrþ1Þdegrees of freedom’’.

7.6. Fuzzy Zipf–Mandelbrot distribution

It is a well known that MandelbrotÕs theory of recurrent coding constitutes the basis of statistical macro-linguistics. If the vocabulary of volumeRis di-vided into S classes according to informational cost [9] of words of a given class, then the probability of the word ofkth class can be expressed as

pk¼PMBCk; ð45Þ

whereP;M;B do not depend on the cost and Ck is a kth class informational

cost.

Let three cases of splitting be considered.

(1) The set of classesK¼ ðk1;k2;. . .;kSÞ ¼ KKe KKeD. In this case

f

pki

pki ¼lekkðkiÞpki; i¼1;. . .;S: ð46Þ

(2) The set of informational costs C¼ ðc1;. . .;cSÞ ¼CCe CCeD. Since pk is a

function ofck, according to the principle of generalization [17] one obtains

f

pki

(16)

(3) When the number of classes is fuzzy numbereSS ¼S1s¼1ðleSSðSÞ=SÞ, by anal-ogy with binomial distribution with fuzzy number of trials, one can write

e

p

pk ¼X

1

S¼1

le S

SðSÞPðSÞM

BðSÞCk: ð48Þ

The above-mentioned formulae must be applied to the whole language as a formation medium, while the classical one must be applied to individual texts.

8. Linear structure

One of the research methods of a linear structure of language elements se-quence is the gap analysis method which consists in the following: the elements of a sequence are not distributed randomly (in disorder); any deviation from full disorder indicates the presence of some structure. The quantitative inves-tigation techniques are as follows: The pair of elements are fixed by some features; elements between the fixed ones are considered as gaps. Hence the sequence may have the following form:– – –½a1– – –½a2– – –½b1½a3– – –½b2– – –

½a4½b3– – –½b4– – –½a5– – –½b5– – –½a6– – –½b6– – –.

Let the structure defined by elements ½a–½b be considered. The complex consisting of ½a nearest ½b and the gaps between them are called ‘‘words’’ (MandelbortÕs definition).

So as to describe mathematically such word generation, the model is applied according to the generation process of any analyzing structure represented as the superposition of two processes: probabilistic and possibilistic. Therefore, one may apply the considered fuzzy probability measures for describing gap distribution by words. The gap analysis method, together with suggested modeling schemes, allows one to establish the structural dependence between elements of any level.

The main characteristics of linear models are the components of the lin-guistic spectrum (qr;/r;wr). Their determination is reduced to the solution of the system of equations

okGðy;aÞ

oyk

" " " "y

!1

¼iði1Þ ðikþ1Þexp; k¼1;2;. . .; ð49Þ

where

Gðy;aÞ ¼X

l

PðlÞyl;

PðlÞis the probability distribution of the chosen model,ais a known function of linguistic spectrum components andð Þexp are measured moments of gap distribution. A special method is elaborated for solving system (49). The

(17)

de-termination of the linguistic spectrum allows one to calculate the informational content of any given structure.

9. Bag statistics for consonantal structures of languages

One method of analyzing several structures of printed information entails the investigation of the probabilistic–possibilistic organization of some of the distribution elements determining the analytic structure [7]. From the point of view of the chosen elements, printed information can be considered as carteges of elements or YagerÕs bags, the main characteristics of which can serve as quantitative analytic parameters, and the probabilistic–possibilistic model parameters as the characteristics of the structures studied in this paper.

The probabilistic–possibilistic organization of bag distributions is described by generalized FucksÕ distribution [4]. Some of the results obtained from this distribution are given below. The aforesaid FucksÕdistribution is based on the superposition of the following two processes:

Uk

n;r;p¼BrBnkrr;p; U k n;p¼

[n

r¼0

BrBknrr;p

; ð50Þ

where Uk

n;r;p is a FucksÕ event, Bnkrr a BernoulliÕs event and Br the so-called

deterministic event [3]. There are many ways of splitting FucksÕevent [4]. But only the event related to the bag distribution model will be considered here. That is to say, the case in which BernoulliÕs event is classical but the deter-ministic event

e

U Ukn;p¼[

k

r¼0 e

B

BrBknrr;p ð51Þ

is split.

Let the structure of the set of events UUekn;p be described.

Before continuing, the following comments should be made: In fuzzy subset applications the problem of evaluating the membership grade is highly im-portant. The membership grade is a result of expert research determining (creating) the fuzzy subset. Let the method making it possible to reveal the membership function in a logically consistent way be considered. It is supposed that the fuzzy subset elements are such that I

eAAðx

0ÞPI

eAAðx

00Þ, x0;x002X0, if

x0"x00;I

eAA is a membership function of eAA. In the present caseX

0 consists of

BernoulliÕs eventsBir

nr,iis a full number of successes andris a fixed number of

successes determining the structure of eventfFFi

n;r;p and corresponding to event Br. The fuzzy subsets considered here are normalized. This permits one to

(18)

can be easily related to focal probabilities, without the necessity of making any additional assumptions.

Let the random experiment in which the level set notion is used and the YagerÕs algorithm is represented byx02X0[13] be considered. Firstly let value

a2 ½0;1and the element from the corresponding set ofa-level be chosen. Now let the probability of choosing specifically element x02X0 be calculated

ac-cording to the conditions established in this example. In accordance with this assumption

061626 6nmax¼1;

where r are values of the membership function (components of possibility

distribution, or components of the so-called linguistic spectrum [3]). The level sets are as follows:

when 06a61:B1¼ fx01;. . .;x

0

ng; 16a62:B2¼ fx02;. . .;x0ng; 26a63:B3¼ fx03;. . .;x

0

ng; . . .

n26a6n1:Bn1¼ fx0n1;x

0

ng; n16a6n:Bn¼ fx0ng;

<a:Ba¼ ;:

Because a was chosen randomly in this example, then the probability that level set Br will be chosen is equal to the length of interval ðr1; rÞ, mðBrÞ ¼rr1. Besides, an element is chosen from the level set in accordance

with BernoulliÕs probability model, thus

Fðchoose element x0jB

¼

nr

ir

pirð1pÞnrðirÞ

if x02B

r;

0if x062B

r:

8 > < >

: ð52Þ

Then, according to the formula of full probability

FnðiÞ ¼

Xn

r¼1

mðBrÞ

nr

ir

pirð1pÞni

ði¼1;nÞ: ð53Þ

Or in Poisson limit

FðiÞ ¼eaX

1

r¼1

mðBrÞ air

ðirÞ! ði¼1;nÞ: ð54Þ

iirr¼const,iiis the average empirical value of random variablen i¼i, rr¼P1r¼1rmðBrÞin full accordance with the model described in this paper.

(19)

The above example provides an explanation of the rule of probabilistic and possibilistic uncertainty index composition [2]. From (53) one can obtain

mðBrÞ ¼ea

Xr

k¼1

ð1Þk1Fðrkþ1Þ a

k1

ðk1Þ!; ðr¼1;nÞ: ð55Þ

The information contained in the distribution moments must be used for determining parameter a. This can be achieved by means of the relationship between focal and empirical moments

lfocalk ¼X

k

l¼0

ð1Þl r

l M

emp

kla

l; ð56Þ

where lfocal

k ¼

Pm

r¼1rðr1Þðrkþ1ÞmðBrÞ;M

emp

j

Pkðk1Þðkjþf

k; fk are the empirical frequencies.

In the case of the finite spectrum the higher moments from some order are equal to 0. This condition allows one to obtain the equation fora.

Another way of obtaining the equation foracan be formulated as follows: The empirical frequencies from some i are practically #0; in this case it is natural to assume that themðBrÞforr¼ialso#0. One obtains the equation

PðrÞ Pðr1ÞaþPðr2Þa

2

2!þ þ ð1Þ r1

Pð1Þ a

r1

ðr1Þ!¼0: ð57Þ

The numerical solution of such an equation does not present any difficulties. It is essential to choose the positive solution from those obtained from the aforementioned equation, which fulfills condition

aþX

r

m¼1

m¼ii:

It is worth mentioning that the choice of a solution in all cases is now the subject of further research. The method described above is applied to the in-vestigation of consonantal structures in English, French, Latin, Spanish and Georgian. Empirical data are obtained from [10], presenting word frequencies with consonantal structures in accordance with the number of syllables. Using YagerÕs notation [14], the following types of bags are subject to processing:

fccvg ¼ ð1=v;2=cÞ; fvccg ¼ ð1=v;2=cÞ;

fccvccg ¼ ð1=v;4=cÞ; fccvccvccg ¼ ð2=v;6=cÞ;

fvcccg ¼ ð1=v;3=cÞ; fcccvg ¼ ð1=v;3=cÞ:

Additionally, the mixed case representing all consonantal structures is considered. v represents a vowel and c a consonant. All the structures are typical of the above-mentioned languages.

(20)

From condition PðiÞ #0and data regardingii one obtains the following

equations for each of the languages under investigation: (1) English language:

(a) Mixed case

a38:7463a2þ13:6513a5:6083¼0;

(b) Structurecc

a36:9360a2þ10:0423a3:9026¼0;

(c) Structurecccc

a312:5278a2þ28:0481a12:8692¼0;

(d) Structurecccccc

a37:9731a2þ11:8423a1:8172¼0;

(e) Structureccc

a414:0925a3þ20:0506a212:7206aþ2:3685¼0: (2) French language:

(a) Mixed case

a314:7997a2þ37:2222a19:3223¼0;

(b) Structurecc

a314:0118a2þ32:4529a15:6912¼0;

(c) Structurecccc

a47:4352a3þ13:4000a210:0444aþ2:2222¼0;

(d) Structurecccccc

a430:3984a3þ80:7891a276:7813aþ17:5781¼0;

(e) Structureccc

a312:6322a2þ26:4220a13:9248¼0: (3) Latin language:

(a) Mixed case

a37:1387a2þ10:9416a4:0073¼0; (b) Structurecc

a35:9020a2þ8:6708a3:1547¼0; (c) Structurecccc

(21)

(d) Structurecccccc

a326:9878a2þ115:9512a43:9756¼0;

(e) Structureccc

a38:1323a2þ10:8228a3:6494¼0:

(4) Spanish language: (a) Mixed case

a514:9809a4þ66:0878a396:1527a2þ59:8855a14:7710¼0;

(b) Structurecc

a512:3107a4þ47:9429a358:3714a2þ28:7143a5:8286¼0;

(c) Structurecccc

a541:8090a4þ247:5000a3419:4231a2þ306:9231a82:3077¼0;

(d) Structurecccccc

a524:2440a4þ119:4896a3181:5311a2þ110:8134a19:7129¼0;

(e) Structureccc

a515:5704a4þ59:3172a360:6994a2þ22:4813a5:5953¼0:

(5) Georgian language: (a) Mixed case

a49:3378a3þ22:6638a239:2616aþ17:7409¼0;

(b) Structurecc

a48:8120a3þ26:7474a236:7270aþ14:8680¼0;

(c) Structurecccc

a517:7878a4þ113:1295a3260:9353a2þ282:0860a93:4530¼0;

(d) Structurecccccc

a414:0866a3þ46:2600a269:6806aþ27:7148¼0;

(e) Structureccc

a510:6601a4þ31:0880a352:4083a2þ42:4694a8:8487¼0:

The calculation results of parameter a, the spectral parameter values, the first empirical and focal moments and the empirical and model frequencies are given in Tables 1–5.

(22)

Table 1 English

N Mean value of word

length (experiment)

a Distribution characteristics

ii mðB1Þ=1 mðB2Þ=2 mðB3Þ=3 mðB4Þ=4

1 Mixed case 0.6983 0:2711

1:0000

0:6000 0:7289

0:1309

0:1281 –

2.5531 rr¼1:8654

2 Structurecc 0.6639 0:3304

1:0000

0:5475 0:6696

0:1167

0:1221 –

2.4630 rr¼1:7755

3 Structurecccc 0.6242 0:1474

1:0000

0:5238 0:8525

0:3337

0:3287 –

2.7997 rr¼2:1961

4 Structurecccccc 0.1732 0:0000

1:0000

0:1990 1:000

0:4945 0:8018

0:3042 0:3065

3.2886 rr¼3:0983

5 Structureccc 0.2965 0:1949

1:0000

0:6289 0:8051

0:1294 0:1762

0:0320 0:0468

2.3251 rr¼1:9689

Empirical and theoretical frequencies

Pð1Þ Fð1Þ

Pð2Þ Fð2Þ

Pð3Þ Fð3Þ

Pð4Þ Fð4Þ

Pð5Þ Fð5Þ

Pð6Þ Fð6Þ

Pð7Þ Fð7Þ

Pð8Þ Fð8Þ

1 0:1348

0:1348

0:3930 0:3930

0:3067 0:3067

0:1260 0:1260

0:0311 0:0311

0:0063 0:0068

0:0015 0:0015

0:0002 0:0001

2 0:1701

0:1701

0:3948 0:3948

0:2847 0:2847

0:1123 0:1122

0:0303 0:0284

0:0062 0:0054

0:0015 0:0015

0:0001 0:0000

3 0:0790

0:0790

0:3299 0:3299

0:3693 0:3693

0:1698 0:1694

0:0410 0:0463

0:0077 0:0077

0:0030 0:0033

0:0002 0:0003

4 0:0000

0:0000

0:1674 0:1674

0:4449 0:4449

0:3304 0:3304

0:0507 0:0503

0:0022 0:0042

0:0044 0:0024

0:0000 0:0000

5 0:1449

0:1449

0:5105 0:5105

0:2473 0:2412

0:0768 0:0750

0:0143 0:0140

0:0011 0:0017

0:0000 0:0017

0:0000 0:0000

F. Criado et al. / Information Sciences 147 (2002) 13–44

(23)

French

N Mean value of word

length (experiment)

a Distribution characteristics

ii mðB1Þ=1 mðB2Þ=2 mðB3Þ=3 mðB4Þ=4 mðB5Þ=5

1 Mixed case 0.7099 0:1218

1:0000

0:5145 0:8782

0:3608 0:3737

0:0000 0:0029

0:0000 0:0029

2.9602 rr¼2:2332

2 Structurecc 0.6791 0:1341

1:0000

0:5353 0:8659

0:3455 0:3300

0:0000 0:0096 –

2.8779 rr¼2:2412

3 Structurecccc 0.3641 0:0384

1:0000

0:3109 0:9616

0:6507 0:6507

0:1573 0:1857

0:0396 0:0284

3.2295 rr¼2:8819

4 Structurecccccc 0.3287 0:0000

1:0000

0:0711 1:0000

0:5171 0:9289

0:3050 0:4118

0:0979 0:0968

3.7877 rr¼3:4070

5 Structureccc 0.6470 0:1738

1:0000

0:5846 0:8262

0:2081 0:2416

0:0000 0:0335 –

2.9888 rr¼1:9673

Empirical and theoretical frequencies

Pð1Þ Fð1Þ

Pð2Þ Fð2Þ

Pð3Þ Fð3Þ

Pð4Þ Fð4Þ

Pð5Þ Fð5Þ

Pð6Þ Fð6Þ

Pð7Þ Fð7Þ

Pð8Þ Fð8Þ

1 0:0599

0:0599

0:2955 0:2955

0:3716 0:3716

0:1936 0:1929

0:0605 0:0603

0:0148 0:0133

0:0034 0:0023

0:0004 0:0003

2 0:0680

0:0680

0:3176 0:3176

0:3678 0:3733

0:1801 0:1843

0:0514 0:0545

0:0123 0:0120

0:0024 0:0015

0:0003 0:0002

3 0:0267

0:0000

0:2160 0:2160

0:4015 0:4015

0:2412 0:2412

0:0904 0:0904

0:0200 0:0100

0:0033 0:0029

0:0008 0:0008

4 0:0000

0:0000

0:0512 0:0512

0:3891 0:3891

0:3447 0:3447

0:1638 0:1631

0:0375 0:0373

0:0136 0:0148

0:0000 0:0000

5 0:0745

0:0745

0:3137 0:3137

0:3282 0:3282

0:1729 0:1715

0:0683 0:0608

0:0279 0:0327

0:0134 0:0032

0:0010 0:0045

F. Criado et al. / Information Sciences 147 (2002) 13–44 35

(24)

Table 3 Latin

N Mean value of word

length (experiment)

a Distribution characteristics

ii mðB1Þ=1 mðB2Þ=2 mðB3Þ=3 mðB4Þ=4 mðB5Þ=5

1 Mixed case 0.5456 0:0139

1:0000

0:2837 0:9860

0:5203 0:7023

0:1108 0:1820

0:0000 0:0053

3.4213 rr¼2:5854

2 Structurecc 0.5515 0:0149

1:0000

0:3400 0:9851

0:4815 0:6451

0:1510 0:1636

0:0000 0:1260

3.3504 rr¼2:7434

3 Structurecccc 0.5855 0:0123

1:0000

0:1693 0:9877

0:6023 0:8184

0:2284 0:2161

0:0000 0:0000

3.6045 rr¼3:0714

4 Structurecccccc 0.4196 0:0000

1:0000

0:0374 1:0000

0:3209 0:9626

0:5853 0:6417

0:0000 0:0902

4.1141 rr¼3:4087

5 Structureccc 0.5466 0:0295

1:0000

0:2724 0:9775

0:5895 0:7061

0:1284 0:1156 –

3.3514 rr¼2:8564

Empirical and theoretical frequencies

Pð1Þ Fð1Þ

Pð2Þ Fð2Þ

Pð3Þ Fð3Þ

Pð4Þ Fð4Þ

Pð5Þ Fð5Þ

Pð6Þ Fð6Þ

Pð7Þ Fð7Þ

Pð8Þ Fð8Þ

1 0:0081

0:0000

0:1644 0:1644

0:3912 0:3912

0:2998 0:2535

0:1098 0:0844

0:0245 0:0203

0:0020 0:0029

0:0002 0:0006

2 0:0086

0:0000

0:1959 0:1959

0:3854 0:3854

0:2831 0:2897

0:1030 0:1000

0:0217 0:0217

0:0022 0:0035

0:0001 0:0004

3 0:0069

0:0000

0:0943 0:0943

0:3906 0:3906

0:3397 0:3397

0:1351 0:1351

0:0307 0:0347

0:0021 0:0059

0:0005 0:0008

4 0:0000

0:0000

0:0246 0:0246

0:2213 0:2213

0:4754 0:4754

0:1803 0:1803

0:0902 0:0365

0:0081 0:0050

0:0000 0:0005

5 0:0130

0:0130

0:1580 0:1650

0:4283 0:4296

0:2850 0:2850

0:0961 0:0751

0:0195 0:0210

0:0000 0:0033

0:0000 0:0000

F. Criado et al. / Information Sciences 147 (2002) 13–44

(25)

Spanish

N Mean value of word length (experiment)

a Distribution characteristics

ii mðB1Þ

1

mðB2Þ 2

mðB3Þ 3

mðB4Þ 4

mðB5Þ 5

mðB6Þ 6

mðB7Þ 7

1 Mixed case 0.7504 0:0044

1:0000

0:02219 0:9956

0:4985 0:7737

0:1719 0:2752

0:0985 0:1033

3.7844 rr¼3:1138

2 Structurecc 0.5475 0:0054

1:0000

0:2420 0:9405

0:4634 0:7526

0:2897 0:2892

0:0008 0:0601

3.5912 rr¼3:0424

3 Structurecccc 0.6304 0:0009

1:0000

0:0586 0:9991

0:4531 0:9405

0:4279 0:4874

0:0474 0:0595

0:0157 0:0121

4.1252 rr¼3:5402

4 Structurecccccc 0.2930 0:0000

1:0000

0:0011 1:0000

0:0840 0:9989

0:3829 0:9149

0:3863 0:5320

0:1243 0:1457

0:0229 0:0214

4.9046 rr¼4:6234

5 Structureccc 0.2402 0:0000

1:0000

0:1527 1:0000

0:4389 0:8473

0:3431 0:4184

0:0591 0:0653

0:0035 0:0062

0:0000 0:0013

3.5687 rr¼3:3110

Empirical and theoretical frequencies

Pð1Þ Fð1Þ

Pð2Þ Fð2Þ

Pð3Þ Fð3Þ

Pð4Þ Fð4Þ

Pð5Þ Fð5Þ

Pð6Þ Fð6Þ

Pð7Þ Fð7Þ

Pð8Þ Fð8Þ

1 0:0021

0:0000

0:1048 0:1047

0:3140 0:3140

0:3463 0:3463

0:1662 0:1788

0:0523 0:0574

0:0129 0:0153

0:0013 0:0020

2 0:0031

0:0000

0:1400 0:1400

0:3447 0:3447

0:3353 0:3353

0:1362 0:1361

0:0335 0:0332

0:0068 0:0063

0:0004 0:0008

3 0:0005

0:0000

0:0312 0:0312

0:2609 0:2609

0:3861 0:3861

0:2181 0:2104

0:0798 0:0711

0:0214 0:0211

0:0020 0:0037

4 0:0000

0:0000

0:0008 0:0000

0:0627 0:0627

0:3040 0:3040

0:3746 0:3746

0:1897 0:1897

0:0579 0:0579

0:0103 0:0103

5 0:0000

0:0000

0:1201 0:1201

0:3740 0:3740

0:3562 0:3562

0:1215 0:1215

0:0225 0:0225

0:0056 0:0027

0:0000 0:0002

F. Criado et al. / Information Sciences 147 (2002) 13–44 37

(26)

Table 5 Georgian

N Mean value of word length (experiment)

a Distribution characteristics

ii mðB1Þ

1

mðB2Þ 2

mðB3Þ 3

mðB4Þ 4

mðB5Þ 5

mðB6Þ 6

mðB7Þ 7

mðB8Þ 8

1 Mixed case 0.6216 0:0136

1:0000

0:2199 0:9864

0:3770 0:7665

0:2446 0:3895

0:1236 0:1419

0:0000 0:0213

3.9542 rr¼3:1808

2 Structurecc 0.6546 0:0110

1:0000

0:2446 0:9890

0:3787 0:7444

0:2449 0:3657

0:1244 0:1208

0:0000 0:0000

3.8923 rr¼3:2379

3 Structurecccc 0.5499 0:0012

1:0000

0:0964 0:9988

0:2898 0:9084

0:3722 0:6126

0:1687 0:2404

0:0693 0:0717

4.3921 rr¼3:8115

4 Structurecccccc 0.5877 0:0000

1:0000

0:0000 1:0000

0:1200 0:9859

0:3935 0:8659

0:2625 0:4724

0:1625 0:2099

0:0090 0:0474

0:0518 0:0465

5.2287 rr¼4:7412

5 Structureccc 0.3094 0:0118

1:0000

0:2229 0:9882

0:4062 0:7653

0:2101 0:3591

0:1091 0:1490

0:0339 0:00399

3.5801 rr¼3:2655

Empirical and theoretical frequencies

Pð1Þ Fð1Þ

Pð2Þ Fð2Þ

Pð3Þ Fð3Þ

Pð4Þ Fð4Þ

Pð5Þ Fð5Þ

Pð6Þ Fð6Þ

Pð7Þ Fð7Þ

Pð8Þ Fð8Þ

1 – 0:1181

0:1181

0:2757 0:2759

0:2821 0:2801

0:1932 0:1920

0:0873 0:0755

0:0277 0:0276

0:0000 0:0066

2 – 0:1271

0:1271

0:2800 0:2800

0:2833 0:2833

0:1945 0:1948

0:0789 0:0799

0:0240 0:0243

0:0000 0:0042

3 – 0:0556

0:0556

0:1978 0:1978

0:3145 0:3151

0:2418 0:2422

0:1307 0:1308

0:0433 0:0449

0:0000 0:0096

4 – – 0:0745

0:0748

0:2624 0:2624

0:2872 0:2872

0:2163 0:2163

0:0496 0:0867

0:0496 0:0512

5 – 0:1636

0:1636

0:3488 0:3488

0:2543 0:2543

0:1429 0:1429

0:0579 0:0578

0:0122 0:0122

0:0031 0:0016

F. Criado et al. / Information Sciences 147 (2002) 13–44

(27)

English model distributions:

FmixðiÞ ¼e0:6983

0:2711ð0:6983Þ

i1

ði1Þ! þ0:6000

ð0:6983Þi2 ði2Þ!

þ0:1309ð0:6983Þ

i3 ði3Þ!

!

;

FccðiÞ ¼e0:6639 0:3304ð0:6639Þ

i1

ði1Þ! þ0:5475

ð0:6639Þi2 ði2Þ!

þ0:1167ð0:6639Þ

i3

ði3Þ!

!

;

FccccðiÞ ¼e0:6246 0:1474ð0:6242Þ

i1

ði1Þ! þ0:5238

ð0:6242Þi2 ði2Þ!

þ0:3337ð0:6242Þ

i3

ði3Þ!

!

;

FccccccðiÞ ¼e0:1732 0:1990ð0:1732Þ

i2

ði2Þ! þ0:4945

ð0:1732Þi3 ði3Þ!

þ0:3042ð0:1732Þ

i4 ði4Þ!

!

;

FcccðiÞ ¼e0:2965

0:1949ð0:2965Þ

i1

ði1Þ! þ0:6289

ð0:2965Þi2 ði2Þ!

þ0:1294ð0:2965Þ

i3

ði3Þ! þ0:0320

ð0:2965Þi4 ði4Þ!

!

:

French model distributions:

FmixðiÞ ¼e0:7099 0:1218ð0:7099Þ

i1

ði1Þ! þ0:5145

ð0:7099Þi2 ði2Þ!

þ0:3608ð0:7099Þ

i3 ði3Þ!

!

;

FccðiÞ ¼e0:6791

0:1341ð0:6791Þ

i1

ði1Þ! þ0:5353

ð0:6791Þi2 ði2Þ!

þ0:3455ð0:6791Þ

i3

ði3Þ!

!

(28)

FccccðiÞ ¼e0:3641 0:0384ð0:3641Þ

i1

ði1Þ! þ0:3109

ð0:3641Þi2 ði2Þ!

þ0:6507ð0:3641Þ

i3

ði3Þ! þ0:1573

ð0:3641Þi4 ði4Þ!

þ0:0396ð0:3641Þ

i5 ði5Þ!

!

;

FccccccðiÞ ¼e0:3287 0:0711ð0:3287Þ

i2

ði2Þ! þ0:5171

ð0:3287Þi3 ði3Þ!

þ0:3050ð0:3287Þ

i4

ði4Þ! þ0:0979

ð0:3287Þi5 ði5Þ!

!

;

FcccðiÞ ¼e0:6470 0:1738ð0:6470Þ

i1

ði1Þ! þ0:5846

ð0:6470Þi2 ði2Þ!

þ0:2081ð0:6470Þ

i3

ði3Þ!

!

:

Latin model distributions:

FmixðiÞ ¼e0:5456 0:0139ð0:5456Þ

i1

ði1Þ! þ0:2837

ð0:5456Þi2 ði2Þ!

þ0:5203ð0:5456Þ

i3

ði3Þ! þ0:1108

ð0:5456Þi4 ði4Þ!

!

;

FccðiÞ ¼e0:5515 0:0149ð0:5515Þ

i1

ði1Þ! þ0:3400

ð0:5515Þi2 ði2Þ!

þ0:4815ð0:5515Þ

i3

ði3Þ! þ0:1510

ð0:5515Þi4 ði4Þ!

!

;

FccccðiÞ ¼e0:5855 0:0123ð0:5855Þ

i1

ði1Þ! þ0:1693

ð0:5855Þi2 ði2Þ!

þ0:6023ð0:5855Þ

i3

ði3Þ! þ0:2284

ð0:5855Þi4 ði4Þ!

!

(29)

FccccccðiÞ ¼e0:4196 0:0374ð0:4196Þ

i2

ði2Þ! þ0:3209

ð0:4196Þi3 ði3Þ!

þ0:5853ð0:4196Þ

i4

ði4Þ!

!

;

FcccðiÞ ¼e0:5466 0:0295ð0:5466Þ

i1

ði1Þ! þ0:2724

ð0:5466Þi2 ði2Þ!

þ0:5895ð0:5466Þ

i3

ði3Þ! þ0:1284

ð0:5466Þi4 ði4Þ!

!

:

Spanish model distributions:

FmixðiÞ ¼e0:7504 0:0044ð0:7504Þ

i1

ði1Þ! þ0:2219

ð0:7504Þi2 ði2Þ!

þ0:4985ð0:7504Þ

i3

ði3Þ! þ0:1719

ð0:7504Þi4 ði4Þ!

þ0:0985ð0:7504Þ

i5 ði5Þ!

!

;

FccðiÞ ¼e0:5475

0:0054ð0:5475Þ

i1

ði1Þ! þ0:2420

ð0:5475Þi2 ði2Þ!

þ0:4634ð0:5475Þ

i3

ði3Þ! þ0:2897

ð0:5475Þi4

ði4Þ! þ0:0008

ð0:5475Þi5 ði5Þ!

!

;

FccccðiÞ ¼e0:6304 0:0009ð0:6304Þ

i1

ði1Þ! þ0:0586

ð0:6304Þi2 ði2Þ!

þ0:4531ð0:6304Þ

i3

ði3Þ! þ0:4279

ð0:6304Þi4

ði4Þ! þ0:0474

ð0:6304Þi5 ði5Þ!

þ0:0157ð0:6304Þ

i6

ði6Þ!

!

;

FccccccðiÞ ¼e0:2930

0:0011ð0:2930Þ

i2

ði2Þ! þ0:0840

ð0:2930Þi3 ði3Þ!

þ0:3829ð0:2930Þ

i4

ði4Þ! þ0:3863

ð0:2930Þi5 ði5Þ!

þ0:1243ð0:2930Þ

i6

ði6Þ! þ0:0229

ð0:2930Þi7 ði7Þ!

!

(30)

FcccðiÞ ¼e0:2402 0:1527ð0:2402Þ

i2

ði2Þ! þ0:4389

ð0:2402Þi3 ði3Þ!

þ0:3431ð0:2402Þ

i4

ði4Þ! þ0:0591

ð0:2402Þi5 ði5Þ!

þ0:0035ð0:2402Þ

i6 ði6Þ!

!

:

Georgian model distributions:

FmixðiÞ ¼e0:6216 0:0136ð0:6216Þ

i1

ði1Þ! þ0:2199

ð0:6216Þi2 ði2Þ!

þ0:3770ð0:6216Þ

i3

ði3Þ! þ0:2446

ð0:6216Þi4 ði4Þ!

þ0:1244ð0:6216Þ

i5 ði5Þ!

!

;

FccðiÞ ¼e0:6546

0:0110ð0:6546Þ

i1

ði1Þ! þ0:2446

ð0:6546Þi2 ði2Þ!

þ0:3787ð0:6546Þ

i3

ði3Þ! þ0:2449

ð0:6546Þi4

ði4Þ! þ0:1244

ð0:6546Þi5 ði5Þ!

!

;

FccccðiÞ ¼e0:5499 0:0012ð0:5499Þ

i1

ði1Þ! þ0:0964

ð0:5499Þi2 ði2Þ!

þ0:2898ð0:5499Þ

i3

ði3Þ! þ0:3722

ð0:5499Þi4

ði4Þ! þ0:1687

ð0:5499Þi5 ði5Þ!

þ0:0693ð0:5499Þ

i6 ði6Þ!

!

;

FccccccðiÞ ¼e0:5877 0:1200ð0:5877Þ

i3

ði3Þ! þ0:3935

ð0:5877Þi4 ði4Þ!

þ0:2625ð0:5877Þ

i5

ði5Þ! þ0:1625

ð0:5877Þi6 ði6Þ!

þ0:0090ð0:5877Þ

i7

ði7Þ! þ0:0518

ð0:5877Þi8 ði8Þ!

!

(31)

FcccðiÞ ¼e0:3094 0:0118ð0:3094Þ

i1

ði1Þ! þ0:2229

ð0:3094Þi2 ði2Þ!

þ0:4062ð0:3094Þ

i3

ði3Þ! þ0:2101

ð0:3094Þi4

ði4Þ! þ0:1091

ð0:3094Þi5 ði5Þ!

þ0:0339ð0:3094Þ

i6 ði6Þ!

!

:

Table 6

Language Structure Phonological structure length

English Mixed case e11¼ ð1:0000=1;0:7289=2;0:1281=3Þ

cc e11¼ ð1:0000=1;0:6696=2;0:1221=3Þ

cc cc e11¼ ð1:0000=1;0:8525=2;0:3287=3Þ

cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:8018=3;0:3042=4Þ

ccc e11¼ ð1:0000=1;0:8051=2;0:1762=3;0:0468=4Þ

French Mixed case e11¼ ð1:0000=1;0:8732=2;0:3737=3;0:0029=4;0:0029=5Þ

cc e11¼ ð1:0000=1;0:8659=2;0:3300=3;0:0096=4Þ

cc cc e11¼ ð1:0000=1;0:9616=2;0:6507=3;0:1857=4;0:0284=5Þ

cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:9289=3;0:4118=4;0:0968=5Þ

ccc e11¼ ð1:0000=1;0:8262=2;0:2416=3;0:0335=4Þ

Latin Mixed case e11¼ ð1:0000=1;0:9860=2;0:7023=3;0:1820=4;0:0053=5Þ

cc e11¼ ð1:0000=1;0:9851=2;0:6451=3;0:1636=4;0:1260=5Þ

cc cc e11¼ ð1:0000=1;0:9877=2;0:8184=3;0:2161=4Þ

cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:9626=3;0:6417=4;0:0902=5Þ

ccc e11¼ ð1:0000=1;0:9775=2;0:7061=3;0:1156=4Þ

Spanish Mixed case e11¼ ð1:0000=1;0:9956=2;0:7737=3;0:2752=4;0:1033=5Þ

cc e11¼ ð1:0000=1;0:9405=2;0:7526=3;0:2892=4;0:0601=5Þ

cc cc e11¼ ð1:0000=1;0:9991=2;0:9405= 3;0:4874=4;0:0595=5;0:0121=6Þ

cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:9989= 3;0:9149=4;0:5320=5;0:1457=6;0:0214=7Þ

ccc e11¼ ð1:0000=1;1:0000=2;0:8473= 3;0:4184=4;0:0653=5;0:0062=6;0:0013=7Þ

Georgian Mixed case e11¼ ð1:0000=1;0:9864=2;0:7665= 3;0:3895=4;0:1419=5;0:0213=6Þ

cc e11¼ ð1:0000=1;0:9890=2;0:7444=3;0:3657=4;0:1208=5Þ

cc cc e11¼ ð1:0000=1;0:9988=2;0:9084= 3;0:6126=4;0:2404=5;0:0717=6Þ

cc cc cc ðð11g;;22ÞÞ ¼ ð1:0000=1;1:0000=2;0:8659=

3;0:8659=4;0:4724=5;0:2099=6;0:0474=7;0:0465=8Þ

ccc e11¼ ð1:0000=1;0:9882=2;0:7653=3;0:3591= 4;0:1490=5;0:0399=6Þ

(32)

The findings of this study suggest that real structures are characterized by fuzzy phonological lengths (number of sounds in the structure). Words (car-teges of chosen elements) are represented by mixtures of certain focal car(car-teges

Br with focal probabilities mðBrÞ and by a fuzzy unimodal structure with a

length of ‘‘approximately 1’’; such a model is assumed for the mixed case and bags ð1=v;2=cÞ, ð1=v;3=cÞand ð1=v;1=uÞ. Bags ð2=v;6=cÞcorresponds to the fuzzy bimodal structure model with a length of ‘‘approximately 2’’.

The data concerning fuzzy structure lengths are to be found in Table 6.

References

[1] G. Birkhoff, Lattice Theory, NY, 1981.

[2] F. Criado, T. Gachechiladze, Fuzzy random events and their corresponding conditional probability measures, Real Academia de Ciencias Exactas LXXXIX (1995).

[3] W. Fucks, Mathematical theory of word formation, Communication Theory, London, 1953.

[4] T. Gachechiladze, T. Manjaparashvili, Fuzzy generalized Bernoulli distributions, in: Proceed-ings of Tbilisi State University, Cybernetics, Applied Mathematics, vol. 224, 1981.

[5] T. Gachechiladze, T. Manjaparashvili, On fuzzy sets, Rep. of Tbilisi University 279 (1988). [6] T. Gachechiladze, T. Manjaparashvili, Fuzzy random events and corresponding probability

measures, Rep. of Tbilisi University (1990) 300.

[7] T. Gachechiladze, T. Manjaparashvili, Fuzzy linguistical models, in: Quantitative Linguistic, Tallin-Tbilisi, 1990.

[8] S. Kullback, Information Theory and Statistics, John Wiley, London, 1958.

[9] B. Mandelbrot, An information theory and statistical structure of language, Communication Theory, London, 1953.

[10] R. Megrelishvili, Structures of word and mathematical theory of word formation, in: Pro-ceeding of Tbilisi State University, Cybernetics, Applied Mathematics, vol. 289, 1989. [11] H. Weil, in: A. Baumler, Sch€ooter (Eds.), Filosofie der mathematik und

Naturwissenschaft-Handbuch der Filosofie, 1927.

[12] R. Yager, On the measures of fuzziness and negation, I, Intern. J. General Systems 5 (1979) 221.

[13] R. Yager, Level sets for evaluation of the grade of membership of the fuzzy sets, in: R. Yager (Ed.), Fuzzy Sets and Possibililty Theory, Pergamon Press, Oxford, 1984.

[14] R. Yager, On the theory of bags, Tech Report M11-601, IONA college, Machine Intelligence Inst. (1986).

[15] L. Zadeh, Fuzzy sets, Inform. and Control (8) (1965) 338.

[16] L. Zadeh, Probabililty measures and fuzzy events, J. Math. Anal. and Applic. 23 (1968) 424.

[17] L. Zadeh, The concept of linguistic variable and its application to approximate reasoning, Information Sciences 8 (1975) 199–249 (see also pp. 301–357).

Referencias

Documento similar

The Dwellers in the Garden of Allah 109... The Dwellers in the Garden of Allah

that when looking at the formal and informal linguistic environments in language acquisition and learning it is necessary to consider the role of the type of

From the phenomenology associated with contexts (C.1), for the statement of task T 1.1 , the future teachers use their knowledge of situations of the personal

In the preparation of this report, the Venice Commission has relied on the comments of its rapporteurs; its recently adopted Report on Respect for Democracy, Human Rights and the Rule

In the previous sections we have shown how astronomical alignments and solar hierophanies – with a common interest in the solstices − were substantiated in the

Díaz Soto has raised the point about banning religious garb in the ―public space.‖ He states, ―for example, in most Spanish public Universities, there is a Catholic chapel

Our results here also indicate that the orders of integration are higher than 1 but smaller than 2 and thus, the standard approach of taking first differences does not lead to

In the “big picture” perspective of the recent years that we have described in Brazil, Spain, Portugal and Puerto Rico there are some similarities and important differences,