1 Finite-State Automata: Characterization Finite-State Acceptors

(1)

Formal Models in NLP

Finite-State Automata

Nina Seemann

Universit¨at Stuttgart

– Institut f¨ur Maschinelle Sprachverarbeitung – Pfaffenwaldring 5b

70569 Stuttgart

(2)

Outline

1 Finite-State Automata: Characterization

2 Closure Properties of Finite-State Acceptors

3 Closure Properties of Finite-State Transducers

(3)

Outline

1 Finite-State Automata: Characterization

Finite-State Acceptors Finite-State Transducers

(4)

Finite-State Acceptors

(5)

Finite-State Acceptors

Non-Deterministic Finite-State Acceptor

Definition (Non-deterministic finite-state acceptor (NFA))

A non-deterministic finite-state acceptor A is a 5-tuple (Q, Σ, q0, F , δ)

where

Q is a finite set of states Σ is the alphabet

q0∈ Q is the start state

F ⊆ Q is a set of final states

δ : Q × Σ ∪ {} → 2Q, the transition function

Nondeterminism refers to the fact that a NFA has the power to be in several states at once.

(6)

Finite-State Acceptors

Deterministic Finite-State Acceptor

Definition (Deterministic finite-state acceptor (DFA))

A deterministic finite-state acceptor D is a 5-tuple (Q, Σ, q0, F , δ) where

Q is a finite set of states

Σ is a finite set and called the alphabet q0∈ Q is the initial state

δ : Q × Σ → Q, the transition function

Determinism refers to the fact that DFAs can go to one state only. DFAs are -free by definition.

DFA and NFA have the same generative power, i.e. they are equivalent.

(7)

Finite-State Acceptors

(8)

Finite-State Acceptors

Extended Transition Function & Language

Definition (Extended transition function ˆδ)

ˆ

δ describes what happens when we start in any state and follow any sequence of inputs.

ˆ

δ(q, ) = q. ˆ

δ(q, w ) = δ(ˆδ(q, x ), a) with w = xa.

Definition (Language of a DFA A)

L(A) = {w ∈ Σ∗ | ˆδ(qo, w ) ∈ F }

We also say that L(A) is recognized by A.

Definition (Regular language)

(9)

Finite-State Acceptors

Extended Transition Function for DFA

Example (frog in DFA Dlex)

Assumption: ˆδ(0, frog ) ∈ {26, 24, 22, 13, 11, 9, 8} ˆ δ(0, ) = 0 ˆ δ(0, f ) = δ(ˆδ(0, ), f ) = δ(0, f ) = 3 ˆ δ(0, fr ) = δ(ˆδ(0, f ), r ) = δ(3, r ) = 6 ˆ δ(0, fro) = δ(ˆδ(0, fr ), o) = δ(6, o) = 7 ˆ δ(0, frog ) = δ(ˆδ(0, fro), g ) = δ(7, g ) = 8

(10)

Finite-State Acceptors

Extended Transition Function for NFA

Example (frog in NFA Alex)

Assumption: ˆδ(31, frog ) ∩ {2, 6, 9, 13, 18, 21, 30} 6= ∅ ˆ δ(31, ) = {31} ˆ δ(31, f ) = δ(ˆδ(31, ), f ) = δ(31, f ) = {3, 7, 10} ˆ δ(31, fr ) = δ(ˆδ(31, f ), r ) = δ(3, r ) ∪ δ(7, r ) ∪ δ(10, r ) = {4} ∪ ∅ ∪ ∅ = {4} ˆ δ(31, fro) = δ(ˆδ(31, fr ), o) = δ(4, o) = {5} ˆ δ(31, frog ) = δ(ˆδ(31, fro), g ) = δ(5, g ) = {6}

(11)

Finite-State Transducers

Definition

Definition ((Non-deterministic) finite-state transducer (NFST))

A (non-deterministic) finite-state transducer T is a 7-tuple (Q, Σ, ∆, q0, F , δ, σ) where

Q is a set of states

Σ is the input alphabet of T ∆ is the output alphabet of T q0∈ Q is the start state

δ : Q × Σ ∪ {} → 2Q, the transition function σ : Q × Σ ∪ {} × Q → ∆∗, the output function

(12)

Finite-State Transducers

Alternative Definition

Definition (Normalized finite-state transducer)

A normalized finite-state transducer T is a 6-tuple (Q, Σ, ∆, q0, F , E )

where

Σ is a set and called the input alphabet of T ∆ is a set and called the output alphabet of T q0∈ Q is the start state

E ⊆ Q × (Σ ∪ {}) × (∆ ∪ {}) × Q, the set of transitions Every transducer can be transformed into a normalized transducer.

(13)

Finite-State Transducers

(14)

Finite-State Transducers

Deterministic Finite-State Transducer

Definition (Deterministic finite-state transducer (DFST))

A deterministic finite-state transducer T is a 7-tuple (Q, Σ, ∆, q0, F , δ, σ)

where

Σ is a set and called the input alphabet of T ∆ is a set and called the output alphabet of T q0∈ Q is the start state

δ : Q × Σ → Q, the (deterministic) transition function σ : Q × Σ × Q 7→ ∆∗, the (deterministic) output function Note: Not every NFST can be determinized.

(15)

Outline

1 Finite-State Automata: Characterization Finite-State Acceptors

Finite-State Transducers

(16)

Closure Properties of Finite-State Acceptors

Finite-state acceptors are closed under: Union

Concatenation Closure (Kleene Star) Reversal

Intersection Complementation Difference

(17)

Closure Properties of Finite-State Acceptors

Union

Example (Union of two acceptors A1 and A2)

(18)

Closure Properties of Finite-State Acceptors

Concatenation

Example (Concatenation of two acceptors A1 and A2)

A1 A2

(19)

Closure Properties of Finite-State Acceptors

Closure (Kleene Star)

Example (Closure of acceptor A1 )

A1

(20)

Closure Properties of Finite-State Acceptors

Reversal

Example (Reversal of acceptor A2)

A2

AR 2

(21)

Closure Properties of Finite-State Acceptors

Intersection

Let L and M be the languages of the deterministic automata

AL= (QL, Σ, δL, qL, FL) and AM = (QM, Σ, δM, qM, FM). For L ∩ M we

will construct an automaton

A = (QL× QM, Σ, δ, (qL, qM), FL× FM)

where δ((p, q), σ) = (δL(p, σ), δM(q, σ)) [p ∈ QL, q ∈ QM, and σ ∈ Σ].

The set F of final states consists of all pairs (p, q) such that p ∈ FL and

q ∈ FM.

states of A are pair of states (AL, AM)

suppose state (p,q):

I Given input symbol a

(22)

Closure Properties of Finite-State Acceptors

Intersection

Example (Intersection of two acceptors A1 and A3)

A1 A3

(23)

Closure Properties of Finite-State Acceptors

Complementation

Example (Complementation of acceptor A3)

A3 A3

(24)

Closure Properties of Finite-State Acceptors

Difference

Example (Difference of two acceptors A1 and A2)

A1 A2

(25)

Outline

(26)

Closure Properties of Finite-State Transducers

Finite-state transducers are closed under Union

Concatenation Closure (Kleene Star) Reversal

Projection (leads to FSAs) Composition

Inversion

Finite-state transducers are not closed under Complementation

Intersection (but acyclic and -free transducers are) Difference

(27)

Closure Properties of Finite-State Transducers

Projection

Example (Projection of transducer T )

(28)

Composition

Definition (-free composition)

Let T1 = (Q1, Σ1, ∆1, q1, F1, E1) and T2 = (Q2, Σ2, ∆2, q2, F2, E2) be two

normalized, -free FSTs. T1◦ T2 is the transducer

T = (Q1× Q2, Σ1, ∆2, (q1, q2), F1× F2, E )

where E = {((p, q), a, b, (p0, q0)) | ∃c ∈ ∆1∩ Σ2:

(p, a, c, p0) ∈ E1∧ (q, c, b, q0) ∈ E2} How does composition work?

Whenever T1contains a transition: and T2 contains a transition:

(29)

Closure Properties of Finite-State Transducers

Composition

Example (Composition)

(30)

Closure Properties of Finite-State Transducers

Inversion

Example (Inversion)

FST TMorph mapping words to morphological categories

(31)

Outline

(32)

Equivalence Transformations on Finite-State Acceptors

Equivalence transformations are operations on automata which change the topology of an automaton but not its language. They usually serve optimization purposes, i.e. they create smaller and/or faster automata.

Sometimes they are even necessary (e.g. determinization is crucial for complementation).

Finite-state acceptors admit the following transformations: -Removal

Determinization Minimization

(33)

Determinization

Subset Construction

A DFA can be constructed from a NFA by the subset construction. In worst case, the smallest DFA can have 2n states.

Example

. . . QD is the power set of QN

FD is the set of subsets S of QN such that S ∩ FN 6= ∅.

For each set S ⊆ QN and for each input symbol a ∈ Σ

δD(S , a) =

[

(34)

Determinization

Subset Construction

transition diagram: transition function δ: δ(p0, 0) = {p0, p1} δ(p0, 1) = {p0} δ(p1, 1) = {p2} 0 1 ∅ ∅ ∅ not accessible! → {p₀} {p₀, p1} {p0} {p1} ∅ {p2} not accessible! ∗_{p 2} ∅ ∅ not accessible! {p₀, p1} {p0, p1} {p0, p2} ∗_{p 0, p2} {p0, p1} {p0} ∗_{p 1, p2} ∅ {p2} not accessible! ∗_{p 0, p1, p2} {p0, p1} {p0, p2} not accessible!

(35)

Determinization

Subset Construction: Lazy Evaluation Lazy Evaluation

Basis NFA N’s start state is accessible.

Induction Set S of states is accessible. Then for each input symbol a,

compute the set of states δD(S , a).

Example

δD({p0}, 0) = {p0, p1} (new accessible state)

δD({p0}, 1) = {p0} (’old’ state)

δD({p0, p1}, 0) = δN(p0, 0) ∪ δN(p1, 0) = {p0, p1} ∪ ∅ = {p0, p1} (’old’)

δD({p0, p1}, 1) = δN(p0, 1) ∪ δN(p1, 1) = {p0} ∪ {p2} = {p0, p2} (n.a.s.)

δD({p0, p2}, 0) = δN(p0, 0) ∪ δN(p2, 0) = {p0, p1} ∪ ∅ = {p0, p1} (’old’)

(36)

Determinization

(37)

Bibliography

J. E. Hopcroft, R. Motwani & J. D. Ullman: Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, 2007.

T. Hanneforth: Finite-state Machines: Theory and Applications. Unweighted Finite-state Automata. Universit¨at Potsdam, 2008. Slides: