Saturday, February 25, 2012

lx in core.logic #3: Finite State Transducers

This is the third post in the series on using core.logic to implement basic constructs in computational linguistics. If you haven't already, you might want to have a look at the first two before you start:

Today, we're gonna look at finite state transducers, which are commonly used to model and implement translation. While sounding fancy and powerful, they are straightforward extensions of finite automata.

(ns fst
(:refer-clojure :exclude [==])
(:use [clojure.core.logic]))
;; A finite state transducer is essentially a translator between
;; two tapes of symbols. It is normally a translator from an input
;; tape to an output tape, but since we are using core.logic,
;; we hope to relax this restriction :).
;; The main idea is that every transition accepts two symbols
;; (one from each tape). We will implement a simple pluralizer
;; for most English words.
;; WARNING: Maths ahead, skip at your leisure
;;
;; Formally, a finite state transducer is a tuple
;; T = (Q, Sigma, Gamma, I, F, delta)
;;
;; where
;; Q is the state space | Q = {0, std, es, s, 1}
;; Sigma is the input alphabet | Sigma = {a-z} and :jump
;; Gamma is the output alphabet | Gamma = {a-z} and :jump
;; I are the starting states | {0}
;; F are the accepting states | {1}
;; delta is the transition function
;; delta(q, a, b, qto) signifies that it is possible
;; to transition from state q to state qto by consuming
;; a from the first tape and b from the second
;; We will implement the following rules:
;; -h, -s, -o |--> -(h|s|o)es
;; -y |--> -ies
;; - |--> -s
;; Our transducer looks as follows:
;;
;; +---(any <x> not in {s,h,o,y}):<x>----(std)
;; | |
;; | #:s
;; | s:s |
;; | h:h v
;; +------> (0) --o:o--> (es) --#:e--> (s) --#:s--> (1)
;; | | y:i
;; +-<x>:<x>-+
;;
;; Notation
;;
;; <a>:<b> - <a> consumed from first tape and <b> from second
;; # - jump
;; (<x>) - state
(defrel start q)
(fact start 0)
(defrel accepting q)
(fact accepting 1)
;; Transition table. Note we haven't included
;; the transitions that accept many symbols, since
;; we do not want to enumerate the input alphabets.
(defrel transition* from a b to)
(facts transition* [[0 'y 'i 'es]
[0 's 's 'es]
[0 'h 'h 'es]
[0 'o 'o 'es]
['es :jump 'e 's]
['s :jump 's 1]
['std :jump 's 1]])
;; Dynamic extension of the transition table
;; to our full transition relation. This includes
;; the transitions from 0 to 0 and 0 to std.
(defn transition [from a b to]
(conde
((transition* from a b to))
((!= a :jump)
(== a b)
(== from 0)
(conde ((== to 0))
((== to 'std)
(!= a 's)
(!= a 'h)
(!= a 'o)
(!= a 'y))))))
;; Translation *relation*
(defn translate
([tape1 tape2]
(fresh [q0]
(start q0)
(translate q0 tape1 tape2)))
([q tape1 tape2]
(matcha [tape1 tape2]
(['() '()]
(accepting q))
([[t . ape1] [c . ape2]]
; This seems unnecessary going forward, but it makes
; sure we get no jumps in the input if we
; "translate" backwards, i.e., look for the singular
(!= t :jump)
(fresh [qto]
(transition q t c qto)
(translate qto ape1 ape2)))
([_ [t . ape2]]
(fresh [qto]
(transition q :jump t qto)
(translate qto tape1 ape2))))))
(run* [q] (translate '(p a s s) '(p a s s e s)))
;; => (_.0)
;;
;; Checking pluralizations
(run* [q] (translate '(p i r a t e) q))
;; => ((p i r a t e s))
;;
;; Running the pluralizer
(run* [q] (translate '(d a i s y) q))
;; => ((d a i s i e s))
(run* [q] (translate '(h e r o) q))
;; => ((h e r o e s))
(run* [q] (translate q '(v a r s)))
;; => ((v a r))
;;
;; Getting the singular
(run* [q] (translate q '(p a s s e s)))
;; => ((p a s s) (p a s s e))
;;
;; Unfortunately, the transducer doesn't really know English,
;; but at least it got the right answer
(run 10 [q] (fresh [a b] (translate a b) (== q [a b])))
;; => ([(h) (h e s)]
;; [(o) (o e s)]
;; [(s) (s e s)]
;; [(_.0 h) (_.0 h e s)]
;; [(_.0) (_.0 s)]
;; [(y) (i e s)]
;; [(_.0 o) (_.0 o e s)]
;; [(_.0 s) (_.0 s e s)]
;; [(_.0 _.1 h) (_.0 _.1 h e s)]
;; [(_.0 _.1) (_.0 _.1 s)])
view raw fst.clj hosted with ❤ by GitHub