Friday, January 27, 2012

lx in core.logic #2: Jumps, Flexible Transitions and Parsing

This is a continuation of the post Finite State Machines in Clojure core.logic.

This current plan for this series is to follow the book Algorithms for Computational Linguistics using Clojure core.logic instead of Prolog.

Jumps, wildcard transitions and parsing are natural and useful ways to extend and leverage finite state machines for text analysis. This was an opportunity to introduce extensions of fact databases and non-deterministic matching. Here's the code:

(ns fsmparse
(:refer-clojure :exclude [==])
(:use [clojure.core.logic]))
;; We will encode a state machine that accepts lists containing '(w h y) as a sublist
;; Moreover, instead of a recognizer, we will implement a parser, that returns a list
;; of visited states in order
;;
;; +----#-----+----#-----+ +--?--+
;; v | | v |
;; +> (x) --w--> (w) --h--> (wh) --y--> (why) --+
;; | |
;; +-?-+
;;
;; Notation:
;; ? - any character
;; # - jump; does not consume a character from the input
;; (<x>) - state named <x>
;; --<i>--> - transition with input <i>
(defrel start q)
(fact start 'x)
;; Encoded transitions including jumps
(defrel transition* from via to)
(facts transition* [['x 'w 'w]
['w 'h 'wh]
['wh 'y 'why]
['w :jump 'x]
['wh :jump 'x]])
;; An extension of the transition* relation to implement start state and final
;; state transitions that accept any character
(defn transition [from via to]
(conde
((transition* from via to))
((!= via :jump) (== from 'x) (== to 'x))
((!= via :jump) (== from 'why) (== to 'why))))
(defrel accepting q)
(fact accepting 'why)
(defn parse
([input parsed]
(fresh [q0]
(start q0)
(parse q0 input parsed)))
([q input parsed]
; Non-relational matching, commits to first match
(matcha [input]
(['()]
(accepting q)
(== parsed (list q)))
([[i . nput]]
; Handling transitions that consume input characters
(!= i :jump)
(fresh [qto subparsed]
(transition q i qto)
(parse qto nput subparsed)
; conso is a built in relation defined as
; conso(x, xs, [x . xs]) succeeds
(conso q subparsed parsed)))
([_]
; Handling jump transitions
(fresh [qto subparsed]
(transition q :jump qto)
(parse qto input subparsed)
(conso q subparsed parsed))))))
(run* [q] (parse '(a w h y i n s i d e) q))
;; => ((x x w wh why why why why why why why))
(run* [q] (parse '(n o w a y) q))
;; => ()
(run 3 [q] (fresh [m] (parse q m)))
;; => ((w h y) (_.0 w h y) (_.0 _.1 w h y))
view raw fsmparse.clj hosted with ❤ by GitHub

2 comments:

  1. This is really fantastic stuff! I'm also very, very excited about leveraging core.logic for parsing. One hurdle we have to surmount is very large inputs - Prolog has some tricks that we haven't implemented yet. Looking forward to watching this series evolve and any feedback you may have about core.logic.

    ReplyDelete
    Replies
    1. Cool to hear that. I haven't been looking at complexity yet, just trying to get used to this paradigm and looking for what it can offer over Prolog. Just being able to prototype and experiment this fast is great already. But absolutely, it would be pretty neat if this actually scaled up.

      Delete