CRM114 Controllable Regex Mutilator for Ruby
This is a Ruby interface to the CRM114 Controllable Regex Mutilator, an advanced and fast text classifier that uses sparse binary polynomial matching with a Bayesian Chain Rule evaluator and a hidden Markov model to categorize data with up to a 99.87% accuracy.
The Ruby wrapper grew out of this:
About CRM114
Download
-
gem install crm114
-
svn checkout svn://rubyforge.org/var/svn/crm114
Dependencies
Requires the CRM114 binaries to be installed. Specifically, the ‘crm
’ binary should be accessible in the current user’s PATH environment variable.
Usage
The CRM114 library interface is very similar to that of the Classifier project.
Here follows a brief example:
require 'crm114'
crm = Classifier::CRM114.new([:interesting, :boring])
crm.train! :interesting, 'Some data set with a decent signal to noise ratio.'
crm.train! :boring, 'Pig latin, as in lorem ipsum dolor sit amet.'
crm.classify 'Lorem ipsum' => [:boring, 0.99]
crm.interesting? 'Lorem ipsum' => false
crm.boring? 'Lorem ipsum' => true
Have a look at the included unit tests for more comprehensive examples.
Related Projects
-
www.elegantchaos.com/node/129 (crm.py)
Author
Arto Bendiken ([email protected]) - bendiken.net
License
Released under the terms of the MIT license. See the accompanying LICENSE file for more information.