SimpleInlineTextAnnotation (Ruby gem)
SimpleInlineTextAnnotation
is a Ruby gem designed for working with inline text annotations. It allows you to parse and generate annotated text in a structured and efficient way.
Installation
To use this gem in a Rails application, add the following line to your application's Gemfile
:
gem 'simple_inline_text_annotation'
Then, run the following command to install the gem:
bundle install
Usage
The SimpleInlineTextAnnotation
gem provides two main methods: parse
and generate
. These methods allow you to work with inline text annotations in a structured way.
parse
Method
The parse
method takes a string with inline annotations and extracts structured information about the annotations, including the character positions and annotation types.
Example
result = SimpleInlineTextAnnotation.parse('[Elon Musk][Person] is a member of the [PayPal Mafia][Organization].')
puts result
# => {
# text: "Elon Musk is a member of the PayPal Mafia.",
# denotations: [
# {span: {begin: 0, end: 9}, obj: "Person"},
# {span: {begin: 29, end: 41}, obj: "Organization"}
# ]
# }
Explanation
The input string
[Elon Musk][Person] is a member of the [PayPal Mafia][Organization].
contains two annotations:[Elon Musk][Person]
: The textElon Musk
is annotated asPerson
.[PayPal Mafia][Organization]
: The textPayPal Mafia
is annotated asOrganization
.
The method returns a hash with:
"text"
: The plain text without annotations."denotations"
: An array of hashes, where each hash contains:"span"
: The character positions (begin
andend
) of the annotated text."obj"
: The annotation type.
generate
Method
The generate
method performs the reverse operation of parse
. It takes a hash containing the plain text and its annotations, and generates a string with inline annotations.
Example
result = SimpleInlineTextAnnotation.generate({
"text" => "Elon Musk is a member of the PayPal Mafia.",
"denotations" => [
{ "span" => { "begin" => 0, "end" => 9 }, "obj" => "Person" },
{ "span" => { "begin" => 29, "end" => 41 }, "obj" => "Organization" }
]
})
puts result
# => "[Elon Musk][Person] is a member of the [PayPal Mafia][Organization]."
Explanation
- The input hash contains:
"text"
: The plain text ("Elon Musk is a member of the PayPal Mafia."
)."denotations"
: An array of hashes, where each hash specifies:"span"
: The character positions (begin
andend
) of the annotated text."obj"
: The annotation type.
- The method generates a string where:
- The text specified in
"span"
is enclosed in square brackets[]
. - The annotation type specified in
"obj"
is added in a second set of square brackets[]
.
- The text specified in
Relation Annotation
The SimpleInlineTextAnnotation
gem supports advanced relation annotation, allowing you to define relationships between annotated entities. This is achieved by interpreting the second set of square brackets ([]
) based on the number of elements it contains.
Parsing Rules
- If the second
[]
contains 1 element, it is treated as the annotation type (default behavior). - If the second
[]
contains 2 elements, the first element is interpreted as theid
of the denotation, and the second element as theobj
(annotation type). - If the second
[]
contains 4 elements, the elements are interpreted as follows:- The first element is the
id
of the denotation and thesubj
of the relation. - The second element is the
obj
(annotation type) of the denotation. - The third element is the
pred
(predicate) of the relation. - The fourth element is the
obj
of the relation.
- The first element is the
- Any other cases are ignored.
Example
source = "[Elon Musk][T1, Person, member_of, T2] is a member of the [PayPal Mafia][T2, Organization]."
result = SimpleInlineTextAnnotation.parse(source)
puts result
# => {
# text: "Elon Musk is a member of the PayPal Mafia.",
# denotations: [
# { id: "T1", span: { begin: 0, end: 9 }, obj: "Person" },
# { id: "T2", span: { begin: 29, end: 41 }, obj: "Organization" }
# ],
# relations: [
# { pred: "member_of", subj: "T1", obj: "T2" }
# ]
# }
Explanation
The input string
[Elon Musk][T1, Person, member_of, T2] is a member of the [PayPal Mafia][T2, Organization].
contains:- Two denotations:
[Elon Musk][T1, Person, member_of, T2]
: The textElon Musk
is annotated asPerson
withid
T1
. It also serves as thesubj
of the relation.[PayPal Mafia][T2, Organization]
: The textPayPal Mafia
is annotated asOrganization
withid
T2
.
- One relation:
member_of
: Indicates thatT1
(Elon Musk
) is a member ofT2
(PayPal Mafia
).
- Two denotations:
The method returns a hash with:
"text"
: The plain text without annotations."denotations"
: An array of hashes, where each hash contains:"id"
: The unique identifier of the denotation."span"
: The character positions (begin
andend
) of the annotated text."obj"
: The annotation type."relations"
: An array of hashes, where each hash contains:"pred"
: The predicate or type of the relation."subj"
: Theid
of the subject denotation."obj"
: Theid
of the object denotation.
Generating Relation Annotation
The generate
method can also create strings with relation annotations from structured data.
result = SimpleInlineTextAnnotation.generate({
"text" => "Elon Musk is a member of the PayPal Mafia.",
"denotations" => [
{ "id" => "T1", "span" => { "begin" => 0, "end" => 9 }, "obj" => "Person" },
{ "id" => "T2", "span" => { "begin" => 29, "end" => 41 }, "obj" => "Organization" }
],
"relations" => [
{ "pred" => "member_of", "subj" => "T1", "obj" => "T2" }
]
})
puts result
# => "[Elon Musk][T1, Person, member_of, T2] is a member of the [PayPal Mafia][T2, Organization]."
Explanation
- The input hash includes:
"text"
: The plain text."denotations"
: An array of annotations withid
,span
, andobj
."relations"
: An array of relationships, where:"subj"
and"obj"
referenceid
s in thedenotations
array."pred"
specifies the relationship type.
- The method generates a string with inline annotations and relationships.