Class: Arx::Cleaner

Inherits:
Object
  • Object
show all
Defined in:
lib/arx/cleaner.rb

Overview

Class for cleaning strings.

Constant Summary collapse

URL_PREFIX =

arXiv paper URL prefix format

/^(https?\:\/\/)?(www.)?arxiv\.org\/abs\//

Class Method Summary collapse

Class Method Details

.clean(string) ⇒ String

Cleans strings.

Parameters:

  • string (String)

    Removes newline/return characters and multiple spaces from a string.

Returns:

  • (String)

    The cleaned string.



17
18
19
# File 'lib/arx/cleaner.rb', line 17

def clean(string)
  string.gsub(/\r\n|\r|\n/, ' ').strip.squeeze ' '
end

.extract_id(string, version: false) ⇒ String

Attempt to extract an arXiv identifier from a string such as a URL.

Parameters:

  • string (String)

    The string to extract the ID from.

  • version (Boolean) (defaults to: false)

    Whether or not to include the paper’s version.

Returns:

  • (String)

    The extracted ID.



26
27
28
29
30
31
32
33
34
35
36
37
38
# File 'lib/arx/cleaner.rb', line 26

def extract_id(string, version: false)
  if version == !!version
    if string.is_a? String
      trimmed = /#{URL_PREFIX}.+\/?$/.match?(string) ? string.gsub(/(#{URL_PREFIX})|(\/$)/, '') : string
      raise ArgumentError.new("Couldn't extract arXiv identifier from: #{string}") unless Validate.id? trimmed
      version ? trimmed : trimmed.sub(/v[0-9]+$/, '')
    else
      raise TypeError.new("Expected `string` to be a String, got: #{string.class}")
    end
  else
    raise TypeError.new("Expected `version` to be boolean (TrueClass or FalseClass), got: #{version.class}")
  end
end

.extract_version(string) ⇒ String

Attempt to extract a version number from an arXiv identifier.

Parameters:

  • string (String)

    The arXiv identifier to extract the version number from.

Returns:

  • (String)

    The extracted version number.



44
45
46
47
48
49
50
51
52
# File 'lib/arx/cleaner.rb', line 44

def extract_version(string)
  reversed = extract_id(string, version: true).reverse

  if /^[0-9]+v/.match? reversed
    reversed.partition('v').first.reverse.to_i
  else
    raise ArgumentError.new("Couldn't extract version number from identifier: #{string}")
  end
end