Class: Prism::Source

Inherits:

Object

Object
Prism::Source

show all

Defined in:: lib/prism/parse_result.rb,
ext/prism/extension.c

Overview

This represents a source of Ruby code that has been parsed. It is used in conjunction with locations to allow them to resolve line numbers and source ranges.

Direct Known Subclasses

ASCIISource

Instance Attribute Summary collapse

#offsets ⇒ Object readonly

The list of newline byte offsets in the source code.
#source ⇒ Object readonly

The source code that this source object represents.
#start_line ⇒ Object readonly

The line number where this source starts.

Class Method Summary collapse

.for(source, start_line = 1, offsets = []) ⇒ Object

Create a new source object with the given source code.

Instance Method Summary collapse

#character_column(byte_offset) ⇒ Object

Return the column number in characters for the given byte offset.
#character_offset(byte_offset) ⇒ Object

Return the character offset for the given byte offset.
#code_units_cache(encoding) ⇒ Object

Generate a cache that targets a specific encoding for calculating code unit offsets.
#code_units_column(byte_offset, encoding) ⇒ Object

Returns the column number in code units for the given encoding for the given byte offset.
#code_units_offset(byte_offset, encoding) ⇒ Object

Returns the offset from the start of the file for the given byte offset counting in code units for the given encoding.
#column(byte_offset) ⇒ Object

Return the column number for the given byte offset.
#deep_freeze ⇒ Object

Freeze this object and the objects it contains.
#encoding ⇒ Object

Returns the encoding of the source code, which is set by parameters to the parser or by the encoding magic comment.
#initialize(source, start_line = 1, offsets = []) ⇒ Source constructor

Create a new source object with the given source code.
#line(byte_offset) ⇒ Object

Binary search through the offsets to find the line number for the given byte offset.
#line_end(byte_offset) ⇒ Object

Returns the byte offset of the end of the line corresponding to the given byte offset.
#line_start(byte_offset) ⇒ Object

Return the byte offset of the start of the line corresponding to the given byte offset.
#lines ⇒ Object

Returns the lines of the source code as an array of strings.
#replace_offsets(offsets) ⇒ Object

Replace the value of offsets with the given value.
#replace_start_line(start_line) ⇒ Object

Replace the value of start_line with the given value.
#slice(byte_offset, length) ⇒ Object

Perform a byteslice on the source code using the given byte offset and byte length.

Constructor Details

#initialize(source, start_line = 1, offsets = []) ⇒ `Source`

Create a new source object with the given source code.

# File 'lib/prism/parse_result.rb', line 46

def initialize(source, start_line = 1, offsets = [])
  @source = source
  @start_line = start_line # set after parsing is done
  @offsets = offsets # set after parsing is done
end

Instance Attribute Details

#offsets ⇒ `Object` (readonly)

The list of newline byte offsets in the source code.



43
44
45

# File 'lib/prism/parse_result.rb', line 43

def offsets
  @offsets
end

#source ⇒ `Object` (readonly)

The source code that this source object represents.



37
38
39

# File 'lib/prism/parse_result.rb', line 37

def source
  @source
end

#start_line ⇒ `Object` (readonly)

The line number where this source starts.



40
41
42

# File 'lib/prism/parse_result.rb', line 40

def start_line
  @start_line
end

Class Method Details

.for(source, start_line = 1, offsets = []) ⇒ `Object`

Create a new source object with the given source code. This method should be used instead of ‘new` and it will return either a `Source` or a specialized and more performant `ASCIISource` if no multibyte characters are present in the source code.

# File 'lib/prism/parse_result.rb', line 13

def self.for(source, start_line = 1, offsets = [])
  if source.ascii_only?
    ASCIISource.new(source, start_line, offsets)
  elsif source.encoding == Encoding::BINARY
    source.force_encoding(Encoding::UTF_8)

    if source.valid_encoding?
      new(source, start_line, offsets)
    else
      # This is an extremely niche use case where the file is marked as
      # binary, contains multi-byte characters, and those characters are not
      # valid UTF-8. In this case we'll mark it as binary and fall back to
      # treating everything as a single-byte character. This _may_ cause
      # problems when asking for code units, but it appears to be the
      # cleanest solution at the moment.
      source.force_encoding(Encoding::BINARY)
      ASCIISource.new(source, start_line, offsets)
    end
  else
    new(source, start_line, offsets)
  end
end

Instance Method Details

#character_column(byte_offset) ⇒ `Object`

Return the column number in characters for the given byte offset.



108
109
110

# File 'lib/prism/parse_result.rb', line 108

def character_column(byte_offset)
  character_offset(byte_offset) - character_offset(line_start(byte_offset))
end

#character_offset(byte_offset) ⇒ `Object`

Return the character offset for the given byte offset.



103
104
105

# File 'lib/prism/parse_result.rb', line 103

def character_offset(byte_offset)
  (source.byteslice(0, byte_offset) or raise).length
end

#code_units_cache(encoding) ⇒ `Object`

Generate a cache that targets a specific encoding for calculating code unit offsets.



136
137
138

# File 'lib/prism/parse_result.rb', line 136

def code_units_cache(encoding)
  CodeUnitsCache.new(source, encoding)
end

#code_units_column(byte_offset, encoding) ⇒ `Object`

Returns the column number in code units for the given encoding for the given byte offset.



142
143
144

# File 'lib/prism/parse_result.rb', line 142

def code_units_column(byte_offset, encoding)
  code_units_offset(byte_offset, encoding) - code_units_offset(line_start(byte_offset), encoding)
end

#code_units_offset(byte_offset, encoding) ⇒ `Object`

Returns the offset from the start of the file for the given byte offset counting in code units for the given encoding.

This method is tested with UTF-8, UTF-16, and UTF-32. If there is the concept of code units that differs from the number of characters in other encodings, it is not captured here.

We purposefully replace invalid and undefined characters with replacement characters in this conversion. This happens for two reasons. First, it’s possible that the given byte offset will not occur on a character boundary. Second, it’s possible that the source code will contain a character that has no equivalent in the given encoding.

# File 'lib/prism/parse_result.rb', line 124

def code_units_offset(byte_offset, encoding)
  byteslice = (source.byteslice(0, byte_offset) or raise).encode(encoding, invalid: :replace, undef: :replace)

  if encoding == Encoding::UTF_16LE || encoding == Encoding::UTF_16BE
    byteslice.bytesize / 2
  else
    byteslice.length
  end
end

#column(byte_offset) ⇒ `Object`

Return the column number for the given byte offset.



98
99
100

# File 'lib/prism/parse_result.rb', line 98

def column(byte_offset)
  byte_offset - line_start(byte_offset)
end

#deep_freeze ⇒ `Object`

Freeze this object and the objects it contains.

# File 'lib/prism/parse_result.rb', line 147

def deep_freeze
  source.freeze
  offsets.freeze
  freeze
end

#encoding ⇒ `Object`

Returns the encoding of the source code, which is set by parameters to the parser or by the encoding magic comment.



64
65
66

# File 'lib/prism/parse_result.rb', line 64

def encoding
  source.encoding
end

#line(byte_offset) ⇒ `Object`

Binary search through the offsets to find the line number for the given byte offset.



81
82
83

# File 'lib/prism/parse_result.rb', line 81

def line(byte_offset)
  start_line + find_line(byte_offset)
end

#line_end(byte_offset) ⇒ `Object`

Returns the byte offset of the end of the line corresponding to the given byte offset.



93
94
95

# File 'lib/prism/parse_result.rb', line 93

def line_end(byte_offset)
  offsets[find_line(byte_offset) + 1] || source.bytesize
end

#line_start(byte_offset) ⇒ `Object`

Return the byte offset of the start of the line corresponding to the given byte offset.



87
88
89

# File 'lib/prism/parse_result.rb', line 87

def line_start(byte_offset)
  offsets[find_line(byte_offset)]
end

#lines ⇒ `Object`

Returns the lines of the source code as an array of strings.



69
70
71

# File 'lib/prism/parse_result.rb', line 69

def lines
  source.lines
end

#replace_offsets(offsets) ⇒ `Object`

Replace the value of offsets with the given value.



58
59
60

# File 'lib/prism/parse_result.rb', line 58

def replace_offsets(offsets)
  @offsets.replace(offsets)
end

#replace_start_line(start_line) ⇒ `Object`

Replace the value of start_line with the given value.



53
54
55

# File 'lib/prism/parse_result.rb', line 53

def replace_start_line(start_line)
  @start_line = start_line
end

#slice(byte_offset, length) ⇒ `Object`

Perform a byteslice on the source code using the given byte offset and byte length.



75
76
77

# File 'lib/prism/parse_result.rb', line 75

def slice(byte_offset, length)
  source.byteslice(byte_offset, length) or raise
end

Class: Prism::Source

Overview

Direct Known Subclasses

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(source, start_line = 1, offsets = []) ⇒ Source

Instance Attribute Details

#offsets ⇒ Object (readonly)

#source ⇒ Object (readonly)

#start_line ⇒ Object (readonly)

Class Method Details

.for(source, start_line = 1, offsets = []) ⇒ Object

Instance Method Details

#character_column(byte_offset) ⇒ Object

#character_offset(byte_offset) ⇒ Object

#code_units_cache(encoding) ⇒ Object

#code_units_column(byte_offset, encoding) ⇒ Object

#code_units_offset(byte_offset, encoding) ⇒ Object

#column(byte_offset) ⇒ Object

#deep_freeze ⇒ Object

#encoding ⇒ Object

#line(byte_offset) ⇒ Object

#line_end(byte_offset) ⇒ Object

#line_start(byte_offset) ⇒ Object

#lines ⇒ Object

#replace_offsets(offsets) ⇒ Object

#replace_start_line(start_line) ⇒ Object

#slice(byte_offset, length) ⇒ Object