Class: DataFrame

Inherits:
Object
  • Object
show all
Includes:
ARFF
Defined in:
lib/data_frame/model.rb,
lib/data_frame/data_frame.rb

Overview

This allows me to have named columns and optionally named rows in a data frame, to work calculations (usually on the columns), to transpose the matrix and store the transposed matrix until the object is tainted.

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Methods included from ARFF

#to_arff, #to_csv

Constructor Details

#initialize(*labels) ⇒ DataFrame

Returns a new instance of DataFrame.



85
86
87
88
# File 'lib/data_frame/data_frame.rb', line 85

def initialize(*labels)
  @labels = labels.map {|e| e.to_underscore_sym }
  @items = TransposableArray.new
end

Dynamic Method Handling

This class handles dynamic methods through the method_missing method

#method_missing(sym, *args, &block) ⇒ Object



137
138
139
140
141
142
143
144
145
146
147
# File 'lib/data_frame/data_frame.rb', line 137

def method_missing(sym, *args, &block)
  if self.labels.include?(sym)
    render_column(sym)
  elsif self.row_labels.include?(sym)
    render_row(sym)
  elsif @items.respond_to?(sym)
    @items.send(sym, *args, &block)
  else
    super
  end
end

Instance Attribute Details

#itemsObject (readonly) Also known as: rows

The items stored in the frame



80
81
82
# File 'lib/data_frame/data_frame.rb', line 80

def items
  @items
end

#labelsObject (readonly) Also known as: variables

The labels of the data items



76
77
78
# File 'lib/data_frame/data_frame.rb', line 76

def labels
  @labels
end

#nameObject

An optional name, useful for arff files



83
84
85
# File 'lib/data_frame/data_frame.rb', line 83

def name
  @name
end

Class Method Details

.from_csv(obj, opts = {}) ⇒ Object

This is the neatest part of this neat gem. DataFrame.from_csv can be called in a lot of ways: DataFrame.from_csv(csv_contents) DataFrame.from_csv(filename) DataFrame.from_csv(url) If you need to define converters for FasterCSV, do it before calling this method: FasterCSV::Converters = lambda{|f| f == ‘foo’ ? ‘bar’ : ‘foo’} DataFrame.from_csv(‘example.com/my_special_url.csv’, :converters => :special) This returns bar where ‘foo’ was found and ‘foo’ everywhere else.



19
20
21
22
23
24
25
26
27
# File 'lib/data_frame/data_frame.rb', line 19

def from_csv(obj, opts={})
  labels, table = infer_csv_contents(obj, opts)
  name = infer_name_from_contents(obj, opts)
  return nil unless labels and table
  df = new(*labels)
  df.import(table)
  df.name = name
  df
end

Instance Method Details

#add_item(item) ⇒ Object Also known as: add



90
91
92
# File 'lib/data_frame/data_frame.rb', line 90

def add_item(item)
  self.items << item
end

#append!(column_name, value = nil) ⇒ Object

Adds a unique column to the table

Raises:

  • (ArgumentError)


283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
# File 'lib/data_frame/data_frame.rb', line 283

def append!(column_name, value=nil)
  raise ArgumentError, "Can't have duplicate column names" if self.labels.include?(column_name)
  self.labels << column_name.to_underscore_sym
  if value.is_a?(Array)
    self.items.each_with_index do |item, i|
      item << value[i]
    end
  else
    self.items.each do |item|
      item << value
    end
  end
  # Because we are tainting the sub arrays, the TaintableArray doesn't know it's been changed.
  self.items.taint
end

#columns(reset = false) ⇒ Object Also known as: to_hash, to_dictionary

The columns as a Dictionary or Hash This is cached, call columns(true) to reset the cache.



115
116
117
118
119
120
121
122
123
124
125
126
127
# File 'lib/data_frame/data_frame.rb', line 115

def columns(reset=false)
  @columns = nil if reset
  return @columns if @columns
  
  container = defined?(Dictionary) ? Dictionary.new : Hash.new
  i = 0
  
  @columns = @items.transpose.inject(container) do |cont, col|
    cont[@labels[i]] = col
    i += 1
    cont
  end
end

#drop!(*labels) ⇒ Object



149
150
151
152
153
154
# File 'lib/data_frame/data_frame.rb', line 149

def drop!(*labels)
  labels.each do |label|
    drop_one!(label)
  end
  self
end

#filter(as = Array, &block) ⇒ Object



211
212
213
214
# File 'lib/data_frame/data_frame.rb', line 211

def filter(as=Array, &block)
  new_data_frame = self.clone
  new_data_frame.filter!(as, &block)
end

#filter!(as = Array, &block) ⇒ Object

Takes a block to evaluate on each row. The row can be converted into an OpenStruct or a Hash for easier filter methods. Note, don’t try this with a hash or open struct unless you have facets available.



200
201
202
203
204
205
206
207
208
209
# File 'lib/data_frame/data_frame.rb', line 200

def filter!(as=Array, &block)
  as = infer_class(as)
  items = []
  self.items.each do |row|
    value = block.call(cast_row(row, as))
    items << row if value
  end
  @items = items.dup
  self
end

#filter_by_category(hash) ⇒ Object



299
300
301
302
303
304
305
306
307
308
# File 'lib/data_frame/data_frame.rb', line 299

def filter_by_category(hash)
  new_data_frame = self.dup
  hash.each do |key, value|
    key = key.to_underscore_sym
    next unless self.labels.include?(key)
    value = [value] unless value.is_a?(Array) or value.is_a?(Range)
    new_data_frame.filter!(:hash) {|row| value.include?(row[key])}
  end
  new_data_frame
end

#filter_by_category!(hash) ⇒ Object



310
311
312
313
314
315
316
317
# File 'lib/data_frame/data_frame.rb', line 310

def filter_by_category!(hash)
  hash.each do |key, value|
    key = key.to_underscore_sym
    next unless self.labels.include?(key)
    value = [value] unless value.is_a?(Array) or value.is_a?(Range)
    self.filter!(:hash) {|row| value.include?(row[key])}
  end
end

#import(rows) ⇒ Object

Loads a batch of rows. Expects an array of arrays, else you don’t know what you have.



65
66
67
68
69
# File 'lib/data_frame/data_frame.rb', line 65

def import(rows)
  rows.each do |row|
    self.add_item(row)
  end
end

#inspectObject



71
72
73
# File 'lib/data_frame/data_frame.rb', line 71

def inspect
  "DataFrame rows: #{self.rows.size} labels: #{self.labels.inspect}"
end

#j_binary_ize!(*columns) ⇒ Object

A weird name. This creates a column for every category in a column and marks each row by its value



260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
# File 'lib/data_frame/data_frame.rb', line 260

def j_binary_ize!(*columns)
  # Allows to mix a hash with the columns.
  options = columns.find_all {|e| e.is_a?(Hash)}.inject({}) {|h, e| h.merge!(e)}
  columns.delete_if {|e| e.is_a?(Hash)}
  
  # Generates new columns
  columns.each do |col|
    values = render_column(col.to_underscore_sym)
    values.categories.each do |category|
      full_name = (col.to_s + "_" + category.to_s).to_sym
      if options[:allow_overlap]
        category_map = values.inject([]) do |list, e|
          list << values.all_categories(e)
        end
        self.append!(full_name, category_map.map{|e| e.include?(category)})
      else
        self.append!(full_name, values.category_map.map{|e| e == category})
      end
    end
  end
end

#model(name = nil, &block) ⇒ Object

Returns a model if defined Defines a model with a block, if given and not defined Stores the model in the models container, which gives us access like: df.models.new_model_name…



8
9
10
11
12
13
14
15
16
# File 'lib/data_frame/model.rb', line 8

def model(name=nil, &block)
  return self.models[name] if self.models.table.keys.include?(name)
  return false unless block
  @pc = ParameterCapture.new(&block)
  model = self.filter(Hash) do |row|
    @pc.filter(row)
  end
  self.models.table[name] = model
end

#modelsObject



18
19
20
# File 'lib/data_frame/model.rb', line 18

def models
  @models ||= OpenStruct.new
end

#render_column(sym) ⇒ Object



104
105
106
107
108
# File 'lib/data_frame/data_frame.rb', line 104

def render_column(sym)
  i = @labels.index(sym)
  return nil unless i
  @items.transpose[i]
end

#render_row(sym) ⇒ Object



131
132
133
134
135
# File 'lib/data_frame/data_frame.rb', line 131

def render_row(sym)
  i = self.row_labels.index(sym)
  return nil unless i
  @items[i]
end

#replace!(column, values = nil, &block) ⇒ Object



167
168
169
170
171
172
173
174
175
# File 'lib/data_frame/data_frame.rb', line 167

def replace!(column, values=nil, &block)
  column = validate_column(column)
  if not values
    values = self.send(column)
    values.map! {|e| block.call(e)}
  end
  replace_column(column, values)
  self
end

#row_labelsObject



95
96
97
# File 'lib/data_frame/data_frame.rb', line 95

def row_labels
  @row_labels ||= []
end

#row_labels=(ary) ⇒ Object

Raises:

  • (ArgumentError)


99
100
101
102
# File 'lib/data_frame/data_frame.rb', line 99

def row_labels=(ary)
  raise ArgumentError, "Row labels must be an array" unless ary.is_a?(Array)
  @row_labels = ary
end

#subset_from_columns(*cols) ⇒ Object

Creates a new data frame, only with the specified columns.



245
246
247
248
249
250
251
252
253
254
255
256
# File 'lib/data_frame/data_frame.rb', line 245

def subset_from_columns(*cols)
  new_labels = self.labels.inject([]) do |list, label|
    list << label if cols.include?(label)
    list
  end
  new_data_frame = DataFrame.new(*self.labels)
  new_data_frame.import(self.items)
  self.labels.each do |label|
    new_data_frame.drop!(label) unless new_labels.include?(label)
  end
  new_data_frame
end