Class: DataFrame
- Inherits:
-
Object
- Object
- DataFrame
- Includes:
- ARFF
- Defined in:
- lib/data_frame/model.rb,
lib/data_frame/data_frame.rb
Overview
This allows me to have named columns and optionally named rows in a data frame, to work calculations (usually on the columns), to transpose the matrix and store the transposed matrix until the object is tainted.
Instance Attribute Summary collapse
-
#items ⇒ Object
(also: #rows)
readonly
The items stored in the frame.
-
#labels ⇒ Object
(also: #variables)
readonly
The labels of the data items.
-
#name ⇒ Object
An optional name, useful for arff files.
Class Method Summary collapse
-
.from_csv(obj, opts = {}) ⇒ Object
This is the neatest part of this neat gem.
Instance Method Summary collapse
- #add_item(item) ⇒ Object (also: #add)
-
#append!(column_name, value = nil) ⇒ Object
Adds a unique column to the table.
-
#columns(reset = false) ⇒ Object
(also: #to_hash, #to_dictionary)
The columns as a Dictionary or Hash This is cached, call columns(true) to reset the cache.
- #drop!(*labels) ⇒ Object
- #filter(as = Array, &block) ⇒ Object
-
#filter!(as = Array, &block) ⇒ Object
Takes a block to evaluate on each row.
- #filter_by_category(hash) ⇒ Object
- #filter_by_category!(hash) ⇒ Object
-
#import(rows) ⇒ Object
Loads a batch of rows.
-
#initialize(*labels) ⇒ DataFrame
constructor
A new instance of DataFrame.
- #inspect ⇒ Object
-
#j_binary_ize!(*columns) ⇒ Object
A weird name.
- #method_missing(sym, *args, &block) ⇒ Object
-
#model(name = nil, &block) ⇒ Object
Returns a model if defined Defines a model with a block, if given and not defined Stores the model in the models container, which gives us access like: df.models.new_model_name…
- #models ⇒ Object
- #render_column(sym) ⇒ Object
- #render_row(sym) ⇒ Object
- #replace!(column, values = nil, &block) ⇒ Object
- #row_labels ⇒ Object
- #row_labels=(ary) ⇒ Object
-
#subset_from_columns(*cols) ⇒ Object
Creates a new data frame, only with the specified columns.
Methods included from ARFF
Constructor Details
#initialize(*labels) ⇒ DataFrame
Returns a new instance of DataFrame.
85 86 87 88 |
# File 'lib/data_frame/data_frame.rb', line 85 def initialize(*labels) @labels = labels.map {|e| e.to_underscore_sym } @items = TransposableArray.new end |
Dynamic Method Handling
This class handles dynamic methods through the method_missing method
#method_missing(sym, *args, &block) ⇒ Object
137 138 139 140 141 142 143 144 145 146 147 |
# File 'lib/data_frame/data_frame.rb', line 137 def method_missing(sym, *args, &block) if self.labels.include?(sym) render_column(sym) elsif self.row_labels.include?(sym) render_row(sym) elsif @items.respond_to?(sym) @items.send(sym, *args, &block) else super end end |
Instance Attribute Details
#items ⇒ Object (readonly) Also known as: rows
The items stored in the frame
80 81 82 |
# File 'lib/data_frame/data_frame.rb', line 80 def items @items end |
#labels ⇒ Object (readonly) Also known as: variables
The labels of the data items
76 77 78 |
# File 'lib/data_frame/data_frame.rb', line 76 def labels @labels end |
#name ⇒ Object
An optional name, useful for arff files
83 84 85 |
# File 'lib/data_frame/data_frame.rb', line 83 def name @name end |
Class Method Details
.from_csv(obj, opts = {}) ⇒ Object
This is the neatest part of this neat gem. DataFrame.from_csv can be called in a lot of ways: DataFrame.from_csv(csv_contents) DataFrame.from_csv(filename) DataFrame.from_csv(url) If you need to define converters for FasterCSV, do it before calling this method: FasterCSV::Converters = lambda{|f| f == ‘foo’ ? ‘bar’ : ‘foo’} DataFrame.from_csv(‘example.com/my_special_url.csv’, :converters => :special) This returns bar where ‘foo’ was found and ‘foo’ everywhere else.
19 20 21 22 23 24 25 26 27 |
# File 'lib/data_frame/data_frame.rb', line 19 def from_csv(obj, opts={}) labels, table = infer_csv_contents(obj, opts) name = infer_name_from_contents(obj, opts) return nil unless labels and table df = new(*labels) df.import(table) df.name = name df end |
Instance Method Details
#add_item(item) ⇒ Object Also known as: add
90 91 92 |
# File 'lib/data_frame/data_frame.rb', line 90 def add_item(item) self.items << item end |
#append!(column_name, value = nil) ⇒ Object
Adds a unique column to the table
283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 |
# File 'lib/data_frame/data_frame.rb', line 283 def append!(column_name, value=nil) raise ArgumentError, "Can't have duplicate column names" if self.labels.include?(column_name) self.labels << column_name.to_underscore_sym if value.is_a?(Array) self.items.each_with_index do |item, i| item << value[i] end else self.items.each do |item| item << value end end # Because we are tainting the sub arrays, the TaintableArray doesn't know it's been changed. self.items.taint end |
#columns(reset = false) ⇒ Object Also known as: to_hash, to_dictionary
The columns as a Dictionary or Hash This is cached, call columns(true) to reset the cache.
115 116 117 118 119 120 121 122 123 124 125 126 127 |
# File 'lib/data_frame/data_frame.rb', line 115 def columns(reset=false) @columns = nil if reset return @columns if @columns container = defined?(Dictionary) ? Dictionary.new : Hash.new i = 0 @columns = @items.transpose.inject(container) do |cont, col| cont[@labels[i]] = col i += 1 cont end end |
#drop!(*labels) ⇒ Object
149 150 151 152 153 154 |
# File 'lib/data_frame/data_frame.rb', line 149 def drop!(*labels) labels.each do |label| drop_one!(label) end self end |
#filter(as = Array, &block) ⇒ Object
211 212 213 214 |
# File 'lib/data_frame/data_frame.rb', line 211 def filter(as=Array, &block) new_data_frame = self.clone new_data_frame.filter!(as, &block) end |
#filter!(as = Array, &block) ⇒ Object
Takes a block to evaluate on each row. The row can be converted into an OpenStruct or a Hash for easier filter methods. Note, don’t try this with a hash or open struct unless you have facets available.
200 201 202 203 204 205 206 207 208 209 |
# File 'lib/data_frame/data_frame.rb', line 200 def filter!(as=Array, &block) as = infer_class(as) items = [] self.items.each do |row| value = block.call(cast_row(row, as)) items << row if value end @items = items.dup self end |
#filter_by_category(hash) ⇒ Object
299 300 301 302 303 304 305 306 307 308 |
# File 'lib/data_frame/data_frame.rb', line 299 def filter_by_category(hash) new_data_frame = self.dup hash.each do |key, value| key = key.to_underscore_sym next unless self.labels.include?(key) value = [value] unless value.is_a?(Array) or value.is_a?(Range) new_data_frame.filter!(:hash) {|row| value.include?(row[key])} end new_data_frame end |
#filter_by_category!(hash) ⇒ Object
310 311 312 313 314 315 316 317 |
# File 'lib/data_frame/data_frame.rb', line 310 def filter_by_category!(hash) hash.each do |key, value| key = key.to_underscore_sym next unless self.labels.include?(key) value = [value] unless value.is_a?(Array) or value.is_a?(Range) self.filter!(:hash) {|row| value.include?(row[key])} end end |
#import(rows) ⇒ Object
Loads a batch of rows. Expects an array of arrays, else you don’t know what you have.
65 66 67 68 69 |
# File 'lib/data_frame/data_frame.rb', line 65 def import(rows) rows.each do |row| self.add_item(row) end end |
#inspect ⇒ Object
71 72 73 |
# File 'lib/data_frame/data_frame.rb', line 71 def inspect "DataFrame rows: #{self.rows.size} labels: #{self.labels.inspect}" end |
#j_binary_ize!(*columns) ⇒ Object
A weird name. This creates a column for every category in a column and marks each row by its value
260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 |
# File 'lib/data_frame/data_frame.rb', line 260 def j_binary_ize!(*columns) # Allows to mix a hash with the columns. = columns.find_all {|e| e.is_a?(Hash)}.inject({}) {|h, e| h.merge!(e)} columns.delete_if {|e| e.is_a?(Hash)} # Generates new columns columns.each do |col| values = render_column(col.to_underscore_sym) values.categories.each do |category| full_name = (col.to_s + "_" + category.to_s).to_sym if [:allow_overlap] category_map = values.inject([]) do |list, e| list << values.all_categories(e) end self.append!(full_name, category_map.map{|e| e.include?(category)}) else self.append!(full_name, values.category_map.map{|e| e == category}) end end end end |
#model(name = nil, &block) ⇒ Object
Returns a model if defined Defines a model with a block, if given and not defined Stores the model in the models container, which gives us access like: df.models.new_model_name…
8 9 10 11 12 13 14 15 16 |
# File 'lib/data_frame/model.rb', line 8 def model(name=nil, &block) return self.models[name] if self.models.table.keys.include?(name) return false unless block @pc = ParameterCapture.new(&block) model = self.filter(Hash) do |row| @pc.filter(row) end self.models.table[name] = model end |
#models ⇒ Object
18 19 20 |
# File 'lib/data_frame/model.rb', line 18 def models @models ||= OpenStruct.new end |
#render_column(sym) ⇒ Object
104 105 106 107 108 |
# File 'lib/data_frame/data_frame.rb', line 104 def render_column(sym) i = @labels.index(sym) return nil unless i @items.transpose[i] end |
#render_row(sym) ⇒ Object
131 132 133 134 135 |
# File 'lib/data_frame/data_frame.rb', line 131 def render_row(sym) i = self.row_labels.index(sym) return nil unless i @items[i] end |
#replace!(column, values = nil, &block) ⇒ Object
167 168 169 170 171 172 173 174 175 |
# File 'lib/data_frame/data_frame.rb', line 167 def replace!(column, values=nil, &block) column = validate_column(column) if not values values = self.send(column) values.map! {|e| block.call(e)} end replace_column(column, values) self end |
#row_labels ⇒ Object
95 96 97 |
# File 'lib/data_frame/data_frame.rb', line 95 def row_labels @row_labels ||= [] end |
#row_labels=(ary) ⇒ Object
99 100 101 102 |
# File 'lib/data_frame/data_frame.rb', line 99 def row_labels=(ary) raise ArgumentError, "Row labels must be an array" unless ary.is_a?(Array) @row_labels = ary end |
#subset_from_columns(*cols) ⇒ Object
Creates a new data frame, only with the specified columns.
245 246 247 248 249 250 251 252 253 254 255 256 |
# File 'lib/data_frame/data_frame.rb', line 245 def subset_from_columns(*cols) new_labels = self.labels.inject([]) do |list, label| list << label if cols.include?(label) list end new_data_frame = DataFrame.new(*self.labels) new_data_frame.import(self.items) self.labels.each do |label| new_data_frame.drop!(label) unless new_labels.include?(label) end new_data_frame end |