Class: RStore::CSV

Inherits:
Object show all
Defined in:
lib/rstore/csv.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(&block) ⇒ CSV

This constructor takes a block yielding an implicit instance of self. Within the block, the following methods need to be called:

Examples:

RStore::CSV.new do
  from '../easter/children', :recursive => true                   # select a directory or
  from '../christmas/children/toys.csv'                           # file, or
  from 'www.example.com/sweets.csv', :selector => 'pre div.line'  # URL
  to   'company.products'                                         # provide database and table name
  run                                                             # run the program
end


37
38
39
40
41
42
43
44
45
46
47
48
49
50
# File 'lib/rstore/csv.rb', line 37

def initialize &block
  @data_hash  = {}
  @data_array = []
  @database   = nil
  @table      = nil

  # Tracking method calls to #from, #to, and #run.
  @from = false
  @to   = false
  @run  = false

  instance_eval(&block) if block_given?

end

Instance Attribute Details

#data_arrayArray<Data> (readonly)

Returns holds RStore::Data objects that are used internally to store information from a data source.

Returns:

  • (Array<Data>)

    holds RStore::Data objects that are used internally to store information from a data source.



20
21
22
# File 'lib/rstore/csv.rb', line 20

def data_array
  @data_array
end

#databaseBaseDB (readonly)

Returns a subclass of BaseDB.

Returns:



16
17
18
# File 'lib/rstore/csv.rb', line 16

def database
  @database
end

#tableBaseTable (readonly)

Returns a sublcass of BaseTable.

Returns:



18
19
20
# File 'lib/rstore/csv.rb', line 18

def table
  @table
end

Class Method Details

.change_default_options(options) ⇒ void

This method returns an undefined value.

Change default options recognized by #from The new option values apply to all following instances of RStore::CSV Options can be reset to their defaults by calling reset_default_options See #from for a list of all options and their default values.

Examples:

# Search directories recursively and handle the first row of a file as data by default
RStore::CSV.change_default_options(:recursive => true, :has_headers => false)

Parameters:

  • options (Hash)

    Keys from default options with their respective new values.



275
276
277
# File 'lib/rstore/csv.rb', line 275

def self.change_default_options options
  Configuration.change_default_options(options)
end

.database_table(db_table) ⇒ Object

Raises:

  • (ArgumentError)


114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/rstore/csv.rb', line 114

def self.database_table db_table
  raise ArgumentError, "The name of the database and table have to be separated with a dot (.)"  unless delimiter_correct?(db_table)

  db, tb = db_table.split('.')

  database = BaseDB.db_classes[db.downcase.to_sym]
  table    = BaseTable.table_classes[tb.downcase.to_sym]

  raise Exception, "Database '#{db}' not found"  if database.nil?
  raise Exception, "Table '#{tb}' not found"     if table.nil?

  [database, table]
end

.delimiter_correct?(name) ⇒ Boolean

Returns:

  • (Boolean)


255
256
257
# File 'lib/rstore/csv.rb', line 255

def self.delimiter_correct? name
  !!(name =~ /^[^\.]+\.[^\.]+$/)
end

.query(db_table) {|table| ... } ⇒ void

This method returns an undefined value.

Easy querying by yielding a Sequel::Dataset instance of your table.

Examples:

RStore::CSV.query('company.products') do |table|    # table = Sequel::Dataset object
  table.all                                         # fetch everything
  table.all[3]                                      # fetch row number 4
  table.filter(:id => 2).update(:on_stock => true)  # update entry
  table.filter(:id => 3).delete                     # delete entry
end

Parameters:

  • db_table (String)

    The name of the database and table, separated by a dot.

Yield Parameters:

  • table (Sequel::Dataset)

    The dataset of your table



245
246
247
248
249
250
# File 'lib/rstore/csv.rb', line 245

def self.query db_table, &block
  database, table = database_table(db_table)
  database.connect do |db|
    block.call(db[table.name]) if block_given?  # Sequel::Dataset
  end
end

.reset_default_optionsvoid

This method returns an undefined value.

Reset the options recognized by #from to their default values.

Examples:

RStore::CSV.reset_default_options


285
286
287
# File 'lib/rstore/csv.rb', line 285

def self.reset_default_options
  Configuration.reset_default_options
end

Instance Method Details

#create_table(db) ⇒ Object



214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/rstore/csv.rb', line 214

def create_table db

  name = @table.name

  if @database.connection_info.is_a?(Hash)
    if @database.connection_info[:adapter] == 'mysql'
      # http://sequel.rubyforge.org/rdoc/files/doc/release_notes/2_10_0_txt.html
      Sequel::MySQL.default_engine = 'InnoDB'
      # http://stackoverflow.com/questions/1671401/unable-to-output-mysql-tables-which-involve-dates-in-sequel
      Sequel::MySQL.convert_invalid_date_time = nil
    end
  end

  unless db.table_exists?(name)
    db.create_table(name, &@table.table_info)
  end

end

#from(source, options) ⇒ void #from(source) ⇒ void

This method returns an undefined value.

Specify the source of the csv file(s) There can be several calls to this method on given instance of RStore::CSV. This method has to be called before #run.

Examples:

store = RStore::CSV.new
# fetching data from a file
store.from '../christmas/children/toys.csv'
# fetching data from a directory
store.from '../easter/children', :recursive => true
# fetching data from an URL
store.from 'www.example.com/sweets.csv', :selector => 'pre div.line'

Overloads:

  • #from(source, options) ⇒ void

    Parameters:

    • source (String)

      The relative or full path to a directory, file, or an URL

    • options (Hash)

      The options used to customize fetching and parsing of csv data

    Options Hash (options):

    • :has_headers (Boolean)

      When set to false, the first line of a file is processed as data, otherwise it is discarded. (default: true)

    • :recursive (Boolean)

      When set to true and a directory is given, recursively search for files. Non-csv files are skipped. (default: false]

    • :selector (String)

      Mandatory css selector when fetching data from an URL. Uses the same syntax as / Nokogiri, default: ‘“”`

    • :col_sep (String)

      The String placed between each field. (default: ‘“,”`)

    • :row_sep (String, Symbol)

      The String appended to the end of each row. (default: :auto)

    • :quote_car (String)

      The character used to quote fields. (default: ‘’“‘`)

    • :field_size_limit (Integer, Nil)

      The maximum size CSV will read ahead looking for the closing quote for a field. (default: nil)

    • :skip_blanks (Boolean)

      When set to a true value, CSV will skip over any rows with no content. (default: false)

    • :digit_seps (Array)

      The *thousands separator* and *decimal mark* used for numbers in the data source (default: ‘[’,‘, ’.‘]`). Different countries use different thousands separators and decimal marks, and setting this options ensures that parsing of these numbers succeeds. Note that all numbers will still be stored in the format that Ruby recognizes, that is with a point (.) as the decimal mark.

  • #from(source) ⇒ void

    Parameters:

    • source (String)

      The relative or full path to a directory, file, or an URL. The default options will be used.



89
90
91
92
93
# File 'lib/rstore/csv.rb', line 89

def from source, options={}
  crawler = FileCrawler.new(source, :csv, options)
  @data_hash.merge!(crawler.data_hash)
  @from = true
end

#ran_once?Boolean

Test if the data has been inserted into the database table.

Returns:

  • (Boolean)


261
262
263
# File 'lib/rstore/csv.rb', line 261

def ran_once?
  @run == true
end

#read_data(data_object) ⇒ Object



183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# File 'lib/rstore/csv.rb', line 183

def read_data data_object
  path    = data_object.path
  options = data_object.options

  begin
    if path.url?
      require 'nokogiri'
      doc = Nokogiri::HTML(open(path))
      selector = options[:file_options][:selector]

      content = doc.css(selector).inject("") do |result, link|
        result << link.content << "\n"
        result
      end
    else
      content = File.read(path)
    end

  raise ArgumentError, "Empty content!"  if content.empty?

  rescue Exception => e
    logger = Logger.new(data_object)
    logger.log(:fetch, e)
    logger.error
  end

  content
end

#runvoid

This method returns an undefined value.

Start processing the csv files, storing the data into a database table. Both methods, #from and #to, have to be called before this method.

Raises:

  • (Exception)


132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
# File 'lib/rstore/csv.rb', line 132

def run
  return  if ran_once?   # Ignore subsequent calls to #run
  raise Exception, "At least one method 'from' has to be called before method 'run'"  unless @from == true
  raise Exception, "Method 'to' has to be called before method 'run'"                 unless @to   == true

  @data_hash.each do |path, data|
    content = read_data(data)
    @data_array << Data.new(path, content, :raw, data.options)
  end

  @database.connect do |db|

    create_table(db)
    name = @table.name

    prepared_data_array = @data_array.map do |data|
      data.parse_csv.convert_fields(db, name)
    end

    insert_all(prepared_data_array, db, name)

    @run = true
    message = <<-TEXT.gsub(/^\s+/, '')
    ===============================
    All data has been successfully inserted into table '#{database.name}.#{table.name}'"
    -------------------------------
    You can retrieve all table data with the following code:
    -------------------------------
    #{self.class}.query('#{database.name}.#{table.name}') do |table|
      table.all
    end
    ===============================
    TEXT
    puts message
  end
end

#to(db_table) ⇒ void

This method returns an undefined value.

Choose the database table to store the csv data into. This method has to be called before #run.

Examples:

store = RStore::CSV.new
store.to('company.products')

Parameters:

  • db_table (String)

    The names of the database and table, separated by a dot, e.g. ‘database.table’. The name of the database has to correspond to a subclass of RStore::BaseDB: CompanyDB < RStore::BaseDB -> ‘company’ The name of the table has to correspond to a subclass of RStore::BaseTable: DataTable < RStore::BaseTable -> ‘data’



107
108
109
110
# File 'lib/rstore/csv.rb', line 107

def to db_table
  @database, @table = CSV.database_table(db_table)
  @to       = true
end