Class: EXEL::Processors::SplitProcessor
- Inherits:
-
Object
- Object
- EXEL::Processors::SplitProcessor
- Includes:
- LoggingHelper
- Defined in:
- lib/exel/processors/split_processor.rb
Overview
Implements the split
instruction. Used to concurrently process a large file by splitting it into small chunks to be separately processed.
Supported Context Options
-
:delete_resource
Defaults to true, can be set to false to preserve the original resource. Otherwise, it will be deleted when splitting is complete -
:chunk_size
Set to specify the number of lines that each chunk should contain
Constant Summary collapse
- DEFAULT_CHUNK_SIZE =
Number of lines to include in each chunk. Can be overridden by setting :chunk_size in the context
1000
Instance Attribute Summary collapse
-
#block ⇒ Object
Returns the value of attribute block.
-
#file_name ⇒ Object
Returns the value of attribute file_name.
Instance Method Summary collapse
- #generate_chunk(content) ⇒ Object
-
#initialize(context) ⇒ SplitProcessor
constructor
The context must contain a CSV File object in context.
- #process(callback) ⇒ Object
- #process_line(line, callback) ⇒ Object
Methods included from LoggingHelper
#log_debug, #log_error, #log_fatal, #log_info, #log_warn, #logger
Constructor Details
#initialize(context) ⇒ SplitProcessor
The context must contain a CSV File object in context
24 25 26 27 28 29 30 |
# File 'lib/exel/processors/split_processor.rb', line 24 def initialize(context) @buffer = [] @tempfile_count = 0 @context = context @file = context[:resource] @context[:delete_resource] = true if @context[:delete_resource].nil? end |
Instance Attribute Details
#block ⇒ Object
Returns the value of attribute block.
18 19 20 |
# File 'lib/exel/processors/split_processor.rb', line 18 def block @block end |
#file_name ⇒ Object
Returns the value of attribute file_name.
18 19 20 |
# File 'lib/exel/processors/split_processor.rb', line 18 def file_name @file_name end |
Instance Method Details
#generate_chunk(content) ⇒ Object
47 48 49 50 51 52 53 54 55 |
# File 'lib/exel/processors/split_processor.rb', line 47 def generate_chunk(content) @tempfile_count += 1 chunk = Tempfile.new([chunk_filename, '.csv']) chunk.write(content) chunk.rewind log_info "Generated chunk # #{@tempfile_count} for file #{filename(@file)} in #{chunk.path}" chunk end |
#process(callback) ⇒ Object
32 33 34 35 |
# File 'lib/exel/processors/split_processor.rb', line 32 def process(callback) process_file(callback) finish(callback) end |
#process_line(line, callback) ⇒ Object
37 38 39 40 41 42 43 44 45 |
# File 'lib/exel/processors/split_processor.rb', line 37 def process_line(line, callback) if line == :eof flush_buffer(callback) else @buffer << CSV.generate_line(line) flush_buffer(callback) if buffer_full? end end |