Class: Proptax::Consolidator
- Inherits:
-
Object
- Object
- Proptax::Consolidator
- Defined in:
- lib/proptax.rb
Constant Summary collapse
- Headers =
Headers for publicly available assessment data for 2019.
['Current Assessed Value', 'Roll Number', 'Location Address', 'Taxation Status', 'Assessment Class', 'Property Type', 'Property Use', 'Valuation Approach', 'Market Adjustment', 'Community', 'Market Area', 'Sub Neighbourhood Code (SNC)', 'Sub Market Area', 'Influences', 'Land Use Designation', 'Assessable Land Area', 'Building Count', 'Building Type/Structure', 'Year of Construction', 'Quality', 'Total Living Area Above Grade', 'Living Area Below Grade', 'Garage Area', 'Fireplace Count']
- MultiFieldHeaders =
Keys that can be assigned multiple values
['Influences']
- Sections =
Section headers for publicly available assessment data for 2019.
['Assessment Details', 'Assessment Approach', 'Location Details', 'Land Details', 'Building Details']
Class Method Summary collapse
-
.clean(text, header) ⇒ Object
Return square footage only when given spacial data.
-
.parse(text) ⇒ Object
Take text-converted 2019 assessment reports and extract the relevant data into an array.
-
.process(dir_name) ⇒ Object
Convert a directory containing 2019 PDF assessment reports to text and write the relevant CSV information to stdout.
Class Method Details
.clean(text, header) ⇒ Object
Return square footage only when given spacial data
136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
# File 'lib/proptax.rb', line 136 def self.clean(text, header) case header when 'Current Assessed Value' text.gsub!(/,/, '') when 'Assessable Land Area', 'Total Living Area Above Grade', 'Living Area Below Grade', 'Garage Area' text = text.split(' ')[0].gsub(/,/, '') # when 'Basement Suite', # 'Walkout Basement', # 'Constructed On Original Foundation', # 'Modified For Disabled', # 'Old House On New Foundation', # 'Basementless', # 'Penthouse', # 'Market Adjustment' # text = text == 'Yes' ? 'T' : 'F' when 'Market Adjustment' text = text == 'Yes' ? 'T' : 'F' end text end |
.parse(text) ⇒ Object
Take text-converted 2019 assessment reports and extract the relevant data into an array.
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 |
# File 'lib/proptax.rb', line 67 def self.parse(text) csv_hash = {} current_header = nil text.split(/\n/).each do |line| # Replace non-breaking spaces with space and strip line = line.gsub(/\u00a0/, ' ').strip # Remove soft hyphens line.gsub!(/\u00AD/, '') ## This is untested line.gsub!(/â\\200\\224/, '-') # Remove trailing colon line.gsub!(/:+$/, ""); next if line.empty? if current_header if Sections.include? line current_header = nil next end if MultiFieldHeaders.include? current_header if csv_hash[current_header].nil? csv_hash[current_header] = clean(line, current_header) else csv_hash[current_header] += "/#{clean(line, current_header)}" end else csv_hash[current_header] = clean(line, current_header) current_header = nil end elsif Headers.include? line current_header = line # # Need a rasterized report (a la, Windows 7) # # For tesseract-extracted text # else # Headers.each do |header| # current_header = line[Regexp.new("^#{Regexp.escape(header)}")] # break if current_header # end # if current_header # line = line.gsub(Regexp.new("^#{Regexp.escape(current_header)}"), '').strip # csv_hash[current_header] = clean(line, current_header) # current_header = nil # end end end # Headers.map { |header| csv_hash[header] || '0' } Headers.map do |header| if header == 'Renovation' csv_hash[header] || 'unk.' else csv_hash[header] || '0' end end end |
.process(dir_name) ⇒ Object
Convert a directory containing 2019 PDF assessment reports to text and write the relevant CSV information to stdout
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
# File 'lib/proptax.rb', line 38 def self.process(dir_name) # `strip!': can't modify frozen String (RuntimeError) #dir_name.strip! dir_name = dir_name.strip if dir_name.empty? puts "No directory specified" return end begin path = Pathname.new(dir_name).realpath puts Headers.to_csv if Dir.exists?(path) Dir.glob("#{path}/*.{pdf,PDF}") do |pdf| data = `pdftotext "#{pdf}" -` puts parse(data).to_csv end rescue puts "Directory #{dir_name} does not exist" end end |