Class: Proptax::Consolidator

Inherits:
Object
  • Object
show all
Defined in:
lib/proptax.rb

Constant Summary collapse

Headers =

Headers for publicly available assessment data for 2019.

['Current Assessed Value', 'Roll Number', 'Location Address', 'Taxation Status',
'Assessment Class', 'Property Type', 'Property Use', 'Valuation Approach',
'Market Adjustment', 'Community', 'Market Area', 'Sub Neighbourhood Code (SNC)',
'Sub Market Area', 'Influences', 'Land Use Designation', 'Assessable Land Area',
'Building Count', 'Building Type/Structure', 'Year of Construction',
'Quality', 'Total Living Area Above Grade', 'Living Area Below Grade',
'Garage Area', 'Fireplace Count']
MultiFieldHeaders =

Keys that can be assigned multiple values

['Influences']
Sections =

Section headers for publicly available assessment data for 2019.

['Assessment Details', 'Assessment Approach', 'Location Details', 'Land Details', 'Building Details']

Class Method Summary collapse

Class Method Details

.clean(text, header) ⇒ Object

Return square footage only when given spacial data

Parameters:

  • string
    • the text data to be cleaned

  • string
    • the header associated with the text data

Returns:

  • string



136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
# File 'lib/proptax.rb', line 136

def self.clean(text, header)
  case header
  when 'Current Assessed Value'
    text.gsub!(/,/, '')
  when 'Assessable Land Area',
       'Total Living Area Above Grade',
       'Living Area Below Grade',
       'Garage Area'
    text = text.split(' ')[0].gsub(/,/, '')
#      when 'Basement Suite',
#           'Walkout Basement',
#           'Constructed On Original Foundation',
#           'Modified For Disabled',
#           'Old House On New Foundation',
#           'Basementless',
#           'Penthouse',
#           'Market Adjustment'
#        text = text == 'Yes' ? 'T' : 'F'
  when 'Market Adjustment'
    text = text == 'Yes' ? 'T' : 'F'
  end
  text
end

.parse(text) ⇒ Object

Take text-converted 2019 assessment reports and extract the relevant data into an array.

Parameters:

  • string
    • text converted 2019 assessment report

Returns:

  • array



67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
# File 'lib/proptax.rb', line 67

def self.parse(text)
  csv_hash = {}
  current_header = nil
  text.split(/\n/).each do |line|
    # Replace non-breaking spaces with space and strip
    line = line.gsub(/\u00a0/, ' ').strip
  
    # Remove soft hyphens
    line.gsub!(/\u00AD/, '')
    ## This is untested
    line.gsub!(/â\\200\\224/, '-')

    # Remove trailing colon
    line.gsub!(/:+$/, "");
  
    next if line.empty?
  
    if current_header
      if Sections.include? line
        current_header = nil 
        next
      end

      if MultiFieldHeaders.include? current_header
        if csv_hash[current_header].nil?
          csv_hash[current_header] = clean(line, current_header)
        else
          csv_hash[current_header] += "/#{clean(line, current_header)}"
        end
      else
        csv_hash[current_header] = clean(line, current_header)
        current_header = nil
      end
    elsif Headers.include? line
      current_header = line
    #
    # Need a rasterized report (a la, Windows 7)
    #
    # For tesseract-extracted text
#        else
#          Headers.each do |header|
#            current_header = line[Regexp.new("^#{Regexp.escape(header)}")]
#            break if current_header
#          end
#          if current_header
#            line = line.gsub(Regexp.new("^#{Regexp.escape(current_header)}"), '').strip
#            csv_hash[current_header] = clean(line, current_header)
#            current_header = nil
#          end
    end
  end
#      Headers.map { |header| csv_hash[header] || '0' }
  Headers.map do |header|
    if header == 'Renovation'
      csv_hash[header] || 'unk.' 
    else
      csv_hash[header] || '0' 
    end
  end
end

.process(dir_name) ⇒ Object

Convert a directory containing 2019 PDF assessment reports to text and write the relevant CSV information to stdout

Parameters:

  • string
    • path to directory containing PDFs

Returns:

  • nil



38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
# File 'lib/proptax.rb', line 38

def self.process(dir_name)
  # `strip!': can't modify frozen String (RuntimeError)
  #dir_name.strip!
  dir_name = dir_name.strip
  if dir_name.empty?
    puts "No directory specified"
    return
  end

  begin
    path = Pathname.new(dir_name).realpath
    puts Headers.to_csv if Dir.exists?(path)
    Dir.glob("#{path}/*.{pdf,PDF}") do |pdf|
      data = `pdftotext "#{pdf}" -`
      puts parse(data).to_csv
    end
  rescue 
    puts "Directory #{dir_name} does not exist"
  end
end