Class: RedParse
- Inherits:
-
Object
- Object
- RedParse
- Defined in:
- lib/redparse/version.rb,
lib/redparse/generate.rb
Constant Summary collapse
- VERSION =
'0.8.2'
- CHARMAPPINGS =
{ ?`=>'bquote', ?~=>'tilde', ?!=>'bang', ?@=>'at', ?#=>'num', ?$=>'dollar', ?%=>'percent', ?^=>'caret', ?&=>'and', ?*=>'star', ?(=>'lparen', ?)=>'rparen', ?-=>'minus', ?+=>'plus', ?==>'equals', ?{=>'lbrace', ?}=>'rbrace', ?[=>'lbrack', ?]=>'rbrack', ?|=>'or', ?\\=>'bslash',?:=>'colon', ?;=>'semicolon', ?"=>'dquote', ?'=>'squote', ?,=>'comma', ?.=>'dot', ?<=>'less', ?>=>'more', ??=>'q', ?/=>'y', ?\s=>'space', ?X=>'x', }
- STRMAPPINGS =
{ '::'=>"XX", '++'=>"Xeval", '--'=>"Xsingleton", '[]'=>"Xbrackets", '->'=>"Xcalling", }
- STRMAP_REX =
/#{STRMAPPINGS.keys.map{|x| Regexp.quote x}.join "|"}/
Class Method Summary collapse
Instance Method Summary collapse
- #action2c(action) ⇒ Object
-
#error_handler ⇒ Object
4.5 Error Recovery yacc’s error recovery mechanism is rather idiosyncratic.
-
#generate_c(output) ⇒ Object
The case arms of the switch statement are taken directly from the goto table that was computed by the LALR(1) grammar analysis.
-
#init_code ⇒ Object
3 LR-Parsing Mechanics We briefly explain the fundamentals of shift-reduce parsing (which represents the LR(1) family) without going into any more detail than necessary for subsequent exposition.
-
#nonterminal(j) ⇒ Object
User actions are associated with reductions, and the code corresponding to a given production is expanded in-place.
-
#repl(rule, m) ⇒ Object
The state number is stored in the stack, followed by possibly invoking the lexical analyzer.
-
#state(state_n, n) ⇒ Object
4.2 Hard-coded States For each automata state, mule creates code responsible for simulating the action of that state based on the current input token.
- #state_utils ⇒ Object
- #str2cname(str) ⇒ Object
Class Method Details
.str2cname(str) ⇒ Object
354 355 356 357 358 359 |
# File 'lib/redparse/generate.rb', line 354 def self.str2cname str str.gsub(STRMAP_REX){|str| STRMAPPINGS[str] } \ .gsub(/(?!#{LETTER_DIGIT}).|[X]/o){|ch| "X"+ esc=CHARMAPPINGS[ch[0]] ? esc : ch[0].to_s(16) } end |
Instance Method Details
#action2c(action) ⇒ Object
137 138 139 140 141 142 143 144 145 146 147 148 |
# File 'lib/redparse/generate.rb', line 137 def action2c(action) case action when Rule; "goto reduce_#{str2cname action.name};" when nil,:error; "goto error_handler;" when ParserState; "goto shift_state_#{str2cname action.name};" when :accept; "YYACCEPT;" when MultiReduce; action.action2c when MultiShift; action.action2c # when StackMonkey; action.action2c else fail "unexpected action type: #{action.class} = #{action}" end end |
#error_handler ⇒ Object
4.5 Error Recovery yacc’s error recovery mechanism is rather idiosyncratic. In fact, examining two books, [LMB92] and [ASU86], and the output generated by yacc yields three dierent descriptions of the recovery mechanism. We have tried to be faithful to the output of yacc. Fortunately, the mechanism has few consequences to the generation of the rest of the hard- coded parser. The only change to the parser is the maintenance of the variable, yyerrorstatus. Although relatively short, the code below is very subtle, like the explanation of yacc’s error recovery mechanism. The code is given only for completeness.
301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 |
# File 'lib/redparse/generate.rb', line 301 def error_handler %[ error_handler: if (yyerrorstatus > 2){ yyerror("syntax error"); } user_error_handler: if (yyerrorstatus == 0){ huh if (la_identity == 0) YYABORT;// End of input. la_identity = yylex(&la_token); switch (OLDSTACK){ #{@states.map{|state| i=state.small_int "case #{i}: goto state_action_#{str2cname state.name};\n" } } }else{ yyerrorstatus = 0; while (stack != stack_start){ switch (OLDSTACK){ case N: goto state_M;// iff M = goto[N,error]. . . . } stack--; } YYABORT;// Empty stack. } ] end |
#generate_c(output) ⇒ Object
The case arms of the switch statement are taken directly from the goto table that was computed by the LALR(1) grammar analysis. Because this switch cannot fail, no default entry is needed. However, making the most common case arm the default is a trivial time and space optimization.
277 278 279 280 281 282 283 284 285 |
# File 'lib/redparse/generate.rb', line 277 def generate_c output output<< init_code output<< state_utils (0...RULES().size).each_with_index{|i,m| output<< (reduce i,m) } node_types.each{|nt| output<< (nonterminal nt) } map_with_index(all_states){|st,i| output<< (state st,i) } #output<< error_handler #disabled, i have rules for error recovery output<< "}" end |
#init_code ⇒ Object
3 LR-Parsing Mechanics We briefly explain the fundamentals of shift-reduce parsing (which represents the LR(1) family) without going into any more detail than necessary for subsequent exposition. LALR(1) parsers like yacc simulate, either directly or indirectly, a very simple automaton with a stack of automaton states [FL88]. (Parsers generated by yacc also maintain a semantic stack, but since that stack grows in parallel with the state stack, we only describe the use of the state stack here.) Simulating the automaton requires two mechanisms: one for determining the action, which is determined by the current input symbol and the state on the top of the stack, and one for determining state transitions based on the current top of stack and a grammar symbol. At parser-generation time LALR(1) grammar analysis builds these tables, called action and goto, respectively. (The analysis is necessary regardless of whether a table-driven or hard-coded parser is desired.) Functionally, these tables have the following signatures.
goto: state x symbol -> state action: state x token -> shift,reduce_y,accept,error
There are only four possible actions: reduce, shift, accept, and error. Reduce actions are parameterized by the grammar production being reduced. Actions are described below. let TOS be the state on the top of the stack, and let la_identity be the current lookahead token.
shift A shift pushes goto onto the stack, and updates la_identity by advancing the lexical analyzer.
reduce_y A reduction processes production Y : X -> x_1…x_n, which requires popping n states off the stack, followed by pushing goto[TOS, X]. (The semantic action of the parser relating to this production would be executed prior to popping states off the stack.)
accept An accept signals a successful parse.
error An error requires error reporting and/or recovery.
4 Simple Implementation mule creates a single parsing routine, yyparse(), that simulates the LALR(1) parser directly in ANSI C, without interpreting any tables. The routine has five simple parts: initialization, automata states, reduction actions, nonterminal transitions, and error recovery. Although very similar to the inverted table structure in [Pfa90], this structure avoids the duplication of semantic action routines. Another diverence is the yacc-compatible error recovery. The structure is simple, with all code being generated from a tiny set of small, well-defined templates that directly mirror the grammar or LALR(1) automaton. Since both the state stack and the semantic stack grow in unison, we wrap the stack entries into a single structure, StackType.
4.1 Initialization The initalization phase simply sets up bookkeeping and data structures for subsequent automata simulation. It is grammar-independent.
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# File 'lib/redparse/generate.rb', line 57 def init_code " #define YYABORT do { \\ free(start_stack);return -1; \\ } while(0) #define YYACCEPT do { \\ YYSTYPE result=SEMANTIC_STACK; \\ free(start_stack); \\ return result; \\ } while(0) /*#define yyclearin_token = yylex(&la_token)*/ #define yyerrok yyerrorstatus = 3 #define YYERROR goto user_error_handler #define YYRECOVERING() (yyerrorstatus <= 2) typedef VALUE YYSTYPE; #if 0 typedef struct stackType{ int state;// State stack element. } StackType; typedef struct { VALUE semantic; } SemanticStackType; #else typedef int StackType; typedef VALUE SemanticStackType; #end int yyparse(void){ YYSTYPE la_token;// Semantic value computed by yylex(). int la_identity; unsigned yyerrorstatus = 3;// Initialize error-recovery counter. YYSTYPE yyredval;// Variable holds semantic value of$$. VALUE semantic_stack; /*Array of Node|Token*/ // SemanticStackType *semantic_stack_start; StackType *stack_start;// Stack. unsigned i=0; unsigned stack_size=64; stack_start=realloc(NULL,sizeof(StackType)*stack_size); if (stack_start==NULL) MALLOC_ERROR(); semantic_stack=rb_ary_new(); // semantic_stack_start=realloc(NULL,sizeof(SemanticStackType)*stack_size); // if (semantic_stack_start==NULL) MALLOC_ERROR(); la_identity = yylex(&la_token); /* Get 1st token.*/ goto shift_state_#{str2cname all_states.first.name};/* Start state.*/ " end |
#nonterminal(j) ⇒ Object
User actions are associated with reductions, and the code corresponding to a given production is expanded in-place. After the user code, the symbols associated with right-hand side of the production are popped, followed by copying $$ onto the semantic stack. Finally, there is a jump to the code that will compute the appropriate state given the left-hand side symbol of this production.
4.4 Nonterminal Transitions For each nonterminal, code is produced to compute (and jump to) the appropriate state given the current state. This simple switch statement is given below.
255 256 257 258 259 260 261 262 263 264 265 266 267 268 |
# File 'lib/redparse/generate.rb', line 255 def nonterminal(j) " nonterminal_#{str2cname j.name}: /*nonterminal_#{j.small_int}:*/ switch (OLDSTACK){ // Top of stack. #{ all_states.map_with_index do|state,k| %[ case #{k}: goto state_#{str2cname state.goto[j].name};\n] end } } " rescue Exception=>e backtrace.unshift("exception in node(nonterminal) #{j.name} #{e.class}:#{e}").join("\n") end |
#repl(rule, m) ⇒ Object
The state number is stored in the stack, followed by possibly invoking the lexical analyzer. The three optional lines store the semantic value of the current token, advance the lexical analyzer, and do error-recovery bookkeeping. Incrementing the stack pointer completes the push. The case arms of the switch are determined by the action table computed by the LALR(1) analysis; for each condition met in the comments, a case arm must be generated. Default actions were developed for compressing table-driven parsers, and can be similarly employed here for generating the switchs default [FL88].
4.3 Reduction Actions One piece of code is generated for each production. Its template is given below.
201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
# File 'lib/redparse/generate.rb', line 201 def repl(rule,m) repl=rule.replacement case repl when :shift,:accept #do nothing? when Class %[static VALUE repl_#{rule.name}=rb_const_lookup(rb_const_lookup(kNIL,"RedParse"),"#{repl.name});\n] when StackMonkey end def reduce(rule,m) repl=rule.replacement pattern=rule.pattern numpatterns=pattern.subregs.size leader="\nreduce_#{rule.name}:\n/*reduce_#{m}:*/ /*Production #{m}: #{rule.to_s} */\n" if StackMonkey===repl userhook=" huh rb_proc_call(stack_monkey_#{rule.name},semantic_stack); huh /*objects removed from semantic stack get pushed back onto lexer input*/ i -= #{repl.backup_count}; goto nonterminal_switcher_#{str2cname repl.huh}; // Compute transition on some old node from the (semantic) stack. huh//need a node (or token?) number to goto address mapper (switch stmt) here " elsif :accept==repl return leader+"YYACCEPT;" end %[ #{leader}#{userhook} yyredval=huh rb_call(repl_#{rule.name},huh("create"),semantic_stack,huh2ruby(-#{numpatterns})); i -= #{numpatterns}; /* Pop patterns from stack.*/ SEMANTIC_STACK_SET(yyredval); /* Copy ($$) onto semantic stack.*/ goto nonterminal_#{str2cname repl.name}; /* Compute transition on produced node type.*/ ] end rescue Exception=>e backtrace.unshift("exception in reduce of rule #{rule.name} #{e.class}:#{e}").join("\n") end |
#state(state_n, n) ⇒ Object
4.2 Hard-coded States For each automata state, mule creates code responsible for simulating the action of that state based on the current input token. All transitions into a given state are labeled with the same grammar symbol. States labeled with a token are called shift states and they require extra code to advance the lexical analyzer. The template of this code for state N is
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 |
# File 'lib/redparse/generate.rb', line 162 def state(state_n,n) #n=state_n.small_int name=state_n.name " shift_state_#{name}: GET_TOKEN(); /*modifies token, la_token*/ state_#{name}: /*state_#{n}:*/ STACK = #{n}; RESERVE_STACK_SLOT(); state_action_#{name}: /* Error-recovery entry point.*/ /*state_action_#{n}:*/ switch (la_identity){ #{state_n.actions.map do |tok,action| %[ case #{str2cname(tok)}: #{action2c action}] end.join(%[\n]) } default: #{action2c state_n.actions.default} } " rescue Exception=>e backtrace.unshift("exception in state #{name} #{e.class}:#{e}").join("\n") end |
#state_utils ⇒ Object
110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 |
# File 'lib/redparse/generate.rb', line 110 def state_utils " #define MALLOC_ERROR() huh #define RESERVE_STACK_SLOT() \\ if (++i >= stack_size){ \\ unsigned new_stack_size=stack_size*2; \\ stack_start=realloc(stack_start,sizeof(StackType)*new_stack_size); \\ if (stack_start==NULL) MALLOC_ERROR(); \\ //semantic_stack_start=realloc(semantic_stack_start,sizeof(SemanticStackType)*new_stack_size); \\ //if (semantic_stack_start==NULL) MALLOC_ERROR(); \\ stack_size=new_stack_size; \\ } #define GET_TOKEN() \\ do { \\ SEMANTIC_STACK_SET(la_token); /*Put lexical semantic entry on stack.*/ \\ la_identity = yylex(&la_token); /* Advance lexical analysis.*/ \\ yyerrorstatus++; /* Update error-recovery counter.*/ \\ } while(0) #define STACK stack_start[i] #define SEMANTIC_STACK rb_ary_get(semantic_stack,rb_int2fixnum(i)) #define SEMANTIC_STACK_SET(x) rb_ary_set(semantic_stack,rb_int2fixnum(i),x) #define OLDSTACK stack_start[i-1] " end |