python - Creating a Lexer -


hey guys trying understand concepts regarding lexers. understand lexers used in compilers separate individual characters in string form known tokens. thing confuses me matching part. not understand logic of why need match characters corresponding position.

import sys import re  def lex(characters, token_exprs):     pos = 0     tokens = []     while pos < len(characters):         match = none         token_expr in token_exprs:             pattern, tag = token_expr             regex = re.compile(pattern)             match = regex.match(characters, pos)             if match:                 text = match.group(0)                 if tag:                     token = (text, tag)                     tokens.append(token)                 break         if not match:             sys.stderr.write('illegal character: %s\n' % characters[pos])             sys.exit(1)         else:             pos = match.end(0)     return tokens 

this code not understand. after loop, not quite grasp code trying do.why have match characters position?

a pretty traditional lexer can work this:

  1. get character somewhere, file or buffer
  2. check current character is:
    • is whitespace? skip whitespace
    • is comment introduction character? , skip comment
    • is digit? try number
    • is "? try string
    • is character? try identifier
      • is identifier keyword/reserved word?
    • otherwise, valid operator sequence?
  3. return token type

instead of checking single characters @ time, can of course use regular expressions.


the best way learn how hand-written lexer works, (imo) find simple existing lexers , try understand them.


Comments

Popular posts from this blog

c# - Send Image in Json : 400 Bad request -

javascript - addthis share facebook and google+ url -

ios - Show keyboard with UITextField in the input accessory view -