python - Creating a Lexer -

May 15, 2015

hey guys trying understand concepts regarding lexers. understand lexers used in compilers separate individual characters in string form known tokens. thing confuses me matching part. not understand logic of why need match characters corresponding position.

import sys import re  def lex(characters, token_exprs):     pos = 0     tokens = []     while pos < len(characters):         match = none         token_expr in token_exprs:             pattern, tag = token_expr             regex = re.compile(pattern)             match = regex.match(characters, pos)             if match:                 text = match.group(0)                 if tag:                     token = (text, tag)                     tokens.append(token)                 break         if not match:             sys.stderr.write('illegal character: %s\n' % characters[pos])             sys.exit(1)         else:             pos = match.end(0)     return tokens

this code not understand. after loop, not quite grasp code trying do.why have match characters position?

a pretty traditional lexer can work this:

get character somewhere, file or buffer
check current character is:
- is whitespace? skip whitespace
- is comment introduction character? , skip comment
- is digit? try number
- is "? try string
- is character? try identifier
  - is identifier keyword/reserved word?
- otherwise, valid operator sequence?
return token type

instead of checking single characters @ time, can of course use regular expressions.

the best way learn how hand-written lexer works, (imo) find simple existing lexers , try understand them.

Search This Blog

Detect

python - Creating a Lexer -

Comments

Post a Comment

Popular posts from this blog

javascript - addthis share facebook and google+ url -

c++ - importing crypto++ in QT application and occurring linker errors? -

ios - Show keyboard with UITextField in the input accessory view -