This is not the document you are looking for? Use the search form below to find more!

Report home > Others

About Tokens and Lexemes

0.00 (0 votes)
Document Description
About tokens and lexemes Ben Scholzen Game Developer Gameforge Productions GmbH What we'll cover Definition of a compiler, tokenizer and parser Basic structure of a tokenizer and a parser Where…
File Details
Submitter
  • Name: henriette
Embed Code:

Add New Comment




Related Documents

What People Need to Know about Acne and Its Solution E-Book

by: iHealthiLicious, 46 pages

What People Need to Know about Acne and Its Solution E-Book

What I need to know about Eating and Diabetes

by: imogen, 52 pages

What I need to know about Eating and Diabetes. A guide to people with diabetes.

Learning More About Bankruptcy And Debt Problems

by: James, 1 pages

Learning More About Bankruptcy And Debt Problems

Top 7 myths about Israel and zionism

by: raceandreligion, 2 pages

Top 7 myths about Israel and zionism

Debunking Some Myths About Acne and Diet.

by: jswilkinson, 2 pages

There seems to be a lot of cautionary tales about what you should and what you should not eat when dealing with acne. Are these just 'old wives tales' or is there some truth in the matter

All You Need to Know About Buying and Selling Gold and Were Afraid to Ask!

by: joshuacarson1231, 2 pages

Selling Gold For Profit can be done in three different ways: The first, as an individual having no interest in investing, and only interested in making small amounts of cash. The second, as a serious ...

Truth About Cataracts And Its Effects

by: MagmireToby, 2 pages

Cataract eye drops - Bright Eyes NAC eye drops cataracts trial on TV featuring Ethos Endymion Bright Eyes NAC (N-Acetyl-Carnosine) eye drops for cataracts, glaucoma and macular degeneration (AMD).

Facts About Bees and Africanized Bees

by: rika, 3 pages

Honey bees are not native to the North America. Honey bees currently pollinate about 90 agricultural crops (accounting for 80% of the pollination in the US): ~ $10 billion pollination business, ...

Getting Our History Right: Six Errors about Darwin and His Influence

by: shinta, 18 pages

The Darwin Exhibition created by the American Museum of Natural History is the centerpiece of the bicentennial of Darwin’s birth. It opened in November 2005 and will circulate to a ...

Externalism about Content and McKinsey-style Reasoning

by: monkey, 38 pages

It’s widely accepted nowadays that the contents of some of our thoughts are externalist: we’re only able to have thoughts with those contents because we inhabit environments ...

Content Preview
  1. About tokens and lexemes Ben Scholzen Game Developer Gameforge Productions GmbH
  2. What we'll cover
    • Definition of a compiler, tokenizer and parser
    • Basic structure of a tokenizer and a parser
    • Where to optimize things for PHP
  3. What about parser generators?
  4. They are evil!
    • PHP_LexerGenerator, PHP_ParserGenerator, lemon-PHP
    • Create lots of function calls like lemon parsers in C
    • Are not working very performance-wise
    • Will eat up all your memory
  5. Conclusion
      Don't use them!
  6. Let's get started
  7. What a compiler is and how it works
    • Acts as frontend for the application
    • Converts human-readable data into machine-readable data
    • Consists of a two components:
      • The lexer:
        • Is a finite-state-machine
        • Reads the input stream
        • Clears up the input data
        • Creates a list of tokens
      • The parser:
        • Gets tokens from the tokenizer
        • Converts them into a data structure
  8. What a compiler is and how it works Lexer Parser Tokens Document Stream Structure
  9. Sounds great, but where do I need it?
    • Formatting languages
      • BB-Code
      • Wiki-Codes
    • Description languages
      • iCalendar / vCalendar
      • XML
    • Even programming languages
      • JavaScript
      • PHP
    • Anything else you want your program to understand
  10. The lexer (or tokenizer)
  11. What are tokens?
    • Categorized block of text
      • Token type
      • Corresponding block of text (lexeme)
    • List of tokens represents an entire document
    • Example in PHP: $value = 5 * 7 ;
  12. How the tokenizer works
    • Define possible states of the lexer
    • Tokenize the input in a loop
      • Scan with preg_match()
        • Strtok() is mostly too simple
        • Reading char-by-char is too slow
        • Use the offset parameter
        • Use the G assertion (^ won't work)
      • Always store the current position
      • Use either a switch-statement or a structured array
    • Return the tokens
  13. What we can optimize
    • Use little memory
      • Always just read a partial part of the document into memory
        • Via fopen() and fgets()
        • Requires previous knowledge about when tokens end
      • Offer a method for the parser to get a partial bunch of tokens
    • Speed up execution-time
      • Do no internal function-calls if applicable
  14. Going into practice
  15. The beginning
    • Use little memory
      • Via fopen() and fread()
        • Requires previous knowledge about when tokens end
        • Offer a method for the parser to get a partial bunch of tokens
      • Speed up execution-time
    • Do no internal function-calls if applicable
  16. Throwing in a file
  17. Preparing stuff
  18. Base state
  19. Operator state
  20. Value state
  21. Rounding it up
  22. Some actual testing
  23. And what we get
    • array(6) {
    • [0]=>
    • array(2) {
    • [0]=>
    • string(8) "variable"
    • [1]=>
    • string(6) "$value"
    • }
    • [1]=>
    • array(2) {
    • [0]=>
    • string(8) "operator"
    • [1]=>
    • string(1) "="
    • }
    • [2]=>
    • array(2) {
    • [0]=>
    • string(6) "number"
    • [1]=>
    • string(1) "5"
    • }
    • [3]=>
    • array(2) {
    • [0]=>
    • string(8) "operator"
    • [1]=>
    • string(1) "*"
    • }
    • [4]=>
    • array(2) {
    • [0]=>
    • string(6) "number"
    • [1]=>
    • string(1) "7"
    • }
    • [5]=>
    • array(2) {
    • [0]=>
    • string(8) "operator"
    • [1]=>
    • string(1) ";"
    • }
    • }
  24. The parser
  25. So we have a bunch of tokens, what now?
    • Loop through the tokens and analyze them
    • Create an object-oriented tree-structure or interpret
    • Avoid non-tail recursion
      • Use tail-recursion (trampoline) instead
      • Saves you from hitting the stack limit
    • That's it!
  26. Summary — Questions?
  27. Where to go from here
    • Wikipedia: http://en.wikipedia.org/wiki/Compiler http://en.wikipedia.org/wiki/Parsing
    • About tail-recursion in PHP: http://www.alternateinterior.com/2006/09/tail-recursion-in-php.html
    • My blog: http://www.dasprids.de
    • Rate this talk: http://joind.in/635
    • Follow me on twitter:
    • http://www.twitter.com/dasprid
  28. Thank you!

Download
About Tokens and Lexemes

 

 

Your download will begin in a moment.
If it doesn't, click here to try again.

Share About Tokens and Lexemes to:

Insert your wordpress URL:

example:

http://myblog.wordpress.com/
or
http://myblog.com/

Share About Tokens and Lexemes as:

From:

To:

Share About Tokens and Lexemes.

Enter two words as shown below. If you cannot read the words, click the refresh icon.

loading

Share About Tokens and Lexemes as:

Copy html code above and paste to your web page.

loading