OSDN > Developer > albertmietus > Chamber > DocIdeas > Commit

DocIdeas
Fork

(Original repository, No fork origin)

Commit

Commit MetaInfo

Revision	a47ed1afefeef2d5cfd19dd685f75db9a356e3be (tree)
Time	2022-05-17 00:10:49
Author	Albert Mietus < albert AT mietus DOT nl >
Commiter	Albert Mietus < albert AT mietus DOT nl >

Log Message

asis

Change Summary

add: CCastle/Design/syntax/2.grammar_code.rst (diff)
add: CCastle/Usage/1.CompilerCompiler.rst (diff)
delete: CCastle/Usage/CompilerCompiler.rst

Incremental Difference

diff -r 073412a6dd1e -r a47ed1afefee CCastle/Design/syntax/2.grammar_code.rst

--- /dev/null Thu Jan 01 00:00:00 1970 +0000

+++ b/CCastle/Design/syntax/2.grammar_code.rst Mon May 16 17:10:49 2022 +0200

		@@ -0,0 +1,218 @@
	1	+.. _grammmar-code:
	2	+
	3	+===============
	4	+Grammar is code
	5	+===============
	6	+
	7	+.. post:: 2022/05/14
	8	+ :category: Castle, DesignStudy
	9	+ :tags: Castle, grammar, PEG
	10	+
	11	+ In :ref:`Castle-CompilerCompiler` we have seen that we can define a grammar within a Castle-program. And we have
	12	+ argued that each grammars-rule can be considered as a function.
	13	+
	14	+ In this post, we look into de details of how this works. And will confirm grammars is code ...
	15	+
	16	+Let’s start with an example. The grammer below is a simplified description of how a (PEG) parsing-rule looks like. The
	17	+syntax in Castle is a bit more complicated, by having more details and options; but very simular. Most of the
	18	+Caste-grammers can be parsed by the grammer below!
	19	+
	20	+.. code-block:: PEG
	21	+
	22	+ PEG_rule <- rule_name '<-' expression ';' ;
	23	+ expression <- ordered_choice+ ;
	24	+ ordered_choice <- sequence ( '\|' sequence)* ;
	25	+ sequence <- group \| atom ;
	26	+ group <- '(' expression ')' ;
	27	+ atom <- rule_xref \| str_lit \| regexp ;
	28	+ str_lit <- '"' /[^"]* '"' ;
	29	+ regexp <- '/' /[^/]* '/' ;
	30	+ rule_name <- ID ;
	31	+ rule_xref <- ID ;
	32	+
	33	+With this grammer we can read and check whether an a string is a valid rule by simply calling:
	34	+:math:`ast:=PEG\_rule(single\_line)`. When not, ``ast`` is :math:`False`, else ``ast`` has 4 children (or fields).
	35	+Where :data:`ast[0]` represent the ``rule_name``, :data:`ast[2]` is an ``expression``; which is a sequence of
	36	+``ordered_choice``\(s).
	37	+\|BR\|
	38	+As the rule has two constants :data:`ast[1]` and :data:`ast[3]` will always match the strings ‘<-’ and ‘;’.
	39	+
	40	+
	41	+Grammar to code
	42	+===============
	43	+
	44	+Before we study how Castle expands a grammar to code, let examine how to do that by manually crafted code, or using a
	45	+external compiler-compiler. In both cases, we will focus on the “validation” part only. And we skip a lot of details;
	46	+assuming you know a bit of theory of (PEG) parser. When not, read Guido van Rossum’s excellent blog-series `PEG Parsing
	47	+Series Overview <https://medium.com/@gvanrossum_83706/peg-parsing-series-de5d41b2ed60>`__
	48	+
	49	+
	50	+Hand written code
	51	+-----------------
	52	+
	53	+Again, let’s suppost we like to verify a string contains a peg-rule. Then we need some functions [#func]_, which signature
	54	+are something like:
	55	+
	56	+.. tabs::
	57	+
	58	+ .. code-tab:: python
	59	+
	60	+ def PEG_rule(text: str) -> ast : ...
	61	+ def expression(text: str) -> ast : ...
	62	+ def ordered_choice(text: str) -> ast : ...
	63	+ def sequence(text: str) -> ast : ...
	64	+ ...
	65	+
	66	+ .. code-tab:: c
	67	+
	68	+ ast PEG_rule(char* txt);
	69	+ ast expression(char* txt);
	70	+ ast ordered_choice(char* txt);
	71	+ ast sequence(char* txt);
	72	+ ...
	73	+
	74	+Where we assume the type ``ast`` is defined in the chosen language and will be Null/None/Empty to signal :math:`False`:
	75	+not valid.
	76	+
	77	+The implementation of ``PEG_rule()`` checks thats the input-string starts with a ``rule_name``, followed by the literal
	78	+string “<-”, then has an ``expression`` and finally the literal string “;”. When one of those (four) steps fail,
	79	+we return :math:`False`.
	80	+\|BR\|
	81	+We follow this pattern for all rules.
	82	+
	83	+This concept is easy to implement: each rule-function calls other rule-functions as defined by the grammar. When we
	84	+need to check for a literal string we use :func:`expect(txt: str, literal: str) -> bool`. Also, we need a function to find
	85	+an ID; again easy. Sometimes we need to implement a loop --to handle ``*`` and ``+``. Or we need an :math:`or`, to
	86	+implement alternatives (``\|``). None of that is rocket science.
	87	+
	88	+A real implementation is a bit harder, as we have to strip spaces (and comments), handle newlines, and need to keep
	89	+track of where we are. Typically that is (nowadays) done by embedding those functions in a class; then the “input text” can be
	90	+stored in the instance (instead of passing them constantly). That instance also has a ‘cursor’ to the current
	91	+location.
	92	+
	93	+More details
	94	+~~~~~~~~~~~~
	95	+
	96	+There are a lot of details that make writing a grammer complex. We mention a few, and what it effect is on the (manually
	97	+written) code.
	98	+
	99	+When using alternatives (the ``\|`` operator in the grammar), a PEG-parser will always try the first alternative first,
	100	+Only when that fails, it back-ups an try the next alternative. Sometimes means (almost) start again, and parse the same file almost
	101	+completely again. Therefore the packrat algorithm is usually used; using memoization.
	102	+\|BR\|
	103	+This is not hard: just add a few lines of boilerplate before and after each call. To store intermediate partial-ast(s) in a
	104	+cache.
	105	+
	106	+Sometimes, we like to use another parser-strategy, like LALR_ (used by Yacc_), GLR_ (e.g Bison, the successor of Yacc_)
	107	+or `LL(k)`_ (introduced by ANTLR, which was popular for a while); each one has it pros and cons. Still, all (or almost)
	108	+start with the same grammar (although smarter strategies may result is shorter, easier to maintain [#maintain]_
	109	+grammars) [#notation]_.
	110	+
	111	+For a long time PEG-parsers where not able to handle left recursive rules [#leftStack]_. Until somebody discovered that is not
	112	+correct. Grammars in Castle can be left recursive! Both direct and indirect recursion is allowed. It is possible to
	113	+rewrite a grammar to remove the recursion. It makes the grammar however, more complex to maintain *(and for that reason
	114	+Castle will support recursion!)*
	115	+\|BR\|
	116	+By example, an simple calculation as :math:`7-5-3` should result in :math:`((7-5)-3)` but that needs left
	117	+recursion. When rewriting it, you must be carefull not to get :math:`(7-(5-3))`!
	118	+\|BR\|
	119	+This can be fixes, by adding an extra step. But it is better to use the update PEG-strategy: Just add more boilerplate code!
	120	+
	121	+.. tabs::
	122	+
	123	+ .. code-tab:: PEG Direct recursion
	124	+
	125	+ expr <- expr '-' term \| term
	126	+
	127	+ .. code-tab:: PEG Indirect recursion
	128	+
	129	+ A <- B "a" \| "a"
	130	+ B <- A "b" \| "b"
	131	+
	132	+ .. code-tab:: PEG A rewritten grammar
	133	+
	134	+ expr <- term ( '-' term )*
	135	+
	136	+
	137	+
	138	+Generating the code
	139	+===================
	140	+
	141	+You might recognise the pattern: To make the grammer more useful, the algorithms become more complex and adds more
	142	+code. This “extra” code, however is not hard; you just need the same (or almost the same) lines at many places.
	143	+\|BR\|
	144	+This begs for automation. And that is exactly what most compiler-compilers do.
	145	+
	146	+A compiler-compilers read the grammar and generates the code. As shown above it will generate (C, C++, C#, Java,
	147	+Python, or ...) functions [#OrTables]_ that call each-other. It will also detect left-recursion, and might compensate for
	148	+that. The result: more boilerplate-code; but as it is automatically generated this is easy.
	149	+
	150	+Classic tools
	151	+-------------
	152	+There are many tools, that we can use for inspiration. A short overview, and how it influences Castle.
	153	+
	154	+Possible the most famous compiler-compilers is Yacc_. It was developed in 197X and generates C-code that can be compiled
	155	+and linked to your code. To parse a string, you had to call ``yyparse())``. It would however be relatively simple to
	156	+generate functions with the name of each rule, using the same machinery. In that decade however, the goal was
	157	+differently. Memory was limited, what we can also see in the used grammar: one had to craft it carefully as the was no
	158	+back-tracking an only a single token look-ahead.
	159	+
	160	+Bison_ is Gnu reimplementation of Yacc_, but can use several parsing-algorithms.cLike Yacc_, it used a separate Lexer_:
	161	+flex (whereas Yacc uses lex). A lexer_ splits the input-string into a stream of Tokens using another (simpler,
	162	+but faster) algorithm. In that time that was relevant.
	163	+\|BR\|
	164	+As a lexer_ can be implemented with a parsing-algorithm (but not the other-way around), and as the need for speed doesn't
	165	+demand a separate lexer_ anymore; modern parsings are often “scannerless”. This removes the need to use two meta-syntaxes
	166	+(for the lexer/scanner and the parser) and so is simpler to use.
	167	+\|BR\|
	168	+Also Castle use a scannerless approach.
	169	+
	170	+Castle
	171	+------
	172	+Also in Castle you can use grammars; but now directly in your program, using the Castle-syntax. And Castle will generate
	173	+“code”, Castle-functions that is. But now without an extra tool.
	174	+\|BR\|
	175	+Actually, it probably will not generate code; nor ‘code as text’. Why should we generate code, to read & parse it back
	176	+and compile it directly? It easier to generate the AST, that would be the result of parsing the generated-code directly.
	177	+
	178	+But the effect is the same. You create a set of function with this generic “text to tree” signature, by writing some
	179	+simle rule. Castle does the rest for you. Easy!
	180	+
	181	+
	182	+----------
	183	+
	184	+.. rubric:: Footnotes
	185	+
	186	+.. [#func]
	187	+ Instead of a function, it can also be a method, or any callable*. We use ‘function’ a generic term, in the
	188	+ mathematical meaning: some input (parameters) and an output (return value).
	189	+
	190	+.. [#maintain]
	191	+ This is not specially for grammers; all it valid for all programming-languages. New languages may introduce new
	192	+ concepts (like --once-- OO). When the compiler becomes smarter, the programmer can focus in the important bits!
	193	+
	194	+.. [#notation]
	195	+ Aside of multiple parser-algorithms, there are also several notation to write the grammar itself; like `EBNF
	196	+ <https://en.wikipedia.org/wiki/Extended_Backus–Naur_form>`__ `ABNF
	197	+ <https://en.wikipedia.org/wiki/Augmented_Backus–Naur_form>`__, and `YACC`_
	198	+ Most implementations of a given algorithm, use a dialect of a standard one, to enable :ref:`G2C-actions`, or ..
	199	+
	200	+ Also Caste does this: We use the Caste-grammer, which is based on both EBNF and PEG; but using the classic ‘\|’
	201	+ instead of the ‘\’ for ordered-choice.
	202	+
	203	+.. [#leftStack]
	204	+ Without going into details left-recursion is hard for many parsing-algorithms. In the shown approach, a
	205	+ rule-function (for a rule that is direct left-recurse) will call itself as first step. In this way no progress is
	206	+ made, and the stack will quickly overrun.
	207	+
	208	+.. [#OrTables]
	209	+ Some tools, like Yacc by example, use another approach. Instead of many functions it has a generic (run-time) library
	210	+ that used code-tables; which are generated by the tool. Still, that is just a implementation detail.
	211	+
	212	+.. _LALR: https://en.wikipedia.org/wiki/LALR_parser
	213	+.. _LALR(1): LALR_
	214	+.. _GLR: https://en.wikipedia.org/wiki/GLR_parser
	215	+.. _LL(k): https://en.wikipedia.org/wiki/LL_parser
	216	+.. _YACC: https://en.wikipedia.org/wiki/Yacc
	217	+.. _Bison: https://en.wikipedia.org/wiki/GNU_Bison
	218	+.. _Lexer: https://en.wikipedia.org/wiki/Lexical_analysis

diff -r 073412a6dd1e -r a47ed1afefee CCastle/Usage/1.CompilerCompiler.rst

--- /dev/null Thu Jan 01 00:00:00 1970 +0000

+++ b/CCastle/Usage/1.CompilerCompiler.rst Mon May 16 17:10:49 2022 +0200

		@@ -0,0 +1,124 @@
	1	+.. _Castle-CompilerCompiler:
	2	+
	3	+=================
	4	+Compiler Compiler
	5	+=================
	6	+
	7	+.. post:: 2022/05/7
	8	+ :category: Castle, Usage
	9	+ :tags: Castle, grammar
	10	+
	11	+ In Castle you can define a grammar directly in your code. The compiler will translate them into functions, using
	12	+ the build-in (PEG) compiler-compiler -- at least that was it called back in the days of YACC.
	13	+
	14	+ How do one use that? And why should you?
	15	+
	16	+Grammars, a short intro
	17	+=======================
	18	+
	19	+A grammar is a collection of (parsing)-rules and optionally some settings. Rules are written in a mixture of EBNF
	20	+and PEG meta-syntax. Let’s start with an simple example:
	21	+
	22	+.. code-block:: PEG
	23	+
	24	+ castle_file <- ( import_line \| interface \| implementation )* ;
	25	+ import_line <- IMPORT_stmt ( STRING_literal \| qualID ) ';' ;
	26	+ qualID <- '.'? nameID ('.' nameID )* ;
	27	+ IMPORT_stmt = "import" ;
	28	+ ...
	29	+
	30	+
	31	+This basically defines that a ``castle_file`` is either an ``import_line``, an ``interface``, an ``implementation``, or
	32	+sequence of them. Where an ``import_line`` starts with the ``IMPORT_stmt`` *(which is set to the string ‘import’, on
	33	+line 4)*, then comes either a ``STRING_literal`` (indeed a literal-string) or a ``qualID``, and ends with a semicolon
	34	+(`;`). Likewise, a ``qualID`` is a ``nameID`` (a name that is used as ID, like in any programming language),
	35	+optionally followed by sub-names *(again like most languages: a dotted name, specifying a field (in a field, in
	36	+...)*. In Castle, that name may start with a dot --which is a shorthand notation for “in the current namespace”. You can
	37	+ignore that for know.
	38	+
	39	+The grammer defines how one should read the input --a text--, or more formally: how to parse it. The result of this
	40	+parsing is twofold. It will check whether input conforms to the grammer; resulting a in boolean, for the mathematics
	41	+under us. And it will translate a sequential (flat) text into a tree-structure; which typically much more useful for a
	42	+software-engineer.
	43	+\|BR\|
	44	+A well known example is this HTML-file. On disk it’s nothing but text, which is easy to store and to transfer. But
	45	+when send to your brouwer, it’s parsed and to create the `DOM
	46	+<https://nl.wikipedia.org/wiki/Document_Object_Model>`__; a tree of the document, with sections, paragraphs,
	47	+hyper-links, etc. By regarding it as a tree, it easy to describe (e.g. with CSS) how all parts, should be shown: all
	48	+headers have a background, the first row in a table is highlighed, etc.
	49	+
	50	+
	51	+Parsing
	52	+=======
	53	+Another well-known example is (the source of a) programm. As code, it is just text. But the compiler will parse it into
	54	+a parse-tree and/or an abstract-syntax-tree; which is build out of classes, methods, statements etc.
	55	+\|BR\|
	56	+But also your favorite IDE will parse it; to highlight the code, give tooltips, enable you to quickly navigate and
	57	+refactor it, and all those conviant features that make it your favorite editor.
	58	+
	59	+And even you are probably parsing text as part of your daily job. When you un-serialise data, you are (often) parsing
	60	+text; when you read the configuration, you are (or should be ) parsing that text. Even a simple input of the user might
	61	+need a bit of parsing. The text “42” is not the number :math:`42.0` -- you need to convert it; parse it.
	62	+
	63	+There a many ways to parse. You do not need a full-fledged grammer to translate “42” into :math:`42` or
	64	+:math:`42.0` --a stdlib functions as ``atoi()`` or ``atof()`` will do. But how about handling complex numbers
	65	+(:math:`4+j2`) or fractions (:math:`\frac{17}{42}`)?
	66	+
	67	+Non-parsing
	68	+-----------
	69	+
	70	+As proper passing used to hard, other similar (but simpler) techniques do exist, like `globing
	71	+<https://en.wikipedia.org/wiki/Glob_(programming)>`__ (``\*.Castle`` on the bash-prompt will result in all
	72	+Castle-files). Using `regular-expressions <https://en.wikipedia.org/wiki/Regular_expression>`__ is more powerfull, and
	73	+often used to highlight code; a pattern as ``//.*$`` can be used to highlight (single-line) comment. It often works, but
	74	+this simple pattern might match a piece of text inside a multi-line-(doc)string -- which wrong.
	75	+\|BR\|
	76	+To parse a input-text its not a sound solution; although I have seen cunning regular-expressions, that almost always
	77	+work. But reg-exps have not the same power as a grammar-- That is already proven halve a century ago and will not be
	78	+repeated here.
	79	+
	80	+Grammars are more powerfull
	81	+===========================
	82	+
	83	+A grammar (even a simple one) is more powerfull. You can define the overal structure of the input and the sub-structure
	84	+of each lump. When a multi-line-string has no sub-structure, the parser will never find comments inside it. Nor other
	85	+way around; it simple is not hunting for it.
	86	+
	87	+As most programming-languages do not have build-in support for grammars, one has to resort to external tools. Like the
	88	+famous `YACC <https://en.wikipedia.org/wiki/Yacc>`__; developed in 197X. YACC will read a grammar-file, and generates
	89	+C-code that can be compiled and linked to your code.
	90	+
	91	+Back then, writing compiler-compilers was a popular academic research exercise (YACC stand for: Yet Another Compiler
	92	+Compiler). It was great for compiler-designers, but clumsy to use for average developers: The syntax to write a grammar
	93	+was hard to grasp, with many pitfalls, the interface between your code and the parser was awkward (you had to call
	94	+``yyparse()``; needed some globals; OO wasn't invented, no inheritance or data-hiding, which resulted in puzzling tricks
	95	+to use multiple parsers, etc).
	96	+\|BR\|
	97	+Aside of that, more and better parsing strategies are developed; that is handles in another :ref:`blog <grammmar-code>`.
	98	+
	99	+Unleash that power!
	100	+-------------------
	101	+
	102	+With those better parsing-algorithms, faster computers with a lot more memory and other inventions, writing grammars
	103	+has become more peaceful. Except that you still need an extra step, another sytax, as you still need to use an external
	104	+tool. That sometimes isn’t maintained after a couple of years ...
	105	+\|BR\|
	106	+The effect is, most developers don’t use grammars; the write parser-like code manually, the settle for less optimal
	107	+result. Or are utterly not aware that grammer can provide a other (better, easier) solution.
	108	+
	109	+Castle has build-in support for grammers, and is hiding the nasty details of parsing-strategies. There is no need to
	110	+generating, compiling, and use that code, with external tools. All that clutter is gone.
	111	+\|BR\|
	112	+With a few lines, you define the structure of the input. Each rule is like a function: it has a name (the left-hand-side
	113	+of the rule, so the part before the arrow), and a implementation; the part after the arrow. That implementation “calls”
	114	+other rules, like normal code.
	115	+
	116	+To use the grammar you simply call one of those rules as a function: pass the input (string) and it will return a
	117	+(generic) tree-structure.
	118	+\|BR\|
	119	+When you simple like to verify the syntax is correct: use the tree as a boolean: when it not-empty the input is valid.
	120	+
	121	+But typically you proces/use that tree: like you do in many situations. Read the configuration values, walk over the
	122	+tree, of traverse it as-if it is a DOM. You can even use Castle’s :ref:`matching-statements` to simply that.
	123	+
	124	+Grammars makes reading text easy. Define the structure, call the “main rule” and use the values. Castle makes that simple!

diff -r 073412a6dd1e -r a47ed1afefee CCastle/Usage/CompilerCompiler.rst

--- a/CCastle/Usage/CompilerCompiler.rst Mon May 16 17:04:58 2022 +0200

+++ /dev/null Thu Jan 01 00:00:00 1970 +0000

		@@ -1,124 +0,0 @@
1		-.. _Castle-CompilerCompiler:
2		-
3		-=================
4		-Compiler Compiler
5		-=================
6		-
7		-.. post:: 2022/05/7
8		- :category: Castle, Usage
9		- :tags: Castle, grammar
10		-
11		- In Castle you can define a grammar directly in your code. The compiler will translate them into functions, using
12		- the build-in (PEG) compiler-compiler -- at least that was it called back in the days of YACC.
13		-
14		- How do one use that? And why should you?
15		-
16		-Grammars, a short intro
17		-=======================
18		-
19		-A grammar is a collection of (parsing)-rules and optionally some settings. Rules are written in a mixture of EBNF
20		-and PEG meta-syntax. Let’s start with an simple example:
21		-
22		-.. code-block:: PEG
23		-
24		- castle_file <- ( import_line \| interface \| implementation )* ;
25		- import_line <- IMPORT_stmt ( STRING_literal \| qualID ) ';' ;
26		- qualID <- '.'? nameID ('.' nameID )* ;
27		- IMPORT_stmt = "import" ;
28		- ...
29		-
30		-
31		-This basically defines that a ``castle_file`` is either an ``import_line``, an ``interface``, an ``implementation``, or
32		-sequence of them. Where an ``import_line`` starts with the ``IMPORT_stmt`` *(which is set to the string ‘import’, on
33		-line 4)*, then comes either a ``STRING_literal`` (indeed a literal-string) or a ``qualID``, and ends with a semicolon
34		-(`;`). Likewise, a ``qualID`` is a ``nameID`` (a name that is used as ID, like in any programming language),
35		-optionally followed by sub-names *(again like most languages: a dotted name, specifying a field (in a field, in
36		-...)*. In Castle, that name may start with a dot --which is a shorthand notation for “in the current namespace”. You can
37		-ignore that for know.
38		-
39		-The grammer defines how one should read the input --a text--, or more formally: how to parse it. The result of this
40		-parsing is twofold. It will check whether input conforms to the grammer; resulting a in boolean, for the mathematics
41		-under us. And it will translate a sequential (flat) text into a tree-structure; which typically much more useful for a
42		-software-engineer.
43		-\|BR\|
44		-A well known example is this HTML-file. On disk it’s nothing but text, which is easy to store and to transfer. But
45		-when send to your brouwer, it’s parsed and to create the `DOM
46		-<https://nl.wikipedia.org/wiki/Document_Object_Model>`__; a tree of the document, with sections, paragraphs,
47		-hyper-links, etc. By regarding it as a tree, it easy to describe (e.g. with CSS) how all parts, should be shown: all
48		-headers have a background, the first row in a table is highlighed, etc.
49		-
50		-
51		-Parsing
52		-=======
53		-Another well-known example is (the source of a) programm. As code, it is just text. But the compiler will parse it into
54		-a parse-tree and/or an abstract-syntax-tree; which is build out of classes, methods, statements etc.
55		-\|BR\|
56		-But also your favorite IDE will parse it; to highlight the code, give tooltips, enable you to quickly navigate and
57		-refactor it, and all those conviant features that make it your favorite editor.
58		-
59		-And even you are probably parsing text as part of your daily job. When you un-serialise data, you are (often) parsing
60		-text; when you read the configuration, you are (or should be ) parsing that text. Even a simple input of the user might
61		-need a bit of parsing. The text “42” is not the number :math:`42.0` -- you need to convert it; parse it.
62		-
63		-There a many ways to parse. You do not need a full-fledged grammer to translate “42” into :math:`42` or
64		-:math:`42.0` --a stdlib functions as ``atoi()`` or ``atof()`` will do. But how about handling complex numbers
65		-(:math:`4+j2`) or fractions (:math:`\frac{17}{42}`)?
66		-
67		-Non-parsing
68		------------
69		-
70		-As proper passing used to hard, other similar (but simpler) techniques do exist, like `globing
71		-<https://en.wikipedia.org/wiki/Glob_(programming)>`__ (``\*.Castle`` on the bash-prompt will result in all
72		-Castle-files). Using `regular-expressions <https://en.wikipedia.org/wiki/Regular_expression>`__ is more powerfull, and
73		-often used to highlight code; a pattern as ``//.*$`` can be used to highlight (single-line) comment. It often works, but
74		-this simple pattern might match a piece of text inside a multi-line-(doc)string -- which wrong.
75		-\|BR\|
76		-To parse a input-text its not a sound solution; although I have seen cunning regular-expressions, that almost always
77		-work. But reg-exps have not the same power as a grammar-- That is already proven halve a century ago and will not be
78		-repeated here.
79		-
80		-Grammars are more powerfull
81		-===========================
82		-
83		-A grammar (even a simple one) is more powerfull. You can define the overal structure of the input and the sub-structure
84		-of each lump. When a multi-line-string has no sub-structure, the parser will never find comments inside it. Nor other
85		-way around; it simple is not hunting for it.
86		-
87		-As most programming-languages do not have build-in support for grammars, one has to resort to external tools. Like the
88		-famous `YACC <https://en.wikipedia.org/wiki/Yacc>`__; developed in 197X. YACC will read a grammar-file, and generates
89		-C-code that can be compiled and linked to your code.
90		-
91		-Back then, writing compiler-compilers was a popular academic research exercise (YACC stand for: Yet Another Compiler
92		-Compiler). It was great for compiler-designers, but clumsy to use for average developers: The syntax to write a grammar
93		-was hard to grasp, with many pitfalls, the interface between your code and the parser was awkward (you had to call
94		-``yyparse()``; needed some globals; OO wasn't invented, no inheritance or data-hiding, which resulted in puzzling tricks
95		-to use multiple parsers, etc).
96		-\|BR\|
97		-Aside of that, more and better parsing strategies are developed; that is handles in another :ref:`blog <grammmar-code>`.
98		-
99		-Unleash that power!
100		--------------------
101		-
102		-With those better parsing-algorithms, faster computers with a lot more memory and other inventions, writing grammars
103		-has become more peaceful. Except that you still need an extra step, another sytax, as you still need to use an external
104		-tool. That sometimes isn’t maintained after a couple of years ...
105		-\|BR\|
106		-The effect is, most developers don’t use grammars; the write parser-like code manually, the settle for less optimal
107		-result. Or are utterly not aware that grammer can provide a other (better, easier) solution.
108		-
109		-Castle has build-in support for grammers, and is hiding the nasty details of parsing-strategies. There is no need to
110		-generating, compiling, and use that code, with external tools. All that clutter is gone.
111		-\|BR\|
112		-With a few lines, you define the structure of the input. Each rule is like a function: it has a name (the left-hand-side
113		-of the rule, so the part before the arrow), and a implementation; the part after the arrow. That implementation “calls”
114		-other rules, like normal code.
115		-
116		-To use the grammar you simply call one of those rules as a function: pass the input (string) and it will return a
117		-(generic) tree-structure.
118		-\|BR\|
119		-When you simple like to verify the syntax is correct: use the tree as a boolean: when it not-empty the input is valid.
120		-
121		-But typically you proces/use that tree: like you do in many situations. Read the configuration values, walk over the
122		-tree, of traverse it as-if it is a DOM. You can even use Castle’s :ref:`matching-statements` to simply that.
123		-
124		-Grammars makes reading text easy. Define the structure, call the “main rule” and use the values. Castle makes that simple!

DocIdeas Fork