Question on backref
I have problems understanding how exactly backref finds the referenced capture.
The following pattern is copied from the backref-rpl.rpl file, and a similar one is in extra/examples/html.rpl. Let's apply this pattern to several input data, and see what is happening.
tagname = word
alias matching_tag = backref:tagname
starttag = {"<" tagname ">"}
endtag = {"</" matching_tag ">"}
grammar
content = { {!"<" .}* {>starttag html}? {!"<" .}* }
in
html = { starttag content? endtag }+
end
I've chosen this pattern because "starttag" is used twice, in "html" and "content".
-
<foo><bar></bar></foo>
: The first "starttag" will match "foo" and the second will match "bar". For this unittest to work, the first starttag must be used for the "endtag" -
<foo></foo><bar></bar>
: The first one would "foo" and the second (last) one "bar". For this unittest to work, the last starttag must be used for the "endtag"
Conclusion: neither "pick the first" nor "pick the last" actually works. The rule required is "the first match within the group". But "group" is nothing ever captured. "html" captures {..}+, that is 1-to-n times the group.
I did not find a definitive description how backref is meant to work. I haven't tested it yet, but a reasonably easy to implement (and fast) rule might be "the last match in a (binding) capture". Only that the grammar from above won't work anymore. Though it could easly be fixed. Just remove >starttag
as it is redundant anyways. Hence IMHO it is not such a big issue, and also easily misleading. Instead the parser/compiler should throw an error to help the user write clear/clean code.