Question on backref

I have problems understanding how exactly backref finds the referenced capture.

The following pattern is copied from the backref-rpl.rpl file, and a similar one is in extra/examples/html.rpl. Let's apply this pattern to several input data, and see what is happening.

tagname = word
alias matching_tag = backref:tagname
starttag = {"<" tagname ">"}
endtag = {"</" matching_tag ">"}

grammar
   content = { {!"<" .}* {>starttag html}? {!"<" .}* }
in
   html = { starttag content? endtag }+
end

I've chosen this pattern because "starttag" is used twice, in "html" and "content".

<foo><bar></bar></foo>: The first "starttag" will match "foo" and the second will match "bar". For this unittest to work, the first starttag must be used for the "endtag"
<foo></foo><bar></bar>: The first one would "foo" and the second (last) one "bar". For this unittest to work, the last starttag must be used for the "endtag"

Conclusion: neither "pick the first" nor "pick the last" actually works. The rule required is "the first match within the group". But "group" is nothing ever captured. "html" captures {..}+, that is 1-to-n times the group.

I did not find a definitive description how backref is meant to work. I haven't tested it yet, but a reasonably easy to implement (and fast) rule might be "the last match in a (binding) capture". Only that the grammar from above won't work anymore. Though it could easly be fixed. Just remove >starttag as it is redundant anyways. Hence IMHO it is not such a big issue, and also easily misleading. Instead the parser/compiler should throw an error to help the user write clear/clean code.

Edited Sep 09, 2021 by Juergen

To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information