You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
(5) |
May
(27) |
Jun
(22) |
Jul
(72) |
Aug
(82) |
Sep
(86) |
Oct
(138) |
Nov
(100) |
Dec
(62) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
(122) |
Feb
(147) |
Mar
(92) |
Apr
(82) |
May
(101) |
Jun
(153) |
Jul
(37) |
Aug
(34) |
Sep
(46) |
Oct
(46) |
Nov
(6) |
Dec
(38) |
| 2004 |
Jan
(64) |
Feb
(81) |
Mar
(36) |
Apr
(194) |
May
(329) |
Jun
(272) |
Jul
(68) |
Aug
(74) |
Sep
(150) |
Oct
(57) |
Nov
(62) |
Dec
(63) |
| 2005 |
Jan
(78) |
Feb
(30) |
Mar
(137) |
Apr
(78) |
May
(54) |
Jun
(122) |
Jul
(72) |
Aug
(110) |
Sep
(80) |
Oct
(75) |
Nov
(125) |
Dec
(79) |
| 2006 |
Jan
(100) |
Feb
(15) |
Mar
(41) |
Apr
(67) |
May
(30) |
Jun
(11) |
Jul
(14) |
Aug
(22) |
Sep
(20) |
Oct
(14) |
Nov
(11) |
Dec
(15) |
| 2007 |
Jan
(17) |
Feb
(16) |
Mar
(35) |
Apr
(21) |
May
(33) |
Jun
(50) |
Jul
(12) |
Aug
(7) |
Sep
(2) |
Oct
(6) |
Nov
(5) |
Dec
(2) |
| 2008 |
Jan
(14) |
Feb
(20) |
Mar
(35) |
Apr
(9) |
May
(57) |
Jun
(21) |
Jul
(42) |
Aug
(4) |
Sep
(13) |
Oct
(76) |
Nov
(40) |
Dec
(55) |
| 2009 |
Jan
(26) |
Feb
(15) |
Mar
(3) |
Apr
(67) |
May
(32) |
Jun
(39) |
Jul
(59) |
Aug
(31) |
Sep
(59) |
Oct
(64) |
Nov
(21) |
Dec
(10) |
| 2010 |
Jan
(21) |
Feb
(3) |
Mar
(116) |
Apr
(33) |
May
(9) |
Jun
(28) |
Jul
(21) |
Aug
(23) |
Sep
(146) |
Oct
(70) |
Nov
(31) |
Dec
(57) |
| 2011 |
Jan
(33) |
Feb
(22) |
Mar
(11) |
Apr
(21) |
May
(51) |
Jun
(47) |
Jul
(35) |
Aug
(26) |
Sep
(25) |
Oct
(34) |
Nov
(61) |
Dec
(51) |
| 2012 |
Jan
(75) |
Feb
(31) |
Mar
(26) |
Apr
(16) |
May
(24) |
Jun
(24) |
Jul
(31) |
Aug
(46) |
Sep
(36) |
Oct
(28) |
Nov
(37) |
Dec
(21) |
| 2013 |
Jan
(16) |
Feb
(56) |
Mar
(31) |
Apr
(44) |
May
(45) |
Jun
(29) |
Jul
(38) |
Aug
(18) |
Sep
(12) |
Oct
(16) |
Nov
(21) |
Dec
(11) |
| 2014 |
Jan
(13) |
Feb
(14) |
Mar
(28) |
Apr
(7) |
May
(72) |
Jun
(33) |
Jul
(21) |
Aug
(1) |
Sep
(6) |
Oct
(14) |
Nov
(18) |
Dec
(22) |
| 2015 |
Jan
(23) |
Feb
(108) |
Mar
(76) |
Apr
(114) |
May
(60) |
Jun
(9) |
Jul
(8) |
Aug
(9) |
Sep
(42) |
Oct
(9) |
Nov
|
Dec
(7) |
| 2016 |
Jan
(6) |
Feb
(15) |
Mar
(7) |
Apr
|
May
(33) |
Jun
(3) |
Jul
(19) |
Aug
(12) |
Sep
(6) |
Oct
(16) |
Nov
(17) |
Dec
(125) |
| 2017 |
Jan
(66) |
Feb
(98) |
Mar
(29) |
Apr
(32) |
May
(63) |
Jun
(98) |
Jul
(26) |
Aug
(33) |
Sep
(19) |
Oct
(77) |
Nov
(31) |
Dec
(27) |
| 2018 |
Jan
(32) |
Feb
(11) |
Mar
(5) |
Apr
(12) |
May
(4) |
Jun
(9) |
Jul
(9) |
Aug
(13) |
Sep
(11) |
Oct
(6) |
Nov
(23) |
Dec
(2) |
| 2019 |
Jan
(26) |
Feb
(12) |
Mar
(20) |
Apr
(18) |
May
(7) |
Jun
(22) |
Jul
(81) |
Aug
(129) |
Sep
(32) |
Oct
(18) |
Nov
(11) |
Dec
(44) |
| 2020 |
Jan
(19) |
Feb
(10) |
Mar
(38) |
Apr
(4) |
May
(9) |
Jun
(15) |
Jul
(29) |
Aug
(79) |
Sep
(12) |
Oct
(22) |
Nov
(10) |
Dec
(37) |
| 2021 |
Jan
(16) |
Feb
(14) |
Mar
(20) |
Apr
(100) |
May
(21) |
Jun
(19) |
Jul
(13) |
Aug
(13) |
Sep
(37) |
Oct
(112) |
Nov
(64) |
Dec
(22) |
| 2022 |
Jan
(209) |
Feb
(38) |
Mar
(11) |
Apr
(10) |
May
(55) |
Jun
(104) |
Jul
(35) |
Aug
(10) |
Sep
(21) |
Oct
(21) |
Nov
(50) |
Dec
(12) |
| 2023 |
Jan
(6) |
Feb
|
Mar
(3) |
Apr
(41) |
May
(48) |
Jun
(9) |
Jul
(6) |
Aug
(25) |
Sep
(3) |
Oct
(22) |
Nov
(56) |
Dec
(12) |
| 2024 |
Jan
(5) |
Feb
(5) |
Mar
(38) |
Apr
(62) |
May
(12) |
Jun
(10) |
Jul
(3) |
Aug
(59) |
Sep
(2) |
Oct
(36) |
Nov
(14) |
Dec
(3) |
| 2025 |
Jan
(5) |
Feb
(19) |
Mar
(7) |
Apr
(65) |
May
(11) |
Jun
(13) |
Jul
(46) |
Aug
(27) |
Sep
(33) |
Oct
(1) |
Nov
|
Dec
|
|
From: David G. <go...@us...> - 2002-09-10 02:06:01
|
The following message was sent to docutils-users. My Zope knowledge is zero. Can anybody here help Colin? Colin Leath wrote: > I'm attempting to make either (1) a new CMFDocument type or (2) a > script that can process the default CMFDocument to create a > collapsible/expandable (e.g. msword outline view) view of that > document. > > I would like the document to be able to be composed using STX or reST > (http://docutils.sourceforge.net/rst.html) ... The full message can be found here: http://sf.net/mailarchive/forum.php?thread_id=1045103&forum_id=11444 On a related note, I'd appreciate it if docutils-developers would subscribe to docutils-users to help answer the occasional question. Subscribe at: http://lists.sourceforge.net/mailman/listinfo/docutils-users -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: Aahz <aa...@py...> - 2002-09-10 01:39:46
|
On Tue, Aug 27, 2002, David Goodger wrote: > Aahz wrote: >> >> this is relevant to me, but can't talk about it with my broken >> wrist. > > You have my sympathies. That must be affecting your book writing! No shit. :-( I'm up to 75% of my normal typing speed, but I can't type as much as normal. -- Aahz (aa...@py...) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ |
|
From: David G. <go...@us...> - 2002-09-07 17:09:23
|
[Adam]
>>> Is there any way to force line breaks? Example of usage::
>>>
>>> :Name: Adam Chodorowski
>>> :Address: The Street
>>> 11111 The Town
>>> The Country
>>> :Phone: +11-111111
[David]
>> If you're using a recent snapshot, it's already taken care of for
>> you. The "Address" field should produce the equivalent of a
>> line_block element, which is exactly what you describe.
[Adam]
> I'm guessing that is only valid for a bibliographic fields...
Yes. Special processing in a specific context, which cannot be
assumed anywhere else.
>> - You shouldn't really be using field lists in the first place.
>> They're for extension syntax (docinfo & directive options). I'll
>> make this more explicit in the docs. Perhaps a table or a
>> definition list would be better?
>
> I disagree strongly on this. Why on earth shouldn't fields lists be
> usable as a construct of their own?
On April 25, I clarified the spec to say this about field lists: "They
are not intended to be an alternative to definition lists or tables."
I made this admonition stronger yesterday: "Field lists should only be
used when and where application code specifically expects them; they
should *not* be used as arbitrary constructs in documents."
Before you begin your reply, please read to the end. (It's one of my
thinking-an-issue-through-to-the-end essays.)
> Definition lists do fit *conceptually*, but unfortunately not with
> regard to presentation. When you specify contact information or
> similar data, you almost always do it like this::
>
> Name: Adam Chodorowski
> Address: The Street 42
> 11111 The City
> Country: Sweden
> Phone: +46-8-1111111
Actually, I would write it like this::
Adam Chodorowski
The Street 42
11111 The City
Sweden
phone +46-8-1111111
Most of the labels are redundant; people know how to read addresses
from context.
> Having definition lists is *not* intuitive for the reader
...
> nobody excepts contact information to be presented that way.
I wouldn't expect contact information to be presented the "field list"
way either. Are conventions so different in Sweden? Do you really
need to label everything? (But this is avoiding the real issue.)
> Field lists are IMHO perfect for "single vertical table rows". Just
> what the doctor ordered. ;-)
This may a case of abusing markup because it gives the desired
results. In DocBook terms, it's like a doc writer using <emphasis>
instead of <glossterm>, because they want italics in the output. A
*side effect* of <glossterm> may be italics, but the main effect may
be quite different (hyperlinking all glossterms to a glossary, for
example).
But I do see Adam's point. I've been moving towards thinking about
field lists purely as an extension syntax, not as a generic construct.
Partly this is due to outside influence, the incessant
"reStructuredText is too complex!" from certain parties. So we have
to choose:
1. Field lists shall be restricted to extension syntax use (directive
options, docinfo) only.
2. Field lists shall also be available to authors as a generic
two-column table/list construct.
There are consequences. If we choose #1, we should remove field list
support from writers; it would be an error for an unprocessed field
list to remain in the document tree. If we choose #2, we should ask
ourselves, are field lists generic *enough*?
The current field list syntax (geared toward extension syntax use,
choice #1) was modeled after RFC 2822's fields and JavaDoc's "@tags".
It limits field names to single (possibly compound) words. Anything
after whitespace is taken to be field arguments, which are
semantically and presentationally separate from field names. Field
arguments allow the equivalent of a JavaDoc "@param name description"
tag.
In Adam's contact info block above, there's a "phone" field. What if
he wants to differentiate between phone numbers (home, cell, pager)?
Let's assume that in Swedish we cannot rework the field names into
single words; we *have* to use multiple words. We'd want this as
output::
Home Phone: +46-8-1111111
Cell Phone: +46-8-2222222
Pager Number: +46-8-3333333
The closest we could get with current field lists would be like this::
:Home-Phone: +46-8-1111111
:Cell-Phone: +46-8-2222222
:Pager-Number: +46-8-3333333
(Underscores could be used instead of hyphens.) I doubt that this
would be acceptable to most document authors, and the output (with
hyphens intact) would look strange to readers. The alternative is to
redefine field lists to allow multi-word field names, and use a
different syntax for field arguments. Perhaps something like this::
:Field name <field arguments>: Field body
Or perhaps we just drop the idea of field arguments altogether, and
rely on the application code to parse the field name as it likes.
Apart from the test suite, the only place where field arguments are
used is in the "meta" directive, which produces <meta> tags. This is
from tools/test.txt::
.. meta::
:keywords: reStructuredText, test, parser
:description lang=en: A test document, containing at least one
example of each reStructuredText construct.
The "meta" directive code would have no problem extracting the
"lang=en" argument from the field name. The "meta" directive would
specify that the first word of the field name would be taken as the
"name" attribute, and the rest should be "attname=value" attribute
definitions.
I'm beginning to like this change. If we drop the notion of "field
arguments", field lists become simpler and fully generic. Then I'd
drop my objection to their use as a generic document construct. Any
objections to this change?
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: Adam C. <ad...@ch...> - 2002-09-07 08:12:58
|
On Fri, 06 Sep 2002 20:11:44 -0400 David Goodger
<go...@us...> wrote:
> > Is there any way to force line breaks? Example of usage::
> >
> > :Name: Adam Chodorowski
> > :Address: The Street
> > 11111 The Town
> > The Country
> > :Phone: +11-111111
>
> If you're using a recent snapshot, it's already taken care of for you.
> The "Address" field should produce the equivalent of a line_block
> element, which is exactly what you describe.
I'm guessing that is only valid for a bibliographic fields...
> I suspect that this is a document *fragment* you're showing us though.
Exactly, although that snippet I gave you actually *is* part of a
bibliographic field list I would like the functionality in other places as
well.
> In that case,
>
> - You shouldn't really be using field lists in the first place.
> They're for extension syntax (docinfo & directive options). I'll
> make this more explicit in the docs. Perhaps a table or a
> definition list would be better?
I disagree strongly on this. Why on earth shouldn't fields lists be usable as
a construct of their own? Using a table to show contact information is ugly
and non-intuitive (if you only have contact information for one person; if you
have many it does makes sense), using a bulleted/numbered list is definately
not apropriate and a definition list doesn't really fit either.
Definition lists do fit *conceptually*, but unfortunately not with regard to
presentation. When you specify contact information or similar data, you almost
always do it like this::
Name: Adam Chodorowski
Address: The Street 42
11111 The City
Country: Sweden
Phone: +46-8-1111111
Having definition lists is *not* intuitive for the reader, since it results in
(atleast for HTML)::
Name:
Adam Chodorowski
Address:
The Street 42
11111 The City
Country:
Sweden
Phone:
+46-8-1111111
Not to mention that it takes more space, nobody excepts contact information to
be presented that way. Field lists are IMHO perfect for "single vertical table
rows". Just what the doctor ordered. ;-)
[...]
> > I looked through the documentation, but couldn't find any mention
> > of it...
>
> http://docutils.sf.net/spec/rst/directives.html#line-block
That's exactly what I need, thanks. :-)
---
Adam Chodorowski <ad...@ch...>
If entropy is increasing, where is it coming from?
|
|
From: <pf...@ar...> - 2002-09-07 07:38:40
|
Hi, David Goodger: > Although the idea of a generic reStructuredText writer is attractive, > I doubt if it will be easy to write or particularly useful. The value > of reStructuredText is as an easy input format, not as an output > format. > > Typcially in data conversion jobs like yours, the best way to proceed > is 80/20: write some quick & dirty conversion code to do 80% of the > work, then finish up with some manual scrubbing. Hmmm... I tend to disagree here. There are zillions of other formats out there. In some situations a round-tripping workflow is hard to avoid. At least for some periods of time. Obviously in such a situation it is required to find the greatest common set of features of the document types involved. > > Does anyone else think there's value in round-tripping documents? > > Abstract value, yes. I don't know that there's much practical value. I think there is much practical value! For exmaple in the context of Wiki-software (like moinmoin for example) and Zope CMS. It would be nice to move this data, which underlies daily modifications in an agile process to another represantion without having to redo edits again and again until reST could be taken into the production environment. > > Would it be worth adding a marker to the document tree (either an > > element or an attribute) to the effect that 'this is generated > > content from directive xyz with options foo, bar="ni", baz="blue, > > no, green"'. > > I don't think so. It would only be useful as a "cheat" to help > round-trip reStructuredText to reStructuredText processing. Other > than abstract curiosity, I don't see much value in that. The problem would be, that this information will be lost, when writing some other format, which doesn't have a comparable feature. Regards, Peter -- Peter Funk, Oldenburger Str.86, D-27777 Ganderkesee, Germany office: +49 421 20419-0 (ArtCom GmbH, Grazer Str.8, D-28359 Bremen, Germany) |
|
From: David G. <go...@us...> - 2002-09-07 00:14:45
|
Dethe Elza wrote: > Has anyone seen or written a writer which produces reST from the > document tree? Nothing concrete, no, just rumors. > I've had to pause in the include/raw/directive work That's unfortunate. I was looking forward to it. Have you made any progress since the initial zipfile you sent me? If so, please send it on, and I may take a crack at completing it along the lines of our discussions. > while I convert some legacy documents into reST for maintenance. > Some are arbitrary HTML and at least one is DocBook. It occurs to > me that the simplest thing for handling XML formats would be to > convert them into reST's XML format (canonicalization?) and then > into reST for editing. ... > Does anyone have an alternative solution? Although the idea of a generic reStructuredText writer is attractive, I doubt if it will be easy to write or particularly useful. The value of reStructuredText is as an easy input format, not as an output format. Typcially in data conversion jobs like yours, the best way to proceed is 80/20: write some quick & dirty conversion code to do 80% of the work, then finish up with some manual scrubbing. What kind of volume are you dealing with? Factor in the time required to *write* the conversion code into the 80% and manual conversion begins to look better & better -- unless you have a *lot* of data. > I realize that directives, in particular, will not round trip well, > since they are not represented in the document tree. The reStructuredText writer would just have to be very smart in its pattern recognition, that's all. ;-) > Does anyone else think there's value in round-tripping documents? Abstract value, yes. I don't know that there's much practical value. > Would it be worth adding a marker to the document tree (either an > element or an attribute) to the effect that 'this is generated > content from directive xyz with options foo, bar="ni", baz="blue, > no, green"'. I don't think so. It would only be useful as a "cheat" to help round-trip reStructuredText to reStructuredText processing. Other than abstract curiosity, I don't see much value in that. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: David G. <go...@us...> - 2002-09-07 00:08:38
|
Adam Chodorowski wrote:
> Is there any way to force line breaks? Example of usage::
>
> :Name: Adam Chodorowski
> :Address: The Street
> 11111 The Town
> The Country
> :Phone: +11-111111
If you're using a recent snapshot, it's already taken care of for you.
The "Address" field should produce the equivalent of a line_block
element, which is exactly what you describe.
I suspect that this is a document *fragment* you're showing us though.
In that case,
- You shouldn't really be using field lists in the first place.
They're for extension syntax (docinfo & directive options). I'll
make this more explicit in the docs. Perhaps a table or a
definition list would be better?
- You could use the "line-block" directive:
======= =================
Address .. line-block::
The Street
11111 The Town
The Country
======= =================
> I looked through the documentation, but couldn't find any mention
> of it...
http://docutils.sf.net/spec/rst/directives.html#line-block
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: Dethe E. <de...@ma...> - 2002-09-06 17:14:47
|
Hi all, Has anyone seen or written a writer which produces reST from the document tree? I've had to pause in the include/raw/directive work while I convert some legacy documents into reST for maintenance. Some are arbitrary HTML and at least one is DocBook. It occurs to me that the simplest thing for handling XML formats would be to convert them into reST's XML format (canonicalization?) and then into reST for editing. Round tripping is a problem, especially for attributes and styles, but in some ways that's a Good Thing[tm] since it will enforce a certain style to the resulting documents. I realize that directives, in particular, will not round trip well, since they are not represented in the document tree. Are there any other Gotchas that folks can think of offhand? Does anyone else think there's value in round-tripping documents? Would it be worth adding a marker to the document tree (either an element or an attribute) to the effect that 'this is generated content from directive xyz with options foo, bar="ni", baz="blue, no, green"'. Does anyone have an alternative solution? --Dethe |
|
From: Adam C. <ad...@ch...> - 2002-09-06 16:45:44
|
Hi.
Is there any way to force line breaks? Example of usage::
:Name: Adam Chodorowski
:Address: The Street
11111 The Town
The Country
:Phone: +11-111111
This looks very ugly if you run it through docutils, so the ability to force
line breaks (at the obvious places) would be very useful. I looked through the
documentation, but couldn't find any mention of it...
---
Adam Chodorowski <ad...@ch...>
Emacs is a nice OS, but it lacks a good text editor. That's why I am using
Vim.
-- Anonymous
|
|
From: David G. <go...@us...> - 2002-08-30 22:48:43
|
Dethe Elza wrote:
> Can reStructured Text be defined in Extended Backus-Naur Form?
I don't know if it can or not; I've forgotten all I ever knew about
formal grammars. There are at least two problems with EBNF for
reStructuredText: indentation and lookahead. Python's grammar has a
solution for indentation; I don't know if it's applicable to
reStructuredText or not. I don't know about lookahead.
Ed Loper did some work on this early last year. His results are
here:
http://www.cis.upenn.edu/~edloper/pydoc/stminus-001.html
I agree that it would be good if reStructuredText could be expressed
in EBNF.
> This would also separate reStructured Text as a concept and language
> from the particular state-machine architecture and implementation.
That's why the spec was written before the parser, although they have
evolved together since. I've tried to keep the spec current, so that
anyone could take it and implement another parser, if they were so
inclined.
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: David G. <go...@us...> - 2002-08-30 22:47:34
|
Dethe Elza wrote:
> While working my way towards understanding the internals of
> reStructured Text well enough to write directives, I've had some
> questions about the implementation decisions.
Always welcome.
> Disclaimer: Nothing in this is an attack or criticism of anyone, I'm
> just curious and working to understand it myself.
Understood.
> reStructuredText appears to create it's own XML DOM-like internal
> structures and tools, and can be serialized as XML or XHTML. Given
> all that, why not use one of the many XML DOMs that already exist
> for Python (at least one is distributed with the main Python distro)
> rather than recreating the wheel? This would simplify the code base
> and allow reST to work with other tools such as XPath and XSLT.
Yes and no. It would simplify the code in some ways, but make it much
more complicated in others. This question has been asked before; see
this 2001-10-24 Doc-SIG post for my reply (Docutils was called DPS back
then):
http://mail.python.org/pipermail/doc-sig/2001-October/002331.html
Of course, there's been a lot of progress since then, including the
addition of Visitor-pattern classes to nodes.py (briefly alluded to in
the above post).
BTW, I've been working on a reference for the DTD. Both the reference
and the DTD should be useful to you:
http://docutils.sf.net/spec/doctree.html
http://docutils.sf.net/spec/docutils.dtd
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: Dethe E. <de...@ma...> - 2002-08-30 20:29:38
|
Dethe's Design Question #2: Can reStructured Text be defined in Extended Backus-Naur Form? This is a standard syntax for describing parsers, used by the [f]lex and yacc/bison tools, among many others. Having an EBNF description could have several benefits: - Allow parallel implementations in other languages - Formal definition for language lawyers - Help newcomers bootstrap, since EBNF is a standard way to describe languages - Aid in compliance and completeness testing This would also separate reStructured Text as a concept and language from the particular state-machine architecture and implementation. Besides the standard Unix lexer and parser tools, there are several general-purpose parsers in Python already which could possibly be leveraged to reduce the complexity of the reStructured Text implementation. More info: David Mertz on the Spark Parser: http://www-106.ibm.com/developerworks/linux/library/l-spark.html David Mertz on the SimpleParse Parser: http://www-106.ibm.com/developerworks/library/l-simple.html Markus Kuhn's resources for the ISO EBNF standard: http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html OK, I'm a big David Mertz fan. He's got his own lightweight syntax that does a small subset of what reStructured Text does, which he uses as an example in the above articles. He's also working on a book called "Text Processing in Python" which has a chapter on parsers and state machines, and may be relevant generally to reST developers. http://gnosis.cx/TPiP/ --Dethe -- "Melting down holy grails to make silver bullets for my .357 Panacea" Dethe Elza (de...@bu...) Chief Mad Scientist Enfolding Systems (http://enfoldingsystems.com) Weblog: http://livingcode.ca/ |
|
From: Dethe E. <de...@ma...> - 2002-08-30 20:15:01
|
While working my way towards understanding the internals of reStructured Text well enough to write directives, I've had some questions about the implementation decisions. Disclaimer: Nothing in this is an attack or criticism of anyone, I'm just curious and working to understand it myself. I'm going to put each question in it's own email, in case they spawn any discussion. That aside, here's Dethe's Design Question #1: reStructuredText appears to create it's own XML DOM-like internal structures and tools, and can be serialized as XML or XHTML. Given all that, why not use one of the many XML DOMs that already exist for Python (at least one is distributed with the main Python distro) rather than recreating the wheel? This would simplify the code base and allow reST to work with other tools such as XPath and XSLT. --Dethe -- "Melting down holy grails to make silver bullets for my .357 Panacea" Dethe Elza (de...@bu...) Chief Mad Scientist Enfolding Systems (http://enfoldingsystems.com) Weblog: http://livingcode.ca/ |
|
From: David G. <go...@us...> - 2002-08-30 04:20:18
|
With many thanks to Barry Warsaw for his help and patience, I am
pleased to announce that a new format for PEPs (Python Enhancement
Proposals) has been deployed. The new format is reStructuredText, a
lightweight what-you-see-is-what-you-get plaintext markup syntax and
parser component of the Docutils project. From the new PEP 12:
ReStructuredText is offered as an alternative to plaintext PEPs,
to allow PEP authors more functionality and expressivity, while
maintaining easy readability in the source text. The processed
HTML form makes the functionality accessible to readers: live
hyperlinks, styled text, tables, images, and automatic tables of
contents, among other advantages.
The following PEPs have been marked up with reStructuredText:
- PEP 12 -- Sample reStructuredText PEP Template
(http://www.python.org/peps/pep-0012.html)
- PEP 256 -- Docstring Processing System Framework
(http://www.python.org/peps/pep-0256.html)
- PEP 257 -- Docstring Conventions
(http://www.python.org/peps/pep-0257.html)
- PEP 258 -- Docutils Design Specification
(http://www.python.org/peps/pep-0258.html)
- PEP 287 -- reStructuredText Docstring Format
(http://www.python.org/peps/pep-0287.html)
- PEP 290 -- Code Migration and Modernization
(http://www.python.org/peps/pep-0290.html)
In addition, the text of PEP 1 and PEP 9 has been revised.
Authors of new PEPs are invited to consider using the new format, and
authors of existing PEPs are invited to convert their PEPs to
reStructuredText to take advantage of the many enhancements over the
plaintext format. I, along with the other Docutils developers and
users, will be happy to assist. Please send questions to:
doc...@li...
The latest project snapshot can always be downloaded from:
http://docutils.sourceforge.net/docutils-snapshot.tgz
(This is required to process the PEP source into HTML. It requires
at least Python 2.0; Python 2.1 or later is recommended.)
Docutils and reStructuredText are under active development. Input is
very welcome, especially HTML rendering/stylesheet issues with
different browsers. We welcome new contributors. If you'd like to
get involved, please visit:
http://docutils.sourceforge.net/
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: David G. <go...@us...> - 2002-08-30 01:21:53
|
Looks great! I can't wait to see the finished version, and especially the example code! ;-) -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: Dethe E. <de...@ma...> - 2002-08-29 21:56:21
|
Here's what it's looking like so far. I'm reworking the examples then I'll write up those sections stepping through the code. |
|
From: David G. <go...@us...> - 2002-08-29 00:07:45
|
[David]
>> You almost had me convinced. But after looking at the existing
>> directive code, I saw a problem. Directives that have options,
>> like "image" or "contents", use the options dictionary returned by
>> "docutils.utils.extract_extension_options" to update another
>> dictionary, either "option_presets" or "pending.details". In the
>> case of "image", the "option_presets" parameter contains an "alt"
>> entry, a filename to be used as the last-resort default for the
>> "alt" option. Setting default option values for unspecified
>> options means that the dict.update code would have to become very
>> nasty. *That's* where the lack of defaults pays off: a
>> ``presets.update(options)`` operation doesn't clobber legitimate
>> presets with bogus defaults.
[Dethe]
> But in this case, aren't the presets the defaults?
Yes, but only if there *are* presets (they're rare), and only if
defaults *always* make sense (they don't). Take a look at a
substitution definition which uses an "image" directive::
.. |symbol| image:: symbol.png
:height: 50
:width: 100
This will produce the following pseudo-XML::
<substitution_definition name="symbol">
<image alt="symbol" height="50" uri="symbol.png" width="100">
The "alt" attribute comes from the substitution name (bracketed by
"|"). With a straight "image" directive (not in a substitution
definition), what should the default "alt" option be? We rejected the
URL as a default long ago. The only default that would make sense
would be some form of "no value", like ``None``. But then the code
would have to special-case a ``None`` value, removing the option at
some point. Seems like a lot more work than checking for dictionary
keys before accessing them.
And what if there are *no* options? ::
.. |symbol| image:: symbol.png
What should the default values be for the "height" and "width"
options? The directive code would have to go through the options
dictionary and remove any that don't make sense (like ``{'height':
None, 'width': None}``). Here's a perfect example of "not specified"
being different from "default value".
> Wouldn't making the presets act specifically as defaults make the
> code simpler rather than more complicated? Or am I just not getting
> it?
When you began this thread, I had a "won't work" feeling, but didn't
know exactly why. During the thread, you almost convinced me twice.
But the feeling never went away and every time I examined the existing
code and examples, I've rediscovered cases where it would be a lot
*more* work to go the defaults route. It just seems so much simpler
the way it is, and I don't see much benefit from using defaults. Have
I convinced you yet? :-)
If not, please code up a solution using defaults and convince me with
proof, not words. ;-)
> Sorry to be so blockheaded on such a relatively trivial matter.
Not at all. Such questions help us examine assumptions that were
made, often without sufficient initial thought. Why did I implement
this the way I did? Straight answer: it made sense at the time. But
now I have to justify the decision. Sometimes examining the decision
in detail invalidates it, sometimes it reinforces its validity.
Either way, asking the questions benefits the project.
For example, Dmitry's "sectnum" directive/transform hinted at a
weakness in the transform priority system. The weakness was fully
exposed by the "target-notes" directive/transform I implemented for
PEPs. It just goes to show that we can't cling to our code or ideas;
we have to be willing to throw them away if and when they're shown to
be deficient in some way. But that's not always easy to do.
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: Dethe E. <de...@ma...> - 2002-08-28 18:24:55
|
Off topic, but to follow up on David's response to Aahz: The Twiddler is a favorite of wearable-computer enthusiasts. I have one, but waiting for my new laptop to train myself on it. It combines mouse and keyboard on one hand fairly elegantly, and can be used with either hand, plus it's programmable so you can type in words and phrases with a single chord. http://www.handykey.com/ --Dethe |
|
From: Dethe E. <de...@ma...> - 2002-08-28 17:25:17
|
[snipped a bunch of to and fro about default arguments] > You almost had me convinced. But after looking at the existing > directive code, I saw a problem. Directives that have options, like > "image" or "contents", use the options dictionary returned by > "docutils.utils.extract_extension_options" to update another > dictionary, either "option_presets" or "pending.details". In the case > of "image", the "option_presets" parameter contains an "alt" entry, a > filename to be used as the last-resort default for the "alt" option. > Setting default option values for unspecified options means that the > dict.update code would have to become very nasty. *That's* where the > lack of defaults pays off: a ``presets.update(options)`` operation > doesn't clobber legitimate presets with bogus defaults. But in this case, aren't the presets the defaults? Wouldn't making the presets act specifically as defaults make the code simpler rather than more complicated? Or am I just not getting it? Sorry to be so blockheaded on such a relatively trivial matter. > But at least it's an established standard that many people are > familiar with. And directives *are* commands; they're commands to the > parser from inside the document. Again, they're *not* XML elements. No, they're XML Processing Instructions <0.5 wink> --Dethe |
|
From: David G. <go...@us...> - 2002-08-28 00:30:25
|
Dethe Elza wrote:
>> The concept of "optional" is useful. I'm using the XML idea of
>> #IMPLIED attributes, rather than defaults in the DTD (or in the
>> attribute parsing code, in our case). I've always found it more
>> flexible for the downstream parts of the processing chain to make the
>> decisions; keep your options open. If you put in default values
>> early, you lose the information that the attribute just *wasn't
>> specified*, and that can be valuable information lost.
>
> I think that this complicates processing by the user (programmer) of
> a directive. When searching for a value, we'll always have to test
> whether the value is set or not. Note, that the #IMPLIED vs. DTD
> default isn't what I'm arguing here--that's an implementation
> detail. If you're parsing HTML or another known format and the spec
> says that htmlOption.selected returns a boolean, you want to get a
> boolean whether the option was set explicitly <option
> selected="false"/> or not <option/>.
But directives are *not* XML tags. I think that the now-obsolete
"attribute" terminology was misleading. Calling them "options" is
much better. Let's drop the name "attribute" altogether.
> If all we want to know is, 'was this set?' we can initialize to
> false (0) and pass the exists function ('def exists(arg): return
> 1). If we need to know existence as well as value, then return a
> tuple (def existsInt(arg): return (1, int(arg)). Again, this is
> explicit vs. implicit, and it allows you to change things like:
>
> if attributes.has_key('selected') and attributes['selected']:
> else:
> do_if_true()
> else:
> do_if_false()
>
> to:
>
> if attributes['selected']:
> do_if_true()
> else:
> do_if_false()
>
> Which looks cleaner to me. I guess there's some question of whether
> this is a valid usecase. Personally, I would rather know that I can
> access an attribute and get a value back consistently, without
> having to test for it. I guess it depends on where and how often we
> need to know whether a value has not been set at all.
I first thought up a hypothetical use case counterargument, but it
wasn't convincing:
Say there's a global-impact directive with an option which sets
some persistent parameter, and this directive can occur multiple
times. Subsequent occurrences should use the previous directive's
option settings by default (i.e., don't override the persistent
parameter unless the option was explicitly set). The second time
the directive occurs, if there's a default value for the option,
there's no way to know that the option wasn't explicitly set, and
the default from the first directive's option will be lost.
Perhaps all we have to do is to use an invalid default value (like
``None``), and check for that before resetting the persistent
parameter.
You almost had me convinced. But after looking at the existing
directive code, I saw a problem. Directives that have options, like
"image" or "contents", use the options dictionary returned by
"docutils.utils.extract_extension_options" to update another
dictionary, either "option_presets" or "pending.details". In the case
of "image", the "option_presets" parameter contains an "alt" entry, a
filename to be used as the last-resort default for the "alt" option.
Setting default option values for unspecified options means that the
dict.update code would have to become very nasty. *That's* where the
lack of defaults pays off: a ``presets.update(options)`` operation
doesn't clobber legitimate presets with bogus defaults.
I don't find testing for the existence of dictionary keys particularly
onerous. And in this case, the solution seems to be much worse than
the problem.
Unless there's another solution?
>> Modelling directives on shell commands works well. Let's
>> go with it.
>
> Does it?
For me, yes. Can't speak for everybody. :-)
But at least it's an established standard that many people are
familiar with. And directives *are* commands; they're commands to the
parser from inside the document. Again, they're *not* XML elements.
> I find shell command options to be really difficult to remember,
> after ten years of steady use.
Details are always hard to remember. I don't know that we can do more
than standardize the directive interface (into arguments, options, and
content, which we're doing now), choose good names for options, and
document it all well. Directives are always going to be "power user"
tools.
> And shell commands are inherently verbs, making their options
> adverbs, if you will. Documents are inherently nouns, making their
> option/attributes adjective, to stretch a metaphor.
I'd say that directives are verbs too, although most are named after
nouns. The directive syntax should be read as "do X" or "make an X".
But the syntax is easily ignored.
>> Out of curiosity, what would a Zope/ZReST URL look like?
...
> In other words, the URI paths are turned into object refernces in
> the ZODB (Zope Object Database) and the default action is taken on
> them (call, if it's a method, transform into HTML if it's data,
> etc.) It's a rich and complex environment, with a bit too much
> magic going on behind the scenes for my taste.
I see. Thanks for the explanation!
>> * ".. raw:: format" + either:
>>
>> - a second, optional "filepath" argument, or
>> - a "source" (or equivalent) option, or
>> - (perhaps best?) *two* options, one for a filesystem path source,
>> the other for a URL source (and we can't use both in one
>> directive)
>
> There's actually a usecase for including both. It's the same as
> using both PUBLIC and SYSTEM identifiers for a DTD. Basically, it
> says, get it from this URL if available, or this file if you can't
> get to the URL for some reason. A fallback option, in other words.
I don't know if we're ever "gonna need it", but go ahead if you're
keen.
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: David G. <go...@us...> - 2002-08-28 00:26:14
|
Aahz wrote:
> this is relevant to me, but can't talk about it with my broken
> wrist.
You have my sympathies. That must be affecting your book writing!
Hope you're back to normal soon.
Have you looked into one-handed keyboards? If it's your right hand
that's broken, there's a nifty half-keyboard here (designed for use
with a Palm, but works with anything I think):
http://www.aboutonehandtyping.com/bat.html
Also, there's the TouchStream keyboards that combine keyboard & mouse
in one unit. The TouchStream Mini might help:
http://www.fingerworks.com/overview.html
I'd like to get a TouchStream ST for myself, but can't afford it right
now.
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|
|
From: David G. <go...@us...> - 2002-08-28 00:24:25
|
fantasai wrote: > I have a feeling a lot of SGML/XML markup formats don't have their > own MIME type--that they're just sent as text/sgml or text/xml. > Perhaps :raw:'s value shouldn't be half a MIME type, but a token > used as a language identifier. One might want to skip printing raw > content that doesn't match the output format, and if there are > several different XML languages, a MIME type won't label the > difference. Good point. The intended usage of the "format" argument is exactly that: a token matching the output format (Writer name or alias). I don't think MIME types are suitable for our purposes. -- David Goodger <go...@us...> Open-source projects: - Python Docutils: http://docutils.sourceforge.net/ (includes reStructuredText: http://docutils.sf.net/rst.html) - The Go Tools Project: http://gotools.sourceforge.net/ |
|
From: Dethe E. <de...@ma...> - 2002-08-27 17:16:54
|
> And yes, you're right about returning a uniform set of data. But we
> actually need a 4-tuple; read on.
Yes, I originally wrote "4-tuple," then modified it to reflect what
you'd written about the spec.
> The concept of "optional" is useful. I'm using the XML idea of
> #IMPLIED attributes, rather than defaults in the DTD (or in the
> attribute parsing code, in our case). I've always found it more
> flexible for the downstream parts of the processing chain to make the
> decisions; keep your options open. If you put in default values
> early, you lose the information that the attribute just *wasn't
> specified*, and that can be valuable information lost.
I think that this complicates processing by the user (programmer) of a
directive. When searching for a value, we'll always have to test
whether the value is set or not. Note, that the #IMPLIED vs. DTD
default isn't what I'm arguing here--that's an implementation detail.
If you're parsing HTML or another known format and the spec says that
htmlOption.selected returns a boolean, you want to get a boolean whether
the option was set explicitly <option selected="false"/> or not
<option/>.
If all we want to know is, 'was this set?' we can initialize to false
(0) and pass the exists function ('def exists(arg): return 1). If we
need to know existence as well as value, then return a tuple (def
existsInt(arg): return (1, int(arg)). Again, this is explicit vs.
implicit, and it allows you to change things like:
if attributes.has_key('selected') and attributes['selected']:
else:
do_if_true()
else:
do_if_false()
to:
if attributes['selected']:
do_if_true()
else:
do_if_false()
Which looks cleaner to me. I guess there's some question of whether
this is a valid usecase. Personally, I would rather know that I can
access an attribute and get a value back consistently, without having to
test for it. I guess it depends on where and how often we need to know
whether a value has not been set at all.
> Modelling directives on shell commands works well. Let's go with it.
Does it? I find shell command options to be really difficult to
remember, after ten years of steady use. And shell commands are
inherently verbs, making their options adverbs, if you will. Documents
are inherently nouns, making their option/attributes adjective, to
stretch a metaphor.
> > Well, I think we *do* need URIs if we want includes to be useful to
> > ZReST (which is not file-system based, but lives in an
> > object-oriented database accessible by URI). Obviously, we don't
> > need a file: URI for such a trivial (and file-based) example, that's
> > just a way to test that URIs work in general.
>
> OK, good use case. But I'm still uncomfortable with the "openAny"
> function you sent::
>
> def openAny(path):
> try:
> # is it a file?
> return open(path)
> except :
> try:
> # is it a url?
> return urlopen(path)
> except (URLError, ValueError):
> # treat as a string
> return StringIO(path)
>
> Especially the final StringIO part; the function should simply fail.
> Should we check if "path" is a URL first, to avoid the "open(path)"
> failure? Or is this a case of "look before you leap" vs. "it's easier
> to ask forgiveness than permission"? Can a URI look like a filesystem
> path? Seems a bit ambiguous to me.
Sure. There's no need to have openAny handle strings if we're not
allowing the include directive to have a raw attribute. That was an
artifact of how I was interpreting include at the time.
> Out of curiosity, what would a Zope/ZReST URL look like?
Zope turns object references into URI paths. So a folder which lives at
http://myserver.com/myfolder
could have a document which lives at
http://myserver.com/myfolder/mydocument
which could be processed by a method edit, returning the document in an
edit form
http://myserver.com/myfolder/mydocument/edit
or maybe searched by xpath
http://myserver.com/myfolder/mydocument/xpath?/root/branch/leaf[@selected]
In other words, the URI paths are turned into object refernces in the
ZODB (Zope Object Database) and the default action is taken on them
(call, if it's a method, transform into HTML if it's data, etc.) It's a
rich and complex environment, with a bit too much magic going on behind
the scenes for my taste.
> > The way I was grabbing (or failing to grab?) the content threw an
> > exception if I put the attributes first, but worked OK if I put them
> > after. I didn't like this either.
>
> The spec is there for a reason. :-) But it's not immutable. It can be
> changed when there's good reason. The code too; there's a *reason*
> we're not at release 1.0 yet! This is a learning experience.
Yup. And I'm learning. I have read the spec, but I haven't
*internalized* it yet the way I have with, say, the DOM. So I'm
learning mostly from the examples of the code, and trying to keep to the
spec. But I know I'll still goof at this point (this is very much a
side project for me right now), so I'm very glad you're there for a
sanity check.
> > But it seems more intuitive to me to put the required argument
> > first: so the first argument to :raw: would always be format and we
> > don't have to run tests or special cases. If there's a second
> > argument, then it's path.
>
> You mis-read. It *is* ".. raw:: format [path]" (the *second*
> argument, "path", is optional).
Oops. My bad.
> > This still isn't as clear and explicit (to me) as using attributes,
> > but better than having argument position be dependent on number of
> > arguments.
>
> It could be::
>
> .. raw:: format
> :source: path/URL
>
> Nothing wrong with that, I suppose. Since it is an *optional*
> argument, it does fit into the "option" mold. And in fact, it could
> be even more explicit (and remove my misgivings at the same time) if
> we made the option more specific::
>
> .. raw:: format
> :file: path
>
> .. raw:: format
> :url: URL
Hey, I like it.
> > Yikes. Now I see why you want get_directive_content(). Doesn't
> > that introduce all sorts of possible ambiguities? Shouldn't there
> > be one format for directives in all cases so that the users (typing
> > in raw text with reST getting in their way as little as possible)
> > don't have to remember the special cases for different directives?
>
> Yes and yes, and I came to the same conclusion, as discussed above.
> Just "parse_directive()" will be sufficient.
Cool.
> * ".. include:: filepath" (must be reStructuredText).
>
> * ".. raw:: format" + either:
>
> - a second, optional "filepath" argument, or
> - a "source" (or equivalent) option, or
> - (perhaps best?) *two* options, one for a filesystem path source,
> the other for a URL source (and we can't use both in one
> directive)
There's actually a usecase for including both. It's the same as using
both PUBLIC and SYSTEM identifiers for a DTD. Basically, it says, get
it from this URL if available, or this file if you can't get to the URL
for some reason. A fallback option, in other words.
> If the "external source" argument or option is specified (in
> whatever form), there can be no directive content. If there is,
> it's an error.
>
> * A single function for directives/__init__.py, "parse_directive".
> (Plus any auxiliary functions required, of course.)
>
> * An exception for directives/__init__.py::
>
> class DirectiveParseError(docutils.ApplicationError): pass
>
> (It doesn't need an "__init__" method, but it doesn't hurt much.)
>
> The signature for "parse_directive" could be something like this::
>
> def parse_directive(match, type_name, state, state_machine,
> option_presets, arguments=None,
> option_spec={}, content=None):
> """
> Parameters:
>
> - `match`, `type_name`, state`, `state_machine`, and
> `option_presets`: See `docutils.parsers.rst.directives.__init__`.
> - `arguments`: A 2-tuple of the number of ``(required,
> optional)`` whitespace-separated arguments to parse, or
> ``None`` if no arguments (same as ``(0, 0)``). If an
> argument may contain whitespace (multiple words), specify
> only one argument (either required or optional); the client
> code must do any context-sensitive parsing.
> - `option_spec`: A dictionary, mapping known option names to
> conversion functions such as `int` or `float`. ``None`` or
> an empty dict implies no options to parse.
> - `content`: A boolean; true if content is allowed. Client
> code must handle the case where content is required but not
> supplied (an empty content list will be returned).
>
> Returns a 4-tuple: list of arguments, dict of options, list of
> strings (content block), and a boolean (blank finish).
>
> Or raises `DirectiveParseError` with arguments: node (system
> message), boolean (blank finish).
> """
>
> Once "parse_directive" is ready, we'll be able to convert all existing
> directives use it. As a side-effect, we will be able to drop the
> "data" parameter from the directive function signature. The end
> result will be much simpler directive code. Great result!
Looks great. Thanks for the comments and the help.
--Dethe
|
|
From: Aahz <aa...@py...> - 2002-08-27 14:48:42
|
On Mon, Aug 26, 2002, David Goodger wrote: > > I don't mind at all. Actually, I prefer the discussions online; it > opens up the possibility for more input. this is relevant to me, but can't talk about it with my broken wrist. -- Aahz (aa...@py...) <*> http://www.pythoncraft.com/ Project Vote Smart: http://www.vote-smart.org/ |
|
From: David G. <go...@us...> - 2002-08-27 00:49:12
|
I'd already begun a follow-up to my last message, which I'll
interweave into this reply.
[Dethe]
> Thanks for the great feedback. I hope you don't mind me taking this
> back online, I think there's some valuable stuff here.
I don't mind at all. Actually, I prefer the discussions online; it
opens up the possibility for more input.
[David]
>> I just hope I don't scare you away.
>
> Not a bit of it.
Glad to hear it. In the message I'd already begun, I wrote:
If this small task has grown too big or or time-consuming or
onerous for you to handle, please let me know and I'll take over,
with gratitude for work done and ideas inspired.
So you've got an escape clause if you change your mind. ;-)
>> I'm looking at the "utils.cheapDirective" function you wrote. I
>> think it's a very good idea to provide a generic directive parser
>> function, but to be truly useful, this function needs to be even
>> *more* generic. Please bear with me here.
>
> More generic would be good, I was working out from the images
> directive and trying to factor what was specfic to that directive
> from what was common to all directives as I went.
And a very fruitful side-trip it has become! It's a step I hadn't
taken. Every new directive began with a copy & paste from the most
similar existing one, but I hadn't looked at the patterns yet.
>> From the markup spec, there are 3 physical parts to a directive:
>>
>> 1. Directive type (identifier before the "::")
>> 2. Directive data (remainder of the first line)
>> 3. Directive block (indented text following the first line)
>
> Except that there is a fourth part: The attribute list.
That's a *logical* part, not physical (at least, not by the original
definition). No matter; I think we'll dispense with the old
"physical" terms as obsolete.
>> I think the "include" directive should be reStructuredText-only,
>> and the "raw" directive should have an optional second argument, a
>> path. They definitely should not duplicate each other's
>> functionality (TOOWTDI). In the terms defined above,
>
> I'm not disagreeing, but I got a bit lost in the debate over :raw:
> vs. :include: and wanted a) some context to help me form an
> opinion, and b) to test the generalized directive parsing by using
> it for more than one directive.
Sure. That's cool.
>> - "include" should have one argument, a path. (1)
>>
>> - "raw" should have one or two arguments -- a format ("html" etc.)
>> and an optional path -- and content, but only if there was no
>> path argument. So this would be either a (1) or a (1,3)
>> directive. So that forces us to split the parse in two, perhaps
>> into "parse_directive" (which parses the arguments & attributes),
>> plus "get_directive_content".
>
> Not necessarily. Is there any reason we can't parse a directive and
> return a 3-tuple (arguments, options, content). It would then be up
> to the individual directive to test that these do have values, if
> required.
Originally, I was thinking that if a directive didn't *need* a content
block, it shouldn't consume it any that happened to be there. That
would allow a directive to be followed by a block quote, but it makes
directives conceptually more difficult than they need to be. While
updating the markup and directive specs as per my last message, I
thought some more about the logical parts of a directive, and I now
believe that the original thinking was wrong. Instead, all indented
text following a directive *should* be consumed by the directive. The
"parse_directive" function should signal an error if there *is*
content, but the directive doesn't ask for it. If a block quote
should follow a directive, an empty comment inserted between them will
do the trick. Docs updated accordingly.
And yes, you're right about returning a uniform set of data. But we
actually need a 4-tuple; read on.
>> Note that there are no attributes/options now. If an attribute is
>> required, it shouldn't be an attribute. It's analogous to
>> command-line options: "required option" is an oxymoron. I'm thinking
>> of changing the terminology in the spec from "attribute" to "option"
>> to help reinforce this.
(As I threatened here, I've changed terminology from "directive
attributes" to "directive options", in the docs and in the parser
code. See the latest CVS or snapshot.)
> OK, but this means sub-parsing the arguments instead of pulling data
> out of the attributes. My preference would be to have a standard
> way to specify not only the types of attributes, but default values
> for them. Then there are no optional attributes, just default or
> explicitly set. This is a more XML-ish way to go.
The concept of "optional" is useful. I'm using the XML idea of
#IMPLIED attributes, rather than defaults in the DTD (or in the
attribute parsing code, in our case). I've always found it more
flexible for the downstream parts of the processing chain to make the
decisions; keep your options open. If you put in default values
early, you lose the information that the attribute just *wasn't
specified*, and that can be valuable information lost.
Modelling directives on shell commands works well. Let's go with it.
>> Looking at the directives in the test1.rst file in order, first
>> there's::
>>
>> .. include:: file:test2.rst
>>
>> I'm not comfortable with URL syntax here. I really think it's
>> YAGNI, and may open up a big can of worms. So that one should
>> become::
>>
>> .. include:: test2.rst
>
> Well, I think we *do* need URIs if we want includes to be useful to
> ZReST (which is not file-system based, but lives in an
> object-oriented database accessible by URI). Obviously, we don't
> need a file: URI for such a trivial (and file-based) example, that's
> just a way to test that URIs work in general.
OK, good use case. But I'm still uncomfortable with the "openAny"
function you sent::
def openAny(path):
try:
# is it a file?
return open(path)
except :
try:
# is it a url?
return urlopen(path)
except (URLError, ValueError):
# treat as a string
return StringIO(path)
Especially the final StringIO part; the function should simply fail.
Should we check if "path" is a URL first, to avoid the "open(path)"
failure? Or is this a case of "look before you leap" vs. "it's easier
to ask forgiveness than permission"? Can a URI look like a filesystem
path? Seems a bit ambiguous to me.
Out of curiosity, what would a Zope/ZReST URL look like?
>> The next one is an "include"/"raw" hybrid::
>>
>> .. include:: test3.rst
>> :raw:
>> :format: html
>
> Well, there were a few possibilities tossed around in email. I
> couldn't remember why the format was even needed (it's *raw*, what
> else do we need?), but then my data wouldn't show up using html.py
> and I had to dig through the writer code to figure out that raw
> nodes are tested for format type.
Yes, it wouldn't do much good to insert raw HTML in the middle of PDF.
The "raw" directive is meant to be a solution of last resort anyhow;
it's not portable.
>> It should be::
>>
>> .. raw:: html test3.rst
>
> OK, if that's the correct way to do it in reST, I'll do that. I
> still prefer the default attributes method discussed above rather
> than multiple arguments because it makes the names and types of the
> argument/option/attribute explicit.
Discussed further below.
>> I noticed that you've got the attributes *after* the content in the
>> next ones::
>>
>> .. include::
>> This is a <super>Test</super> of the <sub>Emergency</sub>
>> <strike>Broadcasting</strike> "System"
>> :raw:
>> :format: html
>
> Yes, that was my mis-parsing of the directive (because I was working
> from images.py, not the spec).
Aha!
> The way I was grabbing (or failing to grab?) the content threw an
> exception if I put the attributes first, but worked OK if I put them
> after. I didn't like this either.
The spec is there for a reason. :-) But it's not immutable. It can be
changed when there's good reason. The code too; there's a *reason*
we're not at release 1.0 yet! This is a learning experience.
>> Attributes always come *before* the content (who's to say that
>> ":format: html" isn't valid raw data in some format?). In any case,
>> that directive should become::
>>
>> .. raw:: html
>
> OK. But the same arguments apply. Who's to say :format: html isn't
> valid raw data at the beginning of the data as well as easily as at
> the end? I agree that it's aethetically and functionally better to
> have the attributes first, just not because the string ":format:
> html" might appear in the data.
That's why the blank line is necessary between options and content.
> Also, the problem with using multiple comes up. Arguments to
> :raw: are
>
> path format
> or
> format
>
> But it seems more intuitive to me to put the required argument
> first: so the first argument to :raw: would always be format and we
> don't have to run tests or special cases. If there's a second
> argument, then it's path.
You mis-read. It *is* ".. raw:: format [path]" (the *second*
argument, "path", is optional).
> This still isn't as clear and explicit (to me) as using attributes,
> but better than having argument position be dependent on number of
> arguments.
It could be::
.. raw:: format
:source: path/URL
Nothing wrong with that, I suppose. Since it is an *optional*
argument, it does fit into the "option" mold. And in fact, it could
be even more explicit (and remove my misgivings at the same time) if
we made the option more specific::
.. raw:: format
:file: path
.. raw:: format
:url: URL
Although better option names may exist.
>> Next::
>>
>> .. raw::
>> This is <strong>RAW</strong>. Really, <em>really,</em> raw.
>> :format: html
>>
>> Should become::
>>
>> .. raw:: html
>>
>> This is <strong>RAW</strong>. Really, <em>really,</em> raw.
>>
>> (Note the blank line.)
>
> Oh, there's a blank line *before* the content. I didn't get that.
Yes. The directives.txt file was mistaken for the description of
"raw" (although if you didn't *read* it, that shouldn't have mattered
;-). It's fixed now, and the specs are much more explicit. See
http://docutils.sf.net/spec/rst/reStructuredText.html#directives, and
http://docutils.sf.net/spec/rst/directives.html.
> I thought the directive *ends* with a blank line.
There was some ambiguity about that, gone now. The rule is, a
directive block ends with the end of indentation. That's it. If a
directive doesn't need a content block, it should be empty, otherwise
it's an error.
> Yikes. Now I see why you want get_directive_content(). Doesn't
> that introduce all sorts of possible ambiguities? Shouldn't there
> be one format for directives in all cases so that the users (typing
> in raw text with reST getting in their way as little as possible)
> don't have to remember the special cases for different directives?
Yes and yes, and I came to the same conclusion, as discussed above.
Just "parse_directive()" will be sufficient.
>> The last "raw" directive was::
>>
>> .. raw::
>> :include: test3.rst
>> :format: html
>>
>> And should become::
>>
>> .. raw:: html test3.rst
>>
>> (This is the same as the second directive, therefore redundant.)
>
> The redundancy is deliberate. I was testing two different
> possibilities and wanted to be sure they both worked. They do the
> same thing, therefore they work.
Sorry, I should have said, "(This is *now* the same as the second
directive, therefore redundant.)". :-)
> So what you'd like to see is:
>
> :raw: [filepath] format
>
> :include: filepath (which must be reST)
>
> and functions for directives/__init__.py
>
> parse_directive()
>
> and
>
> get_directive_content()
>
> right?
Not quite. Just to be clear, let's summarize:
* ".. include:: filepath" (must be reStructuredText).
* ".. raw:: format" + either:
- a second, optional "filepath" argument, or
- a "source" (or equivalent) option, or
- (perhaps best?) *two* options, one for a filesystem path source,
the other for a URL source (and we can't use both in one
directive)
If the "external source" argument or option is specified (in
whatever form), there can be no directive content. If there is,
it's an error.
* A single function for directives/__init__.py, "parse_directive".
(Plus any auxiliary functions required, of course.)
* An exception for directives/__init__.py::
class DirectiveParseError(docutils.ApplicationError): pass
(It doesn't need an "__init__" method, but it doesn't hurt much.)
The signature for "parse_directive" could be something like this::
def parse_directive(match, type_name, state, state_machine,
option_presets, arguments=None,
option_spec={}, content=None):
"""
Parameters:
- `match`, `type_name`, state`, `state_machine`, and
`option_presets`: See `docutils.parsers.rst.directives.__init__`.
- `arguments`: A 2-tuple of the number of ``(required,
optional)`` whitespace-separated arguments to parse, or
``None`` if no arguments (same as ``(0, 0)``). If an
argument may contain whitespace (multiple words), specify
only one argument (either required or optional); the client
code must do any context-sensitive parsing.
- `option_spec`: A dictionary, mapping known option names to
conversion functions such as `int` or `float`. ``None`` or
an empty dict implies no options to parse.
- `content`: A boolean; true if content is allowed. Client
code must handle the case where content is required but not
supplied (an empty content list will be returned).
Returns a 4-tuple: list of arguments, dict of options, list of
strings (content block), and a boolean (blank finish).
Or raises `DirectiveParseError` with arguments: node (system
message), boolean (blank finish).
"""
Once "parse_directive" is ready, we'll be able to convert all existing
directives use it. As a side-effect, we will be able to drop the
"data" parameter from the directive function signature. The end
result will be much simpler directive code. Great result!
--
David Goodger <go...@us...> Open-source projects:
- Python Docutils: http://docutils.sourceforge.net/
(includes reStructuredText: http://docutils.sf.net/rst.html)
- The Go Tools Project: http://gotools.sourceforge.net/
|