• String-Based Macro Systems

    From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.misc on Sat Apr 13 02:29:56 2024
    From Newsgroup: comp.lang.misc

    I think most of us are familiar with the “#define” preprocessor in C and C++. There are more powerful macro processors around, like GNU m4. They
    all have the same basic concept: pass input text straight through to
    output, until something triggers a macro substitution on the text.

    The original m4 was created by the Unix folks at Bell Labs, modelled on an earlier concept called “Macrogenerator” by Christopher Strachey (one of the brains behind the programming language CPL, which led to BCPL, which
    led to B and then C). Macrogenerator had special symbols to indicate macro definition, and macro and argument expansion:

    §DEF,«name»,<«definition»>;

    where the “<” and “>” are actual quote symbols in the notation, while I
    use “«” and ”»” as metasyntactic brackets. Within the «definition»,
    occurrences of “~1”, “~2” etc are replaced with the first, second etc actual argument specified in the call. You then use this macro as

    §«name»,«args»;

    where multiple arguments are comma-separated.

    Simple example: given

    §DEF,greetings,<Hello, ~1!>;

    then

    I would just like to say, “§greetings,world;” to anybody listening

    should expand to

    I would just like to say, “Hello, world!” to anybody listening

    Here is a moderately interesting example, from the Bryan Higman book where
    I first heard about this. It uses a builtin called §UPDATE, which does assignment to an existing macro name, and also note the occurrence of
    §DEFs within §DEFs, for local (temporary) macro definitions (since the auxiliary macro §Q has to persist between invocations, it cannot be one of these):

    §DEF,Q,A;
    §DEF,AORB,<§§Q;;>,§DEF,A,<A§UPDATE,Q,B;>;,§DEF,B,<B§UPDATE,Q,A;>;;

    What this does is, each time you write “§AORB;”, it expands to alternately
    “A” or “B”.

    The big difference with m4 is that it does away with these special
    symbols; the mere occurrence of a name matching a defined macro (or an argument of the macro currently being expanded) is sufficient to trigger substitution. Do you think this is a good idea?

    There are all kinds of pitfalls with such macro systems. The original Macrogenerator could not cope with substitutions containing unpaired
    “< ... >” quote symbols, and even GNU m4 lacks something as simple as a backslash-style “escape next single character, whatever it is”. While m4 lets you switch the quoting symbols, it still insists that they occur in pairs.

    Would adding such an escape character be useful?
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Blue-Maned_Hawk@bluemanedhawk@invalid.invalid to comp.lang.misc on Sat Apr 13 05:09:26 2024
    From Newsgroup: comp.lang.misc

    Lawrence D'Oliveiro wrote:

    The big difference with m4 is that it does away with these special
    symbols; the mere occurrence of a name matching a defined macro (or an argument of the macro currently being expanded) is sufficient to trigger substitution. Do you think this is a good idea?

    There are all kinds of pitfalls with such macro systems. The original Macrogenerator could not cope with substitutions containing unpaired “<
    ... >” quote symbols, and even GNU m4 lacks something as simple as a backslash-style “escape next single character, whatever it is”. While m4 lets you switch the quoting symbols, it still insists that they occur in pairs.

    Would adding such an escape character be useful?

    Yes, of course.

    Whenever a system has a system to escape symbols, there are two ways to go about it: either the symbol is magic by default, and the escape makes it normal, or the symbol is normal by default, and the escape makes it magic.

    Having both of the systems at once is generally confusing, because it
    makes it difficult to remember which symbols are which. It's more
    practical to have all of them be one or the other.

    One could say that having the symbols only become magic upon escapement is better, because it clearly indicates when a symbol has magic properties.
    This is analogous to the logic used to defend sigils, a form of
    disambiguation repeatedly found to be pointless because names already do
    that disambiguation. Therefore, the correct choice is magic by default.

    One fallacious argument i've heard used to justify magic by default is
    that it means that the treatment of the escape symbol itself is consistent with all the other symbols in that it's magic by default unless escaped by itself. I consider this fallacious because in a system where magic must
    be explicit, the escape symbol would be the _only_ exception, and it would
    be _impossible_ to make any others—what i'd say is a worthwhile sacrifice.

    Either way, figuring out the solution to the problem of “Magic: by
    default or by request?” is almost certainly a lower priority than the majority of other problems.
    --
    Blue-Maned_Hawk│shortens to Hawk│/blu.mɛin.dʰak/│he/him/his/himself/Mr. blue-maned_hawk.srht.site
    (?<sigil> [&*\$\@\%])
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From Lawrence D'Oliveiro@ldo@nz.invalid to comp.lang.misc on Sat Apr 13 05:51:27 2024
    From Newsgroup: comp.lang.misc

    On Sat, 13 Apr 2024 05:09:26 -0000 (UTC), Blue-Maned_Hawk wrote:

    Whenever a system has a system to escape symbols, there are two ways to
    go about it: either the symbol is magic by default, and the escape
    makes it normal, or the symbol is normal by default, and the escape
    makes it magic.

    And here’s another question: is magic iterative? Is text produced by a
    macro substitution automatically subject to further macro substitutions?

    This is true of Macrogenerator and m4, but perhaps this is a source of a
    lot of the problems with string-based macro systems.

    On the other hand, if you didn’t do this, then how would you implement the example I gave?

    §DEF,AORB,<§§Q;;>,§DEF,A,<A§UPDATE,Q,B;>;,§DEF,B,<B§UPDATE,Q,A;>;;

    If “§A;” expands literally to “A§UPDATE,Q,B;” with no further special
    interpretation of the embedded “§”, then how would you explicitly request invocation of the “UPDATE” function?

    The answer would be, the body of the macro would not directly be
    interpreted as literal text, but would have to consist of a sequence of explicit directives, like “insert literal text”, “insert expansion of a further macro” and so on.
    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From James Harris@james.harris.1@gmail.com to comp.lang.misc on Fri May 3 10:01:57 2024
    From Newsgroup: comp.lang.misc

    On 13/04/2024 06:09, Blue-Maned_Hawk wrote:
    Lawrence D'Oliveiro wrote:

    The big difference with m4 is that it does away with these special
    symbols; the mere occurrence of a name matching a defined macro (or an
    argument of the macro currently being expanded) is sufficient to trigger
    substitution. Do you think this is a good idea?

    There are all kinds of pitfalls with such macro systems. The original
    Macrogenerator could not cope with substitutions containing unpaired “<
    ... >” quote symbols, and even GNU m4 lacks something as simple as a
    backslash-style “escape next single character, whatever it is”. While m4 >> lets you switch the quoting symbols, it still insists that they occur in
    pairs.

    Would adding such an escape character be useful?

    Yes, of course.

    Whenever a system has a system to escape symbols, there are two ways to go about it: either the symbol is magic by default, and the escape makes it normal, or the symbol is normal by default, and the escape makes it magic.

    Having both of the systems at once is generally confusing, because it
    makes it difficult to remember which symbols are which. It's more
    practical to have all of them be one or the other.

    One could say that having the symbols only become magic upon escapement is better, because it clearly indicates when a symbol has magic properties.
    This is analogous to the logic used to defend sigils, a form of disambiguation repeatedly found to be pointless because names already do
    that disambiguation. Therefore, the correct choice is magic by default.

    Interesting points though I am not sure how you got to that conclusion
    (or what you mean by "the logic used to defend sigils").

    In particular, magic characters are sometimes used in contexts in which
    there are no "names" with which to do any disambiguation. For example,
    the regular expression to match "parts" and "party" might be

    "part[sy]"

    I presume you would take that as magic-by-default so any occurrence of a
    magic symbol needs to be escaped as in a[b] appearing as

    "a\[b\]"

    Alternatively, if magic symbols were prefixed with ~ then the above two strings would appear as

    "part~[xy~]"
    "a[b]"

    Is that really worse?


    One fallacious argument i've heard used to justify magic by default is
    that it means that the treatment of the escape symbol itself is consistent with all the other symbols in that it's magic by default unless escaped by itself. I consider this fallacious because in a system where magic must
    be explicit, the escape symbol would be the _only_ exception, and it would
    be _impossible_ to make any others—what i'd say is a worthwhile sacrifice.

    Indeed. In C, backslash does double duty

    \n - backslash /gives/ significance to n
    \" - backslash /removes/ the significance of the double quote

    That inconsistency does seem odd.


    Either way, figuring out the solution to the problem of “Magic: by
    default or by request?” is almost certainly a lower priority than the majority of other problems.

    It's an important issue nonetheless. And aren't there two contexts, as follows?

    (1) A string which has to be converted by the compiler into binary
    encodings, e.g.

    "Hello\nWorld"

    (2) A string which /after any conversions/ means something to a function
    which processes it, e.g.

    "Hello\:space:world"

    where "\:space:" is meant to indicate whitespace to some program which processes the string and implies that the backslash has to remain in the encoded string.
    --
    James Harris


    --- Synchronet 3.20a-Linux NewsLink 1.114
  • From ram@ram@zedat.fu-berlin.de (Stefan Ram) to comp.lang.misc on Fri May 3 09:33:16 2024
    From Newsgroup: comp.lang.misc

    James Harris <james.harris.1@gmail.com> wrote or quoted:
    Alternatively, if magic symbols were prefixed with ~ then the above two >strings would appear as
    "part~[xy~]"
    "a[b]"
    Is that really worse?

    I reckon it's all about the stats. If a symbol shows up more often in
    "regular text" than when used as a magic symbol, you're more inclined
    to type more when it's used as a magic symbol, and vice versa.

    \n - backslash /gives/ significance to n
    \" - backslash /removes/ the significance of the double quote
    That inconsistency does seem odd.

    This "inconsistency" here only stems from how you've worded
    it above. You could phrase it differently, and then the
    "inconsistency" would vanish.
    --- Synchronet 3.20a-Linux NewsLink 1.114