decomposed's comments

decomposed · on Jan 8, 2019

It is "amazingly simple to get started" but also amazingly difficult to go much deeper. Some of this, imo, is because the preferred introduction to Red/Rebol is commonly the view (GUI) subsystem. Gradual documentation or other structured paths to the deeper end of Red/Rebol appears to be non-existent.

I've seen Red compared to Lisp, Smalltalk, and Forth and other concise languages.

But Smalltalk syntax can famously be fully represented on a postcard. (https://pharoweekly.wordpress.com/2018/06/02/all-pharo-synta...) Lisp syntax is even more simple and orthogonal. (Lisp BNF http://cuiwww.unige.ch/isi/bnf/LISP/BNFlisp.html) I didn't find a good, common BNF for Forth in a quick web search but the stack is exposed and complete, standardized implementations are historically common; I did one myself once upon a time.

What about Red/Rebol? Where is the BNF?

Anyway, the road to Red/Rebol deep mastery is heavily shrouded at present. All that said, the Red project is amazing and pretty much unique in ambition and I, for one, am certainly hoping it makes it through to a solid 1.0. At that point, I hope, strong effort will allow the language to be fully grokked by, at least, the folks who are comfortable with the likes of Lisp and Haskell.

Let's keep our fingers crossed and support the 2019 roadmap.

9214 · on Jan 9, 2019

> Where is the BNF?

Here [1]. Red and Smalltalk syntax look similar, but semantics are very different. They only relation to object-oriented languages Red / Rebol have is through prototype-based OOP, e.g. Self [2].

Forth doesn't have any grammar specification AFAIK - any space-separated string of ASCII tokens is a valid Forth program, but it may vary from dialect to dialect. It's more of an idea than programming language.

[1]: https://github.com/meijeru/red.specs-public/blob/master/spec...

[2]: http://www.selflanguage.org/

decomposed · on Jan 9, 2019

It is good to see there IS a specification; I - and I think some others (see the Temple OS comments) - had concern Red/Rebol might actually be too ad hoc (throw things against the wall and see if they stick).

Is there means to access the lexer directly as a means to explicate how, exactly, Red is interpreting? The use of blocks in Red, and the layout in general, did remind me of Smalltalk but in Smalltalk I could at least rely on really simple syntax rules including some really useful grouping tools to break things up into more human friendly chunks (parens; semicolon; and period).

The spec says "*

As a first operation of the toolchain, the text file will be subjected to lexical analysis which will break the text up in a series of lexemes, i.e. textual representations of Red single values, interspersed with grouping tokens. The grouping tokens should occur in properly nested pairs, and are the following: ( ), [ ], #( ), #[ ]. A sequence of lexemes enclosed in matching grouping tokens represents a Red grouped value of a certain type, and this construct may again be enclosed in grouping tokens etc. Note that the token pairs " ", #" ", { }, #{ } and < > each delimit a single value, they are not grouping tokens...As a rule, lexemes must be separated from each other and from grouping tokens by one or more whitespace characters.*"

Anyway, I think trivial access to the lexer output, so one might SEE exactly how the Red interpreter is viewing things, would go a long way toward dispelling the "Temple OS" concern and providing that on-ramp that IMO is otherwise missing.

Any help on this front?

9214 · on Jan 9, 2019

> Red/Rebol might actually be too ad hoc

No offence, but very few people who raise such concerns actually take time to learn the language (or even launch a REPL at least once) and understand its design, so I appreciate you digging deeper. "Things are thrown against the wall" only in terms of constant search for a sound business model, and this, I believe, is the struggle that any startup (esp. programming languages) faces.

> Is there means to access the lexer directly as a means to explicate how, exactly, Red is interpreting?

I think you can start by, well, reading lexer code, which is written in Parse dialect [1], but specification I showed you might be more approachable. But really, just grab the latest build and start playing, I'll give some very basic examples below.

Now, to the main point: from what I know, Rebol (and Red) are based on research in denotational semantics that Carl Sassenrath did. I'll try to briefly explain the main points.

As you already know, everything starts with a UTF-8 encoded string. Each valid token in this string is converted to an internal data representation - a boxed structure, called a value slot or sometimes a cell.

Value slot is composed of a header and a payload. Header contains various flags and datatype ID, payload specifies exact content of the value. If content doesn't fit in one value slot, then payload contains a pointer to an external buffer (an array of value slots, bytes, or other units + offset and start/end addresses IIRC) with extra data.

So, lexer converts string representation to a tree of value slots (this phase is called "loading"), which is essentially a concrete syntax tree (CST) - this is the crux of homoiconicity.

  >> "6 * 7"
  == "6 * 7"
  >> type? "6 * 7"
  == string!
  >> load "6 * 7"
  == [6 * 7]
  >> type? load "6 * 7"
  == block!
  >> first load "6 * 7"
  == 6
  >> type? first load "6 * 7"
  == integer!

Everything is a (first-class) value, and every value has a datatype (we have roughly 50 of them right now). And there's no code - only this data structure, which is just a block, which you can freely manipulate at will (so as any other value).

  >> reverse [6 * 7]
  == [7 * 6]
  >> append reverse [6 * 7] [+ 1]
  == [7 * 6 + 1]
  >> skip append reverse [6 * 7] [+ 1] 2
  == [6 + 1]

What interpreter does is just a "walk" over this tree of values, dictated by a set of simple evaluation rules (expressions are evaluated left to right, operators take precedence over functions and have a more tight left side, literals evaluate to themselves, functions are applied to a fixed set of arguments, symbolic values of type "set-word!" [more on this later] are bound to the result of expression that follows them, etc) but there are a couple of catches.

The first catch is that some values are symbolic - that is, they indirectly refer to some other values via a context (namespace). You can modify this reference (called binding) freely at runtime, and thus change the meaning of symbolic values and of an entire block that contains them.

So, the "meaning" of a given block is always relative to some context(s) - this is what relative expression means (RE in REBOL). And context itself is just an environment of key/value pairs (key is a "symbol", value is its "meaning") represented as an object (O in REBOL).

  >> block: [6 * 7]             ; "block:" is a value of type "set-word!"
  == [6 * 7]
  >> type? second block
  == word!                      ; words are one of the symbolic values I've mentioned
  >> do block
  == 42
  >> bind block object [*: :+] ; now "multiplication" means "addition"
  == [6 * 7]
  >> do block
  == 13

The second catch is that you are not restricted by default interpreter (represented by "do" function) and can use any other one or even implement your own, thus making an embedded DSL - a dialect in Red/Rebol parlance.

* Red/System takes a block of C-level code and does the prescribed job.

* View takes a block that specifies GUI layout and shows a fancy window.

* Draw takes a block of drawing commands and renders an image.

* Parse takes an input series and a block of PEG grammar, and parses the input.

* Math takes a block and interprets it with common operator precedence rules.

Sky is the limit, and its dialects all the way down.

To reiterate: the basic building block (no pun intended) is a block of values, which can represent either code (relative expression which, upon evaluation, will yield a value) or data (just a bunch of values arranged in a specific format - such micro-formats are considered to be dialects too †). Block can also contain symbolic values (called words) which can change their binding during evaluation, and thus alter the semantics of expression.

There's a lot hiding behind the facade, as you can see. And what is there is hardly an ad-hoc hodge-podge slapped together.

[1]: https://github.com/red/red/blob/master/environment/lexer.red

(†): one example of such micro-format dialect is function specification, e.g.

  spec: [
      "Add two numeric values together"
      x [number!]
      y [number!]
  ]

is a specification (or an "interace") for a function that performs addition. Here's a block that expresses addition of two specific numbers:

  [1 + 2]

If we wish to abstract over it, we can substitute 1 and 2 for words:

  expression: [x + y]

And then we can alter bindings of these words to actual arguments we wish to add together:

   bind expression object [x: 1 y: 2]

We then can evaluate such expression and yield a resulting value:

   == do expression
   >> 3

The trick is that functions are just abstraction over evaluation of expression in some environment, that is, a syntax sugar for

  do bind [...] object [...]

with some additional optimizations and type-checking. So, addition instead can be expressed as:

  >> add: func spec expression
  == func [
    "Add two numeric values together" 
    x [number!] y [number!]
  ][x + y]
  >> type? :add ; ":add" is a value of type "get-word!" which, on evaluation, yields function's value referred by word "add" as-is, without triggering its application.
  == function!
  >> add 1 2
  == 3

decomposed · on Jan 9, 2019

Thank you for the details; I will study them more.

But let me try again with my main question. I want to see what the lexer outputs. How can I view that?

I'd like to be able to hand a string (or file) of what I think is valid (or invalid) Red code and have the lexer return to me that input NOT executed but parsed into lexemes, grouping tokens, and appropriate white space.

Again, according to the specification you linked, the lexing is a discrete step to resolve any ambiguity of syntax in this manner ahead of interpretation/execution. So I'd think accessing the lex output would be trivial and transparent -- not to mention invaluable for learning and debugging Red. Why should lexer be crystal clear about exactly what is going to be interpreted, and in what order, while the human is sometimes left in the dark?

Then, obviously, I'd like to be able to pass this lexed and "grouped" string directly in to the interpreter for execution as a second step. Showing me the result of interpreting a string isn't nearly so helpful as this would be. Can this be done?

Why would I want to do this? Well, I've weird. ;-/ More seriously, at this point you guys don't have a good stepping debugger - it would help me debug my own code. It would also be a rather cool thing to be able to do just for the hell of it, as a great exploratory exercise. I would think I could then play with the lexemes individually once I can see exactly what the lexer sees instead of only what I think - too often mistakenly - the lexer sees.

Thanks for entertaining my questions.

9214 · on Jan 9, 2019

> I'd like to be able to hand a string (or file) of what I think is valid (or invalid) Red code and have the lexer return to me that input NOT executed but parsed into lexemes, grouping tokens, and appropriate white space.

This is what "load" essentially does - it takes a string and returns a concrete syntax tree [1]. "do" then takes that CST and evaluates it. You can do that in a single-step manner with "do/next" (there's also "load/next"). Moreso, you can think of blocks as phrase markers (see wiki example):

  [S [NP John] [VP [V hit] [NP the [N ball]]]]

Now compare it to Red internal data structure written down in similar fashion:

  <block! [<integer! 1> <word! +> <integer! 2>]>

Here I loosely follow phrase marker notation - nonterminals correspond to datatypes, terminals correspond to literal values. But datatype is implied by literal form, and implicitly contained in each value slot (its a datatype ID tag in the header), so I can tidy this up to:

  [1 + 2]

This is what lexer returns - just a block. You can then manipulate it whichever way you want and evaluate.

> at this point you guys don't have a good stepping debugger

We don't have it for a reason - implementing it in user-space would be extremely limited as you can't really distinguish between code and data, and so can't adequately place debugging hooks. The best alternative is just to spruce up block with debugging "print"s and such, or, in case of syntactic errors use "load/trap" and search for error values. Rebol had some interesting projects in this regard, e.g. Anamonitor comes to mind [2]. I think someone already ported it to Red, or re-implemented using our reactive framework.

There was a recent discussion in community chat about that, and we came to conclusion that debugger should rather be implemented at the level of "load" and "do" themselves (e.g. "load" can provide some metainfo, like line numbers and file name, "do" then can single-step it in real time and provide the meaningful info).

[1]: https://en.wikipedia.org/wiki/Parse_tree

[2]: http://rebol2.blogspot.com/2011/11/anamonitor-2-check-block-...