Compiling to another language is always going to be very slow, and very frustrating, because one's expressiveness is limited to the expressiveness of the target language.
(Yes, I know, because C++ is Turing Complete, anything is expressible, but it's a matter of efficiency.)
For example, what if the language needed exceptions to work in a different way than the target does?
And these days, one can easily use existing backends like the Digital Mars one (Boost Licensed!), the Gnu one or the LLVM one. Why punish yourself by emitting code in another language? Natalie already requires gcc or clang anyway.
I could never have gotten D to work if it emitted C code as output.
cfront in the 80s was a far simpler C++ compiler than today. And if you ever looked at the code it generated, you'd be horrified.
One major problem it had in generating C was the lack of COMDAT sections in the generated object file. This was needed for generating code for the same member function in multiple source files. C compilers would just stick the code into the text segment, resulting in multiple declaration errors from the linker.
I am not saying Bjarne was an inept programmer. He's actually brilliant, it was just a very very hard problem he set for himself, and it's amazing cfront worked as well as it did.
The cfront on DOS also was very slow (imagine writing the generated source code out to floppies!) and crippled because it didn't have near and far pointers. Zortech C++ was the first native C++ compiler, which bypassed all these problems. ZTC++ simply vaulted over cfront by making C++ simple and fast on DOS. DOS at the time was where 90% of the programming action was, so this was no small thing.
My opinion, and that of a few others at the time, was that Zortech saved C++ from oblivion by having a usable implementation on DOS. You can see that by the traffic level in comp.lang.c++ at the time - it took off immediately after ZTC++ was released, and never looked back.
I always love your responses. I didn't know about COMDAT, but I did use the SAS compiler on the Amiga which was Cfront based. I didn't know enough to be horrified, but I often looked at the output because I was integrating the output with regular C code (interpreters, Arexx, etc). It was kinda like Hungarian Notation++ ;) I got to the point where it felt natural to read the name mangling.
The Amiga SAS/C compiler was also pretty slow. I can see how ZTC would be popular in by skipping intermediates and not having to reinvoke the compiler.
How many problems from the 80s melted away because of the large amount of ram available in the 90s?
The dmd code generator was/is a C++ compiler backend too, and can trace it's lineage back to the first compilers to make Cfront obsolete.
We could emit C code, but it's just worse when you need to get to the cutting edge. Even if you end up going through the same backend via C, you've now (say) lost devirtualization.
>Compiling to another language is always going to be very slow
I'm curious as to why you think this is necessarily true ? It's always going to be _slower_ than writing the target language directly, off course, that's just a tautology. But if I pick a target language with a really fast compiler, say Golang (judging purely by reputation, not actual experience), then I have a huge headstart on other languages, right ? Even though I'm slower than Golang, I'm faster than other languages because Golang is so much faster than other languages. This is not the case with C++, but that's just a special case.
I think that the true comparison you're probably making here is not what I said above, but that parsing a language, processing it, then writing it to another language (which itself needs to be parsed and processed etc...); I think that you're basically saying that this is always slower than just emitting whatever final IR we eventually reach directly ? But like I said above, if the target language is correctly chosen the overhead doesn't matter much, and in return you get the benefits of using a source language as IR, I will get to that later in reply to another point you made.
>and very frustrating, because one's expressiveness is limited to the expressiveness of the target language.
But C++ is extremely expressive though, indeed it's _too_ expressive for humans to grok and control.
>For example, what if the language needed exceptions to work in a different way than the target does?
I'm not the brightest bulb on runtimes and exceptions so I will take your (admittedly brilliant, I'm a fan) word that this is actually an insurmountable problem.
But isn't this a time-precision tradeoff? if you don't compile to a language with exceptions then you need to make exceptions work from the very scratch. Using a source language as target, you get a working, heavily-debugged, exception mechanism, but you're now constrained to what it can express.
>Why punish yourself by emitting code in another language?
This is where the benefit of compiling-to-source that I mentioned above comes in: it's an extremely low-barrier-to-entry strategy. Even though VM bytecode formats and other IRs are specifically designed to be clean and abstract targets that wraps the ugly details of platforms/architectures, there is nothing more clean and abstract than a source language designed to be used by humans.
It's extremely attractive for new compiler writers to not have to learn yet another language and keep its (quite low level) details in their mind along with the source one, they can just specify an equivalent program in another source language they already know and get all of its toolchain for free.
This can backfire in cases like C and C++, where the languages are not actually clean at all and there are tons of special cases and undefined behaviour that most developers ignore, but again that's just a special case, there is no reason the overall approach can't work with other languages.
>I could never have gotten D to work if it emitted C code as output
Like I said, it's an entry-level strategy. It works when you need something working and you need it fast, once you have something working you can ditch the makeshift backend and create a proper one (hopefully you haven't relied on any implicit semantics of the previous backend). It's like an intermediate point between an honest-to-God compiler and a naive tree-walking interpreter, they are all points on the same tradeoff axis.
> I'm curious as to why you think this is necessarily true ?
It's always going to be slower simply because it's two compilers rather than one. You're writing another file to disk, and reading it back in. The lexing and parsing has to be done over again.
C++ only works if it is a superset of your language. For example, suppose your language wants to trap integer overflow. C++ doesn't do that. Think of how you'd write a+b in C++ and check for overflow. It isn't pretty or efficient. Or suppose you wanted to do a computed goto. C++ doesn't have that. It's not easy or efficient to do rewrite it. (gcc has it as an extension.)
Suppose you wanted to use the BCD arithmetic type in the x87. You're out of luck using C++. Or the 80 bit reals in the x87. You're out of luck with many C++'s as they don't support that.
> if you don't compile to a language with exceptions then you need to make exceptions work from the very scratch.
Yes, not an easy task at all.
> there is no reason the overall approach can't work with other languages
You'll find, as a practical matter, that if you're using language X as the target of your language, it will inevitably constrain the semantics of your language to be that of X. You can't even do things like use a different function call ABI.
> It works when you need something working and you need it fast
I bet you'll get something working fast, but trying to get the last 25% working will consume much more time than if you used an existing, well-developed back end.
RubyMotion is probably as good as it gets for AOT. Ruby depends on dynamic dispatch and thankfully objc is very similar the needs of a Ruby-like language.
Truffle might work even better as it’s able to recompile.
Any C++ port will likely need to reimplement half of the objc runtime to support all of Ruby. Not sure if clangs/gcc objc support includes the runtime, but I’m imagining it would… so maybe it’s reusable that way.
It is not dead. There are regular releases, a helpful community at slack.RubyMotion.com and training available at https://wndx.school (that last is mine)
For me, it's that it's almost guaranteed a C++ compiler is available for the platform I'm targeting. This could be for various reasons, but it's usually either:
1. The chip isn't supported by LLVM (which most new languages use)
2. The platform owner /requires/ the use of their C++ only SDK and will not approve any other compilers for use on their app store, so C++ becomes the "machine language" of that platform.
Some people may do it because it's more approachable to them (and/or others). Others may have language models/runtimes that align closely with the target language. Or they may want to "stand on the shoulders of giants", benefiting from the higher-level optimizations of the target language. Also, for example, LLVM still doesn't target as many architectures as gcc (though I'm not saying that's necessarily very relevant for most users).
That's just what comes to mind. I can't say for certain anything about this particular language though!
I’m a little confused about this project. I’m trying to build the most complete list of programming languages out there (currently working on it via my favorites, it you want to have a look), and I’m trying to figure out if this qualifies.
Usually it’s pretty easy to get on my list: if you call your project a language I add it to the list.
But this one gives me a little pause, because it seems like this language is not distinct from Ruby at all. Rather this is a straight up Ruby -> C++.
Is it fair to call this a language rather than a compiler? To me, a language is more than syntax and semantics, but includes an library ecosystem, tooling, and community. Does Natalie aim to grow a community, or will it exist fully within the Rudy ecosystem?
In a strict CS sense, languages are defined by what they accept, and how they parse what they accept.
Given that this will only accept and correctly parse a subset of the Ruby language, in a strict CS sense, it accepts a different language to Ruby.
In time, this difference may shrink (as Natalie becomes more complete), become larger (as Ruby gets more fully featured) or diverge so that Natalie isn't a subset (e.g. if Natalie decides for some reason to parse some Ruby construct differently to Ruby).
This isn't the first time something like this has been done; for a while Facebook used a PHP to C++ compiler before switching to compiling it directly to native code via a JIT:
The domain is Natalie-lang.org and the header on that page is “Natalie Programming Language”, so that’s where I’m a little confused. Maybe the goal isn’t to be a Ruby compiler but to transition into something more, so I was wondering if anyone had any idea.
In the JavaScript world this would be called a transpiler. Instead of compiling a newer version of JS into an older one this compiles Ruby in a totally different language, C++. However the Ruby interpreter (MRI) is written in C so C++ is less far away from it than, let's say, Java. I didn't check the code but I wonder if they inlined some code from MRI.
I was under impression this can already be achieved via TruffleRuby compilation into a native image? Not that I used it, but thought this is doable, and the process is well tested.
autoconf is nix-only in practice, macro-based, verbose and has a significant legacy burden. I have suffered plenty of pain from trying to build the few projects that still use it. In such cases it's often been faster to write own build scripts for some other newer buildsystem.
CMake and Meson I've used across tens of different projects, there's solid online documentation, significant usage, a good underlying language and I already have my toolchain files ready. Using autoconf instead of something more common and modern is likely to turn off people contributing or using your project as they'll be unfamiliar with it and are going to have more trouble solving issues on their own.
(Yes, I know, because C++ is Turing Complete, anything is expressible, but it's a matter of efficiency.)
For example, what if the language needed exceptions to work in a different way than the target does?
And these days, one can easily use existing backends like the Digital Mars one (Boost Licensed!), the Gnu one or the LLVM one. Why punish yourself by emitting code in another language? Natalie already requires gcc or clang anyway.
I could never have gotten D to work if it emitted C code as output.