Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's untrue.

(famously so, Intel used to ship arm chips with WMMX and Apple for example ships their CPU today with the AMX AI acceleration extension)



WMMX was exposed via the ARM coprocessor mechanism (so it was permitted by the architecture). The coprocessor stuff was removed in ARMv8.


Now custom instructions are directly on the regular instruction space...

(+ there's the can of worms of target-specific MSRs being writable from user-space, Apple does this as part of APRR to flip the JIT region from RW- to R-X and vice-versa without going through a trip to the kernel. That also has the advantage that the state is modifiable per-thread)


In ARMv8 you have a much cleaner mechanism through system registers(MSR/MRS).


Apple has been using system registers for years already. AMX is interesting because it's actual instruction encodings that are unused by the spec.


That's like saying that my Intel CPU comes with an NVIDA Turing AI acceleration extension. The instructions the CPU can run on an Apple ARM-based CPU is all ARM ISA. That's in the license arrangement, if you fail to pass ARM's compliance tests (which include not adding your own instructions, or modifying the ones included) you can't use ARM's license.

Please, stop spreading nonsense. All of this is public knowledge.


No. I reverse-engineered it and AMX on the Apple A13 is an instruction set extension running on the main CPU core.

The Neural Engine is a completely separate hardware block, and you have good reasons to have such an extension available on the CPU directly, to reduce latency for short-running tasks.


Are the AMX instructions available in EL0?

Is it possible AMX is implemented with the implementation-defined system registers and aliases of SYS/SYSL in the encoding space reserved for implementation-defined system instructions? Do you have the encodings for the AMX instructions?


AMX instructions are available in EL0 yes, and are used by CoreML and Accelerate.framework.

A sample instruction: 20 12 20 00... which doesn't in any stretch parse as a valid arm64 instruction in the Arm specification.

Edit: Some other AMX combinations off-hand:

00 10 20 00

21 12 20 00

20 12 20 00

40 10 20 00


very interesting, thanks!


The AMX is an accelerator block... If you concluded otherwise, your reverse-engineering skills are not great...

Let me repeat this: part of the ARM architectural license says that you can't modify the ISA. You have to implement a whole subset (the manual says what's mandatory and what's optional), and only that. This is, as I've been saying, public knowledge. This is how it works. And there are very good reasons for this, like avoiding fragmentation and losing control of their own ISA.

And once again, stop spreading misinformation.


Hello,

Specifically about the Apple case,

After your tone, not certainly obligated to answer but will write one quickly...

Apple A13 adds AMX, a set of (mostly) AI acceleration instructions that are also useful for matrix math in general. The AMX configuration happens at the level of the AMX_CONFIG_EL1/EL12/EL2/EL21 registers, with AMX_STATE_T_EL1 and AMX_CONTEXT_EL1 being also present.

The list of instructions is at https://pastebin.ubuntu.com/p/xZmmVF7tS8/ (didn't bother to document it publicly at least at this point).

Hopefully that clears things up a bit,

And please don't ever do this again, thank you. (this also doesn't comply with the guidelines)

-- a member of the checkra1n team


You may be correct, but do you really have to be so attacking?


Can you provide a link to the "public knowledge" for those who don't know?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: