That's untrue. (famously so, Intel used to ship arm chips with WMMX and Apple fo...

rrss · on Sept 14, 2020

WMMX was exposed via the ARM coprocessor mechanism (so it was permitted by the architecture). The coprocessor stuff was removed in ARMv8.

my123 · on Sept 14, 2020

Now custom instructions are directly on the regular instruction space...

(+ there's the can of worms of target-specific MSRs being writable from user-space, Apple does this as part of APRR to flip the JIT region from RW- to R-X and vice-versa without going through a trip to the kernel. That also has the advantage that the state is modifiable per-thread)

Followerer · on Sept 14, 2020

In ARMv8 you have a much cleaner mechanism through system registers(MSR/MRS).

saagarjha · on Sept 14, 2020

Apple has been using system registers for years already. AMX is interesting because it's actual instruction encodings that are unused by the spec.

Followerer · on Sept 14, 2020

That's like saying that my Intel CPU comes with an NVIDA Turing AI acceleration extension. The instructions the CPU can run on an Apple ARM-based CPU is all ARM ISA. That's in the license arrangement, if you fail to pass ARM's compliance tests (which include not adding your own instructions, or modifying the ones included) you can't use ARM's license.

Please, stop spreading nonsense. All of this is public knowledge.

my123 · on Sept 14, 2020

No. I reverse-engineered it and AMX on the Apple A13 is an instruction set extension running on the main CPU core.

The Neural Engine is a completely separate hardware block, and you have good reasons to have such an extension available on the CPU directly, to reduce latency for short-running tasks.

rrss · on Sept 14, 2020

Are the AMX instructions available in EL0?

Is it possible AMX is implemented with the implementation-defined system registers and aliases of SYS/SYSL in the encoding space reserved for implementation-defined system instructions? Do you have the encodings for the AMX instructions?

my123 · on Sept 14, 2020

AMX instructions are available in EL0 yes, and are used by CoreML and Accelerate.framework.

A sample instruction: 20 12 20 00... which doesn't in any stretch parse as a valid arm64 instruction in the Arm specification.

Edit: Some other AMX combinations off-hand:

00 10 20 00

21 12 20 00

20 12 20 00

40 10 20 00

rrss · on Sept 14, 2020

very interesting, thanks!

Followerer · on Sept 14, 2020

The AMX is an accelerator block... If you concluded otherwise, your reverse-engineering skills are not great...

Let me repeat this: part of the ARM architectural license says that you can't modify the ISA. You have to implement a whole subset (the manual says what's mandatory and what's optional), and only that. This is, as I've been saying, public knowledge. This is how it works. And there are very good reasons for this, like avoiding fragmentation and losing control of their own ISA.

And once again, stop spreading misinformation.

my123 · on Sept 14, 2020

Hello,

Specifically about the Apple case,

After your tone, not certainly obligated to answer but will write one quickly...

Apple A13 adds AMX, a set of (mostly) AI acceleration instructions that are also useful for matrix math in general. The AMX configuration happens at the level of the AMX_CONFIG_EL1/EL12/EL2/EL21 registers, with AMX_STATE_T_EL1 and AMX_CONTEXT_EL1 being also present.

The list of instructions is at https://pastebin.ubuntu.com/p/xZmmVF7tS8/ (didn't bother to document it publicly at least at this point).

Hopefully that clears things up a bit,

And please don't ever do this again, thank you. (this also doesn't comply with the guidelines)

-- a member of the checkra1n team

eklavya · on Sept 14, 2020

You may be correct, but do you really have to be so attacking?

btian · on Sept 14, 2020

Can you provide a link to the "public knowledge" for those who don't know?