Overview over NASA's new RISC-V-based Processor

Overview over NASA's new RISC-V-based Processor
Photo by NASA / Unsplash

It happens not often that NASA switches to a new processor for their spacecraft. For example the rover Perseverance, which landed on Mars in 2021, uses for its processor the RAD750, developed some 30 years ago. Its based on a PowerPC 750, the one that was also used in the original iMac. Since then the hardware and software have proven themselves on many missions. However its compute performance, scalability, security and fault tolerance are lacking. In 2022 NASA awarded Microchip with a contract to design a new processor. Its to be used an all future lunar and planetary missions. The new processor should provide 100x the compute performance and also address the other needs. Recently Microchip announced that this processor is now being delivered. Its called PIC64-HPSC.

The main function blocks are depicted in the block diagram below:

Source: Microchip

It looks like a modern processor containing next to the CPU, memory and system controller a dedicated security coprocessor, lots of peripherals and an Ethernet Switch.

Processing Performance and Fault tolerance

The processing power is delivered by 8 SiFive X280 CPU Cores. The X280 is advertised with a compute performance of 4.6 Specint2k6 / GHz.

I'v tried to find ARM Cores with a similar Specint2k6 score. The closest one I found was a Cortex A75, estimated at 6.8 / Specint2k GHz.
Going by ARMs advertised generational performance gains of 20 % (A75 over A73) and 30 % (A73 over A72) and doing some crude math this would put the Cortex A72 at a similar performance level:

6.8 / (1.3 * 1.2) = 4.3 Specint2k6 / GHz

compared to SiFives X280. The Cortex A72 is used for example in the Raspberry Pi 4's Broadcom BCM2711 SoC. Given its similar performance and power level and popularity I thought the Pi 4 to be an interesting comparison point. Additionally its meant to highlight some space specific particularities of the PIC64, as the Pi is not build for space application but meant for teaching and sees some industrial applications.

The following table compares key specs of the Microchip PIC64 HPSC1000 with its predecessor the BAE RAD750 and the Raspberry Pi 4 SoC BCM2711.

BAE RAD750 Microchip PIC64 HPSC1000-RH Raspberry Pi 4 Broadcom 2711
1x RAD750 8x SiFive X280 4x ARM Cortex A72
ISA PowerPC RISC-V ARM
fmax
200 MHz 1000 MHz 1500 MHz / 1800 MHz
MIPS 366 26000
Real Adress Range 32 bit 64 bit 64 bit
Cache L1 32 KiB I$ 8-ways
32 KiB D$ 8-ways
32 KiB I$ 4-ways
32 KiB D$ 4-ways
48 KiB I$ 3-ways
32 KiB D$ 2-ways
Cache L2 external shared 2 MiB shared 1 MiB 16-ways
Execution Sequence Out of Order in order Out of Order
Pipeline Stages 4 8 15
Issue Width 2 2 (Scalar) + 1 (Vector) 8 (Scalar/Vector)
Vector Length n.a. 512 bit (512 bit datapath) 128 bit (64 bit data path)
Memory SDRAM 256 MB
ECC
133 MT/s * 64 bit = 1 GB/s
DDR3-2133 or DD4-3200 max 2x32 GB
with multi-Bit ECC
3200 MT/s * 128 bit = 51.2 GB/s
LPDDR4-3200 max. 8 GB

12.8 GB/s

Compared to recent x86 or ARM application class processors found in Laptops or Smartphones, SiFives X280 cores scalar processing capabilities are limited. Up to two instructions can be issued per cycle which are executed in order. In comparison even the Cortex A72 of the Raspi 4 are much more complex: capable of changing the instruction sequence to utilize the eight functional units as best as possible. What sets the X280 apart from many other core designs with similar scalar capabilities is the 512 bit long vectors with a true 512 bit data path. In comparison the Cortex A72 has smaller 128 bit vectors while the internal data path is only 64 bit. This means when using vector instructions, the per cycle throughput of an X280 core is 8 times higher compared to an Cortex A72. When using small numbers like INT8, which are typical for machine learning, up to 64 Multiply-Accumulate (MAC) operations can be performed by one 512-bit vector instruction.

On x86 512 bit vectors are part of the AVX512 instruction set extension. This can be found only in recent high performance µArch like Zen4/Zen5 and some Intel CPUs. Looking in the ARM world, 512 bit vectors are possible via the Scalable Vector Extensions (SVE). 512 bit vectors are used e.g. in Fugaku, worlds fastest supercomputer from 2020 to 2022 or Amazons data center processor Graviton3.

These vector processing capabilities allow the processor, which does not feature a dedicated Neural Processing Unit, to provide some reasonable throughput at advertised 2 TOPS. The TOPS number represents the theoretical maximum number of 8-bit-integer Multiply-Accumulate (MAC) operations that can be performed per second by all cores. A "G" refers to 10⁹ (Giga) or one billion, a "T" refers to 10¹² (Tera) or a trillion.
The OPS are the product of the number of cores, the frequency f and the number of 8 bit operations per vector. Although Multiply-Accumulate (MAC) is typically executed using just one instruction, its counted as two operations, hence the factor 2.

For comparison the latest iPhone 15 Pro features a dedicated NPU, capable of 35 TOPS, while Microsoft requires at least 40 TOPS for their "AI PCs".
The following table compares the performance of NASAs old and new processor as well as the P i4, using some popular CPU benchmarks where I could scores. Higher is better. Overall the performance uplift is substantial, achieving 65x the DMIPS score of previous processor. Compared to the ARM Cortex A72 cores in the Pi, the single core is worse, maybe half as good. The multicore performance seems on a similar level, thanks to having double the amount of cores.

System BAE RAD750 Microchip PIC64 HPSC1000-RH Raspberry Pi 4
Single Core 1x RAD750 @ 200 MHz 1x SiFive X280 @ 1000 MHz 1x ARM Cortex A72 @ 1500 MHz
Dhrystone MIPS 400 3250* 8176
Dhrystone MIPS / MHz 2* 3.25 5.45
specINT2k6 / GHz 4.6 4.3*
All Core 8x SiFive X280 @ 1 GHz 4x ARM Cortex A72 @ 1800 MHz
DMIPS 400 26,000
CoreMark 46,000 48,628
INT8 own calc 1 * 0.2 GHz * 2 = 0.4 GOPS 8 * 512/8 * 1 GHz *2 = 1000 GOPS 4 * 64/8 * 1.5 GHz *2 = 96 GOPS
INT advertised 2000 GOPS

*indicates own calculations based on reported scores

The CPU can be configured to run in lockstep, trading off half the CPU cores for increased fault tolerance, e.g. detection of single event upsets. With lockstep, a pair of CPUs executes the same code, with some cycles delay between them. When ever they come to different results, an error is raised and appropriate action can be taken. This can be as simple as running the code section again.

Next to the eight main CPU cores, there are two RISC-V cores embedded into the system and secure controller. Those are meant for house keeping tasks, freeing up the main CPU. This underscores the flexibility of the RISC-V architecture, being able to scale from simple embedded class cores to complex application class cores.
The secure controller acts as a platform root of trust. It supports secure boot. Additionally the post quantum cryptography protocols ML-KEM and ML-DSA are supported.

Scalability and Flexibility

Notable peripherals are CXL (Compute Express Link) TSN (Time Sensitive Networking). Those are relatively new standards, released only in the 2010s and 2020s, compared to I2C and SPI, released in the 1980s.

CXL was originally developed for high performance data center computers. It allows the host CPU and devices to work together in a shared coherent memory space. The coherency is managed by the CPU, enabling low latency and efficient memory access with a less compex software stack. Processing workloads can be more easily distributed between the CPU and accelerators (e.g. FPGA, GPU). This allows NASA to flexibly scale the performance of the computer as required by future missions.

TSN makes Ethernet based networks real-time capable. In contrast to Ethernet, there needs to be a guarantee that messages can be received within a certain maximum delay. This is important e.g. in control tasks like landing a spaceship where motors have to react to sensor input within a certain time frame, otherwise the spaceship will crash.

Processor BAE RAD750 Microchip PIC64 HPSC1000-RH Raspberry Pi 4 BCM2711
Cores 1x RAD750 8x SiFive X280 4x ARM Cortex A72
Temperature - 55 °C to 125 °C -55 °C to 125 °C 0 °C to 50 °C
Total Ionizing Dose 1000 krad 200 krad
Technology 250 nm 28 nm
Power 10 W ~ 5 W
MIPS 366 26000
Peripherals UART
PCI
UART, I2C, SPI
4x TSN Ethernet 10 Mb up to 10 Gb
PCIe Gen 3x8 + CXL 2.0
UART, I2C, SPI
Ethernet 1 Gb
USB
Production Lifetime January 2034

References

NASA Contract Announcement https://www.nasa.gov/news-release/nasa-awards-next-generation-spaceflight-computing-processor-contract/

Microchip PIC64 HPSC Announcement https://www.microchip.com/en-us/about/news-releases/products/microchip-unveils-industrys-highest-performance-64-bit-hpsc-mpu

SiFive X280 Specsheet https://sifive.cdn.prismic.io/sifive/9405d3d0-35a1-4680-a259-7a5598d1ecb2_sifive-intelligence-x200-datasheet.pdf

BAE RAD750 https://www.baesystems.com/en-media/uploadFile/20210404045936/1434555668211.pdf
http://powerpc.i-logout.cz/docs/rad750.pdf

Microchip PIC64 HPSC Series Specsheet https://ww1.microchip.com/downloads/aemDocuments/documents/MPU64/ProductDocuments/Brochures/PIC64-HPSC-Series-00005391.pdf