Introduction :
Finite Field Assembly is a programming language that lets you emulate GPUs on CPUs
It's a CUDA alternative that uses finite field theory to convert GPU kernels to prime number fields.
Finite Field is the primary data structure : FF-asm is a CUDA alternative designed for computations over finite fields.
Recursive computing support : not cache-aware vectorization, not parallelization, but performing a calculation inside a calculation inside another calculation.
Extension of C89 - runs everywhere gcc is available.
Context : I'm getting my math PhD and I built this language around my area of expertise, Number Theory and Finite Fields.
I think I get it. You're using the Ring isomorphism from the Chinese Remainder Theorem to do "parallel computation". This is the same principle as how boolean algebra on binary strings computes the pairwise results of each bit in parallel. Unfortunately, there's no free lunch - if you want to perform K operations on N-bit integers in parallel, you still need to work with (K * N)-bit-wide vectors, which is essentially what SIMD does anyway.
One of the more subtle aspects of retargeting GPU code to run on the CPU is the presence of fine grained(read - block level and warp level) explicit synchronization mechanisms being available in the GPU. However, this is not the same in CPU land, so additional care has to be taken to handle this. One example of work which tries this is https://arxiv.org/pdf/2207.00257 .
Interestingly, in the same work, contrary to what you’d expect, transpiling GPU code to run on CPU gives ~76% speedups in HPC workloads compared to a hand optimized multi-core CPU implementation on Fugaku(a CPU only supercomputer), after accounting for these differences in synchronization.
It's a bit hard for me to tell the intention here. Is the idea that finite fields can take better advantage of CPU architecture than something like SIMD for parallel computation? Or is this just for experimentation?
Edit: this tickles my brain about some similar seeming sort of programming language experiment, where they were also trying to express concurrency (not inherently the same as parallelism) using some fancy math. I can't remember what it was though?
pwdisswordfishz ·1 minutes ago
I thought a finite field's order has to be a prime power.
muragekibicho ·3 days ago
It's a CUDA alternative that uses finite field theory to convert GPU kernels to prime number fields.
Finite Field is the primary data structure : FF-asm is a CUDA alternative designed for computations over finite fields.
Recursive computing support : not cache-aware vectorization, not parallelization, but performing a calculation inside a calculation inside another calculation.
Extension of C89 - runs everywhere gcc is available. Context : I'm getting my math PhD and I built this language around my area of expertise, Number Theory and Finite Fields.
Show replies
adamvenis ·45 minutes ago
Show replies
vimarsh6739 ·31 minutes ago
Interestingly, in the same work, contrary to what you’d expect, transpiling GPU code to run on CPU gives ~76% speedups in HPC workloads compared to a hand optimized multi-core CPU implementation on Fugaku(a CPU only supercomputer), after accounting for these differences in synchronization.
foota ·49 minutes ago
Edit: this tickles my brain about some similar seeming sort of programming language experiment, where they were also trying to express concurrency (not inherently the same as parallelism) using some fancy math. I can't remember what it was though?