First draft of pointer tagging

author: Federico Angelilli <code@fedang.net> 2025-02-15 02:30:11 +0100
committer: Federico Angelilli <code@fedang.net> 2025-02-15 02:30:11 +0100
commit: bbd194e390c1a9b64902df57bfb1ea9562e9fe67 (patch)
tree: b5b55a7633f913cfe2d6de3a3726fb1491a63292 /content/posts
parent: 2570061c8a9fe49cae0e10c8d351692272373fba (diff)
1 files changed, 179 insertions, 6 deletions
diff --git a/content/posts/pointer-tagging/index.md b/content/posts/pointer-tagging/index.md
index a41c601..180e120 100644
--- a/content/posts/pointer-tagging/index.md
+++ b/content/posts/pointer-tagging/index.md
@@ -1,21 +1,194 @@
 +++
-title = 'Pointer tagging for Interpreters'
-date = 2025-02-01T13:58:39+01:00
-categories = [ "langdev", "guide" ]
-tags = [ "langdev", "interpreter", "optimization", "low-level" ]
+title = 'Theory and implementation of tagged pointers'
+date = 2025-02-14T13:58:39+01:00
+categories = [ "langdev", "guide", "low-level" ]
+tags = [ "low-level", "memory", "optimization", "langdev", "interpreter" ]
 draft = true
 +++
 
+Are you curious about the tricks that make some interpreted languages so fast?
+Let's analyze and implement **tagged pointers**, a clever optimization
+that can reduce memory footprint and boost performance.
 
 <!--more-->
 
-## Default representation
+## Beginning with theory
+
+A *pointer* is a word-sized integer representing a *memory address*.
+In theory, all the bits of a pointer are used as the whole address space is available.
+While this could be the case for older or embedded systems, modern ones leave some bits unused.
+
+Let's start from 32-bit architectures (x86, Arm32, etc.).
+They use *32-bit words* (they got their name from that), and can address up to 4GiB (4 × 1024&sup3;) of memory[^pae].
+Since 32-bit computers can use all of the 4 gigs, every bit in the address will be utilized, right? Well, not so fast.
+
+This is where **alignment** comes into the picture.
+Processors don't like reading from addresses not divisible by their word width.
+These unaligned accesses can cause performance problems (two separate reads are executed) or even trap.
+
+To prevent this, compilers insert padding to align values according to their types (you query it with `alignof`).
+Memory allocated with `malloc` is typically aligned to 8 bytes (16 on 64-bit systems)[^malloc].
+In this case, aligned pointers will always have their bottom 3 bits set to zero.
+
+```
+Top                       Bottom
+▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮000
+```
+
+We can use those bits to store some metadata, masking them when the raw pointer is needed.
+This technique, let's call it **low bits tagging**, can be used just as well on
+64-bit architectures (x86_64, Aarch64, etc.).
+
+But another opportunity opens up for us on these systems thanks to their
+massive (theoretical) address space of 16 EB (16 million terabytes!).
+This range is so ludicrously big that processors don't physically support all of it.
+Today, the most common configuration is using *48-bit virtual addresses*, leaving the top 16 bits unused.
+
+Let's see how these bits are handled in various architectures (technical content ahead!).
+
+### X86_64
+
+Canonical addresses on x86_64 must have bits 63 through 48 set to zeros or ones (depending on whether bit 47 is a zero or one)[^intel].
+The Linux kernel uses the bit 47 to distinguish between user and kernel address space[^mmap].
+This means that in user space, bits 63:47 will always be zero.
+
+Linux 4.14 added support for 5-level paging, which allows *57-bit virtual addresses*.
+Everything said above applies, but for bit 56 instead of 47.
+By default, address space above 47-bit will not be allocated for user space[^page5].
+
+An Intel extension called *Linear Address Masking*, ignores the high bits of pointers[^lam], avoiding explicit masking.
+The most significant bit is reserved for kernel use, allowing metadata in bits 62:57 (or 62:48 in LAM_U48 mode).
+
+A similar feature from AMD called *Upper Address Ignore* has not been added to the kernel due to
+security concerns[^uai].
+It allows the uppermost 7 bits to be ignored, including the most significant bit, which is
+usually reserved for kernel addresses.
+
+### Aarch64
+
+Aarch64 adopts a slightly different approach.
+For 48-bit virtual addresses, bits 63:48 must be either all 0s or all 1s.
+Address space partitioning is independent of bit 47, allowing for a range of 256 TB (instead of the 128 TB on x86_64).
+Kernel addresses have the high bits set to 1s, while user space addresses must have them set to zero[^armptr].
+
+Linux also supports 52-bit addressing[^armmap], which not all processors implement.
+For compatibility, all user space addresses will be by default in the 48-bit range.
+
+Arm-v8 introduced the *Top Byte Ignore* feature, which has similarities to Intel's LAM.
+It was added to allow for virtual address tagging[^tbi] and works by ignoring bits 63:56.
+Linux supports TBI alongside the *Memory Tagging Extension*[^mtelinux].
+
+### RISC-V
+
+RISC-V supports three addressing modes: 39-bit (SV39), 48-bit (SV48) and 57-bit (SV57) virtual addresses.
+Like x86_64, the high bits of the address must match (sign extend) the last addressable bit[^riscv].
+For example, when using SV48 addressing, bits 63:48 must be equal to bit 47.
+
+The virtual memory layout[^rvmap] is similar to x86_64, where bit 47 is set for kernel addresses
+and the top bits are zero for user space.
+
+The recently ratified *RISC-V Pointer Masking* proposal[^rvext], adds the `Smmpm`, `Smnpm` and `Ssnpm` extensions.
+These can be used to configure the processor to ignore `PMLEN` bits in addresses, starting from the most significant bit.
+
+By setting `PMLEN` to 7 the addresses are masked similarly to AMD's UAI.
+The security concerns were addressed in a note, specifying that an appropriate system ABI
+can prevent the usage of the MSB without changes to the masking mechanism.
+
+### To sum up
+
+As we have seen, each architecture leaves some parts of the address unused.
+Let's call the technique that stores metadata in those bits **high bits tagging**.
+
+For the sake of simplicity, let's select a single layout to act as a common denominator between the
+various platforms.
+To accommodate x86_64 5-level paging and LAM we can choose to use only bits 62:57 for tagging.
+These bits are allowed on RISC-V and also fall inside the range of Arm's TBI feature.
+
+Thus, we end up with the following pointer layout (`T` for tag bits).
+
+```
+Top                                                       Bottom
+▮TTTTTT▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮▮0000
+```
+
+## What are the advantages?
+
+Now that we have understood how pointer tagging works,
+we need to address :wink: *why* we would want to use it.
+
+
+
+
+
+## Implementation
+
+
+### Union
+
+
+## Tagged implementation
+
+### Low bit
+
+### High bit
 
 
-## Optimized tagged pointers
 
 
 ## A word of caution
 
+This does not work on funky architectures. FPGA, Xeon Phi
+
+
+While there are no standard guarantees about memory alignment, common architectures like x86 are very consistent.
+
+
+## Alternative approaches
+
+There are other clever ways to stuff extra information in a pointer.
+A technique called NaN boxing makes use specific bit configurations of the IEEE 754 double precision float
+to store pointers (up to 52 bits).
+
 
 ## References
+- https://muxup.com/2023q4/storing-data-in-pointers
+- https://coredumped.dev/2024/09/09/what-is-the-best-pointer-tagging-method
+- https://bernsteinbear.com/blog/small-objects
+- https://wingolog.org/archives/2011/05/18/value-representation-in-javascript-implementations
+- https://en.wikichip.org/wiki/arm/tbi
+- https://mikeash.com/pyblog/friday-qa-2012-07-27-lets-build-tagged-pointers.html
+
+[^mmap]: https://docs.kernel.org/arch/x86/x86_64/mm.html
+[^rvmap]: https://docs.kernel.org/arch/riscv/vm-layout.html
+[^armmap]: https://docs.kernel.org/arch/arm64/memory.html
+
+[^uai]: https://lwn.net/Articles/888914/
+
+[^lam]: https://lwn.net/Articles/902094
+[^armptr]: https://www.kernel.org/doc/Documentation/arm64/tagged-pointers.txt
+
+[^tbi]: https://source.android.com/docs/security/test/tagged-pointers#top-byte-ignore
+
+[^mte]: *Armv8.5-A Memory Tagging Extension*, \
+    https://developer.arm.com/-/media/Arm%20Developer%20Community/PDF/Arm_Memory_Tagging_Extension_Whitepaper.pdf
+
+[^mtelinux]: https://docs.kernel.org/arch/arm64/memory-tagging-extension.html
+
+[^page5]: https://docs.kernel.org/arch/x86/x86_64/5level-paging.html
+
+[^rvext]: *RISC-V Pointer Masking*, \
+    https://drive.google.com/file/d/159QffOTbi3EEbdkKndYRZ2c46D25ZLmO/view
+
+[^riscv]: *The RISC-V Instruction Set Manual: Volume II*, \
+    https://drive.google.com/file/d/17GeetSnT5wW3xNuAHI95-SI1gPGd5sJ_/view
+
+[^malloc]: Every libc has some alignment guarantees, for glibc see \
+    https://sourceware.org/glibc/wiki/MallocInternals
+
+[^pae]: I purposefully didn't mention PAE, see \
+    https://www.realworldtech.com/forum/?threadid=76912&curpostid=76973
+
+[^intel]: *Intel 64 and IA-32 Architectures Software Developer’s Manual*, \
+    https://cdrdv2.intel.com/v1/dl/getContent/671200
+
+[^rvnote]: https://github.com/riscv/riscv-j-extension/blob/master/zjpm/background.adoc#pointer-masking-and-privilege-modes
author	Federico Angelilli <code@fedang.net>	2025-02-15 02:30:11 +0100
committer	Federico Angelilli <code@fedang.net>	2025-02-15 02:30:11 +0100
commit	bbd194e390c1a9b64902df57bfb1ea9562e9fe67 (patch)
tree	b5b55a7633f913cfe2d6de3a3726fb1491a63292 /content/posts
parent	2570061c8a9fe49cae0e10c8d351692272373fba (diff)