ok so this has been driving me nuts for weeks now
AWS drops these new P6 instances with Blackwell GPUs and they're insanely fast. like we're talking about exactly what I need for Anansi's proof generation stuff. should be perfect right?
except here's the thing that makes me want to throw my laptop out the window. they didn't expose ANY gpu tee controls. none. zero. nothing in the docs about encrypted HBM, no gpu attestation api that customers can actually use. it's like they built this amazing race car but forgot to include the steering wheel
meanwhile azure has had confidential gpus working for months. MONTHS. their NCCads H100 v5 instances give you cpu AND gpu attestation right out of the box. they literally have a github repo that walks you through the whole thing. i can spin one up right now and get attestation tokens in like 10 minutes
google cloud? same deal. A3 confidential VMs with H100s, nvidia NRAS attestation, the whole nine yards. it just works
seriously AWS what are you doing
your competitors are shipping actual confidential computing features while you're over here writing blog posts about how fast your new gpus are. cool story but enterprises need more than speed benchmarks
this is what i'm talking about
every single piece working together perfectly. no missing parts, no "oh we'll add that later"
that's what cloud infrastructure should look like too
but whatever, can't sit around waiting for AWS to figure it out. so here's what we're actually doing for anansi alpha
two cloud MPC. sounds fancy but it's pretty straightforward. we use AWS P6 instances for the raw compute power because yeah they're stupid fast. but we keep everything confidential by doing multi party computation with azure or gcp confidential gpus
basically your data never exists in cleartext on any single cloud. each one only sees random garbage that looks like noise. even if AWS could somehow peek at the gpu memory (which they probably can't but still) all they'd see is meaningless random numbers
the proof bundle we spit out has everything. azure/gcp cpu attestation, gpu NRAS tokens, aws nitro enclave attestation, mpc protocol transcripts. boom. cryptographic proof that no single cloud provider ever saw your secrets
and here's the kicker - even if AWS could somehow read the pcie bus or gpu memory, doesn't matter. they're only seeing random shares that are useless without the other half
look i know this sounds like overkill. "just trust the cloud provider" right? except when you're dealing with enterprises running their most sensitive AI workloads, trust isn't enough anymore
i've been in meetings with compliance teams. they don't want to hear about your security promises or your certifications. they want cryptographic proof. they want attestation tokens they can verify themselves. they want audit trails that would hold up if things go sideways
that's exactly what anansi gives them. every single computation comes with a proof bundle that anyone can verify independently. no trust required
bottom line
aws has the fastest hardware but no confidential computing story. azure and gcp have confidential computing but slower gpus. so we just use both. problem solved
this is what's shipping in anansi alpha. code's gonna be open source soon so you can see exactly how we're doing it. and hey if anyone from aws is reading this - please just expose those gpu tee controls already. would make all our lives easier
anyway that's how we're solving confidential ai compute right now. not waiting around for perfect solutions, just building what works with what we've got