Claude Code 能够调试底层密码学 ylc3000 2025-11-13 0 浏览 0 点赞 长文 # Claude Code Can Debug Low-level Cryptography # Claude Code 能够调试底层密码学 *1 Nov 2025* *2025年11月1日* Over the past few days I wrote a new Go implementation of ML-DSA, a post-quantum signature algorithm specified by NIST last summer. I [livecoded](https://twitch.tv/filosottile) it all over four days, finishing it on Thursday evening. Except… Verify was always rejecting valid signatures. 在过去的几天里,我用 Go 语言重新实现了一遍 ML-DSA,这是 NIST 去年夏天发布的一种后量子签名算法。我花了四天时间进行[直播编程](https://twitch.tv/filosottile),并在周四晚上完成了它。然而…… `Verify` 函数总是拒绝有效的签名。 ```sh $ bin/go test crypto/internal/fips140/mldsa --- FAIL: TestVector (0.00s) mldsa_test.go:47: Verify: mldsa: invalid signature mldsa_test.go:84: Verify: mldsa: invalid signature mldsa_test.go:121: Verify: mldsa: invalid signature FAIL FAIL crypto/internal/fips140/mldsa 2.142s FAIL ``` I was exhausted, so I tried debugging for half an hour and then gave up, with the intention of coming back to it the next day with a fresh mind. 当时我已经筋疲力尽,所以尝试调试了半个小时就放弃了,打算第二天脑子清醒了再回来处理。 On a whim, I figured I would let Claude Code take a shot while I read emails and resurfaced from hyperfocus. I mostly expected it to flail in some maybe-interesting way, or rule out some issues. 一时兴起,我决定在我读邮件、从高度专注中恢复过来的时候,让 Claude Code 试一试。我当时主要期望它能以某种或许有趣的方式失败,或者排除一些问题。 Instead, it rapidly figured out a fairly complex low-level bug in my implementation of a relatively novel cryptography algorithm. I am sharing this because it made me realize I still don’t have a good intuition for when to invoke AI tools, and because I think it’s a fantastic case study for anyone who’s still skeptical about their usefulness. 然而,它迅速找出了我在一个相对新颖的密码学算法实现中的一个相当复杂的底层错误。我分享这个经历,是因为它让我意识到,我对于何时使用 AI 工具还没有一个好的直觉,而且我认为对于那些仍然对其用处持怀疑态度的人来说,这是一个绝佳的案例研究。 > Full disclosure: Anthropic gave me a few months of Claude Max for free. They reached out one day and told me they were giving it away to some open source maintainers. Maybe it’s a ploy to get me hooked so I’ll pay for it when the free coupon expires. Maybe they hoped I’d write something like this. Maybe they are just nice. Anyway, they made no request or suggestion to write anything public about Claude Code. Now you know. > > **完全披露:** Anthropic 免费给了我几个月的 Claude Max 使用权。有一天他们联系我,说他们正在向一些开源维护者赠送这个服务。也许这是为了让我上瘾,好在免费券过期后付费。也许他们希望我能写点像这样的东西。也许他们只是人好。无论如何,他们没有要求或建议我公开发表任何关于 Claude Code 的文章。现在你都知道了。 ## Finding the bug ## 找到 Bug I started Claude Code v2.0.28 with Opus 4.1 and no system prompts, and gave it the following prompt (typos included): 我启动了 Claude Code v2.0.28,使用了 Opus 4.1 模型,没有设置任何系统提示,然后给了它以下提示(包括拼写错误): > I implemented ML-DSA in the Go standard library, and it all works except that verification always rejects the signatures. I know the signatures are right because they match the test vector. > > YOu can run the tests with “bin/go test crypto/internal/fips140/mldsa” > > You can find the code in src/crypto/internal/fips140/mldsa > > Look for potential reasons the signatures don’t verify. ultrathink > > I spot-checked and w1 is different from the signing one. > > 我在 Go 标准库中实现了 ML-DSA,一切都正常,只是验证总是拒绝签名。我知道签名是正确的,因为它们与测试向量匹配。 > > 你可以用 “bin/go test crypto/internal/fips140/mldsa” 来运行测试。 > > 你可以在 src/crypto/internal/fips140/mldsa 找到代码。 > > 寻找签名无法验证的潜在原因。ultrathink > > 我抽查了一下,发现 w1 和签名时的不一样。 To my surprise, it pinged me a few minutes later with [a complete fix](https://go-review.googlesource.com/c/go/+/716540/1..2). 令我惊讶的是,几分钟后它就给我发来了一个[完整的修复方案](https://go-review.googlesource.com/c/go/+/716540/1..2)。 Maybe I shouldn’t be surprised! Maybe it would have been clear to anyone more familiar with AI tools that this was a good AI task: a well-scoped issue with failing tests. On the other hand, this is a low-level issue in a fresh implementation of a complex, *relatively novel* algorithm. 也许我不应该感到惊讶!对于更熟悉 AI 工具的人来说,这可能是一个很明显的适合 AI 的任务:一个范围明确、有失败测试用例的问题。但另一方面,这是一个在一个复杂且*相对新颖*的算法的全新实现中的底层问题。 It figured out that I had merged `HighBits` and `w1Encode` into a single function for using it from Sign, and then reused it from Verify where `UseHint` already produces the high bits, effectively taking the high bits of w1 twice in Verify. 它发现我为了在签名函数 `Sign` 中使用,将 `HighBits` 和 `w1Encode` 合并成了一个函数,然后在验证函数 `Verify` 中重用了它。但在 `Verify` 中,`UseHint` 已经生成了高位比特,这导致在 `Verify` 中对 w1 的高位比特取了两次。 Looking at [the log](https://gist.github.com/FiloSottile/d019f68db7143493c6a7e9c5fd08e872), it loaded the implementation into the context and then *immediately* figured it out, without any exploratory tool use! After that it wrote itself a cute little test that reimplemented half of verification to confirm the hypothesis, wrote a mediocre fix, and checked the tests pass. 查看[日志](https://gist.github.com/FiloSottile/d019f68db7143493c6a7e9c5fd08e872)可以发现,它将实现加载到上下文中,然后*立刻*就发现了问题,完全没有使用任何探索性工具!之后,它自己写了一个可爱的小测试,重新实现了一半的验证过程来证实它的假设,然后写了一个质量平平的修复方案,并检查测试是否通过。 I [threw the fix away](https://go-review.googlesource.com/c/go/+/716540/2..3) and refactored `w1Encode` to take high bits as input, and changed the type of the high bits, which is both clearer and saves a round-trip through Montgomery representation. Still, this 100% saved me a bunch of debugging time. 我[扔掉了它的修复方案](https://go-review.googlesource.com/c/go/+/716540/2..3),重构了 `w1Encode` 函数,使其接受高位比特作为输入,并更改了高位比特的类型。这样做既更清晰,又节省了一次通过蒙哥马利表示的往返计算。尽管如此,这 100% 为我节省了大量的调试时间。 ## A second synthetic experiment ## 第二个综合实验 On Monday, I had also finished implementing signing with failing tests. There were two bugs, which I fixed in the following couple evenings. 周一,我也完成了签名功能的实现,但测试失败了。有两个 bug,我在接下来的几个晚上修复了它们。 The first one was due to [somehow computing a couple hardcoded constants (1 and -1 in the Montgomery domain) wrong](https://go-review.googlesource.com/c/go/+/716240/1..2). It was very hard to find, requiring a lot of deep printfs and guesswork. Took me maybe an hour or two. 第一个 bug 是因为[不知怎么地算错了几个硬编码的常量(蒙哥马利域中的 1 和 -1)](https://go-review.googlesource.com/c/go/+/716240/1..2)。这个 bug 很难找,需要大量的深度 `printf` 调试和猜测。大概花了我一两个小时。 The second one was easier: [a value that ends up encoded in the signature was too short (32 bits instead of 32 bytes)](https://go-review.googlesource.com/c/go/+/716240/2..3). It was relatively easy to tell because only the first four bytes of the signature were the same, and then the signature lengths were different. 第二个 bug 比较简单:[一个最终被编码进签名的值太短了(应该是 32 字节,但只有 32 比特)](https://go-review.googlesource.com/c/go/+/716240/2..3)。这个问题相对容易发现,因为只有签名的前四个字节是相同的,之后签名的长度就不同了。 I figured these would be an interesting way to validate Claude’s ability to help find bugs in low-level cryptography code, so I checked out the old version of the change with the bugs (yay Jujutsu!) and kicked off a fresh Claude Code session with this prompt: 我觉得这会是一个有趣的方式来验证 Claude 在底层密码学代码中找 bug 的能力,所以我检出了带有这些 bug 的旧版本代码(感谢 Jujutsu!),然后启动了一个新的 Claude Code 会话,并给了它这个提示: > I am implementing ML-DSA in the Go standard library, and I just finished implementing signing, but running the tests against a known good test vector it looks like it goes into an infinite loop, probably because it always rejects in the Fiat-Shamir with Aborts loop. > > You can run the tests with “bin/go test crypto/internal/fips140/mldsa” > > You can find the code in src/crypto/internal/fips140/mldsa > > Figure out why it loops forever, and get the tests to pass. ultrathink > > 我正在 Go 标准库中实现 ML-DSA,刚刚完成了签名功能的实现,但在用一个已知的良好测试向量进行测试时,它似乎进入了无限循环,可能是因为它总是在“Fiat-Shamir with Aborts”循环中拒绝。 > > 你可以用 “bin/go test crypto/internal/fips140/mldsa” 来运行测试。 > > 你可以在 src/crypto/internal/fips140/mldsa 找到代码。 > > 找出它为什么会无限循环,并让测试通过。ultrathink It spent [some time doing printf debugging and chasing down incorrect values very similarly to how I did it, and then figured out and fixed the wrong constants](https://gist.github.com/FiloSottile/d16c37b2fada56875a894cdd2670a860). Took Claude definitely less than it took me. Impressive. 它花了[一些时间进行 `printf` 调试,追踪错误的值,这和我做的方式非常相似,然后找出并修复了错误的常量](https://gist.github.com/FiloSottile/d16c37b2fada56875a894cdd2670a860)。Claude 花的时间肯定比我少。令人印象深刻。 It gave up after fixing that bug even if the tests still failed, so I started a fresh session (on the assumption that the context on the wrong constants would do more harm than good investigating an independent bug), and gave it this prompt: 即使测试仍然失败,它在修复那个 bug 后就放弃了。所以我启动了一个新的会话(假设关于错误常量的上下文在调查一个独立的 bug 时弊大于利),并给了它这个提示: > I am implementing ML-DSA in the Go standard library, and I just finished implementing signing, but running the tests against a known good test vector they don’t match. > > You can run the tests with “bin/go test crypto/internal/fips140/mldsa” > > You can find the code in src/crypto/internal/fips140/mldsa > > Figure out what is going on. ultrathink > > 我正在 Go 标准库中实现 ML-DSA,刚刚完成了签名功能的实现,但在用一个已知的良好测试向量进行测试时,它们不匹配。 > > 你可以用 “bin/go test crypto/internal/fips140/mldsa” 来运行测试。 > > 你可以在 src/crypto/internal/fips140/mldsa 找到代码。 > > 搞清楚发生了什么。ultrathink [It took a couple wrong paths, thought for quite a bit longer, and then found this one too](https://gist.github.com/FiloSottile/b184888663c5d57078dc90b1a019981b). I honestly expected it to fail initially. [它走了几条弯路,思考了相当长的时间,然后也找到了这个问题](https://gist.github.com/FiloSottile/b184888663c5d57078dc90b1a019981b)。老实说,我一开始以为它会失败。 It’s interesting how Claude found the “easier” bug more difficult. My guess is that maybe the large random-looking outputs of the failing tests did not play well with its attention. 有趣的是,Claude 觉得那个“更简单”的 bug 更难找。我猜想,可能是失败测试中大量看起来随机的输出对它的注意力机制不太友好。 The fix it proposed was updating only the allocation’s length and not its capacity, but whatever, the point is finding the bug, and I’ll usually want to throw away the fix and rewrite it myself anyway. 它提出的修复方案只更新了内存分配的长度而没有更新容量,但无所谓了,重点是找到 bug,反正我通常都会扔掉它的修复方案自己重写。 Three out of three one-shot debugging hits with no help is *extremely impressive*. Importantly, there is no need to trust the LLM or review its output when its job is just saving me an hour or two by telling me where the bug is, for me to reason about it and fix it. 三次独立的调试任务,一次成功,无需任何帮助,这*非常令人印象深刻*。重要的是,当 LLM 的工作只是通过告诉我 bug 在哪里来为我节省一两个小时,让我自己去推理和修复时,我完全不需要信任它或审查它的输出。 As ever, I wish we had better tooling for using LLMs which didn’t look like chat or autocomplete or “make me a PR.” For example, how nice would it be if every time tests fail, an LLM agent was kicked off with the task of figuring out why, and only notified us if it did before we fixed it? 和以往一样,我希望我们能有更好的工具来使用 LLM,而不是像聊天、自动补全或“给我创建一个 PR”那样的东西。例如,如果每次测试失败时,都能启动一个 LLM 代理来找出原因,并且只有在它在我们修复问题之前找到答案时才通知我们,那该多好啊? For more low-level cryptography ~~bugs~~ implementations, follow me on Bluesky at [@filippo.abyssdomain.expert](https://bsky.app/profile/filippo.abyssdomain.expert) or on Mastodon at [@filippo@abyssdomain.expert](https://abyssdomain.expert/@filippo). I promise I almost never post about AI. 想要了解更多底层密码学的~~bug~~实现,请在 Bluesky 上关注我 [@filippo.abyssdomain.expert](https://bsky.app/profile/filippo.abyssdomain.expert) 或在 Mastodon 上关注 [@filippo@abyssdomain.expert](https://abyssdomain.expert/@filippo)。我保证我几乎从不发关于 AI 的帖子。 ## The picture ## 图片 Enjoy the silliest floof. Surely this will help redeem me in the eyes of folks who consider AI less of a tool and more of something to be hated or loved. 欣赏这只最傻的毛茸茸小家伙吧。对于那些不把 AI 仅仅看作工具,而是非爱即恨的人来说,这肯定能帮我挽回一些形象。 My work is made possible by [Geomys](https://geomys.org), an organization of professional Go maintainers, which is funded by [Smallstep](https://smallstep.com/), [Ava Labs](https://www.avalabs.org/), [Teleport](https://goteleport.com/), [Tailscale](https://tailscale.com/), and [Sentry](https://sentry.io/). Through our retainer contracts they ensure the sustainability and reliability of our open source maintenance work and get a direct line to my expertise and that of the other Geomys maintainers. (Learn more in the [Geomys announcement](https://words.filippo.io/geomys).) Here are a few words from some of them! 我的工作得以进行,离不开 [Geomys](https://geomys.org) 的支持,这是一个由专业 Go 语言维护者组成的组织,由 [Smallstep](https://smallstep.com/)、[Ava Labs](https://www.avalabs.org/)、[Teleport](https://goteleport.com/)、[Tailscale](https://tailscale.com/) 和 [Sentry](https://sentry.io/) 资助。通过我们的长期合作合同,他们确保了我们开源维护工作的可持续性和可靠性,并能直接获得我和其他 Geomys 维护者的专业知识。(在 [Geomys 公告](https://words.filippo.io/geomys)中了解更多。)这里是其中一些赞助商的寄语! Teleport — For the past five years, attacks and compromises have been shifting from traditional malware and security breaches to identifying and compromising valid user accounts and credentials with social engineering, credential theft, or phishing. [Teleport Identity](https://goteleport.com/platform/identity/?utm=filippo) is designed to eliminate weak access patterns through access monitoring, minimize attack surface with access requests, and purge unused permissions via mandatory access reviews. Teleport — 在过去五年中,攻击和入侵的方式已经从传统的恶意软件和安全漏洞,转向通过社交工程、凭证盗窃或网络钓鱼来识别和攻破有效的用户账户和凭证。[Teleport Identity](https://goteleport.com/platform/identity/?utm=filippo) 旨在通过访问监控消除薄弱的访问模式,通过访问请求最小化攻击面,并通过强制性访问审查清除未使用的权限。 Ava Labs — We at [Ava Labs](https://www.avalabs.org), maintainer of [AvalancheGo](https://github.com/ava-labs/avalanchego) (the most widely used client for interacting with the [Avalanche Network](https://www.avax.network)), believe the sustainable maintenance and development of open source cryptographic protocols is critical to the broad adoption of blockchain technology. We are proud to support this necessary and impactful work through our ongoing sponsorship of Filippo and his team. Ava Labs — 我们 [Ava Labs](https://www.avalabs.org) 是 [AvalancheGo](https://github.com/ava-labs/avalanchego)(与 [Avalanche 网络](https://www.avax.network)交互使用最广泛的客户端)的维护者,我们相信开源密码协议的可持续维护和发展对于区块链技术的广泛采用至关重要。我们很自豪能通过持续赞助 Filippo 和他的团队来支持这项必要且有影响力的工作。 网闻录 Claude Code 能够调试底层密码学