I finally have the feeling that I’m a decent programmer, so I thought it would be fun to write some advice with the idea of “what would have gotten me to this point faster?” I’m not claiming this is great advice for everyone, just that it would have been good advice for me.
我终于觉得自己是个像样的程序员了,所以我想以 "如果是我,会怎样做才能更快达到这个境界?"的想法来写一些建议,会很有趣。我并不是说这对每个人都是好建议,只是说这对我来说是个好建议。
如果你(或你的团队)经常自寻死路,那就把枪修好吧
I can’t tell you how many times I’ve been on a team and there’s something about the system that’s very easy to screw up, but no one thinks about ways to make it harder to make that mistake.
我无法告诉你有多少次,我在一个团队中,系统中有些东西很容易被搞砸,但没有人想过如何让犯错变得更难。
When I was doing iOS development, used CoreData, and subscribed to changes from a bunch of views. The subscription callback came on the same thread that the change was triggered from. Sometimes that was the main thread, and sometimes it was a background thread. Importantly in iOS development, you can only make UI updates on the main thread, or your app crashes. So a new subscription could be working fine, but then it would break when someone triggered a change from a background thread, or if you add a UI update later on.
我在进行 iOS 开发时使用了 CoreData,并订阅了大量视图的更改。订阅回调来自触发更改的同一线程。有时是主线程,有时是后台线程。重要的是,在 iOS 开发中,只能在主线程上进行 UI 更新,否则应用程序就会崩溃。因此,新订阅可能运行正常,但当有人从后台线程触发更改时,或者如果稍后添加用户界面更新,它就会崩溃。
This was just something people transparently accepted, and often came up in reviews for newer people on the team. Every now and then, one would slip through, and we’d go add a DispatchQueue.main.async
when we saw the crash report.
大家都很坦然地接受了这一点,而且经常会在对团队新人的审查中提到这一点。时不时就会有一个漏网之鱼,当我们看到崩溃报告时,就会去添加 DispatchQueue.main.async。
I decided to fix it, and it took ten minutes to update our subscription layer to call subscribers on the main thread instead, which eliminated a whole class of crashes, and a bit of mental load.
我决定解决这个问题,花了十分钟更新了我们的订阅层,改为在主线程上调用订阅器,这样就避免了一整类崩溃,也减轻了一点心理负担。
I’m not trying to be like “look at these idiots not fixing an obvious issue with the codebase, when it was obvious to me”, because it would have been obvious to anyone who thought about it for a few minutes. These things stick around a weird amount, because there’s never a natural time to address them. When you’re first getting onboarded, you’re not trying to change anything big. So you may think it’s weird, but you shouldn’t go changing a bunch of things you’re still learning about. Then when you’ve been on the team for a while, it sort of fades into the background.
我并不是想说 "看看这些白痴,明明代码库存在明显的问题,却不去解决",因为任何人只要多想几分钟就会明白。这些问题的存在很奇怪,因为从来没有一个自然的时间来解决它们。当你刚入职的时候,你并不想改变什么大事。所以你可能会觉得这很奇怪,但你不应该去改变一堆你还在学习的东西。当你在团队中工作一段时间后,这些问题就会逐渐消失。
It’s a mindset shift. You just need to occasionally remind yourself that you are capable of making you and your team’s life easier, when these sorts of things are hanging around.
这是一种心态的转变。你只需要偶尔提醒自己,当这些事情缠身时,你有能力让自己和团队的生活变得更轻松。
评估您在质量和速度之间的权衡,确保它适合您的情况
There’s always a trade-off between implementation speed and how confident you are about correctness. So you should ask yourself: how okay is it to ship bugs in my current context? If the answer to this doesn’t affect the way you work, you’re being overly inflexible.
在实现速度和对正确性的信心之间,总是需要权衡利弊。因此,你应该问问自己:在我目前的情况下,是否有可能出现错误?如果这个问题的答案不会影响你的工作方式,那么你就是过于不灵活了。
At my first job, I was working on greenfield projects around data processing, which had good systems in place to retroactively re-process data. The impact of shipping a bug was very minor. The proper response to that environment is to rely on the guardrails a bit, move faster because you can. You don’t need 100% test coverage or an extensive QA process, which will slow down the pace of development.
在我的第一份工作中,我从事的是与数据处理有关的新建项目,这些项目拥有良好的系统,可以追溯性地重新处理数据。出错的影响非常小。在这种环境下,正确的应对方法是稍微依赖护栏,因为可以,所以走得更快。你不需要 100% 的测试覆盖率,也不需要广泛的质量保证流程,因为这会拖慢开发速度。
At my second company, I was working on a product used by tens of millions of people, which involved high-value financial data and personally identifiable information. Even a small bug would entail a post-mortem. I shipped features at a snail’s pace, but I also think I may have shipped 0 bugs that year.
在我的第二家公司,我正在开发一款有数千万人使用的产品,其中涉及高价值的财务数据和个人身份信息。哪怕是一个小错误,都会被追责。我以蜗牛般的速度开发功能,但我也认为那一年我可能只开发了 0 个 Bug。
Usually, you’re not at the second company. I’ve seen a lot of devs err on the side of that sort of programming though. In situations where bugs aren’t mission critical (ex. 99% of web apps), you’re going to get further with shipping fast and fixing bugs fast, than taking the time to make sure you’re shipping pristine features on your first try.
通常情况下,你不在第二家公司。不过,我看到很多开发人员都偏向于这种编程方式。在错误不是关键任务的情况下(例如,99% 的网络应用程序),快速交付和快速修复错误比花时间确保在第一次尝试时就交付原始功能更有意义。
磨刀不误砍柴工
You’re going to be renaming things, going to type definitions, finding references, etc a lot; you should be fast at this. You should know all the major shortcuts in your editor. You should be a confident and fast typist. You should know your OS well. You should be proficient in the shell. You should know how to use the browser dev tools effectively.
你要经常重命名事物、查找类型定义、查找参考文献等;你应该在这方面很快。你应该知道编辑器中的所有主要快捷方式。你应该是一个自信而快速的打字员。熟悉操作系统。熟练掌握 shell。了解如何有效使用浏览器开发工具。
I can already tell people are gonna be in the comments like “you can’t just spend all day tweaking your neovim config, sometimes you need to chop the tree too”. I don’t think I’ve ever seen someone actually overdo this though; one of the biggest green flags I’ve seen in new engineers is a level of care in choosing and becoming proficient with their tools.
我已经知道有人会在评论中说:"你不能整天都在调整你的 neovim 配置,有时你也需要砍树"。不过,我从未见过有人真的过度使用这些工具;我在新工程师身上看到的最大 "绿旗 "之一,就是他们在选择和熟练使用工具时的谨慎程度。
如果你无法轻易解释为什么某件事情很困难,那么它就是偶然的复杂性,也许值得解决
My favorite manager in my career had a habit of pressing me when I would claim something was difficult to implement. Often his response was something along the lines of “isn’t this just a matter of sending up X when we Y”, or “isn’t this just like Z that we did a couple months ago”? Very high level objections, is what I’m trying to say, not on the level of the actual functions and classes we were dealing with, which I was trying to explain.
在我的职业生涯中,我最喜欢的经理有一个习惯,那就是当我说某件事很难实施时,他会逼问我。他的回答往往是 "这不就是在我们 Y 时发送 X 的问题吗",或者 "这不就是我们几个月前做的 Z 的问题吗"?我想说的是,这些都是非常高层次的反对意见,而不是我试图解释的我们正在处理的实际函数和类的问题。
I think conventional wisdom is that it’s just annoying when managers simplify things like this. But a shockingly high percentage of the time, I’d realize when he was pressing me, that most of the complexity I was explaining was incidental complexity, and that I could actually address that first, thus making the problem as trivial as he was making it sound. This sort of thing tends to make future changes easier too.
我认为传统观点认为,管理者把事情简化成这样只会让人讨厌。但令人震惊的是,很多时候,当他催促我时,我会意识到我所解释的复杂性大多是附带的复杂性,而我其实可以先解决这些问题,从而使问题变得像他说的那样微不足道。这种事情往往也会让未来的修改变得更容易。
尝试解决更深层次的问题
Imagine you have a React component in a dashboard, that deals with a User
object retrieved from state, of the currently logged in user. You see a bug report in Sentry, where user
was null
during render. You could add a quick if (!user) return null
. Or you could investigate a bit more, and find that your logout function makes two distinct state updates – the first to set the user to null
, the second to redirect to the homepage. You swap the two, and now no component will ever have this bug again, because the user object is never null
while you’re within the dashboard.
想象一下,您在仪表板中有一个 React 组件,该组件处理从当前登录用户的状态中获取的 User 对象。您在 Sentry 中看到一个错误报告,其中用户在呈现时为空。你可以快速添加 if (!user) 返回 null。或者你可以再深入调查一下,发现注销函数进行了两次不同的状态更新--第一次是将用户设置为空,第二次是重定向到主页。将两者互换,现在任何组件都不会再出现这个 bug,因为在仪表板中,用户对象永远不会为空。
Keep doing the first sort of bug fix, and you end up with a mess. Keep doing the second type of bug fix, and you’ll have a clean system and a deep understanding of the invariants.
继续进行第一种错误修复,结果就是一团糟。继续进行第二种错误修复,你就会拥有一个干净的系统,并对不变式有深刻的理解。
不要低估挖掘历史调查一些错误的价值
I’ve always been pretty good at debugging weird issues, with the usual toolkit of println
and the debugger. So I never really looked at git much to figure out the history of a bug. But for some bugs it’s crucial.
我一直很擅长调试奇怪的问题,通常使用的工具包是 println 和调试器。所以我从来没怎么用过 git 来了解 bug 的历史。但对于某些 bug 而言,这是至关重要的。
I recently had an issue with my server where it was leaking memory seemingly constantly, and then getting OOM-killed and restarted. I couldn’t figure out the cause of this for the life of me. Every likely culprit was ruled out, I couldn’t reproduce it locally, it felt like throwing darts blindfolded.
最近,我的服务器遇到了一个问题,它似乎一直在泄漏内存,然后被 OOM 杀死并重新启动。我怎么也找不到原因。所有可能的罪魁祸首都被排除了,我无法在本地重现,感觉就像蒙着眼睛扔飞镖。
I looked at the commit history, and found it started happening after I added support for Play Store payments. Never a place I would have looked in a million years, it’s just a couple http requests. Turns out it was getting stuck in an infinite loop of fetching access tokens, after the first one expired. Maybe every request only added a kB or so to memory, but when they’re retrying every 10ms on multiple threads, that adds up quick. And usually this sort of thing would have resulted in a stack overflow, but I was using async recursion in Rust, which doesn’t stack overflow. This never would have occurred to me, but when I’m forced to look into a specific bit of code that I know must have caused it, suddenly the theory pops up.
我查看了提交历史,发现它是在我添加 Play Store 支付支持后开始出现的。我万万没想到,这只是几个 http 请求。原来,在第一个访问令牌过期后,它就陷入了获取访问令牌的无限循环中。也许每个请求只增加了 1KB 左右的内存,但如果在多个线程上每隔 10 毫秒就重试一次,内存就会迅速增加。通常这种情况会导致堆栈溢出,但我使用的是 Rust 中的异步递归,它不会导致堆栈溢出。我从来没想过会出现这种情况,但当我被迫去研究一段特定的代码时,我知道一定是它导致了这种情况,突然间我就想到了这个理论。
I’m not sure what the rule is here for when to do this and when not to. It’s intuition based, a different sort of “huh” to a bug report that triggers this sort of investigation. You’ll develop the intuition over time, but it’s enough to know that sometimes it’s invaluable, if you’re stuck.
我不知道什么时候该这么做,什么时候不该这么做。这是一种直觉,是对引发此类调查的错误报告的一种不同的 "呵呵"。随着时间的推移,你会逐渐形成这种直觉,但只要知道有时在你陷入困境时这种直觉是非常宝贵的就足够了。
Along similar lines, try out git bisect
if the problem is amenable to it — meaning a git history of small commits, a quick automated way to test for the issue, and you have one commit you know is bad and one that’s good.
与此类似,如果问题可以用 git bisect 来解决,那就试试吧--这意味着 git 历史上的小提交、快速自动测试问题的方法,以及您知道的一个坏提交和一个好提交。
糟糕的代码会给你反馈,完美的代码不会。偏向于编写糟糕的代码
It’s really easy to write terrible code. But it’s also really easy to write code that follows absolutely every best practice, which has been unit, integration, fuzz, and mutation-tested for good measure – your startup will just run out of money before you finish. So a lot of programming is figuring out the balance.
写出糟糕的代码真的很容易。但要写出完全遵循所有最佳实践的代码,并经过单元测试、集成测试、模糊测试和突变测试,也是非常容易的。因此,很多编程工作都是在寻找平衡点。
If you err on the side of writing code quickly, you’ll occasionally get bitten by a bad bit of tech debt. You’ll learn stuff like “I should add great testing for data processing, because it’s often impossible to correct later”, or “I should really think through table design, because changing things without downtime can be extremely hard”.
如果你偏向于快速编写代码,那么你偶尔就会被一些糟糕的技术债务所困扰。你会学到这样的东西:"我应该为数据处理添加强大的测试,因为事后往往无法纠正",或者 "我真的应该仔细考虑表格的设计,因为在不停机的情况下更改东西会非常困难"。
If you err on the side of writing perfect code, you don’t get any feedback. Things just universally take a long time. You don’t know where you’re spending your time on things that really deserve it, and where you’re wasting time. Feedback mechanisms are essential for learning, and you’re not getting that.
如果你偏向于编写完美的代码,就不会得到任何反馈。事情总是要花费很长时间。你不知道自己在哪些方面花了时间,在哪些方面浪费了时间。反馈机制对学习至关重要,而你却得不到。
To be clear I don’t mean bad as in “I couldn’t remember the syntax for creating a hash map, so I did two inner loops instead”, I mean bad as in:
说白了,我不是指 "我记不住创建哈希映射的语法,所以用了两个内循环来代替 "的糟糕,而是指 "我记不住创建哈希映射的语法,所以用了两个内循环来代替 "的糟糕:
Instead of a re-write of our data ingestion to make this specific state unrepresentable, I added a couple asserts over our invariants, at a couple key checkpoints
我没有重写我们的数据摄取,使这种特定状态无法体现,而是在我们的不变式上,在几个关键检查点上添加了几个断言
Our server models are exactly the same as the DTOs we would write, so I just serialized those, instead of writing all the boilerplate, we can write DTOs later, as needed
我们的服务器模型与我们要编写的 DTO 完全相同,所以我只是将这些模型序列化,而不是编写所有的模板,以后我们可以根据需要编写 DTO
I skipped writing tests for these components because they’re trivial and a bug in one of them is no big deal
我跳过了这些组件的测试,因为它们都很琐碎,其中一个组件出错也没什么大不了的
让调试更轻松
There’s so many little tricks I’ve acquired over the years on making software easier to debug. If you don’t make any effort to make debugging easy, you’re going to spend unacceptable amounts of time debugging each issue, as your software gets more and more complex. You’ll be terrified to make changes because even a couple new bugs might take you a week to figure out.
多年来,我学到了很多让软件更容易调试的小技巧。如果你不努力让调试变得简单,那么随着软件变得越来越复杂,你将花费难以接受的时间来调试每个问题。你会害怕进行修改,因为即使是几个新的错误,你也可能要花一周的时间才能弄明白。
Here’s some examples of what I mean:
For the backend of Chessbook
I have a command to copy all of a user’s data down to local, so I can reproduce issues easily with only a username
我有一个命令,可以将用户的所有数据复制到本地,这样我就可以只用一个用户名就轻松地重现问题了
I trace every local request with OpenTelemetry, making it very easy to see how a request spends its time
我使用 OpenTelemetry 跟踪每个本地请求,这样就能很容易地了解请求是如何耗费时间的
I have a scratch file that acts as a pseudo-REPL, which re-executes on every change. This makes it easy to pull out bits of code and play around with it to get a better idea what’s going on
我有一个作为伪 REPL 的 scratch 文件,每次更改都会重新执行。这样就可以很容易地调出一些代码并对其进行处理,以便更好地了解发生了什么事
In the staging environment, I limit parallelism to 1, so that it’s easier to visually parse logs
在试运行环境中,我将并行性限制为 1,以便更容易直观地解析日志
For the frontend
I have a debugRequests
setting which prevents optimistic loading of data, to make it easier to debug requests
我有一个 debugRequests 设置,可以防止乐观加载数据,从而更容易调试请求
I have a debugState
setting that will print out the entire state of the program after every update, along with a pretty diff of what changed
我有一个 debugState 设置,每次更新后都会打印出程序的整个状态,以及一个漂亮的变化差异表
I have a file full of little functions that get the UI into specific states, so that as I’m trying to fix bugs, I don’t have to keep clicking in the UI to get to that state.
我有一个文件,里面装满了让用户界面进入特定状态的小函数,这样当我试图修复错误时,就不必一直点击用户界面来进入该状态。
Stay vigilant about how much of your debugging time is spent on setup, reproduction, and cleanup afterwards. If it’s over 50%, you should figure out how to make it easier, even if that will take slightly longer this time. Bugs should get easier to fix over time, all else being equal.
保持警惕,看看有多少调试时间花在设置、重现和事后清理上。如果超过 50%,你就应该想办法让它变得更容易,即使这次需要的时间会稍长一些。在其他条件相同的情况下,随着时间的推移,错误应该会越来越容易修复。
在团队中工作时,你通常应该问这样一个问题
There’s a spectrum of “trying to figure out everything for yourself” to “bugging your coworkers with every little question”, and I think most people starting their careers are too far on the former side. There’s always someone around that has been in the codebase longer, or knows technology X way better than you, or knows the product better, or is just a more experienced engineer in general. There’s so many times in the first 6 months of working somewhere, where you could spend over an hour figuring something out, or you could get an answer in a few minutes.
从 "试图自己解决所有问题 "到 "向同事纠缠所有问题",我认为大多数初入职场的人都偏向于前者。总有人在代码库里待的时间更长,或者比你更了解 X 技术,或者比你更了解产品,或者是更有经验的工程师。在某个地方工作的前 6 个月里,你有很多次可能要花一个多小时才能弄明白一件事,或者几分钟就能得到答案。
Ask the question. The only time this will be annoying to anyone, is if it’s clear you could have found the answer yourself in a few minutes.
提出问题。只有当你很明显可以在几分钟内自己找到答案时,这样做才会让别人讨厌。
发货节奏非常重要。认真思考怎样才能快速、频繁地发货
Startups have limited runway. Projects have deadlines. When you quit your job to strike out on your own, your savings will only last you for so many months.
初创企业的发展空间有限。项目有期限。当你辞职自立门户时,你的积蓄只能支撑几个月。
Ideally, your speed on a project only compounds over time, until you’re shipping features faster than you could have imagined. To ship fast you need a lot of things:
理想情况下,你在一个项目上的速度只会随着时间的推移而加快,直到你以超乎想象的速度交付功能。要实现快速交付,你需要很多东西:
A system that isn’t prone to bugs
不易出现错误的系统
Quick turnaround time between teams
团队之间的快速周转时间
A willingness to cut out the 10% of a new feature that’s going to take 50% of the engineering time, and the foresight to know what those pieces are
愿意砍掉新功能中需要花费 50% 工程时间的 10% 部分,并有远见地知道这些部分是什么
Consistent reusable patterns, that you can compose together for new screens/features/endpoints
一致的可重用模式,您可以将其组合在一起,用于新的屏幕/功能/端点
Quick, easy deploys
快速、轻松部署
Process that doesn’t slow you down; flaky tests, slow CI, fussy linters, slow PR reviews, JIRA as a religion, etc.
不拖累你的流程;不稳定的测试、缓慢的 CI、挑剔的衬垫、缓慢的 PR 审核、将 JIRA 奉为信仰等等。
About a million other things
大约一百万件其他事情
Shipping slowly should merit a post-mortem as much as breaking production does. Our industry doesn’t run like that, but that doesn’t mean you can’t personally follow the north star of Shipping Fast.
缓慢出货应该像打破产量一样,值得进行事后总结。我们的行业不是这样运行的,但这并不意味着你不能亲自追随 "快速出货 "这颗北极星。