O'Reilly 史上最长新书《AI Systems Performance Engineering》登顶 Amazon 榜首 O'Reilly Media 2025-10-23 0 浏览 0 点赞 长文 一部长达 1000 页的 O'Reilly 新书《AI Systems Performance Engineering》已荣登 Amazon "computer hardware & architecture" 类别榜首。这也是 O'Reilly 历史上最长的一本书,全面解读了现实 AI 系统中硬件与软件如何协同工作,直击鲜少讨论但影响深远的性能痛点。 书中重点内容包括: 1. Introduction and AI System Overview 2. AI System Hardware Overview 3. OS, Docker, and Kubernetes Tuning 4. Tuning Distributed Networking Communication 5. GPU-based Storage I/O Optimizations 6. GPU Architecture, CUDA Programming, and Maximizing Occupancy 7. Profiling and Tuning GPU Memory Access Patterns 8. Occupancy Tuning, Warp Efficiency, and Instruction-Level Parallelism 9. Increasing CUDA Kernel Efficiency and Arithmetic Intensity 10. Intra-Kernel Pipelining and Cooperative Thread Block Clusters 11. Inter-Kernel Pipelining and CUDA Streams 12. Dynamic and Device-Side Kernel Orchestration 13. Profiling, Tuning, and Scaling PyTorch 14. PyTorch Compiler, XLA, and OpenAI Triton Backends 15. Multi-Node Inference Parallelism and Routing 16. Profiling, Debugging, and Tuning Inference at Scale 17. Scaling Disaggregated Prefill and Decode 18. Advanced Prefill-Decode and KV Cache Tuning 19. Dynamic and Adaptive Inference Engine Optimizations 20. AI-Assisted Performance Optimizations 作者曾在 AWS 和 Databricks 担任工程领导,业内多位专家均给予高度好评。无论是从硬件调优、操作系统与容器编排,到 GPU 编程和 PyTorch 优化,全方位解析了现代 AI 系统性能的每个环节,为企业实现技术突破和成本节约提供了宝贵指导。 现已开启预订,实体书将于下月发布,O'Reilly 订阅用户也可提前阅读电子版。 这本书不仅回答了现代 AI 系统在性能上几乎所有的疑问,也为关注 AI 经济学转型带来新的思考和突破。 原推文链接 X平台上关于本书的原始推文 #AI #CUDA #Docker #GPU #Kubernetes #PyTorch #分布式系统 #性能优化 #机器学习 #深度学习