Maintain Your Mechanical Sympathy

As a new technologist there are an endless variety of lessons to learn. Everything from the fundamentals of logic, through syntax and software design, and on to building and integrating large distributed systems. Over years of practice, ceaseless learning, and inscrutable errors you will internalize understanding of more computational errata than you could possibly foresee. In this torrent of information, your perpetual bafflement will be washed away, along with your "beginner mind".

One thing that will serve you well on every step of your journey is a healthy dose of mechanical sympathy. It is possible to write useful software, build data platforms, or produce valuable analytics without ever understanding all of the underlying infrastructure and computation. However, when something breaks or you run into confusing performance issues, that understanding will prove invaluable. Regardless of your chosen specialization, whether it is data engineering, data science, web development, or any other of an infinite gradation, you will be navigating an endless sea of abstractions. Each layer of indirection brings you further from the hardware that is diligently churning through your demanding instructions and winging your data across the world at light speed.

By considering the physical implementations of how CPUs execute instructions, the way that memory registers are populated and accessed, and the complex dance of network connections you will gain a greater appreciation for what is possible, what is practical, and when you might want to take a different approach. It is not necessary (nor is it feasible) to become an expert in all of the systems and software that power your specific project. Instead, you should aim to learn just enough about them to understand their limitations, and to know what you don’t know.

For data engineers, some of the foundational principles that will prove most useful are the different access speeds for networked services, hard drives of different types, and system memory. All of these contribute greatly to the concept of data gravity, which will influence your decisions of how and when to move information between systems, or when to leave it in place and bring your computation to the data. It is also helpful to know the difference between computational architectures, such as general purpose CPUs vs. GPUs, vs. ASICs and when to leverage each of their capabilities.

At the end of the day, there is always a shiny new tool, or language, or problem, but you will always be well served by the time that you invest in understanding the fundamental principles that underly everything that we do as software professionals.