Setting Up Data Science on Your M1 MacBook: A Comprehensive Guide
Written on
Chapter 1: Unboxing the M1 MacBook
In April 2021, I had the chance to unbox a brand new M1 Apple MacBook Air.
Quick Overview
If you aim to run Python libraries natively (without relying on Rosetta 2) on your new M1 MacBook, I highly recommend using Miniforge. It functions similarly to Miniconda but comes with Apple M1 support from the get-go.
Why Choose a Mac?
While I contemplated an Ubuntu setup, I am fond of certain commercial applications like Adobe Lightroom, which are exclusive to macOS and Windows. Although I thought about switching to Windows with WSL2 to enjoy both GUI apps and a Linux development environment, there’s an undeniable charm in MacBook's industrial design. The convenience of tossing a MacBook Air into my backpack is unmatched, and I've become quite accustomed to macOS.
I acknowledge the challenges posed by the new ARM64 processor architecture, which has led to some compatibility issues with data science libraries. This is a new obstacle for me, but the speed of the new Apple chips is impressive, and the developer community is actively addressing compatibility concerns.
While it may not be the most powerful laptop for the price, user experience is more critical for me than sheer computational power, especially since heavy machine learning tasks are increasingly handled on cloud servers. I have a basic understanding of AWS and am currently enhancing my skills in Docker, Kubernetes, and Terraform. I can still access my old x86 Hackintosh desktop for specific tasks when needed.
Why a New Laptop?
I was initially searching for used MacBooks from 2016 to 2018 on eBay, trying to dodge the unreliable butterfly keyboards. However, my employer offered to purchase a new Apple machine for me, which I gladly accepted. I'm confident that full compatibility with popular data science libraries will improve over time, making this new computer faster and more durable than a used model.
Now, let’s go through the initial setup process on my new hardware.
Initial Setup on macOS Big Sur
I breezed through the welcome screens, opting out of Siri and selecting English as my language. Here’s how I customized my experience:
- Dock Configuration: I hid the dock to maximize screen space and switched the minimize animation to a more sophisticated "scale."
- WiFi Connection: I connected to my phone's hotspot to log into LastPass and copied my home WiFi password from there.
- Browser Setup: I downloaded Chrome and set it as my default browser, logging into my Google account and installing my essential extensions: LastPass, RegEx search, Privacy Badger, AdBlock, Grammarly, and JSONview.
- Trackpad Sensitivity: I maximized the trackpad sensitivity.
- Uninstalling Pre-installed Apps: I removed apps like GarageBand and Keynote, opting for Google’s cloud applications instead.
Development Environment Preparation
After Homebrew installed, I added it to my PATH variable and installed awscli for cloud development.
Quality-of-Life Enhancements
I installed tools like Flycut and Spectacle for better clipboard management and screen organization. However, I encountered a hiccup: Spectacle required Rosetta to function, which indicated that compatibility issues might persist.
I also set up MenuMeters to monitor CPU temperature and utilization at a glance.
Next, I installed Dropbox for cloud storage, and communication apps like WhatsApp and Slack. Music is vital, so I installed Spotify as well.
Diving Back into Development
I installed pyenv, Git Open, and The Fuck for managing Python versions and enhancing terminal interactions.
I set up Vundle for Vim, primarily for airline features and relative line numbers. After syncing Visual Studio Code settings, I ensured that necessary extensions were installed, such as Python Docstring Generator and SonarLint.
Troubleshooting Begins
I attempted to install Python 3.8.6 but faced compatibility issues. I then tried Python 3.9.4, which worked seamlessly on Apple Silicon. However, my attempts to install various Python libraries like OpenCV were met with challenges, leading to a decision to build them from source.
After several attempts, I successfully installed NumPy directly from the source repository, which gave me a glimmer of hope for native functionality on the ARM64 architecture.
The Ultimate Solution: Miniforge
Faced with ongoing issues, I decided to install Miniforge via Homebrew. It simplifies package management, enabling the creation of virtual environments easily. This tool’s design focuses on supporting various CPU architectures, which is beneficial as the community transitions towards compatibility with Apple Silicon.
Chapter 2: YouTube Tutorials
For those looking to set up their M1 MacBook for data science, here are two valuable video resources:
The first video, "Correct Data Science setup for Arm Macs (M1/M2)", offers a comprehensive guide for setting up your data science environment specifically tailored for Apple’s M1 and M2 chips.
The second video, "Setup Mac for Machine Learning with PyTorch in 11 minutes", provides a quick walkthrough for configuring your Mac for machine learning tasks with PyTorch, applicable to all M1 and M2 models.
In summary, this has been my journey in configuring a new M1 MacBook Air for data science tasks. Initially, I tried using pyenv for virtual environments, but ultimately, Miniforge proved to be the most effective solution for managing my Python packages, ensuring compatibility with the ARM architecture.