GO
Senior Software Engineer, Machine Learning Infrastructure
Job Description
Google DeepMind is a world-leading AI research lab. We are seeking experienced Senior Software Engineers to build and scale the infrastructure that powers our groundbreaking AI research and development. You will play a crucial role in designing, implementing, and maintaining the systems that enable our researchers and engineers to train and deploy large-scale models efficiently and reliably.
**Responsibilities:**
* Develop and optimize high-performance distributed training and inference systems.
* Build robust and scalable ML pipelines for data processing, model training, and deployment.
* Contribute to the design and implementation of internal ML frameworks and tools.
* Collaborate with research scientists and other engineers to understand and address infrastructure needs.
* Ensure the reliability, efficiency, and maintainability of our ML systems.
**Requirements:**
* BS/MS degree in Computer Science or a related technical field, or equivalent practical experience.
* 5+ years of software engineering experience, with a focus on large-scale systems.
* Strong programming skills in C++ and Python.
* Experience with distributed systems, cloud computing (GCP preferred), and containerization technologies (e.g., Kubernetes).
* Familiarity with machine learning concepts and frameworks (e.g., TensorFlow, JAX).
* Experience with performance tuning and optimization.
**Benefits:**
Google offers a comprehensive benefits package including competitive salary, performance bonuses, stock grants, 401(k) with company match, health insurance, and numerous on-site amenities. We provide opportunities for continuous learning and professional growth.
Skills & Tags
ml infrastructuresoftware engineeringdistributed systemshigh-performance computingpython