Back to search results

Advanced Distributed Machine Learning with Apache Spark


Provider
edX

Price
Free

School
Berkeley

Type
University

Instructors
Ameet Talwalkar, Jon Bates

Categories
Computer Science, Computer Science

Duration
4 weeks

Format
Mixed

Language
English

Description
Building on the core ideas presented in Distributed Machine Learning with Spark, this course covers advanced topics for training and deploying large-scale learning pipelines. You will study state-of-the-art distributed algorithms for collaborative filtering, ensemble methods (e.g., random forests), clustering and topic modeling, with a focus on model parallelism and the crucial tradeoffs between computation and communication. After completing this course, you will have a thorough understanding of the statistical and algorithmic principles required to develop and deploy distributed machine learning pipelines. You will further have the expertise to write efficient and scalable code in Spark, using MLlib and the spark.ml package in particular.