Back to search results

Mining Massive Datasets


Provider
coursera

Price
Free

School
Stanford University

Type
University

Instructors
Jeff Ullman, Jure Leskovec, Anand Rajaraman

Categories
Computer Science, Computer Science

Duration
7 weeks

Format
Video

Language
English

Description
We introduce the participant to modern distributed file systems and MapReduce, including what distinguishes good MapReduce algorithms from good algorithms in general.  The rest of the course is devoted to algorithms for extracting models and information from large datasets.  Participants will learn how Google's PageRank algorithm models importance of Web pages and some of the many extensions that have been used for a variety of purposes.  We'll cover locality-sensitive hashing, a bit of magic that allows you to find similar items in a set of items so large you cannot possibly compare each pair.  When data is stored as a very large, sparse matrix, dimensionality reduction is often a good way to model the data, but standard approaches do not scale well; we'll talk about efficient approaches.  Many other large-scale algorithms are covered as well, as outlined in the course syllabus.