Professor Widom offers a range of options for different audiences, although the focus is on fundamental learning rather than advanced development skills or operational deployment. Material is drawn from a course she recently developed at Stanford. The most detailed offering is a short-course lasting a full week, covering a variety of topics and including a great deal of hands-on learning. Except for the broad overview, students should be comfortable with basic mathematical concepts. Some portions of the material require a modest amount of computer programming experience (equivalent to an introductory programming course).
Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing massive data sets. At the same time, it is surprisingly easy to come to make errors or come to false conclusions from data analysis alone. Professor Widom's seminars, tutorials, and short-courses provide a broad introduction to big data and data science, including history, case studies, pitfalls, and basic tools & techniques for data collection, analysis, and visualization.
Formats range from 2 hour seminars, to 1-2 day tutorials, to a weeklong course. Depending on the desired format and the background of the students, the following topics may be covered.
Introduction to Big Data and Data Science
- Motivation, history, and terminology
- Success stories and failure cases
- Privacy considerations
Fundamental Concepts and Techniques
- Basic data operations
- Data mining
- Machine learning: regression, classification, clustering
- Correlation and causation
Tools for Data Manipulation and Analysis
- Relational databases and SQL
- The Python and R programming languages
- Data visualization tools