Course Details

May 30-June 1, 2017

Location: Columbia University Campus, New York City, New York USA


Register Now


Professor Eugene Wu

Prof. Eugene Wu

Eugene Wu is Assistant Professor of Computer Science at Columbia University. He received a Ph.D. in Electrical Engineering and Computer Science from MIT in 2014, and a B.S. from UC Berkeley. He is broadly interested in technologies that help users play with their data. His goal is for users at all technical levels to effecitvely and quicly make sense of their information. Eguene is interested in solutions that ultimately improve the interface between users and data, and techniques borrowed form fields such as data management, systems, crowd sourcing, visualization, and human-computer interfaces.

Prof. Suman Jana

Prof. Suman Jana

Suman Jana is Assistant Professor of Computer Science at Columbia University. His primary research area is computer security and privacy. He is broadly interested in securing software systems by designing automated tools for finding software bugs/invulnerabilities. He has received several awards for his research including two best paper awards at the IEEE symposium on Security and Privacy (Oakland), the PET award, and NYU-Poly/AT&T best applied security paper award. He is also a recipient of a Google Research Award in Computer Security. More details about his work can be found at

Course Content and Structure

This three-day course provides a hands-on introduction to data processing with big data, as well as security risks and privacy issues regarding big data systems. Data processing topics include:

  • Key components of a big data processing pipeline
  • Data cleaning and preparation
  • Query execution
  • Modern data processing engines
  • Data Visualization

Big Data privacy topics include:

  • k-anonymity
  • l-diversity
  • t-closeness
  • Deanonymization attacks
  • Differential privacy

Designing security big data systems topics include:

  • Memory corruption attacks
  • Sandboxing
  • Understanding and using bug finding tools
  • Problems with data anonymity and privacy
  • Denonymization attacks
  • Differential privacy

Course Objectives

After taking this course, participants will have a broad understanding of the data management lifecycle and the key issues that need to be taken into account when creating a data processing pipeline. This includes the ability to clean and prepare data to be put into big data processing systems, the different architecture decisions when picking a big data processing system, and how to use it for processing and visualizaing large data sets.

Further, participants will learn about the key classes of security and privacy risks that big data systems pose as well as the principles needed to design systems that can mitigate these risks. In addition, participants will learn about the dangers that buggy code poses to big data systems and get hands-on experience in finding and fixing security bugs using automated bug finding tools.

Who Should Attend

Professionals and executives that are interested in managing large amounts of data and the security and privacy implications.  We will discuss large scale data management, data privacy, security, and visualization.