Big Data Challenge

Competition for high school students

It is an opportunity for every high-school student to try the emerging profession of Big Data Analytics by analysing a Big Data business case. Slice and dice the data, zoom into it and out, complement it with data from other public sources, find patterns, trends, and tell a story from your data discovery activity. This is a team competition where you will have a chance to exercise all types of skills: curiosity, statistics, programming, common sense, creativity and presentation skills.


Big Data Challenge team participation fee is $100. You can pay for it here

Big Data is information of extreme size, diversity, velocity and complexity that is far beyond human abilities of comprehension.

Did you know ? In the last minute there were

  • 61,000 hours of music downloaded from Pandora
  • 20 million photo views and 3 million uploads on Flickr
  • 100,000 tweets generated
  • 6 million page views and 277,000 Facebook logins
  • 2+ million Google searches
  • 204 million emails sent (stats from 2012)

To capture the potential of Big Data requires developing new models and analytic capabilities that convert this torrent of data into useful information. Thus far, applications from Big Data have been able to:

  • predict infection in infants before doctors know babies are sick
  • identify traffic patterns to reduce road congestion
  • provide customized product offerings to customers that meet specific needs, reinforcing customer loyalty and improving profitability

The Challenge:

Students will be supplied with a real data set of grocery store transactions. These will include all visits to the stores by customers, and will include their categorized shopping lists. The shoppers themselves will be tagged by their date of birth, gender and address (postal code). There is data for 5 stores. Using the provided data, the students are asked to build a model to analyze the data, and address one or more of the following questions:

  • Predict next-weeks shopping list for each consumer
  • Attempt to classify the consumers by household size, income, demographics
  • Classify consumers by life style, political preferences
  • Any other indicators the students find interesting – be creative!

Students are free to use any supplementary open/freely available data sets they can find in order to support their model and analysis.
Students are free to use any modeling/computational tools available to them for their analysis.

Rules and timeline:

  • Students should form teams (2-4 participants). They are encouraged to have a mentor.
  • SciNet analysts are available for consultation throughout the competition. They can be contacted at
  • Optional information sessions to discuss the contest, the data, software tools (e.g. Python, R, SAS), as well as answer any questions from the students, will be held on November 4, and November 5, from 4-5 PM, at the SciNet headquarters (or by Google Hangout).
  • by November 15, 2014
    • Provide list of participants, their school affiliation, contact information, and mentor information.
    • Pay the $100 participation fee. The money will be used for prizes.
    • Submit a 1-2 page abstract, describing the students motivation and goal for the challenge (other than the prize, of course).
  • by January 12, 2015
    • Submit the report describing Hypothesis, Methodology, Results and Discussion.
    • Supply the computer codes used to analyze the data, for evaluation and reproducibility.
  • by end of January, 2015
    • Judges will announce the short list of 5 finalists, who will be invited to present their work at SAS Canada headquater – or online if travel is not possible.
  • February 13, 2015 – Big Data Day
    • The 5 finalist teams will present their projects to a panel of experts at the SAS Canada headquater.
    • Students are encouraged to participate in the discussions.
    • The judges will select the top 3 winning teams.
  • Submissions and further inquiries should go to:
  • The dataset can be downloaded at:

The winning team will be awarded the SAS $1,000 prize.
Second and third places will receive a monetary prize, based on the number of participating teams.

Please find the list of SAS resources to be provided to students.

  1. SAS University Edition:
  2. SAS On-Demand For Academics – This is a free cloud version of 5 of our products including Enterprise Miner our foundation Data Mining tool. I will start a course for the competition
  3. Free E-Learning from U of T – See attached PDF and the Access code is G70072789. There are a twelve different courses for students learn how to use SAS.
  4. – Great website for students to search and learn about different techniques of Analytics.

Big Data team participation fee is $100. You can pay for it here