Bootstrap’s Data Science Course for Middle- and High-School Students

From a deluge of job openings to new university programs, Data Science has become a hot topic in both industry and academia. But if Data Science is so important, why wait until a student enters university to introduce them to the subject? In addition, Data Science is a way to show how computing engages with society, and it has the potential to generate interesting and surprising results. It therefore offers a channel to interest whole new groups of students and teachers in computing.

Plus, children are natural data scientists! Children naturally argue about who was the greatest quarterback of all time, the most successful singer, or which chain has the best pizza. These questions quickly shift to data: a child will point out that athlete X won more trophies than athlete Y, but her friend will argue that Y won their trophies in a shorter length of time. Perhaps Al’s Pizza Barn is rated higher than Bob’s Pizza Warehouse, but Bob’s is much less expensive. As students argue which choice is best, they naturally shift to “which definition of best is most accurate.” As they become more politically involved or career-conscious, they want to know whether a law is racist, or whether the outcomes of going to a particular college justify the extra student loans. In a world that is data-rich, the problem these students face isn’t a lack of information. It’s a lack of tools to ask questions of that information.

The motivation for a K-12 Data Science curriculum is clear, but building such a curriculum requires careful thinking about software, pedagogy and curriculum. To blend in with mainstream courses, these curricula should be designed to fit comfortably within existing content strands, and aligned to national and state standards for statistics, CTE, and/or business.

Professional-grade tools like Stata and R offer powerful features, but aren’t explicitly designed to be child- or teacher-friendly. At the opposite extreme, spreadsheets have deep roots in educational settings, but lack the programming component, and some of the features, necessary to build a rigorous introductory Data Science course.

Current attempts to fill the gap in the middle are a poor compromise, attempting to squeeze Data Science in by having students program various loops over two-dimensional arrays. Making for-loops and nested data structures a prerequisite for Data Science immediately limits the possible audience of students, and burns valuable time that could be spent addressing the standards-alignment of mainstream courses. If a Data Science module needs a few weeks to introduce for-loops and two-dimensional arrays, that’s weeks of time spent before a teacher can address the needed standards for bar, pie and scatter plots, measures of center, or linear regression. This approach rules out scale (only a small percentage of students and teachers are ready for this material) and equity (asking students to self-select into these courses only reinforces existing stereotypes). What’s needed is a holistic approach to Data Science that has all three: equity, rigor, and scale.

Bootstrap is one of the only groups in the CS Education field that builds our own curriculum, software, and programming tools. This gives us a unique opportunity to fill that gap, with a programming language that makes operations on tabular data (literally, spreadsheets) accessible without the overhead of teaching loops. We’ve leveraged our world-class language development team to bring rigorous Data Science to an introductory computing module. And since our unit of data storage is a spreadsheet, there’s a smooth on-ramp for teachers who are comfortable with Microsoft Excel and Google Sheets.

Thanks to seed funding from Bloomberg’s Office of the CTO, a Bootstrap:Data Science course is already being piloted at several middle- and high-schools. In Rhode Island, high school students used our module to compare college acceptance rates for high schools across the state. 6th graders in North Carolina studied the role that Data Science plays in things like credit card fraud detection. By lowering the barrier to entry, Bootstrap:Data Science is a true introductory course, which addresses learning goals that make it suitable in a number of mainstream classes:

  • Statistics classes can use Bootstrap:Data Science to cover core concepts in statistics, such as measures of center (mean, median, mode), and visualizing data (line, bar, and scatter plots).
  • Business classes can use the Data Science module to make inferences about sales, profits, and customer demand, porting existing spreadsheet-based coursework to our programmed lessons.
  • Civics and Social Studies classes can use Data Science to explore the role of data in government and social policy, exploring the impact of things like stop-and-frisk, the Electoral College, and third-party voting on the world around them.

Even with a powerful tool and a flexible curriculum, a successful course still needs an effective pedagogy. Bootstrap builds on a world-class pedagogical technique developed over decades at the university level, which has been proven to work with students of all ages and abilities via our Bootstrap:Algebra course. More than 20,000 students each year use our structured approach to problem solving (~45% female, ~50% students of color), and we’re excited to bring this approach to Data Science.

This summer, Bootstrap held its first-ever teacher professional development training for Bootstrap:Data Science at CSPdWeek. In an early indicator of the curriculum’s potential, the majority of these applicants were not CS teachers, nor were they being assigned to teach a CS class! The training covered introductory programming, visualization using half a dozen chart types, the core statistical concepts mandated by most state and national standards (mean, median, mode, linear regression, r-squared, etc), and table queries.

Staying true to our belief that Data Science must be about more than the tool, the course emphasized the thinking and writing side of things, encouraging teachers to focus on making meaning rather than just writing code. Some investigated the relationship between home ownership and income. Others looked into whether or not louder pop songs tended to have faster beats. In every case, teachers encountered outliers, surprising trends, and more complex relationships than they anticipated.

Just as their students will, these teachers focused on questions they cared about, and used the Data Science concepts they learned to search for answers in the data. By the end of the week, they had created detailed reports of their analysis, which they presented in front of an audience. They found themselves thinking about fitting the material into their own math, business, computing, and social studies classes, and gave us invaluable feedback about how to make the curriculum even better. We still have more to learn and to much to do, but we believe that we’ve found a great starting point for an authentic, accessible curriculum: using computation as a vehicle for thinking, talking, and writing about data.

This article was contributed by Brown University’s Shriram Krishnamurthi and Emmanuel Schanzer who are part of the team that directs Bootstrap.