Understanding Data:
An Introductory Lesson
A slide-based lesson for adult career-changers entering a data program — built to establish shared vocabulary before any tool or technique is introduced.
The Gap
Career-changers entering a data program often arrive with enthusiasm and some tool exposure, but without a shared vocabulary for talking about data itself. In practice this showed up in a specific and consistent way: students were selecting statistical tests and visualizations that weren't appropriate for their data types. They could operate tools, but they couldn't reason about what the tools should be doing.
A compounding factor: the curriculum had been built without formative checks, so misconceptions weren't surfacing until students were already applying tools incorrectly. By the time errors appeared, they were downstream of the real problem.
The Diagnosis
The root cause wasn't a gap in tool skills — it was a gap one level upstream. Without a reliable way to classify and describe data, students had no principled basis for any downstream decision. They were making choices by pattern-matching to examples rather than understanding the logic. No amount of additional tool instruction would fix that.
The Approach
I redesigned the lesson to establish a shared vocabulary before introducing any software or technique. The core sequence:
- Anchor the lesson to a single core idea: a dataset is organized information with structure.
- Introduce the Iris dataset as a running example — chosen deliberately because it contains both numerical and categorical variables in the same table, setting up the classification distinctions that follow.
- Define variable and observation using consistent color-coded highlights on the same table, so learners see the same data from two structural angles.
- Classify data types (numerical vs. categorical, then ordinal vs. nominal) and connect each classification to the decisions it enables.
- Close with a discussion prompt — a smoking habits variable represented two ways — that asks learners to reason about how representation affects downstream use. This is the formative check the original curriculum was missing.
Pacing was managed through Beamer reveals, so each concept landed before the next was introduced. The goal throughout was to give students a reliable first question to ask when facing any unfamiliar dataset: what kind of data is this, and what does that tell me about what I can do with it?
The Lesson
Next Iteration
I'd add a short diagnostic at the start — a few examples where students classify data and justify their choice before the lesson begins. That would surface misconceptions earlier and give a clearer picture of where each student is starting from.
More broadly, this lesson was one part of a larger response to a curriculum that lacked formative structure throughout. The discussion prompt at the end is a step in that direction — but integrating checks for understanding at each stage of the vocabulary sequence would make the design more complete.
Want to see more? I can share additional samples tailored to a specific role or context.
Get in touch