Creating Learner Progress Monitoring Using Python, Pandas and Streamlit
- Habib Abdurrasyid
- 04 Apr, 2022
Why I Built This
In my role as Academic Program Manager for the AWS re/Start program, I was managing multiple cohorts at the same time. Each cohort had its own Canvas LMS gradebook, and each gradebook could have more than 170 columns with mixed data. Knowledge Checks, Labs, scores, completion flags, all in one file.
At first, reviewing each cohort manually was fine. But once the program grew to 10 or more cohorts per batch, doing this the manual way started to take too much time and became easy to make mistakes.
I needed something better, so I built a tool for it.
The Problem in Practice
Canvas LMS is actually a good system. But the gradebook exports it gives you are designed to be complete, not easy to read. Here is what I was dealing with every week:
- One CSV file per cohort
- 170+ columns per file, mixing Knowledge Checks, Labs, and other things
- No clear separation between KC scores, Lab completion, and general course data
- Weekly progress targets that needed to be checked for each learner
With 10 cohorts, that is 10 files and hundreds of rows to go through manually. When programs also ran in India at the same time, it was more than 40 cohorts total. Manual review was not practical anymore.
What I Built
I made a web application using Python, Pandas, and Streamlit. The idea was to keep it simple, upload the files, get the results.
The flow is:
graph LR
A[Upload CSVs] --> B[Parse Data]
B --> C[Aggregate KC and Lab]
C --> D[Check Weekly Targets]
D --> E[Download Summary]
No complicated setup. The people using it are program coordinators, not engineers, so the interface had to be straightforward.
What the app does:
- Accept multiple CSV uploads at once (one per cohort)
- Automatically find and parse Knowledge Check columns
- Automatically find and parse Lab completion columns
- Calculate each learner’s progress against the weekly targets
- Output a clean, downloadable summary per cohort
Tech Stack
I picked tools that I knew well and that would be fast to build and easy to maintain later.
- Python for the core logic
- Pandas for handling 170+ column DataFrames without it getting messy
- Streamlit because building a working web UI in Python takes hours, not days
- PythonAnywhere for hosting, so anyone can access it without installing anything
This is not the most sophisticated stack, but it was the right one for this problem. The tool needed to run reliably and be usable by non-technical team members. This stack delivered that.
One Extra Thing: The Leaderboard
After the main tool was working, I added a learner-facing progress leaderboard using Google Sheets.
The idea was simple. Learners could see their own KC and Lab scores compared to others in the cohort. This did two things:
- It reduced the number of reminder messages coordinators had to send
- It gave learners a way to monitor themselves without waiting for someone to tell them how they were doing
It was a small addition but it made the experience better for both sides.
Impact
The tool ended up being used across multiple batches:
- 3 AWS re/Start batches total
- 10+ cohorts in Indonesia
- 40+ cohorts in India
The time to process each batch dropped a lot compared to doing it manually. Coordinators could spend that saved time on actual learner support instead of spreadsheet work.
What I Would Change Now
Looking back at this, there are things I would do differently:
Automated gradebook retrieval. Right now, someone has to download the files from Canvas and upload them. It would be better to pull directly from the Canvas API.
Built-in charts. The output is a clean CSV, but visualizing trends over time still requires extra work. Putting basic charts directly in the app would help a lot.
Timeline tracking. Right now the tool shows current state. Tracking progress over weeks would make it easier to see patterns early.
But for what it was solving at the time, it worked well.