Computer Science

Beginner

240 mins

Teacher/Student led

What you need:
Chromebook/Laptop/PC

Data Processing and Algorithm Implementation

In this lesson, you'll learn to process data from your previous work and implement algorithms to analyse it. Follow guided steps to review, clean, and explore your data, code algorithms, evaluate results, and document findings for initial insights.

Learning Goals Learning Outcomes Teacher Notes

Warning: Your screen size is quite small for viewing this. Rotate your device into landscape mode if you can.

1 Introduction
2 Review Collected Data
3 Clean the Data
4 Perform Exploratory Data Analysis
5 Implement Algorithms
6 Evaluate Initial Results
7 Document Progress

1 - Introduction

In this lesson, you'll process the data you collected in the previous lesson and implement algorithms to analyse it. This builds on your hypothesis and prepares you for building the full analytics artefact.

We'll guide you through reviewing your data, cleaning it, exploring patterns, selecting and coding algorithms, evaluating results, and documenting your work. By the end, you'll have initial insights from your data analysis.

This lesson uses Python, as covered in earlier modules. Adapt examples to your dataset. It should take about 180-240 minutes over a few sessions.

2 - Review Collected Data

Begin by examining the dataset from your previous lesson. Check for completeness, structure, and relevance to your hypothesis.

Key Checks:

Is the data complete? Look for missing entries.
Does it have a clear structure (e.g., columns for variables like age, score)?
Is it relevant? Does it help test your hypothesis?
Identify issues like inconsistencies (e.g., different date formats).

Activity: Load your data in Python (assume it's in a list or CSV). Print a summary. Note any issues in a text file for later.

import csv

# Example: Load data from CSV
data = []
with open('your_data.csv', 'r') as file:
    reader = csv.reader(file)
    for row in reader:
        data.append(row)

print(data[:5])  # Print first 5 rows

3 - Clean the Data

Handle any issues found in the review. Clean the data to make it ready for analysis. This step is crucial for structuring and transforming raw data to prepare it for analysis (outcome 3.5).

Steps:

Remove or fill missing values (e.g., replace with averages or remove rows with too many gaps).
Normalise formats (e.g., convert all dates to YYYY-MM-DD or standardise text cases).
Transform variables (e.g., convert strings to numbers, or scale values for consistency).
Check for duplicates and remove them if necessary.

4 - Perform Exploratory Data Analysis

Use statistics and visualisations to understand your data's patterns, trends, and relationships.

Approach:

Calculate basics like mean, min, max.
Create simple plots (use print for text-based or matplotlib if available).
Look for correlations.

Exercise: Compute stats on a numeric column. Interpret what this shows about your hypothesis.

# Assume cleaned_data has numeric values in index 1
values = [row[1] for row in cleaned_data]
mean = sum(values) / len(values)
print('Mean:', mean)

# Simple text visualization
print('Histogram:')
for i in range(0, max(values)+1, 5):
    count = sum(1 for v in values if i <= v < i+5)
    print(f'{i}-{i+4}: ' + '*' * count)

5 - Implement Algorithms

Code the algorithms to calculate frequency, mean, median, and mode, and apply them to your data.

Approach:

Write functions for each statistic (mean, median, mode, frequency).
Test on sample data first to ensure they work correctly.
Apply to your full dataset and store the results.

Activity: Implement these in Python. Adapt to your data structure (e.g., a list of numbers).

# Function to calculate mean
def calculate_mean(data):
    return sum(data) / len(data) if data else 0

# Function to calculate median
def calculate_median(data):
    sorted_data = sorted(data)
    n = len(sorted_data)
    if n % 2 == 0:
        return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
    else:
        return sorted_data[n//2]

# Function to calculate mode
def calculate_mode(data):
    from collections import Counter
    count = Counter(data)
    max_count = max(count.values())
    return [k for k, v in count.items() if v == max_count]

# Function to calculate frequency
def calculate_frequency(data):
    from collections import Counter
    return dict(Counter(data))

# Example usage
sample_data = [1, 2, 2, 3, 4, 4, 4]
print('Mean:', calculate_mean(sample_data))
print('Median:', calculate_median(sample_data))
print('Mode:', calculate_mode(sample_data))
print('Frequency:', calculate_frequency(sample_data))

# Apply to your data (replace with your numeric column)
your_data = [float(row[1]) for row in cleaned_data if row[1]]  # Assuming numeric in index 1
print('Your Data Mean:', calculate_mean(your_data))

Unlock the Full Learning Experience

Get ready to embark on an incredible learning journey! Get access to this lesson and hundreds more in our Digital Skills Curriculum.

Get started

Copyright Notice
This lesson is copyright of DigitalSkills.org 2017 - 2025. Unauthorised use, copying or distribution is not allowed.

Your Answer or Description of Your Work

Project URL This can be a link to your project (e.g. a Scratch, Microbit, Arcade or CodePen URL link).

Upload File

You must be logged into Scratch to share your project, otherwise the link will not work.

To get the URL link of your Scratch project:

Make sure you're logged into Scratch.
Click on the Share button in the project editor.
Click on the Copy Link button.
Click on the Copy Link link. This will save the URL of your project page into your clipboard.

To get the URL link of your Microbit project:

Click on the button in the project editor.
Click on the Publish Project button.
Click on the Copy button. This will save the URL of your project page into your clipboard.

To get the URL link of your Arcade project:

Click on the button in the project editor.
Click on the Publish Project button.
Click on the Copy button. This will save the URL of your project page into your clipboard.

To get the URL link of your CodePen project:

Make sure you're logged into CodePen.
Open your project in the editor.
Click on the Save button at the top of the editor.
Copy the link from the address bar in your web browser (e.g. https://codepen.io/Jane-Smith/pen/gOVXxBQ)

Data Processing and Algorithm Implementation

Teacher Class Feed

1 - Introduction

Send your work to your teacher

2 - Review Collected Data

Send your work to your teacher

3 - Clean the Data

Send your work to your teacher

4 - Perform Exploratory Data Analysis

Send your work to your teacher

5 - Implement Algorithms

Send your work to your teacher

Unlock the Full Learning Experience

Data Processing and Algorithm Implementation

Teacher Class Feed

1 - Introduction

Send your work to your teacher

2 - Review Collected Data

Send your work to your teacher

3 - Clean the Data

Send your work to your teacher

4 - Perform Exploratory Data Analysis

Send your work to your teacher

5 - Implement Algorithms

Send your work to your teacher

Unlock the Full Learning Experience

XP