Computer Science
Beginner
240 mins
Teacher/Student led
What you need:
Chromebook/Laptop/PC

Data Processing and Algorithm Implementation

In this lesson, you'll learn to process data from your previous work and implement algorithms to analyse it. Follow guided steps to review, clean, and explore your data, code algorithms, evaluate results, and document findings for initial insights.
Learning Goals Learning Outcomes Teacher Notes

Teacher Class Feed

Load previous activity

    1 - Introduction

    In this lesson, you'll process the data you collected in the previous lesson and implement algorithms to analyse it. This builds on your hypothesis and prepares you for building the full analytics artefact.

    We'll guide you through reviewing your data, cleaning it, exploring patterns, selecting and coding algorithms, evaluating results, and documenting your work. By the end, you'll have initial insights from your data analysis.



    This lesson uses Python, as covered in earlier modules. Adapt examples to your dataset. It should take about 180-240 minutes over a few sessions.

    2 - Review Collected Data

    Begin by examining the dataset from your previous lesson. Check for completeness, structure, and relevance to your hypothesis.

    Key Checks:

    • Is the data complete? Look for missing entries.
    • Does it have a clear structure (e.g., columns for variables like age, score)?
    • Is it relevant? Does it help test your hypothesis?
    • Identify issues like inconsistencies (e.g., different date formats).
    Activity: Load your data in Python (assume it's in a list or CSV). Print a summary. Note any issues in a text file for later.
    import csv
    
    # Example: Load data from CSV
    data = []
    with open('your_data.csv', 'r') as file:
        reader = csv.reader(file)
        for row in reader:
            data.append(row)
    
    print(data[:5])  # Print first 5 rows

    3 - Clean the Data

    Handle any issues found in the review. Clean the data to make it ready for analysis. This step is crucial for structuring and transforming raw data to prepare it for analysis (outcome 3.5).

    Steps:

    1. Remove or fill missing values (e.g., replace with averages or remove rows with too many gaps).
    2. Normalise formats (e.g., convert all dates to YYYY-MM-DD or standardise text cases).
    3. Transform variables (e.g., convert strings to numbers, or scale values for consistency).
    4. Check for duplicates and remove them if necessary.

    4 - Perform Exploratory Data Analysis

    Use statistics and visualisations to understand your data's patterns, trends, and relationships.

    Approach:

    • Calculate basics like mean, min, max.
    • Create simple plots (use print for text-based or matplotlib if available).
    • Look for correlations.
    Exercise: Compute stats on a numeric column. Interpret what this shows about your hypothesis.
    # Assume cleaned_data has numeric values in index 1
    values = [row[1] for row in cleaned_data]
    mean = sum(values) / len(values)
    print('Mean:', mean)
    
    # Simple text visualization
    print('Histogram:')
    for i in range(0, max(values)+1, 5):
        count = sum(1 for v in values if i <= v < i+5)
        print(f'{i}-{i+4}: ' + '*' * count)

    5 - Implement Algorithms

    Code the algorithms to calculate frequency, mean, median, and mode, and apply them to your data. 

    Approach:

    • Write functions for each statistic (mean, median, mode, frequency).
    • Test on sample data first to ensure they work correctly.
    • Apply to your full dataset and store the results.
    Activity: Implement these in Python. Adapt to your data structure (e.g., a list of numbers).
    # Function to calculate mean
    def calculate_mean(data):
        return sum(data) / len(data) if data else 0
    
    # Function to calculate median
    def calculate_median(data):
        sorted_data = sorted(data)
        n = len(sorted_data)
        if n % 2 == 0:
            return (sorted_data[n//2 - 1] + sorted_data[n//2]) / 2
        else:
            return sorted_data[n//2]
    
    # Function to calculate mode
    def calculate_mode(data):
        from collections import Counter
        count = Counter(data)
        max_count = max(count.values())
        return [k for k, v in count.items() if v == max_count]
    
    # Function to calculate frequency
    def calculate_frequency(data):
        from collections import Counter
        return dict(Counter(data))
    
    # Example usage
    sample_data = [1, 2, 2, 3, 4, 4, 4]
    print('Mean:', calculate_mean(sample_data))
    print('Median:', calculate_median(sample_data))
    print('Mode:', calculate_mode(sample_data))
    print('Frequency:', calculate_frequency(sample_data))
    
    # Apply to your data (replace with your numeric column)
    your_data = [float(row[1]) for row in cleaned_data if row[1]]  # Assuming numeric in index 1
    print('Your Data Mean:', calculate_mean(your_data))

    Unlock the Full Learning Experience

    Get ready to embark on an incredible learning journey! Get access to this lesson and hundreds more in our Digital Skills Curriculum.

    Copyright Notice
    This lesson is copyright of DigitalSkills.org 2017 - 2025. Unauthorised use, copying or distribution is not allowed.
    🍪 Our website uses cookies to make your browsing experience better. By using our website you agree to our use of cookies. Learn more