Week 12 Tutorial: File I/O

Problem 1: The Expense Calculator

Problem Statement: You are helping a small business owner calculate their daily expenses. They have a text file called expenses.txt where every line contains a category and a dollar amount separated by a comma (e.g., Lunch,12.50).

Write a program that:

Opens and reads expenses.txt.
Extracts the cost from each line.
Calculates the total sum of all expenses.
Counts how many individual transactions (lines) there were.
Prints a summary to the console.

Note: You do not need to write to a new file, just print the results.

Input Data Setup (Run this first to create the file):

data = """Lunch,12.50
Coffee,5.00
Office Supplies,23.75
Taxi,10.00
Coffee,8.25
Dinner,50.00"""

with open("expenses.txt", "w") as f:
    f.write(data)

Expected Output:

--- Expense Report ---
Total Transactions: 6
Total Spent: $109.50
Average Expense: $18.25

Problem 2: The Log Filter

Problem Statement: You have a messy server log file called server_log.txt. It contains general info messages, warnings, and critical errors mixed together. Your job is to extract only the lines containing the word “ERROR” and save them into a separate file called urgent_alerts.txt.

Write a program that:

Reads server_log.txt.
Checks each line to see if it contains the substring "ERROR" (case-sensitive).
If it does, write that line to a new file called urgent_alerts.txt.
At the end, print how many errors were found.

Input Data Setup (Run this first to create the file):

log_data = """[INFO] System started successfully
[WARNING] Memory usage high
[ERROR] Database connection failed
[INFO] User logged in
[ERROR] Payment gateway timeout
[INFO] Scheduled backup complete
[ERROR] Disk space critical"""

with open("server_log.txt", "w") as f:
    f.write(log_data)

Expected Output (Console):

Scan complete. Found 3 errors.
Please check urgent_alerts.txt.

Expected Output (File: urgent_alerts.txt):

[ERROR] Database connection failed
[ERROR] Payment gateway timeout
[ERROR] Disk space critical

Problem 3: The Inventory Restocker

Problem Statement: You manage a warehouse. You have a file called inventory.csv where each line represents a product in the format: Product Name,Current Stock,Minimum Required.

Write a program that:

Reads the inventory.csv file.
Identifies which items are below their minimum required level.
Creates a new file called reorder_list.txt.
Writes the names of the items that need to be reordered, and how many need to be bought to reach the minimum level.

Input Data Setup (Run this first):

# Format: Product, Stock, Minimum
data = """Apples,50,100
Bananas,120,100
Cherries,5,20
Dates,50,50
Eggs,10,24"""

with open("inventory.csv", "w") as f:
    f.write(data)

Expected Logic Example:

Apples: Have 50, need 100. (50 < 100). Order 50.
Bananas: Have 120, need 100. (120 >= 100). Do nothing.

Expected Output (File: reorder_list.txt):

Item: Apples | Order Amount: 50
Item: Cherries | Order Amount: 15
Item: Eggs | Order Amount: 14

Problem 4: The Formatting Fixer

Problem Statement: You have received a file raw_users.txt containing user signup data. However, the users typed their names in messy ways (weird capitalization, extra spaces) and provided their birth year instead of their age.

Data Format: Full Name - BirthYear

Write a program that:

Reads the file.
Formats the name to be “Title Case” (e.g., “jOhN dOE” becomes “John Doe”).
Calculates their approximate age (assume the current year is 2025).
Writes a clean file clean_profiles.txt in the format: Name: [Name] (Age: [Age]).

Input Data Setup (Run this first):

# Note the messy spacing and casing
data = """  john smith - 1990
SARAH CONNOR - 1984
  kylo REN - 1995
LARA croft - 1992"""

with open("raw_users.txt", "w") as f:
    f.write(data)

Expected Output (File: clean_profiles.txt):

Name: John Smith (Age: 35)
Name: Sarah Connor (Age: 41)
Name: Kylo Ren (Age: 30)
Name: Lara Croft (Age: 33)

Problem 5: The Election Auditor

Problem Statement: You are auditing an election. You have a file called votes.txt. Each line represents a ballot in the format: VoterID:CandidateName.

However, the machine that generated the file was glitchy:

Some lines are incomplete (missing a name or ID).
Some lines have extra whitespace.
You need to count the valid votes for each candidate.

Write a program that:

Reads votes.txt.
Skips invalid lines (where the format isn’t ID:Name).
Counts the votes for each candidate using a dictionary.
Calculates the percentage of the total vote each candidate received.
Writes a results.txt file that lists the candidates, their vote counts, their percentages, and declares a winner.

Input Data Setup:

data = """1001:Alice
1002:Bob
1003:Alice
ERROR_READING_LINE
1004: Charlie
1005:Alice
1006:Bob
1007:   
1008:David"""

with open("votes.txt", "w") as f:
    f.write(data)

Expected Output (File: results.txt):

OFFICIAL ELECTION RESULTS
-------------------------
Alice: 3 votes (42.9%)
Bob: 2 votes (28.6%)
Charlie: 1 votes (14.3%)
David: 1 votes (14.3%)

-------------------------
Total Valid Votes: 7
WINNER: Alice

Problem 6: The Cross-Referencing Text Analyzer

Problem Statement: You are building a tool to analyze the keyword density of a text file, but you need to ignore common “stop words” (like “the”, “is”, “at”) so they don’t clutter the results.

You have two files:

stopwords.txt: A list of words to ignore (one per line).
story.txt: A paragraph of text.

Write a program that:

Loads the stopwords into a list.
Reads the story.txt.
Processes the story word-by-word. You must:
- Convert to lowercase.
- Remove punctuation (periods, commas).
- Ignore the word if it is in your stopword list.
Counts the frequency of the remaining “interesting” words.
Writes the valid words and their counts to analysis.txt.

Input Data Setup:

# File 1: Words to ignore
stops = """the
is
at
on
a
and"""

with open("stopwords.txt", "w") as f:
    f.write(stops)

# File 2: The text to analyze
story = """The cat sat on the mat. 
The cat is a good cat. 
Is the dog on the mat? No, the dog is at the park."""

with open("story.txt", "w") as f:
    f.write(story)

Expected Output (File: analysis.txt):

WORD FREQUENCY REPORT
---------------------
cat: 3
sat: 1
mat: 2
good: 1
dog: 2
no: 1
park: 1

Problem 7: The Multi-Store Sales Consolidator

Problem Statement: You are the regional manager for a chain of three stores. Each store manager sends you a daily sales report as a separate CSV file. Your job is to consolidate all three files into a single master report.

Each store file has the format: Product,UnitsSold,PricePerUnit

Write a program that:

Reads all three store files (store_a.csv, store_b.csv, store_c.csv).
Consolidates the data by calculating:
- Total Units Sold for each product across all stores.
- Total Revenue for each product across all stores (Units × Price).
Identifies which store sold the most units overall.
Writes a consolidated report to regional_report.txt.

Hint: You’ll need a dictionary where keys are product names and values are another dictionary (or a list) storing totals.

Input Data Setup (Run this first to create the files):

store_a = """Laptop,5,999.99
Mouse,20,25.00
Keyboard,15,75.00
Monitor,8,300.00"""

store_b = """Laptop,3,999.99
Mouse,35,25.00
Headphones,12,150.00
Keyboard,10,75.00"""

store_c = """Mouse,25,25.00
Monitor,5,300.00
Headphones,8,150.00
Laptop,7,999.99"""

with open("store_a.csv", "w") as f:
    f.write(store_a)

with open("store_b.csv", "w") as f:
    f.write(store_b)

with open("store_c.csv", "w") as f:
    f.write(store_c)

Expected Output (File: regional_report.txt):

============================================
       REGIONAL SALES CONSOLIDATION
============================================

Product          Units Sold    Total Revenue
--------------------------------------------
Laptop           15            $14,999.85
Mouse            80            $2,000.00
Keyboard         25            $1,875.00
Monitor          13            $3,900.00
Headphones       20            $3,000.00

--------------------------------------------
GRAND TOTAL REVENUE: $25,774.85

TOP SELLING STORE: Store B (60 units sold)
============================================

Hints:

Create a function to process a single store file and return its data.
Use a nested dictionary like: {"Laptop": {"units": 0, "revenue": 0.0}, ...}
Track each store’s total units separately to find the top seller.

Problem 8: The Student Grade Processor with Validation

Problem Statement: You are building a grade processing system for a school. The raw input file grades_raw.txt contains student records, but the data is messy and contains various errors that must be handled gracefully.

Data Format: StudentID,Name,Assignment1,Assignment2,Assignment3,Exam

The data has these potential issues:

Some lines have missing fields (fewer than 6 columns).
Some scores are not valid numbers (typos like “eighty” or empty).
Some scores are out of the valid range (scores must be 0-100).
Some lines are completely empty.

Write a program that:

Reads grades_raw.txt and validates each line.
For valid students:
- Calculates their average score (all 4 assessments weighted equally).
- Assigns a letter grade (A: 90+, B: 80-89, C: 70-79, D: 60-69, F: below 60).
For invalid lines:
- Logs the line number, the original data, and a description of the error.
Writes two output files:
- final_grades.txt: Clean list of students with their averages and letter grades.
- processing_errors.txt: Log of all errors encountered.
Prints a summary to the console showing how many records were processed successfully vs. how many had errors.

Input Data Setup (Run this first to create the file):

data = """S001,Alice Smith,85,90,88,92
S002,Bob Jones,78,82,eighty,75
S003,Charlie Brown,95,91,89,94
S004,Diana Prince,70,65
S005,Eve Wilson,88,105,90,85
S006,Frank Miller,60,58,62,55

S007,Grace Lee,72,78,75,80
S008,Henry Ford,,85,80,78
S009,Ivy Chen,90,88,92,95
S010,Jack Black,45,50,48,52"""

with open("grades_raw.txt", "w") as f:
    f.write(data)

Expected Output (File: final_grades.txt):

FINAL GRADE REPORT
==========================================
ID      Name              Average   Grade
------------------------------------------
S001    Alice Smith       88.8      B
S003    Charlie Brown     92.2      A
S006    Frank Miller      58.8      F
S007    Grace Lee         76.2      C
S009    Ivy Chen          91.2      A
S010    Jack Black        48.8      F
==========================================
Total Students Processed: 6
Class Average: 76.0

Expected Output (File: processing_errors.txt):

PROCESSING ERROR LOG
==========================================
Line 2: S002,Bob Jones,78,82,eighty,75
  -> Error: Invalid score format (non-numeric value)

Line 4: S004,Diana Prince,70,65
  -> Error: Missing fields (expected 6, found 4)

Line 5: S005,Eve Wilson,88,105,90,85
  -> Error: Score out of range (must be 0-100)

Line 7:
  -> Error: Empty line detected

Line 9: S008,Henry Ford,,85,80,78
  -> Error: Invalid score format (empty value)

==========================================
Total Errors: 5

Expected Console Output:

Processing complete!
- Successfully processed: 6 students
- Errors encountered: 5 records
Check 'final_grades.txt' and 'processing_errors.txt' for details.

Hints:

Use try/except to catch conversion errors when parsing scores.
Check len(parts) after splitting to detect missing fields.
Validate each score is between 0 and 100 after successful conversion.
Keep track of line numbers using enumerate() or a counter variable.
Process the file once, writing to both output files as you go.