Week 6 Lecture: Advanced Lists, Tuples & Linear Search

1. Nested Lists (2D Lists)

So far, the lists you’ve worked with have been one-dimensional sequences, like [10, 20, 30, 40]. However, you often need to represent data that has a grid-like structure, such as a tic-tac-toe board, a spreadsheet, or a game map. For this, Python uses nested lists.

A nested list is simply a list that contains other lists as its elements. This allows you to create two-dimensional (or even higher-dimensional) data structures.

Why it matters: Nested lists are the standard way to represent matrices, tables, and grids in Python. This pattern is essential for everything from game development to data analysis.
Relation to Previous Concepts: This concept directly builds on your knowledge of lists (Week 5) and nested loops (Week 3). You will often use a for loop to iterate over the main list (the rows) and a second, nested for loop to iterate over the inner lists (the columns).

Syntax: You create a nested list by placing lists inside another list.

# A 3x3 grid
grid = [
    [1, 2, 3],  # Row 0
    [4, 5, 6],  # Row 1
    [7, 8, 9]   # Row 2
]

To access an element, you use two indices in the format list_name[row_index][column_index].

# Access the element '5' (Row 1, Column 1)
element = grid[1][1] # Result is 5

# Access the element '9' (Row 2, Column 2)
element = grid[2][2] # Result is 9

Common Mistake: Forgetting that indexing starts at 0 for both the row and the column. Accessing grid[3][3] would cause an IndexError.

Problem: You are a teaching assistant for a small class. You have the students’ quiz scores stored in a nested list where each inner list represents a student’s scores on three quizzes. Your task is to write a function calculate_student_averages that takes this 2D list and returns a new 1D list containing the average score for each student.

Input: A nested list of numbers.

student_scores = [
	[85, 92, 78],  # Student 0's scores
	[90, 88, 94],  # Student 1's scores
	[76, 80, 82]   # Student 2's scores
]

Expected Behavior: The function should process the input list and return a new list where each element is the average of the corresponding inner list.

Complete Code:

def calculate_student_averages(scores_grid):
	averages = []
	# The outer loop iterates through each student's list of scores.
	for student_score_list in scores_grid:
		# Calculate the sum and count for the current student.
		total = sum(student_score_list) # sum() is a helpful built-in function!
		count = len(student_score_list)
		
		# Avoid division by zero for an empty inner list.
		if count > 0:
			average = total / count
			averages.append(average)
	
	return averages

# --- Test Case ---
student_scores = [
	[85, 92, 78],
	[90, 88, 94],
	[76, 80, 82]
]

student_averages = calculate_student_averages(student_scores)
print(f"Student averages: {student_averages}")

Code Walkthrough & Key Concepts:

Iterating Through Rows: The first for loop (for student_score_list in scores_grid:) iterates through the “rows” of your 2D list. In the first iteration, the variable student_score_list becomes [85, 92, 78], in the second, it’s [90, 88, 94], and so on.
Processing Each Row: Once you have a single row (which is just a regular list), you can use familiar tools like the built-in sum() and len() functions to process it. This shows how you can break down a complex problem (processing a grid) into a series of simpler problems (processing a single list).
Building the Result: The solution uses the accumulator pattern. You start with an empty list averages and then use the .append() method inside the loop to build up your final result list, one student average at a time.
Handling Edge Cases: It’s important to write robust code. What if a student had no scores (e.g., an empty inner list [])? The if count > 0: check handles this gracefully, preventing a ZeroDivisionError. This is an example of writing “defensive code.”

Expected Output:

Student averages: [85.0, 90.66666666666667, 79.33333333333333]

2. Advanced List Methods

In Week 5, you learned methods like .append() and .pop() that modify a list’s contents. Python also provides several powerful methods for organizing and searching lists.

.sort(): Modifies a list by sorting its items in ascending order. This happens in-place, which means it changes the original list directly and does not return a new one.
.reverse(): Reverses the order of elements in a list, also in-place.
.count(item): Returns the number of times item appears in the list. It does not modify the list.
.index(item): Searches for item and returns the index of its first occurrence. If the item is not found, it raises a ValueError.

Syntax: These methods are called directly on a list object.

numbers = [4, 1, 7, 1, 5]

# .sort() -> Modifies the list in-place
numbers.sort()
# now numbers is [1, 1, 4, 5, 7]

# .reverse() -> Modifies the list in-place
numbers.reverse()
# now numbers is [7, 5, 4, 1, 1]

# .count(item) -> Returns an integer
num_ones = numbers.count(1) # num_ones is 2

# .index(item) -> Returns an integer index
first_one_index = numbers.index(1) # first_one_index is 3

Important Note: In-Place Modification A very common beginner mistake is to assign the result of .sort() or .reverse() to a variable, expecting a new list.

my_list = [3, 1, 2]
sorted_list = my_list.sort() # WRONG!
# my_list is now [1, 2, 3]
# sorted_list is None, because .sort() returns None.

Remember to call in-place methods on a line by themselves. They modify the list directly.

Problem:

You’re analyzing participant data for a programming contest and have a list of scores. Write a function get_contest_stats that takes this list and returns two pieces of information:

The median score. The median is the middle value in a sorted list. If the list has an even number of elements, it’s the average of the two middle elements.
The number of participants who achieved the perfect score of 100.

Input: A list of integer scores.

scores = [88, 92, 100, 75, 92, 68, 95, 100, 81]

Expected Behavior: The function should return the median and the count of perfect scores. A good way to return multiple values is inside another data structure, like a list.

Complete Code:

def get_contest_stats(scores_list):
	num_scores = len(scores_list)
	if num_scores == 0:
		return [None, 0]

	# First, count perfect scores on the original list
	perfect_scores = scores_list.count(100)

	# To find the median, we need a sorted list.
	# It's good practice to sort a copy to not modify the original list.
	sorted_scores = scores_list[:] # Create a shallow copy using slicing
	sorted_scores.sort() # This sorts the copy in-place

	median = 0
	mid_index = num_scores // 2

	if num_scores % 2 == 1: # Odd number of scores
		median = sorted_scores[mid_index]
	else: # Even number of scores
		# Average of the two middle elements
		middle1 = sorted_scores[mid_index - 1]
		middle2 = sorted_scores[mid_index]
		median = (middle1 + middle2) / 2
	
	return [median, perfect_scores]

# --- Test Case ---
scores = [88, 92, 100, 75, 92, 68, 95, 100, 81]
stats = get_contest_stats(scores)
print(f"Median score: {stats[0]}")
print(f"Number of perfect scores: {stats[1]}")

Code Walkthrough & Key Concepts:

Using .count(): Notice how simple and expressive scores_list.count(100) is for counting the number of perfect scores. This is much cleaner than writing a manual loop.
Preserving Original Data: The line sorted_scores = scores_list[:] is very important. This syntax creates a copy of the list. If you had just written sorted_scores = scores_list, both variables would point to the same list in memory. Sorting sorted_scores would then also change the original scores_list, which might not be what you want. Creating a copy is a safe practice.
Median Calculation Logic: The if/else block demonstrates how to calculate a median. It uses the modulus operator (%) to check if the number of scores is odd or even, and integer division (//) to find the middle index. This logic is a common pattern for statistical calculations.

Expected Output:

Median score: 92
Number of perfect scores: 2

3. Introduction to Tuples

A tuple is a sequence of values, much like a list. The single most important difference is that tuples are immutable. Once a tuple is created, you cannot change its contents—you can’t add, remove, or reassign elements.

When should you use a tuple instead of a list?

Data Integrity: Use a tuple for data that should not change, like a coordinate point (x, y), RGB color values (255, 0, 0), or a person’s date of birth. A tuple guarantees that this data cannot be accidentally modified elsewhere in your program.
Performance: Because they are immutable, tuples can be slightly more memory-efficient and faster to process than lists in certain situations.
Dictionary Keys (Preview): In Week 10, you’ll learn about dictionaries. A key requirement for a dictionary key is that it must be immutable. Therefore, tuples can be used as dictionary keys, but lists cannot.

Analogy: A list is like a to-do list on a whiteboard—you can erase, add, and reorder items. A tuple is like a plaque engraved with a set of names—it’s permanent and unchangeable.

Syntax: You create tuples using parentheses () instead of square brackets [].

# A tuple of coordinates
point = (10, 20)

# A tuple of mixed data types
person_data = ("Alice", 30, "Engineer")

Accessing elements with indices and slicing work exactly the same as with lists:

x_coord = point[0] # x_coord is 10
job = person_data[2] # job is "Engineer"

However, you cannot modify a tuple:

point[0] = 15  # This will raise a TypeError!
person_data.append("USA") # This will raise an AttributeError (tuples have no 'append' method)

Special Case: The Single-Element Tuple To create a tuple with only one item, you must include a trailing comma.

not_a_tuple = (50)    # This is just the integer 50
is_a_tuple = (50,)    # The comma makes it a tuple

Problem: A robot moves on a 2D plane. Its path is represented as a list of coordinate tuples, like [(x1, y1), (x2, y2), ...]. Write a function get_quadrant_counts that takes this path and counts how many points fall into each of the four Cartesian quadrants.

Quadrant I: x > 0, y > 0
Quadrant II: x < 0, y > 0
Quadrant III: x < 0, y < 0
Quadrant IV: x > 0, y < 0

(For this problem, you can ignore points that lie on an axis, where x or y is 0).

Input: A list of tuples, where each tuple contains two numbers (the x and y coordinates).

path = [(1, 3), (-2, 4), (-5, -3), (6, -2), (8, 8), (-1, 7)]

Expected Behavior: The function should return a list containing the counts for Quadrants I, II, III, and IV, in that order.

Complete Code:

def get_quadrant_counts(path_data):
	# Initialize counts for Quadrants I, II, III, IV
	counts = [0, 0, 0, 0]

	# "Unpack" each tuple directly in the for loop
	for x, y in path_data:
		if x > 0 and y > 0:
			counts[0] += 1  # Quadrant I
		elif x < 0 and y > 0:
			counts[1] += 1  # Quadrant II
		elif x < 0 and y < 0:
			counts[2] += 1  # Quadrant III
		elif x > 0 and y < 0:
			counts[3] += 1  # Quadrant IV
	
	return counts

# --- Test Case ---
robot_path = [(1, 3), (-2, 4), (-5, -3), (6, -2), (8, 8), (-1, 7)]
quadrant_counts = get_quadrant_counts(robot_path)
print(f"Quadrant Counts (I, II, III, IV): {quadrant_counts}")

Code Walkthrough & Key Concepts:

Why Tuples are a Good Fit: A coordinate point (x, y) represents a single piece of data whose components should not change independently. Using a tuple (1, 3) enforces this, making the code’s intent clearer and safer than using a mutable list [1, 3].
A Key Python Feature: Tuple Unpacking. The line for x, y in path_data: demonstrates a powerful and elegant feature called “unpacking.” Instead of accessing elements by index, you can assign them directly to variables in the for loop. This makes your code much cleaner and more readable. Compare it to the more verbose alternative:
```
# The less-Pythonic way
for point in path_data:
x = point[0]
y = point[1]
# ... rest of the logic
```
Connecting to Conditionals: This problem is also a great way to practice using if/elif chains to categorize data based on multiple conditions.

Expected Output:

Quadrant Counts (I, II, III, IV): [2, 2, 1, 1]

4. Linear Search Algorithm

Now we will shift from learning about data structures to learning about algorithms. An algorithm is a step-by-step procedure for solving a problem.

The first and most fundamental algorithm you will learn is Linear Search. It’s the most intuitive way to find an item in a list:

Start at the first element.
Compare it to the target value you’re looking for.
If it’s a match, you’re done! You’ve found the item.
If it’s not a match, move to the next element and repeat step 2.
If you reach the end of the list without finding a match, then the target is not in the list.

Introduction to Algorithm Efficiency: The performance of an algorithm is a key concept in computer science. For Linear Search:

Best Case: The item you are looking for is the very first one. You only need to do 1 comparison.
Worst Case: The item is the very last one, or it’s not in the list at all. You have to check every single element. If the list has n items, you have to do n comparisons.

This means the time it takes to run a linear search grows linearly with the size of the list. If you double the list size, you potentially double the search time.

Problem: Write a function named linear_search that takes two arguments: data_list (a list of items) and target (the item to search for).

If the target is found in the data_list, the function should return the index of its first occurrence.
If the target is not found, the function should return -1. Returning -1 is a common programming convention to signal “not found,” since -1 cannot be a valid index.

Input: A list and a target value. Expected Behavior: For items = [45, 22, 14, 65, 87, 33, 71], linear_search(items, 87) should return 4, while linear_search(items, 50) should return -1.

Complete Code:

def linear_search(data_list, target):
	# We need the index, so we iterate over the indices of the list.
	for i in range(len(data_list)):
		# Check if the element at the current index matches the target.
		if data_list[i] == target:
			return i # Target found! Immediately return the index.
	
	# This line is only reached if the loop completes without finding the target.
	return -1

# --- Test Cases ---
inventory_ids = [101, 105, 115, 121, 133, 137]

# Case 1: Target is in the list
target1 = 115
index1 = linear_search(inventory_ids, target1)
print(f"Searching for {target1}... Found at index: {index1}")

# Case 2: Target is NOT in the list
target2 = 120
index2 = linear_search(inventory_ids, target2)
print(f"Searching for {target2}... Found at index: {index2}")

Code Walkthrough & Key Concepts:

Iterating by Index: Notice the loop for i in range(len(data_list)). We use this form instead of the simpler for item in data_list because the problem requires us to return the index of the target. This loop gives us access to the index i during each iteration.
The Early return: The return i statement is a key part of the algorithm’s logic. As soon as the if condition is true (a match is found), the function immediately stops and returns the current index. The rest of the list is not checked, which makes the search efficient if the target is found early.
Generality of the Algorithm: This linear_search function is generic. It will work for a list of numbers, strings, or any other data type, as long as the items can be compared for equality using the == operator.

Important Note: The Position of return -1 A classic beginner mistake is to place the return -1 inside an else block within the loop, like this:

# WRONG IMPLEMENTATION
for i in range(len(data_list)):
	if data_list[i] == target:
		return i
	else:
		return -1 # This is incorrect!

This incorrect version would check only the first element. If it doesn’t match, the function would immediately return -1 without checking the rest of the list. The return -1 must be placed outside and after the loop. Its position there signifies that the entire list has been searched and no match was found.

Expected Output:

Searching for 115... Found at index: 2
Searching for 120... Found at index: -1

Consolidation Problems

Problem 1: Filter Valid Sensor Readings

Concepts Practiced: Iterating over a list of tuples, tuple unpacking, conditional logic, list manipulation (.append()), implementing a search algorithm. Problem Statement: You are processing data from a network of environmental sensors. The data comes in as a list of tuples, where each tuple is (sensor_id, temperature_reading). Sometimes, a sensor malfunctions and reports an invalid temperature, represented as a negative number.

Write a function process_sensor_data that takes one argument: a list of these reading tuples.

The function should perform two tasks and return two values:

It must create and return a new list containing only the tuples with valid (non-negative) temperature readings.
It must also find and return the sensor_id of the first sensor that reported an invalid reading. If no invalid readings are found, it should return None.

Your function should return these two results together in a list: [list_of_valid_readings, first_invalid_sensor_id].

Example Case:

readings = [
	('SensorA', 22.5),
	('SensorB', 23.1),
	('SensorC', -99.0), # First invalid reading
	('SensorA', 22.7),
	('SensorD', -1.0),
	('SensorB', 23.3)
]

# Calling process_sensor_data(readings) should return:
# [
#   [('SensorA', 22.5), ('SensorB', 23.1), ('SensorA', 22.7), ('SensorB', 23.3)],
#   'SensorC'
# ]

Edge Case: If readings = [('SensorX', 30.0), ('SensorY', 31.0)], the function should return [[('SensorX', 30.0), ('SensorY', 31.0)], None].

Solution:

def process_sensor_data(data):
	valid_readings = []
	first_invalid_id = None
	found_invalid = False

	for sensor_id, temp in data:
		if temp >= 0:
			valid_readings.append((sensor_id, temp))
		# This 'else' block handles the invalid readings
		else:
			# We only want to capture the *first* invalid ID.
			if not found_invalid:
				first_invalid_id = sensor_id
				found_invalid = True
	
	return [valid_readings, first_invalid_id]

# --- Test Case ---
readings = [
	('SensorA', 22.5), ('SensorB', 23.1), ('SensorC', -99.0),
	('SensorA', 22.7), ('SensorD', -1.0), ('SensorB', 23.3)
]
result = process_sensor_data(readings)
print(f"Valid Readings: {result[0]}")
print(f"First Invalid Sensor ID: {result[1]}")

Solution Walkthrough:

Initialization: The function starts by creating an empty list valid_readings to accumulate the good data and setting first_invalid_id to None.
The “Flag” Variable: The boolean variable found_invalid is used as a “flag.” Its purpose is to help us remember whether we have already found our first invalid reading. It starts as False.
Tuple Unpacking: The loop for sensor_id, temp in data: elegantly unpacks each tuple into two separate variables, making the code easy to read.
Conditional Logic:
- If temp is valid (>= 0), the entire tuple (sensor_id, temp) is appended to the valid_readings list.
- If temp is invalid, the code checks the found_invalid flag. If it’s False (meaning this is the first invalid reading we’ve seen), it records the sensor_id and immediately sets found_invalid to True. On any subsequent invalid readings, the if not found_invalid: condition will be false, and the first_invalid_id will not be overwritten.
Return Value: Finally, the function returns a list containing the populated list of valid readings and the captured ID of the first invalid sensor.

Expected Output:

Valid Readings: [('SensorA', 22.5), ('SensorB', 23.1), ('SensorA', 22.7), ('SensorB', 23.3)]
First Invalid Sensor ID: SensorC

Problem 2: Find the Most Frequent Element

Concepts Practiced: Iterating lists, list methods (.count(), .sort()), implementing a search algorithm, keeping track of a “maximum so far”.

Problem Statement: Write a function find_most_frequent that takes a list of items (e.g., numbers or strings) and determines which item appears most often.

Requirements:

The function should return the item that has the highest frequency in the list.
If there is a tie for the most frequent item, the function should return the one that comes first in alphabetical or numerical order (e.g., if ‘apple’ and ‘banana’ both appear 3 times, ‘apple’ should be returned).
If the input list is empty, the function should return None.

Hint: Using .sort() on the list first can simplify handling the tie-breaking rule.

Example Cases:

scores = [90, 85, 90, 75, 90, 85] -> should return 90
items = ['c', 'b', 'a', 'c', 'b', 'c', 'a', 'b'] -> should return 'b' (since ‘b’ and ‘c’ both appear 3 times, and ‘b’ comes first alphabetically).

Solution:

def find_most_frequent(data_list):
	if not data_list: # Handles the empty list case
		return None

	# Sorting first makes the tie-breaker rule easy to implement.
	# We work on a copy to not alter the original list.
	sorted_data = data_list[:]
	sorted_data.sort()

	most_frequent_item = None
	max_count = -1 # Initialize with a value lower than any possible count

	# We only need to check each unique item. We can build a list of them.
	unique_items = []
	for item in sorted_data:
		# This check ensures we only add each unique item once.
		if item not in unique_items:
			unique_items.append(item)
	
	for item in unique_items:
		current_count = sorted_data.count(item)
		# If we find a new highest count, we update our variables.
		if current_count > max_count:
			max_count = current_count
			most_frequent_item = item
	
	return most_frequent_item

# --- Test Case ---
items = ['c', 'b', 'a', 'c', 'b', 'c', 'a', 'b']
most_common = find_most_frequent(items)
print(f"The most frequent item is: '{most_common}'")

Solution Walkthrough:

Handle Edge Case: The first line if not data_list: is a clean way to check for an empty list and return None as required.
Sort for Tie-Breaking: The solution creates a copy of the list and sorts it immediately. Why? Because the problem states that in a tie, the item that comes first in sorted order wins. By processing the unique items from this sorted list, the first one we encounter in a tie (e.g., ‘b’ before ‘c’) will set the most_frequent_item, and it won’t be overridden by another item with the same count.
Find Unique Items: The code builds a separate unique_items list. This is an efficiency improvement. Instead of running .count() for every single item in the original list (e.g., three times for ‘c’), we only run it once for each unique item.
The “Max So Far” Pattern: The variables most_frequent_item and max_count are used to keep track of the winner as you loop. max_count starts at -1 (a number guaranteed to be lower than any real count). The first item you check will always have a count greater than -1, so it becomes the initial “winner.”
Updating the Winner: Inside the loop, if current_count > max_count: is the key logic. If the item you are currently checking is more frequent than the best one you’ve seen so far, you update max_count and most_frequent_item to reflect this new winner. Because you are iterating through the unique items in sorted order, this logic automatically handles the tie-breaking rule.

Expected Output:

The most frequent item is: 'b'

Problem 3: 2D Matrix Row and Column Sums

Concepts Practiced: Nested lists, nested loops, list manipulation, function design.

Problem Statement: You are given a 2D list (a matrix) of numbers. Write a function calculate_sums that computes the sum of each row and the sum of each column.

Requirements: The function should take one argument: matrix, a list of lists of numbers. It should return a list containing two elements:

A list of the sums of each row.
A list of the sums of each column.

You can assume the input matrix is rectangular (i.e., all inner lists have the same length).

Example Case:

matrix = [
	[1, 2, 3],
	[4, 5, 6],
	[7, 8, 9]
]

# Calling calculate_sums(matrix) should return:
# [
#   [6, 15, 24],  # Row sums: 1+2+3=6, 4+5+6=15, 7+8+9=24
#   [12, 15, 18]  # Col sums: 1+4+7=12, 2+5+8=15, 3+6+9=18
# ]

Solution:

def calculate_sums(matrix):
	if not matrix:
		return [[], []]

	# --- Calculate Row Sums ---
	row_sums = []
	for row in matrix:
		row_sums.append(sum(row))

	# --- Calculate Column Sums ---
	num_cols = len(matrix[0]) # Get number of columns from the first row
	col_sums = [0] * num_cols # Pre-initialize a list of zeros for our sums

	# Outer loop iterates through rows using indices
	for i in range(len(matrix)):
		# Inner loop iterates through columns using indices
		for j in range(num_cols):
			# Add the element at (row i, col j) to the j-th column sum
			col_sums[j] += matrix[i][j]

	return [row_sums, col_sums]

# --- Test Case ---
grid = [
	[1, 2, 3],
	[4, 5, 6],
	[7, 8, 9]
]
sums = calculate_sums(grid)
print(f"Row sums: {sums[0]}")
print(f"Column sums: {sums[1]}")

Solution Walkthrough:

Row Sums (The Easy Part): Calculating the sum of each row is straightforward. You can loop through the main list, and for each row (which is itself a list), you can use the built-in sum() function and append the result to row_sums.
Column Sums (The Tricky Part): Calculating column sums requires a different approach because the numbers for a single column are spread across multiple inner lists.
- Initialization: We first determine the number of columns (num_cols). Then, col_sums = [0] * num_cols is a clever way to create a list of zeros with the exact length needed (e.g., [0, 0, 0] for a 3-column matrix). This list will act as our accumulator for the column totals.
- Nested Loops with Indices: To access elements by column, you must use nested loops with indices. The outer loop iterates through the row indices (i), and the inner loop iterates through the column indices (j).
- Accumulating the Sum: The core of the logic is col_sums[j] += matrix[i][j]. Let’s trace it:
  - When j is 0, you are always adding to col_sums[0]. The loop adds matrix[0][0], then matrix[1][0], then matrix[2][0], correctly summing the first column.
  - When j is 1, you are always adding to col_sums[1], summing the second column, and so on. This problem is a perfect example of why understanding how to iterate through a 2D list by index is so important.

Expected Output:

Row sums: [6, 15, 24]
Column sums: [12, 15, 18]