Week 9 Tutorial: Advanced String Processing
Problem 1: Anagram Checker (easy)
Problem Statement: An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For example, “listen” is an anagram of “silent”.
Write a function are_anagrams(string1, string2) that takes two strings as input and determines if they are anagrams of each other.
Requirements:
- The function should return
Trueif the strings are anagrams, andFalseotherwise. - The comparison must be case-insensitive.
- Spaces and punctuation should be ignored in the comparison.
- You can assume the input strings will only contain letters, spaces, and simple punctuation like commas and periods.
Hint: What would be true about two strings that are anagrams if you were to sort their characters?
Test Cases:
print(are_anagrams("Listen", "Silent"))
print(are_anagrams("The Morse Code", "Here come dots"))
print(are_anagrams("Astronomer", "Moon starer"))
print(are_anagrams("Hello", "World"))
print(are_anagrams("Dormitory", "Dirty room."))
Expected Output:
True
True
True
False
True
Problem 2: Reversing Words in a Sentence (easy)
Problem Statement:
Write a function reverse_words(sentence) that takes a string sentence as input and returns a new sentence where each word is individually reversed, but the order of the words remains the same.
Requirements:
- The function should accept a single string argument,
sentence. - Each word in the sentence should be reversed. For example, “Python” becomes “nohtyP”.
- The order of the words in the sentence must be preserved.
- Whitespace (spaces) between words should also be preserved. The
.split()method with no arguments is very helpful here as it handles multiple spaces between words correctly. - Punctuation should be treated as part of the word and be reversed along with it. For example, “world!” becomes “!dlrow”.
Test Cases:
print(reverse_words("Hello World"))
print(reverse_words("Python is fun!"))
print(reverse_words("This is a test with multiple spaces"))
print(reverse_words("s'teL ecitcarp"))
Expected Output:
olleH dlroW
nohtyP si !nuf
sihT si a tset htiw elpitlum secaps
Let's practice
Problem 3: Reformat Phone Number
Problem Statement:
You are given a string phone_number containing digits, spaces, and dashes. Your task is to reformat the string into a standard phone number format.
Write a function reformat_number(phone_number) that cleans and reformats the number according to the following rules:
- First, remove all spaces and dashes from the input string, leaving only the digits.
- Next, group the digits into blocks of three, separated by dashes.
- The final block of digits is an exception. It can have a length of two, three, or four.
- If the number of remaining digits at the end is 4, they should be grouped into two blocks of two digits each (e.g.,
XX-XX). - If the number of remaining digits is 2 or 3, they form a single final block.
- If the number of remaining digits at the end is 4, they should be grouped into two blocks of two digits each (e.g.,
Test Cases:
print(reformat_number("123 456 789")) # 9 digits -> 3-3-3
print(reformat_number("123-456-7890")) # 10 digits -> 3-3-2-2 (4 remaining -> 2-2)
print(reformat_number("123 45 678")) # 8 digits -> 3-3-2
print(reformat_number("12")) # 2 digits -> 2
print(reformat_number("12345")) # 5 digits -> 3-2
print(reformat_number("--1 23 4-5 6-7--")) # 7 digits -> 3-2-2 (4 remaining -> 2-2)
Expected Output:
123-456-789
123-456-78-90
123-456-78
12
123-45
123-45-67
Problem 4: Decompress Run-Length Encoded String
Problem Statement:
Run-length encoding (RLE) is a simple form of data compression where runs of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. For example, "WWWWBBWW" could be encoded as "3W2B2W".
Write a function decompress_rle(encoded_string) that takes a run-length encoded string and returns the decompressed string.
Requirements:
- The encoded string will consist of digits (
0-9) followed by an uppercase letter (A-Z). - The number part indicates how many times the following letter should be repeated.
- The number part can consist of multiple digits (e.g.,
"12A"means twelve ‘A’s). - You can assume the input string is always in a valid RLE format.
Test Cases:
print(decompress_rle("3A2B4C"))
print(decompress_rle("1W4B1W"))
print(decompress_rle("10X1Y"))
print(decompress_rle("1A1B1C1D1E"))
print(decompress_rle("12Z"))
Expected Output:
AAABBCCCC
WBBBBW
XXXXXXXXXXY
ABCDE
ZZZZZZZZZZZZ
Problem 5: Simple Markdown Parser
Problem Statement: You are building a simplified text converter for a chat application. The app supports basic formatting using special characters:
- Text surrounded by double asterisks
**should be bolded (converted to HTML<b>and</b>tags). - Text surrounded by underscores
_should be italicized (converted to HTML<i>and</i>tags).
Write a function parse_markdown(text) that takes a raw string and returns the HTML-formatted string.
Rules & Constraints:
- You can assume that the formatting markers always appear in pairs. You do not need to handle broken tags (e.g., a
**without a closing**). - You do not need to handle nested formatting (e.g.,
**_bold and italic_**). - You must iterate through the string to process this. You cannot simply use
.replace()globally, because you need to distinguish between an opening tag (which becomes<b>) and a closing tag (which becomes</b>).
Test Cases:
print(parse_markdown("This is **bold** text."))
print(parse_markdown("Hello _world_."))
print(parse_markdown("Make **this** bold and _this_ italic."))
print(parse_markdown("No formatting here."))
print(parse_markdown("**Bold** at start."))
Expected Output:
This is <b>bold</b> text.
Hello <i>world</i>.
Make <b>this</b> bold and <i>this</i> italic.
No formatting here.
<b>Bold</b> at start.
Problem 6: Valid Parentheses Checker (The Stack Approach)
Problem Statement:
Validating mathematical expressions or code syntax is a common problem. Write a function is_balanced(expression) that determines if the parentheses, brackets, and braces in a string are valid.
Rules:
- The function must handle three types of brackets:
(),[], and{}. - Open brackets must be closed by the same type of brackets.
- Open brackets must be closed in the correct order (e.g.,
([)]is invalid). - Any non-bracket characters in the string should be ignored.
- Return
Trueif the expression is balanced,Falseotherwise.
Hint: Think about how you would solve this manually. If you see an opening bracket (, you wait for a closing ). If you see [, you wait for ]. The most recently opened bracket must be the first one to close. You can use a List as a “Stack” (using .append() and .pop()) to keep track of opening brackets.
Test Cases:
print(is_balanced("{[()]}")) # Nested correctly
print(is_balanced("{[}]")) # Incorrect nesting order
print(is_balanced("(]")) # Mismatched types
print(is_balanced("((()))")) # Simple nesting
print(is_balanced("print(list[0])")) # Code with text (valid)
print(is_balanced("(((")) # Unclosed
print(is_balanced("]")) # Closing without opening
Expected Output:
True
False
False
True
True
False
False
This content will be available starting November 25, 2025.