Week 9 Tutorial: Advanced String Processing

Problem 1: Anagram Checker (easy)

Problem Statement: An anagram is a word or phrase formed by rearranging the letters of a different word or phrase, typically using all the original letters exactly once. For example, “listen” is an anagram of “silent”.

Write a function are_anagrams(string1, string2) that takes two strings as input and determines if they are anagrams of each other.

Requirements:

The function should return True if the strings are anagrams, and False otherwise.
The comparison must be case-insensitive.
Spaces and punctuation should be ignored in the comparison.
You can assume the input strings will only contain letters, spaces, and simple punctuation like commas and periods.

Hint: What would be true about two strings that are anagrams if you were to sort their characters?

Test Cases:

print(are_anagrams("Listen", "Silent"))
print(are_anagrams("The Morse Code", "Here come dots"))
print(are_anagrams("Astronomer", "Moon starer"))
print(are_anagrams("Hello", "World"))
print(are_anagrams("Dormitory", "Dirty room."))

Expected Output:

True
True
True
False
True

Problem 2: Reversing Words in a Sentence (easy)

Problem Statement: Write a function reverse_words(sentence) that takes a string sentence as input and returns a new sentence where each word is individually reversed, but the order of the words remains the same.

Requirements:

The function should accept a single string argument, sentence.
Each word in the sentence should be reversed. For example, “Python” becomes “nohtyP”.
The order of the words in the sentence must be preserved.
Whitespace (spaces) between words should also be preserved. The .split() method with no arguments is very helpful here as it handles multiple spaces between words correctly.
Punctuation should be treated as part of the word and be reversed along with it. For example, “world!” becomes “!dlrow”.

Test Cases:

print(reverse_words("Hello World"))
print(reverse_words("Python is fun!"))
print(reverse_words("This is a  test   with   multiple spaces"))
print(reverse_words("s'teL    ecitcarp"))

Expected Output:

olleH dlroW
nohtyP si !nuf
sihT si a  tset   htiw   elpitlum secaps
Let's    practice

Problem 3: Reformat Phone Number

Problem Statement: You are given a string phone_number containing digits, spaces, and dashes. Your task is to reformat the string into a standard phone number format.

Write a function reformat_number(phone_number) that cleans and reformats the number according to the following rules:

First, remove all spaces and dashes from the input string, leaving only the digits.
Next, group the digits into blocks of three, separated by dashes.
The final block of digits is an exception. It can have a length of two, three, or four.
- If the number of remaining digits at the end is 4, they should be grouped into two blocks of two digits each (e.g., XX-XX).
- If the number of remaining digits is 2 or 3, they form a single final block.

Test Cases:

print(reformat_number("123 456 789"))     # 9 digits -> 3-3-3
print(reformat_number("123-456-7890"))    # 10 digits -> 3-3-2-2 (4 remaining -> 2-2)
print(reformat_number("123 45 678"))      # 8 digits -> 3-3-2
print(reformat_number("12"))              # 2 digits -> 2
print(reformat_number("12345"))           # 5 digits -> 3-2
print(reformat_number("--1 23 4-5 6-7--")) # 7 digits -> 3-2-2 (4 remaining -> 2-2)

Expected Output:

123-456-789
123-456-78-90
123-456-78
12
123-45
123-45-67

Problem 4: Decompress Run-Length Encoded String

Problem Statement: Run-length encoding (RLE) is a simple form of data compression where runs of data (sequences in which the same data value occurs in many consecutive data elements) are stored as a single data value and count, rather than as the original run. For example, "WWWWBBWW" could be encoded as "3W2B2W".

Write a function decompress_rle(encoded_string) that takes a run-length encoded string and returns the decompressed string.

Requirements:

The encoded string will consist of digits (0-9) followed by an uppercase letter (A-Z).
The number part indicates how many times the following letter should be repeated.
The number part can consist of multiple digits (e.g., "12A" means twelve ‘A’s).
You can assume the input string is always in a valid RLE format.

Test Cases:

print(decompress_rle("3A2B4C"))
print(decompress_rle("1W4B1W"))
print(decompress_rle("10X1Y"))
print(decompress_rle("1A1B1C1D1E"))
print(decompress_rle("12Z"))

Expected Output:

AAABBCCCC
WBBBBW
XXXXXXXXXXY
ABCDE
ZZZZZZZZZZZZ

Problem 5: Simple Markdown Parser

Problem Statement: You are building a simplified text converter for a chat application. The app supports basic formatting using special characters:

Text surrounded by double asterisks ** should be bolded (converted to HTML  and  tags).
Text surrounded by underscores _ should be italicized (converted to HTML  and  tags).

Write a function parse_markdown(text) that takes a raw string and returns the HTML-formatted string.

Rules & Constraints:

You can assume that the formatting markers always appear in pairs. You do not need to handle broken tags (e.g., a ** without a closing **).
You do not need to handle nested formatting (e.g., **_bold and italic_**).
You must iterate through the string to process this. You cannot simply use .replace() globally, because you need to distinguish between an opening tag (which becomes ) and a closing tag (which becomes ).

Test Cases:

print(parse_markdown("This is **bold** text."))
print(parse_markdown("Hello _world_."))
print(parse_markdown("Make **this** bold and _this_ italic."))
print(parse_markdown("No formatting here."))
print(parse_markdown("**Bold** at start."))

Expected Output:

This is <b>bold</b> text.
Hello <i>world</i>.
Make <b>this</b> bold and <i>this</i> italic.
No formatting here.
<b>Bold</b> at start.

Problem 6: Valid Parentheses Checker (The Stack Approach)

Problem Statement: Validating mathematical expressions or code syntax is a common problem. Write a function is_balanced(expression) that determines if the parentheses, brackets, and braces in a string are valid.

Rules:

The function must handle three types of brackets: (), [], and {}.
Open brackets must be closed by the same type of brackets.
Open brackets must be closed in the correct order (e.g., ([)] is invalid).
Any non-bracket characters in the string should be ignored.
Return True if the expression is balanced, False otherwise.

Hint: Think about how you would solve this manually. If you see an opening bracket (, you wait for a closing ). If you see [, you wait for ]. The most recently opened bracket must be the first one to close. You can use a List as a “Stack” (using .append() and .pop()) to keep track of opening brackets.

Test Cases:

print(is_balanced("{[()]}"))       # Nested correctly
print(is_balanced("{[}]"))         # Incorrect nesting order
print(is_balanced("(]"))           # Mismatched types
print(is_balanced("((()))"))       # Simple nesting
print(is_balanced("print(list[0])")) # Code with text (valid)
print(is_balanced("((("))          # Unclosed
print(is_balanced("]"))            # Closing without opening

Expected Output:

True
False
False
True
True
False
False