Instructions
This document contains a series of programming problems and guided tasks designed to help you practice and strengthen your Python/programming skills. Some of them use sample files so you can test your code. They’re hyperlinked in the problem description, but they can also be found here:
Programming Problem Set Data
To simplify the problems, the data used mostly consists of dummy values.
The tasks are split into two types:
- Programming Problems (30 programming tasks):
- These are self-contained problems, each focusing on getting you to review specific Python concepts.
- They vary in difficulty and topic, allowing you to choose based on what looks interesting to you. (eg. Check out Problem 6)
- The problems are independent, so you can attempt them in any order. However, that means that sometimes the task will ask you to do something you’re not familiar with, or use a function you haven’t seen before. You should feel free to look things up – this is not a test, but rather an exercise in self-directed learning.
- Guided Tasks (5 guided tasks):
- These tasks walk you through step-by-step instructions for solving more complex problems. Try one of these first if you want a more guided approach!
If you get stuck:
- Read the error! Sometimes the error itself gives us helpful hints to fix our code. If not, paste the error into a Google search – often there’s a StackOverflow forum post where someone has had the exact same error, and others have commented on ways to fix it.
- Look up functions you’re not familiar with to see examples of how they’re used.For example, searching “how to use input() in Python” results in many links and tutorials. I often find myself on a geeksforgeeks page when I’ve forgotten how something works exactly.
Programming problems
Problem 1: Directory Listing with Flags
Topics Covered: File I/O, Loops, String Manipulation
Write a program that:
- Lists all files in the current directory.
- Flags directories with a star (*).
Example Output:
file1.txt file2.csv sample_data/ * |
Hint:
- Use the os module to list files and directories.
- Remember to use import os at the start of your code!
- Check if an item is a directory using os.path.isdir().
Problem 2: Word Frequency Counter
Topics Covered: Dictionaries, Loops
Write a function count_words that:
- Takes a string as input.
- Returns a dictionary with words as keys and their frequencies as values.
Example:
count_words("hello world hello") # Output: {'hello': 2, 'world': 1} |
Problem 3: String Lengths
Topics Covered: String functions, lists, for loops
Write and test a function lengths that:
- Takes a list of strings as input.
- Returns a list of the lengths of the strings.
Example:
lengths(['Ed', 'Ted', 'Fred', 'Jennifer']) # Output: [2, 3, 4, 8] |
Problem 4: Student Pass/Fail
Topics Covered: Functions, Conditionals
Write a function is_pass that:
- Takes a student's score as input.
- Returns "Pass" if the score is 50 or above, else "Fail".
Example:
is_pass(75) # Output: "Pass" is_pass(45) # Output: "Fail" |
Problem 5: Filter Even Numbers
Topics Covered: Functions, Lists
Write a function filter_even that:
- Takes a list of integers.
- Returns a new list containing only the even numbers.
Example:
filter_even([1, 2, 3, 4, 5]) # Output: [2, 4] |
Problem 6: What Problem Should I Solve Next?
Topics Covered: Python’s Random library
Write a program to:
1. Import the random library. Eg (import random)
2. Define the probabilities for choosing between guided tasks and programming problems. For example, you can set the probability of choosing a guided task to 0.2, and the other probability to 0.8.
3. Use the random.choice function to select either a guided task or a programming problem based on the defined probabilities.
4. If a guided task is selected, randomly choose a task between 1 and 6.
5. If a programming problem is selected, randomly choose a problem between 1 and 30, excluding problem 6.
Hint: Use a while loop to keep rolling until you get a random number that is not 6
6. Print the selected task or problem.
Example of using random.choices:
problem_list = ['A', 'B', 'C', 'D'] weights = [0.1, 0.2, 0.3, 0.4] selected_choice = random.choices(problem_list, weights=weights, k=1) |
Hint: Test out that example, and try changing k=2 and printing out selected_choice to see what the output of random.choices looks like.
Problem 7: Save Shopping List
Topics Covered: Reading and Writing to Files
Write a program to:
- Take user input for a shopping list (e.g., "eggs,milk,bread").
- Save the list to a file named shopping_list.txt, with each item on a new line.
Hint:
You can get user input with the input() function, and use the .split() function to split the string into a list
Example:
userInput = input("Enter your shopping list separated by commas: ") |
Problem 8: Word Scramble Puzzle
Topics Covered: Strings, Randomization, Loops
Write a program to:
- Make a list of unique works in Pride and Prejudice: pap.txt
- Randomly select words from the list.
- Scramble the letters in the word.
- Ask the user to guess the original word.
Example:
Scrambled word: "rpesoeivt" Guess the original word: "preservative" Correct! |
Hint:
- Store words in a list for random selection.
- Use the random library to shuffle the letters in a word.
Example:
import random word = "preservative" letters = list(word) random.shuffle(letters) scrambled_word = ''.join(letters) print(scrambled_word) # Example output: "erivpetsarev" |
- You can get user input with the input() function, and use the .split() function to split the string into a list
Example:
userInput = input("Enter your shopping list separated by commas: ") |
Problem 9: File Analysis - Count Lines, Words, and Characters
Topics Covered: File I/O, Functions
Write a program that:
- Reads a file.
- Counts and prints the number of lines, words, and characters in the file.
Example Input File sample.txt
Hello, world!
Python is fun.
Expected Output:
Lines: 2
Words: 5
Characters: 27
Hint:
- Use separate functions for each task (e.g., count_lines, count_words, count_characters).
Try it on Pride and Prejudice: pap.txt
Problem 10: Average Grade
Topics Covered: Dictionaries, Loops, Functions, File I/O
Refer to student_grades.csv
Create a dictionary where keys are student names and values are lists of grades.
Hint to build the dictionary:
Read the file line by line. For each line, the format is [student name]: grade1, grade2
Use split(“:”) to separate the name from grades, then split(“,”) on the grades to get individual grades
Write a function calculate_average that:
- Takes the dictionary as input.
- Returns the average grade for each student.
Hint:
Use a loop to calculate the sum of grades for each student and divide by the number of grades.
Problem 11: Find the Maximum Grade
Topics Covered: Pandas
Write a program to:
- Load a CSV file student_grades.csv
- Sort the dataframe by the Math grades, from highest to lowest
- Use .sort_values() to sort using the Math column, with ascending=False
- Print the name of the 3 students and their scores with the highest grades in Math.
- Recall: df.head(3) prints the first 3 rows, and df[[“colname1”,”colname2”]] prints only those 2 columns of a dataframe
Example output:
Name Math 5 Fiona 95 2 Charlie 92 18 Stacy 92 |
Problem 12: DNA Complement
Topics Covered: Strings, Loops, Functions
Write a function get_complement that:
- Takes a DNA sequence (string) as input.
- Returns its complementary strand (A -> T, T -> A, C -> G, G -> C).
Use the function to:
- Read a DNA sequence from the user using the input() function
- Print its complement.
Hint:
Iterate through the DNA sequence using a loop. For each nucleotide, append its complement to a new string.
Problem 13: Nucleotide Count
Topics Covered: Dictionaries, Loops, Functions
Create a dictionary to store counts of nucleotides (A, T, G, C) in a DNA sequence.
Write a function count_nucleotides that:
- Takes a DNA sequence as input.
- Updates the dictionary with the counts of each nucleotide.
Print the results.
Hint:
Use a loop to iterate through the sequence and update the counts in the dictionary.
Problem 14: GC Content
Topics Covered: Functions, Math Operators
Write a function gc_content that:
- Takes a DNA sequence as input.
- Returns the GC content (percentage of G and C in the sequence).
Hint:
Count the occurrences of G and C, divide by the total length, and multiply by 100 to get the percentage.
Problem 15: Reverse Complement
Topics Covered: Strings, Slicing, Functions
Write a function reverse_complement that:
- Takes a DNA sequence as input.
- Returns its reverse complement.
Hint:
Use slicing to reverse the sequence, then use a loop to calculate the complement.
Recall:
- A DNA sequence to complementary strand (A -> T, T -> A, C -> G, G -> C).
Problem 16: Filter Long Sequences
Topics Covered: Lists, Loops, Conditionals, File I/O
Write a program to:
- Read a file containing DNA sequences sequences.txt.
- Write only sequences longer than 10 bases to a new file, filtered_sequences.txt.
Hint:
Use len() to check sequence length and write() to save valid sequences.
Problem 17: Amino Acid Frequency
Topics Covered: Dictionaries, File I/O
Task:
- Read protein sequences from a file. Each line in the file includes a label indicating the source of the protein (e.g., HUMAN_HEMOGLOBIN, JELLYFISH_GFP) and the protein sequence.
- Count the frequency of each amino acid using a dictionary.
- Print the results, showing the source label and the amino acid counts.
Hint:
- Use split(":") to separate the label from the sequence.
- Use a dictionary to count occurrences of each character in the sequence.
Input File protein_sequences.txt
Example output:
HUMAN_HEMOGLOBIN: M 2 V 19 H 8 L 17 T 8 P 7 E 8 ... |
Problem 18: Palindrome Check
Topics Covered: Strings, Functions
Write a function is_palindrome that:
- Takes a string as input.
- Returns True if the string is a palindrome, otherwise False.
Hint:
Compare the string to its reverse using slicing.
Examples to test:
racecar, level, radar, hello, world, civic, deified, noon, python, madam, rotator, kayak, refer, palindrome, stats |
Problem 19: Translate Codons
https://www.hgvs.org/mutnomen/codon.html
Topics Covered: Dictionaries, Loops
Create a dictionary where keys are codons (e.g., ATG, TAA) and values are corresponding amino acids.
Write a program to:
- Build a dictionary of codon-amino acid pairs based on the txt file: codon_to_amino_acid.txt
- Split a DNA sequence into codons.
- Translate each codon into its amino acid using the dictionary.
Note: A duplicate codon appears in the text file.
Option 1: Overwrite the existing entry with the new value.
Option 2: Ignore the duplicate and retain the first occurrence.
Hint: To check if some key is already in the dictionary, you can use the keyword in
Example:
Problem 20: Calculate Factorial
Topics Covered: Loops, Functions
Write a function factorial that:
- Takes an integer as input.
- Returns its factorial.
Hint:
Use a loop to multiply numbers from 1 to the input value.
Example:
Problem 21: Password Strength Checker
Topics Covered: Strings, Conditionals
Write a function check_password that:
- Takes a password as input.
- Checks if it:
- Has at least 8 characters.
- Contains uppercase and lowercase letters.
- Includes at least one number.
- Returns whether the password is strong.
Hint:
Use isupper(), islower(), and isdigit() to check conditions.
Problem 22: Fibonacci Sequence
Topics Covered: Loops, Functions
Write a function fibonacci that:
- Takes an integer n as input.
- Returns the first n numbers of the Fibonacci sequence.
Example:
The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones.
The sequence starts with 0 and 1.
Example for n=10:
0,1,1,2,3,5,8,13,21,34
Hint:
Initialize the sequence as a list with the first two numbers.
Use a loop to calculate the next number in the sequence, and append it to the list.
Problem 23: Filter Odd Numbers
Topics Covered: Lists, File I/O
Write a program to:
- Create a list of numbers from 1 to 50.
- Write only the odd numbers to a file.
Hint:
Use the modulo operator % to check if a number is odd. What do odd numbers all have in common when they’re divided by 2?
Problem 24: Reverse Words
Topics Covered: Strings, Loops, Slicing
Write a program to:
- Reverse the order of words in a sentence.
Hint:
You can use:
- split() to break the sentence into words, slicing to reverse the list of words (e.g., words[::-1])
- join() to combine them in reverse order.
- A loop to iterate through the words from the end to the start and build the reversed sentence manually.
Problem 25: Compare Word Lengths in Two Books
Topics Covered: File I/O, Loops, String Manipulation
Write a program to determine which of two novels uses longer words on average.
- Read the text from Pride and Prejudice, and Huckleberry Finn: pap.txt and hf.txt
- Calculate the average word length for each book.
- Print which book has longer words on average.
Hint:
- Count the total number of characters and words in each book.
Use division to calculate the average word length.
Problem 26: Count Unique Words
Topics Covered: Strings, Dictionaries
Write a program to:
- Read text from Huckleberry Finn: hf.txt
- Count the number of unique words using a dictionary.
Hint:
Use a dictionary to store each word as a key and its count as the value.
Problem 27: Codon Frequency
Topics Covered: Dictionaries, Loops
Write a program to:
- Split a DNA sequence into codons (3 bases).
- Count the frequency of each codon using a dictionary.
Problem 28: Sorting Without Built-in Functions
Topics Covered: Lists, Loops
Write a program to:
- Sort a list of numbers in ascending order without using sort().
Hint:
Use a nested loop to compare and swap elements.
Problem 29: Longest Common Prefix
Topics Covered: Strings, Loops
Write a function longest_common_prefix that:
- Takes a list of strings as input.
- Returns the longest common prefix.
Hint:
Compare characters of strings at the same index until a mismatch is found.
Problem 30: Number Guessing Game
Topics Covered: Randomization, Loops, Conditionals
Write a program to:
- Generate a random number between 1 and 100 (inclusive).
- Allow the user to guess the number.
- Give hints like "Too high" or "Too low" after each guess.
- Stop when the user guesses the number correctly.
Example Interaction:
I have selected a number between 1 and 100. Try to guess it! Your guess: 50 Too high! Your guess: 25 Too low! Your guess: 30 Congratulations! You guessed the number in 3 attempts. |
Hint:
- Use random.randint(1, 100) to generate a random number.
- Use a while loop to keep asking for guesses until the number is found.
- Keep a counter to track the number of attempts.
- Use the input() function to get the user’s input
Example:
guess = input("Your guess: ") |
But! The input function takes data in as a string, so you’ll have to cast the guess to an integer before trying to compare it to the correct number. For example,
If "2"< 4: #causes an error If int("2") < 4: #this is fine |
guess = int(input("Your guess: ")) |
Guided Tasks
Guided Task 1: DNA Analysis
You are given a DNA sequence. Your task is to analyze it step by step and perform various operations.
Step 1: Initialize Variables
- Create a variable dna and assign it a string of nucleotides (e.g., "ATGCCGTAGCTAAGTTCG").
- Create a variable complement and assign it an empty string ("").
Step 2: Build the Complementary Strand
- Use a for loop to iterate over each nucleotide in the dna sequence.
- For each nucleotide:
- Replace A with T, T with A, C with G, and G with C.
- Append the complementary nucleotide to the complement variable.
Hints:
- Use if and elif statements to check each nucleotide.
- Print the complement string after the loop.
Step 3: Slice the Sequence
- Use slicing to extract the first 10 nucleotides of the dna sequence.
- Print the sliced sequence.
Hint:
- Use slicing syntax: dna[start:end].
Step 4: Count Nucleotides
- Create variables count_a, count_t, count_g, and count_c and initialize them to 0.
- Use a for loop and conditionals to count the frequency of each nucleotide in the dna sequence.
- Store the counts in the variables and print them.
Step 5: Use Boolean Logic for Comparisons
- Check if the number of adenines (A) is greater than the number of thymines (T).
- Print "Adenines are more frequent" if true, otherwise print "Thymines are more frequent or equal".
Step 6: Split the Sequence into Codons
- Create an empty list variable codons.
- Use slicing to split the dna sequence into chunks of 3 nucleotides (codons).
- Append each codon to the codons list using .append().
- Print the codons list.
Hint:
- Use a for loop with a step size of 3 (range(start, end, step)) to slice the sequence into codons.
Guided Task 2: Number Analysis
Your task is to analyze a list of numbers and perform various operations.
Step 1: Create a List of Numbers
- Create a variable numbers and assign it a list of 20 integers (e.g., [12, 45, 67, 23, 89, ...]).
- Print the list to verify its contents.
Hint:
- Use square brackets [] to define the list.
- Use the print() function to display the list.
Step 2: Find and Print Basic Statistics
- Calculate and print the following statistics for the list:
- The maximum value.
- The minimum value.
- The average value.
- The sum of all numbers.
Hint:
- Use Python built-in functions like max(), min(), sum(), and len() to perform calculations.
- The average can be calculated as sum(numbers) / len(numbers).
Step 3: Separate Odd and Even Numbers
- Create two empty lists: odd_numbers and even_numbers.
- Use a for loop to iterate through the list numbers:
- Append odd numbers to odd_numbers.
- Append even numbers to even_numbers.
- Print both lists.
Hint:
- Use the modulo operator % to check if a number is odd or even.
Step 4: Sort the Numbers
- Sort the numbers list in ascending order.
- Sort the odd_numbers and even_numbers lists in descending order.
- Print the sorted lists.
Hint:
- Use the .sort() method for sorting.
Step 5: Print Multiples of 5
- Create a new list multiples_of_5.
- Use a for loop to add all numbers divisible by 5 from the numbers list to multiples_of_5.
- Print the list of multiples.
Hint:
- Use the modulo operator % to check divisibility.
Guided Task 3: Text Analysis
Your task is to analyze a block of text.
Step 1: Create a String
- Create a variable text and assign it a paragraph of text
Example:
The Bernard S. and Sophie G. Gould MIT Summer Research Program in Biology (BSG-MSRP-Bio) is offered in collaboration with MIT's Department of Brain & Cognitive Sciences. The program provides a unique opportunity for students who do not have access to cutting-edge research facilities at their own institution to conduct supervised research in a fast-paced environment with state-of-the-art research facilities, and to experience first hand the academic, social, and cultural environment at MIT. The program is designed to encourage students from low income families, first-generation college students, students from socio-economically-disadvantaged backgrounds, veterans and students with disabilities to attend graduate school and pursue a career in basic research by providing them the opportunity to conduct supervised research in an outstanding research institution, in a supportive learning environment with plenty of interaction with graduate students and faculty. Over 85% of past participants have enrolled in highly ranked graduate programs within three years of completing this summer program. A number of our summer students have been awarded Goldwater Scholarships, pre-doctoral NSF fellowships (GRFP), or Howard Hughes Medical Institute (HHMI) Gilliam Fellowships for Advanced Study. Priority will be given to students studying at non-research intensive institutions, small colleges or public universities. |
Step 2: Count Words and Characters
- Split the text into a list of words using the .split() method.
- Count the total number of words.
- Count the total number of characters, including spaces.
- Print the results.
Hint:
- Use len() to count words and characters.
Step 3: Find the Most Frequent Words
- Create a dictionary to store word frequencies.
- Use a for loop to iterate through the list of words:
- If the word is already in the dictionary, increment its count.
- If not, add it to the dictionary with a count of 1.
- Find and print the top 3 words with the highest frequency.
Hint:
- Use .items() to iterate through the dictionary.
Step 4: Reverse the Text
- Reverse the order of words in the text.
- Reverse each word in the text.
- Print both results.
Hint:
- Use slicing [::-1] to reverse strings and lists.
Step 5: Check for Palindromes
- Iterate through the list of words.
- Print all words that are palindromes.
Hint:
- A word is a palindrome if it reads the same backward and forward.
- Use slicing [::-1] to check if a word is equal to its reverse.
Guided Task 4: Gene Expression Analysis
You are given a CSV file gene_expression.csv containing sample gene expression data with the following columns:
- Gene (e.g., "BRCA1", "TP53")
- Sample_1, Sample_2, Sample_3 (expression levels in different samples)
Step 1: Load the Data
- Import Pandas.
- Load the CSV file into a Pandas DataFrame.
- Print the first few rows using .head().
Hint:
- Use pd.read_csv() to load the file.
Step 2: Explore the Data
- Use .info() to view column types and check for missing values.
- Use .describe() to get summary statistics for numeric columns.
Hint:
- Use .info() to check the structure, and .describe() to calculate statistics like mean, min, and max.
Step 3: Add New Columns
For each row (gene):
- Calculate the average expression across all samples and store it in a new column, Average_Expression.
- Calculate the minimum expression and maximum expression across all samples and store them in Min_Expression and Max_Expression columns, respectively.
Hint:
- Use the .mean(axis=1), .min(axis=1), and .max(axis=1) methods for row-wise calculations.
Step 4: Filter the Data
- Keep only rows where Average_Expression > 5.
- Create a new DataFrame for the filtered data.
Hint:
- Use a condition like df[df["Average_Expression"] > 5] to filter the rows.
Step 5: Summarize the Filtered Data
- Print the overall mean, maximum, minimum, and standard deviation of the Average_Expression column.
Hint:
- Use .mean(), .max(), .min(), and .std() methods.
Step 6: Save the Filtered Data
- Save the filtered DataFrame to a new CSV file named filtered_genes.csv.
Hint:
- Use df.to_csv("filtered_genes.csv", index=False) to save the file.
Guided Task 5: GPA Calculator
You are given a CSV file student_grades.csv containing student grades in various subjects with the following columns:
- Name: Student names.
- Math, Science, English, History: Numeric grades for each subject (0-100).
Your task is to analyze the data and calculate letter grades and GPAs for each student. The task is divided into clear steps.
Step 1: Load the Data
- Import Pandas.
- Load the CSV file into a DataFrame.
- Print the first few rows using the .head() method to inspect the data.
Hints:
- Use pd.read_csv() to load the CSV file.
- Call df.head() to check the structure of the dataset.
Step 2: Explore the Data
- Use .info() to check column types and identify any missing values.
- Use .describe() to get summary statistics for the numeric columns.
Hints:
- .info() helps you understand the structure of the dataset.
- .describe() calculates statistics like mean, min, and max for numeric columns.
Step 3: Add Letter Grade Columns
For each subject (Math, Science, English, History), create a new column that contains the letter grade based on the numeric grade.
- Write a function get_letter_grade:
- Takes a numeric grade as input.
- Returns the letter grade:
- 90-100: A
- 80-89: B
- 70-79: C
- 60-69: D
- Below 60: F.
- Iterate through each subject:
- Use a for loop to go through the numeric grades in each column (e.g., Math).
- Append the letter grades to a new list for that subject.
- Add the lists to the DataFrame as new columns:
- Use df["ColumnName"] = list to add the new columns.
Hints:
- Use basic conditionals (if, elif, else) in the function to determine the letter grade.
- Access a column using df["ColumnName"] and iterate over it using a for loop.
Step 4: Calculate GPA
Add a new column GPA to calculate the GPA for each student based on their letter grades.
- Write a function calculate_gpa:
- Takes a list of letter grades as input.
- Maps letter grades to the GPA scale:
- A: 4.0, B: 3.0, C: 2.0, D: 1.0, F: 0.0.
- Loops through the list of letter grades and calculates the total GPA.
- Calculate GPA for each student:
- Iterate through the rows of the DataFrame.
- Pass the letter grades for all subjects (e.g., Math_Letter, Science_Letter) to the calculate_gpa function.
- Store the result in a list.
- Add the GPA list as a new column:
- Use df["GPA"] = list to add the GPA column to the DataFrame.
Hints:
- Use a for loop instead of list comprehension for clarity.
- Create the list of letter grades for each student by accessing the appropriate columns.
- Calculate the average GPA by dividing the total GPA by the number of grades.
Step 5: Save the Updated Data
- Save the updated DataFrame with the new columns (Math_Letter, Science_Letter, etc., and GPA) to a new CSV file named updated_student_grades.csv.
Hints:
- Use df.to_csv("filename.csv", index=False) to save the file without the index column.