Georgina Woo

Instructions

This document contains a series of programming problems and guided tasks designed to help you practice and strengthen your Python/programming skills. Some of them use sample files so you can test your code. They’re hyperlinked in the problem description, but they can also be found here: Programming Problem Set Data

To simplify the problems, the data used mostly consists of dummy values.

The tasks are split into two types:

  1. Programming Problems (30 programming tasks):
  1. Guided Tasks (5 guided tasks):

If you get stuck:

Programming problems

Problem 1: Directory Listing with Flags


Topics Covered: File I/O, Loops, String Manipulation
Write a program that:

  1. Lists all files in the current directory.
  2. Flags directories with a star (*).

Example Output:

file1.txt  
file2.csv  
sample_data/ *  

Hint:


Problem 2: Word Frequency Counter


Topics Covered: Dictionaries, Loops
Write a function
count_words that:

  1. Takes a string as input.
  2. Returns a dictionary with words as keys and their frequencies as values.

Example:

count_words("hello world hello")  
# Output: {'hello': 2, 'world': 1}


Problem 3: String Lengths


Topics Covered: String functions, lists, for loops
Write and test a function
lengths that:

Example:

lengths(['Ed', 'Ted', 'Fred', 'Jennifer'])
# Output: [2, 3, 4, 8]


Problem 4: Student Pass/Fail


Topics Covered: Functions, Conditionals
Write a function
is_pass that:

  1. Takes a student's score as input.
  2. Returns "Pass" if the score is 50 or above, else "Fail".

Example:

is_pass(75)  # Output: "Pass"
is_pass(45)  
# Output: "Fail"


Problem 5: Filter Even Numbers


Topics Covered: Functions, Lists
Write a function
filter_even that:

  1. Takes a list of integers.
  2. Returns a new list containing only the even numbers.

Example:

filter_even([1, 2, 3, 4, 5])  
# Output: [2, 4]


Problem 6: What Problem Should I Solve Next?


Topics Covered: Python’s Random library
Write a program to:

1. Import the random library. Eg (import random)

2. Define the probabilities for choosing between guided tasks and programming problems. For example, you can set the probability of choosing a guided task to 0.2, and the other probability to 0.8.

3. Use the random.choice function to select either a guided task or a programming problem based on the defined probabilities.

4. If a guided task is selected, randomly choose a task between 1 and 6.

5. If a programming problem is selected, randomly choose a problem between 1 and 30, excluding problem 6.

Hint: Use a while loop to keep rolling until you get a random number that is not 6

6. Print the selected task or problem.

Example of using random.choices:

problem_list = ['A', 'B', 'C', 'D']
weights = [0.1, 0.2, 0.3, 0.4]
selected_choice = random.choices(problem_list, weights=weights, k=1)

Hint: Test out that example, and try changing k=2 and printing out selected_choice to see what the output of random.choices looks like.


Problem 7: Save Shopping List


Topics Covered: Reading and Writing to Files
Write a program to:

  1. Take user input for a shopping list (e.g., "eggs,milk,bread").
  2. Save the list to a file named shopping_list.txt, with each item on a new line.

Hint:

You can get user input with the input() function, and use the .split() function to split the string into a list

Example:

userInput = input("Enter your shopping list separated by commas: ")


Problem 8: Word Scramble Puzzle


Topics Covered: Strings, Randomization, Loops
Write a program to:

  1. Make a list of unique works in Pride and Prejudice: pap.txt
  2. Randomly select words from the list.
  3. Scramble the letters in the word.
  4. Ask the user to guess the original word.

Example:

Scrambled word: "rpesoeivt" 
Guess the original word:
"preservative" 
Correct!

Hint:

Example:

import random
word =
"preservative"
letters = list(word)
random.shuffle(letters)
scrambled_word =
''.join(letters)
print(scrambled_word)  # Example output: "erivpetsarev"

Example:

userInput = input("Enter your shopping list separated by commas: ")


Problem 9: File Analysis - Count Lines, Words, and Characters


Topics Covered: File I/O, Functions
Write a program that:

  1. Reads a file.
  2. Counts and prints the number of lines, words, and characters in the file.

Example Input File sample.txt

Hello, world!  

Python is fun.  

Expected Output:

Lines: 2  

Words: 5  

Characters: 27  

Hint:

Try it on Pride and Prejudice: pap.txt

Problem 10: Average Grade


Topics Covered: Dictionaries, Loops, Functions, File I/O

Refer to student_grades.csv

Create a dictionary where keys are student names and values are lists of grades.

Hint to build the dictionary:
Read the file line by line. For each line, the format is [student name]: grade1, grade2

Use split(“:”) to separate the name from grades, then split(“,”) on the grades to get individual grades


Write a function
calculate_average that:

Hint:
Use a loop to calculate the sum of grades for each student and divide by the number of grades.


Problem 11: Find the Maximum Grade


Topics Covered: Pandas
Write a program to:

  1. Load a CSV file student_grades.csv
  2. Sort the dataframe by the Math grades, from highest to lowest
  1. Use .sort_values() to sort using the Math column, with ascending=False
  1. Print the name of the 3 students and their scores with the highest grades in Math.
  1. Recall: df.head(3) prints the first 3 rows, and df[[“colname1”,”colname2”]] prints only those 2 columns of a dataframe

Example output:

       Name  Math
5     Fiona    95
2   Charlie    92
18    Stacy    92


Problem 12: DNA Complement


Topics Covered: Strings, Loops, Functions

Write a function get_complement that:

Use the function to:

Hint:
Iterate through the DNA sequence using a loop. For each nucleotide, append its complement to a new string.


Problem 13: Nucleotide Count


Topics Covered: Dictionaries, Loops, Functions

Create a dictionary to store counts of nucleotides (A, T, G, C) in a DNA sequence.
Write a function
count_nucleotides that:

Print the results.

Hint:
Use a loop to iterate through the sequence and update the counts in the dictionary.


Problem 14: GC Content


Topics Covered: Functions, Math Operators

Write a function gc_content that:

Hint:
Count the occurrences of G and C, divide by the total length, and multiply by 100 to get the percentage.


Problem 15: Reverse Complement


Topics Covered: Strings, Slicing, Functions

Write a function reverse_complement that:

Hint:
Use slicing to reverse the sequence, then use a loop to calculate the complement.

Recall:


Problem 16: Filter Long Sequences


Topics Covered: Lists, Loops, Conditionals, File I/O

Write a program to:

Hint:
Use len() to check sequence length and write() to save valid sequences.


Problem 17: Amino Acid Frequency


Topics Covered: Dictionaries, File I/O

Task:

  1. Read protein sequences from a file. Each line in the file includes a label indicating the source of the protein (e.g., HUMAN_HEMOGLOBIN, JELLYFISH_GFP) and the protein sequence.
  2. Count the frequency of each amino acid using a dictionary.
  3. Print the results, showing the source label and the amino acid counts.

Hint:

Input File protein_sequences.txt

Example output:

HUMAN_HEMOGLOBIN:
M 2
V 19
H 8
L 17
T 8
P 7
E 8
...


Problem 18: Palindrome Check


Topics Covered: Strings, Functions

Write a function is_palindrome that:

Hint:
Compare the string to its reverse using slicing.

Examples to test:

racecar, level, radar, hello, world, civic, deified, noon, python, madam, rotator, kayak, refer, palindrome, stats


Problem 19: Translate Codons


https://www.hgvs.org/mutnomen/codon.html

Topics Covered: Dictionaries, Loops

Create a dictionary where keys are codons (e.g., ATG, TAA) and values are corresponding amino acids.
Write a program to:

Note: A duplicate codon appears in the text file.

Option 1: Overwrite the existing entry with the new value.
Option 2: Ignore the duplicate and retain the first occurrence.

Hint: To check if some key is already in the dictionary, you can use the keyword in

Example:

if key in dictionary:


Problem 20: Calculate Factorial


Topics Covered: Loops, Functions

Write a function factorial that:

Hint:
Use a loop to multiply numbers from 1 to the input value.

Example:

5!=5×4×3×2×1=120


Problem 21: Password Strength Checker


Topics Covered: Strings, Conditionals

Write a function check_password that:

Hint:
Use isupper(), islower(), and isdigit() to check conditions.


Problem 22: Fibonacci Sequence


Topics Covered: Loops, Functions

Write a function fibonacci that:

Example:
The Fibonacci sequence is a series of numbers where each number is the sum of the two preceding ones.

Hint:

Initialize the sequence as a list with the first two numbers.

Use a loop to calculate the next number in the sequence, and append it to the list.


Problem 23: Filter Odd Numbers


Topics Covered: Lists, File I/O

Write a program to:

Hint:

Use the modulo operator % to check if a number is odd. What do odd numbers all have in common when they’re divided by 2?


Problem 24: Reverse Words


Topics Covered: Strings, Loops, Slicing

Write a program to:

Hint:
You can use:


Problem 25: Compare Word Lengths in Two Books


Topics Covered: File I/O, Loops, String Manipulation
Write a program to determine which of two novels uses longer words on average.

  1. Read the text from Pride and Prejudice, and Huckleberry Finn: pap.txt and hf.txt
  2. Calculate the average word length for each book.
  3. Print which book has longer words on average.

Hint:

Use division to calculate the average word length.

Problem 26: Count Unique Words


Topics Covered: Strings, Dictionaries

Write a program to:

Hint:
Use a dictionary to store each word as a key and its count as the value.


Problem 27: Codon Frequency


Topics Covered: Dictionaries, Loops

Write a program to:


Problem 28: Sorting Without Built-in Functions


Topics Covered: Lists, Loops

Write a program to:

Hint:
Use a nested loop to compare and swap elements.


Problem 29: Longest Common Prefix


Topics Covered: Strings, Loops

Write a function longest_common_prefix that:

Hint:
Compare characters of strings at the same index until a mismatch is found.


Problem 30: Number Guessing Game


Topics Covered: Randomization, Loops, Conditionals

Write a program to:

  1. Generate a random number between 1 and 100 (inclusive).
  2. Allow the user to guess the number.
  3. Give hints like "Too high" or "Too low" after each guess.
  4. Stop when the user guesses the number correctly.

Example Interaction:

I have selected a number between 1 and 100. Try to guess it!  
Your guess: 50  
Too high!  
Your guess: 25  
Too low!  
Your guess: 30  
Congratulations! You guessed the number
in 3 attempts.

Hint:

Example:

guess = input("Your guess: ")

But! The input function takes data in as a string, so you’ll have to cast the guess to an integer before trying to compare it to the correct number. For example,

If "2"< 4: #causes an error
If int(
"2") < 4: #this is fine

guess = int(input("Your guess: "))

Guided Tasks

Guided Task 1: DNA Analysis


You are given a DNA sequence. Your task is to analyze it step by step and perform various operations.

Step 1: Initialize Variables

  1. Create a variable dna and assign it a string of nucleotides (e.g., "ATGCCGTAGCTAAGTTCG").
  2. Create a variable complement and assign it an empty string ("").

Step 2: Build the Complementary Strand

  1. Use a for loop to iterate over each nucleotide in the dna sequence.
  2. For each nucleotide:
  1. Append the complementary nucleotide to the complement variable.

Hints:

Step 3: Slice the Sequence

  1. Use slicing to extract the first 10 nucleotides of the dna sequence.
  2. Print the sliced sequence.

Hint:

Step 4: Count Nucleotides

  1. Create variables count_a, count_t, count_g, and count_c and initialize them to 0.
  2. Use a for loop and conditionals to count the frequency of each nucleotide in the dna sequence.
  3. Store the counts in the variables and print them.

Step 5: Use Boolean Logic for Comparisons

  1. Check if the number of adenines (A) is greater than the number of thymines (T).
  2. Print "Adenines are more frequent" if true, otherwise print "Thymines are more frequent or equal".

Step 6: Split the Sequence into Codons

  1. Create an empty list variable codons.
  2. Use slicing to split the dna sequence into chunks of 3 nucleotides (codons).
  3. Append each codon to the codons list using .append().
  4. Print the codons list.

Hint:


Guided Task 2: Number Analysis


Your task is to analyze a list of numbers and perform various operations.

Step 1: Create a List of Numbers

  1. Create a variable numbers and assign it a list of 20 integers (e.g., [12, 45, 67, 23, 89, ...]).
  2. Print the list to verify its contents.

Hint:

Step 2: Find and Print Basic Statistics

  1. Calculate and print the following statistics for the list:

Hint:

Step 3: Separate Odd and Even Numbers

  1. Create two empty lists: odd_numbers and even_numbers.
  2. Use a for loop to iterate through the list numbers:
  1. Print both lists.

Hint:

Step 4: Sort the Numbers

  1. Sort the numbers list in ascending order.
  2. Sort the odd_numbers and even_numbers lists in descending order.
  3. Print the sorted lists.

Hint:

Step 5: Print Multiples of 5

  1. Create a new list multiples_of_5.
  2. Use a for loop to add all numbers divisible by 5 from the numbers list to multiples_of_5.
  3. Print the list of multiples.

Hint:


Guided Task 3: Text Analysis


Your task is to analyze a block of text.

Step 1: Create a String

  1. Create a variable text and assign it a paragraph of text

Example:

The Bernard S. and Sophie G. Gould MIT Summer Research Program in Biology (BSG-MSRP-Bio) is offered in collaboration with MIT's Department of Brain & Cognitive Sciences. The program provides a unique opportunity for students who do not have access to cutting-edge research facilities at their own institution to conduct supervised research in a fast-paced environment with state-of-the-art research facilities, and to experience first hand the academic, social, and cultural environment at MIT.
The program is designed to encourage students from low income families, first-generation college students, students from socio-economically-disadvantaged backgrounds, veterans and students with disabilities to attend graduate school and pursue a career in basic research by providing them the opportunity to conduct supervised research in an outstanding research institution, in a supportive learning environment with plenty of interaction with graduate students and faculty. Over 85% of past participants have enrolled in highly ranked graduate programs within three years of completing this summer program. A number of our summer students have been awarded Goldwater Scholarships, pre-doctoral NSF fellowships (GRFP), or Howard Hughes Medical Institute (HHMI) Gilliam Fellowships for Advanced Study.
Priority will be given to students studying at non-research intensive institutions, small colleges or public universities.

Step 2: Count Words and Characters

  1. Split the text into a list of words using the .split() method.
  2. Count the total number of words.
  3. Count the total number of characters, including spaces.
  4. Print the results.

Hint:

Step 3: Find the Most Frequent Words

  1. Create a dictionary to store word frequencies.
  2. Use a for loop to iterate through the list of words:
  1. Find and print the top 3 words with the highest frequency.

Hint:

Step 4: Reverse the Text

  1. Reverse the order of words in the text.
  2. Reverse each word in the text.
  3. Print both results.

Hint:

Step 5: Check for Palindromes

  1. Iterate through the list of words.
  2. Print all words that are palindromes.

Hint:


Guided Task 4: Gene Expression Analysis


You are given a CSV file gene_expression.csv containing sample gene expression data with the following columns:

Step 1: Load the Data

  1. Import Pandas.
  2. Load the CSV file into a Pandas DataFrame.
  3. Print the first few rows using .head().

Hint:

Step 2: Explore the Data

  1. Use .info() to view column types and check for missing values.
  2. Use .describe() to get summary statistics for numeric columns.

Hint:

Step 3: Add New Columns

For each row (gene):

  1. Calculate the average expression across all samples and store it in a new column, Average_Expression.
  2. Calculate the minimum expression and maximum expression across all samples and store them in Min_Expression and Max_Expression columns, respectively.

Hint:


Step 4: Filter the Data

  1. Keep only rows where Average_Expression > 5.
  2. Create a new DataFrame for the filtered data.

Hint:

Step 5: Summarize the Filtered Data

  1. Print the overall mean, maximum, minimum, and standard deviation of the Average_Expression column.

Hint:

Step 6: Save the Filtered Data

  1. Save the filtered DataFrame to a new CSV file named filtered_genes.csv.

Hint:


Guided Task 5: GPA Calculator


You are given a CSV file student_grades.csv containing student grades in various subjects with the following columns:

Your task is to analyze the data and calculate letter grades and GPAs for each student. The task is divided into clear steps.

Step 1: Load the Data

  1. Import Pandas.
  2. Load the CSV file into a DataFrame.
  3. Print the first few rows using the .head() method to inspect the data.

Hints:

Step 2: Explore the Data

  1. Use .info() to check column types and identify any missing values.
  2. Use .describe() to get summary statistics for the numeric columns.

Hints:


Step 3: Add Letter Grade Columns

For each subject (Math, Science, English, History), create a new column that contains the letter grade based on the numeric grade.

  1. Write a function get_letter_grade:
  1. Iterate through each subject:
  1. Add the lists to the DataFrame as new columns:

Hints:

Step 4: Calculate GPA

Add a new column GPA to calculate the GPA for each student based on their letter grades.

  1. Write a function calculate_gpa:
  1. Calculate GPA for each student:
  1. Add the GPA list as a new column:

Hints:

Step 5: Save the Updated Data

  1. Save the updated DataFrame with the new columns (Math_Letter, Science_Letter, etc., and GPA) to a new CSV file named updated_student_grades.csv.

Hints: