In [2]:

# /// script
# requires-python = ">=3.10"
# ///

# Standard library imports (no need to declare in dependencies)
import random
import statistics as stats
from datetime import date

Intro to Python – Global BioImaging Data Course - Pune, India¶

Student version:

Solution:

Welcome to first step in Python!
This notebook is written like a small interactive book: you will read, explore, and do.

Who is this for?
Anyone who has Python or Jupyter installed and wants to move from “I can run cells” to “I can write programs.”

Why Python?

Readability first – its syntax looks like pseudocode.

A giant ecosystem – from data science (pandas, scipy) to machine learning (scikit-learn, pytorch, tensorflow) to hardware (pymmcore).

Batteries included – the standard library gives you file I/O, HTTP clients, math, testing, and more.

In bioimage analysis, Python helps automate tasks like:

loading and processing large image datasets,

applying filters and segmentations,

extracting features, and

visualizing results.

You will learn core building blocks:

Chapter	Concept	Why it matters
0	Introduction	Python, environment, and Jupyter noteboks
1	Variables and data types	Store and label information so programs remember things
2	Data structures	Understand the different types of data in Python
3	Functions	Package logic into reusable, testable pieces
4	Control Flow with `if`/`else`	Make decisions and branch logic
5	Control Flow with `for`	Repeat work without copy‑pasting
6	Mini project	Your turn!

Each chapter has:

Narrative explanation – read this like a textbook.
Live demo – run and play.
Exercise – your turn, guess the output! & mini projects ✅

0. Python, environment, and Jupyter noteboks¶

Python is a general-purpose programming language.¶

When choosing a programming language, context matters. Here's how Python stacks up against R and Java in bioimage analysis and scientific computing:

Feature	Python	R	Java
Learning Curve	Gentle – readable syntax, beginner-friendly	Steep for programming, but easy for statistics	Steep – verbose syntax, strong typing
Primary Strengths	General-purpose, excellent for data analysis, scripting, and automation	Specialized for statistics and plotting	Fast performance, robust for large systems
Bioimage Support	Strong (`scikit-image`, `napari`, `cellpose`, etc.)	Limited; mostly via third-party packages or Python bridges	Used in tools like ImageJ/Fiji, but not for prototyping
Speed	Fast enough for most tasks; easy to optimize	Slower; often relies on calling C/C++ under the hood	High performance; suitable for computationally intensive tasks
Community & Ecosystem	Massive, with libraries in AI, biology, and automation	Strong in statistics and epidemiology	Strong in engineering and enterprise
Use Case Fit	Ideal for scripting analysis pipelines and integrating tools	Great for exploratory statistics and quick plots	Better for building plugins or standalone software tools

In summary:

Python strikes a balance between ease of use and power. If you want to explore data, prototype tools, or glue systems together, Python excels.
R is great when your work is statistical in nature.
Java is more suited to performance-critical or plugin-based environments like ImageJ.

What is a Python Environment?¶

Analogy: Imaging Setup

Think of a Python environment as a virtual imaging setup:

Your microscope needs a specific objective, filters, and settings to work for a particular experiment.
Similarly, your Python project needs certain packages, tools, and versions to run correctly.

Environments help keep these project-specific tools isolated and clean, so:

Project A (image segmentation) doesn't break when you install tools for Project B (deep learning).

How to Create an Environment (Optional)

Not covered in this course, because we are using Google Colab.

Using uv:

uv venv --python 3.10
uv pip install package1 package2

Using conda:

conda create --name bioimage-env python=3.10
conda activate bioimage-env

Note: uv is faster, more compatible, and more secure. Note: a requirements.txt file is typically used to specify the dependencies for a project, and can then be used to create an environment with uv e.g. uv pip install -r requirements.txt.

Using Jupyter Notebooks¶

What is Jupyter?

Jupyter Notebooks are like lab notebooks for code:

You write and test code in cells.
You can mix code, text, images, and results in one document.
It's a quick way to prototype and share code, but is not production code!

How to Open Jupyter?

Using Anaconda Navigator:

Click on "Launch" under Jupyter Notebook.

Using Terminal:

jupyter notebook

1. Variables and primitive data types¶

Concept.
A variable is a labeled box that can hold any Python object.
Because Python is dynamically typed, the label does not declare a type – the object itself knows its type.

┌─────────────────────────────┐
│ label: `pixel_intensity`    │
└──────────────┬──────────────┘
               ↓
     ┌────────────────────┐
     │ object: 3883.03     │
     │        (float)      │
     └────────────────────┘

Naming matters¶

Use lowercase_with_underscores (number_of_cells or cell_number).
Be descriptive: temperature_c > t. Code is read by humans far more than by machines.

Mutability & Identity¶

Variables can be rebound:

pixel_intensity = 3883.03
pixel_intensity = "high_intensity"  # ↓ the label now points elsewhere and to another type!

But some objects themselves can change (lists) – we call this mutability.

Concept. Python's primitive (built‑in) data types are:

int, float – numbers
- int: Whole numbers like -1, 0, 42 (e.g. z_slice = 30)
- float: Decimal numbers like 3.14, -0.001 (e.g. pixel_size = 0.25)
str – text
- Sequences of characters in single/double quotes
- Examples: "hello", 'world', "123", "GFP.tif"
bool – truth values (True, False)
- Used for logical operations and control flow
- Result of comparisons like ==, >, <
- Examples: True, False (e.g. is_segmented = True)
None – explicit "nothing"
- Represents absence of a value
- Common default return value for functions

Example:

In [ ]:

✍️ Exercise¶

Create two string variables:

channel_name (the name of your favorite imaging channel) e.g. CY5
stain_name (the name of your favorite stain) e.g. Ki67

Then print: e.g. “Ki67 is imaged using the CY5 channel.”

Hint: Use an f‑string. f-string is a way to embed variables inside string literals, using curly braces {}.

In [ ]:

2. Data Structures¶

Containers – list, tuple, set, and dict
- list: Ordered, mutable sequences [1, 2, 3]
- tuple: Ordered, immutable sequences (1, 2, 3)
- set: Unordered collection of unique items {1, 2, 3}
- dict: Key-value pairs {"a": 1, "b": 2}

Example:

In [ ]:

Lists – Ordered Collections: A list holds a collection of items like a channel stack.

Example:

In [ ]:

Tuples – Fixed-size Groupings: Tuples are like lists but immutable (can’t be changed). Useful for things like storing important information e.g. image shape.

Example:

In [ ]:

Used for pixel dimensions, coordinates, etc.

A dictionary is an associative array (hash map) mapping keys → values. Hashmaps are a fundamental data structure in computer science, and are implemented in Python as dictionaries.

channel_colors = {"GFP": "green", "CY5": "red"}

Dictionaries map keys to values, like an image’s metadata.

Example:

In [ ]:

✍️ Exercise: guess the output!¶

Predict what will be printed:

nums = [1, 2, 3]
alias = nums
alias.append(4)
print('nums:', nums)
print('alias:', alias)

Will the two lists differ? Why/why not? Is 4 added to the beginning of the list?

In [ ]:

Follow up: what if we use nums = alias instead of alias = nums?

In [ ]:

Follow up: how to get the number of elements in nums?

Hint: use the len function.

In [ ]:

✍️ Exercise: your turn!¶

Create and print variables:

In [ ]:

sample_name = "embryo_02.tif"
z_planes = 40
pixel_spacing = 0.32
is_noise_filtered = False

# print the depth of the imaged sample

Work with a list:

In [ ]:

fluorophores = ["Hoechst", "GFP", "mCherry"]

# print the third fluorophore in the list

# add a new fluorophore to the list and print the updated list

# remove the first fluorophore from the list and print the updated list
# Hint: use the `pop` method to remove the fluorophore at index 0

# remove the fluorophore at index 1 and print the updated list

3. Functions ‑ Reusable verbs¶

Concept.
A function groups statements, giving them a name, inputs (parameters), and output (return value).

Syntax of a function definition:

def function_name(parameters):
    """Docstring"""
    return value

Then, call the function with the function_name(arguments).

Note: A parameter is a variable named in the function or method definition. It acts as a placeholder for the data the function will use. An argument is the actual value that is passed to the function or method when it is called.

Why it matters:

Reuse – write once, call everywhere.
Testing – functions are the unit of testability.
Abstraction – hide complexity behind a simple interface.

Docstrings become the function’s documentation (try running help(fahrenheit_to_celsius) to see it). It's a good practice to include a docstring for every function you write, as it helps you and others understand what the function does.

Example:

In [2]:

38.46153846153847

✍️ Exercise: guess the output!¶

Predict what will be printed:

def foo(base):
    """What does this function do?"""
    base_map = {"A": "T", "T": "A", "C": "G", "G": "C"}
    return base_map[base]

def foofoo(triplet):
    """What does this function do?"""
    return foo(triplet[0]) + foo(triplet[1]) + foo(triplet[2])

dna_list = ["GTA", "ACC", "TTT"]

result1 = foofoo(dna_list[0])
result2 = foofoo(dna_list[1])
result3 = foofoo("CGT")

print(result1)
print(result2)
print(result3)

What does the function does to DNA codons?

Output:

TAC
GAG
GAC

✍️ Exercise: your turn!¶

Write a function bmi(weight_kg, height_m) that returns the Body‑Mass Index, rounded to 1 decimal.
Then call it with (70 kg, 1.75 m).

Hint: use the round(value, ndigits) function.

In [ ]:

# write your code here

def bmi(weight_kg, height_m):
    return round(weight_kg / (height_m ** 2), 1)

print(bmi(70, 1.75))

4. If/Else ‑ Flow... decisions...¶

Concept: Control Flow

Control flow statements allow your program to make decisions and branch into different paths depending on conditions.

These statements let your code respond to data — like a GPS recalculating your route based on traffic or wrong turns.

Key Keywords

if: the primary gate — only runs the code block if the condition is True
elif: (else if) — test an additional condition if the previous one was False
else: fallback — runs only if all above conditions are False

How it works

Example:

In [ ]:

Moderately bright image

Truthiness in Python

In Python, not just True and False matter — any object can be evaluated in a boolean context:

Value	Boolean Equivalent
`0`, `0.0`, `''`, `[]`, `{}`	`False`
Non‑zero numbers, non‑empty strings/lists	`True`

if []:
    print("This won't run.")
if [1, 2, 3]:
    print("This will!")  # Lists with items are truthy

✍️ Exercise: your turn!¶

Write a function that classifies cells based on their size and intensity.

The function should take two arguments:

size: the size of the cell (in µm²)
intensity: the intensity of the cell (in a.u., a fluorescence unit)

The function should return 4 possible outputs:

"Large & Active" if the cell is both large and fluorescent
"Large & Inactive" if the cell is large but not fluorescent
"Small & Active" if the cell is small and fluorescent
"Small & Inactive" if the cell is small and not fluorescent

Try running the function with the following inputs:

print(classify_cell(120, 50))   # → Large & Active
print(classify_cell(50, 0.3))    # → Small & Inactive
print(classify_cell(130, 12))   # → Large & Inactive
print(classify_cell(80, 75))    # → Small & Active

Hint: use the if/elif/else structure to check the conditions.

In [ ]:

✍️ Exercise: guess the output!¶

Predict what will be printed:

def special_cell_classifier(size, intensity, roundness):
    """What does this function do?"""
    if size > 100 and intensity > 25:
        return "Proliferating"
    elif size <= 100 and roundness > 0.85:
        return "Resting"
    elif intensity < 0.2 or roundness < 0.2:
        return "Likely debris"
    else:
        size_label = "Large" if size > 100 else "Small"
        activity_label = "Active" if intensity > 25 else "Inactive"
        shape_label = "Round" if roundness > 0.85 else "Irregular"
        return size_label + " & " + activity_label + " & " + shape_label

What will the following code print?

print(special_cell_classifier(120, 50, 0.9))
print(special_cell_classifier(50, 0.3, 0.2))
print(special_cell_classifier(130, 0.4, 0.2))
print(special_cell_classifier(80, 125, 0.85))

Output:

Proliferating
Likely debris
Likely debris
Large & Active & Round

5. For Loops ‑ Repetition made easy¶

Concept.
for loops iterate over iterables: lists, strings, ranges, files, generators…

Why loops matter:

Automate repetition.
Enable algorithms like searching and aggregation.

Pythonic looping embraces iteration over indices:

Example:

In [ ]:

✍️ Exercise: guess the output!¶

Predict what will be printed:

def foo(lst):
    """What does this function do?"""
    new_lst = []
   for i in range(len(lst)-1, -1, -1):
       new_lst.append(lst[i])
   return new_lst

numbers = [1, 2, 3, 4, 5]
print(f"Original list: {numbers}")

new_numbers = foo(numbers)
print(f"New list: {new_numbers}")

Output: Original list: [1, 2, 3, 4, 5] Reversed list: [5, 4, 3, 2, 1] Original list unchanged: [1, 2, 3, 4, 5]

Think about what the function does. How is the output achieved with the for loop?

✍️ Exercise: your turn!¶

Write a loop that goes through:

images = {'img1.tif': 1000, 'img2.tif': 2240, 'img3.tif': 3000}

And processes each image by checking if it's a large image: Hint: use a for loop to iterate over the dictionary, and use the items() method to get the key-value pairs.

In [ ]:

6. Mini project: organize your image files¶

Let's create a program that analyzes metadata from microscopy images to help organize and validate your dataset.

You'll work with a dictionary of image metadata containing:

Image names
Microscope settings (magnification, exposure time)
Sample information (cell type, staining)

Tasks:

Create a dictionary of image metadata -> done
Write functions to:
- Check if exposure times are within acceptable range
- Group images by cell type
- Calculate average exposure time per magnification
Use loops and conditionals to process the metadata
Print summary statistics about your dataset

In [1]:

# dictionary of image metadata
image_metadata = {
    'img1.tif': {
        'magnification': 40,
        'exposure_time': 100,
        'cell_type': 'neuron',
        'staining': 'DAPI'
    },
    'img2.tif': {
        'magnification': 60,
        'exposure_time': 150,
        'cell_type': 'astrocyte', 
        'staining': 'GFP'
    },
    'img3.tif': {
        'magnification': 40,
        'exposure_time': 200,
        'cell_type': 'neuron',
        'staining': 'DAPI'
    }
}

# solution to the mini project

Checking exposure times:
img1.tif has valid exposure time: 100
img2.tif has valid exposure time: 150
img3.tif has valid exposure time: 200

Images grouped by cell type:
neuron: ['img1.tif', 'img3.tif']
astrocyte: ['img2.tif']

Average exposure times by magnification:
40x: 150.0ms
60x: 150.0ms

Where to go next?¶

Introduction to digital images – numpy, matplotlib.

“Programs must be written for people to read, and only incidentally for machines to execute.”
— Harold Abelson