To get comfortable with PyTorch, you need to master the “big three”: Tensors, Autograd, and the nn.Module workflow.

Think of PyTorch as “NumPy with superpowers”—it feels like standard Python, but it can run on GPUs and calculate derivatives automatically. Here is a structured questionnaire of exercises to build your foundation.


Part 1: Tensor Basics

The bread and butter of PyTorch. If you can’t manipulate tensors, you can’t build models.

  1. Creation: Create a \(3 \times 3\) matrix of random numbers drawn from a normal distribution. Then, create a tensor filled with zeros of the same shape.
  2. Type Casting: Create a float tensor and convert it to a 64-bit integer tensor.
  3. The “Bridge”: Convert a NumPy array into a PyTorch tensor and back again. What happens to the underlying memory if you modify the NumPy array?
  4. Reshaping: Take a tensor of shape (1, 16) and change its shape to (4, 4) using both .view() and .reshape(). What is the technical difference between these two methods?
  5. Slicing: Given a \(5 \times 5\) tensor, extract the middle \(3 \times 3\) block.

Part 2: Operations and Hardware

Where the speed comes from.

  1. Matrix Math: Perform a matrix multiplication between a \(2 \times 3\) tensor and a \(3 \times 4\) tensor. Ensure you know the difference between * and @.
  2. Broadcasting: Add a 1D tensor of shape (3) to a 2D tensor of shape (3, 3). How does PyTorch decide which dimensions to “stretch”?
  3. Device Agnostic Code: Write a snippet that checks if a GPU (CUDA) or MPS (Metal Performance Shaders for Mac) is available and moves a tensor to that device.

Part 3: Autograd (The Magic)

This is how PyTorch handles Backpropagation.

  1. Gradient Tracking: Create a tensor \(x = 2.0\) and set it to track gradients. Define \(y = x^2 + 5x\). Compute the derivative and print the gradient of \(x\).
    • Math Check: The result should be \(\frac{dy}{dx} = 2x + 5\)
  2. Detaching: Explain (or demonstrate) what happens when you call .detach() or use the with torch.no_grad(): context manager. Why is this critical during model inference?

Part 4: Building a Neural Network

Putting the pieces together using torch.nn.

  1. The Linear Layer: Manually calculate the output of a nn.Linear(2, 1) layer given an input [1.0, 2.0]. Then, verify it using PyTorch.
  2. Activation Functions: Apply a ReLU activation to a tensor containing both positive and negative values. What happens to the negative values?
  3. The Subclass: Create a class SimpleNet that inherits from nn.Module.
    • Define two linear layers in __init__.
    • Implement the forward pass with a Sigmoid activation in between.
  4. Loss & Optimization: * Define a Mean Squared Error (MSE) loss function.
    • Define a Stochastic Gradient Descent (SGD) optimizer.
    • Write the 5-step “Boilerplate” code for a single training step:
      1. optimizer.zero_grad()
      2. forward pass
      3. calculate loss
      4. backward pass
      5. optimizer.step()

Final Challenge: The Synthetic Regression

Create 100 points of synthetic data following the line \(y = 3x + 2\) with some added noise. Use PyTorch to “learn” the weight ($3$) and the bias ($2$) using a single linear layer and a training loop of 100 epochs.

Pro-Tip: If you get stuck on dimensions, always use tensor.shape. 90% of PyTorch errors are just “shape mismatches” in disguise!