ARTIFICIAL INTELLIGENCE (17) – Deep learning (15) Implement a Class that Inherits from Dataset

In PyTorch, when working with data for deep learning models, it is common to create a custom dataset class that inherits from torch.utils.data.Dataset. This allows PyTorch to interact with your dataset in a standardized way and makes it possible to use utilities like the DataLoader for batching, shuffling, and parallel data loading.

Typical import:

from torch.utils.data import Dataset

Inheritance in Python means creating a new class that extends the functionality of an existing class.

When you write:

class MyDataset(Dataset):

you are telling PyTorch that your class follows the Dataset protocol. This protocol requires your class to implement two essential methods:

  1. __len__() – returns the total number of samples in the dataset

  2. __getitem__(idx) – returns a specific data sample given its index

Because PyTorch expects these methods, it can interact with your dataset automatically.

By inheriting from Dataset, your dataset becomes compatible with PyTorch’s data pipeline, especially the DataLoader.

This allows PyTorch to:

  • Determine the size of the dataset using len(dataset)

  • Access individual samples using dataset[i]

  • Load data in batches

  • Shuffle the data during training

  • Use parallel workers for faster data loading

Without implementing this structure, PyTorch would not know how to access or organize your data.

A typical custom dataset class looks like this:

Once the dataset class is defined, it can be wrapped with a DataLoader:

Now PyTorch will:

  • Call __len__() to know the dataset size

  • Call __getitem__(idx) repeatedly to retrieve samples

  • Automatically group samples into batches

Example training loop:

Custom datasets are especially useful when:

  • Loading images from folders

  • Reading CSV or JSON datasets

  • Preprocessing data before feeding it to the model

  • Combining multiple data sources

  • Applying transformations (normalization, augmentation)

 

Licencia Creative Commons@Yolanda Muriel Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Deja un comentario