ARTIFICIAL INTELLIGENCE (44) – Computer Vision (1) Object Detection/ Open Images dataset

Object-related tasks in vision are problems where a computer analyzes images to understand what objects appear in them. Classification is one of the simplest tasks. The goal is to assign one label to an image. The label answers the question: What is the main object in this image?

The main object-related tasks in computer vision, moving from simpler to more detailed understanding of images are:

1. Classification (single object).

The system assigns one label to the entire image. It answers: What is in this image? Example: an image with one dog, the output is “dog”. No information about where the object is located.

2. Classification + Localization (single object)

The system both identifies the object and locates it. It outputs: A class label (e.g., dog) and a bounding box around the object. This still assumes one main object in the image.

3. Object Detection (multiple objects)

The system detects multiple objects in the same image. For each object, it provides: A class label and a bounding box. Example: two cats are two boxes, each labeled “cat”.

4. Instance Segmentation (multiple objects)

This is the most detailed task. The system: Identifies each object, separates each individual instance, marks the exact pixels belonging to every object. Instead of boxes, you get precise object marks.

In summary:

  • Classification → What is it?
  • Localization → What is it and where is it?
  • Detection → What objects are there and where are they?
  • Instance segmentation → Exactly which pixels belong to each object?

© Image. Machine Learning in Computer Vision

 

Open Images Dataset and why it is important for object detection and related computer vision tasks.

Open Images is a large-scale, publicly available dataset used to train and evaluate computer vision models. It contains millions of real-world images annotated with: Image-level labels, Bounding boxes, Object classes and  Relationships between objects.

Open Images has Compared to older datasets more complex scenes with several objects in the same image. This makes it more realistic and challenging for object detection models. Open Images v7 is especially useful for: Object detection,  Multi-object classification, Learning from complex, and cluttered scenes. It pushes models to handle: Many objects per image and Large variability in size, position, and context.

In summary:
Open Images v7 is a rich and challenging dataset that better represents real-world visual complexity, making it ideal for training advanced object detection and recognition systems.

© Image. Open Images dataset

 

 

Bonus 

Write down your own ideas.

 

Licencia Creative Commons@Yolanda Muriel Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Deja un comentario