Object Detection with TensorFlow Hub



On this publish, we are going to discover ways to carry out object detection with TensorFlow Hub pre-trained fashions. TensorFlow Hub is a library and platform designed for sharing, discovering, and reusing pre-trained machine studying fashions. The first objective of TensorFlow Hub is to simplify the method of reusing current fashions, thereby selling collaboration, decreasing redundant work, and accelerating analysis and improvement in machine studying. Customers can seek for pre-trained fashions, known as modules, which have been contributed by the group or offered by Google. These modules will be simply built-in right into a consumer’s personal machine studying tasks with only a few traces of code.

Object detection is a subfield of pc imaginative and prescient that focuses on figuring out and finding particular objects inside digital photos or movies. It includes not solely classifying the objects current in a picture but additionally figuring out their exact location and dimension by inserting bounding packing containers or different spatial encodings round them. On this instance, we are going to use the mannequin EfficientDet/d4, which is from a household of fashions generally known as EfficientDet. The pre-trained fashions from this household accessible on TensorFlow Hub had been all educated on the COCO 2017 dataset. The totally different fashions within the household, starting from D0 to D7, fluctuate when it comes to complexity and enter picture dimensions. D0, probably the most compact mannequin, accepts enter sizes of 512×512 pixels and gives the quickest inference pace. On the different finish of the spectrum, we now have D7, which requires an enter dimension of 1536×1536 and takes significantly longer to carry out inference. A number of different object detection fashions will be discovered right here as nicely.

import os
import numpy as np
import cv2

import zipfile
import requests
import glob as glob

import tensorflow_hub as hub

import matplotlib
import matplotlib.pyplot as plt

import warnings
import logging
import absl

# Filter absl warnings
warnings.filterwarnings("ignore", module="absl")

# Seize all warnings within the logging system

# Set the absl logger degree to 'error' to suppress warnings
absl_logger = logging.getLogger("absl")

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

Obtain Pattern Photographs

def download_file(url, save_name):
    url = url
    file = requests.get(url)

    open(save_name, 'wb').write(file.content material)
def unzip(zip_file=None):
        with zipfile.ZipFile(zip_file) as z:
            print("Extracted all")
        print("Invalid file")
Extracted all

Show Pattern Photographs

image_paths = sorted(glob.glob('object_detection_images' + '/*.png'))

for idx in vary(len(image_paths)):
def load_image(path):

    picture = cv2.imread(path)
    # Convert picture in BGR format to RGB.
    picture = cv2.cvtColor(picture, cv2.COLOR_BGR2RGB)
    # Add a batch dimension which is required by the mannequin.
    picture = np.expand_dims(picture, axis=0)
    return picture
photos = []
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(20, 15))

for axis in ax.flat:
    picture = load_image(image_paths[idx])
Sample images to use for object detection with TensorFlow Hub.

Outline a Dictionary that Maps Class IDs to Class Names

class_index is a dictionary that maps class IDs to class names for the 90 lessons within the COCO dataset.

class_index =  
         1: 'particular person',
         2: 'bicycle',
         3: 'automobile',
         4: 'bike',
         5: 'airplane',
         6: 'bus',
         7: 'prepare',
         8: 'truck',
         9: 'boat',
         10: 'site visitors gentle',
         11: 'fireplace hydrant',
         13: 'cease signal',
         14: 'parking meter',
         15: 'bench',
         16: 'fowl',
         17: 'cat',
         18: 'canine',
         19: 'horse',
         20: 'sheep',
         21: 'cow',
         22: 'elephant',
         23: 'bear',
         24: 'zebra',
         25: 'giraffe',
         27: 'backpack',
         28: 'umbrella',
         31: 'purse',
         32: 'tie',
         33: 'suitcase',
         34: 'frisbee',
         35: 'skis',
         36: 'snowboard',
         37: 'sports activities ball',
         38: 'kite',
         39: 'baseball bat',
         40: 'baseball glove',
         41: 'skateboard',
         42: 'surfboard',
         43: 'tennis racket',
         44: 'bottle',
         46: 'wine glass',
         47: 'cup',
         48: 'fork',
         49: 'knife',
         50: 'spoon',
         51: 'bowl',
         52: 'banana',
         53: 'apple',
         54: 'sandwich',
         55: 'orange',
         56: 'broccoli',
         57: 'carrot',
         58: 'sizzling canine',
         59: 'pizza',
         60: 'donut',
         61: 'cake',
         62: 'chair',
         63: 'sofa',
         64: 'potted plant',
         65: 'mattress',
         67: 'eating desk',
         70: 'bathroom',
         72: 'television',
         73: 'laptop computer',
         74: 'mouse',
         75: 'distant',
         76: 'keyboard',
         77: 'cellphone',
         78: 'microwave',
         79: 'oven',
         80: 'toaster',
         81: 'sink',
         82: 'fridge',
         84: 'ebook',
         85: 'clock',
         86: 'vase',
         87: 'scissors',
         88: 'teddy bear',
         89: 'hair drier',
         90: 'toothbrush'

Right here we are going to use COLOR_IDS to map every class with a novel RGB shade.

R = np.array(np.arange(96, 256, 32))
G = np.roll(R, 1)
B = np.roll(R, 2)

COLOR_IDS = np.array(np.meshgrid(R, G, B)).T.reshape(-1, 3)

Mannequin Inference utilizing Tensorflow Hub

TensorFlow Hub accommodates many alternative pre-trained object detection fashions. Right here we are going to use the EfficientDet class of object detection fashions that had been educated on the COCO 2017 dataset. There are a number of variations of EfficientDet fashions. The EfficientDet household of object detectors consists of a number of fashions with totally different ranges of complexity and efficiency, starting from D0 to D7. The variations between the assorted fashions within the EfficientDet household are primarily of their structure, enter picture dimension, computational necessities, and efficiency.

EfficientDet  = {'EfficientDet D0 512x512'   : 'https://tfhub.dev/tensorflow/efficientdet/d0/1',
                 'EfficientDet D1 640x640'   : 'https://tfhub.dev/tensorflow/efficientdet/d1/1',
                 'EfficientDet D2 768x768'   : 'https://tfhub.dev/tensorflow/efficientdet/d2/1',
                 'EfficientDet D3 896x896'   : 'https://tfhub.dev/tensorflow/efficientdet/d3/1',
                 'EfficientDet D4 1024x1024' : 'https://tfhub.dev/tensorflow/efficientdet/d4/1',
                 'EfficientDet D5 1280x1280' : 'https://tfhub.dev/tensorflow/efficientdet/d5/1',
                 'EfficientDet D6 1280x1280' : 'https://tfhub.dev/tensorflow/efficientdet/d6/1',
                 'EfficientDet D7 1536x1536' : 'https://tfhub.dev/tensorflow/efficientdet/d7/1'

Right here we are going to use the D4 mannequin.

model_url = EfficientDet['EfficientDet D4 1024x1024' ]

print('loading mannequin: ', model_url)
od_model = hub.load(model_url)

print('nmodel loaded!')
loading mannequin:  https://tfhub.dev/tensorflow/efficientdet/d4/1
Metallic gadget set to: Apple M1 Max

mannequin loaded!

Carry out Inference

Earlier than we formalize the code to course of a number of photos and post-process the outcomes, let’s first see the best way to carry out inference on a single picture and examine the output from the mannequin.

Name the Mannequin

# Name the mannequin. # The mannequin returns the detection ends in the type of a dictionary.
outcomes = od_model(photos[0])

Examine the Outcomes

The thing detection mannequin returns the detection ends in the type of a dictionary which incorporates a number of several types of keys.

# Convert the dictionary values to numpy arrays.
outcomes = {key:worth.numpy() for key, worth in outcomes.gadgets()}
# Print the keys from the outcomes dictionary.
for key in outcomes:

Discover that the mannequin has a number of dictionary keys that can be utilized to entry numerous sorts of detection information. EfficientDet, like many different object detection fashions, generates numerous uncooked detections (bounding packing containers and corresponding class scores) for every enter picture. Many of those uncooked detections are redundant, overlapping, or have low confidence scores. To acquire significant outcomes, post-processing methods are utilized inside the mannequin to filter and refine these uncooked detections. For our functions, we’re solely within the detections which have been post-processed inside the mannequin, which can be found within the dictionary keys that begin with detection_.

Within the following code cells, we present that there are millions of uncooked detections, whereas there are 16 ultimate detections. Every of those ultimate detections has an related confidence rating which we might need to filter additional relying on the character of our software.

print('Num Uncooked Detections: ', (len(outcomes['raw_detection_scores'][0])))
print('Num Detections:     ', (outcomes['num_detections'][0]).astype(int))
Num Uncooked Detections:  196416
Num Detections:      16

Let’s now examine among the detection information for all 16 detections. Discover that the detections are sorted from the very best confidence detections to the bottom.

# Print the Scores, Lessons and Bounding Packing containers for the detections.
num_dets = (outcomes['num_detections'][0]).astype(int)

print('nDetection Scores: nn', outcomes['detection_scores'][0][0:num_dets])
print('nDetection Lessons: nn', outcomes['detection_classes'][0][0:num_dets])
print('nDetection Packing containers: nn', outcomes['detection_boxes'][0][0:num_dets])
Detection Scores: 

 [0.9053347  0.8789406  0.7202968  0.35475922 0.2805733  0.17851698
 0.15169667 0.14905979 0.14454156 0.13584    0.12682638 0.11745102
 0.10781792 0.10152479 0.10052315 0.09746186]

Detection Lessons: 

 [ 2. 18.  8.  3. 64. 64.  2. 18. 64. 64. 64.  4. 64. 44. 64. 77.]

Detection Packing containers: 

 [[0.16487242 0.15703079 0.7441227  0.74429274]
 [0.3536     0.16668764 0.9776781  0.40675405]
 [0.06442685 0.61166453 0.25209486 0.8956611 ]
 [0.06630661 0.611912   0.25146762 0.89877594]
 [0.08410528 0.06995308 0.18153256 0.13178551]
 [0.13754636 0.89751065 0.22187063 0.9401711 ]
 [0.34510636 0.16857824 0.97165954 0.40917954]
 [0.18023838 0.15531728 0.7696747  0.7740346 ]
 [0.087889   0.06875686 0.18782085 0.10366233]
 [0.00896974 0.11013152 0.0894229  0.15709913]
 [0.08782443 0.08899567 0.16129945 0.13988526]
 [0.16456181 0.1708141  0.72982967 0.75529355]
 [0.06907014 0.8944937  0.22174956 0.9605442 ]
 [0.30221778 0.10927744 0.33091408 0.15160759]
 [0.11132257 0.09432659 0.16303536 0.12937708]
 [0.133767   0.5592607  0.18178582 0.5844183 ]]

Put up-Course of and Show Detections

Right here we present the logic for the best way to interpret the detection information for a single picture. As we confirmed above, the mannequin returned 16 detections, nevertheless, many detections have low confidence scores, and we, subsequently, have to filter these additional by utilizing a minimal detection threshold.

  1. Retrieve the detections from the outcomes dictionary
  2. Apply a minimal detection threshold to filter the detections
  3. For every thresholded detection, show the bounding field and a label indicating the detected class and the arrogance of the detection.
def process_detection(picture, outcomes,  min_det_thresh=.3):

    # Extract the detection outcomes from the outcomes dictionary.
    scores  =  outcomes['detection_scores'][0]
    packing containers   =  outcomes['detection_boxes'][0]
    lessons = (outcomes['detection_classes'][0]).astype(int)

    # Set a minimal detection threshold to post-process the detection outcomes.
    min_det_thresh = min_det_thresh

    # Get the detections whose scores exceed the minimal detection threshold.
    det_indices = np.the place(scores >= min_det_thresh)[0]

    scores_thresh  = scores[det_indices]
    boxes_thresh   = packing containers[det_indices]
    classes_thresh = lessons[det_indices]

    # Make a duplicate of the picture to annotate.
    img_bbox = picture.copy()

    im_height, im_width = picture.form[:2]

    font_scale = .6
    box_thickness = 2

    # Loop over all thresholded detections.
    for field, class_id, rating in zip(boxes_thresh, classes_thresh, scores_thresh):

        # Get bounding field normalized coordiantes.
        ymin, xmin, ymax, xmax = field

        class_name = class_index[class_id]

        # Convert normalized bounding field coordinates to pixel coordinates.
        (left, proper, prime, backside) = (int(xmin * im_width), 
                                      int(xmax * im_width), 
                                      int(ymin * im_height), 
                                      int(ymax * im_height))

        # Annotate the picture with the bounding field.
        shade = tuple(COLOR_IDS[class_id % len(COLOR_IDS)].tolist())[::-1]
        img_bbox = cv2.rectangle(img_bbox, (left, prime), (proper, backside), shade, thickness=box_thickness)

        # Annotate bounding field with detection information (class title and rating).

        # Construct the textual content string that accommodates the category title and rating related to this detection.
        display_txt="{}: {:.2f}%".format(class_name, 100 * rating)
        ((text_width, text_height), _) = cv2.getTextSize(display_txt, cv2.FONT_HERSHEY_SIMPLEX, font_scale, 1)
        # Deal with case when the label is above the picture body.
        if prime < text_height:
            shift_down = int(2*(1.3*text_height))
            shift_down = 0
        # Draw a crammed rectangle on which the detection outcomes will likely be displayed.
        img_bbox = cv2.rectangle(img_bbox, 
                                 (left-1, top-box_thickness - int(1.3*text_height) + shift_down), 
                                 (left-1 + int(1.1 * text_width), prime),               

        # Annotate the crammed rectangle with textual content (class label and rating).
        img_bbox = cv2.putText(img_bbox, 
                               (left + int(.05*text_width), prime - int(0.2*text_height) + int(shift_down/2)),
                               cv2.FONT_HERSHEY_SIMPLEX, font_scale, (0, 0, 0), 1)
    return img_bbox

Show Outcomes with min_det_thresh=0

First, let’s course of a picture utilizing a minimal detection threshold of zero simply to see what the mannequin returned for all 16 detections. Since we aren’t filtering the outcomes, we anticipate that we might have some redundant and/or false detections.

# Name the mannequin.
outcomes = od_model(photos[0])

# Convert the dictionary values to numpy arrays.
outcomes = {key:worth.numpy() for key, worth in outcomes.gadgets()}

# Take away the batch dimension from the primary picture.
picture = np.squeeze(photos[0])

# Course of the primary pattern picture.
img_bbox = process_detection(picture, outcomes, min_det_thresh=0)

plt.determine(figsize=[15, 10])

The outcomes beneath present all of the detections returned by the mannequin since we didn’t apply a detection threshold to filter them. Nonetheless, discover that every one the mislabeled detections even have very low confidence. It’s at all times subsequently really useful to use a minimal detection threshold to the outcomes generated by the mannequin. The worth of the brink is one thing it is advisable experiment with relying on the info and the applying, however typically, a worth someplace between 0.3 and 0.5 is an effective rule of thumb.

FfficientDet results with zero threshold.

Show Outcomes with min_det_thresh=0.3

Let’s now apply a detection threshold to filter the outcomes.

img_bbox = process_detection(picture, outcomes, min_det_thresh=.3)

plt.determine(figsize=[15, 10])

Formalize the Implementation

On this part, we are going to formalize the implementation and create a comfort operate to execute the mannequin on a listing of photos. As famous within the documentation, the fashions on this household don’t help “batching.” This implies we have to name the mannequin as soon as for every picture. However word that the enter form for the picture does require a batch dimension.


run_inference() is a helper operate that can name the mannequin for every picture within the listing of photos.

def run_inference(photos, mannequin):
    results_list = []
    for img in photos:
        outcome = mannequin(img)
        outcome = {key:worth.numpy() for key,worth in outcome.gadgets()}


    return results_list
# Carry out inference on every picture and retailer the ends in a listing.
results_list = run_inference(photos, od_model)

Subsequent, we loop over every of the pictures and use the outcomes from the mannequin to annotate a duplicate of the picture, which is exhibited to the console.

for idx in vary(len(photos)):
    # Take away the batch dimension.
    picture = np.squeeze(photos[idx])
    # Generate the annotated picture.
    image_bbox = process_detection(picture, results_list[idx], min_det_thresh=.31)
    # Show annotated picture.
EfficientDet results dog, bicycle, car
EfficientDet results elephants
EfficientDet results home interior
EfficientDet results place setting


On this publish, we coated the best way to use pre-trained object detection fashions accessible in TensorFlow Hub. TensorFlow Hub simplifies the method of reusing current fashions by offering a central repository for sharing, discovering, and reusing pre-trained machine studying fashions. A necessary facet of working with these fashions includes decoding their output. A key facet of that is making use of a detection threshold to filter the outcomes generated by the mannequin. Setting an applicable detection threshold usually requires experimentation and also will rely closely on the kind of software. On this instance, we used the D4 mannequin from the EfficienDet Household. Nonetheless, in case your software requires quicker inference speeds, it’s best to think about a smaller mannequin (D0 to D3).

TensorFlow Hub Sources: