Phát hiện đối tượng với Tensorflow 2

Hôm nay, mình sẽ tiếp tục thực hiện bài toán phát hiện đối tượng sử dụng API của Google là Tensorflow 2.

TensorFlow 2.0, được ra mắt vào tháng 10 năm 2019, cải tiến framework theo nhiều cách dựa trên phản hồi của người dùng, để dễ dàng và hiệu quả hơn khi làm việc cùng nó (ví dụ: bằng cách sử dụng các Keras API liên quan đơn giản cho việc train model). Train phân tán dễ chạy hơn nhờ vào API mới và sự hỗ trợ cho TensorFlow Lite cho phép triển khai các mô hình trên khá nhiều nền tảng khác nhau. Tuy nhiên, nếu đã viết code trên các phiên bản trước đó của TensorFlow thì bạn phải viết lại, đôi lúc 1 ít, đôi lúc cũng khá đáng kể, để tận dụng tối đa các tính năng mới của TensorFlow 2.0. (nguồn bài viết tại đây)

1.Cài đặt thư viện

Như bình thường để không ảnh hưởng đến các thư viện khác mình sẽ tạo một môi trường ảo. Các bạn có thể xem lại tại đây

Sau đó cài thư viện bằng lệnh pip. Ở đây mình cài bản mới nhất hiện tại là tensorflow 2.7

pip install tensorflow

2. Cài API object detection

Tải model tensorflow tại đây.

Sau khi tải về và giải nén chúng ta có thể tìm thấy thư mục object_detection bên trong thư mục model-master/research

Tải protobuf mới nhất tại đây. Ở đây mình dùng trên window nên chọn file protoc-3.19.1-wind64.zip.

Sau đó, giải nén thư mục vừa tải về có các thư mục con

Tiếp theo thêm đường dẫn chỉ tới thư mục bin vừa giải nén ra trong bằng cách thêm path trong environment variable.

Sau khi thêm chỉ đến thư mục bin, chúng ta mở của sổ command line đến thư mục model-master/research chạy lệnh :

protoc object_detection/protos/*.proto --python_out=.

Copy file setup.py trong thư mục object_detection/packages/tf2 đến thư mục research và chạy lệnh có thể mất vài phút

python -m pip install --use-feature=2020-resolver .

Kiếm tra object detection đã cài đặt thành công chưa, kết quả như dưới đây.

python object_detection/builders/model_builder_tf2_test.py

3. Training

Đầu tiên chuẩn bị data train các bạn xem lại bài trước của mình tại đây Dữ liệu hôm nay mình sử dụng là pistol dataset.

Tạo file labelmap cho dataset

item {
    id: 1
    name: 'pistol'
}

Sau khi phân chia dữ liệu và tạo file record

Download model đã được train trước tesorflow 2 model zoo. Mình sẽ chọn SSD ResNet50 V1 FPN 640×640 để train.

Giải nén thư mục đã tải về và thay đổi đường dẫn chỉ đến dataset .

Bây giờ chúng ta sẽ chỉnh sửa file pipeline.config ( các dòng màu xanh).

model {
  ssd {
    num_classes: 1   # số class train
    image_resizer {
      fixed_shape_resizer {
        height: 640
        width: 640
      }
    }
    feature_extractor {
      type: "ssd_resnet50_v1_fpn_keras"
      depth_multiplier: 1.0
      min_depth: 16
      conv_hyperparams {
        regularizer {
          l2_regularizer {
            weight: 0.00039999998989515007
          }
        }
        initializer {
          truncated_normal_initializer {
            mean: 0.0
            stddev: 0.029999999329447746
          }
        }
        activation: RELU_6
        batch_norm {
          decay: 0.996999979019165
          scale: true
          epsilon: 0.0010000000474974513
        }
      }
      override_base_feature_extractor_hyperparams: true
      fpn {
        min_level: 3
        max_level: 7
      }
    }
    box_coder {
      faster_rcnn_box_coder {
        y_scale: 10.0
        x_scale: 10.0
        height_scale: 5.0
        width_scale: 5.0
      }
    }
    matcher {
      argmax_matcher {
        matched_threshold: 0.5
        unmatched_threshold: 0.5
        ignore_thresholds: false
        negatives_lower_than_unmatched: true
        force_match_for_each_row: true
        use_matmul_gather: true
      }
    }
    similarity_calculator {
      iou_similarity {
      }
    }
    box_predictor {
      weight_shared_convolutional_box_predictor {
        conv_hyperparams {
          regularizer {
            l2_regularizer {
              weight: 0.00039999998989515007
            }
          }
          initializer {
            random_normal_initializer {
              mean: 0.0
              stddev: 0.009999999776482582
            }
          }
          activation: RELU_6
          batch_norm {
            decay: 0.996999979019165
            scale: true
            epsilon: 0.0010000000474974513
          }
        }
        depth: 256
        num_layers_before_predictor: 4
        kernel_size: 3
        class_prediction_bias_init: -4.599999904632568
      }
    }
    anchor_generator {
      multiscale_anchor_generator {
        min_level: 3
        max_level: 7
        anchor_scale: 4.0
        aspect_ratios: 1.0
        aspect_ratios: 2.0
        aspect_ratios: 0.5
        scales_per_octave: 2
      }
    }
    post_processing {
      batch_non_max_suppression {
        score_threshold: 9.99999993922529e-09
        iou_threshold: 0.6000000238418579
        max_detections_per_class: 100
        max_total_detections: 100
        use_static_shapes: false
      }
      score_converter: SIGMOID
    }
    normalize_loss_by_num_matches: true
    loss {
      localization_loss {
        weighted_smooth_l1 {
        }
      }
      classification_loss {
        weighted_sigmoid_focal {
          gamma: 2.0
          alpha: 0.25
        }
      }
      classification_weight: 1.0
      localization_weight: 1.0
    }
    encode_background_as_zeros: true
    normalize_loc_loss_by_codesize: true
    inplace_batchnorm_update: true
    freeze_batchnorm: false
  }
}
train_config {
  batch_size: 2  
# tăng/giảm tùy theo cấu hình máy tính, máy của mình cùi vãi :v
  data_augmentation_options {
    random_horizontal_flip {
    }
  }
  data_augmentation_options {
    random_crop_image {
      min_object_covered: 0.0
      min_aspect_ratio: 0.75
      max_aspect_ratio: 3.0
      min_area: 0.75
      max_area: 1.0
      overlap_thresh: 0.0
    }
  }
  sync_replicas: true
  optimizer {
    momentum_optimizer {
      learning_rate {
        cosine_decay_learning_rate {
          learning_rate_base: 0.03999999910593033
          total_steps: 25000
          warmup_learning_rate: 0.013333000242710114
          warmup_steps: 2000
        }
      }
      momentum_optimizer_value: 0.8999999761581421
    }
    use_moving_average: false
  }
  fine_tune_checkpoint:"ssd_resnet50_v1_fpn/checkpoint/ckpt-0"
  num_steps: 25000
  startup_delay_steps: 0.0
  replicas_to_aggregate: 8
  max_number_of_boxes: 100
  unpad_groundtruth_tensors: false
  fine_tune_checkpoint_type: "detection"
  use_bfloat16: false
  fine_tune_checkpoint_version: V2
}
train_input_reader {
  label_map_path: "pistol_dataset/labelmap.pbtxt"
  tf_record_input_reader {
    input_path: "pistol_dataset/train.record"
  }
}
eval_config {
  metrics_set: "coco_detection_metrics"
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "pistol_dataset/labelmap.pbtxt"
  shuffle: false
  num_epochs: 1
  tf_record_input_reader {
    input_path: "pistol_dataset/test.record"
  }
}

Sau đó các tìm file model_main_tf2.py trong thư mục research/object_detection và chạy lệnh sau :

python model_main_tf2.py 
--model_dir=models/ssd_resnet50_v1_fpn 
-pipeline_config_path=models/ssd_resnet50_v1_fpn/pipeline.config

Export model đã được train

Tìm file exporter_main_v2.py trong thư mục object_detection và chạy lệnh sau:

python exporter_main_v2.py 
--input_type image_tensor 
--pipeline_config_path ssd_resnet50_v1_fpn/pipeline.config 
--trained_checkpoint_dir thư_mục_lưu_checkpoint
--output_directory output_dir

Test model

import time
import tensorflow as tf
from object_detection.utils import label_map_util
from object_detection.utils import visualization_utils as viz_utils
import numpy as np
import warnings
import cv2
import os

warnings.filterwarnings('ignore')  # Suppress Matplotlib warnings


PATH_TO_SAVED_MODEL = r"my_model/saved_model"
PATH_TO_LABELS = r"labelmap.pbtxt"
print('Loading model...', end='')
start_time = time.time()

# Load saved model and build the detection function
detect_fn = tf.saved_model.load(PATH_TO_SAVED_MODEL)

end_time = time.time()
elapsed_time = end_time - start_time
print('Done! Took {} seconds'.format(elapsed_time))

category_index = label_map_util.create_category_index_from_labelmap(PATH_TO_LABELS,
                                                                    use_display_name=True)

for img in os.scandir(r"img/test_model"):
    image_np = cv2.imread(img.path)

    #     np.mean(image_np, 2, keepdims=True), (1, 1, 3)).astype(np.uint8)

    # The input needs to be a tensor, convert it using `tf.convert_to_tensor`.
    input_tensor = tf.convert_to_tensor(image_np)
    # The model expects a batch of images, so add an axis with `tf.newaxis`.
    input_tensor = input_tensor[tf.newaxis, ...]

    # input_tensor = np.expand_dims(image_np, 0)
    detections = detect_fn(input_tensor)

    # All outputs are batches tensors.
    # Convert to numpy arrays, and take index [0] to remove the batch dimension.
    # We're only interested in the first num_detections.
    num_detections = int(detections.pop('num_detections'))
    detections = {key: value[0, :num_detections].numpy()
                  for key, value in detections.items()}
    detections['num_detections'] = num_detections

    # detection_classes should be ints.
    detections['detection_classes'] = detections['detection_classes'].astype(np.int64)

    image_np_with_detections = image_np.copy()

    viz_utils.visualize_boxes_and_labels_on_image_array(
        image_np_with_detections,
        detections['detection_boxes'],
        detections['detection_classes'],
        detections['detection_scores'],
        category_index,
        use_normalized_coordinates=True,
        max_boxes_to_draw=200,
        min_score_thresh=.30,
        agnostic_mode=False)

    cv2.imshow("output", image_np_with_detections)
    cv2.waitKey(0)