Phát hiện đối tượng với Tensorflow 1

1.Phát hiện đối tượng là gì ?

Phát hiện đối tượng là quá trình trình tìm kiếm các đối tượng trong thế giới thực. VD : khuôn mặt, nhà cửa, xe đạp … trong hình ảnh hoặc video. Đặc trưng của thuật toán tìm kiếm đối tượng là sử dụng các thuật toán lấy ra đặc trưng và học để nhận ra một danh sách đối tượng đã được train. Nó được sử dụng phổ biến trong các ứng dụng như khôi phục hình ảnh, bảo mật, giám sát, xe tự lái …

2. API phát hiện đối tượng của tensorflow

Tensorflow là framework deep learning mã nguồn mở được tạo ra bởi Google Brain. API phát hiện đối tượng của tensorflow là một công cụ mạnh để tạo ra các bộ phân loại hình ảnh.

Cài đặt thư viện cần thiết:

pip install protobuf
pip install pillow
pip install lxml
pip install Cython
pip install jupyter
pip install matplotlib
pip install panda
pip install opencv-python 
pip install tensorflow

2.1 Cài đặt tensorflow API

Tạo môi trường ảo để không ảnh hưởng đến những project khác của các bạn. Các bạn có thêm xem tại đây .

Tải source code từ github.

Sau khi tải về và giải nén chúng ta có thể tìm thấy thư mục object_detection bên trong thư mục model-master/research

Tạo một biến PYTHONPATH từ command line chỉ đến các thư muc \models(model-master nếu download zip ), \models\research, and \models\research\slim vừa giải nén ở trên (mỗi lần mở của cmd mới đều phải chạy lệnh này).

set PYTHONPATH=E:/PROGRAMMING/models-master;
E:/PROGRAMMING/models-master/resarch;
E:/PROGRAMMING/models-master/resarch/slim

Biên dịch protobuf files và chạy file setup.py:

Chúng ta cần phải biên dịch các file protobuf để config model và các tham số cho tensorflow. Để biên dịch protobuf đầu tiền cần tải về trình biên dịch protobuf tại đây . Chọn bản có đuôi win64.zip vì đang dùng trên window.

Sau khi tải về giải nén vào thư mục models-master/resarch. Copy code ở bên dưới lưu lại file có tên use_protobuf.py trong thư muc models-master/resarch.

import os 
  import sys 
  args = sys.argv 
  directory = args[1] 
  protoc_path = args[2] 
  for file in os.listdir(directory):
     if file.endswith(".proto"):
         os.system(protoc_path+" "+directory+"/"+file+" --python_out=.")

Di chuyển đến thư mục models-master/resarch từ cửa số command vừa set PYTHONPATH ở trên thì mới được và chạy lệnh python.

python use_protobuf.py  .\object_detection\protos\ .\bin\protoc

Sau đó copy file setup.py object_detection/tf1 đến thư mục reseach hiện tại chạy 2 lệnh python (ở đây có 2 thư mục tf1 và tf2, tf1 cho bài viết này, tf2 cho tensorflow 2 sau này mình sẽ làm sau) :

 python setup.py build
 python setup.py install

kết quả như hình ảnh

3. Bắt đầu training

3.1 Chuẩn bị dataset

Data sử dụng training mình tải ở đây.

3.2 gán nhãn dữ liệu

Sử dụng tool labelimg, có thể được download tại download labelimg.

Cách sử dụng :

Mỗi ảnh được gán nhãn tạo ra một file xml

3.3 Phân chia dữ liệu

Sau khi gán nhán dữ liệu nên được chia thành 2 phần train và test. Ở đây chúng ta phân chia theo tỉ lệ 90% train và 10% test sử dụng code sau đây có thể download tại đây:

""" usage: partition_dataset.py [-h] [-i IMAGEDIR] [-o OUTPUTDIR] [-r RATIO] [-x]

Partition dataset of images into training and testing sets

optional arguments:
  -h, --help            show this help message and exit
  -i IMAGEDIR, --imageDir IMAGEDIR
                        Path to the folder where the image dataset is stored. If not specified, the CWD will be used.
  -o OUTPUTDIR, --outputDir OUTPUTDIR
                        Path to the output folder where the train and test dirs should be created. Defaults to the same directory as IMAGEDIR.
  -r RATIO, --ratio RATIO
                        The ratio of the number of test images over the total number of images. The default is 0.1.
  -x, --xml             Set this flag if you want the xml annotation files to be processed and copied over.
"""
import os
import re
from shutil import copyfile
import argparse
import math
import random


def iterate_dir(source, dest, ratio, copy_xml):
    source = source.replace('\\', '/')
    dest = dest.replace('\\', '/')
    train_dir = os.path.join(dest, 'train')
    test_dir = os.path.join(dest, 'test')

    if not os.path.exists(train_dir):
        os.makedirs(train_dir)
    if not os.path.exists(test_dir):
        os.makedirs(test_dir)

    images = [f for f in os.listdir(source)
              if re.search(r'([a-zA-Z0-9\s_\\.\-\(\):])+(?i)(.jpg|.jpeg|.png)$', f)]

    num_images = len(images)
    num_test_images = math.ceil(ratio*num_images)

    for i in range(num_test_images):
        idx = random.randint(0, len(images)-1)
        filename = images[idx]
        copyfile(os.path.join(source, filename),
                 os.path.join(test_dir, filename))
        if copy_xml:
            xml_filename = os.path.splitext(filename)[0]+'.xml'
            copyfile(os.path.join(source, xml_filename),
                     os.path.join(test_dir,xml_filename))
        images.remove(images[idx])

    for filename in images:
        copyfile(os.path.join(source, filename),
                 os.path.join(train_dir, filename))
        if copy_xml:
            xml_filename = os.path.splitext(filename)[0]+'.xml'
            copyfile(os.path.join(source, xml_filename),
                     os.path.join(train_dir, xml_filename))


def main():

    # Initiate argument parser
    parser = argparse.ArgumentParser(description="Partition dataset of images into training and testing sets",
                                     formatter_class=argparse.RawTextHelpFormatter)
    parser.add_argument(
        '-i', '--imageDir',
        help='Path to the folder where the image dataset is stored. If not specified, the CWD will be used.',
        type=str,
        default=os.getcwd()
    )
    parser.add_argument(
        '-o', '--outputDir',
        help='Path to the output folder where the train and test dirs should be created. '
             'Defaults to the same directory as IMAGEDIR.',
        type=str,
        default=None
    )
    parser.add_argument(
        '-r', '--ratio',
        help='The ratio of the number of test images over the total number of images. The default is 0.1.',
        default=0.1,
        type=float)
    parser.add_argument(
        '-x', '--xml',
        help='Set this flag if you want the xml annotation files to be processed and copied over.',
        action='store_true'
    )
    args = parser.parse_args()

    if args.outputDir is None:
        args.outputDir = args.imageDir

    # Now we are ready to start the iteration
    iterate_dir(args.imageDir, args.outputDir, args.ratio, args.xml)


if __name__ == '__main__':
    main()

Sau đó sử dụng command prompt để chạy lệnh python sẽ tạo ra 2 thư mục train và test ngay trong thư mục chứa dữ liệu của bạn :

python partition_dataset.py -x -i [đường đến thư mục chứa ảnh và file xml] -r 0.1

3.4 Tạo file label map

Tensorflow yêu cầu file label với tên của mỗi đối tượng map với một số nguyên. Trong trường hợp cần phát hiện nhiều đối tượng.

item {
    id: 1
    name: 'cat'
}

item {
    id: 2
    name: 'dog'
}

Trường hợp của chúng ta chỉ có 1 đối tượng sẽ như sau (name phải giống với tên đối tượng đã được gán nhãn ở trên)

item {
    id: 1
    name: 'raccoon'
}

3.5 Tạo file tensorflow record

Chúng ta đã tạo ra các file gán nhãn (annotation ) và chia dữ liệu training thành 2 tập train và test. Bây giờ chúng ta cần chuyển sang định dạng TFrecord. Đoạn code dưới đây giúp làm việc này.

""" Sample TensorFlow XML-to-TFRecord converter

usage: generate_tfrecord.py [-h] [-x XML_DIR] [-l LABELS_PATH] [-o OUTPUT_PATH] [-i IMAGE_DIR] [-c CSV_PATH]

optional arguments:
  -h, --help            show this help message and exit
  -x XML_DIR, --xml_dir XML_DIR
                        Path to the folder where the input .xml files are stored.
  -l LABELS_PATH, --labels_path LABELS_PATH
                        Path to the labels (.pbtxt) file.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Path of output TFRecord (.record) file.
  -i IMAGE_DIR, --image_dir IMAGE_DIR
                        Path to the folder where the input image files are stored. Defaults to the same directory as XML_DIR.
  -c CSV_PATH, --csv_path CSV_PATH
                        Path of output .csv file. If none provided, then no file will be written.
"""

import os
import glob
import pandas as pd
import io
import xml.etree.ElementTree as ET
import argparse

os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'    # Suppress TensorFlow logging (1)
import tensorflow.compat.v1 as tf
from PIL import Image
from object_detection.utils import dataset_util, label_map_util
from collections import namedtuple

# Initiate argument parser
parser = argparse.ArgumentParser(
    description="Sample TensorFlow XML-to-TFRecord converter")
parser.add_argument("-x",
                    "--xml_dir",
                    help="Path to the folder where the input .xml files are stored.",
                    type=str)
parser.add_argument("-l",
                    "--labels_path",
                    help="Path to the labels (.pbtxt) file.", type=str)
parser.add_argument("-o",
                    "--output_path",
                    help="Path of output TFRecord (.record) file.", type=str)
parser.add_argument("-i",
                    "--image_dir",
                    help="Path to the folder where the input image files are stored. "
                         "Defaults to the same directory as XML_DIR.",
                    type=str, default=None)
parser.add_argument("-c",
                    "--csv_path",
                    help="Path of output .csv file. If none provided, then no file will be "
                         "written.",
                    type=str, default=None)

args = parser.parse_args()

if args.image_dir is None:
    args.image_dir = args.xml_dir

label_map = label_map_util.load_labelmap(args.labels_path)
label_map_dict = label_map_util.get_label_map_dict(label_map)


def xml_to_csv(path):
    """Iterates through all .xml files (generated by labelImg) in a given directory and combines
    them in a single Pandas dataframe.

    Parameters:
    ----------
    path : str
        The path containing the .xml files
    Returns
    -------
    Pandas DataFrame
        The produced dataframe
    """

    xml_list = []
    for xml_file in glob.glob(path + '/*.xml'):
        tree = ET.parse(xml_file)
        root = tree.getroot()
        filename = root.find('filename').text
        width = int(root.find('size').find('width').text)
        height = int(root.find('size').find('height').text)
        for member in root.findall('object'):
            bndbox = member.find('bndbox')
            value = (filename,
                     width,
                     height,
                     member.find('name').text,
                     int(bndbox.find('xmin').text),
                     int(bndbox.find('ymin').text),
                     int(bndbox.find('xmax').text),
                     int(bndbox.find('ymax').text),
                     )
            xml_list.append(value)
    column_name = ['filename', 'width', 'height',
                   'class', 'xmin', 'ymin', 'xmax', 'ymax']
    xml_df = pd.DataFrame(xml_list, columns=column_name)
    return xml_df


def class_text_to_int(row_label):
    return label_map_dict[row_label]


def split(df, group):
    data = namedtuple('data', ['filename', 'object'])
    gb = df.groupby(group)
    return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)]


def create_tf_example(group, path):
    with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid:
        encoded_jpg = fid.read()
    encoded_jpg_io = io.BytesIO(encoded_jpg)
    image = Image.open(encoded_jpg_io)
    width, height = image.size

    filename = group.filename.encode('utf8')
    image_format = b'jpg'
    xmins = []
    xmaxs = []
    ymins = []
    ymaxs = []
    classes_text = []
    classes = []

    for index, row in group.object.iterrows():
        xmins.append(row['xmin'] / width)
        xmaxs.append(row['xmax'] / width)
        ymins.append(row['ymin'] / height)
        ymaxs.append(row['ymax'] / height)
        classes_text.append(row['class'].encode('utf8'))
        classes.append(class_text_to_int(row['class']))

    tf_example = tf.train.Example(features=tf.train.Features(feature={
        'image/height': dataset_util.int64_feature(height),
        'image/width': dataset_util.int64_feature(width),
        'image/filename': dataset_util.bytes_feature(filename),
        'image/source_id': dataset_util.bytes_feature(filename),
        'image/encoded': dataset_util.bytes_feature(encoded_jpg),
        'image/format': dataset_util.bytes_feature(image_format),
        'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
        'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
        'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
        'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
        'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
        'image/object/class/label': dataset_util.int64_list_feature(classes),
    }))
    return tf_example


def main(_):

    writer = tf.python_io.TFRecordWriter(args.output_path)
    path = os.path.join(args.image_dir)
    examples = xml_to_csv(args.xml_dir)
    grouped = split(examples, 'filename')
    for group in grouped:
        tf_example = create_tf_example(group, path)
        writer.write(tf_example.SerializeToString())
    writer.close()
    print('Successfully created the TFRecord file: {}'.format(args.output_path))
    if args.csv_path is not None:
        examples.to_csv(args.csv_path, index=None)
        print('Successfully created the CSV file: {}'.format(args.csv_path))


if __name__ == '__main__':
    tf.app.run()

Copy đoạn code trên và đặt tên file là generate_tfrecord.py hoặc download tại đây. Sau đó chay 2 lệnh python. Vẫn trong cửa sổ command promt lúc đầu.

# Create train data:
python generate_tfrecord.py -x [PATH_TO_IMAGES_FOLDER]/train -l [PATH_TO_ANNOTATIONS_FOLDER]/label_map.pbtxt -o [PATH_TO_ANNOTATIONS_FOLDER]/train.record

# Create test data:
python generate_tfrecord.py -x [PATH_TO_IMAGES_FOLDER]/test -l [PATH_TO_ANNOTATIONS_FOLDER]/label_map.pbtxt -o [PATH_TO_ANNOTATIONS_FOLDER]/test.record

3.6 Config training

Đầu tiên download file model đã được train trước về tại đây.

Ở đây mình chọn model ssd_mobinet_v1_coco. Sạu khi giải nén như hình bên dưới :

Tiếp tục chọn mở pipeline.cofig để thay đổi các param theo dataset của chúng ta. Đầu tiên là num_classes là số class trong dataset ở đây có 1class thì chúng ta thay bằng 1. Tiếp theo là batch_size tùy theo máy tính của bạn mạnh hay yếu mà chỉnh, máy cấu hình thâp· thì giảm xuống và ngược lại.

Sau đó sửa tham số fine_tune_checkpoint chỉ đến thư mục model pretrain vừa được giải nén path/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt.

Cuối cùng là đường dẫn đến file label_map.pbtxt và file record.

  fine_tune_checkpoint: "path/ssd_mobilenet_v1_coco_2018_01_28/model.ckpt"

train_input_reader {
  label_map_path: "path/labelmap.pbtxt"
  tf_record_input_reader {
    input_path: "path/train.record"
  }
}
eval_config {
  num_examples: 8000
  max_evals: 10
  use_moving_averages: false
}
eval_input_reader {
  label_map_path: "path/labelmap.pbtxt"
  shuffle: false
  num_readers: 1
  tf_record_input_reader {
    input_path: "path/test.record"

3.7 Training

Đây là thời gian bắt đầu training. Các bạn có thể tìm thấy file train.py trong thư mục research/object_detection/legacy và chạy lệnh sau đây để train :

python train.py --logtostderr
                --train_dir=training/ 
                --pipeline_config_path=path/pipeline.config

Quá trình training bắt đầu

Checkpoint được lưu trong folder training (tham số train_dir trong lệnh python train.py )

Tiếp tục train cho đến khi loss nhỏ hơn 1

Export inference graph

Sau khi train xong hoặc cảm thấy loss đạt yêu cầu chúng ta tìm file export_inference_graph.py trong thư mục research/object_detection chạy lệnh python sau đây trong cửa số cmd. Nếu của sổ cmd mới phải set lại PYTHONPATH như lúc đầu.

python export_inference_graph.py
 --input_type image_tensor 
 --pipeline_config_path training/pipeline.config 
 --trained_checkpoint_prefix training/model.ckpt-XXXX 
 --output_directory inference_graph

XXXX ở đây nên là số cao nhất trong thư mục lưu checkpoint (training_dir)

3.8 Demo

import os
import cv2
import numpy as np
import tensorflow as tf
import sys
  
# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
  
# Import utilites
from utils import label_map_util
from utils import visualization_utils as vis_util
  
# Name of the directory containing the object detection module we're using
MODEL_NAME = 'inference_graph' # The path to the directory where frozen_inference_graph is stored.
IMAGE_NAME = '11man.jpg'  # The path to the image in which the object has to be detected.
  
# Grab path to current working directory
CWD_PATH = os.getcwd()
  
# Path to frozen detection graph .pb file, which contains the model that is used
# for object detection.
PATH_TO_CKPT = os.path.join(CWD_PATH, MODEL_NAME, 'frozen_inference_graph.pb')
  
# Path to label map file
PATH_TO_LABELS = os.path.join(CWD_PATH, 'training', 'labelmap.pbtxt')
  
# Path to image
PATH_TO_IMAGE = os.path.join(CWD_PATH, IMAGE_NAME)
  
# Number of classes the object detector can identify
NUM_CLASSES = 2
  
# Load the label map.
# Label maps map indices to category names, so that when our convolution
# network predicts `5`, we know that this corresponds to `king`.
# Here we use internal utility functions, but anything that returns a
# dictionary mapping integers to appropriate string labels would be fine
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(
        label_map, max_num_classes = NUM_CLASSES, use_display_name = True)
category_index = label_map_util.create_category_index(categories)
  
# Load the Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
    od_graph_def = tf.GraphDef()
    with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name ='')
  
    sess = tf.Session(graph = detection_graph)
  
# Define input and output tensors (i.e. data) for the object detection classifier
  
# Input tensor is the image
image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
  
# Output tensors are the detection boxes, scores, and classes
# Each box represents a part of the image where a particular object was detected
detection_boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
  
# Each score represents level of confidence for each of the objects.
# The score is shown on the result image, together with the class label.
detection_scores = detection_graph.get_tensor_by_name('detection_scores:0')
detection_classes = detection_graph.get_tensor_by_name('detection_classes:0')
  
# Number of objects detected
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
  
# Load image using OpenCV and
# expand image dimensions to have shape: [1, None, None, 3]
# i.e. a single-column array, where each item in the column has the pixel RGB value
image = cv2.imread(PATH_TO_IMAGE)
image_expanded = np.expand_dims(image, axis = 0)
  
# Perform the actual detection by running the model with the image as input
(boxes, scores, classes, num) = sess.run(
    [detection_boxes, detection_scores, detection_classes, num_detections],
    feed_dict ={image_tensor: image_expanded})
  
# Draw the results of the detection (aka 'visualize the results')
  
vis_util.visualize_boxes_and_labels_on_image_array(
    image,
    np.squeeze(boxes),
    np.squeeze(classes).astype(np.int32),
    np.squeeze(scores),
    category_index,
    use_normalized_coordinates = True,
    line_thickness = 8,
    min_score_thresh = 0.60)
  
# All the results have been drawn on the image. Now display the image.
cv2.imshow('Object detector', image)
  
# Press any key to close the image
cv2.waitKey(0)
  
# Clean up
cv2.destroyAllWindows()