JP2021179971A

JP2021179971A - Method and apparatus for detecting small target, electronic device, computer readable storage medium, and computer program

Info

Publication number: JP2021179971A
Application number: JP2021051677A
Authority: JP
Inventors: ガンヘ; Gang He
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-27
Filing date: 2021-03-25
Publication date: 2021-11-18
Anticipated expiration: 2041-03-25
Also published as: KR20210042275A; KR102523886B1; CN111626208A; CN111626208B; JP7262503B2

Abstract

To disclose a method and an apparatus for detecting small targets, an electronic device, a computer readable storage medium, and a computer program, in embodiments of the present disclosure.SOLUTION: A specific embodiment of a method includes: acquiring an original image containing a small target; reducing an original image to a low-resolution image; identifying a candidate area containing the small target from the low-resolution image using a lightweight division network; and determining the position of the small target in the original image by setting an area of the original image corresponding to the candidate area as an interest area and executing a detection model trained in advance on the interest area. In the embodiment, a two-stage detection method is designed, first the interest area is searched for via the lightweight division network, and next the detection model is executed on the interest area. Thereby, an amount of calculation can be substantially saved.SELECTED DRAWING: Figure 2

Description

本開示の実施例は、コンピュータ技術の分野に関し、具体的には小目標を検出するための方法及び装置、電子デバイス、コンピュータ可読記憶媒体及びコンピュータプログラムに関する。 The embodiments of the present disclosure relate to the field of computer technology, specifically to methods and devices for detecting sub-targets, electronic devices, computer-readable storage media and computer programs.

目標の検出は、自動運転分野における重要な研究の方向である。その主な検出の目標は、静止目標および運動目標の２種類に分類される。静止目標として、信号、交通標識、車道、障害物などが挙げられ、運動目標として、車、歩行者、非自動車などが挙げられる。ここで、交通標識の検出は、無人運転車が走行中に豊富かつ必要なナビゲーション情報を提供するものであり、重要な意味を持つ基礎的な仕事である。 Target detection is an important research direction in the field of autonomous driving. The main detection targets are classified into two types: stationary targets and motor targets. Stationary targets include traffic lights, traffic signs, roadways, obstacles, and exercise targets include cars, pedestrians, and non-automobiles. Here, the detection of traffic signs is a basic task that is important because it provides abundant and necessary navigation information while an unmanned driving vehicle is driving.

ＡＲナビゲーションなどのアプリケーションでは、現在の区間の交通標識をリアルタイムで検出し、ユーザーに対して相応のヒントを与えることが重要である。車載ビデオでは、交通標識のサイズ分布範囲が広く、かつ大量の小目標（２０画素以下）が存在し、小目標の検出は、検出アルゴリズムそのものだけでなく、画像の高い解像度を維持することも求め、これは車載マシンの有限な計算の性能に対しても大きな試練である。 In applications such as AR navigation, it is important to detect traffic signs in the current section in real time and give appropriate hints to the user. In in-vehicle video, the size distribution range of traffic signs is wide and there are a large number of small targets (20 pixels or less), and detection of small targets requires not only the detection algorithm itself but also maintaining high resolution of the image. This is also a big test for the finite computational performance of in-vehicle machines.

交通標識の認識の実効性を保証するために、既存の方式の多くは、ＹＯＬＯモデルを使って入力画像をトレーニングし、得られた予測値から交通標識が属する分類を予測することによって識別する。ＹＯＬＯモデルのトレーニングネットワークがＣ１ーＣ７の計７層の畳み込みトレーニング層と２層の全結合層を含むＣＮＮモデルであるため、比較的速い速度で認識を完成できるが、交通標識が通常、収集されたオリジナル画像のごく一部を占めるだけであり、特徴マップが畳み込み層を通過するたびにサイズが縮小され、したがって、既存のＹＯＬＯモデルを採用する方法では、多層の畳み込みを通過した後に小さい画像の特徴を失いやすく、交通標識の認識の成功率に影響を与える。 To ensure the effectiveness of traffic sign recognition, many existing methods identify by training the input image using a YOLO model and predicting the classification to which the traffic sign belongs from the resulting predictions. Since the training network of the YOLO model is a CNN model that includes a total of 7 convolutional training layers of C1-C7 and 2 fully connected layers, recognition can be completed at a relatively high speed, but traffic signs are usually collected. It occupies only a small part of the original image and is reduced in size each time the feature map passes through the convolutional layer, so the method of adopting the existing YOLO model is to go through the multi-layered convolutional and then the smaller image. It is easy to lose its characteristics and affects the success rate of traffic sign recognition.

本開示の実施例は、小目標を検出するための方法及び装置、電子デバイス、コンピュータ可読記憶媒体及びコンピュータプログラムを提案する。 The embodiments of the present disclosure propose methods and devices, electronic devices, computer-readable storage media and computer programs for detecting sub-targets.

第１態様において、本開示の実施例は、小目標を含むオリジナル画像を取得することと、オリジナル画像を低解像度画像に縮小することと、軽量級の分割ネットワークを用いて、低解像度画像から小目標を含む候補領域を識別することと、候補領域に対応するオリジナル画像の領域を関心領域とし、予めトレーニングされた検出モデルを関心領域上で実行することにより、オリジナル画像における小目標の位置を確定することと、を含む小目標を検出するための方法に関する。 In a first aspect, the embodiments of the present disclosure are small from a low resolution image using an original image containing a small target, reducing the original image to a low resolution image, and using a lightweight split network. The position of the small target in the original image is determined by identifying the candidate area including the target, setting the area of the original image corresponding to the candidate area as the area of interest, and executing a pre-trained detection model on the area of interest. And how to detect small goals, including.

いくつかの実施例において、検出モデルは、次のような方法でトレーニングされ、即ち、初期検出モデルのネットワーク構造を確定し、且つ初期検出モデルのネットワークパラメータを初期化し、トレーニングサンプルセットを取得し、ここで、トレーニングサンプルは、サンプル画像とサンプル画像における小目標の位置を特徴付けるためのアノテーション情報とを含み、トレーニングサンプルを、コピー、マルチスケール変化、編集の少なくとも１つの方法で強化し、強化されたトレーニングサンプルセットにおけるトレーニングサンプル中のサンプル画像およびアノテーション情報をそれぞれ初期検出モデルの入力および所望の出力とし、初期検出モデルを機械学習方法でトレーニングし、トレーニングによって得られた初期検出モデルを、予めトレーニングされた検出モデルとして確定する。 In some embodiments, the detection model is trained in the following way, i.e., determining the network structure of the initial detection model, initializing the network parameters of the initial detection model, and obtaining a training sample set. Here, the training sample contains a sample image and annotation information for characterizing the position of the small target in the sample image, and the training sample is enhanced and enhanced by at least one method of copying, multi-scale change, and editing. The sample image and annotation information in the training sample in the training sample set are used as the input and desired output of the initial detection model, the initial detection model is trained by the machine learning method, and the initial detection model obtained by the training is pre-trained. Confirmed as a detection model.

いくつかの実施例において、トレーニングサンプルを以下のように編集し、即ち、サンプル画像から小目標を抽出し、小目標をスケールおよび/または回転した後、サンプル画像における他の位置にランダムに貼り付けることにより、新しいサンプル画像を得る。 In some examples, the training sample is edited as follows, i.e., a small target is extracted from the sample image, the small target is scaled and / or rotated, and then randomly pasted elsewhere in the sample image. This will give you a new sample image.

いくつかの実施例において、当該方法は、分割ネットワークのトレーニングサンプルを作成する際には、タスクを検出するための矩形枠内の画素点を正のサンプルに設定し、矩形枠外の画素点を負のサンプルに設定することと、長さが所定の画素数より小さい小目標の矩形枠を外側に広げることと、外側に広げられた矩形枠内の画素をいずれも正のサンプルに設定することと、をさらに含む。 In some embodiments, the method sets the pixel points within the rectangular frame for detecting the task to the positive sample and negatives the pixel points outside the rectangular frame when creating the training sample for the split network. To set the sample of, to expand the rectangular frame of the small target whose length is smaller than the predetermined number of pixels to the outside, and to set the pixels in the rectangular frame expanded to the outside to the positive sample. , Further including.

いくつかの実施例において、検出モデルは、ディープニューラルネットワークである。 In some embodiments, the detection model is a deep neural network.

いくつかの実施例において、各予測層の特徴融合の後にアテンションモジュールを導入し、異なるチャネルの特徴に対し適切な重み付けを学習する。 In some embodiments, attention modules are introduced after feature fusion of each predictor layer to learn appropriate weighting for features of different channels.

第２態様において、本開示の実施例は、小目標を含むオリジナル画像を取得するように配置された取得ユニットと、オリジナル画像を低解像度画像に縮小するように配置された縮小ユニットと、軽量級の分割ネットワークを用いて、低解像度画像から小目標を含む候補領域を識別するように配置された第１の検出ユニットと、候補領域に対応するオリジナル画像の領域を関心領域とし、予めトレーニングされた検出モデルを関心領域上で実行することにより、オリジナル画像における小目標の位置を確定するように配置された第２の検出ユニットと、を含む小目標を検出するための装置に関する。 In the second aspect, the embodiments of the present disclosure include an acquisition unit arranged to acquire an original image including a small target, a reduction unit arranged to reduce the original image to a low resolution image, and a lightweight class. The first detection unit arranged to identify the candidate area including the small target from the low-resolution image and the area of the original image corresponding to the candidate area as the area of interest were pre-trained using the divided network of. It relates to a second detection unit arranged to determine the position of the small target in the original image by running the detection model on the region of interest, and a device for detecting the small target including.

いくつかの実施例において、本開示の実施例に係る装置は、以下のように配置されたトレーニングユニットをさらに含み、即ち、初期検出モデルのネットワーク構造を確定し、且つ初期検出モデルのネットワークパラメータを初期化し、トレーニングサンプルセットを取得し、ここで、トレーニングサンプルは、サンプル画像とサンプル画像における小目標の位置を特徴付けるためのアノテーション情報とを含み、トレーニングサンプルを、コピー、マルチスケール変化、編集の少なくとも１つの方法で強化し、強化されたトレーニングサンプルセットにおけるトレーニングサンプル中のサンプル画像およびアノテーション情報をそれぞれ初期検出モデルの入力および所望の出力とし、初期検出モデルを機械学習方法でトレーニングし、トレーニングによって得られた初期検出モデルを、予めトレーニングされた検出モデルとして確定する。 In some embodiments, the apparatus according to the embodiments of the present disclosure further comprises a training unit arranged as follows, i.e., determining the network structure of the initial detection model and providing the network parameters of the initial detection model. Initialize and get the training sample set, where the training sample contains the sample image and annotation information to characterize the position of the small target in the sample image, and at least copy, multiscale change, edit the training sample. The sample images and annotation information in the training sample in the training sample enhanced by one method are used as the input and desired output of the initial detection model, respectively, and the initial detection model is trained by the machine learning method and obtained by training. The initial detection model obtained is determined as a pre-trained detection model.

いくつかの実施例において、トレーニングユニットは、さらに、サンプル画像から小目標を抽出し、小目標をスケールおよび/または回転した後、サンプル画像における他の位置にランダムに貼り付けることにより、新しいサンプル画像を得る、ように配置される。 In some embodiments, the training unit further extracts a small target from the sample image, scales and / or rotates the small target, and then randomly pastes it elsewhere in the sample image to create a new sample image. Is arranged to get.

いくつかの実施例において、第１の検出ユニットは、さらに、分割ネットワークのトレーニングサンプルを作成する際には、タスクを検出するための矩形枠内の画素点を正のサンプルに設定し、矩形枠外の画素点を負のサンプルに設定し、長さが所定の画素数より小さい小目標の矩形枠を外側に広げ、外側に広げられた矩形枠内の画素をいずれも正のサンプルに設定する、ように配置される。 In some embodiments, the first detection unit further sets the pixel points within the rectangular frame for detecting the task to the positive sample when creating the training sample for the split network, and outside the rectangular frame. Pixel points are set as a negative sample, a rectangular frame with a small target whose length is smaller than a predetermined number of pixels is expanded outward, and all the pixels in the rectangular frame expanded outward are set as positive samples. Arranged like this.

第３態様において、本開示の実施例は、１つ以上のプロセッサと、１つ以上のプログラムが記憶された記憶装置と、を含み、１つ以上のプログラムが１つ以上のプロセッサによって実行されるとき、第１態様のいずれかに記載の方法を１つ以上のプロセッサに実現させる小目標を検出するための電子機器に関する。 In a third aspect, the embodiments of the present disclosure include one or more processors and a storage device in which one or more programs are stored, and one or more programs are executed by one or more processors. When it comes to electronic devices for detecting sub-targets that enable one or more processors to implement the method according to any one of the first embodiments.

第４態様において、本開示の実施例は、コンピュータプログラムが記憶されたコンピュータ可読媒体であって、コンピュータプログラムがプロセッサによって実行されるとき、第１態様のいずれかに記載の方法を実現するコンピュータ可読記憶媒体に関する。 In a fourth aspect, an embodiment of the present disclosure is a computer-readable medium in which a computer program is stored, which realizes the method according to any one of the first aspects when the computer program is executed by a processor. Regarding storage media.

第５態様において、本開示の実施例は、コンピュータプログラムであって、コンピュータプログラムがプロセッサによって実行されると、第１態様のいずれかに記載の方法を実現するコンピュータプログラムに関する。 In a fifth aspect, the embodiments of the present disclosure relate to a computer program that, when executed by a processor, realizes the method according to any one of the first aspects.

本開示の実施例による小目標を検出するための方法及び装置は、主にトレーニング方法、モデル構造、２段階の検出の３つの方面から解決され、ここで、トレーニング方法とモデル構造は、主にモデルの小目標に対する検出能力を向上させるために用いられ、２段階の検出は、画像に関係ない領域での計算量を減少させるために用いられ、この結果、演算速度を向上させる。 The methods and devices for detecting sub-targets according to the embodiments of the present disclosure are mainly solved from three aspects of training method, model structure, and two-step detection, where the training method and model structure are mainly described. Used to improve the ability of the model to detect small targets, two-step detection is used to reduce the amount of computation in areas unrelated to the image, resulting in increased computational speed.

本発明は、ＡＲナビゲーションプロジェクトにリアルタイムの交通標識検出アルゴリズムを提供することができ、小目標の検出においてより良いパフォーマンスを示し、ユーザのナビゲーション体験を向上させることができる。 The present invention can provide real-time traffic sign detection algorithms for AR navigation projects, show better performance in detecting small targets, and improve the user's navigation experience.

本開示のその他の特徴、目的および利点をより明確にするために、以下の図面を参照してなされた非限定的な実施例の詳細な説明を参照する。
本開示の一実施例が適用可能な例示的なシステムアーキテクチャ図である。本開示による小目標を検出するための方法の一実施例のフローチャートである。本開示による小目標を検出するための方法の一応用シーンを示す概略図である。本開示による小目標を検出するための方法の別の実施例のフローチャートである。本開示による小目標を検出するための方法の検出モデルのネットワーク構成図である。本開示による小目標を検出するための装置の一実施例の概略構成図である。本開示の実施例を実現するために適用される電子機器のコンピュータシステムの概略構成図である。 To better clarify the other features, objectives and advantages of the present disclosure, reference is made to the detailed description of the non-limiting examples made with reference to the following drawings.
It is an exemplary system architecture diagram to which one embodiment of the present disclosure is applicable. It is a flowchart of one Embodiment of the method for detecting a small target by this disclosure. It is a schematic diagram which shows one application scene of the method for detecting a small target by this disclosure. It is a flowchart of another embodiment of the method for detecting a small target by this disclosure. It is a network configuration diagram of the detection model of the method for detecting a small target by this disclosure. It is a schematic block diagram of an Example of the apparatus for detecting a small target by this disclosure. It is a schematic block diagram of the computer system of the electronic device applied to realize the embodiment of this disclosure.

以下、図面及び実施例を参照して本開示についてより詳細に説明する。ここで説明された具体的な実施例は、関連する発明を説明するためだけのものであり、この発明を制限するものではないことを理解できる。なお、説明を容易にするために、図面には関連する発明に関連する部分のみを示している。 Hereinafter, the present disclosure will be described in more detail with reference to the drawings and examples. It can be understood that the specific examples described herein are for illustration purposes only and are not intended to limit the invention. For ease of explanation, the drawings show only the parts related to the related invention.

なお、矛盾しない場合には、本開示の実施例及び実施例における特徴が互いに組み合わせることができる。以下、図面を参照して、実施例に合わせて本開示を詳細に説明する。 If there is no contradiction, the embodiments of the present disclosure and the features of the embodiments can be combined with each other. Hereinafter, the present disclosure will be described in detail according to examples with reference to the drawings.

図１には、本発明が適用され得る、小目標を検出するための方法又は小目標を検出するための装置の実施例の例示的なシステムアーキテクチャ１００が示されている。 FIG. 1 shows an exemplary system architecture 100 of an embodiment of a method for detecting a small target or an apparatus for detecting a small target to which the present invention may be applied.

図１に示すように、システムアーキテクチャ１００は、車両１０１と交通標識１０２とを含むことができる。 As shown in FIG. 1, the system architecture 100 can include a vehicle 101 and a traffic sign 102.

車両１０１は、普通の自動車であってもよいし、無人運転車であってもよい。車両１０１に、コントローラ１０１１、ネットワーク１０１２およびセンサ１０１３が取り付けられてもよい。ネットワーク１０１２は、コントローラ１０１１とセンサ１０１３との間に通信リンクの媒体を提供するために使用される。ネットワーク１０１２は、例えば有線、無線通信リンク、または光ファイバケーブルなどの様々な接続タイプを含むことができる。 The vehicle 101 may be an ordinary automobile or an unmanned driving vehicle. The controller 1011 and the network 1012 and the sensor 1013 may be attached to the vehicle 101. The network 1012 is used to provide a medium for communication links between the controller 1011 and the sensor 1013. The network 1012 can include various connection types such as, for example, wired, wireless communication links, or fiber optic cables.

コントローラ（車載脳とも呼ばれる）１０１１は、車両１０１の知能制御を担当する。コントローラ１０１１は、例えばプログラマブルロジックコントローラ（ＰｒｏｇｒａｍｍａｂｌｅＬｏｇｉｃＣｏｎｔｒｏｌｌｅｒ、ＰＬＣ）、ワンチップマイクロコンピュータ、産業用制御機などのような個別に配置されたコントローラであってもよいし、入出力ポートを有し、演算制御機能を有する他の電子デバイスで構成された装置であってもよいし、車両運転制御類のアプリケーションがインストールされたコンピュータデバイスであってもよい。コントローラには、トレーニングされた分割ネットワークと検出モデルが設置されている。 The controller (also called an in-vehicle brain) 1011 is in charge of intelligent control of the vehicle 101. The controller 1011 may be an individually arranged controller such as a programmable logic controller (PLC), a one-chip computer, an industrial controller, or the like, or may have an input / output port and perform an operation. It may be a device composed of other electronic devices having a control function, or it may be a computer device in which an application of vehicle driving control is installed. The controller is equipped with a trained split network and detection model.

センサ１０１３は、例えば、カメラ、重力センサ、ホイール速度センサ、温度センサ、湿度センサ、レーザレーダ、ミリ波レーダーなどの様々なセンサであってもよい。場合によっては、車両１０１にはＧＮＳＳ（ＧｌｏｂａｌＮａｖｉｇａｔｉｏｎＳａｔｅｌｌｉｔｅＳｙｓｔｅｍ、グローバル衛星ナビゲーションシステム）機器やＳＩＮＳ（Ｓｔｒａｐ-ｄｏｗｎＩｎｅｒｔｉａｌＮａｖｉｇａｔｉｏｎＳｙｓｔｅｍ、ストラップダウン方式の慣性ナビゲーションシステム）などが搭載されてもよい。 The sensor 1013 may be, for example, various sensors such as a camera, a gravity sensor, a wheel speed sensor, a temperature sensor, a humidity sensor, a laser radar, and a millimeter wave radar. In some cases, the vehicle 101 may be equipped with GNSS (Global Navigation Satellite System) equipment, SINS (Strap-down Inertial Navigation System, strap-down type inertial navigation system), or the like.

車両１０１は、走行中に交通標識１０２を撮影する。遠距離で撮影された画像でも、近距離で撮影された画像でも、画像における交通標識は、いずれも小目標である。 The vehicle 101 photographs the traffic sign 102 while traveling. Traffic signs in images, whether taken at long distances or at short distances, are both small targets.

車両１０１は、撮影した交通標識を含むオリジナル画像をコントローラに識別させることにより、交通標識の位置を確定する。ＯＣＲ識別を行うことにより、交通標識の内容を識別することもできる。そして、交通標識の内容を音声や文字の形で出力する。 The vehicle 101 determines the position of the traffic sign by causing the controller to identify the original image including the captured traffic sign. By performing OCR identification, the content of the traffic sign can also be identified. Then, the content of the traffic sign is output in the form of voice or characters.

なお、本発明の実施例による小目標を検出するための方法は、一般的にコントローラ１０１１によって実行され、これに対応して、小目標を検出するための装置は、一般的にコントローラ１０１１に配置される。 The method for detecting a small target according to the embodiment of the present invention is generally executed by the controller 1011. Correspondingly, the device for detecting the small target is generally arranged in the controller 1011. Will be done.

図１のコントローラ、ネットワーク、およびセンサの数は単なる例示であることを理解すべきである。必要に応じて、任意の数のコントローラ、ネットワーク、およびセンサを備えることができる。 It should be understood that the number of controllers, networks, and sensors in FIG. 1 is merely exemplary. It can be equipped with any number of controllers, networks, and sensors as needed.

続けて図２を参照すると、本開示による小目標を検出するための方法の一実施例のフロー２００が示されている。当該小目標を検出するための方法は、以下のステップを含む。 Subsequently, with reference to FIG. 2, a flow 200 of an embodiment of the method for detecting a small target according to the present disclosure is shown. The method for detecting the sub-target includes the following steps.

ステップ２０１において、小目標を含むオリジナル画像を取得する。 In step 201, an original image including a small goal is acquired.

本実施例において、小目標を検出するための方法の実行主体（例えば図１に示されたコントローラ）は、車載カメラを介して前方画像を収集することができ、収集されたオリジナル画像には小目標が含まれる。小目標とは、縦横の画素数が所定値(例えば２０)未満の目標物体の画像を指す。 In this embodiment, the execution subject of the method for detecting a small target (for example, the controller shown in FIG. 1) can collect a front image via an in-vehicle camera, and the collected original image is small. The goal is included. The small target refers to an image of a target object having a number of vertical and horizontal pixels less than a predetermined value (for example, 20).

ステップ２０２において、オリジナル画像を低解像度画像に縮小する。 In step 202, the original image is reduced to a low resolution image.

本実施例において、オリジナル画像の縦横方向をそれぞれ４(または他の倍数)で除算することにより、低解像度画像を得ることができる。縮小中においてアスペクト比を変更しない。 In this embodiment, a low resolution image can be obtained by dividing the vertical and horizontal directions of the original image by 4 (or other multiples). Do not change the aspect ratio during reduction.

ステップ２０３において、軽量級の分割ネットワークを用いて、低解像度画像から小目標を含む候補領域を識別する。 In step 203, a lightweight divided network is used to identify candidate regions containing small goals from low resolution images.

本実施例において、第１の段階の検出の際に、目標が存在する可能性のある大体の位置を特定するだけでよく、正確な外枠を必要としないため、軽量級の分割ネットワークを用いて実現し、その最終的な出力ヒートマップにおける一定の閾値より大きい点をターゲットの存在が疑わしい点とみなす。Ｕ-Ｎｅｔのような分割ネットワークを採用することができ、バックボーンネットワークは軽量化のためにｓｈｕｆｆｌｅｎｅｔを採用する。 In this embodiment, during the detection of the first stage, it is only necessary to identify the approximate position where the target may exist, and an accurate outer frame is not required. Therefore, a lightweight divided network is used. The point larger than a certain threshold value in the final output heat map is regarded as the point where the existence of the target is suspicious. A split network such as U-Net can be adopted, and the backbone network adopts a shufflenet for weight reduction.

分割ネットワークのトレーニングサンプルを作成する際には、タスクを検出するための矩形枠内の画素点を正のサンプルに設定し、矩形枠外の画素点を負のサンプルに設定する。縦横方向のスケーリングがあるので、小目標に関するリコール率を保証するために、トレーニングサンプルを作成する際に、縦横が所定値、例えば２０画素より小さい目標の矩形枠を外側に１倍に広げ、この後、外側に広げられた矩形枠内の画素をいずれも正のサンプルに設定する。 When creating a training sample for a divided network, the pixel points inside the rectangular frame for detecting the task are set as positive samples, and the pixel points outside the rectangular frame are set as negative samples. Since there is vertical and horizontal scaling, in order to guarantee the recall rate for small goals, when creating a training sample, the rectangular frame of the target whose vertical and horizontal dimensions are smaller than a predetermined value, for example, 20 pixels, is expanded outward by 1 time. After that, all the pixels in the rectangular frame spread outward are set as positive samples.

ステップ２０４において、候補領域に対応するオリジナル画像の領域を関心領域とし、予めトレーニングされた検出モデルを関心領域上で実行することにより、オリジナル画像における小目標の位置を確定する。 In step 204, the region of the original image corresponding to the candidate region is set as the region of interest, and the position of the small target in the original image is determined by executing the pre-trained detection model on the region of interest.

本実施例において、分割ネットワークから出力された結果におけるノイズポイントをフィルタリングした後、残りのすべての疑似的な目標点を囲む最小の外接矩形を形成し、当該矩形のスケールされない高解像度画像における対応する領域を関心領域とする。この後、検出モデルを当該関心領域上で実行すると、高解像度画像の一部の領域のみを処理する必要があり、この結果、計算量を低減することができる。 In this example, after filtering the noise points in the result output from the split network, a minimum circumscribed rectangle surrounding all the remaining pseudo target points is formed, which corresponds to the unscaled high resolution image of the rectangle. The area is the area of interest. After that, when the detection model is executed on the region of interest, it is necessary to process only a part of the high-resolution image, and as a result, the amount of calculation can be reduced.

前述のように、小目標をより良く検出するために、より高い解像度を維持する必要があり、画像が大きいと計算量が倍になり、車載マシンの環境ではリアルタイム処理を実現することが困難である。一方、交通標識が画像上で占める割合は小さく、ほとんどは背景領域であり、背景領域での計算量は全体の計算量のかなりの割合を占め、高解像度で背景領域を処理するのは時間がかかり、無意味である。したがって、本発明は、２段階の検出方式を採用し、まず軽量級の分割ネットワークを介して低解像度の画像上で疑わしい目標の大体の位置を特定し、この後、すべての疑わしい目標を含む最小の外接矩形を求め、最後に、当該最小の外接矩形に対応する高解像度画像ブロック上で検出モデルを実行することにより、小目標に対する検出率を保証する場合、計算量を減らす。 As mentioned earlier, higher resolutions need to be maintained in order to better detect small targets, larger images double the amount of computation, and real-time processing is difficult to achieve in an in-vehicle machine environment. be. On the other hand, traffic signs occupy a small proportion on the image, most of which is the background area, the amount of calculation in the background area occupies a considerable proportion of the total amount of calculation, and it takes time to process the background area with high resolution. It takes and is meaningless. Therefore, the present invention employs a two-step detection method, first identifying the approximate location of a suspicious target on a low resolution image via a lightweight split network, and then a minimum including all suspicious targets. If the detection rate for a small target is guaranteed by finding the circumscribing rectangle of, and finally running the detection model on the high-resolution image block corresponding to the smallest circumscribing rectangle, the amount of calculation is reduced.

上記の２つの段階の処理を経て、検出モデルの平均計算量は元の計算量の２５％ぐらいに減少し、２つのモデルを合わせた平均計算量は、約元の計算量の４５％ぐらいになる。 After the above two steps, the average complexity of the detection model is reduced to about 25% of the original complexity, and the combined complexity of the two models is about 45% of the original complexity. Become.

続けて図４を参照すると、図４は、本実施例による小目標を検出するための方法の応用シーンを示す概略図である。図４の応用シーンにおいて、車両は走行中に前方画像をリアルタイムで収集する。取得されたオリジナル画像の縦横をそれぞれ４で除算した後、低解像度画像に縮小する。低解像度画像を軽量級の分割ネットワークに入力して、交通標識を含む候補領域を識別する。この後、オリジナル画像から候補領域に対応するオリジナル画像の領域を関心領域として見出す。関心領域の画像を抽出し、予めトレーニングされた検出モデルを入力して、点線の枠に示されるように、オリジナル画像における交通標識の具体的な位置を確定する。 With reference to FIG. 4, FIG. 4 is a schematic diagram showing an application scene of the method for detecting a small target according to the present embodiment. In the application scene of FIG. 4, the vehicle collects a front image in real time while driving. After dividing the vertical and horizontal directions of the acquired original image by 4, the image is reduced to a low resolution image. Input a low-resolution image into a lightweight split network to identify candidate areas containing traffic signs. After that, the area of the original image corresponding to the candidate area is found as the area of interest from the original image. An image of the region of interest is extracted and a pre-trained detection model is entered to determine the specific position of the traffic sign in the original image, as shown by the dotted frame.

本開示の上記実施例による方法は、二次検出によって、計算量を低減し、識別速度と正確率を向上させる。 The method according to the above embodiment of the present disclosure reduces the amount of calculation and improves the discrimination speed and the accuracy rate by the secondary detection.

さらに図４を参照して、小目標を検出するための方法の別の実施例のフロー４００が示されている。当該小目標を検出するための方法のフロー４００は、以下のステップを含む。 Further, with reference to FIG. 4, a flow 400 of another embodiment of the method for detecting a small target is shown. Flow 400 of the method for detecting the sub-target includes the following steps.

ステップ４０１において、初期検出モデルのネットワーク構造を確定し、且つ初期検出モデルのネットワークパラメータを初期化する。 In step 401, the network structure of the initial detection model is determined, and the network parameters of the initial detection model are initialized.

本実施例において、小目標を検出するための方法が実行される電子装置（例えば、図１に示されたコントローラ）は、検出モデルをトレーニングすることができる。サードパーティのサーバで検出モデルをトレーニングした後、車両のコントローラにインストールすることもできる。検出モデルは、ニューラルネットワークモデルであり、目標の検出のための既存のいずれかのニューラルネットワークであってもよい。 In this embodiment, an electronic device (eg, the controller shown in FIG. 1) in which the method for detecting a small target is performed can train a detection model. You can also train the detection model on a third-party server and then install it on the vehicle's controller. The detection model is a neural network model and may be any existing neural network for target detection.

本実施例のいくつかの選択可能な実施形態において、検出モデルは、例えばＹＯＬＯ系ネットワークなどのディープニューラルネットワークである。ＹＯＬＯ（ＹｏｕＯｎｌｙＬｏｏｋＯｎｃｅ）は、ディープニューラルネットワークに基づくオブジェクト識別および位置特定アルゴリズムであり、その最大の特徴は、動作速度が速く、リアルタイムシステムに利用できることである。現在、ＹＯＬＯはＶ３バージョン（ＹＯＬＯ３）に発展したが、新バージョンも元のバージョンに基づいてどんどん進化したものである。ＹＯＬＯ３のオリジナルの構造設計では、アップサンプリングにより低解像度特徴マップと高解像度特徴マップを融合する。しかしながら、このような融合は、高解像度特徴マップのみで発生し、異なるスケールの特徴を十分に融合することができなかった。 In some selectable embodiments of this embodiment, the detection model is a deep neural network, such as a YOLO network. YOLO (You Only Look None) is an object identification and positioning algorithm based on a deep neural network, and its greatest feature is that it operates at a high speed and can be used in a real-time system. Currently, YOLO has evolved into the V3 version (YOLO3), but the new version has also evolved steadily based on the original version. In YOLO3's original structural design, low-resolution feature maps and high-resolution feature maps are fused by upsampling. However, such fusion occurred only in the high resolution feature map and could not sufficiently fuse features of different scales.

異なる階層の特徴をより良く融合するために、本発明は、まず、バックボーンネットワークにおけるサブサンプリングの８倍、１６倍、および３２倍の特徴を基本的な特徴として選択し、この後、異なるサイズの目標を予測するために、予測特徴マップのサイズをそれぞれ画像のサブサンプリングの８倍、１６倍、および３２倍のサイズに設定し、各予測特徴マップの特徴は、いずれも３つの基本的な特徴層からのものであり、サブサンプリングまたはアップサンプリングにより同一のサイズに統一してから融合する。画像のサブサンプリングの１６倍の予測層を例にとると、その特徴は、それぞれ３つの基本的な特徴層からのものであり、同一のサイズに統一するために、サブサンプリングの８倍の基本的な特徴層に対して１倍のサブサンプリングを行い、サブサンプリングの３２倍の基本的な特徴層に対して１倍のアップサンプリングを行い、この後、２つの特徴層とサブサンプリングの１６倍の基本的な特徴層とを融合する。 In order to better integrate the features of different hierarchies, the invention first selects features 8x, 16x, and 32x the subsampling in the backbone network as basic features, followed by different sizes. In order to predict the target, the size of the predicted feature map is set to 8 times, 16 times, and 32 times the size of the subsampling of the image, respectively, and the features of each predicted feature map are all three basic features. It is from a layer and is unified to the same size by subsampling or upsampling before fusion. Taking the prediction layer 16 times the subsampling of the image as an example, the features are from each of the three basic feature layers, and the basics are 8 times the subsampling in order to unify them to the same size. 1x subsampling for a typical feature layer, 1x upsampling for a basic feature layer 32x the subsampling, then 16x the two feature layers and subsampling It fuses with the basic feature layer of.

単純に異なるスケールの特徴を融合すると、３つの予測層において特徴の比率は同じであり、それぞれの異なる予測目標に従って偏重して使用することができない。したがった、各予測層の特徴融合の後にアテンションモジュールをさらに導入し、異なるチャネルの特徴に対し適切な重み付けを学習し、これにより、各予測層は、自分が必要とする予測目標の特性に応じて、融合された後の特徴を偏重して使用することができる。ネットワーク構造を図５に示す。アテンションモジュールのパラメータの学習方式は先行技術であるため、ここでは説明を省略する。 By simply fusing features of different scales, the proportions of features are the same in the three prediction layers and cannot be weighted according to their different prediction goals. Therefore, after the feature fusion of each prediction layer, an attention module is further introduced to learn appropriate weighting for the features of different channels, so that each prediction layer depends on the characteristics of the prediction target that it needs. Therefore, the characteristics after fusion can be used with a heavy weight. The network structure is shown in FIG. Since the method of learning the parameters of the attention module is prior art, the description thereof is omitted here.

本開示では、ＹＯＬＯ３を検出ネットワークとして採用することができ、このようなアンカ（ａｎｃｈｏｒ）に基づく検出方法においてａｎｃｈｏｒの設計と割り当ては非常に重要であり、小目標に合致できるａｎｃｈｏｒの数が少ないため、モデルによる小目標の学習が不十分になり、小目標をうまく検出できなくなる。このために、動的なａｎｃｈｏｒマッチングメカニズムを採用し、ｇｒｏｕｎｄｔｒｕｔｈ（基本真理値）の大きさに応じてａｎｃｈｏｒとｇｒｏｕｎｄｔｒｕｔｈがマッチングしたときのＩＯＵ（信頼度スコア）閾値を適応的に選択し、目標が小さい場合、ＩＯＵの閾値を下げて、より多くの小目標がトレーニングに参加できるようにして、小目標の検出におけるモデルの性能を向上させる。トレーニングサンプルを作成する際には、目標の大きさをすでに知り、目標の大きさに応じて適切なＩＯＵの閾値を選択する。 In the present disclosure, YOLO3 can be adopted as a detection network, and the design and allocation of anchors are very important in such anchor-based detection methods, and the number of anchors that can meet the sub-goals is small. , The learning of the small goal by the model becomes insufficient, and the small goal cannot be detected well. For this purpose, a dynamic anchor matching mechanism is adopted, and the IOU (reliability score) threshold value when the anchor and the ground truth are matched according to the magnitude of the ground truth (basic truth value) is adaptively selected. If the goal is small, lower the IOU threshold to allow more sub-goals to participate in the training, improving the performance of the model in detecting sub-goals. When creating a training sample, you already know the size of the target and select the appropriate IOU threshold according to the size of the target.

ステップ４０２において、トレーニングサンプルセットを取得する。 In step 402, a training sample set is obtained.

本実施例において、トレーニングサンプルは、サンプル画像とサンプル画像における小目標の位置を特徴付けるためのアノテーション情報とを含む。 In this example, the training sample contains a sample image and annotation information for characterizing the position of the small target in the sample image.

ステップ４０３において、トレーニングサンプルを、コピー、マルチスケール変化、編集の少なくとも１つの方法で強化する。 In step 403, the training sample is enhanced by at least one method of copying, multiscale variation, and editing.

本実施例において、これは主に、トレーニングデータ内の小目標の数が不十分な場合の策略である。データセットにおける小目標を含む画像を複数コピーすることにより、データにおける小目標の数を直接に増やす一方、画像における小目標を抽出してスケールや回転を行った後、画像における他の位置にランダムに貼り付けることで、小目標の数を増やすだけでなく、より多くの変化を導入し、トレーニングデータの分布を豊かにすることができる。 In this example, this is primarily a trick when the number of sub-goals in the training data is inadequate. By copying multiple images containing small goals in the data set, you can directly increase the number of small goals in the data, while extracting the small goals in the image, scaling and rotating them, and then randomly locating them elsewhere in the image. By pasting in, you can not only increase the number of small goals, but also introduce more changes and enrich the distribution of training data.

選択肢として、トレーニング画像を異なるスケールにスケールした後にトレーニングすることにより、元のデータセットにおける目標スケール変化を豊かにすることができ、モデルを異なるスケールの目標の検出タスクに適合させることができる。 As an option, training images can be scaled to different scales and then trained to enrich the target scale changes in the original dataset and adapt the model to the task of detecting targets of different scales.

ステップ４０４において、強化されたトレーニングサンプルセットにおけるトレーニングサンプル中のサンプル画像およびアノテーション情報をそれぞれ初期検出モデルの入力および所望の出力とし、初期検出モデルを機械学習方法でトレーニングする。 In step 404, the sample images and annotation information in the training sample in the enhanced training sample set are used as inputs and desired outputs of the initial detection model, and the initial detection model is trained by a machine learning method.

本実施例において、実行主体は、トレーニングサンプルセットにおけるトレーニングサンプル中のサンプル画像を初期検出モデルに入力することにより、当該サンプル画像における小目標の位置情報を得ることができ、当該トレーニングサンプルにおけるアノテーション情報を初期検出モデルの所望の出力とし、初期検出モデルを機械学習方法でトレーニングする。具体的には、まず、プリセットされた損失関数を用いて、得られた位置情報と当該トレーニングサンプルにおけるアノテーション情報との差異を計算することができ、例えば、Ｌ２ノルムを損失関数として用いて、得られた位置情報と当該トレーニングサンプルにおけるアノテーション情報との差異を計算することができる。この後、計算によって得られた差異に基づいて、初期検出モデルのネットワークパラメータを調整することができ、プリセットされたトレーニング終了条件を満たした場合、訓練を終了する。例えば、ここでプリセットされたトレーニング終了条件は、トレーニング時間がプリセット時間を超えること、トレーニング回数がプリセット回数を超えること、計算によって得られた差異がプリセット差異閾値より小さいことの少なくとも１つを含むことができるがこれに限定されない。 In this embodiment, the execution subject can obtain the position information of the small target in the sample image by inputting the sample image in the training sample in the training sample set into the initial detection model, and the annotation information in the training sample. Is the desired output of the initial detection model, and the initial detection model is trained by a machine learning method. Specifically, first, the difference between the obtained position information and the annotation information in the training sample can be calculated using the preset loss function. For example, the L2 norm can be used as the loss function to obtain the difference. It is possible to calculate the difference between the obtained position information and the annotation information in the training sample. After this, the network parameters of the initial detection model can be adjusted based on the differences obtained by the calculation, and if the preset training end conditions are met, the training ends. For example, the training end condition preset here includes at least one that the training time exceeds the preset time, the number of trainings exceeds the preset number, and the difference obtained by the calculation is smaller than the preset difference threshold. However, it is not limited to this.

ここで、様々な実施形態で、生成された位置情報と当該トレーニングサンプルにおけるアノテーション情報との差異に基づいて、初期検出モデルのネットワークパラメータを調整することができる。例えば、ＢＰ（ＢａｃｋＰｒｏｐａｇａｔｉｏｎ、逆伝播）アルゴリズムまたはＳＧＤ（ＳｔｏｃｈａｓｔｉｃＧｒａｄｉｅｎｔＤｅｓｃｅｎｔ、ランダム勾配降下）アルゴリズムを用いて、初期検出モデルのネットワークパラメータを調整することができる。 Here, in various embodiments, the network parameters of the initial detection model can be adjusted based on the difference between the generated location information and the annotation information in the training sample. For example, a BP (Backpropagation) algorithm or an SGD (Stochastic Gradient Descent) algorithm can be used to adjust the network parameters of the initial detection model.

ステップ４０５において、トレーニングによって得られた初期検出モデルを、予めトレーニングされた検出モデルとして確定する。 In step 405, the initial detection model obtained by training is determined as a pre-trained detection model.

本実施例において、トレーニングステップの実行主体は、ステップ４０４でトレーニングによって得られた初期検出モデルを、予めトレーニングされた検出モデルとして確定することができる。 In this embodiment, the execution subject of the training step can determine the initial detection model obtained by training in step 404 as a pre-trained detection model.

さらに図６を参照して、上記の各図に示された方法の実現として、本発明は小目標を検出するための装置の一実施例を提供し、当該装置の実施例は、図２に示す方法実施例に対応し、当該装置は、具体的に様々な電子機器に適用できる。 Further, with reference to FIG. 6, as an embodiment of the method shown in each of the above figures, the present invention provides an embodiment of an apparatus for detecting a small target, and an embodiment of the apparatus is shown in FIG. Corresponding to the method embodiment shown, the apparatus can be specifically applied to various electronic devices.

図６に示すように、本実施例に係る小目標を検出するための装置６００は、取得ユニット６０１と、縮小ユニット６０２と、第１の検出ユニット６０３と、第２の検出ユニット６０４とを含む。ここで、取得ユニット６０１は、小目標を含むオリジナル画像を取得するように配置され、縮小ユニット６０２は、オリジナル画像を低解像度画像に縮小するように配置され、第１の検出ユニット６０３は、軽量級の分割ネットワークを用いて、低解像度画像から小目標を含む候補領域を識別するように配置され、第２の検出ユニット６０４は、候補領域に対応するオリジナル画像の領域を関心領域とし、予めトレーニングされた検出モデルを関心領域上で実行することにより、オリジナル画像における小目標の位置を確定するように配置される。 As shown in FIG. 6, the apparatus 600 for detecting the small target according to the present embodiment includes an acquisition unit 601, a reduction unit 602, a first detection unit 603, and a second detection unit 604. .. Here, the acquisition unit 601 is arranged to acquire the original image including the small target, the reduction unit 602 is arranged to reduce the original image to a low resolution image, and the first detection unit 603 is lightweight. A class division network is used to identify candidate regions containing small targets from low-resolution images, and the second detection unit 604 pre-trains the region of the original image corresponding to the candidate region as the region of interest. By executing the detected detection model on the region of interest, it is arranged so as to determine the position of the small target in the original image.

本実施例において、小目標を検出するための装置６００の取得ユニット６０１、縮小ユニット６０２、第１の検出ユニット６０３、及び第２の検出ユニット６０４の具体的な処理について、図２の対応する実施例におけるステップ２０１、ステップ２０２、ステップ２０３、およびステップ２０４を参照することができる。 In the present embodiment, the corresponding implementation of FIG. 2 relates to the specific processing of the acquisition unit 601, the reduction unit 602, the first detection unit 603, and the second detection unit 604 of the device 600 for detecting the small target. You can refer to step 201, step 202, step 203, and step 204 in the example.

本実施例のいくつかの選択可能な実施形態において、装置６００は、以下のように配置されたトレーニングユニット（図示せず）をさらに含み、即ち、初期検出モデルのネットワーク構造を確定し、且つ初期検出モデルのネットワークパラメータを初期化し、トレーニングサンプルセットを取得し、ここで、トレーニングサンプルは、サンプル画像とサンプル画像における小目標の位置を特徴付けるためのアノテーション情報とを含み、トレーニングサンプルを、コピー、マルチスケール変化、編集の少なくとも１つの方法で強化し、強化されたトレーニングサンプルセットにおけるトレーニングサンプル中のサンプル画像およびアノテーション情報をそれぞれ初期検出モデルの入力および所望の出力とし、初期検出モデルを機械学習方法でトレーニングし、トレーニングによって得られた初期検出モデルを、予めトレーニングされた検出モデルとして確定する。 In some selectable embodiments of this embodiment, the apparatus 600 further comprises a training unit (not shown) arranged as follows, i.e., determining and initializing the network structure of the initial detection model. Initialize the network parameters of the detection model and obtain the training sample set, where the training sample contains the sample image and annotation information to characterize the position of the small target in the sample image, copy the training sample, multi. Enhanced by at least one method of scaling and editing, the sample images and annotation information in the training sample in the enhanced training sample set are the input and desired output of the initial detection model, respectively, and the initial detection model is machine-learned. Train and establish the initial detection model obtained by training as a pre-trained detection model.

本実施例のいくつかの選択可能な実施形態において、トレーニングユニットは、さらに、サンプル画像から小目標を抽出し、小目標をスケールおよび/または回転した後、サンプル画像における他の位置にランダムに貼り付けることにより、新しいサンプル画像を得る、ように配置される。 In some selectable embodiments of this example, the training unit further extracts a small target from the sample image, scales and / or rotates the small target, and then randomly pastes it elsewhere in the sample image. By attaching, it is arranged so as to obtain a new sample image.

本実施例のいくつかの選択可能な実施形態において、第１の検出ユニットは、さらに、分割ネットワークのトレーニングサンプルを作成する際には、タスクを検出するための矩形枠内の画素点を正のサンプルに設定し、矩形枠外の画素点を負のサンプルに設定し、長さが所定の画素数より小さい小目標の矩形枠を外側に広げ、外側に広げられた矩形枠内の画素をいずれも正のサンプルに設定する、ように配置される。 In some selectable embodiments of this embodiment, the first detection unit further positively sets the pixel points in the rectangular frame for detecting the task when creating a training sample of the split network. Set it as a sample, set the pixel points outside the rectangular frame to the negative sample, expand the small target rectangular frame whose length is smaller than the predetermined number of pixels to the outside, and all the pixels inside the rectangular frame expanded to the outside. Arranged to set to a positive sample.

本実施例のいくつかの選択可能な実施形態において、検出モデルは、ディープニューラルネットワークである。 In some selectable embodiments of this embodiment, the detection model is a deep neural network.

本実施例のいくつかの選択可能な実施形態において、各予測層の特徴融合の後にアテンションモジュールを導入し、異なるチャネルの特徴に対し適切な重み付けを学習する。 In some selectable embodiments of this embodiment, attention modules are introduced after feature fusion of each predictor layer to learn appropriate weighting for features of different channels.

以下、図７を参照して、本開示の実施例を実現するために適用される電子機器（例えば図１に示されたコントローラ）７００の概略構成図が示されている。図７に示されたコントローラは、単なる例にすぎ、本開示の実施例の機能および使用範囲を制限するものではない。 Hereinafter, with reference to FIG. 7, a schematic configuration diagram of an electronic device (for example, the controller shown in FIG. 1) 700 applied to realize the embodiment of the present disclosure is shown. The controller shown in FIG. 7 is merely an example and does not limit the functionality and scope of use of the embodiments of the present disclosure.

図７に示すように、電子機器７００は、読み出し専用メモリ（ＲＯＭ）７０２に記憶されているプログラムまたは記憶部７０８からランダムアクセスメモリ（ＲＡＭ）７０３にロードされたプログラムに従って各種の適切な動作と処理を行うことができる処理装置（例えば中央処理装置、グラフィックスプロセッサなど）７０１を含むことができる。ＲＡＭ７０３には、電子機器７００の操作に必要な様々なプログラムとデータが記憶されている。処理装置７０１、ＲＯＭ７０２、およびＲＡＭ７０３は、バス７０４によって相互に接続されている。入力/出力(Ｉ/Ｏ)インターフェース７０５もバス７０４に接続されている。 As shown in FIG. 7, the electronic device 700 performs various appropriate operations and processes according to a program stored in the read-only memory (ROM) 702 or a program loaded into the random access memory (RAM) 703 from the storage unit 708. A processing unit capable of performing the above (for example, a central processing unit, a graphics processor, etc.) 701 can be included. The RAM 703 stores various programs and data necessary for operating the electronic device 700. The processing apparatus 701, ROM 702, and RAM 703 are connected to each other by a bus 704. The input / output (I / O) interface 705 is also connected to the bus 704.

通常、Ｉ/Ｏインターフェース７０５には、例えばタッチスクリーン、タッチパネル、キーボード、マウス、カメラ、マイク、加速度計、ジャイロなどを含む入力装置７０６と、例えば液晶ディスプレイ（ＬＣＤ）、スピーカー、振動器などを含む出力装置７０７と、例えば磁気テープ、ハードディスクなどを含む記憶装置７０８と、通信装置７０９とが接続されている。通信装置７０９は、データを交換するために電子機器７００が他の機器と無線通信または有線通信することを許可できる。図７は、様々な装置を有する電子機器７００を示しているが、図示されたすべての装置を実施または備えることが要求されないことを理解されたい。代わりに、より多くまたはより少ない装置を実施または備えることができる。図７に示した各ブロックは、１つの装置を表してもよく、必要に応じて複数の装置を表してもよい。 Typically, the I / O interface 705 includes an input device 706 including, for example, a touch screen, touch panel, keyboard, mouse, camera, microphone, accelerometer, gyro, etc., and, for example, a liquid crystal display (LCD), speaker, vibrator, etc. An output device 707, a storage device 708 including, for example, a magnetic tape and a hard disk, and a communication device 709 are connected. The communication device 709 can allow the electronic device 700 to perform wireless or wired communication with other devices for exchanging data. FIG. 7 shows an electronic device 700 with various devices, but it should be understood that it is not required to implement or equip all the devices shown. Alternatively, more or less equipment can be implemented or equipped. Each block shown in FIG. 7 may represent one device, or may represent a plurality of devices as needed.

特に、本開示の実施例によると、上記のフローチャートを参照して説明されたプロセスは、コンピュータソフトウェアのプログラムとして実現されることができる。例えば、本開示の実施例は、コンピュータ可読媒体に担持されたコンピュータプログラムを含むコンピュータプログラム製品を含み、当該コンピュータプログラムは、フローチャートに示された方法を実行するためのプログラムコードを含む。このような実施例では、このコンピュータプログラムは、通信装置７０９を介してネットワークからダウンロードされてインストールされ、または記憶装置７０８からインストールされ、またはＲＯＭ７０２からインストールされることができる。このコンピュータプログラムが処理装置７０１によって実行されるときに、本開示の実施例の方法で限定された上記の機能を実行する。なお、本開示の実施例に記載のコンピュータ可読媒体は、コンピュータ可読信号媒体、あるいはコンピュータ可読記憶媒体、または上記の両方の任意の組合せであってもよい。コンピュータ可読記憶媒体は、例えば、電気、磁気、光、電磁気、赤外線、あるいは半導体のシステム、装置またはデバイス、あるいは上記の任意の組合せであってもよいが、これらに限らない。コンピュータ可読記憶媒体のより具体的な例には、１本以上のワイヤによる電気的接続、携帯型コンピュータディスク、ハードディスク、ランダムアクセスメモリ（ＲＡＭ）、読み出し専用メモリ（ＲＯＭ）、消去可能プログラマブル読み取り専用メモリ（ＥＰＲＯＭまたはフラッシュメモリ）、光ファイバ、コンパクトディスク読み取り専用メモリ（ＣＤ−ＲＯＭ）、光記憶装置、磁気記憶装置、または上記の任意の組み合わせが含まれるが、これらに限らない。本開示の実施例では、コンピュータ可読記憶媒体は、プログラムを含むかまたは記憶する任意の有形の媒体であることができ、このプログラムは、指令実行システム、装置またはデバイスによって使用され、またはそれらと組み合わせて使用されることができる。本開示の実施例では、コンピュータが読み取り可能な信号媒体は、コンピュータが読み取り可能なプログラムコードを担持した、ベースバンド内でまたは搬送波の一部として伝播されるデータ信号を含んでもよい。このような伝播されたデータ信号は、多種の形式を採用でき、電磁気信号、光信号、または上記の任意の適切な組み合わせを含むが、これらに限らない。コンピュータが読み取り可能な信号媒体は、コンピュータ可読記憶媒体以外のいかなるコンピュータ可読媒体であってもよく、このコンピュータ可読信号媒体は、指令実行システム、装置またはデバイスによって使用され、またはそれらと組み合わせて使用されるためのプログラムを送信、伝播または伝送することができる。コンピュータ可読媒体に含まれるプログラムコードは、任意の適切な媒体で伝送されることができ、ワイヤ、光ファイバケーブル、ＲＦ（無線周波数）など、または上記の任意の適切な組み合わせを含むが、これらに限らない。 In particular, according to the embodiments of the present disclosure, the process described with reference to the flowchart above can be implemented as a program of computer software. For example, the embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer readable medium, the computer program including program code for performing the method shown in the flowchart. In such an embodiment, the computer program can be downloaded and installed from the network via the communication device 709, installed from the storage device 708, or installed from the ROM 702. When this computer program is executed by the processor 701, it performs the above functions limited by the methods of the embodiments of the present disclosure. The computer-readable medium described in the examples of the present disclosure may be a computer-readable signal medium, a computer-readable storage medium, or any combination of both of the above. The computer-readable storage medium may be, for example, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device or device, or any combination of the above, but is not limited thereto. More specific examples of computer-readable storage media include electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory. (EPROM or flash memory), optical fiber, compact disk read-only memory (CD-ROM), optical storage, magnetic storage, or any combination of the above, but not limited to these. In the embodiments of the present disclosure, the computer-readable storage medium can be any tangible medium containing or storing a program, which program is used by, or combined with, a command execution system, device or device. Can be used. In the embodiments of the present disclosure, the computer-readable signal medium may include a data signal propagating within the baseband or as part of a carrier wave carrying a computer-readable program code. Such propagated data signals can adopt a variety of formats, including, but not limited to, electromagnetic signals, optical signals, or any suitable combination described above. The computer-readable signal medium may be any computer-readable medium other than a computer-readable storage medium, which is used by, or in combination with, a command execution system, device or device. Can send, propagate or transmit a program for. The program code contained on a computer-readable medium can be transmitted on any suitable medium, including wires, fiber optic cables, RF (radio frequency), etc., or any suitable combination described above. Not exclusively.

上記コンピュータ可読媒体は、上記電子機器に含まれてもよく、個別に存在しこの電子機器に組み込まれなくてもよい。上記のコンピュータ可読媒体は、１つ以上のプログラムを担持し、上記の１つ以上のプログラムが当該電子機器によって実行されたとき、当該電子機器は、小目標を含むオリジナル画像を取得し、オリジナル画像を低解像度画像に縮小し、候補領域に対応するオリジナル画像の領域を関心領域とし、予めトレーニングされた検出モデルを関心領域上で実行することにより、オリジナル画像における小目標の位置を確定する。 The computer-readable medium may be included in the electronic device, or may exist individually and may not be incorporated in the electronic device. The computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device acquires an original image including a small target and the original image. Is reduced to a low resolution image, the area of the original image corresponding to the candidate area is set as the area of interest, and the position of the small target in the original image is determined by executing the pre-trained detection model on the area of interest.

本開示の実施例の操作を実行するためのコンピュータプログラムコードを、１以上のプログラミング言語またはそれらの組み合わせで書くことができ、前記プログラミング言語には、Ｊａｖａ、Ｓｍａｌｌｔａｌｋ、Ｃ＋＋などのオブジェクト指向プログラミング言語を含み、さらに「Ｃ」言語または同様のプログラミング言語などの従来の手続き型プログラミング言語も含まれる。プログラムコードは、完全にユーザのコンピュータ上で、部分的にユーザのコンピュータ上、１つの単独のソフトウェアパッケージとして、部分的にユーザのコンピュータ上かつ部分的にリモートコンピュータ上で、あるいは完全に遠隔コンピュータまたはサーバー上で実行されることができる。遠隔コンピュータに関する場合には、遠隔コンピュータは、ローカルエリアネットワーク（ＬＡＮ）または広域ネットワーク（ＷＡＮ）を含む任意の種類のネットワークを介してユーザのコンピュータに接続されることができ、または、外部のコンピュータに接続されることができる（例えばインターネットサービスプロバイダを利用してインターネットを介して接続する）。 Computer program code for performing the operations of the embodiments of the present disclosure can be written in one or more programming languages or combinations thereof, and the programming languages include object-oriented programming languages such as Java, Smalltalk, and C ++. Also includes traditional procedural programming languages such as the "C" language or similar programming languages. The program code is entirely on the user's computer, partly on the user's computer, as a single software package, partly on the user's computer and partly on the remote computer, or entirely on the remote computer or It can be run on the server. When it comes to remote computers, the remote computer can be connected to the user's computer via any type of network, including a local area network (LAN) or wide area network (WAN), or to an external computer. Can be connected (eg, connect via the Internet using an internet service provider).

図の中のフローチャートおよびブロック図には、本開示の様々な実施例によるシステム、方法とコンピュータプログラム製品の実現可能なアーキテクチャ、機能、および操作が示されている。この点で、フローチャート又はブロック図の各ブロックは、１つのモジュール、プログラミングのセグメント、またはコードの一部を代表でき、当該モジュール、プログラミングのセグメント、またはコードの一部は、所定のロジック機能を実現するための１つ以上の実行可能指令を含む。また、いくつかの代替の実施例では、ブロックに示されている機能は、図面に示された順序と異なる順序で発生してもよいことに留意されたい。例えば、連続して示す２つのブロックは実際に並行して実行されてもよく、それらは係る機能に応じて時に逆の順序で実行されてもよい。ブロック図および／またはフローチャートの各ブロック、およびブロック図および／またはフローチャートのブロックの組み合わせは、特定の機能または操作を実行する専用のハードウェアによるシステムによって実現されてもよく、または専用ハードウェアとコンピュータ指令の組み合わせによって実現されてもよいことにも留意されたい。 The flowcharts and block diagrams in the figure show the feasible architectures, functions, and operations of the systems, methods, and computer program products according to the various embodiments of the present disclosure. In this regard, each block of the flowchart or block diagram can represent one module, programming segment, or part of code, and that module, programming segment, or part of code implements a given logic function. Includes one or more actionable directives to do so. Also note that in some alternative embodiments, the functions shown in the blocks may occur in a different order than shown in the drawings. For example, two blocks shown in succession may actually be executed in parallel, and they may sometimes be executed in reverse order depending on the function concerned. Each block of block diagrams and / or flowcharts, and a combination of blocks of block diagrams and / or flowcharts, may be implemented by a system of dedicated hardware that performs specific functions or operations, or dedicated hardware and a computer. It should also be noted that this may be achieved by a combination of directives.

本開示の実施例に係るユニットは、ソフトウェアによって実現されてもよく、ハードウェアによって実現されてもよい。説明されたユニットは、プロセッサに設置されてもよく、例えば、「取得ユニットと、縮小ユニットと、第１の検出ユニットと、第２の検出ユニットとを含むプロセッサである」と記載してもよい。ここで、これらのユニットの名は、ある場合にはそのユニット自体を限定しなくて、例えば、取得ユニットを「ユーザのウェブページ閲覧要求を受信するユニット」と記載してもよい。 The units according to the embodiments of the present disclosure may be realized by software or hardware. The described unit may be installed in a processor and may be described as, for example, "a processor including an acquisition unit, a reduction unit, a first detection unit, and a second detection unit". .. Here, the names of these units are not limited to the units themselves in some cases, and for example, the acquisition unit may be described as "a unit that receives a user's web page browsing request".

上記の説明は、本開示の好ましい実施例および応用された技術の原理の説明にすぎない。本開示の実施例に係る発明の範囲が、上記の技術的特徴を組み合わせて得られた技術案に限定されず、同時に上記の発明の概念から逸脱しない場合に、上記の技術的特徴またはこれと同等の技術的特徴を任意に組み合わせて得られた他の技術案を含むべきであることを当業者は理解すべきである。例えば、上記の特徴が本開示において開示されているもの（しかしこれに限らず）と類似した機能を有する技術的特徴と相互に入れ替わって形成された技術案が挙げられる。 The above description is merely a description of the preferred embodiments of the present disclosure and the principles of the applied technique. The scope of the invention according to the embodiment of the present disclosure is not limited to the technical proposal obtained by combining the above technical features, and at the same time, the above technical features or the above technical features shall not deviate from the concept of the above invention. Those skilled in the art should understand that other technical proposals obtained by any combination of equivalent technical features should be included. For example, there may be a technical proposal in which the above features are interchangeably formed with technical features having functions similar to those disclosed in the present disclosure (but not limited to this).

Claims

To get the original image including the small goal,
Reducing the original image to a low resolution image and
Using a lightweight split network to identify candidate regions containing the small target from the low resolution image,
By using the region of the original image corresponding to the candidate region as the region of interest and executing a pre-trained detection model on the region of interest, the position of the small target in the original image can be determined.
A method for detecting small goals, including.

The detection model is trained in the following way, i.e.
The network structure of the initial detection model is fixed, and the network parameters of the initial detection model are initialized.
Obtain a training sample set, where the training sample contains the sample image and annotation information to characterize the position of the small goal in the sample image.
The training sample was enhanced by at least one method of copying, multiscale variation, and editing.
The sample images and annotation information in the training sample in the enhanced training sample set are used as inputs and desired outputs of the initial detection model, respectively, and the initial detection model is trained by a machine learning method.
The initial detection model obtained by training is determined as the pre-trained detection model.
The method according to claim 1.

Edit the training sample as follows, ie
Extract small goals from sample images and
Obtain a new sample image by scaling and / or rotating the small target and then randomly pasting it elsewhere in the sample image.
The method according to claim 2.

When creating the training sample of the divided network, the pixel points inside the rectangular frame for detecting the task are set to the positive sample, and the pixel points outside the rectangular frame are set to the negative sample.
Expanding the rectangular frame of the small target whose length is smaller than the predetermined number of pixels to the outside,
Setting all the pixels in the rectangular frame spread out to be positive samples,
The method according to claim 1, further comprising.

The method according to any one of claims 1 to 3, wherein the detection model is a deep neural network.

The method of claim 5, wherein an attention module is introduced after feature fusion of each predictor layer to learn appropriate weighting for features of different channels.

With the acquisition unit arranged to acquire the original image including the small target,
A reduction unit arranged to reduce the original image to a low resolution image, and
A first detection unit arranged to identify a candidate region containing the small target from the low resolution image using a lightweight divided network.
A region of the original image corresponding to the candidate region is set as the region of interest, and a pre-trained detection model is executed on the region of interest to determine the position of the small target in the original image. 2 detection units and
A device for detecting small targets, including.

It further includes training units arranged as follows, i.e.
The network structure of the initial detection model is fixed, and the network parameters of the initial detection model are initialized.
Obtain a training sample set, where the training sample contains the sample image and annotation information to characterize the position of the small goal in the sample image.
The training sample was enhanced by at least one method of copying, multiscale variation, and editing.
The sample images and annotation information in the training sample in the enhanced training sample set are used as inputs and desired outputs of the initial detection model, respectively, and the initial detection model is trained by a machine learning method.
The initial detection model obtained by training is determined as the pre-trained detection model.
The device according to claim 7.

The training unit further
Extract small goals from sample images and
Obtain a new sample image by scaling and / or rotating the small target and then randomly pasting it elsewhere in the sample image.
Arranged like
The device according to claim 8.

The first detection unit further
When creating the training sample of the divided network, the pixel points inside the rectangular frame for detecting the task are set as positive samples, and the pixel points outside the rectangular frame are set as negative samples.
Expand the rectangular frame of the small target whose length is smaller than the predetermined number of pixels to the outside,
Set all the pixels in the rectangular frame spread outward as a positive sample,
Arranged like
The device according to claim 7.

The apparatus according to any one of claims 7 to 10, wherein the detection model is a deep neural network.

11. The apparatus of claim 11, wherein an attention module is introduced after feature fusion of each predictor layer to learn appropriate weighting for features of different channels.

With one or more processors
A storage device in which one or more programs are stored, and
Including
When the one or more programs are executed by the one or more processors, the method according to any one of claims 1 to 6 is realized in the one or more processors.
An electronic device for detecting small targets.

A computer-readable medium in which a computer program is stored.
The method according to any one of claims 1 to 6 is realized when the computer program is executed by a processor.
Computer-readable medium.

It ’s a computer program,
A computer program that, when executed by a processor, implements the method of any one of claims 1-6.