WO2019223582A1

WO2019223582A1 - Target detection method and system

Info

Publication number: WO2019223582A1
Application number: PCT/CN2019/087015
Authority: WO
Inventors: Haifeng Shen; Yuan Zhao; Guangda YU
Original assignee: Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date: 2018-05-24
Filing date: 2019-05-15
Publication date: 2019-11-28

Abstract

Target detection methods and systems are provided. The target detection method may include obtaining a target detection model. The target detection model may be generated based on supervised labelled image data and unsupervised labelled image data of pre-training image data. The target detection model may indicate a detection model for targets. The supervised labelled image data and the unsupervised labelled image data may indicate attribute information of the targets in the pre-training image data. The target detection method may further include determining image data-to-be-detected and executing the target detection model on the image data-to-be-detected to generate attribute information of the targets in the image data-to-be-detected.

Description

TARGET DETECTION METHOD AND SYSTEM

CROSS-REFERENCE TO RELATED APPLICATION

This application claims priority to Chinese Patent Application No. 201810510732.3, filed on May 24, 2018, and Chinese Patent Application No. 201810547022.8, filed on May 31, 2018, the contents of which are hereby incorporated by reference.

TECHNICAL FIELD

The present disclosure generally relates to target detection, and in particular, to systems and methods for training a target detection model and systems and methods for executing the trained target detection model to detect targets in image data.

BACKGROUND

Target detection systems have been widely used in a variety of scenarios that require a detection or recognition of targets or portions thereof. Exemplary scenarios include autonomous driving, video surveillance, face detection, and security checks (e.g., Customs) . Usually, target detection systems execute a target detection model to detect targets in image data (e.g., still images, videos, video frames, software code corresponding to images) , and the target detection model is normally trained by a plurality of labelled sample images. Conventionally, the labelled sample images are generated by an operator. However, as the posture, size, and/or shapes of the targets in the sample images are different, errors may occur during the labelling of the targets. Also, the labelling of the sample images can be very subjective (e.g., highly dependent on the operator’s personal judgments) . In addition, a vast number of labelled sample images are usually required to train a model with low errors, which makes the labelling process and the subsequent training process very time consuming. In order to reduce errors and subjectivity, and to save time, sometimes multiple operators are used to simultaneously label the targets in the sample images. However, different operators have different criterions of labelling (e.g., whether a very small target should be labelled, whether a twisted target should be labelled) and reduce the consistency of the labelled sample images, dramatically reducing the accuracy and the convergence rate of the target detection model.

Besides, subjects that have the same characteristics as the target (also referred as “interference subject” ) can interfere with the detection or recognition of the targets, and the target detection systems are likely to erroneously detect the interference subjects as the targets. Therefore, in order to improve the accuracy of the detection or recognition of the targets, during the labeling of the sample images, the interference subjects needs to be labelled and distinguished with the targets. Thus, it is desirable to provide systems and methods for training a low-error target detection model in a time-saving and objective way.

SUMMARY

According to an aspect of the present disclosure, a target detection method is provided. The target detection method may include obtaining a target detection model. The target detection model may be generated based on supervised labelled image data and unsupervised labelled image data of pre-training image data. The target detection model may indicate a detection model for targets. The supervised labelled image data and the unsupervised labelled image data may indicate attribute information of the targets in the pre-training image data. The target detection method may further include determining image data-to-be-detected and executing the target detection model on the image data-to-be-detected to generate attribute information of the targets in the image data-to-be-detected.

In some embodiments, the obtaining the target detection model may include generating an initial seed model based on the supervised labelled image data, preprocessing the pre-training image data based on the initial seed model to generate unsupervised labelled image data, and generating the target detection model based on the supervised labelled image data and the unsupervised labelled image data.

In some embodiments, the target detection method may further include, before the target detection model is generated based on supervised labelled image data and unsupervised labelled image data of pre-training image data, performing a filtering process on the unsupervised labelled image data to obtain retained first unsupervised labelled image data. The retained first unsupervised labelled image data may satisfy a first preset condition. The target detection method may further include generating the target detection model based on supervised labelled image data and the first unsupervised labelled image data.

In some embodiments, the first preset condition may include an average output score of the initial seed model being greater than a first threshold, a size ratio of boundaries of a detection region satisfying a preset ratio, a color of a detection region satisfying a preset color request and/or an angle of a target satisfying a preset angle.

In some embodiments, the generating the target detection model based on supervised labelled image data and the first unsupervised labelled image data may include step A: generating a seed model based on the supervised labelled image data and the first unsupervised labelled image data, step B: preprocessing the pre-training image data based on the seed model to generate updated first unsupervised labelled image data and step C: training the seed model based on the supervised labelled image data and the updated first unsupervised labelled image data to generate an updated seed model. If the updated first unsupervised labelled image data does not satisfy a second preset condition, the generating the target detection model may include designating the updated first unsupervised labelled image data as the first unsupervised labelled image data, designating the updated seed model as the seed model, performing step A and step B iteratively until the updated first unsupervised labelled image data satisfy a second preset condition, and designating the updated seed model as the target detection model.

In some embodiments, the second preset condition may include a count of the iterations being greater than a second threshold and/or an average score corresponding to the updated first unsupervised labelled image data being greater than a third threshold.

In some embodiments, the target detection method may also include: processing the set of training image data based on a second target examination model, wherein the second target examination model provides labelling information of the target and labelling information of an interference subject associated with the target; and generating the target detection model based on the set of processed training image data.

In some embodiments, the target detection method may also include: processing the set of supervised labelled image data based on a second target examination model, wherein the second target examination model provides labelling information of the target and labelling information of an interference subject associated with the target; obtaining a set of processed training image data based on the set of processed supervised labelled image data; and generating the target detection model based on the set of processed training image data.

In some embodiments, the labelling information of the target may include a category of the target, and a score of the target. The labelling information of the interference subject may include a category of the interference subject, and a score of the interference subject

According to another aspect of the present disclosure, a target detection apparatus is provided. The target detection apparatus may include an acquisition unit, a determination unit, and a detection unit. The acquisition unit may be configured to obtain a target detection model. The target detection model may be generated based on supervised labelled image data and unsupervised labelled image data of pre-training image data. The target detection model may indicate a detection model for targets. The supervised labelled image data and the unsupervised labelled image data may indicate attribute information of the targets in the pre-training image data. The determination unit may be configured to determine image data-to-be-detected. The detection unit may be configured to execute the target detection model on the image data-to-be-detected to generate attribute information of the targets in the image data-to-be-detected.

In some embodiments, the acquisition unit may be further configured to generate an initial seed model based on the supervised labelled image data, preprocess the pre-training image data based on the initial seed model to generate unsupervised labelled image data, and generate the target detection model based on the supervised labelled image data and the unsupervised labelled image data.

In some embodiments, the apparatus may further include a processing unit, configured to perform a filtering process on the unsupervised labelled image data to obtain retained first unsupervised labelled image data. The retained first unsupervised labelled image data may satisfy a first preset condition. The acquisition unit may be further configured to generate the target detection model based on supervised labelled image data and the first unsupervised labelled image data.

In some embodiments, the acquisition unit may be further configured to perform a step A: generating a seed model based on the supervised labelled image data and the first unsupervised labelled image data, perform a step B: preprocessing the pre-training image data based on the seed model to generate updated first unsupervised labelled image data, and perform a step C: training the seed model based on the supervised labelled image data and the updated first unsupervised labelled image data to generate an updated seed model. If the updated first unsupervised labelled image data does not satisfy a second preset condition, the acquisition unit may be further configured to designate the updated first unsupervised labelled image data as the first unsupervised labelled image data, designate the updated seed model as the seed model, perform step A and step B iteratively until the updated first unsupervised labelled image data satisfy a second preset condition, and designate the updated seed model as the target detection model.

According to another aspect of the present disclosure, a computer device is provided. The computer device may include at least one computer-readable storage medium including a set of instructions and at least one processor in communication with the at least one computer-readable storage medium. When executing the set of instructions, the at least one processor may be directed to perform a target detection method.

According to another aspect of the present disclosure, a non-transitory computer-readable storage medium embodying a computer program product is provided. The computer program product may include instructions and be configured to cause a computing device to perform a target detection method.

According to another aspect of the present disclosure, a target examination method is provided. The target examination method may include: obtaining data to be examined; and examining the data to be examined, outputting an examination result including a category identifier of a target to be examined and a category identifier of an interference subject, and obtaining the target to be examined. The interference subject may be erroneously identified as the target to be examined for at least once in a target examination process.

In some embodiments, the data to be examined may include image data, and the examination result may be image data that uses different identification modes to identify the target to be examined and the interference subject.

In some embodiments, the examining the data to be examined may include: using a second target examination model to examine the data to be examined. The second target examination model may be obtained based on a training process below: training a preliminary model to generate the second target examination model using training data including labelling information of the target to be examined and labelling information of the interference subject. The interference subject may be erroneously identified as the target to be examined for at least once in the target examination process.

In some embodiments, the training process may further include: obtaining preliminary training data, and training the preliminary model to obtain a preliminary target examination model using the preliminary training data; using the preliminary target examination model to perform target examination and outputting examination results; and obtaining training data labelled with the interference subject based on the examination results; training the preliminary target examination model using the training data labelled with the interference subject, and obtaining the second target examination model. The preliminary training data may at least include labelling information of the target to be examined.

In some embodiments, the preliminary training data may also include labelling information of a priori interference subject. The priori interference subject may be erroneously identified as the target to be examined for at least once in a target examination process other than the training process of the second target examination model.

According to another aspect of the present disclosure, a target examination system is provided. The target examination system may include an obtaining module and an examination module. The obtaining module may be configured to obtain data to be examined. The examination module may be configured to examine the data to be examined, output an examination result including a category identifier of a target to be examined and a category identifier of an interference subject, and obtain the target to be examined. The interference subject may be erroneously identified as the target to be examined for at least once in a target examination process.

In some embodiments, the data to be examined may include image data, and the examination result may be image data of using different identification modes to identify the target to be examined and the interference subject.

In some embodiments, the examination module may be further configured to use a second target examination model to examine the data to be examined, and the system may further include a training module. The training module may be configured to train a preliminary model to generate the second target examination model using training data including labelling information of the target to be examined and labelling information of the interference subject. The interference subject may be erroneously identified as the target to be examined for at least once in the target examination process.

In some embodiments, the training module may further include: a preliminary training unit, a model examination unit, and a second training unit. The preliminary training unit may be configured to train the preliminary model to obtain a preliminary target examination model using preliminary training data. The model examination unit may be configured to use the preliminary target examination model to perform target examination and output examination results. The second training unit may be configured to obtain training data labelled with the interference subject based on the examination results and train the preliminary target examination model using the training data labelled with the interference subject to obtain the second target examination model. The preliminary training data may at least include labelling information of the target to be examined.

In some embodiments, the preliminary training data may further include labelling information of a priori interference subject. The priori interference subject may be erroneously identified as the target to be examined for at least once in a target examination process other than the training process of the second target examination model.

According to another aspect of the present disclosure, a target examination device is provided. The target examination device may include at least one processor and at least one storage device. The at least one storage device may be configured to store a set of instructions. The at least one processor may be configured to execute at least part of the set of instructions to perform a method. The method may include: include obtaining data to be examined; and examining the data to be examined, outputting an examination result including a category identifier of a target to be examined and a category identifier of an interference subject, and obtaining the target to be examined. The interference subject may be erroneously identified as the target to be examined for at least once in a target examination process.

According to another aspect of the present disclosure, a non-transitory computer readable medium is provided. The non-transitory computer readable medium may include executable instructions that, when executed by at least one processor, a method is performed. The method may include: include obtaining data to be examined; and examining the data to be examined, outputting an examination result including a category identifier of a target to be examined and a category identifier of an interference subject, and obtaining the target to be examined. The interference subject may be erroneously identified as the target to be examined for at least once in a target examination process.

Additional features will be set forth in part in the description which follows, and in part will become apparent to those skilled in the art upon examination of the following and the accompanying drawings or may be learned by production or operation of the examples. The features of the present disclosure may be realized and attained by practice or use of various aspects of the methodologies, instrumentalities and combinations set forth in the detailed examples discussed below.

BRIEF DESCRIPTION OF THE DRAWINGS

The present disclosure is further described in terms of exemplary embodiments. These exemplary embodiments are described in detail with reference to the drawings. These embodiments are non-limiting exemplary embodiments, in which like reference numerals represent similar structures throughout the several views of the drawings, and wherein:

FIG. 1 is a schematic diagram illustrating an exemplary target detection system according to some embodiments of the present disclosure;

FIG. 2 is a schematic diagram illustrating an exemplary scenario for autonomous vehicle according to some embodiments of the present disclosure;

FIG. 3 is a schematic diagram illustrating exemplary hardware components of a computing device according to some embodiments of the present disclosure;

FIG. 4 is a block diagram illustrating exemplary processing device according to some embodiments of the present disclosure;

FIG. 5 is a flowchart illustrating an exemplary process for generating attribute information of target according to some embodiments of the present disclosure;

FIG. 6 is a flowchart illustrating exemplary processes for generating a target detection model according to some embodiments of the present disclosure;

FIG. 7 is a schematic diagram illustrating an exemplary process for generating unsupervised labelled image data according to some embodiments of the present disclosure;

FIG. 8 is a flowchart illustrating exemplary processes for generating a target detection model according to some embodiments of the present disclosure;

FIG. 9 is a schematic diagram illustrating an exemplary process for generating updated seed model according to some embodiments of the present disclosure;

FIG. 10 is a schematic diagram illustrating exemplary detection result according to some embodiments of the present disclosure;

FIG. 11 is a schematic diagram illustrating an exemplary structure of a model according to some embodiments of the present disclosure;

FIG. 12 is a block diagram illustrating an exemplary processing device according to some embodiments of the present disclosure;

FIG. 13 is a flowchart illustrating a process for determining an examination result of the data to be examined according to some embodiments of the present disclosure; and

FIG. 14 is a flowchart illustrating a process for generating a second target examination model according to some embodiments of the present disclosure.

DETAILED DESCRIPTION

The following description is presented to enable any person skilled in the art to make and use the present disclosure, and is provided in the context of a particular application and its requirements. Various modifications to the disclosed embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be applied to other embodiments and applications without departing from the spirit and scope of the present disclosure. Thus, the present disclosure is not limited to the embodiments shown, but is to be accorded the widest scope consistent with the claims.

The terminology used herein is to describe particular example embodiments only and is not intended to be limiting. As used herein, the singular forms "a, " "an, " and "the" may be intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprise, " "comprises, " and/or "comprising, " "include, " "includes, " and/or "including, " when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

These and other features, and characteristics of the present disclosure, as well as the methods of operation and functions of the related elements of structure and the combination of parts and economies of manufacture, may become more apparent upon consideration of the following description with reference to the accompanying drawings, all of which form a part of this disclosure. It is to be expressly understood, however, that the drawings are for the purpose of illustration and description only and are not intended to limit the scope of the present disclosure. It is understood that the drawings are not to scale.

The flowcharts used in the present disclosure illustrate operations that systems implement according to some embodiments in the present disclosure. It is to be expressly understood, the operations of the flowchart may be implemented not in order. Conversely, the operations may be implemented in inverted order, or simultaneously. Moreover, one or more other operations may be added to the flowcharts. One or more operations may be removed from the flowcharts.

The terms “filtering” and “selecting” in the present disclosure are used interchangeably to refer to a process of deleting some or all elements from a set of elements based on a selection condition (also referred to as a filtering condition) . The elements that do not satisfy the selection condition are “deleted” , “diposed” , or “filtered” . The elements that satisfy the selection condition are “selected” or “retained” .

The terms “detection” , “identification” , and “recognition” in the present disclosure are used interchangeably to each refer to a complete process of detecting an existence of a target in image data, identifying a size, a relative position and a shape of the target, and recognizing a type and/or identity of the target.

In some embodiments, the present method may be related to the training of a model. It should be understood the “training” of a model may refer to a process of updating a previously trained model, or a process of generating a new model, or both of them.

In some embodiments, the present method may be related to a preprocessing process of a model. The preprocessing process in the present disclosure may include using a model to select one or more images that include a target and labelling the target in the selected images. Such preprocessing process is also known as a process of executing the model.

In some embodiments, the “detection” of a target in image data may refer to a process of generating attribute information of the target. The attribute information may include regional information (the position and/or shape) , category information, content information, or the like, of the target in the image data.

An aspect of the present disclosure relates to systems and methods for training a target detection model. For example, a plurality of training images (e.g., 400 training images) may be obtained. The plurality of training images may correspond to a similar environment or (at least a majority of them) include a similar target. Some of the plurality of training images (e.g., 150 training images) may be selected and labelled manually. In other words, operators may provide attribute information of the target in each of the 150 training images. The attribute information may include regional information (the position and/or shape) , category information, content information, or the like, of the target in the each of the 150 training images. More specifically, a user may describe a boundary around the target in the training images and provide a category (e.g., human, animal, tree, building) and content information of the target. The 150 manually labelled images are also called supervised labelled images (or supervised labelled image data) . The supervised labelled images may be used to train an initial seed model. The initial seed model may be used to preprocess (as mentioned above, a preprocess process may refer to a filtering process and a labelling process) the plurality of training images (e.g., 400 training images) or the remaining training images (e.g., 250 training images) . For example, the initial seed model may select 120 training images from the remaining 250 training images and label them as unsupervised labelled images (or unsupervised labelled image data) . The 150 supervised labelled images and the 120 unsupervised labelled images may be collectively used to update the initial seed model or train a new model to generate a target detection model. In some embodiments, the unsupervised labelled images may be further updated by the target detection model and the further updated unsupervised labelled images may be used to further update the target detection model. In some embodiments, the unsupervised labelled images may be manually corrected before used to train the initial seed model.

In some cases, before obtaining the training image data, the supervised labelled image data may be processed based on a second target examination model. The second target examination model may be configured to provide labelling information of the target and labelling information of an interference subject associated with the target.

In some cases, before generating the target detection model, the training image data may be processed based on the second target examination model.

Another aspect of the present disclosure relates to systems and methods for determining an examination result of data to be examined using a second target examination model. The examination result may include labelling information of a target to be examined and labelling information of an interference subject associated with the target to be examined. As used herein, the second target examination model may be generated based on training data include labelling information of a training target to be examined and labelling information of a training interference subject associated with the training target to be examined.

FIG. 1 is a schematic diagram illustrating an exemplary target detection system 100 according to some embodiments of the present disclosure. The target detection system 100 may be used to detect targets in image data (e.g., still images, videos, video frames, software code corresponding to images) . In some embodiments, the target detection system 100 may be used in various application scenarios, e.g., face recognition, video surveillance and analysis, recognition and analysis, smart driving, 3D image vision, industrial visual examination, medical imaging diagnostics, text recognition, image and video editing, etc. The face recognition may include but is not limited to, attendance, access control, identity authentication, face attribute recognition, face examination and tracking, reality examining, face contrasting, face searching, face key point positioning, or the like, or any combination thereof. Video surveillance and analysis may include, but is not limited to, intelligent identification, analysis and positioning for objects, commodity, pedestrian attributes, pedestrian analysis and tracking, crowd density and passenger flow analysis, vehicle behavior analysis, or the like, or any combination thereof. Image recognition and analysis may include, but is not limited to, image searching, object/scene recognition, vehicle type recognition, character attribute analysis, character clothing analysis, item identification, pornography/violence identification, or the like, or any combination thereof. Smart driving may include, but is not limited to, vehicle and object examination and collision warning, lane examination and offset warning, traffic sign identification, pedestrian examination, vehicle distance examination, or the like, or any combination thereof. 3D image vision may include, but is not limited to, 3D machine vision, binocular stereo vision, 3D reconstruction, 3D scanning, mapping, map measurement, industrial simulation, or the like, or any combination thereof. Industrial vision examination may include, but is not limited to, industrial cameras, industrial vision monitoring, industrial vision measurement, industrial control, or the like, or any combination thereof. Medical imaging diagnostics may include, but is not limited to, tissue examination and identification, tissue localization, lesion examination and recognition, lesion localization, or the like, or any combination thereof. Text recognition may include, but is not limited to, text examination, text extraction, text recognition, or the like, or any combination thereof. Image and video editing may include, but is not limited to, image/video authoring, image/video repairing, image/video beautification, image/video effect transformation, or the like, or any combination thereof. Different embodiments of the present disclosure may be applied in different industries, e.g., Internet, financial industry, smart home, e-commerce shopping, security, transportation, justice, military, public security, frontier inspection, government, aerospace, electricity, factory, agriculture, forestry, education, entertainment, medical, or the like, or any combination thereof. It should be understood that application scenarios of the system and method disclosed herein are only some examples or embodiments, and those having ordinary skills in the art, without further creative efforts, may apply these drawings to other application scenarios.

For example, the target detection system 100 may be used in an autonomous driving system, or a part thereof (e.g., as a driving aid) . More particularly, the target detection system 100 may capture real-time image data around a vehicle and detect targets in the captured image data. The target detection system 100 may send an instruction to change the speed of the vehicle or make a turn according to the detected targets. As another example, the target detection system 100 may be used in a video surveillance system, or a part thereof. The target detection system 100 may continuously monitor an environment (e.g., buildings, parking lots, traffic lights, city streets, vehicles, etc. ) and detect targets (e.g., humans, vehicles, animals, etc. ) in it. As a further example, the target detection system 100 may be used in a security system (e.g., a Customs) . Merely by way of example, the target detection system 100 may capture X-ray images of a luggage of a person. The target detection system 100 may detect and recognize items in the luggage based on the X-ray images of the luggage.

In some embodiments, the target detection system 100 or a part thereof may be mounted on a vehicle or a component thereof. The target detection system 100 may include a processing device 110, a terminal 120, an image capturing device 130, a network 140 and a storage device150.

In some embodiments, the processing device 110 may be a single processing device or a processing device group. The processing device group may be centralized, or distributed (e.g., the processing device 110 may be a distributed system) . In some embodiments, the processing device 110 may be local or remote. For example, the processing device 110 may access information and/or data stored in the terminal 120, the image capturing device 130, and/or the storage device 150 via the network 140. As another example, the processing device 110 may be directly connected to the terminal 120, the image capturing device 130, and/or the storage device 150 to access stored information and/or data. In some embodiments, the processing device 110 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or a combination thereof. In some embodiments, the processing device 110 may be implemented on a computing device 200 having one or more components illustrated in FIG. 2 in the present disclosure.

In some embodiments, the processing device 110 may include a processing engine 112. The processing engine 112 may process information and/or data related to the drowsiness detection to perform one or more functions described in the present disclosure. For example, the processing engine 112 may train a target detection model based on labelled image data. The labelled image data may include supervised image data that is labelled by an operator and unsupervised image data that is labelled by a machine (e.g., a model stored in the storage device 150 and/or the processing device 110) . The processing engine 112 may detect targets in real-time image data using the trained target detection model. In some embodiments, the processing engine 112 may include one or more processing engines (e.g., single-core processing engine (s) or multi-core processor (s) ) . Merely by way of example, the processing engine 112 may include a central processing unit (CPU) , an application-specific integrated circuit (ASIC) , an application-specific instruction-set processor (ASIP) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a digital signal processor (DSP) , a field-programmable gate array (FPGA) , a programmable logic device (PLD) , a controller, a microcontroller unit, a reduced instruction-set computer (RISC) , a microprocessor, or the like, or a combination thereof.

In some embodiments, the terminal 120 may include a tablet computer 120-1, a laptop computer 120-2, a built-in device in a vehicle 120-3, a mobile device 120-4, or the like, or a combination thereof. In some embodiments, the mobile device 120-4 may include a smart home device, a wearable device, a smart mobile device, an augmented reality device, or the like, or a combination thereof. In some embodiments, the wearable device may include a smart bracelet, a smart footgear, smart glasses, a smart helmet, a smart watch, smart clothing, a smart backpack, a smart accessory, or the like, or a combination thereof. In some embodiments, the smart mobile device may include a smartphone, a personal digital assistant (PDA) , a gaming device, a navigation device, a point of sale (POS) device, or the like, or a combination thereof. In some embodiments, the built-in device in the vehicle 120-3 may include an onboard computer, an automobile data recorder, an auto-piloting system, an onboard human-computer interaction (HCI) system, an onboard television, etc. For example, the processing device 110 may send an instruction or control command to the auto-piloting system of the vehicle 120-3 to control the movement of the vehicle 120-3 based on a target detection result.

The image capturing device 130 may be configured to capture an image of one or more objects. The image may include a still image, a video (offline or live streaming) , a frame of a video (or referred to as a video frame) , or a combination thereof. The one or more objects may be static or moving. The one or more objects may be an animal, a human being (adriver, an operator, a student, a worker) , or a portion thereof (e.g., faces) ) , buildings, vehicles, goods, or the like, or a combination thereof.

As shown in FIG. 1, the image capturing device 130 may include an automobile data recorder 130-1, a dome camera 130-2, a fixed camera 130-3, or the like, or a combination thereof. The imaging capturing device 130 may be combined with the terminal 120 (e.g., the mobile device 120-4) . In some embodiments, the automobile data recorder 130-1 may be mounted on a vehicle and configured to record a road condition around the vehicle when the driver is driving. In some embodiments, the dome camera 130-2 may be mounted on a surface (e.g., a roof, a ceiling, a wall) of a building to monitor environment around the building.

The network 140 may facilitate exchange of information and/or data. In some embodiments, one or more components of the target detection system 100 (e.g., the processing device 110, the terminal 120, the image capturing device 130 or the storage device 150) may transmit information and/or data to another component (s) of the target detection system 100 via the network 140. For example, the processing device 110 (or the processing engine 112) may receive a plurality of images or video frames from the image capturing device 130 via the network 140. As another example, the processing device 110 (or the processing engine 112) may send a notification or instruction to the terminal 120 via the network 140. In some embodiments, the network 140 may be any type of wired or wireless network, or combination thereof. Merely by way of example, the network 140 may include a cable network, a wireline network, an optical fiber network, a telecommunications network, an intranet, an Internet, a local area network (LAN) , a wide area network (WAN) , a wireless local area network (WLAN) , a metropolitan area network (MAN) , a public telephone switched network (PSTN) , a Bluetooth network, a ZigBee network, a near field communication (NFC) network, or the like, or a combination thereof. In some embodiments, the network 140 may include one or more network access points. For example, the network 140 may include wired or wireless network access points such as base stations and/or internet exchange points 140-1, 140-2, through which one or more components of the target detection system 100 may be connected to the network 140 to exchange data and/or information.

The storage device 150 may store data and/or instructions. In some embodiments, the storage device 150 may store data obtained from the terminal 120 and/or the image capturing device 130. For example, the storage device 150 may store a plurality of images captured by the image capturing device 130. In some embodiments, the storage device 150 may store data and/or instructions that the processing device 110 may execute or use to perform exemplary methods described in the present disclosure. In some embodiments, storage device 150 may include a mass storage, removable storage, a volatile read-and-write memory, a read-only memory (ROM) , or the like, or a combination thereof. Exemplary mass storage may include a magnetic disk, an optical disk, solid-state drives, etc. Exemplary removable storage may include a flash drive, a floppy disk, an optical disk, a memory card, a zip disk, a magnetic tape, etc. Exemplary volatile read-and-write memory may include random-access memory (RAM) . Exemplary RAM may include a dynamic RAM (DRAM) , a double date rate synchronous dynamic RAM (DDR SDRAM) , a static RAM (SRAM) , a thyristor RAM (T-RAM) , and a zero-capacitor RAM (Z-RAM) , etc. Exemplary ROM may include a mask ROM (MROM) , a programmable ROM (PROM) , an erasable programmable ROM (EPROM) , an electrically-erasable programmable ROM (EEPROM) , a compact disk ROM (CD-ROM) , and a digital versatile disk ROM, etc. In some embodiments, the storage device 150 may be implemented on a cloud platform. Merely by way of example, the cloud platform may include a private cloud, a public cloud, a hybrid cloud, a community cloud, a distributed cloud, an inter-cloud, a multi-cloud, or the like, or a combination thereof.

In some embodiments, the storage device 150 may be connected to the network 140 to communicate with one or more components of the target detection system 100 (e.g., the processing device 110, the terminal 120, or the image capturing device 130) . One or more components of the target detection system 100 may access the data or instructions stored in the storage device 150 via the network 140. In some embodiments, the storage device 150 may be directly connected to or communicate with one or more components of the target detection system 100 (e.g., the processing device 110, the terminal 120, the image capturing device 130) . In some embodiments, the storage device 150 may be part of the processing device 110.

In some embodiments, one or more components of the target detection system 100 (e.g., the processing device 110, the terminal 120, the image capturing device 130) may have permissions to access the storage device 150. In some embodiments, one or more components of the target detection system 100 may read and/or modify information when one or more conditions are met. In some embodiments, the storage device 150 may store one or more models. The models may include an untrained preliminary model, an initial seed model, an updated seed model and/or a trained target detection model. For example, the processing device 110 may obtain an untrained preliminary model from the storage device 150 via the network 140. The processing device 110 may train the preliminary model based on labelled image data to generate a trained target detection model. The trained target detection model may then be transmitted to the storage device 150 and stored. As another example, the processing device 110 may obtain a trained target detection model from the storage device 150. The processing device 110 may execute the target detection model to detect targets in a real-time video frame.

FIG. 2 is a schematic diagram illustrating an exemplary scenario for autonomous vehicle according to some embodiments of the present disclosure. As shown in FIG. 2, an autonomous vehicle 230 may travel along a road 221 without human input along a path autonomously determined by the autonomous vehicle 230. The road 221 may be a space prepared for a vehicle to travel along. For example, the road 221 may be a road for vehicles with wheel (e.g. a car, a train, a bicycle, a tricycle, etc. ) or without wheel (e.g., a hovercraft) , may be an air lane for an air plane or other aircraft, and may be a water lane for ship or submarine, may be an orbit for satellite. Travel of the autonomous vehicle 230 may not break traffic law of the road 221 regulated by law or regulation. For example, speed of the autonomous vehicle 230 may not exceed speed limit of the road 221.

The autonomous vehicle 230 may not collide an obstacle 210 by travelling along a path 220 determined by the autonomous vehicle 230. The obstacle 210 may be a static obstacle or a dynamic obstacle. The static obstacle may include a building, tree, roadblock, or the like, or any combination thereof. The dynamic obstacle may include moving vehicles, pedestrians, and/or animals, or the like, or any combination thereof.

The autonomous vehicle 230 may include conventional structures of a non-autonomous vehicle, such as an engine, four wheels, a steering wheel, etc. The autonomous vehicle 230 may further include a sensing system 240, including a plurality of sensors (e.g., a sensor 242, a sensor 244, a sensor 246) and a control unit 250. The plurality of sensors may be configured to provide information that is used to control the vehicle. In some embodiments, the sensors may sense status of the vehicle. The status of the vehicle may include dynamic situation of the vehicle, environmental information around the vehicle, or the like, or any combination thereof.

In some embodiments, the plurality of sensors may be configured to sense dynamic situation of the autonomous vehicle 230. The plurality of sensors may include a camera (or an image capturing device) , a video sensor, a distance sensor, a velocity sensor, an acceleration sensor, a steering angle sensor, a traction-related sensor, and/or any sensor.

For example, the camera may capture one or more images around (e.g., in front of) the vehicle. A control unit 250 in the vehicle may detect one or more targets (e.g., the obstacle 210) in the captured images and generate an instruction or control command to other components (e.g., a throttle, a steering wheel) of the autonomous vehicle 230. The distance sensor (e.g., a radar, a LiDAR, an infrared sensor) may determine a distance between a vehicle (e.g., the autonomous vehicle 230) and other objects (e.g., the obstacle 210) . The velocity sensor (e.g., a Hall sensor) may determine a velocity (e.g., an instantaneous velocity, an average velocity) of a vehicle (e.g., the autonomous vehicle 230) . The acceleration sensor (e.g., an accelerometer) may determine an acceleration (e.g., an instantaneous acceleration, an average acceleration) of a vehicle (e.g., the autonomous vehicle 230) . The steering angle sensor (e.g., a tilt sensor) may determine a steering angle of a vehicle (e.g., the autonomous vehicle 230) . The traction-related sensor (e.g., a force sensor) may determine a traction of a vehicle (e.g., the autonomous vehicle 230) .

In some embodiments, the plurality of sensors may sense environment around the autonomous vehicle 230. For example, one or more sensors may detect a road geometry and obstacles (e.g., static obstacles, dynamic obstacles) . The road geometry may include a road width, road length, road type (e.g., ring road, straight road, one-way road, two-way road) .

The control unit 250 may be configured to control the autonomous vehicle 230. The control unit 250 may control the autonomous vehicle 230 to drive along a path 220. The control unit 250 may calculate the path 220 based on the status information from the plurality of sensors. In some embodiments, the path 220 may be configured to avoid collisions between the vehicle and one or more obstacles (e.g., the obstacle 210) . The obstacle 210 may be detected by a target detection method described elsewhere in the present disclosure.

In some embodiments, the path 220 may include one or more path samples. Each of the one or more path samples may include a plurality of path sample features. The plurality of path sample features may include a path velocity, a path acceleration, a path location, or the like, or a combination thereof.

The autonomous vehicle 230 may drive along the path 220 to avoid a collision with an obstacle. In some embodiments, the autonomous vehicle 230 may pass each path location at a corresponding path velocity and a corresponding path accelerated velocity for each path location.

In some embodiments, the autonomous vehicle 230 may also include a positioning system to obtain and/or determine the position of the autonomous vehicle 230. In some embodiments, the positioning system may also be connected to another party, such as a base station, another vehicle, or another person, to obtain the position of the party. For example, the positioning system may be able to establish a communication with a positioning system of another vehicle, and may receive the position of the other vehicle and determine the relative positions between the two vehicles.

FIG. 3 is a schematic diagram illustrating exemplary hardware components of a computing device 300 according to some embodiments of the present disclosure. The computing device 300 may be a special purpose computing device for target detection, such as a single-board computing device including one or more microchips. Further, the control unit 150 may include one or more of the computing device 300. The computing device 300 may be used to implement the method and/or system described in the present disclosure via its hardware, software program, firmware, or a combination thereof.

The computing device 300, for example, may include COM ports 350 connected to and from a network connected thereto to facilitate data communications. The computing device 300 may also include a processor 320, in the form of one or more processors, for executing computer instructions. The computer instructions may include, for example, routines, programs, objects, components, data structures, procedures, modules, and functions, which perform particular functions described herein.

In some embodiments, the processor 320 may include one or more hardware processors built in one or more microchips, such as a microcontroller, a microprocessor, a reduced instruction set computer (RISC) , an application specific integrated circuits (ASICs) , an application-specific instruction-set processor (ASIP) , a central processing unit (CPU) , a graphics processing unit (GPU) , a physics processing unit (PPU) , a microcontroller unit, a digital signal processor (DSP) , a field programmable gate array (FPGA) , an advanced RISC machine (ARM) , a programmable logic device (PLD) , any circuit or processor capable of executing one or more functions, or the like, or any combinations thereof.

The exemplary computer device 300 may include an internal communication bus 310, program storage and data storage of different forms, for example, a disk 370, and a read only memory (ROM) 330, or a random access memory (RAM) 340, for various data files to be processed and/or transmitted by the computer. The exemplary computer device 300 may also include program instructions stored in the ROM 330, RAM 340, and/or other type of non-transitory storage medium to be executed by the processor 320. The methods and/or processes of the present disclosure may be implemented as the program instructions. The computing device 300 also includes an I/O component 360, supporting input/output between the computer and other components (e.g., user interface elements) . The computing device 300 may also receive programming and data via network communications.

Merely for illustration, only one processor is described in the computing device 300. However, it should be noted that the computing device 300 in the present disclosure may also include multiple processors, thus operations and/or method steps that are performed by one processor as described in the present disclosure may also be jointly or separately performed by the multiple processors. For example, if in the present disclosure the processor 320 of the computing device 300 executes both step A and step B, it should be understood that step A and step B may also be performed by two different processors jointly or separately in the computing device 300 (e.g., the first processor executes step A and the second processor executes step B, or the first and second processors jointly execute steps A and B) .

FIG. 4 is a block diagram illustrating exemplary processing device according to some embodiments of the present disclosure. As shown in FIG. 4, the processing device 110 may include an acquisition module 410, a detection module 420, a determination 430, and a processing module 440.

The acquisition module 410 (also referred to as an acquisition unit) may be configured to obtain a trained target detection model. In some embodiments, the target detection model may be trained by supervised labelled image data and/or unsupervised labelled image data. For example, the target detection model may be a general model that may be used to detect multiple targets in image data. As another example, the target detection model may be a specialized model that may be used to detect a certain target in image data. In some embodiments, the targets may be preset by the processing device 110 or an operator. The targets may be a single object or multiple objects. The targets may be a rigid object or a flexible object. As used herein, a rigid object may refer to an object whose shape does not change. A flexible object may refer to an object whose shape may change at different moments or capturing angles. The objects may include but not limited to buildings, trees, roadblocks, static vehicles, moving vehicles, pedestrians, animals, or the like, or any combination thereof. The supervised labelled image data and the unsupervised labelled image data may indicate attribute information of targets in the corresponding image data. The attribute information may include regional information, category information and/or content information.

The detection module 420 (also referred to as a detection unit) may be configured to execute the trained target detection model to generate attribute information of targets in the image data-to-be-detected. The attribute information may include regional information, category information and/or content information. The regional information may correspond to the position and/or shape of the target in the image data. For example, the regional information may include a central point of the target, a close fit boundary of the target (which may be an irregular shape) , a loose fit boundary of the target (which may be a regular shape such as rectangles, circles, ovals, squares, that is slightly larger than the actual shape of the target) , a vertical line along the height direction of the target, a horizontal line along the width direction of the target, a diagonal line, etc. For example, the detection module 420 may determine the regional information of the target as a rectangle of 30*40 pixels. The detection module 420 may mark the boundary of the rectangle or show the rectangle in colors different from background. The category information may include a category of the target in the image data. For example, the category information may include static object and dynamic object. As another example, the category information may include humans, animals, plants, vehicles, buildings, etc. The content information may include a content or identity of the target in the image data. For example, the content information may include teachers, students, pedestrians, motorbikes, bicycles, etc. As another example, the content information may include smile, sad, angry, worried, etc. The detection module 420 may mark the boundaries of the targets in the image data in different colors according to their category information and content information. For example, a red rectangle may be marked around a vehicle and a yellow circle may be marked around a rabbit.

The determination module 430 (also referred to as a determination unit) may be configured to determine image data-to-be-detected. The image data-to-be-detected may be real-time image data captured by a capturing device (e.g., the capturing device 130) or a non-real-time image data stored in the storage device 150. The image data-to-be-detected may include a target that corresponds to the target detection model. For example, the target detection model may be a model used to detect objects around a vehicle in an autonomous driving system and the image data-to-be-detected may be a real-time image data of environment around a vehicle. As another example, the target detection model may be a face detection model used to detect criminals in a Customs and the image data-to-be-detected may include photos of people captured in the Customs. The determination module 430 may determine whether a preset condition is satisfied. In some embodiments, the preset condition may include the number of iterations being greater than a second threshold and/or an average output score of the updated first unsupervised labelled image data being greater than a third threshold. The second threshold may be 1, 2, 5, 10, 15, 20, etc. The third threshold may be 80%, 90%, 95%, 99%, etc. In some embodiments, the determination module 430 may obtain a set of testing image data and a reference detection result corresponding to the set of testing image data. The determination module 430 may generate a detection result corresponding to the set of testing image data using the seed model generated in the current iteration. The preset condition may include that the difference between the detection result generated by the seed model in the current iteration and the reference detection result being less than a fourth threshold.

The processing module 440 (also referred to as a processing unit) may be configured to generate an initial seed model based on the supervised labelled image data. For example, the initial seed model may include an input, an output, and multiple classifiers or layers in between. When an input A is provided, the initial seed model may output a B according to the structure, and weights of classifiers. The classifiers may correspond to different perspectives regarding the image, for example, one classifier may focus on size of objects, one classifier may focus on details, one classifier may focus on color, etc. When the supervised labelled image data is inputted to the initial seed model (or an untrained preliminary model) , the input may be unlabeled image data and the output may be the manually labelled image data or the supervised labelled image data. The model may “learn” the strategy why the images are labelled as such by changing an internal structure, factor, layers, or weight of classifiers inside the model to achieve a maximum score or lowest loss function. The processing module 440 may generate a target detection model based on the supervised labelled image data and the first unsupervised labelled image data. In some embodiments, the initial seed model may be updated iteratively to generate the target detection model. The processing module 440 may generate or update a seed model based on the supervised labelled image data and the first unsupervised labelled image data. The training of the seed model may be similar to the training of the initial seed model. For example, the unlabeled image data may be provided to the seed model as an input and the supervised labelled image data and the first unsupervised labelled image data may be provided to the seed model as an output.

FIG. 5 is a flowchart illustrating an exemplary process for generating attribute information of target according to some embodiments of the present disclosure. In some embodiments, one or more operations of process 500 may be executed by the target detection system 100. For example, the process 500 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 150, the ROM 330, the RAM 340) and invoked and/or executed by a processing device (e.g., the processing device 110, the processor 320 of the computing device 300, and/or the modules illustrated in FIG. 4) . In some embodiments, the instructions may be transmitted in the form of electronic current or electrical signals. The operations of the illustrated process present below are intended to be illustrative. In some embodiments, the process 500 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order in which the operations of the process as illustrated in FIG. 5 and described below is not intended to be limiting.

In 510, the processing device 110 (e.g., the acquisition module 410) may obtain a trained target detection model. In some embodiments, the target detection model may be trained by supervised labelled image data and/or unsupervised labelled image data. The supervised labelled image data and the unsupervised labelled image may be collectively called training image data. The training image data may be generated based on pre-training image data. For example, the target detection model may be a general model that may be used to detect multiple targets in image data. As another example, the target detection model may be a specialized model that may be used to detect a certain target in image data. In some embodiments, the targets may be preset by the processing device 110 or an operator. The targets may be a single object or multiple objects. The targets may be a rigid object or a flexible object. As used herein, a rigid object may refer to an object whose shape does not change. A flexible object may refer to an object whose shape may change at different moments or capturing angles. The objects may include but not limited to buildings, trees, roadblocks, static vehicles, moving vehicles, pedestrians, animals, or the like, or any combination thereof. The supervised labelled image data and the unsupervised labelled image data may indicate attribute information of targets in the corresponding image data. The attribute information may include regional information, category information and/or content information.

In some embodiments, the supervised labelled image data may be manually labelled by an operator. For example, the operator may label image data via a user interface of a user terminal. The labelling of a target in the image data may include describing a region (or boundary) of the target in the image data, providing a category (e.g., a human, a vehicle, an animal, a tree, a static target, a dynamic target) of the target and/or providing a content (e.g., a teacher, a student, a pedestrian, a motorbike, a bicycle) of the target. The unsupervised labelled image data may be labelled by a machine or a model. For example, the machine may select images that include the target from a set of pre-training image data and label the target in the selected images. In some embodiments, the supervised labelled image data and the unsupervised labelled image data may also include labelling information (e.g., location, category) of an interference subject associated with the target. The labeling information may be labelled manually or by using a second target examination model described in connection with FIGs. 12-14. As used herein, the second target examination model may be configured to provide labelling information of the target and labelling information of the interference subject associated with the target.

In some embodiments, an operator may first determine one or more targets that require to be detected in image data. The operator may manually label the one or more targets in at least some images of a set of pre-training image data to generate a set of supervised labelled image data. The set of supervised labelled image data may be processed to generate a set of unsupervised labelled image data. The set of supervised labelled image data and the set of unsupervised labelled image data may be used collectively to train a preliminary to generate a target detection model. In some embodiments, the unsupervised labelled image data may be iteratively updated to generate a more accurate and objective target detection model. More descriptions regarding the generation of the unsupervised labelled image data and the target detection model may be found elsewhere in the present disclosure, e.g., FIGs. 6-9 and descriptions thereof.

In 520, the processing device 110 (e.g., the determination module 430) may determine image data-to-be-detected. The image data-to-be-detected may be real-time image data captured by a capturing device (e.g., the capturing device 130) or a non-real-time image data stored in the storage device 150. The image data-to-be-detected may include a target that corresponds to the target detection model. For example, the target detection model may be a model used to detect objects around a vehicle in an autonomous driving system and the image data-to-be-detected may be a real-time image data of environment around a vehicle. As another example, the target detection model may be a face detection model used to detect criminals in a Customs and the image data-to-be-detected may include photos of people captured in the Customs.

In 530, the processing device 110 (e.g., the processing module 440) may execute the trained target detection model to generate attribute information of targets in the image data-to-be-detected. The attribute information may include regional information, category information and/or content information. The regional information may correspond to the position and/or shape of the target in the image data. For example, the regional information may include a central point of the target, a close fit boundary of the target (which may be an irregular shape) , a loose fit boundary of the target (which may be a regular shape such as rectangles, circles, ovals, squares, that is slightly larger than the actual shape of the target) , a vertical line along the height direction of the target, a horizontal line along the width direction of the target, a diagonal line, etc. For example, the processing device 110 may determine the regional information of the target as a rectangle of 30*40 pixels. The processing device 110 may mark the boundary of the rectangle or show the rectangle in colors different from background. The category information may include a category of the target in the image data. For example, the category information may include static object and dynamic object. As another example, the category information may include humans, animals, plants, vehicles, buildings, etc. The content information may include a content or identity of the target in the image data. For example, the content information may include teachers, students, pedestrians, motorbikes, bicycles, etc. As another example, the content information may include smile, sad, angry, worried, etc. The processing device 110 may mark the boundaries of the targets in the image data in different colors according to their category information and content information. For example, a red rectangle may be marked around a vehicle and a yellow circle may be marked around a rabbit. It should be noted the generation of the attribute information of targets may include visually marking the target or virtually correlating the targets with software code corresponding to the related regional information, content information and/or category information. Refer to FIG. 10, the image 1000 may be an image data-to-be-detected. The processing device 110 may execute the trained target detection model on the image 1000 to generate attribute information of targets in the image 1000. For example, the regional information of a car may be labelled as a rectangle 1010 around the car, and the category information of the car and the content information of the car may be displayed in image 1000 under the car as (vehicle, car) . As another example, the regional information of a road lamp may be labelled as a rectangle 1020 around the road lamp, and the category information of the car and the content information of the car may be displayed in image 1000 on the left of the road lamp as (facility, road lamp) . Similarly, the attribute information of a telegraph pole and a warehouse are also generated.

FIG. 6 is a flowchart illustrating exemplary processes for generating a target detection model according to some embodiments of the present disclosure. In some embodiments, one or more operations of process 600 may be executed by the target detection system 100. For example, the process 600 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 150, the ROM 330, the RAM 340) and invoked and/or executed by a processing device (e.g., the processing device 110, the processor 320 of the computing device 300, and/or the modules illustrated in FIG. 4) . In some embodiments, the instructions may be transmitted in the form of electronic current or electrical signals. The operations of the illustrated process present below are intended to be illustrative. In some embodiments, the process 600 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order in which the operations of the process as illustrated in FIG. 6 and described below is not intended to be limiting.

The process 600 may be an exemplary process for generating a target detection model. The generated target detection model may be obtained in 510 and used in process 500 in FIG. 5. In some embodiments, FIG. 7 is schematic diagram illustrating an exemplary process for generating unsupervised labelled image data according to some embodiments of the present disclosure. The schematic diagram FIG. 7 may be an exemplary embodiment of the process 600.

In 610, the processing device 110 (e.g., the acquisition module 410) may obtain pre-training image data. The pre-training image data may be obtained from an image capturing device (e.g., the image capturing device 130) or a database (e.g., the storage device 150, an external database) . The pre-training image data may be associated with a particular environment. For example, the pre-training image data may include a plurality of traffic images captured by data recorders of multiple vehicles. As another example, the pre-training image data may include a plurality of videos captured by dome cameras mounted on front gate of buildings. As a further example, the pre-training image data may include a plurality of human faces. The pre-training image data may be real data (e.g., image data of real environment) or virtual data (e.g., image data generated, modified or synthesized by a computer) .

In some embodiments, the pre-training image data may be selected from a pool of image data. Refer to FIG. 7, the pre-training image data 720 may be selected from the raw data 710. For example, the raw image data 710 may include image data captured by different cameras in different environments. The processing device 110 may select some of the raw image data as the pre-training image data 720 according to a requirement. Merely by way of example, an operator may set vehicles as targets and all the image data that includes vehicles or wheels may be selected as the pre-training image data 720. The selection may be performed manually by an operator or a processor (e.g., an external processor, the processing device 110) .

In 620, the processing device 110 (e.g., the processing module 440) may receive one or more operations from an operator on the pre-training image data to generate supervised labelled image data. For example, the operator may label the pre-training image data via a user interface of a user terminal. The labelling of a target in the image data may include describing a region (or boundary) of the target in the image data, providing a category (e.g., a human, a vehicle, an animal, a tree, a static target, a dynamic target) of the target and/or providing a content (e.g., a teacher, a student, a pedestrian, a motorbike, a bicycle) of the target. For example, the operator may draw a rectangle around a target in the image data and provide a category and content of the target as (vehicle, car) . As another example, the operator may draw an oval around a target in the image data and provide a category and content of the target as (human, cyclist) . Refer to FIG. 7, the pre-training image data 720 may be manually labelled to generate the supervised labelled image data 740.

In some embodiments, the supervised labelled image data may also include labelling information (e.g., location, category) of an interference subject associated with the target. The labeling information may be labelled manually or by using a second target examination model described in connection with FIGs. 12-14. As used herein, the second target examination model may be configured to provide labelling information of the target and labelling information of the interference subject associated with the target.

In 630, the processing device 110 (e.g., the processing module 440) may generate an initial seed model based on the supervised labelled image data. For example, the initial seed model may include an input, an output, and multiple classifiers or layers in between. When an input A is provided, the initial seed model may output a B according to the structure, and weights of classifiers. The classifiers may correspond to different perspectives regarding the image, for example, one classifier may focus on size of objects, one classifier may focus on details, one classifier may focus on color, etc. When the supervised labelled image data is inputted to the initial seed model (or an untrained preliminary model) , the input may be unlabeled image data and the output may be the manually labelled image data or the supervised labelled image data. The model may “learn” the strategy why the images are labelled as such by changing an internal structure, factor, layers, or weight of classifiers inside the model to achieve a maximum score or lowest loss function. Merely by way of example, the initial seed model may learn that the vehicles may have different colors and may reduce weight of color classifiers (e.g., color won’t strongly affect identification of objects any more) . As another example, the machine may learn that each type of vehicles is of similar height-width ratio and increase the corresponding classifier’s weight. Overall, the initial seed model may learn what feature are common in the labelled target and increase the corresponding classifier’s weight and what feature are less common in the labelled target and reduce the corresponding classifier’s weight.

It should be noted that the above descriptions regarding the training of a model is merely illustrative and shall not limit the method of training the model nor the structure or type of the model. For example, the initial seed model (and/or other models in the present disclosure, e.g., seed model, preliminary model, updated seed model, target detection model) may include but not limited to a neural network, an artificial neural network (ANN) , a convolutional neural network (CNN) , a you-only-look-once (YOLO) network, a tiny YOLO network, a support vector machine (SVM) , a regions with convolutional neural network (R-CNN) , a decision tree, a random forest, or the like, or any combination thereof. Refer to FIG. 7, the supervised labelled image data 740 may be used to train the initial seed model 750. The initial seed model 750 may be a temporary model of the target detection system 760.

In 640, the processing device 110 (e.g., the processing module 440) may execute the initial seed model to generate unsupervised labelled image data based on the pre-training image data. In some embodiments, the processing device 110 may preprocess the pre-training image data to generate the unsupervised labelled image data. The “preprocess” of the pre-training image data may include a filtering process and a labelling process. For example, the initial seed model may select some or all of the pre-training image data that includes the target from the pre-training image data. The initial seed model may label the target in the selected pre-training image data. For example, the processing device 110 may input the pre-training image data to the initial seed model. The initial seed model may generate labelled pre-training image data as an output. The outputted image data may correspond to some or all of the inputted image data. Such outputted image data may be called the unsupervised labelled image data. Refer to FIG. 7, the processing device 110 may select some or all image data from the raw image data 710 as the pre-training image data 730. The pre-training image data 730 may be the same as or different from the pre-training image data 720. The processing device 110 may then input the selected pre-training image data 730 into the target detection system 760 to generate the unsupervised labelled image data 770 as an output.

In 650, the processing device 110 (e.g., the processing module 650) may process the unsupervised labelled image data to generate first unsupervised labelled image data. The supervised labelled image data and the unsupervised labelled image (and/or the first unsupervised labelled image) may be collectively called the training image data. In some embodiments, the initial seed model may not be very accurate, and the unsupervised labelled image data may include errors. For example, some of the targets that should be labelled may be missed by the initial seed model. As another example, the attribute information of the detected target may be wrong. Merely by way of example, the regional information (e.g., the shape or boundary) of the detected target may be wrong. In some embodiments, the processing device 110 may filter some of the unsupervised labelled image data according to a first preset condition. The retained unsupervised labelled image data may be referred to as the first unsupervised labelled image data. The preset condition may include conditions that an average output score of the initial seed model being greater than a first threshold, a size ratio of boundaries of a detection region satisfying a preset ratio, a color of a detection region satisfying a preset color request, and/or an angle of the target satisfying a preset angle. In some embodiments, the preset condition may include any one, any two, any three, or all of the above conditions.

In some embodiments, the unsupervised labelled image data and/or the first unsupervised labelled image data may also include labelling information (e.g., location, category) of an interference subject associated with the target. The labeling information may be labelled manually or by using a second target examination model described in connection with FIGs. 12-14. As used herein, the second target examination model may be configured to provide labelling information of the target and labelling information of the interference subject associated with the target.

In some embodiments, when a pre-training image data is inputted into the initial seed model, the initial seed model may also output an average output score of the unsupervised labelled image data. The average output score may indicate the probability that the unsupervised labelled image data being reliable. For example, the score of the supervised labelled image data may be 100%. As another example, if the pre-training image data 720 and the pre-training image data 730 are very different, the output score of the unsupervised labelled image data may be relatively low. In other words, the initial seed model is not so confident about its labelling of the image data. Such unsupervised labelled image data (if lower than the first threshold) may be filtered out from the unsupervised labelled image data. The first threshold may be 80%, 90%, 95%, 99%, etc.

The detection region may be a region (usually a rectangle) where the target is labelled. In some embodiments, the size ratio of boundaries of the detection region may be determined. For example, if the target is labelled by a rectangle of 40*30 pixels, the size ratio may be 1.33 (40/30) . In some embodiments, the size ratio of boundaries of the detection region may also include a ratio between the size of the boundaries of the detection region and the image data. For example, if the target is labelled by a rectangle of 40*30 and the size of the image is 500*800, the size ratio may be 0.08 (40/500) or 0.0375 (30/800) . The preset size ratio may depend on the selection of the target and is not limiting.

The color (or the grayscale value) of the detection region may include an average grayscale value, a maximum grayscale value, a minimum grayscale value, a difference between the maximum grayscale value and the minimum grayscale value, a difference between the average grayscale value in the detection region and the average grayscale value of pixels around the detection region. The preset color request may depend on the selection of the target and is not limiting.

The angle of the target may include a relative angle from the target to other objects in the image data or a relative position of the target in the image data. For example, vehicles or seats in front of a car may be at a fixed angle (e.g., perpendicular to the capturing angle) . If a seat of a vehicle is labelled in unsupervised labelled image data, the processing device 110 may determine a boundary of the labelled seat and determine whether certain points or lines along the boundary of the labelled seat satisfy a preset angle condition. For example, the processing device 110 may determine whether an upper edge of the boundary and/or the lower edge of the boundary are horizontal. The preset angle may depend on the selection of the target and is not limiting.

In 660, the processing device 110 (e.g., the processing module 440) may generate a target detection model based on the training image data (e.g., the supervised labelled image data and the first unsupervised labelled image data) . In some embodiments, the initial seed model may be updated iteratively to generate the target detection model. More descriptions regarding the generation of the target detection model may be found elsewhere in the present disclosure, e.g., FIG. 8, FIG. 9 and the descriptions thereof. In some embodiments, the target detection model may be used to detect targets in real images. An exemplary process of using the target detection model may be found in process 500 in FIG. 5.

FIG. 8 is a flowchart illustrating exemplary processes for generating a target detection model according to some embodiments of the present disclosure. In some embodiments, one or more operations of process 800 may be executed by the target detection system 100. For example, the process 800 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 150, the ROM 330, the RAM 340) and invoked and/or executed by a processing device (e.g., the processing device 110, the processor 320 of the computing device 300, and/or the modules illustrated in FIG. 4) . In some embodiments, the instructions may be transmitted in the form of electronic current or electrical signals. The operations of the illustrated process present below are intended to be illustrative. In some embodiments, the process 800 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order in which the operations of the process as illustrated in FIG. 8 and described below is not intended to be limiting.

In some embodiments, operations 860-890 may correspond to operation 660 in FIG. 6. In some embodiments, FIG. 9 may be a schematic diagram illustrating an exemplary process for generating target detection model according to some embodiments of the present disclosure. The schematic diagram FIG. 9 may be an exemplary embodiment of the process 800.

In 810, the processing device 110 may obtain pre-training image data. Operation 810 may be similar to the operation 610 and is not repeated herein. Refer to FIG. 9, the pre-training image data 920 may correspond to the pre-training image data 720 and/or the pre-training image data 730. Alternatively, the pre-training image data may be selected from the raw image data 710 and be different from the training image 720 and the pre-training image data 730.

In 820, the processing device 110 may label the pre-training image data to generate supervised labelled image data. Operation 820 may be similar to the operation 620 and is not repeated herein. Refer to FIG. 9, the supervised labelled image data 910 may correspond to the supervised labelled image data 740.

In 830, the processing device 110 may generate an initial seed model based on supervised labelled image data. Operation 830 may be similar to the operation 630 and is not repeated herein.

In 840 and 850, the processing device 110 may generate first unsupervised labelled image data based on the initial seed model.

Operations

840 and 850 may be similar to the

operations

640 and 650 and are not repeated herein.

In 860, the processing device 110 may generate or update a seed model based on the training image data (e.g., the supervised labelled image data and the first unsupervised labelled image data) . The training of the seed model may be similar to the training of the initial seed model. For example, the unlabeled image data may be provided to the seed model as an input and the supervised labelled image data and the first unsupervised labelled image data may be provided to the seed model as an output. The internal structure, classifiers, layers of the seed model may be updated according to the input and the output. Refer to FIG. 9, the seed model 950 may be updated by the supervised labelled image data 910 and the unsupervised labelled 940. The updated seed model 950 may be a temporary model of the target detection system 930.

In 870, the processing device 110 may execute the seed model to update first unsupervised labelled image data based on the pre-training image data. For example, the pre-training image data may be inputted to the seed model updated in the current iteration to generate the updated first unsupervised labelled image data. In the first iteration, the initial seed model may be updated based on the supervised labelled image data generated in 840 and the first unsupervised labelled image data generated in 850 to generate a first-iteration seed model. In the second iteration, the first-iteration seed model may be updated based on the supervised labelled image data generated in 840 and first unsupervised labelled image data that are updated in the first iteration in 870. Similarly, the nth-iteration seed model may be updated in the n+1th iteration in 860 based on the supervised labelled image data generated in 840 and the first unsupervised labelled image data that are updated in the nth iteration in 870. Refer to FIG. 9, the unsupervised labelled image data 940 may be updated by the target detection system 930 in response to an input of the pre-training image data 920.

In 880, the processing device 110 may determine whether a preset condition is satisfied. In some embodiments, the preset condition may include the number of iterations being greater than a second threshold and/or an average output score of the updated first unsupervised labelled image data being greater than a third threshold. The second threshold may be 1, 2, 5, 10, 15, 20, etc. The third threshold may be 80%, 90%, 95%, 99%, etc. In some embodiments, the processing device 110 may obtain a set of testing image data and a reference detection result corresponding to the set of testing image data. The processing device 110 may generate a detection result corresponding to the set of testing image data using the seed model generated in the current iteration. The preset condition may include that the difference between the detection result generated by the seed model in the current iteration and the reference detection result being less than a fourth threshold. In response to a determination that the preset condition is satisfied, the process 800 may proceed to 890; otherwise, the process 800 may proceed back to 860.

In 890, the processing device 110 may designate the seed model in the current iteration as a target detection model. In some embodiments, the target detection model may be used to detect targets in real images. An exemplary process of using the target detection model may be found in process 500 in FIG. 5.

FIG. 11 is a schematic diagram illustrating an exemplary structure of a model according to some embodiments of the present disclosure. The model 1100 may be an exemplary embodiment of the preliminary model, the initial seed model, the update seed model, the target detection model. In some embodiments, the model may be a convolutional neural network (CNN) . The CNN may be a multilayer neural network (e.g., including multiple layers) . The multiple layers may include at least one of a convolutional layer (CONV) , a Rectified Linear Unit (ReLU) layer, a pooling layer (POOL) , or a fully connected layer (FC) . The multiple layers of CNN may correspond to neurons arranged in 3 dimensions: width, height, depth. In some embodiments, CNN may have an architecture as [INPUT -CONV -RELU -POOL -FC] . In some embodiments, The INPUT [32x32x3] may hold the raw pixel values of the image, in this case an image of width 32, height 32, and with three color channels R, G, B. The CONV layer may compute the output of neurons that are connected to local regions in the input, each computing a dot product between their weights and a small region they are connected to in the input volume. This may result in a volume of [32x32x12] if 12 filters are used. The CONV layer may be the core building block of CNN that does most of the computational load. The RELU layer may apply an elementwise activation function, such as the max (0, x) thresholding at zero. This may leave the size of the volume unchanged ( [32x32x12] ) . The POOL layer may perform a downsampling operation along the spatial dimensions (width, height) , resulting in a volume such as [16x16x12] . The function of the POOL layer may be to progressively reduce the spatial size of the representation to reduce the amount of parameters and computation in the network, and hence to also control overfitting. The Pooling Layer operates independently on every depth slice of the input and resizes it spatially, using the MAX operation. In some embodiments, the pooling layer with filters of size 2x2 applied with a stride of 2 downsamples. Each depth slice in the input by 2 along with both width and height, discarding 75%of the activations. Each MAX operation include taking a max over 4 numbers (e.g., little 2x2 region in some depth slice) . The FC layer may compute the class scores, resulting in volume of size [1x1x10] , where each of the 10 numbers correspond to a class score. Each neuron in the FC layer may be connected to all the values in the previous volume. Each class score may correspond to a category, a type, or content information of a particular target. Different targets may correspond to different class score.

In this way, CNN may transform the original image layer by layer from the original pixel values to regional class scores. In particular, the CONV/FC layers perform transformations that may be a function of not only the activations in the input volume, but also of the parameters (for example, the weights and biases of the neurons) . In some embodiments, the RELU/POOL layers may implement a fixed function. In some embodiments, the parameters in the CONV/FC layers may be trained with gradient descent so that the class scores that CNN computes may be consistent with the labelled image data in the output.

The CNN may combine with reinforcement learning to improve the accuracy of target detection. The reinforcement learning may include Markov Decision Process (MDP) , Hidden Markov Model (HMM) , etc.

It should be noted that the model 1100 described in FIG. 11 is merely illustrative and shall not be limiting. Other types of models may be used in the present disclosure, including but not limited to a neural network, an artificial neural network (ANN) , a convolutional neural network (CNN) , a you-only-look-once (YOLO) network, a tiny YOLO network, a support vector machine (SVM) , a regions with convolutional neural network (R-CNN) , a decision tree, a random forest, or the like, or any combination thereof.

FIG. 12 is a block diagram illustrating exemplary processing device according to some embodiments of the present disclosure. As shown in FIG. 12, the processing device 110 may include an obtaining module 1210, an examination module 1220 and a model training module 1230.

The obtaining module 1210 may be configured to obtain data. In some embodiments, the obtaining module 1210 may obtain the data from the target detection system 100 (e.g., the processing device 110, the terminal 120, the image capturing device 130, the storage device 150, ) or any device disclosure in the present disclosure. The data may include image data, video data, user instruction, algorithm, model, or the like, or any combination thereof.

In some embodiments, the obtaining module 1210 may obtain data to be examined. The data to be examined may include image data, video data, or the like, or any combination thereof. The image data may include one or more images. The video data may include a plurality of image frames constituting a video.

The examination module 1220 may be configured to examine the data to be examined and output an examination result of the data to be examined. As used herein, the examination result may include labelling information of a target to be examined and labelling information of an interference subject associated with the target to be examined. As used herein, the target to be examined may include a target that needs to be obtained from the data to be examined. A user of the target detection system 100 may be interested in the target. The interference subject may include a background target that is erroneously identified as the target to be examined in any target examination process.

In some embodiments, the interference subject may include, but is not limited to, a subject that has been erroneously identified as the target to be examined using a second target examination model described in the present disclosure, a subject that has been erroneously identified as the target to be examined using an existing target examination model in a target examination field, a subject that has been erroneously identified as the target to be examined using an examination algorithm and/or a model in other fields, a subject that has been erroneously identified as the target to be examined manually, or the like, or any combination thereof.

In some embodiments, the examination result (or the labelling information) may include a category of the target to be examined, a score of the target to be examined, a category of the interference subject, a score of the interference subject, etc. The examination model 1420 may label the category of the target to be examined or the interference subject with an examination box. As used herein, the examination box may refer to a bounding box of a subject, and an examination subject of the data to be examined may be included in the examination box. The score may refer to a probability that a subject in an examination box is identified as a target to be examined or an interference subject.

In some embodiments, the examination result (or the labelling information) may also include information (also referred to as “category identifier” ) for distinguishing the category of the target to be examined and the category of the category of the interference subject. In some embodiments, the examination module 1220 may distinguish the target to be examined and the interference subject using examination boxes with different colors or different shape (also referred to as “labelling box” ) . For example, the examination module 1220 may use a green examination box labelling the category of the target to be examined, and use a red examination box labelling the category of the interference subject. In some embodiments, the examination module 1220 may add a text near the examination box to illustrate whether the subject in the examination box is the target to be examined or the interference subject.

In some embodiments, the examination module 1220 may output the examination result of the data to be examined using the second target examination model. Further, the examination module 1220 may obtain the target to be examined based on the examination result. As used herein, the second target examination model may be trained by training data including labelling information of a training target to be examined and labelling information of a training interference subject associated with the training target to be examined.

The model training module 1230 may be configured to generate the second target examination model. In some embodiments, the model training module 1230 may include a preliminary training unit 1232, a model examination unit 1234, and a second training unit 1236.

The preliminary training unit 1232 may be configured to obtain preliminary training data and train a preliminary model to obtain a preliminary target examination model using the preliminary training data. In some embodiments, the preliminary training data may include image data, video data, or the like, or a combination thereof. As used herein, the image data may include a plurality of images. The video data may include a plurality of images of video frames. In some embodiments, the preliminary training data may include labelling information of a preliminary training target to be examined. The preliminary training data may also include labelling information of a priori interference subject of the preliminary training targets to be examined. The priori interference subject may refer to a predetermined interference subject of a preliminary training target to be examined.

As used herein, the preliminary model may include a Deformable Part Model (DMP) , an OverFeat model, a Region-Convolutional Neural Network (R-CNN) model, a Spatial Pyramid Pooling Network (SPP-Net) model, a Fast R-CNN model, a Faster R-CNN model, a Region-based Fully Convolutional Network (R-FCN) model, a Deeply Supervised Object Detector (DSOD) model, or the like, or any combination thereof. More detailed descriptions of training the preliminary model can be found elsewhere in the present disclosure, e.g., FIG. 14 and the descriptions thereof.

The model examination unit 1234 may be configured to obtain preliminary data to be examined and determine preliminary target examination results of the preliminary data to be examined using the preliminary target examination model. Similar to the preliminary training data, the preliminary data to be examined may include image data, video data. The preliminary data to be examined may be different from the preliminary training data.

The preliminary target examination results may include outputs of the preliminary target examination model for input data, i.e., the preliminary data to be examined. In some embodiments, the preliminary target examination results may include one or more examination boxes added in the input data. Each of the examination box may include an examination subject, e.g., a preliminary target to be examined, a preliminary background target of the preliminary target to be examined, a preliminary interference subject of the preliminary target to be examined. The preliminary target examination results may also not include any examination box added in the input data, i.e., the input data and the output of the preliminary model may be the same. In some embodiments, the preliminary target examination results may also include a score indicating a probability that the subject in the examination box is examined as the preliminary target to be examined. The score may include 1, 0.99, 0.98, 0.97, or any other values in 0-1. For example, if the score is 0.98, it may indicate that the subject in the examination box may have a probability of 98%as the preliminary target to be examined.

The second training unit 1236 may be configured to generate training data at least based on the preliminary data to be examined and the preliminary target examination results, and train the preliminary target examination model to generate the second target examination model based on the training data. The training data may include labelling information (e.g., a category identifier, a category, a score, location information) of a training target to be examined and labelling information (e.g., a category identifier, a category, a score, location information) of a training interference object associated with the training target to be examined.

In some embodiments, the second training unit 1236 may generate at least a portion of training data based on the preliminary data to be examined and the preliminary target examination results. The at least a portion of training data may include the labelling information of the prior interference subjects, the labelling information of preliminary interference subjects, the labelling information of the preliminary targets to be examined. In some embodiments, a second portion of the training data may be generated manually, and the second training unit 1236 may obtain the second portion of training data. The second training unit 1236 may obtain a third portion of training data based on an existing target examination model or an existing target examination algorithm described in connection with FIG. 13. More detailed descriptions of training the preliminary target examination model can be found elsewhere in the present disclosure, e.g., FIG. 14 and the descriptions thereof.

The modules and/or units in the processing device 110 may be connected to or communicated with each other via a wired connection or a wireless connection. The wired connection may include a metal cable, an optical cable, a hybrid cable, or the like, or any combination thereof. The wireless connection may include a Local Area Network (LAN) , a Wide Area Network (WAN) , a Bluetooth, a ZigBee, a Near Field Communication (NFC) , or the like, or any combination thereof. For example, the processing device 110 may include a storage module (not shown) which may be used to store data generated by the above-mentioned modules and/or units. As another example, the model training module 1230 may be unnecessary and the second target examination model may be obtained from a storage device (e.g., the storage device 150) , such as the ones disclosed elsewhere in the present disclosure.

FIG. 13 is a flowchart illustrating a process for determining an examination result of the data to be examined according to some embodiments of the present disclosure. In some embodiments, one or more operations of process 1300 may be executed by the target detection system 100. For example, the process 1300 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 150, the ROM 330, the RAM 340) and invoked and/or executed by a processing device (e.g., the processing device 110, the processor 320 of the computing device 300, and/or the modules illustrated in FIG. 12) . In some embodiments, the instructions may be transmitted in the form of electronic current or electrical signals. The operations of the illustrated process present below are intended to be illustrative. In some embodiments, the process 1300 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order in which the operations of the process as illustrated in FIG. 13 and described below is not intended to be limiting.

In 1310, the processing device 110 (e.g., the obtaining module 1210) may obtain data to be examined. As used herein, in some cases, the term “examine” may incorporate the meaning of “detect” , and vice versa. In some embodiments, the data to be examined may include image data, video data, or the like, or any combination thereof. The image data may include one or more images. The video data may include a plurality of image frames constituting a video. In some embodiments, the processing device 110 may designate the image or the video obtained by an obtaining apparatus (e.g., a camera) of the terminal 120, or the image capturing device 130 in real-time as the data to be examined. Additionally or alternatively, the processing device 110 may access the image or the video obtained by the terminal 120 or the image capturing device 130 and previously stored in a storage device (e.g., the storage device 150, the ROM 330, the RAM 340) or the image capturing device 130 via the network 130, and designate thereof as the data to be examined.

In 1320, the processing device (e.g., the examination module 1220) may examine the data to be examined and output an examination result. As used herein, the examination result may include labelling information of a target to be examined and labelling information of an interference subject associated with the target to be examined. As used herein, the target to be examined may include a target that needs to be obtained from the data to be examined. A user of the target detection system 100 may be interested in the target to be examined. The interference subject may include a background target that has been erroneously identified as the target to be examined in a target examination process. The background target may include any target other than the target to be examined that does not need to be examined in the data to be examined. At least a portion of the background target may have the same and/or similar physical properties (e.g., shape, color, texture, etc. ) as the target to be examined. For example, assuming that a seat belt needs to be examined from an image or a video of an in-vehicle scene, the processing device 110 may designate the seat belt as the target to be examined, and collectively designate targets other than the seat belt as the background targets. A portion of the background targets may be easily confused with the seat belt since the portion of the background targets may have a similar shape with the seat belt (e.g., a shape of long strip) . For example, the portion of the background targets may include a tie, a lanyard of an accessory hanging on a rear-view mirror, a light shadow, a lane line, etc. Thus, the portion of the background targets may be likely to be erroneously identified as the seat belt in a target examination process, and the processing device 110 may designate the portion of the background targets as interference subjects of the seat belt.

In some embodiments, the interference subject may include, but is not limited to, a subject that has been erroneously identified as the target to be examined using a second target examination model described in the present disclosure, a subject that has been erroneously identified as the target to be examined using an existing target examination model in a target examination field, a subject that has been erroneously identified as the target to be examined using an examination algorithm in other fields, a subject that has been erroneously identified as the target to be examined manually, or the like, or any combination thereof. For example, the second target examination model disclosed in the present disclosure may erroneously identify the seat belt as the tie (i.e., the target to be examined) during a target examination process. The processing device 110 may designate the seat belt as the interference subject of the tie (i.e., the target to be examined) . As another example, the existing target examination model, e.g., an OverFeat model, may erroneously identify the seat belt as the tie (i.e., the target to be examined) during a target examination process. The processing device 110 may designate the seat belt as the interference subject of the tie (i.e., the target to be examined) . As a further example, the seat belt may be erroneously identified as the tie (i.e., the target to be examined) using an adaptive boosting (Adaboost) algorithm during a target examination process. The processing device 110 may designate the seat belt as the interference subject of the tie (i.e., the target to be examined) . As a further example, when the target is classified or identified manually, the seat belt may be erroneously identified as the tie (i.e., the target to be examined) . The processing device 110 may designate the seat belt as the interference subject of the tie (i.e., the target to be examined) . For illustration purpose, the existing target examination model may also include a Deformable Part Model (DMP) , an OverFeat model, a Region-Convolutional Neural Network (R-CNN) model, a Spatial Pyramid Pooling Network (SPP-Net) model, a Fast R-CNN model, a Faster R-CNN model, a Region-based Fully Convolutional Network (R-FCN) model, a Deeply Supervised Object Detector (DSOD) model, or the like, or any combination thereof. The examination algorithm may also include a Support Vector Machine (SVM) algorithm, a Single Shot MultiBox Detector (SSD) algorithm, a you-only-look-once (YOLO) algorithm, or the like, or any combination thereof.

In some embodiments, the number of the targets to be examined may include one or more. For example, when an image or video stream of an in-vehicle scene is examined, the target to be examined may include a seat belt, a face of a driver, a face of a passenger, or the like, or any combination thereof. The number of interference subjects of each target to be examined may be different and depend on properties (e.g., physical properties) of the target to be examined. For example, if the target to be examined is the seat belt, the interference subjects of the seat belt may include a tie, a lanyard of an accessory hanging on a rear-view mirror, a light shadow, a lane line, or the like, or any combination thereof. As another example, if the target to be examined is the face of the driver, the interference subject may include a face of the passenger, a face of a pedestrian, or the like, or any combination thereof.

As used herein, the labelling information of the target to be examined may include a category, a score, location information of the target to be examined, etc. The labelling information of the interference subject may include a category, a score, location information of the interference subject, etc. The processing device 110 may label the category of the target to be examined or the interference subject with an examination box. As used herein, the examination box may refer to a bounding box of a subject, and an examination subject of the data to be examined may be included in the examination box. As used herein, the processing device 110 may collectively designate the targets to be examined, the interference subjects of the targets to be examined, the background targets as the examination subject. As used herein, the score may refer to a probability that a subject in an examination box is identified as a target to be examined or an interference subject. The score may include 1, 0.99, 0.98, 0.97, or any other values in 0-1. For example, assuming that a score corresponding to a target within an examination box in the examination result is 0.98, it may indicate that the target within the examination box may have a probability of 98%as the target to be examined.

In some embodiments, the examination result may include one or more examination boxes. Each of the examination subjects may be included in one of the one or more examination boxes. In some embodiments, the examination result (or the labelling information) may also include information (also referred to as “category identifier” ) for distinguishing the category of the target to be examined and the category of the category of the interference subject. In some embodiments, the category identifier may include examination boxes with different colors or different shapes (also referred to as “labelling box” ) . Each of the different colors or different shapes may represent one category. For example, the processing device 110 may use a green examination box labelling the category of the target to be examined, and use a red examination box labelling the category of the interference subject. As another example, the processing device 110 may add a text near the examination box to illustrate whether the subject in the examination box is the target to be examined or the interference subject.

As used herein, the location information may include a coordinate or a set of coordinates (also referred to as “coordinate set” ) of the target to be examined and a coordinate or a coordinate set of the interference subject. For example, the coordinate may include a coordinate of a center of the target to be examined or a center of the interference subject. The coordinate set may include a plurality of coordinates associated with a plurality of locations of the target to be examined or a plurality of locations of the interference subject. e.g., a center, a vertex, a location of a boundary of the target to be examined or the interference subject, etc.

In some embodiments, the examination result may be obtained manually. In some embodiments, the processing device 110 may obtain the examination result using the second target examination model. The processing device 110 may input the data to be examined into the second target examination model to obtain the examination result. The processing device 110 may obtain the second target examination model by training a relevant model (e.g., a preliminary target examination model described in connection with FIGs. 14-15) using training data. The relevant model may include a classical learning model in the target examination field. For example, the relevant model may include a Deformable Part Model (DMP) , an OverFeat model, a Region-Convolutional Neural Network (R-CNN) model, a Spatial Pyramid Pooling Network (SPP-Net) model, aFast R-CNN model, a Faster R-CNN, a Region-based Fully Convolutional Network (R-FCN) model, a Deeply Supervised Object Detector (DSOD) model, or the like, or any combination thereof.

In some embodiments, the training data of the second target examination model may include image data, video data, or a combination thereof. The training data may include labeling information of training targets to be examined and labelling information of training interference subjects of the training targets to be examined. In some embodiments, for each image and/or each image frame in each video in the training data, an identifier may be used to identify location information and category information of the training targets to be examined and the training interference subjects. For example, the identifier may include a labelling box, a text, an arrow, or the like, or any combination thereof. As used herein, the location information may include a coordinate or a set of coordinates (also referred to as “coordinate set” ) of the training target to be examined or a coordinate or a coordinate set of the training interference subject of the training target to be examined in the image. For example, the coordinate may include a coordinate of a center of the training target to be examined or a center of the training interference subject. The coordinate set may include a plurality of coordinates associated with a plurality of locations of the training target to be examined or a plurality of locations of the training interference subject. e.g., a center, a vertex, a location of a boundary of the training target to be examined or the training interference subject, etc.

As used herein, the category information may include information (also referred to as “category identifier” ) for distinguishing the training target to be examined and the training interference subject of the training target to be examined. For example, the location information of the training target to be examined and the location information of the training interference subject of the target to be examined may be respectively distinguished by examination boxes with different colors or different shapes (also referred to as “labelling box” ) . Additionally or alternatively, a text may be added near the examination box (also referred to as “labelling box” ) to illustrate whether the subject in the examination box is the training target to be examined or the training interference subject. The processing device 110 may collectively designate the location information, the category information (including the identifier, the category, the score) as the labelling information. For example, for an image of an in-vehicle scene, assuming that the training target to be examined is a seat belt, and the training interference subject is a tie, the seat belt may be highlighted using a first labelling box (e.g., a green rectangle exemption box) in the image. Coordinates of a range of a subject within the labelling box may include a coordinate set of the seat belt. The tie may be highlighted by using a second labelling box (e.g., a red rectangle examination box) in the image. Coordinates of a range of a subject within the second labelling box may include a coordinate set of the tie. The training target to be examined and the training interference subject of the training target to be examined may be labelled manually or by a high-precision classifier, and the present disclosure may be non-limiting. For example, the high-precision classifier may include a deformable part classifier, an OverFeat classifier, a Region-Convolutional Neural Network (R-CNN) classifier, a Spatial Pyramid Pooling Network (SPP-Net) classifier, a Fast R-CNN classifier, a Faster R-CNN classifier, a Region-based Fully Convolutional Network (R-FCN) classifier, a Deeply Supervised Object Detector (DSOD) classifier, or the like, or any combination thereof. Detailed description of generating the second target examination model can be found elsewhere in the present disclosure e.g., FIG. 14 and the descriptions thereof, and not repeated here.

In some embodiments, the processing device 110 may also filter out the targets to be examined based on the category of each of the examination subjects in the examination result. The category of the examination subject may be obtained by identifying the examination subject within the identifier. For example, the examination subject may be identified by a machine identification method (e.g., by the second target examination model described in the present disclosure) or manually. The processing device 110 may obtain the targets to be examined after the identification.

It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure.

FIG. 14 is a flowchart illustrating a process for generating a second target examination model according to some embodiments of the present disclosure. In some embodiments, one or more operations of process 1400 may be executed by the target detection system 100. For example, the process 1400 may be implemented as a set of instructions (e.g., an application) stored in a storage device (e.g., the storage device 150, the ROM 330, the RAM 340) and invoked and/or executed by a processing device (e.g., the processing device 110, the processor 320 of the computing device 300, and/or the modules illustrated in FIG. 12) . In some embodiments, the instructions may be transmitted in the form of electronic current or electrical signals. The operations of the illustrated process present below are intended to be illustrative. In some embodiments, the process 1400 may be accomplished with one or more additional operations not described and/or without one or more of the operations herein discussed. Additionally, the order in which the operations of the process as illustrated in FIG. 14 and described below is not intended to be limiting.

In 1410, the processing device 110 (e.g., the model training model 1230, the preliminary training data 1232) may obtain preliminary training data (also referred to as “preliminary training samples” ) . In some embodiments, similar to the data to be examined, the preliminary training data may include image data, video data, or the like, or a combination thereof. As used herein, the image data may include a plurality of images. The video data may include a plurality of images of video frames. The processing device 110 may pre-obtain the preliminary training data or obtain the preliminary training data in real-time. In some embodiments, the processing device 110 may number the images and/or videos included in the preliminary training data using symbols, e.g., characters, letters, digits, codes, or a combination thereof.

In some embodiments, the preliminary training data may include labelling information of preliminary training targets to be examined. For illustration purpose, for any image of the image data or any image frame of the video, each category of the preliminary training targets to be examined may be labelled with an identifier, such as a rendered box. For example, the identifier may include a square rendered box, a rectangular rendered box, a circular rendered box, a boundary rendered box, or the like, or any combination thereof. The identifier may also display a location (e.g., coordinate information, coordinate set information) of the preliminary training target to be examined in the image. For example, a first preliminary training target to be examined may be labelled with and included in a green rectangular rendered box. A second preliminary training target to be examined may be labelled with and included in a green circular rendered box. In some embodiments, the preliminary training targets to be examined may be labelled manually or by a high-precision classifier, and the present disclosure may be non-limiting. As described elsewhere in the present disclosure, the high-precision classifier may include a deformable part classifier, an OverFeat classifier, a Region-Convolutional Neural Network (R-CNN) classifier, a Spatial Pyramid Pooling Network (SPP-Net) classifier, a Fast R-CNN classifier, a Faster R-CNN classifier, a Region-based Fully Convolutional Network (R-FCN) classifier, a Deeply Supervised Object Detector (DSOD) classifier, or the like, or any combination thereof.

In some embodiments, the preliminary training data may also include labelling information of priori interference subjects of the preliminary training targets to be examined. The priori interference subject may refer to a predetermined interference subject of a preliminary training target to be examined. For example, assuming that the preliminary training target to be examined is a tie, a subject (e.g., a seat belt) having the same and/or similar properties as the tie may be designated as the priori interference subject of the tie and marked in the first preliminary data. The process for obtaining the priori interference subjects may be based on statistical analysis of examination results of an existing target examination algorithm and/or a model, or the like, and the present disclosure may be non-limiting. The process for labelling the priori interference subjects may be similar to the process for labelling the preliminary target to be examined described above, and not repeated here.

In 1420, the processing device 110 (e.g., the model training module 1230, the preliminary training unit 1232) may train a preliminary model to obtain a preliminary target examination model using the preliminary training data. In some embodiments, the preliminary model may include a classical learning model. For example, the classical learning model may include a Deformable Part Model (DMP) , an OverFeat model, a Region-Convolutional Neural Network (R-CNN) model, a Spatial Pyramid Pooling Network (SPP-Net) model, a Fast R-CNN model, a Faster R-CNN model, a Region-based Fully Convolutional Network (R-FCN) model, a Deeply Supervised Object Detector (DSOD) model, a convolutional neural network model, an adaptive boosting model, a gradient boosting decision tree, or the like, or any combination thereof.

In some embodiments, the processing device 110 may input the preliminary training data into the preliminary model and determine preliminary training examination results. Further, the processing device 110 may determine whether the preliminary training examination results satisfy a predetermined condition. In response to a determination that the preliminary training examination results satisfy the predetermined condition, the processing device 110 may terminate the training and designate the preliminary model as the preliminary target examination model. In response to a determination that the preliminary training examination results do not satisfy the predetermined condition, the processing device 110 may adjust parameters of the preliminary model, and continue the training. The processing device 110 may generate an updated preliminary model and determine updated preliminary training examination results associated with the updated preliminary model.

In response to a determination that the updated preliminary training examination results satisfy the predetermined condition, the processing device 110 may terminate the training and designate the updated preliminary model as the preliminary target examination model. In response to a determination that the updated preliminary training examination results do not satisfy the predetermined condition, the processing device 110 may continue to adjust parameters of the updated preliminary model and continue the training until newly determined preliminary training examination results associated with a newly determined updated preliminary model satisfy the predetermined condition.

Merely by way of example, the parameters of the (updated) preliminary model may include a learning rate, a hyper parameter, a weight matrix, a bias vector, etc. The predetermined condition may include the number of preliminary training samples reaching a predetermined threshold, a precision rate of the (updated) preliminary model greater than a predetermined precision threshold, a value of a loss function of the (updated) preliminary model less than a predetermined value. In some embodiments, the processing device 110 may perform further processing e.g., performing a performance test, for the preliminary target examination model. Specifically, the processing device 110 may validate the preliminary target examination model using preliminary validating data. The process for validating the preliminary target examination model using preliminary validating data may be similar to the process for validating the second target examination model described in operation 1460, and not repeated here.

In 1430, the processing device 110 (e.g., the model training model 1230, the model examination unit 1234) use the preliminary target examination model to perform target examination and output preliminary target examination results. Before performing the target examination, the processing device 110 may obtain preliminary data to be examined (also referred to as “preliminary examination samples” ) . Similar to the data to be examined or the preliminary training data, the preliminary data to be examined may include image data and/or video data. The preliminary data to be examined may be different from the data to be examined or the first preliminary training data.

In some embodiments, the processing device 110 may obtain the preliminary data to be examined from the image capturing device 130 via the network 140 in real-time. Additionally or alternatively, the processing device 110 may obtain the preliminary data to be examined from a storage device (e.g., the storage device 150, the ROM 330, the RAM 340) described elsewhere in the present disclosure or an external storage device.

In 1440, the processing device 110 (e.g., the model training module 1230, the second training unit 1236) may obtain training data (also referred to as “training samples” ) . The training data may include labelling information (e.g., a category identifier, a category, a score, location information) of a training target to be examined and labelling information (e.g., an identifier, a category, a score, location information) of a training interference object associated with the training target to be examined.

In some embodiments, the processing device 110 may generate at least a portion of training data based on the preliminary data to be examined and the preliminary target examination results. A second portion of the training data may be generated manually and the processing device 110 may obtain the second portion of the training data. The processing device 110 may generate a third portion of training data based on an existing target examination model or an existing target examination algorithm described in connection with FIG. 13.

Since subjects that have the same characteristics as the target (also referred as “interference subject” ) can interfere with the examination of the preliminary targets to be examined, the preliminary target examination model may be likely to erroneously examine the preliminary interference subjects as the preliminary targets to be examined. In some embodiments, a portion of the preliminary target examination results may be incorrect for various reasons, e.g., structures, parameters, training conditions of the preliminary target examination model. The processing device 110 may analyze the preliminary target examination results, obtain preliminary interference subjects of each preliminary target to be examined, and then generate the at least a portion of the training data.

In some embodiments, the processing device 110 may rank the preliminary interference subjects of the preliminary target to be examined based on a strategy. For illustration purpose, the strategy may be associated with scores of the preliminary target examination results, the number of a category of preliminary examination samples erroneously identified as a preliminary target to be examined, or the like, or a combination thereof. As used herein, the score may refer to a probability that a subject is examined as a preliminary target to be examined. The score may include any value of 0-1. For example, assuming that the preliminary target to be examined is a seat belt, for a preliminary examination sample, the preliminary target examination model examines a tie as the seat belt, and a probability that the subject (i.e., the tie) is the seat belt is 85%, 85%may be the score of the preliminary target examination result of the preliminary examination sample.

As used herein, the number of the category of preliminary examination samples erroneously identified as the preliminary target to be examined may refer to a times that a subject is erroneously examined as a preliminary target to be examined during the preliminary target examination model examines the preliminary examination samples. For example, assuming that the preliminary target to be examined is a seat belt, the number of preliminary examination samples with incorrect results is 5, the preliminary target examination model examines a tie as the seat belt for 3 preliminary examination samples, a lanyard as the seat belt for 1 preliminary examination sample, a light shadow as the seat belt for 1 preliminary examination sample, the number of the ties, the lanyard and the light shadow may be 3, 1, 1, respectively.

In some embodiments, the processing device 110 may rank the preliminary interference subjects according to the scores or the numbers, e.g., in descending order, to obtain a preliminary ranking result. The preliminary ranking result may represent a degree (also referred to as “degree of false examination” ) that the preliminary interference subject is erroneously examined as the preliminary target to be examined. For example, assuming that the preliminary target to be examined is a seat belt, and the preliminary ranking result is a tie, a lanyard, and a light shadow arranged in descending order according to the numbers, the tie may be have a highest degree of false examination, followed by the lanyard and light shadow.

In some embodiments, the processing device 110 may generate a set of the preliminary interference subjects (also referred to as “preliminary interference subject set” ) of the preliminary target to be examined by selecting the preliminary interference subjects partially or in whole according to the preliminary ranking result. For example, the processing device 110 may select the first two preliminary interference subjects in the preliminary ranking result, or select all of the preliminary interference subjects in the preliminary ranking result to generate the preliminary interference subject set. In some embodiments, the number of selected preliminary interference subjects may be a default value, or adjusted according to different preliminary targets to be examined. In some embodiments, the processing device 110 may express the preliminary interference subject set in a similar form of a vector, e.g., [ (seat belt, tie, lanyard) ] , or in the form of a list. The expression of the preliminary interference subject set may be non-limiting in the present disclosure.

accordingly, the at least a portion of training data may include the labelling information of the prior interference subjects, the labelling information of the preliminary interference subject set, the labelling information of the preliminary targets to be examined.

In 1450, the processing device 110 (e.g., the model training module 1230, the second training unit 1236) may train the preliminary target examination model using the training data and obtain a second target examination model. The processing device 110 may set parameters (e.g., a learning rate, a hyper parameter, a weight matrix, a bias vector) of the preliminary target examination model and output training target examination results of the training data. Further, the processing device 110 may determine whether the training target examination results satisfy a predetermined condition. In response to a determination that the training target examination results satisfy a predetermined condition, the processing device 110 may terminate the training and designate the preliminary target examination model as the second target examination model. In response to a determination that the training target examination results do not satisfy the predetermined condition, the processing device 110 may continue to adjust the parameters. Further, the processing device 110 may generate an updated preliminary target examination model and determine updated training target examination results associated with the updated preliminary target examination model.

In response to a determination that the updated training target examination results satisfy the predetermined condition, the processing device 110 may terminate the training and designate the updated preliminary target examination model as the second target examination model. In response to a determination that the updated training target examination results do not satisfy the predetermined condition, the processing device 110 may continue to adjust parameters of the updated preliminary target examination model and continue the training until newly determined training target examination results associated with a newly determined updated preliminary target examination model satisfy the predetermined condition.

Merely by way of example, the predetermined condition may include the number of the training samples reaching a predetermined threshold, a precision rate of the (updated) preliminary target examination model greater than a predetermined precision threshold, a value of a loss function of the (updated) preliminary target examination model less than a predetermined value, etc. The precision rate may refer to a ratio of the number of training samples whose training target examination results are correct and include a training target to be examined to a total number of the training samples

In some embodiments, the processing device 110 may validate the second target examination model, e.g., a precision rate of the second target examination model. The processing device 110 may determine validating examination results of validating data (also referred to as “validating samples” ) using the second target examination model, and determine the precision rate based on the validating examination results of the validating data. As used herein, the precision rate may refer to a ratio of the number of validating samples whose validating results are correct and include a validating target to be examined to a total number of the validating samples. For example, assuming that the validating samples include 100 images, a validating target to be examined is seat belt, and the second target examination model correctly examines the seat belt from 97 images, the precision rate may be 97%. In some embodiments, the total number of the validating samples may be default settings or adjusted according to practical demands.

Similar to the training data, the validating data may include image data, video data, etc. The validating data may include labelling information of a validating target to be examined and labelling information of a validating interference subject. In some embodiments, the validating data and the training data may be different and not have the same data.

In some embodiments, the processing device 110 may determine whether the precision rate is greater than or equal to a predetermined threshold. In response to a determination that the precision rate is greater than or equal to the predetermined threshold, the processing device 110 may consider that the precision rate of the second target examination model satisfies a predetermined requirement. The processing device 110 may use the second target examination model to perform target examination (e.g., the target examination described in operation 1320) . In response to a determination that the precision rate is smaller than the predetermined threshold, the processing device 110 may consider that the precision rate of the second target examination model does not satisfy the predetermined requirement, the processing device 110 may obtain new training data to further train and update the second target examination model until a newly determined precision rate of a newly determined updated second target examination model satisfies the predetermined requirement. In some embodiments, the predetermined threshold may be default settings or adjusted according to practical demands. In some embodiments, a precision rate for any validating target to be examined may be the same or different. For example, a precision rate of a validating target that is hard to examine may be smaller than a precision rate of a validating target that is easy to examine. Thus, the predetermined threshold may be adjusted accordingly.

It should be noted that the above description is merely provided for the purposes of illustration, and not intended to limit the scope of the present disclosure. For persons having ordinary skills in the art, multiple variations and modifications may be made under the teachings of the present disclosure. However, those variations and modifications do not depart from the scope of the present disclosure. For example, the processing device 110 may update the second target examination model at a certain time interval (e.g., per month, per two months) based on newly obtained training data.

Having thus described the basic concepts, it may be rather apparent to those skilled in the art after reading this detailed disclosure that the foregoing detailed disclosure is intended to be presented by way of example only and is not limiting. Various alterations, improvements, and modifications may occur and are intended to those skilled in the art, though not expressly stated herein. These alterations, improvements, and modifications are intended to be suggested by this disclosure, and are within the spirit and scope of the exemplary embodiments of this disclosure.

Moreover, certain terminology has been used to describe embodiments of the present disclosure. For example, the terms “one embodiment, ” “an embodiment, ” and/or “some embodiments” mean that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Therefore, it is emphasized and should be appreciated that two or more references to “an embodiment, ” “one embodiment, ” or “an alternative embodiment” in various portions of this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures or characteristics may be combined as suitable in one or more embodiments of the present disclosure.

Further, it will be appreciated by one skilled in the art, aspects of the present disclosure may be illustrated and described herein in any of a number of patentable classes or context including any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof. Accordingly, aspects of the present disclosure may be implemented entirely hardware, entirely software (including firmware, resident software, micro-code, etc. ) or combining software and hardware implementation that may all generally be referred to herein as a "block, " “module, ” “engine, ” “unit, ” “component, ” or “system. ” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable media having computer readable program code embodied thereon.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including electro-magnetic, optical, or the like, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that may communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable signal medium may be transmitted using any appropriate medium, including wireless, wireline, optical fiber cable, RF, or the like, or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Scala, Smalltalk, Eiffel, JADE, Emerald, C++, C#, VB. NET, Python or the like, conventional procedural programming languages, such as the “C” programming language, Visual Basic, Fortran 1703, Perl, COBOL 1702, PHP, ABAP, dynamic programming languages such as Python, Ruby and Groovy, or other programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN) , or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider) or in a cloud computing environment or offered as a service such as a software as a service (SaaS) .

Furthermore, the recited order of processing elements or sequences, or the use of numbers, letters, or other designations, therefore, is not intended to limit the claimed processes and methods to any order except as may be specified in the claims. Although the above disclosure discusses through various examples what is currently considered to be a variety of useful embodiments of the disclosure, it is to be understood that such detail is solely for that purpose, and that the appended claims are not limited to the disclosed embodiments, but, on the contrary, are intended to cover modifications and equivalent arrangements that are within the spirit and scope of the disclosed embodiments. For example, although the implementation of various components described above may be embodied in a hardware device, it may also be implemented as a software-only solution-e.g., an installation on an existing server or mobile device.

Similarly, it should be appreciated that in the foregoing description of embodiments of the present disclosure, various features are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure aiding in the understanding of one or more of the various embodiments. This method of disclosure, however, is not to be interpreted as reflecting an intention that the claimed subject matter requires more features than are expressly recited in each claim. Rather, claimed subject matter may lie in less than all features of a single foregoing disclosed embodiment.

STATEMENT OF INVENTION

1. A target detection method, comprising:

obtaining a target detection model, wherein the target detection model is generated based on supervised labelled image data and unsupervised labelled image data of pre-training image data,

wherein the target detection model is configured to be a detection model for targets, and the supervised labelled image data and the unsupervised labelled image data are used to indicate attribute information of the targets in the pre-training image data;

determining image data-to-be-detected; and

executing the target detection model on the image data-to-be-detected to generate attribute information of the targets in the image data-to-be-detected.

2. The method of item 1, the obtaining the target detection model comprising:

generating an initial seed model based on the supervised labelled image data;

preprocessing the pre-training image data based on the initial seed model to generate unsupervised labelled image data; and

generating the target detection model based on the supervised labelled image data and the unsupervised labelled image data.

3. The method of item 2, further comprising:

before the target detection model is generated based on supervised labelled image data and unsupervised labelled image data of pre-training image data, performing a filtering process on the unsupervised labelled image data to obtain retained first unsupervised labelled image data, wherein the retained first unsupervised labelled image data satisfy a first preset condition; and

generating the target detection model based on supervised labelled image data and the first unsupervised labelled image data.

4. The method of item 3, wherein the first preset condition includes at least one of:

an average output score of the initial seed model being greater than a first threshold;

a size ratio of boundaries of a detection region satisfying a preset ratio;

a color of a detection region satisfying a preset color request; or

an angle of a target satisfying a preset angle.

5. The method of item 3, the generating the target detection model based on supervised labelled image data and the first unsupervised labelled image data comprising:

step A: generating a seed model based on the supervised labelled image data and the first unsupervised labelled image data;

step B: preprocessing the pre-training image data based on the seed model to generate updated first unsupervised labelled image data;

step C: training the seed model based on the supervised labelled image data and the updated first unsupervised labelled image data to generate an updated seed model; and

if the updated first unsupervised labelled image data does not satisfy a second preset condition,

designating the updated first unsupervised labelled image data as the first unsupervised labelled image data;

designating the updated seed model as the seed model;

performing step A and step B iteratively until the updated first unsupervised labelled image data satisfy a second preset condition; and

designating the updated seed model as the target detection model.

6. The method of item 5, wherein the second preset condition includes at least one of:

a count of the iterations being greater than a second threshold; or

an average score corresponding to the updated first unsupervised labelled image data being greater than a third threshold.

7. A target detection apparatus, comprising:

an acquisition unit, configured to obtain a target detection model, wherein the target detection model is generated based on supervised labelled image data and unsupervised labelled image data of pre-training image data, wherein

the target detection model indicates a detection model for targets and the supervised labelled image data and the unsupervised labelled image data indicate attribute information of the targets in the pre-training image data;

a determination unit, configured to determine image data-to-be-detected; and

a detection unit, configured to execute the target detection model on the image data-to-be-detected to generate attribute information of the targets in the image data-to-be-detected.

8. The apparatus of item 7, wherein the acquisition unit is further configured to:

generate an initial seed model based on the supervised labelled image data;

preprocess the pre-training image data based on the initial seed model to generate unsupervised labelled image data; and

generate the target detection model based on the supervised labelled image data and the unsupervised labelled image data.

9. The apparatus of item 8, further comprising:

a processing unit, configured to perform a filtering process on the unsupervised labelled image data to obtain retained first unsupervised labelled image data, wherein the retained first unsupervised labelled image data satisfy a first preset condition, wherein

the acquisition unit is further configured to generate the target detection model based on supervised labelled image data and the first unsupervised labelled image data.

10. The apparatus of item 9, wherein the first preset condition includes at least one of:

a size ratio of boundaries of a detection region satisfying a preset ratio;

a color of a detection region satisfying a preset color request; or

an angle of a target satisfying a preset angle.

11. The apparatus of item 9, wherein the acquisition unit is further configured to:

perform a step A: generating a seed model based on the supervised labelled image data and the first unsupervised labelled image data;

perform a step B: preprocessing the pre-training image data based on the seed model to generate updated first unsupervised labelled image data;

perform a step C: training the seed model based on the supervised labelled image data and the updated first unsupervised labelled image data to generate an updated seed model; and

designate the updated first unsupervised labelled image data as the first unsupervised labelled image data;

designate the updated seed model as the seed model;

perform step A and step B iteratively until the updated first unsupervised labelled image data satisfy a second preset condition; and

designate the updated seed model as the target detection model.

12. The apparatus of item 11, wherein the second preset condition includes at least one of:

a count of the iterations being greater than a second threshold; or

13. A computer device, comprising:

at least one computer-readable storage medium including a set of instructions; and

at least one processor in communication with the at least one computer-readable storage medium, wherein when executing the set of instructions, the at least one processor is directed to perform a target detection method of any one of items 1-6.

14. A non-transitory computer-readable storage medium embodying a computer program product, the computer program product comprising instructions and configured to cause a computing device to perform a target detection method of any one of items 1-6.

Claims

A system, comprising:

at least one computer-readable storage medium including a set of instructions for generating a target detection model; and

at least one processor in communication with the at least one computer-readable storage medium, wherein when executing the set of instructions, the at least one processor is directed to:

obtain a set of training image data, including at least a set of supervised labelled image data based on one or more operations and a set of unsupervised labelled image data based on the set of supervised labelled image data; and

generate a target detection model based on the set of supervised labelled image data and the set of unsupervised labelled image data.
The system of claim 1, wherein the one or more operations are conducted by an operator and the one or more operations include at least one of:

labelling a region of a target on a first set of pre-training image data;

providing category information of a target in a first set of pre-training image data; or

providing content information of a target in a first set of pre-training image data.
The system of any one of claims 1-2, wherein the at least one processor is further directed to:

input an image to the target detection model to generate attribute information of targets in the image.
The system of claim 3, wherein the attribute information includes at least one of regional information of the targets, category information of the targets, or content information of the targets.
The system of any one of claims 1-4, wherein to generate a set of unsupervised labelled image data based on the set of supervised labelled image data, the at least one processor is directed to:

generate an initial seed model based on the set of supervised labelled image data; and

process a second set of pre-training image data using the initial seed model to generate the set of unsupervised labelled image data.
The system of any one of claims 1-5, wherein the at least one processor is further directed to:

dispose at least one unsupervised labelled image data from the set of unsupervised labeled image data before training the target detection model, wherein the disposed unsupervised sample image does not satisfy at least one first preset condition.
The system of claim 6, wherein the at least one first preset condition includes at least one of:

an average output score of an initial seed model corresponding to the unsupervised sample image being greater than a first threshold;

a size of a target detected in the unsupervised labelled image data satisfying a preset size condition;

a color of a target detected in the unsupervised labelled image data satisfying a preset color condition; or

an angle of a target detected in the unsupervised labelled image data satisfying a preset angle condition.
The system of any one of claims 1-7, wherein to generate the set of unsupervised labelled image data based on the set of supervised labelled image data, the at least one processor is directed to:

perform an iterative process for updating the set of unsupervised labelled image data, wherein in each iteration of the iterative process, the at least one processor is directed to:

generate a seed model in a current iteration based on the set of supervised labelled image data and a set of unsupervised labelled image data generated in the preceding iteration;

process at least some of the set of training image data based on the seed model in the current iteration to generate a set of unsupervised labelled image data in the current iteration;

determine whether a second preset condition is satisfied; and

in response to a determination that the second preset condition is satisfied,

train the seed model in the current iteration based on the set of supervised labelled image data and the set of unsupervised labelled image data generated in the current iteration to generate a trained seed model;

terminate the iterative process; and

designate the trained seed model as the target detection model.
The system of claim 8, wherein the second preset condition includes at least one of:

a count of iterations of the iterative process exceeding a preset second threshold; or

an output score of the seed model generated in the current iteration being greater than a preset third threshold.
The system of claim 8, wherein the at least one processor is further directed to:

in each iteration,

obtain a set of testing image data;

obtain a reference detection result corresponding to the set of testing image data; and

generate a detection result corresponding to the set of testing image data using the seed model generated in the current iteration, wherein the second preset condition includes:

the detection result corresponding to the set of testing image data generated in the current iteration using the seed model being similar to the reference detection result.
The system of any one of claims 1-10, wherein the at least one processor is further directed to:

process the set of training image data based on a second target examination model, wherein the second target examination model provides labelling information of the target and labelling information of an interference subject associated with the target; and

generate the target detection model based on the set of processed training image data.
The system of any one of claims 1-10, wherein the at least one processor is further directed to:

process the set of supervised labelled image data based on a second target examination model, wherein the second target examination model provides labelling information of the target and labelling information of an interference subject associated with the target;

obtain a set of processed training image data based on the set of processed supervised labelled image data; and

generate the target detection model based on the set of processed training image data.
The system of claim 11 or 12, wherein the labelling information of the target includes a category of the target, and a score of the target, and the labelling information of the interference subject includes a category of the interference subject, and a score of the interference subject.
A method implemented on a computing device having at least one processor, at least one storage medium, and a communication platform connected to a network, the method comprising:

obtaining a set of training image data, including at least a set of supervised labelled image data based on one or more operations and a set of unsupervised labelled image data based on the set of supervised labelled image data; and

generating a target detection model based on the set of supervised labelled image data and the set of unsupervised labelled image data.
The method of claim 14, wherein the one or more operations are conducted by an operator and the one or more operations include at least one of:

labelling a region of a target on a first set of pre-training image data;

providing category information of a target in a first set of pre-training image data; or

providing content information of a target in a first set of pre-training image data.
The method of any one of claims 14-15, further comprising:

inputting an image to the target detection model to generate attribute information of targets in the image.
The method of claim 16, wherein the attribute information includes at least one of regional information of the targets, category information of the targets, or content information of the targets.
The method of any one of claims 14-17, wherein the generating a set of unsupervised labelled image data based on the set of supervised labelled image data includes:

generating an initial seed model based on the set of supervised labelled image data; and

processing a second set of pre-training image data using the initial seed model to generate the set of unsupervised labelled image data.
The method of any one of claims 14-18, further comprising:

disposing at least one unsupervised labelled image data from the set of unsupervised labeled image data before training the target detection model, wherein the disposed unsupervised sample image does not satisfy at least one first preset condition.
The method of claim 19, wherein the at least one first preset condition includes at least one of:

an average output score of an initial seed model corresponding to the unsupervised sample image being greater than a first threshold;

a size of a target detected in the unsupervised labelled image data satisfying a preset size condition;

a color of a target detected in the unsupervised labelled image data satisfying a preset color condition; or

an angle of a target detected in the unsupervised labelled image data satisfying a preset angle condition.
The method of any one of claims 14-20, wherein the generating the set of unsupervised labelled image data based on the set of supervised labelled image data includes:

performing an iterative process for updating the set of unsupervised labelled image data, wherein each iteration of the iterative process includes:

generating a seed model in a current iteration based on the set of supervised labelled image data and a set of unsupervised labelled image data generated in the preceding iteration;

processing at least some of the set of training image data based on the seed model in the current iteration to generate a set of unsupervised labelled image data in the current iteration;

determining whether a second preset condition is satisfied; and

in response to a determination that the second preset condition is satisfied,

training the seed model in the current iteration based on the set of supervised labelled image data and the set of unsupervised labelled image data generated in the current iteration to generate a trained seed model;

terminating the iterative process; and

designating the trained seed model as the target detection model.
The method of claim 21, wherein the second preset condition includes at least one of:

a count of iterations of the iterative process exceeding a preset second threshold; or

an output score of the seed model generated in the current iteration being greater than a preset third threshold.
The method of claim 21, further comprising:

in each iteration,

obtaining a set of testing image data;

obtaining a reference detection result corresponding to the set of testing image data; and

generating a detection result corresponding to the set of testing image data using the seed model generated in the current iteration, wherein the second preset condition includes:

the detection result corresponding to the set of testing image data generated in the current iteration using the seed model being similar to the reference detection result.
A non-transitory computer readable medium, comprising executable instructions that, when executed by at least one processor, directs the at least one processor to perform a method, the method comprising:

obtaining a set of training image data, including at least a set of supervised labelled image data based on one or more operations and a set of unsupervised labelled image data based on the set of supervised labelled image data; and

generating a target detection model based on the set of supervised labelled image data and the set of unsupervised labelled image data.
A target examination method, comprising:

obtaining data to be examined; and

examining the data to be examined, outputting an examination result including a category identifier of a target to be examined and a category identifier of an interference subject, and obtaining the target to be examined; wherein:

the interference subject is erroneously identified as the target to be examined for at least once in a target examination process.
The target examination method of claim 25, wherein the data to be examined includes image data, and the examination result is image data that uses different identification modes to identify the target to be examined and the interference subject.
The target examination method of claim 25, wherein the examining the data to be examined includes:

using a second target examination model to examine the data to be examined; wherein

the second target examination model is obtained based on a training process below:

training a preliminary model to generate the second target examination model using training data including labelling information of the target to be examined and labelling information of the interference subject, wherein:

the interference subject is erroneously identified as the target to be examined for at least once in the target examination process.
The target examination method of claim 27, wherein the training process further includes:

obtaining preliminary training data, and training the preliminary model to obtain a preliminary target examination model using the preliminary training data;

using the preliminary target examination model to perform target examination and outputting examination results; and

obtaining training data labelled with the interference subject based on the examination results;

training the preliminary target examination model using the training data labelled with the interference subject, and obtaining the second target examination model; wherein

the preliminary training data at least includes labelling information of the target to be examined.
The target examination method of claim 28, the preliminary training data also comprising labelling information of a priori interference subject, wherein

the priori interference subject is erroneously identified as the target to be examined for at least once in a target examination process other than the training process of the second target examination model.
A target examination system, comprising an obtaining module and an examination module, wherein:

the obtaining module is configured to obtain data to be examined; and

the examination module is configured to examine the data to be examined, output an examination result including a category identifier of a target to be examined and a category identifier of an interference subject, and obtain the target to be examined; and wherein:

the interference subject is erroneously identified as the target to be examined for at least once in a target examination process.
The target examination system of claim 30, wherein the data to be examined includes image data, and the examination result is image data of using different identification modes to identify the target to be examined and the interference subject.
The target examination system of claim 30, the examination module being further configured to use a second target examination model to examine the data to be examined, and the system further comprising a training module, wherein:

the training module is configured to train a preliminary model to generate the second target examination model using training data including labelling information of the target to be examined and labelling information of the interference subject, and wherein:

the interference subject is erroneously identified as the target to be examined for at least once in the target examination process.
The target examination system of claim 32, the training module further comprising:

a preliminary training unit configured to train the preliminary model to obtain a preliminary target examination model using preliminary training data;

a model examination unit configured to use the preliminary target examination model to perform target examination and output examination results;

a second training unit configured to obtain training data labelled with the interference subject based on the examination results and train the preliminary target examination model using the training data labelled with the interference subject to obtain the second target examination model; and wherein

the preliminary training data at least includes labelling information of the target to be examined.
The target examination system of claim 33, wherein the preliminary training data further includes labelling information of a priori interference subject, and wherein

the priori interference subject is erroneously identified as the target to be examined for at least once in a target examination process other than the training process of the second target examination model.
A target examination device, comprising at least one processor and at least one storage device, wherein:

the at least one storage device is configured to store a set of instructions; and

the at least one processor is configured to execute at least part of the set of instructions to perform the method of any one of claims 25-29.
A non-transitory computer readable medium, comprising executable instructions that, when executed by at least one processor, the method of any one of claims 25-29 is performed.