CN113887607A

CN113887607A - Target object information processing method and device and computer program product

Info

Publication number: CN113887607A
Application number: CN202111142661.4A
Authority: CN
Inventors: 王学占; 孔德超; 杜海
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-09-28
Filing date: 2021-09-28
Publication date: 2022-01-04

Abstract

The disclosure provides a target object information processing method and device and a computer program product, and particularly relates to technologies such as edge computing, cloud computing and deep learning, which can be used in a target detection scene. The specific implementation scheme is as follows: acquiring a training set, wherein the training set comprises a plurality of training subsets which are unbalanced in number and correspond to different types of target objects, and training samples included in each training subset comprise sample images and position labels and type labels of the target objects in the sample images; training to obtain a target object determination network through sample images and position labels in training samples in a training set; training to obtain a target object classification network based on sample images and type labels in training samples in a plurality of training subsets; and determining a network and a target object classification network by combining the target object to obtain a target object detection model. The method and the device improve the practicability and the adaptive range of the training method and the detection accuracy of the target object detection model.

Description

Target object information processing method and device and computer program product

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to techniques such as edge computing, cloud computing, and deep learning, and in particular, to a method and an apparatus for processing target object information, an electronic device, a storage medium, and a computer program product, which may be used in a target detection scenario.

Background

In the target detection algorithm, the situation that the number of sample data sets of each type has a large difference, that is, the samples are not balanced, is often encountered. If the target object detection model is trained by directly adopting the unbalanced training samples, the detection effect of the obtained model is generally not ideal, and particularly for the target object with a small number of samples, the detection effect is generally poor. At present, in the case of sample imbalance, a training set with sample imbalance is constructed and model training is performed on the basis of a class with a small data volume. However, the amount of training sample data is small, so that the detection effect of the obtained target object detection model is poor.

Disclosure of Invention

The disclosure provides a target object information processing method and device, an electronic device, a storage medium and a computer program product.

According to a first aspect, there is provided a method for processing target object information, including: acquiring a training set, wherein the training set comprises a plurality of training subsets which are unbalanced in number and correspond to different types of target objects, and training samples included in each training subset comprise sample images and position labels and type labels of the target objects in the sample images; training to obtain a target object determination network through sample images and position labels in training samples in a training set; training to obtain a target object classification network based on sample images and type labels in training samples in a plurality of training subsets; and determining a network and a target object classification network by combining the target object to obtain a target object detection model.

According to a second aspect, there is provided a target object detection method, comprising: acquiring an image to be detected; and obtaining a detection result of the target object in the image to be detected through a target object detection model, wherein the target object detection model is obtained through the method described in any one of the implementation manners of the first aspect.

According to a third aspect, there is provided an apparatus for processing target object information, comprising: a first obtaining unit configured to obtain a training set, wherein the training set includes a plurality of training subsets having an unbalanced number and corresponding to different types of target objects, and a training sample included in each training subset includes a sample image and a position label and a type label of the target object in the sample image; the first training unit is configured to train to obtain a target object determination network through sample images and position labels in training samples in a training set; a second training unit configured to train to obtain a target object classification network based on sample images and type labels in training samples in the plurality of training subsets; and the obtaining unit is configured to combine the target object determination network and the target object classification network to obtain the target object detection model.

According to a fourth aspect, there is provided an apparatus for detecting a target object, comprising: a second acquisition unit configured to acquire an image to be detected; a detection unit configured to obtain a detection result of the target object in the image to be detected through a target object detection model, where the target object detection model is obtained through the apparatus described in any implementation manner of the third aspect.

According to a fifth aspect, there is provided an electronic device comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being executable by the at least one processor to enable the at least one processor to perform the method as described in any one of the implementations of the first aspect and the second aspect.

According to a sixth aspect, there is provided a non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method as described in any one of the implementations of the first and second aspects.

According to a seventh aspect, there is provided a computer program product comprising: a computer program which, when executed by a processor, implements a method as described in any of the implementations of the first aspect and the second aspect.

According to the technology disclosed by the invention, under the condition of unbalanced training samples, a target object detection model is divided into a determination network and a classification network, all types of target objects in all training subsets are regarded as the same large class, and the target objects are trained through position labels of the target objects to determine the network; then, the characteristics that image classification tasks corresponding to the target object classification network are simple and the data amount of required training samples is small are utilized, the target object classification network is trained on the basis of the type labels of the sample images in each training subset, so that a final target detection object detection model is obtained, and the practicability and the adaptive range of the training method and the detection accuracy of the target object detection model are improved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is an exemplary system architecture diagram in which one embodiment according to the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method of processing target object information, according to the present disclosure;

fig. 3 is a schematic diagram of an application scenario of the processing method of the target object information according to the present embodiment;

FIG. 4 is a flow diagram of yet another embodiment of a method of processing target object information according to the present disclosure;

FIG. 5 is a flow chart of one embodiment of a method of target object detection according to the present disclosure;

FIG. 6 is a block diagram of one embodiment of a device for processing target object information according to the present disclosure;

FIG. 7 is a block diagram of one embodiment of a target object detection apparatus according to the present disclosure;

FIG. 8 is a schematic block diagram of a computer system suitable for use in implementing embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

Fig. 1 shows an exemplary architecture 100 to which the target object information processing method and apparatus, the target object detection method and apparatus of the present disclosure can be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The communication connections between the

terminal devices

101, 102, 103 form a topological network, and the network 104 serves to provide a medium for communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The

terminal devices

101, 102, 103 may be hardware devices or software that support network connections for data interaction and data processing. When the

terminal devices

101, 102, and 103 are hardware, they may be various electronic devices supporting network connection, information acquisition, interaction, display, processing, and the like, including but not limited to a monitoring device, a smart phone, a tablet computer, an e-book reader, a laptop portable computer, a desktop computer, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, such as a background processing server for training a target object determination network and a target object classification network to obtain a target object detection model based on a training operation initiated by a user through a terminal device in the case of unbalanced training samples. After the trained target object detection model is obtained, the server can execute a target object detection task through the target object detection model. Optionally, the server may feed back the detection result to the terminal device. As an example, the server 105 may be a cloud server.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be further noted that the processing method of the target object information and the detection method of the target object provided by the embodiments of the present disclosure may be executed by the server, or may be executed by the server and the terminal device in cooperation with each other. Accordingly, the processing device of the target object information and each part (for example, each unit) included in the detection device of the target object may be provided in the server entirely or may be provided in the server and the terminal device, respectively.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. When the electronic device on which the processing method of the target object information and the detection method of the target object operate does not need to perform data transmission with other electronic devices, the system architecture may only include the electronic device (e.g., a server or a terminal device) on which the processing method of the target object information and the detection method of the target object operate.

Referring to fig. 2, fig. 2 is a flowchart of a method for processing target object information according to an embodiment of the disclosure, where the process 200 includes the following steps:

step 201, a training set is obtained.

In this embodiment, an execution subject (for example, a terminal device or a server in fig. 1) of the target object information processing method may obtain the training set from a remote location or from a local location through a wired network connection manner or a wireless network connection manner.

The training set includes a plurality of training subsets with unequal numbers and corresponding to different types of target objects, and the training samples included in each training subset include sample images and position labels and type labels of the target objects in the sample images. The position label is used for representing information such as the position, the area and the like of the target object in the sample image, for example, the position label of the target object can be taken as a minimum rectangular frame of the target object in the sample image; the type labels are used for representing the type information of the target object, the type labels can be represented in the forms of numbers, texts, characters and the like, and different types of targets are set to be different in a sharing mode.

The multiple types of target objects corresponding to the multiple training subsets one-to-one may be target objects of various subclasses having similar types and greater correlation in the same class, for example, the target objects corresponding to the training subsets are target objects of various subclasses belonging to the same class of electronic products, such as mobile phones, computers, digital cameras, smart speakers, and the like; the training objects may be different types of target objects that are independent of each other and have small correlation, for example, the target objects in the training subsets may be electronic products, fresh products, clothes, and the like.

The sample imbalance is used to characterize the large difference in the number of training samples in different training subsets. For example, training samples with a sample number ratio exceeding 4:1 are generally referred to as unbalanced samples.

Step 202, training to obtain a target object determination network through sample images and position labels in training samples in a training set.

In this embodiment, the execution subject obtains the target object determination network through training by using the sample image and the position tag in the training sample in the training set. Wherein the target object determination network is used to determine position information of the target object in the input image.

As an example, the executing body regards target objects corresponding to the plurality of training subsets as the same type of target object, so as to determine whether the target object exists in the sample image through the target object determination network; when it is determined that the target object exists in the sample image, position information of the target object in the image is determined.

Specifically, firstly, the executing subject selects an untrained training sample from a training set, inputs a sample image in the training sample into an initial determination network, uses a position label in the training sample as an expected output of the initial determination network, and obtains a final target object determination network through an iterative training process. For example, the execution subject may determine a determination loss between an actual output and an expected output of the initial determination model, calculate a gradient according to the determination loss, adjust a network parameter of the initial determination network according to the gradient by using a gradient descent algorithm, use a model obtained after adjusting the parameter each time as the initial determination network for the next training, and end the training when a preset training end condition is satisfied, thereby obtaining the target object determination model through the training.

The training end condition may be, for example, that the training time exceeds a preset time threshold, the training number exceeds a preset number threshold, the training loss tends to converge or is lower than a preset loss threshold, and the like.

The initial determination network may be any classification network having a target object determination function, for example, R-CNN (Region Convolutional Neural Networks), Fast R-CNN (Fast Region Convolutional Neural Networks), SSD (Single Shot multi box Detector), YOLO (You need Only look Once), and the like.

And step 203, training to obtain a target object classification network based on sample images and type labels in training samples in the training subsets.

In this embodiment, the executing entity may obtain a target object classification network through training based on sample images and type labels in training samples in a plurality of training subsets. The target object classification network is used for determining the type of the target object in the input image.

As an example, the executing entity may train the initial classification network with sample images and type labels in all training samples in the plurality of training subsets. Specifically, firstly, the executing subject selects untrained training samples from a plurality of training subsets, inputs sample images in the training samples into an initial classification network, uses type labels in the training samples as expected outputs of the initial classification network, and obtains a final target object classification network through an iterative training process. For example, the execution subject may determine a classification loss between an actual output and an expected output of the initial classification model, calculate a gradient according to the classification loss, adjust a network parameter of the initial classification network according to the gradient by using a gradient descent algorithm, use a model obtained after adjusting the parameter each time as the initial classification network for the next training, and end the training when a preset training end condition is met, thereby obtaining the target object determination model through the training.

It should be noted that the task of classifying the target object is simple and requires a small amount of training samples. As another example, the executing entity may extract the same number of training samples from each training subset, and then train the initial classification network with the extracted training samples to obtain the target object classification network.

The initial classification network may be any classification network having a target object classification function, such as a decision tree, a support vector machine, na iotave bayes, a random forest, and the like.

And step 204, determining a network and a target object classification network by combining the target object to obtain a target object detection model.

In this embodiment, the execution body may combine the target object determination network and the target object classification network to obtain the target object detection model.

Specifically, the target object determination network and the target object classification network are combined in sequence based on the target object determination network and the target object classification network to obtain the target object detection model. And determining the position information of the target object in the input image through a target object determination network in the target object detection model, and determining the type of the target object in the image area indicated by the position information through a target object classification network, thereby completing the target detection task of the input image.

The target object detection model can be applied to various fields. For example, in the field of face detection, a target object detection model is used for tasks such as face recognition, face authentication and the like; in the field of intelligent transportation, a target object detection model is used for tasks such as obstacle detection, traffic sign identification and the like.

With continued reference to fig. 3, fig. 3 is a schematic diagram 300 of an application scenario of the processing method of the target object information according to the present embodiment. In the application scenario of fig. 3, first, the server obtains a training set from a database 301. The training set 301 includes a plurality of

training subsets

3011, 3012, 3013, 3014 with unequal numbers and corresponding to different types of target objects, and the training samples included in each training subset include sample images and position labels and type labels of the target objects in the sample images. Then, the target object determination network 302 is obtained through training by using the sample images and the position labels in the training samples in the training set 301. Then, based on sample images and type labels in training samples in the

multiple training subsets

3011, 3012, 3013, 3014, training to obtain a target object classification network 303; the target object detection model 304 is obtained by combining the target object determination network 302 and the target object classification network 303.

In this embodiment, under the condition of unbalanced training samples, the target object detection model is divided into a determination network and a classification network, all types of target objects in each training subset are regarded as the same large class, and the target object determination network is trained through the position labels of the target objects; then, the target object classification network is trained based on the type labels of the sample images in each training subset by utilizing the characteristics that the image classification task corresponding to the target object classification network is simple and the data amount of the required training samples is small, so as to obtain a final target detection object detection model, and the practicability and the adaptive range of the training method and the detection accuracy of the target object detection model are improved.

In some optional implementations of this embodiment, the executing main body may execute the step 203 by:

firstly, based on a plurality of training subsets, a preset mode is adopted to obtain a plurality of updated training subsets with balanced quantity.

The preset mode may be any mode for equalizing the training subsets, including but not limited to data expansion based on image rotation, movement, and noise addition, resampling of the training subsets with smaller data amount, and cost-sensitive methods. The cost sensitive method is used for increasing the penalty cost of the small sample misclassification, and the cost is directly embodied in the objective function of model training. In this way, the attention of the model on small samples can be adjusted by optimizing the objective function.

Secondly, training to obtain a target object classification network through a plurality of sample images and type labels in training samples in the updated training subset.

Specifically, the executing entity may use a machine learning method to train to obtain the target object classification network by taking the sample images in the training samples in the plurality of updated training subsets as input and taking the type labels of the input sample images as expected output.

In the implementation mode, the target object classification model is obtained through training of a plurality of updating training subsets with balanced quantity, and the classification accuracy of the target object classification model is improved.

In some optional implementations of this embodiment, the executing body may execute the first step by:

and sampling training subsets except the target training subset by taking the target training subset with the minimum number of training samples in the plurality of training subsets as a reference to obtain a plurality of updated training subsets.

Specifically, for each training subset outside the target training subset, a number of training samples similar to or equal to the number of training samples of the target training subset are sampled to obtain a plurality of updated training subsets.

For example, we identify whether a worker is wearing a hard hat in a worksite scene, and the particular color of the hard hat being worn. The colors of the safety helmet are yellow, red, blue, white and the like according to the work of workers on the construction site. Generally, yellow safety helmets are worn by ordinary workers and are the largest in number. The white safety helmet is worn by managers and is minimum in number. Suppose that our training set contains 5000 sample images in total, wherein 4000 sample images include yellow helmets, 100 sample images include white helmets, and 450 sample images include blue helmets and red helmets. At this point, the samples of the training subsets corresponding in sequence to yellow, red, blue, and white helmets are unbalanced.

First, the executive subjects consider the safety caps in all 5000 sample images as a category without considering the color difference of the safety caps, and train a safety cap determination network that can determine the position information of the safety caps in the images. Then, 100 sample images are selected from the training subsets of the safety helmet corresponding to the colors respectively, a safety helmet classification model capable of classifying the colors of the safety helmet is trained through the selected sample images, finally, a safety helmet determination network and a safety helmet classification network are combined to obtain a safety helmet detection model, and the position and the color of the safety helmet in the input image can be detected.

In the implementation mode, the characteristics that the classification network training is simple and the number of required training samples is small are utilized, so that a simple and effective sample balancing method is provided, and balanced samples can be obtained quickly and conveniently.

In some optional implementations of this embodiment, the executing main body may execute the step 202 by:

and training to obtain a plurality of target object determination networks with different network structures by using the sample images in the training samples in the training set as input and using the position labels corresponding to the input sample images as expected output by using a machine learning method.

The executing body may also use a machine learning method to train to obtain a plurality of target object classification networks with different network structures by taking the sample images in the training samples in the plurality of updated training subsets as input and taking the type labels corresponding to the input sample images as expected output.

The network models with different network structures can be the same type of network models with different network structures obtained by improving the same basic network, and can also be different types of network models. A plurality of target object determination networks with different network structures and a plurality of target object classification networks with different network structures are obtained through training, so that the selectivity of the determination networks and the classification networks is improved, and a network model with a better effect is obtained.

In some optional implementations of this embodiment, the executing main body may execute the step 204 by:

first, a target determination network is determined from a plurality of target object determination networks.

Then, a target classification network is determined from the plurality of target object classification networks.

And finally, combining the target determination network and the target classification network to obtain a target object detection model.

As an example, the executing entity may use a determination network with the highest accuracy among the plurality of target object determination networks as the target determination network, and use a classification network with the highest accuracy among the plurality of target object classification networks as the target classification network, so as to obtain the target object detection model by combining the target determination network and the target classification network. Thus, the detection accuracy of the obtained target object detection network is improved.

With continued reference to FIG. 4, an exemplary flow 400 of one embodiment of a method for processing target object information in accordance with the methods of the present disclosure is shown. The process 400 includes the following steps:

step 401, a training set is obtained.

The training set comprises a plurality of training subsets which are unbalanced in number and correspond to different types of target objects, and the training samples in each training subset comprise sample images and position labels and type labels of the target objects in the sample images.

Step 402, training to obtain a target object determination network through sample images and position labels in training samples in a training set.

Step 403, taking the target training subset with the minimum number of training samples included in the plurality of training subsets as a reference, sampling the training subsets other than the target training subset to obtain a plurality of updated training subsets.

And step 404, training to obtain a target object classification network through the sample images and the type labels in the training samples in the plurality of updated training subsets.

And step 405, combining the target object determination network and the target object classification network to obtain a target object detection model.

As can be seen from this embodiment, compared with the embodiment corresponding to fig. 2, the process 400 of the target object information processing method in this embodiment specifically describes the training process of the target object determination network and the target object classification network, and further improves the detection accuracy of the target object detection model based on a simple and effective sample equalization manner by using the characteristics of simple image classification task and small data amount of the required training samples corresponding to the target object classification network.

With continuing reference to fig. 5, fig. 5 is a flowchart of a method for detecting a target object according to an embodiment of the disclosure, where the flowchart 500 includes the following steps:

and step 501, acquiring an image to be detected.

In this embodiment, an execution subject (for example, a server or a terminal device in fig. 1) of the target object detection method may obtain the image to be detected from a remote location or a local location through a wired network connection manner or a wireless network connection manner.

The image to be detected may be an image including a target object. For example, the image to be detected is an image including a human face.

Step 502, obtaining a detection result of the target object in the image to be detected through the target object detection model.

In this embodiment, the execution subject may obtain a detection result of the target object in the image to be detected through the target object detection model. The target object detection model is obtained through the

embodiments

200 and 400.

Specifically, the execution body may determine, through a target object determination network in the target object detection model, position information of a target object in the image to be detected, and determine, through a target object classification network in the target object detection model, type information of the target object in the image to be detected, to obtain a detection result composed of the position information and the type information.

In this embodiment, an accurate detection result is obtained based on the target object detection model.

With continuing reference to fig. 6, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of a device for processing target object information, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable to various electronic devices.

As shown in fig. 6, the apparatus for processing target object information includes: a first obtaining unit 601 configured to obtain a training set, where the training set includes a plurality of training subsets with unequal numbers and corresponding to different types of target objects, and a training sample included in each training subset includes a sample image and a position label and a type label of a target object in the sample image; a first training unit 602 configured to train to obtain a target object determination network through sample images and position labels in training samples in a training set; a second training unit 603 configured to train to obtain a target object classification network based on sample images and type labels in training samples in the plurality of training subsets; a deriving unit 604 configured to combine the target object determination network and the target object classification network to derive a target object detection model.

In some optional implementations of this embodiment, the second training unit 603 is further configured to: based on the training subsets, obtaining a plurality of updated training subsets with balanced quantity by adopting a preset mode; and training to obtain a target object classification network through the sample images and the type labels in the training samples in the plurality of updated training subsets.

In some optional implementations of this embodiment, the second training unit 603 is further configured to: and sampling training subsets except the target training subset by taking the target training subset with the minimum number of training samples in the plurality of training subsets as a reference to obtain a plurality of updated training subsets.

In some optional implementations of this embodiment, the second training unit 603 is further configured to: training to obtain a plurality of target object classification networks with different network structures by using sample images in training samples in a plurality of updated training subsets as input and using type labels corresponding to the input sample images as expected output by using a machine learning method; and a first training unit 602, further configured to: and training to obtain a plurality of target object determination networks with different network structures by using the sample images in the training samples in the training set as input and using the position labels corresponding to the input sample images as expected output by using a machine learning method.

In some optional implementations of this embodiment, the obtaining unit 604 is further configured to: determining a target determination network from a plurality of target object determination networks; determining a target classification network from a plurality of target object classification networks; and combining the target determination network and the target classification network to obtain a target object detection model.

In this embodiment, under the condition of unbalanced training samples, the target object detection model is divided into a determination network and a classification network, all types of target objects are regarded as the same large class, and the target objects are trained through position labels of the target objects to determine the network; then, by utilizing the characteristics that the image classification task corresponding to the target object classification network is simple and the data amount of the required training samples is small, the target object classification network is trained based on the type labels of the sample images in each training subset to obtain a final target detection object detection model, so that the practicability and the adaptive range of the training device are improved, and the detection accuracy of the target object detection model is improved.

With continuing reference to fig. 7, as an implementation of the methods illustrated in the above figures, the present disclosure provides an embodiment of an apparatus for detecting a target object, which corresponds to the embodiment of the method illustrated in fig. 5, and which may be applied in various electronic devices.

As shown in fig. 7, the target object detection apparatus includes: a second acquisition unit 701 configured to acquire an image to be detected; a detection unit 702 configured to obtain a detection result of the target object in the image to be detected through a target object detection model, where the target object detection model is obtained through the embodiment 600.

According to an embodiment of the present disclosure, the present disclosure also provides an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can implement the target object information processing method and the target object detection method described in any of the above embodiments.

According to an embodiment of the present disclosure, the present disclosure further provides a readable storage medium storing computer instructions for enabling a computer to implement the target object information processing method and the target object detection method described in any of the above embodiments when executed.

The embodiments of the present disclosure provide a computer program product, which when executed by a processor can implement the target object information processing method and the target object detection method described in any of the above embodiments.

FIG. 8 illustrates a schematic block diagram of an example electronic device 800 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 8, the apparatus 800 includes a computing unit 801 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)802 or a computer program loaded from a storage unit 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the device 800 can also be stored. The calculation unit 801, the ROM 802, and the RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.

A number of components in the device 800 are connected to the I/O interface 805, including: an input unit 806, such as a keyboard, a mouse, or the like; an output unit 807 such as various types of displays, speakers, and the like; a storage unit 808, such as a magnetic disk, optical disk, or the like; and a communication unit 809 such as a network card, modem, wireless communication transceiver, etc. The communication unit 809 allows the device 800 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

Computing unit 801 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 801 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and the like. The calculation unit 801 executes the respective methods and processes described above, such as a processing method of target object information, a detection method of a target object. For example, in some embodiments, the processing of target object information, the detection of target objects, may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 808. In some embodiments, part or all of the computer program can be loaded and/or installed onto device 800 via ROM 802 and/or communications unit 809. When the computer program is loaded into the RAM 803 and executed by the computing unit 801, one or more steps of the processing method of the target object information, the detection method of the target object described above may be executed. Alternatively, in other embodiments, the computing unit 801 may be configured to perform the processing method of the target object information, the detection method of the target object by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of large management difficulty and weak service expansibility existing in the traditional physical host and Virtual Private Server (VPS) service; it may also be a server of a distributed system, or a server incorporating a blockchain.

According to the technical scheme of the embodiment of the disclosure, under the condition of unbalanced training samples, a target object detection model is divided into a determination network and a classification network, all types of target objects in all training subsets are regarded as the same large class, and the target objects are trained through position labels of the target objects to determine the network; then, the target object classification network is trained based on the type labels of the sample images in each training subset by utilizing the characteristics that the image classification task corresponding to the target object classification network is simple and the data amount of the required training samples is small, so as to obtain a final target detection object detection model, and the practicability and the adaptive range of the training method and the detection accuracy of the target object detection model are improved.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in this disclosure may be performed in parallel, sequentially, or in a different order, as long as the desired results of the technical solutions provided by this disclosure can be achieved, and are not limited herein.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A method for processing target object information comprises the following steps:

acquiring a training set, wherein the training set comprises a plurality of training subsets which are unbalanced in number and correspond to different types of target objects, and training samples included in each training subset comprise sample images and position labels and type labels of the target objects in the sample images;

training to obtain a target object determination network through sample images and position labels in training samples in the training set;

training to obtain a target object classification network based on sample images and type labels in training samples in the training subsets;

and combining the target object determination network and the target object classification network to obtain the target object detection model.

2. The method of claim 1, wherein training a target object classification network based on sample images and type labels in training samples in the plurality of training subsets comprises:

based on the training subsets, obtaining a plurality of updated training subsets with balanced quantity by adopting a preset mode;

and training to obtain the target object classification network through sample images and type labels in training samples in the plurality of updated training subsets.

3. The method of claim 2, wherein the obtaining a number of updated training subsets with a number equal in a preset manner based on the plurality of training subsets comprises:

and sampling training subsets except the target training subset by taking the target training subset with the minimum number of training samples in the plurality of training subsets as a reference to obtain the plurality of updated training subsets.

4. The method of claim 2 or 3, wherein the training of the target object classification network by sample images and type labels in training samples in the plurality of updated training subsets comprises:

training to obtain a plurality of target object classification networks with different network structures by using the sample images in the training samples in the plurality of updated training subsets as input and using the type labels corresponding to the input sample images as expected output by using a machine learning method; and

the training to obtain the target object determination network through the sample images and the position labels in the training samples in the training set comprises:

5. The method of claim 4, wherein the combining the target object determination network and the target object classification network to derive the target object detection model comprises:

determining a target determination network from the plurality of target object determination networks;

determining a target classification network from the plurality of target object classification networks;

and combining the target determination network and the target classification network to obtain the target object detection model.

6. A method of detecting a target object, comprising:

acquiring an image to be detected;

and obtaining a detection result of the target object in the image to be detected through a target object detection model, wherein the target object detection model is obtained through any one of claims 1 to 5.

7. An apparatus for processing target object information, comprising:

a first obtaining unit configured to obtain a training set, wherein the training set includes a plurality of training subsets with unequal numbers and corresponding to different types of target objects, and training samples included in each training subset include a sample image and position labels and type labels of the target objects in the sample image;

a first training unit configured to train to obtain a target object determination network through sample images and position labels in training samples in the training set;

a second training unit configured to train a target object classification network based on sample images and type labels in training samples in the plurality of training subsets;

an obtaining unit configured to obtain the target object detection model in combination with the target object determination network and the target object classification network.

8. The apparatus of claim 7, wherein the second training unit is further configured to:

based on the training subsets, obtaining a plurality of updated training subsets with balanced quantity by adopting a preset mode; and training to obtain the target object classification network through sample images and type labels in training samples in the plurality of updated training subsets.

9. The apparatus of claim 8, wherein the second training unit is further configured to:

10. The apparatus of claim 8 or 9, wherein the second training unit is further configured to:

a first training unit further configured to:

11. The apparatus of claim 10, wherein the deriving unit is further configured to:

determining a target determination network from the plurality of target object determination networks; determining a target classification network from the plurality of target object classification networks; and combining the target determination network and the target classification network to obtain the target object detection model.

12. A target object detection apparatus comprising:

a second acquisition unit configured to acquire an image to be detected;

a detection unit configured to obtain a detection result of a target object in the image to be detected through a target object detection model, wherein the target object detection model is obtained through any one of claims 7 to 11.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.

15. A computer program product, comprising: computer program which, when being executed by a processor, carries out the method according to any one of claims 1-6.