CN112949731A

CN112949731A - Target detection method, device, storage medium and equipment based on multi-expert model

Info

Publication number: CN112949731A
Application number: CN202110266617.8A
Authority: CN
Inventors: 王堃
Original assignee: Jiangsu Yu Space Technology Co ltd
Current assignee: Wuxi Yuspace Intelligent Technology Co ltd
Priority date: 2021-03-11
Filing date: 2021-03-11
Publication date: 2021-06-11

Abstract

The application discloses a target detection method, a target detection device, a storage medium and target detection equipment based on a multi-expert model, and belongs to the technical field of image processing. The method is used in a neural network model, the neural network model comprises a fast regional convolutional neural network and a multi-expert model, and the method comprises the following steps: acquiring an image to be identified, wherein the image comprises at least one target object; processing the image through a fast regional convolutional neural network to obtain a plurality of candidate characteristic graphs; determining candidate feature maps matching each expert model in the multiple expert models; and processing the matched candidate feature map through each expert model to obtain the category and the position of each target object. According to the method and the device, different candidate feature maps can be processed by using different expert models, so that the different candidate feature maps can be processed by the expert model which is good at processing the data area, and the recognition rate of target detection is improved.

Description

Target detection method, device, storage medium and equipment based on multi-expert model

Technical Field

The embodiment of the application relates to the technical field of image processing, in particular to a target detection method, a target detection device, a storage medium and target detection equipment based on a multi-expert model.

Background

Object detection refers to the detection of objects of interest from an image, including the localization and classification of the objects. In recent years, the target detection algorithm has made a great breakthrough.

One algorithm that is more popular at present includes two stages (two-stage), that is, the neural network model may first extract a plurality of target candidate regions from the image, and then classify and regress the plurality of target candidate regions, so as to obtain the categories and positions of different target objects in the image.

With the increase of the size of the data set, a single neural network model is often good at processing only a part of data, that is, a single neural network cannot accurately process a plurality of target candidate regions, thereby affecting the recognition rate of target detection.

Disclosure of Invention

The embodiment of the application provides a target detection method, a target detection device, a storage medium and a target detection device based on a multi-expert model, which are used for solving the problem that a single neural network cannot accurately process a plurality of target candidate areas, so that the recognition rate of target detection is influenced. The technical scheme is as follows:

in one aspect, a multi-expert model-based target detection method is provided, and is used in a neural network model, where the neural network model includes a fast area convolutional neural network and a multi-expert model, and the method includes:

acquiring an image to be identified, wherein the image comprises at least one target object;

processing the image through a fast regional convolutional neural network to obtain a plurality of candidate characteristic graphs;

determining candidate feature maps matching each expert model in the multiple expert models;

and processing the matched candidate feature map through each expert model to obtain the category and the position of each target object.

In one possible implementation, the determining, in the multi-expert model, a candidate feature map matching each expert model includes:

for each expert model, acquiring a weight coefficient and a plurality of candidate feature maps of the expert model;

for each candidate feature map, inputting the candidate feature map and the weight coefficient into a predetermined function to obtain the probability of matching the candidate feature map with the expert model;

and determining the candidate feature map corresponding to the maximum probability as the candidate feature map matched with the expert model.

In a possible implementation manner, the obtaining the weight coefficients and the plurality of candidate feature maps of the expert model includes:

acquiring a plurality of candidate feature maps, the weight of each expert model and an expert mark vector;

for each expert model, inputting the weight of the expert model and the candidate feature map into the predetermined function to obtain a first calculation result;

inputting the expert marking vector of the expert model and the first calculation result into a first loss function of the expert model to obtain a second calculation result;

and determining the weight coefficient of the expert model according to the second calculation result.

In a possible implementation manner, the processing the matched candidate feature map through each expert model to obtain the category and the position of each target object includes:

for each expert model, processing the matched candidate feature map through a second loss function of the expert model to obtain the category of the target object;

and processing the matched candidate feature map through a third loss function of the expert model to obtain the position offset of the target object, and determining the position of the target object according to the position offset.

In a possible implementation manner, the fast area convolutional neural network includes a feature extraction layer, an area candidate network layer, and an area-of-interest pooling layer, and the processing of the image by the fast area convolutional neural network to obtain a plurality of candidate feature maps includes:

extracting the features of the image through the feature extraction layer to obtain a feature map;

calculating the characteristic diagram through the regional candidate network layer to obtain a plurality of candidate regions;

and calculating the feature map and the candidate regions through the region-of-interest pooling layer to obtain a plurality of candidate feature maps.

In one aspect, a multi-expert-model-based target detection apparatus is provided, and is used in a neural network model, where the neural network model includes a fast area convolutional neural network and a multi-expert model, and the apparatus includes:

the device comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module is used for acquiring an image to be recognized, and the image comprises at least one target object;

the first processing module is used for processing the image through a fast regional convolutional neural network to obtain a plurality of candidate feature maps;

a determining module, configured to determine candidate feature maps matching each expert model in the multiple expert models;

and the second processing module is used for processing the matched candidate feature map through each expert model to obtain the category and the position of each target object.

In a possible implementation manner, the determining module is further configured to:

In one aspect, a computer-readable storage medium is provided, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the multi-expert model based object detection method as described above.

In one aspect, an electronic device is provided, which includes a processor and a memory, where at least one instruction is stored in the memory, and the instruction is loaded and executed by the processor to implement the multi-expert model based object detection method as described above.

The technical scheme provided by the embodiment of the application has the beneficial effects that at least:

because the neural network model comprises the fast regional convolutional neural network and the multi-expert model, the image can be processed through the fast regional convolutional neural network to obtain a plurality of candidate characteristic graphs, then the candidate characteristic graph matched with each expert model is determined in the multi-expert model, and finally the matched candidate characteristic graph is processed through each expert model to obtain the category and the position of each target object. Because each expert model in the multiple expert models has a data area which is good at processing, different candidate feature maps can be processed by using different expert models, so that the different candidate feature maps can be processed by the expert model which is good at processing the data area, and the recognition rate of target detection is improved.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

FIG. 1 is a flowchart of a method for multi-expert model based target detection provided by an embodiment of the present application;

FIG. 2 is a schematic flowchart of a multi-expert model-based target detection method according to an embodiment of the present application;

fig. 3 is a block diagram of a target detection apparatus based on a multi-expert model according to still another embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a method for detecting an object based on a multi-expert model according to an embodiment of the present application is shown, where the method for detecting an object based on a multi-expert model can be applied to a neural network model, and the neural network model includes a fast area convolution neural network and a multi-expert model. The target detection method based on the multi-expert model can comprise the following steps:

step 101, acquiring an image to be identified, wherein the image comprises at least one target object.

The image to be recognized may be captured by the electronic device or acquired from another electronic device, and the source of the image is not limited in this embodiment.

The target object is an object that needs to be recognized from the image.

And 102, processing the image through a fast regional convolutional neural network to obtain a plurality of candidate characteristic graphs.

The fast convolutional Neural Network may be a fast-RCNN (fast Region convolutional Neural Network), and the fast Region convolutional Neural Network includes a feature extraction layer, a Region candidate Network layer (RPN), and a Region of Interest Pooling layer (RoI-Pooling layer).

Specifically, processing the image through the fast regional convolutional neural network to obtain a plurality of candidate feature maps may include the following several sub-steps.

At step 1021, feature extraction is performed on the image through the feature extraction layer to obtain a feature map.

The characteristic extraction layer is a group of basic conv (convolution) + relu (activation) + pooling (pooling) layers, and the characteristic extraction is carried out on the image by utilizing the characteristic extraction layer, so that the characteristic graph can be obtained.

In one example, VGG16 may be employed as a feature extraction layer to extract a feature map of an image, and VGG16 includes 13 convolutional layers, 13 activation layers, and 4 pooling layers.

In sub-step 1022, the feature map is calculated by the regional candidate network layer to obtain a plurality of candidate regions.

The regional candidate network layer is used for realizing the target positioning function.

Specifically, the local candidate network layer evaluates all spatial positions in the image according to the feature map, and generates anchor boxes, also called anchor frames, by presetting the size of anchor points (anchors). And each anchor point of the feature graph is provided with 9 anchor frames at the early stage as an initial detection frame, and the detection frame is corrected at the later stage. Then, the local candidate network layer determines whether the anchor point belongs to a positive sample (positive) or a negative sample (negative) according to an Intersection over Union (IoU) of the windows, wherein IoU calculates a ratio of an Intersection and a Union of the "predicted bounding box" and the "real bounding box". In this embodiment, anchor points greater than IoU, 0.7, may be set to be positive samples, those belonging between 0.3 and 0.7 at IoU are negative samples, and negative samples are discarded, leaving positive samples. After obtaining the anchor frame, the area candidate network layer further determines whether the anchor frame contains the recognition object, which belongs to the foreground or the background, that is, a binary problem, and retains the anchor frame belonging to the foreground, which is equivalent to preliminarily extracting the candidate area of the target object. And finally, the area candidate network layer performs operations such as translation and scaling on the anchor frame by using bounding box regression (bounding box regression) to obtain a more reasonable bounding box and obtain a final candidate area.

And a substep 1023 of calculating the feature map and the candidate regions through the region-of-interest pooling layer to obtain a plurality of candidate feature maps.

Considering that the sizes of the anchor frames are not uniform, in the conventional solution, a cropping or scaling method may be selected to ensure that the sizes are uniform, but this may also destroy the complete structure of the image and the original shape information of the image.

And step 103, determining candidate feature maps matched with each expert model in the multiple expert models.

In the image field, the appearance of the object may change greatly due to the shape and the observation angle thereof, so that the embodiment introduces multiple expert models, and each expert model in the multiple expert models has a data area which is good for processing, so that different candidate feature maps can be processed by using different expert models, so that different candidate feature maps can be processed by the expert model which is good for processing the data area, and the recognition rate of target detection is improved.

The multiple Expert models may also be referred to as an Expert Management Network (EMN), which includes an Expert allocation (Expert Assignment) and multiple Expert models (Expert), and the Expert Management Network includes a convolutional layer and a full link layer. And each expert model is connected with two loss functions, and object classification and bounding box regression are respectively carried out on the candidate characteristic graphs to obtain the category and the position offset of the target object.

When the expert management network assigns expert models, probabilities may be calculated to select appropriate expert models for classification and bounding box regression of candidate feature maps. Specifically, determining the candidate feature map matching each expert model in the multiple expert models may include: for each expert model, obtaining a weight coefficient and a plurality of candidate feature maps of the expert model; for each candidate feature map, inputting the candidate feature map and the weight coefficient into a predetermined function to obtain the probability of matching the candidate feature map with the expert model; and determining the candidate feature map corresponding to the maximum probability as the candidate feature map matched with the expert model.

In one implementation, the expert management network may calculate the probability according to the formula one, where:

wherein p (e) represents the probability of selecting the expert model e, f (·) represents a predetermined function, m represents a candidate feature map, and w represents a weight coefficient.

Obtaining the weight coefficients and the plurality of candidate feature maps of the expert model may include: acquiring a plurality of candidate feature maps, the weight of each expert model and an expert mark vector; for each expert model, inputting the weight of the expert model and the candidate feature map into a predetermined function to obtain a first calculation result; inputting the expert marking vector and the first calculation result of the expert model into a first loss function of the expert model to obtain a second calculation result; and determining the weight coefficient of the expert model according to the second calculation result.

In one implementation, the expert management network may calculate the weight coefficient w according to a formula two, where the formula two is:

wherein, Loss () represents a first Loss function, and y represents a preset expert mark vector.

And 104, processing the matched candidate feature map through each expert model to obtain the category and the position of each target object.

After each expert model is assigned a candidate feature map, the candidate feature map may be input into the corresponding expert model for processing. Specifically, the processing the matched candidate feature map through each expert model to obtain the category and the position of each target object may include: for each expert model, processing the matched candidate feature map through a second loss function of the expert model to obtain the category of the target object; and processing the matched candidate characteristic graph through a third loss function of the expert model to obtain the position offset of the target object, and determining the position of the target object according to the position offset.

Referring to fig. 2, in fig. 2, the image is input into a fast Network, and the image is convolved, activated, and pooled by the feature extraction layer to obtain a feature map; performing softmax classification and bounding box regression processing on the feature map through a regional candidate network layer to obtain a candidate region (Proposal); and processing the feature map and the candidate region through the region-of-interest pooling layer to obtain a candidate feature map. Inputting the candidate feature graph into an EMN Network, firstly allocating the candidate feature graph to a corresponding Expert model (Expert) through Expert allocation, and then processing the candidate feature graph through each Expert model to obtain the category and the position of the target object.

In summary, according to the target detection method based on multiple expert models provided in the embodiment of the present application, since the neural network model includes the fast regional convolutional neural network and the multiple expert models, the image may be processed through the fast regional convolutional neural network to obtain a plurality of candidate feature maps, the candidate feature map matched with each expert model is determined in the multiple expert models, and finally the matched candidate feature map is processed through each expert model to obtain the category and position of each target object. Because each expert model in the multiple expert models has a data area which is good at processing, different candidate feature maps can be processed by using different expert models, so that the different candidate feature maps can be processed by the expert model which is good at processing the data area, and the recognition rate of target detection is improved.

Referring to fig. 3, a block diagram of a multi-expert model based object detection apparatus provided in an embodiment of the present application is shown, where the multi-expert model based object detection apparatus can be applied to a neural network model, and the neural network model includes a fast area convolution neural network and a multi-expert model. The target detection device based on the multi-expert model can comprise:

an obtaining module 310, configured to obtain an image to be identified, where the image includes at least one target object;

the first processing module 320 is configured to process the image through a fast regional convolutional neural network to obtain a plurality of candidate feature maps;

a determining module 330, configured to determine candidate feature maps matching each expert model in the multiple expert models;

and the second processing module 340 is configured to process the matched candidate feature map through each expert model to obtain a category and a position of each target object.

In an optional embodiment, the determining module 330 is further configured to:

for each expert model, obtaining a weight coefficient and a plurality of candidate feature maps of the expert model;

In an optional embodiment, the determining module 330 is further configured to:

for each expert model, inputting the weight of the expert model and the candidate feature map into a predetermined function to obtain a first calculation result;

inputting the expert marking vector and the first calculation result of the expert model into a first loss function of the expert model to obtain a second calculation result;

In an alternative embodiment, the second processing module 340 is further configured to:

and processing the matched candidate characteristic graph through a third loss function of the expert model to obtain the position offset of the target object, and determining the position of the target object according to the position offset.

In an optional embodiment, the fast area convolutional neural network includes a feature extraction layer, an area candidate network layer, and a region-of-interest pooling layer, and the first processing module 320 is further configured to:

extracting the features of the image through a feature extraction layer to obtain a feature map;

calculating the characteristic diagram through a regional candidate network layer to obtain a plurality of candidate regions;

In summary, according to the target detection device based on multiple expert models provided in the embodiment of the present application, since the neural network model includes the fast regional convolutional neural network and the multiple expert models, the image may be processed through the fast regional convolutional neural network to obtain a plurality of candidate feature maps, the candidate feature map matched with each expert model is determined in the multiple expert models, and finally the matched candidate feature map is processed through each expert model to obtain the category and the position of each target object. Because each expert model in the multiple expert models has a data area which is good at processing, different candidate feature maps can be processed by using different expert models, so that the different candidate feature maps can be processed by the expert model which is good at processing the data area, and the recognition rate of target detection is improved.

One embodiment of the present application provides a computer-readable storage medium, in which at least one instruction is stored, and the at least one instruction is loaded and executed by a processor to implement the multi-expert model based object detection method as described above.

One embodiment of the present application provides an electronic device, which includes a processor and a memory, where the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the multi-expert model based object detection method as described above.

It should be noted that: in the target detection device based on the multi-expert model provided in the above embodiment, when the target detection device based on the multi-expert model performs the target detection based on the multi-expert model, only the division of the functional modules is taken as an example, and in practical application, the function distribution may be completed by different functional modules according to needs, that is, the internal structure of the target detection device based on the multi-expert model is divided into different functional modules to complete all or part of the functions described above. In addition, the target detection device based on the multi-expert model provided by the above embodiment and the target detection method based on the multi-expert model belong to the same concept, and the specific implementation process thereof is detailed in the method embodiment and is not described herein again.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The above description should not be taken as limiting the embodiments of the present application, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the embodiments of the present application should be included in the scope of the embodiments of the present application.

Claims

1. A target detection method based on a multi-expert model is used in a neural network model, wherein the neural network model comprises a fast regional convolutional neural network and the multi-expert model, and the method comprises the following steps:

2. The method of claim 1, wherein determining candidate feature maps in the multi-expert model that match each expert model comprises:

3. The method of claim 2, wherein the obtaining the weighting coefficients and the candidate feature maps of the expert model comprises:

4. The method of claim 1, wherein the processing the matched candidate feature maps through each expert model to obtain the category and the position of each target object comprises:

5. The method according to any one of claims 1 to 4, wherein the fast regional convolutional neural network comprises a feature extraction layer, a regional candidate network layer and a region-of-interest pooling layer, and the processing the image through the fast regional convolutional neural network to obtain a plurality of candidate feature maps comprises:

6. An object detection device based on a multi-expert model, which is used in a neural network model, wherein the neural network model comprises a fast area convolution neural network and the multi-expert model, and the device comprises:

7. The apparatus of claim 6, wherein the determining module is further configured to:

8. The apparatus of claim 7, wherein the determining module is further configured to:

9. A computer-readable storage medium, wherein at least one instruction is stored in the storage medium, and the at least one instruction is loaded and executed by a processor to implement the multi-expert model based object detection method according to any one of claims 1 to 5.

10. An electronic device, comprising a processor and a memory, wherein the memory stores at least one instruction, and the instruction is loaded and executed by the processor to implement the multi-expert model based object detection method according to any one of claims 1 to 5.