CN110705479A

CN110705479A - Model training method, target recognition method, device, equipment and medium

Info

Publication number: CN110705479A
Application number: CN201910945706.8A
Authority: CN
Inventors: 侯峰
Original assignee: Beijing Orion Star Technology Co Ltd
Current assignee: Beijing Orion Star Technology Co Ltd
Priority date: 2019-09-30
Filing date: 2019-09-30
Publication date: 2020-01-17

Abstract

The invention discloses a training method of a network matching model, a target identification method, a device, equipment and a medium, because the embodiment of the invention obtains the first characteristic diagram and the second characteristic diagram of the template picture and the sample picture of the target to be detected respectively through the first characteristic extraction network and the second characteristic extraction network when the network matching model is trained, according to the first position information marked in the sample picture and the second position information determined in the second feature map, parameter values of parameters of the first feature extraction network and the second feature extraction network are respectively determined, therefore, the first characteristic extraction network and the second characteristic extraction network respectively learn the characteristics of the template picture and the sample picture, the distinguishing degree of the extracted characteristics of the two characteristic extraction networks is improved, the search range can be adjusted, and the success rate of subsequent identification is guaranteed.

Description

Model training method, target recognition method, device, equipment and medium

Technical Field

The invention relates to the technical field of computer vision, in particular to a training method of a network matching model, a target identification method, a device, equipment and a medium.

Background

With the development of the computer vision technology field, template matching is a common method for realizing target tracking, and is often used for positioning feature points in pictures to play a role in quick positioning. The principle of template matching is to use a template picture to find the most similar area to the template picture in the whole picture.

Template matching plays an important role in industrial vision, for example, in the detection of a single target, the method using template matching can have a higher speed than the similar detection method, and the precision can be guaranteed. Many target detection methods are slow in detection speed because target detection needs to search for a target in the whole picture, but in practical application, the target which needs to be detected usually only occupies a small part of the picture. If the search area can be locked in a small area, the calculation amount of the object detection can be greatly reduced, so that the speed of the object detection is improved. The template matching method can match the characteristic part in the whole image, and the position containing the target area can be cut out according to the relative position of the characteristic part and the target area, so that the improvement of the target detection speed is facilitated.

In the prior art, a traditional template matching method in an OpenCV standard library is usually adopted, but the traditional template matching method can only match targets in a small area, and false detection or target undetection can be caused if the search range is too large.

Disclosure of Invention

The embodiment of the invention provides a training method of a network matching model, a target identification method, a device, equipment and a medium, which are used for solving the problem that the target cannot be detected or is detected by mistake due to the fact that the template matching search range is too large.

The embodiment of the invention provides a training method of a network matching model, aiming at any sample picture in a sample set, first position information of an area where a target to be detected is marked in the sample picture, and the method comprises the following steps:

acquiring a first feature map of the template picture of the target to be detected through a first feature extraction network of a network matching model; acquiring a second feature map of the sample picture through a second feature extraction network of the network matching model;

performing convolution processing on the first characteristic diagram and the second characteristic diagram through a convolution layer of the network matching model to determine second position information of the target to be detected in the second characteristic diagram;

and training the network matching model according to the first position information and the second position information so as to respectively determine parameter values of parameters of the first feature extraction network and the second feature extraction network.

In a possible implementation, the obtaining a second feature map in the second feature extraction network includes: respectively acquiring a feature map output by the last convolutional layer in the second feature extraction network and a feature map output by at least one other convolutional layer in the second feature extraction network, and determining each corresponding second feature map according to each feature map;

the convolving the first feature map and the second feature map by the convolutional layer of the network matching model to determine second position information of the target to be detected in the second feature map includes: performing convolution processing on the first feature map and each second feature map through a convolution layer of the network matching model to determine second position information of the target to be detected in each second feature map;

the training the network matching model according to the first position information and the second position information includes: and training the network matching model according to the first position information and each piece of second position information.

In a possible implementation, the training the network matching model according to the first location information and each of the second location information includes:

determining a sum of the first location information and a loss value of each of the second location information;

and training the first feature extraction network and the second feature extraction network according to the sum of the loss values so as to update the parameters of the first feature extraction network and the second feature extraction network.

In one possible embodiment, the number of second feature maps obtained is 3.

In a possible implementation manner, if the second feature extraction network includes seven convolutional layers, the obtained second feature maps are feature maps output by a second convolutional layer, a fourth convolutional layer and a seventh convolutional layer in the second feature extraction network, respectively.

The invention also provides a target identification method of the network matching model obtained based on the training method, which comprises the following steps:

respectively inputting a template picture of a target to be detected and the picture to be detected into a first feature extraction network and a second feature extraction network in a trained network matching model;

and identifying the information of the region where the target to be detected in the picture to be detected is located based on a third feature map and a fourth feature map which are respectively output by a first feature extraction network and a second feature extraction network in the trained network matching model, wherein the parameters in the first feature extraction network in the trained network matching model are different from the parameters in the second feature extraction network.

The embodiment of the present invention further provides a training device for a network matching model, which is configured to, for any sample picture in a sample set, mark first location information of an area where an object to be detected is located in the sample picture, and the device includes:

the first extraction module is used for extracting a network through a first feature of a network matching model to obtain a first feature map of the template picture of the target to be detected;

the second extraction module is used for acquiring a second feature map of the sample picture through a second feature extraction network of the network matching model;

the matching module is used for performing convolution processing on the first characteristic diagram and the second characteristic diagram through the convolution layer of the network matching model to determine second position information of the target to be detected in the second characteristic diagram;

and the analysis module is used for training the network matching model according to the first position information and the second position information so as to respectively determine parameter values of parameters of the first feature extraction network and the second feature extraction network.

In a possible implementation manner, the second obtaining module is specifically configured to obtain a feature map output by a last convolutional layer in the second feature extraction network and a feature map output by at least one other convolutional layer in the second feature extraction network, and determine each corresponding second feature map according to each feature map;

the matching module is specifically configured to perform convolution processing on the first feature map and each of the second feature maps through the convolution layer of the network matching model, and determine second position information of the target to be detected in each of the second feature maps;

the analysis module is specifically configured to train the network matching model according to the first location information and each piece of the second location information.

In a possible implementation, the analysis module is specifically configured to determine a sum of the loss values of the first location information and each second location information; and training the first feature extraction network and the second feature extraction network according to the sum of the loss values so as to update the parameters of the first feature extraction network and the second feature extraction network.

The embodiment of the invention provides a target recognition device of a training device based on the network matching model, which comprises:

the input module is used for respectively inputting the template picture of the target to be detected and the picture to be detected into a first feature extraction network and a second feature extraction network in the trained network matching model;

and the identification module is used for identifying the information of the region where the target to be detected is located in the picture to be detected based on a third feature map and a fourth feature map which are respectively output by a first feature extraction network and a second feature extraction network in the trained network matching model, wherein the parameters in the first feature extraction network in the trained network matching model are different from the parameters in the second feature extraction network.

An embodiment of the present invention provides an electronic device, which includes a processor, and the processor is configured to implement the steps of the training method for any one of the network matching models when executing a computer program stored in a memory.

An embodiment of the present invention provides an electronic device, which includes a processor, and the processor is configured to implement the steps of the target recognition method based on the training method of the network matching model when executing a computer program stored in a memory.

An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of any of the network matching model training methods.

An embodiment of the present invention provides a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the computer program implements the steps of the target recognition method based on the training method of the network matching model.

According to the embodiment of the invention, when the network matching model is trained, the first characteristic diagram and the second characteristic diagram of the template picture and the sample picture of the target to be detected are obtained through the first characteristic extraction network and the second characteristic extraction network respectively, and the parameter values of the parameters of the first characteristic extraction network and the second characteristic extraction network are determined respectively according to the first position information marked in the sample picture and the second position information determined in the second characteristic diagram, so that the first characteristic extraction network and the second characteristic extraction network learn the characteristics of the template picture and the sample picture respectively, the distinguishing degree of the extracted characteristics of the two characteristic extraction networks is improved, the search range can be adjusted, and the success rate of subsequent identification is ensured.

Drawings

Fig. 1 is a schematic diagram of a training process of a network matching model according to an embodiment of the present invention;

fig. 2 is a schematic diagram of a connection structure of a network matching model according to an embodiment of the present invention;

fig. 3 is a flowchart of a specific training method of a network matching model according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a target recognition process of the training method based on the network matching model according to the embodiment of the present invention;

FIG. 5 is a schematic structural diagram of a training apparatus for a network matching model according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a target recognition device of a training device based on the network matching model according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to expand the search range of template matching and improve the robustness of template matching, the embodiment of the invention provides a training method of a network matching model, a target identification method, a device, equipment and a medium.

Example 1:

fig. 1 is a schematic diagram of a training process of a network matching model according to an embodiment of the present invention, where the training process includes the following steps:

s101: and acquiring a first characteristic diagram of the template picture of the target to be detected through a first characteristic extraction network of the network matching model.

In order to facilitate the recognition of the target to be detected, the network matching model for recognition needs to be trained in advance. And training the network matching model based on the template picture of the target to be detected.

The template picture is a picture of the target to be detected, and it can be understood that the template picture only contains the picture of the target to be detected, and the sample picture contains the target to be detected, and certainly can also contain other contents, such as an ambient image, a background image and the like. For example, the template picture is a small five-pointed star picture, and the sample picture is a picture having one surface including five-pointed stars, three-pointed stars, and four-pointed stars in different sizes.

The first feature extraction network is used for extracting features of a target to be detected in a template picture so as to output a first feature map, and the first feature extraction network comprises convolution layers and a pooling layer, wherein the set number of the convolution layers needs to be set according to specific actual requirements, the pooling layer generally comprises at least two layers of convolution layers, and the pooling layer can reduce the dimension of a feature map matrix so as to accelerate the training of a network matching model and inhibit the occurrence of an over-fitting phenomenon.

For example, the first feature extraction network may have a structure of five convolutional layers and one pooling layer, or seven convolutional layers and two pooling layers, and the number of the convolutional layers and the pooling layers is only for convenience of description of the first feature extraction network, where the number and the structure of each layer may be specifically set according to actual requirements.

For the processing process of the template picture in the first feature extraction network, a person skilled in the art can determine how to obtain the first feature map of the template picture according to the first feature extraction network based on the description of the embodiment of the present invention, and details are not repeated here.

S102: and acquiring a second feature map of the sample picture through a second feature extraction network of the network matching model.

The sample picture is any picture in the sample set, the sample pictures in the sample set are labeled in advance, first position information of an area where the target to be detected is located is labeled in each sample picture, and specifically, the first position information of the area where the target to be detected is located can be labeled in the sample picture in a rectangular frame mode.

The second feature extraction network and the first feature extraction network have the same number of convolutional layers and pooling layers, and the structures of the layers are the same, that is, the first feature extraction network and the second feature extraction network have the same network structure. The second feature extraction network is used for performing feature extraction on the sample picture so as to output a second feature map.

In order to improve robustness, in the embodiment of the present invention, when the network matching model is trained, in order to make network parameters of the first feature extraction network and the second feature extraction network different, the first feature extraction network is used to extract features of the network learning template picture, and the second feature extraction network is used to extract features of the network learning sample picture.

S103: and performing convolution processing on the first characteristic diagram and the second characteristic diagram through the convolution layer of the network matching model to determine second position information of the target to be detected in the second characteristic diagram.

In one possible implementation, in order to determine the second position information of the object to be detected in the second feature map, the first feature map is used as a convolution kernel to perform convolution processing on the second feature map, so that the second position information of the object to be detected in the sample picture is determined.

Specifically, a process of performing convolution processing according to the first feature map and the second feature map to determine the second position information of the target to be detected in the sample picture can be specifically implemented by those skilled in the art through the above description, and the process is not described herein again.

S104, training the network matching model according to the first position information and the second position information to respectively determine parameter values of parameters of the first feature extraction network and the second feature extraction network.

In specific implementation, after the second position information of the target to be detected in the second feature map is determined, because the sample picture is labeled with the first position information of the target to be detected in advance, a loss value can be calculated according to the first position information and the second position information, and the first feature extraction network and the second feature extraction network are trained respectively according to the loss value so as to update parameter values of parameters of the first feature extraction network and the second feature extraction network, so that the first feature extraction network and the second feature extraction network have different parameter values.

In a specific implementation, when the first feature extraction network and the second feature extraction network are trained according to the loss value, a gradient descent algorithm may be adopted to perform back propagation on gradients of parameters in the first feature extraction network and the second feature extraction network, so as to update parameter values of the first feature extraction network and the second feature extraction network.

And (3) the sample set for training the network matching model comprises a large number of sample pictures, the operation is carried out on each sample picture and the template picture comprising the target to be detected, and when the preset convergence condition is met, the training of the network matching model is finished.

The condition that the preset convergence condition is met can be that a loss value calculated according to the first position information and the second position information is smaller than a set loss value threshold, the number of iterations for training the model reaches a set maximum number of iterations, and the like. The specific implementation can be flexibly set, and is not particularly limited herein.

Example 2:

when the image to be detected is subjected to target recognition through a template matching method trained by a training method based on the existing network matching model, the similarity between the target to be detected in the image to be detected and the target in the template image has high requirements, and the size of the target to be detected in the image to be detected is possibly detected only if the size of the target to be detected in the image to be detected is completely the same as or has little difference with the size of the target in the template image, so the robustness is poor. In order to improve the robustness of detection, on the basis of the above embodiment, in the embodiment of the present invention:

the obtaining of the second feature map of the sample picture includes: respectively acquiring a feature map output by the last convolutional layer in the second feature extraction network and a feature map output by at least one other convolutional layer in the second feature extraction network, and determining a second feature map corresponding to each feature map according to each feature map;

In order to further improve robustness, in the embodiment of the present invention, when the feature map of the sample picture is obtained through the second feature extraction network, at least two feature maps of the sample picture may be obtained, specifically, a feature map output by a last convolutional layer in the second feature extraction network is obtained, a feature map output by at least one other convolutional layer in the second feature extraction network is obtained, and a second feature map corresponding to the feature map is determined according to each obtained feature map. Because the length and the width of the obtained feature map are different after the convolution processing is performed on the different convolution layers of the second feature extraction network by the sample picture, that is, each convolution layer has different receptive fields, if the feature maps obtained by the at least two convolution layers of the second feature extraction network are obtained during the training of the network matching model, the feature maps with different lengths and widths are obtained, and each corresponding feature map is determined to be different in length and width according to each obtained feature map, after the convolution processing is performed on the first feature map and each second feature map by the network matching model, the second position information of the target to be detected can be determined in the second feature maps with different sizes.

Specifically, when each corresponding second feature map is determined according to each feature map acquired in the second feature extraction network, the network structure of the second feature extraction network may be flexibly set as needed, for example, each acquired feature map may be directly used as the second feature map.

For another example, when the last layer of the second feature extraction network is not a convolutional layer, the last convolutional layer may be further connected to other network layers such as a pooling layer or a full connection layer, that is, the feature map output by the last convolutional layer is further processed by other subsequent network layers connected to the last convolutional layer, and the feature map output by the last convolutional layer is further processed by other subsequent network layers, at this time, the feature map output by the last network layer in the second feature extraction network may be used as the second feature map determined according to the feature map output by the last convolutional layer. The feature map output by at least one other convolutional layer in the second feature extraction network can be processed in the same processing mode as the feature map output by the last convolutional layer, namely, the feature map is processed by other subsequent network layers of the last convolutional layer, so that a corresponding second feature map is obtained; of course, the feature map output by at least one other convolutional layer in the second feature extraction network may be used as the second feature map.

Fig. 2 is a schematic diagram of a connection structure of a network matching model according to an embodiment of the present invention, in which three convolutional layers of an output feature map in a second feature extraction network are shown, and the second feature extraction network may further include other convolutional layers and pooling layers, but for convenience of description, the number of convolutional layers and pooling layers included in the first feature extraction network is the same as that of the second feature extraction network, and the first feature extraction network also includes a plurality of convolutional layers. After convolution processing is carried out on a template picture of a target to be detected for a first convolution layer in a first feature extraction network, outputting a feature map and sending the feature map to a next convolution layer for convolution processing, wherein other convolution layers behind the first convolution layer all receive the feature map output by the last convolution layer and output the feature map after convolution processing, and after convolution processing is carried out on the last convolution layer and the feature map is output, a corresponding first feature map is determined according to the feature map, and the process of outputting the first feature map by the first feature extraction network is consistent with the process of outputting the feature map by the feature extraction network in the existing network matching model. Since the corresponding first feature map is determined only according to the feature map output by the last convolutional layer of the first feature extraction network, only the first feature extraction network is simply shown in fig. 2, and the convolutional layer and the pooling layer included in the first feature extraction network are not specifically shown in the figure.

Note that convolutional layer C shown in fig. 2 represents the last convolutional layer in the second feature extraction network.

For convenience of description, the first feature extraction network and the second feature extraction network shown in fig. 2 are described as including only 3 convolutional layers, and the 3 convolutional layers included in the first feature extraction network and the second feature extraction network are respectively: and the convolutional layer A, the convolutional layer B and the convolutional layer C are sequentially connected, wherein the convolutional layer A represents any convolutional layer except the convolutional layer C in the second feature extraction network, and the convolutional layer B represents any convolutional layer except the convolutional layer C and the convolutional layer A in the second feature extraction network. The convolutional layer obtained by performing convolution processing on the first feature map obtained from the first feature extraction network and the second feature map obtained from the second feature extraction network is convolutional layer Z. When the network matching model is trained, the feature map output by the last convolutional layer of the first feature extraction network, namely convolutional layer C, is obtained, and the corresponding first feature map is determined according to the feature map.

In order to further improve robustness, a second feature extraction network is used to obtain feature maps output by any at least two convolutional layers, in the embodiment of the present invention, taking the network matching model shown in fig. 2 as an example, the last layer of the first feature extraction network and the second feature extraction network is a convolutional layer, feature maps output by a convolutional layer a, a convolutional layer B and a convolutional layer C of the second feature extraction network are obtained, and each corresponding second feature map is determined according to each feature map. That is, after the feature map is output by the convolutional layer a, a second feature map is determined according to the feature map, the feature map output by the convolutional layer a is further sent to the convolutional layer B for continuous convolution processing, the feature map is output after the convolutional layer B convolution processing, a second feature map is also determined according to the feature map, the feature map output by the convolutional layer B is also sent to the last convolutional layer, that is, the convolutional layer C continues convolution processing, the feature map is output after the convolutional layer C convolution processing, the second feature map is also determined according to the feature map, and at this time, three second feature maps of the sample picture are obtained through the second feature extraction network.

And performing convolution processing on the acquired first characteristic diagram and the three second characteristic diagrams respectively, and determining second position information of the target to be detected in the three second characteristic diagrams respectively. In fig. 2, in order to describe the convolution layer, the convolution processing is performed on the second feature maps acquired from the convolution layers a and B, wherein the dotted lines indicate that the second feature maps determined from the feature maps output by the convolution layers a and B are input to the convolution layer Z for convolution processing, respectively, so that the convolution layer Z performs convolution processing based on the first feature map and the received second feature map.

In order to further optimize parameters in the feature extraction network, on the basis of the above embodiment, in an embodiment of the present invention, the training the network matching model according to the first location information and each piece of the second location information specifically includes:

In specific implementation, because the sample picture is labeled with the first position information of the target to be detected in advance, loss values between each second position information and the labeled first position information in advance can be determined, the loss values can be summed, the sum of the loss values is determined, and the first feature extraction network and the second feature extraction network are trained respectively according to the determined sum of the loss values, so that parameter values of different parameters of the first feature extraction network and the second feature extraction network are obtained.

In a specific implementation, when the first feature extraction network and the second feature extraction network are trained according to the sum of the loss values, a gradient descent algorithm may be adopted to perform back propagation on gradients of parameters in the first feature extraction network and the second feature extraction network, so as to update parameter values of the first feature extraction network and the second feature extraction network.

The first feature extraction network and the second feature extraction network are trained according to the sum of the loss values of the first feature graph and each second feature graph, so that the feature extraction network can perform parameter adjustment according to multi-size information of the template graph relative to the target to be detected, and the robustness of the feature extraction network in the multi-size aspect of the target to be detected and the template graph is enhanced.

Example 3:

in order to solve the problem that the traditional training algorithm of the network matching model cannot learn the multiple sizes of the target to be detected, on the basis of the above embodiments, in the embodiment of the present invention, the number of the acquired second feature maps is 3.

When determining each corresponding second feature map from each feature map output from the convolutional layer in the second feature extraction network, if each second feature map has a great influence on the matching result of the second feature extraction network, the feature map output by each convolutional layer in the second feature extraction network can be obtained, then according to each feature map, determining each corresponding second feature map, and performing convolution processing on each second feature map and the first feature map, this facilitates the network matching model to learn the feature information of each size of the second feature extraction network, furthermore, when the loss value is calculated, second position information determined in the second feature maps of the objects to be detected in the sizes and first position information of the artificial marks can be considered, and subsequent adjustment of parameter values of the convolutional layer parameters in the first feature extraction network and the second feature extraction network is facilitated.

However, since the scale variation of the feature map between two adjacent convolutional layers is not particularly large, the influence of each second feature map determined from the feature map output by each convolutional layer on the scale invariance is not large. If too few convolutional layers are selected for outputting the second feature map, the effect of the network on the gradient feedback about the scale invariance is not obvious enough. Therefore, in the embodiment of the present invention, the number of the acquired second feature maps is 3.

For the convolutional layer outputting the second feature map, this may be referred to as a second feature extraction convolutional layer, and when the second feature extraction convolutional layer is selected, a non-adjacent convolutional layer in the second feature extraction network may be selected, for example, a non-adjacent convolutional layer separated by one convolutional layer is selected as the second feature extraction convolutional layer, or, for example, a non-adjacent convolutional layer separated by two convolutional layers is selected as the second feature extraction convolutional layer. If the second feature extraction network includes 7 convolutional layers, the obtained second feature map may include feature maps output by a second convolutional layer, a fourth convolutional layer, and a seventh convolutional layer in the second feature extraction network, respectively.

Specifically, when the second feature extraction network includes 7 convolutional layers, feature maps output by the second convolutional layer, the fourth convolutional layer, and the seventh convolutional layer may be obtained, and each corresponding second feature map may be determined according to the feature maps output by the second convolutional layer, the fourth convolutional layer, and the seventh convolutional layer. And each second feature map is convolved with the first feature map output by the first feature extraction network, so that the network matching model is facilitated to learn the multi-size feature information of the second feature extraction network, the subsequent adjustment of parameters of each convolution layer in the first feature extraction network and the second feature extraction network is facilitated, and the calculation amount is greatly reduced.

For example, assume that the second feature extraction network includes 7 convolutional layers and the connection order of each network layer is: a first convolutional layer, a second convolutional layer, a pooling layer, a third convolutional layer, a fourth convolutional layer, a pooling layer, a fifth convolutional layer, a sixth convolutional layer, a pooling layer, and a seventh convolutional layer. An input sample picture is subjected to convolution processing in a first convolution layer and then a characteristic diagram 1 is output, the characteristic diagram 1 is input to a second convolution layer and then is subjected to convolution processing, the second convolution layer is subjected to convolution processing and then outputs a characteristic diagram 2, a corresponding second characteristic diagram is determined according to the characteristic diagram 2, the characteristic diagram 2 is input to a pooling layer and then is subjected to pooling processing, the pooling layer is subjected to pooling processing and then outputs a characteristic diagram 3, the characteristic diagram 3 is input to a third convolution layer and then is subjected to convolution processing, the third convolution layer is subjected to convolution processing and then outputs a characteristic diagram 4, the characteristic diagram 4 is input to a fourth convolution layer and then is subjected to convolution processing, the fourth convolution layer is subjected to convolution processing and then outputs a characteristic diagram 5, a corresponding second characteristic diagram is determined according to the characteristic diagram 5, the characteristic diagram 5 is input to the pooling layer and then is subjected to pooling processing, the pooling layer is subjected to pooling processing and then outputs a characteristic diagram 6, inputting the feature map 6 into a fifth convolutional layer for convolution processing, outputting a feature map 7 after convolution processing of the fifth convolutional layer, inputting the feature map 7 into a sixth convolutional layer for convolution processing, outputting a feature map 8 after convolution processing of the sixth convolutional layer, inputting the feature map 8 into a pooling layer for pooling processing, outputting a feature map 9 after pooling processing of the pooling layer, inputting the feature map 9 into a seventh convolutional layer for convolution processing, outputting a feature map 10 after convolution processing of the seventh convolutional layer, and determining a corresponding second feature map according to the feature map 10.

The following describes in detail a training process of the network matching model in the embodiment of the present invention by using a specific embodiment. In this embodiment, the sample set includes a plurality of sample pictures, and each sample picture marks first position information of an area where the target to be detected is located. Fig. 3 is a flowchart of a specific training method of a network matching model, including:

s301: and acquiring a template picture of the target to be detected.

S302: and acquiring a first characteristic diagram of the template picture of the target to be detected through a network matching model.

The following processing of S303 is performed for each sample picture in the sample set:

s303: feature maps output by the convolutional layer A, the convolutional layer B and the convolutional layer C in the second feature extraction network shown in FIG. 2 are obtained through a network matching model, and a second feature map corresponding to each feature map is determined according to each feature map.

S304, performing convolution processing on the first characteristic diagram and each second characteristic diagram, and determining second position information of the target to be detected in each second characteristic diagram.

The sum of the loss values of the first location information and each of the second location information is determined S305.

S306: according to the sum of the loss values, a gradient descent algorithm can be adopted to train the first feature extraction network and the second feature extraction network so as to update the parameters of the first feature extraction network and the second feature extraction network.

Example 4:

fig. 4 is a schematic diagram of a target recognition process of the training method based on the network matching model according to the embodiment of the present invention, where the process includes the following steps:

s401: and respectively inputting the template picture of the target to be detected and the picture to be detected into a first feature extraction network and a second feature extraction network in the trained network matching model.

S402: and identifying the information of the region where the target to be detected in the picture to be detected is located based on a third feature map and a fourth feature map which are respectively output by a first feature extraction network and a second feature extraction network in the network matching model, wherein the parameters in the first feature extraction network in the trained network matching model are different from the parameters in the second feature extraction network.

Specifically, the first feature extraction network and the second feature extraction network are 7 convolutional layers as an example for explanation, and a template picture and a picture to be detected of the target to be detected are obtained. And respectively acquiring a third feature map of the template picture of the target to be detected and a fourth feature map output by the last convolutional layer in the second feature extraction network through a network matching model. And performing convolution processing on the third characteristic diagram and each fourth characteristic diagram to determine the information of the area where the target to be detected is located in the fourth characteristic diagram.

In order to better realize target recognition, in the embodiment of the present invention, the target recognition process is a model trained based on the training method of the network matching model in the above embodiment.

For the above method for extracting the feature map from the template picture and the picture to be detected and the method for identifying the information of the region where the target to be detected is located in the picture to be detected, it is believed that a person skilled in the art can determine how to specifically obtain the corresponding feature map through the first feature extraction network and the second feature extraction network based on the description of the embodiment of the present invention and identify the target to be detected according to the obtained feature map, and details are not repeated herein.

According to the target recognition method based on the network matching model training method, parameters in the first feature extraction network and the second feature extraction network are determined through the network matching model training method after characteristics of a template picture and a sample picture of a target to be detected are learned respectively, so that the degree of distinction of the extracted characteristics of the two feature extraction networks is higher, the search range is wider during target recognition, and the success rate of recognizing the target to be detected is ensured.

Example 5:

fig. 5 is a schematic structural diagram of a training apparatus for a network matching model according to an embodiment of the present invention, where the training apparatus for a network matching model according to an embodiment of the present invention includes:

a first extraction module 501, configured to extract a network through a first feature of a network matching model, to obtain a first feature map of a template picture of the target to be detected;

a second extraction module 502, configured to obtain a second feature map of the sample picture in a second feature extraction network through the second feature extraction network of the network matching model;

the matching module 503 is configured to perform convolution processing on the first feature map and the second feature map through a convolution layer of the network matching model, and determine second position information of the target to be detected in the second feature map;

an analysis module 504, configured to train the network matching model according to the first location information and the second location information, so as to determine parameter values of parameters of the first feature extraction network and the second feature extraction network, respectively.

In a possible embodiment, the second extraction module 502 is specifically configured to obtain a feature map output by a last convolutional layer in the second feature extraction network and a feature map output by at least one other convolutional layer in the second feature extraction network, and determine each corresponding second feature map according to each feature map;

the matching module 503 is specifically configured to perform convolution processing on the first feature map and each of the second feature maps through the convolution layer of the network matching model, and determine second position information of the target to be detected in each of the second feature maps;

the analysis module 504 is specifically configured to train the network matching model according to the first location information and each piece of the second location information.

In a possible embodiment, the analysis module 504 is specifically configured to determine a sum of the loss values of the first location information and each second location information; and training the first feature extraction network and the second feature extraction network according to the sum of the loss values so as to update the parameters of the first feature extraction network and the second feature extraction network.

Example 6:

fig. 6 is a schematic structural diagram of a target recognition device of a training device based on the network matching model according to an embodiment of the present invention, and the target recognition device of the training device based on the network matching model according to an embodiment of the present invention includes:

the input module 601 is configured to input a template picture of a target to be detected and a picture to be detected into a first feature extraction network and a second feature extraction network in a trained network matching model respectively;

the identifying module 602 is configured to identify information of an area where a target to be detected in the picture to be detected is located based on a third feature map and a fourth feature map output by a first feature extraction network and a second feature extraction network in the trained network matching model, where parameters in the first feature extraction network in the trained network matching model are different from parameters in the second feature extraction network.

Example 7:

fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and on the basis of the foregoing embodiments, an electronic device according to an embodiment of the present invention further includes a processor 71 and a memory 72;

the processor 71 is adapted to carry out the steps of the above-described method of tracking focus when executing a computer program stored in the memory 72.

The processor 71 may be a CPU (central processing unit), an ASIC (Application Specific integrated circuit), an FPGA (Field Programmable Gate Array), or a CPLD (Complex Programmable Logic Device).

A processor 71 for performing the following steps when in accordance with a computer program stored in the memory 72:

Based on any of the above embodiments, the processor 71 specifically executes the following steps:

acquiring a feature map output by the last convolutional layer in the second feature extraction network and a feature map output by at least one other convolutional layer in the second feature extraction network, and determining each corresponding second feature map according to each feature map;

performing convolution processing on the first feature map and each second feature map through a convolution layer of the network matching model to determine second position information of the target to be detected in each second feature map;

and training the network matching model according to the first position information and each piece of second position information.

Based on any of the above embodiments, the processor 71 further performs the following steps:

determining a sum of the first location information and a loss value of each of the second location information; and training the first feature extraction network and the second feature extraction network according to the sum of the loss values so as to update the parameters of the first feature extraction network and the second feature extraction network.

Based on any of the above embodiments, the number of the acquired second feature maps is 3.

Based on any of the above embodiments, if the second feature extraction network includes seven convolutional layers, the obtained second feature maps are feature maps output by the second convolutional layer, the fourth convolutional layer, and the seventh convolutional layer in the second feature extraction network, respectively.

Example 8:

fig. 8 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and on the basis of the foregoing embodiments, an electronic device according to an embodiment of the present invention further includes a processor 81 and a memory 82; the processor 81 is arranged to carry out the steps of the above-described method of tracking focus when executing a computer program stored in the memory 82.

The processor 81 may be a CPU (central processing unit), an ASIC (Application specific integrated Circuit), an FPGA (Field Programmable Gate Array), or a CPLD (Complex Programmable Logic Device), among others.

A processor 81 for performing the following steps when in accordance with the computer program stored in the memory 82:

Example 9:

on the basis of the foregoing embodiments, an embodiment of the present invention provides a computer-readable storage medium, in which a computer program executable by an electronic device is stored, and when the program is run on the electronic device, the electronic device is caused to execute the following steps:

acquiring a first feature map of the template picture of the target to be detected through a first feature extraction network of a network matching model; acquiring a second feature map of the sample picture in a second feature extraction network through the second feature extraction network of the network matching model;

Wherein the obtaining of the second feature map in the second feature extraction network includes: acquiring a feature map output by the last convolutional layer in the second feature extraction network and a feature map output by at least one other convolutional layer in the second feature extraction network, and determining each corresponding second feature map according to each feature map;

In a possible embodiment, the training the network matching model according to the first location information and each of the second location information includes:

In one possible embodiment, the number of the second feature maps obtained is 3.

In a possible embodiment, if the second feature extraction network includes seven convolutional layers, the obtained second feature maps are feature maps output by a second convolutional layer, a fourth convolutional layer and a seventh convolutional layer in the second feature extraction network, respectively.

Example 10:

on the basis of the foregoing embodiments, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program executable by an electronic device is stored, and when the program is run on the electronic device, the electronic device is caused to execute the following steps:

and identifying the information of the region where the target to be detected in the picture to be detected is located based on a third feature map and a fourth feature which are respectively output by a first feature extraction network and a second feature extraction network in the trained network matching model, wherein the parameters in the first feature extraction network in the trained network matching model are different from the parameters in the second feature extraction network.

The computer readable storage medium may be any available medium or data storage device that can be accessed by a processor in an electronic device, including but not limited to magnetic memory such as floppy disks, hard disks, magnetic tape, magneto-optical disks (MO), etc., optical memory such as CDs, DVDs, BDs, HVDs, etc., and semiconductor memory such as ROMs, EPROMs, EEPROMs, nonvolatile memories (NANDFLASH), Solid State Disks (SSDs), etc.

For the system/apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A training method of a network matching model is characterized in that for any sample picture in a sample set, first position information of an area where a target to be detected is located is marked in the sample picture, and the method comprises the following steps:

2. The method of claim 1,

the obtaining of the second feature map of the sample picture includes: respectively acquiring a feature map output by the last convolutional layer in the second feature extraction network and a feature map output by at least one other convolutional layer in the second feature extraction network, and determining each corresponding second feature map according to each feature map;

3. The method of claim 2, wherein the training the network matching model based on the first location information and each of the second location information comprises:

4. The method of claim 2, wherein the number of second profiles obtained is 3.

5. The method of claim 4, wherein if the second feature extraction network includes seven convolutional layers, the obtained second feature maps are feature maps output by a second convolutional layer, a fourth convolutional layer and a seventh convolutional layer in the second feature extraction network, respectively.

6. A method for identifying a target based on a network matching model trained by the method of any one of claims 1 to 5, the method comprising:

7. A training device for a network matching model is characterized in that for any sample picture in a sample set, first position information of an area where an object to be detected is marked in the sample picture is included in the sample picture, and the device comprises:

8. An object recognition apparatus of a training apparatus based on the network matching model of claim 7, the apparatus comprising:

9. An electronic device, characterized in that the electronic device comprises a processor for implementing the steps of the method according to any of claims 1-6 when executing a computer program stored in a memory.

10. A computer-readable storage medium, characterized in that it stores a computer program which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.