CN111104942B

CN111104942B - Template matching network training method, recognition method and device

Info

Publication number: CN111104942B
Application number: CN201911248538.3A
Authority: CN
Inventors: 赵青; 蔡旗
Original assignee: Seizet Technology Shenzhen Co Ltd
Current assignee: Shenzhen Robot Vision Technology Co Ltd
Priority date: 2019-12-09
Filing date: 2019-12-09
Publication date: 2023-11-03
Anticipated expiration: 2039-12-09
Also published as: CN111104942A

Abstract

The application discloses a template matching network training method, an identification method and a device, which belong to the field of computer vision and deep learning, wherein the method comprises the following steps: obtaining a sample template; generating training data comprising a plurality of sample pictures based on a sample template, and recording center point coordinates, rotation angles, scaling ratios and category information of each target object contained in each sample picture; normalizing the training data to obtain target training data, and extracting features of the target training data by using a convolutional neural network to obtain feature data; training a suggestion frame network according to the feature data, reserving a target suggestion frame meeting preset requirements, mapping the target suggestion frame to a position corresponding to a feature map, and performing rotation operation on the target suggestion frame to obtain a target feature map; and training the neural network through the target feature map to obtain a template matching network. The application can accurately position the target object and the pixel area where the target object is positioned.

Description

Template matching network training method, recognition method and device

Technical Field

The application belongs to the field of computer vision and deep learning, and particularly relates to a template matching network training method, a template matching network recognition method and a template matching network recognition device.

Background

Traditional template matching is used as a common method in the field of vision, is widely applied to simple occasions with good illumination and clear foreground and background, but cannot adapt to complex scenes. Based on the target detection of the deep learning, various objects can be detected in a complex environment, and the objects are classified and positioned in a specific positioning mode that the external axes of the detected objects are aligned with a rectangular envelope frame (axially aligned bounding box, AABB). However, since the area outlined by the AABB box is typically much larger than the pixel area occupied by the actual target object. Under certain applications (e.g., robotically unordered grabbing), the positioning reference value of the AABB frame is not great.

Disclosure of Invention

Aiming at the defects or improvement demands of the prior art, the application provides a template matching network training method, a template matching network recognition method and a template matching network recognition device, so that the technical problem that a target object and a pixel area where the target object is positioned cannot be accurately positioned in a complex environment in the existing AABB positioning mode is solved.

To achieve the above object, according to one aspect of the present application, there is provided a template matching network training method, including:

(1) Obtaining a sample template, wherein the sample template at least comprises a target object diagram and a background diagram, and the sample template comprises a target outline of a target object;

(2) Generating training data comprising a plurality of sample pictures based on the sample template, and recording center point coordinates of each target object contained in each sample picture, rotation angles of each target object, scaling ratios of each target object and category information of each target object;

(3) Performing normalization processing on the training data to obtain target training data, and performing feature extraction on the target training data by using a convolutional neural network to obtain feature data, wherein the feature data comprises gray scale of the target training data and contour feature data of each target object;

(4) Training a suggestion frame network according to the feature data, reserving a target suggestion frame meeting preset requirements, mapping the target suggestion frame to a position corresponding to a feature map, and performing rotation operation on the target suggestion frame to obtain a target feature map;

(5) And training a neural network through the target feature map to obtain a template matching network, wherein the output of the template matching network is at least the category of the target object and the scaling ratio of the target object.

Preferably, step (2) comprises:

(2.1) randomly selecting target objects, randomly setting the scaling and the rotation angle of each target object based on the sample template;

and (2.2) generating training data comprising a plurality of sample pictures by adding a von neumann topological structure in a particle swarm algorithm, and recording the center point coordinates of each target object, the rotation angle of each target object, the scaling ratio of each target object and the category information of each target object in each sample picture.

Preferably, step (4) comprises:

(4.1) training a suggestion frame network according to the characteristic data, and predicting the axis alignment suggestion frame of each target object and the rotation angle value of each target object, wherein the suggestion frame network comprises a rpn network and an angle classification network;

(4.2) obtaining the coincidence ratio between the shaft alignment suggested frame and the actual suggested frame, obtaining a difference value between the rotation angle value of the target object in the shaft alignment suggested frame and the rotation angle value of the target object in the actual suggested frame, and if the coincidence ratio is smaller than a preset coincidence ratio threshold value and the difference value is smaller than a preset difference value threshold value, discarding the shaft alignment suggested frame with the coincidence ratio lower than the preset coincidence ratio threshold value to obtain a reserved target shaft alignment suggested frame;

and (4.3) mapping the target axis alignment suggestion frame to a position corresponding to the feature map, and performing rotation transformation on the target axis alignment suggestion frame at the position based on the rotation angle value of the target object in the target axis alignment suggestion frame to obtain the target feature map.

Preferably, in step (4.1), predicting the rotation angle value of the target object includes:

and classifying the angles, wherein the standard angle and a plurality of left and right degrees thereof are positive values, and the rest angle label values are all set to 0, wherein the standard angle is the direction of the centroid of the target object along the long side of the minimum rectangular envelope frame.

According to another aspect of the present application, there is provided an identification method comprising:

inputting the picture to be identified into the template matching network trained by the template matching network training method of any one of the above, and carrying out identification processing to obtain category information and pose information of each target object contained in the picture to be identified.

According to another aspect of the present application, there is provided a template matching network training apparatus comprising:

the template acquisition module is used for acquiring a sample template, wherein the sample template at least comprises a target object diagram and a background diagram, and the sample template comprises a target outline of a target object;

the training data acquisition module is used for generating training data comprising a plurality of sample pictures based on the sample template and recording the center point coordinates of each target object contained in each sample picture, the rotation angle of each target object, the scaling ratio of each target object and the category information of each target object;

the feature extraction module is used for carrying out normalization processing on the training data to obtain target training data, and carrying out feature extraction on the target training data by utilizing a convolutional neural network to obtain feature data, wherein the feature data comprises the gray level of the target training data and the contour feature data of each target object;

the feature map acquisition module is used for training a suggestion frame network according to the feature data, reserving a target suggestion frame meeting preset requirements, mapping the target suggestion frame to a corresponding position of a feature map, and performing rotation operation on the target suggestion frame to obtain a target feature map;

and the training module is used for training the neural network through the target feature map to obtain a template matching network, wherein the output of the template matching network is at least the category of the target object and the scaling ratio of the target object.

Preferably, the training data acquisition module includes:

the preprocessing module is used for randomly selecting target objects based on the sample template, and randomly setting the scaling and the rotation angle of each target object;

the training data acquisition sub-module is used for generating training data comprising a plurality of sample pictures in a mode of adding a von Neumann topological structure in a particle swarm algorithm, and recording center point coordinates of each target object contained in each sample picture, rotation angles of each target object, scaling ratios of each target object and category information of each target object.

Preferably, the feature map acquisition module includes:

the first training module is used for training a suggestion frame network according to the characteristic data and predicting the shaft alignment suggestion frame of each target object and the rotation angle value of each target object, wherein the suggestion frame network comprises a rpn network and an angle classification network;

the judging and processing module is used for acquiring the coincidence ratio between the shaft alignment suggested frame and the actual suggested frame, acquiring a difference value between the rotation angle value of the target object in the shaft alignment suggested frame and the rotation angle value of the target object in the actual suggested frame, and discarding the shaft alignment suggested frame with the coincidence ratio lower than the preset coincidence ratio threshold value if the coincidence ratio is smaller than the preset coincidence ratio threshold value and acquiring a reserved target shaft alignment suggested frame if the coincidence ratio is smaller than the preset coincidence ratio threshold value;

and the characteristic map acquisition sub-module is used for mapping the target axis alignment suggestion frame to a position corresponding to the characteristic map, and then carrying out rotation transformation on the target axis alignment suggestion frame at the position based on the rotation angle value of the target object in the target axis alignment suggestion frame to obtain the target characteristic map.

Preferably, predicting the rotation angle value of the target object includes:

According to another aspect of the present application, there is provided an identification device comprising:

the recognition result acquisition module is used for inputting the picture to be recognized into the template matching network trained by the template matching network training device to perform recognition processing to obtain the category information and the pose information of each target object contained in the picture to be recognized.

In general, the above technical solutions conceived by the present application, compared with the prior art, enable the following beneficial effects to be obtained:

1. according to the application, through angle prediction, feature mapping, feature map rotation and the like, the information such as the central position of various target objects, the scaling ratio relative to the template, the rotation angle and the like can be detected under the complex background condition.

2. The RPN network is additionally provided with a function of predicting angles, and the angles are classified into 360 categories by adopting a classification method for angle prediction. Wherein the standard angle and the left and right 5 degrees are positive values, and the rest angle label values are all set to 0. Assignment method of positive value label: the label value of "standard angle" is 1, decreasing by 0.2 per 1 degree deviation of the label value. In addition, the symmetrical object has a plurality of "standard angles". For example, a rectangle has 2 standard angles, a square has 4 standard angles, and a circle is all standard angles.

3. The roiafine method is proposed. The conventional roiplating method maps the suggestion frame to the featureMap and cuts out a new featureMap area (hereinafter referred to as new_featuremap), and then predicts the length, width and type of the object according to the new_featuremap. Since the target object in new_featuremap is not subjected to angle correction, the characteristic extraction of calculated length and width is imperfect due to diversified angles, and therefore, the prediction of the length and width of the target object is usually inaccurate. The ROIAffine provided by the application performs cutting and angle correction work simultaneously when the feature is mapped, and the length and width of the target object can be calculated more accurately after the feature_map is cut.

4. A method of generating training data based on an improved particle swarm algorithm is presented. The user only needs to provide all the object images and the background images and outline the outer outline of the object, so that the image can be automatically generated.

Drawings

FIG. 1 is a schematic flow chart of a training method of a template matching network according to an embodiment of the present application;

FIG. 2 is a result diagram of a suggestion box network provided by an embodiment of the present application;

fig. 3 is a diagram of a result of a rotation of a suggestion box area in a picture according to an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application. In addition, the technical features of the embodiments of the present application described below may be combined with each other as long as they do not collide with each other.

In order to accurately position a target object and a pixel area where the target object is located in a complex environment, the application provides a template matching network training method, a template matching method and a template matching device, which can detect the information such as the central position of various target objects, the scaling ratio and the rotation angle relative to a template and the like under the complex background condition.

Fig. 1 is a schematic flow chart of a training method for a template matching network according to an embodiment of the present application, including the following steps:

s1: obtaining a sample template, wherein the sample template at least comprises a target object graph and a background graph, and the sample template comprises a target outline of a target object;

in the step of making the template, a user only needs to provide all the object images and the background images of the target, outline the outer outline of the target, mark the template type and determine the angle orientation.

S2: processing the sample template obtained in the step S1 by using an improved particle swarm algorithm, generating sample picture training data, and recording the center point coordinates of each target object contained in each sample picture, the rotation angle of the target object, the scaling ratio of the target object and the category information of the target object;

in an embodiment of the application, training data is generated based on the improved particle swarm. Randomly selecting a target object, randomly setting the scaling and the rotation angle of the target object. In order to make the arrangement of the objects in each piece of picture data as compact as possible, in the embodiment of the present application, the object positions are arranged using an improved particle swarm algorithm. And automatically recording the center point, the angle, the scaling ratio and the category information of each target object when generating the picture. To balance the speed and global search capability at optimization iterations, von neumann topology is added to the particle swarm algorithm. In order to improve the convergence rate of the particle swarm algorithm, principal component analysis is performed on the generated picture in the first half period, and the search space in iteration is reduced.

S3: normalizing the training data to obtain target training data, and extracting features of the target training data by using a convolutional neural network to obtain feature data, wherein the feature data comprises gray scales of the target training data and contour feature data of each target object;

in the embodiment of the present application, the convolutional neural network may be vgg, acceptable v3, resnet, etc., and the embodiment of the present application is not limited to uniqueness.

S4: training a suggestion frame network according to the feature data, reserving a target suggestion frame meeting preset requirements, mapping the target suggestion frame to a position corresponding to the feature map, and performing rotation operation on the target suggestion frame to obtain a target feature map;

as an alternative embodiment, step S4 may be implemented in the following manner:

s4.1: training a suggestion frame network according to the characteristic data, and predicting an axis alignment suggestion frame of each target object, a rotation angle value and a score value of the target object, wherein the suggestion frame network comprises a rpn network and an angle classification network, and a result diagram of the suggestion frame network provided by the embodiment of the application is shown in fig. 2;

as an optional implementation manner, the angle prediction adopts a classification method to divide the angles into a plurality of categories, wherein a plurality of degrees of the standard angles are positive values, the rest angle label values are all set to 0, and the standard angles are the directions of the centroid of the target object along the long sides of the minimum rectangular envelope frame.

In the embodiment of the application, the standard angle has a plurality of degrees of positive values, and the specific degree of positive values can be determined according to actual needs, and the embodiment of the application is not limited by uniqueness. In the embodiment of the application, the standard angle is preferably about 5 degrees and has a positive value.

The method for assigning the positive value label comprises the following steps: the label value of "standard angle" is 1, decreasing by 0.2 per 1 degree deviation of the label value. In addition, the symmetrical object has a plurality of "standard angles". For example, a square has 4 standard angles, and a circle is all standard angles.

S4.2: obtaining the coincidence ratio between the shaft alignment suggested frame and the actual suggested frame, obtaining the difference value between the rotation angle value of the target object in the shaft alignment suggested frame and the rotation angle value of the target object in the actual suggested frame, and discarding the shaft alignment suggested frame with the coincidence ratio lower than the preset coincidence ratio threshold value if the coincidence ratio is smaller than the preset coincidence ratio threshold value and the difference value is smaller than the preset difference value threshold value, so as to obtain a reserved target shaft alignment suggested frame;

s4.3: mapping the target axis alignment suggestion frame to a position corresponding to the feature map, and performing rotation transformation on the target axis alignment suggestion frame at the position based on the rotation angle value of the target object in the target axis alignment suggestion frame to obtain a target feature map, wherein fig. 3 is a result map after the suggestion frame region in the picture is rotated.

S5: and training the neural network through the target feature map to obtain a template matching network, wherein the output of the template matching network is at least the category of the target object and the scaling ratio of the target object.

In the embodiment of the application, the position and the angle value of the central point obtained in the step S4.2 can be finely adjusted through a template matching network.

In another embodiment of the present application, there is also provided an identification method including:

and inputting the picture to be identified into a trained template matching network for identification processing to obtain the category information and the pose information of each target object contained in the picture to be identified.

The pose information of the target object comprises a center position, a scaling ratio, a rotation angle and the like.

In another embodiment of the present application, there is also provided a template matching network training apparatus, including:

the feature map acquisition module is used for training a suggestion frame network according to the feature data, reserving target suggestion frames meeting preset requirements, mapping the target suggestion frames to corresponding positions of the feature map, and performing rotation operation on the target suggestion frames to obtain a target feature map;

In the embodiment of the present application, specific implementation manners of each module may refer to descriptions of method embodiments, and the embodiment of the present application will not be repeated.

In another embodiment of the present application, there is also provided an identification device including:

the recognition result acquisition module is used for inputting the picture to be recognized into the trained template matching network for recognition processing to obtain the category information and the pose information of each target object contained in the picture to be recognized.

It should be noted that each step/component described in the present application may be split into more steps/components, or two or more steps/components or part of operations of the steps/components may be combined into new steps/components, according to the implementation needs, to achieve the object of the present application.

It will be readily appreciated by those skilled in the art that the foregoing description is merely a preferred embodiment of the application and is not intended to limit the application, but any modifications, equivalents, improvements or alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. A template matching network training method, comprising:

2. The method of claim 1, wherein step (2) comprises:

3. The method according to claim 1 or 2, wherein step (4) comprises:

4. A method according to claim 3, wherein in step (4.1) the rotation angle value of the target object is predicted, comprising:

5. A method of identification, comprising:

inputting a picture to be identified into the template matching network trained by the template matching network training method according to any one of claims 1 to 4 for identification processing, so as to obtain category information and pose information of each target object contained in the picture to be identified.

6. A template matching network training device, comprising:

7. The apparatus of claim 6, wherein the training data acquisition module comprises:

8. The apparatus according to claim 6 or 7, wherein the feature map acquisition module includes:

9. The apparatus of claim 8, wherein predicting the rotation angle value of the target object comprises:

10. An identification device, comprising:

the recognition result obtaining module is configured to input a picture to be recognized into the template matching network trained by the template matching network training device according to any one of claims 6 to 9, and perform recognition processing on the template matching network to obtain category information and pose information of each target object included in the picture to be recognized.