CN115375976B

CN115375976B - Image processing model training method, electronic device, and computer-readable storage medium

Info

Publication number: CN115375976B
Application number: CN202211311142.0A
Authority: CN
Inventors: 马子昂; 刘征宇
Original assignee: Hangzhou Huacheng Software Technology Co Ltd
Current assignee: Hangzhou Huacheng Software Technology Co Ltd
Priority date: 2022-10-25
Filing date: 2022-10-25
Publication date: 2023-02-10
Anticipated expiration: 2042-10-25
Also published as: CN115375976A

Abstract

The application discloses an image processing model training method, electronic equipment and a computer readable storage medium, wherein the image processing model training method comprises the following steps: inputting a training image corresponding to a training target into an image processing model, wherein the image processing model represents the training target by using a plurality of initial key points, and predicting the position of each initial key point to obtain a plurality of predicted key points; obtaining a polymerization key point corresponding to each initial key point based on each initial key point and a prediction key point corresponding to the initial key point; determining a target prediction box corresponding to a training target based on a plurality of aggregation key points; adjusting parameters of the image processing model based on a target true value frame and a target prediction frame corresponding to the training target; and responding to the condition of meeting the preset convergence condition to obtain the trained image processing model. According to the scheme, the precision requirement of target labeling can be reduced, and the accuracy of visual feature extraction of the trained image processing model is improved.

Description

Image processing model training method, electronic device, and computer-readable storage medium

Technical Field

The present application relates to the field of computer vision technologies, and in particular, to an image processing model training method, an electronic device, and a computer-readable storage medium.

Background

The vision is the most important way for human to obtain information, and the visual feature extraction also becomes an important branch in the field of computer vision, so that the application of an image processing model is increasingly emphasized for improving the efficiency of obtaining visual features, and the efficiency of extracting the visual features on an input image can be effectively improved by training the image processing model to obtain the trained image processing model. In view of this, how to reduce the precision requirement of target labeling and improve the accuracy of visual feature extraction of the trained image processing model becomes an urgent problem to be solved.

Disclosure of Invention

The technical problem mainly solved by the application is to provide an image processing model training method, an electronic device and a computer readable storage medium, which can reduce the precision requirement of target labeling and improve the accuracy of visual feature extraction of the trained image processing model.

In order to solve the above technical problem, a first aspect of the present application provides an image processing model training method, including: inputting a training image corresponding to a training target into an image processing model, wherein the image processing model represents the training target by using a plurality of initial key points, and predicting the position of each initial key point to obtain a plurality of predicted key points; obtaining a polymerization key point corresponding to each initial key point based on each initial key point and the prediction key point corresponding to the initial key point; determining a target prediction box corresponding to the training target based on the aggregation key points; adjusting parameters of the image processing model based on a target true value frame and the target prediction frame corresponding to the training target; and responding to the condition that a preset convergence condition is met, and obtaining the trained image processing model.

In order to solve the above technical problem, a second aspect of the present application provides an electronic device, including: a memory and a processor coupled to each other, wherein the memory stores program data, and the processor calls the program data to execute the method of the first aspect.

In order to solve the above technical problem, a third aspect of the present application provides a computer-readable storage medium, on which program data are stored, the program data implementing the method of the first aspect when being executed by a processor.

According to the scheme, a training image corresponding to a training target is input into an image processing model, the image processing model represents the training target by utilizing a plurality of initial key points, so that the position of the initial key points is predicted, a plurality of prediction key points are obtained, a polymerization key point corresponding to each initial key point is determined based on each initial key point and the corresponding prediction key point, the accuracy of the polymerization key points is improved, the characteristics of the polymerization key points are enhanced, a target prediction frame corresponding to the training target is determined based on the plurality of polymerization key points, so that a target true value frame and a target prediction frame corresponding to the training target are compared, and the parameters of the image processing model are adjusted.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. Wherein:

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of an image processing model training method according to the present application;

FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a training method for an image processing model according to the present application;

FIG. 3 is a schematic diagram of an application scenario of an embodiment corresponding to step S204 in FIG. 2;

FIG. 4 is a schematic view of a topology of an embodiment of an image processing model of the present application;

FIG. 5 is a schematic structural diagram of an embodiment of an electronic device of the present application;

FIG. 6 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of an image processing model training method according to the present application, the method including:

s101: inputting a training image corresponding to a training target into an image processing model, wherein the image processing model represents the training target by using a plurality of initial key points, predicting the position of each initial key point to obtain a plurality of predicted key points, obtaining an aggregation key point corresponding to each initial key point based on each initial key point and the corresponding predicted key point thereof, and determining a target prediction frame corresponding to the training target based on the aggregation key points.

Specifically, a training image corresponding to a training target is input into an image processing model, the training image includes the training target, the image processing model represents the training target by using a plurality of initial key points, that is, the training target corresponds to a set formed by a plurality of initial key points, the position of each initial key point is predicted, a plurality of predicted key points are obtained, that is, the image processing model predicts the position of the training target.

Furthermore, based on each initial key point and the corresponding prediction key point thereof, the aggregation key point corresponding to each initial key point is obtained, the accuracy rate of the aggregation key point is improved, the characteristics of the aggregation key point are enhanced, based on the positions of the aggregation key points, the target prediction frame corresponding to the training target is determined, and the position of the predicted training target can be identified by using the rectangular frame, so that the labeling based on pixel points is not needed when the target true value frame corresponding to the training target is labeled, and the precision requirement of the target labeling is reduced.

In an application mode, a training image comprising a training target is obtained, a target true value frame of the training target is marked on the training image, the training image is input into an image processing model, so that the image processing model represents the training target by utilizing a plurality of initial key points, the position of the training target is predicted, and the predicted key points corresponding to the initial key points are obtained.

In another application mode, a training image including a training target is obtained, a target true value frame of the training target is marked on the training image, the training image is input into an image processing model, so that the image processing model represents the training target by using a plurality of initial key points, an offset value of the training target relative to an initial position is predicted, and the offset value is superposed on the initial key points to obtain predicted key points corresponding to the initial key points.

Further, the initial key points and the characteristics of the corresponding prediction key points are aggregated, the aggregation key points corresponding to the initial key points are determined based on the aggregated characteristics, the target prediction frame of the training target is determined based on the positions of the aggregation key points, and then the target prediction frame corresponding to the training target is output by the image processing model.

In an application scene, pooling operation is carried out on the initial key points and the characteristics of the corresponding predicted key points, so that the characteristics of the initial key points and the characteristics of the corresponding predicted key points are aggregated, the aggregated characteristics are determined to obtain the aggregated key points corresponding to the initial key points, a rectangular frame capable of surrounding all the aggregated key points is determined based on the positions of a plurality of aggregated key points, and a target predicted frame corresponding to a training target is output.

In another application scenario, the initial key points and the corresponding prediction key points are connected, the maximum pooling operation is performed on the pixel points on the connecting line to obtain aggregation key points on the connecting line, the characteristics of the aggregation key points are enhanced, the minimum rectangular frame capable of surrounding all the aggregation key points is determined based on the positions of the aggregation key points, and the target prediction frame corresponding to the training target is output.

S102: and adjusting parameters of the image processing model based on the target true value frame and the target prediction frame corresponding to the training target.

Specifically, a target true value frame and a target prediction frame corresponding to the training target are compared, so that the parameters of the image processing model are adjusted based on the difference value of the target prediction frame relative to the target true value frame.

In an application mode, after a prediction key point corresponding to each initial key point is obtained, a rectangular frame capable of surrounding all the prediction key points is determined based on the position of the prediction key point to obtain an initial prediction frame, partial parameters of the image processing model are adjusted based on the initial prediction frame and a target true value frame, when the adjustment exceeds a preset number of times, a polymerization key point corresponding to each initial key point is obtained based on each initial key point and the corresponding prediction key point, so that a target prediction frame corresponding to a training target is determined, and all parameters of the image processing model are adjusted based on a difference value of the target prediction frame relative to the target true value frame.

In another application mode, all modules in the image processing model are trained simultaneously, aggregation key points corresponding to all initial key points are obtained based on all the initial key points and the corresponding prediction key points thereof, so that a target prediction frame corresponding to a training target is determined, and all parameters of the image processing model are adjusted based on the difference value of the target prediction frame relative to a target true value frame.

S103: and responding to the condition of meeting the preset convergence condition to obtain the trained image processing model.

Specifically, when a preset convergence condition is met, a trained image processing model is obtained.

In an application mode, the preset convergence condition is determined based on the coincidence rate of the target prediction frame and the target true value frame and the confidence coefficient of the prediction result, when the coincidence rate of the target prediction frame and the target true value frame exceeds a coincidence rate threshold and the confidence coefficient exceeds a confidence coefficient threshold, the training process is ended, the trained image processing model is obtained, and the image processing model with high confidence coefficient is obtained.

In another application mode, the preset convergence condition is determined based on the coincidence rate of the target prediction frame and the target true value frame and the iteration number, and when the coincidence rate of the target prediction frame and the target true value frame exceeds a coincidence rate threshold value and the iteration number exceeds a number threshold value, the training process is ended, and the trained image processing model is obtained, so that the image processing model with high stability is obtained.

According to the scheme, a training image corresponding to a training target is input into an image processing model, the image processing model represents the training target by utilizing a plurality of initial key points, so that the positions of the initial key points are predicted, a plurality of predicted key points are obtained, a converged key point corresponding to each initial key point is determined based on each initial key point and the predicted key point corresponding to each initial key point, the accuracy of the converged key points is improved, the characteristics of the converged key points are enhanced, a target prediction frame corresponding to the training target is determined based on the plurality of converged key points, so that a target true value frame and a target prediction frame corresponding to the training target are compared, and parameters of the image processing model are adjusted.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an image processing model training method according to another embodiment of the present application, wherein the image processing model includes a convolution module, a prediction module, and an aggregation module, and the method includes:

s201: and inputting the training image corresponding to the training target into a convolution module so that the convolution module extracts the characteristics of the training image to obtain a convolution characteristic diagram corresponding to the training image.

Specifically, a training image including a training target is input into the convolution module, and the convolution module extracts features of the training image to obtain a convolution feature map corresponding to the training image, so that the features on the training image are fully extracted.

In an application scene, a training image is input into a plurality of convolution modules cascaded by convolution neural networks for convolution feature extraction, and the output feature dimensionality isH×W×CThe height and the width of the convolution characteristic diagram and the channel number of the convolution module respectively correspond to the convolution characteristic diagram.

S202: and representing a training target on the convolution characteristic diagram by using the initial key points to obtain an initial key point set consisting of the initial key points.

Specifically, a plurality of pixel points are selected from the convolution characteristic diagram to serve as initial key points, an initial key point set composed of the initial key points is obtained, and a training target is expressed as the initial key point set, so that the dependence of the training effect of the image processing model on the pixel level labeling is reduced.

In an application mode, selecting pixel points with a second preset numerical value from all pixel points of the convolution characteristic graph as initial key points to obtain an initial key point set consisting of the initial key points with the second preset numerical value; an initial position of each initial keypoint in the initial set of keypoints is determined.

Specifically, any pixel point on the convolution feature map is taken as a candidate position corresponding to the training target, a second preset number of pixel points is selected as an initial key point in the candidate position based on the pixel features corresponding to the pixel points, an initial key point set composed of the second preset number of initial key points is obtained, the training target is represented by the initial key point set, and the initial position of each initial key point in the initial key point set is determined, wherein the process is represented by a formula as follows:

R = {(x _k ，y _k )} ⁿ _k=1 （1）

wherein,x _k =i，y _k =j，k=1,2,3…,nwherein (a) and (b)i，j) Indicating the position of the pixel point,nrepresenting a second of the initial keypoint correspondences in the initial set of keypointsAnd (3) presetting values, and representing the training target by using an initial key point set consisting of initial key points of second preset values, so as to finally obtain a target prediction frame, so that the data annotation independent of pixel levels is realized, only a true value frame needs to be labeled on the training target, and the dependency of the image processing model on the data annotation quality is reduced.

S203: and inputting the initial key point set into a prediction module so that the prediction module predicts the position of each initial key point to obtain a prediction key point set consisting of the prediction key points corresponding to each initial key point.

Specifically, the initial key point set is input to the prediction module, and then the prediction module predicts the position of each initial key point, that is, the position of the training target is predicted to obtain a prediction key point set consisting of the prediction key points corresponding to each key point, wherein the prediction key point set is the position of the training target predicted by the prediction module, so that the prediction module has better adaptivity to targets with interference factors such as motion blur, target pose change, ambient light change and/or occlusion, and can still obtain a prediction result for the training target with interference.

In an application mode, inputting the initial key point set into a prediction module so that the prediction module predicts the position corresponding to each initial key point to obtain a position deviation value; superposing the position deviation value to the initial position of the corresponding initial key point to obtain the predicted position corresponding to each initial key point; and determining the predicted key points based on the predicted positions corresponding to the initial key points to obtain a predicted key point set consisting of a second preset number of predicted key points.

Specifically, the initial key point set is input to the prediction module, the prediction module predicts the position of each initial key point to obtain a position deviation value of each predicted position relative to the initial key point, the position deviation value is overlapped with the position of the corresponding initial key point to obtain a predicted position corresponding to each initial key point, the predicted key points are determined based on the predicted positions, and a predicted key point set composed of a second preset number of predicted key points is obtained. The above process is formulated as follows:

R _r = { x _k +△x _k ，y _k +△y _k } ⁿ _k=1 （2）

wherein,R _r representing a set of predicted keypoints, (x _k ，y _k ) And is andk=1,2,3…,nindicating the location of each initial keypoint, ((ii))△x _k ，△y _k ) Represents a position offset value of (1)x _k +△x _k ，y _k +△y _k ) The positions of all the predicted key points are represented, and the accuracy of the positions of the predicted key points is improved by correcting the positions of the initial key points.

In a specific application scenario, the initial position includes a two-dimensional coordinate corresponding to the initial key point, the number of channels corresponding to the prediction module is twice of a second preset value, and the position offset value includes a two-dimensional coordinate offset between each prediction key point and the corresponding initial key point.

Specifically, the prediction module does not change the space size of the input features, only changes the number of feature channels, and assumes that the feature dimension obtained by the convolution module isH×W×CThe second preset value corresponding to the initial key point in the initial key point set isnIf the number of channels corresponding to the prediction module is twice the second preset value, the feature dimension output by the prediction module isH×W×2nWherein, the number of channels2nCorrespond tonThe two-dimensional coordinate offset of each initial keypoint.

Optionally, the prediction module consists of two convolutional layers in series, so as to obtain a channel number of2nThe prediction module of (3).

S204: and inputting the initial key point set and the prediction key point set into a polymerization module so that the polymerization module polymerizes the characteristics of each initial key point and the corresponding prediction key point thereof to obtain a polymerization key point set consisting of the polymerization key points corresponding to each initial key point.

Specifically, an initial key point set and a prediction key point set are input into a polymerization module, and then the polymerization module polymerizes the characteristics of each initial key point and the corresponding prediction key point to obtain a polymerization key point corresponding to each initial key point, a plurality of polymerization key points form a polymerization key point set, and the characteristics of the initial key points and the prediction key points are polymerized to adapt to the condition that the position of a target key point is inaccurate, and improve the accuracy and robustness of key point characteristic extraction.

In an application mode, based on the positions of each initial key point and the corresponding predicted key point, obtaining a connecting line between each initial key point and the corresponding predicted key point; dividing each connecting line into a first preset value of reference points, and performing maximum pooling operation on the first preset value of reference points to aggregate the characteristics of the first preset value of reference points to obtain aggregation key points corresponding to each connecting line and aggregation characteristics corresponding to the aggregation key points; and forming a polymerization key point set by the multiple polymerization key points, and connecting the polymerization features corresponding to the polymerization key points in series to obtain the enhancement features corresponding to the polymerization key point set.

Specifically, an initial key point is used as a starting point and a predicted key point corresponding to the initial key point is used as an end point, the starting point and the end point are connected to obtain a connecting line between each initial key point and the corresponding predicted key point, each connecting line is divided into a first preset number of reference points, the features of the first preset number of reference points are aggregated through maximum pooling operation, so that an aggregated key point corresponding to each initial key point is obtained, a plurality of aggregated key points form an aggregated key point set, the aggregated features corresponding to each aggregated key point in the aggregated key point set are connected in series, so that enhanced features corresponding to the aggregated key point set are obtained, and for targets with interference factors such as motion blur, target pose change, environmental illumination change and/or shielding, the aggregation module is utilized to enable the proposed method to adapt to the situation that the position of the target key point is inaccurate, and the accuracy and the robustness of key point feature extraction are improved. The above process is formulated as follows:

F _m =max I[x _i +k(x _m -x _i )/(N-1)，y _i +k(y _m -y _i )/(N-1)] （3）

wherein, (ii) (x _i ，y _i ) Indicating the location of the initial keypoint, ((ii))x _m ，y _m ) Indicating the location of the predicted keypoint,I（x,y）representing the response value of the corresponding pixel point in the convolution characteristic diagram output by the convolution module,kvalue range of 0 is less than or equal tok≤N-1The polymerization feature of the polymerization key point isF _m ∈R ^H×W×C The enhanced characteristics corresponding to the polymerization key point set obtained by connecting the characteristics of all the polymerization key points in series areF _a ∈R ^H×W×nC 。

In a specific application scenario, please refer to fig. 3, where fig. 3 is an application scenario diagram of an embodiment corresponding to step S204 in fig. 2, the training target in fig. 3 is an automobile, taking an initial key point corresponding to the training target as an example, the initial key point is a dot without padding, the initial key point is taken as a pole point, the predicted key point is a gray dot, the pole point and the predicted key point are connected to obtain a pole axis corresponding to the initial key point, the pole axis is divided equally into N reference points, which are shown by a dotted line in fig. 3, maximum pooling operation is performed on N total reference points including the initial key point and the predicted key point, features of the N reference points are aggregated to obtain dots with the aggregated key point being black, so as to obtain a more accurate position of the training target, adapt to a target with interference factors such as motion blur, target pose change, illumination environment change, and/or occlusion, and improve the robustness and the key point feature extraction. Wherein, N can be any value which is larger than 3 and is set by a user.

S205: and determining corner positions corresponding to the aggregation key points of the preset corners in an aggregation key point set consisting of a plurality of aggregation key points.

Specifically, an angle position corresponding to the aggregation key point of the preset angle is extracted from the aggregation key point set, and when the aggregation key point corresponds to a two-dimensional coordinate, the corresponding angle position is the two-dimensional coordinate.

S206: and determining a target prediction box corresponding to the training target based on the corner position.

Specifically, a minimum bounding rectangle capable of surrounding all the aggregation key points is determined based on the corner positions, and a target prediction frame corresponding to the training target is obtained.

In an application mode, the preset corner comprises a lower left corner and an upper right corner, and the minimum rectangular frame corresponding to the target prediction frame is determined based on the positions of the aggregation key points at the lower left corner and the upper right corner.

In another application mode, the preset corner comprises an upper left corner and a lower right corner, and the minimum circumscribed rectangle frame corresponding to the target prediction frame is determined based on the positions of the aggregation key points at the upper left corner and the lower right corner. The above process is formulated as follows:

Br=[min _k (x _k +Δx _k ),min _k (y _k +Δy _k ),max _k (x _k +Δx _k ),max _k (y _k +Δy _k )]（4）

wherein, (ii) (x _k +△x _k ，y _k +△y _k ) And after the positions of the prediction key points are obtained, the abscissa and the ordinate of each corner are extracted from the positions, so that the minimum circumscribed rectangle frame corresponding to the target prediction frame is determined, the minimum circumscribed rectangle frame can surround all the aggregation key points, and the target prediction frame with more accurate labels is obtained.

S207: and adjusting parameters of the image processing model based on the target true value frame and the target prediction frame corresponding to the training target.

Specifically, parameters of the image processing model are adjusted based on the coincidence rate of the target prediction frame relative to the target true value frame.

In an application mode, after prediction key points corresponding to all initial key points are obtained, a rectangular frame capable of surrounding all the prediction key points is determined based on the positions of the prediction key points to obtain initial prediction frames, parameters of a convolution module and a prediction module are adjusted based on the initial prediction frames and a target true value frame, when the adjustment exceeds a preset number of times, aggregation key points corresponding to all the initial key points are obtained based on all the initial key points and the corresponding prediction key points, so that a target prediction frame corresponding to a training target is determined, and the parameters of the convolution module, the prediction module and the aggregation module are adjusted based on the coincidence rate of the target prediction frame and the target true value frame.

In another application mode, all modules in the image processing model are trained simultaneously, aggregation key points corresponding to all initial key points are obtained based on all the initial key points and prediction key points corresponding to the initial key points, so that a target prediction frame corresponding to a training target is determined, and parameters of the convolution module, the prediction module and the aggregation module are adjusted based on the coincidence rate of the target prediction frame and the target true value frame.

S208: and responding to the condition that the preset convergence condition is met, and obtaining the trained image processing model.

Further, the trained image processing model can mark a target in an input image and extract visual features of the target, and the trained image processing model can be used for different application scenarios such as target recognition, target detection, target tracking, instance segmentation and the like.

In this embodiment, an initial key point set composed of a second preset number of initial key points is used to represent a training target, so that a target prediction frame is finally obtained, so that data labeling independent of pixel levels is achieved, only a true value frame needs to be labeled on the training target, dependency of an image processing model on data labeling quality is reduced, aggregation features corresponding to each aggregation key point in an aggregation key point set are connected in series by using an aggregation module, enhancement features corresponding to the aggregation key point set are obtained, the proposed method can adapt to the situation that the position of a target key point is inaccurate, and accuracy and robustness of key point feature extraction are improved.

Referring to fig. 4, fig. 4 is a schematic diagram of a topological structure of an embodiment of an image processing model of the present application, where the image processing model 40 includes a convolution module 400, a prediction module 402, and an aggregation module 404, and a training image corresponding to a training target is input into the image processing model 40, and the image processing model 40 uses a plurality of initial key points to represent the training target, and predicts the positions of the initial key points to obtain a plurality of predicted key points; obtaining a polymerization key point corresponding to each initial key point based on each initial key point and a prediction key point corresponding to the initial key point; determining a target prediction frame corresponding to a training target based on a plurality of aggregation key points; adjusting parameters of the image processing model 40 based on a target true value frame and a target prediction frame corresponding to the training target; in response to the preset convergence condition being met, a trained image processing model 40 is obtained.

Specifically, the convolution module 400 extracts features of the training image to obtain a convolution feature map corresponding to the training image, and represents the training target on the convolution feature map by using a plurality of initial key points to obtain an initial key point set composed of a plurality of initial key points. The prediction module 402 predicts the positions of the initial key points to obtain a prediction key point set composed of the prediction key points corresponding to the initial key points.

Further, the aggregation module 404 aggregates the characteristics of each initial keypoint and the corresponding predicted keypoint to obtain an aggregation keypoint set composed of the aggregation keypoints corresponding to each initial keypoint. The aggregation module 404 obtains a connection line between each initial key point and the corresponding predicted key point based on the position of each initial key point and the corresponding predicted key point; dividing each connecting line into a first preset number of reference points, and performing maximum pooling operation on the first preset number of reference points to aggregate the characteristics of the first preset number of reference points to obtain aggregation key points corresponding to each connecting line and aggregation characteristics corresponding to the aggregation key points; and forming a polymerization key point set by the multiple polymerization key points, and connecting the polymerization features corresponding to the polymerization key points in series to obtain the enhancement features corresponding to the polymerization key point set.

It should be noted that the image processing model 40 in this embodiment can be trained by using the method described in any of the above embodiments.

Referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of an electronic device 50 of the present application, where the electronic device 50 includes a memory 501 and a processor 502 coupled to each other, where the memory 501 stores program data (not shown), and the processor 502 calls the program data to implement the method in any of the above embodiments, and for a description of relevant contents, reference is made to the detailed description of the above method embodiments, which is not repeated here.

Referring to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of a computer-readable storage medium 60 of the present application, the computer-readable storage medium 60 stores program data 600, and the program data 600 is executed by a processor to implement the method in any of the above embodiments, and the related contents are described in detail with reference to the above method embodiments, which are not repeated herein.

It should be noted that, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or contributing to the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only an embodiment of the present application, and is not intended to limit the scope of the present application, and all equivalent structures or equivalent processes performed by the present application and the contents of the attached drawings, which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A method for training an image processing model, the method comprising:

inputting a training image corresponding to a training target into an image processing model, wherein the image processing model represents the training target by using a plurality of initial key points, and predicting the position of each initial key point to obtain a plurality of predicted key points; obtaining a polymerization key point corresponding to each initial key point based on each initial key point and the prediction key point corresponding to the initial key point; determining a target prediction box corresponding to the training target based on the aggregation key points;

adjusting parameters of the image processing model based on a target true value frame and the target prediction frame corresponding to the training target;

obtaining the trained image processing model in response to the preset convergence condition being met;

wherein obtaining aggregation key points corresponding to each initial key point based on each initial key point and the prediction key point corresponding to the initial key point comprises: obtaining a connecting line between each initial key point and the corresponding prediction key point based on the initial key point and the position of the corresponding prediction key point; dividing each connection line into a first preset number of reference points, and performing maximum pooling operation on the first preset number of reference points to aggregate characteristics of the first preset number of reference points to obtain aggregation key points corresponding to each connection line and aggregation characteristics corresponding to the aggregation key points; and forming a polymerization key point set by the polymerization key points, and connecting the polymerization features corresponding to the polymerization key points in series to obtain the enhancement features corresponding to the polymerization key point set.

2. The method of claim 1, wherein the image processing model comprises a convolution module and a prediction module, and the predicting the positions of the initial key points to obtain a plurality of predicted key points comprises:

inputting a training image corresponding to a training target into the convolution module so that the convolution module extracts the characteristics of the training image to obtain a convolution characteristic diagram corresponding to the training image;

representing the training target on the convolution characteristic diagram by using a plurality of initial key points to obtain an initial key point set consisting of the initial key points;

and inputting the initial key point set into the prediction module so that the prediction module predicts the position of each initial key point to obtain a prediction key point set consisting of prediction key points corresponding to each initial key point.

3. The method according to claim 2, wherein the image processing model further comprises an aggregation module, and the aggregation module is configured to implement a step of obtaining an aggregation key point corresponding to each initial key point based on each initial key point and the predicted key point corresponding to the initial key point.

4. The method for training an image processing model according to claim 2, wherein the representing the training target on the convolution feature map by using a plurality of initial key points to obtain an initial key point set composed of a plurality of initial key points comprises:

selecting pixel points with a second preset value from all pixel points of the convolution characteristic graph as the initial key points to obtain an initial key point set consisting of the initial key points with the second preset value;

determining an initial position of each of the initial keypoints in the initial set of keypoints.

5. The method of claim 4, wherein the inputting the initial keypoint set into the prediction module to enable the prediction module to predict the position of each initial keypoint, so as to obtain a predicted keypoint set composed of predicted keypoints corresponding to each initial keypoint, includes:

inputting the initial key point set into the prediction module so that the prediction module predicts the position corresponding to each initial key point to obtain a position deviation value;

superposing the position deviation value to the initial position of the corresponding initial key point to obtain the predicted position corresponding to each initial key point;

and determining the predicted key points based on the predicted positions corresponding to the initial key points to obtain a predicted key point set consisting of the predicted key points with the second preset value.

6. The image processing model training method of claim 5,

the initial position comprises a two-dimensional coordinate corresponding to the initial key point, the number of channels corresponding to the prediction module is twice of the second preset value, and the position deviation value comprises a two-dimensional coordinate deviation between each prediction key point and the corresponding initial key point.

7. The method for training an image processing model according to claim 1, wherein the determining a target prediction box corresponding to the training target based on the plurality of aggregation key points comprises:

determining corner positions corresponding to the aggregation key points of preset corners in an aggregation key point set consisting of a plurality of aggregation key points;

and determining a target prediction box corresponding to the training target based on the corner position.

8. An electronic device, comprising: a memory and a processor coupled to each other, wherein the memory stores program data that the processor calls to perform the method of any of claims 1-7.

9. A computer-readable storage medium, on which program data are stored, which program data, when being executed by a processor, carry out the method according to any one of claims 1-7.