CN115375976B - Image processing model training method, electronic device, and computer-readable storage medium - Google Patents

Image processing model training method, electronic device, and computer-readable storage medium Download PDF

Info

Publication number
CN115375976B
CN115375976B CN202211311142.0A CN202211311142A CN115375976B CN 115375976 B CN115375976 B CN 115375976B CN 202211311142 A CN202211311142 A CN 202211311142A CN 115375976 B CN115375976 B CN 115375976B
Authority
CN
China
Prior art keywords
key point
initial
initial key
target
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211311142.0A
Other languages
Chinese (zh)
Other versions
CN115375976A (en
Inventor
马子昂
刘征宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Huacheng Software Technology Co Ltd
Original Assignee
Hangzhou Huacheng Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Huacheng Software Technology Co Ltd filed Critical Hangzhou Huacheng Software Technology Co Ltd
Priority to CN202211311142.0A priority Critical patent/CN115375976B/en
Publication of CN115375976A publication Critical patent/CN115375976A/en
Application granted granted Critical
Publication of CN115375976B publication Critical patent/CN115375976B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an image processing model training method, electronic equipment and a computer readable storage medium, wherein the image processing model training method comprises the following steps: inputting a training image corresponding to a training target into an image processing model, wherein the image processing model represents the training target by using a plurality of initial key points, and predicting the position of each initial key point to obtain a plurality of predicted key points; obtaining a polymerization key point corresponding to each initial key point based on each initial key point and a prediction key point corresponding to the initial key point; determining a target prediction box corresponding to a training target based on a plurality of aggregation key points; adjusting parameters of the image processing model based on a target true value frame and a target prediction frame corresponding to the training target; and responding to the condition of meeting the preset convergence condition to obtain the trained image processing model. According to the scheme, the precision requirement of target labeling can be reduced, and the accuracy of visual feature extraction of the trained image processing model is improved.

Description

Image processing model training method, electronic device, and computer-readable storage medium
Technical Field
The present application relates to the field of computer vision technologies, and in particular, to an image processing model training method, an electronic device, and a computer-readable storage medium.
Background
The vision is the most important way for human to obtain information, and the visual feature extraction also becomes an important branch in the field of computer vision, so that the application of an image processing model is increasingly emphasized for improving the efficiency of obtaining visual features, and the efficiency of extracting the visual features on an input image can be effectively improved by training the image processing model to obtain the trained image processing model. In view of this, how to reduce the precision requirement of target labeling and improve the accuracy of visual feature extraction of the trained image processing model becomes an urgent problem to be solved.
Disclosure of Invention
The technical problem mainly solved by the application is to provide an image processing model training method, an electronic device and a computer readable storage medium, which can reduce the precision requirement of target labeling and improve the accuracy of visual feature extraction of the trained image processing model.
In order to solve the above technical problem, a first aspect of the present application provides an image processing model training method, including: inputting a training image corresponding to a training target into an image processing model, wherein the image processing model represents the training target by using a plurality of initial key points, and predicting the position of each initial key point to obtain a plurality of predicted key points; obtaining a polymerization key point corresponding to each initial key point based on each initial key point and the prediction key point corresponding to the initial key point; determining a target prediction box corresponding to the training target based on the aggregation key points; adjusting parameters of the image processing model based on a target true value frame and the target prediction frame corresponding to the training target; and responding to the condition that a preset convergence condition is met, and obtaining the trained image processing model.
In order to solve the above technical problem, a second aspect of the present application provides an electronic device, including: a memory and a processor coupled to each other, wherein the memory stores program data, and the processor calls the program data to execute the method of the first aspect.
In order to solve the above technical problem, a third aspect of the present application provides a computer-readable storage medium, on which program data are stored, the program data implementing the method of the first aspect when being executed by a processor.
According to the scheme, a training image corresponding to a training target is input into an image processing model, the image processing model represents the training target by utilizing a plurality of initial key points, so that the position of the initial key points is predicted, a plurality of prediction key points are obtained, a polymerization key point corresponding to each initial key point is determined based on each initial key point and the corresponding prediction key point, the accuracy of the polymerization key points is improved, the characteristics of the polymerization key points are enhanced, a target prediction frame corresponding to the training target is determined based on the plurality of polymerization key points, so that a target true value frame and a target prediction frame corresponding to the training target are compared, and the parameters of the image processing model are adjusted.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the description below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings without creative efforts. Wherein:
FIG. 1 is a schematic flow chart diagram illustrating an embodiment of an image processing model training method according to the present application;
FIG. 2 is a schematic flow chart diagram illustrating another embodiment of a training method for an image processing model according to the present application;
FIG. 3 is a schematic diagram of an application scenario of an embodiment corresponding to step S204 in FIG. 2;
FIG. 4 is a schematic view of a topology of an embodiment of an image processing model of the present application;
FIG. 5 is a schematic structural diagram of an embodiment of an electronic device of the present application;
FIG. 6 is a schematic structural diagram of an embodiment of a computer-readable storage medium of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms "system" and "network" are often used interchangeably herein. The term "and/or" herein is merely an association relationship describing an associated object, and means that there may be three relationships, for example, a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship. Further, the term "plurality" herein means two or more than two.
Referring to fig. 1, fig. 1 is a schematic flowchart illustrating an embodiment of an image processing model training method according to the present application, the method including:
s101: inputting a training image corresponding to a training target into an image processing model, wherein the image processing model represents the training target by using a plurality of initial key points, predicting the position of each initial key point to obtain a plurality of predicted key points, obtaining an aggregation key point corresponding to each initial key point based on each initial key point and the corresponding predicted key point thereof, and determining a target prediction frame corresponding to the training target based on the aggregation key points.
Specifically, a training image corresponding to a training target is input into an image processing model, the training image includes the training target, the image processing model represents the training target by using a plurality of initial key points, that is, the training target corresponds to a set formed by a plurality of initial key points, the position of each initial key point is predicted, a plurality of predicted key points are obtained, that is, the image processing model predicts the position of the training target.
Furthermore, based on each initial key point and the corresponding prediction key point thereof, the aggregation key point corresponding to each initial key point is obtained, the accuracy rate of the aggregation key point is improved, the characteristics of the aggregation key point are enhanced, based on the positions of the aggregation key points, the target prediction frame corresponding to the training target is determined, and the position of the predicted training target can be identified by using the rectangular frame, so that the labeling based on pixel points is not needed when the target true value frame corresponding to the training target is labeled, and the precision requirement of the target labeling is reduced.
In an application mode, a training image comprising a training target is obtained, a target true value frame of the training target is marked on the training image, the training image is input into an image processing model, so that the image processing model represents the training target by utilizing a plurality of initial key points, the position of the training target is predicted, and the predicted key points corresponding to the initial key points are obtained.
In another application mode, a training image including a training target is obtained, a target true value frame of the training target is marked on the training image, the training image is input into an image processing model, so that the image processing model represents the training target by using a plurality of initial key points, an offset value of the training target relative to an initial position is predicted, and the offset value is superposed on the initial key points to obtain predicted key points corresponding to the initial key points.
Further, the initial key points and the characteristics of the corresponding prediction key points are aggregated, the aggregation key points corresponding to the initial key points are determined based on the aggregated characteristics, the target prediction frame of the training target is determined based on the positions of the aggregation key points, and then the target prediction frame corresponding to the training target is output by the image processing model.
In an application scene, pooling operation is carried out on the initial key points and the characteristics of the corresponding predicted key points, so that the characteristics of the initial key points and the characteristics of the corresponding predicted key points are aggregated, the aggregated characteristics are determined to obtain the aggregated key points corresponding to the initial key points, a rectangular frame capable of surrounding all the aggregated key points is determined based on the positions of a plurality of aggregated key points, and a target predicted frame corresponding to a training target is output.
In another application scenario, the initial key points and the corresponding prediction key points are connected, the maximum pooling operation is performed on the pixel points on the connecting line to obtain aggregation key points on the connecting line, the characteristics of the aggregation key points are enhanced, the minimum rectangular frame capable of surrounding all the aggregation key points is determined based on the positions of the aggregation key points, and the target prediction frame corresponding to the training target is output.
S102: and adjusting parameters of the image processing model based on the target true value frame and the target prediction frame corresponding to the training target.
Specifically, a target true value frame and a target prediction frame corresponding to the training target are compared, so that the parameters of the image processing model are adjusted based on the difference value of the target prediction frame relative to the target true value frame.
In an application mode, after a prediction key point corresponding to each initial key point is obtained, a rectangular frame capable of surrounding all the prediction key points is determined based on the position of the prediction key point to obtain an initial prediction frame, partial parameters of the image processing model are adjusted based on the initial prediction frame and a target true value frame, when the adjustment exceeds a preset number of times, a polymerization key point corresponding to each initial key point is obtained based on each initial key point and the corresponding prediction key point, so that a target prediction frame corresponding to a training target is determined, and all parameters of the image processing model are adjusted based on a difference value of the target prediction frame relative to the target true value frame.
In another application mode, all modules in the image processing model are trained simultaneously, aggregation key points corresponding to all initial key points are obtained based on all the initial key points and the corresponding prediction key points thereof, so that a target prediction frame corresponding to a training target is determined, and all parameters of the image processing model are adjusted based on the difference value of the target prediction frame relative to a target true value frame.
S103: and responding to the condition of meeting the preset convergence condition to obtain the trained image processing model.
Specifically, when a preset convergence condition is met, a trained image processing model is obtained.
In an application mode, the preset convergence condition is determined based on the coincidence rate of the target prediction frame and the target true value frame and the confidence coefficient of the prediction result, when the coincidence rate of the target prediction frame and the target true value frame exceeds a coincidence rate threshold and the confidence coefficient exceeds a confidence coefficient threshold, the training process is ended, the trained image processing model is obtained, and the image processing model with high confidence coefficient is obtained.
In another application mode, the preset convergence condition is determined based on the coincidence rate of the target prediction frame and the target true value frame and the iteration number, and when the coincidence rate of the target prediction frame and the target true value frame exceeds a coincidence rate threshold value and the iteration number exceeds a number threshold value, the training process is ended, and the trained image processing model is obtained, so that the image processing model with high stability is obtained.
According to the scheme, a training image corresponding to a training target is input into an image processing model, the image processing model represents the training target by utilizing a plurality of initial key points, so that the positions of the initial key points are predicted, a plurality of predicted key points are obtained, a converged key point corresponding to each initial key point is determined based on each initial key point and the predicted key point corresponding to each initial key point, the accuracy of the converged key points is improved, the characteristics of the converged key points are enhanced, a target prediction frame corresponding to the training target is determined based on the plurality of converged key points, so that a target true value frame and a target prediction frame corresponding to the training target are compared, and parameters of the image processing model are adjusted.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an image processing model training method according to another embodiment of the present application, wherein the image processing model includes a convolution module, a prediction module, and an aggregation module, and the method includes:
s201: and inputting the training image corresponding to the training target into a convolution module so that the convolution module extracts the characteristics of the training image to obtain a convolution characteristic diagram corresponding to the training image.
Specifically, a training image including a training target is input into the convolution module, and the convolution module extracts features of the training image to obtain a convolution feature map corresponding to the training image, so that the features on the training image are fully extracted.
In an application scene, a training image is input into a plurality of convolution modules cascaded by convolution neural networks for convolution feature extraction, and the output feature dimensionality isH×W×CThe height and the width of the convolution characteristic diagram and the channel number of the convolution module respectively correspond to the convolution characteristic diagram.
S202: and representing a training target on the convolution characteristic diagram by using the initial key points to obtain an initial key point set consisting of the initial key points.
Specifically, a plurality of pixel points are selected from the convolution characteristic diagram to serve as initial key points, an initial key point set composed of the initial key points is obtained, and a training target is expressed as the initial key point set, so that the dependence of the training effect of the image processing model on the pixel level labeling is reduced.
In an application mode, selecting pixel points with a second preset numerical value from all pixel points of the convolution characteristic graph as initial key points to obtain an initial key point set consisting of the initial key points with the second preset numerical value; an initial position of each initial keypoint in the initial set of keypoints is determined.
Specifically, any pixel point on the convolution feature map is taken as a candidate position corresponding to the training target, a second preset number of pixel points is selected as an initial key point in the candidate position based on the pixel features corresponding to the pixel points, an initial key point set composed of the second preset number of initial key points is obtained, the training target is represented by the initial key point set, and the initial position of each initial key point in the initial key point set is determined, wherein the process is represented by a formula as follows:
R = {(x k y k )} n k=1 (1)
wherein,x k =iy k =jk=1,2,3…,nwherein (a) and (b)ij) Indicating the position of the pixel point,nrepresenting a second of the initial keypoint correspondences in the initial set of keypointsAnd (3) presetting values, and representing the training target by using an initial key point set consisting of initial key points of second preset values, so as to finally obtain a target prediction frame, so that the data annotation independent of pixel levels is realized, only a true value frame needs to be labeled on the training target, and the dependency of the image processing model on the data annotation quality is reduced.
S203: and inputting the initial key point set into a prediction module so that the prediction module predicts the position of each initial key point to obtain a prediction key point set consisting of the prediction key points corresponding to each initial key point.
Specifically, the initial key point set is input to the prediction module, and then the prediction module predicts the position of each initial key point, that is, the position of the training target is predicted to obtain a prediction key point set consisting of the prediction key points corresponding to each key point, wherein the prediction key point set is the position of the training target predicted by the prediction module, so that the prediction module has better adaptivity to targets with interference factors such as motion blur, target pose change, ambient light change and/or occlusion, and can still obtain a prediction result for the training target with interference.
In an application mode, inputting the initial key point set into a prediction module so that the prediction module predicts the position corresponding to each initial key point to obtain a position deviation value; superposing the position deviation value to the initial position of the corresponding initial key point to obtain the predicted position corresponding to each initial key point; and determining the predicted key points based on the predicted positions corresponding to the initial key points to obtain a predicted key point set consisting of a second preset number of predicted key points.
Specifically, the initial key point set is input to the prediction module, the prediction module predicts the position of each initial key point to obtain a position deviation value of each predicted position relative to the initial key point, the position deviation value is overlapped with the position of the corresponding initial key point to obtain a predicted position corresponding to each initial key point, the predicted key points are determined based on the predicted positions, and a predicted key point set composed of a second preset number of predicted key points is obtained. The above process is formulated as follows:
R r = { x k +△x k y k +△y k } n k=1 (2)
wherein,R r representing a set of predicted keypoints, (x k y k ) And is andk=1,2,3…,nindicating the location of each initial keypoint, ((ii))△x k △y k ) Represents a position offset value of (1)x k +△x k y k +△y k ) The positions of all the predicted key points are represented, and the accuracy of the positions of the predicted key points is improved by correcting the positions of the initial key points.
In a specific application scenario, the initial position includes a two-dimensional coordinate corresponding to the initial key point, the number of channels corresponding to the prediction module is twice of a second preset value, and the position offset value includes a two-dimensional coordinate offset between each prediction key point and the corresponding initial key point.
Specifically, the prediction module does not change the space size of the input features, only changes the number of feature channels, and assumes that the feature dimension obtained by the convolution module isH×W×CThe second preset value corresponding to the initial key point in the initial key point set isnIf the number of channels corresponding to the prediction module is twice the second preset value, the feature dimension output by the prediction module isH×W×2nWherein, the number of channels2nCorrespond tonThe two-dimensional coordinate offset of each initial keypoint.
Optionally, the prediction module consists of two convolutional layers in series, so as to obtain a channel number of2nThe prediction module of (3).
S204: and inputting the initial key point set and the prediction key point set into a polymerization module so that the polymerization module polymerizes the characteristics of each initial key point and the corresponding prediction key point thereof to obtain a polymerization key point set consisting of the polymerization key points corresponding to each initial key point.
Specifically, an initial key point set and a prediction key point set are input into a polymerization module, and then the polymerization module polymerizes the characteristics of each initial key point and the corresponding prediction key point to obtain a polymerization key point corresponding to each initial key point, a plurality of polymerization key points form a polymerization key point set, and the characteristics of the initial key points and the prediction key points are polymerized to adapt to the condition that the position of a target key point is inaccurate, and improve the accuracy and robustness of key point characteristic extraction.
In an application mode, based on the positions of each initial key point and the corresponding predicted key point, obtaining a connecting line between each initial key point and the corresponding predicted key point; dividing each connecting line into a first preset value of reference points, and performing maximum pooling operation on the first preset value of reference points to aggregate the characteristics of the first preset value of reference points to obtain aggregation key points corresponding to each connecting line and aggregation characteristics corresponding to the aggregation key points; and forming a polymerization key point set by the multiple polymerization key points, and connecting the polymerization features corresponding to the polymerization key points in series to obtain the enhancement features corresponding to the polymerization key point set.
Specifically, an initial key point is used as a starting point and a predicted key point corresponding to the initial key point is used as an end point, the starting point and the end point are connected to obtain a connecting line between each initial key point and the corresponding predicted key point, each connecting line is divided into a first preset number of reference points, the features of the first preset number of reference points are aggregated through maximum pooling operation, so that an aggregated key point corresponding to each initial key point is obtained, a plurality of aggregated key points form an aggregated key point set, the aggregated features corresponding to each aggregated key point in the aggregated key point set are connected in series, so that enhanced features corresponding to the aggregated key point set are obtained, and for targets with interference factors such as motion blur, target pose change, environmental illumination change and/or shielding, the aggregation module is utilized to enable the proposed method to adapt to the situation that the position of the target key point is inaccurate, and the accuracy and the robustness of key point feature extraction are improved. The above process is formulated as follows:
F m =max I[x i +k(x m -x i )/(N-1),y i +k(y m -y i )/(N-1)] (3)
wherein, (ii) (x i y i ) Indicating the location of the initial keypoint, ((ii))x m y m ) Indicating the location of the predicted keypoint,I(x,y)representing the response value of the corresponding pixel point in the convolution characteristic diagram output by the convolution module,kvalue range of 0 is less than or equal tokN-1The polymerization feature of the polymerization key point isF m ∈R H×W×C The enhanced characteristics corresponding to the polymerization key point set obtained by connecting the characteristics of all the polymerization key points in series areF a ∈R H×W×nC
In a specific application scenario, please refer to fig. 3, where fig. 3 is an application scenario diagram of an embodiment corresponding to step S204 in fig. 2, the training target in fig. 3 is an automobile, taking an initial key point corresponding to the training target as an example, the initial key point is a dot without padding, the initial key point is taken as a pole point, the predicted key point is a gray dot, the pole point and the predicted key point are connected to obtain a pole axis corresponding to the initial key point, the pole axis is divided equally into N reference points, which are shown by a dotted line in fig. 3, maximum pooling operation is performed on N total reference points including the initial key point and the predicted key point, features of the N reference points are aggregated to obtain dots with the aggregated key point being black, so as to obtain a more accurate position of the training target, adapt to a target with interference factors such as motion blur, target pose change, illumination environment change, and/or occlusion, and improve the robustness and the key point feature extraction. Wherein, N can be any value which is larger than 3 and is set by a user.
S205: and determining corner positions corresponding to the aggregation key points of the preset corners in an aggregation key point set consisting of a plurality of aggregation key points.
Specifically, an angle position corresponding to the aggregation key point of the preset angle is extracted from the aggregation key point set, and when the aggregation key point corresponds to a two-dimensional coordinate, the corresponding angle position is the two-dimensional coordinate.
S206: and determining a target prediction box corresponding to the training target based on the corner position.
Specifically, a minimum bounding rectangle capable of surrounding all the aggregation key points is determined based on the corner positions, and a target prediction frame corresponding to the training target is obtained.
In an application mode, the preset corner comprises a lower left corner and an upper right corner, and the minimum rectangular frame corresponding to the target prediction frame is determined based on the positions of the aggregation key points at the lower left corner and the upper right corner.
In another application mode, the preset corner comprises an upper left corner and a lower right corner, and the minimum circumscribed rectangle frame corresponding to the target prediction frame is determined based on the positions of the aggregation key points at the upper left corner and the lower right corner. The above process is formulated as follows:
Br=[min k (x k +Δx k ),min k (y k +Δy k ),max k (x k +Δx k ),max k (y k +Δy k )](4)
wherein, (ii) (x k +△x k y k +△y k ) And after the positions of the prediction key points are obtained, the abscissa and the ordinate of each corner are extracted from the positions, so that the minimum circumscribed rectangle frame corresponding to the target prediction frame is determined, the minimum circumscribed rectangle frame can surround all the aggregation key points, and the target prediction frame with more accurate labels is obtained.
S207: and adjusting parameters of the image processing model based on the target true value frame and the target prediction frame corresponding to the training target.
Specifically, parameters of the image processing model are adjusted based on the coincidence rate of the target prediction frame relative to the target true value frame.
In an application mode, after prediction key points corresponding to all initial key points are obtained, a rectangular frame capable of surrounding all the prediction key points is determined based on the positions of the prediction key points to obtain initial prediction frames, parameters of a convolution module and a prediction module are adjusted based on the initial prediction frames and a target true value frame, when the adjustment exceeds a preset number of times, aggregation key points corresponding to all the initial key points are obtained based on all the initial key points and the corresponding prediction key points, so that a target prediction frame corresponding to a training target is determined, and the parameters of the convolution module, the prediction module and the aggregation module are adjusted based on the coincidence rate of the target prediction frame and the target true value frame.
In another application mode, all modules in the image processing model are trained simultaneously, aggregation key points corresponding to all initial key points are obtained based on all the initial key points and prediction key points corresponding to the initial key points, so that a target prediction frame corresponding to a training target is determined, and parameters of the convolution module, the prediction module and the aggregation module are adjusted based on the coincidence rate of the target prediction frame and the target true value frame.
S208: and responding to the condition that the preset convergence condition is met, and obtaining the trained image processing model.
Specifically, when a preset convergence condition is met, a trained image processing model is obtained.
Further, the trained image processing model can mark a target in an input image and extract visual features of the target, and the trained image processing model can be used for different application scenarios such as target recognition, target detection, target tracking, instance segmentation and the like.
In this embodiment, an initial key point set composed of a second preset number of initial key points is used to represent a training target, so that a target prediction frame is finally obtained, so that data labeling independent of pixel levels is achieved, only a true value frame needs to be labeled on the training target, dependency of an image processing model on data labeling quality is reduced, aggregation features corresponding to each aggregation key point in an aggregation key point set are connected in series by using an aggregation module, enhancement features corresponding to the aggregation key point set are obtained, the proposed method can adapt to the situation that the position of a target key point is inaccurate, and accuracy and robustness of key point feature extraction are improved.
Referring to fig. 4, fig. 4 is a schematic diagram of a topological structure of an embodiment of an image processing model of the present application, where the image processing model 40 includes a convolution module 400, a prediction module 402, and an aggregation module 404, and a training image corresponding to a training target is input into the image processing model 40, and the image processing model 40 uses a plurality of initial key points to represent the training target, and predicts the positions of the initial key points to obtain a plurality of predicted key points; obtaining a polymerization key point corresponding to each initial key point based on each initial key point and a prediction key point corresponding to the initial key point; determining a target prediction frame corresponding to a training target based on a plurality of aggregation key points; adjusting parameters of the image processing model 40 based on a target true value frame and a target prediction frame corresponding to the training target; in response to the preset convergence condition being met, a trained image processing model 40 is obtained.
Specifically, the convolution module 400 extracts features of the training image to obtain a convolution feature map corresponding to the training image, and represents the training target on the convolution feature map by using a plurality of initial key points to obtain an initial key point set composed of a plurality of initial key points. The prediction module 402 predicts the positions of the initial key points to obtain a prediction key point set composed of the prediction key points corresponding to the initial key points.
Further, the aggregation module 404 aggregates the characteristics of each initial keypoint and the corresponding predicted keypoint to obtain an aggregation keypoint set composed of the aggregation keypoints corresponding to each initial keypoint. The aggregation module 404 obtains a connection line between each initial key point and the corresponding predicted key point based on the position of each initial key point and the corresponding predicted key point; dividing each connecting line into a first preset number of reference points, and performing maximum pooling operation on the first preset number of reference points to aggregate the characteristics of the first preset number of reference points to obtain aggregation key points corresponding to each connecting line and aggregation characteristics corresponding to the aggregation key points; and forming a polymerization key point set by the multiple polymerization key points, and connecting the polymerization features corresponding to the polymerization key points in series to obtain the enhancement features corresponding to the polymerization key point set.
It should be noted that the image processing model 40 in this embodiment can be trained by using the method described in any of the above embodiments.
Referring to fig. 5, fig. 5 is a schematic structural diagram of an embodiment of an electronic device 50 of the present application, where the electronic device 50 includes a memory 501 and a processor 502 coupled to each other, where the memory 501 stores program data (not shown), and the processor 502 calls the program data to implement the method in any of the above embodiments, and for a description of relevant contents, reference is made to the detailed description of the above method embodiments, which is not repeated here.
Referring to fig. 6, fig. 6 is a schematic structural diagram of an embodiment of a computer-readable storage medium 60 of the present application, the computer-readable storage medium 60 stores program data 600, and the program data 600 is executed by a processor to implement the method in any of the above embodiments, and the related contents are described in detail with reference to the above method embodiments, which are not repeated herein.
It should be noted that, units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit may be implemented in the form of hardware, or may also be implemented in the form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application, which are essential or contributing to the prior art, or all or part of the technical solutions may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only an embodiment of the present application, and is not intended to limit the scope of the present application, and all equivalent structures or equivalent processes performed by the present application and the contents of the attached drawings, which are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims (9)

1. A method for training an image processing model, the method comprising:
inputting a training image corresponding to a training target into an image processing model, wherein the image processing model represents the training target by using a plurality of initial key points, and predicting the position of each initial key point to obtain a plurality of predicted key points; obtaining a polymerization key point corresponding to each initial key point based on each initial key point and the prediction key point corresponding to the initial key point; determining a target prediction box corresponding to the training target based on the aggregation key points;
adjusting parameters of the image processing model based on a target true value frame and the target prediction frame corresponding to the training target;
obtaining the trained image processing model in response to the preset convergence condition being met;
wherein obtaining aggregation key points corresponding to each initial key point based on each initial key point and the prediction key point corresponding to the initial key point comprises: obtaining a connecting line between each initial key point and the corresponding prediction key point based on the initial key point and the position of the corresponding prediction key point; dividing each connection line into a first preset number of reference points, and performing maximum pooling operation on the first preset number of reference points to aggregate characteristics of the first preset number of reference points to obtain aggregation key points corresponding to each connection line and aggregation characteristics corresponding to the aggregation key points; and forming a polymerization key point set by the polymerization key points, and connecting the polymerization features corresponding to the polymerization key points in series to obtain the enhancement features corresponding to the polymerization key point set.
2. The method of claim 1, wherein the image processing model comprises a convolution module and a prediction module, and the predicting the positions of the initial key points to obtain a plurality of predicted key points comprises:
inputting a training image corresponding to a training target into the convolution module so that the convolution module extracts the characteristics of the training image to obtain a convolution characteristic diagram corresponding to the training image;
representing the training target on the convolution characteristic diagram by using a plurality of initial key points to obtain an initial key point set consisting of the initial key points;
and inputting the initial key point set into the prediction module so that the prediction module predicts the position of each initial key point to obtain a prediction key point set consisting of prediction key points corresponding to each initial key point.
3. The method according to claim 2, wherein the image processing model further comprises an aggregation module, and the aggregation module is configured to implement a step of obtaining an aggregation key point corresponding to each initial key point based on each initial key point and the predicted key point corresponding to the initial key point.
4. The method for training an image processing model according to claim 2, wherein the representing the training target on the convolution feature map by using a plurality of initial key points to obtain an initial key point set composed of a plurality of initial key points comprises:
selecting pixel points with a second preset value from all pixel points of the convolution characteristic graph as the initial key points to obtain an initial key point set consisting of the initial key points with the second preset value;
determining an initial position of each of the initial keypoints in the initial set of keypoints.
5. The method of claim 4, wherein the inputting the initial keypoint set into the prediction module to enable the prediction module to predict the position of each initial keypoint, so as to obtain a predicted keypoint set composed of predicted keypoints corresponding to each initial keypoint, includes:
inputting the initial key point set into the prediction module so that the prediction module predicts the position corresponding to each initial key point to obtain a position deviation value;
superposing the position deviation value to the initial position of the corresponding initial key point to obtain the predicted position corresponding to each initial key point;
and determining the predicted key points based on the predicted positions corresponding to the initial key points to obtain a predicted key point set consisting of the predicted key points with the second preset value.
6. The image processing model training method of claim 5,
the initial position comprises a two-dimensional coordinate corresponding to the initial key point, the number of channels corresponding to the prediction module is twice of the second preset value, and the position deviation value comprises a two-dimensional coordinate deviation between each prediction key point and the corresponding initial key point.
7. The method for training an image processing model according to claim 1, wherein the determining a target prediction box corresponding to the training target based on the plurality of aggregation key points comprises:
determining corner positions corresponding to the aggregation key points of preset corners in an aggregation key point set consisting of a plurality of aggregation key points;
and determining a target prediction box corresponding to the training target based on the corner position.
8. An electronic device, comprising: a memory and a processor coupled to each other, wherein the memory stores program data that the processor calls to perform the method of any of claims 1-7.
9. A computer-readable storage medium, on which program data are stored, which program data, when being executed by a processor, carry out the method according to any one of claims 1-7.
CN202211311142.0A 2022-10-25 2022-10-25 Image processing model training method, electronic device, and computer-readable storage medium Active CN115375976B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211311142.0A CN115375976B (en) 2022-10-25 2022-10-25 Image processing model training method, electronic device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211311142.0A CN115375976B (en) 2022-10-25 2022-10-25 Image processing model training method, electronic device, and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN115375976A CN115375976A (en) 2022-11-22
CN115375976B true CN115375976B (en) 2023-02-10

Family

ID=84073258

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211311142.0A Active CN115375976B (en) 2022-10-25 2022-10-25 Image processing model training method, electronic device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN115375976B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329888A (en) * 2020-11-26 2021-02-05 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
CN115063656A (en) * 2022-05-31 2022-09-16 北京开拓鸿业高科技有限公司 Image detection method and device, computer readable storage medium and electronic equipment

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11107238B2 (en) * 2019-12-13 2021-08-31 Zebra Technologies Corporation Method, system and apparatus for detecting item facings
CN114092963B (en) * 2021-10-14 2023-09-22 北京百度网讯科技有限公司 Method, device, equipment and storage medium for key point detection and model training
CN114022900A (en) * 2021-10-29 2022-02-08 北京百度网讯科技有限公司 Training method, detection method, device, equipment and medium for detection model
CN114549557A (en) * 2022-02-28 2022-05-27 重庆紫光华山智安科技有限公司 Portrait segmentation network training method, device, equipment and medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112329888A (en) * 2020-11-26 2021-02-05 Oppo广东移动通信有限公司 Image processing method, image processing apparatus, electronic device, and storage medium
CN115063656A (en) * 2022-05-31 2022-09-16 北京开拓鸿业高科技有限公司 Image detection method and device, computer readable storage medium and electronic equipment

Also Published As

Publication number Publication date
CN115375976A (en) 2022-11-22

Similar Documents

Publication Publication Date Title
CN114782691B (en) Robot target identification and motion detection method based on deep learning, storage medium and equipment
CN109753913B (en) Multi-mode video semantic segmentation method with high calculation efficiency
CN109493367B (en) Method and equipment for tracking target object
CN111242954B (en) Panorama segmentation method with bidirectional connection and shielding processing
CN110020650B (en) Inclined license plate recognition method and device based on deep learning recognition model
CN112614110B (en) Method and device for evaluating image quality and terminal equipment
CN111553302B (en) Key frame selection method, device, equipment and computer readable storage medium
CN112784750B (en) Fast video object segmentation method and device based on pixel and region feature matching
CN108197567B (en) Method, apparatus and computer readable medium for image processing
CN111144213A (en) Object detection method and related equipment
CN115423846A (en) Multi-target track tracking method and device
US12056897B2 (en) Target detection method, computer device and non-transitory readable storage medium
CN110598698A (en) Natural scene text detection method and system based on adaptive regional suggestion network
EP4020387A2 (en) Target tracking method and device, and electronic apparatus
CN116468919A (en) Image local feature matching method and system
CN113657225B (en) Target detection method
CN109447943B (en) Target detection method, system and terminal equipment
KR20100041172A (en) Method for tracking a movement of a moving target of image tracking apparatus
CN115375976B (en) Image processing model training method, electronic device, and computer-readable storage medium
CN113486879A (en) Image area suggestion frame detection method, device, equipment and storage medium
CN110992371B (en) Portrait segmentation method and device based on priori information and electronic equipment
CN111461177A (en) Image identification method and device
CN114219070A (en) Training method of image processing model, target detection method and attribute identification method
CN109636818A (en) A kind of Laplce's canonical constrains the Target Segmentation method of lower low-rank sparse optimization
Ueda et al. Data Augmentation for Semantic Segmentation Using a Real Image Dataset Captured Around the Tsukuba City Hall

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant