CN113033549A - Training method and device for positioning diagram acquisition model - Google Patents

Training method and device for positioning diagram acquisition model Download PDF

Info

Publication number
CN113033549A
CN113033549A CN202110258523.6A CN202110258523A CN113033549A CN 113033549 A CN113033549 A CN 113033549A CN 202110258523 A CN202110258523 A CN 202110258523A CN 113033549 A CN113033549 A CN 113033549A
Authority
CN
China
Prior art keywords
positioning
category
loss function
positioning diagram
sample image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110258523.6A
Other languages
Chinese (zh)
Other versions
CN113033549B (en
Inventor
尚方信
杨叶辉
王磊
许言午
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202110258523.6A priority Critical patent/CN113033549B/en
Publication of CN113033549A publication Critical patent/CN113033549A/en
Priority to PCT/CN2021/106885 priority patent/WO2022188327A1/en
Application granted granted Critical
Publication of CN113033549B publication Critical patent/CN113033549B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The disclosure discloses a training method and a training device for a positioning diagram acquisition model, relates to the technical field of image processing, and particularly relates to the fields of computer vision, deep learning and other artificial intelligence. The scheme is as follows: inputting the sample image into a positioning diagram acquisition model for category identification, and acquiring a positioning diagram of each identified category; obtaining label information of a sample image, and obtaining a loss function corresponding to each category by combining the pixel value of the positioning image of each category; and reversely adjusting the positioning diagram acquisition model based on the loss function corresponding to each category, and returning to train the adjusted positioning diagram acquisition model by using the next sample image until the training is finished to generate the target positioning diagram acquisition model. According to the method, the loss function of each type is determined through the positioning diagrams of the types and the label information of the sample images, and then the loss function of the model is generated to reversely adjust the parameters of the model so as to guide the model to screen the region with higher attention and obtain the optimized positioning diagram.

Description

Training method and device for positioning diagram acquisition model
Technical Field
The present disclosure relates to the field of image processing technology, and in particular, to the field of artificial intelligence such as computer vision and deep learning.
Background
The image recognition is an important field of artificial intelligence, in the development of the image recognition, the positioning diagram recognition is an important technology, the research on the aspect of the positioning diagram recognition is greatly advanced, and the positioning diagram recognition lays a foundation for further image recognition, analysis and understanding.
Disclosure of Invention
The present disclosure provides a training method for a mapping acquisition model. Finally determining a loss function of the model through the positioning diagram of the category and the label information of the sample image to reversely adjust the parameters of the model so as to guide the positioning diagram to obtain the model and screen the region with higher attention, thereby realizing the optimization of the positioning diagram.
According to another aspect of the present disclosure, a training apparatus for a scout map acquisition model is provided.
According to another aspect of the present disclosure, an electronic device is provided.
According to another aspect of the present disclosure, a non-transitory computer readable storage medium is provided.
According to another aspect of the present disclosure, a computer program product is provided.
To achieve the above object, an embodiment of a first aspect of the present disclosure provides a method for training a positioning diagram obtaining model, where the method includes:
inputting the sample image into a positioning diagram acquisition model for category identification, and acquiring a positioning diagram of each identified category;
acquiring label information of the sample image, and acquiring a loss function corresponding to each category according to the label information of the sample image, the pixel value of the positioning image of each category and the label information of the sample image;
and reversely adjusting the positioning diagram acquisition model based on the loss function corresponding to each category, and returning to train the adjusted positioning diagram acquisition model by using the next sample image until the training is finished to generate the target positioning diagram acquisition model.
To achieve the above object, an embodiment of a second aspect of the present disclosure provides a training apparatus for a scout map acquisition model, the apparatus including:
the first acquisition module is used for inputting the sample image into the positioning diagram acquisition model for category identification to acquire the positioning diagram of each identified category;
the second acquisition module is used for acquiring label information of the sample image and acquiring a loss function corresponding to each category according to the label information of the sample image and the pixel value of each positioning image according to the category;
and the adjusting module is used for reversely adjusting the positioning diagram obtaining model based on the loss function corresponding to each category, and returning to continue training the adjusted positioning diagram obtaining model by using the next sample image until the training is finished to generate the target positioning diagram obtaining model.
To achieve the above object, an embodiment of a third aspect of the present disclosure provides an electronic device, which includes at least one processor, and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of training a mapping acquisition model as defined in the first aspect of the disclosure.
To achieve the above object, a fourth aspect of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions, where the computer instructions are used to cause a computer to execute the training method of the location map acquisition model according to the first aspect of the present disclosure.
To achieve the above object, a fifth aspect of the present disclosure provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the computer program implements the method for training a location graph acquisition model according to the first aspect of the present disclosure.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow diagram of a method of training a localization graph acquisition model according to one embodiment of the present disclosure;
FIG. 2 is a flow diagram of a method of training a localization graph acquisition model according to another embodiment of the present disclosure;
FIG. 3 is a flow diagram of a method of training a localization graph acquisition model according to another embodiment of the present disclosure;
FIG. 4 is a flow diagram of a method of training a localization graph acquisition model according to another embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a training method of a location graph acquisition model according to one embodiment of the present disclosure;
FIG. 6 is a block diagram of a training apparatus for a mapping acquisition model according to one embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device in which an embodiment of the present disclosure may be implemented.
Detailed Description
Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Image Processing (Image Processing) techniques that analyze an Image with a computer to achieve a desired result. Also known as image processing. Image processing generally refers to digital image processing. Digital images are large two-dimensional arrays of elements called pixels and values called gray-scale values, which are captured by industrial cameras, video cameras, scanners, etc. Image processing techniques generally include image compression, enhancement and restoration, matching, description and identification of 3 parts.
Deep Learning (DL) is a new research direction in the field of Machine Learning (ML), and is introduced into Machine Learning to make it closer to the original target, artificial intelligence. Deep learning is the intrinsic law and representation hierarchy of learning sample data, and information obtained in the learning process is very helpful for interpretation of data such as characters, images and sounds. The final aim of the method is to enable the machine to have the analysis and learning capability like a human, and to recognize data such as characters, images and sounds. Deep learning is a complex machine learning algorithm, and achieves the effect in speech and image recognition far exceeding the prior related art.
Computer Vision (Computer Vision) is a science for researching how to make a machine "see", and further, it means that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further image processing is performed, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can acquire 'information' from images or multidimensional data. The information referred to herein refers to information defined by Shannon that can be used to help make a "decision". Because perception can be viewed as extracting information from sensory signals, computer vision can also be viewed as the science of how to make an artificial system "perceive" from images or multidimensional data.
Artificial Intelligence (AI) is a subject of studying some thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) of a computer to simulate a human life, and has both hardware and software technologies. Artificial intelligence hardware techniques generally include computer vision techniques, speech recognition techniques, natural language processing techniques, and learning/deep learning thereof, big data processing techniques, knowledge-graph techniques, and the like.
Fig. 1 is a flowchart of a training method of a positioning diagram obtaining model according to an embodiment of the present disclosure, and as shown in fig. 1, the training method of the positioning diagram obtaining model includes the following steps:
s101, inputting the sample image into a positioning diagram acquisition model for category identification, and acquiring a positioning diagram of each identified category.
And preprocessing the sample image and inputting the preprocessed sample image into a positioning diagram to obtain a model. After the sample image is preprocessed, irrelevant noise in the image can be eliminated, data is simplified, and meanwhile detectability of relevant information can be enhanced. Optionally, the pre-processing includes digitizing, smoothing, restoring or enhancing, etc.
In the embodiment of the present disclosure, the positioning diagram obtaining model may include a classification network, and the classification of the input sample image is performed based on the classification network. Optionally, the classification network includes a feature extractor and a classifier, where:
the feature extractor comprises a convolution layer, a pooling layer and a normalization layer, and can be used for performing feature extraction on the sample image to obtain a feature vector corresponding to the sample image. Inputting the preprocessed sample image into a classification network, performing convolution operation on the sample image by a convolution layer in a feature extractor, and extracting a feature map of the sample image; then, performing pooling operation by a pooling layer, and reducing feature dimensionality while keeping the main features of the sample image so as to reduce the calculated amount; after the convolution operation and the pooling operation are performed on the sample image, the data distribution is likely to be changed, and in order to solve the problem that the middle layer data distribution is greatly changed in the training process, the characteristic diagram of the sample image needs to be normalized, and finally the characteristic vector of the sample image is extracted.
The classifier comprises a full connection layer used for integrating the feature vectors, the full connection layer performs full connection operation on the feature vectors output by the feature extractor, and further, a positioning diagram of a category corresponding to the sample image is determined. Different sample images may correspond to different categories. In implementation, the positioning Map of each category is determined based on the type recognition result of the classifier on the sample image, and in the embodiment of the present disclosure, the positioning Map may be understood as a Class Classification Activation Map (CAM) of the category. The class activation map is used for reflecting the importance degree of each position in the sample image to the category, so that whether the position belongs to the category or not can be determined based on the importance degree of the position to the category, and the positioning target of the category can be further determined from the positioning map.
Alternatively, the classification network may be constructed by itself, or a convolutional neural network, a ResNet (residual error network), a densneet (dense convolutional network), or other network models may be used.
S102, obtaining label information of the sample image, and obtaining a loss function corresponding to each category according to the label information of the sample image and the pixel value of the positioning image of each category.
Labeling label information of the sample image in advance, wherein in the multi-classification network, the label information of the sample image comprises labels of each class, the value of one label is 1, the values of the labels of the rest classes are 0, namely the label information of the sample image can be represented as yn={0,0,1...,0}。
In the embodiment of the present disclosure, the obtained positioning map is actually a matrix, where elements in the matrix are position points on the positioning map, and values of the elements are pixel values of the positioning map. The pixel value of each location point on the location map may reflect the importance of the location point to the category. And the label information of the sample image can directly reflect whether the sample image belongs to a certain category or not. Therefore, in the localization image obtaining model, the loss function corresponding to each category may be obtained based on the label information of the sample image and the pixel value of the localization image of each category. For example, the sample image includes a category a, a category B, and a category C. Based on the label information of the sample image and the pixel value of the positioning map of the category a, the loss function corresponding to the category a can be obtained. Based on the label information of the sample image and the pixel value of the positioning map of the category B, the loss function corresponding to the category B can be acquired. Further, based on the label information of the sample image and the pixel value of the positioning map of the category C, the loss function corresponding to the category C may be obtained. That is, for each class identified in the sample image, the loss function corresponding to the class needs to be acquired.
According to the embodiment of the disclosure, the loss function of each category is constructed to train the positioning diagram obtaining model so as to reduce errors, and finally, the target positioning diagram obtaining model is generated so as to obtain the optimized positioning diagram.
And S103, reversely adjusting the positioning diagram acquisition model based on the loss function corresponding to each category, and returning to continue training the adjusted positioning diagram acquisition model by using the next sample image until the training is finished to generate the target positioning diagram acquisition model.
After the loss function of each category is obtained, because the localization map obtaining model needs to identify each category, the loss function of each category needs to be considered comprehensively, so the loss functions of each category can be summed or the corresponding loss functions are weighted according to the categories to obtain the overall loss function of the localization map obtaining model, the gradient information of the localization map obtaining model is determined according to the overall loss function of the model, the gradient information is reversely propagated to each layer of the localization map obtaining model, and parameters, such as weight, of each layer of the localization map obtaining model are adjusted.
And after the adjustment is finished and before the condition of finishing the model training is not met, the next sample image is used for continuously training the adjusted positioning image acquisition model until the training is finished to generate the target positioning image acquisition model. Alternatively, the training end condition may be that a preset number of times of training is reached or that the error after training is smaller than a preset threshold.
On the basis of the above example, after the target positioning diagram obtaining model is obtained, type recognition may be performed on any image to obtain a positioning diagram of the any image, and thus a target in the image may be obtained. In the embodiment of the present disclosure, under the condition that the positioning diagram obtaining model has no accurate reference, not only the model outputs a more accurate and comprehensive positioning diagram, but also the classification result of the model is better, that is, the positioning diagram obtaining model is a "weak supervision" model with a lower labeling precision than the output precision.
The training method of the positioning diagram obtaining model provided by the embodiment of the disclosure comprises the steps of firstly inputting a sample image into the positioning diagram obtaining model to carry out category identification, and obtaining a positioning diagram of each identified category; then, for each category, obtaining a loss function corresponding to the category according to the pixel value of the positioning image of the category and the label information of the sample image; and finally, reversely adjusting the positioning diagram acquisition model based on the loss function corresponding to each category, and returning to use the next sample image to train the adjusted positioning diagram acquisition model continuously until the training is finished to generate the target positioning diagram acquisition model. According to the embodiment of the method and the device, the loss function of the model is finally determined through the positioning diagram of the category and the label information of the sample image, so that the parameters of the model are reversely adjusted, the positioning diagram is guided to obtain the region with higher attention of the model screening, the model is enabled not to focus on the region with the most discriminability of the target, and therefore optimization of the positioning diagram is achieved. Moreover, the loss function is constructed based on the positioning graph of the category, so that the image information which is not related to the category is suppressed.
On the basis of the foregoing embodiment, the process of obtaining the loss function corresponding to the category, as shown in fig. 2, may include the following steps:
s201, aiming at each category, obtaining a pixel mean value of the positioning image according to the pixel value of the positioning image of the category.
In the embodiment of the disclosure, the length of the input sample image is H pixel points, the width of the input sample image is W pixel points, and the sample image has z features, wherein each feature can correspond to one channel number. In the embodiment of the present disclosure, the positioning graph obtaining model may output a positioning graph of a category, where the positioning graph may be represented as:
Figure BDA0002968561910000071
wherein M isc(x, y) represents the pixel value of the positioning map M of the c-th category at the position point (x, y);
Figure BDA0002968561910000076
representing fully connected layersThe weight vector of the kth channel, k is less than or equal to z; f. ofk(x, y) represents the value of the feature map f corresponding to the sample image on the k-th channel of the position point (x, y).
After the pixel values of each position point of the positioning map are acquired, the pixel values at the position points may be averaged to acquire a pixel mean value of the positioning map. Optionally, in order to improve the model and reduce the data distribution variation, the embodiment of the present disclosure may constrain the pixel value of each position point in the positioning map, so as to constrain the pixel value within the same target value range. In the disclosed embodiment, the value of the pixel value of each location point in the localization map is constrained from (— infinity, + ∞) to [0, + ∞). For example, the pixel values may be squared to a root or an absolute value, etc. Furthermore, a hyper-parameter of the model is obtained based on a set value, such as a positioning diagram, and then the pixel value is restricted to a target value range.
Alternatively, the pixel values of the location points on the map may be constrained based on the following formula:
Figure BDA0002968561910000072
wherein the content of the first and second substances,
Figure BDA0002968561910000073
for constraining the range of values of the pixel at each location point (x, y) in the localization map from (— infinity, + ∞) to [0, + ∞). Further, eta is a hyper-parameter preset by the positioning diagram acquisition model, and min (-) is an operation for taking the minimum value and is used for selecting
Figure BDA0002968561910000074
And the minimum value in the hyper-parameter eta is used as the constrained pixel value at the position point (x, y); that is, the over-parameter η is an upper limit value of the target value range; that is, the target range of the constrained pixel value is [0, η [ ]]。
Acquiring a pixel mean value A of the positioning map based on the constrained pixel value corresponding to each position point in the target value range and the resolution of the positioning mapn,c
Figure BDA0002968561910000075
Where (u × v) is the resolution of the scout map, CCAMn,cAnd (4) constraining the pixel values of the c category of the nth positioning diagram.
S202, acquiring label values of the sample images on the categories based on the label information of the sample images.
The embodiment of the present disclosure illustrates how to obtain a label value based on label information of a sample image. If all the sample images include the following categories in sequence: { rabbit, puppy, kitten,.., bird }, then sample image n labeled "kitten," labeled yn0, i.e., kitten category tag value is 1, and the remaining category tag values are set to 0. y isn,cLabeling label y for nth sample imagenA value in the c-th class is {0,0,1.
And S203, determining the loss function of the category according to the pixel mean value and the label value.
For the c-th class, the pixel mean A of the localization map according to class c of the n-th sample imagen,cAnd the label value y of the nth sample image in the category cn,cConstructing a loss function for class c:
Figure BDA0002968561910000081
wherein N represents a total of N sample images in the data set.
In the embodiment of the disclosure, the constructed loss function can be implemented in yn,cWhen the number is 1, it is described that the nth sample image includes an image of a category c, that is, the location map of the nth sample image is important with respect to the category c, and in the embodiment of the present disclosure, the pixel value mean value a of the location map may be obtained by the above-mentioned loss function of the category cn,cAdjusting towards the increasing direction, namely increasing the pixel value of the positioning image; at yn,cWhen 0, it means that the nth sample image is not includedThe image including the category c, i.e. the positioning map of the nth sample image, is not important with respect to the category c, and in the embodiment of the present disclosure, the mean value a of the pixel values of the positioning map may be obtained by the above-mentioned loss function of the category cn,cAnd adjusting towards the reducing direction, namely reducing the pixel value of the positioning diagram so as to guide the positioning diagram acquisition model to select a region with high attention as much as possible and reduce the value of the loss function.
On the basis of the above embodiment, the process of performing reverse adjustment on the location map acquisition model based on the class-based loss function, as shown in fig. 3, may include the following steps:
s301, summing the loss functions corresponding to all the categories to obtain a first loss function of the positioning diagram obtaining model.
Based on the loss functions of a certain category obtained in step S203, the loss functions of all categories are obtained, and the loss functions corresponding to all categories are summed up to be used as the first loss function of the localization map obtaining model.
Figure BDA0002968561910000082
Wherein m represents m categories of the data set, alpha is a preset parameter, and LcThe first loss function for class c.
S302, a second loss function of the positioning diagram obtaining model is obtained.
The second loss function is applicable to the location map of each category, is a loss function common to the classification network, and optionally, a cross entropy loss function can be used as the second loss function. During the training process, a second loss function may be obtained based on the training error.
S303, determining a total loss function of the positioning diagram acquisition model based on the first loss function and the second loss function.
And summing the first loss function and the second loss function to obtain a total loss function of the positioning diagram acquisition model.
Ltotal=L1+L2
Wherein L is2Representing a second loss function.
S304, determining gradient information of the positioning diagram acquisition model based on the total loss function, and reversely adjusting the positioning diagram acquisition model based on the gradient information.
And training the positioning diagram acquisition model by using a total loss function, determining gradient information of the positioning diagram acquisition model, reversely transmitting the gradient information to each layer of the positioning diagram acquisition model, and adjusting parameters, such as weight, of each layer of the positioning diagram acquisition model.
Fig. 4 is another training method for a positioning diagram acquisition model provided in the embodiment of the present disclosure. The training method of the positioning diagram acquisition model comprises the following steps:
s401, inputting the sample image into the positioning diagram obtaining model for category identification, and obtaining the positioning diagram of each identified category.
S402, aiming at each category, obtaining a pixel mean value of the positioning image according to the pixel value of the positioning image of the category.
And S403, acquiring the label value of the sample image on the category based on the label information of the sample image.
S404, determining a loss function of the category according to the pixel mean value and the label value.
S405, summing the loss functions corresponding to all the categories to obtain a first loss function of the positioning diagram obtaining model.
S406, acquiring a second loss function of the positioning diagram acquisition model.
S407, determining a total loss function of the positioning diagram acquisition model based on the first loss function and the second loss function.
And S408, determining gradient information of the positioning diagram acquisition model based on the total loss function, and reversely adjusting the positioning diagram acquisition model based on the gradient information.
And S409, returning to continue training the adjusted positioning image acquisition model by using the next sample image until the training is finished to generate the target positioning image acquisition model.
Fig. 5 is a schematic diagram of a training method of a positioning diagram obtaining model provided in the embodiment of the present disclosure, which illustrates that an image of a "puppy" is input into the positioning obtaining model, as shown in fig. 5, a feature diagram can be obtained by a feature extractor, and a feature vector is input into a classifier to obtain a category identification, and then the category identification is performed to obtain a positioning diagram of each identified category; and obtaining loss functions corresponding to the categories based on the label information and the pixel values of the positioning diagram, reversely adjusting the positioning diagram obtaining model based on the loss functions corresponding to all the categories, and returning to use the next sample image to continue training the adjusted positioning image obtaining model until the training is finished to generate the target positioning diagram obtaining model.
On the basis of the above example, after the target positioning diagram obtaining model is obtained, type recognition may be performed on any image to obtain a positioning diagram of the any image, and thus a target in the image may be obtained. In the embodiment of the present disclosure, under the condition that the positioning diagram obtaining model has no accurate reference, not only the model outputs a more accurate and comprehensive positioning diagram, but also the classification result of the model is better, that is, the positioning diagram obtaining model is a "weak supervision" model with a lower labeling precision than the output precision.
Fig. 6 is a block diagram of an apparatus for training a mapping acquisition model according to an embodiment of the present disclosure, and as shown in fig. 6, an apparatus 600 for training a mapping acquisition model includes:
the first obtaining module 61 is configured to input the sample image into the positioning diagram obtaining model to perform category identification, and obtain a positioning diagram of each identified category;
a second obtaining module 62, configured to obtain label information of the sample image, and obtain a loss function corresponding to each category according to the label information of the sample image and the pixel value of each category-based positioning map;
and the adjusting module 63 is configured to perform reverse adjustment on the positioning diagram obtaining model based on the loss function corresponding to each category, and return to continue training the adjusted positioning diagram obtaining model by using the next sample image until the training is finished to generate the target positioning diagram obtaining model.
It should be noted that the explanation of the embodiment of the training method for obtaining a location map is also applicable to the training device for obtaining a location map model of the embodiment, and is not repeated here.
The training device for the positioning diagram acquisition model provided by the embodiment of the disclosure firstly inputs a sample image into the positioning diagram acquisition model for category identification, and acquires a positioning diagram of each identified category; then, for each category, obtaining a loss function corresponding to the category according to the pixel value of the positioning image of the category and the label information of the sample image; and finally, reversely adjusting the positioning diagram acquisition model based on the loss function corresponding to each category, and returning to use the next sample image to train the adjusted positioning diagram acquisition model continuously until the training is finished to generate the target positioning diagram acquisition model. According to the embodiment of the method and the device, the loss function of the model is optimized to reduce the value of the loss function through the positioning diagram of the category and the label information of the sample image, and the positioning diagram is guided to obtain the region with higher attention of the model screening through reverse adjustment, so that the optimization of the positioning diagram is realized.
Further, in a possible implementation manner of the embodiment of the present disclosure, the second obtaining module 62 is further configured to: aiming at each category, acquiring a pixel mean value of the positioning image according to the pixel value of the positioning image of the category; obtaining label values of the sample images on the categories based on the label information of the sample images; and determining a loss function of the category according to the pixel mean value and the label value.
Further, in a possible implementation manner of the embodiment of the present disclosure, the adjusting module 63 is further configured to: summing the loss functions corresponding to all the categories to obtain a first loss function of the positioning diagram obtaining model; acquiring a second loss function of the positioning diagram acquisition model; determining a total loss function of the bitmap acquisition model based on the first loss function and the second loss function; and determining gradient information of the positioning diagram acquisition model based on the total loss function, and reversely adjusting the positioning diagram acquisition model based on the gradient information.
Further, in a possible implementation manner of the embodiment of the present disclosure, the second obtaining module 62 is further configured to: constraining the pixel value of each position point in the positioning graph to be within a target value range; and acquiring a pixel mean value of the positioning map based on the constrained pixel value corresponding to each position point in the target value range and the resolution of the positioning map.
Further, in a possible implementation manner of the embodiment of the present disclosure, the second obtaining module 62 is further configured to: and aiming at the pixel value of any position point on the positioning diagram, comparing the pixel value with the specified hyper-parameter in the positioning diagram acquisition model, selecting the minimum value between the pixel value and the hyper-parameter as the constrained pixel value corresponding to any position point, wherein the hyper-parameter is used for determining the upper limit value of the target value range.
Further, in a possible implementation manner of the embodiment of the present disclosure, the first obtaining module 61 is further configured to: and aiming at each category, acquiring a classification weight vector in the model and a characteristic vector of the image based on the positioning diagram corresponding to the category, and acquiring the positioning diagram of the category.
The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.
Fig. 7 is a schematic structural diagram of an electronic device provided in an embodiment of the present disclosure. As shown in fig. 7, the electronic device 700 includes a storage medium 71, a processor 72, and a computer program product stored on the memory 71 and executable on the processor 72, and when the processor executes the computer program, the processor implements the aforementioned method for training a scout map acquisition model.
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The service end can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service (Virtual Private Server, or VPS for short). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims (15)

1. A training method of a positioning diagram acquisition model comprises the following steps:
inputting the sample image into a positioning diagram acquisition model for category identification, and acquiring a positioning diagram of each identified category;
obtaining label information of the sample image, and obtaining a loss function corresponding to each category according to the label information of the sample image and the pixel value of the positioning image of each category;
and reversely adjusting the positioning diagram acquisition model based on the loss function corresponding to each category, and returning to use the next sample image to continuously train the adjusted positioning diagram acquisition model until the training is finished to generate the target positioning diagram acquisition model.
2. The method of claim 1, wherein the obtaining the loss function corresponding to each of the categories according to the label information of the sample image and the pixel value of the localization map of each of the categories comprises:
for each category, acquiring a pixel mean value of the positioning image according to the pixel value of the positioning image of the category;
acquiring a label value of the sample image on the category based on label information of the sample image;
and determining the loss function of the category according to the pixel mean value and the label value.
3. The method according to claim 1 or 2, wherein said inversely adjusting said localization map acquisition model based on the loss function corresponding to each of said classes comprises:
summing the loss functions corresponding to all the categories to obtain a first loss function of the positioning diagram obtaining model;
acquiring a second loss function of the positioning diagram acquisition model;
determining a total loss function of the positioning diagram acquisition model based on the first loss function and the second loss function;
and determining the gradient information of the positioning diagram acquisition model based on the total loss function, and reversely adjusting the positioning diagram acquisition model based on the gradient information.
4. The method of claim 2, wherein the obtaining a pixel mean of the positioning map according to the pixel values of the positioning map comprises:
constraining the pixel value of each position point in the positioning graph to be in a target value range;
and acquiring a pixel mean value of the positioning map based on the constrained pixel value corresponding to each position point in the target value range and the resolution of the positioning map.
5. The method of claim 4, wherein the constraining the pixel value of each location point in the positioning map to be within a target range of values comprises:
and aiming at the pixel value of any position point on the positioning diagram, comparing the pixel value with a specified hyper-parameter in the positioning diagram acquisition model, selecting the minimum value between the pixel value and the hyper-parameter as the constrained pixel value corresponding to the any position point, wherein the hyper-parameter is used for determining the upper limit value of the target value range.
6. The method of claim 1, wherein the obtaining a location map for each identified category comprises:
and aiming at each category, acquiring a classification weight vector in a model and a feature vector of the sample image based on the positioning diagram corresponding to the category, and acquiring the positioning diagram of the category.
7. A training apparatus for a scout map acquisition model, comprising:
the first acquisition module is used for inputting the sample image into the positioning diagram acquisition model for category identification to acquire the positioning diagram of each identified category;
the second obtaining module is used for obtaining the label information of the sample image and obtaining a loss function corresponding to each category according to the label information of the sample image and the pixel value of each positioning image according to the category;
and the adjusting module is used for reversely adjusting the positioning diagram obtaining model based on the loss function corresponding to each category, and returning to continue training the adjusted positioning diagram obtaining model by using the next sample image until the training is finished to generate the target positioning diagram obtaining model.
8. The apparatus of claim 7, wherein the second obtaining means is further configured to:
for each category, acquiring a pixel mean value of the positioning image according to the pixel value of the positioning image of the category;
acquiring a label value of the sample image on the category based on label information of the sample image;
and determining the loss function of the category according to the pixel mean value and the label value.
9. The apparatus of claim 7 or 8, wherein the adjustment module is further configured to:
summing the loss functions corresponding to all the categories to obtain a first loss function of the positioning diagram obtaining model;
acquiring a second loss function of the positioning diagram acquisition model;
determining a total loss function of the bitmap acquisition model based on the first loss function and the second loss function;
and determining the gradient information of the positioning diagram acquisition model based on the total loss function, and reversely adjusting the positioning diagram acquisition model based on the gradient information.
10. The apparatus of claim 8, wherein the second obtaining means is further configured to:
constraining the pixel value of each position point in the positioning graph to be in a target value range;
and acquiring a pixel mean value of the positioning map based on the constrained pixel value corresponding to each position point in the target value range and the resolution of the positioning map.
11. The apparatus of claim 10, wherein the second obtaining means is further configured to:
and aiming at the pixel value of any position point on the positioning diagram, comparing the pixel value with a specified hyper-parameter in the positioning diagram acquisition model, selecting the minimum value between the pixel value and the hyper-parameter as the constrained pixel value corresponding to the any position point, wherein the hyper-parameter is used for determining the upper limit value of the target value range.
12. The apparatus of claim 7, wherein the first obtaining module is further configured to:
and aiming at each category, acquiring a classification weight vector in a model and a feature vector of the image based on the positioning diagram corresponding to the category, and acquiring the positioning diagram of the category.
13. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.
14. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-6.
15. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1-6.
CN202110258523.6A 2021-03-09 2021-03-09 Training method and device for positioning diagram acquisition model Active CN113033549B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110258523.6A CN113033549B (en) 2021-03-09 2021-03-09 Training method and device for positioning diagram acquisition model
PCT/CN2021/106885 WO2022188327A1 (en) 2021-03-09 2021-07-16 Method and apparatus for training positioning image acquisition model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110258523.6A CN113033549B (en) 2021-03-09 2021-03-09 Training method and device for positioning diagram acquisition model

Publications (2)

Publication Number Publication Date
CN113033549A true CN113033549A (en) 2021-06-25
CN113033549B CN113033549B (en) 2022-09-20

Family

ID=76468937

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110258523.6A Active CN113033549B (en) 2021-03-09 2021-03-09 Training method and device for positioning diagram acquisition model

Country Status (2)

Country Link
CN (1) CN113033549B (en)
WO (1) WO2022188327A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113344822A (en) * 2021-06-29 2021-09-03 展讯通信(上海)有限公司 Image noise reduction method, device, terminal and storage medium
CN113642740A (en) * 2021-08-12 2021-11-12 百度在线网络技术(北京)有限公司 Model training method and device, electronic device and medium
CN113901911A (en) * 2021-09-30 2022-01-07 北京百度网讯科技有限公司 Image recognition method, image recognition device, model training method, model training device, electronic equipment and storage medium
CN114612732A (en) * 2022-05-11 2022-06-10 成都数之联科技股份有限公司 Sample data enhancement method, system and device, medium and target classification method
CN115049878A (en) * 2022-06-17 2022-09-13 平安科技(深圳)有限公司 Target detection optimization method, device, equipment and medium based on artificial intelligence
WO2022188327A1 (en) * 2021-03-09 2022-09-15 北京百度网讯科技有限公司 Method and apparatus for training positioning image acquisition model

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830640B (en) * 2022-12-26 2024-03-05 北京百度网讯科技有限公司 Human body posture recognition and model training method, device, equipment and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109784424A (en) * 2019-03-26 2019-05-21 腾讯科技(深圳)有限公司 A kind of method of image classification model training, the method and device of image procossing
CN110334807A (en) * 2019-05-31 2019-10-15 北京奇艺世纪科技有限公司 Training method, device, equipment and the storage medium of deep learning network
WO2019228358A1 (en) * 2018-05-31 2019-12-05 华为技术有限公司 Deep neural network training method and apparatus
CN111723815A (en) * 2020-06-23 2020-09-29 中国工商银行股份有限公司 Model training method, image processing method, device, computer system, and medium
CN111739027A (en) * 2020-07-24 2020-10-02 腾讯科技(深圳)有限公司 Image processing method, device and equipment and readable storage medium
CN112183635A (en) * 2020-09-29 2021-01-05 南京农业大学 Method for realizing segmentation and identification of plant leaf lesions by multi-scale deconvolution network

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101879207B1 (en) * 2016-11-22 2018-07-17 주식회사 루닛 Method and Apparatus for Recognizing Objects in a Weakly Supervised Learning Manner
CN111950579A (en) * 2019-05-17 2020-11-17 北京京东尚科信息技术有限公司 Training method and training device for classification model
CN111046939B (en) * 2019-12-06 2023-08-04 中国人民解放军战略支援部队信息工程大学 Attention-based CNN class activation graph generation method
CN111639755B (en) * 2020-06-07 2023-04-25 电子科技大学中山学院 Network model training method and device, electronic equipment and storage medium
CN113033549B (en) * 2021-03-09 2022-09-20 北京百度网讯科技有限公司 Training method and device for positioning diagram acquisition model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019228358A1 (en) * 2018-05-31 2019-12-05 华为技术有限公司 Deep neural network training method and apparatus
CN109784424A (en) * 2019-03-26 2019-05-21 腾讯科技(深圳)有限公司 A kind of method of image classification model training, the method and device of image procossing
CN110334807A (en) * 2019-05-31 2019-10-15 北京奇艺世纪科技有限公司 Training method, device, equipment and the storage medium of deep learning network
CN111723815A (en) * 2020-06-23 2020-09-29 中国工商银行股份有限公司 Model training method, image processing method, device, computer system, and medium
CN111739027A (en) * 2020-07-24 2020-10-02 腾讯科技(深圳)有限公司 Image processing method, device and equipment and readable storage medium
CN112183635A (en) * 2020-09-29 2021-01-05 南京农业大学 Method for realizing segmentation and identification of plant leaf lesions by multi-scale deconvolution network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周鹏程等: "基于深度特征融合的图像语义分割", 《计算机科学》, no. 02, 29 October 2019 (2019-10-29), pages 132 - 140 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022188327A1 (en) * 2021-03-09 2022-09-15 北京百度网讯科技有限公司 Method and apparatus for training positioning image acquisition model
CN113344822A (en) * 2021-06-29 2021-09-03 展讯通信(上海)有限公司 Image noise reduction method, device, terminal and storage medium
CN113642740A (en) * 2021-08-12 2021-11-12 百度在线网络技术(北京)有限公司 Model training method and device, electronic device and medium
CN113642740B (en) * 2021-08-12 2023-08-01 百度在线网络技术(北京)有限公司 Model training method and device, electronic equipment and medium
CN113901911A (en) * 2021-09-30 2022-01-07 北京百度网讯科技有限公司 Image recognition method, image recognition device, model training method, model training device, electronic equipment and storage medium
CN114612732A (en) * 2022-05-11 2022-06-10 成都数之联科技股份有限公司 Sample data enhancement method, system and device, medium and target classification method
CN115049878A (en) * 2022-06-17 2022-09-13 平安科技(深圳)有限公司 Target detection optimization method, device, equipment and medium based on artificial intelligence
CN115049878B (en) * 2022-06-17 2024-05-03 平安科技(深圳)有限公司 Target detection optimization method, device, equipment and medium based on artificial intelligence

Also Published As

Publication number Publication date
WO2022188327A1 (en) 2022-09-15
CN113033549B (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN113033549B (en) Training method and device for positioning diagram acquisition model
CN110059694B (en) Intelligent identification method for character data in complex scene of power industry
CN111639710A (en) Image recognition model training method, device, equipment and storage medium
CN111652225B (en) Non-invasive camera shooting and reading method and system based on deep learning
CN113158909B (en) Behavior recognition light-weight method, system and equipment based on multi-target tracking
CN112949710A (en) Image clustering method and device
CN114612749B (en) Neural network model training method and device, electronic device and medium
CN113361578A (en) Training method and device of image processing model, electronic equipment and storage medium
CN113674421A (en) 3D target detection method, model training method, related device and electronic equipment
CN113869429A (en) Model training method and image processing method
US20220101642A1 (en) Method for character recognition, electronic device, and storage medium
CN112862006A (en) Training method and device for image depth information acquisition model and electronic equipment
CN113222149A (en) Model training method, device, equipment and storage medium
CN111709428B (en) Method and device for identifying positions of key points in image, electronic equipment and medium
CN113837308A (en) Knowledge distillation-based model training method and device and electronic equipment
JP2023166444A (en) Capture and storage of magnified images
CN113344862A (en) Defect detection method, defect detection device, electronic equipment and storage medium
CN113792742A (en) Semantic segmentation method of remote sensing image and training method of semantic segmentation model
CN111126155B (en) Pedestrian re-identification method for generating countermeasure network based on semantic constraint
CN112651451A (en) Image recognition method and device, electronic equipment and storage medium
CN113177449A (en) Face recognition method and device, computer equipment and storage medium
CN112837466B (en) Bill recognition method, device, equipment and storage medium
CN116030370A (en) Behavior recognition method and device based on multi-target tracking and electronic equipment
CN113592932A (en) Training method and device for deep completion network, electronic equipment and storage medium
CN109543716B (en) K-line form image identification method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant