WO2022188327A1 - 定位图获取模型的训练方法和装置 - Google Patents

定位图获取模型的训练方法和装置 Download PDF

Info

Publication number
WO2022188327A1
WO2022188327A1 PCT/CN2021/106885 CN2021106885W WO2022188327A1 WO 2022188327 A1 WO2022188327 A1 WO 2022188327A1 CN 2021106885 W CN2021106885 W CN 2021106885W WO 2022188327 A1 WO2022188327 A1 WO 2022188327A1
Authority
WO
WIPO (PCT)
Prior art keywords
positioning map
category
loss function
model
positioning
Prior art date
Application number
PCT/CN2021/106885
Other languages
English (en)
French (fr)
Inventor
尚方信
杨叶辉
王磊
许言午
Original Assignee
北京百度网讯科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京百度网讯科技有限公司 filed Critical 北京百度网讯科技有限公司
Publication of WO2022188327A1 publication Critical patent/WO2022188327A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to the fields of artificial intelligence such as computer vision and deep learning.
  • Image recognition is an important field of artificial intelligence.
  • localization map recognition is one of the important technologies.
  • the research on localization map recognition has made great progress, and localization map recognition is a further image recognition. , analysis and understanding.
  • the present disclosure provides a training method for a positioning map acquisition model.
  • the loss function of the model is finally determined to reversely adjust the model parameters, so as to guide the localization map to obtain the model to filter the areas of higher attention, so as to realize the optimization of the localization map.
  • a training device for a positioning map acquisition model According to another aspect of the present disclosure, there is provided a training device for a positioning map acquisition model.
  • an electronic device is provided.
  • a non-transitory computer-readable storage medium is provided.
  • a computer program product is provided.
  • an embodiment of the first aspect of the present disclosure proposes a training method for a positioning map acquisition model, and the method includes:
  • a training device for acquiring a model of a positioning map comprising:
  • the first acquisition module is used to input the sample image into the positioning map acquisition model for category identification, and obtain the identified positioning map of each category;
  • the second acquisition module is used to acquire the label information of the sample image, and obtain the loss function corresponding to each category according to the label information of the sample image and the pixel value of each positioning map according to the category;
  • the adjustment module is used to reversely adjust the positioning map acquisition model based on the loss function corresponding to each category, and return to use the next sample image to continue training the adjusted positioning image acquisition model until the end of training to generate the target positioning map acquisition. Model.
  • an embodiment of the third aspect of the present disclosure provides an electronic device, comprising at least one processor, and
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the training method of the positioning map acquisition model according to the embodiment of the first aspect of the present disclosure.
  • a fourth aspect of the present disclosure provides a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause a computer to execute the positioning map acquisition of the first aspect of the present disclosure.
  • the training method of the model is used to cause a computer to execute the positioning map acquisition of the first aspect of the present disclosure.
  • the fifth aspect of the present disclosure provides a computer program product, including a computer program that, when executed by a processor, implements the training method for the positioning map acquisition model of the first aspect of the present disclosure.
  • FIG. 1 is a flowchart of a training method for a positioning map acquisition model according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of a training method for a positioning map acquisition model according to another embodiment of the present disclosure
  • FIG. 3 is a flowchart of a training method for a positioning map acquisition model according to another embodiment of the present disclosure
  • FIG. 4 is a flowchart of a training method for a positioning map acquisition model according to another embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of a training method for a positioning map acquisition model according to an embodiment of the present disclosure
  • FIG. 6 is a structural diagram of a training device for a positioning map acquisition model according to an embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of an electronic device that can implement an embodiment of the present disclosure.
  • Image Processing is a technique for analyzing images with a computer to achieve the desired results. Also called image processing.
  • Image processing generally refers to digital image processing.
  • Digital image refers to a large two-dimensional array obtained by shooting with industrial cameras, video cameras, scanners and other equipment. The elements of the array are called pixels, and their values are called gray values.
  • Image processing technology generally includes three parts: image compression, enhancement and restoration, matching, description and recognition.
  • Deep Learning is a new research direction in the field of Machine Learning (ML), which is introduced into machine learning to make it closer to the original goal - artificial intelligence. Deep learning is to learn the intrinsic laws and representation levels of sample data, and the information obtained during these learning processes is of great help in the interpretation of data such as text, images, and sounds. Its ultimate goal is to enable machines to have the ability to analyze and learn like humans, and to recognize data such as words, images, and sounds. Deep learning is a complex machine learning algorithm that has achieved results in speech and image recognition far exceeding previous related technologies.
  • Computer Vision is a science that studies how to make machines "see”. Further, it refers to the use of cameras and computers instead of human eyes to identify, track and measure targets, and further make graphics. Processing makes computer processing become images more suitable for human eye observation or transmission to instruments for detection. As a scientific discipline, computer vision studies related theories and technologies, trying to build artificial intelligence systems that can obtain 'information' from images or multi-dimensional data. The information referred to here refers to Shannon's definition of information that can be used to help make a "decision”. Because perception can be viewed as extracting information from sensory signals, computer vision can also be viewed as the science of how to make artificial systems "perceive" from images or multidimensional data.
  • Artificial Intelligence is a discipline that studies certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) that allow computers to simulate life. It has both hardware-level technology and software-level technology. technology. Artificial intelligence hardware technology generally includes computer vision technology, speech recognition technology, natural language processing technology and its learning/deep learning, big data processing technology, knowledge graph technology and other major aspects.
  • FIG. 1 is a flowchart of a training method for a positioning map acquisition model according to an embodiment of the present disclosure. As shown in FIG. 1 , the training method for the positioning map acquisition model includes the following steps:
  • the preprocessing process After preprocessing the sample image, input the positioning map to obtain the model. After the sample image is preprocessed, the irrelevant noise in the image can be eliminated, the data can be simplified and the detectability of the relevant information can be enhanced. Optionally, the preprocessing process includes digitization, smoothing, restoration or enhancement, and the like.
  • the location map acquisition model may include a classification network, and based on the classification network, class recognition is performed on the input sample image.
  • the classification network includes a feature extractor and a classifier, wherein:
  • the feature extractor includes a convolution layer, a pooling layer, and a normalization layer.
  • the feature extractor can be used to extract features from the sample images to obtain feature vectors corresponding to the sample images.
  • the preprocessed sample image is input into the classification network, and the convolution layer in the feature extractor performs the convolution operation on the sample image to extract the feature map of the sample image; then the pooling layer performs the pooling operation, Retain the main features of the sample image while reducing the feature dimension to reduce the amount of calculation; after the sample image is subjected to convolution and pooling operations, the data distribution is likely to be changed.
  • the feature map of the sample image needs to be normalized, and finally the feature vector of the sample image is extracted.
  • the classifier includes a fully connected layer for integrating feature vectors.
  • the fully connected layer performs a fully connected operation on the feature vector output by the feature extractor, and further determines the location map of the category corresponding to the sample image.
  • Different sample images can correspond to different categories.
  • the localization map of each category is determined based on the type recognition result of the sample image by the classifier.
  • the localization map can be understood as the class activation map (Class Cctivation Map, CAM) of the class.
  • the class activation map is used to reflect the importance of each position in the sample image to the category, so it can be determined whether the position belongs to the category based on the importance of the position to the category, and then determine the category from the positioning map. Locate the target.
  • the classification network can be constructed by itself, or a network model such as a convolutional neural network, ResNet (residual network), and DenseNet (dense convolutional network) can be used.
  • a network model such as a convolutional neural network, ResNet (residual network), and DenseNet (dense convolutional network) can be used.
  • S102 Obtain label information of the sample image, and obtain a loss function corresponding to each class according to the label information of the sample image and the pixel value of the location map of each class.
  • the label information of the sample image is pre-labeled.
  • the label information of the sample image includes the label of each category.
  • One of the labels has a value of 1, and the remaining categories have a value of 0, that is, the label of the sample image.
  • the acquired positioning map is actually a matrix, wherein the elements in the matrix are the position points on the positioning map, and the values of the elements are the pixel values of the positioning map.
  • the pixel value of each location point on the location map can reflect the importance of the location point to the category.
  • the label information of the sample image can directly reflect whether the sample image belongs to a certain category. Therefore, in the positioning image acquisition model, the loss function corresponding to each category can be obtained based on the label information of the sample image and the pixel value of the positioning map of each category. For example, class A, class B, and class C are included in the sample image.
  • the loss function corresponding to category A can be obtained based on the label information of the sample image and the pixel value of the location map of category A. Based on the label information of the sample image and the pixel value of the location map of category B, the loss function corresponding to category B can be obtained. Further, based on the label information of the sample image and the pixel value of the location map of the category C, the loss function corresponding to the category C can be obtained. That is to say, for each category identified in the sample image, the loss function corresponding to the category needs to be obtained.
  • the positioning map acquisition model is trained by constructing a loss function of each category to reduce errors, and finally a target positioning map acquisition model is generated to obtain an optimized positioning map.
  • the loss function of each category is obtained, since the positioning map acquisition model needs to identify each category, it is necessary to comprehensively consider the loss function of each category, so the loss function of each category can be summed or according to the category
  • the corresponding loss function is weighted to obtain the overall loss function of the positioning map to obtain the model, and the gradient information of the positioning map is determined according to the overall loss function of the model, and the gradient information is back-propagated to the positioning map to obtain each model of the model. layer, and adjust the parameters such as weights of each layer of the positioning map acquisition model.
  • the parameters of the positioning image acquisition model are adjusted for each training. After the adjustment is completed, the next sample image is used to continue training the adjusted positioning image acquisition model before the end condition of the model training is met, until the training ends to generate the target.
  • the positioning map gets the model.
  • the training end condition may be that a preset number of training times is reached or the error after training is less than a preset threshold.
  • the model for obtaining the positioning map not only makes the model output a more accurate and comprehensive positioning map, but also makes the classification result of the model better. That is to say, the model for obtaining the positioning map is a kind of A "weakly supervised" model where the labeling accuracy is weaker than the output accuracy.
  • the training method of the positioning map acquisition model proposed by the embodiment of the present disclosure firstly inputs the sample image into the positioning map acquisition model for category recognition, and obtains the identified positioning map of each category; then, for each category, according to the category positioning map The pixel value and the label information of the sample image are obtained, and the loss function corresponding to the category is obtained; finally, based on the loss function corresponding to each category, the positioning map acquisition model is reversely adjusted, and the adjusted positioning image using the next sample image is returned. Acquire the model and continue training until the end of the training to generate the target localization map to obtain the model.
  • the loss function of the model is finally determined through the location map of the category and the label information of the sample image, and then the model parameters are adjusted inversely, so as to guide the location map to obtain the model to filter the areas of higher attention, so that the model no longer focuses on the target.
  • the most discriminative regions thereby enabling optimization of the localization map.
  • building a loss function based on the location map of the category can suppress the category-independent image information.
  • the process of obtaining the loss function corresponding to the category may include the following steps:
  • the input sample image has a length of H pixels and a width of W pixels, and the sample image has z features in total, wherein each feature may correspond to a number of channels.
  • the model obtained from the positioning map can output the positioning map of the category, and the positioning map can be expressed as:
  • M c (x, y) represents the pixel value of the location map M of the c-th category at the position point (x, y); Represents the weight vector of the kth channel in the fully connected layer, k ⁇ z; f k (x, y) represents the value of the feature map f corresponding to the sample image on the kth channel of the position point (x, y).
  • the embodiment of the present disclosure may constrain the pixel value of each position point in the positioning map, so as to constrain the pixel value to be within the same target value range.
  • the value of the pixel value of each position point in the positioning map is constrained from (- ⁇ , + ⁇ ) to [0, + ⁇ ).
  • the pixel value can be bisected or the absolute value can be taken.
  • the hyperparameters of the model are obtained based on a set value, such as a positioning map, and the pixel value is constrained to be within the target value range.
  • the pixel value of the position point on the positioning map can be constrained based on the following formula:
  • n is the hyperparameter preset by the positioning map acquisition model
  • min( ) is the operation of taking the minimum value, and is used to select
  • the minimum value of the hyperparameter ⁇ is used as the constrained pixel value at the position point (x, y); that is, the hyperparameter ⁇ is the upper limit of the target value range; that is, the constrained pixel value
  • the target value range of is [0, ⁇ ].
  • the pixel mean value An ,c of the positioning map is obtained.
  • (u ⁇ v) is the resolution of the positioning map
  • CCAM n,c is the pixel value of the c-th category constraint of the n-th positioning map.
  • S203 Determine the loss function of the category according to the pixel mean value and the label value.
  • the loss function of category c is constructed according to the pixel mean value A n,c of the location map of category c of the n-th sample image and the label value y n,c of the n-th sample image in category c:
  • N represents a total of N sample images in the dataset.
  • the constructed loss function can be realized when y n,c is 1, indicating that the nth sample image includes images of category c, that is, the location map of the nth sample image is important relative to the category c
  • the above-mentioned loss function of category c can be used to adjust the pixel value average value A n,c of the positioning map in an increasing direction, that is, increase the pixel value of the positioning map; where y n,c is 0 , it is explained that the n-th sample image does not include images of category c, that is, the positioning map of the n-th sample image is unimportant relative to the category c.
  • the above-mentioned loss function of category c can be used to convert the positioning map
  • the mean value of the pixel value An ,c is adjusted in the direction of decreasing, that is, the pixel value of the positioning map is reduced, so as to guide the positioning map acquisition model to select areas with high attention as much as possible, and reduce the value of the loss function.
  • the process of reversely adjusting the positioning map acquisition model may include the following steps:
  • step S203 Based on the loss function of a certain category obtained in step S203, the loss functions of all categories are obtained, and the loss functions corresponding to all categories are summed to obtain the first loss function of the model as the positioning map.
  • m indicates that there are m categories in the data set
  • is a preset parameter
  • L c is the first loss function of category c.
  • the second loss function needs to be applied to the localization map of each category, and is a common loss function for classification networks.
  • a cross-entropy loss function can be used as the second loss function.
  • the second loss function can be obtained based on the training error.
  • the first loss function and the second loss function are summed to obtain the total loss function of the model as the positioning map.
  • L 2 represents the second loss function.
  • FIG. 4 is another training method of a positioning map acquisition model provided in an embodiment of the present disclosure.
  • the training method of the positioning map acquisition model includes the following steps:
  • FIG. 5 is a schematic diagram of a training method for a positioning map acquisition model provided by an embodiment of the present disclosure.
  • a feature extractor can be used Obtain the feature map, and input the classifier to obtain the feature vector, and then perform category recognition, and obtain the positioning map of each identified category; based on the label information and pixel value of the positioning map, obtain the loss function corresponding to the category, based on the corresponding The loss function is used to reversely adjust the positioning map acquisition model, and return to use the next sample image to continue training the adjusted positioning image acquisition model until the end of training to generate the target positioning map acquisition model.
  • the model for obtaining the positioning map not only makes the model output a more accurate and comprehensive positioning map, but also makes the classification result of the model better. That is to say, the model for obtaining the positioning map is a kind of A "weakly supervised" model where the labeling accuracy is weaker than the output accuracy.
  • a training apparatus 600 for a positioning map acquisition model includes:
  • the first obtaining module 61 is used for inputting the sample image into the positioning map obtaining model for category identification, and obtaining the identified positioning map of each category;
  • the second obtaining module 62 is used to obtain the label information of the sample image, and obtain the loss function corresponding to each category according to the label information of the sample image and the pixel value of each positioning map according to the category;
  • the adjustment module 63 is used to reversely adjust the positioning map acquisition model based on the loss function corresponding to each category, and return to use the next sample image to continue training the adjusted positioning image acquisition model until the end of the training to generate the target positioning map Get the model.
  • the training device for the positioning map acquisition model proposed by the embodiment of the present disclosure firstly inputs the sample image into the positioning map acquisition model to perform category recognition, and obtains the identified positioning map of each category; then, for each category, according to the category positioning map The pixel value and the label information of the sample image are obtained, and the loss function corresponding to the category is obtained; finally, based on the loss function corresponding to each category, the positioning map acquisition model is reversely adjusted, and the adjusted positioning image using the next sample image is returned. Acquire the model and continue training until the end of the training to generate the target localization map to obtain the model.
  • the loss function of the model is optimized to reduce the value of the loss function through the location map of the category and the label information of the sample image, and the model is obtained by reversely adjusting the guide location map to filter the areas with higher attention, so as to realize the optimization of the location map. optimization.
  • the second obtaining module 62 is further configured to: for each category, obtain the pixel mean value of the positioning map according to the pixel value of the positioning map of the category; based on the sample image The label information of the sample image is obtained, and the label value of the sample image on the category is obtained; the loss function of the category is determined according to the pixel mean and the label value.
  • the adjustment module 63 is further configured to: sum up the loss functions corresponding to all categories, obtain the first loss function of the model for obtaining the positioning map; obtain the positioning map to obtain The second loss function of the model; based on the first loss function and the second loss function, determine the bitmap to obtain the total loss function of the model; based on the total loss function, determine the positioning map to obtain the gradient information of the model, and reversely adjust the positioning based on the gradient information Figure to get the model.
  • the second obtaining module 62 is further configured to: constrain the pixel value of each position point in the positioning map to be within the target value range; based on the target value range The constrained pixel value corresponding to each position point and the resolution of the positioning map are obtained, and the pixel mean value of the positioning map is obtained.
  • the second obtaining module 62 is further configured to: for the pixel value of any position point on the positioning map, obtain the hyperparameter specified in the model by combining the pixel value with the positioning map For comparison, the minimum value between the pixel value and the hyperparameter is selected as the constrained pixel value corresponding to any position point, wherein the hyperparameter is used to determine the upper limit of the target value range.
  • the first acquisition module 61 is further configured to: for each category, acquire the classification weight vector in the model and the feature vector of the image based on the positioning map corresponding to the category, Get a location map for a category.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • the electronic device 700 includes a storage medium 71, a processor 72, and a computer program product stored in the memory 71 and running on the processor 72.
  • the processor executes the computer program, the aforementioned positioning diagram is realized. Get the model training method.
  • Various implementations of the systems and techniques described herein above may be implemented in digital electronic circuitry, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on chips system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or combinations thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC systems on chips system
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or combinations thereof.
  • These various embodiments may include being implemented in one or more computer programs executable and/or interpretable on a programmable system including at least one programmable processor that
  • the processor which may be a special purpose or general-purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device an output device.
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, performs the functions/functions specified in the flowcharts and/or block diagrams. Action is implemented.
  • the program code may execute entirely on the machine, partly on the machine, partly on the machine and partly on a remote machine as a stand-alone software package or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or trackball
  • Other kinds of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (eg, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented on a computing system that includes back-end components (eg, as a data server), or a computing system that includes middleware components (eg, an application server), or a computing system that includes front-end components (eg, a user's computer having a graphical user interface or web browser through which a user may interact with implementations of the systems and techniques described herein), or including such backend components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communication network). Examples of communication networks include: Local Area Networks (LANs), Wide Area Networks (WANs), the Internet, and blockchain networks.
  • a computer system can include clients and servers. Clients and servers are generally remote from each other and usually interact through a communication network. The relationship of client and server arises by computer programs running on the respective computers and having a client-server relationship to each other.
  • the server can be a cloud server, also known as a cloud computing server or a cloud host, which is a host product in the cloud computing service system to solve the ), there are defects of difficult management and weak business expansion.
  • the server can also be a server of a distributed system, or a server combined with a blockchain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

本公开公开了定位图获取模型的训练方法和装置,涉及图像处理技术领域,尤其涉及计算机视觉、深度学习等人工智能领域。该方案为:将样本图像输入定位图获取模型进行类别识别,获取识别出的每个类别的定位图;获取样本图像的标签信息,并结合每个类别的定位图的像素值,获取每个类别对应的损失函数;基于每个类别对应的损失函数,对定位图获取模型进行反向调整,并返回使用下一个样本图像对调整后的定位图像获取模型继续训练,直至训练结束生成目标定位图获取模型。本公开通过类别的定位图和样本图像的标签信息,确定每个类型的损失函数,进而生成模型的损失函数以反向调整模型参数,以引导模型筛选更高注意力的区域,得到优化的定位图。

Description

定位图获取模型的训练方法和装置
相关申请的交叉引用
本公开要求于2021年3月09日提交的中国专利申请号“202110258523.6”的优先权,其全部内容通过引用并入本文。
技术领域
本公开涉及图像处理技术领域,尤其涉及计算机视觉、深度学习等人工智能领域。
背景技术
图像识别是人工智能的一个重要领域,在图像识别的发展中,定位图识别是其中的一项重要技术,有关定位图识别方面的研究取得了很大的进展,定位图识别为进一步的图像识别、分析和理解奠定了基础。
发明内容
本公开提供了一种定位图获取模型的训练方法。通过类别的定位图和样本图像的标签信息,最终确定模型的损失函数来反向调整模型参数,,以引导定位图获取模型筛选更高注意力的区域,从而实现定位图的优化。
根据本公开的另一方面,提供了一种定位图获取模型的训练装置。
根据本公开的另一方面,提供了一种电子设备。
根据本公开的另一方面,提供了一种非瞬时计算机可读存储介质。
根据本公开的另一方面,提供了一种计算机程序产品。
为达上述目的,本公开第一方面实施例提出了一种定位图获取模型的训练方法,该方法包括:
将样本图像输入定位图获取模型进行类别识别,获取识别出的每个类别的定位图;
获取样本图像的标签信息,并针对每个类别,根据样本图像的标签信息和每个类别的定位图的像素值和样本图像的标签信息,获取每个类别对应的损失函数;
基于每个类别对应的损失函数,对定位图获取模型进行反向调整,并返回使用下一个样本图像对调整后的定位图像获取模型继续训练,直至训练结束生成目标定位图获取模型。
为达上述目的,本公开第二方面实施例提出了一种定位图获取模型的训练装置,该装置包括:
第一获取模块,用于将样本图像输入定位图获取模型进行类别识别,获取识别出的每个类别的定位图;
第二获取模块,用于获取样本图像的标签信息,并根据样本图像的标签信息和每个根据类别的定位图的像素值,获取每个类别对应的损失函数;
调整模块,用于基于每个类别对应的损失函数,对定位图获取模型进行反向调整,并返回使用下一个样本图像对调整后的定位图像获取模型继续训练,直至训练结束生成目标定位图获取模型。
为达上述目的,本公开第三方面实施例提出了一种电子设备,包括至少一个处理器,以及
与至少一个处理器通信连接的存储器;其中,
存储器存储有可被至少一个处理器执行的指令,指令被至少一个处理器执行,以使至少一个处理器能够执行本公开第一个方面实施例的定位图获取模型的训练方法。
为达上述目的,本公开第四方面实施例提出了一种存储有计算机指令的非瞬时计算机可读存储介质,其中,计算机指令用于使计算机执行本公开第一个方面实施例的定位图获取模型的训练方法。
为达上述目的,本公开第五方面实施例提出了一种计算机程序产品,包括计算机程序,计算机程序在被处理器执行时实现本公开第一个方面实施例的定位图获取模型的训练方法。
应当理解,本部分所描述的内容并非旨在标识本公开的实施例的关键或重要特征,也不用于限制本公开的范围。本公开的其它特征将通过以下的说明书而变得容易理解。
附图说明
附图用于更好地理解本方案,不构成对本公开的限定。其中:
图1是根据本公开一个实施例的定位图获取模型的训练方法的流程图;
图2是根据本公开另一个实施例的定位图获取模型的训练方法的流程图;
图3是根据本公开另一个实施例的定位图获取模型的训练方法的流程图;
图4是根据本公开另一个实施例的定位图获取模型的训练方法的流程图;
图5是根据本公开一个实施例的定位图获取模型的训练方法的示意图;
图6是根据本公开一个实施例的定位图获取模型的训练装置的结构图;
图7是可以实现本公开实施例的电子设备的结构示意图。
具体实施方式
以下结合附图对本公开的示范性实施例做出说明,其中包括本公开实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此,本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本公开的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
图像处理(Image Processing),用计算机对图像进行分析,以达到所需结果的技术。又称影像处理。图像处理一般指数字图像处理。数字图像是指用工业相机、摄像机、扫描仪等设备经过拍摄得到的一个大的二维数组,该数组的元素称为像素,其值称为灰度值。图像处理技术一般包括图像压缩,增强和复原,匹配、描述和识别3个部分。
深度学习(Deep Learning,简称DL),是机器学习(Machine Learning,简称ML)领域中一个新的研究方向,它被引入机器学习使其更接近于最初的目标——人工智能。深度学习是学习样本数据的内在律和表示层次,这些学习过程中获得的信息对诸如文字,图像和声音等数据的解释有很大的帮助。它的最终目标是让机器能够像人一样具有分析学习能力,能够识别文字、图像和声音等数据。深度学习是一个复杂的机器学习算法,在语音和图像识别方面取得的效果,远远超过先前相关技术。
计算机视觉(Computer Vision),是一门研究如何使机器“看”的科学, 更进一步的说,就是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉,并进一步做图形处理,使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科,计算机视觉研究相关的理论和技术,试图建立能够从图像或者多维数据中获取‘信息’的人工智能系统。这里所指的信息指Shannon定义的,可以用来帮助做一个“决定”的信息。因为感知可以看作是从感官信号中提取信息,所以计算机视觉也可以看作是研究如何使人工系统从图像或多维数据中“感知”的科学。
人工智能(Artificial Intelligence,简称AI),是研究使计算机来模拟人生的某些思维过程和智能行为(如学习、推理、思考、规划等)的学科,既有硬件层面的技术,也有软件层面的技术。人工智能硬件技术一般包括计算机视觉技术、语音识别技术、自然语言处理技术以及及其学习/深度学习、大数据处理技术、知识图谱技术等几大方面。
图1是本公开一个实施例的定位图获取模型的训练方法的流程图,如图1所示,该定位图获取模型的训练方法包括以下步骤:
S101,将样本图像输入定位图获取模型进行类别识别,获取识别出的每个类别的定位图。
将样本图像进行预处理操作后输入定位图获取模型。样本图像经过预处理后可以消除图像中无关的噪声,简化数据的同时可以增强有关信息的可检测性。可选地,预处理过程包括数字化、平滑、复原或增强等。
本公开实施例中,定位图获取模型可以包括分类网络,基于该分类网络对输入的样本图像进行类别识别。可选地,分类网络中包括特征提取器及分类器,其中:
特征提取器包括卷积层、池化层以及归一层,该特征提取器可用于对样本图像进行特征提取,得到样本图像对应的特征向量。实现中将经过预处理后的样本图像输入分类网络中,由特征提取器中的卷积层对样本图像进行卷积运算,提取出样本图像的特征图;然后由池化层进行池化操作,在保留样本图像主要特征的同时减少特征维度,以降低计算量;样本图像进行卷积运算和池化操作后,数据分布很可能被改变,为了解决训练过程中中间层数据分布变化较大的情况,还需将样本图像的特征图进行归一化处理,最终提取到样本图像的特征向量。
分类器包括全连接层,用于整合特征向量,全连接层将特征提取器输出的特征向量进行全连接操作,进一步地,确定出样本图像对应的类别的定位图。不同的样本图像可以对应不同的类别。实现中,基于分类器对样本图像的类型识别结果,确定每个类别的定位图,本公开实施例中,定位图可以理解为类别的类激活图(Class Cctivation Map,CAM)。其中,类激活图用于反映样本图像中每个位置对类别的重要程度,因此可以基于位置对类别的重要程度,确定该位置是否为属于该类别的位置,进而从定位图中确定该类别的定位目标。
可选地,分类网络可以自行构造,也可以使用卷积神经网络、ResNet(残差网络)、DenseNet(密集卷积网络)等网络模型。
S102,获取样本图像的标签信息,并根据样本图像的标签信息和每个类别的定位图的像素值,获取每个类别对应的损失函数。
预先标记样本图像的标签信息,在多分类网络中,样本图像的标签信息包括每个类别的标签,其中一个标签的取值为1,剩余类别的标签取值均为0,即样本图像的标签信息可以表示成y n={0,0,1...,0}。
本公开实施例中,获取的定位图实际上是一个矩阵,其中,矩阵中的元素即为定位图上的位置点,元素的取值即为定位图的像素值。定位图上每个位置点的像素值可以反映出该位置点对类别的重要程度。而样本图像的标签信息可以直接反映出样本图像是否属于某一类别。因此,可以在定位图像获取模型中,可以基于样本图像的标签信息和每个类别的定位图的像素值,获取每个类别对应的损失函数。例如,样本图像中包括类别A、类别B和类别C。其中,基于样本图像的标签信息和类别A的定位图的像素值,可以获取到类别A对应的损失函数。基于样本图像的标签信息和类别B的定位图的像素值,可以获取到类别B对应的损失函数。进一步地,基于样本图像的标签信息和类别C的定位图的像素值,可以获取到类别C对应的损失函数。也就是说,针对样本图像中识别出的每个类别均需要获取到类别对应的损失函数。
本公开实施例,通过构建每个类别的损失函数对定位图获取模型进行训练来减小误差,最终生成目标定位图获取模型,以获取优化的定位图。
S103,基于每个类别对应的损失函数,对定位图获取模型进行反向调 整,并返回使用下一个样本图像对调整后的定位图像获取模型继续训练,直至训练结束生成目标定位图获取模型。
在获取到每个类别的损失函数后,由于定位图获取模型需要对每个类别进行识别,因此需要综合考虑每个类别的损失函数,因此可以对每个类别的损失函数进行求和或者根据类别对相应的损失函数进行加权,来获取定位图获取模型的整体损失函数,并根据该模型的整体损失函数确定定位图获取模型的梯度信息,将梯度信息反向传播至定位图获取模型的每一层,并对定位图获取模型每一层的参数如权重进行调整。
每次训练一次就会调整定位图获取模型的参数,在调整结束后,在未满足模型训练结束条件之前,就使用下一个样本图像对调整后的定位图像获取模型继续训练,直至训练结束生成目标定位图获取模型。可选地,训练结束条件可以是达到预设的训练次数或者训练后误差小于预设阈值。
在上述实例的基础之上,获取到目标定位图获取模型之后,可以对任意图像进行类型识别,以获取该任意图像的定位图,进而可以获取到图像中的目标。本公开实施例中定位图获取模型在无精准标注的情况下,不仅使模型输出更准确和全面的定位图,而且也使模型的分类结果更优,也就是说,定位图获取模型为一种标注精度弱于输出精度的“弱监督”模型。
本公开实施例所提出的定位图获取模型的训练方法,首先将样本图像输入定位图获取模型进行类别识别,获取识别出的每个类别的定位图;然后针对每个类别,根据类别的定位图的像素值和样本图像的标签信息,获取类别对应的损失函数;最后基于每个类别对应的损失函数,对定位图获取模型进行反向调整,并返回使用下一个样本图像对调整后的定位图像获取模型继续训练,直至训练结束生成目标定位图获取模型。本公开实施例通过类别的定位图和样本图像的标签信息,最终确定模型的损失函数进而反向调整模型参数,以引导定位图获取模型筛选更高注意力的区域,使得模型不再进关注目标最具判别性的区域,从而实现定位图的优化。而且,基于类别的定位图构建损失函数,可以使得与类别无关的图像信息则受到压制。
在上述实施例的基础上,获取类别对应的损失函数的过程,如图2所示,可以包括以下步骤:
S201,针对每个类别,根据类别的定位图的像素值,获取定位图的像素均值。
本公开实施例中,输入的样本图像长度为H个像素点,宽度为W个像素点,样本图像共有z个特征,其中,每个特征可以对应一个通道数。本公开实施例中,由定位图获取模型可以输出类别的定位图,该定位图可以表示为:
Figure PCTCN2021106885-appb-000001
其中,M c(x,y)表示第c个类别的定位图M在位置点(x,y)处的像素值;
Figure PCTCN2021106885-appb-000002
表示全连接层中第k个通道的权向量,k≤z;f k(x,y)表示样本图像对应的特征图f在位置点(x,y)的第k个通道上的值。
在获取到定位图每个位置点的像素值之后,可以对位置点上的像素值求平均,以获取定位图的像素均值。可选地,为改进模型,减小数据分布变化,本公开实施例可以对定位图中每个位置点的像素值进行约束,以将像素值约束值同一目标取值范围内。本公开实施例中,将定位图中的每个位置点的像素值的取值从(-∞,+∞)约束至[0,+∞)。例如,可以像素值进行平分根或者取绝对值等。进一步地,基于设定值,例如定位图获取模型的超参数,再将像素值约束至目标取值范围内。
可选地,可以基于如下公式对定位图上位置点的像素值进行约束:
Figure PCTCN2021106885-appb-000003
其中,
Figure PCTCN2021106885-appb-000004
用于将定位图中每个位置点(x,y)处的像素值的取值范围从(-∞,+∞)约束至[0,+∞)。进一步地,η为定位图获取模型预设的超参数,min(·)为取最小值的运算,用于选取
Figure PCTCN2021106885-appb-000005
与超参数η中的最小值作为位置点(x,y)处被约束后的像素值;也就是说,超参数η是目标取值范围的上限值;也就是说,约束后的像素值的目标取值范围为[0,η]。
基于目标取值范围内每个位置点对应的被约束后的像素值和定位图的分辨率,获取定位图的像素均值A n,c
Figure PCTCN2021106885-appb-000006
其中,(u×v)为定位图的分辨率,CCAM n,c为第n张定位图的第c个类别约束后的像素值。
S202,基于样本图像的标签信息,获取样本图像在类别上的标签值。
本公开实施例中举例说明如何根据样本图像的标签信息获取标签值。若所有样本图像包含的类别按顺序排列有:{兔子、小狗,小猫,...,小鸟},则标签为“小猫”的第n张样本图像,其标注标签为y n={0,0,1...,0},即小猫类别标签值为1,其余类别标签值置0。y n,c为第n张样本图像的标注标签y n={0,0,1...,0}在第c个类别上的取值。
S203,根据像素均值和标签值,确定该类别的损失函数。
针对第c个类别,根据第n个样本图像的类别c的定位图的像素均值A n,c和第n个样本图像在类别c的标签值y n,c,构造类别c的损失函数:
Figure PCTCN2021106885-appb-000007
其中,N表示数据集中一共N张样本图像。
本公开实施例中,所构建的损失函数可以实现在y n,c为1时,说明第n张样本图像包括了类别c的图像,即第n个样本图像的定位图相对该类别c是重要的,本公开实施例中可以通过类别c的上述损失函数,将定位图的像素值均值A n,c,向着增大的方向调整,即增加定位图的像素值;在y n,c为0时,说明第n张样本图像不包括类别c的图像,即第n个样本图像的定位图相对该类别c是不重要的,本公开实施例中可以通过类别c的上述损失函数,将定位图的像素值均值A n,c,向着减小的方向调整,即减小定位图的像素值,以此来引导定位图获取模型尽可能选取高注意力的区域,减小损失函数的值。
在上述实施例的基础上,基于类别的损失函数,对定位图获取模型进行反向调整的过程,如图3所示,可以包括以下步骤:
S301,对所有类别对应的损失函数进行求和,获取定位图获取模型的第一损失函数。
基于步骤S203所获取的某一类别的损失函数,求出所有类别的损失函数,并对所有类别对应的损失函数进行求和,作为定位图获取模型的第一损失函数。
Figure PCTCN2021106885-appb-000008
其中,m表示数据集共有m个类别,α为预设的参数,L c为类别c的第一损失函数。
S302,获取定位图获取模型的第二损失函数。
第二损失函数需适用每个类别的定位图,是分类网络通用的损失函数,可选地,可以使用交叉熵损失函数作为第二损失函数。在训练过程中,可以基于训练误差获取到第二损失函数。
S303,基于第一损失函数和第二损失函数,确定定位图获取模型的总损失函数。
将第一损失函数与第二损失函数进行求和,作为定位图获取模型的总损失函数。
L total=L 1+L 2
其中,L 2表示第二损失函数。
S304,基于总损失函数,确定定位图获取模型的梯度信息,并基于梯度信息反向调整定位图获取模型。
利用总损失函数对定位图获取模型进行训练,确定定位图获取模型的梯度信息,将梯度信息反向传播至定位图获取模型的每一层,并对定位图获取模型每一层的参数如权重进行调整。
图4为本公开实施例中提供的另一种定位图获取模型的训练方法。该定位图获取模型的训练方法包括以下步骤:
S401,将样本图像输入定位图获取模型进行类别识别,获取识别出的每个类别的定位图。
S402,针对每个类别,根据类别的定位图的像素值,获取定位图的像素均值。
S403,基于样本图像的标签信息,获取样本图像在类别上的标签值。
S404,根据像素均值和标签值,确定类别的损失函数。
S405,对所有类别对应的损失函数进行求和,获取定位图获取模型的第一损失函数。
S406,获取定位图获取模型的第二损失函数。
S407,基于第一损失函数和第二损失函数,确定定位图获取模型的总损失函数。
S408,基于总损失函数,确定定位图获取模型的梯度信息,并基于梯度信息反向调整定位图获取模型。
S409,返回使用下一个样本图像对调整后的定位图像获取模型继续训 练,直至训练结束生成目标定位图获取模型。
图5是本公开实施例提供的定位图获取模型的训练方法的示意图,举例说明,以一张“小狗”的图像,输入到定位获取模型中,如图5所示,可以通过特征提取器获取特征图,并输入分类器获取特征向量,进而进行类别识别,获取识别出的每个类别的定位图;基于定位图的标签信息和像素值,获取类别对应的损失函数,基于所有类别对应的损失函数,对定位图获取模型进行反向调整,并返回使用下一个样本图像对调整后的定位图像获取模型继续训练,直至训练结束生成目标定位图获取模型。
在上述实例的基础之上,获取到目标定位图获取模型之后,可以对任意图像进行类型识别,以获取该任意图像的定位图,进而可以获取到图像中的目标。本公开实施例中定位图获取模型在无精准标注的情况下,不仅使模型输出更准确和全面的定位图,而且也使模型的分类结果更优,也就是说,定位图获取模型为一种标注精度弱于输出精度的“弱监督”模型。
图6是根据本公开一个实施例的定位图获取模型的训练装置的结构图,如图6所示,定位图获取模型的训练装置600包括:
第一获取模块61,用于将样本图像输入定位图获取模型进行类别识别,获取识别出的每个类别的定位图;
第二获取模块62,用于获取样本图像的标签信息,并根据样本图像的标签信息和每个根据类别的定位图的像素值,获取每个类别对应的损失函数;
调整模块63,用于基于每个类别对应的损失函数,对定位图获取模型进行反向调整,并返回使用下一个样本图像对调整后的定位图像获取模型继续训练,直至训练结束生成目标定位图获取模型。
需要说明的是,前述对定位图获取模型的训练方法实施例的解释说明也适用于该实施例的定位图获取模型的训练装置,此处不再赘述。
本公开实施例所提出的定位图获取模型的训练装置,首先将样本图像输入定位图获取模型进行类别识别,获取识别出的每个类别的定位图;然后针对每个类别,根据类别的定位图的像素值和样本图像的标签信息,获取类别对应的损失函数;最后基于每个类别对应的损失函数,对定位图获取模型进行反向调整,并返回使用下一个样本图像对调整后的定位图像获 取模型继续训练,直至训练结束生成目标定位图获取模型。本公开实施例通过类别的定位图和样本图像的标签信息,优化模型的损失函数来降低损失函数的值,通过反向调整引导定位图获取模型筛选更高注意力的区域,从而实现定位图的优化。
进一步的,在本公开实施例一种可能的实现方式中,第二获取模块62,还用于:针对每个类别,根据类别的定位图的像素值,获取定位图的像素均值;基于样本图像的标签信息,获取样本图像在类别上的标签值;根据像素均值和标签值,确定类别的损失函数。
进一步的,在本公开实施例一种可能的实现方式中,调整模块63,还用于:对所有类别对应的损失函数进行求和,获取定位图获取模型的第一损失函数;获取定位图获取模型的第二损失函数;基于第一损失函数和第二损失函数,确定位图获取模型的总损失函数;基于总损失函数,确定定位图获取模型的梯度信息,并基于梯度信息反向调整定位图获取模型。
进一步的,在本公开实施例一种可能的实现方式中,第二获取模块62,还用于:将定位图中每个位置点的像素值约束至目标取值范围内;基于目标取值范围内每个位置点对应的被约束后的像素值和定位图的分辨率,获取定位图的像素均值。
进一步的,在本公开实施例一种可能的实现方式中,第二获取模块62,还用于:针对定位图上任一位置点的像素值,将像素值与定位图获取模型中指定的超参数进行比较,选取像素值与超参数中最小值,作为任一位置点对应的被约束后的像素值,其中,超参数用于确定目标取值范围的上限值。
进一步的,在本公开实施例一种可能的实现方式中,第一获取模块61,还用于:针对每个类别,基于类别对应的定位图获取模型中分类权向量和本图像的特征向量,获取类别的定位图。
根据本公开的实施例,本公开还提供了一种电子设备、一种可读存储介质和一种计算机程序产品。
图7是本公开的实施例提供的电子设备的结构示意图。如图7所示,该电子设备700包括包括存储介质71、处理器72及存储在存储器71上并可在处理器72上运行的计算机程序产品,处理器执行计算机程序时,实 现前述的定位图获取模型训练方法。
本文中以上描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、芯片上系统的系统(SOC)、负载可编程逻辑设备(CPLD)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠 标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)、互联网和区块链网络。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。服务端可以是云服务器,又称为云计算服务器或云主机,是云计算服务体系中的一项主机产品,以解决了传统物理主机与VPS服务(“Virtual Private Server”,或简称“VPS”)中,存在的管理难度大,业务扩展性弱的缺陷。服务器也可以为分布式系统的服务器,或者是结合区块链的服务器。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发公开中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本公开公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本公开保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本公开的精神和原则之内所作的修改、等同替换和改进等,均应包含在本公开保护范围之内。

Claims (15)

  1. 一种定位图获取模型的训练方法,包括:
    将样本图像输入定位图获取模型进行类别识别,获取识别出的每个类别的定位图;
    获取所述样本图像的标签信息,并根据所述样本图像的标签信息和每个所述类别的定位图的像素值,获取每个所述类别对应的损失函数;
    基于每个所述类别对应的损失函数,对所述定位图获取模型进行反向调整,并返回使用下一个样本图像对调整后的定位图像获取模型继续训练,直至训练结束生成目标定位图获取模型。
  2. 根据权利要求1所述的方法,其中,所述根据所述样本图像的标签信息和每个所述类别的定位图的像素值,获取每个所述类别对应的损失函数,包括:
    针对每个所述类别,根据所述类别的定位图的像素值,获取所述定位图的像素均值;
    基于所述样本图像的标签信息,获取所述样本图像在所述类别上的标签值;
    根据所述像素均值和所述标签值,确定所述类别的损失函数。
  3. 根据权利要求1或2所述的方法,其中,所述基于每个所述类别对应的损失函数,对所述定位图获取模型进行反向调整,包括:
    对所有所述类别对应的损失函数进行求和,获取所述定位图获取模型的第一损失函数;
    获取所述定位图获取模型的第二损失函数;
    基于所述第一损失函数和所述第二损失函数,确定所述定位图获取模型的总损失函数;
    基于所述总损失函数,确定所述定位图获取模型的梯度信息,并基于所述梯度信息反向调整所述定位图获取模型。
  4. 根据权利要求2所述的方法,其中,所述根据所述定位图的像素值,获取所述定位图的像素均值,包括:
    将所述定位图中每个位置点的像素值约束至目标取值范围内;
    基于所述目标取值范围内每个位置点对应的被约束后的像素值和所述定位图的分辨率,获取所述定位图的像素均值。
  5. 根据权利要求4所述的方法,其中,所述将所述定位图中每个位置点的像素值约束至目标取值范围内,包括:
    针对所述定位图上任一位置点的像素值,将所述像素值与所述定位图获取模型中指定的超参数进行比较,选取所述像素值与所述超参数中最小值,作为所述任一位置点对应的所述被约束后的像素值,其中,所述超参数用于确定所述目标取值范围的上限值。
  6. 根据权利要求1所述的方法,其中,所述获取识别出的每个类别的定位图,包括:
    针对每个类别,基于所述类别对应的所述定位图获取模型中分类权向量和所述样本图像的特征向量,获取所述类别的定位图。
  7. 一种定位图获取模型的训练装置,包括:
    第一获取模块,用于将样本图像输入定位图获取模型进行类别识别,获取识别出的每个类别的定位图;
    第二获取模块,用于获取所述样本图像的标签信息,并根据所述样本图像的标签信息和每个根据所述类别的定位图的像素值,获取每个所述类别对应的损失函数;
    调整模块,用于基于每个所述类别对应的损失函数,对所述定位图获取模型进行反向调整,并返回使用下一个样本图像对调整后的定位图像获取模型继续训练,直至训练结束生成目标定位图获取模型。
  8. 根据权利要求7所述的装置,其中,所述第二获取模块,还用于:
    针对每个所述类别,根据所述类别的定位图的像素值,获取所述定位图的像素均值;
    基于所述样本图像的标签信息,获取所述样本图像在所述类别上的标签值;
    根据所述像素均值和所述标签值,确定所述类别的损失函数。
  9. 根据权利要求7或8所述的装置,其中,所述调整模块,还用于:
    对所有所述类别对应的所述损失函数进行求和,获取所述定位图获取模型的第一损失函数;
    获取所述定位图获取模型的第二损失函数;
    基于所述第一损失函数和所述第二损失函数,确定所述位图获取模型的总损失函数;
    基于所述总损失函数,确定所述定位图获取模型的梯度信息,并基于所述梯度信息反向调整所述定位图获取模型。
  10. 根据权利要求8所述的装置,其中,所述第二获取模块,还用于:
    将所述定位图中每个位置点的像素值约束至目标取值范围内;
    基于所述目标取值范围内每个位置点对应的被约束后的像素值和所述定位图的分辨率,获取所述定位图的像素均值。
  11. 根据权利要求10所述的装置,其中,所述第二获取模块,还用于:
    针对所述定位图上任一位置点的像素值,将所述像素值与所述定位图获取模型中指定的超参数进行比较,选取所述像素值与所述超参数中最小值,作为所述任一位置点对应的所述被约束后的像素值,其中,所述超参数用于确定所述目标取值范围的上限值。
  12. 根据权利要求7所述的装置,其中,所述第一获取模块,还用于:
    针对每个类别,基于所述类别对应的所述定位图获取模型中分类权向量和所述本图像的特征向量,获取所述类别的定位图。
  13. 一种电子设备,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-6中任一项所述的方法。
  14. 一种存储有计算机指令的非瞬时计算机可读存储介质,其中,所述计算机指令用于使所述计算机执行根据权利要求1-6中任一项所述的方法。
  15. 一种计算机程序产品,包括计算机程序,所述计算机程序在被处理器执行时实现根据权利要求1-6中任一项所述的方法。
PCT/CN2021/106885 2021-03-09 2021-07-16 定位图获取模型的训练方法和装置 WO2022188327A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110258523.6 2021-03-09
CN202110258523.6A CN113033549B (zh) 2021-03-09 2021-03-09 定位图获取模型的训练方法和装置

Publications (1)

Publication Number Publication Date
WO2022188327A1 true WO2022188327A1 (zh) 2022-09-15

Family

ID=76468937

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/106885 WO2022188327A1 (zh) 2021-03-09 2021-07-16 定位图获取模型的训练方法和装置

Country Status (2)

Country Link
CN (1) CN113033549B (zh)
WO (1) WO2022188327A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830640A (zh) * 2022-12-26 2023-03-21 北京百度网讯科技有限公司 一种人体姿态识别和模型训练方法、装置、设备和介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113033549B (zh) * 2021-03-09 2022-09-20 北京百度网讯科技有限公司 定位图获取模型的训练方法和装置
CN113344822B (zh) * 2021-06-29 2022-11-18 展讯通信(上海)有限公司 图像降噪方法、装置、终端、存储介质
CN113642740B (zh) * 2021-08-12 2023-08-01 百度在线网络技术(北京)有限公司 模型训练方法及装置、电子设备和介质
CN113901911B (zh) * 2021-09-30 2022-11-04 北京百度网讯科技有限公司 图像识别、模型训练方法、装置、电子设备及存储介质
CN114612732A (zh) * 2022-05-11 2022-06-10 成都数之联科技股份有限公司 样本数据增强方法及系统及装置及介质及目标分类方法
CN115049878B (zh) * 2022-06-17 2024-05-03 平安科技(深圳)有限公司 基于人工智能的目标检测优化方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10102444B2 (en) * 2016-11-22 2018-10-16 Lunit Inc. Object recognition method and apparatus based on weakly supervised learning
CN111046939A (zh) * 2019-12-06 2020-04-21 中国人民解放军战略支援部队信息工程大学 基于注意力的cnn类别激活图生成方法
CN111639755A (zh) * 2020-06-07 2020-09-08 电子科技大学中山学院 一种网络模型训练方法、装置、电子设备及存储介质
CN111950579A (zh) * 2019-05-17 2020-11-17 北京京东尚科信息技术有限公司 分类模型的训练方法和训练装置
CN113033549A (zh) * 2021-03-09 2021-06-25 北京百度网讯科技有限公司 定位图获取模型的训练方法和装置

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902798A (zh) * 2018-05-31 2019-06-18 华为技术有限公司 深度神经网络的训练方法和装置
CN109784424B (zh) * 2019-03-26 2021-02-09 腾讯科技(深圳)有限公司 一种图像分类模型训练的方法、图像处理的方法及装置
CN110334807B (zh) * 2019-05-31 2021-09-28 北京奇艺世纪科技有限公司 深度学习网络的训练方法、装置、设备和存储介质
CN111723815B (zh) * 2020-06-23 2023-06-30 中国工商银行股份有限公司 模型训练方法、图像处理方法、装置、计算机系统和介质
CN111739027B (zh) * 2020-07-24 2024-04-26 腾讯科技(深圳)有限公司 一种图像处理方法、装置、设备及可读存储介质
CN112183635A (zh) * 2020-09-29 2021-01-05 南京农业大学 一种多尺度反卷积网络实现植物叶部病斑分割与识别方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10102444B2 (en) * 2016-11-22 2018-10-16 Lunit Inc. Object recognition method and apparatus based on weakly supervised learning
CN111950579A (zh) * 2019-05-17 2020-11-17 北京京东尚科信息技术有限公司 分类模型的训练方法和训练装置
CN111046939A (zh) * 2019-12-06 2020-04-21 中国人民解放军战略支援部队信息工程大学 基于注意力的cnn类别激活图生成方法
CN111639755A (zh) * 2020-06-07 2020-09-08 电子科技大学中山学院 一种网络模型训练方法、装置、电子设备及存储介质
CN113033549A (zh) * 2021-03-09 2021-06-25 北京百度网讯科技有限公司 定位图获取模型的训练方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115830640A (zh) * 2022-12-26 2023-03-21 北京百度网讯科技有限公司 一种人体姿态识别和模型训练方法、装置、设备和介质
CN115830640B (zh) * 2022-12-26 2024-03-05 北京百度网讯科技有限公司 一种人体姿态识别和模型训练方法、装置、设备和介质

Also Published As

Publication number Publication date
CN113033549A (zh) 2021-06-25
CN113033549B (zh) 2022-09-20

Similar Documents

Publication Publication Date Title
WO2022188327A1 (zh) 定位图获取模型的训练方法和装置
US11977967B2 (en) Memory augmented generative temporal models
CN109993102B (zh) 相似人脸检索方法、装置及存储介质
WO2019155064A1 (en) Data compression using jointly trained encoder, decoder, and prior neural networks
CN113361578B (zh) 图像处理模型的训练方法、装置、电子设备及存储介质
CN112949710A (zh) 一种图像的聚类方法和装置
CN108564102A (zh) 图像聚类结果评价方法和装置
US20220188636A1 (en) Meta pseudo-labels
WO2022227759A1 (zh) 图像类别的识别方法、装置和电子设备
CN116596916B (zh) 缺陷检测模型的训练和缺陷检测方法及其装置
CN113963148B (zh) 对象检测方法、对象检测模型的训练方法及装置
CN114842343A (zh) 一种基于ViT的航空图像识别方法
CN113947140A (zh) 人脸特征提取模型的训练方法和人脸特征提取方法
CN114971375A (zh) 基于人工智能的考核数据处理方法、装置、设备及介质
CN114494776A (zh) 一种模型训练方法、装置、设备以及存储介质
CN114972910A (zh) 图文识别模型的训练方法、装置、电子设备及存储介质
JP7081454B2 (ja) 処理装置、処理方法、及び処理プログラム
CN114913339A (zh) 特征图提取模型的训练方法和装置
CN114067099A (zh) 学生图像识别网络的训练方法及图像识别方法
WO2023273695A1 (zh) 产品漏检的识别方法、装置、电子设备及存储介质
Varshneya et al. Learning interpretable concept groups in CNNs
CN114881227A (zh) 模型压缩方法、图像处理方法、装置和电子设备
JP7081455B2 (ja) 学習装置、学習方法、及び学習プログラム
CN110276760B (zh) 一种图像场景分割方法、终端及存储介质
CN113205131A (zh) 图像数据的处理方法、装置、路侧设备和云控平台

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21929796

Country of ref document: EP

Kind code of ref document: A1