WO2023056802A1 - 一种最大化互信息的图像分类方法、设备、介质及系统 - Google Patents

一种最大化互信息的图像分类方法、设备、介质及系统 Download PDF

Info

Publication number
WO2023056802A1
WO2023056802A1 PCT/CN2022/116045 CN2022116045W WO2023056802A1 WO 2023056802 A1 WO2023056802 A1 WO 2023056802A1 CN 2022116045 W CN2022116045 W CN 2022116045W WO 2023056802 A1 WO2023056802 A1 WO 2023056802A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
image classification
mutual information
image
parameters
Prior art date
Application number
PCT/CN2022/116045
Other languages
English (en)
French (fr)
Inventor
戴文睿
王曜明
刘育辰
李成林
邹君妮
熊红凯
Original Assignee
上海交通大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海交通大学 filed Critical 上海交通大学
Publication of WO2023056802A1 publication Critical patent/WO2023056802A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the invention relates to the technical fields of artificial intelligence and image processing, in particular to an image classification method for maximizing mutual information, computer equipment and a readable storage medium thereof.
  • the big data-driven approach enables the network to automatically learn the most suitable feature extraction filter bank for image classification targets.
  • the emergence of neural networks has abandoned the manual design of feature extraction operators, and has achieved performance beyond human experts in image classification tasks.
  • the emergence of neural networks eliminates the need to manually design feature extraction operators, saves human resource consumption, and improves image classification performance, the architecture of neural networks still relies on manual design and construction.
  • the image classification method that maximizes mutual information gives the machine the ability to independently design a neural network for specific image classification, and at the same time, it can automatically design a neural network model with better performance in a very short period of time, which can be used for image classification tasks in industrial applications.
  • the neural network construction provides a more efficient and convenient solution, such as efficient application to different types of image data (such as natural images, medical images, etc.), efficient configuration in devices with different computing resources (such as central servers, edge devices, etc.) wait).
  • the existing image classification methods based on neural networks still have the following deficiencies: (1) For specific image data, experts manually design image classification methods based on neural networks, the design process is complicated and consumes a lot of human resources and computing resources ; (2) The manually designed image classification method based on neural network is not the best neural network due to the limitation of expert knowledge, and the performance of image classification has a large room for improvement. (3) At present, the existing image classification methods for automatic design of neural networks have high computational costs and need to spend a lot of time, and at the same time, there is room for improvement in performance.
  • the present invention provides an image classification method that maximizes mutual information, which can automatically determine the network structure and parameters of the neural network according to the given image data for image classification, greatly reducing the design time and human resource consumption, while achieving higher image classification accuracy.
  • a kind of image classification method that maximizes mutual information comprising:
  • the obtained neural network is used to process the image data to be classified to obtain an image classification result.
  • the training image collected is divided into two parts; the mutual information of the maximum training image and the neural network structure automatically determines the network structure and parameters of the neural network, including:
  • Construct a hypernetwork and a structure-generating network perform data processing on them respectively to obtain the network parameters of the hypernetwork and the parameters of the structure-generating network, and construct the target network;
  • said constructing a hypernetwork and a structure generation network performing data processing on it respectively to obtain network parameters of a hypernetwork and parameters of a structure generation network, and constructing a target network, including:
  • the supernetwork is stacked by basic units that contain all possible image classification operations
  • the output of the structure generation network and the noise summation obtained by the sampling are used as the structural parameters of the super network
  • the lower bound of the mutual information is the posterior distribution of the structural parameters and the cross-entropy loss of the posterior distribution of the image data, calculating the cross-entropy loss, and using the gradient descent method to update the parameters of the structure generation network;
  • the structure generation network refers to a neural network formed by stacking a convolutional neural network, a modified linear unit, and a batch normalization layer
  • the input of the structure generation network is the sampling value ⁇ N(0,1) obtained from the standard Gaussian distribution;
  • the output of the structure generation network is to input the sampled value ⁇ to the structure generation network Forward propagation gets the output as
  • the structural parameters of the hypernetwork are obtained by sampling noise ⁇ N(0,1) from a standard Gaussian distribution and summing them with the output of the structure generation network That is, the structural parameters of the hypernetwork are composed of two parts, one part is generated by the structure generation network based on sampling in a standard Gaussian distribution with a mean of 0 and a variance of 1; the other part is the noise obtained by sampling from a standard Gaussian distribution .
  • the mutual information of the image data and the structural parameters of the hypernetwork refers to where D represents the given training image data, A represents the structural parameters of the hypernetwork, Denotes the network parameters describing the dataset D given the structural parameters A.
  • the lower bound of the mutual information refers to in:
  • H(D) represents the information entropy of the training image data D
  • D represents the joint probability distribution of the training data and the structural parameters of the supernetwork
  • A) represents the posterior distribution of the data used to approximate the real Variational distribution of .
  • the posterior distribution of the structural parameters and the cross-entropy loss of the posterior distribution of the image data refer to in Represents the conditional probability distribution of the structural parameters of the hypernetwork given the training image data D, and q ⁇ (D
  • the stacking of the updated basic units to construct the target network includes: obtaining sampling values from a standard Gaussian distribution, inputting the structure to generate the network, obtaining the structural parameters of the output supernetwork, and according to the structural parameters of the supernetwork, from In the hypernetwork built by the basic unit, select the specified number of image classification operations with the largest corresponding parameter values, and finally obtain the target network.
  • a computer device comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, and the above-mentioned maximization is realized when the processor executes the program.
  • Image classification methods based on mutual information.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the above-mentioned image classification method for maximizing mutual information is realized.
  • a chip system including a processor, the processor is coupled to a memory, the memory stores program instructions, and when the program instructions stored in the memory are executed by the processor At the same time, the above-mentioned image classification method that maximizes mutual information is realized.
  • the present invention has at least one beneficial effect as follows:
  • the above-mentioned image classification method for maximizing mutual information in the present invention automatically designs and determines the network structure and parameters of the neural network by maximizing mutual information based on the given image data, and is used for image classification without complicated manual design. And save the consumption of human resources and computing resources.
  • the above-mentioned image classification method for maximizing mutual information of the present invention can automatically design and obtain an image classification method based on a neural network in a very short period of time, and at the same time can achieve a higher accuracy rate of image classification.
  • Fig. 1 is a flowchart of an image classification method for maximizing mutual information in an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a supernetwork of basic unit (cell) stacking according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a target structural unit finally obtained through continuous training and iterative updating of network parameters of a hypernetwork and parameters of a structure generation network according to an embodiment of the present invention.
  • the present invention provides an embodiment, an image classification method that maximizes mutual information, including:
  • the obtained neural network is used to process the image data to be classified to obtain the image classification result.
  • maximize the mutual information of the training image and the neural network structure automatically determine the network structure and parameters of the neural network, including:
  • Construct a hypernetwork and a structure-generating network perform data processing on them respectively to obtain the network parameters of the hypernetwork and the parameters of the structure-generating network, and construct the target network;
  • Input all training images into the target network to generate predicted image category labels, calculate the cross-entropy loss of image classification according to the predicted image category labels and real image category labels, and train the target network until convergence for image classification.
  • construct a hypernetwork and a structure generation network perform data processing on it respectively to obtain network parameters of the supernetwork and parameters of the structure generation network, and construct a target network, including:
  • S1 according to all possible operations of image classification, use basic units to construct a super network, where the super network is stacked by basic units that include all possible image classification operations;
  • the super network is composed of a series of basic units stacked according to certain rules.
  • Each basic unit includes multiple nodes.
  • edges For any two connected nodes, it means that a certain node is used as an input, and the output is passed to the connected node through the operation on the connected edge.
  • each basic unit contains two input nodes, four intermediate nodes and one output node. Except that the two input nodes are not connected and the output node is a cascade of four intermediate nodes, all nodes are They are connected in pairs to form fourteen possible sides, and each possible side contains all eight possible operations.
  • the method of maximizing mutual information obtains the posterior distribution of structural parameters and the cross-entropy loss of the posterior distribution of image data according to the input image data, trains and iteratively updates the parameters of the structure generation network, and finally determines the structure of the basic unit from the supernetwork ( options for possible actions).
  • the basic units are stacked into a super network as shown in FIG. 2 , wherein the basic units are divided into two types: Normal units and Reduction units.
  • the step size of all possible operations in the Normal unit is 1, so that the output after passing through the Normal unit is consistent with the input dimension, and the step size of the operation adjacent to the input node in the Reduction unit is set to 2, so that the input dimension passes through this unit. Reduce by half.
  • a reasonable overall structure is set for the super network, and after the basic units are obtained through network structure search, they are stacked to form a final usable overall network structure.
  • the convolution kernel slides on the input image for convolution operation, and extracts a new image through the convolution filter.
  • Features in which multiple convolution kernels (from multiple angles) can be used to extract the features of the image.
  • input the obtained feature map to the fully connected layer map the extracted image features to the corresponding image category label, and obtain the predicted category label of the image, and calculate the image classification cross entropy loss according to the real category label of the image, and then
  • the network parameters of the hypernetwork are updated by a gradient descent method.
  • Network parameters refer to the parameters of all image classification operations in all basic units, such as the weight parameters of convolution kernels, etc.
  • S4 input the second part of the training image into the supernetwork, maximize the mutual information of the image data and the structural parameters of the supernetwork, and determine the lower bound of the mutual information, wherein: the lower bound of the mutual information is the posterior distribution of the structural parameters and the posterior distribution of the image data
  • the cross-entropy loss of the posterior distribution calculates the cross-entropy loss, and uses the gradient descent method to update the parameters of the structure generation network
  • the structure generation network refers to a neural network formed by stacking a convolutional neural network, a modified linear unit, and a batch normalization layer Sampling from the standard Gaussian distribution N(0,1), the sampling value obtained is ⁇ N(0,1), which is used as the input of the structure generation network, and the output of the structure generation network is obtained through forward propagation. Then sample the noise ⁇ N(0,1) from the standard Gaussian distribution, and sum it with the output of the structure generation network to get Participate in training as a structural parameter of the supernetwork.
  • the mutual information of the image data and the structural parameters of the hypernetwork refers to where D represents the given training image data, A represents the structural parameters of the hypernetwork, Denotes the network parameters describing the dataset D given the structural parameters A.
  • the lower bound of mutual information refers to Among them: H(D) represents the information entropy of the training image data D, Represents the joint probability distribution of the training data and the structural parameters of the supernetwork, q ⁇ (D
  • the posterior distribution of the structural parameters and the cross-entropy loss of the posterior distribution of the image data refer to in Represents the conditional probability distribution of the structural parameters of the hypernetwork given the training image data D, and q ⁇ (D
  • the updated new basic units are stacked to build the target network, including: obtaining sampling values from the standard Gaussian distribution, inputting the structure to generate the network, and obtaining the structural parameters of the output supernetwork. According to the structural parameters of the supernetwork, Select a specified number of image classification operations with the largest corresponding parameter values from the supernetwork constructed by the basic unit, and finally obtain the target network.
  • a computer device including a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein the above-mentioned maximum mutual information is realized when the processor executes the program image classification method.
  • a computer-readable storage medium on which a computer program is stored, which is characterized in that, when the program is executed by a processor, the above-mentioned image classification method for maximizing mutual information is realized.
  • a chip system including a processor, a coupling between the processor and a memory, and the memory stores program instructions, and it is characterized in that, when the program instructions stored in the memory are executed by the processor, the above-mentioned claim is realized.
  • the present invention also provides a schematic flowchart of another preferred embodiment, refer to FIG. 1 .
  • Image classification methods that maximize mutual information including:
  • S1 according to all possible operations of image classification, use basic units to construct a super network, where the super network is stacked by basic units that include all possible image classification operations;
  • the mutual information of the image data and the structural parameters of the hypernetwork refers to where D represents the given training image data, A represents the structural parameters of the hypernetwork, Indicates that given the structural parameter A, the network parameters of the data set D can be accurately described.
  • the formula is equivalently replaced to get
  • H(D) represents the information entropy of the training image data D
  • D Represents the joint probability distribution of the training data and the structural parameters of the supernetwork
  • A) represents the posterior distribution of the data used to approximate the real
  • the cross-entropy loss of the posterior distribution of structural parameters and the posterior distribution of image data means that for a given training image data, the information entropy H(D) is a fixed value that has nothing to do with the model parameters, so the lower bound of mutual information is optimized, can be further equivalent to optimizing
  • the formula is the cross-entropy loss of the posterior distribution of the structural parameters and the posterior distribution of the image data.
  • the cross-entropy loss is optimized using gradient descent to update the parameters of the structure generation network.
  • S5 input all the training images into the target network, generate predicted image category labels, calculate the cross-entropy loss of image classification according to the predicted image category labels and real image category labels, train the target network until convergence, and use it for image classification.
  • the target network is trained based on the criterion of reducing the cross-entropy loss, and the target network with the smallest cross-entropy loss is finally obtained through training.
  • the target structural unit is finally obtained: the left is the normal cell (Normal cell), and the right is the reduction cell (Reduction cell).
  • the target structural unit is output to the image classification module as a result of maximizing mutual information.
  • the target structural unit shown in Fig. 3 is the convergent structure obtained after the image classification method of maximizing mutual information is trained for 1 epoch, and this method only needs a very short In 0.007 GPU-days, the target optimization network can be automatically designed for image classification.
  • the target optimized network achieves an image classification test error (%) of 2.45 ⁇ 0.04 on CIFAR-10 with 3.9M parameters, and can reach 15.80% and 24.0% when migrating to CIFAR-100 and ImageNet datasets Image classification test error rate.
  • the automatic design process of this image classification method can be obtained by running on a single Nvidia 1080Ti GPU in about 0.09 hours. Compared with DARTS (9.6 hours) and SNAS (36 hours), this result can significantly improve the design efficiency of the image classification method, and at the same time achieve a high image classification accuracy rate, ensuring the practicability and reliability of the image classification method. stability.
  • table 1 Concrete comparison result can be seen as shown in following table 1, table 2, table 3, table 4, and table 1 is that the search of the embodiment of the present invention obtains the evaluation comparison (on the CIFAR-10 data set), and table 2 is the result of the embodiment of the present invention
  • the search result evaluation comparison on the ImageNet data set
  • table 3 is the search result evaluation comparison (on the NAS-Bench-201 benchmark) of the embodiment of the present invention
  • table 4 is the search result evaluation comparison of the embodiment of the present invention (On the search space of S1-S4), the lower the error rate in the table represents the better performance, and the smaller the amount of parameters and computation represents the structure with higher efficiency.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明提供一种最大化互信息的图像分类方法、设备、介质及系统,包括:采集训练图像;最大化训练图像和神经网络结构的互信息,自动确定神经网络的网络结构和参数;采用得到的所述神经网络对待分类的图像数据进行处理,得到图像分类结果。本发明基于给定图像数据,通过最大化互信息方式,自动设计确定神经网络的网络结构和参数,用于图像分类,无需繁杂的人工设计,节省人力资源和计算资源消耗。本发明能在极短的时间内,自动设计得到基于神经网络的图像分类方法,同时能实现较高的图像分类准确率。

Description

一种最大化互信息的图像分类方法、设备、介质及系统 技术领域
本发明涉及人工智能和图像处理技术领域,具体涉及一种最大化互信息的图像分类方法,及其计算机设备及可读存储介质。
背景技术
随着大数据的发展和计算能力的提升,人工智能技术在近些年来取得了飞速的发展,工业界对于图像分类等图像处理的应用需求也越来越高。早期的图像处理受到采集设备的限制,得到图像质量不高,主要采用的技术是手工特征提取,一些手工设计的特征提取算子例如HoG和信号处理方法例如小波分析,促进了图像处理的早期发展。但是手工设计的特征提取算子是科学家基于对给定图像的先验认识和分析设计得到的,其不可避免地保留着人为先验带来的偏见,在图像分类任务上表现的性能始终无法超越人类。而神经网络则是将特征提取器和分类器联合在一起进行端到端的训练,大数据驱动的方式使得网络自动学习到最适用于图像分类目标的特征提取滤波器组。神经网络的出现摒弃了人为的手工设计特征提取算子,并且在图像分类任务上达到了超越人类专家的性能。虽然神经网络的出现使得人们无需手工设计特征提取算子,节省了人力资源消耗,并且提高了图像分类性能,但是本身神经网络的架构仍然依赖手工设计搭建。
在过去的十多年里,手工设计的神经网络在图像分类等图像处理任务上取得了不错的性能,但日益复杂的神经网络以及多元的图像分类需求使得设计神经网络成为了一种繁杂、效率低下且人力和计算资源消耗大的工作。最大化互信息的图像分类方法则赋予机器自主设计针对特定图像分类的神经网络的能力,同时能在极短的时间内自动设计得到性能较好的神经网络模型,为工业应用中针对图像分类任务的神经网络搭建提供了一种更加高效和便捷的解决方案,例如高效应用于不同种类的图像数据(如自然图像、医学图像等),高效配置于不同计算资源的设备(如中心服务器、边缘设备等)。随着大数据的发展和计算能力的提升,基于神经网络的图像分类方法在近些年来得到了一定的关注和发展,比如,Googlenet、Resnet、 Densenet等神经网络架构在自然图像数据集ImageNet上的图像分类精度已经超过人类,但是繁杂的人力资源消耗,冗长的网络设计时间和特定的面向数据集使得基于神经网络的图像分类方法的实用性较差。因此,现在需要新的基于神经网络的图像分类方法能大幅度降低神经网络设计的时间,同时能保证较高的性能,使得图像分类方法能高效地部署在实际工业应用中。
另外,现有的基于神经网络的图像分类方法还存在以下不足:(1)针对特定的图像数据,专家手工设计基于神经网络的图像分类方法,设计过程繁杂且需要消耗大量的人力资源和计算资源;(2)手工设计的基于神经网络的图像分类方法,由于专家知识的局限,导致其不是最佳的神经网络,图像分类的性能有较大提升空间。(3)目前现有的自动设计神经网络的图像分类方法,其计算代价较高,需要花费大量的时间,同时性能也有一定的提升空间。
发明内容
本发明针对现有技术的不足,提供了一种最大化互信息的图像分类方法,能根据给定的图像数据,自动确定神经网络的网络结构和参数,用于图像分类,大幅度减少设计时间和人力资源消耗,同时能实现更高的图像分类准确率。
根据本发明的一个方面,提供一种最大化互信息的图像分类方法,包括:
采集训练图像;
最大化训练图像和神经网络结构的互信息,自动确定神经网络的网络结构和参数;
采用得到的所述神经网络对待分类的图像数据进行处理,得到图像分类结果。
优选地,将采集到的训练图像分为两部分;所述最大化训练图像和神经网络结构的互信息,自动确定神经网络的网络结构和参数,包括:
构建超网络和结构生成网络,分别对其进行数据处理获得超网络的网络参数和结构生成网络的参数,并构建目标网络;
将全部训练图像输入所述目标网络,生成预测的图像类别标签,根据所述预测的图像类别标签与真实的图像类别标签,计算图像分类的交叉熵损失,训练目标网络直至收敛,用于图像分类。
优选地,所述构建超网络和结构生成网络,分别对其进行数据处理获得超网络的网络参数和结构生成网络的参数,并构建目标网络,包括:
S1,基于图像分类所有可能的操作,构建基本单元;
再利用所述基本单元构建超网络,其中:
所述超网络是由包含所有可能的图像分类操作的基本单元堆叠而成;
S2,基于卷积神经网络构建结构生成网络,从标准高斯分布中采样,得到采样值作为结构生成网络的输入,经过前向传播得到结构生成网络的输出;
再从标准高斯分布中采样得到噪声;
将所述结构生成网络的输出与所述采样得到的噪声求和作为超网络的结构参数;
S3,将第一部分训练图像输入所述超网络,生成预测类别标签;
根据所述预测类别标签与真实类别标签,计算图像分类交叉熵损失;
利用梯度下降方法根据所述图像分类交叉熵损失更新超网络的网络参数;
S4,将第二部分训练图像输入所述超网络,最大化图像数据和超网络的结构参数的互信息,并确定互信息的下界,其中:
互信息的下界为结构参数的后验分布和图像数据的后验分布的交叉熵损失,计算所述交叉熵损失,并利用梯度下降方法更新结构生成网络的参数;
重复S2-S4不断迭代更新所述超网络的网络参数和结构生成网络的参数,直到收敛,将更新得到的新基本单元堆叠构建目标网络。
优选地,所述结构生成网络,是指利用卷积神经网络、修正线性单元、批归一化层堆叠而成的神经网络
Figure PCTCN2022116045-appb-000001
所述结构生成网络的输入是从标准高斯分布中得到的采样值∈~N(0,1);
所述结构生成网络的输出是将采样值∈输入结构生成网络
Figure PCTCN2022116045-appb-000002
前向传播得到输出为
Figure PCTCN2022116045-appb-000003
所述超网络的结构参数是从标准高斯分布中采样噪声ε~N(0,1),并与所述结构生成网络的输出求和得到的
Figure PCTCN2022116045-appb-000004
即超网络的结构参数是由两部分相加构成,一部分是结构生成网络根据均值为0、方差为1的标准高斯分布中的采样生成的;另一部分则是从标准高斯分布中采样得到的噪声。
优选地,所述图像数据和超网络的结构参数的互信息,是指
Figure PCTCN2022116045-appb-000005
其中D表示给定的训练图像数据,A表示超网络的结构参数,
Figure PCTCN2022116045-appb-000006
表示给定结构参数A的情况下,描述数据集D的网络参数。
优选地,所述互信息的下界,是指
Figure PCTCN2022116045-appb-000007
其中:
H(D)表示训练图像数据D的信息熵,
Figure PCTCN2022116045-appb-000008
表示训练数据和超网络的结构参数的联合概率分布,q θ(D|A)表示用来近似真实的数据后验分布
Figure PCTCN2022116045-appb-000009
的变分分布。
优选地,所述结构参数的后验分布和图像数据的后验分布的交叉熵损失,是指
Figure PCTCN2022116045-appb-000010
其中
Figure PCTCN2022116045-appb-000011
表示给定训练图像数据D后,超网络的结构参数的条件概率分布,q θ(D|A)表示给定超网络的结构参数后,训练图像数据的变分条件概率分布。
优选地,所述将更新得到的新基本单元堆叠构建目标网络,包括:从标准高斯分布中得到采样值,输入结构生成网络,得到输出的超网络的结构参数,根据超网络的结构参数,从基本单元构建的超网络中选定规定数目的对应参数数值最大的图像分类操作,最终得到目标网络。
根据本发明的第二个方面,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现上述的最大化互信息的图像分类方法。
根据本发明的第三个方面,提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述的最大化互信息的图像分类方法。
根据本发明的第四个方面,提供一种芯片系统,包括处理器,所述处理器与存储器的耦合,所述存储器存储有程序指令,当所述存储器存储的程序指令被所述处理器执行时实现上述的最大化互信息的图像分类方法。
与现有技术相比,本发明具有如下至少一种有益效果:
本发明上述的最大化互信息的图像分类方法,基于给定的图像数据,通过最大化互信息的方式,自动设计确定神经网络的网络结构和参数,用于图像分类,无需繁杂的人工设计,并节省人力资源和计算资源的消耗。
本发明上述的最大化互信息的图像分类方法,能在极短的时间内,自动设计得到基于神经网络的图像分类方法,同时能实现较高的图像分类准确率。
附图说明
通过阅读参照以下附图对非限制性实施例所作的详细描述,本发明的其它特征、目的和优点将会变得更明显:
图1为本发明一实施例中最大化互信息的图像分类方法的流程图;
图2为本发明一实施例的基本单元(cell)堆叠的超网络示意图;
图3为本发明一实施例的经过不断训练迭代更新超网络的网络参数和结构生成网络的参数最终得到的目标结构单元示意图。
具体实施方式
下面结合具体实施例对本发明进行详细说明。以下实施例将有助于本领域的技术人员进一步理解本发明,但不以任何形式限制本发明。应当指出的是,对本领域的普通技术人员来说,在不脱离本发明构思的前提下,还可以做出若干变形和改进。这些都属于本发明的保护范围。
本发明提供一个实施例,一种最大化互信息的图像分类方法,包括:
采集训练图像;
最大化训练图像和神经网络结构的互信息,自动确定神经网络的网络结构和参数;
采用得到的神经网络对待分类的图像数据进行处理,得到图像分类结果。
作为较优实施例,最大化训练图像和神经网络结构的互信息,自动确定神经网络的网络结构和参数,包括:
构建超网络和结构生成网络,分别对其进行数据处理获得超网络的网络参数和结构生成网络的参数,并构建目标网络;
将全部训练图像输入目标网络,生成预测的图像类别标签,根据预测的图像类别标签与真实的图像类别标签,计算图像分类的交叉熵损失,训练目标网络直至收敛,用于图像分类。
作为较优实施例,构建超网络和结构生成网络,分别对其进行数据处理获得超网络的网络参数和结构生成网络的参数,并构建目标网络,包括:
S1,根据图像分类所有可能的操作,利用基本单元构建超网络,其中,超网络是由包含了所有可能的图像分类操作的基本单元堆叠而成;
超网络由一系列基本单元按照一定的规则堆叠构成,每个基本单元包括多个节点,节点与节点之间存在所有可能层间连接(边),并在边上定义所有可能的图像分类操作,包括卷积(convolution)、池化(pooling)、跳过(skip-connect)等。对任意相连的两个节点,表示将某个节点作为输入,通过相连边上的操作,将输出传递至所连接的节点。超网络中,每个基本单元共包含两个输入节点,四个中间节点和一个输出节点,除了两个输入节点之间不相连以及输出节点作为四个中间节点的级联之外,所有节点均两两相连,共构成了十四条可能边,并且每条可能边上包含所有八个可能操作。最大化互信息方法根据输入图像数据,得到结构参数的后验分布和图像数据的后验分布的交叉熵损失, 训练迭代更新结构生成网络的参数,并最终自超网络中确定基本单元的结构(可能操作的选择)。
如图2所示,在一实施例中,将基本单元按照图2所示堆叠成超网络,其中基本单元分为Normal单元和Reduction单元两类。Normal单元中的所有可能操作的步长都为1,使得通过Normal单元后输出与输入维度一致,而Reduction单元中邻接输入节点的操作的步长均设为2,使得输入的维度经过此单元后降低一半。本实施例为超网络设定了合理的整体结构,通过网络结构搜索得到基本单元后,堆叠成最后可以使用的整体网络结构。
S2,基于卷积神经网络构建结构生成网络,从标准高斯分布中采样,得到采样值作为结构生成网络的输入,经过前向传播得到结构生成网络的输出;再从标准高斯分布中采样得到噪声;将结构生成网络的输出与采样得到的噪声求和作为超网络的结构参数;
S3,将第一部分训练图像输入超网络,生成预测类别标签;根据预测类别标签与真实类别标签,计算图像分类交叉熵损失;利用梯度下降方法根据图像分类交叉熵损失更新超网络的网络参数;
先将图像数据输入到神经网络的卷积层、修正线性单元、批归一化层后得到特征图,卷积核在输入图像上滑动进行卷积操作,通过卷积滤波器提取到新的图像特征,其中可以使用多个卷积核(从多个角度)去提取到图像的特征。然后将得到的特征图输入到全连接层,将提取到的图像特征映射到对应的图像类别标签上,得到图像的预测类别标签,根据图像的真实类别标签,计算得到图像分类交叉熵损失,再通过梯度下降方法更新超网络的网络参数。
网络参数是指所有基本单元中全部图像分类操作的参数,比如卷积核的权重参数等。
S4,将第二部分训练图像输入超网络,最大化图像数据和超网络的结构参数的互信息,并确定互信息的下界,其中:互信息的下界为结构参数的后验分布和图像数据的后验分布的交叉熵损失,计算交叉熵损失,并利用梯度下降方法更新结构生成网络的参数;
重复S2-S4不断迭代更新超网络的网络参数和结构生成网络的参数,直到收敛,将更新得到的新基本单元堆叠构建目标网络。
作为较优实施例,结构生成网络是指利用卷积神经网络、修正线性单元、批归一化层堆叠而成的神经网络
Figure PCTCN2022116045-appb-000012
从标准高斯分布N(0,1)中采样,得到采样值为∈~N(0,1),作为结构生成网络的输入,经过前向传播得到结构生成网络的输出为
Figure PCTCN2022116045-appb-000013
再从标准高斯分布中采样噪声ε~N(0,1),并与结构生成网络的输出求和得到
Figure PCTCN2022116045-appb-000014
作为超网络 的结构参数参与训练。
作为较优实施例,图像数据和超网络的结构参数的互信息,是指
Figure PCTCN2022116045-appb-000015
其中D表示给定的训练图像数据,A表示超网络的结构参数,
Figure PCTCN2022116045-appb-000016
表示给定结构参数A的情况下,描述数据集D的网络参数。
作为较优实施例,互信息的下界,是指
Figure PCTCN2022116045-appb-000017
其中:H(D)表示训练图像数据D的信息熵,
Figure PCTCN2022116045-appb-000018
表示训练数据和超网络的结构参数的联合概率分布,q θ(D|A)表示用来近似真实的数据后验分布
Figure PCTCN2022116045-appb-000019
的变分分布。
作为较优实施例,结构参数的后验分布和图像数据的后验分布的交叉熵损失,是指
Figure PCTCN2022116045-appb-000020
其中
Figure PCTCN2022116045-appb-000021
表示给定训练图像数据D后,超网络的结构参数的条件概率分布,q θ(D|A)表示给定超网络的结构参数后,训练图像数据的变分条件概率分布。
作为较优实施例,将更新得到的新基本单元堆叠构建目标网络,包括:从标准高斯分布中得到采样值,输入结构生成网络,得到输出的超网络的结构参数,根据超网络的结构参数,从基本单元构建的超网络中选定规定数目的对应参数数值最大的图像分类操作,最终得到目标网络。
基于本发明的相同构思,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,处理器执行程序时实现上述的最大化互信息的图像分类方法。
基于本发明的相同构思,提供一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现上述的最大化互信息的图像分类方法。
基于本发明的相同构思,提供一种芯片系统,包括处理器,处理器与存储器的耦合,存储器存储有程序指令,其特征在于,当存储器存储的程序指令被处理器执行时实现权利要求上述的最大化互信息的图像分类方法。
本发明还提供了另一个较优实施例的流程示意图,参照图1。最大化互信息的图像分类方法,包括:
S1,根据图像分类所有可能的操作,利用基本单元构建超网络,其中,超网络是由包含了所有可能的图像分类操作的基本单元堆叠而成;
S2,基于卷积神经网络来构建结构生成网络,从标准高斯分布中采样,得到采样值作为结构生成网络的输入,经过前向传播得到结构生成网络的输出,再从标准高斯分布中采样得到噪声,将输出与采样的噪声求和作为超网络的结构参数;
S3,将第一部分训练图像输入超网络,生成预测类别标签;根据预测类别标签与真实类别标签,计算图像分类交叉熵损失;利用梯度下降方法根据图像分类交叉熵损失更新超网络的网络参数;
S4,将第二部分训练图像输入超网络,最大化图像数据和超网络的结构参数的互信息,公式推导互信息的下界为结构参数的后验分布和图像数据的后验分布的交叉熵损失,计算交叉熵损失,并利用梯度下降方法更新结构生成网络的参数;
本步骤中,图像数据和超网络的结构参数的互信息是指
Figure PCTCN2022116045-appb-000022
其中D表示给定的训练图像数据,A表示超网络的结构参数,
Figure PCTCN2022116045-appb-000023
表示给定结构参数A的情况下,能准确描述数据集D的网络参数。公式等价替换得到
Figure PCTCN2022116045-appb-000024
其中H(D)表示训练图像数据D的信息熵,
Figure PCTCN2022116045-appb-000025
表示训练数据和超网络的结构参数的联合概率分布,q θ(D|A)表示用来近似真实的数据后验分布
Figure PCTCN2022116045-appb-000026
的变分分布,由于KL散度恒大于等于0,即
Figure PCTCN2022116045-appb-000027
恒成立,得到互信息的下界为
Figure PCTCN2022116045-appb-000028
结构参数的后验分布和图像数据的后验分布的交叉熵损失是指,对于给定的训练图像数据,信息熵H(D)是跟模型参数无关的定值,因此优化互信息的下界,可以进一步等价于优化
Figure PCTCN2022116045-appb-000029
其中该式就是结构参数的后验分布和图像数据的后验分布的交叉熵损失。
利用梯度下降的方法优化该交叉熵损失,更新结构生成网络的参数。
重复上述S2~S4,不断迭代更新超网络的网络参数和结构参数,直到训练至收敛,最终得到目标网络。
S5,将全部训练图像输入目标网络,生成预测的图像类别标签,根据预测的图像类别标签与真实的图像类别标签,计算图像分类的交叉熵损失,训练目标网络直至收敛,用于图像分类。本实施例中,基于降低交叉熵损失的准则,来训练目标网络,训练最终得到交叉熵损失最小的目标网络。
如图3所示,经过不断训练迭代更新网络参数和结构参数最终得到的目标结构单元:左边为普通单元(Normal cell),右边为归约单元(Reduction cell)。该目标结构单元作为最大化互信息的结果,输出到图像分类模块。为了说明上述方法的技术效果,在本发明实施例中,图3中所示目标结构单元是最大化互信息的图像分类方法训练了1个 epoch后得到的收敛结构,该方法仅需极短的0.007个GPU-days的时间,就能自动设计得到目标优化网络,用于图像分类。多数用于图像分类的神经网络均需要较长时间训练,通常视收敛条件在几十至几百epoch不等达到收敛。目标优化网络在CIFAR-10上以3.9M的参数量达到了2.45±0.04的图像分类测试误差(%),在迁移到CIFAR-100和ImageNet数据集上时,可以分别达到15.80%和24.0%的图像分类测试错误率。而这个图像分类方法的自动设计过程可以在单个Nvidia 1080Ti GPU上运行0.09个小时左右得到。跟DARTS(9.6个小时)和SNAS(36个小时)相比较,这一结果能显著提升图像分类方法的设计效率,同时也很达到很高的图像分类准确率,保证图像分类方法的实用性和稳定性。
具体对比结果可见如下表1、表2、表3、表4所示,表1为本发明实施例的搜索得到结果评测对比(在CIFAR-10数据集上),表2为本发明实施例的搜索得到结果评测对比(在ImageNet数据集上),表3为本发明实施例的搜索得到结果评测对比(在NAS-Bench-201基准上),表4为本发明实施例的搜索得到结果评测对比(在S1-S4的搜索空间上),表中越低的错误率代表越好的性能,越小的参数量和计算量代表效率越高的结构。
表1
Figure PCTCN2022116045-appb-000030
表2
Figure PCTCN2022116045-appb-000031
表3
Figure PCTCN2022116045-appb-000032
表4
Figure PCTCN2022116045-appb-000033
上述实验表明,本发明的实施例提出的最大化互信息的图像分类方法,能够在限定计算资源的情况下提高图像分类方法的性能,实现更高的准确率,并且可以在较短时间内自动设计调整得到基于神经网络的图像分类方法,从而有更广泛的应用场景。
以上对本发明的具体实施例进行了描述。需要理解的是,本发明并不局限于上述特定实施方式,本领域技术人员可以在权利要求的范围内做出各种变形或修改,这并不影响本发明的实质内容。上述各优选特征在互不冲突的情况下,可以任意组合使用。

Claims (11)

  1. 一种最大化互信息的图像分类方法,其特征在于,包括:
    采集训练图像;
    最大化训练图像和神经网络结构的互信息,自动确定神经网络的网络结构和参数;
    采用得到的所述神经网络对待分类的图像数据进行处理,得到图像分类结果。
  2. 根据权利要求1所述的最大化互信息的图像分类方法,其特征在于,将采集到的训练图像分为两部分;所述最大化训练图像和神经网络结构的互信息,自动确定神经网络的网络结构和参数,包括:
    构建超网络和结构生成网络,分别对其进行数据处理获得超网络的网络参数和结构生成网络的参数,并构建目标网络;
    将全部训练图像输入所述目标网络,生成预测的图像类别标签,根据所述预测的图像类别标签与真实的图像类别标签,计算图像分类的交叉熵损失,训练目标网络直至收敛,用于图像分类。
  3. 根据权利要求2所述的最大化互信息的图像分类方法,其特征在于,所述构建超网络和结构生成网络,分别对其进行数据处理获得超网络的网络参数和结构生成网络的参数,并构建目标网络,包括:
    S1,基于图像分类所有可能的操作,构建基本单元;
    再利用所述基本单元构建超网络,其中:
    所述超网络是由包含所有可能的图像分类操作的基本单元堆叠而成;
    S2,基于卷积神经网络构建结构生成网络,从标准高斯分布中采样,得到采样值作为结构生成网络的输入,经过前向传播得到结构生成网络的输出;
    再从标准高斯分布中采样得到噪声;
    将所述结构生成网络的输出与所述采样得到的噪声求和作为超网络的结构参数;
    S3,将第一部分训练图像输入所述超网络,生成预测类别标签;
    根据所述预测类别标签与真实类别标签,计算图像分类交叉熵损失;
    利用梯度下降方法,根据所述图像分类交叉熵损失更新超网络的网络参数;
    S4,将第二部分训练图像输入所述超网络,最大化图像数据和超网络的结构参数的互信息,并确定互信息的下界,其中:
    互信息的下界为结构参数的后验分布和图像数据的后验分布的交叉熵损失,计算所 述交叉熵损失,并利用梯度下降方法更新结构生成网络的参数;
    重复S2-S4不断迭代更新所述超网络的网络参数和结构生成网络的参数,直到收敛,将更新得到的新基本单元堆叠构建目标网络。
  4. 根据权利要求3所述的最大化互信息的图像分类方法,其特征在于,所述结构生成网络,是指利用卷积神经网络、修正线性单元、批归一化层堆叠而成的神经网络
    Figure PCTCN2022116045-appb-100001
    所述结构生成网络的输入是从标准高斯分布中得到的采样值∈~N(0,1);
    所述结构生成网络的输出是将采样值∈输入结构生成网络
    Figure PCTCN2022116045-appb-100002
    前向传播得到输出为
    Figure PCTCN2022116045-appb-100003
    所述超网络的结构参数是从标准高斯分布中采样噪声ε~N(0,1),并与所述结构生成网络的输出求和得到的
    Figure PCTCN2022116045-appb-100004
  5. 根据权利要求3所述的最大化互信息的图像分类方法,其特征在于,所述图像数据和超网络的结构参数的互信息,是指
    Figure PCTCN2022116045-appb-100005
    其中D表示给定的训练图像数据,A表示超网络的结构参数,
    Figure PCTCN2022116045-appb-100006
    表示给定结构参数A的情况下,描述数据集D的网络参数。
  6. 根据权利要求3所述的最大化互信息的图像分类方法,其特征在于,所述互信息的下界,是指
    Figure PCTCN2022116045-appb-100007
    其中:
    H(D)表示训练图像数据D的信息熵,
    Figure PCTCN2022116045-appb-100008
    表示训练数据和超网络的结构参数的联合概率分布,q θ(D|A)表示用来近似真实的数据后验分布
    Figure PCTCN2022116045-appb-100009
    的变分分布。
  7. 根据权利要求6所述的最大化互信息的图像分类方法,其特征在于,所述结构参数的后验分布和图像数据的后验分布的交叉熵损失,是指
    Figure PCTCN2022116045-appb-100010
    其中
    Figure PCTCN2022116045-appb-100011
    表示给定训练图像数据D后,超网络的结构参数的条件概率分布,q θ(D|A)表示给定超网络的结构参数后,训练图像数据的变分条件概率分布。
  8. 根据权利要求3所述的最大化互信息的图像分类方法,其特征在于,所述将更新得到的新基本单元堆叠构建目标网络,包括:
    从标准高斯分布中得到采样值,输入结构生成网络,得到输出的超网络的结构参数,根据超网络的结构参数,从基本单元构建的超网络中选定规定数目的对应参数数值最大的图像分类操作,最终得到目标网络。
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至8任一项所述的最大化互信息的图像分类方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处 理器执行时实现权利要求1至8任一项所述的最大化互信息的图像分类方法。
  11. 一种芯片系统,包括处理器,所述处理器与存储器的耦合,所述存储器存储有程序指令,其特征在于,当所述存储器存储的程序指令被所述处理器执行时实现权利要求1至8任一项所述的最大化互信息的图像分类方法。
PCT/CN2022/116045 2021-10-08 2022-08-31 一种最大化互信息的图像分类方法、设备、介质及系统 WO2023056802A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111170350.9 2021-10-08
CN202111170350.9A CN113936173A (zh) 2021-10-08 2021-10-08 一种最大化互信息的图像分类方法、设备、介质及系统

Publications (1)

Publication Number Publication Date
WO2023056802A1 true WO2023056802A1 (zh) 2023-04-13

Family

ID=79278103

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/116045 WO2023056802A1 (zh) 2021-10-08 2022-08-31 一种最大化互信息的图像分类方法、设备、介质及系统

Country Status (2)

Country Link
CN (1) CN113936173A (zh)
WO (1) WO2023056802A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113936173A (zh) * 2021-10-08 2022-01-14 上海交通大学 一种最大化互信息的图像分类方法、设备、介质及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112529806A (zh) * 2020-12-15 2021-03-19 哈尔滨工程大学 基于生成对抗网络信息最大化的sar图像数据增强方法
CN112784961A (zh) * 2021-01-21 2021-05-11 北京百度网讯科技有限公司 超网络的训练方法、装置、电子设备和存储介质
US20210209775A1 (en) * 2018-12-19 2021-07-08 Shanghai Sensetime Intelligent Technology Co., Ltd. Image Processing Method and Apparatus, and Computer Readable Storage Medium
US20210287099A1 (en) * 2020-03-09 2021-09-16 International Business Machines Corporation Mutual Information Neural Estimation with Eta-Trick
CN113936173A (zh) * 2021-10-08 2022-01-14 上海交通大学 一种最大化互信息的图像分类方法、设备、介质及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210209775A1 (en) * 2018-12-19 2021-07-08 Shanghai Sensetime Intelligent Technology Co., Ltd. Image Processing Method and Apparatus, and Computer Readable Storage Medium
US20210287099A1 (en) * 2020-03-09 2021-09-16 International Business Machines Corporation Mutual Information Neural Estimation with Eta-Trick
CN112529806A (zh) * 2020-12-15 2021-03-19 哈尔滨工程大学 基于生成对抗网络信息最大化的sar图像数据增强方法
CN112784961A (zh) * 2021-01-21 2021-05-11 北京百度网讯科技有限公司 超网络的训练方法、装置、电子设备和存储介质
CN113936173A (zh) * 2021-10-08 2022-01-14 上海交通大学 一种最大化互信息的图像分类方法、设备、介质及系统

Also Published As

Publication number Publication date
CN113936173A (zh) 2022-01-14

Similar Documents

Publication Publication Date Title
Wu et al. Nonnegative matrix factorization with mixed hypergraph regularization for community detection
WO2023000574A1 (zh) 一种模型训练方法、装置、设备及可读存储介质
Zheng et al. Migo-nas: Towards fast and generalizable neural architecture search
CN104102745B (zh) 基于局部最小边的复杂网络社团挖掘方法
Li et al. Smooth group L1/2 regularization for input layer of feedforward neural networks
CN108304380A (zh) 一种融合学术影响力的学者人名消除歧义的方法
CN103838803A (zh) 一种基于节点Jaccard相似度的社交网络社团发现方法
Zhang et al. Pruning convolutional neural networks with an attention mechanism for remote sensing image classification
WO2023056802A1 (zh) 一种最大化互信息的图像分类方法、设备、介质及系统
Chen et al. Binarized neural architecture search
Huang et al. Particle swarm optimization for compact neural architecture search for image classification
CN107784327A (zh) 一种基于gn的个性化社区发现方法
Pu et al. Screen efficiency comparisons of decision tree and neural network algorithms in machine learning assisted drug design
Costa et al. Train me if you can: Decentralized learning on the deep edge
Mao et al. A MapReduce-based K-means clustering algorithm
Li et al. DLW-NAS: differentiable light-weight neural architecture search
Zhang et al. An efficient multi-objective evolutionary zero-shot neural architecture search framework for image classification
CN102779241A (zh) 基于人工蜂群繁殖机制的ppi网络聚类方法
Yang et al. Link prediction via nonnegative matrix factorization enhanced by blocks information
Van Lierde et al. Spectral clustering algorithms for the detection of clusters in block-cyclic and block-acyclic graphs
Ma et al. Multi-view clustering microbiome data by joint symmetric nonnegative matrix factorization with Laplacian regularization
CN116226467A (zh) 基于节点结构特征的图卷积神经网络的社区发现方法
CN114970684A (zh) 一种结合vae的提取网络核心结构的社区检测方法
Mora et al. Convolutional Neural Networks-based plant disease detection implemented on low-power consumption device
CN105162648A (zh) 基于骨干网络扩展的社团检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22877824

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE