WO2023125456A1 - 基于多层次变分自动编码器的高光谱图像特征提取方法 - Google Patents

基于多层次变分自动编码器的高光谱图像特征提取方法 Download PDF

Info

Publication number
WO2023125456A1
WO2023125456A1 PCT/CN2022/142106 CN2022142106W WO2023125456A1 WO 2023125456 A1 WO2023125456 A1 WO 2023125456A1 CN 2022142106 W CN2022142106 W CN 2022142106W WO 2023125456 A1 WO2023125456 A1 WO 2023125456A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature extraction
feature
hyperspectral image
size
input
Prior art date
Application number
PCT/CN2022/142106
Other languages
English (en)
French (fr)
Inventor
于文博
黄鹤
沈纲祥
Original Assignee
苏州大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州大学 filed Critical 苏州大学
Publication of WO2023125456A1 publication Critical patent/WO2023125456A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/52Scale-space analysis, e.g. wavelet analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/58Extraction of image or video features relating to hyperspectral data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A40/00Adaptation technologies in agriculture, forestry, livestock or agroalimentary production
    • Y02A40/10Adaptation technologies in agriculture, forestry, livestock or agroalimentary production in agriculture

Definitions

  • the present application relates to the technical field of spectral imaging, in particular to a hyperspectral image feature extraction method based on a multi-level variational autoencoder.
  • Hyperspectral images contain rich spatial and spectral features, where spatial features refer to the spatial location information of pixels at each wavelength, and spectral features refer to the spectral curve composed of the spectral reflectance of a single pixel at each wavelength.
  • spatial features refer to the spatial location information of pixels at each wavelength
  • spectral features refer to the spectral curve composed of the spectral reflectance of a single pixel at each wavelength.
  • Hyperspectral image feature extraction methods can be divided into feature extraction methods based on spectral features and feature extraction methods based on spatial spectral features from the perspective of information sources.
  • the feature extraction method based on spectral features uses a single spectral curve in the hyperspectral image to construct a feature extractor, and ignores the position information of different pixels in the spatial dimension.
  • Early widely used methods include principal component analysis (Principle Component Analysis, PCA), minimum noise separation (Minimum Noise Fraction, MNF), linear discriminant analysis (Linear Discriminant Analysis, LDA) and so on. These methods generally consider the internal discriminative information of hyperspectral pixels to ensure the classifiability.
  • hyperspectral image is a typical three-dimensional cube data, this kind of data combines the spatial information of the target object with the spectral information of each wavelength, and is jointly reflected in the complete data, Therefore, the hyperspectral image has the characteristics of map-spectrum integration, that is, the spatial information of the hyperspectral image is consistent with the spectral information.
  • map-spectrum integration that is, the spatial information of the hyperspectral image is consistent with the spectral information.
  • the spatial information of a hyperspectral image can be understood as the local spatial neighborhood of a single pixel in the spatial dimension.
  • this application proposes a hyperspectral image feature extraction method based on a multi-level variational autoencoder (multi-level VAE) for hyperspectral images.
  • This method uses the variational autoencoder as the basic framework of the method, and uses the fusion feature obtained by the final correction as the final output space-spectrum joint feature after training.
  • a hyperspectral image feature extraction method based on a multi-level variational automatic encoder characterized in that the method comprises the steps of:
  • the size of the hyperspectral image is X ⁇ Y ⁇ B, wherein, X and Y are the spatial dimensions of the hyperspectral image at each wavelength, and B is the number of wavelengths of the hyperspectral image,
  • Configure neighborhood information for each hyperspectral pixel in the hyperspectral image that is, select neighborhood pixels with a size of s ⁇ s around the hyperspectral pixel as the neighborhood information of the pixel, and the size of the neighborhood information is s ⁇ s ⁇ B, where the neighborhood information refers to the square area centered on the hyperspectral pixel, the side length is s, and s is an odd number,
  • the second sample with a size of 1 ⁇ B hyperspectral pixel is used as the input Input q of the spectral feature extraction module, the first sample and the second sample have the same number and one-to-one correspondence,
  • Feature splicing and calculation of the mean feature ⁇ that is, the first feature in the spatial feature extraction module layer input for the first layer output
  • the layer input for the first layers and The splicing of layer output is based on the calculation formula: Among them, 1 ⁇ i ⁇ m, according to the calculation formula, Get the mean feature ⁇ , the size of ⁇ is bs ⁇ s 2 ⁇ d,
  • the fused feature O is obtained, and the fused feature O is input to the decoder module, and the decoder is used for data reconstruction of the fused feature O,
  • ⁇ ( ⁇ ) means to add all the contents in brackets together.
  • ⁇ R uses the Euclidean distance to calculate the similarity between the input of the spectral feature extraction module and the output of the encoder
  • ⁇ KL is the loss function in the variational autoencoder VAE, and uses the KL divergence to calculate the Gaussian distribution and embedded features The similarity between distributions
  • ⁇ Homo uses the spectral angular distance to calculate the similarity between the output of all spatial feature extraction modules and the output of the corresponding spectral feature extraction module.
  • step S1 it also includes:
  • step S4 includes: randomly selecting a small batch of samples from X ⁇ Y first samples with a size of 1 ⁇ s 2 ⁇ B and X ⁇ Y second samples with a size of 1 ⁇ B and inputting them to the depth
  • the neural network is used for the training of the deep neural network.
  • the number of small batch pixels of the first sample and the second sample is both bs, and the activation function in the network is the Tanh activation function.
  • the hyperspectral image feature extraction method based on multi-level variational automatic encoder is characterized in that, also includes:
  • x min represents the minimum value in the pixel data
  • x max represents the maximum value
  • the formula for calculating the Tanh activation function is:
  • step S8 also includes: using the loss function, selecting an Adam optimizer with a step size of 10 -3 to optimize the deep network model, and after the model reaches a stable level, pooling the average feature as output.
  • step S3 includes:
  • B is not an integer multiple of s2 , remove ⁇ wavelengths so that B- ⁇ is an integer multiple of s2 , where q is used to refer to the relevant variables in the spectral feature extraction module.
  • step S6 also includes:
  • the input of the long-short-term memory network layer L is The number of nodes is d, and the size of ⁇ is bs ⁇ s 2 ⁇ d.
  • the decoder module includes:
  • each network layer is ⁇ d 1 , d 2 ...d n ⁇
  • the input of each network layer is ⁇ ind 1 , ind 2 ...ind n ⁇
  • the output of each network layer is ⁇ outd 1 ,outd 2 ...outd n ⁇
  • the number of nodes in the last layer is B
  • the number of nodes in other network layers is d.
  • step S7 also includes:
  • the fused feature O is obtained, where ⁇ is a randomly generated noise matrix, which conforms to a Gaussian distribution, and its size is bs ⁇ d.
  • the multi-continuous feature integration method of the hyperspectral image in the embodiment of the present application is used to extract the spatial features and spectral features of the hyperspectral image.
  • This method considers two kinds of continuous information contained in the hyperspectral image, By designing a deep network model, the purpose of describing information from multiple angles is achieved.
  • the spatial features of different stages are used to sequentially correct the information of the spectral features of the corresponding stage, so as to improve the collaboration ability of the two features in the extraction process.
  • the multi-level spatial-spectral feature homology promotion method based on spectral angular distance uses spectral angular distance to gradually improve the homology between spatial-spectral features at each level, and improves the correlation between spatial features and spectral features in the feature mapping stage. , to solve the problem that the classification accuracy is difficult to improve due to the large difference in the distribution of the two feature data.
  • Fig. 1 is a schematic flow chart of the feature extraction method of the embodiment of the present application.
  • FIG. 2 is a schematic flowchart of an overall differential demodulation process in an embodiment of the present application.
  • This application provides a hyperspectral image feature extraction method based on a multi-level variational autoencoder, which uses the long short-term memory network layer to extract the continuous features of the pixel from the spatial level and the spectral level, and uses the splicing layer to combine the two continuous features Fusion is performed to solve the problem of single continuous information in traditional feature extraction algorithms.
  • This method is based on the multi-level spatial spectral feature homology promotion method based on spectral angular distance, which uses spectral angular distance to calculate and increase the homology between spatial spectral features at each stage, and solves the large difference in data distribution between spatial features and spectral features.
  • the resulting problem that it is difficult to improve the subsequent classification accuracy improves the relevance of the two features in the feature mapping stage.
  • the size of the hyperspectral image is X ⁇ Y ⁇ B, where X and Y are the spatial dimensions of the hyperspectral image at each wavelength, and B is the number of wavelengths of the hyperspectral image.
  • For hyperspectral Perform normalized preprocessing on the image, set the neighborhood size s, the number of network layers m in the spatial feature extraction module and the spectral feature extraction module, the number n of network layers in the decoder module, and the embedded feature dimension d, where d requires is an even number greater than 0.
  • the deep network model transforms the neighborhood information of the pixel to obtain the first sample with a size of 1 ⁇ s 2 ⁇ B and use it as the input Input p of the spatial feature extraction module. Transform the hyperspectral pixel x with size 1 ⁇ B to get the size and take it as the input Input q of the spectral feature extraction module, if B is not an integer multiple of s 2 , remove ⁇ wavelengths so that B- ⁇ is an integer multiple of s 2 .
  • the superscript p is used to refer to the relevant variables in the spatial feature extraction module
  • the superscript q is used to refer to the relevant variables in the spectral feature extraction module.
  • the number of the first sample is the same as that of the second sample and there is a one-to-one correspondence.
  • the deep network model includes a spatial feature extraction module, a spectral feature extraction module, a feature fusion module and a decoder module.
  • Concat( ) is a splicing operation, and the two are spliced in the third dimension to obtain an output with a size of bs ⁇ s 2 ⁇ 2d and use it as the first spectral feature extraction module.
  • layer input is a splicing operation, and the two are spliced in the third dimension to obtain an output with a size of bs ⁇ s 2 ⁇ 2d and use it as the first spectral feature extraction module.
  • the first layer input for the first layer output In the spatial feature extraction module, the first layer input for the first layer output, the layer input for the first layers and Concatenation of layer outputs, as follows:
  • the mean feature ⁇ is obtained, and the size of ⁇ is bs ⁇ s 2 ⁇ d.
  • This step also includes: using a long short-term memory network layer L to obtain the standard deviation feature ⁇ , the network layer input is The number of nodes is d, and the size of ⁇ is bs ⁇ s 2 ⁇ d.
  • ⁇ ( ⁇ ) means to add all the contents in brackets together.
  • ⁇ R uses the Euclidean distance to calculate the similarity between the input of the spectral feature extraction module and the output of the encoder
  • ⁇ KL is the loss function in the variational autoencoder VAE, and uses the KL divergence to calculate the Gaussian distribution and embedded features The similarity between distributions
  • ⁇ Homo uses the spectral angular distance to calculate the similarity between the output of all spatial feature extraction modules and the output of the corresponding spectral feature extraction module.
  • the spatial feature extraction module includes: m long short-term memory network layers (Long Short-term Memory Layer), each network layer is respectively The input of each network layer is The output of each network layer is The number of nodes in the last layer is d/2, and the number of nodes in other network layers is d.
  • the spectral feature extraction module includes: m long short-term memory network layers, each network layer is The input of each network layer is The output of each network layer is The number of nodes in the last layer is d/2, and the number of nodes in other network layers is d.
  • the decoder module is composed of n fully connected network layers (Fully Connected Layer), each network layer is ⁇ d 1 ,d 2 ...d n ⁇ , and the input of each network layer is ⁇ ind 1 , ind 2 ...ind n ⁇ , The output of each network layer is ⁇ outd 1 , outd 2 ...outd n ⁇ , the number of nodes in the last layer is B, and the number of nodes in other network layers is d.
  • the decoder is used to reconstruct the data of the fused feature O to form a structure like an autoencoder, which helps to ensure the consistency of sample information.
  • it also includes: using the above loss function, selecting an Adam optimizer with a step size of 10 ⁇ 3 to optimize the deep network model, and after the model reaches a steady state, pooling the mean value feature As an output, all the first samples and the second samples are used as test samples to obtain expected embedded features.
  • all X ⁇ Y hyperspectral pixels need to be divided into a training set and a test set according to a certain ratio, and normalized so that the value range is between -1 and 1, the normalization formula as follows:
  • x min represents the minimum value in the pixel data
  • x max represents the maximum value.
  • the hyperspectral image feature extraction method based on multi-level variational autoencoder is used to extract the spatial spectral features in the hyperspectral image and used for subsequent classification research.
  • the image size is 145 ⁇ 145 ⁇ 200, contains a total of 21025 pixels, each pixel contains 200 spectral wavelengths, and the entire dataset contains 16 effective categories and background noise categories. After removing the pixels belonging to the background noise category, there are 10366 valid pixels remaining.
  • the deep network structure is shown in Figure 2:
  • the input hyperspectral image is an image of size 145 ⁇ 145 ⁇ 200.
  • the neighborhood size is 5, the number of network layers in the spatial feature extraction module and the spectral feature extraction module is 3, the number of network layers in the decoder module is 3, and the embedded feature dimension is 40.
  • the neighborhood information is selected, and for each pixel, the neighborhood information with a size of 5 ⁇ 5 ⁇ 200 is obtained, and the pixel and neighborhood information are input into the deep network for training.
  • the 10366 training set data 40% of the samples are selected for training the deep network model, these samples are randomly sorted and packaged, and the number of pixels in the mini-batch is 512. Only one of the sample packets is used for each training session. After the training is over, all 10366 training set data are input into the deep model for testing, and the embedded features with a size of 10366 ⁇ 40 are obtained, and finally the SVM classifier is used for classification. Randomly select 10% of the samples to train the SVM classifier, and use the remaining 90% of the samples for testing, and finally obtain the classification results, and use the overall classification accuracy and average classification accuracy to evaluate the classification results.
  • the overall classification result refers to the ratio of the number of correctly classified samples divided by the number of all samples in all samples.
  • the average classification accuracy first divides the number of correctly classified samples in each class by the ratio of the number of samples in this class, and calculates the average value of each type of ratio.
  • the ordinary variational autoencoder includes an encoder, a feature fusion module and a decoder, wherein the encoder consists of 3 layer fully connected layer, the decoder is composed of 3 fully connected layers, the number of nodes in the network layer and the structure of the feature fusion module are the same as the method implemented in this application) the obtained classification results are shown in the table below.
  • the method of this application can better improve the classification performance of embedded features, and has fewer misclassified samples.
  • the overall classification accuracy obtained is 81.4% (the overall classification accuracy reaches 85.3% when no random Gaussian noise is added), it can be seen that the application The method has strong anti-noise ability. Therefore, the method of this application can effectively improve the classifiability and classification accuracy of embedded features, and can also improve the anti-noise interference ability of the model.

Abstract

本申请提出一种针对高光谱图像的基于多层次变分自动编码器的高光谱图像特征提取方法。该方法用变分自动编码器作为方法基本框架,并采用最终修正得到的融合特征作为训练后最终输出的空谱联合特征。该方法可以更好地提取数据中的重要判别信息,提高像元的可分类能力和分类精度,减少后续分类任务中误分类现象的发生,提高模型的抗噪声干扰能力。

Description

基于多层次变分自动编码器的高光谱图像特征提取方法 技术领域
本申请涉及光谱成像技术领域,具体的涉及一种基于多层次变分自动编码器的高光谱图像特征提取方法。
背景技术
在遥感领域内,高光谱成像技术在各种研究中被广泛使用。高光谱图像包含丰富的空间特征和光谱特征,其中空间特征指在各个波长下的像元的空间位置信息,光谱特征指单一像元在各波长下光谱反射率组成的光谱曲线。通过对高光谱图像进行特征提取,可以得到蕴含丰富判别信息的低维嵌入特征,减少图像中的冗余信息,并可以提升后续分类研究中的识别精度。早期,高光谱图像特征提取方法主要提取像元的光谱特征,并没有考虑像元之间的位置信息,因此很难得到较好的结果。随着计算机算力的提升以及深度学习研究的深入,一些通过训练神经网络来提取空谱特征的方法相继被提出,这些方法引入多传感器数据融合的思想,采用对空间特征和光谱特征进行单独提取并进行特征融合的方式,避免了信息的缺失,提升了算法的性能。
高光谱图像特征提取方法从信息来源的角度可分为基于光谱特征的特征提取方法和基于空谱特征的特征提取方法。
基于光谱特征的特征提取方法是利用高光谱图像中单一光谱曲线来构造特征提取器,并忽略不同像元在空间维度的位置信息。早期应用较广泛的方法包括主成分分析(Principle Component Analysis,PCA),最小化噪声分离(Minimum Noise Fraction,MNF),线性判别分析(Linear DiscriminantAnalysis,LDA)等等。这些方法一般考虑高光谱像元的内部判别信息,以此来保证可分类能力。 随着深度学习研究的不断深入,一些深度网络模型也被应用到高光谱图像特征提取研究中,包括自动编码器(Auto-encoder,AE)、变分自动编码器(Variational Auto-encoder,VAE)、长短期记忆网络(Long Short-term Memory,LSTM)等等。但这类方法没有考虑不同像元之间的位置关系,因此只从光谱层面上对图像中的信息进行描述,没有充分发挥高光谱图像的“图谱合一”的优势,即图像中空间信息和光谱信息之间存在统一性与协同性。目前,该研究领域的主流方法是基于空谱特征的特征提取方法。
而基于空谱特征的特征提取方法,高光谱图像是一种典型的三维立方体数据,这种数据将目标地物的空间信息与各个波长下的光谱信息进行结合,并共同体现在完整的数据中,因此高光谱图像具有图谱合一的特点,即高光谱图像的空间信息与光谱信息具有一致性。同时,由于高光谱图像多变的拍摄环境与外界干扰的影响,图像中仍然存在“一物异谱”和“一谱异物”的现象,这些现象同时也会干扰高光谱分析研究的结果。高光谱图像的空间信息可理解为单一像元在空间维度的局部空间邻域,该定义假定每一个像元均与其空间邻域像元存在一定的关系,因此可以通过对其局部空间邻域的分布进行学习来掌握该像元在真实地物中的位置信息。目前较为常用的方法是利用卷积神经网络对像元的局部信息进行特征提取,利用全连接层对像元的光谱特征进行提取,最后采用拼接层来实现空谱特征的联合。在进行光谱特征提取时,一些研究也会利用长短期记忆网络学习像元的光谱曲线内的连续信息。但这类方法存在一定的缺陷:①这些方法没有从多方面考虑高光谱图像中的连续信息,仅仅从光谱曲线这一角度对连续信息进行描述;②这些方法在分别提取光谱特征和空间特征时,两种映射的相关性与协作性较差,仅仅在最后一步利用拼接层实现特征融合,无法充分匹配两种特征的分布;③这些方法没有考虑空间特征和光谱特征 间的同源性,尽管两种特征是从不同层面上对高光谱图像的信息进行描述,但二者均对同一高光谱像元进行描述,因此二者势必存在同源性。
总之目前高光谱图像特征提取方法存在一定的缺点:
①目前基于空谱特征的高光谱图像特征提取方法中没有从多种层面考虑像元的连续信息,大部分方法仅仅从光谱曲线这一角度对连续信息进行描述,弱化了数据的多元性;
②目前基于空谱特征的高光谱图像特征提取方法中空间特征映射和光谱特征映射之间的协作能力较差,两种映射基本完全割裂开来,各自进行特征提取后,采用拼接层进行特征融合,但空间特征和光谱特征本身数据分布就存在较大的差异,生硬地进行拼接难以达到预期的目的;
③目前基于空谱特征的高光谱图像特征提取方法中没有考虑空间特征和光谱特征间的同源性,忽略了两种特征均是从同一高光谱像元中提取得到的现实,这进一步导致两种特征的数据分布差异增大,不利于特征融合表达,也不利于后续分类研究。
发明内容
为克服上述的缺陷,本申请的目的在于:本申请提出一种针对高光谱图像的基于多层次变分自动编码器(multi-level VAE)的高光谱图像特征提取方法。该方法用变分自动编码器作为方法基本框架,并采用最终修正得到的融合特征作为训练后最终输出的空谱联合特征。
为实现上述目的,本申请采用如下的技术方案,
一种基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,所述方法包括如下步骤:
S1.选取高光谱图像,所述高光谱图像的尺寸为X×Y×B,其中,X和Y是 各个波长下高光谱图像的空间尺寸,B为高光谱图像的波长个数,
S2.对高光谱图像中每一个高光谱像元配置邻域信息,即选取高光谱像元周围尺寸为s×s的邻域像元作为像元的邻域信息,邻域信息尺寸为s×s×B,其中,邻域信息指以高光谱像元为中心的正方形区域,边长为s,s为奇数,
S3.基于深度网络模型变换所述邻域信息以得到尺寸为1×s 2×B的第一样本,所述第一样本作为空间特征提取模块的输入Input p
将尺寸为1×B高光谱像元的第二样本,所述第二样本作为光谱特征提取模块的输入Input q,第一样本与第二样本个数相同且一一对应,
S4.训练深度神经网络,
S5.特征拼接及计算均值特征μ,即空间特征提取模块中第
Figure PCTCN2022142106-appb-000001
层的输入
Figure PCTCN2022142106-appb-000002
为第
Figure PCTCN2022142106-appb-000003
层的输出
Figure PCTCN2022142106-appb-000004
光谱特征提取模块中第
Figure PCTCN2022142106-appb-000005
层的输入
Figure PCTCN2022142106-appb-000006
为第
Figure PCTCN2022142106-appb-000007
层和
Figure PCTCN2022142106-appb-000008
层输出的拼接,依据计算式:
Figure PCTCN2022142106-appb-000009
其中1<i<m,依据计算式,
Figure PCTCN2022142106-appb-000010
得到均值特征μ,μ的尺寸为bs×s 2×d,
S6.池化操作,即利用平均池化层将均值特征μ和标准差特征δ进行池化操作,以得到池化后的均值特征
Figure PCTCN2022142106-appb-000011
和池化后的标准差特征
Figure PCTCN2022142106-appb-000012
尺寸均为bs×d,
S7.基于特征融合模块得到融合后的特征O,并将融合后的特征O输入至解码器模块,所述解码器用于将融合后的特征O进行数据重构,
S8.网络优化,即依据公式Γ=Γ RKLHomo构建网络模型训练时的损失函数,其中
Figure PCTCN2022142106-appb-000013
Figure PCTCN2022142106-appb-000014
Figure PCTCN2022142106-appb-000015
其中,∑(·)为将括号内的内容全部加在一起。其中,Γ R利用欧氏距离来计算光谱特征提取模块输入与编码器输出之间的相似性,Γ KL是变分自动编码器VAE中的损失函数,利用KL散度来计算高斯分布和嵌入特征分布之间的相似性,Γ Homo利用光谱角距离来计算全部空间特征提取模块输出与对应光谱特征提取模块输出之间的相似性。
优选的,步骤S1中,还包括:
对高光谱图像进行归一化预处理,设置邻域尺寸s,空间特征提取模块和光谱特征提取模块中网络层个数m,解码器模块中网络层个数n,嵌入特征维数d,其中,d为大于0的偶数。
优选的,步骤S4中包括:从X×Y个尺寸为1×s 2×B的第一样本和X×Y个尺寸为1×B的第二样本中分别随机选取小批量样本输入到深度神经网络中用于深度神经网络的训练,第一样本与第二样本的小批量像元个数均为bs,网络中激活函数均为Tanh激活函数。
优选的,基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,还包括:
将全部X×Y个高光谱像元归一化使取值范围在-1到1之间,归一化公式如下:
Figure PCTCN2022142106-appb-000016
其中,x min表示像元数据中最小值,x max为最大值。
优选的,Tanh激活函数计算公式为:
Figure PCTCN2022142106-appb-000017
优选的,步骤S8中还包括:利用损失函数,选用步长为10 -3的Adam优化 器,对深度网络模型进行优化,在模型达到平稳后,将池化后的均值特征
Figure PCTCN2022142106-appb-000018
作为输出。
优选的,步骤S3中包括:
将尺寸为1×B的高光谱像元x进行变换,得到尺寸为
Figure PCTCN2022142106-appb-000019
的第二样本,所述第二样本作为光谱特征提取模块的输入Input q
若B不是s 2的整数倍,则移除ε个波长使得B-ε是s 2的整数倍,其中,q用于指代光谱特征提取模块中的相关变量。
优选的,步骤S6中还包括:
利用长短期记忆网络层L得到标准差特征δ,所述长短期记忆网络层L的输入为
Figure PCTCN2022142106-appb-000020
节点个数为d,δ的尺寸为bs×s 2×d。
优选的,步骤S7中,所述解码器模块包括:
n个全连接网络层,其中,各网络层分别为{d 1,d 2…d n},各网络层输入分别为{ind 1,ind 2…ind n},各网络层输出分别为{outd 1,outd 2…outd n},最后一层节点个数为B,其他网络层节点个数均为d。
优选的,步骤S7还包括:
依据公式
Figure PCTCN2022142106-appb-000021
得到融合后的特征O,其中,γ为随机生成的噪声矩阵,其符合高斯分布,且尺寸为bs×d。
有益效果
与现有技术相比,本申请实施方式的高光谱图像的多连续特征整合方法,用于提取高光谱图像的空间特征和光谱特征,该方法考虑了高光谱图像中蕴含的两种连续信息,通过设计深度网络模型,实现对信息多角度描述的目的。另外,该方法中通过多层次空间特征修正方法,利用不同阶段的空间特征依次对相应阶段的光谱特征进行信息修正,以提升两种特征在提取过程中的协作能力。 该方法中基于光谱角距离的多层次空谱特征同源性提升方法,利用光谱角距离逐步提升各个层次空谱特征之间的同源性,提升空间特征与光谱特征在特征映射阶段的关联性,解决两种特征数据分布差异较大导致的分类精度难以提升的问题。
附图说明
图1为本申请实施例的特征提取方法的流程示意图,
图2为本申请实施例的整体差分解调过程的流程示意图。
具体实施方式
以下结合具体实施例对上述方案做进一步说明。应理解,这些实施例是用于说明本申请而不限于限制本申请的范围。实施例中采用的实施条件可以如具体厂家的条件做进一步调整,未注明的实施条件通常为常规实验中的条件。
本申请提供一种基于多层次变分自动编码器的高光谱图像特征提取方法,利用长短期记忆网络层从空间层面和光谱层面来提取像元的连续特征,并使用拼接层对两种连续特征进行融合,解决了传统特征提取算法中连续信息单一的问题。该方法基于光谱角距离的多层次空谱特征同源性提升方法利用光谱角距离计算并增加各阶段空谱特征之间的同源性,解决了空间特征和光谱特征间数据分布差异性较大导致的后续分类精度难以提升的问题,提升了两种特征在特征映射阶段的关联性。
下面结合附图来描述本申请提出的基于多层次变分自动编码器的高光谱图像特征提取方法。
如图1所示为特征提取方法的流程示意图,该方法包括:
S1.选取/获取高光谱图像,高光谱图像的尺寸为X×Y×B,其中,X和Y是各个波长下高光谱图像的空间尺寸,B是高光谱图像的波长个数,对高光谱图像 进行归一化预处理,设置邻域尺寸s,空间特征提取模块和光谱特征提取模块中网络层个数m,解码器模块中网络层个数n,嵌入特征维数d,其中,d需要为大于0的偶数。
S2.对每一个高光谱像元(共有X×Y个高光谱像元)设置邻域信息,选取其周围尺寸为s×s的邻域像元作为该像元的邻域信息,邻域信息尺寸为s×s×B。
S3.构建深度网络模型,将像元的邻域信息进行变换,得到尺寸为1×s 2×B的第一样本并作为空间特征提取模块的输入Input p。将尺寸为1×B的高光谱像元x进行变换,得到尺寸为
Figure PCTCN2022142106-appb-000022
的第二样本并将其作为光谱特征提取模块的输入Input q,若B不是s 2的整数倍,则移除ε个波长使得B-ε是s 2的整数倍。其中,p上角标用于指代空间特征提取模块中的相关变量,q上角标用于指代光谱特征提取模块中的相关变量。第一样本与第二样本个数相同且一一对应。该深度网络模型中包含空间特征提取模块,光谱特征提取模块,特征融合模块和解码器模块。
S4.训练深度神经网络,从全部X×Y个高光谱像元中随机选取小批量样本输入到深度神经网络中,小批量像元个数为bs,网络中激活函数均为Tanh激活函数,除了解码器模块最后一层之外,其他所有网络层后均连接批归一化层(Batch Normalization Layer)。
S5.特征拼接操作,
即将Input p作为
Figure PCTCN2022142106-appb-000023
输入空间特征提取模块的第
Figure PCTCN2022142106-appb-000024
层中,得到输出
Figure PCTCN2022142106-appb-000025
将Input q作为
Figure PCTCN2022142106-appb-000026
输入光谱特征提取模块的第
Figure PCTCN2022142106-appb-000027
层中,得到输出
Figure PCTCN2022142106-appb-000028
其中,
Figure PCTCN2022142106-appb-000029
Figure PCTCN2022142106-appb-000030
的尺寸均为bs×s 2×d。将
Figure PCTCN2022142106-appb-000031
作为空间特征提 取模块中第
Figure PCTCN2022142106-appb-000032
层的输入
Figure PCTCN2022142106-appb-000033
依据如下公式
Figure PCTCN2022142106-appb-000034
其中Concat(·)为拼接操作,将二者在第三个维度上进行拼接,得到尺寸为bs×s 2×2d的输出并将其作为光谱特征提取模块中第
Figure PCTCN2022142106-appb-000035
层的输入
Figure PCTCN2022142106-appb-000036
空间特征提取模块中第
Figure PCTCN2022142106-appb-000037
层的输入
Figure PCTCN2022142106-appb-000038
为第
Figure PCTCN2022142106-appb-000039
层的输出
Figure PCTCN2022142106-appb-000040
光谱特征提取模块中第
Figure PCTCN2022142106-appb-000041
层的输入
Figure PCTCN2022142106-appb-000042
为第
Figure PCTCN2022142106-appb-000043
层和
Figure PCTCN2022142106-appb-000044
层输出的拼接,如下所示:
Figure PCTCN2022142106-appb-000045
其中1<i<m。依据如下公式:
Figure PCTCN2022142106-appb-000046
得到均值特征μ,μ的尺寸为bs×s 2×d。
S6.池化操作,即利用平均池化层(Average Pooling Layer)将均值特征μ和标准差特征δ进行池化操作,得到池化后的均值特征
Figure PCTCN2022142106-appb-000047
和池化后的标准差特征
Figure PCTCN2022142106-appb-000048
尺寸均为bs×d。该步骤中还包括:利用一个长短期记忆网络层L得到标准差特征δ,该网络层输入为
Figure PCTCN2022142106-appb-000049
节点个数为d,δ的尺寸为bs×s 2×d。
S7.基于特征融合模块得到融合后的特征O,将融合后的特征O作为解码器模块的输入;该步骤中依据公式
Figure PCTCN2022142106-appb-000050
在特征融合模块中得到融合后的特征O,其中,γ为随机生成的噪声矩阵,其符合高斯分布,且尺寸为bs×d。
S8.依据公式Γ=Γ RKLHomo构建网络模型训练时的损失函数,其中
Figure PCTCN2022142106-appb-000051
Figure PCTCN2022142106-appb-000052
Figure PCTCN2022142106-appb-000053
其中∑(·)为将括号内的内容全部加在一起。其中,Γ R利用欧氏距离来计算光谱特征提取模块输入与编码器输出之间的相似性,Γ KL是变分自动编码器VAE中的损失函数,利用KL散度来计算高斯分布和嵌入特征分布之间的相似性,Γ Homo利用光谱角距离来计算全部空间特征提取模块输出与对应光谱特征提取模块输出之间的相似性。本实施方式中,空间特征提取模块包括:m个长短期记忆网络层(Long Short-term Memory Layer),各网络层分别为
Figure PCTCN2022142106-appb-000054
各网络层输入分别为
Figure PCTCN2022142106-appb-000055
各网络层输出分别为
Figure PCTCN2022142106-appb-000056
Figure PCTCN2022142106-appb-000057
最后一层节点个数为d/2,其他网络层节点个数均为d。光谱特征提取模块包括:m个长短期记忆网络层组成,各网络层分别为
Figure PCTCN2022142106-appb-000058
Figure PCTCN2022142106-appb-000059
各网络层输入分别为
Figure PCTCN2022142106-appb-000060
各网络层输出分别为
Figure PCTCN2022142106-appb-000061
最后一层节点个数为d/2,其他网络层节点个数均为d。解码器模块由n个全连接网络层(Fully Connected Layer)组成,各网络层分别为{d 1,d 2…d n},各网络层输入分别为{ind 1,ind 2…ind n},各网络层输出分别为{outd 1,outd 2…outd n},最后一层节点个数为B,其他网络层节点个数均为d。解码器用于将融合后的特征O进行数据重构,组成形如自动编码器的结构,有助于保证样本信息的一致性。
在一实施方式中,还包括:利用上述损失函数,选用步长为10 -3的Adam优化器,对深度网络模型进行优化,在模型达到平稳后,将池化后的均值特征
Figure PCTCN2022142106-appb-000062
作为输出,将全部第一样本及第二样本作为测试样本,得到预期的嵌入特征。
优选的,该步骤S4中,需将全部X×Y个高光谱像元按照一定比例划分为训练集和测试集,并归一化使取值范围在-1到1之间,归一化公式如下:
Figure PCTCN2022142106-appb-000063
其中,x min表示像元数据中最小值,x max为最大值。然后将训练集像元随机排序并打包,即分为多批的样本包,每个样本包内包含bs个像元,每次迭代优化只选择其中的一个样本包输入到神经网络中,并且每次选择的样本包不一样。所用的Tanh激活函数计算公式为:
Figure PCTCN2022142106-appb-000064
接下来结合具体实施方式来验证上述的方法。
基于多层次变分自动编码器的高光谱图像特征提取方法用于提取高光谱图像中空谱特征,并用于后续分类研究,采用印第安纳州森林数据集(Indian Pines Dataset)为例,该图像尺寸为145×145×200,共包含21025个像元,每个像元包含200个光谱波长,整个数据集共含有16个有效类别和背景噪声类别。将属于背景噪声类别的像元去掉后,共剩余10366个有效像元。深度网络结构如图2所示:
输入:输入的高光谱图像是大小为145×145×200的图像。
参数设定:邻域尺寸为5,空间特征提取模块和光谱特征提取模块中网络层个数为3,解码器模块中网络层个数为3,嵌入特征维数为40。
进行邻域信息选取,针对每一个像元,得到大小为5×5×200的邻域信息,将像元和邻域信息输入深度网络进行训练。
训练此CNN
在10366个训练集数据中选择40%的样本用于训练深度网络模型,将这些样本随机排序并打包,小批量像元个数为512。每次训练只用其中一个样本包。在训练结束后,将全部10366个训练集数据输入深度模型进行测试,得到尺寸为10366×40的嵌入特征,最后利用SVM分类器进行分类。随机选择10%的样本来 训练SVM分类器,并使用剩余的90%样本进行测试,最终得到分类结果,选用总体分类精度和平均分类精度来对分类结果进行评价。总体分类结果指所有样本中分类正确的样本数目除以全部样本数目的比值。平均分类精度首先每一类中分类正确的样本数目除以该类样本的数目的比值,并求取各类比值的平均值。
用本申请提出的基于多层次变分自动编码器的高光谱图像特征提取方法和普通变分自动器(普通变分自动编码器,包含编码器,特征融合模块和解码器,其中编码器由3层全连接层组成,解码器由3层全连接层组成,网络层节点个数与特征融合模块结构均与本申请实施的方法相同)所得到的分类结果如下表所示。
  总体分类精度 平均分类精度
本申请实施的方法 85.3% 79.1%
添加随机高斯噪声 81.4% 72.3%
普通变分自动编码器 76.7% 66.3%
从表可以看出,本申请的方法可以较好地提升嵌入特征的分类性能,具有更少的错分样本。此外,通过向原始高光谱图像中添加一定的随机高斯噪声,并重复上述实验,得到的总体分类精度为81.4%(未添加随机高斯噪声时总体分类精度达到85.3%),由此可见,本申请方法具有较强的抗噪声干扰能力。因此本申请方法可以有效提升嵌入特征的可分类能力和分类精度,又可以提升模型的抗噪声干扰能力。
上述实施例只为说明本申请的技术构思及特点,其目的在于让熟悉此项技术的人是能够了解本申请的内容并据以实施,并不能以此限制本申请的保护范围。凡如本申请精神实质所做的等效变换或修饰,都应涵盖在本申请的保护范围之内。

Claims (10)

  1. 一种基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,所述方法包括如下步骤:
    S1.选取高光谱图像,所述高光谱图像的尺寸为X×Y×B,其中,X和Y是各个波长下高光谱图像的空间尺寸,B为高光谱图像的波长个数,
    S2.对高光谱图像中每一个高光谱像元配置邻域信息,即选取高光谱像元周围尺寸为s×s的邻域像元作为像元的邻域信息,邻域信息尺寸为s×s×B,其中,邻域信息指以高光谱像元为中心的正方形区域,边长为s,s为奇数,
    S3.构建深度网络模型,基于深度网络模型变换所述邻域信息以得到尺寸为1×s 2×B的第一样本,所述第一样本作为空间特征提取模块的输入Input p
    将尺寸为1×B高光谱像元的第二样本,所述第二样本作为光谱特征提取模块的输入Input q,第一样本与第二样本个数相同且一一对应,
    S4.训练深度神经网络,
    S5.特征拼接及计算均值特征μ,即空间特征提取模块中第
    Figure PCTCN2022142106-appb-100001
    层的输入
    Figure PCTCN2022142106-appb-100002
    为第
    Figure PCTCN2022142106-appb-100003
    层的输出
    Figure PCTCN2022142106-appb-100004
    光谱特征提取模块中第
    Figure PCTCN2022142106-appb-100005
    层的输入
    Figure PCTCN2022142106-appb-100006
    为第
    Figure PCTCN2022142106-appb-100007
    层和
    Figure PCTCN2022142106-appb-100008
    层输出的拼接,依据计算式:
    Figure PCTCN2022142106-appb-100009
    其中1<i<m,依据计算式,
    Figure PCTCN2022142106-appb-100010
    得到均值特征μ,μ的尺寸为bs×s 2×d,
    S6.池化操作,即利用平均池化层将均值特征μ和标准差特征δ进行池化操作,以得到池化后的均值特征
    Figure PCTCN2022142106-appb-100011
    和池化后的标准差特征
    Figure PCTCN2022142106-appb-100012
    尺寸均为bs×d,
    S7.基于特征融合模块得到融合后的特征O,并将融合后的特征O输入至解码器模块,所述解码器用于将融合后的特征O进行数据重构,
    S8.网络优化,即依据公式Γ=Γ RKLHomo构建网络模型训练时的损失函数,其中
    Figure PCTCN2022142106-appb-100013
    Figure PCTCN2022142106-appb-100014
    Figure PCTCN2022142106-appb-100015
    其中,∑(·)为将括号内的内容全部加在一起。
  2. 如权利要求1所述的基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,步骤S1中,还包括:
    对高光谱图像进行归一化预处理,设置邻域尺寸s,空间特征提取模块和光谱特征提取模块中网络层个数m,解码器模块中网络层个数n,嵌入特征维数d,其中,d为大于0的偶数。
  3. 如权利要求1所述的基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,步骤S4中包括:
    从X×Y个尺寸为1×s 2×B的第一样本和X×Y个尺寸为1×B的第二样本中分别随机选取小批量样本输入到深度神经网络中用于深度神经网络的训练,第一样本与第二样本的小批量像元个数均为bs,网络中激活函数均为Tanh激活函数。
  4. 如权利要求3所述的基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,还包括:
    将全部X×Y个高光谱像元归一化使取值范围在-1到1之间,归一化公式如下:
    Figure PCTCN2022142106-appb-100016
    其中,x min表示像元数据中最小值,x max为最大值。
  5. 如权利要求3所述的基于多层次变分自动编码器的高光谱图像特征提取 方法,其特征在于,
    Tanh激活函数计算公式为:
    Figure PCTCN2022142106-appb-100017
  6. 如权利要求1所述的基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,
    步骤S8中还包括:利用损失函数,选用步长为10 -3的Adam优化器,对构建的网络模型进行优化,在模型达到平稳后,将池化后的均值特征
    Figure PCTCN2022142106-appb-100018
    作为输出,将第一样本及第二样本作为测试样本,得到预期的嵌入特征。
  7. 如权利要求1所述的基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,
    所述步骤S3中包括:
    将尺寸为1×B的高光谱像元x进行变换,得到尺寸为
    Figure PCTCN2022142106-appb-100019
    的第二样本,所述第二样本作为光谱特征提取模块的输入Input q
    若B不是s 2的整数倍,则移除ε个波长使得B-ε是s 2的整数倍,其中,q用于指代光谱特征提取模块中的相关变量。
  8. 如权利要求1所述的基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,
    所述步骤S6中还包括:
    利用长短期记忆网络层L得到标准差特征δ,所述长短期记忆网络层L的输入为
    Figure PCTCN2022142106-appb-100020
    节点个数为d,δ的尺寸为bs×s 2×d。
  9. 如权利要求1所述的基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,
    步骤S7中,所述解码器模块包括:
    n个全连接网络层,其中,各网络层分别为{d 1,d 2…d n},各网络层输入分 别为{ind 1,ind 2…ind n},各网络层输出分别为{outd 1,outd 2…outd n},最后一层节点个数为B,其他网络层节点个数均为d。
  10. 如权利要求1所述的基于多层次变分自动编码器的高光谱图像特征提取方法,其特征在于,步骤S7还包括:
    依据公式
    Figure PCTCN2022142106-appb-100021
    得到融合后的特征O,其中,γ为随机生成的噪声矩阵,其符合高斯分布,且尺寸为bs×d。
PCT/CN2022/142106 2021-12-28 2022-12-26 基于多层次变分自动编码器的高光谱图像特征提取方法 WO2023125456A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111627432.1 2021-12-28
CN202111627432.1A CN116416441A (zh) 2021-12-28 2021-12-28 基于多层次变分自动编码器的高光谱图像特征提取方法

Publications (1)

Publication Number Publication Date
WO2023125456A1 true WO2023125456A1 (zh) 2023-07-06

Family

ID=86997859

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/142106 WO2023125456A1 (zh) 2021-12-28 2022-12-26 基于多层次变分自动编码器的高光谱图像特征提取方法

Country Status (2)

Country Link
CN (1) CN116416441A (zh)
WO (1) WO2023125456A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115553A (zh) * 2023-09-13 2023-11-24 南京审计大学 一种基于掩膜谱空特征预测的高光谱遥感图像分类方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026586A1 (en) * 2017-07-19 2019-01-24 Vispek Inc. Portable substance analysis based on computer vision, spectroscopy, and artificial intelligence
CN111160273A (zh) * 2019-12-31 2020-05-15 北京云智空间科技有限公司 一种高光谱图像空谱联合分类方法及装置
CN111914907A (zh) * 2020-07-13 2020-11-10 河海大学 一种基于深度学习空谱联合网络的高光谱图像分类方法
CN112101271A (zh) * 2020-09-23 2020-12-18 台州学院 一种高光谱遥感影像分类方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026586A1 (en) * 2017-07-19 2019-01-24 Vispek Inc. Portable substance analysis based on computer vision, spectroscopy, and artificial intelligence
CN111160273A (zh) * 2019-12-31 2020-05-15 北京云智空间科技有限公司 一种高光谱图像空谱联合分类方法及装置
CN111914907A (zh) * 2020-07-13 2020-11-10 河海大学 一种基于深度学习空谱联合网络的高光谱图像分类方法
CN112101271A (zh) * 2020-09-23 2020-12-18 台州学院 一种高光谱遥感影像分类方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117115553A (zh) * 2023-09-13 2023-11-24 南京审计大学 一种基于掩膜谱空特征预测的高光谱遥感图像分类方法
CN117115553B (zh) * 2023-09-13 2024-01-30 南京审计大学 一种基于掩膜谱空特征预测的高光谱遥感图像分类方法

Also Published As

Publication number Publication date
CN116416441A (zh) 2023-07-11

Similar Documents

Publication Publication Date Title
WO2022160771A1 (zh) 基于自适应多尺度特征提取模型的高光谱图像分类方法
WO2021134871A1 (zh) 基于局部二值模式和深度学习的合成人脸图像取证方法
CN113378632B (zh) 一种基于伪标签优化的无监督域适应行人重识别方法
US11417148B2 (en) Human face image classification method and apparatus, and server
WO2020177432A1 (zh) 基于目标检测网络的多标签物体检测方法、系统、装置
CN109615014B (zh) 一种基于kl散度优化的3d物体数据分类系统与方法
CN112070209B (zh) 基于w距离的稳定可控图像生成模型训练方法
CN110222215B (zh) 一种基于f-ssd-iv3的作物害虫检测方法
US20090319269A1 (en) Method of Trainable Speaker Diarization
CN109063719B (zh) 一种联合结构相似性和类信息的图像分类方法
CN106096535A (zh) 一种基于双线性联合cnn的人脸验证方法
WO2022042123A1 (zh) 图像识别模型生成方法、装置、计算机设备和存储介质
CN109344759A (zh) 一种基于角度损失神经网络的亲属识别方法
CN110543906B (zh) 基于Mask R-CNN模型的肤质自动识别方法
CN107292259A (zh) 基于AdaRank的深度特征和传统特征的集成方法
WO2021051987A1 (zh) 神经网络模型训练的方法和装置
CN110675421B (zh) 基于少量标注框的深度图像协同分割方法
CN112102314B (zh) 基于不确定性来判断人脸图片质量的计算方法
WO2023125456A1 (zh) 基于多层次变分自动编码器的高光谱图像特征提取方法
WO2024040828A1 (zh) 遥感高光谱图像与激光雷达图像融合分类方法及装置
CN112750129A (zh) 一种基于特征增强位置注意力机制的图像语义分割模型
CN110880010A (zh) 基于卷积神经网络的视觉slam闭环检测算法
CN105631478A (zh) 基于稀疏表示字典学习的植物分类方法
Liu et al. Automatic classification of chinese herbal based on deep learning method
CN113033345B (zh) 基于公共特征子空间的v2v视频人脸识别方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22914710

Country of ref document: EP

Kind code of ref document: A1