CN115936136A

CN115936136A - Data recovery method and system based on low-rank structure

Info

Publication number: CN115936136A
Application number: CN202210434699.7A
Authority: CN
Inventors: 罗廷金; 刘玥瑛; 侯臣平
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2022-04-24
Filing date: 2022-04-24
Publication date: 2023-04-07

Abstract

The present application relates to a data restoration method and system based on a low-rank structure. The method includes: collecting an image data set, analyzing the impact of target occlusion in different regions and positions in the image data set on the quality and signal-to-noise ratio of the target data; , build a low-rank matrix data restoration model based on semi-quadratic minimization; through the low-rank matrix data restoration model, and according to the semi-quadratic minimization theory, data is performed on the missing part of the target data of the image dataset recovery. In the embodiment of the present invention, aiming at the problem of lack of occluded target data, by introducing a robust factor into the loss function, a low-rank matrix data restoration method based on semi-quadratic minimization is designed, and optimized with the help of semi-quadratic minimization theory Solve to realize the restoration of the occluded target data matrix, while effectively reducing the interference of missing data and noise sample points.

Description

A data restoration method and system based on low-rank structure

技术领域Technical Field

本申请涉及数据复原领域，特别是涉及一种基于低秩结构的数据复原方法与系统。The present application relates to the field of data restoration, and in particular to a data restoration method and system based on a low-rank structure.

背景技术Background Art

目标遮挡即为数据特征缺失，是机器学习和信号处理领域中不完全数据分析的典型场景之一。针对信息不完全数据的学习方法在推荐系统、医疗数据分析、图像恢复和图像超分辨等若干领域有着广泛的应用。Object occlusion is the missing of data features, which is one of the typical scenarios of incomplete data analysis in the fields of machine learning and signal processing. Learning methods for incomplete data have been widely used in several fields such as recommendation systems, medical data analysis, image restoration and image super-resolution.

针对标签信息不完全问题，常常采用半监督学习和弱监督学习等方法对标签信息增强。例如，基于生成式模型的半监督分类方法是比较早的学习方法，其核心思想是采用聚类假设，综合标签数据和未标签数据，建立数据的联合概率分布模型，通过假定数据的概率分布模型来求出数据属于某一类的后验概率。该类方法的缺点建立模型前需要假定数据的概率分布，然而实际中由于数据的稀疏性，假定的概率分布往往是不准确的。此外，基于图的半监督分类方法其核心思想为标签信息传播，通过构造一个加权无向图来表达数据两两之间的关系，根据图上的加权连接信息使得标签信息从标签数据向无标签数据传播。当图矩阵构造质量较高时，该类方法能实现良好的效果，其缺点在于该方法是直推式模型，处理新来数据时需要重新训练模型。判别式半监督分类方法通过学习使得分类超平面到最近的样例间距最大化。自训练半监督分类方法首先用标签样本构造分类器对未标签样本进行预测，将可信度较高的未标签样本及其预测标签作为标签数据加入训练集中。该方法需要不断加入新的数据对分类器进行迭代的训练，因此所需时间较长，此外若初始分类器有偏差，则误差会持续累积，导致最终模型不精确。In order to solve the problem of incomplete label information, methods such as semi-supervised learning and weakly supervised learning are often used to enhance label information. For example, the semi-supervised classification method based on generative models is a relatively early learning method. Its core idea is to adopt clustering assumptions, integrate labeled data and unlabeled data, establish a joint probability distribution model of data, and obtain the posterior probability that the data belongs to a certain class by assuming the probability distribution model of data. The disadvantage of this type of method is that the probability distribution of data needs to be assumed before the model is established. However, in practice, due to the sparsity of data, the assumed probability distribution is often inaccurate. In addition, the core idea of the graph-based semi-supervised classification method is label information propagation. By constructing a weighted undirected graph to express the relationship between data, the label information is propagated from labeled data to unlabeled data according to the weighted connection information on the graph. When the quality of the graph matrix construction is high, this type of method can achieve good results. Its disadvantage is that this method is a direct induction model and needs to be retrained when processing new data. The discriminant semi-supervised classification method maximizes the distance from the classification hyperplane to the nearest sample through learning. The self-training semi-supervised classification method first constructs a classifier with labeled samples to predict unlabeled samples, and adds the more credible unlabeled samples and their predicted labels as labeled data to the training set. This method requires continuous addition of new data to iteratively train the classifier, so it takes a long time. In addition, if the initial classifier is biased, the error will continue to accumulate, resulting in an inaccurate final model.

发明内容Summary of the invention

基于此，有必要针对上述技术问题，提供一种基于低秩结构的数据复原方法和系统。Based on this, it is necessary to provide a data recovery method and system based on a low-rank structure to address the above technical problems.

第一方面，本发明实施例提供了一种基于低秩结构的数据复原方法，该方法包括：In a first aspect, an embodiment of the present invention provides a data recovery method based on a low-rank structure, the method comprising:

采集图像数据集，分析所述图像数据集中不同区域和位置的目标遮挡对目标数据的质量和信噪比影响；Collecting an image data set, and analyzing the impact of target occlusions in different areas and positions in the image data set on the quality and signal-to-noise ratio of the target data;

根据分析的结果，针对所述图像数据集中有遮挡的目标数据缺失，构建基于半二次最小化的低秩矩阵数据复原模型；According to the analysis results, a low-rank matrix data restoration model based on semi-quadratic minimization is constructed for missing occluded target data in the image data set;

通过所述低秩矩阵数据复原模型，并根据半二次最小化理论，对所述图像数据集的目标数据中的缺失部分进行数据复原。The low-rank matrix data recovery model is used to recover the missing parts of the target data of the image data set according to the semi-quadratic minimization theory.

进一步的，所述采集图像数据集，分析所述图像数据集中不同区域和位置的目标遮挡对目标数据的质量和信噪比影响，包括：Furthermore, the collecting of the image data set and analyzing the influence of target occlusion in different areas and positions in the image data set on the quality and signal-to-noise ratio of the target data include:

将所述图像数据集中的缺失数据转换为高维缺失数据矩阵X，所述高维缺失数据矩阵X通过真实数据矩阵Z和噪声E混合产生；Convert the missing data in the image data set into a high-dimensional missing data matrix X, where the high-dimensional missing data matrix X is generated by mixing a real data matrix Z and noise E;

将真实数据矩阵Z分解为两个低秩矩阵的乘积，即Z＝UW，其中，W∈R^k×n为低维表示矩阵，U∈R^d×k是低维表示子空间基矩阵。The real data matrix Z is decomposed into the product of two low-rank matrices, that is, Z = UW, where W∈R ^k×n is the low-dimensional representation matrix and U∈R ^d×k is the low-dimensional representation subspace basis matrix.

进一步的，所述根据分析的结果，针对所述图像数据集中有遮挡的目标数据缺失，构建基于半二次最小化的低秩矩阵数据复原模型，包括：Furthermore, according to the analysis results, for the missing occluded target data in the image data set, a low-rank matrix data restoration model based on semi-quadratic minimization is constructed, including:

构建的所述低秩矩阵数据复原模型为：The low-rank matrix data restoration model constructed is:

其中，f(Z，U，W)表示低秩近似的损失函数，φ(U)和

是加在低维表示子空间基矩阵U和低维表示矩阵W的正则项，λ和γ是大于零的超参数。Among them, f(Z, U, W) represents the loss function of low-rank approximation, φ(U) and

It is a regular term added to the low-dimensional representation subspace basis matrix U and the low-dimensional representation matrix W, and λ and γ are hyperparameters greater than zero.

进一步的，所述根据分析的结果，针对所述图像数据集中有遮挡的目标数据缺失，构建基于半二次最小化的低秩矩阵数据复原模型，还包括：Furthermore, according to the analysis result, for the missing occluded target data in the image data set, a low-rank matrix data restoration model based on semi-quadratic minimization is constructed, which also includes:

通过半二次最小化稳健估计函数增强学习模型的鲁棒性，将所述损失函数表示为：The robustness of the learning model is enhanced by minimizing the robust estimation function through semi-quadratic minimization, and the loss function is expressed as:

引入鲁棒因子l₁-l₂的鲁棒估计器，为所述损失函数缺失样本和异常点分配了更小的去权重，将所述损失函数再次转换为：A robust estimator with robust factors l ₁ -l ₂ is introduced to assign smaller weights to missing samples and outliers in the loss function, and the loss function is converted again to:

在所述损失函数引入潜在函数ρ(t)，并将潜在函数的

等价转化为The potential function ρ(t) is introduced into the loss function, and the potential function

Equivalent conversion to

其中，s为辅助变量，ψ(s)为鲁棒因子的对偶函数；则所述损失函数进一步转换为：Where s is an auxiliary variable, ψ(s) is the dual function of the robust factor; then the loss function is further converted to:

另一方面，本发明实施例还提供了一种基于低秩结构的数据复原系统，包括：On the other hand, an embodiment of the present invention further provides a data recovery system based on a low-rank structure, comprising:

图像缺失分析模块，用于采集图像数据集，分析所述图像数据集中不同区域和位置的目标遮挡对目标数据的质量和信噪比影响；An image missing analysis module is used to collect an image data set and analyze the impact of target occlusions in different areas and positions in the image data set on the quality and signal-to-noise ratio of the target data;

数据复原模型模块，用于根据分析的结果，针对所述图像数据集中有遮挡的目标数据缺失，构建基于半二次最小化的低秩矩阵数据复原模型；A data restoration model module is used to construct a low-rank matrix data restoration model based on semi-quadratic minimization according to the analysis results and for missing occluded target data in the image data set;

图像复原模块，用于通过所述低秩矩阵数据复原模型，并根据半二次最小化理论，对所述图像数据集的目标数据中的缺失部分进行数据复原。The image restoration module is used to restore the missing part of the target data of the image data set through the low-rank matrix data restoration model and according to the semi-quadratic minimization theory.

进一步的，所述图像缺失分析模块包括低秩表示单元，所述低秩表示单元用于：Furthermore, the image missing analysis module includes a low-rank representation unit, and the low-rank representation unit is used to:

进一步的，所述数据复原模型模块包括目标函数单元，所述目标函数单元用于：Furthermore, the data recovery model module includes an objective function unit, and the objective function unit is used to:

其中，f(Z，U，W)表示低秩近似的损失函数，φ(U)和

进一步的，所述数据复原模型模块还包括损失函数单元，所述损失函数单元用于：Furthermore, the data restoration model module also includes a loss function unit, and the loss function unit is used to:

在所述损失函数引入潜在函数ρ(t)，并将潜在函数的

Equivalent conversion to

进一步的，所述数据复原模型模块还包括模型求解单元，所述模型求解单元用于：Furthermore, the data recovery model module also includes a model solving unit, and the model solving unit is used to:

在所述低维表示子空间基矩阵U上加上平滑项，即

在低维表示矩阵W上加入核范数的正则项，即

Add a smoothing term to the low-dimensional representation subspace basis matrix U, that is,

Add the regular term of the nuclear norm to the low-dimensional representation matrix W, that is,

将构建的所述低秩矩阵数据复原模型转换为：The constructed low-rank matrix data recovery model is converted into:

通过减少异常样本或者缺失样本对模型的影响，根据鲁棒因子

将所述低秩矩阵数据复原模型再次转换为：By reducing the impact of abnormal samples or missing samples on the model, according to the robustness factor

The low-rank matrix data restoration model is converted again into:

采用交替最小化策略进行求解，将其他变量看作为已知变量并固定，更新其中一组变量，获取所述低秩矩阵数据复原模型的求解结果。An alternating minimization strategy is used to solve the problem, other variables are regarded as known variables and fixed, one set of variables is updated, and the solution result of the low-rank matrix data restoration model is obtained.

上述基于低秩结构的数据复原方法和系统，该方法包括：采集图像数据集，分析所述图像数据集中不同区域和位置的目标遮挡对目标数据的质量和信噪比影响；根据分析的结果，针对所述图像数据集中有遮挡的目标数据缺失，构建基于半二次最小化的低秩矩阵数据复原模型；通过所述低秩矩阵数据复原模型，并根据半二次最小化理论，对所述图像数据集的目标数据中的缺失部分进行数据复原。本发明实施例针对有遮挡的目标数据缺失问题，通过在损失函数上引入鲁棒因子，设计了基于半二次最小化的低秩矩阵数据复原方法，并借助于半二次最小化理论进行优化求解，实现遮挡目标数据矩阵恢复，同时有效降低了数据缺失和噪声样本点的干扰。最终与传统方法进行对比，数值实验验证了提出方法的有效性，在遮挡情形下对目标数据质量和目标智能识别性能提升有显著作用。The above-mentioned data restoration method and system based on low-rank structure, the method includes: collecting image data sets, analyzing the impact of target occlusion in different areas and positions in the image data sets on the quality and signal-to-noise ratio of target data; according to the analysis results, for the missing target data with occlusion in the image data sets, constructing a low-rank matrix data restoration model based on semi-quadratic minimization; through the low-rank matrix data restoration model, and according to the semi-quadratic minimization theory, the missing part of the target data in the image data sets is restored. In this embodiment of the present invention, for the problem of missing target data with occlusion, a low-rank matrix data restoration method based on semi-quadratic minimization is designed by introducing a robust factor in the loss function, and the optimization solution is performed with the help of the semi-quadratic minimization theory to realize the recovery of the occluded target data matrix, while effectively reducing the interference of data missing and noise sample points. Finally, compared with the traditional method, the numerical experiment verifies the effectiveness of the proposed method, which has a significant effect on improving the target data quality and target intelligent recognition performance under occlusion.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为一个实施例中基于低秩结构的数据复原方法的流程示意图；FIG1 is a schematic diagram of a process of a data recovery method based on a low-rank structure in one embodiment;

图2为一个实施例中对缺失数据进行低秩表示的流程示意图；FIG2 is a schematic diagram of a process of performing low-rank representation on missing data in one embodiment;

图3为一个实施例中基于低秩结构的数据复原系统的结构框图。FIG3 is a structural block diagram of a data recovery system based on a low-rank structure in one embodiment.

具体实施方式DETAILED DESCRIPTION

为了使本申请的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本申请进行进一步详细说明。应当理解，此处描述的具体实施例仅仅用以解释本申请，并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application more clearly understood, the present application is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application and are not used to limit the present application.

由于数据采集过程中受到诱饵干扰或者遮挡等因素影响，导致数据特征采集不完全，即为数据特征缺失现象。数据特征缺失导致目标信息不完全，进而将极大影响后续目标识别性能。分析可知，常见可见光图像本身具有低秩先验结构。为此，需要对数据进行低秩结构建模，融合先验结构信息进行缺失特征补全，弥补和增强目标数据信息。Due to the influence of factors such as bait interference or occlusion during data collection, data feature collection is incomplete, which is the phenomenon of missing data features. Missing data features lead to incomplete target information, which will greatly affect the subsequent target recognition performance. Analysis shows that common visible light images themselves have a low-rank prior structure. To this end, it is necessary to model the low-rank structure of the data, integrate the prior structure information to complete the missing features, and make up for and enhance the target data information.

在一个实施例中，如图1所示，提供了一种基于低秩结构的数据复原方法，所述方法包括：In one embodiment, as shown in FIG1 , a data recovery method based on a low-rank structure is provided, the method comprising:

步骤101，采集图像数据集，分析所述图像数据集中不同区域和位置的目标遮挡对目标数据的质量和信噪比影响；Step 101, collecting an image data set, and analyzing the impact of target occlusion in different areas and positions in the image data set on the quality and signal-to-noise ratio of the target data;

步骤102，根据分析的结果，针对所述图像数据集中有遮挡的目标数据缺失，构建基于半二次最小化的低秩矩阵数据复原模型；Step 102, according to the analysis result, for the missing target data with occlusion in the image data set, construct a low-rank matrix data restoration model based on semi-quadratic minimization;

步骤103，通过所述低秩矩阵数据复原模型，并根据半二次最小化理论，对所述图像数据集的目标数据中的缺失部分进行数据复原。Step 103 , using the low-rank matrix data restoration model and according to the semi-quadratic minimization theory, restore the missing part of the target data of the image data set.

具体地，首先针对目标遮挡导致数据缺失问题，通过仿真实验分析不同区域和位置的目标遮挡对目标数据的质量和信噪比影响，深入分析不同目标遮挡位置类型的信号特性和影响。于此同时，通过实验分析验证了有遮挡的目标数据的低秩假设的合理性，为构建基于低秩结构的遮挡数据特征复原模型与方法提供支持。Specifically, we first focus on the data loss problem caused by target occlusion. We analyze the impact of target occlusion in different regions and locations on the quality and signal-to-noise ratio of target data through simulation experiments, and deeply analyze the signal characteristics and impact of different types of target occlusion locations. At the same time, we verify the rationality of the low-rank assumption of occluded target data through experimental analysis, providing support for the construction of occluded data feature restoration models and methods based on low-rank structures.

本实施例针对有遮挡的目标数据缺失问题，通过在损失函数上引入鲁棒因子，设计了基于半二次最小化的低秩矩阵数据复原方法，并借助于半二次最小化理论进行优化求解，实现遮挡目标数据矩阵恢复，同时有效降低了数据缺失和噪声样本点的干扰。最终与传统方法进行对比，数值实验验证了提出方法的有效性，在遮挡情形下对目标数据质量和目标智能识别性能提升有显著作用。This embodiment aims at the problem of missing target data under occlusion. By introducing a robust factor into the loss function, a low-rank matrix data recovery method based on semi-quadratic minimization is designed, and the semi-quadratic minimization theory is used to optimize and solve the problem, thereby realizing the recovery of the occluded target data matrix and effectively reducing the interference of missing data and noise sample points. Finally, the proposed method is compared with the traditional method, and the numerical experiment verifies the effectiveness of the proposed method, which has a significant effect on improving the target data quality and target intelligent recognition performance under occlusion.

在一个实施例中，如图2所示，对缺失数据进行低秩表示的流程包括以下步骤：In one embodiment, as shown in FIG2 , the process of performing low-rank representation on missing data includes the following steps:

步骤201，将所述图像数据集中的缺失数据转换为高维缺失数据矩阵X，所述高维缺失数据矩阵X通过真实数据矩阵Z和噪声E混合产生；Step 201, converting the missing data in the image data set into a high-dimensional missing data matrix X, wherein the high-dimensional missing data matrix X is generated by mixing a real data matrix Z and noise E;

步骤202，将真实数据矩阵Z分解为两个低秩矩阵的乘积，即Z＝UW，其中，W∈R^k×n为低维表示矩阵，U∈R^d×k是低维表示子空间基矩阵。Step 202, decompose the real data matrix Z into the product of two low-rank matrices, that is, Z = UW, where W∈R ^k×n is a low-dimensional representation matrix, and U∈R ^d×k is a low-dimensional representation subspace basis matrix.

在一个实施例中，所述根据分析的结果，针对所述图像数据集中有遮挡的目标数据缺失，构建基于半二次最小化的低秩矩阵数据复原模型，包括：In one embodiment, according to the analysis result, for the missing occluded target data in the image data set, a low-rank matrix data restoration model based on semi-quadratic minimization is constructed, including:

其中，f(Z，U，W)表示低秩近似的损失函数，φ(U)和

具体地，为了提升缺失方法的鲁棒性，基于半二次最小化理论设计了基于半二次最小化的低秩矩阵复原方法(IMLHM)。具体而言，给定一个高维缺失数据矩阵X，通常X可通过Z和噪声E混合产生。基于数据低秩性假设，将真实数据矩阵Z分解为两个低秩矩阵的乘积，即Z＝UW，其中，W∈R^k×n为低维表示矩阵，U∈R^d×k是低维表示子空间基矩阵。则基于半二次最小化的低秩矩阵复原框架可描述为Specifically, in order to improve the robustness of the missing method, a low-rank matrix recovery method based on semi-quadratic minimization (IMLHM) is designed based on the semi-quadratic minimization theory. Specifically, given a high-dimensional missing data matrix X, X can usually be generated by mixing Z and noise E. Based on the assumption of low rank of data, the real data matrix Z is decomposed into the product of two low-rank matrices, that is, Z = UW, where W∈R ^k×n is the low-dimensional representation matrix, and U∈R ^d×k is the low-dimensional representation subspace basis matrix. Then the low-rank matrix recovery framework based on semi-quadratic minimization can be described as

在一个实施例中，所述根据分析的结果，针对所述图像数据集中有遮挡的目标数据缺失，构建基于半二次最小化的低秩矩阵数据复原模型，还包括：In one embodiment, according to the analysis result, for the missing occluded target data in the image data set, constructing a low-rank matrix data restoration model based on semi-quadratic minimization further includes:

在所述损失函数引入潜在函数ρ(t)，并将潜在函数的

Equivalent conversion to

具体地，传统很多学习方法都是该框架的一个特例，即基于该框架使用不同的损失函数或者正则项。对于损失函数而言，常用

范数度量拟合误差值，然而该损失对数据噪声、数据缺失和异常点等影响十分敏感，进而使得传统方法对数据缺失敏感。为了增强学习模型的鲁棒性，半二次最小化稳健估计函数是一种常用的方法，因此，可以对损失函数进行转换。Specifically, many traditional learning methods are special cases of this framework, that is, they use different loss functions or regularization terms based on this framework.

The norm measures the fitting error value, but the loss is very sensitive to data noise, data missing and outliers, which makes the traditional method sensitive to data missing. In order to enhance the robustness of the learning model, semi-quadratic minimization of the robust estimation function is a common method, so the loss function can be transformed.

此外，鲁棒估计器的引入相当于为缺失样本和异常点分配了更小的去权重，给重要的点分配了更大的权重。具体而言，每个样本x_i分配的权重是θ(||z_i-Uw_i||_F)。以上稳健估计函数中，因为l₁-l₂鲁棒因子形式最简单，效果也不错，选取l₁-l₂作为该方法表示损失函数的一部分。In addition, the introduction of the robust estimator is equivalent to assigning smaller weights to missing samples and outliers, and assigning larger weights to important points. Specifically, the weight assigned to each sample x _i is θ(||z _i -Uwi _|| _F ). Among the above robust estimation functions, because the l ₁ -l ₂ robust factor has the simplest form and good effect, l ₁ -l ₂ is selected as part of the loss function of this method.

在所述低维表示子空间基矩阵U上加上平滑项，即

在低维表示矩阵W上加入核范数的正则项，即

The low-rank matrix data restoration model is converted again into:

具体地，为了使得低维表示具有低秩聚类结构以及避免模型过拟合，在U上加上平滑项，

在W上加入核范数的正则项，即

因此，提出的基于半二次最小化的低秩矩阵复原方法的目标函数。Specifically, in order to make the low-dimensional representation have a low-rank clustering structure and avoid model overfitting, a smoothing term is added to U:

Add the regularization term of the nuclear norm to W, that is

Therefore, the objective function of the proposed low-rank matrix restoration method based on semi-quadratic minimization.

稳健估计函数的引入加大了优化问题求解难度，为此基于半二次最小化理论可知，

可以等价转化为The introduction of robust estimation function increases the difficulty of solving the optimization problem. Therefore, based on the semi-quadratic minimization theory, we know that

can be equivalently converted into

其中，s为辅助变量，ψ(s)为鲁棒因子的对偶函数。为此，公式中的损失函数可以等价转换为：Among them, s is an auxiliary variable, and ψ(s) is the dual function of the robust factor. To this end, the loss function in the formula can be equivalently converted to:

此外，不同的鲁棒估计器具有不同的对偶函数ψ(s)，不同的对偶函数ψ(s)对应不同的极小化函数θ(t)，于此同时极小化函数θ(t)决定辅助变量s。极小化函数有两种形式：加法形式和乘法形式。由于乘法形式比加法形式收敛所需迭代次数少，所以一般常采用乘法形式。对于l₁-l₂函数的鲁棒因子

若z_i和Uw_i之间的差异越大，样本权重s_i将越小，进而有效地减少异常样本或者缺失样本对模型的影响。s的取值与更新由极小化函数θ(t)决定，同时ψ(s)的显式形式不知道也不会影响模型参数更新，可以忽略不计。In addition, different robust estimators have different dual functions ψ(s), and different dual functions ψ(s) correspond to different minimization functions θ(t). At the same time, the minimization function θ(t) determines the auxiliary variable s. There are two forms of minimization functions: additive form and multiplicative form. Since the multiplicative form requires fewer iterations to converge than the additive form, the multiplicative form is generally used. For the robust factor of the l ₁ -l ₂ function

The larger the difference between z _i and U w _i , the smaller the sample weight _si will be, which effectively reduces the impact of abnormal samples or missing samples on the model. The value and update of s are determined by the minimization function θ(t). At the same time, the explicit form of ψ(s) is unknown and will not affect the update of model parameters, so it can be ignored.

在求解过程中，

记In the solution process,

remember

则问题的矩阵形式为：The matrix form of the problem is:

具体迭代步骤如算法所示。The specific iterative steps are shown in the algorithm.

应该理解的是，虽然上述流程图中的各个步骤按照箭头的指示依次显示，但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明，这些步骤的执行并没有严格的顺序限制，这些步骤可以以其它的顺序执行。而且，上述流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段，这些子步骤或者阶段并不必然是在同一时刻执行完成，而是可以在不同的时刻执行，这些子步骤或者阶段的执行顺序也不必然是依次进行，而是可以与其它步骤或者其它步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that, although the various steps in the above-mentioned flow chart are displayed in sequence according to the indication of the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless there is a clear explanation in this article, the execution of these steps is not strictly limited in order, and these steps can be executed in other orders. Moreover, at least a part of the steps in the above-mentioned flow chart may include a plurality of sub-steps or a plurality of stages, and these sub-steps or stages are not necessarily executed at the same time, but can be executed at different times, and the execution order of these sub-steps or stages is not necessarily to be carried out in sequence, but can be executed in turn or alternately with other steps or at least a part of the sub-steps or stages of other steps.

在一个实施例中，如图3所示，提供了一种基于低秩结构的数据复原系统，包括：In one embodiment, as shown in FIG3 , a data recovery system based on a low-rank structure is provided, comprising:

图像缺失分析模块301，用于采集图像数据集，分析所述图像数据集中不同区域和位置的目标遮挡对目标数据的质量和信噪比影响；An image missing analysis module 301 is used to collect an image data set and analyze the impact of target occlusions in different regions and positions in the image data set on the quality and signal-to-noise ratio of the target data;

数据复原模型模块302，用于根据分析的结果，针对所述图像数据集中有遮挡的目标数据缺失，构建基于半二次最小化的低秩矩阵数据复原模型；A data restoration model module 302 is used to construct a low-rank matrix data restoration model based on semi-quadratic minimization according to the analysis results and for missing occluded target data in the image data set;

图像复原模块303，用于通过所述低秩矩阵数据复原模型，并根据半二次最小化理论，对所述图像数据集的目标数据中的缺失部分进行数据复原。The image restoration module 303 is used to restore the missing part of the target data of the image data set through the low-rank matrix data restoration model and according to the semi-quadratic minimization theory.

在一个实施例中，如图3所示，所述图像缺失分析模块301包括低秩表示单元3011，所述低秩表示单元3011用于：In one embodiment, as shown in FIG3 , the image missing analysis module 301 includes a low-rank representation unit 3011, and the low-rank representation unit 3011 is used to:

在一个实施例中，如图3所示，所述数据复原模型模块302包括目标函数单元3021，所述目标函数单元3021用于：In one embodiment, as shown in FIG. 3 , the data recovery model module 302 includes an objective function unit 3021, and the objective function unit 3021 is used to:

其中，f(Z，U，W)表示低秩近似的损失函数，φ(U)和

在一个实施例中，如图3所示，所述数据复原模型模块302还包括损失函数单元3022，所述损失函数单元3022用于：In one embodiment, as shown in FIG. 3 , the data recovery model module 302 further includes a loss function unit 3022, and the loss function unit 3022 is used to:

在所述损失函数引入潜在函数ρ(t)，并将潜在函数的

Equivalent conversion to

在一个实施例中，如图3所示，所述数据复原模型模块302还包括模型求解单元3023，所述模型求解单元3023用于：In one embodiment, as shown in FIG. 3 , the data recovery model module 302 further includes a model solving unit 3023, and the model solving unit 3023 is used to:

在所述低维表示子空间基矩阵U上加上平滑项，即

在低维表示矩阵W上加入核范数的正则项，即

The low-rank matrix data restoration model is converted again into:

关于基于低秩结构的数据复原系统的具体限定可以参见上文中对于基于低秩结构的数据复原方法的限定，在此不再赘述。上述基于低秩结构的数据复原系统中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中，也可以以软件形式存储于计算机设备中的存储器中，以便于处理器调用执行以上各个模块对应的操作。For the specific limitations of the data recovery system based on the low-rank structure, please refer to the limitations of the data recovery method based on the low-rank structure above, which will not be repeated here. Each module in the above-mentioned data recovery system based on the low-rank structure can be implemented in whole or in part by software, hardware and a combination thereof. The above-mentioned modules can be embedded in or independent of the processor in the computer device in the form of hardware, or can be stored in the memory of the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.

本领域技术人员可以理解，图3中示出的结构，仅仅是与本申请方案相关的部分结构的框图，并不构成对本申请方案所应用于其上的计算机设备的限定，具体的计算机设备可以包括比图中所示更多或更少的部件，或者组合某些部件，或者具有不同的部件布置。Those skilled in the art will understand that the structure shown in FIG. 3 is merely a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied. The specific computer device may include more or fewer components than shown in the figure, or combine certain components, or have a different arrangement of components.

本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程，是可以通过计算机程序来指令相关的硬件来完成，所述的计算机程序可存储于一非易失性计算机可读取存储介质中，该计算机程序在执行时，可包括如上述各方法的实施例的流程。A person of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing related hardware through a computer program. The computer program can be stored in a non-volatile computer-readable storage medium. When the computer program is executed, it can include the processes of the embodiments of the above-mentioned methods.

以上实施例的各技术特征可以进行任意的组合，为使描述简洁，未对上述实施例中的各个技术特征所有可能的组合都进行描述，然而，只要这些技术特征的组合不存在矛盾，都应当认为是本说明书记载的范围。The technical features of the above embodiments may be arbitrarily combined. To make the description concise, not all possible combinations of the technical features in the above embodiments are described. However, as long as there is no contradiction in the combination of these technical features, they should be considered to be within the scope of this specification.

以上所述实施例仅表达了本申请的几种实施方式，其描述较为具体和详细，但并不能因此而理解为对发明专利范围的限制。应当指出的是，对于本领域的普通技术人员来说，在不脱离本申请构思的前提下，还可以做出若干变形和改进，这些都属于本申请的保护范围。因此，本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express several implementation methods of the present application, and the descriptions thereof are relatively specific and detailed, but they cannot be understood as limiting the scope of the invention patent. It should be pointed out that, for a person of ordinary skill in the art, several variations and improvements can be made without departing from the concept of the present application, and these all belong to the protection scope of the present application. Therefore, the protection scope of the patent of the present application shall be subject to the attached claims.

Claims

1. A method for data recovery based on a low rank structure, the method comprising:

acquiring an image data set, and analyzing the influence of target occlusion in different areas and positions in the image data set on the quality and the signal-to-noise ratio of target data;

according to the analysis result, aiming at the shielded target data missing in the image data set, constructing a low-rank matrix data restoration model based on semiquadratic minimization;

and performing data recovery on the missing part in the target data of the image data set through the low-rank matrix data recovery model according to a half-quadratic minimization theory.

2. The low-rank structure-based data recovery method according to claim 1, wherein the acquiring the image dataset and analyzing the influence of the occlusion of the target in different areas and positions in the image dataset on the quality and the signal-to-noise ratio of the target data comprises:

converting missing data in the image dataset into a high-dimensional missing data matrix X, wherein the high-dimensional missing data matrix X is generated by mixing a real data matrix Z and noise E;

the true data matrix Z is decomposed into the product of two low rank matrices, i.e. Z = UW, where W ∈ R ^k×n For a low-dimensional representation matrix, U ∈ R ^d×k Is a low-dimensional representation subspace base matrix.

3. The method for data recovery based on low rank structure according to claim 1, wherein the constructing a model for recovering low rank matrix data based on semi-quadratic minimization for the missing occluded target data in the image data set according to the analysis result comprises:

the low-rank matrix data recovery model is constructed by the following steps:

where f (Z, U, W) represents a loss function of a low rank approximation, phi (U) and

is a canonical term added to the low-dimensional representation subspace base matrix U and the low-dimensional representation matrix W, λ and γ being hyperparameters greater than zero.

4. The method for data recovery based on low rank structure according to claim 2, wherein the constructing a model for recovering low rank matrix data based on semi-quadratic minimization for the missing occluded target data in the image data set according to the analysis result further comprises:

enhancing the robustness of the learning model by minimizing the robust estimation function semiquadratic, and expressing the loss function as:

introducing a robust factor l ₁ -l ₂ The loss function missing samples and the abnormal points are distributed with smaller de-weightsAnd, converting the loss function again to:

introducing a potential function rho (t) into the loss function and adding the potential function

Conversion to equivalence

Wherein s is an auxiliary variable, and psi(s) is a dual function of the robust factor; the loss function further translates to:

5. the method for data recovery based on low rank structure according to claim 4, wherein the constructing a model for recovering low rank matrix data based on semi-quadratic minimization for the missing occluded target data in the image data set according to the analysis result further comprises:

adding a smoothing term to the low-dimensional representation subspace basis matrix U, i.e.

Adding a regularization term of the nuclear norm to the low-dimensional representation matrix W, i.e. < >>

Converting the constructed low-rank matrix data recovery model into:

by reducing the effect of abnormal or missing samples on the model, according to a robust factor

Reconverting the low-rank matrix data recovery model to:

and solving by adopting an alternative minimization strategy, taking other variables as known variables and fixing, updating one group of variables, and obtaining a solving result of the low-rank matrix data recovery model.

6. A low rank structure based data recovery system, comprising:

the image missing analysis module is used for acquiring an image data set and analyzing the influence of target shielding in different areas and positions in the image data set on the quality and the signal-to-noise ratio of target data;

the data recovery model module is used for constructing a low-rank matrix data recovery model based on semi-quadratic minimization aiming at the shielded target data missing in the image data set according to the analysis result;

and the image restoration module is used for restoring the missing part in the target data of the image data set through the low-rank matrix data restoration model according to a half-quadratic minimization theory.

7. The low-rank structure-based data recovery system according to claim 6, wherein the image missing analysis module comprises a low-rank representation unit for:

8. The low rank structure based data recovery system of claim 6, wherein the data recovery model module comprises an objective function unit to:

the low-rank matrix data recovery model is constructed by the following steps:

are regular terms added to the low-dimensional representation subspace base matrix U and the low-dimensional representation matrix W, λ and γ being hyperparameters greater than zero.

9. The low rank structure based data recovery system of claim 6, wherein the data recovery model module further comprises a loss function unit for:

enhancing the robustness of the learning model by minimizing the robust estimation function by a half-quadratic, the loss function is expressed as:

introducing a robust factor l ₁ -l ₂ The robust estimator of (2) assigns smaller de-weights to the missing samples and outliers of the loss function, and transforms the loss function again into:

Conversion to equivalence

10. the low-rank structure based data recovery system according to claim 6, wherein the data recovery model module further comprises a model solving unit for:

Converting the constructed low-rank matrix data recovery model into:

by reducing the influence of abnormal samples or missing samples on the model, according to the robust factor

Reconverting the low-rank matrix data recovery model to: