CN104318046A - System and method for incrementally converting high dimensional data into low dimensional data - Google Patents
System and method for incrementally converting high dimensional data into low dimensional data Download PDFInfo
- Publication number
- CN104318046A CN104318046A CN201410407101.0A CN201410407101A CN104318046A CN 104318046 A CN104318046 A CN 104318046A CN 201410407101 A CN201410407101 A CN 201410407101A CN 104318046 A CN104318046 A CN 104318046A
- Authority
- CN
- China
- Prior art keywords
- dimensional data
- data
- dimensional
- new
- dimension
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 68
- 238000012545 processing Methods 0.000 claims abstract description 51
- 239000013598 vector Substances 0.000 claims description 41
- 238000000605 extraction Methods 0.000 claims description 9
- 239000000284 extract Substances 0.000 claims description 5
- 238000013480 data collection Methods 0.000 claims description 3
- 230000000694 effects Effects 0.000 abstract description 16
- 230000007547 defect Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 15
- 238000006243 chemical reaction Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000009897 systematic effect Effects 0.000 description 2
- 230000003044 adaptive effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Landscapes
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
技术领域 technical field
本发明属于增量式的高维数据及处理技术领域,具体涉及一种增量式的高维数据转换为低维数据的系统及方法。 The invention belongs to the technical field of incremental high-dimensional data and its processing, and in particular relates to a system and method for converting incremental high-dimensional data into low-dimensional data. the
背景技术 Background technique
现有的在科学研究和工程应用中,很多通过数据采集系统采集来的如视频、音频、气候以及图像数据具有高维数据的特点,这种高维数据可以提供丰富而详尽的信息,但是高维数据的处理往往会产生维数过大导致的海量的处理时间,这种问题往往会导致数据处理系统的硬件的处理效率效果不高,耗时耗力并且在并发执行处理方面严重的会导致死机怠慢处理进程,如果在实时处理高维数据的环境下,这将会导致数据的丢失乃至无法正常反映实时状态信息。 In the existing scientific research and engineering applications, many data such as video, audio, climate and image data collected by data acquisition systems have the characteristics of high-dimensional data. This high-dimensional data can provide rich and detailed information, but the high-dimensional The processing of dimensional data often produces massive processing time caused by too large a dimension, which often leads to low processing efficiency of the hardware of the data processing system, which is time-consuming and labor-intensive, and seriously leads to serious problems in concurrent execution processing. Crashing slows down the processing process. If high-dimensional data is processed in real time, this will lead to data loss and even fail to reflect real-time status information normally. the
发明内容 Contents of the invention
本发明的目的提供一种增量式的高维数据转换为低维数据的系统及方法,包括高维数据采集系统,所述的高维数据采集系统同数据处理系统相连接,所述的数据处理系统中包括有用于增量式的高维数据转换为低维数据的模块,所述的数据处理系统中包含有用于存放高维数据的队列。这样的结构结合其方法避免了现有技术中导致数据处理系统的硬件的处理效率效果不高、耗时耗力并且在并发执行处理方面严重的会导致死机怠慢处理进程以及如果在实时处理高维数据的环境下这将会导致数据的丢失乃至无法正常反映实时状态信息的缺 陷。 The object of the present invention is to provide a system and method for converting incremental high-dimensional data into low-dimensional data, including a high-dimensional data acquisition system, the high-dimensional data acquisition system is connected with a data processing system, and the data The processing system includes a module for incrementally converting high-dimensional data into low-dimensional data, and the data processing system includes a queue for storing high-dimensional data. Such a structure combined with its method avoids the processing efficiency of the hardware of the data processing system in the prior art is not high, time-consuming and labor-intensive, and will seriously cause crashes and slow down the processing process in terms of concurrent execution processing, and if processing high-dimensional data in real time In the data environment, this will lead to the loss of data and even the defect that the real-time status information cannot be reflected normally. the
为了克服现有技术中的不足,本发明提供了一种增量式的高维数据转换为低维数据的系统及方法的解决方案,具体如下: In order to overcome the deficiencies in the prior art, the present invention provides a solution to the system and method for converting incremental high-dimensional data into low-dimensional data, specifically as follows:
一种增量式的高维数据转换为低维数据的系统,包括高维数据采集系统1,所述的高维数据采集系统1同数据处理系统2相连接,所述的数据处理系统2中包括有用于增量式的高维数据转换为低维数据的模块3,所述的数据处理系统2中包含有用于存放高维数据的队列4。 A system for converting incremental high-dimensional data into low-dimensional data, including a high-dimensional data acquisition system 1, the high-dimensional data acquisition system 1 is connected with a data processing system 2, and the data processing system 2 It includes a module 3 for converting incremental high-dimensional data into low-dimensional data, and the data processing system 2 includes a queue 4 for storing high-dimensional data. the
所述的增量式的高维数据转换为低维数据的系统的方法,如下所示: The systematic method for converting incremental high-dimensional data into low-dimensional data is as follows:
步骤1:首先高维数据采集系统进行针对视频、音频、气候或者图像数据这样的高维数据的采集,然后将采集到的高维数据发送到数据处理系统2; Step 1: First, the high-dimensional data acquisition system collects high-dimensional data such as video, audio, climate or image data, and then sends the collected high-dimensional data to the data processing system 2;
步骤2:数据处理系统2接收到高维数据后,然后按照接收的先后顺序把高维数据依次存放到用于存放高维数据的队列4中,启动用于增量式的高维数据转换为低维数据的模块3来设定一个n维空间对象V,所述的n维空间对象V内包含有k维空间对象S,k初始设置为0; Step 2: After the data processing system 2 receives the high-dimensional data, it stores the high-dimensional data into the queue 4 for storing high-dimensional data in sequence according to the order of receipt, and starts the conversion of the incremental high-dimensional data to The module 3 of low-dimensional data sets an n-dimensional spatial object V, and the n-dimensional spatial object V contains a k-dimensional spatial object S, and k is initially set to 0;
步骤3:数据处理系统2然后按先后顺序依次从用于存放高维数据的队列4取出一个高维数据,取出一个高维数据X后,就进行对该高维数据X的特征分量的提取及降维操作,所述的高维数据X表示为 (x1,x2,…xn),n为该高维数据的维数; Step 3: The data processing system 2 then takes out a high-dimensional data from the queue 4 for storing high-dimensional data in sequence, and after taking out a high-dimensional data X, extracts and extracts the feature components of the high-dimensional data X. Dimensionality reduction operation, the high-dimensional data X is expressed as (x 1 , x 2 ,...x n ), n is the dimension of the high-dimensional data;
步骤4:所述的进行对该高维数据的特征分量的提取及降维操作包括首先用于增量式的高维数据转换为低维数据的模块3把该高维数据X投影到k维空间对象S所表示的k维空间中,该高维数据X投影到k维空间对象所表示的k维空间S中的方法是根据公式(1)用迭代方法来得到结果向量rk: Step 4: The extraction of the feature components of the high-dimensional data and the dimensionality reduction operation include firstly converting the incremental high-dimensional data into low-dimensional data module 3 to project the high-dimensional data X to k dimensions In the k-dimensional space represented by the spatial object S, the method for projecting the high-dimensional data X into the k-dimensional space S represented by the k-dimensional spatial object is to use an iterative method to obtain the result vector r k according to formula (1):
所述的i的取值范围为从1到k,k为目前空间对象S的维数,也是目前数据降维后低维的维数,第一系数rk为结果向量,当结果向量的长度||rk||2小于Tk时,k值不变,目前空间对象S也保持不变,在||rk||2大于等于Tk时,求得第k+1系数将bk+1作为新的空间基底加入原k维空间对象S,令空间对象S的维数增加1,k=k+1。设定r0=X,而ri为中间向量,,Ti为第i阈值,Xmax为在用于存放高维数据的队列4中数据长度最长的那个高维数据; The value range of i is from 1 to k, and k is the dimension of the current spatial object S, which is also the low-dimensional dimension of the current data after dimensionality reduction. The first coefficient r k is the result vector. When the length of the result vector ||r k || 2 is less than T k , the value of k remains unchanged, and the current spatial object S also remains unchanged. When ||r k || 2 is greater than or equal to T k , get the k+1th coefficient Add b k+1 as a new spatial basis to the original k-dimensional spatial object S, and increase the dimension of the spatial object S by 1, k=k+1. Set r 0 =X, and r i is the intermediate vector, T i is the ith threshold, X max is the high-dimensional data with the longest data length in queue 4 for storing high-dimensional data;
步骤5:根据所得到的b1、b2…bk通过公式(2)得到针对该高维数据X的降维数据向量中分量非零的部分Y: Step 5: According to the obtained b 1 , b 2 .
当高维数据的队列全部处理完毕后,根据最终k的取值,将全部高维数据的降维数据向量分量非零的部分补充上分量为零的部分,统一表示为k维的降维后数据向量。 After all the queues of high-dimensional data are processed, according to the final value of k, the non-zero part of the dimensionality reduction data vector component of all high-dimensional data is added to the part with zero component, which is uniformly expressed as k-dimensional dimensionality reduction data vector. the
步骤6:在得到了针对每一个高维数据的降维数据向量后,如果高维数据采集系统继续进行高维数据的采集,并把新的采集到的高维数据发送到数据处理系统2; Step 6: After obtaining the dimensionality reduction data vector for each high-dimensional data, if the high-dimensional data collection system continues to collect high-dimensional data, and send the newly collected high-dimensional data to the data processing system 2;
步骤7:数据处理系统2接收到新的采集到的高维数据后,然后按照接收的先后顺序把高维数据依次存放到用于存放高维数据的队列4中; Step 7: After the data processing system 2 receives the newly collected high-dimensional data, it stores the high-dimensional data in sequence in the queue 4 for storing high-dimensional data according to the order of receipt;
步骤8:数据处理系统2然后按先后顺序依次从用于存放高维数据的队列4取出一个新的采集到的高维数据,取出一个新的采集到的高维数据Xnew后,就进行对该高维数据Xnew的特征分量的提取及降维操作,所述的高维数据Xnew表示为(x’1,x’2,…x’n),n为该新的高维数据的维数; Step 8: The data processing system 2 then takes out a new collected high-dimensional data from the queue 4 for storing high-dimensional data in sequence, and after taking out a new collected high-dimensional data X new , it performs the comparison The extraction and dimension reduction operation of the feature components of the high-dimensional data X new , the high-dimensional data X new is expressed as (x' 1 , x' 2 ,...x' n ), n is the new high-dimensional data dimension;
步骤9:所述的进行对该新的采集到的高维数据的特征分量的提取及降维操作包括首先用于增量式的高维数据转换为低维数据的模块3把该新的采集到的高维数据Xnew投影到以b1、b2…bk为基底的k维空间中,该新的采集到的高维数据Xnew投影到以b1、b2…bk为基底 的k维空间中的方法是根据公式(3)用迭代方法来得到结果向量rk: Step 9: The extraction and dimension reduction operation of the feature components of the newly collected high-dimensional data includes firstly converting the incremental high-dimensional data into low-dimensional data module 3 to convert the new collected The acquired high-dimensional data X new is projected into the k-dimensional space based on b 1 , b 2 ...b k , and the newly collected high-dimensional data X new is projected onto the bases of b 1 , b 2 ...b k The method in the k-dimensional space of is to use an iterative method according to formula (3) to obtain the result vector r k :
所述的i的取值范围为从1到k,k为目前空间对象S的维数,也是目前数据降维后低维的维数,rk为结果向量,当结果向量的长度||rk||2小于Tk时,k值不变,目前空间对象S也保持不变,在||rk||2大于等于Tk时,求得第k+1系数将bk+1作为新的空间基底加入原k维空间对象S,令空间对象S的维数增加1,k=k+1。设定r0=Xnew,而ri为中间向量,Ti为第i阈值, Xmax为目前系统处理过的高维数据的队列4中数据长度最长的那个高维数据; The value range of i is from 1 to k, k is the dimension of the current spatial object S, which is also the low-dimensional dimension of the current data after dimensionality reduction, r k is the result vector, when the length of the result vector ||r When k || 2 is less than T k , the k value remains unchanged, and the current spatial object S also remains unchanged. When ||r k || 2 is greater than or equal to T k , the k+1th coefficient is obtained Add b k+1 as a new spatial basis to the original k-dimensional spatial object S, and increase the dimension of the spatial object S by 1, k=k+1. Set r 0 =X new , and r i is the intermediate vector, T i is the ith threshold, X max is the high-dimensional data with the longest data length in queue 4 of the high-dimensional data processed by the current system;
步骤10:根据所得到的b1、b2…bk通过公式(4)得到针对该高维数据Xnew的降维数据向量中分量非零的部分Ynew: Step 10: According to the obtained b 1 , b 2 .
当高维数据的队列全部处理完毕后,根据最终k的取值,通过在全部已处理高维数据的降维数据向量后部添加分量为零的部分,统一表示为k维的降维后数据向量。 After all the queues of high-dimensional data have been processed, according to the final value of k, by adding a part with zero component at the end of the dimension-reduced data vector of all processed high-dimensional data, it is uniformly expressed as k-dimensional dimension-reduced data vector. the
应用本发明上述方案,还可以达到消除数据采集系统采集来的视频、音频、气候以及图像数据这样的高维数据的冗余度、简化高维数据的复杂度、揭示高维数据的内在结构和联系、提高维数据的处理效率、改善降维后的数据可理解性以及提高降维后数据准确反映原始高维数据的效果。 Applying the above scheme of the present invention can also eliminate the redundancy of high-dimensional data such as video, audio, climate and image data collected by the data acquisition system, simplify the complexity of high-dimensional data, and reveal the internal structure and structure of high-dimensional data. Contact, improve the processing efficiency of dimensional data, improve the comprehensibility of data after dimension reduction, and improve the effect that data after dimension reduction accurately reflect the original high-dimensional data. the
附图说明 Description of drawings
图l为本发明的原理结构示意图。 Fig. 1 is a schematic diagram of the principle structure of the present invention. the
图2为本发明的方法针对第一组增量式的高维数据同现有技术的两种方法的对比重构误差的效果图。 FIG. 2 is an effect diagram of comparing reconstruction errors between the method of the present invention and the two methods of the prior art for the first set of incremental high-dimensional data. the
图3为本发明的方法针对第二组增量式的高维数据同现有技术的两种方法的对比重构误差的效果图。 FIG. 3 is an effect diagram of comparing reconstruction errors between the method of the present invention and the two methods of the prior art for the second set of incremental high-dimensional data. the
图4为本发明的方法针对第三组增量式的高维数据同现有技术的两种方法的对比重构误差的效果图。 FIG. 4 is an effect diagram of comparing reconstruction errors between the method of the present invention and the two methods of the prior art for the third group of incremental high-dimensional data. the
图5为本发明的方法针对第四组增量式的高维数据同现有技术的两种方法的对比重构误差的效果图。 FIG. 5 is an effect diagram of comparing reconstruction errors between the method of the present invention and the two methods of the prior art for the fourth set of incremental high-dimensional data. the
图6为本发明的方法针对第五组增量式的高维数据同现有技术的两种方法的对比重构误差的效果图。 FIG. 6 is an effect diagram of comparing reconstruction errors between the method of the present invention and the two methods of the prior art for the fifth set of incremental high-dimensional data. the
图7为本发明的方法针对第六组增量式的高维数据同现有技术的两种方法的对比重构误差的效果图。 FIG. 7 is an effect diagram of comparing reconstruction errors between the method of the present invention and the two methods of the prior art for the sixth group of incremental high-dimensional data. the
图8为本发明的方法针对第一组增量式的高维数据同现有技术的两种方法的对比降维耗时的效果图。 FIG. 8 is an effect diagram of dimensionality reduction and time-consuming comparison between the method of the present invention and the two methods of the prior art for the first set of incremental high-dimensional data. the
图9为本发明的方法针对第二组增量式的高维数据同现有技术的两种方法的对比降维耗时的效果图。 FIG. 9 is an effect diagram of dimensionality reduction and time-consuming comparison between the method of the present invention and the two methods of the prior art for the second set of incremental high-dimensional data. the
图10为本发明的方法针对第三组增量式的高维数据同现有技术的两种方法的对比降维耗时的效果图。 FIG. 10 is an effect diagram of dimensionality reduction and time-consuming comparison between the method of the present invention and the two methods of the prior art for the third group of incremental high-dimensional data. the
图11为本发明的方法针对第四组增量式的高维数据同现有技术的两种方法的对比降维耗时的效果图。 FIG. 11 is an effect diagram of dimensionality reduction and time-consuming comparison between the method of the present invention and the two methods of the prior art for the fourth set of incremental high-dimensional data. the
图12为本发明的方法针对第五组增量式的高维数据同现有技术的两种方法的对比降维耗时的效果图。 FIG. 12 is an effect diagram of dimensionality reduction and time-consuming comparison between the method of the present invention and the two methods of the prior art for the fifth set of incremental high-dimensional data. the
图13为本发明的方法针对第六组增量式的高维数据同现有技术的两种方法的对比降维耗时的效果图。 FIG. 13 is an effect diagram of dimensionality reduction and time-consuming comparison between the method of the present invention and the two methods of the prior art for the sixth group of incremental high-dimensional data. the
具体实施方式 Detailed ways
绝大多数现有的降维方法都需要用户设定特征空间的维数(目标维数),很多降维方法不具备在线增量续展的能力,这样就重复降维的方法步骤多,导致占用系统的资源也多,并且增大的时间复杂度,许多传统的维数约减方法,需要导出特征向量或进行矩阵求逆运算,需要更大的时间复杂度或导致算法的不稳定。 Most of the existing dimensionality reduction methods require the user to set the dimensionality of the feature space (target dimensionality). Many dimensionality reduction methods do not have the ability of online incremental renewal, so many steps of dimensionality reduction methods are repeated, resulting in It also takes up a lot of system resources and increases the time complexity. Many traditional dimension reduction methods need to derive eigenvectors or perform matrix inversion operations, which require greater time complexity or lead to algorithm instability. the
下面结合附图对发明内容作进一步说明: Below in conjunction with accompanying drawing, content of the invention is further described:
参照图1所示,增量式的高维数据转换为低维数据的系统,包括高维数据采集系统1,所述的高维数据采集系统1同数据处理系统2相连接,所述的数据处理系统2中包括有用于增量式的高维数据转换为低维数据的模块3,所述的数据处理系统2中包含有用于存放高维数据的队列4。 Referring to Fig. 1, the system for converting incremental high-dimensional data into low-dimensional data includes a high-dimensional data acquisition system 1, and the high-dimensional data acquisition system 1 is connected with a data processing system 2, and the data The processing system 2 includes a module 3 for incrementally converting high-dimensional data into low-dimensional data, and the data processing system 2 includes a queue 4 for storing high-dimensional data. the
所述的增量式的高维数据转换为低维数据的系统的方法,如下所示: The systematic method for converting incremental high-dimensional data into low-dimensional data is as follows:
步骤1:首先高维数据采集系统进行针对视频、音频、气候或者图像数据这样的高维数据的采集,然后将采集到的高维数据发送到数据处理系统2; Step 1: First, the high-dimensional data acquisition system collects high-dimensional data such as video, audio, climate or image data, and then sends the collected high-dimensional data to the data processing system 2;
步骤2:数据处理系统2接收到高维数据后,然后按照接收的先后顺序把高维数据依次存放到用于存放高维数据的队列4中,启动用于增量式的高维数据转换为低维数据的模块3来设定一个n维空间对象V,所述的n维空间对象V内包含有k维空间对象S,k初始设置为0; Step 2: After the data processing system 2 receives the high-dimensional data, it stores the high-dimensional data into the queue 4 for storing high-dimensional data in sequence according to the order of receipt, and starts the conversion of the incremental high-dimensional data to The module 3 of low-dimensional data sets an n-dimensional spatial object V, and the n-dimensional spatial object V contains a k-dimensional spatial object S, and k is initially set to 0;
步骤3:数据处理系统2然后按先后顺序依次从用于存放高维数据的队列4取出一个高维数据,取出一个高维数据X后,就进行对该高维数据X的特征分量的提取及降维操作,所述的高维数据X表示为(x1,x2,…xn),n为该高维数据的维数; Step 3: The data processing system 2 then takes out a high-dimensional data from the queue 4 for storing high-dimensional data in sequence, and after taking out a high-dimensional data X, extracts and extracts the feature components of the high-dimensional data X. Dimensionality reduction operation, the high-dimensional data X is expressed as (x 1 , x 2 ,...x n ), n is the dimension of the high-dimensional data;
步骤4:所述的进行对该高维数据的特征分量的提取及降维操作包括首先用于增量式的高维数据转换为低维数据的模块3把该高维数据X投影到k维空间对象S所表示的k维空间中,该高维数据X投影到k维空间对象所表示的k维空间S中的方法是根据公式(1)用迭代方法来得到结果向量rk: Step 4: The extraction of the feature components of the high-dimensional data and the dimensionality reduction operation include firstly converting the incremental high-dimensional data into low-dimensional data module 3 to project the high-dimensional data X to k dimensions In the k-dimensional space represented by the spatial object S, the method for projecting the high-dimensional data X into the k-dimensional space S represented by the k-dimensional spatial object is to use an iterative method to obtain the result vector r k according to formula (1):
所述的i的取值范围为从1到K,K为目前空间对象S的维数,也是目前数据降维后低维的维数,第一系数rk为结果向量,当结果向量的长度||rk||2小于Tk时,k值不变,目前空间对象S 也保持不变,在||rk||2大于等于Tk时,求得第k+1系数将bk+1作为新的空间基底加入原k维空间对象S,令空间对象S的维数增加1,k=k+1。设定r0=X,而ri为中间向量,,Ti为第i阈值,Xmax为在用于存放高维数据的队列4中数据长度最长的那个高维数据; The value range of i is from 1 to K, and K is the dimension of the current spatial object S, which is also the low-dimensional dimension of the current data after dimensionality reduction. The first coefficient r k is the result vector. When the length of the result vector ||r k || 2 is less than T k , the value of k remains unchanged, and the current spatial object S also remains unchanged. When ||r k || 2 is greater than or equal to T k , get the k+1th coefficient Add b k+1 as a new spatial basis to the original k-dimensional spatial object S, and increase the dimension of the spatial object S by 1, k=k+1. Set r 0 =X, and r i is the intermediate vector, T i is the ith threshold, X max is the high-dimensional data with the longest data length in queue 4 for storing high-dimensional data;
步骤5:根据所得到的b1、b2…bk通过公式(2)得到针对该高维数据X的降维数据向量中分量非零的部分Y: Step 5: According to the obtained b 1 , b 2 .
当高维数据的队列全部处理完毕后,根据最终k的取值,将全部高维数据的降维数据向量分量非零的部分补充上分量为零的部分,统一表示为k维的降维后数据向量。 After all the queues of high-dimensional data are processed, according to the final value of k, the non-zero part of the dimensionality reduction data vector component of all high-dimensional data is added to the part with zero component, which is uniformly expressed as k-dimensional dimensionality reduction data vector. the
步骤6:在得到了针对每一个高维数据的降维数据向量后,如果高维数据采集系统继续进行高维数据的采集,并把新的采集到的高维数据发送到数据处理系统2; Step 6: After obtaining the dimensionality reduction data vector for each high-dimensional data, if the high-dimensional data collection system continues to collect high-dimensional data, and send the newly collected high-dimensional data to the data processing system 2;
步骤7:数据处理系统2接收到新的采集到的高维数据后,然后按照接收的先后顺序把高维数据依次存放到用于存放高维数据的队列4中; Step 7: After the data processing system 2 receives the newly collected high-dimensional data, it stores the high-dimensional data in sequence in the queue 4 for storing high-dimensional data according to the order of receipt;
步骤8:数据处理系统2然后按先后顺序依次从用于存放高维数 据的队列4取出一个新的采集到的高维数据,取出一个新的采集到的高维数据Xnew后,就进行对该高维数据Xnew的特征分量的提取及降维操作,所述的高维数据Xnew表示为(x’1,x’2,…x’n),n为该新的高维数据的维数; Step 8: The data processing system 2 then takes out a new high-dimensional data collected from the queue 4 for storing high-dimensional data in sequence, and after taking out a new high-dimensional data X new collected, the comparison is performed. The extraction and dimension reduction operation of the feature components of the high-dimensional data X new , the high-dimensional data X new is expressed as (x' 1 , x' 2 ,...x' n ), n is the new high-dimensional data dimension;
步骤9:所述的进行对该新的采集到的高维数据的特征分量的提取及降维操作包括首先用于增量式的高维数据转换为低维数据的模块3把该新的采集到的高维数据Xnew投影到以b1、b2…bk为基底的k维空间中,该新的采集到的高维数据Xnew投影到以b1、b2…bk为基底的k维空间中的方法是根据公式(3)用迭代方法来得到结果向量rk: Step 9: The extraction and dimension reduction operation of the feature components of the newly collected high-dimensional data includes firstly converting the incremental high-dimensional data into low-dimensional data module 3 to convert the new collected The acquired high-dimensional data X new is projected into the k-dimensional space based on b 1 , b 2 ...b k , and the newly collected high-dimensional data X new is projected onto the bases of b 1 , b 2 ...b k The method in the k-dimensional space of is to use an iterative method according to formula (3) to obtain the result vector r k :
所述的i的取值范围为从1到k,k为目前空间对象S的维数,也是目前数据降维后低维的维数,rk为结果向量,当结果向量的长度||rk||2小于Tk时,k值不变,目前空间对象S也保持不变,在||rk||2大于等于Tk时,求得第k+1系数将bk+1作为新的空间基底加入原k维空间对象S,令空间对象S的维数增加1,k=k+1。设定r0=Xnew,而ri为中间向量,Ti为第i阈值, Xmax为目前系统处理过的高维数据的队列4中数据长度最长的那个高维数据; The value range of i is from 1 to k, k is the dimension of the current spatial object S, which is also the low-dimensional dimension of the current data after dimensionality reduction, r k is the result vector, when the length of the result vector ||r When k || 2 is less than T k , the k value remains unchanged, and the current spatial object S also remains unchanged. When ||r k || 2 is greater than or equal to T k , the k+1th coefficient is obtained Add b k+1 as a new spatial basis to the original k-dimensional spatial object S, and increase the dimension of the spatial object S by 1, k=k+1. Set r 0 =X new , and r i is the intermediate vector, T i is the ith threshold, X max is the high-dimensional data with the longest data length in queue 4 of the high-dimensional data processed by the current system;
步骤10:根据所得到的b1、b2…bk通过公式(4)得到针对该高维数据Xnew的降维数据向量中分量非零的部分Ynew: Step 10: According to the obtained b 1 , b 2 .
当高维数据的队列全部处理完毕后,根据最终k的取值,通过在全部已处理高维数据的降维数据向量后部添加分量为零的部分,统一表示为k维的降维后数据向量。 After all the queues of high-dimensional data have been processed, according to the final value of k, by adding a part with zero component at the end of the dimension-reduced data vector of all processed high-dimensional data, it is uniformly expressed as k-dimensional dimension-reduced data vector. the
本发明的方法能够解决绝大多数现有的降维方法的缺点,从而实现自适应地确定目标维数、在线增量续展以及无需导出特征向量特征方程或进行矩阵求逆操作,以尽量小的导出数据的数量获取正交的成分向量。并且IOCA计算复杂度为O(Ndk),N为数据个数,d为原始数据维数,k为目标维数IOCA只需遍历一次数据,便可同时获取正交成分和数据的低维表示,若b1,b2,...,bk为最终获取的正交基底,本发明可以保证这样对于每一个高维数据,都会使得降维后的结果数据经过重构后与原始数据的误差小于如图2、图3、图4、图5、图6和图7所示,这六张附图分别展示了本发明所代表的IOCA的降维方法同现有的技术中的IPCA降维方法和CCIPCA降维方法在针对六组增量式的高维数据的降维后的重构误差的效果图,从图中可以看出本发明的方法自动确定的降维后的维数误差小,其他的现有技术的方法降维后的维数不确定,无法保证降维后数据的准确再现性,并且现有技术的方法无法像本发明的方法保 证基底向量的正交性。 The method of the present invention can solve the shortcomings of most existing dimensionality reduction methods, thereby realizing adaptive determination of the target dimension, online incremental extension, and no need to derive eigenvector eigenequation or perform matrix inversion operation, to minimize Gets an orthogonal component vector for the number of derived data. And the computational complexity of IOCA is O(Ndk), where N is the number of data, d is the dimension of the original data, and k is the dimension of the target. IOCA only needs to traverse the data once to obtain the low-dimensional representation of the orthogonal components and data at the same time. If b 1 , b 2 ,..., b k are the finally obtained orthogonal bases, the present invention can guarantee In this way, for each high-dimensional data, the error between the reconstructed result data and the original data will be less than As shown in Fig. 2, Fig. 3, Fig. 4, Fig. 5, Fig. 6 and Fig. 7, these six accompanying drawings respectively show that the dimensionality reduction method of IOCA represented by the present invention is the same as the IPCA dimensionality reduction method in the prior art and CCIPCA dimensionality reduction method for the effect diagram of the reconstruction error after dimensionality reduction of six sets of incremental high-dimensional data, it can be seen from the figure that the dimensionality error automatically determined by the method of the present invention after dimensionality reduction is small, The dimensions of other prior art methods after dimension reduction are uncertain, and cannot guarantee the accurate reproducibility of the data after dimension reduction, and the prior art methods cannot guarantee the orthogonality of basis vectors like the method of the present invention.
如图8、图9、图10、图11、图12和图13所示,这六张附图分别展示了本发明所代表的IOCA的降维方法同现有的技术中的IPCA降维方法和CCIPCA降维方法在针对六组增量式的高维数据的降维的耗时的效果图,从图中可以看出本发明的方法进行增量式降维运算的耗时远远小于其他现有技术的两种方法。 As shown in Fig. 8, Fig. 9, Fig. 10, Fig. 11, Fig. 12 and Fig. 13, these six drawings respectively show that the IOCA dimension reduction method represented by the present invention is the same as the IPCA dimension reduction method in the prior art and CCIPCA dimensionality reduction method in the time-consuming effect diagram for dimensionality reduction of six sets of incremental high-dimensional data, it can be seen from the figure that the time-consuming incremental dimensionality reduction operation of the method of the present invention is far less than other Two methods of prior art. the
以上所述,仅是本发明的较佳实施例而已,并非对本发明作任何形式上的限制,虽然本发明已以较佳实施例揭露如上,然而并非用以限定本发明,任何熟悉本专业的技术人员,在不脱离本发明技术方案范围内,当可利用上述揭示的技术内容做出些许更动或修饰为等同变化的等效实施例,但凡是未脱离本发明技术方案内容,依据本发明的技术实质,在本发明的精神和原则之内,对以上实施例所作的任何简单的修改、等同替换与改进等,均仍属于本发明技术方案的保护范围之内。 The above description is only a preferred embodiment of the present invention, and does not limit the present invention in any form. Although the present invention has been disclosed as above with preferred embodiments, it is not intended to limit the present invention. Anyone familiar with this field Those skilled in the art, without departing from the scope of the technical solution of the present invention, may use the technical content disclosed above to make some changes or modify equivalent embodiments with equivalent changes, but as long as they do not depart from the technical solution of the present invention, according to the technical content of the present invention Within the spirit and principles of the present invention, any simple modifications, equivalent replacements and improvements made to the above embodiments still fall within the protection scope of the technical solutions of the present invention. the
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410407101.0A CN104318046A (en) | 2014-08-18 | 2014-08-18 | System and method for incrementally converting high dimensional data into low dimensional data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410407101.0A CN104318046A (en) | 2014-08-18 | 2014-08-18 | System and method for incrementally converting high dimensional data into low dimensional data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104318046A true CN104318046A (en) | 2015-01-28 |
Family
ID=52373277
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410407101.0A Pending CN104318046A (en) | 2014-08-18 | 2014-08-18 | System and method for incrementally converting high dimensional data into low dimensional data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104318046A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105160154A (en) * | 2015-08-07 | 2015-12-16 | 武汉大学 | Parallel factor based multidimensional data analysis method |
CN108573262A (en) * | 2018-05-08 | 2018-09-25 | 南京大学 | A High Dimensional Sparse Vector Reconstruction Method Based on IGR_OMP |
CN108664448A (en) * | 2018-05-08 | 2018-10-16 | 南京大学 | A kind of higher-dimension sparse vector reconstructing method based on IQR_OMP |
CN110415270A (en) * | 2019-06-17 | 2019-11-05 | 广东第二师范学院 | A kind of human motion form evaluation method based on double study mapping increment dimensionality reduction models |
CN117390013A (en) * | 2023-09-12 | 2024-01-12 | 博瀚智能(深圳)有限公司 | Data storage methods, retrieval methods, systems, equipment and storage media |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101546332A (en) * | 2009-05-07 | 2009-09-30 | 哈尔滨工程大学 | Manifold dimension-reducing medical image search method based on quantum genetic optimization |
CN103258001A (en) * | 2013-04-01 | 2013-08-21 | 哈尔滨工业大学 | Local linear embedded algorithm based radio frequency map unsupervised classifying method |
CN103336960A (en) * | 2013-07-26 | 2013-10-02 | 电子科技大学 | Human face identification method based on manifold learning |
CN103605889A (en) * | 2013-11-13 | 2014-02-26 | 浙江工业大学 | Data dimension reduction method based on data global-local structure preserving projections |
CN103761426A (en) * | 2014-01-02 | 2014-04-30 | 中国科学院数学与系统科学研究院 | Method and system for quickly recognizing feature combinations in high-dimensional data |
-
2014
- 2014-08-18 CN CN201410407101.0A patent/CN104318046A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101546332A (en) * | 2009-05-07 | 2009-09-30 | 哈尔滨工程大学 | Manifold dimension-reducing medical image search method based on quantum genetic optimization |
CN103258001A (en) * | 2013-04-01 | 2013-08-21 | 哈尔滨工业大学 | Local linear embedded algorithm based radio frequency map unsupervised classifying method |
CN103336960A (en) * | 2013-07-26 | 2013-10-02 | 电子科技大学 | Human face identification method based on manifold learning |
CN103605889A (en) * | 2013-11-13 | 2014-02-26 | 浙江工业大学 | Data dimension reduction method based on data global-local structure preserving projections |
CN103761426A (en) * | 2014-01-02 | 2014-04-30 | 中国科学院数学与系统科学研究院 | Method and system for quickly recognizing feature combinations in high-dimensional data |
Non-Patent Citations (6)
Title |
---|
NICOLAS DOBIGEON等: "Bayesian Orthogonal Component Analysis for Sparse Representation", 《IEEE TRANSACTIONS ON SIGNAL PROCESSING》 * |
YE XU等: "To Obtain Orthogonal Feature Extraction Using Training Data Selection", 《PROCEEDINGS OF THE 18TH ACM CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT》 * |
侯咏佳等: "主成分分析算法的FPGA实现", 《机电工程》 * |
南京航空航天大学科技部编: "《南京航空航天大学论文集 2008年 第15册 自动化学院 第1分册》", 31 May 2006, 南京航空航天大学科技部 * |
朱真峰等: "增量式最小二乘法分类器与增量式支持向量机的对比", 《小型微型计算机系统》 * |
李郁林等: "高维数据分析中的降维研究", 《计算机光盘软件与应用》 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105160154A (en) * | 2015-08-07 | 2015-12-16 | 武汉大学 | Parallel factor based multidimensional data analysis method |
CN108573262A (en) * | 2018-05-08 | 2018-09-25 | 南京大学 | A High Dimensional Sparse Vector Reconstruction Method Based on IGR_OMP |
CN108664448A (en) * | 2018-05-08 | 2018-10-16 | 南京大学 | A kind of higher-dimension sparse vector reconstructing method based on IQR_OMP |
CN108664448B (en) * | 2018-05-08 | 2021-06-01 | 南京大学 | High-dimensional sparse vector reconstruction method based on IQR _ OMP |
CN110415270A (en) * | 2019-06-17 | 2019-11-05 | 广东第二师范学院 | A kind of human motion form evaluation method based on double study mapping increment dimensionality reduction models |
CN117390013A (en) * | 2023-09-12 | 2024-01-12 | 博瀚智能(深圳)有限公司 | Data storage methods, retrieval methods, systems, equipment and storage media |
CN117390013B (en) * | 2023-09-12 | 2024-11-26 | 博瀚智能(深圳)有限公司 | Data storage method, retrieval method, system, device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104318046A (en) | System and method for incrementally converting high dimensional data into low dimensional data | |
JP2022543232A (en) | Pipelined Hardware Decoder for Quantum Computing Devices | |
CN101751697B (en) | Three-dimensional scene reconstruction method based on statistical model | |
CN107192878A (en) | A kind of trend of harmonic detection method of power and device based on compressed sensing | |
WO2018027584A1 (en) | Method and system for restoring image using target attribute assisted compression perception | |
CN102624399B (en) | Reconfiguration method for compression sensing signal | |
CN103746703A (en) | Segmented self-adaptive regularized matching pursuit reconstruction method based on threshold | |
CN105515585A (en) | Compressed sensing reconstruction method for signals with unknown sparseness | |
CN109756740B (en) | Half-tensor image compression method and image restoration method based on optimal measurement matrix | |
JP5220071B2 (en) | How to reconstruct a sparse signal from sign measurements using greedy search | |
CN114492823B (en) | Method and device for eliminating quantum noise, electronic equipment and medium | |
CN110751599A (en) | Visual tensor data completion method based on truncated nuclear norm | |
WO2017128632A1 (en) | Method, apparatus and system for image compression and image reconstruction | |
CN105630741A (en) | Improved module for solving inverse matrixes of matrixes according to bit replacement method | |
CN104090952A (en) | Method and system for estimating average value of data flow under sliding window | |
CN108846430B (en) | A Sparse Representation Method of Image Signal Based on Polyatomic Dictionary | |
CN102521299A (en) | Method for processing data of resource description framework | |
CN102163338B (en) | Efficient reconstruction method in compression perceptual system | |
CN113947237B (en) | A wind power error correction method based on AMS-TCN | |
CN109584320A (en) | A kind of Low coherence observing matrix building method | |
CN105846826B (en) | Compressed sensing signal reconfiguring method based on approximate smooth L0 norm | |
CN108629371B (en) | A Data Dimensionality Reduction Method for Two-Dimensional Time-Frequency Data | |
CN104537377A (en) | Image data dimension reduction method based on two-dimensional kernel entropy component analysis | |
CN104317792B (en) | The high dimensional data of collection is converted to the system and method for low-dimensional data | |
CN107809253A (en) | Compressed sensing data reconstruction method based on random Kaczmarz iteration |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20150128 |
|
RJ01 | Rejection of invention patent application after publication |