CN104346520A - Neural network based data dimension reduction system and dimension reducing method thereof - Google Patents

Neural network based data dimension reduction system and dimension reducing method thereof Download PDF

Info

Publication number
CN104346520A
CN104346520A CN201410362559.9A CN201410362559A CN104346520A CN 104346520 A CN104346520 A CN 104346520A CN 201410362559 A CN201410362559 A CN 201410362559A CN 104346520 A CN104346520 A CN 104346520A
Authority
CN
China
Prior art keywords
reference point
data
dimensionality reduction
neuroid
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410362559.9A
Other languages
Chinese (zh)
Other versions
CN104346520B (en
Inventor
申富饶
干强
赵金熙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University
Original Assignee
Nanjing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University filed Critical Nanjing University
Priority to CN201410362559.9A priority Critical patent/CN104346520B/en
Publication of CN104346520A publication Critical patent/CN104346520A/en
Application granted granted Critical
Publication of CN104346520B publication Critical patent/CN104346520B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a neural network based data dimension reduction system and a dimension reducing method thereof. The neural network based data dimension reduction system comprises a data acquisition system, wherein the data acquisition system is connected with a control system; the control system comprises a data dimension reduction module based on the neural network. The neural network based data dimension reduction system can be combined with the dimension reducing method to effectively overcome the defects that in the prior art, the operand is large, and the neighborhood determination is uncertain and not practical.

Description

A kind of Data Dimensionality Reduction system based on neuroid and dimension reduction method thereof
Technical field
The invention belongs to Data Dimensionality Reduction technical field, be specifically related to a kind of Data Dimensionality Reduction system based on neuroid and dimension reduction method thereof.
Background technology
Current image, video also have the signal of communication of some complexity to be transferred in control system in data acquisition system (DAS), normally carry out storing with the data mode of higher-dimension, so just bring in use take control system resource too much and the large problem very consuming time of operand, the serious problem that control system even can be caused to collapse.
Therefore the high dimensional data that existing control system generally have employed the such as image got from data acquisition system (DAS), video also have the signal of communication of some complexity to form carries out dimension-reduction treatment before the use, but the following problem of existing dimensionality reduction mode ubiquity:
(1) operand is still very large: the time complexity of the distance of the geodesic line based on the k neighbour figure step such as under existing dimension-reduction algorithm is O (kN 2logN), the time complexity protecting distance mapping step is O (N 3), under such computing, time complexity is very large;
(2) uncertainty determined of neighborhood: existing dimension-reduction algorithm uses k neighbour figure to carry out the calculating of geodesic line distance, but just there is article to deliver query in science magazine as far back as 2002, k is excessive causes short circuit error, the too small problem causing fragment of k, and the method solved can only be select suitable k to carry out uncertain dimensionality reduction, the same like this increase that can cause the complexity of operand, often occur that dimensionality reduction result is excessive with the error phase ratio error of former high dimensional data, even distortion completely;
(3) without practicality: high dimensional data point of newly arriving can change whole k neighbour figure, needs all to recalculate, online process is difficult to thus without practicality.
Summary of the invention
Object of the present invention provides a kind of Data Dimensionality Reduction system based on neuroid and dimension reduction method thereof, comprise data acquisition system (DAS), described data acquisition system (DAS) is connected with control system, with the Data Dimensionality Reduction module based on neuroid in described control system.And effectively can avoid operand of the prior art still very large, uncertainty that neighborhood is determined and the defect without practicality in conjunction with its dimension reduction method.
In order to overcome deficiency of the prior art, the invention provides a kind of based on the Data Dimensionality Reduction system of neuroid and the solution of dimension reduction method method thereof, specific as follows:
Based on a Data Dimensionality Reduction system for neuroid, comprise data acquisition system (DAS) 1, described data acquisition system (DAS) 1 is connected with control system 2, with the Data Dimensionality Reduction module 3 based on neuroid in described control system 2.
The measurement dimension reduction method of described a kind of Data Dimensionality Reduction system based on neuroid, step is as follows:
Step 1: first data acquisition system (DAS) is gathering the image that comes or the such signal data of video is sent in control system 2, then the control system 2 Data Dimensionality Reduction module 3 started based on neuroid is first configured to High Dimensional Data Set the image sent or the such signal data of video and stores;
Step 2: the process then determining Topology of Mainfolds structure reference point based on Data Dimensionality Reduction module 3 pairs of high dimensional datas of neuroid, described process high dimensional data being determined to Topology of Mainfolds structure reference point, describedly particularly determine that the detailed process of the process of Topology of Mainfolds structure reference point is for first to carry out initialization to high dimensional data, described initialization comprises first setting reference point set A={ L 1, L 2, wherein A is reference point set, L 1be the first reference point, L 2be the second reference point, the first reference point and the second reference point random concentrate from high dimensional data two high dimensional datas chosen; Then set limit set C based on the Data Dimensionality Reduction module 3 of neuroid, initial value be 0 two activate number variable, initial values be || L 1-L 2|| two range threshold variablees and initial value be 0 first be connected age variable described and its initial value is empty set, A × A represents the annexation between the reference point of reference point set, initial value is that empty set represents and initially do not connect between the first reference point and the second reference point, described two are activated number variable and are respectively the activation number variable for the first reference point and the activation number variable for the second reference point, and the activation number variable for the first reference point and the activation number variable for the second reference point are respectively with , described two range threshold variablees are respectively the first range threshold variable with the second range threshold variable described first connects age variable what represent is the connection duration of the first reference point and the second reference point;
Step 3: then enter input and competitive stage, described input and competitive stage comprise data acquisition system (DAS) and continue one and gather image or the such signal data of video, and gathering an image coming or the such signal data of video is sent in control system, the Data Dimensionality Reduction module 3 based on neuroid in control system is first stored as a high dimensional data the image received or the such signal data of video, and described high dimensional data is as a new data model ξ ∈ R d, wherein said new data model is ξ, described R drepresent higher-dimension real number space, described R represents real number, D represents the dimension of high dimensional data, then calculate the Euclidean distance of each reference point in A and new data model ξ, the reference point corresponding to minimum Euclidean distance obtained and the reference point corresponding to little Euclidean distance second from the bottom are respectively victor's reference point s 1with second place's reference point s 2, the victor's reference point s namely represented by formula (1) and formula (2) 1with second place's reference point s 2:
S 1 = arg min x ∈ A | | ξ - x | | - - - ( 1 )
S 2 = arg min x ∈ A \ { s 1 } | | ξ - x | | - - - ( 2 )
Victor's reference point s 1with second place's reference point s 2just become two the most similar reference points; Enter the reference point more new stage subsequently, described reference point more the new stage Data Dimensionality Reduction module 3 comprised based on neuroid judge if or set up, just for new data model ξ is put in reference point set A, to generate a new value be ξ reference point, and namely A=A ∪ { ξ }, then returns in step 3 and perform;
Step 4: if s 1with s 2between there is not connection, perform C=C ∪ { (s 1, s 2) operation, be between two most similar references points and connect, reset the second age variable that initial value is 0 the second described age variable that represent is victor's reference point s 1with second place's reference point s 2connection duration; If then judged then perform operation, operation represent and s 1the connection duration of all reference points be connected adds 1, described be the 3rd age variable, that the 3rd age variable represents is victor's reference point s 1with all reference point L be attached thereto iconnection duration, i is natural number variable, sets for victor's reference point s 1activation number variable and to for victor's reference point s 1activation number variable perform operation, value be increase progressively from 0, then perform s 1=s 1+ ε (t) || ξ-s 1|| and s 2=s 2+ ε ' (t) || ξ-s 2|| operation, namely perform s 1with s 2to the operation of new data model movement, wherein t is the working time of the Data Dimensionality Reduction system based on neuroid;
Step 5: the Data Dimensionality Reduction module 3 based on neuroid checks the connection (L between all reference points i, L j) ∈ C and each group reference point between connection (L i, L j) corresponding to current age parameter if just remove this connection from C, wherein age maxbe predefined connection duration maximal value, the connection between wherein said all reference points is (L i, L j) ∈ C, wherein i and j is unequal natural number, described for (L i, L j) between connection duration;
Step 6: the Data Dimensionality Reduction module 3 based on neuroid then performs the more new stage of the range threshold of reference point, the more new stage of the range threshold of described reference point comprises s 1and s 2range threshold with be updated to respectively and s by formula (3) and formula (4) 1and s 2the ultimate range of adjacent reference point
T s 1 = arg max ( x , s 1 ) ∈ C | | x - s 1 | | - - - ( 3 )
T s 2 = arg max ( x , s 2 ) ∈ C | | x - s 2 | | - - - ( 4 )
Described with be respectively for victor's reference point s 1range threshold and second place's reference point s 2range threshold, then the denoising stage is entered, the described denoising stage comprises by judging based on the Data Dimensionality Reduction module 3 of neuroid if the data sample sum of current input is the integral multiple defining value λ of setting, check the reference point in all reference point set A, if there is some reference point L ionly have a reference point be connected, and be less than the activation number minimum M of setting min, just in reference point set A, leave out this reference point L i, described for for some reference point L iactivation number variable, return in step 2 perform;
Step 7: the Data Dimensionality Reduction module 3 then based on neuroid enters the Calculation Basis point similarity stage;
Step 8: natural number variable i value is added 1, by extraction reference point L i(i=1 ..., n), wherein n is the reference point number in reference point set A, for this reference point L ienter the initial phase in Calculation Basis point similarity stage, first perform S={L i, U=A-{L ioperation, S is the first intermediate quantity set, and U is the second intermediate quantity set, then the similarity matrix D of n*n g(n*n) D in gthe value of (i, i) element is set to 0, described D gthis reference point of (i, i) element representation L iwith the Similarity value of self, for each reference point L in U j(L j∈ U), if L iwith L jbe connected, i.e. (L i, L j) ∈ C, then D g(i, j) element value is set to || L i-L j||; Otherwise D g(i, j) element value is set to ∞, described D gthis reference point L described in (i, j) element indicates iwith the L of the element in U jbetween Similarity value
Step 9: enter intermediate point and choose the stage, described intermediate point choose the stage comprise to choose from U with this reference point L ithe reference point L that Similarity value is minimum min, i.e. L min=argminD g(i, j) and L min∈ U, by L minadd S, i.e. S=S ∪ { L min, U=U-{L min;
Step 10: then enter limit and expand the stage, the described limit stage of expanding comprises for each reference point L in U k(L k∈ U), k is natural number, if L minwith L kbe connected, i.e. (L min, L k) ∈ C, and D g(i, min)+|| L min-L k|| < D g(i, k), min is the sequence number of Lmin, then perform renewal rewards theory as shown in formula (5):
D G(i,k)=D G(i,min)+||L min-L k|| (5)
Then repeated execution of steps 9 and step 10 are until S=A, till;
Step 11: return step 8 and perform, when i value reaches n by the time, after the reference point in expression reference point set A is all finished, obtains the similarity matrix D of n*n g(n*n);
Step 12: the Data Dimensionality Reduction module 3 then based on neuroid enters reference point dimensionality reduction mapping phase, described reference point dimensionality reduction mapping phase comprises by formula (6) calculating square distance matrix Δ n(i, j):
Δ n(i,j)=D G(i,j)*D G(i,j),(i,j=1,…n) (6)
Then by formula (7) computation of mean values vector
&delta; &RightArrow; &mu; = ( &delta; &RightArrow; 1 + &delta; &RightArrow; 2 + . . . + &delta; &RightArrow; n ) / n - - - ( 7 )
Described represent Δ ni-th row of (i, j), i value is 1 to n;
Step 13: by formula (8) computation of mean values centralization matrix H n:
H n ( i , j ) = &delta; ( i , j ) - 1 n - - - ( 8 )
Wherein δ (i, j) is intermediate parameters, generally gets 1, H n(i, j) represents average centralization matrix H nthe element value of the i-th row jth row;
Step 14: by formula (9) inner product matrix B n:
B n = - 1 2 H n &Delta; n H n - - - ( 9 )
Step 15: calculate eigenwert proper vector, described calculating eigenwert proper vector comprises calculating B nmaximum d positive eigenvalue λ 1... λ dwith its characteristic of correspondence vector wherein d is the target dimension of dimensionality reduction;
Step 16: the dimensionality reduction mapping phase entering reference point, described dimensionality reduction mapping phase comprises and obtains by formula (10) matrix L that maps for the dimensionality reduction of reference point:
L = &lambda; 1 &CenterDot; v 1 &RightArrow; T &lambda; 2 &CenterDot; v 2 &RightArrow; T . . . &lambda; d &CenterDot; v d &RightArrow; T - - - ( 10 ) , The column vector of n d dimension of the matrix L mapped for the dimensionality reduction of reference point be respectively the coordinate of n reference point at d dimension space;
Step 17: enter online data dimensionality reduction mapping phase, described online data dimensionality reduction mapping phase comprises determines reference point belonging to new data point, determines the reference point L nearest apart from new data model ξ by formula (11) α:
L &alpha; = arg min x &Element; A | | &xi; - x | | - - - ( 11 )
Step 18: obtain new data model ξ and the similarity D of all reference points according to formula (12) s(ξ, L i):
D S(ξ,L i)=||ξ-L α||+D G(α,i) (12)
Step 19: obtain square distance vector according to formula (13)
&delta; &RightArrow; &xi; ( i ) = D S ( &xi; , L i ) * D S ( &xi; , L i ) - - - ( 13 )
Step 20: obtain pseudoinverse transposed matrix according to formula (14), note L #pseudoinverse transposed matrix for the matrix L that the dimensionality reduction of reference point maps:
L # = v 1 &RightArrow; T / &lambda; 1 v 2 &RightArrow; T / &lambda; 2 . . . v d &RightArrow; T / &lambda; d - - - ( 14 )
Step 21: according to formula (15), low-dimensional is carried out to new data model ξ and map and obtain low-dimensional and map vectorial l ξ:
l &xi; = - 1 2 L # ( &delta; &RightArrow; &xi; - &delta; &RightArrow; &mu; ) - - - ( 15 ) .
By these technical characteristics, dimension reduction method of the present invention overcomes in conventional linear dimension reduction method, uses Euclidean distance to represent the shortcoming of similarity, proposes to use geodesic line distance to weigh similarity, thus obtain desirable dimensionality reduction result, for subsequent data analysis provides reliable pre-service.
Accompanying drawing explanation
Figure l is the syndeton schematic diagram of a kind of Data Dimensionality Reduction system based on neuroid of the present invention.
Fig. 2 be the Data Dimensionality Reduction module based on neuroid in embodiments of the invention 1 high dimensional data is determined to the process of Topology of Mainfolds structure reference point face design sketch.
Fig. 3 be the Data Dimensionality Reduction module based on neuroid in embodiments of the invention 1 high dimensional data is determined to the process of Topology of Mainfolds structure reference point overlook design sketch.
Fig. 4 is that the low-dimensional of carrying out in embodiments of the invention 1 maps the design sketch after obtaining low-dimensional mapping vector.
Fig. 5 be the Data Dimensionality Reduction module based on neuroid in embodiments of the invention 2 high dimensional data is determined to the process of Topology of Mainfolds structure reference point face design sketch.
Fig. 6 be the Data Dimensionality Reduction module based on neuroid in embodiments of the invention 2 high dimensional data is determined to the process of Topology of Mainfolds structure reference point overlook design sketch.
Fig. 7 is that the low-dimensional of carrying out in embodiments of the invention 2 maps the design sketch after obtaining low-dimensional mapping vector.
Fig. 8 be the Data Dimensionality Reduction module based on neuroid in embodiments of the invention 3 high dimensional data is determined to the process of Topology of Mainfolds structure reference point face design sketch.
Fig. 9 be the Data Dimensionality Reduction module based on neuroid in embodiments of the invention 3 high dimensional data is determined to the process of Topology of Mainfolds structure reference point overlook design sketch.
Figure 10 is that the low-dimensional of carrying out in embodiments of the invention 3 maps the design sketch after obtaining low-dimensional mapping vector.
Embodiment
The object of the invention is a kind of efficiently Data Dimensionality Reduction system based on neuroid and the dimension reduction method thereof of developing robotization, be further detailed by drawings and Examples:
Embodiment 1:
It is swiss_roll data set that the image sent in the present embodiment or the such signal data of video are configured to High Dimensional Data Set, wherein 15000 data points of swiss_roll data centralization are used for determining reference point, and other 5000 data points are used for obtaining low-dimensional and map vectorial l ξ, specific as follows:
As shown in Figure 1, Figure 2, Figure 3 and Figure 4, based on the Data Dimensionality Reduction system of neuroid, comprise data acquisition system (DAS) 1, described data acquisition system (DAS) 1 is connected with control system 2, with the Data Dimensionality Reduction module 3 based on neuroid in described control system 2.
The measurement dimension reduction method of described a kind of Data Dimensionality Reduction system based on neuroid, step is as follows:
Step 1: first data acquisition system (DAS) is gathering the image that comes or the such signal data of video is sent in control system 2, then the control system 2 Data Dimensionality Reduction module 3 started based on neuroid is first configured to High Dimensional Data Set the image sent or the such signal data of video and stores;
Step 2: the process then determining Topology of Mainfolds structure reference point based on Data Dimensionality Reduction module 3 pairs of high dimensional datas of neuroid, to high dimensional data, described determines that the target of the process of Topology of Mainfolds structure reference point utilizes training data to train self organizing neural network, make the result of training can represent the topological structure of former data set, reference point needed for generation and connection, described particularly high dimensional data is determined that the detailed process of the process of Topology of Mainfolds structure reference point is for first to carry out initialization, described initialization comprises first setting reference point set A={ L 1, L 2, wherein A is reference point set, L 1be the first reference point, L 2be the second reference point, the first reference point and the second reference point random concentrate from high dimensional data two high dimensional datas chosen, then set limit set C based on the Data Dimensionality Reduction module 3 of neuroid, initial value be 0 two activate number variable, initial values be || L 1-L 2|| two range threshold variablees and initial value be 0 first be connected age variable described and its initial value is empty set, A × A represents the annexation between the reference point of reference point set, initial value is that empty set represents and initially do not connect between the first reference point and the second reference point, described two are activated number variable and are respectively the activation number variable for the first reference point and the activation number variable for the second reference point, and the activation number variable for the first reference point and the activation number variable for the second reference point are respectively with , described two range threshold variablees are respectively the first range threshold variable with the second range threshold variable described first connects age variable what represent is the connection duration of the first reference point and the second reference point,
Step 3: then enter input and competitive stage, described input and competitive stage comprise data acquisition system (DAS) and continue one and gather image or the such signal data of video, and gathering an image coming or the such signal data of video is sent in control system, the Data Dimensionality Reduction module 3 based on neuroid in control system is first stored as a high dimensional data the image received or the such signal data of video, and described high dimensional data is as a new data model ξ ∈ R d, wherein said new data model is ξ, described R drepresent higher-dimension real number space, described R represents real number, D represents the dimension of high dimensional data, then calculate the Euclidean distance of each reference point in A and new data model ξ, the reference point corresponding to minimum Euclidean distance obtained and the reference point corresponding to little Euclidean distance second from the bottom are respectively victor's reference point s 1with second place's reference point s 2, the victor's reference point s namely represented by formula (1) and formula (2) 1with second place's reference point s 2:
S 1 = arg min x &Element; A | | &xi; - x | | - - - ( 1 )
S 2 = arg min x &Element; A \ { s 1 } | | &xi; - x | | - - - ( 2 )
Victor's reference point s 1with second place's reference point s 2just become two the most similar reference points; Enter the reference point more new stage subsequently, described reference point more the new stage Data Dimensionality Reduction module 3 comprised based on neuroid judge if or set up, just for new data model ξ is put in reference point set A, to generate a new value be ξ reference point, and namely A=A ∪ { ξ }, then returns in step 3 and perform;
Step 4: if s 1with s 2between there is not connection, perform C=C ∪ { (s 1, s 2) operation, be between two most similar references points and connect, reset the second age variable that initial value is 0 the second described age variable that represent is victor's reference point s 1with second place's reference point s 2connection duration; If then judge (s 1, L i) ∈ C, then perform operation, operation represent and s 1the connection duration of all reference points be connected adds 1, described be the 3rd age variable, that the 3rd age variable represents is victor's reference point s 1with all reference point L be attached thereto iconnection duration, i is natural number variable, sets for victor's reference point s 1activation number variable and to for victor's reference point s 1activation number variable perform operation, value be increase progressively from 0, then perform s 1=s 1+ ε (t) || ξ-s 1|| and s 2=s 2+ ε ' (t) || ξ-s 2|| operation, namely perform s 1with s 2to the operation of new data model movement, wherein t is the working time of the Data Dimensionality Reduction system based on neuroid;
Step 5: the Data Dimensionality Reduction module 3 based on neuroid checks the connection (L between all reference points i, L j) ∈ C and each group reference point between connection (L i, L j) corresponding to current age parameter if just remove this connection from C, wherein age maxbe predefined connection duration maximal value, the connection between wherein said all reference points is (L i, L j) ∈ C, wherein i and j is unequal natural number, described for (L i, L j) between connection duration;
Step 6: the Data Dimensionality Reduction module 3 based on neuroid then performs the more new stage of the range threshold of reference point, the more new stage of the range threshold of described reference point comprises s 1and s 2range threshold with be updated to respectively and s by formula (3) and formula (4) 1and s 2the ultimate range of adjacent reference point
T s 1 = arg max ( x , s 1 ) &Element; C | | x - s 1 | | - - - ( 3 )
T s 2 = arg max ( x , s 2 ) &Element; C | | x - s 2 | | - - - ( 4 )
Described with be respectively for victor's reference point s 1range threshold and second place's reference point s 2range threshold, then the denoising stage is entered, the described denoising stage comprises by judging based on the Data Dimensionality Reduction module 3 of neuroid if the data sample sum of current input is the integral multiple defining value λ of setting, check the reference point in all reference point set A, if there is some reference point L ionly have a reference point be connected, and be less than the activation number minimum M of setting min, just in reference point set A, leave out this reference point L i, described for for some reference point L iactivation number variable, return in step 2 perform; By the time, after training data sample all inputs, the connection C between Topology of Mainfolds structure reference point set A needed for us and reference point is obtained.
Step 7: the Data Dimensionality Reduction module 3 then based on neuroid enters the Calculation Basis point similarity stage, utilize the topology diagram produced in preceding step, i.e. reference point and annexation, Calculation Basis point each other shortest path in the drawings represents similarity, n the reference point produced, this sample stage just needs to calculate the shortest path of each reference point relative to other all reference points, thus produces similarity matrix D g(n*n) it is 0 that, the described Calculation Basis point similarity stage comprises first setting natural number variable i value;
Step 8: natural number variable i value is added 1, by extraction reference point L i(i=1 ..., n), wherein n is the reference point number in reference point set A, for this reference point L ienter the initial phase in Calculation Basis point similarity stage, first perform S={L i, U=A-{L ioperation, S is the first intermediate quantity set, and U is the second intermediate quantity set, then the similarity matrix D of n*n g(n*n) D in gthe value of (i, i) element is set to 0, described D gthis reference point of (i, i) element representation L iwith the Similarity value of self, for each reference point L in U j(L j∈ U), if L iwith L jbe connected, i.e. (L i, L j) ∈ C, then D g(i, j) element value is set to || L i-L j||; Otherwise D g(i, j) element value is set to ∞, described D gthis reference point L described in (i, j) element indicates iwith the L of the element in U jbetween Similarity value
Step 9: enter intermediate point and choose the stage, described intermediate point choose the stage comprise to choose from U with this reference point L ithe reference point L that Similarity value is minimum min, i.e. L min=argminD g(i, j) and L min∈ U, by L minadd s, namely
S=S∪{L min},U=U-{L min};
Step 10: then enter limit and expand the stage, the described limit stage of expanding comprises for each reference point L in U k(L k∈ U), k is natural number, if L minwith L kbe connected, i.e. (L min, L k) ∈ C, and D g(i, min)+|| L min-L k|| < D g(i, k), min is L minsequence number, then perform renewal rewards theory as shown in formula (5):
D G(i,k)=D G(i,min)+||L min-L k|| (5)
Then repeated execution of steps 9 and step 10 are until S=A, till;
Step 11: return step 8 and perform, when i value reaches n by the time, after the reference point in expression reference point set A is all finished, obtains the similarity matrix D of n*n g(n*n);
Step 12: the Data Dimensionality Reduction module 3 then based on neuroid enters reference point dimensionality reduction mapping phase, the object in this stage is at the similarity matrix D keeping n*n g(n*n) carry out dimensionality reduction mapping to reference point under prerequisite, optimization aim is the coordinate of trying to achieve reference point in lower dimensional space, makes the Euclidean distance under lower dimensional space and similarity the most close, namely minimum error described reference point dimensionality reduction mapping phase comprises by formula (6) calculating square distance matrix Δ n(i, j):
Δ n(i,j)=D G(i,j)*D G(i,j),(i,j=1,…n) (6)
Then by formula (7) computation of mean values vector
&delta; &RightArrow; &mu; = ( &delta; &RightArrow; 1 + &delta; &RightArrow; 2 + . . . + &delta; &RightArrow; n ) / n - - - ( 7 )
Described represent Δ ni-th row of (i, j), i value is 1 to n;
Step 13: by formula (8) computation of mean values centralization matrix H n:
H n ( i , j ) = &delta; ( i , j ) - 1 n - - - ( 8 )
Wherein δ (i, j) is intermediate parameters, generally gets 1, H n(i, j) represents average centralization matrix H nthe element value of the i-th row jth row;
Step 14: by formula (9) inner product matrix B n:
B n = - 1 2 H n &Delta; n H n - - - ( 9 )
Step 15: calculate eigenwert proper vector, described calculating eigenwert proper vector comprises calculating B nmaximum d positive eigenvalue λ 1... λ dwith its characteristic of correspondence vector wherein d is the target dimension of dimensionality reduction;
Step 16: the dimensionality reduction mapping phase entering reference point, described dimensionality reduction mapping phase comprises and obtains by formula (10) matrix L that maps for the dimensionality reduction of reference point:
L = &lambda; 1 &CenterDot; v 1 &RightArrow; T &lambda; 2 &CenterDot; v 2 &RightArrow; T . . . &lambda; d &CenterDot; v d &RightArrow; T - - - ( 10 ) , The column vector of n d dimension of the matrix L mapped for the dimensionality reduction of reference point be respectively the coordinate of n reference point at d dimension space, since then we obtain high dimensional data lower dimensional space map needed for topological structure represent information;
Step 17: enter online data dimensionality reduction mapping phase, the target in this stage is the information obtained according to above-mentioned steps, with online mode, dimensionality reduction mapping is carried out to higher-dimension new data, described online data dimensionality reduction mapping phase comprises determines reference point belonging to new data point, determines the reference point L nearest apart from new data model ξ by formula (11) α:
L &alpha; = arg min x &Element; A | | &xi; - x | | - - - ( 11 )
Step 18: obtain new data model ξ and the similarity D of all reference points according to formula (12) s(ξ, L i):
D S(ξ,L i)=||ξ-L α||+D G(α,i) (12)
Step 19: obtain square distance vector according to formula (13)
&delta; &RightArrow; &xi; ( i ) = D S ( &xi; , L i ) * D S ( &xi; , L i ) - - - ( 13 )
Step 20: obtain pseudoinverse transposed matrix according to formula (14), note L #pseudoinverse transposed matrix for the matrix L that the dimensionality reduction of reference point maps:
L # = v 1 &RightArrow; T / &lambda; 1 v 2 &RightArrow; T / &lambda; 2 . . . v d &RightArrow; T / &lambda; d - - - ( 14 )
Step 21: according to formula (15), low-dimensional is carried out to new data model ξ and map and obtain low-dimensional and map vectorial l ξ:
l &xi; = - 1 2 L # ( &delta; &RightArrow; &xi; - &delta; &RightArrow; &mu; ) - - - ( 15 ) .
Can be found out by the accompanying drawing for embodiment 1, the present embodiment overcomes in conventional linear dimension reduction method really, Euclidean distance is used to represent the shortcoming of similarity, propose to use geodesic line distance to weigh similarity, thus obtain desirable dimensionality reduction result, for subsequent data analysis provides reliable pre-service.
Embodiment 2:
It is swiss_roll data set that the image sent in the present embodiment or the such signal data of video are configured to High Dimensional Data Set, swiss_roll data centralization is with 200 Gaussian noise data points, 15000 data points of swiss_roll data centralization are used for determining reference point in addition, and other 5000 data points are used for obtaining low-dimensional and map vectorial l ξ, specific as follows:
As shown in Fig. 1, Fig. 5, Fig. 6 and Fig. 7, based on the Data Dimensionality Reduction system of neuroid, comprise data acquisition system (DAS) 1, described data acquisition system (DAS) 1 is connected with control system 2, with the Data Dimensionality Reduction module 3 based on neuroid in described control system 2.
The measurement dimension reduction method of described a kind of Data Dimensionality Reduction system based on neuroid, step is as follows:
Step 1: first data acquisition system (DAS) is gathering the image that comes or the such signal data of video is sent in control system 2, then the control system 2 Data Dimensionality Reduction module 3 started based on neuroid is first configured to High Dimensional Data Set the image sent or the such signal data of video and stores;
Step 2: the process then determining Topology of Mainfolds structure reference point based on Data Dimensionality Reduction module 3 pairs of high dimensional datas of neuroid, to high dimensional data, described determines that the target of the process of Topology of Mainfolds structure reference point utilizes training data to train self organizing neural network, make the result of training can represent the topological structure of former data set, reference point needed for generation and connection, described particularly high dimensional data is determined that the detailed process of the process of Topology of Mainfolds structure reference point is for first to carry out initialization, described initialization comprises first setting reference point set A={ L 1, L 2, wherein A is reference point set, L 1be the first reference point, L 2be the second reference point, the first reference point and the second reference point random concentrate from high dimensional data two high dimensional datas chosen, then set limit set C based on the Data Dimensionality Reduction module 3 of neuroid, initial value be 0 two activate number variable, initial values be || L 1-L 2|| two range threshold variablees and initial value be 0 first be connected age variable described and its initial value is empty set, A × A represents the annexation between the reference point of reference point set, initial value is that empty set represents and initially do not connect between the first reference point and the second reference point, described two are activated number variable and are respectively the activation number variable for the first reference point and the activation number variable for the second reference point, and the activation number variable for the first reference point and the activation number variable for the second reference point are respectively with , described two range threshold variablees are respectively the first range threshold variable with the second range threshold variable described first connects age variable what represent is the connection duration of the first reference point and the second reference point,
Step 3: then enter input and competitive stage, described input and competitive stage comprise data acquisition system (DAS) and continue one and gather image or the such signal data of video, and gathering an image coming or the such signal data of video is sent in control system, the Data Dimensionality Reduction module 3 based on neuroid in control system is first stored as a high dimensional data the image received or the such signal data of video, and described high dimensional data is as a new data model ξ ∈ R d, wherein said new data model is ξ, described R drepresent higher-dimension real number space, described R represents real number, D represents the dimension of high dimensional data, then calculate the Euclidean distance of each reference point in A and new data model ξ, the reference point corresponding to minimum Euclidean distance obtained and the reference point corresponding to little Euclidean distance second from the bottom are respectively victor's reference point s 1with second place's reference point s 2, the victor's reference point s namely represented by formula (1) and formula (2) 1with second place's reference point s 2:
S 1 = arg min x &Element; A | | &xi; - x | | - - - ( 1 )
S 2 = arg min x &Element; A \ { s 1 } | | &xi; - x | | - - - ( 2 )
Victor's reference point s 1with second place's reference point s 2just become two the most similar reference points; Enter the reference point more new stage subsequently, described reference point more the new stage Data Dimensionality Reduction module 3 comprised based on neuroid judge if or set up, just for new data model ξ is put in reference point set A, to generate a new value be ξ reference point, and namely A=A ∪ { ξ }, then returns in step 3 and perform;
Step 4: if s 1with s 2between there is not connection, perform C=C ∪ { (s 1, s 2) operation, be between two most similar references points and connect, reset the second age variable that initial value is 0 the second described age variable that represent is victor's reference point s 1with second place's reference point s 2connection duration; If then judge (s 1, L i) ∈ C, then perform operation, operation represent and s 1the connection duration of all reference points be connected adds 1, described be the 3rd age variable, that the 3rd age variable represents is victor's reference point s 1with all reference point L be attached thereto iconnection duration, i is natural number variable, sets for victor's reference point s 1activation number variable and to for victor's reference point s 1activation number variable perform operation, value be increase progressively from 0, then perform s 1=s 1+ ε (t) || ξ-s 1|| and s 2=s 2+ ε ' (t) || ξ-s 2|| operation, namely perform s 1with s 2to the operation of new data model movement, wherein t is the working time of the Data Dimensionality Reduction system based on neuroid;
Step 5: the Data Dimensionality Reduction module 3 based on neuroid checks the connection (L between all reference points i, L j) ∈ C and each group reference point between connection (L i, L j) corresponding to current age parameter if just remove this connection from C, wherein age maxbe predefined connection duration maximal value, the connection between wherein said all reference points is (L i, L j) ∈ C, wherein i and j is unequal natural number, described for (L i, L j) between connection duration;
Step 6: the Data Dimensionality Reduction module 3 based on neuroid then performs the more new stage of the range threshold of reference point, the more new stage of the range threshold of described reference point comprises s 1and s 2range threshold with be updated to respectively and s by formula (3) and formula (4) 1and s 2the ultimate range of adjacent reference point
T s 1 = arg max ( x , s 1 ) &Element; C | | x - s 1 | | - - - ( 3 )
T s 2 = arg max ( x , s 2 ) &Element; C | | x - s 2 | | - - - ( 4 )
Described with be respectively for victor's reference point s 1range threshold and second place's reference point s 2range threshold, then the denoising stage is entered, the described denoising stage comprises by judging based on the Data Dimensionality Reduction module 3 of neuroid if the data sample sum of current input is the integral multiple defining value λ of setting, check the reference point in all reference point set A, if there is some reference point L ionly have a reference point be connected, and be less than the activation number minimum M of setting min, just in reference point set A, leave out this reference point L i, described for for some reference point L iactivation number variable, return in step 2 perform; By the time, after training data sample all inputs, the connection C between Topology of Mainfolds structure reference point set A needed for us and reference point is obtained.
Step 7: the Data Dimensionality Reduction module 3 then based on neuroid enters the Calculation Basis point similarity stage, utilize the topology diagram produced in preceding step, i.e. reference point and annexation, Calculation Basis point each other shortest path in the drawings represents similarity, n the reference point produced, this sample stage just needs to calculate the shortest path of each reference point relative to other all reference points, thus produces similarity matrix D g(n*n) it is 0 that, the described Calculation Basis point similarity stage comprises first setting natural number variable i value;
Step 8: natural number variable i value is added 1, by extraction reference point L i(i=1 ..., n), wherein n is the reference point number in reference point set A, for this reference point L ienter the initial phase in Calculation Basis point similarity stage, first perform S={L i, U=A-{L ioperation, S is the first intermediate quantity set, and U is the second intermediate quantity set, then the similarity matrix D of n*n g(n*n) D in gthe value of (i, i) element is set to 0, described D gthis reference point of (i, i) element representation L iwith the Similarity value of self, for each reference point L in U j(L j∈ U), if L iwith L jbe connected, i.e. (L i, L j) ∈ C, then D g(i, j) element value is set to || L i-L j||; Otherwise D g(i, j) element value is set to ∞, described D gthis reference point L described in (i, j) element indicates iwith the L of the element in U jbetween Similarity value
Step 9: enter intermediate point and choose the stage, described intermediate point choose the stage comprise to choose from U with this reference point L ithe reference point L that Similarity value is minimum min, i.e. L min=argminD g(i, j) and L min∈ U, by L minadd s, namely
S=S∪{L min},U=U-{L min};
Step 10: then enter limit and expand the stage, the described limit stage of expanding comprises for each reference point L in U k(L k∈ U), k is natural number, if L minwith L kbe connected, i.e. (L min, L k) ∈ C, and D g(i, min)+|| L min-L k|| < D g(i, k), min is L minsequence number, then perform renewal rewards theory as shown in formula (5):
D G(i,k)=D G(i,min)+||L min-L k|| (5)
Then repeated execution of steps 9 and step 10 are until S=A, till;
Step 11: return step 8 and perform, when i value reaches n by the time, after the reference point in expression reference point set A is all finished, obtains the similarity matrix D of n*n g(n*n);
Step 12: the Data Dimensionality Reduction module 3 then based on neuroid enters reference point dimensionality reduction mapping phase, the object in this stage is at the similarity matrix D keeping n*n g(n*n) carry out dimensionality reduction mapping to reference point under prerequisite, optimization aim is the coordinate of trying to achieve reference point in lower dimensional space, makes the Euclidean distance under lower dimensional space and similarity the most close, namely minimum error described reference point dimensionality reduction mapping phase comprises by formula (6) calculating square distance matrix Δ n(i, j):
Δ n(i,j)=D G(i,j)*D G(i,j),(i,j=1,…n) (6)
Then by formula (7) computation of mean values vector
&delta; &RightArrow; &mu; = ( &delta; &RightArrow; 1 + &delta; &RightArrow; 2 + . . . + &delta; &RightArrow; n ) / n - - - ( 7 )
Described represent Δ ni-th row of (i, j), i value is 1 to n;
Step 13: by formula (8) computation of mean values centralization matrix H n:
H n ( i , j ) = &delta; ( i , j ) - 1 n - - - ( 8 )
Wherein δ (i, j) is intermediate parameters, generally gets 1, H n(i, j) represents average centralization matrix H nthe element value of the i-th row jth row;
Step 14: by formula (9) inner product matrix B n:
B n = - 1 2 H n &Delta; n H n - - - ( 9 )
Step 15: calculate eigenwert proper vector, described calculating eigenwert proper vector comprises calculating B nmaximum d positive eigenvalue λ 1... λ dwith its characteristic of correspondence vector wherein d is the target dimension of dimensionality reduction;
Step 16: the dimensionality reduction mapping phase entering reference point, described dimensionality reduction mapping phase comprises and obtains by formula (10) matrix L that maps for the dimensionality reduction of reference point:
L = &lambda; 1 &CenterDot; v 1 &RightArrow; T &lambda; 2 &CenterDot; v 2 &RightArrow; T . . . &lambda; d &CenterDot; v d &RightArrow; T - - - ( 10 ) , The column vector of n d dimension of the matrix L mapped for the dimensionality reduction of reference point be respectively the coordinate of n reference point at d dimension space, since then we obtain high dimensional data lower dimensional space map needed for topological structure represent information;
Step 17: enter online data dimensionality reduction mapping phase, the target in this stage is the information obtained according to above-mentioned steps, with online mode, dimensionality reduction mapping is carried out to higher-dimension new data, described online data dimensionality reduction mapping phase comprises determines reference point belonging to new data point, determines the reference point L nearest apart from new data model ξ by formula (11) α:
L &alpha; = arg min x &Element; A | | &xi; - x | | - - - ( 11 )
Step 18: obtain new data model ξ and the similarity D of all reference points according to formula (12) s(ξ, L i):
D S(ξ,L i)=||ξ-L α||+D G(α,i) (12)
Step 19: obtain square distance vector according to formula (13)
&delta; &RightArrow; &xi; ( i ) = D S ( &xi; , L i ) * D S ( &xi; , L i ) - - - ( 13 )
Step 20: obtain pseudoinverse transposed matrix according to formula (14), note L #pseudoinverse transposed matrix for the matrix L that the dimensionality reduction of reference point maps:
L # = v 1 &RightArrow; T / &lambda; 1 v 2 &RightArrow; T / &lambda; 2 . . . v d &RightArrow; T / &lambda; d - - - ( 14 )
Step 21: according to formula (15), low-dimensional is carried out to new data model ξ and map and obtain low-dimensional and map vectorial l ξ:
l &xi; = - 1 2 L # ( &delta; &RightArrow; &xi; - &delta; &RightArrow; &mu; ) - - - ( 15 ) .
Can be found out by the accompanying drawing for embodiment 1, the present embodiment overcomes in conventional linear dimension reduction method really, Euclidean distance is used to represent the shortcoming of similarity, propose to use geodesic line distance to weigh similarity, thus obtain desirable dimensionality reduction result, for subsequent data analysis provides reliable pre-service.
Embodiment 3:
It is swiss_roll data set that the image sent in the present embodiment or the such signal data of video are configured to High Dimensional Data Set, swiss_roll data centralization is with 100 Uniform noise data points, 15000 data points of swiss_roll data centralization are used for determining reference point in addition, and other 5000 data points are used for obtaining low-dimensional and map vectorial l ξ, specific as follows:
As shown in Fig. 1, Fig. 8, Fig. 9 and Figure 10, based on the Data Dimensionality Reduction system of neuroid, comprise data acquisition system (DAS) 1, described data acquisition system (DAS) 1 is connected with control system 2, with the Data Dimensionality Reduction module 3 based on neuroid in described control system 2.
The measurement dimension reduction method of described a kind of Data Dimensionality Reduction system based on neuroid, step is as follows:
Step 1: first data acquisition system (DAS) is gathering the image that comes or the such signal data of video is sent in control system 2, then the control system 2 Data Dimensionality Reduction module 3 started based on neuroid is first configured to High Dimensional Data Set the image sent or the such signal data of video and stores;
Step 2: the process then determining Topology of Mainfolds structure reference point based on Data Dimensionality Reduction module 3 pairs of high dimensional datas of neuroid, to high dimensional data, described determines that the target of the process of Topology of Mainfolds structure reference point utilizes training data to train self organizing neural network, make the result of training can represent the topological structure of former data set, reference point needed for generation and connection, described particularly high dimensional data is determined that the detailed process of the process of Topology of Mainfolds structure reference point is for first to carry out initialization, described initialization comprises first setting reference point set A={ L 1, L 2, wherein A is reference point set, L 1be the first reference point, L 2be the second reference point, the first reference point and the second reference point random concentrate from high dimensional data two high dimensional datas chosen, then set limit set C based on the Data Dimensionality Reduction module 3 of neuroid, initial value be 0 two activate number variable, initial values be || L 1-L 2|| two range threshold variablees and initial value be 0 first be connected age variable described and its initial value is empty set, A × A represents the annexation between the reference point of reference point set, initial value is that empty set represents and initially do not connect between the first reference point and the second reference point, described two are activated number variable and are respectively the activation number variable for the first reference point and the activation number variable for the second reference point, and the activation number variable for the first reference point and the activation number variable for the second reference point are respectively with , described two range threshold variablees are respectively the first range threshold variable with the second range threshold variable described first connects age variable what represent is the connection duration of the first reference point and the second reference point,
Step 3: then enter input and competitive stage, described input and competitive stage comprise data acquisition system (DAS) and continue one and gather image or the such signal data of video, and gathering an image coming or the such signal data of video is sent in control system, the Data Dimensionality Reduction module 3 based on neuroid in control system is first stored as a high dimensional data the image received or the such signal data of video, and described high dimensional data is as a new data model ξ ∈ R d, wherein said new data model is ξ, described R drepresent higher-dimension real number space, described R represents real number, D represents the dimension of high dimensional data, then calculate the Euclidean distance of each reference point in A and new data model ξ, the reference point corresponding to minimum Euclidean distance obtained and the reference point corresponding to little Euclidean distance second from the bottom are respectively victor's reference point s 1with second place's reference point s 2, the victor's reference point s namely represented by formula (1) and formula (2) 1with second place's reference point s 2:
S 1 = arg min x &Element; A | | &xi; - x | | - - - ( 1 )
S 2 = arg min x &Element; A \ { s 1 } | | &xi; - x | | - - - ( 2 )
Victor's reference point s 1with second place's reference point s 2just become two the most similar reference points; Enter the reference point more new stage subsequently, described reference point more the new stage Data Dimensionality Reduction module 3 comprised based on neuroid judge if or set up, just for new data model ξ is put in reference point set A, to generate a new value be ξ reference point, and namely A=A ∪ { ξ }, then returns in step 3 and perform;
Step 4: if s 1with s 2between there is not connection, perform C=C ∪ { (s 1, s 2) operation, be between two most similar references points and connect, reset the second age variable that initial value is 0 the second described age variable that represent is victor's reference point s 1with second place's reference point s 2connection duration; If then judge (s 1, L i) ∈ C, then perform operation, operation represent and s 1the connection duration of all reference points be connected adds 1, described be the 3rd age variable, that the 3rd age variable represents is victor's reference point s 1with all reference point L be attached thereto iconnection duration, i is natural number variable, sets for victor's reference point s 1activation number variable and to for victor's reference point s 1activation number variable perform operation, value be increase progressively from 0, then perform s 1=s 1+ ε (t) || ξ-s 1|| and s 2=s 2+ ε ' (t) || ξ-s 2|| operation, namely perform s 1with s 2to the operation of new data model movement, wherein t is the working time of the Data Dimensionality Reduction system based on neuroid;
Step 5: the Data Dimensionality Reduction module 3 based on neuroid checks the connection (L between all reference points i, L j) ∈ C and each group reference point between connection (L i, L j) corresponding to current age parameter if just remove this connection from C, wherein age maxbe predefined connection duration maximal value, the connection between wherein said all reference points is (L i, L j) ∈ C, wherein i and j is unequal natural number, described for (L i, L j) between connection duration;
Step 6: the Data Dimensionality Reduction module 3 based on neuroid then performs the more new stage of the range threshold of reference point, the more new stage of the range threshold of described reference point comprises s 1and s 2range threshold with be updated to respectively and s by formula (3) and formula (4) 1and s 2the ultimate range of adjacent reference point
T s 1 = arg max ( x , s 1 ) &Element; C | | x - s 1 | | - - - ( 3 )
T s 2 = arg max ( x , s 2 ) &Element; C | | x - s 2 | | - - - ( 4 )
Described with be respectively for victor's reference point s 1range threshold and second place's reference point s 2range threshold, then the denoising stage is entered, the described denoising stage comprises by judging based on the Data Dimensionality Reduction module 3 of neuroid if the data sample sum of current input is the integral multiple defining value λ of setting, check the reference point in all reference point set A, if there is some reference point L ionly have a reference point be connected, and be less than the activation number minimum M of setting min, just in reference point set A, leave out this reference point L i, described for for some reference point L iactivation number variable, return in step 2 perform; By the time, after training data sample all inputs, the connection C between Topology of Mainfolds structure reference point set A needed for us and reference point is obtained.
Step 7: the Data Dimensionality Reduction module 3 then based on neuroid enters the Calculation Basis point similarity stage, utilize the topology diagram produced in preceding step, i.e. reference point and annexation, Calculation Basis point each other shortest path in the drawings represents similarity, n the reference point produced, this sample stage just needs to calculate the shortest path of each reference point relative to other all reference points, thus produces similarity matrix D g(n*n) it is 0 that, the described Calculation Basis point similarity stage comprises first setting natural number variable i value;
Step 8: natural number variable i value is added 1, by extraction reference point L i(i=1 ..., n), wherein n is the reference point number in reference point set A, for this reference point L ienter the initial phase in Calculation Basis point similarity stage, first perform S={L i, U=A-{L ioperation, S is the first intermediate quantity set, and U is the second intermediate quantity set, then the similarity matrix D of n*n g(n*n) D in gthe value of (i, i) element is set to 0, described D gthis reference point of (i, i) element representation L iwith the Similarity value of self, for each reference point L in U j(L j∈ U), if L iwith L jbe connected, i.e. (L i, L j) ∈ C, then D g(i, j) element value is set to || L i-L j||; Otherwise D g(i, j) element value is set to ∞, described D gthis reference point L described in (i, j) element indicates iwith the L of the element in U jbetween Similarity value
Step 9: enter intermediate point and choose the stage, described intermediate point choose the stage comprise to choose from U with this reference point L ithe reference point L that Similarity value is minimum min, i.e. L min=argminD g(i, j) and L min∈ U, by L minadd s, namely
S=S∪{L min},U=U-{L min};
Step 10: then enter limit and expand the stage, the described limit stage of expanding comprises for each reference point L in U k(L k∈ U), k is natural number, if L minwith L kbe connected, i.e. (L min, L k) ∈ C, and D g(i, min)+|| L min-L k|| < D g(i, k), min is L minsequence number, then perform renewal rewards theory as shown in formula (5):
D G(i,k)=D G(i,min)+||L min-L k|| (5)
Then repeated execution of steps 9 and step 10 are until S=A, till;
Step 11: return step 8 and perform, when i value reaches n by the time, after the reference point in expression reference point set A is all finished, obtains the similarity matrix D of n*n g(n*n);
Step 12: the Data Dimensionality Reduction module 3 then based on neuroid enters reference point dimensionality reduction mapping phase, the object in this stage is at the similarity matrix D keeping n*n g(n*n) carry out dimensionality reduction mapping to reference point under prerequisite, optimization aim is the coordinate of trying to achieve reference point in lower dimensional space, makes the Euclidean distance under lower dimensional space and similarity the most close, namely minimum error described reference point dimensionality reduction mapping phase comprises by formula (6) calculating square distance matrix Δ n(i, j):
Δ n(i,j)=D G(i,j)*D G(i,j),(i,j=1,…n) (6)
Then by formula (7) computation of mean values vector
&delta; &RightArrow; &mu; = ( &delta; &RightArrow; 1 + &delta; &RightArrow; 2 + . . . + &delta; &RightArrow; n ) / n - - - ( 7 )
Described represent Δ ni-th row of (i, j), i value is 1 to n;
Step 13: by formula (8) computation of mean values centralization matrix H n:
H n ( i , j ) = &delta; ( i , j ) - 1 n - - - ( 8 )
Wherein δ (i, j) is intermediate parameters, generally gets 1, H n(i, j) represents average centralization matrix H nthe element value of the i-th row jth row;
Step 14: by formula (9) inner product matrix B n:
B n = - 1 2 H n &Delta; n H n - - - ( 9 )
Step 15: calculate eigenwert proper vector, described calculating eigenwert proper vector comprises calculating B nmaximum d positive eigenvalue λ 1... λ dwith its characteristic of correspondence vector wherein d is the target dimension of dimensionality reduction;
Step 16: the dimensionality reduction mapping phase entering reference point, described dimensionality reduction mapping phase comprises and obtains by formula (10) matrix L that maps for the dimensionality reduction of reference point:
L = &lambda; 1 &CenterDot; v 1 &RightArrow; T &lambda; 2 &CenterDot; v 2 &RightArrow; T . . . &lambda; d &CenterDot; v d &RightArrow; T - - - ( 10 ) , The column vector of n d dimension of the matrix L mapped for the dimensionality reduction of reference point be respectively the coordinate of n reference point at d dimension space, since then we obtain high dimensional data lower dimensional space map needed for topological structure represent information;
Step 17: enter online data dimensionality reduction mapping phase, the target in this stage is the information obtained according to above-mentioned steps, with online mode, dimensionality reduction mapping is carried out to higher-dimension new data, described online data dimensionality reduction mapping phase comprises determines reference point belonging to new data point, determines the reference point L nearest apart from new data model ξ by formula (11) α:
L &alpha; = arg min x &Element; A | | &xi; - x | | - - - ( 11 )
Step 18: obtain new data model ξ and the similarity D of all reference points according to formula (12) s(ξ, L i):
D S(ξ,L i)=||ξ-L α||+D G(α,i) (12)
Step 19: obtain square distance vector according to formula (13)
&delta; &RightArrow; &xi; ( i ) = D S ( &xi; , L i ) * D S ( &xi; , L i ) - - - ( 13 )
Step 20: obtain pseudoinverse transposed matrix according to formula (14), note L #pseudoinverse transposed matrix for the matrix L that the dimensionality reduction of reference point maps:
L # = v 1 &RightArrow; T / &lambda; 1 v 2 &RightArrow; T / &lambda; 2 . . . v d &RightArrow; T / &lambda; d - - - ( 14 )
Step 21: according to formula (15), low-dimensional is carried out to new data model ξ and map and obtain low-dimensional and map vectorial l ξ:
l &xi; = - 1 2 L # ( &delta; &RightArrow; &xi; - &delta; &RightArrow; &mu; ) - - - ( 15 ) .
Can be found out by the accompanying drawing for embodiment 1, the present embodiment overcomes in conventional linear dimension reduction method really, Euclidean distance is used to represent the shortcoming of similarity, propose to use geodesic line distance to weigh similarity, thus obtain desirable dimensionality reduction result, for subsequent data analysis provides reliable pre-service.
The above, it is only preferred embodiment of the present invention, not any pro forma restriction is done to the present invention, although the present invention discloses as above with preferred embodiment, but and be not used to limit the present invention, any those skilled in the art, do not departing within the scope of technical solution of the present invention, make a little change when the technology contents of above-mentioned announcement can be utilized or be modified to the Equivalent embodiments of equivalent variations, in every case be do not depart from technical solution of the present invention content, according to technical spirit of the present invention, within the spirit and principles in the present invention, to any simple amendment that above embodiment is done, equivalent replacement and improvement etc., within the protection domain all still belonging to technical solution of the present invention.

Claims (2)

1. based on a Data Dimensionality Reduction system for neuroid, comprise data acquisition system (DAS), described data acquisition system (DAS) is connected with control system, with the Data Dimensionality Reduction module based on neuroid in described control system.
2. the measurement dimension reduction method of a kind of Data Dimensionality Reduction system based on neuroid according to claim 1, it is characterized in that, step is as follows:
Step 1: first data acquisition system (DAS) is gathering the image that comes or the such signal data of video is sent in control system, then the control system Data Dimensionality Reduction module started based on neuroid is first configured to High Dimensional Data Set the image sent or the such signal data of video and stores;
Step 2: the process then based on the Data Dimensionality Reduction module of neuroid, high dimensional data being determined to Topology of Mainfolds structure reference point, described process high dimensional data being determined to Topology of Mainfolds structure reference point, describedly particularly determine that the detailed process of the process of Topology of Mainfolds structure reference point is for first to carry out initialization to high dimensional data, described initialization comprises first setting reference point set A={ L 1, L 2, wherein A is reference point set, L 1be the first reference point, L 2be the second reference point, the first reference point and the second reference point random concentrate from high dimensional data two high dimensional datas chosen; Then based on Data Dimensionality Reduction module setting limit set C, the initial value of neuroid be 0 two activate number variable, initial values be || L 1-L 2|| two range threshold variablees and initial value be 0 first be connected age variable described and its initial value is empty set, A × A represents the annexation between the reference point of reference point set, initial value is that empty set represents and initially do not connect between the first reference point and the second reference point, described two are activated number variable and are respectively the activation number variable for the first reference point and the activation number variable for the second reference point, and the activation number variable for the first reference point and the activation number variable for the second reference point are respectively with described two range threshold variablees are respectively the first range threshold variable with the second range threshold variable described first connects age variable what represent is the connection duration of the first reference point and the second reference point;
Step 3: then enter input and competitive stage, described input and competitive stage comprise data acquisition system (DAS) and continue one and gather image or the such signal data of video, and gathering an image coming or the such signal data of video is sent in control system, the Data Dimensionality Reduction module 3 based on neuroid in control system is first stored as a high dimensional data the image received or the such signal data of video, and described high dimensional data is as a new data model ξ ∈ R d, wherein said new data model is ξ, described R drepresent higher-dimension real number space, described R represents real number, D represents the dimension of high dimensional data, then calculate the Euclidean distance of each reference point in A and new data model ξ, the reference point corresponding to minimum Euclidean distance obtained and the reference point corresponding to little Euclidean distance second from the bottom are respectively victor's reference point s 1with second place's reference point s 2, the victor's reference point s namely represented by formula (1) and formula (2) 1with second place's reference point s 2:
Victor's reference point s 1with second place's reference point s 2just become two the most similar reference points; Enter the reference point more new stage subsequently, described reference point more the new stage Data Dimensionality Reduction module 3 comprised based on neuroid judge if or set up, just for new data model ξ is put in reference point set A, to generate a new value be ξ reference point, and namely A=A ∪ { ξ }, then returns in step 3 and perform;
Step 4: if s 1with s 2between there is not connection, perform C=C ∪ { (s 1, s 2) operation, be between two most similar references points and connect, reset the second age variable that initial value is 0 the second described age variable that represent is victor's reference point s 1with second place's reference point s 2connection duration; If then judge (s 1, L i) ∈ C, then perform operation, operation represent and s 1the connection duration of all reference points be connected adds 1, described be the 3rd age variable, that the 3rd age variable represents is victor's reference point s 1with all reference point L be attached thereto iconnection duration, i is natural number variable, sets for victor's reference point s 1activation number variable and to for victor's reference point s 1activation number variable perform operation, value be increase progressively from 0, then perform s 1=s 1+ ε (t) || ξ-s 1|| and s 2=s 2+ ε ' (t) || ξ-s 2|| operation, namely perform s 1with s 2to the operation of new data model movement, wherein t is the working time of the Data Dimensionality Reduction system based on neuroid;
Step 5: based on the connection (L between the reference point that the Data Dimensionality Reduction module check of neuroid is all i, L j) ∈ C and each group reference point between connection (L i, L j) corresponding to current age parameter if just remove this connection from C, wherein age maxbe predefined connection duration maximal value, the connection between wherein said all reference points is (L i, L j) ∈ C, wherein i and j is unequal natural number, described between connection duration;
Step 6: the Data Dimensionality Reduction module based on neuroid then performs the more new stage of the range threshold of reference point, the more new stage of the range threshold of described reference point comprises s 1and s 2range threshold with be updated to respectively and s by formula (3) and formula (4) 1and s 2the ultimate range of adjacent reference point
Described with be respectively for victor's reference point s 1range threshold and second place's reference point s 2range threshold, then the denoising stage is entered, if the described denoising stage comprises by judging that the data sample sum of current input is the integral multiple defining value λ of setting based on the Data Dimensionality Reduction module of neuroid, check the reference point in all reference point set A, if there is some reference point L ionly have a reference point be connected, and be less than the activation number minimum M of setting min, just in reference point set A, leave out this reference point L i, described for for some reference point L iactivation number variable, return in step and perform;
Step 7: the Data Dimensionality Reduction module then based on neuroid enters the Calculation Basis point similarity stage;
Step 8: natural number variable i value is added 1, by extraction reference point L i(i=1 ..., n), wherein n is the reference point number in reference point set A, for this reference point L ienter the initial phase in Calculation Basis point similarity stage, first perform S={L i, U=A-{L ioperation, S is the first intermediate quantity set, and U is the second intermediate quantity set, then the similarity matrix D of n*n g(n*n) D in gthe value of (i, i) element is set to 0, described D gthis reference point of (i, i) element representation L iwith the Similarity value of self, for each reference point L in U j(L j∈ U), if L iwith L jbe connected, i.e. (L i, L j) ∈ C, then D g(i, j) element value is set to || L i-L j||; Otherwise D g(i, j) element value is set to ∞, described D gthis reference point L described in (i, j) element indicates iwith the L of the element in U jbetween Similarity value
Step 9: enter intermediate point and choose the stage, described intermediate point choose the stage comprise to choose from U with this reference point L ithe reference point L that Similarity value is minimum min, i.e. L min=argminD g(i, j) and L min∈ U, by L minadd S, i.e. S=S ∪ { L min, U=U-{L min;
Step 10: then enter limit and expand the stage, the described limit stage of expanding comprises for each reference point L in U k(L k∈ U), k is natural number, if L minwith L kbe connected, i.e. (L min, L k) ∈ C, and D g(i, min)+|| L min-L k|| < D g(i, k), min is L minsequence number, then perform renewal rewards theory as shown in formula (5):
D G(i,k)=D G(i,min)+||L min-L k|| (5)
Then repeated execution of steps 9 and step 10 are until S=A, till;
Step 11: return step 8 and perform, when i value reaches n by the time, after the reference point in expression reference point set A is all finished, obtains the similarity matrix D of n*n g(n*n);
Step 12: the Data Dimensionality Reduction module then based on neuroid enters reference point dimensionality reduction mapping phase, described reference point dimensionality reduction mapping phase comprises by formula (6) calculating square distance matrix Δ n(i, j):
Then by formula (7) computation of mean values vector
Described represent Δ ni-th row of (i, j), i value is 1 to n;
Step 13: by formula (8) computation of mean values centralization matrix H n:
Wherein δ (i, j) is intermediate parameters, generally gets 1, H n(i, j) represents average centralization matrix H nthe element value of the i-th row jth row;
Step 14: by formula (9) inner product matrix B n:
Step 15: calculate eigenwert proper vector, described calculating eigenwert proper vector comprises calculating B nmaximum d positive eigenvalue λ 1... λ dwith its characteristic of correspondence vector wherein d is the target dimension of dimensionality reduction;
Step 16: the dimensionality reduction mapping phase entering reference point, described dimensionality reduction mapping phase comprises and obtains by formula (10) matrix L that maps for the dimensionality reduction of reference point:
the column vector of n d dimension of the matrix L mapped for the dimensionality reduction of reference point be respectively the coordinate of n reference point at d dimension space;
Step 17: enter online data dimensionality reduction mapping phase, described online data dimensionality reduction mapping phase comprises determines reference point belonging to new data point, determines the reference point L nearest apart from new data model ξ by formula (11) α:
Step 18: obtain new data model ξ and the similarity D of all reference points according to formula (12) s(ξ, L i):
D S(ξ,L i)=||ξ-L α||+D G(α,i) (12)
Step 19: obtain square distance vector according to formula (13)
Step 20: obtain pseudoinverse transposed matrix according to formula (14), note L #pseudoinverse transposed matrix for the matrix L that the dimensionality reduction of reference point maps:
Step 21: according to formula (15), low-dimensional is carried out to new data model ξ and map and obtain low-dimensional and map vectorial l ξ:
CN201410362559.9A 2014-07-28 2014-07-28 A kind of Data Dimensionality Reduction system and its dimension reduction method based on neuroid Active CN104346520B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410362559.9A CN104346520B (en) 2014-07-28 2014-07-28 A kind of Data Dimensionality Reduction system and its dimension reduction method based on neuroid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410362559.9A CN104346520B (en) 2014-07-28 2014-07-28 A kind of Data Dimensionality Reduction system and its dimension reduction method based on neuroid

Publications (2)

Publication Number Publication Date
CN104346520A true CN104346520A (en) 2015-02-11
CN104346520B CN104346520B (en) 2017-10-13

Family

ID=52502108

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410362559.9A Active CN104346520B (en) 2014-07-28 2014-07-28 A kind of Data Dimensionality Reduction system and its dimension reduction method based on neuroid

Country Status (1)

Country Link
CN (1) CN104346520B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388869A (en) * 2018-02-28 2018-08-10 苏州大学 A kind of hand-written data sorting technique and system based on multiple manifold
CN110955809A (en) * 2019-11-27 2020-04-03 南京大学 High-dimensional data visualization method supporting topology structure maintenance

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546332A (en) * 2009-05-07 2009-09-30 哈尔滨工程大学 Manifold dimension-reducing medical image search method based on quantum genetic optimization
CN101807245A (en) * 2010-03-02 2010-08-18 天津大学 Artificial neural network-based multi-source gait feature extraction and identification method
CN102269972A (en) * 2011-03-29 2011-12-07 东北大学 Method and device for compensating pipeline pressure missing data based on genetic neural network

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101546332A (en) * 2009-05-07 2009-09-30 哈尔滨工程大学 Manifold dimension-reducing medical image search method based on quantum genetic optimization
CN101807245A (en) * 2010-03-02 2010-08-18 天津大学 Artificial neural network-based multi-source gait feature extraction and identification method
CN102269972A (en) * 2011-03-29 2011-12-07 东北大学 Method and device for compensating pipeline pressure missing data based on genetic neural network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
MUKUND BALASUBRAMANIAN 等: "The Isomap Algorithm and", 《SCIENCE》 *
吴证 等: "结合主元成分分析的受限玻耳兹曼机神经网络的降维方法", 《上海交通大学学报》 *
王建中: "基于流形学习的数据降维方法及其在人脸识别中的应用", 《中国博士学位论文全文数据库(信息科技辑)》 *
钱晓东 等: "基于信号传递的神经网络文本降维算法", 《计算机工程》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388869A (en) * 2018-02-28 2018-08-10 苏州大学 A kind of hand-written data sorting technique and system based on multiple manifold
CN108388869B (en) * 2018-02-28 2021-11-05 苏州大学 Handwritten data classification method and system based on multiple manifold
CN110955809A (en) * 2019-11-27 2020-04-03 南京大学 High-dimensional data visualization method supporting topology structure maintenance
CN110955809B (en) * 2019-11-27 2023-03-31 南京大学 High-dimensional data visualization method supporting topology structure maintenance

Also Published As

Publication number Publication date
CN104346520B (en) 2017-10-13

Similar Documents

Publication Publication Date Title
CN107330115B (en) Information recommendation method and device
Dash et al. An outliers detection and elimination framework in classification task of data mining
Papadopoulos et al. Network mapping by replaying hyperbolic growth
CN107808122A (en) Method for tracking target and device
CN112184391A (en) Recommendation model training method, medium, electronic device and recommendation model
JP2015099593A5 (en)
Barman et al. Shape: A novel graph theoretic algorithm for making consensus-based decisions in person re-identification systems
CN105978711B (en) A kind of best exchange side lookup method based on minimum spanning tree
CN107423762A (en) Semi-supervised fingerprinting localization algorithm based on manifold regularization
CN104156943B (en) Multi objective fuzzy cluster image change detection method based on non-dominant neighborhood immune algorithm
Mokarram et al. Using machine learning for land suitability classification
CN113780002A (en) Knowledge reasoning method and device based on graph representation learning and deep reinforcement learning
CN113378656B (en) Action recognition method and device based on self-adaptive graph convolution neural network
CN109756842A (en) Wireless indoor location method and system based on attention mechanism
CN111488460B (en) Data processing method, device and computer readable storage medium
Comarela et al. Robot routing in sparse wireless sensor networks with continuous ant colony optimization
CN114743273A (en) Human skeleton behavior identification method and system based on multi-scale residual error map convolutional network
CN104346520A (en) Neural network based data dimension reduction system and dimension reducing method thereof
CN109492770A (en) A kind of net with attributes embedding grammar based on the sequence of personalized relationship
CN113111193A (en) Data processing method and device of knowledge graph
CN115759199B (en) Multi-robot environment exploration method and system based on hierarchical graph neural network
Wang et al. Decentralized recommender systems
CN116738983A (en) Word embedding method, device and equipment for performing financial field task processing by model
Peng et al. Graphangel: Adaptive and structure-aware sampling on graph neural networks
CN104636489B (en) The treating method and apparatus of attribute data is described

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant