CN109948646A - A kind of time series data method for measuring similarity and gauging system - Google Patents
A kind of time series data method for measuring similarity and gauging system Download PDFInfo
- Publication number
- CN109948646A CN109948646A CN201910067744.8A CN201910067744A CN109948646A CN 109948646 A CN109948646 A CN 109948646A CN 201910067744 A CN201910067744 A CN 201910067744A CN 109948646 A CN109948646 A CN 109948646A
- Authority
- CN
- China
- Prior art keywords
- series data
- time series
- vector
- similarity
- event
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of time series data method for measuring similarity and gauging systems, comprising the following steps: firstly, for the event in all time series datas, the vector for learning each event is indicated;Secondly, the time map that each event is occurred is embedded into event vector at the vector with the dimensions such as event vector by vectorial addition;Finally, final sequence of events is indicated that being sent into convolutional neural networks carries out the study for having supervision, the time series data measuring similarity model of the robust that finally learns;Measuring similarity is carried out by obtained measuring similarity model.The present invention indicates more rationally effective to timing sequence data, so as to promote the accuracy of time series data measuring similarity.
Description
Technical field
The invention belongs to time series data similarity technical field, in particular to a kind of time series data method for measuring similarity and
Gauging system.
Background technique
Data measuring similarity is the underlying issue in data science, it is related to natural language processing, data retrieval, queue
Multiple application fields such as analysis.There are a large amount of time series data in reality scene, these data usually have timing, higher-dimension
Degree, heterogeneity, sparsity, the features such as not etc. peacekeepings are not irregular.
Currently, this representation method is because of sparsity, height usually using the sequence representation method based on one-hot vector
The features such as dimension, can seriously reduce the efficiency and accuracy of similarity calculation.In addition, existing method is usually in special time period
Polymeric sequence event ignores the opposite pass of the relativeness and each event and time of origin in sequence between each event
System, this will lead to the loss of temporal information.From the perspective of reality scene, most of events all can change at any time in sequence
Change and changes, and the correlativity of each event also can be different with the variation of event, therefore temporal information is for event
The expression of sequence is particularly significant.
To sum up, a kind of new time series data method for measuring similarity is needed.
Summary of the invention
The purpose of the present invention is to provide when a kind of time series data method for measuring similarity and gauging system, it is above-mentioned to solve
?.The present invention by effective expression to time series data, can make up conventional method ignore in data event it
Between and event and time of origin between relativeness defect, can to time series data similarity carry out valid metric.
In order to achieve the above objectives, the invention adopts the following technical scheme:
A kind of time series data method for measuring similarity, comprising the following steps:
Step 1, the sample time-series data for acquiring preset quantity consider the opposite of each event in each sample time-series data
The data of higher dimensional space are mapped to lower dimensional space by the relativeness of relationship and each event and time of origin, are constructed every
The expression of a sample time-series data;
Step 2, the expression of all sample time-series data step 1 obtained inputs preset convolutional neural networks mould
Type carries out feature extraction to the expression of each sample time-series data, obtains the feature vector of each sample time series data;
Step 3, the feature vector of each sample time series data obtained according to step 2 is calculated based on similarity matrix and is obtained
Similarity between each sample time series data;
Step 4, when each sample that the feature vector and step 3 of each sample time series data obtained by step 2 obtain
Similarity training preset convolutional neural networks model of the ordinal number between, training are trained to the default condition of convergence
Measuring similarity model;
Step 5, the expression of time series data to be measured is constructed by the method for step 1, and is inputted step 4 acquisition
In trained measuring similarity model, the measuring similarity result of time series data to be measured is obtained.
Further, step 1 specifically includes:
Step 1.1, it is a sequence of events by every sample time series data matrix conversion, is arranged according to the relative time of event
Event free of turn in the same time occurs for column event;
Step 1.2, each event is mapped to the vector of fixed length using word2vec, obtains each event in sequence of events
Vector comprising relativeness information indicates;
Step 1.3, time map event each in sequence of events occurred using word2vec is at event vector etc.
The vector of dimension, the vector for obtaining Time To Event indicate;
Step 1.4, the vector of Time To Event each in time series data is indicated to be embedded into relatively by vectorial addition
In the event expression answered, the expression of sample time-series data is obtained.
Further, in step 2, by convolutional neural networks extract the isometric features of each sample time-series data to
Amount;
Convolutional neural networks structure used includes:
Convolutional layer for receiving input data, and exports characteristic pattern;
Sample level, for receiving the characteristic pattern of convolutional layer output, and the fixed length feature vector of output timing data.
Further, in convolutional neural networks structure used:
The convolution of convolutional layer is unidirectional.
In convolutional layer, maximum sampling is taken, the feature of each characteristic pattern is sampled into a single numerical value, when finally obtaining
The fixed length vector of ordinal number evidence indicates.
Further, step 4 specifically includes:
Step 4.1, feature vector is merged with the similarity obtained is calculated, is spliced into a vector.
Step 4.2, by the DUAL PROBLEMS OF VECTOR MAPPING obtained in step 4.1 at a two-dimensional vector;Ordinal number at two articles of one-dimensional representation
The size for being 1 according to similarity, the size that two articles of time series data similarities of two-dimensional representation are 0;
Step 4.3, the similarity for obtaining two time series datas is calculated by Softmax;
Step 4.4, loss function, and the preset convolutional neural networks of training are constructed, trained measuring similarity is obtained
Model.
Further, in step 4.4, objective function is constructed first, and the loss of iteration each time is calculated according to objective function,
Objective function seeks local derviation to each parameter, and parameter is updated to its derivative negative direction loses, to continue to optimize parameter;
Loss function formalization representation are as follows:
L(S1,S2, y) and=(y-M (S1,S2))2;
In formula, S1、S2Indicate the data pair of input.
Further, in step 3 similarity calculation method are as follows: one matrix M of random initializtion, when with two obtained
The feature vector, X of ordinal number evidencea, XbThe similarity S of the two obtained is calculated with M.
A kind of time series data measuring similarity system, comprising:
Time series data indicates that building module considers each sample time-series for acquiring the sample time-series data of preset quantity
The relativeness of the relativeness of each event and each event and time of origin in data maps the data of higher dimensional space
To lower dimensional space, the expression of each sample time-series data is constructed;
Measuring similarity network module, for each sample time-series data to time series data expression building module building
It indicates to carry out feature extraction, obtains the feature vector of each sample time series data;For according to each sample time series data of acquisition
Feature vector calculates the similarity between obtaining each sample time series data based on similarity matrix;For passing through characteristic vector pickup
Between each sample time series data that the feature vector and similarity calculation module for each sample time series data that module obtains obtain
The preset convolutional neural networks model of similarity training, training obtain trained similarity degree to the default condition of convergence
Measure model;
The measuring similarity of time series data to be measured is completed by trained measuring similarity model.
Compared with prior art, the invention has the following advantages:
When only considering feature of event in special time period in sequence itself different from existing method, and ignoring it with occurring
Between relativeness;The present invention for time series data sparsity, it is high-dimensional, not etc. dimensions, timing and scrambling the features such as,
A kind of rationally effective time series data method for measuring similarity is provided.In method of the invention: firstly, for all time series datas
In event, the vector for constructing each event indicates that this vector indicates that the distance that can efficiently use vector space indicates to suffer from
The relativeness of each event of person;Secondly, the time map that each event is occurred passes through at the vector with the dimensions such as event vector
Vectorial addition is embedded into event vector;Finally, final sequence of events is indicated that being sent into convolutional neural networks has carried out supervision
Study, the time series data measuring similarity model of the robust that finally learns passes through obtained model and carries out measuring similarity.
The present invention by effective expression to time series data, compensate for conventional method ignore in data between event, event and when occurring
Between between relativeness the problem of, solve the problems, such as can not to time series data similarity carry out valid metric, clock synchronization of the present invention
Sequence sequence data indicates more rationally effective, so as to promote the accuracy of measuring similarity.In the present invention, the when ordinal number of acquisition
According to dense, low-dimensional is indicated, it may make that calculating is efficient;The time series data of acquisition is embedded in temporal information, and representation method is more reasonable;?
On the basis of reasonable representation of the present invention, feature calculation similarity is extracted using convolutional neural networks and is had supervision end to end
Training, while realizing that reasonable representation and efficient feature extract, so that measurement accuracy can be improved.
Further, sparse time series data matrix 1) is become into dense event vector, realizes non-sparsity.It 2) will be high
The expression of dimension event is mapped to low-dimensional vector space by word2vec, realizes low dimensional.3) final sequence of events indicates fusion
The relativeness between relativeness and event and time of origin between event.
Further, due to different along matrix both direction convolution operation from image analysis, time series data only exists
It is just significant that convolution is done in time orientation, so the convolution of convolutional layer is unidirectional.
Further, for the feature vector of extraction, the similarity between two feature vectors is calculated based on similarity matrix,
Consider the position due to arbitrarily exchanging two data, similarity should be equal, therefore use restraint to similarity matrix, i.e.,
The similarity matrix must be symmetrical.
Detailed description of the invention
Fig. 1 is a kind of schematic process flow diagram of time series data method for measuring similarity of the invention;
Fig. 2 is the signal of sequence of events matrix dimension-reduction treatment in a kind of time series data method for measuring similarity of the invention
Figure;
Fig. 3 is the schematic diagram of convolutional neural networks structure in a kind of time series data method for measuring similarity of the invention.
Specific embodiment
Invention is further described in detail in the following with reference to the drawings and specific embodiments.
Referring to Fig. 1, a kind of time series data method for measuring similarity of the invention, comprising the following steps:
Step 1, effective expression of time series data is constructed.The effective expression for constructing time series data, needs sparse timing
Data become dense, consider the relativeness of each event and the relativeness of each event and time of origin in sequence, will
The data of higher dimensional space are mapped to lower dimensional space.
Step 1 specifically includes the following steps:
Step 1.1, time series data matrix is excessively sparse, and it is huge to will lead to operand for directly analysis.Sparse matrix is become
Must be dense, the matrix dimension-reduction treatment of higher-dimension.Referring to Fig. 2, being a sequence of events every time series data matrix conversion, press
Event is arranged according to the relative time of event, the event free of turn in the same time occurs;
Step 1.2, each event is mapped to the vector of fixed length using word2vec, each event in sequence that obtains includes
The vector of relativeness information indicates;
Step 1.3, the time map each event occurred using word2vec is obtained at the vector with the dimensions such as event vector
The vector for obtaining Time To Event indicates;
Step 1.4, the vector of Time To Event each in time series data is indicated to be embedded into relatively by vectorial addition
In the event expression answered.
1) time series data representation method of the invention, which has the feature that, becomes dense for sparse time series data matrix
Event vector, non-sparsity.2) higher-dimension event is indicated to be mapped to low-dimensional vector space, low dimensional by word2vec.3) most
Whole sequence of events indicates to have merged the relativeness between the relativeness and event and time of origin between event.
Step 2, the feature of time series data is effectively extracted.
Expression for time series data needs to carry out feature extraction to it, effectively to carry out measuring similarity.In addition,
Since the quantity of each time series data event is different, the vector quantity that the sequence of events for causing previous step to obtain indicates is different, is
The data similarity that convenience is calculated as pair, needs the sequence expression of different length being mapped to isometric character representation.We
Method extracts isometric time series data character representation using improved convolutional neural networks.
Referring to Fig. 3, convolutional neural networks structure of the invention includes:
1) convolutional layer: due to different along matrix both direction convolution operation from image analysis, time series data only exists
It is just significant that convolution is done in time orientation, so the convolution of convolutional layer is unidirectional.This layer has multiple kernel function (kernel
Function it) is used as filter, different features is extracted, obtains multiple characteristic patterns.
2) sample level: simple maximum sampling is taken, the feature of each characteristic pattern is sampled into a single numerical value, finally
The fixed length vector for obtaining time series data indicates.
Step 3, similarity and training network are calculated.
For the feature vector that previous step is extracted, the similarity between two feature vectors is calculated based on similarity matrix, is examined
Consider the position due to arbitrarily exchanging two data, similarity should be equal, therefore use restraint to similarity matrix, i.e., should
Similarity matrix must be symmetrical.It is lost with calculated similarity calculation, and training network.
Step 3 specifically includes the following steps:
Step 3.1, the similarity between time series data is calculated based on similarity matrix;One matrix M of random initializtion, use
Two the feature vector, Xs a, Xb and M that one step obtains calculate similarity S.
Step 3.2, feature vector is merged with the similarity obtained is calculated;By two feature vector, Xs a, Xb and step 3.1
The middle similarity S obtained that calculates is spliced into a vector.
Step 3.3, classification is exported by full articulamentum;By the DUAL PROBLEMS OF VECTOR MAPPING obtained in step 3.2 at a two-dimensional vector,
The size that two data similarity of one-dimensional representation is 1, the size that two data similarity of two-dimensional representation is 0.
Step 3.4, the similarity of two datas is calculated by Softmax.
Step 3.5, loss function, and training network are constructed.
Objective function is constructed first, the loss of iteration each time is calculated according to objective function, objective function is to each parameter
Local derviation is sought, parameter is updated to its derivative (gradient) negative direction loses, to continue to optimize parameter.Loss function can formalization representation
Are as follows:
L(S1,S2, y) and=(y-M (S1,S2))2;
In formula, S1、S2Indicate the data pair of input.
A kind of data method for measuring similarity indicated based on time insertion of the invention, can make up for it existing representation method
The deficiencies of ignoring temporal information, high-dimensional, sparsity, effective expression of time series data is obtained, and reasonably accurately calculate similar
Degree.The time series data that the present invention obtains indicates dense, low-dimensional, so that calculating efficient;The time series data of acquisition is embedded in time letter
Breath, representation method are more reasonable;Accuracy rate is high, on the basis of reasonable representation, extracts feature calculation phase using convolutional neural networks
Like spending and carry out Training end to end, while realizing that reasonable representation and efficient feature extract, to improve measurement accuracy.
A kind of time series data measuring similarity system, comprising:
Time series data indicates that building module considers each sample time-series for acquiring the sample time-series data of preset quantity
The relativeness of the relativeness of each event and each event and time of origin in data maps the data of higher dimensional space
To lower dimensional space, the expression of each sample time-series data is constructed;
Characteristic vector pickup module, the table of each sample time-series data for indicating time series data building module building
Show carry out feature extraction, obtains the feature vector of each sample time series data;
Similarity calculation module, the feature of each sample time series data for being obtained according to characteristic vector pickup module to
Amount calculates the similarity between obtaining each sample time series data based on similarity matrix;
Measuring similarity network module, the feature of each sample time series data for being obtained by characteristic vector pickup module
The preset convolutional Neural net of similarity training between each sample time series data that vector and similarity calculation module obtain
Network model, training obtain trained measuring similarity model to the default condition of convergence;
Input/output module, for constructing the expression of time series data to be measured, extraction obtains time series data to be measured
Feature vector, and be inputted in measuring similarity network module, export the measuring similarity result of time series data to be measured.
Embodiment
Referring to Fig. 1, a kind of time series data method for measuring similarity of the embodiment of the present invention, is applied to electronic health care case history
Measuring similarity, comprising the following steps:
S101 constructs effective expression of electronic health care case history traditional Chinese medicine event sequence.
Step1, electronic health care case history (EMR) matrix is excessively sparse, and to do is to become thick sparse matrix first
It is close, the matrix of higher-dimension is carried out dimension-reduction treatment.Referring to Fig. 2, be a sequence of events each EMR matrix conversion, according to
The relative time of opposite event arranges event, occurs to finally obtain a vector H in event free of turn on the same day;
Medical events each in electronic health care case history are mapped to the vector of fixed length using word2vec by Step2, to obtain
The relativeness of each medical events in electronic health care case history, word2vec are the thought using deep learning, and vocabulary is shown as
The Effective model of vector.If word is regarded as feature, it can be understood as Feature Mapping to K dimensional vector space, by each
Word is mapped to K dimensional vector, just the processing to text is reduced to the vector operation of K dimensional vector space, and the similarity of vector can
To be used to indicate the similarity on text semantic.Therefore, the medical events sequence in every electronic health care case history is considered as one
A sentence, and each event in sequence can regard a word as, be mapped to one after each course of event word2vec
A permanent vector, vector length are a parameters, use dimvIndicate, by each event in Step1 in vector H be mapped to
A sequence matrix K is obtained after amount;
Step3. time map medical events each in EMR occurred using word2vec is at medical events vector etc.
Long vector is indicated with obtaining the vector of medical events time of origin, finally obtains a matrix T;
Step4. the vector expression of medical events time of origin each in EMR traditional Chinese medicine sequence of events is passed through into vectorial addition
It is embedded into corresponding medical events expression, can formalizes as follows;
E=K+T
Wherein, E is the medical events sequence representing matrix of final time insertion, and K is the sequence square generated in Step2
Battle array, T is the time series matrix generated in Step3.
Specifically, 1) above-mentioned medical events sequence representation method has the feature that by medical events square sparse in EMR
Battle array becomes dense medical events vector, has non-sparsity.2) higher-dimension event is indicated to be mapped to low-dimensional by word2vec
Vector space has low dimensional.3) final patient episode's sequence indicates to have merged relativeness and the doctor between medical events
Relativeness between event and time of origin.
S102. the character representation of EMR traditional Chinese medicine sequence is effectively extracted.
Expression for patient's medical events sequence needs to carry out feature extraction to it, effectively to carry out similarity degree
Amount.Further, since the quantity of each EMR traditional Chinese medicine event sequence is different, the vector for causing the medical events sequence obtained to indicate
Quantity is different, and the patient's similarity being calculated as convenience pair needs for the patient event sequence expression of different length to be mapped to
Isometric character representation.Method of the invention is extracted isometric patient characteristics using improved convolutional neural networks and is indicated.
Specific network structure is as follows:
1) convolutional layer: due to different along matrix both direction convolution operation from image analysis, EMR data only when
Between to do convolution in direction just significant, so the convolution of convolutional layer is unidirectional.Kernel function (the kernel that there are many this layers
Function it) is used as filter, different features is extracted, obtains multiple characteristic patterns;
As shown in Fig. 2, the input of convolutional layer is two dimvThe EMR sequence representing matrix of × L, the present invention will finally train
Out be two EMR similarity, therefore input be two EMR information.After convolution operation by c convolution kernel,
Generate the identical vector of c size, i.e., multiple characteristic patterns.Convolution operation is carried out to two EMR datas, is joined using same convolution
Number.Any two EMRA and B, should be equal when calculating sim (A, B) and sim (B, A), so the front-rear position of A and B is symmetrical
, it should use identical convolution layer parameter.Similarly, in similarity mode layer below, symmetrical similarity is also used
Matrix, this two o'clock, which ensure that, can obtain equal value when calculating sim (A, B) and sim (B, A).
After handling by convolutional layer, two c × dim can be obtainedmMatrix.It originally should be c × (T+8) matrix,
For convenience of calculation, in the input of convolutional layer, less than the patient EMR matrix end filling 0 of L event.Therefore, it obtains here
c×dimmMatrix, end a part is also possible to be still 0.Although dimension is fixed, be actually still not etc. dimensions put to the proof,
It also needs further to handle to carry out similarity analysis, is exactly in next step sample level.
2) sample level: simple maximum sampling is taken, the feature of each characteristic pattern is sampled into a single numerical value, in this way
Just each EMR is indicated with a fixed length vector.It is different from average sample, the maximum value of sampling area is taken, region can be found
Inside most can performance characteristic point.Each EMR obtains a fixed length vector after sample level, is the vector of p dimension herein,
Indicate the feature of the EMR.
S103. similarity and training network are calculated.
For the EMR feature vector that previous step is extracted, the similarity between two feature vectors is calculated based on similarity matrix,
Consider the position due to arbitrarily exchanging two patients, similarity should be equal, therefore use restraint to similarity matrix, i.e.,
The similarity matrix must be symmetrical.It is lost with calculated similarity calculation, and training network.
Specifically includes the following steps:
Step1. patient's similarity is calculated based on similarity matrix;
One matrix M of random initializtion, with two feature vector, Xs obtained in the previous stepa, XbSimilarity S is calculated with M.
Step2. feature vector is merged with the similarity of calculating;
By two feature vector, Xsa, XbAnd the similarity S calculated in Step1 is spliced into a vector.
Step3. classification is exported by full articulamentum;
By the DUAL PROBLEMS OF VECTOR MAPPING obtained in Step2 at a two-dimensional vector, two patients of one-dimensional representation belong to same
The size of cohort, two patients of two-dimensional representation belong to the size of different cohort.
Step4. the probability that two initial data belong to the same cohort is calculated by Softmax.
Step5. loss function, and training network are constructed;
Objective function is constructed first, the loss of iteration each time is calculated according to objective function, objective function is to each parameter
Local derviation is sought, parameter is updated to its derivative (gradient) negative direction loses, to continue to optimize parameter.Loss function can formalization representation
Are as follows:
L(S1,S2, y) and=(y-M (S1,S2))2。
When network parameter convergence, deconditioning obtains final EMR measuring similarity model.
To sum up, method of the invention is related to a kind of time series data method for measuring similarity indicated based on time insertion, main
Solve the problems, such as to be difficult to similarity between the metric sequence on effective and reasonable ground under a large amount of heterogeneous Dimension Time Series.Specific packet
Include following steps: firstly, effective expression of building time series data, is mapped to low-dimensional sky for higher-dimension, sparse temporal events sequence
Between, obtain each event by word2vec technology indicates in the vector of lower dimensional space, and is embedded in temporal information.Secondly, building
The convolutional neural networks of one customization extract the validity feature of time series data, obtain the fixed length mark sheet of Length discrepancy time series data
Show;Finally, indicating to calculate similarity, calculating target function and training network using the fixed length of time series data, to obtain finally
Time series data measuring similarity model.Feature of event in special time period in sequence itself is only considered different from existing method,
And ignore its relativeness with time of origin, the invention discloses a kind of time series similarities indicated based on time insertion
Measure, constructs the event vector and time arrow of each event by word2vec, then by time arrow be embedded in event to
In amount, by supervised learning mode one convolutional neural networks of training, patient's likeness in form degree measurement mould an of robust is finally obtained
Type.This method compensates for asking for relativeness in existing method between ignorance event and event and the relativeness of time of origin
Topic indicates more rationally effective to timing sequence data, to promote the accuracy of measuring similarity.
As it will be easily appreciated by one skilled in the art that the foregoing is merely embodiments of the method for the invention, not to
The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include
Within protection scope of the present invention.
Claims (10)
1. a kind of time series data method for measuring similarity, which comprises the following steps:
Step 1, the sample time-series data for acquiring preset quantity consider the relativeness of each event in each sample time-series data
And the relativeness of each event and time of origin, the data of higher dimensional space are mapped to lower dimensional space, construct each sample
The expression of this time series data;
Step 2, the expression of all sample time-series data step 1 obtained inputs preset convolutional neural networks model,
Feature extraction is carried out to the expression of each sample time-series data, obtains the feature vector of each sample time series data;
Step 3, the feature vector of each sample time series data obtained according to step 2 is calculated based on similarity matrix and obtains various kinds
Similarity between this time series data;
Step 4, each sample time-series number that the feature vector and step 3 of each sample time series data obtained by step 2 obtain
According to the preset convolutional neural networks model of similarity training, training obtains trained phase to the default condition of convergence
Like degree measurement model;
Step 5, the expression of time series data to be measured is constructed by the method for step 1, and is inputted the training of step 4 acquisition
In good measuring similarity model, the measuring similarity result of time series data to be measured is obtained.
2. a kind of time series data method for measuring similarity according to claim 1, which is characterized in that step 1 specifically includes:
Step 1.1, it is a sequence of events by every sample time series data matrix conversion, arranges thing according to the relative time of event
Event free of turn in the same time occurs for part;
Step 1.2, each event is mapped to the vector of fixed length using word2vec, each event in sequence of events that obtains includes
The vector of relativeness information indicates;
Step 1.3, time map event each in sequence of events occurred using word2vec at the dimensions such as event vector
Vector, the vector for obtaining Time To Event indicate;
Step 1.4, the vector of Time To Event each in time series data is indicated to be embedded by vectorial addition corresponding
In event expression, the expression of sample time-series data is obtained.
3. a kind of time series data method for measuring similarity according to claim 1, which is characterized in that in step 2, pass through volume
Product neural network extracts the isometric feature vector of each sample time-series data;
Convolutional neural networks structure used includes:
Convolutional layer for receiving input data, and exports characteristic pattern;
Sample level, for receiving the characteristic pattern of convolutional layer output, and the fixed length feature vector of output timing data.
4. a kind of time series data method for measuring similarity according to claim 3, which is characterized in that convolutional Neural used
In network structure:
The convolution of convolutional layer is unidirectional;
In convolutional layer, maximum sampling is taken, the feature of each characteristic pattern is sampled into a single numerical value, ordinal number when finally obtaining
According to fixed length vector indicate.
5. a kind of time series data method for measuring similarity according to claim 1, which is characterized in that used in step 3
Similarity matrix is symmetrical structure.
6. a kind of time series data method for measuring similarity according to claim 1, which is characterized in that step 4 specifically includes:
Step 4.1, feature vector is merged with the similarity obtained is calculated, is spliced into a vector;
Step 4.2, by the DUAL PROBLEMS OF VECTOR MAPPING obtained in step 4.1 at a two-dimensional vector;Two articles of time series data phases of one-dimensional representation
The size for being 1 like degree, the size that two articles of time series data similarities of two-dimensional representation are 0;
Step 4.3, the similarity for obtaining two time series datas is calculated by Softmax;
Step 4.4, loss function, and the preset convolutional neural networks of training are constructed, trained measuring similarity model is obtained.
7. a kind of time series data method for measuring similarity according to claim 6, which is characterized in that in step 4.4, first
Construct objective function, calculate the loss of iteration each time according to objective function, objective function seeks local derviation to each parameter, parameter to
Its derivative negative direction updates loss, to continue to optimize parameter;
Loss function formalization representation are as follows:
L(S1,S2, y) and=(y-M (S1,S2))2;
In formula, S1、S2Indicate the data pair of input.
8. a kind of time series data method for measuring similarity according to claim 1, which is characterized in that similarity in step 3
Calculation method are as follows: one matrix M of random initializtion, with the feature vector, X of two obtained time series datasa, XbIt is obtained with M calculating
The similarity S of the two obtained.
9. a kind of time series data method for measuring similarity according to claim 8, which is characterized in that step 4 specifically includes:
Step 4.1, by the feature vector, X of two time series datasa、XbOne is spliced into the similarity S for calculating acquisition in step 3
Vector;
Step 4.2, by the DUAL PROBLEMS OF VECTOR MAPPING obtained in step 4.1 at a two-dimensional vector, two articles of time series data phases of one-dimensional representation
The size for being 1 like degree, the size that two articles of time series data similarities of two-dimensional representation are 0;
Step 4.3, the similarity of two time series datas is calculated by Softmax;
Step 4.4, loss function, and the preset convolutional neural networks of training are constructed, trained measuring similarity model is obtained.
10. a kind of time series data measuring similarity system characterized by comprising
Time series data indicates that building module considers each sample time-series data for acquiring the sample time-series data of preset quantity
In the relativeness of each event and the relativeness of each event and time of origin, the data of higher dimensional space are mapped to low
Dimension space constructs the expression of each sample time-series data;
Measuring similarity network module, the expression of each sample time-series data for indicating time series data building module building
Feature extraction is carried out, the feature vector of each sample time series data is obtained;For the feature according to each sample time series data of acquisition
Vector calculates the similarity between obtaining each sample time series data based on similarity matrix;For passing through characteristic vector pickup module
It is similar between each sample time series data that the feature vector and similarity calculation module of each sample time series data obtained obtain
The preset convolutional neural networks model of degree training, training obtain trained measuring similarity mould to the default condition of convergence
Type;
The measuring similarity of time series data to be measured is completed by trained measuring similarity model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910067744.8A CN109948646A (en) | 2019-01-24 | 2019-01-24 | A kind of time series data method for measuring similarity and gauging system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910067744.8A CN109948646A (en) | 2019-01-24 | 2019-01-24 | A kind of time series data method for measuring similarity and gauging system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109948646A true CN109948646A (en) | 2019-06-28 |
Family
ID=67007417
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910067744.8A Pending CN109948646A (en) | 2019-01-24 | 2019-01-24 | A kind of time series data method for measuring similarity and gauging system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109948646A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110766060A (en) * | 2019-10-14 | 2020-02-07 | 中山大学 | Time series similarity calculation method, system and medium based on deep learning |
CN111309900A (en) * | 2020-01-17 | 2020-06-19 | 中国科学技术大学 | Legal class similarity judging and pushing method |
CN112115184A (en) * | 2020-09-18 | 2020-12-22 | 平安科技(深圳)有限公司 | Time series data detection method and device, computer equipment and storage medium |
CN112966808A (en) * | 2021-01-25 | 2021-06-15 | 咪咕音乐有限公司 | Data analysis method, device, server and readable storage medium |
CN113239990A (en) * | 2021-04-27 | 2021-08-10 | 中国银联股份有限公司 | Method and device for performing feature processing on sequence data and storage medium |
CN113377909A (en) * | 2021-06-09 | 2021-09-10 | 平安科技(深圳)有限公司 | Paraphrase analysis model training method and device, terminal equipment and storage medium |
CN114528334A (en) * | 2022-02-18 | 2022-05-24 | 重庆伏特猫科技有限公司 | Rapid similarity searching method in time sequence database |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016187711A1 (en) * | 2015-05-22 | 2016-12-01 | Csts Health Care Inc. | Biomarker-driven molecularly targeted combination therapies based on knowledge representation pathway analysis |
CN106446081A (en) * | 2016-09-09 | 2017-02-22 | 西安交通大学 | Method for mining association relationship of time series data based on change consistency |
CN107239445A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | The method and system that a kind of media event based on neutral net is extracted |
-
2019
- 2019-01-24 CN CN201910067744.8A patent/CN109948646A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2016187711A1 (en) * | 2015-05-22 | 2016-12-01 | Csts Health Care Inc. | Biomarker-driven molecularly targeted combination therapies based on knowledge representation pathway analysis |
CN106446081A (en) * | 2016-09-09 | 2017-02-22 | 西安交通大学 | Method for mining association relationship of time series data based on change consistency |
CN107239445A (en) * | 2017-05-27 | 2017-10-10 | 中国矿业大学 | The method and system that a kind of media event based on neutral net is extracted |
Non-Patent Citations (2)
Title |
---|
ZIHAO ZHU ET AL.: "Measuring Patient Similarities via a Deep Architecture with Medical Concept Embedding", 《2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)》 * |
贾峥等: "基于电子病历的患者相似性分析综述", 《中国生物医学工程学报》 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110766060A (en) * | 2019-10-14 | 2020-02-07 | 中山大学 | Time series similarity calculation method, system and medium based on deep learning |
CN111309900A (en) * | 2020-01-17 | 2020-06-19 | 中国科学技术大学 | Legal class similarity judging and pushing method |
CN111309900B (en) * | 2020-01-17 | 2022-09-06 | 中国科学技术大学 | Legal class similarity judging and pushing method |
CN112115184A (en) * | 2020-09-18 | 2020-12-22 | 平安科技(深圳)有限公司 | Time series data detection method and device, computer equipment and storage medium |
CN112966808A (en) * | 2021-01-25 | 2021-06-15 | 咪咕音乐有限公司 | Data analysis method, device, server and readable storage medium |
CN113239990A (en) * | 2021-04-27 | 2021-08-10 | 中国银联股份有限公司 | Method and device for performing feature processing on sequence data and storage medium |
CN113377909A (en) * | 2021-06-09 | 2021-09-10 | 平安科技(深圳)有限公司 | Paraphrase analysis model training method and device, terminal equipment and storage medium |
CN113377909B (en) * | 2021-06-09 | 2023-07-11 | 平安科技(深圳)有限公司 | Paraphrasing analysis model training method and device, terminal equipment and storage medium |
CN114528334A (en) * | 2022-02-18 | 2022-05-24 | 重庆伏特猫科技有限公司 | Rapid similarity searching method in time sequence database |
CN114528334B (en) * | 2022-02-18 | 2022-10-18 | 重庆伏特猫科技有限公司 | Rapid similarity searching method in time sequence database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109948646A (en) | A kind of time series data method for measuring similarity and gauging system | |
Gadosey et al. | SD-UNET: Stripping down U-net for segmentation of biomedical images on platforms with low computational budgets | |
Li et al. | Comparison of feature learning methods for human activity recognition using wearable sensors | |
Yang et al. | EANet: Edge-aware network for the extraction of buildings from aerial images | |
Li et al. | Model compression for deep neural networks: A survey | |
Huang et al. | City3D: Large-scale building reconstruction from airborne LiDAR point clouds | |
Zhang et al. | Overview of multi-modal brain tumor mr image segmentation | |
CN109471895A (en) | The extraction of electronic health record phenotype, phenotype name authority method and system | |
CN106909537B (en) | One-word polysemous analysis method based on topic model and vector space | |
CN109325513A (en) | A kind of image classification network training method based on magnanimity list class single image | |
Luo et al. | Pine cone detection using boundary equilibrium generative adversarial networks and improved YOLOv3 model | |
Lyu et al. | Cirrus detection based on RPCA and fractal dictionary learning in infrared imagery | |
Ma et al. | An improved ball pivot algorithm-based ground filtering mechanism for LiDAR data | |
CN205015889U (en) | Definite system of traditional chinese medical science lingual diagnosis model based on convolution neuroid | |
Wang et al. | A deformable convolutional neural network with spatial-channel attention for remote sensing scene classification | |
Xu et al. | A combination of lie group machine learning and deep learning for remote sensing scene classification using multi-layer heterogeneous feature extraction and fusion | |
Chai et al. | Compact cloud detection with bidirectional self-attention knowledge distillation | |
Mou et al. | YOLO-FR: A YOLOv5 infrared small target detection algorithm based on feature reassembly sampling method | |
Li et al. | Method of building detection in optical remote sensing images based on segformer | |
Qin et al. | PointSkelCNN: Deep Learning‐Based 3D Human Skeleton Extraction from Point Clouds | |
Zhang et al. | Global random graph convolution network for hyperspectral image classification | |
Perko et al. | Critical aspects of person counting and density estimation | |
Wang et al. | SCA-Net: multiscale contextual information network for building extraction based on high-resolution remote sensing images | |
Chen et al. | A novel deep nearest neighbor neural network for few-shot remote sensing image scene classification | |
Zhang et al. | SaltISNet3D: Interactive salt segmentation from 3D seismic images using deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190628 |