CN109948646A - A kind of time series data method for measuring similarity and gauging system - Google Patents

A kind of time series data method for measuring similarity and gauging system Download PDF

Info

Publication number
CN109948646A
CN109948646A CN201910067744.8A CN201910067744A CN109948646A CN 109948646 A CN109948646 A CN 109948646A CN 201910067744 A CN201910067744 A CN 201910067744A CN 109948646 A CN109948646 A CN 109948646A
Authority
CN
China
Prior art keywords
series data
time series
vector
similarity
event
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910067744.8A
Other languages
Chinese (zh)
Inventor
钱步月
张先礼
陆亮
王谞动
刘小彤
李扬
卫荣
郑庆华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN201910067744.8A priority Critical patent/CN109948646A/en
Publication of CN109948646A publication Critical patent/CN109948646A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of time series data method for measuring similarity and gauging systems, comprising the following steps: firstly, for the event in all time series datas, the vector for learning each event is indicated;Secondly, the time map that each event is occurred is embedded into event vector at the vector with the dimensions such as event vector by vectorial addition;Finally, final sequence of events is indicated that being sent into convolutional neural networks carries out the study for having supervision, the time series data measuring similarity model of the robust that finally learns;Measuring similarity is carried out by obtained measuring similarity model.The present invention indicates more rationally effective to timing sequence data, so as to promote the accuracy of time series data measuring similarity.

Description

A kind of time series data method for measuring similarity and gauging system
Technical field
The invention belongs to time series data similarity technical field, in particular to a kind of time series data method for measuring similarity and Gauging system.
Background technique
Data measuring similarity is the underlying issue in data science, it is related to natural language processing, data retrieval, queue Multiple application fields such as analysis.There are a large amount of time series data in reality scene, these data usually have timing, higher-dimension Degree, heterogeneity, sparsity, the features such as not etc. peacekeepings are not irregular.
Currently, this representation method is because of sparsity, height usually using the sequence representation method based on one-hot vector The features such as dimension, can seriously reduce the efficiency and accuracy of similarity calculation.In addition, existing method is usually in special time period Polymeric sequence event ignores the opposite pass of the relativeness and each event and time of origin in sequence between each event System, this will lead to the loss of temporal information.From the perspective of reality scene, most of events all can change at any time in sequence Change and changes, and the correlativity of each event also can be different with the variation of event, therefore temporal information is for event The expression of sequence is particularly significant.
To sum up, a kind of new time series data method for measuring similarity is needed.
Summary of the invention
The purpose of the present invention is to provide when a kind of time series data method for measuring similarity and gauging system, it is above-mentioned to solve ?.The present invention by effective expression to time series data, can make up conventional method ignore in data event it Between and event and time of origin between relativeness defect, can to time series data similarity carry out valid metric.
In order to achieve the above objectives, the invention adopts the following technical scheme:
A kind of time series data method for measuring similarity, comprising the following steps:
Step 1, the sample time-series data for acquiring preset quantity consider the opposite of each event in each sample time-series data The data of higher dimensional space are mapped to lower dimensional space by the relativeness of relationship and each event and time of origin, are constructed every The expression of a sample time-series data;
Step 2, the expression of all sample time-series data step 1 obtained inputs preset convolutional neural networks mould Type carries out feature extraction to the expression of each sample time-series data, obtains the feature vector of each sample time series data;
Step 3, the feature vector of each sample time series data obtained according to step 2 is calculated based on similarity matrix and is obtained Similarity between each sample time series data;
Step 4, when each sample that the feature vector and step 3 of each sample time series data obtained by step 2 obtain Similarity training preset convolutional neural networks model of the ordinal number between, training are trained to the default condition of convergence Measuring similarity model;
Step 5, the expression of time series data to be measured is constructed by the method for step 1, and is inputted step 4 acquisition In trained measuring similarity model, the measuring similarity result of time series data to be measured is obtained.
Further, step 1 specifically includes:
Step 1.1, it is a sequence of events by every sample time series data matrix conversion, is arranged according to the relative time of event Event free of turn in the same time occurs for column event;
Step 1.2, each event is mapped to the vector of fixed length using word2vec, obtains each event in sequence of events Vector comprising relativeness information indicates;
Step 1.3, time map event each in sequence of events occurred using word2vec is at event vector etc. The vector of dimension, the vector for obtaining Time To Event indicate;
Step 1.4, the vector of Time To Event each in time series data is indicated to be embedded into relatively by vectorial addition In the event expression answered, the expression of sample time-series data is obtained.
Further, in step 2, by convolutional neural networks extract the isometric features of each sample time-series data to Amount;
Convolutional neural networks structure used includes:
Convolutional layer for receiving input data, and exports characteristic pattern;
Sample level, for receiving the characteristic pattern of convolutional layer output, and the fixed length feature vector of output timing data.
Further, in convolutional neural networks structure used:
The convolution of convolutional layer is unidirectional.
In convolutional layer, maximum sampling is taken, the feature of each characteristic pattern is sampled into a single numerical value, when finally obtaining The fixed length vector of ordinal number evidence indicates.
Further, step 4 specifically includes:
Step 4.1, feature vector is merged with the similarity obtained is calculated, is spliced into a vector.
Step 4.2, by the DUAL PROBLEMS OF VECTOR MAPPING obtained in step 4.1 at a two-dimensional vector;Ordinal number at two articles of one-dimensional representation The size for being 1 according to similarity, the size that two articles of time series data similarities of two-dimensional representation are 0;
Step 4.3, the similarity for obtaining two time series datas is calculated by Softmax;
Step 4.4, loss function, and the preset convolutional neural networks of training are constructed, trained measuring similarity is obtained Model.
Further, in step 4.4, objective function is constructed first, and the loss of iteration each time is calculated according to objective function, Objective function seeks local derviation to each parameter, and parameter is updated to its derivative negative direction loses, to continue to optimize parameter;
Loss function formalization representation are as follows:
L(S1,S2, y) and=(y-M (S1,S2))2
In formula, S1、S2Indicate the data pair of input.
Further, in step 3 similarity calculation method are as follows: one matrix M of random initializtion, when with two obtained The feature vector, X of ordinal number evidencea, XbThe similarity S of the two obtained is calculated with M.
A kind of time series data measuring similarity system, comprising:
Time series data indicates that building module considers each sample time-series for acquiring the sample time-series data of preset quantity The relativeness of the relativeness of each event and each event and time of origin in data maps the data of higher dimensional space To lower dimensional space, the expression of each sample time-series data is constructed;
Measuring similarity network module, for each sample time-series data to time series data expression building module building It indicates to carry out feature extraction, obtains the feature vector of each sample time series data;For according to each sample time series data of acquisition Feature vector calculates the similarity between obtaining each sample time series data based on similarity matrix;For passing through characteristic vector pickup Between each sample time series data that the feature vector and similarity calculation module for each sample time series data that module obtains obtain The preset convolutional neural networks model of similarity training, training obtain trained similarity degree to the default condition of convergence Measure model;
The measuring similarity of time series data to be measured is completed by trained measuring similarity model.
Compared with prior art, the invention has the following advantages:
When only considering feature of event in special time period in sequence itself different from existing method, and ignoring it with occurring Between relativeness;The present invention for time series data sparsity, it is high-dimensional, not etc. dimensions, timing and scrambling the features such as, A kind of rationally effective time series data method for measuring similarity is provided.In method of the invention: firstly, for all time series datas In event, the vector for constructing each event indicates that this vector indicates that the distance that can efficiently use vector space indicates to suffer from The relativeness of each event of person;Secondly, the time map that each event is occurred passes through at the vector with the dimensions such as event vector Vectorial addition is embedded into event vector;Finally, final sequence of events is indicated that being sent into convolutional neural networks has carried out supervision Study, the time series data measuring similarity model of the robust that finally learns passes through obtained model and carries out measuring similarity. The present invention by effective expression to time series data, compensate for conventional method ignore in data between event, event and when occurring Between between relativeness the problem of, solve the problems, such as can not to time series data similarity carry out valid metric, clock synchronization of the present invention Sequence sequence data indicates more rationally effective, so as to promote the accuracy of measuring similarity.In the present invention, the when ordinal number of acquisition According to dense, low-dimensional is indicated, it may make that calculating is efficient;The time series data of acquisition is embedded in temporal information, and representation method is more reasonable;? On the basis of reasonable representation of the present invention, feature calculation similarity is extracted using convolutional neural networks and is had supervision end to end Training, while realizing that reasonable representation and efficient feature extract, so that measurement accuracy can be improved.
Further, sparse time series data matrix 1) is become into dense event vector, realizes non-sparsity.It 2) will be high The expression of dimension event is mapped to low-dimensional vector space by word2vec, realizes low dimensional.3) final sequence of events indicates fusion The relativeness between relativeness and event and time of origin between event.
Further, due to different along matrix both direction convolution operation from image analysis, time series data only exists It is just significant that convolution is done in time orientation, so the convolution of convolutional layer is unidirectional.
Further, for the feature vector of extraction, the similarity between two feature vectors is calculated based on similarity matrix, Consider the position due to arbitrarily exchanging two data, similarity should be equal, therefore use restraint to similarity matrix, i.e., The similarity matrix must be symmetrical.
Detailed description of the invention
Fig. 1 is a kind of schematic process flow diagram of time series data method for measuring similarity of the invention;
Fig. 2 is the signal of sequence of events matrix dimension-reduction treatment in a kind of time series data method for measuring similarity of the invention Figure;
Fig. 3 is the schematic diagram of convolutional neural networks structure in a kind of time series data method for measuring similarity of the invention.
Specific embodiment
Invention is further described in detail in the following with reference to the drawings and specific embodiments.
Referring to Fig. 1, a kind of time series data method for measuring similarity of the invention, comprising the following steps:
Step 1, effective expression of time series data is constructed.The effective expression for constructing time series data, needs sparse timing Data become dense, consider the relativeness of each event and the relativeness of each event and time of origin in sequence, will The data of higher dimensional space are mapped to lower dimensional space.
Step 1 specifically includes the following steps:
Step 1.1, time series data matrix is excessively sparse, and it is huge to will lead to operand for directly analysis.Sparse matrix is become Must be dense, the matrix dimension-reduction treatment of higher-dimension.Referring to Fig. 2, being a sequence of events every time series data matrix conversion, press Event is arranged according to the relative time of event, the event free of turn in the same time occurs;
Step 1.2, each event is mapped to the vector of fixed length using word2vec, each event in sequence that obtains includes The vector of relativeness information indicates;
Step 1.3, the time map each event occurred using word2vec is obtained at the vector with the dimensions such as event vector The vector for obtaining Time To Event indicates;
Step 1.4, the vector of Time To Event each in time series data is indicated to be embedded into relatively by vectorial addition In the event expression answered.
1) time series data representation method of the invention, which has the feature that, becomes dense for sparse time series data matrix Event vector, non-sparsity.2) higher-dimension event is indicated to be mapped to low-dimensional vector space, low dimensional by word2vec.3) most Whole sequence of events indicates to have merged the relativeness between the relativeness and event and time of origin between event.
Step 2, the feature of time series data is effectively extracted.
Expression for time series data needs to carry out feature extraction to it, effectively to carry out measuring similarity.In addition, Since the quantity of each time series data event is different, the vector quantity that the sequence of events for causing previous step to obtain indicates is different, is The data similarity that convenience is calculated as pair, needs the sequence expression of different length being mapped to isometric character representation.We Method extracts isometric time series data character representation using improved convolutional neural networks.
Referring to Fig. 3, convolutional neural networks structure of the invention includes:
1) convolutional layer: due to different along matrix both direction convolution operation from image analysis, time series data only exists It is just significant that convolution is done in time orientation, so the convolution of convolutional layer is unidirectional.This layer has multiple kernel function (kernel Function it) is used as filter, different features is extracted, obtains multiple characteristic patterns.
2) sample level: simple maximum sampling is taken, the feature of each characteristic pattern is sampled into a single numerical value, finally The fixed length vector for obtaining time series data indicates.
Step 3, similarity and training network are calculated.
For the feature vector that previous step is extracted, the similarity between two feature vectors is calculated based on similarity matrix, is examined Consider the position due to arbitrarily exchanging two data, similarity should be equal, therefore use restraint to similarity matrix, i.e., should Similarity matrix must be symmetrical.It is lost with calculated similarity calculation, and training network.
Step 3 specifically includes the following steps:
Step 3.1, the similarity between time series data is calculated based on similarity matrix;One matrix M of random initializtion, use Two the feature vector, Xs a, Xb and M that one step obtains calculate similarity S.
Step 3.2, feature vector is merged with the similarity obtained is calculated;By two feature vector, Xs a, Xb and step 3.1 The middle similarity S obtained that calculates is spliced into a vector.
Step 3.3, classification is exported by full articulamentum;By the DUAL PROBLEMS OF VECTOR MAPPING obtained in step 3.2 at a two-dimensional vector, The size that two data similarity of one-dimensional representation is 1, the size that two data similarity of two-dimensional representation is 0.
Step 3.4, the similarity of two datas is calculated by Softmax.
Step 3.5, loss function, and training network are constructed.
Objective function is constructed first, the loss of iteration each time is calculated according to objective function, objective function is to each parameter Local derviation is sought, parameter is updated to its derivative (gradient) negative direction loses, to continue to optimize parameter.Loss function can formalization representation Are as follows:
L(S1,S2, y) and=(y-M (S1,S2))2
In formula, S1、S2Indicate the data pair of input.
A kind of data method for measuring similarity indicated based on time insertion of the invention, can make up for it existing representation method The deficiencies of ignoring temporal information, high-dimensional, sparsity, effective expression of time series data is obtained, and reasonably accurately calculate similar Degree.The time series data that the present invention obtains indicates dense, low-dimensional, so that calculating efficient;The time series data of acquisition is embedded in time letter Breath, representation method are more reasonable;Accuracy rate is high, on the basis of reasonable representation, extracts feature calculation phase using convolutional neural networks Like spending and carry out Training end to end, while realizing that reasonable representation and efficient feature extract, to improve measurement accuracy.
A kind of time series data measuring similarity system, comprising:
Time series data indicates that building module considers each sample time-series for acquiring the sample time-series data of preset quantity The relativeness of the relativeness of each event and each event and time of origin in data maps the data of higher dimensional space To lower dimensional space, the expression of each sample time-series data is constructed;
Characteristic vector pickup module, the table of each sample time-series data for indicating time series data building module building Show carry out feature extraction, obtains the feature vector of each sample time series data;
Similarity calculation module, the feature of each sample time series data for being obtained according to characteristic vector pickup module to Amount calculates the similarity between obtaining each sample time series data based on similarity matrix;
Measuring similarity network module, the feature of each sample time series data for being obtained by characteristic vector pickup module The preset convolutional Neural net of similarity training between each sample time series data that vector and similarity calculation module obtain Network model, training obtain trained measuring similarity model to the default condition of convergence;
Input/output module, for constructing the expression of time series data to be measured, extraction obtains time series data to be measured Feature vector, and be inputted in measuring similarity network module, export the measuring similarity result of time series data to be measured.
Embodiment
Referring to Fig. 1, a kind of time series data method for measuring similarity of the embodiment of the present invention, is applied to electronic health care case history Measuring similarity, comprising the following steps:
S101 constructs effective expression of electronic health care case history traditional Chinese medicine event sequence.
Step1, electronic health care case history (EMR) matrix is excessively sparse, and to do is to become thick sparse matrix first It is close, the matrix of higher-dimension is carried out dimension-reduction treatment.Referring to Fig. 2, be a sequence of events each EMR matrix conversion, according to The relative time of opposite event arranges event, occurs to finally obtain a vector H in event free of turn on the same day;
Medical events each in electronic health care case history are mapped to the vector of fixed length using word2vec by Step2, to obtain The relativeness of each medical events in electronic health care case history, word2vec are the thought using deep learning, and vocabulary is shown as The Effective model of vector.If word is regarded as feature, it can be understood as Feature Mapping to K dimensional vector space, by each Word is mapped to K dimensional vector, just the processing to text is reduced to the vector operation of K dimensional vector space, and the similarity of vector can To be used to indicate the similarity on text semantic.Therefore, the medical events sequence in every electronic health care case history is considered as one A sentence, and each event in sequence can regard a word as, be mapped to one after each course of event word2vec A permanent vector, vector length are a parameters, use dimvIndicate, by each event in Step1 in vector H be mapped to A sequence matrix K is obtained after amount;
Step3. time map medical events each in EMR occurred using word2vec is at medical events vector etc. Long vector is indicated with obtaining the vector of medical events time of origin, finally obtains a matrix T;
Step4. the vector expression of medical events time of origin each in EMR traditional Chinese medicine sequence of events is passed through into vectorial addition It is embedded into corresponding medical events expression, can formalizes as follows;
E=K+T
Wherein, E is the medical events sequence representing matrix of final time insertion, and K is the sequence square generated in Step2 Battle array, T is the time series matrix generated in Step3.
Specifically, 1) above-mentioned medical events sequence representation method has the feature that by medical events square sparse in EMR Battle array becomes dense medical events vector, has non-sparsity.2) higher-dimension event is indicated to be mapped to low-dimensional by word2vec Vector space has low dimensional.3) final patient episode's sequence indicates to have merged relativeness and the doctor between medical events Relativeness between event and time of origin.
S102. the character representation of EMR traditional Chinese medicine sequence is effectively extracted.
Expression for patient's medical events sequence needs to carry out feature extraction to it, effectively to carry out similarity degree Amount.Further, since the quantity of each EMR traditional Chinese medicine event sequence is different, the vector for causing the medical events sequence obtained to indicate Quantity is different, and the patient's similarity being calculated as convenience pair needs for the patient event sequence expression of different length to be mapped to Isometric character representation.Method of the invention is extracted isometric patient characteristics using improved convolutional neural networks and is indicated.
Specific network structure is as follows:
1) convolutional layer: due to different along matrix both direction convolution operation from image analysis, EMR data only when Between to do convolution in direction just significant, so the convolution of convolutional layer is unidirectional.Kernel function (the kernel that there are many this layers Function it) is used as filter, different features is extracted, obtains multiple characteristic patterns;
As shown in Fig. 2, the input of convolutional layer is two dimvThe EMR sequence representing matrix of × L, the present invention will finally train Out be two EMR similarity, therefore input be two EMR information.After convolution operation by c convolution kernel, Generate the identical vector of c size, i.e., multiple characteristic patterns.Convolution operation is carried out to two EMR datas, is joined using same convolution Number.Any two EMRA and B, should be equal when calculating sim (A, B) and sim (B, A), so the front-rear position of A and B is symmetrical , it should use identical convolution layer parameter.Similarly, in similarity mode layer below, symmetrical similarity is also used Matrix, this two o'clock, which ensure that, can obtain equal value when calculating sim (A, B) and sim (B, A).
After handling by convolutional layer, two c × dim can be obtainedmMatrix.It originally should be c × (T+8) matrix, For convenience of calculation, in the input of convolutional layer, less than the patient EMR matrix end filling 0 of L event.Therefore, it obtains here c×dimmMatrix, end a part is also possible to be still 0.Although dimension is fixed, be actually still not etc. dimensions put to the proof, It also needs further to handle to carry out similarity analysis, is exactly in next step sample level.
2) sample level: simple maximum sampling is taken, the feature of each characteristic pattern is sampled into a single numerical value, in this way Just each EMR is indicated with a fixed length vector.It is different from average sample, the maximum value of sampling area is taken, region can be found Inside most can performance characteristic point.Each EMR obtains a fixed length vector after sample level, is the vector of p dimension herein, Indicate the feature of the EMR.
S103. similarity and training network are calculated.
For the EMR feature vector that previous step is extracted, the similarity between two feature vectors is calculated based on similarity matrix, Consider the position due to arbitrarily exchanging two patients, similarity should be equal, therefore use restraint to similarity matrix, i.e., The similarity matrix must be symmetrical.It is lost with calculated similarity calculation, and training network.
Specifically includes the following steps:
Step1. patient's similarity is calculated based on similarity matrix;
One matrix M of random initializtion, with two feature vector, Xs obtained in the previous stepa, XbSimilarity S is calculated with M.
Step2. feature vector is merged with the similarity of calculating;
By two feature vector, Xsa, XbAnd the similarity S calculated in Step1 is spliced into a vector.
Step3. classification is exported by full articulamentum;
By the DUAL PROBLEMS OF VECTOR MAPPING obtained in Step2 at a two-dimensional vector, two patients of one-dimensional representation belong to same The size of cohort, two patients of two-dimensional representation belong to the size of different cohort.
Step4. the probability that two initial data belong to the same cohort is calculated by Softmax.
Step5. loss function, and training network are constructed;
Objective function is constructed first, the loss of iteration each time is calculated according to objective function, objective function is to each parameter Local derviation is sought, parameter is updated to its derivative (gradient) negative direction loses, to continue to optimize parameter.Loss function can formalization representation Are as follows:
L(S1,S2, y) and=(y-M (S1,S2))2
When network parameter convergence, deconditioning obtains final EMR measuring similarity model.
To sum up, method of the invention is related to a kind of time series data method for measuring similarity indicated based on time insertion, main Solve the problems, such as to be difficult to similarity between the metric sequence on effective and reasonable ground under a large amount of heterogeneous Dimension Time Series.Specific packet Include following steps: firstly, effective expression of building time series data, is mapped to low-dimensional sky for higher-dimension, sparse temporal events sequence Between, obtain each event by word2vec technology indicates in the vector of lower dimensional space, and is embedded in temporal information.Secondly, building The convolutional neural networks of one customization extract the validity feature of time series data, obtain the fixed length mark sheet of Length discrepancy time series data Show;Finally, indicating to calculate similarity, calculating target function and training network using the fixed length of time series data, to obtain finally Time series data measuring similarity model.Feature of event in special time period in sequence itself is only considered different from existing method, And ignore its relativeness with time of origin, the invention discloses a kind of time series similarities indicated based on time insertion Measure, constructs the event vector and time arrow of each event by word2vec, then by time arrow be embedded in event to In amount, by supervised learning mode one convolutional neural networks of training, patient's likeness in form degree measurement mould an of robust is finally obtained Type.This method compensates for asking for relativeness in existing method between ignorance event and event and the relativeness of time of origin Topic indicates more rationally effective to timing sequence data, to promote the accuracy of measuring similarity.
As it will be easily appreciated by one skilled in the art that the foregoing is merely embodiments of the method for the invention, not to The limitation present invention, any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should all include Within protection scope of the present invention.

Claims (10)

1. a kind of time series data method for measuring similarity, which comprises the following steps:
Step 1, the sample time-series data for acquiring preset quantity consider the relativeness of each event in each sample time-series data And the relativeness of each event and time of origin, the data of higher dimensional space are mapped to lower dimensional space, construct each sample The expression of this time series data;
Step 2, the expression of all sample time-series data step 1 obtained inputs preset convolutional neural networks model, Feature extraction is carried out to the expression of each sample time-series data, obtains the feature vector of each sample time series data;
Step 3, the feature vector of each sample time series data obtained according to step 2 is calculated based on similarity matrix and obtains various kinds Similarity between this time series data;
Step 4, each sample time-series number that the feature vector and step 3 of each sample time series data obtained by step 2 obtain According to the preset convolutional neural networks model of similarity training, training obtains trained phase to the default condition of convergence Like degree measurement model;
Step 5, the expression of time series data to be measured is constructed by the method for step 1, and is inputted the training of step 4 acquisition In good measuring similarity model, the measuring similarity result of time series data to be measured is obtained.
2. a kind of time series data method for measuring similarity according to claim 1, which is characterized in that step 1 specifically includes:
Step 1.1, it is a sequence of events by every sample time series data matrix conversion, arranges thing according to the relative time of event Event free of turn in the same time occurs for part;
Step 1.2, each event is mapped to the vector of fixed length using word2vec, each event in sequence of events that obtains includes The vector of relativeness information indicates;
Step 1.3, time map event each in sequence of events occurred using word2vec at the dimensions such as event vector Vector, the vector for obtaining Time To Event indicate;
Step 1.4, the vector of Time To Event each in time series data is indicated to be embedded by vectorial addition corresponding In event expression, the expression of sample time-series data is obtained.
3. a kind of time series data method for measuring similarity according to claim 1, which is characterized in that in step 2, pass through volume Product neural network extracts the isometric feature vector of each sample time-series data;
Convolutional neural networks structure used includes:
Convolutional layer for receiving input data, and exports characteristic pattern;
Sample level, for receiving the characteristic pattern of convolutional layer output, and the fixed length feature vector of output timing data.
4. a kind of time series data method for measuring similarity according to claim 3, which is characterized in that convolutional Neural used In network structure:
The convolution of convolutional layer is unidirectional;
In convolutional layer, maximum sampling is taken, the feature of each characteristic pattern is sampled into a single numerical value, ordinal number when finally obtaining According to fixed length vector indicate.
5. a kind of time series data method for measuring similarity according to claim 1, which is characterized in that used in step 3 Similarity matrix is symmetrical structure.
6. a kind of time series data method for measuring similarity according to claim 1, which is characterized in that step 4 specifically includes:
Step 4.1, feature vector is merged with the similarity obtained is calculated, is spliced into a vector;
Step 4.2, by the DUAL PROBLEMS OF VECTOR MAPPING obtained in step 4.1 at a two-dimensional vector;Two articles of time series data phases of one-dimensional representation The size for being 1 like degree, the size that two articles of time series data similarities of two-dimensional representation are 0;
Step 4.3, the similarity for obtaining two time series datas is calculated by Softmax;
Step 4.4, loss function, and the preset convolutional neural networks of training are constructed, trained measuring similarity model is obtained.
7. a kind of time series data method for measuring similarity according to claim 6, which is characterized in that in step 4.4, first Construct objective function, calculate the loss of iteration each time according to objective function, objective function seeks local derviation to each parameter, parameter to Its derivative negative direction updates loss, to continue to optimize parameter;
Loss function formalization representation are as follows:
L(S1,S2, y) and=(y-M (S1,S2))2
In formula, S1、S2Indicate the data pair of input.
8. a kind of time series data method for measuring similarity according to claim 1, which is characterized in that similarity in step 3 Calculation method are as follows: one matrix M of random initializtion, with the feature vector, X of two obtained time series datasa, XbIt is obtained with M calculating The similarity S of the two obtained.
9. a kind of time series data method for measuring similarity according to claim 8, which is characterized in that step 4 specifically includes:
Step 4.1, by the feature vector, X of two time series datasa、XbOne is spliced into the similarity S for calculating acquisition in step 3 Vector;
Step 4.2, by the DUAL PROBLEMS OF VECTOR MAPPING obtained in step 4.1 at a two-dimensional vector, two articles of time series data phases of one-dimensional representation The size for being 1 like degree, the size that two articles of time series data similarities of two-dimensional representation are 0;
Step 4.3, the similarity of two time series datas is calculated by Softmax;
Step 4.4, loss function, and the preset convolutional neural networks of training are constructed, trained measuring similarity model is obtained.
10. a kind of time series data measuring similarity system characterized by comprising
Time series data indicates that building module considers each sample time-series data for acquiring the sample time-series data of preset quantity In the relativeness of each event and the relativeness of each event and time of origin, the data of higher dimensional space are mapped to low Dimension space constructs the expression of each sample time-series data;
Measuring similarity network module, the expression of each sample time-series data for indicating time series data building module building Feature extraction is carried out, the feature vector of each sample time series data is obtained;For the feature according to each sample time series data of acquisition Vector calculates the similarity between obtaining each sample time series data based on similarity matrix;For passing through characteristic vector pickup module It is similar between each sample time series data that the feature vector and similarity calculation module of each sample time series data obtained obtain The preset convolutional neural networks model of degree training, training obtain trained measuring similarity mould to the default condition of convergence Type;
The measuring similarity of time series data to be measured is completed by trained measuring similarity model.
CN201910067744.8A 2019-01-24 2019-01-24 A kind of time series data method for measuring similarity and gauging system Pending CN109948646A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910067744.8A CN109948646A (en) 2019-01-24 2019-01-24 A kind of time series data method for measuring similarity and gauging system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910067744.8A CN109948646A (en) 2019-01-24 2019-01-24 A kind of time series data method for measuring similarity and gauging system

Publications (1)

Publication Number Publication Date
CN109948646A true CN109948646A (en) 2019-06-28

Family

ID=67007417

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910067744.8A Pending CN109948646A (en) 2019-01-24 2019-01-24 A kind of time series data method for measuring similarity and gauging system

Country Status (1)

Country Link
CN (1) CN109948646A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766060A (en) * 2019-10-14 2020-02-07 中山大学 Time series similarity calculation method, system and medium based on deep learning
CN111309900A (en) * 2020-01-17 2020-06-19 中国科学技术大学 Legal class similarity judging and pushing method
CN112115184A (en) * 2020-09-18 2020-12-22 平安科技(深圳)有限公司 Time series data detection method and device, computer equipment and storage medium
CN112966808A (en) * 2021-01-25 2021-06-15 咪咕音乐有限公司 Data analysis method, device, server and readable storage medium
CN113239990A (en) * 2021-04-27 2021-08-10 中国银联股份有限公司 Method and device for performing feature processing on sequence data and storage medium
CN113377909A (en) * 2021-06-09 2021-09-10 平安科技(深圳)有限公司 Paraphrase analysis model training method and device, terminal equipment and storage medium
CN114528334A (en) * 2022-02-18 2022-05-24 重庆伏特猫科技有限公司 Rapid similarity searching method in time sequence database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016187711A1 (en) * 2015-05-22 2016-12-01 Csts Health Care Inc. Biomarker-driven molecularly targeted combination therapies based on knowledge representation pathway analysis
CN106446081A (en) * 2016-09-09 2017-02-22 西安交通大学 Method for mining association relationship of time series data based on change consistency
CN107239445A (en) * 2017-05-27 2017-10-10 中国矿业大学 The method and system that a kind of media event based on neutral net is extracted

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016187711A1 (en) * 2015-05-22 2016-12-01 Csts Health Care Inc. Biomarker-driven molecularly targeted combination therapies based on knowledge representation pathway analysis
CN106446081A (en) * 2016-09-09 2017-02-22 西安交通大学 Method for mining association relationship of time series data based on change consistency
CN107239445A (en) * 2017-05-27 2017-10-10 中国矿业大学 The method and system that a kind of media event based on neutral net is extracted

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZIHAO ZHU ET AL.: "Measuring Patient Similarities via a Deep Architecture with Medical Concept Embedding", 《2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM)》 *
贾峥等: "基于电子病历的患者相似性分析综述", 《中国生物医学工程学报》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110766060A (en) * 2019-10-14 2020-02-07 中山大学 Time series similarity calculation method, system and medium based on deep learning
CN111309900A (en) * 2020-01-17 2020-06-19 中国科学技术大学 Legal class similarity judging and pushing method
CN111309900B (en) * 2020-01-17 2022-09-06 中国科学技术大学 Legal class similarity judging and pushing method
CN112115184A (en) * 2020-09-18 2020-12-22 平安科技(深圳)有限公司 Time series data detection method and device, computer equipment and storage medium
CN112966808A (en) * 2021-01-25 2021-06-15 咪咕音乐有限公司 Data analysis method, device, server and readable storage medium
CN113239990A (en) * 2021-04-27 2021-08-10 中国银联股份有限公司 Method and device for performing feature processing on sequence data and storage medium
CN113377909A (en) * 2021-06-09 2021-09-10 平安科技(深圳)有限公司 Paraphrase analysis model training method and device, terminal equipment and storage medium
CN113377909B (en) * 2021-06-09 2023-07-11 平安科技(深圳)有限公司 Paraphrasing analysis model training method and device, terminal equipment and storage medium
CN114528334A (en) * 2022-02-18 2022-05-24 重庆伏特猫科技有限公司 Rapid similarity searching method in time sequence database
CN114528334B (en) * 2022-02-18 2022-10-18 重庆伏特猫科技有限公司 Rapid similarity searching method in time sequence database

Similar Documents

Publication Publication Date Title
CN109948646A (en) A kind of time series data method for measuring similarity and gauging system
Gadosey et al. SD-UNET: Stripping down U-net for segmentation of biomedical images on platforms with low computational budgets
Li et al. Comparison of feature learning methods for human activity recognition using wearable sensors
Yang et al. EANet: Edge-aware network for the extraction of buildings from aerial images
Li et al. Model compression for deep neural networks: A survey
Huang et al. City3D: Large-scale building reconstruction from airborne LiDAR point clouds
Zhang et al. Overview of multi-modal brain tumor mr image segmentation
CN109471895A (en) The extraction of electronic health record phenotype, phenotype name authority method and system
CN106909537B (en) One-word polysemous analysis method based on topic model and vector space
CN109325513A (en) A kind of image classification network training method based on magnanimity list class single image
Luo et al. Pine cone detection using boundary equilibrium generative adversarial networks and improved YOLOv3 model
Lyu et al. Cirrus detection based on RPCA and fractal dictionary learning in infrared imagery
Ma et al. An improved ball pivot algorithm-based ground filtering mechanism for LiDAR data
CN205015889U (en) Definite system of traditional chinese medical science lingual diagnosis model based on convolution neuroid
Wang et al. A deformable convolutional neural network with spatial-channel attention for remote sensing scene classification
Xu et al. A combination of lie group machine learning and deep learning for remote sensing scene classification using multi-layer heterogeneous feature extraction and fusion
Chai et al. Compact cloud detection with bidirectional self-attention knowledge distillation
Mou et al. YOLO-FR: A YOLOv5 infrared small target detection algorithm based on feature reassembly sampling method
Li et al. Method of building detection in optical remote sensing images based on segformer
Qin et al. PointSkelCNN: Deep Learning‐Based 3D Human Skeleton Extraction from Point Clouds
Zhang et al. Global random graph convolution network for hyperspectral image classification
Perko et al. Critical aspects of person counting and density estimation
Wang et al. SCA-Net: multiscale contextual information network for building extraction based on high-resolution remote sensing images
Chen et al. A novel deep nearest neighbor neural network for few-shot remote sensing image scene classification
Zhang et al. SaltISNet3D: Interactive salt segmentation from 3D seismic images using deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190628