CN106202128A - The sorting technique of sequential file and categorizing system - Google Patents

The sorting technique of sequential file and categorizing system Download PDF

Info

Publication number
CN106202128A
CN106202128A CN201510232775.6A CN201510232775A CN106202128A CN 106202128 A CN106202128 A CN 106202128A CN 201510232775 A CN201510232775 A CN 201510232775A CN 106202128 A CN106202128 A CN 106202128A
Authority
CN
China
Prior art keywords
statistic
matrix
sequential file
sequential
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510232775.6A
Other languages
Chinese (zh)
Inventor
杨铭
夏迎炬
孙健
侯翠琴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujitsu Ltd
Original Assignee
Fujitsu Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujitsu Ltd filed Critical Fujitsu Ltd
Priority to CN201510232775.6A priority Critical patent/CN106202128A/en
Publication of CN106202128A publication Critical patent/CN106202128A/en
Pending legal-status Critical Current

Links

Abstract

The invention discloses sorting technique and the categorizing system of a kind of sequential file.According to an aspect of the present invention, the sorting technique of sequential file includes: extract temporal aspect from multiple sequential files;Calculate the statistic vector of each sequential file according to the temporal aspect extracted, the element in wherein said statistic vector reflects the statistic result of corresponding time sequence file;Utilize the statistic vector construction feature matrix of the plurality of sequential file;And according to described eigenmatrix, the plurality of sequential file is classified.Thus, it is possible to sequential file is classified by utilization state statistical information, ensure that the reliability of classification results, and computation complexity can be simplified, thus realize classifying fast and accurately.

Description

The sorting technique of sequential file and categorizing system
Technical field
The present invention relates to computer documents classification field, be specifically related to the sorting technique of sequential file and divide Class system.
Background technology
Along with the development of multimedia application, the substantial amounts of sequential file resource with music file as representative goes out Existing.In recent years, organization and management sequential file, is increasingly paid close attention to by people the most effectively.
As a example by music file, it is organization and management sequential file effectively to music file mark classification Important means.Such as, according to music style (such as jazz, Bruce, allusion, rural area, rock and roll Deng) music file is labeled, it is an importance of mark classification.Due to music file Quantity is very big, so artificial mark is often wasted time and energy, and accuracy rate is the highest, easily due to manually Carelessness or cognitive competence limited and make mistakes.
In order to solve the problems referred to above, in the prior art, it is proposed that sequential file is classified automatically Multiple method.In order to realize the correct classification automatically of sequential file, mainly there are following two main points. One is how to design and select suitable grader to classify sequential file, and another is from former Which type of feature beginning sequential file extracts to be applicable to classification.
In the prior art, on the one hand have been presented for much for how to design and select suitably classifying The solution of device.But then, from original temporal file, which type of feature is extracted to be suitable for In the difficult point of classification always research, also lack gratifying solution in the prior art.
Summary of the invention
In view of this, the present invention proposes sorting technique and the categorizing system of a kind of sequential file, in order to With state statistical information, sequential file is classified.
According to an aspect of the invention, it is provided the sorting technique of a kind of sequential file, including: from Multiple sequential files extract temporal aspect;Each sequential file is calculated according to the temporal aspect extracted Statistic vector, the element in wherein said statistic vector reflects corresponding time sequence file Statistic result;Utilize the statistic vector construction feature matrix of the plurality of sequential file;With And according to described eigenmatrix, the plurality of sequential file is classified.
According to a further aspect in the invention, it is provided that the categorizing system of a kind of sequential file, including: special Levy extraction element, from multiple sequential files, extract temporal aspect;Calculate device, according to described feature The temporal aspect that extraction element extracts calculates the statistic vector of each sequential file, wherein said shape Element in state statistical vector reflects the statistic result of corresponding time sequence file;Matrix builds dress Put, utilize the statistic vector construction feature of the plurality of sequential file that described calculating device calculates Matrix;And grader, according to described matrix construction device build eigenmatrix to time the plurality of Preface part is classified.
According to technical scheme provided by the present invention, it is possible to use sequential file is entered by state statistical information Row classification, this both will not as utilize all temporal aspects of being extracted calculate complexity, again can Enough characteristic informations of abundant statistics that remains, for classified counting, ensure that classification knot The reliability of fruit, can simplify again computation complexity, thus realize classifying fast and accurately.
Accompanying drawing explanation
The embodiments of the present invention are read with reference to the drawings, other spy of the present invention be will be better understood Seeking peace advantage, accompanying drawing described here is intended merely to schematically illustrate embodiments of the present invention Purpose, and not all possible enforcement, and be not intended to limit the scope of the present invention.In the accompanying drawings:
Fig. 1 shows the flow process of the sorting technique of the sequential file according to one embodiment of the present invention Figure;
Fig. 2 shows the schematic diagram extracting temporal aspect from original temporal file;
Fig. 3 shows and calculates according to the temporal aspect extracted every according to one embodiment of the present invention The flow chart of the statistic vector of individual sequential file;
Fig. 4 shows and enters the temporal aspect of N number of sequential file according to one embodiment of the invention The schematic diagram of row cluster;
Fig. 5 shows the statistic calculating each sequential file according to one embodiment of the present invention The flow chart of vector;
Fig. 6 shows the cluster result of the temporal aspect according to the sequential file shown in Fig. 4 and generates poly- The schematic diagram of class state matrix;
Fig. 7 shows an example of the row combination chosen from the cluster state matrix shown in Fig. 6;
Fig. 8 shows and carries out double cunning in multiple row combine respectively according to one embodiment of the present invention Dynamic Window state statistics is to generate the flow chart of the assembled state statistical matrix of sequential file;
Fig. 9 shows the example arranging outer window and interior window in the row combination shown in Fig. 7;
Figure 10 shows an example of the assembled state statistical matrix according to this embodiment;
Figure 11 shows the statistic utilizing multiple sequential file according to one embodiment of the present invention The flow chart of vector construction feature matrix;
Figure 12 shows the frame of the categorizing system of the sequential file according to one embodiment of the present invention Figure;
Figure 13 shows the block diagram calculating device according to one embodiment of the present invention;
Figure 14 shows the block diagram of the computing unit according to one embodiment of the present invention;
Figure 15 shows the block diagram of the statistics subelement according to one embodiment of the present invention;
Figure 16 shows the block diagram of the matrix construction device according to one embodiment of the present invention;And
Figure 17 shows the computer that can be used for implementing method and system according to embodiments of the present invention Schematic block diagram.
Detailed description of the invention
Referring now to accompanying drawing, embodiments of the present invention are described in detail.Only it should be noted that following description It is only exemplary, and is not intended to limit the present invention.Additionally, in the following description, phase will be used Same drawing reference numeral represents the same or analogous parts in different accompanying drawing.Described below different real Execute the different characteristic in mode, can be bonded to each other, to form other embodiments in the scope of the invention.
Fig. 1 shows the flow process of the sorting technique of the sequential file according to one embodiment of the present invention Figure.As it is shown in figure 1, the sorting technique 1000 of sequential file can include step S1100 to S1400.
When sequential file is classified by needs, in step S1100, from multiple sequential files Extract temporal aspect.Here, various method known in the art can be utilized from original sequential file Middle extraction temporal aspect.And, the temporal aspect extracted can be that MFCC (fall by Mel frequency Spectral coefficient, Mel Frequency Cepstrum Coefficient) feature, FFT (fast Fourier Conversion, Fast Fourier Transformation) any of feature such as feature.Following In description, it is described as a example by MFCC feature, but the application is not limited to this.It is appreciated that Temporal aspect known to as various in FFT etc. is all applicable to the application.
Fig. 2 shows the schematic diagram extracting temporal aspect from original temporal file.As in figure 2 it is shown, The feature extracted from original temporal file (such as, music file) is temporal aspect, at Fig. 2 Each column shown in You Ce represents a frame of the temporal aspect extracted from sequential file.
Return Fig. 1, in step S1200, calculate each sequential according to the temporal aspect extracted The statistic vector of file.Statistic vector characterizes the statistic situation of sequential file, its In element reflect the statistic result of sequential file.Concrete calculating process and statistic The example of vector will be described below.
In step S1300, the statistic vector of calculated each sequential file is utilized to carry out structure Build eigenmatrix.Subsequently, in step S1400, according to this feature matrix to preface time the plurality of Part is classified.Available known any suitable grader is classified.Here, feature square Battle array is that the statistic vector utilizing sequential file builds, therefore, when this feature matrix contains each The state statistical information of preface part.These state statistical information are utilized sequential file to be classified, both Will not as utilize all temporal aspects of being extracted calculate complexity, ensure that again and remain The characteristic information of abundant statistics is used for classified counting, ensures that the reliability of classification results, Computation complexity can be simplified again, thus realize classifying fast and accurately.
Fig. 3 shows and calculates according to the temporal aspect extracted every according to one embodiment of the present invention The flow chart of the statistic vector of individual sequential file.As it is shown on figure 3, above-mentioned steps S1200 can be wrapped Enclosed tool step S1210 and S1220.In sub-step S1210, will extract from each sequential file Temporal aspect cluster.Temporal aspect (e.g., MFCC feature) due to sequential file Between frame and frame, identical probability is the least, therefore to simplify the calculating process of classification, this Shen The frame that please think the most similar belongs to same cluster, thus is gathered by each frame in temporal aspect Class, in order to for follow-up statistical computation.
Fig. 4 shows and enters the temporal aspect of N number of sequential file according to one embodiment of the invention The schematic diagram of row cluster.As shown in Figure 4, each sequential file in N number of sequential file 1 to N Temporal aspect all includes multiple frame, and each frame is represented by a column on the left of Fig. 4, by pair time The cluster of sequence characteristics, each frame of feature is clustered into the class in default classification.Concrete cluster Method can use the most known method in prior art, such as mean shift clustering (Mean-shift Clustering), Kmeans clusters methods such as (MiniBatchKmeans Clustering) in batches. In the example depicted in fig. 4, it is assumed that having preset 10 classes, each frame of the most each sequential file can The a certain class being clustered in classification 1 to 10, as shown in the right side of Fig. 4.Thus, cluster is utilized Label instead of the primitive character extracted from sequential file, in order to follow-up statistical computation.This Skilled person is appreciated that the number of default cluster can determine according to actual needs, and it is big Cause suitable with the quantity of sequential file class.Describe for convenience, the most all it will be assumed that preset 10 clusters.
Return Fig. 3, in sub-step S1220, poly-according to the temporal aspect of each sequential file Class result calculates the statistic vector of each sequential file.An embodiment according to the application, In sub-step S1220, calculate the state of each sequential file based on double sliding window statistical models Statistical vector.
Fig. 5 shows the statistic calculating each sequential file according to one embodiment of the present invention The flow chart of vector.As it is shown in figure 5, above-mentioned sub-step S1220 can include sub-step S1221 extremely S1224.In sub-step S1221, the cluster result according to the temporal aspect of each sequential file is raw Become the cluster state matrix of this sequential file.
Fig. 6 shows the cluster result of the temporal aspect according to the sequential file 1 shown in Fig. 4 and generates The schematic diagram of cluster state matrix.As shown in Figure 6, by sequential file 1 (such as, music file) Temporal aspect cluster result generate sequential file 1 cluster state matrix, the line number etc. of this matrix In the quantity of cluster preset when temporal aspect is clustered, in the example depicted in fig. 6, by In having preset 10 classes, so cluster state matrix has 10 row.Cluster state matrix in Fig. 6 Every string all represent the cluster result of the frame in the temporal aspect of sequential file 1.Such as, this square The first of battle array is classified as (1,0,0,0,0,0,0,0,0,0)T, i.e. first row only has the first row Element be 1, the element of remaining row is 0, and this list shows that the cluster result of the first frame is the first kind. Remaining respectively arranges and marks the most in this way.Such as, the 3rd of this matrix be classified as (0,1,0,0,0, 0,0,0,0,0)T, i.e. the element of the 3rd row only the second row is 1, and the element of remaining row is 0, this list shows that the cluster result of the 3rd frame is Equations of The Second Kind.Thus, characterized by the form of matrix The clustering information of the temporal aspect of sequential file.Being appreciated that also can be according to actual needs with its other party Formula design cluster state matrix, as long as it can characterize the clustering information of temporal aspect of sequential file.
Return Fig. 5, in sub-step S1222, from cluster state matrix, choose the combination of multiple row. As a example by the cluster state matrix shown in Fig. 6, it has 10 row, can choose multiple from 10 row Row combination, for subsequent treatment.
Fig. 7 shows an example of the row combination chosen from the cluster state matrix shown in Fig. 6.Such as figure Row combination shown in 7 includes the 1-3 row of the cluster state matrix shown in Fig. 6, the reality in Fig. 7 Heart circle represents that cluster state matrix element value in this place is 1, the unit at solid circles does not occurs Element value is 0.Calculating for the ease of subsequent statistical, in the combination of each row, the quantity of row can be equal.Example As, as it is shown in fig. 7, each row combination of cluster state matrix can include 3 row.It is possible to haveIndividual row combines.According to another embodiment of the present invention, if it is considered that the combination of each row is internal Putting in order of each row, i.e. such as 123 row and 321 row are different row combinations, then can have C 10 3 · P 3 3 = 720 Individual row combines.
Return Fig. 5, in sub-step S1223, at the cluster state matrix from each sequential file In multiple row combination of choosing carries out double sliding window statistic, respectively to generate this sequential file Assembled state statistical matrix.Thus, the statistic letter of the temporal aspect of available each sequential file Breath, for the classification to sequential file.Subsequently, in sub-step S1224, by preface time each The assembled state statistical matrix planarization of part, to obtain the statistic vector of this sequential file.Under Literary composition will be described in double sliding window statistic, assembled state statistical matrix and the mistake of planarization Journey.
Fig. 8 shows and carries out double cunning in multiple row combine respectively according to one embodiment of the present invention Dynamic Window state statistics is to generate the flow chart of the assembled state statistical matrix of sequential file.Such as Fig. 8 Shown in, above-mentioned sub-step S1223 can include sub-step S1223a to S1223e.In sub-step S1223a In, outer window is set in each row combines, and interior window is set in this outer window.
Fig. 9 shows the example arranging outer window and interior window in the row combination shown in Fig. 7.Such as figure Shown in 9, combination 123 of being expert at is provided with outer window Wout.It is appreciated that to obtain this row Statistical information in combination, the height of outer window Wout need to cover each row in the combination of this row (at this Example is 3 row), its length should be greater than the line number in row combination, and less than the total length of temporal aspect, Those skilled in the art can set the length of outer window Wout, such as, 20 according to actual needs.Again As it is shown in figure 9, outside window Wout is provided with interior window Winner, interior window Winner Height cover this row combination in each row, for the ease of follow-up statistical computation, its length is also 3.
Return Fig. 8, in sub-step S1223b, window in sliding in window Wout outside Winner, to obtain multiple State Viewpoint measured value.First, outer window Wout is kept not slide, interior window Often slide in mouth Winner window Wout outside lattice, a shape of available corresponding time sequence file State observation.State Viewpoint measured value is a vector, for characterizing the cluster state matrix that interior window comprises The value state of element.Such as, as it is shown in figure 9, the shape now obtained by interior window Winner State observation is (1,1,0)T, wherein " 1 " of the first row and the second row represents at the first of interior window Row and the second row occur in that solid circles (i.e. cluster state matrix element value in this place is 1), There is not solid circles in the third line of interior window in 0 expression of the third line.It is to say, at interior window In the event of solid circles in certain a line of mouth, then corresponding state observation vector is at the element of this row Value is 1, is otherwise 0.Visible, this State Viewpoint measured value of interior window Winner characterizes sequential The temporal aspect of file state at a few frames.So, the state observation in example as shown in Figure 9 Value have following 8 kinds may:
0 0 0 0 0 1 0 1 0 1 0 0 0 1 1 1 0 1 1 1 0 1 1 1
Return Fig. 8, in sub-step S1223c, obtain according to window in sliding in window outside To multiple State Viewpoint measured values add up the statistic value of outer window.Such as, as described in the above example, The State Viewpoint measured value of each interior window has above-mentioned 8 kinds of possibilities.So, window in sliding in window outside And obtained multiple State Viewpoint measured value, and these State Viewpoint measured values to be added up, the result of statistics is: For each in above-mentioned 8 kinds of possible State Viewpoint measured values, it is likely to occur, it is also possible to do not go out Existing.The result so added up, i.e. statistic value may have 28=256 kinds of results.It is to say, The statistic value of outer window is one of these 256 kinds of possible results.
In sub-step S1223d, slide in combination of being expert at outer window, to add up multiple statistic Value.Still as a example by above-mentioned example, once can obtain a statistic value due to outer window sliding, should Value falls into one of above-mentioned 256 kinds of possible results.So, by the exterior window that constantly slides in combination of being expert at Mouthful (often slide outer window, completes the slide of interior window, with root in being required to window outside Statistic behavior statistical value is carried out according to multiple State Viewpoint measured values), multiple statistic value can be obtained.To these The frequency that statistic value occurs in 256 kinds of possible results is added up.Lower list 1 is exemplary Show the result that the multiple statistic values obtained in the combination of row are added up, table 1 In every a line represent a kind of state, in table 1, certain value of certain state takes 0 expression and forms this exterior window There is not corresponding vector in multiple State Viewpoint measured values of the statistic value of mouth, takes 1 and represent occur Corresponding vector.Rightmost side string in table 1 represents every kind of state in multiple statistic values Frequency of occurrence.
Table 1
The benefit of the double sliding window statistic model of employing is: if only with single window sliding, then At a certain frame of temporal aspect, can only have a value cluster is 1 (seeing Fig. 6), then the shape of statistics State feature is the most definitely;And using Dual-window to slide, the information of statistics is more and more comprehensively, for Each frame, all added up its various states may, and be not only that can only to have a state be 1, and Remaining is all only 0.Therefore, double sliding window statistic models are both original by cluster reduction Temporal aspect, improves processing speed, maintains again abundant statistical information, just to guarantee classification Really property.
In sub-step S1223e, the statistic value combined by multiple row is grouped together, and constitutes The assembled state statistical matrix of sequential file.Still as a example by above-mentioned example, Figure 10 shows according to being somebody's turn to do One example of the assembled state statistical matrix of embodiment.As shown in Figure 10, by a sequential file The statistic value of multiple row combination is grouped together, and constitutes the assembled state statistics of this sequential file Matrix.Every a line of this matrix is made up of the statistic value of a row combination of this sequential file.
After sub-step S1223a to S1223e completes, carry out above-mentioned sub-step S1224.With Figure 10 As a example by shown assembled state statistical matrix, the state of the sequential file obtained by this matrix is planarized Statistical vector is as follows:
(203 ..., 2901,127 ..., 321,29 ..., 92 ... ..., 231 ..., 102)
Figure 11 shows the statistic utilizing multiple sequential file according to one embodiment of the present invention The flow chart of vector construction feature matrix.As shown in figure 11, above-mentioned steps S1300 can include sub-step Rapid S1310 and S1320.In sub-step S1310, by the statistic vector of multiple sequential files It is combined into statistic matrix.Still as a example by above-mentioned example, the dimension of statistic matrix M is M × n, wherein m is the number of sequential file,Wherein nDims represents Each row combines the line number comprised, thereforeRepresent the statistic result of a row combination (that is, a line in Figure 10);AndRepresent the quantity of the row combination of a sequential file. Then the example of statistic matrix M is as follows:
M mn = M 11 . . . M 1 n . . . M ij . . . M m 1 . . . M mn
In sub-step S1320, statistic matrix is carried out weight conversion, to form eigenmatrix. An embodiment according to the application, by calculating each element in statistic matrix and its institute The difference of other elements of place's row carries out weight conversion.
According to another embodiment, above-mentioned sequential file is music file.So, for music file, Can be by calculating entropy and the ratio of the edge entropy of row residing for it of each element in statistic matrix Carry out weight conversion.Such as, for statistic matrix M, can be counted by following formula 1 Calculate its element MijEntropy, and calculated the string of statistic matrix M by following formula 2 Edge entropy.
H ( M ij ) = - M ij + 1 Σ k = 1 m M kj + 2 · log ( M ij + 1 Σ k = 1 m M kj + 2 ) - ( Σ k = 1 m M kj - M ij + 1 ) Σ k = 1 m M kj + 2 · log ( ( Σ k = 1 m M kj - M ij + 1 ) Σ k = 1 m M kj + 2 ) - - - ( 1 )
H ( M · j ) = Σ o = 1 m M oj + 1 Σ k = 1 m M kj + 2 · log ( M oj + 1 Σ k = 1 m M kj + 2 ) - - - ( 2 )
So, H (Mij) and H (M·j) ratio can be used as the element F of eigenmatrixij, as following Formula 3 represents.
F ij = H ( M ij ) H ( M · j ) - - - ( 3 )
Thus, i.e. can get eigenmatrix F.This feature matrix F is used for by grader music file Carry out the input feature vector classified.Utilize this feature matrix F that music file is classified, both will not picture Utilize the original temporal audio frequency characteristics extracted to carry out calculating such complexity, ensure that again and remain foot The characteristic information of enough statistics is used for classified counting, ensures that the reliability of classification results, again Computation complexity can be simplified, thus realize classifying fast and accurately.
Figure 12 shows the frame of the categorizing system of the sequential file according to one embodiment of the present invention Figure.As shown in figure 12, the categorizing system 1200 of sequential file can include feature deriving means 1210, Calculate device 1220, matrix construction device 1230 and grader 1240.Feature deriving means 1210 Temporal aspect can be extracted from multiple sequential files.Calculating device 1220 can be according to feature deriving means 1210 temporal aspects extracted calculate the statistic vector of each sequential file, wherein this state system Element in meter vector reflects the statistic result of corresponding time sequence file.Matrix construction device The 1230 available statistic vector construction features calculating multiple sequential files that device 1220 calculates Matrix.Grader 1240 can be according to the eigenmatrix of matrix construction device 1230 structure to multiple sequential File is classified.
Figure 13 shows the block diagram calculating device according to one embodiment of the present invention.Such as Figure 13 Shown in, calculate device 1220 and can include cluster cell 1221 and computing unit 1222.Cluster cell The temporal aspect that feature deriving means 1210 extracts from each sequential file can be gathered by 1221 Class.Computing unit 1222 can be according to poly-to the temporal aspect of each sequential file of cluster cell 1221 Class result calculates the statistic vector of each sequential file.
According to an embodiment of the invention, computing unit 1222 can be added up based on double sliding windows Model calculates the statistic vector of each sequential file.
Figure 14 shows the block diagram of the computing unit according to one embodiment of the present invention.Such as Figure 14 Shown in, computing unit 1222 can include matrix generate subelement 1222a, combination subelement 1222b, Statistics subelement 1222c and smooth beggar unit 1222d.Matrix generates subelement 1222a can basis Cluster cell 1221 generates the poly-of sequential file to the cluster result of the temporal aspect of each sequential file Class state matrix.Combination subelement 1222b can generate, from matrix, the cluster shape that subelement 1222a generates State matrix is chosen the combination of multiple row.Statistics subelement 1222c can be selected by combination subelement 1222b The multiple row combination taken carries out double sliding window statistic, respectively to generate the combination of sequential file Statistic matrix.This assembled state statistical matrix can be planarized by planarization subelement 1222d, with Obtain the statistic vector of sequential file.
Figure 15 shows the block diagram of the statistics subelement according to one embodiment of the present invention.Such as Figure 15 Shown in, statistics subelement 1222c can include window controlling module 1222c1, logging modle 1222c2, Statistical module 1222c3 and matrix constitute module 1222c4.Window controlling module 1222c1 can be each Row combination arranges outer window, set outer window arranges interior window, and controls this interior window Slip in window outside and this outer window be expert at combination in slip.Logging modle 1222c2 can The State Viewpoint measured value of window in record.Statistical module 1222c3 can be according to logging modle 1222c2 record Multiple State Viewpoint measured values carry out statistic behavior statistical value.Matrix constitutes module 1222c4 can be by multiple row The statistic value of combination constitutes combinations thereof statistic matrix.
According to an embodiment of the invention, matrix generates each sequential that subelement 1222a generates The line number of the cluster state matrix of file is equal to when temporal aspect is clustered by cluster cell 1221 The quantity of cluster preset, and the every string clustering state matrix all represents that the sequential of sequential file is special The cluster result of the frame in levying.
Figure 16 shows the block diagram of the matrix construction device according to one embodiment of the present invention.Such as figure Shown in 16, matrix construction device 1230 can include that matrix forms unit 1231 and weight converting unit 1232.Matrix forms unit 1231 can will calculate the state of multiple sequential files that device 1220 calculates Statistical vector is combined into statistic matrix.Weight converting unit 1232 can form unit to matrix The 1231 statistic matrixes formed carry out weight conversion, to form eigenmatrix.
According to an embodiment of the invention, weight converting unit 1232 can be united by calculating state Each element in meter matrix carries out weight conversion with the difference of other elements of its residing row.
According to another implementation of the invention, above-mentioned multiple sequential file can be multiple music literary composition Part.Thus, weight converting unit 1232 can be by each element in calculating statistic matrix Entropy carries out weight conversion with the ratio of the edge entropy of its residing row.
It addition, still need here it is noted that in said system each building block can pass through software, The mode of firmware, hardware or a combination thereof configures.Configure spendable specific means or mode for this Known to skilled person, do not repeat them here.In the case of being realized by software or firmware, From storage medium or network to computer (such as general shown in Figure 17 with specialized hardware structure Computer 1700) install constitute this software program, this computer when being provided with various program, It is able to carry out various functions etc..
Figure 17 shows the computer that can be used for implementing method and system according to embodiments of the present invention Schematic block diagram.
In fig. 17, CPU (CPU) 1701 is according in read only memory (ROM) 1702 The program stored or the program being loaded into random access memory (RAM) 1703 from storage part 1708 Perform various process.In RAM 1703, perform various always according to needs storage as CPU 1701 Data required during process etc..CPU 1701, ROM 1702 and RAM 1703 are via bus 1704 are connected to each other.Input/output interface 1705 is also connected to bus 1704.
Components described below is connected to input/output interface 1705: importation 1706 (includes keyboard, Mus Mark etc.), output part 1707 (include display, such as cathode ray tube (CRT), liquid crystal Show device (LCD) etc., and speaker etc.), storage part 1708 (including hard disk etc.), communications portion 1709 (including NIC such as LAN card, modem etc.).Communications portion 1709 warp Communication process is performed by network such as the Internet.As required, driver 1710 can be connected to defeated Enter/output interface 1705.Detachable media 1711 such as disk, CD, magneto-optic disk, quasiconductor are deposited Reservoir etc. can be installed in driver 1710 as required so that the computer read out Program is installed to store in part 1708 as required.
In the case of realizing above-mentioned series of processes by software, it is situated between from network such as the Internet or storage Matter such as detachable media 1711 installs the program constituting software.
It will be understood by those of skill in the art that this storage medium is not limited to its shown in Figure 17 In have program stored therein and equipment distributes the detachable media of the program that provides a user with separately 1711.The example of detachable media 1711 comprises disk (comprising floppy disk (registered trade mark)), CD (comprises Compact disc read-only memory (CD-ROM) and digital universal disc (DVD)), magneto-optic disk (comprise mini-disk (MD) (registered trade mark)) and semiconductor memory.Or, storage medium can be ROM 1702, Hard disk of comprising etc. in storage part 1708, wherein computer program stored, and with comprise setting of they For being distributed to user together.
The present invention also proposes the program product that a kind of storage has the instruction code of machine-readable.Described finger When making code be read by machine and perform, the above-mentioned method according to embodiment of the present invention can be performed.
Correspondingly, for carrying the depositing of program product that above-mentioned storage has the instruction code of machine-readable Storage media is intended to be included within the scope of the present invention.Described storage medium include but not limited to floppy disk, CD, Magneto-optic disk, storage card, memory stick etc..
It should be noted that, the method for the present invention be not limited to specifications described in time sequencing hold OK, it is also possible to sequentially, in parallel or independently perform according to other order.Therefore, this explanation The technical scope of the present invention is not construed as limiting by the execution sequence of the method described in book.
The description of embodiment each to the present invention is to be more fully understood that the present invention above, and it is only Exemplary, and be not intended to limit the invention.It should be noted that in the above description, for one Kind embodiment description and/or the feature illustrated can be in same or similar mode one or more Other embodiment individual uses, combined with the feature in other embodiment, or it is real to substitute other Execute the feature in mode.It will be understood by those skilled in the art that in the inventive concept without departing from the present invention In the case of, the variations and modifications carried out for embodiment described above, belong to this In the range of invention.
To sum up, in an embodiment according to the present invention, the invention provides following technical scheme.
Scheme 1, the sorting technique of a kind of sequential file, including:
Temporal aspect is extracted from multiple sequential files;
The statistic vector of each sequential file is calculated according to the temporal aspect extracted, wherein said Element in statistic vector reflects the statistic result of corresponding time sequence file;
Utilize the statistic vector construction feature matrix of the plurality of sequential file;And
According to described eigenmatrix, the plurality of sequential file is classified.
Scheme 2, method as described in scheme 1, wherein calculate each according to the temporal aspect extracted The statistic vector of sequential file includes:
The temporal aspect extracted from each sequential file is clustered;And
The cluster result of the temporal aspect according to each sequential file calculates the state of each sequential file Statistical vector.
Scheme 3, method as described in scheme 2, the statistic vector of the most each sequential file is Calculated based on double sliding window statistical models.
Scheme 4, method as described in scheme 2, wherein calculate each sequential file by following steps Statistic vector:
The cluster result of the temporal aspect according to each sequential file generates the cluster of described sequential file State matrix;
The combination of multiple row is chosen from described cluster state matrix;
Double sliding window statistic is carried out respectively, to generate described sequential in the plurality of row combines The assembled state statistical matrix of file;And
Described assembled state statistical matrix is planarized, with obtain the statistic of described sequential file to Amount.
Scheme 5, method as described in scheme 4, wherein carry out double in the plurality of row combines respectively Sliding window statistic, includes generating the assembled state statistical matrix of described sequential file:
Outer window is set in each row combines, and interior window is set in described outer window;
Slide in described outer window described interior window, to obtain multiple State Viewpoint measured value;
Statistic behavior statistical value is carried out according to the plurality of State Viewpoint measured value;
Slide in described row combines described outer window, to add up multiple statistic value;And
The statistic value combined by the plurality of row constitutes the assembled state statistics of described sequential file Matrix.
Scheme 6, method as described in scheme 4 or 5, the cluster state square of the most each sequential file The line number of battle array is equal to the quantity of the cluster preset when being clustered by temporal aspect, and described cluster Every string of state matrix all represents the cluster result of the frame in the temporal aspect of described sequential file.
Scheme 7, method as according to any one of scheme 1-6, preface when wherein utilizing the plurality of The statistic vector construction feature matrix of part includes:
By the statistic Vector Groups synthetic state statistical matrix of the plurality of sequential file;And
Described statistic matrix is carried out weight conversion, to form eigenmatrix.
Scheme 8, method as described in scheme 7, wherein by calculating in described statistic matrix Each element carries out weight conversion with the difference of other elements of its residing row.
Scheme 9, method as described in scheme 7, wherein said multiple sequential files are multiple music literary composition Part.
Scheme 10, method as described in scheme 9, wherein by calculating in described statistic matrix The entropy of each element carry out weight conversion with the ratio of the edge entropy of its residing row.
Scheme 11, the categorizing system of a kind of sequential file, including:
Feature deriving means, extracts temporal aspect from multiple sequential files;
Calculate device, calculate each sequential file according to the temporal aspect that described feature deriving means extracts Statistic vector, the element in wherein said statistic vector reflects corresponding time sequence file Statistic result;
Matrix construction device, utilizes the state system of the plurality of sequential file that described calculating device calculates Meter vector construction feature matrix;And
Grader, according to described matrix construction device build eigenmatrix to the plurality of sequential file Classify.
Scheme 12, system as described in scheme 11, wherein said calculating device includes:
Cluster cell, enters the temporal aspect that described feature deriving means extracts from each sequential file Row cluster;And
Computing unit, according to the described cluster cell cluster result to the temporal aspect of each sequential file Calculate the statistic vector of each sequential file.
Scheme 13, system as described in scheme 12, wherein said computing unit is based on double sliding windows Statistical model calculates the statistic vector of each sequential file.
Scheme 14, system as described in scheme 12, wherein said computing unit includes:
Matrix generates subelement, according to poly-to the temporal aspect of each sequential file of described cluster cell Class result generates the cluster state matrix of described sequential file;
Combination subelement, chooses multiple from the cluster state matrix that described matrix generation subelement generates Row combination;
Statistics subelement, carries out double sliding in the multiple row selected by described combination subelement combine respectively Dynamic Window state statistics, to generate the assembled state statistical matrix of described sequential file;And
Planarization subelement, planarizes described assembled state statistical matrix, preface during to obtain described The described statistic vector of part.
Scheme 15, system as described in scheme 14, wherein said statistics subelement includes:
Window controlling module, arranges outer window, in arranging in described outer window in each row combines Window, and control the slip in described outer window of the described interior window and described outer window at described row Slip in combination;
Logging modle, records the State Viewpoint measured value of described interior window;
Statistical module, carrys out statistic behavior statistics according to multiple State Viewpoint measured values of described logging modle record Value;And
Matrix constitutes module, the plurality of row the statistic value combined constitutes described assembled state system Meter matrix.
Scheme 16, system as described in scheme 14 or 15, it is raw that wherein said matrix generates subelement The line number of the cluster state matrix of each sequential file become is special by sequential equal at described cluster cell Levy the quantity of the cluster preset when clustering, and every string of described cluster state matrix all represents The cluster result of the frame in the temporal aspect of described sequential file.
Scheme 17, system as according to any one of scheme 11-16, wherein said matrix construction device Including:
Matrix forms unit, by the statistic of the plurality of sequential file that described calculating device calculates Vector Groups synthetic state statistical matrix;And
Weight converting unit, the statistic matrix forming described matrix formation unit carries out weight and turns Change, to form eigenmatrix.
Scheme 18, system as described in scheme 17, wherein said weight converting unit is by calculating institute State the difference of other elements of each element in statistic matrix and its residing row to carry out weight Conversion.
Scheme 19, method as described in scheme 17, wherein said multiple sequential files are multiple music File.
Scheme 20, system as described in scheme 19, wherein said weight converting unit is by calculating institute The ratio of the entropy and the edge entropy of its residing row of stating each element in statistic matrix is weighed Heavily change.

Claims (10)

1. a sorting technique for sequential file, including:
Temporal aspect is extracted from multiple sequential files;
The statistic vector of each sequential file is calculated according to the temporal aspect extracted, wherein said Element in statistic vector reflects the statistic result of corresponding time sequence file;
Utilize the statistic vector construction feature matrix of the plurality of sequential file;And
According to described eigenmatrix, the plurality of sequential file is classified.
2. the method for claim 1, wherein calculates each according to the temporal aspect extracted The statistic vector of sequential file includes:
The temporal aspect extracted from each sequential file is clustered;And
The cluster result of the temporal aspect according to each sequential file calculates the state of each sequential file Statistical vector.
3. method as claimed in claim 2, the statistic vector of the most each sequential file is Calculated based on double sliding window statistical models.
4. method as claimed in claim 2, wherein calculates each sequential file by following steps Statistic vector:
The cluster result of the temporal aspect according to each sequential file generates the cluster of described sequential file State matrix;
The combination of multiple row is chosen from described cluster state matrix;
Double sliding window statistic is carried out respectively, to generate described sequential in the plurality of row combines The assembled state statistical matrix of file;And
Described assembled state statistical matrix is planarized, with obtain the statistic of described sequential file to Amount.
5. method as claimed in claim 4, wherein carries out double in the plurality of row combines respectively Sliding window statistic, includes generating the assembled state statistical matrix of described sequential file:
Outer window is set in each row combines, and interior window is set in described outer window;
Slide in described outer window described interior window, to obtain multiple State Viewpoint measured value;
Statistic behavior statistical value is carried out according to the plurality of State Viewpoint measured value;
Slide in described row combines described outer window, to add up multiple statistic value;And
The statistic value combined by the plurality of row constitutes the assembled state statistics of described sequential file Matrix.
6. method as claimed in claim 4, the cluster state matrix of the most each sequential file Line number is equal to the quantity of the cluster preset when being clustered by temporal aspect, and described cluster state Every string of matrix all represents the cluster result of the frame in the temporal aspect of described sequential file.
7. the method as according to any one of claim 1-6, preface when wherein utilizing the plurality of The statistic vector construction feature matrix of part includes:
By the statistic Vector Groups synthetic state statistical matrix of the plurality of sequential file;And
Described statistic matrix is carried out weight conversion, to form eigenmatrix.
8. method as claimed in claim 7, wherein by calculating in described statistic matrix Each element carries out weight conversion with the difference of other elements of its residing row.
9. method as claimed in claim 7, wherein said multiple sequential files are multiple music literary composition Part, by calculating entropy and the edge entropy of row residing for it of each element in described statistic matrix Ratio carries out weight conversion.
10. a categorizing system for sequential file, including:
Feature deriving means, extracts temporal aspect from multiple sequential files;
Calculate device, calculate each sequential file according to the temporal aspect that described feature deriving means extracts Statistic vector, the element in wherein said statistic vector reflects corresponding time sequence file Statistic result;
Matrix construction device, utilizes the state system of the plurality of sequential file that described calculating device calculates Meter vector construction feature matrix;And
Grader, according to described matrix construction device build eigenmatrix to the plurality of sequential file Classify.
CN201510232775.6A 2015-05-08 2015-05-08 The sorting technique of sequential file and categorizing system Pending CN106202128A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510232775.6A CN106202128A (en) 2015-05-08 2015-05-08 The sorting technique of sequential file and categorizing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510232775.6A CN106202128A (en) 2015-05-08 2015-05-08 The sorting technique of sequential file and categorizing system

Publications (1)

Publication Number Publication Date
CN106202128A true CN106202128A (en) 2016-12-07

Family

ID=57459878

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510232775.6A Pending CN106202128A (en) 2015-05-08 2015-05-08 The sorting technique of sequential file and categorizing system

Country Status (1)

Country Link
CN (1) CN106202128A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134839A (en) * 2019-03-27 2019-08-16 平安科技(深圳)有限公司 Time series data characteristic processing method, apparatus and computer readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion
CN102129456A (en) * 2011-03-09 2011-07-20 天津大学 Method for monitoring and automatically classifying music factions based on decorrelation sparse mapping
CN102842310A (en) * 2012-08-10 2012-12-26 上海协言科学技术服务有限公司 Method for extracting and utilizing audio features for repairing Chinese national folk music audios

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101599271A (en) * 2009-07-07 2009-12-09 华中科技大学 A kind of recognition methods of digital music emotion
CN102129456A (en) * 2011-03-09 2011-07-20 天津大学 Method for monitoring and automatically classifying music factions based on decorrelation sparse mapping
CN102842310A (en) * 2012-08-10 2012-12-26 上海协言科学技术服务有限公司 Method for extracting and utilizing audio features for repairing Chinese national folk music audios

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134839A (en) * 2019-03-27 2019-08-16 平安科技(深圳)有限公司 Time series data characteristic processing method, apparatus and computer readable storage medium
CN110134839B (en) * 2019-03-27 2023-06-06 平安科技(深圳)有限公司 Time sequence data characteristic processing method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN106529503B (en) A kind of integrated convolutional neural networks face emotion identification method
Kosinski et al. Mining big data to extract patterns and predict real-life outcomes.
CN101587493B (en) Text classification method
CN108804677B (en) Deep learning problem classification method and system combining multi-level attention mechanism
CN102521656B (en) Integrated transfer learning method for classification of unbalance samples
CN104346629B (en) A kind of model parameter training method, apparatus and system
CN106815244B (en) Text vector representation method and device
CN108875067A (en) text data classification method, device, equipment and storage medium
CN107463605A (en) The recognition methods and device of low-quality News Resources, computer equipment and computer-readable recording medium
CN105808524A (en) Patent document abstract-based automatic patent classification method
WO2015165372A1 (en) Method and apparatus for classifying object based on social networking service, and storage medium
CN107293308B (en) A kind of audio-frequency processing method and device
CN108090800A (en) A kind of game item method for pushing and device based on player's consumption potentiality
CN103824565A (en) Humming music reading method and system based on music note and duration modeling
CN107944986A (en) A kind of O2O Method of Commodity Recommendation, system and equipment
CN101556553A (en) Defect prediction method and system based on requirement change
CN108960264A (en) The training method and device of disaggregated model
CN103473556B (en) Hierarchical SVM sorting technique based on rejection subspace
CN109948680A (en) The classification method and system of medical record data
CN106598999A (en) Method and device for calculating text theme membership degree
CN109784966A (en) A kind of music website customer churn prediction method
CN106294882A (en) Data digging method and device
CN108038108A (en) Participle model training method and device and storage medium
CN104077303A (en) Method and device for displaying data
Tang et al. Improved convolutional neural networks for acoustic event classification

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20161207

WD01 Invention patent application deemed withdrawn after publication