CN112131322A - Time series classification method and device - Google Patents
Time series classification method and device Download PDFInfo
- Publication number
- CN112131322A CN112131322A CN202011003407.1A CN202011003407A CN112131322A CN 112131322 A CN112131322 A CN 112131322A CN 202011003407 A CN202011003407 A CN 202011003407A CN 112131322 A CN112131322 A CN 112131322A
- Authority
- CN
- China
- Prior art keywords
- classified
- sliding window
- time series
- subset
- subsequences
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- A—HUMAN NECESSITIES
- A63—SPORTS; GAMES; AMUSEMENTS
- A63F—CARD, BOARD, OR ROULETTE GAMES; INDOOR GAMES USING SMALL MOVING PLAYING BODIES; VIDEO GAMES; GAMES NOT OTHERWISE PROVIDED FOR
- A63F13/00—Video games, i.e. games using an electronically generated display having two or more dimensions
- A63F13/70—Game security or game management aspects
- A63F13/79—Game security or game management aspects involving player-related data, e.g. identities, accounts, preferences or play histories
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Computer Security & Cryptography (AREA)
- General Business, Economics & Management (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the application provides a time series classification method and device. The time series classification method comprises the following steps: acquiring a time sequence to be classified and a classified time sequence set, wherein the classified time sequence set comprises a plurality of classified time sequences and the classes of the classified time sequences, and each classified time sequence comprises a plurality of first subsequences obtained by dividing through the length of a target sliding window; dividing the time sequence to be classified according to the length of the target sliding window to obtain a plurality of second subsequences; calculating the similarity between the time sequence to be classified and each classified time sequence according to the plurality of second subsequences and the plurality of first subsequences contained in each classified time sequence; and determining the category of the time sequence to be classified according to the similarity between the time sequence to be classified and each classified time sequence. The technical scheme of the embodiment of the application can improve the classification accuracy.
Description
Technical Field
The application relates to the technical field of data mining, in particular to a time series classification method and device.
Background
The time sequence is an ordered sequence formed by arranging numerical values of a certain phenomenon or a statistical index at different time points according to a time sequence. The problem of time series classification has been a major concern for researchers in the field of time series data mining. However, the relevant time series classification algorithm usually defaults that the time series data to be classified is preprocessed, and the time series data in the real world are often unequal in length and have missing and abnormal values, so that the relevant time series classification algorithm is difficult to be applied to data in the real environment simply and effectively, and the classification accuracy is difficult to be ensured.
Disclosure of Invention
The embodiment of the application provides a time series classification method and device, and classification accuracy can be improved at least to a certain extent.
Other features and advantages of the present application will be apparent from the following detailed description, or may be learned by practice of the application.
According to an aspect of an embodiment of the present application, there is provided a time series classification method, including: acquiring a time sequence to be classified and a classified time sequence set, wherein the classified time sequence set comprises a plurality of classified time sequences and the classes of the classified time sequences, and each classified time sequence comprises a plurality of first subsequences obtained by dividing through the length of a target sliding window; dividing the time sequence to be classified according to the length of the target sliding window to obtain a plurality of second subsequences; calculating the similarity between the time sequence to be classified and each classified time sequence according to the plurality of second subsequences and the plurality of first subsequences contained in each classified time sequence; and determining the category of the time sequence to be classified according to the similarity between the time sequence to be classified and each classified time sequence.
According to an aspect of an embodiment of the present application, there is provided a time-series classification apparatus including: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a time sequence to be classified and a classified time sequence set, the classified time sequence set comprises a plurality of classified time sequences and categories of the classified time sequences, and each classified time sequence comprises a plurality of first subsequences obtained by dividing through the length of a target sliding window; the first dividing unit is configured to divide the time sequence to be classified according to the length of the target sliding window to obtain a plurality of second subsequences; a calculating unit, configured to calculate a similarity between the time series to be classified and each classified time series according to a plurality of first subsequences included in the plurality of second subsequences and each classified time series; a first determining unit, configured to determine the category of the time series to be classified according to the similarity between the time series to be classified and each classified time series.
In some embodiments of the present application, based on the foregoing scheme, the first determining unit is configured to: and acquiring the maximum similarity in the similarities between the time sequence to be classified and each classified time sequence, and taking the category of the classified time sequence corresponding to the maximum similarity as the category of the time sequence to be classified.
In some embodiments of the present application, based on the foregoing solution, the apparatus further includes: a generating unit configured to divide the classified time series set into a first subset and a second subset, and generate a plurality of sliding window lengths according to the sequence length of each classified time series; a second dividing unit, configured to divide the sorted time series included in the first subset according to each sliding window length to obtain a plurality of third subsequences corresponding to each sliding window length, and divide the sorted time series included in the second subset according to each sliding window length to obtain a plurality of fourth subsequences corresponding to each sliding window length; a second determining unit, configured to determine, according to the plurality of third subsequences and the plurality of fourth subsequences, classification accuracy rates corresponding to the respective sliding window lengths; and the third determining unit is configured to determine the length of the target sliding window according to the classification accuracy corresponding to each sliding window length.
In some embodiments of the present application, based on the foregoing scheme, the third determining unit is configured to: and acquiring the maximum classification accuracy rate of the classification accuracy rates corresponding to the lengths of the sliding windows, and taking the length of the sliding window corresponding to the maximum classification accuracy rate as the length of the target sliding window.
In some embodiments of the present application, based on the foregoing scheme, the second determining unit includes: a calculating subunit configured to calculate, according to the plurality of third subsequences and the plurality of fourth subsequences, similarities between the sorted time sequences contained in the first subset and the sorted time sequences contained in the second subset with respect to the respective sliding window lengths; a first determining subunit, configured to determine a reference category of the sorted time series contained in the second subset with respect to the respective sliding window lengths according to a similarity between the sorted time series contained in the first subset and the sorted time series contained in the second subset with respect to the respective sliding window lengths; a second determining subunit, configured to determine, according to the category of the sorted time series included in the second subset and the reference category of the sorted time series included in the second subset relative to each sliding window length, a sorting accuracy corresponding to each sliding window length.
In some embodiments of the present application, based on the foregoing scheme, the first determining subunit is configured to: and acquiring the maximum similarity of the classified time series contained in the first subset and the classified time series contained in the second subset relative to the lengths of the sliding windows, and taking the category of the classified time series corresponding to the maximum similarity as the reference category of the classified time series contained in the second subset relative to the lengths of the sliding windows.
In some embodiments of the present application, based on the foregoing scheme, the second determining subunit is configured to: determining the classification accuracy of the classified time series contained in the second subset relative to the lengths of the sliding windows according to the classes of the classified time series contained in the second subset and the reference classes of the classified time series contained in the second subset relative to the lengths of the sliding windows; and determining the classification accuracy corresponding to each sliding window length according to the classification accuracy of the classified time sequences contained in the second subset relative to each sliding window length and the number of the classified time sequences contained in the second subset.
In some embodiments of the present application, based on the foregoing scheme, the second determining subunit is configured to: and calculating the ratio of the sum of the classification accuracy of the classified time sequences contained in the second subset relative to the lengths of the sliding windows to the number of the classified time sequences contained in the second subset, and taking the ratio as the classification accuracy corresponding to the lengths of the sliding windows.
In some embodiments of the present application, based on the foregoing scheme, the second determining unit is configured to: calculating the similarity of the sorted time series contained in the first subset and the sorted time series contained in the second subset relative to the length of each sliding window for a plurality of times according to the plurality of third subsequences and the plurality of fourth subsequences; determining a single reference category of the classified time series contained in the second subset relative to the length of each sliding window according to the similarity obtained by each calculation; determining a plurality of classification accuracy rates corresponding to the sliding window lengths according to the classes of the classified time sequences contained in the second subset and the single reference classes of the classified time sequences contained in the second subset relative to the sliding window lengths; and calculating the ratio of the sum of the classification accuracy rates corresponding to the lengths of the sliding windows to the times, and taking the calculated ratio as the classification accuracy rate corresponding to the length of each sliding window.
In the technical solutions provided in some embodiments of the present application, a time sequence to be classified and a classified time sequence set are obtained, each classified time sequence includes a plurality of first subsequences obtained by dividing according to a target sliding window length, the time sequence to be classified is divided according to the target sliding window length to obtain a plurality of second subsequences, and then a similarity between the time sequence to be classified and each classified time sequence is obtained by calculating through the plurality of second subsequences and the plurality of first subsequences, and a category of the time sequence to be classified is determined according to the similarity. The technical scheme provided by the embodiment of the application can be directly applied to original time sequence data existing in a real scene without additionally preprocessing the time sequence, and the time sequence data is divided by using the length of the target sliding window to obtain subsequences, so that the influence of unequal time sequence, missing values or abnormal values is effectively avoided, any similarity measurement method is compatible, the time sequence can be effectively classified, and the classification accuracy and efficiency are improved.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 is a diagram illustrating an exemplary system architecture to which aspects of embodiments of the present application may be applied;
FIG. 2 shows a flow diagram of a method of time series classification according to an embodiment of the present application;
FIG. 3 illustrates a flow chart for determining a target sliding window length according to an embodiment of the present application;
FIG. 4 shows a detailed flowchart of step S330 according to an embodiment of the present application;
fig. 5 shows a detailed flowchart of step S3303 according to an embodiment of the present application;
fig. 6 shows a detailed flowchart of step S330 according to another embodiment of the present application;
FIG. 7 shows a block diagram of a time series classification apparatus according to an embodiment of the present application;
FIG. 8 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
It is to be noted that the terms used in the specification and claims of the present application and the above-described drawings are only for describing the embodiments and are not intended to limit the scope of the present application. It will be understood that the terms "comprises," "comprising," "includes," "including," "has," "having," and the like, when used herein, specify the presence of stated features, integers, steps, operations, elements, components, and/or groups thereof, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It will be further understood that, although the terms first, second, third, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element without departing from the scope of the present invention. Similarly, a second element may be termed a first element. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more.
Before further detailed description of the embodiments of the present application, terms and expressions referred to in the embodiments of the present application will be described, and the terms and expressions referred to in the embodiments of the present application will be used for the following explanation.
Time series: the numerical values of a certain phenomenon or statistical index at different time points are arranged according to the time sequence to form an ordered sequence.
User portrait: the users are clustered or classified based on their behavioral characteristic data (usually time series data), thereby realizing the characterization of the users.
Fig. 1 shows a schematic diagram of an exemplary system architecture to which the technical solution of the embodiments of the present application can be applied.
As shown in fig. 1, the system architecture 100 may include one or more of terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few. The terminal devices 101, 102, 103 may be various electronic devices having a display screen, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative, and that there may be any number of terminal devices, networks, and servers, as desired for an implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
The time-series classification method provided by the embodiment of the present application is generally executed by the server 105, and accordingly, the time-series classification apparatus is generally disposed in the server 105. However, it is easily understood by those skilled in the art that the time-series classification method provided in the embodiment of the present application may also be executed by the terminal devices 101, 102, and 103, and accordingly, the time-series classification apparatus may also be disposed in the terminal devices 101, 102, and 103, which is not particularly limited in this exemplary embodiment. For example, in an exemplary embodiment, the user may upload the time series to the server 105 through the terminal devices 101, 102, and 103, and the server 105 processes the time series through the time series classification method provided in the embodiment of the present application, and sends the obtained classification result to the terminal devices 101, 102, and 103.
It can be understood that the time series classification method can be applied to any real-world time series data classification scene, such as electrocardiosignal abnormal classification and sensing action classification, and in addition, the time series classification method provided by the embodiment of the application can also be applied to user portrait analysis, and effectively classifies users through time series classification, so as to realize portrayal of different types of users.
In the application scenario of game user portrait analysis, it is a very important issue to identify the minor among game users in order to protect the physical and mental health of the minor nowadays. Currently, although identity verification of minors is already performed, a phenomenon that a large number of minors play games by using parent mobile phones still exists. On the one hand, excessive play can affect the healthy growth of minors, and on the other hand, this can also lead to the potential problem of paying for the minors to play via the parent's mobile phone, thereby causing undesirable complaints of refunds and negative social consensus. Therefore, how to identify underage game users remains a very important and worthy of study problem.
In an application scenario of game user representation analysis, the server 105 may be a game server, and the terminal devices 101, 102, and 103 may be terminal devices installed with a game application program in which a game account is registered. The game data of the user playing the game through the game account are all provided with time stamps to form a time sequence, so that whether the user playing the game is a minor user or not can be identified by classifying the time sequence.
The method comprises the steps that a user to be identified is a user needing to be identified by an underage user, the user to be identified conducts a game action through a game application program of terminal equipment to form a time sequence to be classified, a game server can obtain the time sequence to be classified, and can also obtain a classified time sequence set, wherein the classified time sequence set comprises a plurality of classified time sequences and categories of the classified time sequences, each classified time sequence can take the refund action of the user as a category, and the classified time sequence comprises a plurality of first subsequences obtained by dividing the length of a target sliding window; then, the game server can divide the time sequence to be classified according to the length of the target sliding window to obtain a plurality of second subsequences; and finally, the game server determines the category of the time sequence to be classified according to the similarity between the time sequence to be classified and each classified time, so that the user portrait of the user to be identified is depicted, and whether the user to be identified is a minor is judged.
By the technical scheme of time sequence classification, the game users can be effectively analyzed, the game users can be depicted, the immature users in the game can be identified, the phenomenon that the immature people play the game can be explored, and potential refuge complaints and negative social public opinions can be avoided.
It should be noted that the above application scenario is only an illustrative example, and does not constitute a limitation on the application scenario of the technical solution of the embodiment of the present application, and the technical solution of the embodiment of the present application may be applied to any classification scenario of time series data.
The implementation details of the technical solution of the embodiment of the present application are set forth in detail below:
the classification method for classifying the time sequence mainly comprises a nearest neighbor classification method, a Shapelet sequence analysis method, a mode Bag method (Bag of Patterns, BoP) and a Hierarchical voting Collective method (high volume Collective of Transformation-based) based on the integration of conversion.
The nearest neighbor classification method measures the similarity between time sequences by methods such as Euclidean distance and the like, and then selects the class of the most similar time sequence as a classification result; searching a specific subsequence by using a Shapelet sequence analysis method, and using the existence of the subsequence as a key feature for distinguishing different types of sequences, wherein the description of the existence is based on Euclidean distance; the mode bag method comprises the steps of firstly converting a real value sequence into a symbol sequence by using an XML (Simple API for XML) analytical substitution method (SAX) technology, constructing a dictionary according to words appearing in the symbol sequence after determining the lengths of the words, then recording frequency information of the words, and finally classifying by using the frequency; the classification accuracy of the hierarchical voting ensemble method based on integration of transformations may be higher compared to the nearest neighbor classification method, the Shapelet sequence analysis method, and the pattern bag method, which aggregates more than 30 independent classifiers, including the nearest neighbor classification method, the Shapelet sequence analysis method, the pattern bag method, and so on.
In addition, classifying the time series includes classifying the real-valued series using a deep Neural Network, for example, using a Residual Neural Network (ResNet), which includes 9 Residual convolutional layers and more than 500000 Network parameters.
The time series classification method is characterized in that default time series data are preprocessed and are sequences with equal length, no deletion and no abnormal value. However, in practical applications, the data of payment, activity and the like of the users in the game are all time-stamped in terms of game applications, and therefore are all typical time-series data. When portrait analysis is performed on users, time series classification can effectively realize classification of the users and portrayal of different types of users. However, different users generate time series of different lengths due to different game frequencies, and besides, missing values and abnormal values often exist in the time series data due to processing errors or recording missing. Therefore, the time series data usually has deletions and abnormal values, and is not of equal length.
Although the time series can be guaranteed to be equal in length by simple truncation or sampling, part of the information of the time series is lost, and the unequal time series naturally exist. For example, the heartbeat time series are typically of unequal length, as the heartbeats themselves are not perfectly equally spaced. The operation of changing the time series to equal length may itself distort the data, causing problems for subsequent analysis.
The missing value can be completed by interpolation or other methods, but these methods are only applicable to the case of a small number of discontinuous missing values, and are not applicable to the case of a continuous missing value. This situation with continuous missing is very common and is often caused by network transmission problems. In addition, even if the completion can be performed, it is difficult to ensure the validity of the completion value, and the subsequent analysis may be affected.
Outliers can be identified by anomaly detection algorithms, but how to deal with outliers subsequently remains a problematic issue. It is common to identify outliers that can be treated as missing, but this still suffers from the problem of missing value handling.
According to the time sequence classification method, the nearest neighbor classification method can measure the similarity of two unequal long-time sequences by selecting a dynamic time warping algorithm, but the method cannot solve the problems of missing values and abnormal values, and the dynamic time warping algorithm is high in time complexity and difficult to adapt to large-scale data scenes.
The Shapelet sequence analysis method can avoid the problems of unequal length, missing values and abnormal values, but the method has extremely high time complexity and completely lacks practicability because all possible subsequences are traversed. In addition, the actual classification performance of the method is poor and is significantly lower than that of the nearest neighbor method. The pattern bag method can be applied to sequences of unequal duration, but the missing values and outliers must be preprocessed.
The integrated grading voting collective method based on conversion is based on basic algorithms such as nearest neighbor, Shapelet and pattern bag, the problem cannot be avoided naturally, and the deep learning method can realize effective model training only by preprocessing a time sequence in advance.
In summary, the time series classification algorithm usually defaults that the time series data to be classified is preprocessed, and the time series data in the real world are often unequal in length and have missing and abnormal values, so that the related time series classification algorithm is difficult to be applied to data in the real environment simply and effectively. In addition, the traditional preprocessing flow is difficult to ensure the quality of the processed data, so that the subsequent data analysis has potential problems.
In view of the above, an embodiment of the present application provides a time sequence classification method, where a time sequence to be classified and a classified time sequence set are obtained, where the classified time sequence set includes a plurality of classified time sequences and classes of the classified time sequences, each classified time sequence includes a plurality of first subsequences obtained by dividing through a target sliding window length, then the time sequence to be classified is divided according to the target sliding window length to obtain a plurality of second subsequences, further, according to the plurality of second subsequences and the plurality of first subsequences, a similarity between the time sequence to be classified and each classified time sequence is calculated, and finally, according to the similarity between the time sequence to be classified and each classified time, a class of the time sequence to be classified is determined. The technical scheme provided by the embodiment of the application can be directly applied to original time sequence data existing in a real scene without additionally preprocessing the time sequence, and the time sequence data is divided by using the length of the target sliding window to obtain subsequences, so that the influence of unequal time sequence, missing values or abnormal values is effectively avoided, any similarity measurement method is compatible, the time sequence can be effectively classified, and the classification accuracy and efficiency are improved.
Fig. 2 shows a flowchart of a time series classification method according to an embodiment of the present application, which may be performed by a server, which may be the server 105 shown in fig. 1, but may also be performed by a terminal device, such as the terminal 101 shown in fig. 1. Referring to fig. 2, the time series classification method includes:
step S210, a time sequence to be classified and a classified time sequence set are obtained, wherein the classified time sequence set comprises a plurality of classified time sequences and the classes of the classified time sequences, and each classified time sequence comprises a plurality of first subsequences obtained by dividing through the length of a target sliding window;
step S220, dividing the time sequence to be classified according to the length of the target sliding window to obtain a plurality of second subsequences;
step S230, calculating a similarity between the time sequence to be classified and each classified time sequence according to the plurality of second subsequences and the plurality of first subsequences included in each classified time sequence;
step S240, determining the category of the time sequence to be classified according to the similarity between the time sequence to be classified and each classified time sequence.
These steps are described in detail below.
In step S210, a time sequence to be classified and a classified time sequence set are obtained, where the classified time sequence set includes a plurality of classified time sequences and categories of the classified time sequences, and each classified time sequence includes a plurality of first subsequences obtained by dividing the target sliding window length.
In this embodiment, the time series to be classified is a time series to be classified, the classified time series set is a set formed by a plurality of classified time series, and the set includes the category of each classified time series.
And dividing each classified time sequence into a plurality of first subsequences based on the length of the target sliding window, wherein the length of each divided first subsequence is the length of the target sliding window. For example, if the classified time series is T1=(t1,t2,t3,t4,t5) And the target sliding window length is 2, T can be determined based on the target sliding window length1The divided 5 first subsequences are T11=(t1,t2),T12=(t2,t3),T13=(t3,t4),T14=(t3,t4),T15=(t4,t5)。
In step S220, the time sequence to be classified is divided according to the length of the target sliding window, so as to obtain a plurality of second subsequences.
Besides, each classified time sequence is divided according to the length of the target sliding window to obtain a plurality of first subsequences, the time sequences to be classified are further divided according to the length of the target sliding window to obtain a plurality of second subsequences.
In step S230, a similarity between the time series to be classified and each classified time series is calculated according to the plurality of second sub-series and the plurality of first sub-series included in each classified time series.
After the time sequence to be classified is obtained and the time sequence to be classified is divided to obtain a plurality of second subsequences, the similarity between the time sequence to be classified and each classified time sequence can be calculated and obtained through the plurality of second subsequences and the plurality of first subsequences contained in each classified time sequence. The method for calculating the similarity may include euclidean distance, dynamic time warping, and the like, and the method for calculating the similarity is not specifically limited in this embodiment of the application.
For example, assume that the classified time-series set includes 4 classified time-series, T1=(t1,t2,t3,t4),T2=(t5,t6),T3=(t7,t8),T4=(t9,t10,t11) Time series to be classified is P1=(p1,p2,p3) The target sliding window length is 2. Based on the length of the target sliding window, a plurality of first subsequences can be obtained by dividing, wherein the first subsequences are T respectively11=(t1,t2),T12=(t2,t3),T13=(t3,t4),T21=(t5,t6),T31=(t7,t8),T41=(t9,t10),T42=(t10,t11) Based on the length of the target sliding window, a plurality of second subsequences can be obtained by dividing, wherein each second subsequence is P11=(p1,p2),P12=(p2,p3)。
To calculate a sorted time series T1And the time series P to be classified1The similarity between the two can be calculated by calculating T11=(t1,t2) And P11=(p1,p2) Similarity between them S1,T11=(t1,t2) And P12=(p2,p3) Similarity between them S2,T12=(t2,t3) And P11=(p1,p2) The similarity between them is S3,T12=(t2,t3) And P12=(p2,p3) Similarity between them S4,T13=(t3,t4) And P11=(p1,p2) Similarity between them S5,T13=(t3,t4) And P12=(p2,p3) Similarity between them S6Then six similarities S are obtained1、S2、S3、S4、S5、S6Then, the maximum value of the six similarity degrees can be used as the classified time series T1And the time series P to be classified1In betweenSimilarity, and similarly, the sorted time series T can also be calculated2And the time series P to be classified1Similarity between, sorted time series T3And the time series P to be classified1Similarity between, sorted time series T4And the time series P to be classified1The similarity between them.
In step S240, determining the category of the time series to be classified according to the similarity between the time series to be classified and each classified time series.
The similarity describes the degree of similarity of local features between the time series to be classified and each classified time series, and therefore, after the similarity between the time series to be classified and each classified time series is calculated in step S230, the category of the time series to be classified can be determined accordingly.
In an embodiment of the present application, after the similarity between the time series to be classified and each classified time series is obtained through calculation, the maximum similarity among the similarities between the time series to be classified and each classified time series may be obtained, and the category of the classified time series corresponding to the maximum similarity is taken as the category of the time series to be classified.
Based on the technical scheme of the embodiment, the subsequence is obtained by dividing the time sequence data by using the length of the target sliding window, so that the influence of unequal length, missing values or abnormal values of the time sequence is effectively avoided, the time sequence is not required to be additionally preprocessed, the method can be directly applied to the original time sequence data existing in a real scene, and any similarity measurement method is compatible, so that the time sequence can be effectively classified, and the classification accuracy and efficiency are improved.
Fig. 3 shows a flowchart for determining the length of the target sliding window according to an embodiment of the present application, and as shown in fig. 3, the method may specifically include steps S310 to S340, which are described in detail as follows:
step S310, the classified time sequence set is divided into a first subset and a second subset, and a plurality of sliding window lengths are generated according to the sequence length of each classified time sequence.
In this embodiment, to determine the target sliding window length, the classified time-series set may be first divided, specifically, the classified time-series set may be divided into a first subset and a second subset in a manner of a certain number proportion, and the first subset and the second subset respectively include different numbers of classified time-series. Meanwhile, a plurality of sliding window lengths may be generated according to the sequence length of each classified time series.
For example, assuming that the sorted time series set includes 5 sorted time series, the sequence lengths of the 5 sorted time series are 10, 6, 12, 4, and 5, and since the length of the shortest sequence in the sorted time series set is 4, the lengths of the plurality of sliding windows may include 1, 2, 3, and 4.
Here, the first subset may be used as a training subset for training a classification of the time series; and the second subset is used as a verification subset for verifying whether the training subset classifies the time series correctly or not. Therefore, in order to ensure the classification effect, the number of the classified time series contained in the first subset may be greater than the number of the classified time series contained in the second subset, for example, the ratio of the number of the first subset to the second subset is 7: 3.
step S320, dividing the sorted time sequence included in the first subset according to the length of each sliding window to obtain a plurality of third subsequences corresponding to the lengths of each sliding window, and dividing the sorted time sequence included in the second subset according to the lengths of each sliding window to obtain a plurality of fourth subsequences corresponding to the lengths of each sliding window.
After the plurality of sliding window lengths are generated, further, the sorted time sequences included in the first subset may be divided according to the respective sliding window lengths to obtain a plurality of third subsequences corresponding to the respective sliding window lengths, and meanwhile, the sorted time sequences included in the second subset may also be divided according to the respective sliding window lengths to obtain a plurality of fourth subsequences corresponding to the respective sliding window lengths.
For example, assume that the first subset includes 3 sorted time-series, T respectively1、T2And T3Wherein T is1=(t1,t2)、T2=(t3,t4,t5,t6)、T3=(t7,t8,t9) The second subset comprises 2 classified subsequences, T4And T5Wherein T is4=(t10,t11,t12,t13,t14,t15),T5=(t16,t17,t18) Since the shortest sequence length in the sorted set of time sequences is 2, 2 sliding window lengths w can be generated1And w2Are each w1=1,w2=2。
Thus, at w1When 1, the 3 sorted time series included in the first subset may be divided into 9 third subsequences: t is111=(t1),T112=(t2),T121=(t3),T122=(t4),T123=(t5),T124=(t6),T131=(t7),T132=(t8),T133=(t9) (ii) a At w2When 2, the 3 sorted time series included in the first subset may be divided into 6 third subsequences: t is211=(t1,t2),T221=(t3,t4),T222=(t4,t5),T223=(t5,t6),T231=(t7,t8),T232=(t8,t9)。
In the same way, w1When 1, 2 sorted time series included in the second subset are divided, and 9 fourth subsequences can be obtained: t is141=(t10),T142=(t11),T143=(t12),T144=(t13),T145=(t14),T146=(t15),T151=(t16),T152=(t17),T153=(t18);w2When 2, 2 sorted time sequences included in the second subset are divided, and 7 fourth subsequences can be obtained: t is241=(t10,t11),T242=(t11,t12),T243=(t12,t13),T244=(t13,t14),T245=(t14,t15),T251=(t16,t17),T252=(t17,t18)。
And step S330, determining the classification accuracy corresponding to each sliding window length according to the plurality of third subsequences and the plurality of fourth subsequences.
Since the third sub-sequence is obtained by dividing the first sub-sequence, the first sub-sequence can be used as a training set for training the classification of the time sequence, and the fourth sub-sequence is obtained by dividing the second sub-sequence, the second sub-sequence can be used as a verification sub-set for verifying the correctness of the classification of the training sub-set on the time sequence. Therefore, through the classification of the plurality of third subsequences and the verification of the plurality of fourth subsequences, the classification accuracy corresponding to each sliding window length can be determined.
In an embodiment of the present application, as shown in fig. 4, step S330 specifically includes steps S410 to S430, which are specifically described as follows:
step S410, calculating similarities between the sorted time series included in the first subset and the sorted time series included in the second subset with respect to the lengths of the sliding windows according to the third subsequences and the fourth subsequences.
In this embodiment, in order to determine the classification accuracy corresponding to each sliding window length, firstly, the similarity between the classified time series included in the first subset and the classified time series included in the second subset with respect to each sliding window length may be calculated according to the plurality of third subsequences and the plurality of fourth subsequences.
Continuing with the example in step S320, the sliding window length w1When 1, the 3 sorted time series included in the first subset are divided into 9 third subsequences: t is111=(t1),T112=(t2),T121=(t3),T122=(t4),T123=(t5),T124=(t6),T131=(t7),T132=(t8),T133=(t9) And dividing 2 classified time sequences contained in the second subset to obtain 9 fourth subsequences: t is141=(t10),T142=(t11),T143=(t12),T144=(t13),T145=(t14),T146=(t15),T151=(t16),T152=(t17),T153=(t18) Thus, passing T111=(t1),T112=(t2) Respectively with T141=(t10),T142=(t11),T143=(t12),T144=(t13),T145=(t14),T146=(t15) The plurality of similarities may be calculated, and after the plurality of similarities are calculated, a maximum value of the plurality of similarities may be regarded as T1And T4Similarity between them S11. Similarly, T can also be calculated by a plurality of third subsequences and a plurality of fourth subsequences2And T4Similarity between them S12,T3And T4Similarity between them S13,T1And T5Similarity between them S14,T2And T5Similarity between them S15,T3And T5Similarity between them S16。
Sliding window length w2When the number is equal to 2, the alloy is put into a container,dividing 3 classified time sequences contained in the first subset to obtain 6 third subsequences: t is211=(t1,t2),T221=(t3,t4),T222=(t4,t5),T223=(t5,t6),T231=(t7,t8),T232=(t8,t9) And dividing 2 classified time sequences contained in the second subset to obtain 7 fourth subsequences: t is241=(t10,t11),T242=(t11,t12),T243=(t12,t13),T244=(t13,t14),T245=(t14,t15),T251=(t16,t17),T252=(t17,t18) Thus, by the third subsequence T211=(t1,t2) With a fourth subsequence T241=(t10,t11),T242=(t11,t12),T243=(t12,t13),T244=(t13,t14),T245=(t14,t15) The plurality of similarities may be calculated, and after the plurality of similarities are calculated, a maximum value of the plurality of similarities may be regarded as T1And T4Similarity between them S21. Similarly, T can also be calculated by a plurality of third subsequences and a plurality of fourth subsequences2And T4Similarity between them S22,T3And T4Similarity between them S23,T1And T5Similarity between them S24,T2And T5Similarity between them S25,T3And T5Similarity between them S26。
Step S420, determining a reference category of the sorted time series contained in the second subset relative to the lengths of the sliding windows according to the similarity between the sorted time series contained in the first subset and the sorted time series contained in the second subset relative to the lengths of the sliding windows.
After calculating the similarity of the sorted time series contained in the first subset and the sorted time series contained in the second subset with respect to the respective sliding window lengths, i.e. describing the degree of similarity of the local features between the sorted time series contained in the first subset and the sorted time series contained in the second subset by the similarity, the reference category of the sorted time series contained in the second subset with respect to the respective sliding window lengths can be determined accordingly.
In an embodiment of the present application, step S420 may specifically include:
and acquiring the maximum similarity of the classified time series contained in the first subset and the classified time series contained in the second subset relative to the lengths of the sliding windows, and taking the category of the classified time series corresponding to the maximum similarity as the reference category of the classified time series contained in the second subset relative to the lengths of the sliding windows.
Specifically, the maximum similarity describes the maximum degree of similarity between the sorted time series included in the first subset and the sorted time series included in the second subset, and therefore, the category of the sorted time series corresponding to the maximum similarity may be used as the reference category of the sorted time series included in the second subset with respect to the length of each sliding window.
Continuing with the example in step S410, the sliding window length w1When T is 1, T can be calculated1And T4Similarity between them S11,T2And T4Similarity between them S12,T3And T4Similarity between them S13,T1And T5Similarity between them S14,T2And T5Similarity between them S15,T3And T5Similarity between them S16. If S11、S12、S13Middle S12For maximum similarity, T may be2Is taken as T4The reference category of (1); if S14、S15、S16Middle S16For maximum similarity, T may be3Is taken as T5Reference category of (2).
Sliding window length w2When T is 2, T can be calculated1And T4Similarity between them S21Similarly, T can also be calculated2And T4Similarity between them S22,T3And T4Similarity between them S23,T1And T5Similarity between them S24,T2And T5Similarity between them S25,T3And T5Similarity between them S26. If S21、S22、S23Middle S23For maximum similarity, T may be3Is taken as T4The reference category of (1); if S24、S25、S26Middle S25For maximum similarity, T may be2Is taken as T5Reference category of (2).
Step S430, determining the classification accuracy corresponding to each sliding window length according to the class of the classified time series included in the second subset and the reference class of the classified time series included in the second subset relative to each sliding window length.
It is understood that if the category of the classified time series is the same as the reference category of the classified time series, the classification accuracy of the classified time series may be determined to be 100%, and conversely, if not the same, the classification accuracy of the classified time series may be determined to be 0%.
Specifically, in this step, if the reference category of the sorted time series contained in the second subset with respect to each sliding window length is the same as the category of the sorted time series contained in the second subset, the sorting accuracy of the sorted time series contained in the second subset with respect to each sliding window length may be determined to be 100%, and conversely, the sorting accuracy of the sorted time series contained in the second subset with respect to each sliding window length may be determined to be 0%.
Furthermore, according to the classification accuracy of the classified time series contained in the second subset relative to each sliding window length, the classification accuracy corresponding to each sliding window length can be determined.
In an embodiment of the present application, step S430 may specifically include:
and calculating the ratio of the sum of the classification accuracy of the classified time sequences contained in the second subset relative to the lengths of the sliding windows to the number of the classified time sequences contained in the second subset, and taking the ratio as the classification accuracy corresponding to the lengths of the sliding windows.
In this embodiment, the ratio of the sum of the classification accuracy rates of the classified time series contained in the second subset with respect to each sliding window length to the number of the classified time series contained in the second subset may be used as the classification accuracy rate corresponding to each sliding window length.
Continuing with the example in step S420, the sliding window length w1When equal to 1, T is2Is taken as T4Reference class of, will T3Is taken as T5If T is a reference class2Is of class c1Then T is4Reference class of c1And T is4Is also of the class c1Thus T4The classification accuracy rate relative to the sliding window length w being 1 is 100%; if T3Is of class c2Then T is5Reference class of c2And T is5Is of class c3Then T is5The classification accuracy with respect to the sliding window length w of 1 is 0%, and thus the sliding window length w can be calculated1The corresponding classification accuracy is (100% + 0%)/2-50%.
Sliding window length w2When equal to 2, let T3Is taken as T4Reference class of, will T2Is taken as T5Reference class of (2), T3Is of class c2Then T is4Reference class of c2And T is4Is also of the class c1Thus T4The classification accuracy with respect to the sliding window length w-2 is 0%; t is2Is of class c1Then T is5Reference class of c1And T is5Is of class c3Then T is5The classification accuracy with respect to the sliding window length w of 2 is 0%, and thus the sliding window length w can be calculated2The corresponding classification accuracy is (0% + 0%)/2 ═ 0%.
In an embodiment of the present application, as shown in fig. 6, step S330 may further specifically include step S610 to step S640, which are specifically described as follows:
step S610, according to the third subsequences and the fourth subsequences, calculating a similarity between the sorted time sequence included in the first subset and the sorted time sequence included in the second subset with respect to the length of each sliding window.
In this embodiment, in order to determine the classification accuracy corresponding to each sliding window length, firstly, according to the plurality of third subsequences and the plurality of fourth subsequences, the similarity between the classified time sequence included in the first subset and the classified time sequence included in the second subset with respect to each sliding window length may be calculated, and the calculation may be performed multiple times. Wherein, the number of times of the multiple calculations may be inversely proportional to the calculated data amount, i.e. if the calculated data amount is large, the number of times may be reduced; if the calculated data amount is small, the number of times can be increased, and the determination can be specifically carried out according to the actual situation.
Step S620, determining a single reference category of the sorted time series contained in the second subset relative to the length of each sliding window according to the similarity obtained by each calculation.
After performing multiple calculations, a single reference category of the sorted time series contained in the second subset with respect to each sliding window length may be determined according to the similarity obtained from each calculation. The specific determination method synchronizes step S420.
Step S630, determining a plurality of classification accuracies corresponding to each sliding window length according to the class of the classified time series included in the second subset and the single reference class of the classified time series included in the second subset relative to each sliding window length.
Specifically, if the category of the sorted time series contained in the second subset and the single reference category of the sorted time series contained in the second subset with respect to each sliding window length are the same, the single sorting accuracy of the sorted time series contained in the second subset with respect to each sliding window length may be determined to be 100%, whereas the single sorting accuracy of the sorted time series contained in the second subset with respect to each sliding window length may be determined to be 0%.
Furthermore, according to the single classification accuracy of the classified time series contained in the second subset relative to each sliding window length, the single classification accuracy corresponding to each sliding window length can be determined. For example, a ratio of the sum of the single classification accuracy rates of the classified time series contained in the second subset with respect to the respective sliding window lengths to the number of the classified time series contained in the second subset is calculated, and the ratio is used as the single classification accuracy rate corresponding to the respective sliding window lengths.
After the single classification accuracy corresponding to each sliding window length is obtained, a plurality of classification accuracies corresponding to each sliding window length are also obtained.
Continuing with the example in step S430, the sliding window length w calculated in step S4301The corresponding classification accuracy is (100% + 0%)/2 ═ 50%, and the sliding window length w2The corresponding classification accuracy is (0% + 0%)/2 ═ 0%, which is a calculation result, and if 5 calculations are performed in this embodiment, 5 classification accuracies can be obtained, and illustratively, the 5 calculation results can be shown in table 1.
w1=1 | w2=2 | |
For the first time | 50% | 0% |
For the second time | 50% | 50% |
The third time | 50% | 0% |
Fourth time | 0% | 0% |
Fifth time | 50% | 0% |
TABLE 1
And step S640, calculating the ratio of the sum of the classification accuracy rates corresponding to the sliding window lengths to the times, and taking the calculated ratio as the classification accuracy rate corresponding to the sliding window lengths.
After determining the plurality of classification accuracy rates corresponding to the lengths of the sliding windows, the ratio between the sum of the plurality of classification accuracy rates and the number of times can be further calculated, so that the calculated ratio is used as the classification accuracy rate corresponding to each length of the sliding window.
For example, assuming that the classification accuracy as shown in Table 1 above is obtained, the sliding window length w can be obtained1The corresponding classification accuracy is (50% + 50% + 50% + 0% + 50%)/5 ═ 40%, the sliding window length w2The corresponding classification accuracy is (0% + 50% + 0% + 0% + 0%)/5 ═ 10%.
With continued reference to fig. 3, in step S340, the target sliding window length is determined according to the classification accuracy corresponding to each sliding window length.
After the classification accuracy rates corresponding to the sliding window lengths are obtained through the above embodiment, the target sliding window length may be determined according to the classification accuracy rates corresponding to the sliding window lengths, for example, the sliding window length corresponding to the classification accuracy rate greater than a preset threshold value in the classification accuracy rates corresponding to the sliding window lengths may be used as the target sliding window length.
In an embodiment of the present application, after determining the classification accuracy corresponding to each sliding window length, the maximum classification accuracy of the classification accuracies corresponding to each sliding window length may also be obtained, and the sliding window length corresponding to the maximum classification accuracy is taken as the target sliding window length.
In this embodiment, the sliding window length corresponding to the maximum classification accuracy may be used as the target sliding window length, for example, the sliding window length w is obtained in step S6401The corresponding classification accuracy is the maximum classification accuracy, so the sliding window length w can be adjusted1As the target sliding window length.
Embodiments of the apparatus of the present application are described below, which may be used to perform the time series classification method in the above-described embodiments of the present application. For details that are not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the time-series classification method described above in the present application.
Fig. 7 is a block diagram illustrating a time-series classification apparatus according to an embodiment of the present application, and referring to fig. 7, a time-series classification apparatus 700 according to an embodiment of the present application includes: an acquisition unit 702, a first division unit 704, a calculation unit 706, and a first determination unit 708.
The obtaining unit 702 is configured to obtain a time sequence to be classified and a classified time sequence set, where the classified time sequence set includes a plurality of classified time sequences and categories of the classified time sequences, and each classified time sequence includes a plurality of first subsequences obtained by dividing a target sliding window length; the first dividing unit 704 is configured to divide the time sequence to be classified according to the length of the target sliding window to obtain a plurality of second subsequences; a calculating unit 706 configured to calculate similarities between the time series to be classified and the respective classified time series according to the plurality of second subsequences and the plurality of first subsequences included in the respective classified time series; a first determining unit 708, configured to determine a category of the time series to be classified according to a similarity between the time series to be classified and the respective classified time series.
In some embodiments of the present application, the first determining unit 708 is configured to: and acquiring the maximum similarity in the similarities between the time sequence to be classified and each classified time sequence, and taking the category of the classified time sequence corresponding to the maximum similarity as the category of the time sequence to be classified.
In some embodiments of the present application, the apparatus further comprises: a generating unit configured to divide the classified time series set into a first subset and a second subset, and generate a plurality of sliding window lengths according to the sequence length of each classified time series; a second dividing unit, configured to divide the sorted time series included in the first subset according to each sliding window length to obtain a plurality of third subsequences corresponding to each sliding window length, and divide the sorted time series included in the second subset according to each sliding window length to obtain a plurality of fourth subsequences corresponding to each sliding window length; a second determining unit, configured to determine, according to the plurality of third subsequences and the plurality of fourth subsequences, classification accuracy rates corresponding to the respective sliding window lengths; and the third determining unit is configured to determine the length of the target sliding window according to the classification accuracy corresponding to each sliding window length.
In some embodiments of the present application, the third determining unit is configured to: and acquiring the maximum classification accuracy rate of the classification accuracy rates corresponding to the lengths of the sliding windows, and taking the length of the sliding window corresponding to the maximum classification accuracy rate as the length of the target sliding window.
In some embodiments of the present application, the second determination unit includes: a calculating subunit configured to calculate, according to the plurality of third subsequences and the plurality of fourth subsequences, similarities between the sorted time sequences contained in the first subset and the sorted time sequences contained in the second subset with respect to the respective sliding window lengths; a first determining subunit, configured to determine a reference category of the sorted time series contained in the second subset with respect to the respective sliding window lengths according to a similarity between the sorted time series contained in the first subset and the sorted time series contained in the second subset with respect to the respective sliding window lengths; a second determining subunit, configured to determine, according to the category of the sorted time series included in the second subset and the reference category of the sorted time series included in the second subset relative to each sliding window length, a sorting accuracy corresponding to each sliding window length.
In some embodiments of the present application, the first determining subunit is configured to: and acquiring the maximum similarity of the classified time series contained in the first subset and the classified time series contained in the second subset relative to the lengths of the sliding windows, and taking the category of the classified time series corresponding to the maximum similarity as the reference category of the classified time series contained in the second subset relative to the lengths of the sliding windows.
In some embodiments of the present application, the second determining subunit is configured to: determining the classification accuracy of the classified time series contained in the second subset relative to the lengths of the sliding windows according to the classes of the classified time series contained in the second subset and the reference classes of the classified time series contained in the second subset relative to the lengths of the sliding windows; and determining the classification accuracy corresponding to each sliding window length according to the classification accuracy of the classified time sequences contained in the second subset relative to each sliding window length and the number of the classified time sequences contained in the second subset.
In some embodiments of the present application, the second determining subunit is configured to: and calculating the ratio of the sum of the classification accuracy of the classified time sequences contained in the second subset relative to the lengths of the sliding windows to the number of the classified time sequences contained in the second subset, and taking the ratio as the classification accuracy corresponding to the lengths of the sliding windows.
In some embodiments of the present application, the second determining unit is configured to: calculating the similarity of the sorted time series contained in the first subset and the sorted time series contained in the second subset relative to the length of each sliding window for a plurality of times according to the plurality of third subsequences and the plurality of fourth subsequences; determining a single reference category of the classified time series contained in the second subset relative to the length of each sliding window according to the similarity obtained by each calculation; determining a plurality of classification accuracy rates corresponding to the sliding window lengths according to the classes of the classified time sequences contained in the second subset and the single reference classes of the classified time sequences contained in the second subset relative to the sliding window lengths; and calculating the ratio of the sum of the classification accuracy rates corresponding to the lengths of the sliding windows to the times, and taking the calculated ratio as the classification accuracy rate corresponding to the length of each sliding window.
FIG. 8 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that the computer system 800 of the electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 8, a computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes, such as performing the methods described in the above embodiments, according to a program stored in a Read-Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for system operation are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An Input/Output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising a computer program for performing the method illustrated by the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. When the computer program is executed by the Central Processing Unit (CPU)801, various functions defined in the system of the present application are executed.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with a computer program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. The computer program embodied on the computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by an electronic device, cause the electronic device to implement the method described in the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
Claims (10)
1. A method of time series classification, the method comprising:
acquiring a time sequence to be classified and a classified time sequence set, wherein the classified time sequence set comprises a plurality of classified time sequences and the classes of the classified time sequences, and each classified time sequence comprises a plurality of first subsequences obtained by dividing through the length of a target sliding window;
dividing the time sequence to be classified according to the length of the target sliding window to obtain a plurality of second subsequences;
calculating the similarity between the time sequence to be classified and each classified time sequence according to the plurality of second subsequences and the plurality of first subsequences contained in each classified time sequence;
and determining the category of the time sequence to be classified according to the similarity between the time sequence to be classified and each classified time sequence.
2. The method according to claim 1, wherein determining the category of the time series to be classified according to the similarity between the time series to be classified and each classified time series comprises:
and acquiring the maximum similarity in the similarities between the time sequence to be classified and each classified time sequence, and taking the category of the classified time sequence corresponding to the maximum similarity as the category of the time sequence to be classified.
3. The method of claim 1, further comprising:
dividing the classified time sequence set into a first subset and a second subset, and generating a plurality of sliding window lengths according to the sequence length of each classified time sequence;
dividing the classified time sequence contained in the first subset according to the length of each sliding window to obtain a plurality of third subsequences corresponding to the lengths of each sliding window, and dividing the classified time sequence contained in the second subset according to the lengths of each sliding window to obtain a plurality of fourth subsequences corresponding to the lengths of each sliding window;
determining the classification accuracy corresponding to the length of each sliding window according to the plurality of third subsequences and the plurality of fourth subsequences;
and determining the length of the target sliding window according to the classification accuracy corresponding to each sliding window length.
4. The method of claim 3, wherein determining the target sliding window length according to the classification accuracy corresponding to each sliding window length comprises:
and acquiring the maximum classification accuracy rate of the classification accuracy rates corresponding to the lengths of the sliding windows, and taking the length of the sliding window corresponding to the maximum classification accuracy rate as the length of the target sliding window.
5. The method of claim 3, wherein determining the classification accuracy corresponding to each sliding window length according to the third subsequences and the fourth subsequences comprises:
calculating the similarity of the sorted time series contained in the first subset and the sorted time series contained in the second subset relative to the length of each sliding window according to the third subsequences and the fourth subsequences;
determining a reference class of the sorted time series contained in the second subset relative to the respective sliding window length according to a similarity of the sorted time series contained in the first subset and the sorted time series contained in the second subset relative to the respective sliding window length;
and determining the classification accuracy corresponding to each sliding window length according to the class of the classified time sequence contained in the second subset and the reference class of the classified time sequence contained in the second subset relative to each sliding window length.
6. The method of claim 5, wherein determining the reference category of the sorted time series contained in the second subset relative to the respective sliding window length according to the similarity between the sorted time series contained in the first subset and the sorted time series contained in the second subset relative to the respective sliding window length comprises:
and acquiring the maximum similarity of the classified time series contained in the first subset and the classified time series contained in the second subset relative to the lengths of the sliding windows, and taking the category of the classified time series corresponding to the maximum similarity as the reference category of the classified time series contained in the second subset relative to the lengths of the sliding windows.
7. The method of claim 5, wherein determining the classification accuracy corresponding to each sliding window length according to the class of the classified time sequence included in the second subset and the reference class of the classified time sequence included in the second subset relative to each sliding window length comprises:
determining the classification accuracy of the classified time series contained in the second subset relative to the lengths of the sliding windows according to the classes of the classified time series contained in the second subset and the reference classes of the classified time series contained in the second subset relative to the lengths of the sliding windows;
and determining the classification accuracy corresponding to each sliding window length according to the classification accuracy of the classified time sequences contained in the second subset relative to each sliding window length and the number of the classified time sequences contained in the second subset.
8. The method of claim 7, wherein determining the classification accuracy corresponding to each sliding window length according to the classification accuracy of the classified time series contained in the second subset relative to each sliding window length and the number of the classified time series contained in the second subset comprises:
and calculating the ratio of the sum of the classification accuracy of the classified time sequences contained in the second subset relative to the lengths of the sliding windows to the number of the classified time sequences contained in the second subset, and taking the ratio as the classification accuracy corresponding to the lengths of the sliding windows.
9. The method of claim 3, wherein determining the classification accuracy corresponding to each sliding window length according to the third subsequences and the fourth subsequences comprises:
calculating the similarity of the sorted time series contained in the first subset and the sorted time series contained in the second subset relative to the length of each sliding window for a plurality of times according to the plurality of third subsequences and the plurality of fourth subsequences;
determining a single reference category of the classified time series contained in the second subset relative to the length of each sliding window according to the similarity obtained by each calculation;
determining a plurality of classification accuracy rates corresponding to the sliding window lengths according to the classes of the classified time sequences contained in the second subset and the single reference classes of the classified time sequences contained in the second subset relative to the sliding window lengths;
and calculating the ratio of the sum of the classification accuracy rates corresponding to the lengths of the sliding windows to the times, and taking the calculated ratio as the classification accuracy rate corresponding to the length of each sliding window.
10. A time series classification apparatus, characterized in that the apparatus comprises:
the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a time sequence to be classified and a classified time sequence set, the classified time sequence set comprises a plurality of classified time sequences and categories of the classified time sequences, and each classified time sequence comprises a plurality of first subsequences obtained by dividing through the length of a target sliding window;
the first dividing unit is configured to divide the time sequence to be classified according to the length of the target sliding window to obtain a plurality of second subsequences;
a calculating unit, configured to calculate a similarity between the time series to be classified and each classified time series according to a plurality of first subsequences included in the plurality of second subsequences and each classified time series;
a first determining unit, configured to determine the category of the time series to be classified according to the similarity between the time series to be classified and each classified time series.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011003407.1A CN112131322B (en) | 2020-09-22 | 2020-09-22 | Time sequence classification method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011003407.1A CN112131322B (en) | 2020-09-22 | 2020-09-22 | Time sequence classification method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112131322A true CN112131322A (en) | 2020-12-25 |
CN112131322B CN112131322B (en) | 2023-10-10 |
Family
ID=73842422
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011003407.1A Active CN112131322B (en) | 2020-09-22 | 2020-09-22 | Time sequence classification method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112131322B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112965963A (en) * | 2021-02-05 | 2021-06-15 | 同盾科技有限公司 | Information processing method |
CN113821574A (en) * | 2021-08-31 | 2021-12-21 | 北京达佳互联信息技术有限公司 | User behavior classification method and device and storage medium |
CN113836240A (en) * | 2021-09-07 | 2021-12-24 | 招商银行股份有限公司 | Time sequence data classification method and device, terminal equipment and storage medium |
CN116541784A (en) * | 2023-07-04 | 2023-08-04 | 乐山师范学院 | Time sequence classification method and device based on dictionary tree and coverage |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136327A (en) * | 2012-12-28 | 2013-06-05 | 中国矿业大学 | Time series signifying method based on local feature cluster |
CN104657749A (en) * | 2015-03-05 | 2015-05-27 | 苏州大学 | Method and device for classifying time series |
CN105224543A (en) * | 2014-05-30 | 2016-01-06 | 国际商业机器公司 | For the treatment of seasonal effect in time series method and apparatus |
CN111291824A (en) * | 2020-02-24 | 2020-06-16 | 网易(杭州)网络有限公司 | Time sequence processing method and device, electronic equipment and computer readable medium |
-
2020
- 2020-09-22 CN CN202011003407.1A patent/CN112131322B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103136327A (en) * | 2012-12-28 | 2013-06-05 | 中国矿业大学 | Time series signifying method based on local feature cluster |
CN105224543A (en) * | 2014-05-30 | 2016-01-06 | 国际商业机器公司 | For the treatment of seasonal effect in time series method and apparatus |
CN104657749A (en) * | 2015-03-05 | 2015-05-27 | 苏州大学 | Method and device for classifying time series |
CN111291824A (en) * | 2020-02-24 | 2020-06-16 | 网易(杭州)网络有限公司 | Time sequence processing method and device, electronic equipment and computer readable medium |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112965963A (en) * | 2021-02-05 | 2021-06-15 | 同盾科技有限公司 | Information processing method |
CN113821574A (en) * | 2021-08-31 | 2021-12-21 | 北京达佳互联信息技术有限公司 | User behavior classification method and device and storage medium |
CN113821574B (en) * | 2021-08-31 | 2024-07-30 | 北京达佳互联信息技术有限公司 | User behavior classification method and device and storage medium |
CN113836240A (en) * | 2021-09-07 | 2021-12-24 | 招商银行股份有限公司 | Time sequence data classification method and device, terminal equipment and storage medium |
CN113836240B (en) * | 2021-09-07 | 2024-02-20 | 招商银行股份有限公司 | Time sequence data classification method, device, terminal equipment and storage medium |
CN116541784A (en) * | 2023-07-04 | 2023-08-04 | 乐山师范学院 | Time sequence classification method and device based on dictionary tree and coverage |
CN116541784B (en) * | 2023-07-04 | 2023-09-26 | 乐山师范学院 | Time sequence classification method and device based on dictionary tree and coverage |
Also Published As
Publication number | Publication date |
---|---|
CN112131322B (en) | 2023-10-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bolón-Canedo et al. | Feature selection for high-dimensional data | |
CN110377740B (en) | Emotion polarity analysis method and device, electronic equipment and storage medium | |
CN108197652B (en) | Method and apparatus for generating information | |
CN112131322B (en) | Time sequence classification method and device | |
WO2021174944A1 (en) | Message push method based on target activity, and related device | |
CN108540826B (en) | Bullet screen pushing method and device, electronic equipment and storage medium | |
CN110069709B (en) | Intention recognition method, device, computer readable medium and electronic equipment | |
CN112528025A (en) | Text clustering method, device and equipment based on density and storage medium | |
CN109033408B (en) | Information pushing method and device, computer readable storage medium and electronic equipment | |
CN111785384B (en) | Abnormal data identification method based on artificial intelligence and related equipment | |
CN110390408A (en) | Trading object prediction technique and device | |
CN108121699B (en) | Method and apparatus for outputting information | |
CN109214501B (en) | Method and apparatus for identifying information | |
CN113688310B (en) | Content recommendation method, device, equipment and storage medium | |
US11017572B2 (en) | Generating a probabilistic graphical model with causal information | |
CN116402166B (en) | Training method and device of prediction model, electronic equipment and storage medium | |
CN117421491A (en) | Method and device for quantifying social media account running data and electronic equipment | |
Zhang et al. | A generative adversarial network–based method for generating negative financial samples | |
Xu | Machine Learning for Flavor Development | |
JP7288062B2 (en) | Methods and devices for outputting information, electronic devices, storage media, and computer programs | |
CN111275683B (en) | Image quality grading processing method, system, device and medium | |
CN112541069A (en) | Text matching method, system, terminal and storage medium combined with keywords | |
CN110852078A (en) | Method and device for generating title | |
CN110162714A (en) | Content delivery method, calculates equipment and computer readable storage medium at device | |
CN116340864B (en) | Model drift detection method, device, equipment and storage medium thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |