CN113688929B - Prediction model determining method, apparatus, electronic device and computer storage medium - Google Patents

Prediction model determining method, apparatus, electronic device and computer storage medium Download PDF

Info

Publication number
CN113688929B
CN113688929B CN202111019064.2A CN202111019064A CN113688929B CN 113688929 B CN113688929 B CN 113688929B CN 202111019064 A CN202111019064 A CN 202111019064A CN 113688929 B CN113688929 B CN 113688929B
Authority
CN
China
Prior art keywords
data
numerical
type
threshold
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111019064.2A
Other languages
Chinese (zh)
Other versions
CN113688929A (en
Inventor
张发恩
刘祝崧
姜勇越
王菲
王建华
徐辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ruiyun Qizhi Chongqing Technology Co ltd
Original Assignee
Ruiyun Qizhi Chongqing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ruiyun Qizhi Chongqing Technology Co ltd filed Critical Ruiyun Qizhi Chongqing Technology Co ltd
Priority to CN202111019064.2A priority Critical patent/CN113688929B/en
Publication of CN113688929A publication Critical patent/CN113688929A/en
Application granted granted Critical
Publication of CN113688929B publication Critical patent/CN113688929B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Operations Research (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Marketing (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Development Economics (AREA)
  • Game Theory and Decision Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a prediction model determining method, a device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: extracting features of the received data to obtain corresponding data features; according to the data characteristics, carrying out predictable type classification on the data to obtain a classification result; determining a prediction model corresponding to the data according to the classification result; wherein, each classification result is matched with a corresponding prediction model in advance. By the method, the prediction scheme corresponding to the time sequence data can be rapidly and automatically determined, so that the labor cost is saved and the prediction efficiency is improved.

Description

Prediction model determining method, apparatus, electronic device and computer storage medium
Technical Field
The application belongs to the field of data processing, and particularly relates to a prediction model determining method, a prediction model determining device, electronic equipment and a computer storage medium.
Background
Most of the data collected by the operation and maintenance system are structured time series data, such as the utilization rate of the CPU at different time points, and other machine indexes. During operation, there is a need to predict such data.
In the prior art, it is difficult to have a general algorithm that can be applied to predict various types of time-series data generated by all scenes, so that an engineer typically selects an appropriate prediction algorithm manually according to his own experience to predict the time-series data of the corresponding scene. However, this process consumes a certain amount of labor and is time consuming.
Disclosure of Invention
In view of the foregoing, an object of the present application is to provide a prediction model determining method, apparatus, electronic device, and computer storage medium, so as to quickly and automatically determine a prediction scheme corresponding to time series data, thereby saving labor cost and improving prediction efficiency.
Embodiments of the present application are implemented as follows:
in a first aspect, an embodiment of the present application provides a method for determining a prediction model, where the method includes:
extracting features of the received data to obtain corresponding data features; according to the data characteristics, carrying out predictable type classification on the data to obtain a classification result; determining a prediction model corresponding to the data according to the classification result; wherein, each classification result is matched with a corresponding prediction model in advance. By the method, the prediction scheme corresponding to the time sequence data can be automatically determined, and the prediction scheme is not required to be manually determined, so that the labor cost is saved and the prediction efficiency is improved.
With reference to the first aspect embodiment, in a possible implementation manner, the performing feature extraction on the received data includes: acquiring a service type corresponding to the data, and determining a value field type of the data according to the service type; calculating the numerical fluctuation amplitude of the data according to the value range type; determining a numerical banding distribution, a numerical continuity, a numerical periodicity, and a numerical complexity of the data; the data characteristics include the value range type, the value fluctuation amplitude, the value banding distribution, the value continuity, the value periodicity, and the value complexity.
With reference to the first aspect embodiment, in one possible implementation manner, each data is time-series data, the value range type is a fixed value range or a non-fixed value range, and calculating a numerical fluctuation range of the data according to the value range type includes: when the value range type is determined to be the fixed value range, the formula is used for the value range typeCalculating to obtain the numerical fluctuation amplitude;
alternatively, when the value range type is determined to be the non-fixed value range, the formula is followedCalculating to obtain the numerical fluctuation amplitude; wherein A is the magnitude of the numerical fluctuation, N is the number of data points included in the time series data, and x i For the data value of the ith data point in the time series data,/for the data value of the ith data point in the time series data>An average data value for data points included in the time series data.
With reference to the first aspect embodiment, in a possible implementation manner, each data is time-series data, and the determining a value banding distribution, a value continuity, a value periodicity, and a value complexity of the data includes: calculating to obtain the numerical value banded distribution according to a formula b=m1+m2+m3; b is the numerical value banded distribution, M1 is the first mode ratio in the data, M2 is the second mode ratio in the data, and M3 is the third mode ratio in the data; calculating the numerical continuity according to formula c= Acf (1); c is the numerical continuity, acf (1) is the autocorrelation coefficient with parameter 1; calculating according to the formula d=acf (T) to obtain the numerical periodicity; d is the numerical periodicity, and Acf (T) is an autocorrelation coefficient with a parameter of T; according to the formulaE=ApEn=Φ m (r)-Φ m+1 (r) calculating the numerical complexity; e is the numerical complexity, apEn is the approximate entropy,n is the number of data points included in the time series data, r is a real number, and m is a positive integer less than N.
With reference to the first aspect embodiment, in a possible implementation manner, the data feature includes numerical continuity of the data and numerical complexity of the data; the classifying the data according to the data characteristics, including: judging whether the numerical continuity is smaller than a first threshold value; judging whether the numerical complexity is larger than a second threshold value or not when the numerical continuity is smaller than the first threshold value; and when the numerical complexity is greater than the second threshold, determining that the type of the data is an unpredictable type, and enabling a prediction model corresponding to the unpredictable type to be null.
With reference to the first aspect embodiment, in a possible implementation manner, the classifying the data according to the data feature according to a predictable type further includes: and when the numerical complexity is not greater than the second threshold value, determining the type of the data as a first predictable type, and determining a prediction model corresponding to the first predictable type as a deep learning model.
With reference to the first aspect embodiment, in a possible implementation manner, the data features further include a value range type of the data, a value fluctuation range of the data, a value band distribution of the data, and a value periodicity of the data; the classifying the data according to the data characteristics, further includes: when the numerical continuity is not smaller than the first threshold, if the numerical periodicity is determined to be smaller than a third threshold and the numerical fluctuation amplitude is greater than a fourth threshold, determining that the type of the data is a second predictable type, and determining that a prediction model corresponding to the second predictable type is an ARIMA model; or,
and when the numerical continuity is not smaller than the first threshold, if the numerical periodicity is determined to be larger than or equal to the third threshold and the numerical banded distribution is determined to be smaller than a fifth threshold, determining the type of the data to be a third predictable type, and determining a prediction model corresponding to the third predictable type to be a Prophet model.
In a second aspect, an embodiment of the present application provides a prediction model determining apparatus, including: the device comprises a feature extraction module, a classification module and a determination module.
The feature extraction module is used for carrying out feature extraction on the received data to obtain corresponding data features;
the classification module is used for classifying the data in a predictable type according to the data characteristics to obtain a classification result;
the determining module is used for determining a prediction model corresponding to the data according to the classification result;
wherein, each classification result is matched with a corresponding prediction model in advance.
With reference to the second aspect of the embodiment, in a possible implementation manner, the feature extraction module is configured to obtain a service type corresponding to the data, and determine a value range type of the data according to the service type; calculating the numerical fluctuation amplitude of the data according to the value range type; determining a numerical banding distribution, a numerical continuity, a numerical periodicity, and a numerical complexity of the data; the data characteristics include the value range type, the value fluctuation amplitude, the value banding distribution, the value continuity, the value periodicity, and the value complexity.
With reference to the second aspect of the embodiment, in a possible implementation manner, each data is time-series data, the value range type is a fixed value range or a non-fixed value range, and the feature extraction module is configured to, when determining that the value range type is the fixed value range, according to a formulaCalculating to obtain the numerical fluctuation amplitude; or alternatively,
When the value range type is determined to be the non-fixed value range, the formula is used for the value range typeCalculating to obtain the numerical fluctuation amplitude;
wherein A is the magnitude of the numerical fluctuation, N is the number of data points included in the time series data, and x i For the data value of the ith data point in the time series data,an average data value for data points included in the time series data.
With reference to the second aspect of the embodiment, in one possible implementation manner, each data is time-series data, and the feature extraction module is configured to calculate the numerical banded distribution according to a formula b=m1+m2+m3; b is the numerical value banded distribution, M1 is the first mode ratio in the data, M2 is the second mode ratio in the data, and M3 is the third mode ratio in the data;
calculating the numerical continuity according to formula c= Acf (1); c is the numerical continuity, acf (1) is the autocorrelation coefficient with parameter 1;
calculating according to the formula d=acf (T) to obtain the numerical periodicity; d is the numerical periodicity, and Acf (T) is an autocorrelation coefficient with a parameter of T;
according to the formula e=apen=Φ m (r)-Φ m+1 (r) calculating the numerical complexity; e is the numerical complexity, apEn is the approximate entropy,n is the number of data points included in the time series data, r is a real number, and m is a positive integer less than N.
With reference to the second aspect embodiment, in a possible implementation manner, the data feature includes numerical continuity of the data and numerical complexity of the data; the classification module is used for judging whether the numerical continuity is smaller than a first threshold value or not; judging whether the numerical complexity is larger than a second threshold value or not when the numerical continuity is smaller than the first threshold value; and when the numerical complexity is greater than the second threshold, determining that the type of the data is an unpredictable type, and enabling a prediction model corresponding to the unpredictable type to be null.
With reference to the second aspect of the embodiment, in a possible implementation manner, the classification module is further configured to determine, when the numerical complexity is not greater than the second threshold, that the type of the data is a first predictable type, and a prediction model corresponding to the first predictable type is a deep ar deep learning model.
With reference to the second aspect embodiment, in a possible implementation manner, the data features further include a value range type of the data, a value fluctuation range of the data, a value banding distribution of the data, and a value periodicity of the data; the classification module is further configured to determine, when the numerical continuity is not less than the first threshold, that the type of the data is a second predictable type and that a prediction model corresponding to the second predictable type is an ARIMA model if it is determined that the numerical periodicity is less than a third threshold and that the numerical fluctuation range is greater than a fourth threshold; or,
and when the numerical continuity is not smaller than the first threshold, if the numerical periodicity is determined to be larger than or equal to the third threshold and the numerical banded distribution is determined to be smaller than a fifth threshold, determining the type of the data to be a third predictable type, and determining a prediction model corresponding to the third predictable type to be a Prophet model.
In a third aspect, embodiments of the present application further provide an electronic device, including: the device comprises a memory and a processor, wherein the memory is connected with the processor; the memory is used for storing programs; the processor invokes a program stored in the memory to perform the above-described first aspect embodiment and/or the method provided in connection with any one of the possible implementations of the first aspect embodiment.
In a fourth aspect, the embodiments of the present application further provide a non-volatile computer readable storage medium (hereinafter referred to as computer storage medium), on which a computer program is stored, which when executed by a computer performs the above-described embodiments of the first aspect and/or the method provided in connection with any one of the possible implementations of the embodiments of the first aspect.
Additional features and advantages of the application will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the embodiments of the application. The objects and other advantages of the present application may be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art. The above and other objects, features and advantages of the present application will become more apparent from the accompanying drawings. Like reference numerals refer to like parts throughout the several views of the drawings. The drawings are not intended to be drawn to scale, with emphasis instead being placed upon illustrating the principles of the present application.
Fig. 1 shows a flowchart of a prediction model determining method provided in an embodiment of the present application.
Fig. 2 shows a schematic diagram of a classification rule according to an embodiment of the present application.
Fig. 3 shows a block diagram of a prediction model determining apparatus according to an embodiment of the present application.
Fig. 4 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
Icon: 100-an electronic device; 110-a processor; 120-memory; 400-prediction model determining means; 410-a feature extraction module; 420-classification module; 430-determination module.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, relational terms such as "first," "second," and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Furthermore, the term "and/or" in this application is merely an association relation describing an association object, and indicates that three relations may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone.
Furthermore, determining the existence of defects (which require a certain labor cost and are time-consuming) for the predictive model in the prior art is the result of the applicant after practice and careful study, and therefore, the discovery process of the defects and the solutions proposed in the embodiments of the present application below for the defects should be considered as contributions of the applicant to the present application.
In order to solve the above problems, embodiments of the present application provide a prediction model determining method, apparatus, electronic device, and computer storage medium, which can quickly and automatically determine a prediction scheme corresponding to time series data, thereby saving labor cost and improving prediction efficiency.
The technology can be realized by adopting corresponding software, hardware and a combination of the software and the hardware. The following describes embodiments of the present application in detail.
The following description will be made with respect to a prediction model determination method provided in the present application.
Referring to fig. 1, an embodiment of the present application provides a prediction model determining method for determining which prediction model data should take for prediction. The steps involved will be described below in connection with fig. 1.
Step S110: and extracting the characteristics of the received data to obtain corresponding data characteristics.
In the embodiment of the application, feature extraction can be performed on the data received by the operation and maintenance system, so that the data features of the data can be fully obtained for classifying the data based on the data features.
Generally, the data acquired by the operation and maintenance system is time series data. One time series data includes data points corresponding to the same index at different time points. Based on this, the data characteristics of the data may include: at least one of the value range type of the data, the value band distribution, the value continuity, the value periodicity, the value complexity and the like.
The value range type included in the data feature is used for describing whether the numerical value of each data point included in the time series data has a numerical upper limit and a numerical lower limit, and the value range type includes two kinds of fixed value ranges and non-fixed value ranges.
In the embodiment of the present application, if the value of each data point included in a certain time series data has an upper numerical limit and a lower numerical limit, the value range type of the data is a fixed value range; if the value of each data point included in a certain time series data does not have a value upper limit or a value lower limit, the value range type of the data is a non-fixed value range.
For example, if the time-series data is the usage rate of the CPU at different time points, the maximum value of the usage rate cannot be higher than 100% and the minimum value cannot be lower than 0, so that the time-series data has an upper value limit and a lower value limit, and the value range type is a fixed value range.
For another example, if the time-series data is the cumulative number of accesses at different time points, the number of accesses is a value gradually increasing with time, and there is no maximum upper limit, so the value range type of the time-series data is a non-fixed value range.
As for the determination of the value range type of the data, it may be determined according to the service type to which the data corresponds.
The service type corresponding to the data refers to a specific service for generating the data.
In the embodiment of the application, the value range types corresponding to the generated data can be configured in advance for different service types, and the mapping table is formed. When data provided by a particular service type is subsequently received, a value range type corresponding to the service type that generated the current data may be determined by querying the mapping table. The value field type is the value field type of the data.
Wherein the data characteristic includes a magnitude of the numerical fluctuation describing a magnitude of the numerical fluctuation of each data point in the data. In the embodiment of the application, when the value range types of the data are different, different calculation modes can be adopted to determine the numerical fluctuation amplitude of the data.
Alternatively, when the value range type of the data is determined to be a fixed value range, the formula may beAnd calculating the numerical fluctuation amplitude of the obtained data.
Alternatively, when the value range type of the data is determined to be a non-fixed value range, the formula may beAnd calculating the numerical fluctuation amplitude of the obtained data.
In the above formula, A is the numerical fluctuation amplitude of the data, and N is the number of data points included in the data (i.e., time series data),x i For the data value of the ith data point in the time series data,an average data value for all data points included in the time series data.
The data features comprise a banded distribution of values for describing a mode condition formed by values of data points in the time series data.
Specifically, a numerical banding distribution of the data may be calculated based on the formula b=m1+m2+m3.
In the above formula, B is a numerical value band distribution, M1 is a duty ratio of a first mode in the numerical value of each data point included in the time series data, M2 is a duty ratio of a second mode in the numerical value of each data point included in the time series data, and M3 is a duty ratio of a third mode in the numerical value of each data point included in the time series data.
The first mode is the numerical value with the highest occurrence frequency among the numerical values of all the data points included in the time sequence data; the second mode is the value with the next highest occurrence frequency among the values of all the data points included in the time series data; the third mode is the value with the third highest occurrence frequency among the values of the data points included in the time series data.
Wherein the data features include a succession of values describing the succession between the values of the respective data points in the time series data.
In the embodiment of the present application, the numerical continuity of the time-series data may be calculated according to the formula c= Acf (1).
In the above formula, C is the numerical continuity of time series data, and Acf (Lag) is an autocorrelation function, which is a more common prior art; acf (1) represents an autocorrelation coefficient when lang=1 is calculated.
Wherein the data characteristic includes a periodicity of values for describing the periodicity between values of the data points in the time series data.
In the embodiment of the present application, the numerical periodicity of the time-series data may be calculated according to the formula d=acf (T).
Similar to the above, here D characterizes the numerical periodicity of the time series data, acf (Lag) is an autocorrelation function, which is a more common prior art; acf (T) represents an autocorrelation coefficient when lang=t is calculated; t is an approximation period determined from specific data points of the time series data.
Wherein the data features include numerical complexity for describing the numerical complexity of each data point included in the time series data.
In the embodiment of the present application, the formula e=apen=Φ may be used m (r)-Φ m+1 (r) calculating the numerical complexity of the time series data.
In the above procedure, E is the numerical complexity of the time-series data, apEn is the approximate entropy,n is the number of data points included in the time series data, r is a real number, m is a positive integer less than N, Φ m (r) represents calculating an m-order entropy value of time-series data.
Step S120: and classifying the data according to the data characteristics to obtain a classification result.
Step S130: and determining a prediction model corresponding to the data according to the classification result.
In the embodiment of the application, after the data characteristics of the data are obtained, the data can be classified according to the data characteristics of the data to obtain a classification result.
Wherein the classification result may include one of predictable, unpredictable, and low predictive value.
For this broad class of predictability, it can also be divided into a first, a second, and a third type of predictability.
In this embodiment of the present application, the process of determining the classification result to which the data belongs may adopt a manner as shown in fig. 2 according to the data characteristics of the data.
Specifically, for each data, it may be first determined whether the numerical continuity of the data is smaller than the first threshold. The first threshold is a constant that is empirically set, and may be, for example, 0.8, although in other embodiments, the first threshold may be set to other values.
When the numerical continuity is smaller than the first threshold, whether the data numerical complexity of the data is larger than the second threshold needs to be continuously judged. If the data value complexity is greater than the second threshold, the data complexity of the data is higher, otherwise, the data complexity of the data is lower.
Wherein the second threshold is an empirically set constant and may be modified according to the circumstances.
When the numerical complexity of the data is greater than the second threshold, that is, the data complexity of the data is high, the data is not suitable for prediction in the specification, and then the classification result of the data is determined to be an unpredictable type.
In some embodiments, the classification result of the data is determined to be a first one of the predictability types when the numerical complexity of the data is not greater than a second threshold, i.e., the data complexity of the data is low.
Further, in some embodiments, when it is determined that the numerical continuity of the data is not less than the first threshold, the magnitude relationship between the numerical periodicity of the data and the third threshold may continue to be determined. The third threshold may be 0.6, although in other embodiments, the third threshold may be set to other values.
If the numerical periodicity of the data is smaller than the third threshold, the magnitude relation between the numerical fluctuation amplitude of the data and the fourth threshold can be continuously judged. The fourth threshold may be set and adjusted according to the actual situation.
And if the numerical fluctuation amplitude is larger than the fourth threshold value, the numerical fluctuation of the data is larger, otherwise, the numerical fluctuation of the data is smaller.
Alternatively, if the numerical fluctuation of the data is greater than a fourth threshold, then the classification result of the data is determined to be a second predictable type of predictability.
Alternatively, if the numerical fluctuation of the data is not greater than the fourth threshold, the classification result of the data is determined to be low in predictive value.
Further, in some embodiments, when it is determined that the numerical continuity of the data is not less than the first threshold and the numerical periodicity of the data is greater than or equal to the third threshold, the magnitude relationship between the numerical banding distribution of the data and the fifth threshold may be continued to be determined. The fifth threshold may be 0.8, although in other embodiments, the fifth threshold may be set to other values.
Optionally, if the numerical banding distribution of the data is less than a fifth threshold, determining that the classification result of the data is a third predictable type of predictability.
Optionally, if the numerical fluctuation of the data is greater than or equal to the fifth threshold, determining that the classification result of the data is low in prediction value.
It should be noted that, in the embodiment of the present application, a corresponding prediction model is matched in advance for each classification result.
Wherein, the prediction model corresponding to the two classification results of unpredictable and low prediction value is empty; the prediction model corresponding to the first one of the predictors is a deep learning model, the prediction model corresponding to the second one of the predictors is an ARIMA model, and the prediction model corresponding to the third one of the predictors is a propset model.
According to the prediction model determining method provided by the embodiment of the application, the data characteristics of the data are extracted, the data are subjected to predictable type classification according to the data characteristics, and then the prediction model corresponding to the data is automatically determined according to the classification result. In the process, human participation is not needed, and the whole process is automated, so that the prediction efficiency can be improved while the labor cost is saved.
As shown in fig. 3, the embodiment of the present application further provides a prediction model determining apparatus 400, where the prediction model determining apparatus 400 may include: a feature extraction module 410, a classification module 420, and a determination module 430.
The feature extraction module 410 is configured to perform feature extraction on the received data to obtain corresponding data features;
the classification module 420 is configured to classify the data according to the data characteristics, so as to obtain a classification result;
a determining module 430, configured to determine a prediction model corresponding to the data according to the classification result;
wherein, each classification result is matched with a corresponding prediction model in advance.
In a possible implementation manner, the feature extraction module 410 is configured to obtain a service type corresponding to the data, and determine a value range type of the data according to the service type; calculating the numerical fluctuation amplitude of the data according to the value range type; determining a numerical banding distribution, a numerical continuity, a numerical periodicity, and a numerical complexity of the data; the data characteristics include the value range type, the value fluctuation amplitude, the value banding distribution, the value continuity, the value periodicity, and the value complexity.
In a possible implementation manner, each of the data is time-series data, the value range type is a fixed value range or a non-fixed value range, and the feature extraction module 410 is configured to, when determining that the value range type is the fixed value range, according to a formulaCalculating to obtain the numerical fluctuation amplitude; or,
when the value range type is determined to be the non-fixed value range, the formula is used for the value range typeCalculating to obtain the numerical fluctuation amplitude;
wherein A is the magnitude of the numerical fluctuation, N is the number of data points included in the time series data, and x i A data value for an ith data point in the time series data,An average data value for data points included in the time series data.
In a possible implementation manner, each of the data is time-series data, and the feature extraction module 410 is configured to calculate the numerical banded distribution according to the formula b=m1+m2+m3; b is the numerical value banded distribution, M1 is the first mode ratio in the data, M2 is the second mode ratio in the data, and M3 is the third mode ratio in the data;
calculating the numerical continuity according to formula c= Acf (1); c is the numerical continuity, acf (1) is the autocorrelation coefficient with parameter 1;
calculating according to the formula d=acf (T) to obtain the numerical periodicity; d is the numerical periodicity, and Acf (T) is an autocorrelation coefficient with a parameter of T;
according to the formula e=apen=Φ m (r)-Φ m+1 (r) calculating the numerical complexity; e is the numerical complexity, apEn is the approximate entropy,n is the number of data points included in the time series data, r is a real number, and m is a positive integer less than N.
In one possible implementation, the data characteristics include a numerical continuity of the data and a numerical complexity of the data; the classification module 420 is configured to determine whether the numerical continuity is less than a first threshold; judging whether the numerical complexity is larger than a second threshold value or not when the numerical continuity is smaller than the first threshold value; and when the numerical complexity is greater than the second threshold, determining that the type of the data is an unpredictable type, and enabling a prediction model corresponding to the unpredictable type to be null.
In a possible implementation manner, the classification module 420 is further configured to determine that the type of the data is a first predictable type when the numerical complexity is not greater than the second threshold, and the prediction model corresponding to the first predictable type is a deep ar deep learning model.
In one possible embodiment, the data characteristics further include a value range type of the data, a magnitude of a numerical fluctuation of the data, a numerical banding distribution of the data, and a numerical periodicity of the data; the classification module 420 is further configured to determine that the type of the data is a second predictable type and a prediction model corresponding to the second predictable type is an ARIMA model if the numerical value periodicity is determined to be less than a third threshold and the numerical value fluctuation range is greater than a fourth threshold when the numerical value continuity is not less than the first threshold; or,
and when the numerical continuity is not smaller than the first threshold, if the numerical periodicity is determined to be larger than or equal to the third threshold and the numerical banded distribution is determined to be smaller than a fifth threshold, determining the type of the data to be a third predictable type, and determining a prediction model corresponding to the third predictable type to be a Prophet model.
The prediction model determining device 400 provided in the embodiment of the present application has the same implementation principle and technical effects as those of the foregoing method embodiment, and for the sake of brevity, reference may be made to the corresponding content in the foregoing method embodiment where the device embodiment portion is not mentioned.
Furthermore, the embodiment of the present application further provides a computer storage medium, where a computer program is stored, and the computer program when executed by a computer performs the steps included in the prediction model determining method described above.
In addition, referring to fig. 4, an electronic device 100 for implementing the method and the apparatus for determining a prediction model according to the embodiments of the present application is provided in the embodiments of the present application.
Alternatively, the electronic device 100 may be, but is not limited to, a personal computer (Personal computer, PC), a smart phone, a tablet computer, a mobile Internet device (Mobile Internet Device, MID), a personal digital assistant, a server, and the like. The server may be, but is not limited to, a web server, a database server, a cloud server, etc.
Wherein the electronic device 100 may include: a processor 110, a memory 120.
It should be noted that the components and structures of the electronic device 100 shown in fig. 4 are exemplary only and not limiting, as the electronic device 100 may have other components and structures as desired.
The processor 110, the memory 120, and other components that may be present in the electronic device 100 are electrically connected to each other, either directly or indirectly, to enable transmission or interaction of data. For example, the processor 110, the memory 120, and possibly other components may be electrically connected to each other by one or more communication buses or signal lines.
The memory 120 is used for storing programs, for example, a program corresponding to the previously-presented prediction model determination method or previously-presented prediction model determination means. Alternatively, when the predictive model determination means is stored in the memory 120, the predictive model determination means includes at least one software functional module that may be stored in the memory 120 in the form of software or firmware (firmware).
Alternatively, the software function module included in the predictive model determination apparatus may be solidified in an Operating System (OS) of the electronic device 100.
The processor 110 is configured to execute executable modules stored in the memory 120, such as software functional modules or computer programs included in the predictive model determination device. When the processor 110 receives the execution instructions, it may execute a computer program, for example, to perform: extracting features of the received data to obtain corresponding data features; according to the data characteristics, carrying out predictable type classification on the data to obtain a classification result; determining a prediction model corresponding to the data according to the classification result; wherein, each classification result is matched with a corresponding prediction model in advance.
Of course, the methods disclosed in any of the embodiments of the present application may be applied to the processor 110 or implemented by the processor 110.
In summary, the method, the device, the electronic device and the computer storage medium for determining the prediction model provided by the embodiment of the invention comprise the following steps: extracting features of the received data to obtain corresponding data features; according to the data characteristics, carrying out predictable type classification on the data to obtain a classification result; determining a prediction model corresponding to the data according to the classification result; wherein, each classification result is matched with a corresponding prediction model in advance. By the method, the prediction scheme corresponding to the time sequence data can be automatically determined, and the prediction scheme is not required to be manually determined, so that the labor cost is saved and the prediction efficiency is improved.
It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other manners as well. The apparatus embodiments described above are merely illustrative, for example, flow diagrams and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application.

Claims (7)

1. A method of predictive model determination, the method comprising:
extracting features of the received data to obtain corresponding data features; the received data is time sequence data;
according to the data characteristics, carrying out predictable type classification on the data to obtain a classification result;
determining a prediction model corresponding to the data according to the classification result;
wherein, each classification result is matched with a corresponding prediction model in advance;
the data features include a numerical continuity of the data, a numerical complexity of the data, a numerical fluctuation amplitude of the data, a numerical banding distribution of the data, and a numerical periodicity of the data; the classifying the data according to the data characteristics, including:
judging whether the numerical continuity is smaller than a first threshold value;
judging whether the numerical complexity is larger than a second threshold value or not when the numerical continuity is smaller than the first threshold value;
when the numerical complexity is greater than the second threshold, determining that the type of the data is an unpredictable type, and a prediction model corresponding to the unpredictable type is empty;
when the numerical complexity is not greater than the second threshold, determining that the type of the data is a first predictable type, and determining that a prediction model corresponding to the first predictable type is a deep learning model;
when the numerical continuity is not smaller than the first threshold, if the numerical periodicity is determined to be smaller than a third threshold and the numerical fluctuation amplitude is greater than a fourth threshold, determining that the type of the data is a second predictable type, and determining that a prediction model corresponding to the second predictable type is an ARIMA model; or,
and when the numerical continuity is not smaller than the first threshold, if the numerical periodicity is determined to be larger than or equal to the third threshold and the numerical banded distribution is determined to be smaller than a fifth threshold, determining the type of the data to be a third predictable type, and determining a prediction model corresponding to the third predictable type to be a Prophet model.
2. The method of claim 1, wherein the feature extraction of the received data comprises:
acquiring a service type corresponding to the data, and determining a value field type of the data according to the service type;
calculating the numerical fluctuation amplitude of the data according to the value range type;
determining a numerical banding distribution, a numerical continuity, a numerical periodicity, and a numerical complexity of the data;
the data characteristics include the value range type, the value fluctuation amplitude, the value banding distribution, the value continuity, the value periodicity, and the value complexity.
3. The method of claim 2, wherein each of the data is time-series data, the range type is a fixed range or a non-fixed range, and the calculating the magnitude of the numerical fluctuation of the data based on the range type comprises:
when the value range type is determined to be the fixed value range, the formula is used for the value range typeCalculating to obtain the numerical fluctuation amplitude; or,
when the value range type is determined to be the non-fixed value range, the formula is used for the value range typeCalculating to obtain the numerical fluctuation amplitude;
wherein A is the magnitude of the numerical fluctuation, N is the number of data points included in the time series data, and x i For the data value of the ith data point in the time series data,an average data value for data points included in the time series data.
4. The method of claim 2, wherein each of the data is time-series data, and wherein the determining the numerical banding distribution, numerical continuity, numerical periodicity, and numerical complexity of the data comprises:
calculating to obtain the numerical value banded distribution according to a formula b=m1+m2+m3; b is the numerical value banded distribution, M1 is the first mode ratio in the data, M2 is the second mode ratio in the data, and M3 is the third mode ratio in the data;
calculating the numerical continuity according to formula c= Acf (1); c is the numerical continuity, acf (1) is the autocorrelation coefficient with parameter 1;
calculating according to the formula d=acf (T) to obtain the numerical periodicity; d is the numerical periodicity, and Acf (T) is an autocorrelation coefficient with a parameter of T;
according to the formulaCalculating to obtain the numerical complexity; e is the numerical complexity, apEn is the approximate entropy, < >>N is the number of data points included in the time series data, r is a real number, and m is a positive integer less than N.
5. A predictive model determination device, the device comprising:
the feature extraction module is used for carrying out feature extraction on the received data to obtain corresponding data features; the received data is time sequence data;
the classification module is used for classifying the data in a predictable type according to the data characteristics to obtain a classification result; the data features include a numerical continuity of the data, a numerical complexity of the data, a numerical fluctuation amplitude of the data, a numerical banding distribution of the data, and a numerical periodicity of the data; judging whether the numerical continuity is smaller than a first threshold value; judging whether the numerical complexity is larger than a second threshold value or not when the numerical continuity is smaller than the first threshold value; when the numerical complexity is greater than the second threshold, determining that the type of the data is an unpredictable type, and a prediction model corresponding to the unpredictable type is empty; when the numerical complexity is not greater than the second threshold, determining that the type of the data is a first predictable type, and determining that a prediction model corresponding to the first predictable type is a deep learning model; when the numerical continuity is not smaller than the first threshold, if the numerical periodicity is determined to be smaller than a third threshold and the numerical fluctuation amplitude is greater than a fourth threshold, determining that the type of the data is a second predictable type, and determining that a prediction model corresponding to the second predictable type is an ARIMA model; or when the numerical continuity is not smaller than the first threshold, if the numerical periodicity is determined to be greater than or equal to the third threshold and the numerical banding distribution is determined to be smaller than a fifth threshold, determining the type of the data to be a third predictable type, and determining a prediction model corresponding to the third predictable type to be a propset model;
the determining module is used for determining a prediction model corresponding to the data according to the classification result;
wherein, each classification result is matched with a corresponding prediction model in advance.
6. An electronic device, comprising: the device comprises a memory and a processor, wherein the memory is connected with the processor;
the memory is used for storing programs;
the processor invokes a program stored in the memory to perform the method of any one of claims 1-4.
7. A computer storage medium, characterized in that it has stored thereon a computer program which, when executed by a computer, performs the method according to any of claims 1-4.
CN202111019064.2A 2021-09-01 2021-09-01 Prediction model determining method, apparatus, electronic device and computer storage medium Active CN113688929B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111019064.2A CN113688929B (en) 2021-09-01 2021-09-01 Prediction model determining method, apparatus, electronic device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111019064.2A CN113688929B (en) 2021-09-01 2021-09-01 Prediction model determining method, apparatus, electronic device and computer storage medium

Publications (2)

Publication Number Publication Date
CN113688929A CN113688929A (en) 2021-11-23
CN113688929B true CN113688929B (en) 2024-02-23

Family

ID=78584617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111019064.2A Active CN113688929B (en) 2021-09-01 2021-09-01 Prediction model determining method, apparatus, electronic device and computer storage medium

Country Status (1)

Country Link
CN (1) CN113688929B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750830A (en) * 2015-04-01 2015-07-01 东南大学 Cycle mining method of time series data
CN104866932A (en) * 2015-06-12 2015-08-26 哈尔滨工业大学 Time series prediction method based on prediction model applicability judgment
CN111199018A (en) * 2019-12-27 2020-05-26 东软集团股份有限公司 Abnormal data detection method and device, storage medium and electronic equipment
CN111258866A (en) * 2020-01-14 2020-06-09 中国平安人寿保险股份有限公司 Computer performance prediction method, device, equipment and readable storage medium
CN111931860A (en) * 2020-09-01 2020-11-13 腾讯科技(深圳)有限公司 Abnormal data detection method, device, equipment and storage medium
CN112685273A (en) * 2020-12-29 2021-04-20 京东数字科技控股股份有限公司 Anomaly detection method and device, computer equipment and storage medium
CN113254877A (en) * 2021-05-18 2021-08-13 北京达佳互联信息技术有限公司 Abnormal data detection method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11526370B2 (en) * 2019-03-10 2022-12-13 Microsoft Technology Licensing, Llc. Cloud resource management using machine learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104750830A (en) * 2015-04-01 2015-07-01 东南大学 Cycle mining method of time series data
CN104866932A (en) * 2015-06-12 2015-08-26 哈尔滨工业大学 Time series prediction method based on prediction model applicability judgment
CN111199018A (en) * 2019-12-27 2020-05-26 东软集团股份有限公司 Abnormal data detection method and device, storage medium and electronic equipment
CN111258866A (en) * 2020-01-14 2020-06-09 中国平安人寿保险股份有限公司 Computer performance prediction method, device, equipment and readable storage medium
CN111931860A (en) * 2020-09-01 2020-11-13 腾讯科技(深圳)有限公司 Abnormal data detection method, device, equipment and storage medium
CN112685273A (en) * 2020-12-29 2021-04-20 京东数字科技控股股份有限公司 Anomaly detection method and device, computer equipment and storage medium
CN113254877A (en) * 2021-05-18 2021-08-13 北京达佳互联信息技术有限公司 Abnormal data detection method and device, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Improving precise point positioning performance based on Prophet model;Liao, SJ.et al;《PLOS ONE》;第16卷(第01期);全文 *
Prophet框架下生产物流运输节点的阻塞预警方法;曹小华等;《武汉理工大学学报(交通科学与工程版)》;全文 *
基于LSTM-Prophet非线性组合的时间序列预测模型;赵英等;《计算机与现代化》(第09期);全文 *

Also Published As

Publication number Publication date
CN113688929A (en) 2021-11-23

Similar Documents

Publication Publication Date Title
CN109697207B (en) Method and system for monitoring abnormity of time sequence data
CN110826648B (en) Method for realizing fault detection by utilizing time sequence clustering algorithm
CN106682906B (en) Risk identification and service processing method and equipment
US11301506B2 (en) Automated digital asset tagging using multiple vocabulary sets
CN107343164B (en) Video monitoring task allocation method and device
CN116108393B (en) Power sensitive data classification and classification method and device, storage medium and electronic equipment
US9734434B2 (en) Feature interpolation
CN112990583A (en) Method and equipment for determining mold entering characteristics of data prediction model
CN113904943B (en) Account detection method and device, electronic equipment and storage medium
CN113688929B (en) Prediction model determining method, apparatus, electronic device and computer storage medium
CN110781410A (en) Community detection method and device
CN108764206B (en) Target image identification method and system and computer equipment
CN116308634A (en) Double-tower model recommendation method and device based on behavior sequence and weight sharing
CN116070382A (en) Risk prediction method and device for network, processor and electronic equipment
CN113722593B (en) Event data processing method, device, electronic equipment and medium
CN114238223A (en) Picture removing method and device, computer equipment and computer readable storage medium
CN111581487B (en) Information processing method and device
CN114818907A (en) State monitoring method, device and equipment of power transmission line and storage medium
CN108958929B (en) Method and device for applying algorithm library, storage medium and electronic equipment
CN114565794A (en) Bearing fault classification method, device, equipment and storage medium
CN111078914B (en) Method and device for detecting repeated pictures
CN111797282A (en) Product label weight determination method and device, electronic equipment and readable storage medium
CN105468603A (en) Data selection method and apparatus
CN112115418B (en) Method, device and equipment for acquiring bias estimation information
JP2020119201A (en) Determination device, determination method and determination program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40064504

Country of ref document: HK

GR01 Patent grant
GR01 Patent grant