CN113705910A - Data sample expansion method, device, equipment and medium - Google Patents

Data sample expansion method, device, equipment and medium Download PDF

Info

Publication number
CN113705910A
CN113705910A CN202111017164.1A CN202111017164A CN113705910A CN 113705910 A CN113705910 A CN 113705910A CN 202111017164 A CN202111017164 A CN 202111017164A CN 113705910 A CN113705910 A CN 113705910A
Authority
CN
China
Prior art keywords
data
periodic component
enhancement
original time
time sequence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111017164.1A
Other languages
Chinese (zh)
Inventor
张穗辉
匡文彬
陈晓帆
郜振锋
古亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202111017164.1A priority Critical patent/CN113705910A/en
Publication of CN113705910A publication Critical patent/CN113705910A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Probability & Statistics with Applications (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Game Theory and Decision Science (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Complex Calculations (AREA)

Abstract

The embodiment of the application discloses a method, a device, equipment and a medium for expanding data samples, which are used for acquiring original time series data; decomposing the original time sequence data to obtain periodic component data and trend component data; the diversity of the periodic component data can improve the coverage of the data set for the business scenario. According to the proportion of the periodic component data in the original time sequence data, performing enhancement processing on the periodic component data to obtain periodic component enhancement data; and superposing the periodic component enhancement data and the trend component data to obtain expanded time sequence data. According to the technical scheme, the periodic component data can be subjected to enhancement processing in different modes according to the proportion of the periodic component data in the original time sequence data, so that data with enough difference can be obtained, the obtained data can better cover the characteristics of a service scene, and the high quality of the expanded time sequence data is ensured.

Description

Data sample expansion method, device, equipment and medium
Technical Field
The present application relates to the field of model prediction technologies, and in particular, to a method, an apparatus, a device, and a computer-readable storage medium for expanding data samples.
Background
Time series data is data collected at different times, and such data reflects the changing state or degree of a certain thing, phenomenon, etc. over time. Time series analysis is a statistical method for processing time series data, and the method is based on a random process theory and a mathematical statistics method, and researches a statistical rule followed by the time series data so as to solve a practical problem.
The task of time series prediction is to construct a model based on historical data and predict the future trend of the series. In order for the model to be able to predict future trends more accurately, a sufficient data set must be collected to train the model, which requires a sufficient number of samples for the data set and sufficient differences between the samples to cover all scenarios. However, in an actual environment, the collected data sets are often insufficient in quantity, single in data characteristics and poor in service scene coverage, so that the model obtained based on the data set training is poor in prediction effect.
Therefore, how to acquire a high-quality data set is a problem to be solved by those skilled in the art.
Disclosure of Invention
The embodiment of the application aims to provide a method, a device, equipment and a computer-readable storage medium for expanding data samples, which can acquire a high-quality data set.
To solve the foregoing technical problem, an embodiment of the present application provides an expansion method for data samples, including:
acquiring original time sequence data;
decomposing the original time sequence data to obtain periodic component data and trend component data;
according to the ratio of the periodic component data in the original time sequence data, performing enhancement processing on the periodic component data to obtain periodic component enhancement data;
and superposing the periodic component enhancement data and the trend component data to obtain expanded time sequence data.
Optionally, the performing enhancement processing on the periodic component data according to the proportion of the periodic component data in the original time series data to obtain periodic component enhancement data includes:
and under the condition that the ratio is greater than or equal to a preset threshold value, scaling and adding noise to the periodic component data to obtain periodic component enhancement data.
Optionally, the performing enhancement processing on the periodic component data according to the proportion of the periodic component data in the original time series data to obtain periodic component enhancement data includes:
and under the condition that the ratio is smaller than a preset threshold value, carrying out slice enhancement processing on the periodic component data according to a set service distribution rule to obtain periodic component enhancement data.
Optionally, the slice enhancing processing on the periodic component data according to a set service distribution rule to obtain periodic component enhanced data includes:
intercepting data segments from the periodic component data according to a set service distribution rule;
extending the data fragments according to the set data demand;
and scaling and denoising the extended data segment to obtain periodic component enhanced data.
Optionally, the extending the data segment according to the set data demand includes:
determining a continuation multiple based on the data demand and the data length of the data fragment;
and repeatedly copying the data fragment according to the continuation multiple to obtain a prolonged data fragment.
Optionally, for the determination of the proportion of the periodic component data in the raw time series data, the method comprises:
acquiring a first extreme difference value of the periodic component data and a second extreme difference value of the original time sequence data;
taking the sum of the first and second pole difference values as a total pole difference value; and taking the ratio of the first range value to the total range value as the occupation ratio of the periodic component data in the original time sequence data.
Optionally, the original time-series data is device operation state data; the method further comprises the following steps:
and training the initial equipment state classification model by using the expanded time sequence data to obtain an equipment state classification model meeting the accuracy requirement, so as to determine the equipment state corresponding to the newly acquired equipment running state data according to the equipment state classification model.
The embodiment of the application also provides an expansion device of the data sample, which comprises an acquisition unit, a decomposition unit, an enhancement unit and an overlapping unit;
the acquisition unit is used for acquiring original time series data;
the decomposition unit is used for decomposing the original time series data to obtain periodic component data and trend component data;
the enhancement unit is used for enhancing the periodic component data according to the proportion of the periodic component data in the original time sequence data to obtain periodic component enhanced data;
the superposition unit is used for superposing the periodic component enhancement data and the trend component data to obtain expanded time series data.
Optionally, the enhancement unit is configured to, when the ratio is greater than or equal to a preset threshold, perform scaling and noise adding processing on the periodic component data to obtain periodic component enhancement data.
Optionally, the enhancing unit is configured to perform slice enhancement processing on the periodic component data according to a set service distribution rule under the condition that the ratio is smaller than a preset threshold, so as to obtain periodic component enhancement data.
Optionally, the enhancement unit includes a truncation subunit, a continuation subunit, and a noise addition subunit;
the intercepting subunit is configured to intercept data segments from the periodic component data according to a set service distribution rule;
the continuation subunit is used for extending the data segments according to the set data demand;
and the noise adding subunit is used for scaling and adding noise to the extended data segment to obtain the periodic component enhancement data.
Optionally, the continuation subunit is configured to determine a continuation multiple based on the data demand and the data length of the data segment; and repeatedly copying the data fragment according to the continuation multiple to obtain a prolonged data fragment.
Optionally, the apparatus includes as a unit, for determination of a ratio of the periodic component data in the original time-series data;
the acquisition unit is used for acquiring a first extreme difference value of the periodic component data and a second extreme difference value of the original time sequence data;
the as unit is used for taking the sum of the first polarization value and the second polarization value as a total polarization value; and taking the ratio of the first range value to the total range value as the occupation ratio of the periodic component data in the original time sequence data.
Optionally, the original time-series data is device operation state data; the apparatus further comprises a training unit;
the training unit is used for training the initial equipment state classification model by using the expanded time series data to obtain an equipment state classification model meeting the accuracy requirement, so that the equipment state corresponding to the newly acquired equipment running state data can be determined according to the equipment state classification model.
The embodiment of the present application further provides an expansion device for data samples, including:
a memory for storing a computer program;
a processor for executing the computer program to implement the steps of the method for augmenting a data sample as described in any of the above.
The present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of the data sample expansion method according to any one of the above items.
According to the technical scheme, the original time sequence data is obtained; decomposing the original time sequence data to obtain periodic component data and trend component data; the diversity of the periodic component data can improve the coverage of the data set for the service scene and improve the data quality of the data set, so that in the technical scheme, the periodic component data can be processed, and the purpose of expanding the data set is achieved. According to the proportion of the periodic component data in the original time sequence data, performing enhancement processing on the periodic component data to obtain periodic component enhancement data; and superposing the periodic component enhancement data and the trend component data to obtain expanded time sequence data. According to the technical scheme, the periodic component data can be subjected to enhancement processing in different modes according to the proportion of the periodic component data in the original time sequence data, so that data with enough difference can be obtained, the obtained data can better cover the characteristics of a service scene, and the high quality of the expanded time sequence data is ensured.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a schematic view of a data sample expansion scenario provided in an embodiment of the present application;
FIG. 2 is a flowchart of a method for expanding data samples according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of an expansion device for data samples according to an embodiment of the present disclosure;
fig. 4 is a schematic structural diagram of an expansion device for data samples according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the present application.
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings.
In order to make the model more accurate in predicting the result in the traditional way, a sufficient data set must be collected to train the model. However, the problems of insufficient quantity, single data characteristics and poor service scene coverage of the currently collected data sets often exist, so that the model obtained based on the data set training is poor in prediction effect.
Therefore, the embodiment of the application provides a method, a device, equipment and a medium for expanding data samples, and original time series data are obtained; the raw time series data is decomposed to obtain periodic component data and trend component data. Referring to a scenario diagram of data sample expansion shown in fig. 1, expansion of raw time-series data may be implemented depending on a computer device on which a manner for seasonal decomposition of raw time-series data and a manner for enhanced processing of periodic component data may be set. The original time series data can be separated into periodic component data, random component data and trend component data through seasonal decomposition. The diversity of the periodic component data can improve the coverage of the data set for the business scene and improve the data quality of the data set. Therefore, in the embodiment of the present application, the periodic component data may be subjected to enhancement processing to obtain periodic component enhancement data according to the proportion of the periodic component data in the original time series data. The occupation ratios of the periodic component data in the original time sequence data are different, corresponding enhancement processing modes are different, scaling and noise adding modes can be adopted for processing when the occupation ratios are high, and slice enhancement modes can be adopted for processing when the occupation ratios are low.
The scaling may be to increase or decrease the amplitude of the periodic component data, and the adding the noise may be to add a noise signal to the scaled periodic component data. By scaling the periodic component data, periodic component data different from the original periodic component data can be obtained. On the basis, the noise adding processing is carried out, the randomness of the periodic component data can be met, so that the difference between the finally obtained periodic component enhanced data and the original periodic component data is ensured, the characteristics of a service scene can be covered, and the requirement of the number of samples is met.
The slice enhancement processing may be to intercept a data segment that better reflects the service scene characteristics from the periodic component data, and may extend the data segment, thereby performing scaling and noise addition processing on the extended data segment. Considering that the data volume of the intercepted data segment is often small, in order to enable the intercepted data segment to more fully embody the characteristics of the service scene, the data segment can be extended, and the extension is mainly to repeat the data segment.
The extended time series data can be obtained by overlapping the periodic component enhancement data and the trend component data. The expanded time sequence data has enough difference, can better cover the characteristics of a service scene, and ensures the high quality of the expanded time sequence data.
Next, a method for expanding data samples provided in the embodiments of the present application will be described in detail. Fig. 2 is a flowchart of a method for expanding a data sample according to an embodiment of the present application, where the method includes:
s201: raw time-series data is acquired.
The form of the original time series data is various, and the form of the original time series data required to be acquired can be determined by combining with the actual service scene. For example, when the occupancy of the system resource needs to be predicted, the original time series data may be data of the occupancy of the system resource by each device within a preset time period; when the operation states of the devices need to be classified, the original time series data may be operation state data corresponding to the devices within a preset time period.
In the embodiment of the present application, the specific form of the original time series data is not limited, and may be determined according to an actual service scenario.
S202: the raw time series data is decomposed to obtain periodic component data and trend component data.
The original time series data are decomposed mainly to acquire data capable of reflecting service scene characteristics.
In the embodiment of the present application, the periodic component data, the trend component data, and the random component data may be separated from the raw time-series data by using seasonal decomposition. In practical applications, no processing is performed on the random component data, and therefore only the periodic component data and the trend component data are mentioned in S202.
S203: and according to the occupation ratio of the periodic component data in the original time sequence data, performing enhancement processing on the periodic component data to obtain periodic component enhanced data.
The diversity of the periodic component data can improve the coverage of the data set for the service scene and improve the data quality of the data set, so that in the embodiment of the application, the periodic component data can be enhanced, and the purpose of expanding the data set is achieved.
In consideration of practical application, the proportion of the periodic component data in the original time sequence data is different under different service scenes. Therefore, in the embodiment of the present application, different enhancing modes can be set for different ratios. When the ratio of the periodic component data in the original time-series data is high, the periodic component data can be directly subjected to scaling and noise-adding processing. When the proportion of the periodic component data in the original time sequence data is low, in order to effectively enhance the data reflecting the service scene characteristics, a data fragment capable of reflecting the service scene characteristics can be intercepted from the periodic component data based on the service distribution rule of the service scene, and the data fragment is extended, so that the extended data fragment is subjected to scaling and noise adding processing.
S204: and superposing the periodic component enhancement data and the trend component data to obtain expanded time sequence data.
After the periodic component enhancement data is obtained, the periodic component enhancement data and the trend component data may be superimposed to obtain augmented time-series data.
According to the technical scheme, the original time sequence data is obtained; decomposing the original time sequence data to obtain periodic component data and trend component data; the diversity of the periodic component data can improve the coverage of the data set for the service scene and improve the data quality of the data set, so that in the technical scheme, the periodic component data can be processed, and the purpose of expanding the data set is achieved. According to the proportion of the periodic component data in the original time sequence data, performing enhancement processing on the periodic component data to obtain periodic component enhancement data; and superposing the periodic component enhancement data and the trend component data to obtain expanded time sequence data. According to the technical scheme, the periodic component data can be subjected to enhancement processing in different modes according to the proportion of the periodic component data in the original time sequence data, so that data with enough difference can be obtained, the obtained data can better cover the characteristics of a service scene, and the high quality of the expanded time sequence data is ensured.
In the embodiment of the present application, the periodic component data may be subjected to enhancement processing according to the proportion of the periodic component data in the original time series data to obtain periodic component enhancement data. In a specific implementation, the case of the ratio can be classified by a preset threshold.
The determination manner for the occupation ratio of the periodic component data in the original time-series data may include acquiring a first extreme difference value of the periodic component data and a second extreme difference value of the original time-series data; taking the sum of the first and second pole difference values as a total pole difference value; and taking the ratio of the first range value to the total range value as the ratio of the periodic component data in the original time sequence data.
Wherein the first range represents a difference between a maximum value of the periodic component data and a minimum value of the periodic component data; the second-polarity difference value represents a numerical value after adding differences of respective maximum values and minimum values corresponding to the periodic component data, the trend component data, and the random component data in the original time-series data.
The way of calculating the ratio of the periodic component data in the original time-series data can be seen in the following formula,
Figure BDA0003240304770000091
wherein eta ispRepresenting the ratio of the periodic component data in the original time sequence data; max (p) represents the maximum value of the periodic component data; min (p) represents the minimum value of the periodic composition data; max (p) -min (p) represents a first variance value of the periodic composition data; max (i) represents the maximum value of the periodic component data, and min (i) represents the minimum value of the periodic component data when i is equal to p; max (i) represents the maximum value of the tendency component data, and min (i) represents the minimum value of the tendency component data when i is t; max (i) represents the maximum value of the random component data, and min (i) represents the minimum value of the random component data when i is r; sigmai=p,t,r(max (i) -min (i)) represents a second derivative of the raw time series data.
Under the condition that the ratio is greater than or equal to the preset threshold, the original time sequence data shows obvious periodicity, so that the periodic component data can be directly subjected to scaling and noise adding processing to obtain periodic component enhanced data.
In practical application, the periodic component data can be scaled and noised according to the following formula,
Paug=α*Pori+ε;
wherein, PoriRepresenting periodic composition data; paugRepresenting periodic component enhancement data; α is a scaling factor that can be used to control the amplitude of the periodic component enhancement data; ε is the noise signal that follows a normal distribution with mean μ and standard deviation σ.
In the embodiment of the present application, in order to ensure that the periodic component enhancement data and the periodic component data are different but do not deviate too much, the scaling factor may be controlled within a suitable range. For example, the scaling factor may take on a range of [0.9, 1.1 ].
And in the case that the occupation ratio is smaller than the preset threshold value, the original time sequence data has no obvious periodic rule. In consideration of the fact that in practical application, the length of the acquired original time series data is limited, and the period length of the original time series data may be long, so that a period rule cannot be found from the current original time series data, and therefore, when the occupancy is smaller than a preset threshold, slice enhancement processing can be performed on the periodic component data according to a set service distribution rule to obtain the periodic component enhancement data.
In practical application, the periodic component data can be scaled and noised according to the following formula,
Paug=α*Po'ri+ε;
wherein, Po'riRepresenting extended data segments; paugRepresenting periodic component enhancement data; α is a scaling factor, which mayFor controlling the amplitude of the periodic component enhancement data; ε is the noise signal that follows a normal distribution with mean μ and standard deviation σ.
In the embodiment of the application, data fragments can be intercepted from the periodic component data according to a set service distribution rule; extending the data fragments according to the set data demand; and scaling and denoising the extended data segment to obtain periodic component enhanced data.
The data demand can be set according to the data amount required by model training in the current service scene.
The continuation of the data segment is mainly to repeat the data segment. Based on the data demand and the data length of the data segment, a continuation multiple can be determined; and repeatedly copying the data fragment according to the continuation multiple to obtain the extended data fragment.
The service distribution rule refers to the periodicity of the service itself in the current service scenario, for example, some services have obvious periodicity of the week, and some services have obvious periodicity of the month.
The service distribution rule is a cycle every 7 days, the currently acquired original time sequence data is assumed to be data of 10 days, at this time, the data of 7 days can be intercepted from the periodic component data to be used as a data fragment, and the determined continuation multiple is assumed to be 3, the data fragment can be repeated twice, so that 3 identical data fragments are obtained, and the 3 data fragments are the data fragments after continuation.
In the embodiment of the application, under the condition that the proportion of the periodic component data in the original time sequence data is greater than or equal to the preset threshold, the periodic component data can represent the characteristics of the current service scene, and at the moment, the periodic component data can be directly subjected to scaling and noise adding processing to obtain periodic component enhancement data; and overlapping the periodic component enhancement data and the trend component data to obtain expanded time series data. When the proportion of the periodic component data in the original time sequence data is smaller than a preset threshold, the proportion of the periodic component data is less, and in order to improve the capability of the periodic component data for representing the current service scene characteristics, slice enhancement processing can be performed on the periodic component data according to a set service distribution rule to obtain periodic component enhancement data; and superposing the periodic component enhancement data and the trend component data to obtain expanded time sequence data. According to the technical scheme, the periodic component data are processed in different modes according to the proportion of the periodic component data in the original time sequence data, so that data with enough difference can be obtained, and the obtained data can better cover the characteristics of a service scene, so that the high quality of the expanded time sequence data is ensured.
In the embodiment of the present application, the extended time series data can be used for training of the model. Taking the original time-series data as the device running state data as an example, in a specific implementation, the expanded time-series data can be used for training an initial device state classification model to obtain a device state classification model meeting the accuracy requirement, so that the device state corresponding to the newly acquired device running state data can be determined according to the device state classification model.
The device states may include a device normal load state, a device overload state, and a device idle state.
In the embodiment of the application, the expanded time series data has enough difference while representing the characteristics of the service scene, and the problems of insufficient data sample data quantity, single data characteristic and weak service scene coverage are effectively solved. And training the model based on the expanded time series data, so that the trained model has better accuracy, and the accuracy of model execution result prediction is improved.
Fig. 3 is a schematic structural diagram of an expansion apparatus for data samples according to an embodiment of the present application, including an obtaining unit 31, a decomposition unit 32, an enhancement unit 33, and a superposition unit 34;
an acquisition unit 31 for acquiring original time-series data;
a decomposition unit 32 for decomposing the original time-series data to obtain periodic component data and trend component data;
an enhancement unit 33, configured to perform enhancement processing on the periodic component data according to a ratio of the periodic component data in the original time series data to obtain periodic component enhancement data;
and the superposition unit 34 is used for superposing the periodic component enhancement data and the trend component data to obtain expanded time series data.
Optionally, the enhancement unit is configured to, in a case that the duty ratio is greater than or equal to a preset threshold, perform scaling and noise adding processing on the periodic component data to obtain periodic component enhancement data.
Optionally, the enhancement unit is configured to perform slice enhancement processing on the periodic component data according to a set service distribution rule under the condition that the proportion is smaller than a preset threshold, so as to obtain the periodic component enhancement data.
Optionally, the enhancement unit includes an interception subunit, a continuation subunit, and a noise addition subunit;
the intercepting subunit is used for intercepting data fragments from the periodic component data according to a set service distribution rule;
the continuation subunit is used for extending the data fragments according to the set data demand quantity;
and the noise adding subunit is used for carrying out scaling and noise adding processing on the extended data fragment to obtain the periodic component enhanced data.
Optionally, the continuation subunit is configured to determine a continuation multiple based on the data demand and the data length of the data segment; and repeatedly copying the data fragment according to the continuation multiple to obtain the extended data fragment.
Alternatively, the apparatus includes as a unit, for determination of a ratio of the periodic component data in the original time-series data;
the acquisition unit is used for acquiring a first extreme difference value of the periodic component data and a second extreme difference value of the original time sequence data;
a unit for taking a sum of the first and second polarization values as a total polarization value; and taking the ratio of the first range value to the total range value as the ratio of the periodic component data in the original time sequence data.
Optionally, the original time-series data is device operation state data; the device further comprises a training unit;
and the training unit is used for training the initial equipment state classification model by using the expanded time series data to obtain an equipment state classification model meeting the accuracy requirement so as to determine the equipment state corresponding to the newly acquired equipment operation state data according to the equipment state classification model.
The description of the features in the embodiment corresponding to fig. 3 may refer to the related description of the embodiment corresponding to fig. 2, and is not repeated here.
According to the technical scheme, the original time sequence data is obtained; decomposing the original time sequence data to obtain periodic component data and trend component data; the diversity of the periodic component data can improve the coverage of the data set for the service scene and improve the data quality of the data set, so that in the technical scheme, the periodic component data can be processed, and the purpose of expanding the data set is achieved. According to the proportion of the periodic component data in the original time sequence data, performing enhancement processing on the periodic component data to obtain periodic component enhancement data; and superposing the periodic component enhancement data and the trend component data to obtain expanded time sequence data. According to the technical scheme, the periodic component data can be subjected to enhancement processing in different modes according to the proportion of the periodic component data in the original time sequence data, so that data with enough difference can be obtained, the obtained data can better cover the characteristics of a service scene, and the high quality of the expanded time sequence data is ensured.
Fig. 4 is a schematic structural diagram of an expansion device 40 for data samples according to an embodiment of the present application, including:
a memory 41 for storing a computer program;
a processor 42 for executing a computer program to implement the steps of the method of augmenting a data sample as described above.
The embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the steps of the method for expanding any one of the data samples are implemented.
A method, an apparatus, a device, and a computer-readable storage medium for expanding a data sample provided in the embodiments of the present application are described in detail above. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Claims (10)

1. A method for augmenting data samples, comprising:
acquiring original time sequence data;
decomposing the original time sequence data to obtain periodic component data and trend component data;
according to the ratio of the periodic component data in the original time sequence data, performing enhancement processing on the periodic component data to obtain periodic component enhancement data;
and superposing the periodic component enhancement data and the trend component data to obtain expanded time sequence data.
2. The method for expanding data samples according to claim 1, wherein the enhancing the periodic component data according to the ratio of the periodic component data in the original time-series data to obtain periodic component enhanced data comprises:
and under the condition that the ratio is greater than or equal to a preset threshold value, scaling and adding noise to the periodic component data to obtain periodic component enhancement data.
3. The method for expanding data samples according to claim 1, wherein the enhancing the periodic component data according to the ratio of the periodic component data in the original time-series data to obtain periodic component enhanced data comprises:
and under the condition that the ratio is smaller than a preset threshold value, carrying out slice enhancement processing on the periodic component data according to a set service distribution rule to obtain periodic component enhancement data.
4. The method for expanding data samples according to claim 3, wherein the slice enhancement processing on the periodic component data according to the set traffic distribution rule to obtain the periodic component enhancement data comprises:
intercepting data segments from the periodic component data according to a set service distribution rule;
extending the data fragments according to the set data demand;
and scaling and denoising the extended data segment to obtain periodic component enhanced data.
5. The method for expanding data samples according to claim 4, wherein the extending the data segments according to the set data requirement comprises:
determining a continuation multiple based on the data demand and the data length of the data fragment;
and repeatedly copying the data fragment according to the continuation multiple to obtain a prolonged data fragment.
6. The method of expanding data samples according to claim 1, wherein for the determination of the proportion of the periodic component data in the raw time series data, the method comprises:
acquiring a first extreme difference value of the periodic component data and a second extreme difference value of the original time sequence data;
taking the sum of the first and second pole difference values as a total pole difference value; and taking the ratio of the first range value to the total range value as the occupation ratio of the periodic component data in the original time sequence data.
7. The method according to any one of claims 1 to 6, wherein the original time series data is device operation state data; the method further comprises the following steps:
and training an initial equipment state classification model by using the expanded time sequence data to obtain an equipment state classification model meeting the accuracy requirement, so as to determine the equipment state corresponding to the newly acquired equipment running state data according to the equipment state classification model.
8. An expansion device of a data sample is characterized by comprising an acquisition unit, a decomposition unit, an enhancement unit and an superposition unit;
the acquisition unit is used for acquiring original time series data;
the decomposition unit is used for decomposing the original time series data to obtain periodic component data and trend component data;
the enhancement unit is used for enhancing the periodic component data according to the proportion of the periodic component data in the original time sequence data to obtain periodic component enhanced data;
the superposition unit is used for superposing the periodic component enhancement data and the trend component data to obtain expanded time series data.
9. An expansion device for data samples, comprising:
a memory for storing a computer program;
a processor for executing said computer program for carrying out the steps of the method of augmenting a data sample according to any one of claims 1 to 7.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out the steps of the method for augmenting a data sample according to any one of claims 1 to 7.
CN202111017164.1A 2021-08-31 2021-08-31 Data sample expansion method, device, equipment and medium Pending CN113705910A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111017164.1A CN113705910A (en) 2021-08-31 2021-08-31 Data sample expansion method, device, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111017164.1A CN113705910A (en) 2021-08-31 2021-08-31 Data sample expansion method, device, equipment and medium

Publications (1)

Publication Number Publication Date
CN113705910A true CN113705910A (en) 2021-11-26

Family

ID=78658369

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111017164.1A Pending CN113705910A (en) 2021-08-31 2021-08-31 Data sample expansion method, device, equipment and medium

Country Status (1)

Country Link
CN (1) CN113705910A (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016045917A (en) * 2014-08-27 2016-04-04 株式会社日立ソリューションズ西日本 Device for tendency extraction and evaluation of time series data
CN111080487A (en) * 2020-01-17 2020-04-28 广东电网有限责任公司 Electricity sales market electricity quantity prediction method and device
CN111126694A (en) * 2019-12-23 2020-05-08 北京工商大学 Time series data prediction method, system, medium and device
CN111160651A (en) * 2019-12-31 2020-05-15 福州大学 STL-LSTM-based subway passenger flow prediction method
CN111612628A (en) * 2020-05-28 2020-09-01 深圳博普科技有限公司 Method and system for classifying unbalanced data sets
CN112036515A (en) * 2020-11-04 2020-12-04 北京淇瑀信息科技有限公司 Oversampling method and device based on SMOTE algorithm and electronic equipment
CN112256550A (en) * 2020-11-19 2021-01-22 深信服科技股份有限公司 Storage capacity prediction model generation method and storage capacity prediction method
CN112801150A (en) * 2021-01-15 2021-05-14 清华大学 Method and device for expanding periodic video data
CN112836869A (en) * 2021-01-22 2021-05-25 新华三技术有限公司 KPI prediction method, KPI prediction device and storage medium
CN113254877A (en) * 2021-05-18 2021-08-13 北京达佳互联信息技术有限公司 Abnormal data detection method and device, electronic equipment and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016045917A (en) * 2014-08-27 2016-04-04 株式会社日立ソリューションズ西日本 Device for tendency extraction and evaluation of time series data
CN111126694A (en) * 2019-12-23 2020-05-08 北京工商大学 Time series data prediction method, system, medium and device
CN111160651A (en) * 2019-12-31 2020-05-15 福州大学 STL-LSTM-based subway passenger flow prediction method
CN111080487A (en) * 2020-01-17 2020-04-28 广东电网有限责任公司 Electricity sales market electricity quantity prediction method and device
CN111612628A (en) * 2020-05-28 2020-09-01 深圳博普科技有限公司 Method and system for classifying unbalanced data sets
CN112036515A (en) * 2020-11-04 2020-12-04 北京淇瑀信息科技有限公司 Oversampling method and device based on SMOTE algorithm and electronic equipment
CN112256550A (en) * 2020-11-19 2021-01-22 深信服科技股份有限公司 Storage capacity prediction model generation method and storage capacity prediction method
CN112801150A (en) * 2021-01-15 2021-05-14 清华大学 Method and device for expanding periodic video data
CN112836869A (en) * 2021-01-22 2021-05-25 新华三技术有限公司 KPI prediction method, KPI prediction device and storage medium
CN113254877A (en) * 2021-05-18 2021-08-13 北京达佳互联信息技术有限公司 Abnormal data detection method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
贾梓健;宋腾炜;王建新;: "基于傅里叶变换和kNNI的周期性时序数据缺失值补全算法", 软件工程, no. 03 *

Similar Documents

Publication Publication Date Title
CN111031346B (en) Method and device for enhancing video image quality
CN110378419B (en) Image set expansion method, device, equipment and readable storage medium
CN110049372B (en) Method, device, equipment and storage medium for predicting stable retention rate of anchor
US8577827B1 (en) Network page latency reduction using gamma distribution
CN108289121A (en) The method for pushing and device of marketing message
DE102019131231A1 (en) Methods and devices for detecting side channel attacks
Ludescher et al. Universal behavior of the interoccurrence times between losses in financial markets: Independence of the time resolution
CN111030869A (en) Network traffic prediction method and prediction device
CN112667407A (en) Processor parameter adjusting method and device, electronic equipment and storage medium
Ramaswami et al. Modeling heavy tails in traffic sources for network performance evaluation
CN110672325A (en) Bearing working condition stability evaluation method and device based on probability distribution
CN113344217A (en) Federal learning method and system combining personalized differential privacy
Podladchikova et al. A Kalman filter technique for improving medium-term predictions of the sunspot number
DE102011007603A1 (en) Method and apparatus for processing data elements with minimal latency
Ntlangu et al. Modelling network traffic using time series analysis: A review
CN113705910A (en) Data sample expansion method, device, equipment and medium
Yi et al. Simulation of inverse Gaussian compound Gaussian distribution sea clutter based on SIRP
CN116304623A (en) Radiation source identification method, device and system
CN113630786B (en) Network data traffic prediction method, device, computing equipment and storage medium
CN1792061B (en) Signal pattern generation apparatus, signal pattern generation method, program, storage medium, network patience test system and method
Joseph et al. Comparative analyses of stock returns properties and predictability
Mohammed et al. Multivariate time-series prediction for traffic in large WAN topology
CN117351993B (en) Audio transmission quality evaluation method and system based on audio distribution
CN113132033A (en) Communication interference detection method and device based on polynomial interpolation processing
CN113272897A (en) Noise reduction filter for signal processing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination