CN110874645A - Data reduction method - Google Patents
Data reduction method Download PDFInfo
- Publication number
- CN110874645A CN110874645A CN201911113392.1A CN201911113392A CN110874645A CN 110874645 A CN110874645 A CN 110874645A CN 201911113392 A CN201911113392 A CN 201911113392A CN 110874645 A CN110874645 A CN 110874645A
- Authority
- CN
- China
- Prior art keywords
- data
- feature set
- data feature
- features
- memory
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Abstract
The invention discloses a data reduction method, relates to the technical field of computers, and can effectively reduce various types of service data by adopting a machine learning model and a regression model.
Description
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a data reduction method.
Background
Some types of data are collected in a buried point mode at present, the collection mode needs to be transmitted through a plurality of nodes, and data can be lost due to data transmission failure of a certain node for some reasons. There is currently no relative recovery measure for lost data.
Disclosure of Invention
Aiming at the defects in the prior art, the embodiment of the invention provides a data reduction method, which comprises the following steps:
before data loss, acquiring data characteristics of various service data to generate a first data characteristic set;
extracting partial data features from the first feature set to generate a second data feature set;
inputting the first data feature set and the second data feature set into a machine learning model respectively, and calculating correlation coefficients between the data features of the first data set and the data features of the second data set respectively;
selecting data features with the maximum correlation coefficient with each data feature of the first data feature set from the second data feature set to generate a third data feature set;
after data loss, simultaneously inputting the remaining data features of the first data feature set and the third data feature set into a regression model, and restoring the lost service data of the first data feature set.
Preferably, the remaining data features of the first data feature set and the third data feature set are simultaneously input into a regression model, and the restoring the lost service data of the first data feature set includes:
and predicting the service data corresponding to each data feature of the first data feature set by the regression model through the trend of each data feature in the third data feature set to obtain the lost service data of each data feature of the first data feature set.
The data reduction method provided by the embodiment of the invention has the following beneficial effects:
by adopting the machine learning model and the regression model, various types of business data can be effectively restored.
Detailed Description
The present invention will be described in detail with reference to the following embodiments.
The data recovery method provided by the embodiment of the invention comprises the following steps:
s101, before data loss, data characteristics of various types of service data are obtained, and a first data characteristic set is generated.
As a specific embodiment, for the shared automobile industry, the first data feature set includes data features of the number of network points in 5 kilometers nearby, the number of network points in 1 kilometer nearby, the total number of vehicles at the network points, the number of vehicles to be rented at each network point, the total number of orders placed at each time, the number of miles traveled by each vehicle, the number of miles ordered, the time length of placing the orders, and the number of scanned vehicles at each time, which are 9.
S102, extracting partial data features from the first feature set to generate a second data feature set.
As a specific embodiment, for the shared automobile industry, the second data feature set includes 6 data features of the number of vehicles to be rented at each website, the total number of orders placed at each time, the mileage traveled by each vehicle, the mileage ordered, the duration of placing the orders, and the number of scanned vehicles at each time.
S103, inputting the first data feature set and the second data feature set into the machine learning model respectively, and calculating correlation coefficients between the data features of the first data set and the data features of the second data set respectively.
And S104, selecting the data features with the maximum correlation coefficient with the data features in the first data feature set from the second data feature set, and generating a third data feature set.
As a specific embodiment, in the second data feature set, the data features with the highest correlation coefficient with respect to each data feature of the first data feature set are: and the number of the vehicles to be rented of each website, the total number of the orders placed at each moment, the travel mileage of each vehicle and the order placing time length, and the generated third data feature set comprises 4 data features of the number of the vehicles to be rented of each website, the total number of the orders placed at each moment, the travel mileage of each vehicle and the order placing time length.
And S105, after the data are lost, simultaneously inputting the residual data features of the first data feature set and the third data feature set into a regression model, and restoring the lost service data of the first data feature set.
Optionally, the step of simultaneously inputting the remaining data features of the first data feature set and the third data feature set into the regression model, and the restoring the lost service data of the first data feature set includes:
and predicting the service data corresponding to each data feature of the first data feature set by the regression model through the trend of each data feature in the third data feature set to obtain the service data lost by each data feature of the first data feature set.
The data reduction method provided by the embodiment of the invention can effectively reduce various types of service data by adopting the machine learning model and the regression model.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In addition, the memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
It should be noted that the above-mentioned embodiments do not limit the present invention in any way, and all technical solutions obtained by using equivalent alternatives or equivalent variations fall within the protection scope of the present invention.
Claims (3)
1. A method of data reduction, comprising:
before data loss, acquiring data characteristics of various service data to generate a first data characteristic set;
extracting partial data features from the first feature set to generate a second data feature set;
inputting the first data feature set and the second data feature set into a machine learning model respectively, and calculating correlation coefficients between the data features of the first data set and the data features of the second data set respectively;
selecting data features with the maximum correlation coefficient with each data feature of the first data feature set from the second data feature set to generate a third data feature set;
after data loss, simultaneously inputting the remaining data features of the first data feature set and the third data feature set into a regression model, and restoring the lost service data of the first data feature set.
2. The data reduction method according to claim 1, wherein the remaining data features of the first data feature set and the third data feature set are input into a regression model at the same time, and the reduction of the lost traffic data of the first data feature set comprises:
and predicting the service data corresponding to each data feature of the first data feature set by the regression model through the trend of each data feature in the third data feature set to obtain the lost service data of each data feature of the first data feature set.
3. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of claims 1-2 are implemented when the computer program is executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911113392.1A CN110874645A (en) | 2019-11-14 | 2019-11-14 | Data reduction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911113392.1A CN110874645A (en) | 2019-11-14 | 2019-11-14 | Data reduction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110874645A true CN110874645A (en) | 2020-03-10 |
Family
ID=69717216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911113392.1A Pending CN110874645A (en) | 2019-11-14 | 2019-11-14 | Data reduction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110874645A (en) |
Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001224023A (en) * | 2000-02-10 | 2001-08-17 | Sony Corp | Information processing unit and method, and recording medium |
JP2006251997A (en) * | 2005-03-09 | 2006-09-21 | Toyo Electric Mfg Co Ltd | Method for interpolating missing data |
US20130067181A1 (en) * | 2011-09-09 | 2013-03-14 | Nokia Corporation | Method and apparatus for providing criticality based data backup |
CN103368787A (en) * | 2012-03-28 | 2013-10-23 | 索尼公司 | Information processing device, information processing method, and program |
US20140012553A1 (en) * | 2012-04-20 | 2014-01-09 | United Parcel Service Of America, Inc. | Systems and methods for aggregating and evaluating environmental data |
CN103678869A (en) * | 2013-09-17 | 2014-03-26 | 中国人民解放军海军航空工程学院青岛校区 | Prediction and estimation method of flight parameter missing data |
CN103971520A (en) * | 2014-04-17 | 2014-08-06 | 浙江大学 | Traffic flow data recovery method based on space-time correlation |
CN104240715A (en) * | 2013-06-21 | 2014-12-24 | 华为技术有限公司 | Method and device for recovering lost data |
US20150237157A1 (en) * | 2014-02-18 | 2015-08-20 | Salesforce.Com, Inc. | Transparent sharding of traffic across messaging brokers |
CN105469123A (en) * | 2015-12-30 | 2016-04-06 | 华东理工大学 | Missing data completion method based on k plane regression |
CN105679022A (en) * | 2016-02-04 | 2016-06-15 | 北京工业大学 | Multi-source traffic data complementing method based on low rank |
CN105893610A (en) * | 2016-04-26 | 2016-08-24 | 中国科学院信息工程研究所 | Deficiency-source completion method of multi-source heterogeneous large data |
WO2017202006A1 (en) * | 2016-05-25 | 2017-11-30 | 腾讯科技(深圳)有限公司 | Data processing method and device, and computer storage medium |
US20180081914A1 (en) * | 2016-09-16 | 2018-03-22 | Oracle International Corporation | Method and system for adaptively imputing sparse and missing data for predictive models |
CN108289285A (en) * | 2018-01-12 | 2018-07-17 | 上海海事大学 | A kind of ocean wireless sensor network is lost data and is restored and reconstructing method |
US20180336484A1 (en) * | 2017-05-18 | 2018-11-22 | Sas Institute Inc. | Analytic system based on multiple task learning with incomplete data |
WO2019003234A1 (en) * | 2017-06-26 | 2019-01-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Network node and method performed therein for generating a missing data value of a set of data from one or more devices |
CN109472404A (en) * | 2018-10-31 | 2019-03-15 | 山东大学 | A kind of Short-Term Load Forecasting of Electric Power System, model, apparatus and system |
CN109618400A (en) * | 2019-01-28 | 2019-04-12 | 南京邮电大学 | Wireless sensor network data transmission method, readable storage medium storing program for executing and terminal |
-
2019
- 2019-11-14 CN CN201911113392.1A patent/CN110874645A/en active Pending
Patent Citations (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2001224023A (en) * | 2000-02-10 | 2001-08-17 | Sony Corp | Information processing unit and method, and recording medium |
JP2006251997A (en) * | 2005-03-09 | 2006-09-21 | Toyo Electric Mfg Co Ltd | Method for interpolating missing data |
US20130067181A1 (en) * | 2011-09-09 | 2013-03-14 | Nokia Corporation | Method and apparatus for providing criticality based data backup |
CN103368787A (en) * | 2012-03-28 | 2013-10-23 | 索尼公司 | Information processing device, information processing method, and program |
US20140012553A1 (en) * | 2012-04-20 | 2014-01-09 | United Parcel Service Of America, Inc. | Systems and methods for aggregating and evaluating environmental data |
CN104240715A (en) * | 2013-06-21 | 2014-12-24 | 华为技术有限公司 | Method and device for recovering lost data |
CN103678869A (en) * | 2013-09-17 | 2014-03-26 | 中国人民解放军海军航空工程学院青岛校区 | Prediction and estimation method of flight parameter missing data |
US20150237157A1 (en) * | 2014-02-18 | 2015-08-20 | Salesforce.Com, Inc. | Transparent sharding of traffic across messaging brokers |
CN103971520A (en) * | 2014-04-17 | 2014-08-06 | 浙江大学 | Traffic flow data recovery method based on space-time correlation |
CN105469123A (en) * | 2015-12-30 | 2016-04-06 | 华东理工大学 | Missing data completion method based on k plane regression |
CN105679022A (en) * | 2016-02-04 | 2016-06-15 | 北京工业大学 | Multi-source traffic data complementing method based on low rank |
CN105893610A (en) * | 2016-04-26 | 2016-08-24 | 中国科学院信息工程研究所 | Deficiency-source completion method of multi-source heterogeneous large data |
WO2017202006A1 (en) * | 2016-05-25 | 2017-11-30 | 腾讯科技(深圳)有限公司 | Data processing method and device, and computer storage medium |
US20180081914A1 (en) * | 2016-09-16 | 2018-03-22 | Oracle International Corporation | Method and system for adaptively imputing sparse and missing data for predictive models |
US20180336484A1 (en) * | 2017-05-18 | 2018-11-22 | Sas Institute Inc. | Analytic system based on multiple task learning with incomplete data |
WO2019003234A1 (en) * | 2017-06-26 | 2019-01-03 | Telefonaktiebolaget Lm Ericsson (Publ) | Network node and method performed therein for generating a missing data value of a set of data from one or more devices |
CN108289285A (en) * | 2018-01-12 | 2018-07-17 | 上海海事大学 | A kind of ocean wireless sensor network is lost data and is restored and reconstructing method |
CN109472404A (en) * | 2018-10-31 | 2019-03-15 | 山东大学 | A kind of Short-Term Load Forecasting of Electric Power System, model, apparatus and system |
CN109618400A (en) * | 2019-01-28 | 2019-04-12 | 南京邮电大学 | Wireless sensor network data transmission method, readable storage medium storing program for executing and terminal |
Non-Patent Citations (1)
Title |
---|
马茜;谷峪;李芳芳;于戈;: "顺序敏感的多源感知数据填补技术" * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6388339B2 (en) | Distributed caching and cache analysis | |
CN106897342B (en) | Data verification method and equipment | |
CN109033365B (en) | Data processing method and related equipment | |
CN106682167B (en) | Statistical device and method for user behavior data | |
CN111524384B (en) | Parking lot parking space occupation and release judgment method | |
CN110838955B (en) | Method and device for debugging ETC online application function | |
CN114138468B (en) | Self-adaptive distribution method and device for packaging task amount and storage medium | |
CN106878365B (en) | data synchronization method and device | |
CN110019347B (en) | Data processing method and device of block chain and terminal equipment | |
CN114138756B (en) | Data deduplication method, node and computer-readable storage medium | |
CN110874645A (en) | Data reduction method | |
CN106156185B (en) | Method, device and system for inquiring service request execution state | |
CN111637897B (en) | Map updating method, map updating device, storage medium, and processor | |
CN111639998A (en) | Method, device and medium for guaranteeing user deposit rights and interests based on block chain | |
CN111369282B (en) | Resource processing method and device | |
CN106888244A (en) | A kind of method for processing business and device | |
CN109426559B (en) | Command issuing method and device, storage medium and processor | |
CN111126624A (en) | Method for judging validity of model prediction result | |
CN112509164A (en) | Attendance card-punching method, attendance card-punching device, attendance card-punching equipment and storage medium | |
CN111369040A (en) | Road condition information updating method | |
CN110766546A (en) | Bank account management method | |
CN110659170A (en) | Vehicle-mounted T-BOX test system | |
WO2019041826A1 (en) | Breakpoint list cleaning method and apparatus, storage medium, and server | |
CN116629386B (en) | Model training method and device | |
CN112685046A (en) | Data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |