CN110874645A - Data reduction method - Google Patents

Data reduction method Download PDF

Info

Publication number
CN110874645A
CN110874645A CN201911113392.1A CN201911113392A CN110874645A CN 110874645 A CN110874645 A CN 110874645A CN 201911113392 A CN201911113392 A CN 201911113392A CN 110874645 A CN110874645 A CN 110874645A
Authority
CN
China
Prior art keywords
data
feature set
data feature
features
memory
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911113392.1A
Other languages
Chinese (zh)
Inventor
周金龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Shouqi Zhixing Technology Co Ltd
Original Assignee
Beijing Shouqi Zhixing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Shouqi Zhixing Technology Co Ltd filed Critical Beijing Shouqi Zhixing Technology Co Ltd
Priority to CN201911113392.1A priority Critical patent/CN110874645A/en
Publication of CN110874645A publication Critical patent/CN110874645A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention discloses a data reduction method, relates to the technical field of computers, and can effectively reduce various types of service data by adopting a machine learning model and a regression model.

Description

Data reduction method
Technical Field
The invention belongs to the technical field of computers, and particularly relates to a data reduction method.
Background
Some types of data are collected in a buried point mode at present, the collection mode needs to be transmitted through a plurality of nodes, and data can be lost due to data transmission failure of a certain node for some reasons. There is currently no relative recovery measure for lost data.
Disclosure of Invention
Aiming at the defects in the prior art, the embodiment of the invention provides a data reduction method, which comprises the following steps:
before data loss, acquiring data characteristics of various service data to generate a first data characteristic set;
extracting partial data features from the first feature set to generate a second data feature set;
inputting the first data feature set and the second data feature set into a machine learning model respectively, and calculating correlation coefficients between the data features of the first data set and the data features of the second data set respectively;
selecting data features with the maximum correlation coefficient with each data feature of the first data feature set from the second data feature set to generate a third data feature set;
after data loss, simultaneously inputting the remaining data features of the first data feature set and the third data feature set into a regression model, and restoring the lost service data of the first data feature set.
Preferably, the remaining data features of the first data feature set and the third data feature set are simultaneously input into a regression model, and the restoring the lost service data of the first data feature set includes:
and predicting the service data corresponding to each data feature of the first data feature set by the regression model through the trend of each data feature in the third data feature set to obtain the lost service data of each data feature of the first data feature set.
The data reduction method provided by the embodiment of the invention has the following beneficial effects:
by adopting the machine learning model and the regression model, various types of business data can be effectively restored.
Detailed Description
The present invention will be described in detail with reference to the following embodiments.
The data recovery method provided by the embodiment of the invention comprises the following steps:
s101, before data loss, data characteristics of various types of service data are obtained, and a first data characteristic set is generated.
As a specific embodiment, for the shared automobile industry, the first data feature set includes data features of the number of network points in 5 kilometers nearby, the number of network points in 1 kilometer nearby, the total number of vehicles at the network points, the number of vehicles to be rented at each network point, the total number of orders placed at each time, the number of miles traveled by each vehicle, the number of miles ordered, the time length of placing the orders, and the number of scanned vehicles at each time, which are 9.
S102, extracting partial data features from the first feature set to generate a second data feature set.
As a specific embodiment, for the shared automobile industry, the second data feature set includes 6 data features of the number of vehicles to be rented at each website, the total number of orders placed at each time, the mileage traveled by each vehicle, the mileage ordered, the duration of placing the orders, and the number of scanned vehicles at each time.
S103, inputting the first data feature set and the second data feature set into the machine learning model respectively, and calculating correlation coefficients between the data features of the first data set and the data features of the second data set respectively.
And S104, selecting the data features with the maximum correlation coefficient with the data features in the first data feature set from the second data feature set, and generating a third data feature set.
As a specific embodiment, in the second data feature set, the data features with the highest correlation coefficient with respect to each data feature of the first data feature set are: and the number of the vehicles to be rented of each website, the total number of the orders placed at each moment, the travel mileage of each vehicle and the order placing time length, and the generated third data feature set comprises 4 data features of the number of the vehicles to be rented of each website, the total number of the orders placed at each moment, the travel mileage of each vehicle and the order placing time length.
And S105, after the data are lost, simultaneously inputting the residual data features of the first data feature set and the third data feature set into a regression model, and restoring the lost service data of the first data feature set.
Optionally, the step of simultaneously inputting the remaining data features of the first data feature set and the third data feature set into the regression model, and the restoring the lost service data of the first data feature set includes:
and predicting the service data corresponding to each data feature of the first data feature set by the regression model through the trend of each data feature in the third data feature set to obtain the service data lost by each data feature of the first data feature set.
The data reduction method provided by the embodiment of the invention can effectively reduce various types of service data by adopting the machine learning model and the regression model.
In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
It will be appreciated that the relevant features of the method and apparatus described above are referred to one another. It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In addition, the memory may include volatile memory in a computer readable medium, Random Access Memory (RAM) and/or nonvolatile memory such as Read Only Memory (ROM) or flash memory (flash RAM), and the memory includes at least one memory chip.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). The memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.
It should be noted that the above-mentioned embodiments do not limit the present invention in any way, and all technical solutions obtained by using equivalent alternatives or equivalent variations fall within the protection scope of the present invention.

Claims (3)

1. A method of data reduction, comprising:
before data loss, acquiring data characteristics of various service data to generate a first data characteristic set;
extracting partial data features from the first feature set to generate a second data feature set;
inputting the first data feature set and the second data feature set into a machine learning model respectively, and calculating correlation coefficients between the data features of the first data set and the data features of the second data set respectively;
selecting data features with the maximum correlation coefficient with each data feature of the first data feature set from the second data feature set to generate a third data feature set;
after data loss, simultaneously inputting the remaining data features of the first data feature set and the third data feature set into a regression model, and restoring the lost service data of the first data feature set.
2. The data reduction method according to claim 1, wherein the remaining data features of the first data feature set and the third data feature set are input into a regression model at the same time, and the reduction of the lost traffic data of the first data feature set comprises:
and predicting the service data corresponding to each data feature of the first data feature set by the regression model through the trend of each data feature in the third data feature set to obtain the lost service data of each data feature of the first data feature set.
3. A computer arrangement comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of claims 1-2 are implemented when the computer program is executed by the processor.
CN201911113392.1A 2019-11-14 2019-11-14 Data reduction method Pending CN110874645A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911113392.1A CN110874645A (en) 2019-11-14 2019-11-14 Data reduction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911113392.1A CN110874645A (en) 2019-11-14 2019-11-14 Data reduction method

Publications (1)

Publication Number Publication Date
CN110874645A true CN110874645A (en) 2020-03-10

Family

ID=69717216

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911113392.1A Pending CN110874645A (en) 2019-11-14 2019-11-14 Data reduction method

Country Status (1)

Country Link
CN (1) CN110874645A (en)

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001224023A (en) * 2000-02-10 2001-08-17 Sony Corp Information processing unit and method, and recording medium
JP2006251997A (en) * 2005-03-09 2006-09-21 Toyo Electric Mfg Co Ltd Method for interpolating missing data
US20130067181A1 (en) * 2011-09-09 2013-03-14 Nokia Corporation Method and apparatus for providing criticality based data backup
CN103368787A (en) * 2012-03-28 2013-10-23 索尼公司 Information processing device, information processing method, and program
US20140012553A1 (en) * 2012-04-20 2014-01-09 United Parcel Service Of America, Inc. Systems and methods for aggregating and evaluating environmental data
CN103678869A (en) * 2013-09-17 2014-03-26 中国人民解放军海军航空工程学院青岛校区 Prediction and estimation method of flight parameter missing data
CN103971520A (en) * 2014-04-17 2014-08-06 浙江大学 Traffic flow data recovery method based on space-time correlation
CN104240715A (en) * 2013-06-21 2014-12-24 华为技术有限公司 Method and device for recovering lost data
US20150237157A1 (en) * 2014-02-18 2015-08-20 Salesforce.Com, Inc. Transparent sharding of traffic across messaging brokers
CN105469123A (en) * 2015-12-30 2016-04-06 华东理工大学 Missing data completion method based on k plane regression
CN105679022A (en) * 2016-02-04 2016-06-15 北京工业大学 Multi-source traffic data complementing method based on low rank
CN105893610A (en) * 2016-04-26 2016-08-24 中国科学院信息工程研究所 Deficiency-source completion method of multi-source heterogeneous large data
WO2017202006A1 (en) * 2016-05-25 2017-11-30 腾讯科技(深圳)有限公司 Data processing method and device, and computer storage medium
US20180081914A1 (en) * 2016-09-16 2018-03-22 Oracle International Corporation Method and system for adaptively imputing sparse and missing data for predictive models
CN108289285A (en) * 2018-01-12 2018-07-17 上海海事大学 A kind of ocean wireless sensor network is lost data and is restored and reconstructing method
US20180336484A1 (en) * 2017-05-18 2018-11-22 Sas Institute Inc. Analytic system based on multiple task learning with incomplete data
WO2019003234A1 (en) * 2017-06-26 2019-01-03 Telefonaktiebolaget Lm Ericsson (Publ) Network node and method performed therein for generating a missing data value of a set of data from one or more devices
CN109472404A (en) * 2018-10-31 2019-03-15 山东大学 A kind of Short-Term Load Forecasting of Electric Power System, model, apparatus and system
CN109618400A (en) * 2019-01-28 2019-04-12 南京邮电大学 Wireless sensor network data transmission method, readable storage medium storing program for executing and terminal

Patent Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2001224023A (en) * 2000-02-10 2001-08-17 Sony Corp Information processing unit and method, and recording medium
JP2006251997A (en) * 2005-03-09 2006-09-21 Toyo Electric Mfg Co Ltd Method for interpolating missing data
US20130067181A1 (en) * 2011-09-09 2013-03-14 Nokia Corporation Method and apparatus for providing criticality based data backup
CN103368787A (en) * 2012-03-28 2013-10-23 索尼公司 Information processing device, information processing method, and program
US20140012553A1 (en) * 2012-04-20 2014-01-09 United Parcel Service Of America, Inc. Systems and methods for aggregating and evaluating environmental data
CN104240715A (en) * 2013-06-21 2014-12-24 华为技术有限公司 Method and device for recovering lost data
CN103678869A (en) * 2013-09-17 2014-03-26 中国人民解放军海军航空工程学院青岛校区 Prediction and estimation method of flight parameter missing data
US20150237157A1 (en) * 2014-02-18 2015-08-20 Salesforce.Com, Inc. Transparent sharding of traffic across messaging brokers
CN103971520A (en) * 2014-04-17 2014-08-06 浙江大学 Traffic flow data recovery method based on space-time correlation
CN105469123A (en) * 2015-12-30 2016-04-06 华东理工大学 Missing data completion method based on k plane regression
CN105679022A (en) * 2016-02-04 2016-06-15 北京工业大学 Multi-source traffic data complementing method based on low rank
CN105893610A (en) * 2016-04-26 2016-08-24 中国科学院信息工程研究所 Deficiency-source completion method of multi-source heterogeneous large data
WO2017202006A1 (en) * 2016-05-25 2017-11-30 腾讯科技(深圳)有限公司 Data processing method and device, and computer storage medium
US20180081914A1 (en) * 2016-09-16 2018-03-22 Oracle International Corporation Method and system for adaptively imputing sparse and missing data for predictive models
US20180336484A1 (en) * 2017-05-18 2018-11-22 Sas Institute Inc. Analytic system based on multiple task learning with incomplete data
WO2019003234A1 (en) * 2017-06-26 2019-01-03 Telefonaktiebolaget Lm Ericsson (Publ) Network node and method performed therein for generating a missing data value of a set of data from one or more devices
CN108289285A (en) * 2018-01-12 2018-07-17 上海海事大学 A kind of ocean wireless sensor network is lost data and is restored and reconstructing method
CN109472404A (en) * 2018-10-31 2019-03-15 山东大学 A kind of Short-Term Load Forecasting of Electric Power System, model, apparatus and system
CN109618400A (en) * 2019-01-28 2019-04-12 南京邮电大学 Wireless sensor network data transmission method, readable storage medium storing program for executing and terminal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马茜;谷峪;李芳芳;于戈;: "顺序敏感的多源感知数据填补技术" *

Similar Documents

Publication Publication Date Title
JP6388339B2 (en) Distributed caching and cache analysis
CN106897342B (en) Data verification method and equipment
CN109033365B (en) Data processing method and related equipment
CN106682167B (en) Statistical device and method for user behavior data
CN111524384B (en) Parking lot parking space occupation and release judgment method
CN110838955B (en) Method and device for debugging ETC online application function
CN114138468B (en) Self-adaptive distribution method and device for packaging task amount and storage medium
CN106878365B (en) data synchronization method and device
CN110019347B (en) Data processing method and device of block chain and terminal equipment
CN114138756B (en) Data deduplication method, node and computer-readable storage medium
CN110874645A (en) Data reduction method
CN106156185B (en) Method, device and system for inquiring service request execution state
CN111637897B (en) Map updating method, map updating device, storage medium, and processor
CN111639998A (en) Method, device and medium for guaranteeing user deposit rights and interests based on block chain
CN111369282B (en) Resource processing method and device
CN106888244A (en) A kind of method for processing business and device
CN109426559B (en) Command issuing method and device, storage medium and processor
CN111126624A (en) Method for judging validity of model prediction result
CN112509164A (en) Attendance card-punching method, attendance card-punching device, attendance card-punching equipment and storage medium
CN111369040A (en) Road condition information updating method
CN110766546A (en) Bank account management method
CN110659170A (en) Vehicle-mounted T-BOX test system
WO2019041826A1 (en) Breakpoint list cleaning method and apparatus, storage medium, and server
CN116629386B (en) Model training method and device
CN112685046A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination