CN108563789A - Data cleaning method based on Spark frames and device - Google Patents

Data cleaning method based on Spark frames and device Download PDF

Info

Publication number
CN108563789A
CN108563789A CN201810398800.1A CN201810398800A CN108563789A CN 108563789 A CN108563789 A CN 108563789A CN 201810398800 A CN201810398800 A CN 201810398800A CN 108563789 A CN108563789 A CN 108563789A
Authority
CN
China
Prior art keywords
data
attribute
cleaned
stored
demand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810398800.1A
Other languages
Chinese (zh)
Inventor
姜光植
严雪枫
谢川
黄瀚林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CHENGDU ZHIYUN SCIENCE & TECHNOLOGY Co Ltd
Original Assignee
CHENGDU ZHIYUN SCIENCE & TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CHENGDU ZHIYUN SCIENCE & TECHNOLOGY Co Ltd filed Critical CHENGDU ZHIYUN SCIENCE & TECHNOLOGY Co Ltd
Priority to CN201810398800.1A priority Critical patent/CN108563789A/en
Publication of CN108563789A publication Critical patent/CN108563789A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the present invention provides a kind of data cleaning method and device based on Spark frames, and it includes obtaining data to be cleaned to be somebody's turn to do the data cleaning method based on Spark frames;Judge whether meet preset need in the data to be cleaned, if being unsatisfactory for preset need, data cleansing is carried out to the data to be cleaned, and the data that cleaning is completed are as data to be stored;The data attribute being calculated is written in property file the data attribute for calculating the data to be stored;The data to be stored and property file are preserved.The present invention can effectively improve data cleansing efficiency, it is ensured that authenticity, integrality and the reasonability of data.

Description

Data cleaning method based on Spark frames and device
Technical field
The present invention relates to technical field of data processing, in particular to a kind of data cleansing side based on Spark frames Method and device.
Background technology
Existing mainstream data cleaning method is to carry out data cleansing based on MapReduce programs mostly, but due to logical When crossing the progress big data cleaning of MapReduce programs, need a large amount of intermediate result to be written to local disk, so as to cause There are MapReduce programs to execute the disadvantages such as time-consuming, efficiency is low when data cleansing.
Invention content
In view of this, the embodiment of the present invention provides a kind of data cleaning method and device based on Spark frames, it can be effective It solves the above problems, improves data cleansing efficiency.
Present pre-ferred embodiments provide a kind of data cleaning method based on Spark frames, the method includes:
Obtain data to be cleaned;
Judge whether meet preset need in the data to be cleaned, if being unsatisfactory for preset need, to described to be cleaned Data carry out data cleansing, and the data that cleaning is completed are as data to be stored;
The data attribute being calculated is written in property file the data attribute for calculating the data to be stored;
The data to be stored and property file are preserved.
In the selection of present pre-ferred embodiments, the step of judging whether to meet preset need in the data to be cleaned Including:
Judge whether the data to be cleaned meet data integrity demand, data consistency demand, data validation need Ask and data uniqueness demand in one or more demands.
In the selection of present pre-ferred embodiments, the step of calculating the data attribute of the data to be stored, including:
The configuration file with its format match is called according to the data format of the data to be stored;
The data attribute of the data to be stored is calculated according to preset data attribute computation rule in the configuration file.
In the selection of present pre-ferred embodiments, the step of executing the data attribute for calculating the data to be stored it Before, the method further includes:
Determine the data format of preprocessed data;
Corresponding data attribute is determined according to each data format, and using the file comprising data attribute as configuration text Part carries out corresponding preservation with the data format.
In the selection of present pre-ferred embodiments, the data attribute includes the KEY for characterize data field meanings And the Value for characterizing field threshold value.
In the selection of present pre-ferred embodiments, the step of obtaining data to be cleaned, includes:
Acquired from data source according to prefixed time interval by preset data sampling instrument daily record data be used as it is to be cleaned Data.
Present pre-ferred embodiments also provide a kind of data cleansing device based on Spark frames, and described device includes:
Data acquisition module, for obtaining data to be cleaned;
Data judgment module, for judging whether meet preset need in the data to be cleaned, if being unsatisfactory for default need It asks, then data cleansing is carried out to the data to be cleaned, and the data that cleaning is completed are as data to be stored;
Attribute computing module, the data attribute for calculating the data to be stored write the data attribute being calculated Enter in property file;
Data preserve judgment module, for being preserved to the data to be stored and property file.
In the selection of present pre-ferred embodiments, the data judgment module is additionally operable to:
Judge whether the data to be cleaned meet data integrity demand, data consistency demand, data validation need Ask and data uniqueness demand in one or more demands.
In the selection of present pre-ferred embodiments, the attribute computing module includes:
Configuration file acquiring unit is matched for being called according to the data format of the data to be stored with its format match Set file;
Attribute computing unit, for waiting depositing according to described in preset data attribute computation rule calculating in the configuration file Store up the data attribute of data.
In the selection of present pre-ferred embodiments, described device further includes:
Demand determining module, the data format for determining preprocessed data;
File configuration module for determining corresponding data attribute according to each data format, and will include data category The file of property carries out corresponding preservation as configuration file with the data format.
Compared with prior art, the embodiment of the present invention provides a kind of data cleaning method and device based on Spark frames, Wherein, it should realize data cleansing based on Spark frames, and can effectively avoid the need for occurring in data processing by intermediate data The problem on local disk is write on, data cleansing efficiency can be greatly improved.Meanwhile the present disclosure may also ensure that during data cleansing Accuracy, authenticity.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this A little attached drawings obtain other relevant attached drawings.
Fig. 1 is to be set using the terminal of data cleaning method and device provided in an embodiment of the present invention based on Spark frames Standby frame structure schematic diagram.
Fig. 2 is the flow diagram of the data cleaning method provided in an embodiment of the present invention based on Spark frames.
Fig. 3 is the sub-process schematic diagram of step S12 shown in Fig. 2.
Fig. 4 is another flow diagram of the data cleaning method provided in an embodiment of the present invention based on Spark frames.
Fig. 5 is the frame structure schematic diagram of the data cleansing device provided in an embodiment of the present invention based on Spark frames.
Icon:10- terminal devices;Data cleansing devices of the 100- based on Spark frames;110- data acquisition modules; 120- data judgment modules;130- attribute computing modules;131- configuration file acquiring units;132- attribute computing units;140- Data storage module;150- demand determining modules;160- file configuration modules;200- memories;300- storage controls;400- Processor.
Specific implementation mode
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.The present invention being usually described and illustrated herein in the accompanying drawings is implemented The component of example can be arranged and be designed with a variety of different configurations.
Therefore, below the detailed description of the embodiment of the present invention to providing in the accompanying drawings be not intended to limit it is claimed The scope of the present invention, but be merely representative of the present invention selected embodiment.Based on the embodiments of the present invention, this field is common The every other embodiment that technical staff is obtained without creative efforts belongs to the model that the present invention protects It encloses.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.
As shown in Figure 1, for application data cleaning method and device provided in an embodiment of the present invention based on Spark frames The frame structure schematic diagram of terminal device 10, the terminal device 10 include data cleansing device 100 based on Spark frames, deposit Reservoir 200, storage control 300 and processor 400.Wherein, the memory 200, storage control 300, processor 400 Each element is directly or indirectly electrically connected between each other, to realize the transmission or interaction of data.For example, leading between these elements It crosses one or more communication bus or signal wire is realized and is electrically connected.The data cleansing device 100 based on Spark frames wraps Including at least one can be stored in the memory 200 or be solidificated in the form of software or firmware in the terminal device 10 Software function module in operating system.The processor 400 accesses the storage under the control of the storage control 300 Device 200, for executing the executable module stored in the memory 200, such as the data based on Spark frames are clear Software function module and computer program included by cleaning device 100 etc..
It is appreciated that structure shown in FIG. 1 is only to illustrate, the terminal device 10 may also include more than shown in Fig. 1 Either less component or with the configuration different from shown in Fig. 1.Hardware, software may be used in each component shown in Fig. 1 Or combinations thereof realize.It should be understood that the terminal device 10 may be, but not limited to, smart mobile phone, PC (personal computer, PC), tablet computer, personal digital assistant (personal digital assistant, PDA), Mobile internet surfing equipment (mobile Internet device, MID), Cloud Server, minicomputer etc..
Further, as shown in Fig. 2, for the data cleaning method provided in an embodiment of the present invention based on Spark frames Flow diagram is somebody's turn to do the data cleaning method based on Spark frames and is applied to above-mentioned terminal device 10, below in conjunction with Fig. 2 The specific steps and flow of the data cleaning method based on Spark frames are described in detail.It should be understood that this reality It is limitation with the sequence of steps described below and flow to apply the data cleaning method based on Spark frames provided in example not.
Step S10 obtains data to be cleaned;
Step S11 judges whether meet preset need in the data to be cleaned, if being unsatisfactory for preset need, to institute It states data to be cleaned and carries out data cleansing, and the data that cleaning is completed are as data to be stored;
Step S12 calculates the data attribute of the data to be stored, and property file is written in the data attribute being calculated In;
Step S13 preserves the data to be stored and property file.
Firstly the need of explanation, Spark is that cloud computing is popular and general parallel computation frame, and one kind are scalable (scalable) the company-data analysis platform that (In-Memory Computing) is calculated based on memory, i.e. Spark are based on interior The distributed data collection deposited, optimizes iterative workload and interactive inquiry, so as to greatly improve big data meter The speed and efficiency of calculation.Therefore, in the present embodiment, used in above-mentioned steps S10- steps S13 and be based on Spark frames into line number According to cleaning, the characteristic that Spark frames are calculated based on memory can be efficiently used, i.e., uses memory as storage in Distributed Calculation Medium, and the read-write efficiency of memory is far above disk read-write efficiency.In addition, being carried out based on MapReduce programs compared to existing It needs a large amount of intermediate result to be written to local disk when data cleansing, the problem of cleaning efficiency is low, time-consuming is caused to go out Existing, the data cleaning method based on Spark frames in the application can effectively avoid the problem, and can effectively ensure data Integrality, authenticity.
In detail, in step slo, the data to be cleaned can be but not limited to daily record data etc..It is waited for clearly in acquisition Can be obtained from different types of data source when washing data, such as from FTP, (File Transfer Protocol, file pass Defeated agreement) server, Tomcat, Nginx etc..In other embodiments, also may be used for obtaining the data source of data to be cleaned To be extended according to specific business, and growth data Source Type is only needed to accordingly increase data load atom processing or be buried Point, the loading procedure of data source is also distributed variable-frequencypump.
In addition, when actually obtaining data to be cleaned, daily record data can be acquired from data source according to prefixed time interval As data to be cleaned.Wherein, the preset data sampling instrument can be but not limited to Shell scripts, Python scripts, Java applet or some other existing log collection tools (such as Flume).
Further, in step S11, the preset need can carry out flexible design according to actual conditions, such as in this example In, the preset need can be the one or more of integrality, consistency, legitimacy, uniqueness of data etc., that is, carry out When whether meeting the judgement of preset need in the data to be cleaned, it is complete to can determine whether the data to be cleaned meet data One or more demands in property demand, data consistency demand, data validation demand and data uniqueness demand, and When the data to be cleaned are unsatisfactory for preset need, data cleansing is carried out.
For example, when the data to be cleaned are unsatisfactory for integrality demand, information completion can be carried out to the data to be cleaned Deng, e.g., such as the scarce data of time series, front and back mean value can be used, scarce is more, can use smooth equal processing.But If can not completion, can directly carry out data and abandon.
In another example when the data to be cleaned are unsatisfactory for uniqueness demand, can according to preset rules or using sql or Person excel etc. carries out duplicate removal processing etc..
In another example when the data to be cleaned are unsatisfactory for legitimacy demand, it can be according to preset legitimacy rule Validity judgement is carried out, corresponding data cleansing is carried out according to judging result, such as data are deleted.
It should be understood that in actual implementation, the preset need can be but not limited to above-mentioned several.In addition, into It needs to consider when row data cleansing, why wrong data generates, such as no other reason than that the maloperation of user or some thirds The intervention of square software and the wrong data generated, this data can be filtered out directly.And for some because client from The missing or excess of some parts of daily record data caused by body problem network problem, can by such as integrity rule, Rule of consistency etc. is handled data to ensure the availability and integrality of most of data, to which data be substantially improved Integrality and reliability.
Further, the data attribute in step S12 includes the KEY of characterize data field meanings and for characterizing field The Value of threshold value.In one embodiment, as shown in figure 3, calculating the data attribute of the data to be stored in step S12 Process can lead to step S120 and step S121 is realized, specific as follows.
Step S120 calls the configuration file with its format match according to the data format of the data to be stored;
Step S121 calculates the data to be stored according to preset data attribute computation rule in the configuration file Data attribute.
Since data to be cleaned are the dirty datas to come from different server pulls, accordingly, it is considered to arrive data to be cleaned Changeability, such as daily record data, may due to practical business adjustment need add or delete in some daily records field letter Breath, so as to cause the variation of journal format new, between legacy version, be unable to control in addition user whether can timely update it is newest Client, so the log information being collected into from client will be inconsistent, this just needs the changeability of journal file.Cause This can carry out data category as shown in step S120 and step S121 when implementing for the data of different formats The calculating of property is very necessary, can effectively improve the enforceability and accuracy of data cleansing process.
In step S13, when being preserved to the data and corresponding property file of completing data attribute calculating, it can save In HDFS (Hadoop Distributed File System, abbreviation HDFS) or Hive data warehouses.
It further, in one embodiment, can be according to the different-format of possible demand before executing step S12 Data carry out configuration file configuration, for when carrying out data attribute calculating can directly according to corresponding configuration file into Row calculates.As shown in figure 4, when actual implementation, the configuration process of the configuration file can be realized by step S14 and step S15, It is specific as follows.
Step S14 determines preprocessed data format;
Step S15 determines corresponding data attribute according to the preprocessed data format, and by the text comprising data attribute Part carries out corresponding preservation as configuration file with the data format.
Preprocessed data format in step S14 neutralization procedures S15 can it is determining according to actual demand or empirically into The input etc. of the corresponding data format of row, the present embodiment are not limited herein.
Further, as shown in figure 5, being the data cleansing device based on Spark frames provided in the embodiment of the present invention 100 frame structure schematic diagram, the data cleansing device 100 based on Spark frames include rope data acquisition module 110, Data judgment module 120, attribute computing module 130, data storage module 140, demand determining module 150 and file configuration module 160。
The data acquisition module 110, for obtaining data to be cleaned;In the present embodiment, about the data acquisition mould The description of block 110 specifically refers to the detailed description of above-mentioned steps S10, that is, the step S10 can be by the data acquisition Module 110 executes, thus does not illustrate more herein.
The data judgment module 120, for judging whether meet preset need in the data to be cleaned, if being unsatisfactory for Preset need then carries out data cleansing to the data to be cleaned, and the data that cleaning is completed are as data to be stored;This reality It applies in example, the description as described in the data judgment module 120 specifically refers to the detailed description of above-mentioned steps S11, that is, described Step S11 can be executed by the data judgment module 120, thus not illustrated more herein.
The attribute computing module 130, the data attribute for calculating the data to be stored, the data that will be calculated Attribute is written in property file;In the present embodiment, the description as described in the attribute computing module 130 specifically refers to above-mentioned steps The detailed description of S12 that is, the step S12 can be executed by the data judgment module 120, thus is not made more herein Explanation.Optionally, in the present embodiment, the attribute computing module 130 includes that configuration file acquiring unit 131 and attribute calculate list Member 132.
The configuration file acquiring unit 131, for being called and its format according to the data format of the data to be stored Matched configuration file;In the present embodiment, the description as described in the configuration file acquiring unit 131 specifically refers to above-mentioned steps The detailed description of S120, that is, the step S120 can be executed by the configuration file acquiring unit 131, thus herein not Make more explanations.
The attribute computing unit 132, for being calculated according to preset data attribute computation rule in the configuration file The data attribute of the data to be stored.In the present embodiment, the description as described in the attribute computing unit 132 specifically refers to The detailed description of step S121 is stated, that is, the step S122 can be executed by the attribute computing unit 132, thus herein Do not illustrate more.
The data storage module 140, for being preserved to the data to be stored and property file.The present embodiment In, the description as described in the data storage module 140 specifically refers to the detailed description of above-mentioned steps S13, that is, the step S13 can be executed by the data storage module 140, thus not illustrated more herein.
The demand determining module 150, the data format for determining preprocessed data;In the present embodiment, about described The description of demand determining module 150 specifically refers to the detailed description of above-mentioned steps S14, that is, the step S14 can be by institute The execution of demand determining module 150 is stated, thus is not illustrated more herein.
The file configuration module 160 for determining corresponding data attribute according to each data format, and will include The file of data attribute carries out corresponding preservation as configuration file with the data format.In the present embodiment, about the file The description of configuration module 160 specifically refers to the detailed description of above-mentioned steps S15, that is, the step S15 can be by the text Part configuration module 160 executes, thus does not illustrate more herein.
In conclusion data cleaning method and device provided in an embodiment of the present invention based on Spark frames, wherein base Data cleansing is realized in Spark frames, can effectively avoid the need for occurring intermediate data writing on local in data processing Problem on disk can greatly improve data cleansing efficiency.Meanwhile it is the present disclosure may also ensure that accurate during data cleansing Property, authenticity.
In the description of the present invention, term " setting ", " connected ", " connection " shall be understood in a broad sense, for example, it may be fixed Connection, may be a detachable connection, or be integrally connected;It can be mechanical connection, can also be electrical connection;Can be direct It is connected, can also can is indirectly connected through an intermediary the connection inside two elements.For the ordinary skill of this field For personnel, the concrete meaning of above-mentioned term in the present invention can be understood with concrete condition.
In several embodiments that the embodiment of the present invention is provided, it should be understood that disclosed device and method also may be used To realize by other means.Device and method embodiment described above is only schematical, for example, the stream in attached drawing Journey figure and block diagram show that the device of preset quantity embodiment according to the present invention, method and computer program product may be real Existing architecture, function and operation.In this regard, each box in flowchart or block diagram can represent module, a journey A part for sequence section or code.A part for the module, section or code include one or preset quantity for realizing Defined logic function.
It should also be noted that at some as in the realization method replaced, the function of being marked in box can also be with difference The sequence marked in attached drawing occurs.For example, two continuous boxes can essentially be basically executed in parallel, they are sometimes It can also execute in the opposite order, this is depended on the functions involved.It is also noted that in block diagram and or flow chart The combination of each box and the box in block diagram and or flow chart can use the dedicated of function or action as defined in executing Hardware based system is realized, or can be realized using a combination of dedicated hardware and computer instructions.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, any made by repair Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.

Claims (10)

1. a kind of data cleaning method based on Spark frames, which is characterized in that the method includes:
Obtain data to be cleaned;
Judge whether meet preset need in the data to be cleaned, if being unsatisfactory for preset need, to the data to be cleaned Data cleansing is carried out, and the data that cleaning is completed are as data to be stored;
The data attribute being calculated is written in property file the data attribute for calculating the data to be stored;
The data to be stored and property file are preserved.
2. the data cleaning method according to claim 1 based on Spark frames, which is characterized in that waited for described in judgement clear Wash that the step of whether meeting preset need in data includes:
Judge the data to be cleaned whether meet data integrity demand, data consistency demand, data validation demand with And one or more demands in data uniqueness demand.
3. the data cleaning method according to claim 1 based on Spark frames, which is characterized in that wait depositing described in calculating The step of storing up the data attribute of data, including:
The configuration file with its format match is called according to the data format of the data to be stored;
The data attribute of the data to be stored is calculated according to preset data attribute computation rule in the configuration file.
4. the data cleaning method according to claim 3 based on Spark frames, which is characterized in that calculate institute executing Before the step of stating the data attribute of data to be stored, the method further includes:
Determine the data format of preprocessed data;
Determine corresponding data attribute according to each data format, and using the file comprising data attribute as configuration file with The data format carries out corresponding preservation.
5. the data cleaning method according to claim 4 based on Spark frames, which is characterized in that the data attribute Include the Value for the KEY of characterize data field meanings and for characterizing field threshold value.
6. the data cleaning method according to claim 3 based on Spark frames, which is characterized in that obtain number to be cleaned According to the step of include:
Daily record data is acquired from data source be used as data to be cleaned by preset data sampling instrument according to prefixed time interval.
7. a kind of data cleansing device based on Spark frames, which is characterized in that described device includes:
Data acquisition module, for obtaining data to be cleaned;
Data judgment module, for judging whether meet preset need in the data to be cleaned, if being unsatisfactory for preset need, Data cleansing is carried out to the data to be cleaned, and the data that cleaning is completed are as data to be stored;
The data attribute being calculated is written and is belonged to by attribute computing module, the data attribute for calculating the data to be stored In property file;
Data storage module, for being preserved to the data to be stored and property file.
8. the data cleansing device according to claim 7 based on Spark frames, which is characterized in that the data judge Module is additionally operable to:
Judge the data to be cleaned whether meet data integrity demand, data consistency demand, data validation demand with And one or more demands in data uniqueness demand.
9. the data cleansing device according to claim 7 based on Spark frames, which is characterized in that the attribute calculates Module includes:
Configuration file acquiring unit, for calling the configuration text with its format match according to the data format of the data to be stored Part;
Attribute computing unit, for calculating the number to be stored according to preset data attribute computation rule in the configuration file According to data attribute.
10. the data cleansing device according to claim 9 based on Spark frames, which is characterized in that described device is also wrapped It includes:
Demand determining module, the data format for determining preprocessed data;
File configuration module for determining corresponding data attribute according to each data format, and will include data attribute File carries out corresponding preservation as configuration file with the data format.
CN201810398800.1A 2018-04-28 2018-04-28 Data cleaning method based on Spark frames and device Pending CN108563789A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810398800.1A CN108563789A (en) 2018-04-28 2018-04-28 Data cleaning method based on Spark frames and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810398800.1A CN108563789A (en) 2018-04-28 2018-04-28 Data cleaning method based on Spark frames and device

Publications (1)

Publication Number Publication Date
CN108563789A true CN108563789A (en) 2018-09-21

Family

ID=63537449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810398800.1A Pending CN108563789A (en) 2018-04-28 2018-04-28 Data cleaning method based on Spark frames and device

Country Status (1)

Country Link
CN (1) CN108563789A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109684082A (en) * 2018-12-11 2019-04-26 中科恒运股份有限公司 The data cleaning method and system of rule-based algorithm
CN109753496A (en) * 2018-11-27 2019-05-14 天聚地合(苏州)数据股份有限公司 A kind of data cleaning method for big data
CN110597793A (en) * 2019-07-30 2019-12-20 深圳市华傲数据技术有限公司 Data management method and device, electronic equipment and computer readable storage medium
CN111651509A (en) * 2020-04-30 2020-09-11 中国平安财产保险股份有限公司 Data importing method and device based on Hbase database, electronic device and medium
CN112633206A (en) * 2020-12-28 2021-04-09 上海眼控科技股份有限公司 Dirty data processing method, device, equipment and storage medium
CN113821503A (en) * 2021-09-23 2021-12-21 北京金山云网络技术有限公司 Medical data processing method and device and edge server

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090241520A1 (en) * 2008-03-31 2009-10-01 Woodward Governor Company Diesel Exhaust Soot Sensor System and Method
CN106446277A (en) * 2016-08-21 2017-02-22 宁化宽信科技服务有限公司 Big data storage system
CN106682213A (en) * 2016-12-30 2017-05-17 Tcl集团股份有限公司 Internet-of-things task customizing method and system based on Hadoop platform
CN107229621A (en) * 2016-03-23 2017-10-03 北大方正集团有限公司 The cleaning method and device of variance data
CN107491381A (en) * 2017-07-04 2017-12-19 广西电网有限责任公司电力科学研究院 A kind of equipment condition monitoring quality of data evaluating system
CN107493257A (en) * 2016-06-13 2017-12-19 大唐移动通信设备有限公司 A kind of frame data compression transmission, frame data decompression method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090241520A1 (en) * 2008-03-31 2009-10-01 Woodward Governor Company Diesel Exhaust Soot Sensor System and Method
CN107229621A (en) * 2016-03-23 2017-10-03 北大方正集团有限公司 The cleaning method and device of variance data
CN107493257A (en) * 2016-06-13 2017-12-19 大唐移动通信设备有限公司 A kind of frame data compression transmission, frame data decompression method and device
CN106446277A (en) * 2016-08-21 2017-02-22 宁化宽信科技服务有限公司 Big data storage system
CN106682213A (en) * 2016-12-30 2017-05-17 Tcl集团股份有限公司 Internet-of-things task customizing method and system based on Hadoop platform
CN107491381A (en) * 2017-07-04 2017-12-19 广西电网有限责任公司电力科学研究院 A kind of equipment condition monitoring quality of data evaluating system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753496A (en) * 2018-11-27 2019-05-14 天聚地合(苏州)数据股份有限公司 A kind of data cleaning method for big data
CN109684082A (en) * 2018-12-11 2019-04-26 中科恒运股份有限公司 The data cleaning method and system of rule-based algorithm
CN110597793A (en) * 2019-07-30 2019-12-20 深圳市华傲数据技术有限公司 Data management method and device, electronic equipment and computer readable storage medium
CN111651509A (en) * 2020-04-30 2020-09-11 中国平安财产保险股份有限公司 Data importing method and device based on Hbase database, electronic device and medium
CN111651509B (en) * 2020-04-30 2024-04-02 中国平安财产保险股份有限公司 Hbase database-based data importing method and device, electronic equipment and medium
CN112633206A (en) * 2020-12-28 2021-04-09 上海眼控科技股份有限公司 Dirty data processing method, device, equipment and storage medium
CN113821503A (en) * 2021-09-23 2021-12-21 北京金山云网络技术有限公司 Medical data processing method and device and edge server

Similar Documents

Publication Publication Date Title
CN108563789A (en) Data cleaning method based on Spark frames and device
CN106850746B (en) The method and device of smooth service upgrading
CN106067080B (en) Configurable workflow capabilities are provided
CN107832406A (en) Duplicate removal storage method, device, equipment and the storage medium of massive logs data
CN111046072A (en) Data query method, system, heterogeneous computing acceleration platform and storage medium
CN108287708A (en) A kind of data processing method, device, server and computer readable storage medium
CN110134516A (en) Finance data processing method, device, equipment and computer readable storage medium
CN108920948A (en) A kind of anti-fraud streaming computing device and method
CN110262889A (en) A kind of link tracing method and device
CN109933585A (en) Data query method and data query system
CN108363741A (en) Big data unified interface method, apparatus, equipment and storage medium
US20190080248A1 (en) System and method for facilitating model-based classification of transactions
CN108564462A (en) Acquisition methods, terminal device and the medium of collage-credit data
CN106919438A (en) Workflow processing method and framework in a kind of virtualized environment
CN110209467A (en) A kind of flexible resource extended method and system based on machine learning
CN107783850A (en) A kind of node tree chooses analytic method, device, server and the system of record
CN110727664A (en) Method and device for executing target operation on public cloud data
CN110222790A (en) Method for identifying ID, device and server
US11082526B2 (en) Optimizing large parameter passing in a service mesh
CN109298899A (en) A kind of file automating application method of juvenile's game configuration and electronic equipment
CN108932241A (en) Daily record data statistical method, device and node
CN112860506B (en) Method, device, system and storage medium for processing monitoring data
JP5475602B2 (en) Asynchronous processing service management system
US11538586B2 (en) Clinical decision support
CN107689969A (en) A kind of determination method and device of cache policy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180921