CN107784070A - A kind of method, apparatus and equipment for improving data cleansing efficiency - Google Patents

A kind of method, apparatus and equipment for improving data cleansing efficiency Download PDF

Info

Publication number
CN107784070A
CN107784070A CN201710834301.8A CN201710834301A CN107784070A CN 107784070 A CN107784070 A CN 107784070A CN 201710834301 A CN201710834301 A CN 201710834301A CN 107784070 A CN107784070 A CN 107784070A
Authority
CN
China
Prior art keywords
inquiry
data
key word
data cleansing
field
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710834301.8A
Other languages
Chinese (zh)
Other versions
CN107784070B (en
Inventor
李治
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201710834301.8A priority Critical patent/CN107784070B/en
Publication of CN107784070A publication Critical patent/CN107784070A/en
Priority to PCT/CN2018/082314 priority patent/WO2019052162A1/en
Application granted granted Critical
Publication of CN107784070B publication Critical patent/CN107784070B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24549Run-time optimisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

A kind of method for improving data cleansing efficiency includes:Data cleansing request is received, the data cleansing request includes the key word of the inquiry of user's input;The identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the ratio of inquiry field total number of word, determines the key word of the inquiry with inquiring about the similarity of field, the inquiry field according to corresponding to the similarity determines the key word of the inquiry;The search order of inquiry field in default joint index, is ranked up to the key word of the inquiry;According to the key word of the inquiry after sequence, data are inquired about and cleaned.Because user can any input inquiry keyword, you can it is quickly automatic to carry out data search, be advantageous to improve the convenience searched, save and search the time, improve cleaning efficiency.

Description

A kind of method, apparatus and equipment for improving data cleansing efficiency
Technical field
The invention belongs to data processing field, more particularly to a kind of method, apparatus and equipment of data cleansing efficiency.
Background technology
In insurance industry, need to clean data at regular intervals, i.e., data are carried out examining and verifying again Process, it is therefore intended that delete duplicate message, correct existing for mistake, and provide data consistency.Such as the middle of the month of every month And the end of month, the cleaning to insuring payout data etc..Because the dividend date of different insurances differs, and the quantity of insurance policy is huge Greatly.For example the quantity for monthly insuring dividend of Ping An Insurance Company is up to 300,000,000 a plurality of, also, over time, monthly increase Up to as many as millions of.
, it is necessary to which tester carries out multiple lookup to data and to score when being cleaned to so huge data Analysis, to find the exception in stored data.Because data volume is too big, cause what needs during tester's searching data consumed Time is very long, searches specification and to search trouble, search efficiency is very low.
The content of the invention
In view of this, the embodiment of the present application provides the method, apparatus and equipment for improving data cleansing efficiency, existing to solve Having because data volume is too big, causes the time that needs consume during tester's searching data very long in technology, search trouble, and And the low-down problem of efficiency.
The first aspect of the embodiment of the present application provides a kind of method for improving data cleansing efficiency, and the raising data are clear Washing the method for efficiency includes:
Data cleansing request is received, the data cleansing request includes the key word of the inquiry of user's input;
The identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the ratio of inquiry field total number of word, it is determined that The key word of the inquiry and the similarity of inquiry field, the inquiry according to corresponding to the similarity determines the key word of the inquiry Field;
The search order of inquiry field in default joint index, is ranked up to the key word of the inquiry;
According to the key word of the inquiry after sequence, data are inquired about and cleaned.
It is described that data are carried out with inquiry and clear with reference in a first aspect, in the first possible implementation of first aspect The step of washing includes:
The data result of the similar traffic of lookup is counted, obtains the ratio shared by the item number of different pieces of information result Value;
By the ratio value of statistics compared with predetermined proportion threshold value, ratio value of the record less than predetermined proportion threshold value The odd numbers of corresponding data result.
With reference in a first aspect, may be described that data are carried out with inquiry and clear in implementation at second of first aspect The step of washing includes:
Set the normal data result corresponding to the key word of the inquiry of input;
By data result corresponding to the key word of the inquiry compared with normal data result, there is the number of difference in record According to the odd numbers corresponding to result.
It is described that data are carried out with inquiry and clear with reference in a first aspect, in the third possible implementation of first aspect The step of washing includes:
Obtain request time when receiving the data cleansing request;
According to default request time and the corresponding relation of scheduled time slot data, search corresponding to the data cleansing request Scheduled time slot data;
The scheduled time slot data are analyzed.
Second of possible realization side of the first possible implementation, first aspect with reference to first aspect, first aspect The possible implementation of the third of formula or first aspect, in the 4th kind of possible implementation of first aspect, methods described is also Including:
Count the enquiry frequency of the inquiry field corresponding to the key word of the inquiry;
According to the enquiry frequency of the inquiry field, the order of the inquiry field in the joint index is adjusted.
The second aspect of the embodiment of the present application provides a kind of device for improving data cleansing efficiency, and the raising data are clear Washing the device of efficiency includes:
Request reception unit, for receiving data cleansing request, the data cleansing request includes the inquiry of user's input Keyword;
Inquiry word segment search unit, inquiry is accounted for for the identical number of words according to possessed by key word of the inquiry and inquiry field The ratio of field total number of word, the key word of the inquiry is determined with inquiring about the similarity of field, according to determining the similarity Inquiry field corresponding to key word of the inquiry;
Sequencing unit, it is crucial to the inquiry for the search order of the inquiry field in default joint index Word is ranked up;
Cleaning unit, for according to the key word of the inquiry after sequence, being inquired about data and being cleaned.
With reference to second aspect, in the first possible implementation of second aspect, the cleaning unit includes:
Ratio value determination subelement, for the data result of the similar traffic of lookup to be counted, obtain different pieces of information As a result the ratio value shared by item number;
First odd numbers records subelement, for the ratio value of statistics compared with predetermined proportion threshold value, to be recorded small The odd numbers of the data result corresponding to ratio value in predetermined proportion threshold value.
With reference to second aspect, in second of possible implementation of second aspect, the cleaning unit includes:
Normal data result sets subelement, for setting the normal data result corresponding to the key word of the inquiry of input;
Second odd numbers records subelement, for data result corresponding to the key word of the inquiry to be entered with normal data result Row compares, and the odd numbers corresponding to the data result of difference occurs in record.
The third aspect of the embodiment of the present application provides a kind of equipment for improving data cleansing efficiency, including memory, place Manage device and be stored in the computer program that can be run in the memory and on the processor, it is characterised in that be described The step for the method that data cleansing efficiency is improved as described in any one of first aspect is realized described in computing device during computer program Suddenly.
The fourth aspect of the embodiment of the present application provides a kind of computer-readable recording medium, the computer-readable storage Media storage has computer program, it is characterised in that realizes that first aspect such as is appointed when the computer program is executed by processor One it is described improve data cleansing efficiency method the step of.
Existing beneficial effect is the embodiment of the present application compared with prior art:Receiving data cleansing asks what is included to look into Keyword is ask, according to key word of the inquiry and the corresponding relation of inquiry field, the key word of the inquiry is ranked up, so as to basis The data that keyword after sequence quickly finds needs in inquiry field accordingly are cleaned.During due to data cleansing Can arbitrarily be inputted by user needs the data inquired about, and user can any input inquiry keyword, you can quickly enters automatically Row data search, be advantageous to improve the convenience searched, save and search the time, improve cleaning efficiency.
Brief description of the drawings
, below will be to embodiment or description of the prior art in order to illustrate more clearly of the technical scheme in the embodiment of the present application In the required accompanying drawing used be briefly described, it should be apparent that, drawings in the following description be only the present invention some Embodiment, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these Accompanying drawing obtains other accompanying drawings.
Fig. 1 is the implementation process schematic diagram of the method for the raising data cleansing efficiency that the embodiment of the present application provides;
Fig. 2 is the implementation process schematic diagram of the method for the raising data cleansing efficiency that the embodiment of the present application provides;
Fig. 3 is the implementation process schematic diagram of the method for the raising data cleansing efficiency that the embodiment of the present application provides;
Fig. 4 is the implementation process schematic diagram of the method for the raising data cleansing efficiency that the embodiment of the present application provides;
Fig. 5 is the schematic diagram of the device for the raising data cleansing efficiency that the embodiment of the present application provides;
Fig. 6 is the schematic diagram of the equipment for the raising data cleansing efficiency that the embodiment of the present application provides.
Embodiment
In describing below, in order to illustrate rather than in order to limit, it is proposed that such as tool of particular system structure, technology etc Body details, thoroughly to understand the embodiment of the present application.However, it will be clear to one skilled in the art that there is no these specific The present invention can also be realized in the other embodiments of details.In other situations, omit to well-known system, device, electricity Road and the detailed description of method, in case unnecessary details hinders description of the invention.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
The implementation process schematic diagram of the method for the raising data cleansing efficiency provided as shown in Figure 1 for the embodiment of the present application, Details are as follows:
In step S101, data cleansing request is received, the inquiry that the data cleansing request includes user's input is crucial Word.
Specifically, the data cleansing request, can be that the abnormal data that user triggers after input inquiry keyword is looked into Ask request.
The key word of the inquiry, can be the content that inquiry word section includes.Such as the inquiry field for Business Name In, including multiple specific Business Names, the key word of the inquiry can be specific in the inquiry field in Business Name Business Name.
The data cleansing refers to be used to filter those undesirable data, and the result of filtering is searched and recorded, In order to be confirmed whether to filter out, or by being extracted again after service unit amendment.Undesirable data are mainly There are incomplete data, the data of mistake, data of repetition etc..
For the very big data cleansing of data volume, data can be divided according to predetermined time period, at one In period, cleaned for one section of fixed historical data, when avoiding in real time to data cleansing, by data increase institute band The data cleansing number come dramatically increases, and avoids the multiplicating to data from cleaning.
In step s 102, the identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the inquiry total word of field Several ratios, the key word of the inquiry is determined with inquiring about the similarity of field, determine that the inquiry is crucial according to the similarity Inquiry field corresponding to word.
Specifically, the key word of the inquiry and the corresponding relation of inquiry field, can obtain the tool included by inquiry field Body title, establish the corresponding relation of the inquiry field and all specific names.When user input query keyword, by institute State key word of the inquiry to be matched with specific name, when both similarities exceed certain value, it may be determined that the inquiry of input is closed Inquiry field corresponding to key word.
The key word of the inquiry and the similarity of the specific name under inquiry field, can be according to specific name and inquiry word Section identical number of words, the ratio with the total number of word of specific name, it is determined that both similarities.It can be looked into by similarity threshold The more similar multiple specific names of key word of the inquiry with input are found out, closely obtain dependency number corresponding with specific name According to.
Or the first similarity that can also be determined according to the number of words of key word of the inquiry, with reference to the meaning of word of key word of the inquiry With the second similarity of the meaning of word for inquiring about field, the inquiry field corresponding to the key word of the inquiry is determined.The inquiry is crucial Word, the meaning of word for inquiring about field, can preset, according to key word of the inquiry and the corresponding relation of inquiry field and the meaning of word, it is determined that The meaning of word corresponding to key word of the inquiry, it can also determine to inquire about the meaning of word of field.
The inquiry field, represent the column data stored in table data store.In the table data store, institute Including every column data, respectively correspond to one inquiry field, such as can include sequence number inquiry field, name query field, industry Service type inquiry field, data result inquiry field etc..
In step s 103, the search order of the inquiry field in default joint index, it is crucial to the inquiry Word is ranked up.
Beforehand through the mode of joint index, tables of data to be cleaned is stored.The joint index includes inquiry The search order of field.It is corresponding to adjust according to the sequencing information of the inquiry field in joint index by way of joint index With corresponding key word of the inquiry, the result data for finding target that can be more accurately and quickly.
For example six inquiry fields of A, B, C, D, E, F are stored with tables of data to be cleaned, and inquire about the order of field It is followed successively by F, E, D, C, B, A.When the key word of the inquiry of user's input corresponds to E-F, D-E-F, C-D-E-F, B-C-D-E-F, A-B- During C-D-E-F, according to the ranking results of inquiry field corresponding inquiry field can be asked to be ranked up the data cleansing For:F-E、F-E-D、F-E-D-C、F-E-D-C-B、F-E-D-C-B-A.When the inquiry corresponding to the key word of the inquiry of user's input Field when the pars intermedia for inquiring about field sequence list lacks partial query field, supplement by the inquiry field that automatically can lack this For whole lookups, so as to further improve the adaptability of inquiry or flexibility.
Furthermore it is also possible to selected Query Result when inputting default key word of the inquiry according to user, according to Different Results Selection number, the data all searched are ranked up according to the number of selection, in use according to selection result Continuous Optimal scheduling, it is easy to user effectively to find required data.
In step S104, according to the key word of the inquiry after sequence, data are inquired about and cleaned.
When carrying out data query according to the key word of the inquiry after sequence, it can be entered by the difference of the result data according to inquiry Row analysis, abnormal result data is recorded as by the result data having differences of small scale.Or it can also be closed according to inquiry Normal data result corresponding to the setting of key word, the data result of the data result of inquiry and standard is contrasted, you can it is determined that There is abnormal data result.
The key word of the inquiry included by receiving data cleansing request, it is corresponding with inquiry field according to key word of the inquiry Relation, the key word of the inquiry is ranked up, so as to inquired about accordingly in field quickly according to the keyword after sequence The data for finding needs are cleaned.The data for needing to inquire about, user can be arbitrarily inputted during due to data cleansing by user Can any input inquiry keyword, you can it is quickly automatic to carry out data search, be advantageous to improve the convenience searched, save The time is searched, improves cleaning efficiency.
Fig. 2 is the implementation process for the method that another that the embodiment of the present application provides improves data cleansing efficiency, is described in detail such as Under:
In step s 201, data cleansing request is received, the inquiry that the data cleansing request includes user's input is crucial Word.
In step S202, the identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the inquiry total word of field Several ratios, the key word of the inquiry is determined with inquiring about the similarity of field, determine that the inquiry is crucial according to the similarity Inquiry field corresponding to word.
In step S203, the search order of the inquiry field in default joint index is crucial to the inquiry Word is ranked up.
Step S101-S103 described in step S201-S203 and Fig. 1 is essentially identical.
In step S204, according to the key word of the inquiry after sequence, the data result of the similar traffic of lookup is united Meter, obtains the ratio value shared by the item number of different pieces of information result.
The key word of the inquiry inputted according to user, after matching corresponding inquiry field, key word of the inquiry is ranked up, root According to the key word of the inquiry after sequence, inquired about in tables of data, find the data result corresponding to key word of the inquiry.By root It is compared and analyzes according to data, can obtains the quantity of the data result of different values, and the data result of different value Quantity accounts for the ratio of the quantity of total data result.
Such as include tri- kinds of X, Y, Z by inquiring about obtained data result, and the quantity of every kind of data result be 588, 658th, 54, then, the ratio value shared by X, Y, Z data result is respectively:588/1300、658/1300、54/1300.
In step S205, by the ratio value of statistics compared with predetermined proportion threshold value, record is less than predetermined ratio The odd numbers of data result corresponding to the ratio value of example threshold value.
The less characteristic of probability occurred according to abnormal data result, can be with setting ratio threshold value, such as 5% etc..Will compare The data result that example value is less than default proportion threshold value is recorded as the data result of exception.Pass through the side of compared proportions value Formula, abnormal data can be solved automatically, the intelligent of the abnormal result data of system searching can be improved and search effect Rate.
Fig. 3 is the implementation process of the another method for improving data cleansing efficiency provided in an embodiment of the present invention, and details are as follows:
In step S301, data cleansing request is received, the inquiry that the data cleansing request includes user's input is crucial Word.
In step s 302, the identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the inquiry total word of field Several ratios, the key word of the inquiry is determined with inquiring about the similarity of field, determine that the inquiry is crucial according to the similarity Inquiry field corresponding to word.
In step S303, the search order of the inquiry field in default joint index is crucial to the inquiry Word is ranked up.
Step S101-S103 described in step S301-S303 and Fig. 1 is essentially identical.
In step s 304, the normal data result corresponding to the key word of the inquiry of input is set;
According to the key word of the inquiry after sequence, data result corresponding to the key word of the inquiry is entered with normal data result Row compares, and the odd numbers corresponding to the data result of difference occurs in record.
The normal data result can be set by staff, or can also be according to the data knot of system-computed The maximum data result of ratio is selected in fruit, as normal data result.By the normal data result and other data results It is compared, so as to quickly position abnormal data result, particularly when abnormal data result is including a variety of and abnormal When the quantity of data result is more, using the lookup mode, be advantageous to improve search efficiency.
Fig. 4 is the implementation process of the another method for improving data cleansing efficiency provided in an embodiment of the present invention, and details are as follows:
In step S401, data cleansing request is received, the inquiry that the data cleansing request includes user's input is crucial Word.
In step S402, the identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the inquiry total word of field Several ratios, the key word of the inquiry is determined with inquiring about the similarity of field, determine that the inquiry is crucial according to the similarity Inquiry field corresponding to word.
In step S403, the search order of the inquiry field in default joint index is crucial to the inquiry Word is ranked up.
Step S101-S103 described in step S401-S403 and Fig. 1 is essentially identical.
In step s 404, request time when receiving the data cleansing request is obtained.
The request time for receiving the data cleansing request, can be the time that user triggers the cleaning request. If user triggers the cleaning request, it is set as that predetermined point of time carries out data cleansing, then during the data cleansing request Request time, can be logarithm according to time when being cleaned.
In step S405, according to default request time and the corresponding relation of scheduled time slot data, the data are searched Scheduled time slot data corresponding to cleaning request.
The request time and the corresponding relation of scheduled time slot data, can set according to cleaning requirement.Such as can be with When No. 1-No. 19 be set in every month carry out data cleansing, the execution date recorded is the end of month of last month, corresponding Data are last month 20 to the data between the end of month.To during the end of month progress data cleansing, what is recorded holds No. 20 in every month The row date is of that month No. 20, and corresponding data are the data of No. 1 to No. 19, such as, the day that the execution time performs with record Phase, data corresponding relation can be with as shown in the table:
Certainly, above-mentioned execution date and period are same, simply one of which dividing mode.
In step S406, the scheduled time slot data are analyzed.
By the way that the data of scheduled time slot and scavenging period are established into corresponding relation so that user when carrying out data scrubbing, Identical data can be carried out with repeatedly cleaning verification at different time points, and can avoid introducing different pieces of information to cleaning tape To disturb, be advantageous to improve the accuracy rate of cleaning.
In addition, a kind of embodiment as the application optimization, methods described can also include the inquiry word to combining rope The step of order of section optimizes, including:
Count the enquiry frequency of the inquiry field corresponding to the key word of the inquiry;
According to the enquiry frequency of the inquiry field, the order of the inquiry field in the joint index is adjusted.
By being counted to the key word of the inquiry that user inputs, it is crucial that the most inquiry of the number of user input can be recorded Word, the inquiry field corresponding to the most keyword of number is emitted on forward position, enabled a user to quicker Find required data, further improve search efficiency.
It should be understood that the size of the sequence number of each step is not meant to the priority of execution sequence, each process in above-described embodiment Execution sequence should determine that the implementation process without tackling the embodiment of the present application forms any limit with its function and internal logic It is fixed.
Fig. 5 is the structural representation of the device for the raising data cleansing efficiency that the embodiment of the present application provides, and details are as follows:
The herein described device for improving data cleansing efficiency, including:
Request reception unit 501, for receiving data cleansing request, the data cleansing request includes looking into for user's input Ask keyword;
Inquiry word segment search unit 502, for according to key word of the inquiry with inquiry field possessed by identical number of words account for The ratio of field total number of word is inquired about, determines that the key word of the inquiry with inquiring about the similarity of field, determines according to the similarity Inquiry field corresponding to the key word of the inquiry;
Sequencing unit 503, for the search order of the inquiry field in default joint index, the inquiry is closed Key word is ranked up;
Cleaning unit 504, for according to the key word of the inquiry after sequence, being inquired about data and being cleaned.
Preferably, the cleaning unit includes:
Ratio value determination subelement, for the data result of the similar traffic of lookup to be counted, obtain different pieces of information As a result the ratio value shared by item number;
First odd numbers records subelement, for the ratio value of statistics compared with predetermined proportion threshold value, to be recorded small The odd numbers of the data result corresponding to ratio value in predetermined proportion threshold value.
Preferably, the cleaning unit includes:
Normal data result sets subelement, for setting the normal data result corresponding to the key word of the inquiry of input;
Second odd numbers records subelement, for data result corresponding to the key word of the inquiry to be entered with normal data result Row compares, and the odd numbers corresponding to the data result of difference occurs in record.
The equipment that data cleansing efficiency is improved described in Fig. 5, the method pair with the raising data cleansing efficiency described in Fig. 1-4 Should.
Fig. 6 is the schematic diagram of the equipment for the raising data cleansing efficiency that one embodiment of the invention provides.As shown in fig. 6, should The equipment 6 of the raising data cleansing efficiency of embodiment includes:Processor 60, memory 61 and it is stored in the memory 61 And the computer program 62 that can be run on the processor 60, such as improve the program of data cleansing efficiency.The processor The step in above-mentioned each embodiment of the method for improving data cleansing efficiency is realized during the 60 execution computer program 62, such as Step 101 shown in Fig. 1 is to 104.Or the processor 60 realizes that above-mentioned each device is real when performing the computer program 62 Apply the function of each module/unit in example, such as the function of module 501 to 504 shown in Fig. 5.
Exemplary, the computer program 62 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 61, and are performed by the processor 60, to complete the present invention.Described one Individual or multiple module/units can be the series of computation machine programmed instruction section that can complete specific function, and the instruction segment is used for Implementation procedure of the computer program 62 in the equipment 6 of the raising data cleansing efficiency is described.For example, the computer It is specific that program 62 can be divided into request reception unit, inquiry word segment search unit, sequencing unit and cleaning unit, each unit Function is as follows:
Request reception unit, for receiving data cleansing request, the data cleansing request includes the inquiry of user's input Keyword;
Inquiry word segment search unit, inquiry is accounted for for the identical number of words according to possessed by key word of the inquiry and inquiry field The ratio of field total number of word, the key word of the inquiry is determined with inquiring about the similarity of field, according to determining the similarity Inquiry field corresponding to key word of the inquiry;
Sequencing unit, it is crucial to the inquiry for the search order of the inquiry field in default joint index Word is ranked up;
Cleaning unit, for according to the key word of the inquiry after sequence, being inquired about data and being cleaned.
The equipment 6 for improving data cleansing efficiency can be desktop PC, notebook, palm PC and high in the clouds clothes The computing devices such as business device.The equipment for improving data cleansing efficiency may include, but be not limited only to, processor 60, memory 61. It will be understood by those skilled in the art that Fig. 6 is only the example for the equipment 6 for improving data cleansing efficiency, do not form to improving The restriction of the equipment 6 of data cleansing efficiency, parts more more or less than diagram, or some parts of combination can be included, or The different part of person, such as the equipment for improving data cleansing efficiency can also include input-output equipment, network insertion is set Standby, bus etc..
Alleged processor 60 can be CPU (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other PLDs, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor Deng.
The memory 61 can be the internal storage unit of the equipment 6 of the raising data cleansing efficiency, such as improve The hard disk or internal memory of the equipment 6 of data cleansing efficiency.The memory 61 can also be setting for the raising data cleansing efficiency The plug-in type hard disk being equipped with standby 6 External memory equipment, such as the equipment 6 for improving data cleansing efficiency, intelligent storage Block (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc.. Further, the memory 61 can also both include it is described raising data cleansing efficiency equipment 6 internal storage unit or Including External memory equipment.The memory 61 is used to storing the computer program and described improves data cleansing efficiency Other programs and data needed for equipment.The memory 61, which can be also used for temporarily storing, have been exported or will export Data.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work( Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device are divided into different functional units or module, more than completion The all or part of function of description.Each functional unit, module in embodiment can be integrated in a processing unit, also may be used To be that unit is individually physically present, can also two or more units it is integrated in a unit, it is above-mentioned integrated Unit can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.In addition, each function list Member, the specific name of module are not limited to the protection domain of the application also only to facilitate mutually distinguish.Said system The specific work process of middle unit, module, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and is not described in detail or remembers in some embodiment The part of load, it may refer to the associated description of other embodiments.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein Member and algorithm steps, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually Performed with hardware or software mode, application-specific and design constraint depending on technical scheme.Professional and technical personnel Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed device/terminal device and method, can be with Realize by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute The division of module or unit is stated, only a kind of division of logic function, there can be other dividing mode when actually realizing, such as Multiple units or component can combine or be desirably integrated into another system, or some features can be ignored, or not perform.Separately A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be by some interfaces, device Or INDIRECT COUPLING or the communication connection of unit, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs 's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated module/unit realized in the form of SFU software functional unit and as independent production marketing or In use, it can be stored in a computer read/write memory medium.Based on such understanding, the present invention realizes above-mentioned implementation All or part of flow in example method, by computer program the hardware of correlation can also be instructed to complete, described meter Calculation machine program can be stored in a computer-readable recording medium, and the computer program can be achieved when being executed by processor The step of stating each embodiment of the method..Wherein, the computer program includes computer program code, the computer program Code can be source code form, object identification code form, executable file or some intermediate forms etc..Computer-readable Jie Matter can include:Can carry any entity or device of the computer program code, recording medium, USB flash disk, mobile hard disk, Magnetic disc, CD, computer storage, read-only storage (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It is it should be noted that described The content that computer-readable medium includes can carry out appropriate increasing according to legislation in jurisdiction and the requirement of patent practice Subtract, such as in some jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and Telecommunication signal.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to foregoing reality Example is applied the present invention is described in detail, it will be understood by those within the art that:It still can be to foregoing each Technical scheme described in embodiment is modified, or carries out equivalent substitution to which part technical characteristic;And these are changed Or replace, the essence of appropriate technical solution is departed from the spirit and scope of various embodiments of the present invention technical scheme, all should Within protection scope of the present invention.

Claims (10)

  1. A kind of 1. method for improving data cleansing efficiency, it is characterised in that the method for improving data cleansing efficiency includes:
    Data cleansing request is received, the data cleansing request includes the key word of the inquiry of user's input;
    The identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the ratio of inquiry field total number of word, it is determined that described Key word of the inquiry and the similarity of inquiry field, the inquiry word according to corresponding to the similarity determines the key word of the inquiry Section;
    The search order of inquiry field in default joint index, is ranked up to the key word of the inquiry;
    According to the key word of the inquiry after sequence, data are inquired about and cleaned.
  2. 2. the method according to claim 1 for improving data cleansing efficiency, it is characterised in that described that data are inquired about Include with the step of cleaning:
    The data result of the similar traffic of lookup is counted, obtains the ratio value shared by the item number of different pieces of information result;
    By the ratio value of statistics compared with predetermined proportion threshold value, record is right less than the ratio value institute of predetermined proportion threshold value The odd numbers for the data result answered.
  3. 3. the method according to claim 1 for improving data cleansing efficiency, it is characterised in that described that data are inquired about Include with the step of cleaning:
    Set the normal data result corresponding to the key word of the inquiry of input;
    By data result corresponding to the key word of the inquiry compared with normal data result, there is the data knot of difference in record Odd numbers corresponding to fruit.
  4. 4. the method according to claim 1 for improving data cleansing efficiency, it is characterised in that described that data are inquired about Include with the step of cleaning:
    Obtain request time when receiving the data cleansing request;
    According to default request time and the corresponding relation of scheduled time slot data, search pre- corresponding to the data cleansing request Timing segment data;
    The scheduled time slot data are analyzed.
  5. 5. according to any one of the claim 1-4 methods for improving data cleansing efficiency, it is characterised in that methods described is also wrapped Include:
    Count the enquiry frequency of the inquiry field corresponding to the key word of the inquiry;
    According to the enquiry frequency of the inquiry field, the order of the inquiry field in the joint index is adjusted.
  6. 6. a kind of device for improving data cleansing efficiency, it is characterised in that the device for improving data cleansing efficiency includes:
    Request reception unit, for receiving data cleansing request, the inquiry that the data cleansing request includes user's input is crucial Word;
    Inquiry word segment search unit, inquiry field is accounted for for the identical number of words according to possessed by key word of the inquiry and inquiry field The ratio of total number of word, determine that with inquiring about the similarity of field, the inquiry is determined according to the similarity for the key word of the inquiry Inquiry field corresponding to keyword;
    Sequencing unit, for the search order of the inquiry field in default joint index, the key word of the inquiry is entered Row sequence;
    Cleaning unit, for according to the key word of the inquiry after sequence, being inquired about data and being cleaned.
  7. 7. the device according to claim 6 for improving data cleansing efficiency, it is characterised in that the cleaning unit includes:
    Ratio value determination subelement, for the data result of the similar traffic of lookup to be counted, obtain different pieces of information result Item number shared by ratio value;
    First odd numbers records subelement, for by the ratio value of statistics, compared with predetermined proportion threshold value, record to be less than pre- The odd numbers of data result corresponding to the ratio value of fixed proportion threshold value.
  8. 8. the device according to claim 6 for improving data cleansing efficiency, it is characterised in that the cleaning unit includes:
    Normal data result sets subelement, for setting the normal data result corresponding to the key word of the inquiry of input;
    Second odd numbers records subelement, for data result corresponding to the key word of the inquiry to be compared with normal data result Compared with the odd numbers corresponding to the data result of difference occurs in record.
  9. 9. a kind of equipment for improving data cleansing efficiency, including memory, processor and it is stored in the memory and can The computer program run on the processor, it is characterised in that realized described in the computing device during computer program The step of method of data cleansing efficiency is improved as described in any one of claim 1 to 5.
  10. 10. a kind of computer-readable recording medium, the computer-readable recording medium storage has computer program, and its feature exists In realizing the raising data cleansing efficiency as described in any one of claim 1 to 5 when the computer program is executed by processor The step of method.
CN201710834301.8A 2017-09-15 2017-09-15 Method, device and equipment for improving data cleaning efficiency Active CN107784070B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201710834301.8A CN107784070B (en) 2017-09-15 2017-09-15 Method, device and equipment for improving data cleaning efficiency
PCT/CN2018/082314 WO2019052162A1 (en) 2017-09-15 2018-04-09 Method, apparatus and device for improving data cleaning efficiency, and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710834301.8A CN107784070B (en) 2017-09-15 2017-09-15 Method, device and equipment for improving data cleaning efficiency

Publications (2)

Publication Number Publication Date
CN107784070A true CN107784070A (en) 2018-03-09
CN107784070B CN107784070B (en) 2020-10-30

Family

ID=61438075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710834301.8A Active CN107784070B (en) 2017-09-15 2017-09-15 Method, device and equipment for improving data cleaning efficiency

Country Status (2)

Country Link
CN (1) CN107784070B (en)
WO (1) WO2019052162A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984708A (en) * 2018-07-06 2018-12-11 蔚来汽车有限公司 Dirty data recognition methods and device, data cleaning method and device, controller
CN109271379A (en) * 2018-10-11 2019-01-25 北京奇艺世纪科技有限公司 A kind of data clearing method and device
CN109299233A (en) * 2018-09-19 2019-02-01 平安科技(深圳)有限公司 Text data processing method, device, computer equipment and storage medium
CN109492089A (en) * 2018-10-18 2019-03-19 上海连尚网络科技有限公司 Method and apparatus for output information
WO2019052162A1 (en) * 2017-09-15 2019-03-21 平安科技(深圳)有限公司 Method, apparatus and device for improving data cleaning efficiency, and readable storage medium
CN109947770A (en) * 2018-08-14 2019-06-28 武汉斗鱼网络科技有限公司 A kind of data base query method, terminal device and storage medium
CN113326261A (en) * 2021-04-29 2021-08-31 上海淇馥信息技术有限公司 Data blood relationship extraction method and device and electronic equipment
CN117171153A (en) * 2023-09-11 2023-12-05 北京三维天地科技股份有限公司 Visual data cleaning method and system supporting custom cleaning flow

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110457704B (en) * 2019-08-12 2022-11-15 北京明略软件系统有限公司 Target field determination method and device, storage medium and electronic device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650734A (en) * 2009-08-17 2010-02-17 金蝶软件(中国)有限公司 Menu filter method, menu filter device, menu processing system and information processing equipment
CN103514201A (en) * 2012-06-27 2014-01-15 阿里巴巴集团控股有限公司 Method and device for querying data in non-relational database

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040044679A1 (en) * 2002-08-30 2004-03-04 Kuo-Chin Chang System and method for remotely generating reports
CN102542071B (en) * 2012-01-17 2014-02-26 深圳市龙视传媒有限公司 Distributed data processing system and method
CN104268216A (en) * 2014-09-24 2015-01-07 江苏名通信息科技有限公司 Data cleaning system based on internet information
CN107784070B (en) * 2017-09-15 2020-10-30 平安科技(深圳)有限公司 Method, device and equipment for improving data cleaning efficiency

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101650734A (en) * 2009-08-17 2010-02-17 金蝶软件(中国)有限公司 Menu filter method, menu filter device, menu processing system and information processing equipment
CN103514201A (en) * 2012-06-27 2014-01-15 阿里巴巴集团控股有限公司 Method and device for querying data in non-relational database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
焦昂: "针对包含异常值数据的优化K-MEANS聚类算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
用户MECHESS: "sql写where字句后面的字段顺序是不是要与建立索引的顺序一样查询会快些", 《百度知道》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019052162A1 (en) * 2017-09-15 2019-03-21 平安科技(深圳)有限公司 Method, apparatus and device for improving data cleaning efficiency, and readable storage medium
CN108984708A (en) * 2018-07-06 2018-12-11 蔚来汽车有限公司 Dirty data recognition methods and device, data cleaning method and device, controller
CN108984708B (en) * 2018-07-06 2022-02-01 蔚来(安徽)控股有限公司 Dirty data identification method and device, data cleaning method and device, and controller
CN109947770A (en) * 2018-08-14 2019-06-28 武汉斗鱼网络科技有限公司 A kind of data base query method, terminal device and storage medium
CN109299233A (en) * 2018-09-19 2019-02-01 平安科技(深圳)有限公司 Text data processing method, device, computer equipment and storage medium
CN109299233B (en) * 2018-09-19 2024-03-01 平安科技(深圳)有限公司 Text data processing method, device, computer equipment and storage medium
CN109271379A (en) * 2018-10-11 2019-01-25 北京奇艺世纪科技有限公司 A kind of data clearing method and device
CN109492089A (en) * 2018-10-18 2019-03-19 上海连尚网络科技有限公司 Method and apparatus for output information
CN113326261A (en) * 2021-04-29 2021-08-31 上海淇馥信息技术有限公司 Data blood relationship extraction method and device and electronic equipment
CN113326261B (en) * 2021-04-29 2024-03-08 奇富数科(上海)科技有限公司 Data blood relationship extraction method and device and electronic equipment
CN117171153A (en) * 2023-09-11 2023-12-05 北京三维天地科技股份有限公司 Visual data cleaning method and system supporting custom cleaning flow

Also Published As

Publication number Publication date
WO2019052162A1 (en) 2019-03-21
CN107784070B (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN107784070A (en) A kind of method, apparatus and equipment for improving data cleansing efficiency
CN107229730A (en) Data query method and device
CN108388675A (en) Circulation method and terminal device are drawn in a kind of identity
CN108304522A (en) Comparison method, device and the terminal device of difference between a kind of database
CN104298736B (en) Data acquisition system connection method, device and Database Systems
US20210233027A1 (en) Method for conducting statistics on insurance type state information of policy, terminal device and storage medium
CN107728878A (en) Display methods, terminal and the computer-readable recording medium of application icon
CN108255909A (en) Tables of data backup method and server based on oracle database
CN107818168A (en) Topic searching method, device and equipment
CN112100219A (en) Report generation method, device, equipment and medium based on database query processing
CN108764633A (en) A kind of method for allocating tasks, system and terminal device
CN110909129B (en) Abnormal complaint event identification method and device
CN108228634A (en) A kind of data processing method and device
CN109189790A (en) Data managing method, device, computer equipment and storage medium
CN108197338A (en) A kind of browser bookmark generation method, system and terminal device
CN107491484A (en) A kind of data matching method, device and equipment
CN107451204A (en) A kind of data query method, apparatus and equipment
CN109558462A (en) Data statistical approach and device
CN107528969A (en) Management method, managing device and the terminal device of telephone call time
CN111695077A (en) Asset information pushing method, terminal equipment and readable storage medium
CN109462635B (en) Information pushing method, computer readable storage medium and server
CN109377391B (en) Information tracking method, storage medium and server
CN109450963A (en) Information push method and terminal device
CN114116799A (en) Abnormal transaction loop identification method, device, terminal and storage medium
CN112948460A (en) Method and device for screening network flow data and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant