CN107784070A - A kind of method, apparatus and equipment for improving data cleansing efficiency - Google Patents
A kind of method, apparatus and equipment for improving data cleansing efficiency Download PDFInfo
- Publication number
- CN107784070A CN107784070A CN201710834301.8A CN201710834301A CN107784070A CN 107784070 A CN107784070 A CN 107784070A CN 201710834301 A CN201710834301 A CN 201710834301A CN 107784070 A CN107784070 A CN 107784070A
- Authority
- CN
- China
- Prior art keywords
- inquiry
- data
- key word
- data cleansing
- field
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24549—Run-time optimisation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
A kind of method for improving data cleansing efficiency includes:Data cleansing request is received, the data cleansing request includes the key word of the inquiry of user's input;The identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the ratio of inquiry field total number of word, determines the key word of the inquiry with inquiring about the similarity of field, the inquiry field according to corresponding to the similarity determines the key word of the inquiry;The search order of inquiry field in default joint index, is ranked up to the key word of the inquiry;According to the key word of the inquiry after sequence, data are inquired about and cleaned.Because user can any input inquiry keyword, you can it is quickly automatic to carry out data search, be advantageous to improve the convenience searched, save and search the time, improve cleaning efficiency.
Description
Technical field
The invention belongs to data processing field, more particularly to a kind of method, apparatus and equipment of data cleansing efficiency.
Background technology
In insurance industry, need to clean data at regular intervals, i.e., data are carried out examining and verifying again
Process, it is therefore intended that delete duplicate message, correct existing for mistake, and provide data consistency.Such as the middle of the month of every month
And the end of month, the cleaning to insuring payout data etc..Because the dividend date of different insurances differs, and the quantity of insurance policy is huge
Greatly.For example the quantity for monthly insuring dividend of Ping An Insurance Company is up to 300,000,000 a plurality of, also, over time, monthly increase
Up to as many as millions of.
, it is necessary to which tester carries out multiple lookup to data and to score when being cleaned to so huge data
Analysis, to find the exception in stored data.Because data volume is too big, cause what needs during tester's searching data consumed
Time is very long, searches specification and to search trouble, search efficiency is very low.
The content of the invention
In view of this, the embodiment of the present application provides the method, apparatus and equipment for improving data cleansing efficiency, existing to solve
Having because data volume is too big, causes the time that needs consume during tester's searching data very long in technology, search trouble, and
And the low-down problem of efficiency.
The first aspect of the embodiment of the present application provides a kind of method for improving data cleansing efficiency, and the raising data are clear
Washing the method for efficiency includes:
Data cleansing request is received, the data cleansing request includes the key word of the inquiry of user's input;
The identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the ratio of inquiry field total number of word, it is determined that
The key word of the inquiry and the similarity of inquiry field, the inquiry according to corresponding to the similarity determines the key word of the inquiry
Field;
The search order of inquiry field in default joint index, is ranked up to the key word of the inquiry;
According to the key word of the inquiry after sequence, data are inquired about and cleaned.
It is described that data are carried out with inquiry and clear with reference in a first aspect, in the first possible implementation of first aspect
The step of washing includes:
The data result of the similar traffic of lookup is counted, obtains the ratio shared by the item number of different pieces of information result
Value;
By the ratio value of statistics compared with predetermined proportion threshold value, ratio value of the record less than predetermined proportion threshold value
The odd numbers of corresponding data result.
With reference in a first aspect, may be described that data are carried out with inquiry and clear in implementation at second of first aspect
The step of washing includes:
Set the normal data result corresponding to the key word of the inquiry of input;
By data result corresponding to the key word of the inquiry compared with normal data result, there is the number of difference in record
According to the odd numbers corresponding to result.
It is described that data are carried out with inquiry and clear with reference in a first aspect, in the third possible implementation of first aspect
The step of washing includes:
Obtain request time when receiving the data cleansing request;
According to default request time and the corresponding relation of scheduled time slot data, search corresponding to the data cleansing request
Scheduled time slot data;
The scheduled time slot data are analyzed.
Second of possible realization side of the first possible implementation, first aspect with reference to first aspect, first aspect
The possible implementation of the third of formula or first aspect, in the 4th kind of possible implementation of first aspect, methods described is also
Including:
Count the enquiry frequency of the inquiry field corresponding to the key word of the inquiry;
According to the enquiry frequency of the inquiry field, the order of the inquiry field in the joint index is adjusted.
The second aspect of the embodiment of the present application provides a kind of device for improving data cleansing efficiency, and the raising data are clear
Washing the device of efficiency includes:
Request reception unit, for receiving data cleansing request, the data cleansing request includes the inquiry of user's input
Keyword;
Inquiry word segment search unit, inquiry is accounted for for the identical number of words according to possessed by key word of the inquiry and inquiry field
The ratio of field total number of word, the key word of the inquiry is determined with inquiring about the similarity of field, according to determining the similarity
Inquiry field corresponding to key word of the inquiry;
Sequencing unit, it is crucial to the inquiry for the search order of the inquiry field in default joint index
Word is ranked up;
Cleaning unit, for according to the key word of the inquiry after sequence, being inquired about data and being cleaned.
With reference to second aspect, in the first possible implementation of second aspect, the cleaning unit includes:
Ratio value determination subelement, for the data result of the similar traffic of lookup to be counted, obtain different pieces of information
As a result the ratio value shared by item number;
First odd numbers records subelement, for the ratio value of statistics compared with predetermined proportion threshold value, to be recorded small
The odd numbers of the data result corresponding to ratio value in predetermined proportion threshold value.
With reference to second aspect, in second of possible implementation of second aspect, the cleaning unit includes:
Normal data result sets subelement, for setting the normal data result corresponding to the key word of the inquiry of input;
Second odd numbers records subelement, for data result corresponding to the key word of the inquiry to be entered with normal data result
Row compares, and the odd numbers corresponding to the data result of difference occurs in record.
The third aspect of the embodiment of the present application provides a kind of equipment for improving data cleansing efficiency, including memory, place
Manage device and be stored in the computer program that can be run in the memory and on the processor, it is characterised in that be described
The step for the method that data cleansing efficiency is improved as described in any one of first aspect is realized described in computing device during computer program
Suddenly.
The fourth aspect of the embodiment of the present application provides a kind of computer-readable recording medium, the computer-readable storage
Media storage has computer program, it is characterised in that realizes that first aspect such as is appointed when the computer program is executed by processor
One it is described improve data cleansing efficiency method the step of.
Existing beneficial effect is the embodiment of the present application compared with prior art:Receiving data cleansing asks what is included to look into
Keyword is ask, according to key word of the inquiry and the corresponding relation of inquiry field, the key word of the inquiry is ranked up, so as to basis
The data that keyword after sequence quickly finds needs in inquiry field accordingly are cleaned.During due to data cleansing
Can arbitrarily be inputted by user needs the data inquired about, and user can any input inquiry keyword, you can quickly enters automatically
Row data search, be advantageous to improve the convenience searched, save and search the time, improve cleaning efficiency.
Brief description of the drawings
, below will be to embodiment or description of the prior art in order to illustrate more clearly of the technical scheme in the embodiment of the present application
In the required accompanying drawing used be briefly described, it should be apparent that, drawings in the following description be only the present invention some
Embodiment, for those of ordinary skill in the art, without having to pay creative labor, can also be according to these
Accompanying drawing obtains other accompanying drawings.
Fig. 1 is the implementation process schematic diagram of the method for the raising data cleansing efficiency that the embodiment of the present application provides;
Fig. 2 is the implementation process schematic diagram of the method for the raising data cleansing efficiency that the embodiment of the present application provides;
Fig. 3 is the implementation process schematic diagram of the method for the raising data cleansing efficiency that the embodiment of the present application provides;
Fig. 4 is the implementation process schematic diagram of the method for the raising data cleansing efficiency that the embodiment of the present application provides;
Fig. 5 is the schematic diagram of the device for the raising data cleansing efficiency that the embodiment of the present application provides;
Fig. 6 is the schematic diagram of the equipment for the raising data cleansing efficiency that the embodiment of the present application provides.
Embodiment
In describing below, in order to illustrate rather than in order to limit, it is proposed that such as tool of particular system structure, technology etc
Body details, thoroughly to understand the embodiment of the present application.However, it will be clear to one skilled in the art that there is no these specific
The present invention can also be realized in the other embodiments of details.In other situations, omit to well-known system, device, electricity
Road and the detailed description of method, in case unnecessary details hinders description of the invention.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
The implementation process schematic diagram of the method for the raising data cleansing efficiency provided as shown in Figure 1 for the embodiment of the present application,
Details are as follows:
In step S101, data cleansing request is received, the inquiry that the data cleansing request includes user's input is crucial
Word.
Specifically, the data cleansing request, can be that the abnormal data that user triggers after input inquiry keyword is looked into
Ask request.
The key word of the inquiry, can be the content that inquiry word section includes.Such as the inquiry field for Business Name
In, including multiple specific Business Names, the key word of the inquiry can be specific in the inquiry field in Business Name
Business Name.
The data cleansing refers to be used to filter those undesirable data, and the result of filtering is searched and recorded,
In order to be confirmed whether to filter out, or by being extracted again after service unit amendment.Undesirable data are mainly
There are incomplete data, the data of mistake, data of repetition etc..
For the very big data cleansing of data volume, data can be divided according to predetermined time period, at one
In period, cleaned for one section of fixed historical data, when avoiding in real time to data cleansing, by data increase institute band
The data cleansing number come dramatically increases, and avoids the multiplicating to data from cleaning.
In step s 102, the identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the inquiry total word of field
Several ratios, the key word of the inquiry is determined with inquiring about the similarity of field, determine that the inquiry is crucial according to the similarity
Inquiry field corresponding to word.
Specifically, the key word of the inquiry and the corresponding relation of inquiry field, can obtain the tool included by inquiry field
Body title, establish the corresponding relation of the inquiry field and all specific names.When user input query keyword, by institute
State key word of the inquiry to be matched with specific name, when both similarities exceed certain value, it may be determined that the inquiry of input is closed
Inquiry field corresponding to key word.
The key word of the inquiry and the similarity of the specific name under inquiry field, can be according to specific name and inquiry word
Section identical number of words, the ratio with the total number of word of specific name, it is determined that both similarities.It can be looked into by similarity threshold
The more similar multiple specific names of key word of the inquiry with input are found out, closely obtain dependency number corresponding with specific name
According to.
Or the first similarity that can also be determined according to the number of words of key word of the inquiry, with reference to the meaning of word of key word of the inquiry
With the second similarity of the meaning of word for inquiring about field, the inquiry field corresponding to the key word of the inquiry is determined.The inquiry is crucial
Word, the meaning of word for inquiring about field, can preset, according to key word of the inquiry and the corresponding relation of inquiry field and the meaning of word, it is determined that
The meaning of word corresponding to key word of the inquiry, it can also determine to inquire about the meaning of word of field.
The inquiry field, represent the column data stored in table data store.In the table data store, institute
Including every column data, respectively correspond to one inquiry field, such as can include sequence number inquiry field, name query field, industry
Service type inquiry field, data result inquiry field etc..
In step s 103, the search order of the inquiry field in default joint index, it is crucial to the inquiry
Word is ranked up.
Beforehand through the mode of joint index, tables of data to be cleaned is stored.The joint index includes inquiry
The search order of field.It is corresponding to adjust according to the sequencing information of the inquiry field in joint index by way of joint index
With corresponding key word of the inquiry, the result data for finding target that can be more accurately and quickly.
For example six inquiry fields of A, B, C, D, E, F are stored with tables of data to be cleaned, and inquire about the order of field
It is followed successively by F, E, D, C, B, A.When the key word of the inquiry of user's input corresponds to E-F, D-E-F, C-D-E-F, B-C-D-E-F, A-B-
During C-D-E-F, according to the ranking results of inquiry field corresponding inquiry field can be asked to be ranked up the data cleansing
For:F-E、F-E-D、F-E-D-C、F-E-D-C-B、F-E-D-C-B-A.When the inquiry corresponding to the key word of the inquiry of user's input
Field when the pars intermedia for inquiring about field sequence list lacks partial query field, supplement by the inquiry field that automatically can lack this
For whole lookups, so as to further improve the adaptability of inquiry or flexibility.
Furthermore it is also possible to selected Query Result when inputting default key word of the inquiry according to user, according to Different Results
Selection number, the data all searched are ranked up according to the number of selection, in use according to selection result
Continuous Optimal scheduling, it is easy to user effectively to find required data.
In step S104, according to the key word of the inquiry after sequence, data are inquired about and cleaned.
When carrying out data query according to the key word of the inquiry after sequence, it can be entered by the difference of the result data according to inquiry
Row analysis, abnormal result data is recorded as by the result data having differences of small scale.Or it can also be closed according to inquiry
Normal data result corresponding to the setting of key word, the data result of the data result of inquiry and standard is contrasted, you can it is determined that
There is abnormal data result.
The key word of the inquiry included by receiving data cleansing request, it is corresponding with inquiry field according to key word of the inquiry
Relation, the key word of the inquiry is ranked up, so as to inquired about accordingly in field quickly according to the keyword after sequence
The data for finding needs are cleaned.The data for needing to inquire about, user can be arbitrarily inputted during due to data cleansing by user
Can any input inquiry keyword, you can it is quickly automatic to carry out data search, be advantageous to improve the convenience searched, save
The time is searched, improves cleaning efficiency.
Fig. 2 is the implementation process for the method that another that the embodiment of the present application provides improves data cleansing efficiency, is described in detail such as
Under:
In step s 201, data cleansing request is received, the inquiry that the data cleansing request includes user's input is crucial
Word.
In step S202, the identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the inquiry total word of field
Several ratios, the key word of the inquiry is determined with inquiring about the similarity of field, determine that the inquiry is crucial according to the similarity
Inquiry field corresponding to word.
In step S203, the search order of the inquiry field in default joint index is crucial to the inquiry
Word is ranked up.
Step S101-S103 described in step S201-S203 and Fig. 1 is essentially identical.
In step S204, according to the key word of the inquiry after sequence, the data result of the similar traffic of lookup is united
Meter, obtains the ratio value shared by the item number of different pieces of information result.
The key word of the inquiry inputted according to user, after matching corresponding inquiry field, key word of the inquiry is ranked up, root
According to the key word of the inquiry after sequence, inquired about in tables of data, find the data result corresponding to key word of the inquiry.By root
It is compared and analyzes according to data, can obtains the quantity of the data result of different values, and the data result of different value
Quantity accounts for the ratio of the quantity of total data result.
Such as include tri- kinds of X, Y, Z by inquiring about obtained data result, and the quantity of every kind of data result be 588,
658th, 54, then, the ratio value shared by X, Y, Z data result is respectively:588/1300、658/1300、54/1300.
In step S205, by the ratio value of statistics compared with predetermined proportion threshold value, record is less than predetermined ratio
The odd numbers of data result corresponding to the ratio value of example threshold value.
The less characteristic of probability occurred according to abnormal data result, can be with setting ratio threshold value, such as 5% etc..Will compare
The data result that example value is less than default proportion threshold value is recorded as the data result of exception.Pass through the side of compared proportions value
Formula, abnormal data can be solved automatically, the intelligent of the abnormal result data of system searching can be improved and search effect
Rate.
Fig. 3 is the implementation process of the another method for improving data cleansing efficiency provided in an embodiment of the present invention, and details are as follows:
In step S301, data cleansing request is received, the inquiry that the data cleansing request includes user's input is crucial
Word.
In step s 302, the identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the inquiry total word of field
Several ratios, the key word of the inquiry is determined with inquiring about the similarity of field, determine that the inquiry is crucial according to the similarity
Inquiry field corresponding to word.
In step S303, the search order of the inquiry field in default joint index is crucial to the inquiry
Word is ranked up.
Step S101-S103 described in step S301-S303 and Fig. 1 is essentially identical.
In step s 304, the normal data result corresponding to the key word of the inquiry of input is set;
According to the key word of the inquiry after sequence, data result corresponding to the key word of the inquiry is entered with normal data result
Row compares, and the odd numbers corresponding to the data result of difference occurs in record.
The normal data result can be set by staff, or can also be according to the data knot of system-computed
The maximum data result of ratio is selected in fruit, as normal data result.By the normal data result and other data results
It is compared, so as to quickly position abnormal data result, particularly when abnormal data result is including a variety of and abnormal
When the quantity of data result is more, using the lookup mode, be advantageous to improve search efficiency.
Fig. 4 is the implementation process of the another method for improving data cleansing efficiency provided in an embodiment of the present invention, and details are as follows:
In step S401, data cleansing request is received, the inquiry that the data cleansing request includes user's input is crucial
Word.
In step S402, the identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the inquiry total word of field
Several ratios, the key word of the inquiry is determined with inquiring about the similarity of field, determine that the inquiry is crucial according to the similarity
Inquiry field corresponding to word.
In step S403, the search order of the inquiry field in default joint index is crucial to the inquiry
Word is ranked up.
Step S101-S103 described in step S401-S403 and Fig. 1 is essentially identical.
In step s 404, request time when receiving the data cleansing request is obtained.
The request time for receiving the data cleansing request, can be the time that user triggers the cleaning request.
If user triggers the cleaning request, it is set as that predetermined point of time carries out data cleansing, then during the data cleansing request
Request time, can be logarithm according to time when being cleaned.
In step S405, according to default request time and the corresponding relation of scheduled time slot data, the data are searched
Scheduled time slot data corresponding to cleaning request.
The request time and the corresponding relation of scheduled time slot data, can set according to cleaning requirement.Such as can be with
When No. 1-No. 19 be set in every month carry out data cleansing, the execution date recorded is the end of month of last month, corresponding
Data are last month 20 to the data between the end of month.To during the end of month progress data cleansing, what is recorded holds No. 20 in every month
The row date is of that month No. 20, and corresponding data are the data of No. 1 to No. 19, such as, the day that the execution time performs with record
Phase, data corresponding relation can be with as shown in the table:
Certainly, above-mentioned execution date and period are same, simply one of which dividing mode.
In step S406, the scheduled time slot data are analyzed.
By the way that the data of scheduled time slot and scavenging period are established into corresponding relation so that user when carrying out data scrubbing,
Identical data can be carried out with repeatedly cleaning verification at different time points, and can avoid introducing different pieces of information to cleaning tape
To disturb, be advantageous to improve the accuracy rate of cleaning.
In addition, a kind of embodiment as the application optimization, methods described can also include the inquiry word to combining rope
The step of order of section optimizes, including:
Count the enquiry frequency of the inquiry field corresponding to the key word of the inquiry;
According to the enquiry frequency of the inquiry field, the order of the inquiry field in the joint index is adjusted.
By being counted to the key word of the inquiry that user inputs, it is crucial that the most inquiry of the number of user input can be recorded
Word, the inquiry field corresponding to the most keyword of number is emitted on forward position, enabled a user to quicker
Find required data, further improve search efficiency.
It should be understood that the size of the sequence number of each step is not meant to the priority of execution sequence, each process in above-described embodiment
Execution sequence should determine that the implementation process without tackling the embodiment of the present application forms any limit with its function and internal logic
It is fixed.
Fig. 5 is the structural representation of the device for the raising data cleansing efficiency that the embodiment of the present application provides, and details are as follows:
The herein described device for improving data cleansing efficiency, including:
Request reception unit 501, for receiving data cleansing request, the data cleansing request includes looking into for user's input
Ask keyword;
Inquiry word segment search unit 502, for according to key word of the inquiry with inquiry field possessed by identical number of words account for
The ratio of field total number of word is inquired about, determines that the key word of the inquiry with inquiring about the similarity of field, determines according to the similarity
Inquiry field corresponding to the key word of the inquiry;
Sequencing unit 503, for the search order of the inquiry field in default joint index, the inquiry is closed
Key word is ranked up;
Cleaning unit 504, for according to the key word of the inquiry after sequence, being inquired about data and being cleaned.
Preferably, the cleaning unit includes:
Ratio value determination subelement, for the data result of the similar traffic of lookup to be counted, obtain different pieces of information
As a result the ratio value shared by item number;
First odd numbers records subelement, for the ratio value of statistics compared with predetermined proportion threshold value, to be recorded small
The odd numbers of the data result corresponding to ratio value in predetermined proportion threshold value.
Preferably, the cleaning unit includes:
Normal data result sets subelement, for setting the normal data result corresponding to the key word of the inquiry of input;
Second odd numbers records subelement, for data result corresponding to the key word of the inquiry to be entered with normal data result
Row compares, and the odd numbers corresponding to the data result of difference occurs in record.
The equipment that data cleansing efficiency is improved described in Fig. 5, the method pair with the raising data cleansing efficiency described in Fig. 1-4
Should.
Fig. 6 is the schematic diagram of the equipment for the raising data cleansing efficiency that one embodiment of the invention provides.As shown in fig. 6, should
The equipment 6 of the raising data cleansing efficiency of embodiment includes:Processor 60, memory 61 and it is stored in the memory 61
And the computer program 62 that can be run on the processor 60, such as improve the program of data cleansing efficiency.The processor
The step in above-mentioned each embodiment of the method for improving data cleansing efficiency is realized during the 60 execution computer program 62, such as
Step 101 shown in Fig. 1 is to 104.Or the processor 60 realizes that above-mentioned each device is real when performing the computer program 62
Apply the function of each module/unit in example, such as the function of module 501 to 504 shown in Fig. 5.
Exemplary, the computer program 62 can be divided into one or more module/units, it is one or
Multiple module/units are stored in the memory 61, and are performed by the processor 60, to complete the present invention.Described one
Individual or multiple module/units can be the series of computation machine programmed instruction section that can complete specific function, and the instruction segment is used for
Implementation procedure of the computer program 62 in the equipment 6 of the raising data cleansing efficiency is described.For example, the computer
It is specific that program 62 can be divided into request reception unit, inquiry word segment search unit, sequencing unit and cleaning unit, each unit
Function is as follows:
Request reception unit, for receiving data cleansing request, the data cleansing request includes the inquiry of user's input
Keyword;
Inquiry word segment search unit, inquiry is accounted for for the identical number of words according to possessed by key word of the inquiry and inquiry field
The ratio of field total number of word, the key word of the inquiry is determined with inquiring about the similarity of field, according to determining the similarity
Inquiry field corresponding to key word of the inquiry;
Sequencing unit, it is crucial to the inquiry for the search order of the inquiry field in default joint index
Word is ranked up;
Cleaning unit, for according to the key word of the inquiry after sequence, being inquired about data and being cleaned.
The equipment 6 for improving data cleansing efficiency can be desktop PC, notebook, palm PC and high in the clouds clothes
The computing devices such as business device.The equipment for improving data cleansing efficiency may include, but be not limited only to, processor 60, memory 61.
It will be understood by those skilled in the art that Fig. 6 is only the example for the equipment 6 for improving data cleansing efficiency, do not form to improving
The restriction of the equipment 6 of data cleansing efficiency, parts more more or less than diagram, or some parts of combination can be included, or
The different part of person, such as the equipment for improving data cleansing efficiency can also include input-output equipment, network insertion is set
Standby, bus etc..
Alleged processor 60 can be CPU (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other PLDs, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor
Deng.
The memory 61 can be the internal storage unit of the equipment 6 of the raising data cleansing efficiency, such as improve
The hard disk or internal memory of the equipment 6 of data cleansing efficiency.The memory 61 can also be setting for the raising data cleansing efficiency
The plug-in type hard disk being equipped with standby 6 External memory equipment, such as the equipment 6 for improving data cleansing efficiency, intelligent storage
Block (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..
Further, the memory 61 can also both include it is described raising data cleansing efficiency equipment 6 internal storage unit or
Including External memory equipment.The memory 61 is used to storing the computer program and described improves data cleansing efficiency
Other programs and data needed for equipment.The memory 61, which can be also used for temporarily storing, have been exported or will export
Data.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work(
Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different
Functional unit, module are completed, i.e., the internal structure of described device are divided into different functional units or module, more than completion
The all or part of function of description.Each functional unit, module in embodiment can be integrated in a processing unit, also may be used
To be that unit is individually physically present, can also two or more units it is integrated in a unit, it is above-mentioned integrated
Unit can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.In addition, each function list
Member, the specific name of module are not limited to the protection domain of the application also only to facilitate mutually distinguish.Said system
The specific work process of middle unit, module, the corresponding process in preceding method embodiment is may be referred to, will not be repeated here.
In the above-described embodiments, the description to each embodiment all emphasizes particularly on different fields, and is not described in detail or remembers in some embodiment
The part of load, it may refer to the associated description of other embodiments.
Those of ordinary skill in the art are it is to be appreciated that the list of each example described with reference to the embodiments described herein
Member and algorithm steps, it can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
Performed with hardware or software mode, application-specific and design constraint depending on technical scheme.Professional and technical personnel
Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed
The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed device/terminal device and method, can be with
Realize by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute
The division of module or unit is stated, only a kind of division of logic function, there can be other dividing mode when actually realizing, such as
Multiple units or component can combine or be desirably integrated into another system, or some features can be ignored, or not perform.Separately
A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be by some interfaces, device
Or INDIRECT COUPLING or the communication connection of unit, can be electrical, mechanical or other forms.
The unit illustrated as separating component can be or may not be physically separate, show as unit
The part shown can be or may not be physical location, you can with positioned at a place, or can also be distributed to multiple
On NE.Some or all of unit therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, can also
That unit is individually physically present, can also two or more units it is integrated in a unit.Above-mentioned integrated list
Member can both be realized in the form of hardware, can also be realized in the form of SFU software functional unit.
If the integrated module/unit realized in the form of SFU software functional unit and as independent production marketing or
In use, it can be stored in a computer read/write memory medium.Based on such understanding, the present invention realizes above-mentioned implementation
All or part of flow in example method, by computer program the hardware of correlation can also be instructed to complete, described meter
Calculation machine program can be stored in a computer-readable recording medium, and the computer program can be achieved when being executed by processor
The step of stating each embodiment of the method..Wherein, the computer program includes computer program code, the computer program
Code can be source code form, object identification code form, executable file or some intermediate forms etc..Computer-readable Jie
Matter can include:Can carry any entity or device of the computer program code, recording medium, USB flash disk, mobile hard disk,
Magnetic disc, CD, computer storage, read-only storage (ROM, Read-Only Memory), random access memory (RAM,
Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It is it should be noted that described
The content that computer-readable medium includes can carry out appropriate increasing according to legislation in jurisdiction and the requirement of patent practice
Subtract, such as in some jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and
Telecommunication signal.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to foregoing reality
Example is applied the present invention is described in detail, it will be understood by those within the art that:It still can be to foregoing each
Technical scheme described in embodiment is modified, or carries out equivalent substitution to which part technical characteristic;And these are changed
Or replace, the essence of appropriate technical solution is departed from the spirit and scope of various embodiments of the present invention technical scheme, all should
Within protection scope of the present invention.
Claims (10)
- A kind of 1. method for improving data cleansing efficiency, it is characterised in that the method for improving data cleansing efficiency includes:Data cleansing request is received, the data cleansing request includes the key word of the inquiry of user's input;The identical number of words according to possessed by key word of the inquiry with inquiry field accounts for the ratio of inquiry field total number of word, it is determined that described Key word of the inquiry and the similarity of inquiry field, the inquiry word according to corresponding to the similarity determines the key word of the inquiry Section;The search order of inquiry field in default joint index, is ranked up to the key word of the inquiry;According to the key word of the inquiry after sequence, data are inquired about and cleaned.
- 2. the method according to claim 1 for improving data cleansing efficiency, it is characterised in that described that data are inquired about Include with the step of cleaning:The data result of the similar traffic of lookup is counted, obtains the ratio value shared by the item number of different pieces of information result;By the ratio value of statistics compared with predetermined proportion threshold value, record is right less than the ratio value institute of predetermined proportion threshold value The odd numbers for the data result answered.
- 3. the method according to claim 1 for improving data cleansing efficiency, it is characterised in that described that data are inquired about Include with the step of cleaning:Set the normal data result corresponding to the key word of the inquiry of input;By data result corresponding to the key word of the inquiry compared with normal data result, there is the data knot of difference in record Odd numbers corresponding to fruit.
- 4. the method according to claim 1 for improving data cleansing efficiency, it is characterised in that described that data are inquired about Include with the step of cleaning:Obtain request time when receiving the data cleansing request;According to default request time and the corresponding relation of scheduled time slot data, search pre- corresponding to the data cleansing request Timing segment data;The scheduled time slot data are analyzed.
- 5. according to any one of the claim 1-4 methods for improving data cleansing efficiency, it is characterised in that methods described is also wrapped Include:Count the enquiry frequency of the inquiry field corresponding to the key word of the inquiry;According to the enquiry frequency of the inquiry field, the order of the inquiry field in the joint index is adjusted.
- 6. a kind of device for improving data cleansing efficiency, it is characterised in that the device for improving data cleansing efficiency includes:Request reception unit, for receiving data cleansing request, the inquiry that the data cleansing request includes user's input is crucial Word;Inquiry word segment search unit, inquiry field is accounted for for the identical number of words according to possessed by key word of the inquiry and inquiry field The ratio of total number of word, determine that with inquiring about the similarity of field, the inquiry is determined according to the similarity for the key word of the inquiry Inquiry field corresponding to keyword;Sequencing unit, for the search order of the inquiry field in default joint index, the key word of the inquiry is entered Row sequence;Cleaning unit, for according to the key word of the inquiry after sequence, being inquired about data and being cleaned.
- 7. the device according to claim 6 for improving data cleansing efficiency, it is characterised in that the cleaning unit includes:Ratio value determination subelement, for the data result of the similar traffic of lookup to be counted, obtain different pieces of information result Item number shared by ratio value;First odd numbers records subelement, for by the ratio value of statistics, compared with predetermined proportion threshold value, record to be less than pre- The odd numbers of data result corresponding to the ratio value of fixed proportion threshold value.
- 8. the device according to claim 6 for improving data cleansing efficiency, it is characterised in that the cleaning unit includes:Normal data result sets subelement, for setting the normal data result corresponding to the key word of the inquiry of input;Second odd numbers records subelement, for data result corresponding to the key word of the inquiry to be compared with normal data result Compared with the odd numbers corresponding to the data result of difference occurs in record.
- 9. a kind of equipment for improving data cleansing efficiency, including memory, processor and it is stored in the memory and can The computer program run on the processor, it is characterised in that realized described in the computing device during computer program The step of method of data cleansing efficiency is improved as described in any one of claim 1 to 5.
- 10. a kind of computer-readable recording medium, the computer-readable recording medium storage has computer program, and its feature exists In realizing the raising data cleansing efficiency as described in any one of claim 1 to 5 when the computer program is executed by processor The step of method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710834301.8A CN107784070B (en) | 2017-09-15 | 2017-09-15 | Method, device and equipment for improving data cleaning efficiency |
PCT/CN2018/082314 WO2019052162A1 (en) | 2017-09-15 | 2018-04-09 | Method, apparatus and device for improving data cleaning efficiency, and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710834301.8A CN107784070B (en) | 2017-09-15 | 2017-09-15 | Method, device and equipment for improving data cleaning efficiency |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107784070A true CN107784070A (en) | 2018-03-09 |
CN107784070B CN107784070B (en) | 2020-10-30 |
Family
ID=61438075
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710834301.8A Active CN107784070B (en) | 2017-09-15 | 2017-09-15 | Method, device and equipment for improving data cleaning efficiency |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN107784070B (en) |
WO (1) | WO2019052162A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108984708A (en) * | 2018-07-06 | 2018-12-11 | 蔚来汽车有限公司 | Dirty data recognition methods and device, data cleaning method and device, controller |
CN109271379A (en) * | 2018-10-11 | 2019-01-25 | 北京奇艺世纪科技有限公司 | A kind of data clearing method and device |
CN109299233A (en) * | 2018-09-19 | 2019-02-01 | 平安科技(深圳)有限公司 | Text data processing method, device, computer equipment and storage medium |
CN109492089A (en) * | 2018-10-18 | 2019-03-19 | 上海连尚网络科技有限公司 | Method and apparatus for output information |
WO2019052162A1 (en) * | 2017-09-15 | 2019-03-21 | 平安科技(深圳)有限公司 | Method, apparatus and device for improving data cleaning efficiency, and readable storage medium |
CN109947770A (en) * | 2018-08-14 | 2019-06-28 | 武汉斗鱼网络科技有限公司 | A kind of data base query method, terminal device and storage medium |
CN113326261A (en) * | 2021-04-29 | 2021-08-31 | 上海淇馥信息技术有限公司 | Data blood relationship extraction method and device and electronic equipment |
CN117171153A (en) * | 2023-09-11 | 2023-12-05 | 北京三维天地科技股份有限公司 | Visual data cleaning method and system supporting custom cleaning flow |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110457704B (en) * | 2019-08-12 | 2022-11-15 | 北京明略软件系统有限公司 | Target field determination method and device, storage medium and electronic device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101650734A (en) * | 2009-08-17 | 2010-02-17 | 金蝶软件(中国)有限公司 | Menu filter method, menu filter device, menu processing system and information processing equipment |
CN103514201A (en) * | 2012-06-27 | 2014-01-15 | 阿里巴巴集团控股有限公司 | Method and device for querying data in non-relational database |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040044679A1 (en) * | 2002-08-30 | 2004-03-04 | Kuo-Chin Chang | System and method for remotely generating reports |
CN102542071B (en) * | 2012-01-17 | 2014-02-26 | 深圳市龙视传媒有限公司 | Distributed data processing system and method |
CN104268216A (en) * | 2014-09-24 | 2015-01-07 | 江苏名通信息科技有限公司 | Data cleaning system based on internet information |
CN107784070B (en) * | 2017-09-15 | 2020-10-30 | 平安科技(深圳)有限公司 | Method, device and equipment for improving data cleaning efficiency |
-
2017
- 2017-09-15 CN CN201710834301.8A patent/CN107784070B/en active Active
-
2018
- 2018-04-09 WO PCT/CN2018/082314 patent/WO2019052162A1/en active Application Filing
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101650734A (en) * | 2009-08-17 | 2010-02-17 | 金蝶软件(中国)有限公司 | Menu filter method, menu filter device, menu processing system and information processing equipment |
CN103514201A (en) * | 2012-06-27 | 2014-01-15 | 阿里巴巴集团控股有限公司 | Method and device for querying data in non-relational database |
Non-Patent Citations (2)
Title |
---|
焦昂: "针对包含异常值数据的优化K-MEANS聚类算法", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
用户MECHESS: "sql写where字句后面的字段顺序是不是要与建立索引的顺序一样查询会快些", 《百度知道》 * |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019052162A1 (en) * | 2017-09-15 | 2019-03-21 | 平安科技(深圳)有限公司 | Method, apparatus and device for improving data cleaning efficiency, and readable storage medium |
CN108984708A (en) * | 2018-07-06 | 2018-12-11 | 蔚来汽车有限公司 | Dirty data recognition methods and device, data cleaning method and device, controller |
CN108984708B (en) * | 2018-07-06 | 2022-02-01 | 蔚来(安徽)控股有限公司 | Dirty data identification method and device, data cleaning method and device, and controller |
CN109947770A (en) * | 2018-08-14 | 2019-06-28 | 武汉斗鱼网络科技有限公司 | A kind of data base query method, terminal device and storage medium |
CN109299233A (en) * | 2018-09-19 | 2019-02-01 | 平安科技(深圳)有限公司 | Text data processing method, device, computer equipment and storage medium |
CN109299233B (en) * | 2018-09-19 | 2024-03-01 | 平安科技(深圳)有限公司 | Text data processing method, device, computer equipment and storage medium |
CN109271379A (en) * | 2018-10-11 | 2019-01-25 | 北京奇艺世纪科技有限公司 | A kind of data clearing method and device |
CN109492089A (en) * | 2018-10-18 | 2019-03-19 | 上海连尚网络科技有限公司 | Method and apparatus for output information |
CN113326261A (en) * | 2021-04-29 | 2021-08-31 | 上海淇馥信息技术有限公司 | Data blood relationship extraction method and device and electronic equipment |
CN113326261B (en) * | 2021-04-29 | 2024-03-08 | 奇富数科(上海)科技有限公司 | Data blood relationship extraction method and device and electronic equipment |
CN117171153A (en) * | 2023-09-11 | 2023-12-05 | 北京三维天地科技股份有限公司 | Visual data cleaning method and system supporting custom cleaning flow |
Also Published As
Publication number | Publication date |
---|---|
WO2019052162A1 (en) | 2019-03-21 |
CN107784070B (en) | 2020-10-30 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107784070A (en) | A kind of method, apparatus and equipment for improving data cleansing efficiency | |
CN107229730A (en) | Data query method and device | |
CN108388675A (en) | Circulation method and terminal device are drawn in a kind of identity | |
CN108304522A (en) | Comparison method, device and the terminal device of difference between a kind of database | |
CN104298736B (en) | Data acquisition system connection method, device and Database Systems | |
US20210233027A1 (en) | Method for conducting statistics on insurance type state information of policy, terminal device and storage medium | |
CN107728878A (en) | Display methods, terminal and the computer-readable recording medium of application icon | |
CN108255909A (en) | Tables of data backup method and server based on oracle database | |
CN107818168A (en) | Topic searching method, device and equipment | |
CN112100219A (en) | Report generation method, device, equipment and medium based on database query processing | |
CN108764633A (en) | A kind of method for allocating tasks, system and terminal device | |
CN110909129B (en) | Abnormal complaint event identification method and device | |
CN108228634A (en) | A kind of data processing method and device | |
CN109189790A (en) | Data managing method, device, computer equipment and storage medium | |
CN108197338A (en) | A kind of browser bookmark generation method, system and terminal device | |
CN107491484A (en) | A kind of data matching method, device and equipment | |
CN107451204A (en) | A kind of data query method, apparatus and equipment | |
CN109558462A (en) | Data statistical approach and device | |
CN107528969A (en) | Management method, managing device and the terminal device of telephone call time | |
CN111695077A (en) | Asset information pushing method, terminal equipment and readable storage medium | |
CN109462635B (en) | Information pushing method, computer readable storage medium and server | |
CN109377391B (en) | Information tracking method, storage medium and server | |
CN109450963A (en) | Information push method and terminal device | |
CN114116799A (en) | Abnormal transaction loop identification method, device, terminal and storage medium | |
CN112948460A (en) | Method and device for screening network flow data and computer readable storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |