CN110399428A - A kind of data verification method, device and electronic equipment - Google Patents

A kind of data verification method, device and electronic equipment Download PDF

Info

Publication number
CN110399428A
CN110399428A CN201910684893.9A CN201910684893A CN110399428A CN 110399428 A CN110399428 A CN 110399428A CN 201910684893 A CN201910684893 A CN 201910684893A CN 110399428 A CN110399428 A CN 110399428A
Authority
CN
China
Prior art keywords
data
field
value
probability
sample data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910684893.9A
Other languages
Chinese (zh)
Other versions
CN110399428B (en
Inventor
赵鸿楠
艾国信
周志成
宋超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201910684893.9A priority Critical patent/CN110399428B/en
Publication of CN110399428A publication Critical patent/CN110399428A/en
Application granted granted Critical
Publication of CN110399428B publication Critical patent/CN110399428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/273Asynchronous replication or reconciliation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The present invention relates to a kind of data verification method, device and electronic equipments, this method comprises: obtaining default check field, the probability that the default check field mistake causes data that mistake occurs meets preset condition;It obtains in the second database from the data to be verified that first database is synchronous;The default check field in the data to be verified is verified.The default check field that the technical solution is only treated in verification data is verified, and the time-consuming higher problem of full word section verification is avoided.And since default check field is to lead to the higher field of error in data probability, these fields are verified, can more quickly, more targetedly find wrong data, improve the efficiency of data check.In addition, also effectively further reducing the access frequency to database.

Description

A kind of data verification method, device and electronic equipment
Technical field
The present invention relates to data processing field more particularly to a kind of data verification methods, device and electronic equipment.
Background technique
When synchrodata among a plurality of databases, it is related to Data Consistency.Although in synchronization system It joined the technology for guaranteeing consistency, but in practice there is still a need for verifying to data consistency, find that data are asked in advance Topic.
In the past, since number of users is fewer, data volume is relatively low, can using daily be timed to data into Row full word section scanned for checkout.However, maintaining sustained and rapid growth with user volume, original full word section method of calibration will lead to verification Efficiency is lower, takes long time, and increases the access pressure of database.
Summary of the invention
In order to solve the above-mentioned technical problem or it at least is partially solved above-mentioned technical problem, the present invention provides a kind of numbers According to method of calibration, device and electronic equipment.
In a first aspect, the present invention provides a kind of data verification methods, comprising:
Default check field is obtained, the probability that the default check field mistake causes data that mistake occurs meets default item Part;
It obtains in the second database from the data to be verified that first database is synchronous;
The default check field in the data to be verified is verified.
Optionally, before obtaining default check field, the method also includes:
The quantity for obtaining sample data is the first numerical value;
Error sample data are extracted from the sample data, the error sample data are the sample number that mistake occurs According to the quantity of the error sample data is second value;
The error sample data are analyzed, determine the field that mistake occurs;
The quantity for counting the corresponding error sample data of the field is third value;
Calculating the field according to first numerical value, second value and third value causes sample data that mistake occurs Probability;
Judge whether the probability meets the preset condition, the probability meet the preset condition include it is following at least A kind of situation: the probability is greater than or equal to predetermined probabilities threshold value, and after the probability is sorted from large to small, and first default Several probability;
The corresponding field of the probability for meeting the preset condition is set as default check field.
Optionally, calculating the field according to first numerical value, second value and third value causes sample data to be sent out The probability of raw mistake, comprising:
Error sample data are calculated shared first in all sample datas according to first numerical value and second value Ratio;
The corresponding error sample data of the field are calculated in the wrong sample of institute according to the second value and third value The second shared ratio in notebook data;
The corresponding error sample data of the field are calculated in all sample numbers according to first numerical value and third value The shared third ratio in;
Based on NB Algorithm is improved, calculated according to first ratio, the second ratio and third ratio described general Rate are as follows:
Wherein, P (a | B) indicates that the probability, P (B) indicate described first Ratio, P (a) expression second ratio, P (B | a) indicate the third ratio.
Optionally, calculating the field according to first numerical value, second value and third value causes sample data to be sent out The probability of raw mistake, comprising:
Obtain the corresponding checkout coefficient of the field;
Calculating the field according to first numerical value, second value, third value and the checkout coefficient leads to sample The probability of mistake occurs for data are as follows:
Wherein, h indicates the checkout coefficient.
Optionally, the method also includes:
When check results are the default check field mistake, the Data Identification of the data to be verified is obtained;
The corresponding initial data of the Data Identification is searched from the first database;
It is modified according to the default check field of the initial data to the data to be verified.
Second aspect, the present invention provides a kind of data calibration devices, comprising:
Field obtains module, and for obtaining default check field, it is wrong that the default check field mistake causes data to occur Probability accidentally meets preset condition;
Data acquisition module, for obtaining in the second database from the data to be verified that first database is synchronous;
Correction verification module, for being verified to the default check field in the data to be verified.
Optionally, described device further include:
Quantity obtains module, and the quantity for before obtaining default check field, obtaining sample data is the first numerical value;
Extraction module, for extracting error sample data from the sample data, the error sample data are to occur The sample data of mistake, the quantity of the error sample data are second value;
Analysis module determines the field that mistake occurs for analyzing the error sample data;
Statistical module, the quantity for counting the corresponding error sample data of the field are third value;
Computing module leads to sample for calculating the field according to first numerical value, second value and third value The probability of mistake occurs for data;
Judgment module, for judging whether the probability meets the preset condition, the probability meets the default item Part comprises at least one of the following situation: the probability is greater than or equal to predetermined probabilities threshold value, and from big to small by the probability After sequence, the probability of preceding predetermined number;
Setting module, for the corresponding field of the probability for meeting the preset condition to be set as the default check word Section.
Optionally, the computing module is specifically used for calculating error sample number according to first numerical value and second value According to shared the first ratio in all sample datas;It is corresponding that the field is calculated according to the second value and third value Error sample data the second ratio shared in all error sample data;It is calculated according to first numerical value and third value The corresponding error sample data of field third ratio shared in all sample datas;It is calculated based on naive Bayesian is improved Method calculates the probability according to first ratio, the second ratio and third ratio are as follows:
Wherein, P (a | B) indicates that the probability, P (B) indicate described first Ratio, P (a) expression second ratio, P (B | a) indicate the third ratio.
Optionally, the computing module is also used to obtain the corresponding checkout coefficient of the field;According to first number Value, second value, third value and the checkout coefficient calculate the probability that the field causes sample data that mistake occurs are as follows:
Wherein, h indicates the checkout coefficient.
Optionally, described device further include:
Identifier acquisition module, for obtaining the number to be verified when check results are the default check field mistake According to Data Identification;
Searching module, for searching the corresponding initial data of the Data Identification from the first database;
Correction module, for being repaired according to the default check field of the initial data to the data to be verified Just.
The third aspect, the present invention provides a kind of electronic equipment, comprising: processor, communication interface, memory and communication are total Line, wherein processor, communication interface, memory complete mutual communication by communication bus;
The memory, for storing computer program;
The processor when for executing computer program, realizes above method step.
Above-mentioned technical proposal provided in an embodiment of the present invention has the advantages that compared with prior art
The default check field only treated in verification data is verified, and the time-consuming higher problem of full word section verification is avoided. And since default check field is to lead to the higher field of error in data probability, these fields are verified, it can be more Quickly, wrong data is more targetedly found, the efficiency of data check is improved.In addition, also effectively further reducing pair The access frequency of database.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.
Fig. 1 is a kind of flow chart of data verification method provided in an embodiment of the present invention;
Fig. 2 be another embodiment of the present invention provides a kind of data verification method flow chart;
Fig. 3 be another embodiment of the present invention provides a kind of data verification method flow chart;
Fig. 4 be another embodiment of the present invention provides a kind of data verification method flow chart;
Fig. 5 is a kind of block diagram of data calibration device provided in an embodiment of the present invention;
Fig. 6 be another embodiment of the present invention provides a kind of data calibration device block diagram;
Fig. 7 be another embodiment of the present invention provides a kind of data calibration device block diagram;
Fig. 8 is the structural schematic diagram of a kind of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.
Full word section verification in the prior art, for huge data volume, takes a long time.Also, due to error number It is smaller according to being accounted in total data, i.e., most of data be all correctly, full word section verify that efficiency is lower and meaning not Greatly.
The technical solution of the embodiment of the present invention, the probability selection for causing data that mistake occurs previously according to field errors are default Check field only verifies the data of default check field in data check.
A kind of data verification method is provided for the embodiments of the invention first below to be introduced.
Method provided by the embodiment of the present invention can be applied to any electronic equipment for needing data check, for example, can Think the electronic equipments such as server, terminal, is not specifically limited herein, for convenience of description, subsequent referred to as electronic equipment.
Fig. 1 is a kind of flow chart of data verification method provided in an embodiment of the present invention.As shown in Figure 1, this method includes Following steps:
Step S11 obtains default check field, and it is pre- that default check field mistake causes the probability of data generation mistake to meet If condition.
The preset condition packet can be probability more than or equal to predetermined probabilities threshold value, or probability is sorted from large to small Afterwards, the probability of preceding predetermined number.
Step S12 is obtained in the second database from the data to be verified that first database is synchronous.
Step S13, the default check field treated in verification data are verified.
When data are synchronized to the second database from first database, in fact it could happen that synchronize in rear second database The inconsistent situation of data in mistake, with first database occurs for data.Therefore, it is necessary in the second database to synchronizing Data verified.When verification, only default check field can be verified.
For example, user data includes many fields, such as user identifier, phone number, log-on message, account information, certification State, historical record etc..Wherein, account information field and authentication state field are the probability for causing user data that mistake occurs Relatively high field, therefore, default check field can be set to account information field and authentication state field.Only to every user Account information field and authentication state field in data are verified.
Wherein, the verification of default check field can be realized by comparing data corresponding in two databases.It is optional , check results can also be sent to the terminal of database maintenance personnel, so as to maintenance personnel can according to check results into Row data correction.
In the embodiment of the present invention, the default check field only treated in verification data is verified, and full word section is avoided to verify Time-consuming higher problem.And since default check field is to lead to the higher field of error in data probability, to these fields into Row verification, can more quickly, more targetedly find wrong data, improve the efficiency of data check.In addition, also further Significantly reduce the access frequency to database.
Fig. 2 be another embodiment of the present invention provides a kind of data verification method flow chart.As shown in Fig. 2, obtaining Before default check field, this method further includes the process that analysis determines default check field, can be realized by following steps:
Step S21, the quantity for obtaining sample data is the first numerical value.
Wherein, sample data can be the historical data in database.
Step S22 extracts error sample data from sample data, wherein error sample data are the sample that mistake occurs Notebook data, the quantity of error sample data are second value.Step S23 analyzes error sample data, determines the word that mistake occurs Section.
Step S24, the quantity of the corresponding error sample data of static fields are third value.
Step S25 causes sample data that mistake occurs according to the first numerical value, second value and third value calculated field Probability.
When calculating probability, it can be calculated using bayesian algorithm is improved.
Step S26, judges whether probability meets preset condition, if so, step S27 is executed, if not, terminating.
The probability meets preset condition and comprises at least one of the following situation: probability is greater than or equal to predetermined probabilities threshold value, with And after probability is sorted from large to small, the probability of preceding predetermined number.
The corresponding field of the probability for meeting preset condition is set as default check field by step S27.
Wherein, the process of step S25 calculating probability is as follows:
Step A1 calculates error sample data shared the in all sample datas according to the first numerical value and second value One ratio;
Step A2, according to second value and the corresponding error sample data of third value calculated field in all error samples The second shared ratio in data;
Step A3, according to the first numerical value and the corresponding error sample data of third value calculated field in all sample datas In shared third ratio;
Step A4 calculates probability according to the first ratio, the second ratio and third ratio based on NB Algorithm is improved Are as follows:
Wherein, P (a | B) indicates that the probability, P (B) indicate described first Ratio, P (a) expression second ratio, P (B | a) indicate the third ratio.
Above-mentioned steps A1, A2, A3 sequentially, can also be carried out simultaneously in no particular order.
The calculating process of probability is illustrated with a specific example below.
For example, the quantity of sample data is 10000, the error sample data for receiving report barrier record are 100.By analyzing, The field that mistake occurs is respectively a1, a2, a3.Wherein, the corresponding error sample data of a1 are the corresponding error sample number of 20, a2 It is 50 according to for the corresponding error sample data of 30, a3.
Calculate error sample data ratio shared in all sample datas are as follows: P (B)=1%.
It calculates the corresponding error sample data of each field ratio shared in all error sample data and is respectively as follows: P (B | a1)=20%, P (B | a2)=30%, P (B | a3)=50%.
It calculates the corresponding error sample data of each field ratio shared in all sample datas and is respectively as follows: P (a1) =0.2%, P (a2)=0.3%, P (a3)=0.5%.
Therefore, based on improvement NB AlgorithmIt calculates and mistake occurs The probability that field causes sample data that mistake occurs is respectively as follows:
Further, step S25 can also include: to obtain the corresponding checkout coefficient of field;According to the first numerical value, the second number Value, third value and checkout coefficient calculated field cause sample data that wrong probability occurs are as follows:
Wherein, h indicates checkout coefficient.
Checkout coefficient is the coefficient previously according to the attention rate setting to field, the size of checkout coefficient and the pass to field Note degree is directly proportional, i.e., attention rate is higher, and checkout coefficient numerical value is bigger.For example, its checkout coefficient is set for the field being not concerned with It is 0;For the not high field of attention rate, the number its checkout coefficient being set as between 0~1;Field higher for attention rate, will Its checkout coefficient is set as the number greater than 1.
In addition, the attention rate to field is also related with the importance of field and its influence to system business, for example, In It is more important to the field for embodying user authentication status in user data, once malfunctioning, it is affected to system business, because The checkout coefficient of the field can be set as being greater than 1 by this.
For example, in above-mentioned specific example, these three fields of a1, a2, a3 corresponding checkout coefficient h1=0, h2=0.8, h3 =1.5.
The probability that then these three fields cause sample data that mistake occurs is respectively as follows:
It can be seen that field a1 is not significant field, the influence to malfunction to system business is very low, therefore, not to its into Row verification.When selecting default check field, it can choose a2 and a3 the two fields as the field verified.
It is time-consuming in order to further decrease verification in another embodiment of the present invention, the data volume of data to be verified can be set Threshold value, for example, only verifying 2,000,000 datas daily.Above-mentioned steps S12 includes: to obtain the first preset threshold;From second number According to the data to be verified selected in library from synchronous first preset threshold of first database.
Fig. 3 be another embodiment of the present invention provides a kind of data verification method flow chart.As shown in figure 3, the data Method of calibration the following steps are included:
Step S31 takes out data to be verified according to default check field from the second database.
Since data not all in database all have the default check field, extracting number to be verified According to when, can choose the data that check field is preset with this.Further, since default check field is modified, illustrate it A possibility that mistake occurs is bigger, therefore can choose data that default check field is modified as data to be verified.
Step S32 judges the data volume of data to be verified whether more than the first preset threshold, if so, executing step S33, if not, executing step S34.
The data to be verified currently verified are truncated in step S33.
Step S34, the default check field for treating verification data are verified.
Step S35, output verification result.
In addition, one field of every verification is needed to data scanning one time since default check field can have multiple, that 2 fields are verified, then need to check data 2 times.The data volume actually verified is equivalent to the quantity of data to be verified and pre- If the product of the number of check field.
In another embodiment of the present invention, the data-quantity threshold of verification data can also be set.Above-mentioned steps S13 includes:
Obtain the second preset threshold;
Statistics verification data volume, when the number of the default check field is greater than 1, one default check word of every verification Section, the verification data volume add 1;
When the verification data volume is greater than second preset threshold, stop verification.
For example, the second preset threshold of setting is 5,000,000.One default check field of every verification, verification data volume add 1.When When verifying some default check field of some data, the data volume for verifying data reaches 5,000,000, then carries out truncation, Stop subsequent verification.
In the present embodiment, limitation carries out verification data volume, avoid a large amount of data check to the occupancy of system resource and Processing of the influence system to other work.
In another embodiment, data can be corrected mistake automatically.Fig. 4 be another embodiment of the present invention provides a kind of number According to the flow chart of method of calibration.As shown in figure 4, this method is further comprising the steps of:
Step S41 obtains the Data Identification of data to be verified when check results are default check field mistake;
Step S42 searches the corresponding initial data of Data Identification from first database;
Step S43 is modified according to the default check field that initial data treats verification data.
In the present embodiment, wrong data is modified automatically, without manually modifying, data correction effect can be improved Rate reduces manual operation.
Following is embodiment of the present disclosure, can be used for executing embodiments of the present disclosure.
Fig. 5 is a kind of block diagram of data calibration device provided in an embodiment of the present invention, which can pass through software, hardware Or both be implemented in combination with as some or all of of electronic equipment.As shown in figure 5, the data calibration device includes:
Field obtains module 501, and for obtaining default check field, default check field mistake causes data that mistake occurs Probability meet preset condition;
Data acquisition module 502, for obtaining in the second database from the data to be verified that first database is synchronous;
Correction verification module 503 is verified for treating the default check field in verification data.
Fig. 6 be another embodiment of the present invention provides the block diagram of data calibration device a kind of should as shown in fig. 6, optionally Device further include:
Quantity obtains module 504, and the quantity for before obtaining default check field, obtaining sample data is the first number Value;
Extraction module 505, for extracting error sample data from the sample data, the error sample data are hair The sample data of raw mistake, the quantity of the error sample data are second value;
Analysis module 506 determines the field that mistake occurs for analyzing the error sample data;
Statistical module 507, the quantity for counting the corresponding error sample data of the field are third value;
Computing module 508 leads to sample for calculating the field according to first numerical value, second value and third value The probability of mistake occurs for notebook data;
Judgment module 509, for judging whether the probability meets the preset condition, the probability meets described default Condition comprises at least one of the following situation: the probability be greater than or equal to predetermined probabilities threshold value, and by the probability from greatly to After small sequence, the probability of preceding predetermined number;
Setting module 510, for the corresponding field of the probability for meeting the preset condition to be set as the default verification Field.
Optionally, computing module 508 are specifically used for calculating error sample data according to first numerical value and second value The first shared ratio in all sample datas;The corresponding mistake of the field is calculated according to the second value and third value Accidentally sample data the second ratio shared in all error sample data;Institute is calculated according to first numerical value and third value State the corresponding error sample data of field third ratio shared in all sample datas;It is calculated based on naive Bayesian is improved Method calculates the probability according to first ratio, the second ratio and third ratio are as follows:
Wherein, P (a | B) indicates that the probability, P (B) indicate described first Ratio, P (a) expression second ratio, P (B | a) indicate the third ratio.
Optionally, computing module 508 are also used to obtain the corresponding checkout coefficient of the field;According to first numerical value, Second value, third value and the checkout coefficient calculate the probability that the field causes sample data that mistake occurs are as follows:
Wherein, h indicates the checkout coefficient.
Fig. 7 be another embodiment of the present invention provides a kind of data calibration device block diagram, as shown in fig. 7, the device is also Include:
Identifier acquisition module 511, for obtaining described to be verified when check results are the default check field mistake The Data Identification of data;
Searching module 512, for searching the corresponding initial data of the Data Identification from the first database;
Correction module 513, for according to the initial data to the default check fields of the data to be verified into Row amendment.
The embodiment of the present invention also provides a kind of electronic equipment, as shown in figure 8, electronic equipment may include: processor 1501, Communication interface 1502, memory 1503 and communication bus 1504, wherein processor 1501, communication interface 1502, memory 1503 Mutual communication is completed by communication bus 1504.
Memory 1503, for storing computer program;
Processor 1501 when for executing the computer program stored on memory 1503, realizes that the above method is implemented The step of example.
The communication bus that above-mentioned electronic equipment is mentioned can be Peripheral Component Interconnect standard (Peripheral Component Interconnect, P C I) bus or expanding the industrial standard structure (Extended Industry Standard Architecture, EISA) bus etc..The communication bus can be divided into address bus, data/address bus, control bus etc..For just It is only indicated with a thick line in expression, figure, it is not intended that an only bus or a type of bus.
Communication interface is for the communication between above-mentioned electronic equipment and other equipment.
Memory may include random access memory (Random Access Memory, RAM), also may include non-easy The property lost memory (Non-Volatile Memory, NVM), for example, at least a magnetic disk storage.Optionally, memory may be used also To be storage device that at least one is located remotely from aforementioned processor.
Above-mentioned processor can be general processor, including central processing unit (Central Processing Unit, CPU), network processing unit (Network Processor, NP) etc.;It can also be digital signal processor (Digital Signal Processing, DSP), it is specific integrated circuit (Application Specific Integrated Circuit, ASIC), existing It is field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete Door or transistor logic, discrete hardware components.
The present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer program The step of above method embodiment is realized when being executed by processor.
It should be noted that for above-mentioned apparatus, electronic equipment and computer readable storage medium embodiment, due to It is substantially similar to embodiment of the method, so being described relatively simple, related place is referring to the part explanation of embodiment of the method It can.
Explanation is needed further exist for, herein, the relational terms of such as " first " and " second " or the like are only used Distinguish one entity or operation from another entity or operation, without necessarily requiring or implying these entities or There are any actual relationship or orders between operation.Moreover, the terms "include", "comprise" or its any other change Body is intended to non-exclusive inclusion, so that the process, method, article or equipment including a series of elements is not only wrapped Those elements are included, but also including other elements that are not explicitly listed, or further includes for this process, method, article Or the element that equipment is intrinsic.In the absence of more restrictions, the element limited by sentence "including a ...", and It is not excluded in process, method, article or equipment in the process, method, article or apparatus that includes the element that there is also other identical elements.
The above is only a specific embodiment of the invention, is made skilled artisans appreciate that or realizing this hair It is bright.Various modifications to these embodiments will be apparent to one skilled in the art, as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, of the invention It is not intended to be limited to the embodiments shown herein, and is to fit to and applied principle and features of novelty phase one herein The widest scope of cause.

Claims (11)

1. a kind of data verification method characterized by comprising
Default check field is obtained, the probability that the default check field mistake causes data that mistake occurs meets preset condition;
It obtains in the second database from the data to be verified that first database is synchronous;
The default check field in the data to be verified is verified.
2. the method according to claim 1, wherein the method is also wrapped before obtaining default check field It includes:
The quantity for obtaining sample data is the first numerical value;
Error sample data are extracted from the sample data, the error sample data are that the sample data of mistake, institute occurs The quantity for stating error sample data is second value;
The error sample data are analyzed, determine the field that mistake occurs;
The quantity for counting the corresponding error sample data of the field is third value;
Calculating the field according to first numerical value, second value and third value causes sample data that the general of mistake occurs Rate;
Judge whether the probability meets the preset condition, the probability meets the preset condition and comprises at least one of the following Situation: the probability is greater than or equal to predetermined probabilities threshold value, and after the probability is sorted from large to small, preceding predetermined number Probability;
The corresponding field of the probability for meeting the preset condition is set as the default check field.
3. according to the method described in claim 2, it is characterized in that, according to first numerical value, second value and third value Calculate the probability that the field causes sample data that mistake occurs, comprising:
Error sample data the first ratio shared in all sample datas is calculated according to first numerical value and second value;
The corresponding error sample data of the field are calculated in all error sample numbers according to the second value and third value The second shared ratio in;
The corresponding error sample data of the field are calculated in all sample datas according to first numerical value and third value Shared third ratio;
Based on NB Algorithm is improved, the probability is calculated according to first ratio, the second ratio and third ratio are as follows:
Wherein, P (a | B) indicates that the probability, P (B) indicate first ratio, P (a) expression second ratio, and P (B | a) indicate the third ratio.
4. according to the method described in claim 3, it is characterized in that, according to first numerical value, second value and third value Calculate the probability that the field causes sample data that mistake occurs, further includes:
Obtain the corresponding checkout coefficient of the field;
Calculating the field according to first numerical value, second value, third value and the checkout coefficient leads to sample data The probability of mistake occurs are as follows:
Wherein, h indicates the checkout coefficient.
5. the method according to claim 1, wherein the method also includes:
When check results are the default check field mistake, the Data Identification of the data to be verified is obtained;
The corresponding initial data of the Data Identification is searched from the first database;
It is modified according to the default check field of the initial data to the data to be verified.
6. a kind of data calibration device characterized by comprising
Field obtains module, and for obtaining default check field, the default check field mistake causes data that mistake occurs Probability meets preset condition;
Data acquisition module, for obtaining in the second database from the data to be verified that first database is synchronous;
Correction verification module, for being verified to the default check field in the data to be verified.
7. device according to claim 6, which is characterized in that described device further include:
Quantity obtains module, and the quantity for before obtaining default check field, obtaining sample data is the first numerical value;
Extraction module, for extracting error sample data from the sample data, the error sample data are that mistake occurs Sample data, the quantity of the error sample data is second value;
Analysis module determines the field that mistake occurs for analyzing the error sample data;
Statistical module, the quantity for counting the corresponding error sample data of the field are third value;
Computing module leads to sample data for calculating the field according to first numerical value, second value and third value The probability of mistake occurs;
Judgment module, for judging whether the probability meets the preset condition, the probability meets the preset condition packet Include following at least one situation: the probability is greater than or equal to predetermined probabilities threshold value, and the probability is sorted from large to small Afterwards, the probability of preceding predetermined number;
Setting module, for the corresponding field of the probability for meeting the preset condition to be set as the default check field.
8. device according to claim 7, which is characterized in that the computing module is specifically used for according to first number Value and second value calculate error sample data the first ratio shared in all sample datas;According to the second value and Third value calculates the corresponding error sample data of the field the second ratio shared in all error sample data;According to First numerical value and third value calculate the corresponding error sample data of the field shared the in all sample datas Three ratios;Based on NB Algorithm is improved, the probability is calculated according to first ratio, the second ratio and third ratio Are as follows:
Wherein, P (a | B) indicates that the probability, P (B) indicate first ratio, P (a) expression second ratio, and P (B | a) indicate the third ratio.
9. device according to claim 8, which is characterized in that it is corresponding to be also used to obtain the field for the computing module Checkout coefficient;Calculating the field according to first numerical value, second value, third value and the checkout coefficient leads to sample The probability of mistake occurs for notebook data are as follows:
Wherein, h indicates the checkout coefficient.
10. device according to claim 6, which is characterized in that described device further include:
Identifier acquisition module, for obtaining the data to be verified when check results are the default check field mistake Data Identification;
Searching module, for searching the corresponding initial data of the Data Identification from the first database;
Correction module, for being modified according to the default check field of the initial data to the data to be verified.
11. a kind of electronic equipment characterized by comprising processor, communication interface, memory and communication bus, wherein place Device, communication interface are managed, memory completes mutual communication by communication bus;
The memory, for storing computer program;
The processor when for executing the computer program, realizes the described in any item method and steps of claim 1-5.
CN201910684893.9A 2019-07-26 2019-07-26 Data verification method and device and electronic equipment Active CN110399428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910684893.9A CN110399428B (en) 2019-07-26 2019-07-26 Data verification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910684893.9A CN110399428B (en) 2019-07-26 2019-07-26 Data verification method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN110399428A true CN110399428A (en) 2019-11-01
CN110399428B CN110399428B (en) 2022-02-11

Family

ID=68326253

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910684893.9A Active CN110399428B (en) 2019-07-26 2019-07-26 Data verification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN110399428B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241328A (en) * 2020-09-10 2021-01-19 长沙市到家悠享网络科技有限公司 Data processing method, device and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100299314A1 (en) * 2005-03-18 2010-11-25 Arijit Sengupta Identifying and using critical fields in quality management
US20130279516A1 (en) * 2010-12-17 2013-10-24 Zte Corporation Method and Device for Improving Robustness of Context Update Message in Robust Header Compression
CN106802898A (en) * 2015-11-26 2017-06-06 北大方正集团有限公司 Data entry method and device
CN109635300A (en) * 2018-12-14 2019-04-16 泰康保险集团股份有限公司 Data verification method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100299314A1 (en) * 2005-03-18 2010-11-25 Arijit Sengupta Identifying and using critical fields in quality management
US20130279516A1 (en) * 2010-12-17 2013-10-24 Zte Corporation Method and Device for Improving Robustness of Context Update Message in Robust Header Compression
CN106802898A (en) * 2015-11-26 2017-06-06 北大方正集团有限公司 Data entry method and device
CN109635300A (en) * 2018-12-14 2019-04-16 泰康保险集团股份有限公司 Data verification method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241328A (en) * 2020-09-10 2021-01-19 长沙市到家悠享网络科技有限公司 Data processing method, device and system
CN112241328B (en) * 2020-09-10 2024-01-23 长沙市到家悠享网络科技有限公司 Data processing method, device and system

Also Published As

Publication number Publication date
CN110399428B (en) 2022-02-11

Similar Documents

Publication Publication Date Title
TWI603220B (en) Method and device for network verification information
US8117609B2 (en) System and method for optimizing changes of data sets
WO2017113677A1 (en) User behavior data processing method and system
US11372699B1 (en) Method and system for detecting system outages using application event logs
CN106709805B (en) User income data acquisition method and system
CN109299193A (en) Method of data synchronization and relevant device
EP4327210A1 (en) Systems and methods for predicting correct or missing data and data anomalies
US20110282813A1 (en) System and method for using pattern recognition to monitor and maintain status quo
US20190050672A1 (en) INCREMENTAL AUTOMATIC UPDATE OF RANKED NEIGHBOR LISTS BASED ON k-th NEAREST NEIGHBORS
CN110399428A (en) A kind of data verification method, device and electronic equipment
WO2019056501A1 (en) Personalized wifi hotspot pushing method, device, and storage medium
CN106304084B (en) Information processing method and device
CN112181794A (en) Page monitoring method and device, computer equipment and storage medium
CN113535449B (en) Abnormal event restoration processing method and device, computer equipment and storage medium
CN109783721A (en) A kind of intelligence questionnaire method for pushing and system
US10803053B2 (en) Automatic selection of neighbor lists to be incrementally updated
CN106604072B (en) The difference analysis method and device of Web TV data
CN114138813A (en) Attribute configuration method and related device
CN113691548A (en) Data acquisition and classified storage method and system thereof
CN113360172A (en) Application deployment method and device, computer equipment and storage medium
CN106933694A (en) Application error localization method and device
CN111736939A (en) Page self-adaptive adjusting method and device, storage medium and computer equipment
CN111371900B (en) Method and system for monitoring health state of synchronous link
US11928622B2 (en) Systems and methods for failure detection tools in large scale maintenance operations
CN110349025B (en) Method and device for preventing loss of contract assets based on non-cost transaction output

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant