CN109635256A - Method and apparatus for verifying data - Google Patents

Method and apparatus for verifying data Download PDF

Info

Publication number
CN109635256A
CN109635256A CN201811562212.3A CN201811562212A CN109635256A CN 109635256 A CN109635256 A CN 109635256A CN 201811562212 A CN201811562212 A CN 201811562212A CN 109635256 A CN109635256 A CN 109635256A
Authority
CN
China
Prior art keywords
data
data set
character string
character
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811562212.3A
Other languages
Chinese (zh)
Other versions
CN109635256B (en
Inventor
徐飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhangmen Science and Technology Co Ltd
Original Assignee
Shanghai Zhangmen Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhangmen Science and Technology Co Ltd filed Critical Shanghai Zhangmen Science and Technology Co Ltd
Priority to CN201811562212.3A priority Critical patent/CN109635256B/en
Publication of CN109635256A publication Critical patent/CN109635256A/en
Application granted granted Critical
Publication of CN109635256B publication Critical patent/CN109635256B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the present application is disclosed for verifying data.One specific embodiment of this method includes: to obtain the first data set and the second data set to be verified, and the data in data and the second data set in the first data set correspond;According to the first data set and the second data set, generate the first character trail and the second character trail, wherein, the character string that first character string is concentrated includes the data in the first data set, the character string that second character string is concentrated includes the data in the second data set, and the character string that the first character string is concentrated is corresponded with the data that the corresponding character string that the second character string is concentrated includes and put in order consistent;Based on the first character trail and the second character trail, determine whether the first data set and the second data set are identical, output is for indicating the first data set and the whether identical check results information of the second data set.The embodiment realizes the verification to two datasets.

Description

Method and apparatus for verifying data
Technical field
The invention relates to field of computer technology, and in particular to the method and apparatus for verifying data.
Background technique
With the fast development of Internet technology, per quarter can all generate a large amount of data, can also be related to the biography of mass data Defeated, storage etc..And durings the generation of data, transimission and storage etc., many and diverse influences are had, some influences will lead to Data go wrong.Therefore, data check is the method for a kind of common integrality for guaranteeing data, consistency etc..
Same part data can be corresponding with more parts of storages in many cases,.For example, the data that an interface of server-side returns It gathers while being synchronized to multiple client.In another example for same data set, it is sometimes desirable to according to different storage mode or It is stored respectively using different data storage facility etc..In the case of these, can all appear in different location be stored with it is same Data set.At this point, how to verify at these different locations store data set in data it is whether identical be in need of consideration one A problem.
Currently, the method for calibration that commonly can be used for verifying the consistency of multiple data sets of different storage locations includes Directly verify and verify and.Wherein, directly verification is exactly directly to be compared multiple data sets one by one.It verifies and is using disappearing Breath digest algorithm scheduling algorithm is respectively processed multiple data sets, and by judging the corresponding hashed value of multiple data sets Whether the identical consistency to verify multiple data sets.
Summary of the invention
The embodiment of the present application proposes the method and apparatus for verifying data.
In a first aspect, the embodiment of the present application provides a kind of method for verifying data, this method comprises: obtaining to school The first data set and the second data set tested, wherein the data one in the data and the second data set in the first data set are a pair of It answers;According to the first data set and the second data set, the first character trail and the second character trail are generated, wherein the first character string The character string of concentration includes the data in the first data set, and the character string that the second character string is concentrated includes the number in the second data set According to, and the first character string character string concentrated and the character string that the second character string is concentrated correspond, and corresponding two characters The data that string includes are corresponded and are put in order consistent;Based on the first character trail and the second character trail, the first number is determined It is whether identical according to collection and the second data set, and output is for indicating the first data set and the whether identical verification of the second data set Result information.
Second aspect, the embodiment of the present application provide it is a kind of for verifying the device of data, the device include: obtain it is single Member is configured to obtain the first data set and the second data set to be verified, wherein data and the second number in the first data set It is corresponded according to the data of concentration;Generation unit is configured to generate the first character according to the first data set and the second data set Trail and the second character trail, wherein the character string that the first character string is concentrated includes the data in the first data set, the second character Character string in trail includes the data in the second data set, and the character string that the first character string is concentrated is concentrated with the second character string Character string correspond, and the data that corresponding two character strings include are corresponded and are put in order consistent;Determination unit, It is configured to determine whether the first data set and the second data set are identical based on the first character trail and the second character trail, with And output is for indicating the first data set and the whether identical check results information of the second data set.
The third aspect, the embodiment of the present application provide a kind of electronic equipment, which includes: one or more processing Device;Storage device, for storing one or more programs;When one or more programs are executed by one or more processors, make Obtain method of the one or more processors realization as described in implementation any in first aspect.
Fourth aspect, the embodiment of the present application provide a kind of computer-readable medium, are stored thereon with computer program, should The method as described in implementation any in first aspect is realized when computer program is executed by processor.
Method and apparatus provided by the embodiments of the present application for verifying data, by obtaining the first data set to be verified With the second data set, wherein the data in the data and the second data set in the first data set correspond;According to the first data Collection and the second data set generate the first character trail and the second character trail, wherein the character string that the first character string is concentrated includes Data in first data set, the character string that the second character string is concentrated include the data in the second data set, and the first character string The character string that the character string of concentration and the second character string are concentrated corresponds, and the data that corresponding two character strings include are one by one It corresponds to and puts in order consistent;Based on the first character trail and the second character trail, the first data set and the second data set are determined It is whether identical, and export for indicating the first data set and the whether identical check results information of the second data set, thus real Show and has been arranged according to the data in two datasets according to corresponding sequence and multiple character strings for generating are completed to two numbers According to the verification of collection.On the one hand, greater probability avoid in two datasets data correspond twin check the case where, Help to promote verification speed.On the other hand, according to each data set, multiple character strings are generated, thus later can be right respectively Corresponding two character strings are verified, and can also integrally be verified, be had according to the corresponding multiple character strings of each data set Help be promoted the flexibility of verification mode.
Detailed description of the invention
By reading a detailed description of non-restrictive embodiments in the light of the attached drawings below, the application's is other Feature, objects and advantages will become more apparent upon:
Fig. 1 is that one embodiment of the application can be applied to exemplary system architecture figure therein;
Fig. 2 is the flow chart according to one embodiment of the method for verifying data of the application;
Fig. 3 is the flow chart according to another embodiment of the method for verifying data of the application;
Fig. 4 is the schematic diagram according to an application scenarios of the method for verifying data of the embodiment of the present application;
Fig. 5 is the structural schematic diagram according to one embodiment of the device for verifying data of the application;
Fig. 6 is adapted for the structural schematic diagram for the computer system for realizing the electronic equipment of the embodiment of the present application.
Specific embodiment
The application is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining related invention, rather than the restriction to the invention.It also should be noted that in order to Convenient for description, part relevant to related invention is illustrated only in attached drawing.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.The application is described in detail below with reference to the accompanying drawings and in conjunction with the embodiments.
Fig. 1 is shown can be using the method for verifying data of the application or the implementation of the device for verifying data The exemplary architecture 100 of example.
As shown in Figure 1, system architecture 100 may include server 101 and the database with the communication connection of server 101 102,103.Database management language can be installed, for controlling database 102,103 on server 101.
It can be used for storing same part data in database 102,103.The storage mode of data in database 102,103 (such as storage format, storage order) can be different.
Server 101 can be to provide the server of various services.For example, for database 102,103 store data into The data processing server of row verification.Data processing server can obtain corresponding data from database 102,103 respectively Collection, and consistency desired result is carried out to two datasets, and show check results to user.
It should be noted that database 102,103 can also be mounted directly on server 101.At this point, server 101 can Directly to obtain corresponding data set from two local databases and carry out consistency desired result.
What server 101 was also possible to be verified by same part data that two different clients receive and store Data processing server.The corresponding data set that data processing server can be obtained from two clients respectively, and to two A data set carries out consistency desired result.At this point it is possible to which database 102,103 is not present.
It should be noted that database 102,103 can be respectively arranged in two clients.At this point, server can be with Corresponding data set is obtained from the database installed in two clients respectively, and carries out consistency desired result.
It should be noted that server 101 can be hardware, it is also possible to software.It, can when server 101 is hardware To be implemented as the distributed server cluster that multiple servers form, individual server also may be implemented into.When server 101 is When software, multiple softwares or software module may be implemented into (such as providing multiple softwares of Distributed Services or software mould Block), single software or software module also may be implemented into.It is not specifically limited herein.
It should be noted that above-mentioned client can be hardware, it is also possible to software.It, can when terminal device is hardware To be various electronic equipments, including but not limited to smart phone, tablet computer, E-book reader, pocket computer on knee With desktop computer etc..When terminal device is software, may be mounted in above-mentioned cited electronic equipment.It can be real Ready-made multiple softwares or software module (such as providing multiple softwares of Distributed Services or software module), also may be implemented At single software or software module.It is not specifically limited herein.
It should be noted that the method provided by the embodiment of the present application for verifying data is generally held by server 101 Row, correspondingly, the device for verifying data is generally positioned in server 101.
It should be understood that the number of server, database in Fig. 1 is only schematical.According to needs are realized, can have There are any number of server, database.
With continued reference to Fig. 2, it illustrates the processes according to one embodiment of the method for verifying data of the application 200.This be used for verify data method the following steps are included:
Step 201, the first data set and the second data set to be verified are obtained.
It in the present embodiment, can be first for verifying the executing subject (server 101 as shown in Figure 1) of the method for data The first data set and second to be verified is obtained from local or other storage equipment in the way of wired connection or wireless connection Data set.Wherein, data can refer to the storable various data of computer.Data can be number, text, character string etc..
Wherein, the data in the data and the second data set in the first data set correspond.Here corresponding relationship can To refer to the correspondence of verification relationship.It should be appreciated that when the data in the first data set and the second data set are identical, the first number It is identical with the data one-to-one correspondence in the second data set according to collecting.Therefore, it verifies the first data set and whether the second data set is identical, Whether corresponding two data are identical substantially and in verifying the first data set and the second data set.
In some optional implementations of the present embodiment, the data in the first data set and the second data set be can be The data record obtained from database.At this point, the first data set and the second data set can determine as follows:
Step 1 is inquired in first object database according to preset querying condition, the first note returned Record collection, and according to querying condition, inquired in the second target database, the second record set returned.
In this step, querying condition can be arranged by technical staff previously according to specific application scenarios.Such as, it is desirable to The data record of the first three days since current time is verified, then it is to start in current time on the date that querying condition, which can be set, First three days within data record.Wherein, first object database and the second target database can be pre- by technical staff It is first specified according to specific application scenarios.
Optionally, querying condition can also limit the field to be verified.Every data is recorded, institute can be only selected It needs the field of part to be verified, to promote the flexibility of verification mode, while also contributing to promoting verification speed.
Step 2, in response to determining the number of the first record set record for including and the number of the second record set record for including Mesh is identical, determines that the first record set as the first data set, and determines the second record set as the second data set.
In the first data set and identical the second data set, the number for the data that the first data set includes and the second data set The number for the data for including should be also identical.Therefore, if according to identical querying condition in first object database and the second mesh The number for the data record that the first record set and the second record set inquired respectively in mark database separately includes is inconsistent, then First record set and the second record set are certainly different.Therefore, such case can first be filtered out.
Step 202, according to the first data set and the second data set, the first character trail and the second character trail are generated.
In this step, the character string that the first character string is concentrated includes the data in the first data set, the second character trail In character string include the second data set in data, and the first character string concentrate character string and the second character string concentrate word Symbol string corresponds, and the data that corresponding two character strings include are corresponded and put in order consistent.
In this way can will verification the first data set and the second data set whether same conversion be verification the first word It accords with trail and whether the second character trail is identical.And verify the first character trail and the second character trail it is whether identical be verification Whether the first character trail and the second character string concentrate corresponding character string identical.
It should be appreciated that putting in order for data in only corresponding two character strings is consistent, verification is just significant. Otherwise, if inconsistent, the difference of the two character string greater probability that puts in order of the data in corresponding two character strings, Also with regard to the appearance error checking of greater probability the case where.
Putting in order for the data that two character strings include can unanimously refer to corresponding data in the corresponding position of character string Place.As an example, the first character string are as follows: " A1B1 ".Wherein, the first character string includes data " A1 " and " B1 ".Second character string For " A2B2 ".Wherein, the second character string includes data " A2 " and " B2 ".Wherein, data " A1 " and data " A2 " are corresponding, data " B1 " and data " B2 " are corresponding.So, putting in order for data is data " A1 " before data " B1 " in the first character string.By In the second character string and data " A1 " corresponding data " A2 " are also before data " B2 " corresponding with data " B1 ".Cause This, the data in the first character string are consistent with putting in order for the data in the second character string.If third character string is " B2A2 ", Then the data in the first character string and the data in third character string put in order it is inconsistent.
In the present embodiment, the data in the first data set can be obtained the according to the preset successively splicing that puts in order One character string, and determine the first character string as the first character trail;By the data in the second data set according to suitable with arrangement The corresponding successively splicing that puts in order of sequence obtains the second character string, and determines the second character string as the second character trail.Its In, putting in order for data can be specified by technical staff.
It is alternatively possible to which the data in the first data set are obtained the first character according to the preset successively splicing that puts in order String, and the data in the second data set are obtained into the second character according to the successively splicing that puts in order corresponding with putting in order String.Later, preset characters, and new first that will be obtained can be inserted between the every two data that the first character string includes Character string is determined as the first character trail, and is inserted into preset characters between the every two data that the second character string includes, with And obtained the second new character string is determined as the second character trail.
Wherein, preset characters can be by the preassigned character of technical staff.As an example, character includes but not in advance Be limited to: "-", ", ", ", ", " * ", "+" etc..Preset characters can be used as the separator between two data of contiguous concatenation.
As an example, any character string " AB2B2C ".It include data " A ", " B2 ", " B ", " 2C " in the character string.? It is inserted into after preset characters "-" between the every two data that the character string includes, obtaining the new character string is " A-B2-B- 2C”。
By being inserted into preset characters between the data of every two contiguous concatenation, can be very good to divide each data, from And the case where including certain data in data set in the spliced character string of data for avoiding the occurrence of two contiguous concatenations.
With the above-mentioned character string " AB2B2C " including data " A ", " B2 ", " B ", " 2C " for example, preset characters are being not added When, it in the character string " B2C " that is formed include " B2 " after data " B " and " 2C " splicing, the data " B2 " for being easy and including originally are mixed Confuse, therefore, easily causes error checking in this case.And after having added preset characters, it can be using preset characters as number Separator between, the case where to reduce error checking, to promote verification accuracy rate.
Step 203, it is based on the first character trail and the second character trail, determines whether are the first data set and the second data set It is identical, and output is for indicating the first data set and the whether identical check results information of the second data set.
In the present embodiment, due in the first data set and identical the second data set, the first character trail and the second word It is also identical to accord with trail.Therefore, can by verification the first character trail and the second character trail it is whether identical come judge first number It is whether identical according to collection and the second data set.
It is utilizable existing each when it only includes a character string that the first character trail and the second character string, which are concentrated, Kind character string method of calibration (such as cyclic redundancy check method, hash check) is to the first character trail and the second character trail It is verified, to determine whether the first character trail and the second character trail are identical, and then determines the first data set and the second number It is whether identical according to collecting.
Wherein, check results information can be any form of information.Check results information can be number, character, text Word, image, video, signal etc..
It should be noted that in this application, two datasets to be verified, two datasets are divided for ease of description It is not named as the first data set and the second data set, it will be appreciated by those skilled in the art that the therein first and second not structures The particular determination of paired data collection.Similarly, above or below the first character trail, the second character trail, first object Database, the second target database, the first record set, the second record set, the first character string, the second character string, the first hashed value Collect, first and second that the second hashed value is concentrated also do not constitute particular determination.
The method provided by the above embodiment of the application verifies two by the corresponding character trail of two datasets Whether a data set is identical.The data verified in two datasets one by one are compared, and help to promote verification speed.In addition, logical The consistency of the data for guaranteeing that the character string that the corresponding character string of two datasets is concentrated includes to put in order is crossed, to subtract It is few as data are not corresponding and caused by error checking the case where, to help to promote verification accuracy.
With further reference to Fig. 3, it illustrates the processes 300 of another embodiment of the method for verifying data.The use In the process 300 of the method for verification data, comprising the following steps:
Step 301, the first data set and the second data set to be verified are obtained.
The specific implementation procedure of this step 301 can refer to the related description of the step 201 in Fig. 2 corresponding embodiment, This is repeated no more.
Step 302, the first data set and the second data set are split as at least two data subsets respectively.
In this step, the data subset at least two data subsets of the first data set and the second data set be at least Data subset in two data subsets corresponds, and the data that corresponding two data subsets include correspond.Change speech It, is split as at least two data subsets according to corresponding fractionation mode for the first data set and the second data set respectively, to protect It demonstrate,proves corresponding data and is belonging respectively to corresponding data subset.Specific fractionation mode can have technical staff to preset, can also To be determined according to actual application demand.
Optionally, in the data that the first data set is in first object database, the second data set is the second target data When data in library, the data in data subset at least two data subsets of the first data set belong to first object data The data in data subset in the same record in library and at least two data subsets of the second data set belong to the second mesh Mark the same record of database.
In other words, the first data set and the second data set are split according to record belonging to data.In the first data set and When data in two data sets are the data record obtained from database, every data in database can be recorded and be made For a data subset.
Step 303, for the data subset at least two data subsets of the first data set, according to the data subset, Generate the first character string of the data subset;For the data subset at least two data subsets of the second data set, according to The data subset generates the second character string of the data subset.
It in the present embodiment, can be by the number for the data subset at least two data subsets of the first data set The first character string of the data subset is obtained according to the preset successively splicing that puts in order according to the data in subset.For the second number It, can be by the data in the data subset according to suitable with preset arrangement according to the data subset at least two data subsets of collection The corresponding successively splicing that puts in order of sequence obtains the second character string of the data subset.
Putting in order can be specified by technical staff.It should be appreciated that the data in data set are from database When the data record of acquisition, puts in order and record putting in order for corresponding each field including every data, so that it is guaranteed that after The corresponding data of continuous verification is meaningful, further avoid as order of the field is inconsistent and caused by error checking feelings Condition.
It optionally, can be by the data subset for the data subset at least two data subsets of the first data set In data obtain initial first character string of the data subset according to the preset successively splicing that puts in order, then initial the It is inserted into preset characters between the every two data that one character string includes, and obtained new initial first character string is determined as First character string of the data subset.It, can be with and for the data subset at least two data subsets of the second data set Data in the data subset are obtained into the data subset according to the preset corresponding successively splicing that puts in order that puts in order The second init string, and preset characters are inserted between the every two data that the second init string includes, and will Obtained new initial second character string is determined as the second character string of the data subset.
Similarly with the step 202 in Fig. 2 corresponding embodiment, pre- by being inserted between the data of every two contiguous concatenation If character, it can be very good to divide each data, thus in the spliced character string of the data for avoiding the occurrence of two contiguous concatenations The case where including certain data in data set.
Step 304, corresponding first word of data subset at least two data subsets of the first data set is utilized Symbol string forms the first character trail, and right respectively using the data subset at least two data subsets of the second data set The second character string answered forms the second character trail.
Step 305, it is based on preset hash algorithm, the character string concentrated to the first character string is respectively processed, and is obtained The corresponding first hashed value collection of first character trail, and it is based on hash algorithm, the character string that the second character string is concentrated is distinguished It is handled, obtains the corresponding second hashed value collection of the second character trail.
In this step, hash algorithm can be specified by technical staff, and to the first character trail and the second character trail In each character string handled, to obtain, each character string that the first character trail and the second character string are concentrated is corresponding to be dissipated Train value.
Step 306, it is based on the first hashed value collection and the second hashed value collection, determines whether are the first data set and the second data set It is identical.
In the present embodiment, since the character string that the first character trail and the second character string are concentrated corresponds, it obtains To first hash the second hashed value of value set concentrate hashed value be also correspond.Therefore, can be respectively compared have pair Whether the hashed value that should be related to is identical to determine whether the first data set and the second data set are identical.In the first data set and second When data set is identical, two hashed values with corresponding relationship also should be identical.
It is alternatively possible to first determine the hashed value for the hashed value and the second hashed value concentration that the first hashed value is concentrated respectively Total and/or total product;It is identical in response to the corresponding summation of determining first hashed value collection summation corresponding with the second hashed value collection, or In response to determining that the corresponding total product of the first hashed value collection total product corresponding with the second hashed value collection is identical, the first data are determined Collect identical with the second data set.
Wherein, due in the first data set and identical the second data set, each hashed value that the first hashed value is concentrated and Each hashed value that second hashed value is concentrated corresponds identical.Therefore, the summation for each hashed value that the first hashed value is concentrated It should be equal to the summation of each hashed value of the second hashed value concentration.The total product for each hashed value that first hashed value is concentrated is answered This is equal to the total product for each hashed value that the second hashed value is concentrated.Therefore, what can be concentrated by the first hashed value is each scattered Total and/or total product of each hashed value that total and/or total the second hashed value of sum of products of train value is concentrated determines the first data set It is whether identical with the second data set.
This mode for calculating summation is compared with the method for comparing the correspondence hashed value that two hashed values are concentrated one by one, is calculated The mode of summation does not need the corresponding relationship for spending time cost to determine hashed value again then.Because even dissipating in two hashed values Putting in order for train value is not corresponding, but summation is determining, to further promote verification speed.
With continued reference to the signal that Fig. 4, Fig. 4 are according to the application scenarios of the method for verifying data of the present embodiment Figure 40 0.In the application scenarios of Fig. 4, the first data set 402 to be verified can be first obtained from first database 401.Such as figure Shown in, every data record in database 401 is corresponding with keyword and field 1.First data set 402 is including keyword " K11 ", the value of field 1 is " F11 " and keyword is " K12 ", and the value of field 1 is that two datas of " F12 " record.
Likewise it is possible to obtain the second data set 404 to be verified from the second database 403.As shown in the figure, data The same corresponding keyword of every data record and field 1 in library 403.Second data set 404 includes that keyword is " K21 ", word The value of section 1 is " F21 " and keyword is " K22 ", and the value of field 1 is that two datas of " F22 " record.
Later, as shown in figure label 405, the first data set 402 can be split as two data subsets.Specifically, Using each data record as a data subset.As shown in the figure, a data subset is { (K11, F11) }, second number It is { (K12, F12) } according to subset.
Similarly, as shown in figure label 406, the second data set 404 can be split as two data subsets.Specifically Ground, using each data record as a data subset.As shown in the figure, data subset is { (K21, F21) }, second Data subset is { (K22, F22) }.
Later, for data subset { (K11, F11) }, taking for the keyword and field 1 in the data subset can be generated Value is attached by character "-", to obtain corresponding first character string " K11-F11 " 407.Similarly, for data Collecting { (K12, F12) }, the value that the keyword and field 1 in the data subset can be generated is attached by character "-", from And obtain corresponding first character string " K12-F12 " 408.
Similarly, for data subset { (K21, F21) }, keyword in the data subset and field 1 can be generated Value is attached by character "-", to obtain corresponding second character string " K21-F21 " 409.Similarly, for data Subset { (K22, F22) }, the value that the keyword and field 1 in the data subset can be generated are attached by character "-", To obtain corresponding second character string " K22-F22 " 410.
Wherein, the first character string " K11-F11 " 407 and the first character string " K12-F12 " 408 can form the first character string Collection.Second character string " K21-F21 " 409 and the second character string " K22-F22 " 410 can form the second character trail.
Later, can use hash algorithm, respectively obtain the corresponding hashed value H11 of the first character string " K11-F11 " 407 and First character string " K12-F12 ", 408 corresponding hashed value H12 (as shown in figure label 411 and 412).Utilize same Hash Algorithm respectively obtains 410 pairs of the corresponding hashed value H21 of the second character string " K21-F21 " 409 and the second character string " K22-F22 " The hashed value H22 answered (as shown in figure label 413 and 414).
Later, the first character string " K11-F11 " 407 can be calculated and the first character string " K12-F12 " 408 is corresponding The summation of hashed value H11 and H12 are H1 (as shown in figure label 415).It is same to calculate the second character string " K21-F21 " 409 He The summation H2 of the corresponding hashed value H21 and H22 of second character string " K22-F22 " 410 (as shown in figure label 416).
Later, so that it may which whether two the summations H1 and H2 obtained by comparing are identical, to obtain check results information 417.If H1 is identical with H2, check results information 417 can indicate that the first data set and the second data set are identical.If H1 and H2 Not identical, then check results information 417 can indicate that the first data set and the second data set be not identical.
The method provided by the above embodiment of the application is corresponding by being split as two datasets to be verified respectively At least two data subsets, and according to each data subset, corresponding character string is generated, it is right respectively to obtain two datasets The character trail answered.Then the corresponding hashed value of each character string that each character string is concentrated is calculated, and according to two datasets Corresponding hashed value collection, to complete the verification to two datasets.The problem of two datasets being verified conversion For the verification to the corresponding hashed value of corresponding at least two character string of two datasets, and integrally it regard data set as school It tests object to compare, helps to mitigate in checking procedure sort brought time cost and space cost, while helping to be promoted Verify accuracy.
With further reference to Fig. 5, as the realization to method shown in above-mentioned each figure, this application provides for verifying data One embodiment of device, the Installation practice is corresponding with embodiment of the method shown in Fig. 2, which specifically can be applied to In various electronic equipments.
As shown in figure 5, the device 500 provided in this embodiment for verifying data includes acquiring unit 501, generation unit 502 and determination unit 503.Wherein, acquiring unit 501 are configured to obtain the first data set and the second data set to be verified, Wherein, the data in the data and the second data set in the first data set correspond;Generation unit 502, is configured to basis First data set and the second data set generate the first character trail and the second character trail, wherein the word that the first character string is concentrated Symbol string includes the data in the first data set, and the character string that the second character string is concentrated includes the data in the second data set, and the The character string that the character string and the second character string that one character string is concentrated are concentrated corresponds, and corresponding two character strings include Data are corresponded and are put in order consistent;Determination unit 503 is configured to based on the first character trail and the second character string Collection, determines whether the first data set and the second data set are identical, and output is for indicating the first data set and the second data set Whether identical check results information.
In the present embodiment, in the device 500 for verifying data: acquiring unit 501, generation unit 502 and determining list The specific processing of member 503 and its brought technical effect can be respectively with reference to step 201, the steps 202 in Fig. 2 corresponding embodiment With step 203 related description, details are not described herein.
The device provided by the above embodiment of the application obtains the first data set and second to be verified by acquiring unit Data set, wherein the data in the data and the second data set in the first data set correspond;Generation unit is according to the first number According to collection and the second data set, the first character trail and the second character trail are generated, wherein the character string packet that the first character string is concentrated Containing the data in the first data set, the character string that the second character string is concentrated includes the data in the second data set, and the first character The character string that character string and the second character string in trail are concentrated corresponds, and the data one that corresponding two character strings include One corresponds to and puts in order consistent;Determination unit be based on the first character trail and the second character trail, determine the first data set and Whether the second data set is identical, and output is for indicating the first data set and the whether identical check results letter of the second data set Breath is arranged according to the data in two datasets according to corresponding sequence and multiple character strings for generating are completed to realize Verification to two datasets.On the one hand, avoiding for greater probability compares school to the data one-to-one correspondence in two datasets The case where testing helps to promote verification speed.On the other hand, according to each data set, multiple character strings are generated, to later may be used To be verified respectively to corresponding two character strings, can also integrally be carried out according to the corresponding multiple character strings of each data set Verification facilitates the flexibility for promoting verification mode.
Below with reference to Fig. 6, it illustrates the computer systems 600 for the electronic equipment for being suitable for being used to realize the embodiment of the present application Structural schematic diagram.Electronic equipment shown in Fig. 6 is only an example, function to the embodiment of the present application and should not use model Shroud carrys out any restrictions.
As shown in fig. 6, computer system 600 includes central processing unit (CPU) 601, it can be read-only according to being stored in Program in memory (ROM) 602 or be loaded into the program in random access storage device (RAM) 603 from storage section 608 and Execute various movements appropriate and processing.In RAM 603, also it is stored with system 600 and operates required various programs and data. CPU 601, ROM 602 and RAM 603 are connected with each other by bus 604.Input/output (I/O) interface 605 is also connected to always Line 604.
I/O interface 605 is connected to lower component: the importation 606 including keyboard, mouse etc.;It is penetrated including such as cathode The output par, c 607 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage section 608 including hard disk etc.; And the communications portion 609 of the network interface card including LAN card, modem etc..Communications portion 609 via such as because The network of spy's net executes communication process.Driver 610 is also connected to I/O interface 605 as needed.Detachable media 611, such as Disk, CD, magneto-optic disk, semiconductor memory etc. are mounted on as needed on driver 610, in order to read from thereon Computer program be mounted into storage section 608 as needed.
Particularly, in accordance with an embodiment of the present disclosure, it may be implemented as computer above with reference to the process of flow chart description Software program.For example, embodiment of the disclosure includes a kind of computer program product comprising be carried on computer-readable medium On computer program, which includes the program code for method shown in execution flow chart.In such reality It applies in example, which can be downloaded and installed from network by communications portion 609, and/or from detachable media 611 are mounted.When the computer program is executed by central processing unit (CPU) 601, limited in execution the present processes Above-mentioned function.
It should be noted that the computer-readable medium of the application can be computer-readable signal media or computer Readable storage medium storing program for executing either the two any combination.Computer readable storage medium for example can be --- but it is unlimited In system, device or the device of --- electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, or any above combination.It calculates The more specific example of machine readable storage medium storing program for executing can include but is not limited to: have the electrical connection, portable of one or more conducting wires Formula computer disk, hard disk, random access storage device (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-ROM), light storage device, magnetic memory device or The above-mentioned any appropriate combination of person.In this application, computer readable storage medium can be it is any include or storage program Tangible medium, which can be commanded execution system, device or device use or in connection.And in this Shen Please in, computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, In carry computer-readable program code.The data-signal of this propagation can take various forms, including but not limited to Electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be computer-readable Any computer-readable medium other than storage medium, the computer-readable medium can send, propagate or transmit for by Instruction execution system, device or device use or program in connection.The journey for including on computer-readable medium Sequence code can transmit with any suitable medium, including but not limited to: wireless, electric wire, optical cable, RF etc. are above-mentioned Any appropriate combination.
Flow chart and block diagram in attached drawing are illustrated according to the system of the various embodiments of the application, method and computer journey The architecture, function and operation in the cards of sequence product.In this regard, each box in flowchart or block diagram can generation A part of one module, program segment or code of table, a part of the module, program segment or code include one or more use The executable instruction of the logic function as defined in realizing.It should also be noted that in some implementations as replacements, being marked in box The function of note can also occur in a different order than that indicated in the drawings.For example, two boxes succeedingly indicated are actually It can be basically executed in parallel, they can also be executed in the opposite order sometimes, and this depends on the function involved.Also it to infuse Meaning, the combination of each box in block diagram and or flow chart and the box in block diagram and or flow chart can be with holding The dedicated hardware based system of functions or operations as defined in row is realized, or can use specialized hardware and computer instruction Combination realize.
Being described in unit involved in the embodiment of the present application can be realized by way of software, can also be by hard The mode of part is realized.Described unit also can be set in the processor, for example, can be described as: a kind of processor, packet Include acquiring unit, generation unit and determination unit.Wherein, the title of these units is not constituted under certain conditions to the unit The restriction of itself, for example, acquiring unit is also described as " obtaining the list of the first data set and the second data set to be verified Member, wherein the data in the data and the second data set in the first data set correspond ".
As on the other hand, present invention also provides a kind of computer-readable medium, which be can be Included in electronic equipment described in above-described embodiment;It is also possible to individualism, and without in the supplying electronic equipment. Above-mentioned computer-readable medium carries one or more program, when said one or multiple programs are held by the electronic equipment When row, so that the electronic equipment: obtaining the first data set and the second data set to be verified, wherein the number in the first data set It is corresponded according to the data in the second data set;According to the first data set and the second data set, generate the first character trail and Second character trail, wherein the character string that the first character string is concentrated includes the data in the first data set, and the second character string is concentrated Character string include the second data set in data, and the first character string concentrate character string and the second character string concentrate character String corresponds, and the data that corresponding two character strings include are corresponded and put in order consistent;Based on the first character string Collection and the second character trail, determine whether the first data set and the second data set are identical, and output is for indicating the first data Collection and the whether identical check results information of the second data set.
Above description is only the preferred embodiment of the application and the explanation to institute's application technology principle.Those skilled in the art Member is it should be appreciated that invention scope involved in the application, however it is not limited to technology made of the specific combination of above-mentioned technical characteristic Scheme, while should also cover in the case where not departing from foregoing invention design, it is carried out by above-mentioned technical characteristic or its equivalent feature Any combination and the other technical solutions formed.Such as features described above has similar function with (but being not limited to) disclosed herein Can technical characteristic replaced mutually and the technical solution that is formed.

Claims (12)

1. a kind of method for verifying data, comprising:
Obtain the first data set and the second data set to be verified, wherein the data in first data set and described second Data in data set correspond;
According to first data set and the second data set, the first character trail and the second character trail are generated, wherein described the The character string that one character string is concentrated includes the data in first data set, and the character string that second character string is concentrated includes Data in second data set, and the character that the character string of first character string concentration and second character string are concentrated String corresponds, and the data that corresponding two character strings include are corresponded and put in order consistent;
Based on the first character trail and the second character trail, first data set and second data set are determined It is whether identical, and output is for indicating first data set and the whether identical check results letter of second data set Breath.
2. according to the method described in claim 1, wherein, first data set and the second data set are true as follows It is fixed:
It according to preset querying condition, is inquired in first object database, the first record set returned, Yi Jigen It according to the querying condition, is inquired in the second target database, the second record set returned;
The number for the record that the number for the record for including in response to determination first record set and second record set include It is identical, determine that first record set as first data set, and determines second record set as described second Data set.
It is described according to first data set and the second data set 3. according to the method described in claim 1, wherein, generate the One character trail and the second character trail, comprising:
Data in first data set are obtained into the first character string according to the preset successively splicing that puts in order, and are determined First character string is as the first character trail;
Data in second data set are obtained second according to the corresponding successively splicing that puts in order that puts in order Character string, and determine second character string as the second character trail.
It is described according to first data set and the second data set 4. according to the method described in claim 1, wherein, generate the One character trail and the second character trail, comprising:
Data in first data set are obtained into the first character string according to the preset successively splicing that puts in order, and by institute The data stated in the second data set obtain the second character string according to the corresponding successively splicing that puts in order that puts in order;
Preset characters, and the first new character that will be obtained are inserted between the every two data that first character string includes String is determined as the first character trail;
The preset characters, and new second that will be obtained are inserted between the every two data that second character string includes Character string is determined as the second character trail.
It is described according to first data set and the second data set 5. according to the method described in claim 1, wherein, generate the One character trail and the second character trail, comprising:
First data set and second data set are split as at least two data subsets respectively, wherein described first Data subset at least two data subsets of data set and the number at least two data subsets of second data set It is corresponded according to subset, and the data that corresponding two data subsets include correspond;
The number is generated according to the data subset for the data subset at least two data subsets of first data set According to the first character string of subset;
The number is generated according to the data subset for the data subset at least two data subsets of second data set According to the second character string of subset;
Utilize corresponding first character string of data subset at least two data subsets of first data set, composition The first character trail, and respectively corresponded using the data subset at least two data subsets of second data set The second character string, form the second character trail.
6. according to the method described in claim 5, wherein, first data set is the data in first object database, institute Stating the second data set is the data in the second target database;And
The data in data subset at least two data subsets of first data set belong to the first object data The data in data subset in the same record in library and at least two data subsets of second data set belong to institute State the same record of the second target database.
7. according to the method described in claim 5, wherein, at least two data subsets for first data set Data subset the first character string of the data subset is generated according to the data subset, comprising:
Data in the data subset are obtained into the first character string of the data subset according to the preset successively splicing that puts in order; And
At least two data subsets of data subset in to(for) second data set is generated according to the data subset Second character string of the data subset, including;
Data in the data subset are obtained into the data according to the preset corresponding successively splicing that puts in order that puts in order Second character string of subset.
8. according to the method described in claim 5, wherein, at least two data subsets for first data set Data subset the first character string of the data subset is generated according to the data subset, comprising:
Data in the data subset are obtained into initial first word of the data subset according to the preset successively splicing that puts in order Symbol string, and preset characters are inserted between the every two data that initial first character string includes, and new by what is obtained Initial first character string be determined as the first character string of the data subset;And
At least two data subsets of data subset in to(for) second data set is generated according to the data subset Second character string of the data subset, comprising:
Data in the data subset are obtained into the data according to the preset corresponding successively splicing that puts in order that puts in order Second init string of subset, and predetermined word is inserted between the every two data that second init string includes It accords with, and obtained new initial second character string is determined as to the second character string of the data subset.
9. method described in one of -8 according to claim 1, wherein described to be based on the first character trail and second word Trail is accorded with, determines whether first data set and second data set are identical, comprising:
Based on preset hash algorithm, the character string concentrated to first character string is respectively processed, and obtains the first character The corresponding first hashed value collection of trail, and it is based on the hash algorithm, the character string that second character string is concentrated is distinguished It is handled, obtains the corresponding second hashed value collection of the second character trail;
Based on the first hashed value collection and the second hashed value collection, first data set and second data set are determined It is whether identical.
10. described to be based on the first hashed value collection and second hashed value according to the method described in claim 9, wherein Collection determines the whether identical of first data set and second data set, comprising:
The total and/or total of the hashed value for the hashed value and second hashed value concentration that first hashed value is concentrated is determined respectively Product;
It is identical in response to the corresponding summation of determination the first hashed value collection summation corresponding with the second hashed value collection, or ring Institute should be determined in determining that the corresponding total product of the first hashed value collection total product corresponding with the second hashed value collection is identical It is identical with second data set to state the first data set.
11. a kind of electronic equipment, comprising:
One or more processors;
Storage device is stored thereon with one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real The now method as described in any in claim 1-10.
12. a kind of computer-readable medium, is stored thereon with computer program, wherein the realization when program is executed by processor Method as described in any in claim 1-10.
CN201811562212.3A 2018-12-20 2018-12-20 Method and device for verifying data Active CN109635256B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811562212.3A CN109635256B (en) 2018-12-20 2018-12-20 Method and device for verifying data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811562212.3A CN109635256B (en) 2018-12-20 2018-12-20 Method and device for verifying data

Publications (2)

Publication Number Publication Date
CN109635256A true CN109635256A (en) 2019-04-16
CN109635256B CN109635256B (en) 2023-07-11

Family

ID=66075699

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811562212.3A Active CN109635256B (en) 2018-12-20 2018-12-20 Method and device for verifying data

Country Status (1)

Country Link
CN (1) CN109635256B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362722A (en) * 2019-06-04 2019-10-22 苏州神州数码捷通科技有限公司 A kind of handbook data method of calibration based on big data
CN110377883A (en) * 2019-06-04 2019-10-25 苏州神州数码捷通科技有限公司 A kind of deep processing data verification method based on big data
CN110459098A (en) * 2019-08-14 2019-11-15 毕莘教育咨询(深圳)有限公司 Method, mark generating method and the system of identical judgement are inscribed for upper machine programming
CN111064697A (en) * 2019-10-21 2020-04-24 上海百事通信息技术股份有限公司 Data transmission method, device, storage medium and terminal
CN112182120A (en) * 2020-10-14 2021-01-05 瀚高基础软件股份有限公司 Data table processing method and device and storage medium
CN112307489A (en) * 2020-06-24 2021-02-02 神州融安科技(北京)有限公司 Character display method, device, electronic equipment and computer readable storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786911A (en) * 2014-12-25 2016-07-20 阿里巴巴集团控股有限公司 Application data checking method and device
CN106899411A (en) * 2016-12-08 2017-06-27 阿里巴巴集团控股有限公司 A kind of method of calibration and device based on identifying code

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105786911A (en) * 2014-12-25 2016-07-20 阿里巴巴集团控股有限公司 Application data checking method and device
CN106899411A (en) * 2016-12-08 2017-06-27 阿里巴巴集团控股有限公司 A kind of method of calibration and device based on identifying code

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110362722A (en) * 2019-06-04 2019-10-22 苏州神州数码捷通科技有限公司 A kind of handbook data method of calibration based on big data
CN110377883A (en) * 2019-06-04 2019-10-25 苏州神州数码捷通科技有限公司 A kind of deep processing data verification method based on big data
CN110459098A (en) * 2019-08-14 2019-11-15 毕莘教育咨询(深圳)有限公司 Method, mark generating method and the system of identical judgement are inscribed for upper machine programming
CN110459098B (en) * 2019-08-14 2021-09-21 毕莘教育咨询(深圳)有限公司 Method for judging identity of on-machine programming questions, and identification generation method and system
CN111064697A (en) * 2019-10-21 2020-04-24 上海百事通信息技术股份有限公司 Data transmission method, device, storage medium and terminal
CN112307489A (en) * 2020-06-24 2021-02-02 神州融安科技(北京)有限公司 Character display method, device, electronic equipment and computer readable storage medium
CN112307489B (en) * 2020-06-24 2024-03-22 神州融安科技(北京)有限公司 Character display method, device, electronic equipment and computer readable storage medium
CN112182120A (en) * 2020-10-14 2021-01-05 瀚高基础软件股份有限公司 Data table processing method and device and storage medium

Also Published As

Publication number Publication date
CN109635256B (en) 2023-07-11

Similar Documents

Publication Publication Date Title
CN109635256A (en) Method and apparatus for verifying data
CN108880931B (en) Method and apparatus for outputting information
CN108985208A (en) The method and apparatus for generating image detection model
CN108933695B (en) Method and apparatus for processing information
CN109062563A (en) Method and apparatus for generating the page
CN110213614A (en) The method and apparatus of key frame are extracted from video file
CN109002385A (en) Method for testing pressure and device for data flow system
US9934291B2 (en) Dynamic presentation of a results set by a form-based software application
CN108882025A (en) Video frame treating method and apparatus
CN109284367A (en) Method and apparatus for handling text
US20170199912A1 (en) Behavior topic grids
CN111752834A (en) Automatic testing method and device
CN109614327A (en) Method and apparatus for output information
CN109885564A (en) Method and apparatus for sending information
CN109325227A (en) Method and apparatus for generating amendment sentence
CN110852057A (en) Method and device for calculating text similarity
CN109840072B (en) Information processing method and device
CN111949655A (en) Form display method and device, electronic equipment and medium
CN109242892B (en) Method and apparatus for determining the geometric transform relation between image
CN115563942A (en) Contract generation method and device, electronic equipment and computer readable medium
CN109271397A (en) Method and apparatus for handling information
CN110019531A (en) A kind of method and apparatus obtaining analogical object set
CN110020040A (en) Inquire the methods, devices and systems of data
CN109597819A (en) Method and apparatus for more new database
CN113760695A (en) Method and device for positioning problem code

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant