CN112711683A - Data comparison method and device and computer equipment - Google Patents

Data comparison method and device and computer equipment Download PDF

Info

Publication number
CN112711683A
CN112711683A CN202110211405.XA CN202110211405A CN112711683A CN 112711683 A CN112711683 A CN 112711683A CN 202110211405 A CN202110211405 A CN 202110211405A CN 112711683 A CN112711683 A CN 112711683A
Authority
CN
China
Prior art keywords
data set
bitmap
data
determining
elements
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110211405.XA
Other languages
Chinese (zh)
Inventor
肖俊贤
段夕华
王帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Koubei Network Technology Co Ltd
Original Assignee
Zhejiang Koubei Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Koubei Network Technology Co Ltd filed Critical Zhejiang Koubei Network Technology Co Ltd
Priority to CN202110211405.XA priority Critical patent/CN112711683A/en
Publication of CN112711683A publication Critical patent/CN112711683A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9038Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Abstract

The embodiment of the specification provides a data comparison method, a data comparison device and computer equipment, and for a first data set and a second data set which need to be compared, correspondence between the data sets and bitmaps is achieved through bitmap processing. And because the first bitmap corresponding to the first data set and the second bitmap corresponding to the second data set correspond to the same value range, the comparison of the two bitmaps can be quickly carried out according to the state whether the values represented by the elements at the same positions in the two bitmaps exist, and the difference between the first data set and the second data set can be quickly determined.

Description

Data comparison method and device and computer equipment
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to a data comparison method, an apparatus, and a computer device.
Background
Data comparison is a common process in the technical field of data processing and the like, and may refer to comparing two data sets, and determining whether the two data sets are the same, different, the same part exists or not through comparison. Therefore, how to improve the comparison efficiency of the data set is an urgent technical problem to be solved.
Disclosure of Invention
In order to overcome the problems in the related art, the specification provides a data comparison method, a data comparison device and computer equipment.
According to a first aspect of embodiments herein, there is provided a data alignment method, comprising:
after a first data set and a second data set need to be compared, acquiring a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set; the first bitmap and the second bitmap correspond to the same value range, each bit element in the first bitmap and the second bitmap represents whether a value corresponding to the element exists or not, and the value corresponds to original data of a data set;
and comparing the elements in the first bitmap with the elements in the corresponding positions in the second bitmap, and determining a comparison result of the first data set and the second data set according to the comparison result.
In some examples, the first bitmap and the second bitmap are obtained by performing bitmap processing on the corresponding data sets, and the bitmap processing includes:
determining a numerical value corresponding to each original data in a data set, determining a numerical range corresponding to the data set, and generating a bitmap according to the numerical value corresponding to each original data according to a set numerical value sequence, wherein elements contained in the bitmap correspond to the numerical range.
In some examples, the determining the range of values to which the data set corresponds includes:
determining maximum values in each original data in the first data set and each original data in the second data set, and determining minimum values in each original data in the first data set and each original data in the second data set;
determining the range of values based on the maximum and minimum values.
In some examples, the raw data is a non-numeric type of data, and the numeric value corresponding to the raw data is determined by converting each raw data in the data set into a numeric value.
In some examples, the determining the alignment of the first data set and the second data set according to the alignment comprises:
if the elements of the first bitmap are the same as the elements of the corresponding positions in the second bitmap, determining that the first data set and the second data set both contain or do not contain original data corresponding to the values according to whether the values represented by the elements exist;
and if the elements of the first bitmap are different from the elements of the corresponding positions in the second bitmap, determining the difference between the original data corresponding to the values of the first data set and the second data set represented by the elements according to whether the values of the elements of the first bitmap are present and whether the values of the elements of the second bitmap are present.
In some examples, each raw data of the data set is a mobile phone number;
the determining a numerical value corresponding to each original data in the data set and determining a numerical range corresponding to the data set includes:
and removing the first digit of each mobile phone number in the data set, determining the value obtained after removal as the value corresponding to the original data, and determining the value range corresponding to the data set according to the value corresponding to the original data.
In some examples, the numerical range corresponding to the data set refers to a numerical range from which unopened mobile phone number segments are removed.
According to a second aspect of embodiments herein, there is provided a data alignment apparatus, comprising:
an acquisition module to: after a first data set and a second data set need to be compared, acquiring a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set; the first bitmap and the second bitmap correspond to the same value range, each bit element in the first bitmap and the second bitmap represents whether a value corresponding to the element exists or not, and the value corresponds to original data of a data set;
a comparison module for: and comparing the elements in the first bitmap with the elements in the corresponding positions in the second bitmap, and determining a comparison result of the first data set and the second data set according to the comparison result.
In some examples, the first bitmap and the second bitmap are obtained by performing bitmap processing on the corresponding data sets, and the bitmap processing includes:
determining a numerical value corresponding to each original data in a data set, determining a numerical range corresponding to the data set, and generating a bitmap according to the numerical value corresponding to each original data according to a set numerical value sequence, wherein elements contained in the bitmap correspond to the numerical range, and each bit element in the bitmap represents whether the numerical value corresponding to the element exists or not.
In some examples, the determining the range of values to which the data set corresponds includes:
determining maximum values in each original data in the first data set and each original data in the second data set, and determining minimum values in each original data in the first data set and each original data in the second data set;
determining the range of values based on the maximum and minimum values.
In some examples, the raw data is a non-numeric type of data, and the numeric value corresponding to the raw data is determined by converting each raw data in the data set into a numeric value.
In some examples, the determining the alignment of the first data set and the second data set according to the alignment comprises:
if the elements of the first bitmap are the same as the elements of the corresponding positions in the second bitmap, determining that the first data set and the second data set both contain or do not contain original data corresponding to the values according to whether the values represented by the elements exist;
and if the elements of the first bitmap are different from the elements of the corresponding positions in the second bitmap, determining the difference between the original data corresponding to the values of the first data set and the second data set represented by the elements according to whether the values of the elements of the first bitmap are present and whether the values of the elements of the second bitmap are present.
In some examples, each raw data of the data set is a mobile phone number;
the determining a numerical value corresponding to each original data in the data set and determining a numerical range corresponding to the data set includes:
and removing the first digit of each mobile phone number in the data set, determining the value obtained after removal as the value corresponding to the original data, and determining the value range corresponding to the data set according to the value corresponding to the original data.
In some examples, the numerical range corresponding to the data set refers to a numerical range from which unopened mobile phone number segments are removed.
Accordingly, the present specification also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the following method when executing the program:
after a first data set and a second data set need to be compared, acquiring a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set; the first bitmap and the second bitmap correspond to the same value range, each bit element in the first bitmap and the second bitmap represents whether a value corresponding to the element exists or not, and the value corresponds to original data of a data set;
and comparing the elements in the first bitmap with the elements in the corresponding positions in the second bitmap, and determining a comparison result of the first data set and the second data set according to the comparison result.
According to a third aspect of embodiments herein, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the following method when executing the program:
after a first data set and a second data set need to be compared, acquiring a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set; the first bitmap and the second bitmap correspond to the same value range, each bit element in the first bitmap and the second bitmap represents whether a value corresponding to the element exists or not, and the value corresponds to original data of a data set;
and comparing the elements in the first bitmap with the elements in the corresponding positions in the second bitmap, and determining a comparison result of the first data set and the second data set according to the comparison result.
The technical scheme provided by the embodiment of the specification can have the following beneficial effects:
in the embodiment of the present specification, for a first data set and a second data set that need to be compared, correspondence between the data set and a bitmap is realized through bitmap processing. And because the first bitmap corresponding to the first data set and the second bitmap corresponding to the second data set correspond to the same value range, the comparison of the two bitmaps can be quickly carried out according to the state whether the values represented by the elements at the same positions in the two bitmaps exist, and the difference between the first data set and the second data set can be quickly determined.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.
FIG. 1A is a flow chart illustrating a method of data alignment according to an exemplary embodiment of the present disclosure.
FIG. 1B is a schematic diagram of a bitmap shown in the present specification according to an example embodiment.
FIG. 1C is a bitmap illustration of a data set shown in accordance with an exemplary embodiment of the present description.
Fig. 2A is a bitmap diagram illustrating a correspondence between mobile phone numbers according to an exemplary embodiment.
FIG. 2B is a schematic diagram of a data set alignment shown in the present specification according to an exemplary embodiment.
Fig. 3 is a hardware structure diagram of a computer device in which the data alignment apparatus according to an exemplary embodiment is shown.
FIG. 4 is a block diagram of a data alignment device shown in accordance with an exemplary embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
There is a need for data alignment in many areas of technology. Taking the internet field as an example, the current internet technology and application development has been many years, which affects the aspects of people's work and life, and internet business parties have many ways on how to develop users and promote business, wherein some collaboration among different business parties is included, such as collaboration update, and accurate advertisement putting based on user portrait, and the aspects all relate to the collaboration of business data. In the business data cooperation process, the business data of two business parties need to be compared. Of course, in other business scenarios, there is also a need for data comparison. Therefore, it is desirable to provide a fast and efficient data alignment scheme.
An embodiment of the present specification provides a data comparison scheme, as shown in fig. 1A, which is a flowchart of a data comparison method according to an exemplary embodiment shown in the present specification, including the following steps:
in step 102, after a first data set and a second data set that need to be compared are determined, a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set are obtained.
The first bitmap and the second bitmap correspond to the same value range, each bit element in the first bitmap and the second bitmap represents whether a value corresponding to the element exists or not, and the value corresponds to original data of the data set.
In step 104, the elements in the first bitmap are compared with the elements in the corresponding positions in the second bitmap, and the comparison result between the first data set and the second data set is determined according to the comparison result.
The bitmap (bitmap) of this embodiment is a data structure, and includes at least one element, the elements are arranged in sequence, and each element adopts "0" or "1" to indicate that its corresponding element does not exist or exists. Fig. 1B is a schematic diagram of a bitmap shown in this embodiment, and fig. 1B takes a bitmap including 16 elements as an example.
In this embodiment, the first data set and the second data set are two data sets that need to be compared, and the data set of this embodiment includes at least one original data. In practical applications, in different service scenarios, different implementations may be used for determining that the first data set and the second data set need to be compared, for example, by receiving a comparison request sent by a user, where the comparison request indicates the first data set and the second data set that need to be compared. The request for alignment of the first data set and the second data set may be initiated by different users, for example, one of the users indicates the first data set and the other user indicates the second data set. The data set may be carried in a comparison request initiated by a user, or may be stored locally, and the comparison request of the user indicates which data in the locally stored data are to be compared as the data set.
In this embodiment, a bitmap manner is adopted for comparison, and based on this, a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set need to be obtained.
Wherein the bitmap processing comprises:
determining a numerical value corresponding to each original data in a data set, determining a numerical range corresponding to the data set, and generating a bitmap according to the numerical value corresponding to each original data according to a set numerical value sequence, wherein elements contained in the bitmap correspond to the numerical range, and each bit element in the bitmap represents whether the numerical value corresponding to the element exists or not.
In this embodiment, when a bitmap is used to represent a data set, the bitmap does not record what the original data is, but uses an element to represent the state of whether the data exists; therefore, it is necessary to determine to which data in the data set each bit element in the bitmap corresponds. In this embodiment, the value range corresponding to the data set is determined, and the values corresponding to the raw data are arranged according to the set value sequence, that is, the data set and the values have a mapping relationship, this embodiment can store the mapping relationship between the data set and the values, and the elements included in the bitmap correspond to the value range, so that each bit element in the bitmap represents which data in the value range can be determined, and thus, the correspondence between the data set and the bitmap is realized.
As shown in FIG. 1C, a bitmap for data set [5, 1, 7, 15, 0, 4, 6, 10] is shown, corresponding to a range of values from 0-15. The corresponding numerical ranges for the bitmap of FIG. 1C are merely illustrative. In this embodiment, two data sets, a bitmap corresponding to the first data set and a bitmap corresponding to the second data set, which correspond to the same value range, are compared, and then the same value corresponding to the same element in the two bitmaps is obtained, so that whether the two values are the same can be quickly determined by determining whether the value represented by the element in the same position in the two bitmaps is in the existing state. And then the comparison of the two bitmaps can be quickly carried out, and the difference between the first data set and the second data set can be quickly determined.
There are many ways to implement this in which the first bitmap and the second bitmap correspond to the same range of values. In some examples, the range of values may be determined by the largest value in each of the raw data of the first data set and the raw data of the second data set and the smallest value in each of the raw data of the first data set and the raw data of the second data set. For example, the maximum value corresponding to each raw data of the first data set is 999, and the minimum value corresponding to each raw data of the first data set is 30; if the maximum value corresponding to each raw data of the second data set is 900 and the corresponding minimum value is 25, the value range may be determined based on 25 and 999, for example, the value range may be 25 to 999, or a value range larger than [25, 999] may be determined based on 25 to 999 as the value range corresponding to the first bitmap and the second bitmap, that is, the value range corresponding to the first bitmap and the second bitmap covers the value range from the minimum value to the maximum value, that is, both are the same, or the value range corresponding to the first bitmap and the second bitmap covers the value range from the minimum value to the maximum value. In some examples, the corresponding value ranges of the first bitmap and the second bitmap may be determined by traversing the first data set and the second data set to obtain the maximum value and the minimum value of the two data sets.
The above example can realize automatic identification of the value range, and in particular, the determination of the value range can also be realized in various other ways, and the value range corresponding to the data set can be determined by the value ranges corresponding to all possible effective expressions of data. As an example, taking the first data set and the second data set that need to be compared as the user IDs belonging to the same service system as an example, the service party sets the range for the user IDs to be 00000 to 99999, and the numerical range can be used as the numerical range corresponding to the data set. Of course, in other examples, rather than automatically identifying, the range of values may be user-entered, i.e., the range of values may be received for user input to determine, which may be more quickly determined.
In practical applications, the data types of each original data in the first data set and each original data in the second data set may be various, and how to determine the corresponding value of each original data in the data set may be various implementations. For example, for raw data of a numerical type, the raw data is the numerical value; for non-numerical type data, it can be obtained by conversion processing, for example, determined by converting each original data in the data set into a numerical value. As an example, the non-numeric data may be converted into a numeric value by an algorithm, for example, the non-numeric data may be a character string type, and for the original data of the character string type, the character string may be converted into a numeric value by a conversion algorithm of the character string and the numeric value. Optionally, when the values are converted into the numerical values, the original data in the data set may be first sorted according to a set sorting manner and then converted into the numerical values, or the original data in the data set may be converted into the numerical values and then the numerical values are sorted, so that subsequent determination of a numerical range and numerical value ratio peer-to-peer processing are facilitated.
In some examples, the determined value corresponding to each original data in the data set may be an integer type of value, so as to facilitate subsequent value sorting and comparison. If the original data in the data set is integer type data, the numerical value corresponding to the original data can be directly determined; if the original data in the data set is data of non-integer type, the data can be converted into data of integer type. For example, the decimal number may be converted into an integer type data by expanding the multiple. As an example, for data with non-numeric type raw data, the corresponding numeric value of the raw data may be determined by sorting the limited raw data set and converting the sorted data into serial number values.
After the value range is determined, the value corresponding to each original data can be generated into a bitmap according to a set value sequence; the numerical sequence may be from small to large or from large to small, as required. In practical applications, as described above, since the mapping relationship between the data set and the value is stored, after the subsequent comparison between the first bitmap and the second bitmap, the comparison result between the first data set and the second data set can be determined based on the comparison result between the first bitmap and the second bitmap and by combining the stored mapping relationship between the data set and the value.
By the mode, the elements at the same position in the first bitmap and the second bitmap correspond to the same value, and the comparison of data can be quickly realized through the comparison of the bits in the bitmaps according to the state whether the value represented in the elements exists. In the comparison, the elements in the first bitmap may be compared with the elements in the corresponding positions in the second bitmap, and if the elements in the same positions in the first bitmap and the second bitmap are the same, the states of the values corresponding to the elements are the same, that is, the values exist or do not exist in the value range, that is, the first data set and the second data set both include data corresponding to the values, or do not include data corresponding to the values.
As an example, assume that 0 is used to indicate absence and 1 is used to indicate presence; of course, in practical applications, other representation manners may also be adopted, for example, 0 represents the existence, 1 represents the nonexistence, and other numerical values may also be adopted to represent the existence state, and the present embodiment does not limit this. If the element at the first position of the first bitmap is the same as the element at the first position of the second bitmap, both are 0, that is, no value in the first ordering in the value range exists, that is, neither the first data set nor the second data set contains data corresponding to the value. If the element at the first position of the first bitmap is the same as the element at the first position of the second bitmap, both are 1, that is, the first value in the value range is present, that is, the first data set and the second data set both contain data corresponding to the value.
Similarly, if the elements at the same position in the first bitmap and the second bitmap are different, the state of the value corresponding to the element is different, that is, the value has different states in the two bitmaps in the value range, and the corresponding first data set and second data set have a difference in the data corresponding to the value.
For example, if the element at the first position of the first bitmap is different from the element at the first position of the second bitmap, the first bitmap is 1, the second bitmap is 0, the first bitmap has the value in the first value sorted in the value range, the second bitmap does not have the value, the first data set includes data corresponding to the value, and the second data set does not include data corresponding to the value. And vice versa.
By the method, basic operations such as intersection operation, union operation or complement operation among the data sets can be realized. Therefore, in the data comparison process, efficient comparison processing can be realized in a bitmap mode.
In the scheme of the embodiment, the bitmap is used for representing the data which can be sequenced in sequence, the full amount of the data can be used as a full amount bitmap according to the numerical range corresponding to the data set, each bit represents the existence of a data item, and therefore each data is compressed to be represented by one bit, and the data representation achieves a very good effect. The scheme of the embodiment can be applied to various application scenarios, for example, comparison of data such as a user mobile phone, a data ID, a staff number and the like, any character string can be converted into an integer of 26 systems, and data of a character string type can be converted into a numerical value and displayed as a bitmap by adopting the scheme of the embodiment for data comparison.
As an example, an application scenario is that two service parties need to compare their target user groups, where the user uses a mobile phone number as a user identifier, so that the target user groups of the two service parties are compared, that is, the mobile phone numbers of the users are compared.
If hash substitution is used for the plaintext of the mobile phone number, taking MD5 (MD 5 Message Digest Algorithm, Message-Digest Algorithm) as an example, each plaintext of the mobile phone numbers is 11Bytes, which is less than 5Bytes expressed by MD 5.
If the plaintext of the mobile phone number is represented by an integer, the mobile phone number is a large integer of 11 digits. In some examples, the highest order of all the mobile phone numbers may be fixed to 1 according to the design of the actual mobile phone number, and based on this, in this embodiment, the first order of each mobile phone number in the data set may be removed, the value obtained after the removal may be determined as the value corresponding to the original data, and the value range corresponding to the data set may be determined according to the value corresponding to the original data. By removing the first digit, a large integer of 10 digits remains, and the maximum number 99-9999-. If the mobile phone number of the whole quantity is stored, the total memory quantity needs 100-. As can be seen, the data storage amount can be significantly reduced by the above processing.
While the bitmap of the embodiment is used to represent the cell phone numbers, 100-0000-0000/8 = 9536 × 1024 × 1024/8 = 1193M Bytes, it is obvious that the bitmap used to represent the full cell phone numbers can significantly reduce the storage amount. Optionally, in practice, based on the opening rule of the number resource of the telecommunication network, the mobile phone number has some number segments of unopened resources, for example, the mobile phone numbers of the number segments 142, 168, etc. are not opened for the user to use, so that from the perspective of the user side, the mobile phone numbers of these number segments do not exist, the mobile phone numbers contained in the corresponding user data set do not relate to these number segments, and actually there are only fifty number segments, so that the value range can be reduced as needed, and the value range corresponding to the data set refers to the value range of the unopened mobile phone number segment removed, so that the bitmap can be further compressed, and in practical application, the bitmap can be compressed to 547M Bytes. The data size 50-0000 16Bytes = 800GBytes relative to the original MD5 HASH mode is reduced by more than 1000 times, and the data size which cannot be practically implemented is changed into a completely feasible data size.
As shown in fig. 2A, a bitmap diagram of three mobile phone numbers is shown, based on the design of the mobile phone numbers, there is a lack of mobile phone numbers of some number segments, bitmap is a 01 alternating process, and some existing compression algorithms can be further used to compress the bitmap, so that the data volume of the bitmap in the network transmission process and the persistent storage process is reduced, and the data loading efficiency is improved.
The scheme of the embodiment is suitable for a data cooperation scene, and in the data cooperation scene, data cooperation among different business parties is involved, for example, the cooperation attracts new users, and businesses such as accurate advertisement putting based on user portrait all involve the cooperation of user data. In the data cooperation process, a large amount of data comparison processing is often required, and data security is also required to be considered. In this embodiment, in the data collaboration scenario, the decentralized data collaboration service platform may provide a data collaboration service to the business party, data of the business party may be stored in a data trusted domain by the service platform, and the data trusted domain may include: a remotely provable hardware trusted execution environment is utilized to internally store a data source, such as a data source provided by a business party. The trusted domain of data may include at least one adapter node for processing the data source.
As an example, as shown in fig. 2B, the process of two business parties comparing data sets is shown:
generating a Bitmap: as an example, the traffic direction imports an internal data source to the service platform, which is first loaded into an adapter node of the internal trusted data domain, during which the data source may be converted into a bitmap. The conversion process may include: determining a numerical range according to a data source, arranging the data source in sequence and then obtaining a numerical value, generating a full-scale Bitmap according to the numerical range, initializing the full-scale Bitmap to be all 0, and indicating that data of a data set corresponding to the Bitmap are all empty; and then reading each data item in the data source, and setting the corresponding bit in the full Bitmap to be 1 according to the offset position of each data item. Based on this, regardless of the original data amount, the allocation situation after the data sequence serialization can be mapped into the full Bitmap by the full amount. In this embodiment, the service platform provides services for at least one business party, different business parties have different data trust domains, in the data domain, data of the business party can be considered as safe, and a process of generating bitmaps by a data set of the business party can be executed in the data trust domain.
Bitmap transmission: data of the business party is transmitted from the data credible domain to the outside, and the business party faces an untrusted environment. Because the Bitmap and the original data have a corresponding relation, the Bitmap can be encrypted and imported into a trusted memory. Optionally, the data may be transmitted from the trusted domain to the memory in the untrusted domain through an encrypted transmission channel, so as to protect data security. In addition, the generated full data table may be still relatively large, compression may be considered as required, the bitmap itself is a pure binary data, and particularly for the case of relatively small original data amount, because there may be a large number of continuous 0 s or continuous 1 s, a required algorithm may be selected from a plurality of data compression algorithms to perform data compression as required, and the compression ratio is relatively high, so that the data amount may be significantly reduced.
③ Bitmap calculation: for operations such as intersection or union of data calculation, no matter how many data quantities of a plurality of data sources participate in calculation, only bitwise AND or OR operation needs to be carried out on all the total quantity bitmaps corresponding to all the data sources, and the calculation result or one total quantity Bitmap is obtained, so that the calculation process is irrelevant to the data quantities. For example, in a bitwise and operation, two bits are "1" at the same time, the result is "1", and otherwise, 0 is obtained. In the bitwise OR operation, only one of two bits participating in the operation is 1, and the value is 1.
Bitmap interpretation: and (3) the comparison result of the two full bitmaps, wherein the comparison result can still be the bitmaps. In some examples, as a Bitmap of the comparison result, each bit element in the Bitmap may indicate whether there is a difference between the original data of the two data sets at the position. In some examples, the Bitmap may be derived from the encrypted memory in the data infeasible domain for transmission to two business parties to determine the comparison result of the data sets. As an example, the comparison result of two full bitmaps is still in the form of Bitmap, and there may be a large number of consecutive 0 s or consecutive 1 s, data compression may be performed by various data compression algorithms, the compression ratio is high, and thus the data amount can be significantly reduced. In the process of generating the Bitmap, the corresponding relation between the original data and the numerical value corresponding to each position element in the Bitmap is recorded, and according to a reverse mapping algorithm of the Bitmap and the original data, after the result is calculated by the Bitmap, the comparison result of the Bitmap can be directly mapped into the comparison result of the original data. For example, for the example of the mobile phone number, according to the mapping relationship between the stored bitmap and the data set, the mobile phone number represented correspondingly can be obtained only by the offset of the bit corresponding to 1 according to 100-.
Corresponding to the embodiment of the data comparison method, the specification also provides an embodiment of a data comparison device and a computer device applied by the data comparison device.
The embodiment of the data comparison device in the specification can be applied to computer equipment, such as a server or terminal equipment. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor in which the file processing is located. From a hardware aspect, as shown in fig. 3, it is a hardware structure diagram of a computer device in which the data comparison apparatus of this specification is located, except for the processor 310, the memory 330, the network interface 320, and the nonvolatile memory 340 shown in fig. 3, in an embodiment, the computer device in which the data comparison apparatus 331 is located may also include other hardware according to an actual function of the computer device, which is not described again.
As shown in fig. 4, fig. 4 is a block diagram of a data alignment apparatus according to an exemplary embodiment, the apparatus includes:
an obtaining module 41, configured to: after a first data set and a second data set need to be compared, acquiring a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set; the first bitmap and the second bitmap correspond to the same value range, each bit element in the first bitmap and the second bitmap represents whether a value corresponding to the element exists or not, and the value corresponds to original data of a data set;
an alignment module 42, configured to: comparing the elements in the first bitmap with the elements in the corresponding positions in the second bitmap, and determining a comparison result of the first data set and the second data set according to the comparison result;
in some examples, the first bitmap and the second bitmap are obtained by performing bitmap processing on the corresponding data sets, and the bitmap processing includes:
determining a numerical value corresponding to each original data in a data set, determining a numerical range corresponding to the data set, and generating a bitmap according to the numerical value corresponding to each original data according to a set numerical value sequence, wherein elements contained in the bitmap correspond to the numerical range, and each bit element in the bitmap represents whether the numerical value corresponding to the element exists or not.
In some examples, the determining the range of values to which the data set corresponds includes:
determining maximum values in each original data in the first data set and each original data in the second data set, and determining minimum values in each original data in the first data set and each original data in the second data set;
determining the range of values based on the maximum and minimum values.
In some examples, the raw data is a non-numeric type of data, and the numeric value corresponding to the raw data is determined by converting each raw data in the data set into a numeric value.
In some examples, the determining the alignment of the first data set and the second data set according to the alignment comprises:
if the elements of the first bitmap are the same as the elements of the corresponding positions in the second bitmap, determining that the first data set and the second data set both contain or do not contain original data corresponding to the values according to whether the values represented by the elements exist;
and if the elements of the first bitmap are different from the elements of the corresponding positions in the second bitmap, determining the difference between the original data corresponding to the values of the first data set and the second data set represented by the elements according to whether the values of the elements of the first bitmap are present and whether the values of the elements of the second bitmap are present.
In some examples, each raw data of the data set is a mobile phone number;
the determining a numerical value corresponding to each original data in the data set and determining a numerical range corresponding to the data set includes:
and removing the first digit of each mobile phone number in the data set, determining the value obtained after removal as the value corresponding to the original data, and determining the value range corresponding to the data set according to the value corresponding to the original data.
In some examples, the numerical range corresponding to the data set refers to a numerical range from which unopened mobile phone number segments are removed.
Accordingly, the present specification also provides a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the following method when executing the program:
after a first data set and a second data set need to be compared, acquiring a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set; the first bitmap and the second bitmap correspond to the same value range, each bit element in the first bitmap and the second bitmap represents whether a value corresponding to the element exists or not, and the value corresponds to original data of a data set;
and comparing the elements in the first bitmap with the elements in the corresponding positions in the second bitmap, and determining a comparison result of the first data set and the second data set according to the comparison result.
The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.
The above description is only a preferred embodiment of the present disclosure, and should not be taken as limiting the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (15)

1. A method of data alignment comprising:
after a first data set and a second data set need to be compared, acquiring a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set; the first bitmap and the second bitmap correspond to the same value range, each bit element in the first bitmap and the second bitmap represents whether a value corresponding to the element exists or not, and the value corresponds to original data of a data set;
and comparing the elements in the first bitmap with the elements in the corresponding positions in the second bitmap, and determining a comparison result of the first data set and the second data set according to the comparison result.
2. The method of claim 1, wherein the first bitmap and the second bitmap are obtained by performing bitmap processing on the respective data sets, and the bitmap processing comprises:
determining a numerical value corresponding to each original data in a data set, determining a numerical range corresponding to the data set, and generating a bitmap according to the numerical value corresponding to each original data according to a set numerical value sequence, wherein elements contained in the bitmap correspond to the numerical range.
3. The method of claim 1, the determining a numerical range to which the data set corresponds, comprising:
determining maximum values in each original data in the first data set and each original data in the second data set, and determining minimum values in each original data in the first data set and each original data in the second data set;
determining the range of values based on the maximum and minimum values.
4. The method of claim 1, wherein the raw data is a non-numeric type of data, and the numeric value corresponding to the raw data is determined by converting each raw data in the data set into a numeric value.
5. The method of claim 1, the determining an alignment of the first data set to a second data set from the alignment, comprising:
if the elements of the first bitmap are the same as the elements of the corresponding positions in the second bitmap, determining that the first data set and the second data set both contain or do not contain original data corresponding to the values according to whether the values represented by the elements exist;
and if the elements of the first bitmap are different from the elements of the corresponding positions in the second bitmap, determining the difference between the original data corresponding to the values of the first data set and the second data set represented by the elements according to whether the values of the elements of the first bitmap are present and whether the values of the elements of the second bitmap are present.
6. The method of claim 1, wherein each raw data of the data set is a cell phone number;
the determining a numerical value corresponding to each original data in the data set and determining a numerical range corresponding to the data set includes:
and removing the first digit of each mobile phone number in the data set, determining the value obtained after removal as the value corresponding to the original data, and determining the value range corresponding to the data set according to the value corresponding to the original data.
7. The method of claim 6, wherein the range of values corresponding to the data set is a range of values from which unopened mobile phone number segments are removed.
8. A data alignment apparatus, comprising:
an acquisition module to: after a first data set and a second data set need to be compared, acquiring a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set; the first bitmap and the second bitmap correspond to the same value range, each bit element in the first bitmap and the second bitmap represents whether a value corresponding to the element exists or not, and the value corresponds to original data of a data set;
a comparison module for: and comparing the elements in the first bitmap with the elements in the corresponding positions in the second bitmap, and determining a comparison result of the first data set and the second data set according to the comparison result.
9. The apparatus of claim 8, the acquisition module to: the bitmap processing is performed by:
determining a numerical value corresponding to each original data in a data set, determining a numerical range corresponding to the data set, and generating a bitmap according to the numerical value corresponding to each original data according to a set numerical value sequence, wherein elements contained in the bitmap correspond to the numerical range.
10. The apparatus of claim 8, the determining a range of values to which the data set corresponds, comprising:
determining maximum values in each original data in the first data set and each original data in the second data set, and determining minimum values in each original data in the first data set and each original data in the second data set;
determining the range of values based on the maximum and minimum values.
11. The apparatus of claim 8, wherein the raw data is non-numeric data, and the numeric value corresponding to the raw data is determined by sorting and converting the raw data in the data set into a numeric value.
12. The apparatus of claim 8, the determining the alignment of the first data set to the second data set from the alignment comprises:
if the elements of the first bitmap are the same as the elements of the corresponding positions in the second bitmap, determining that the first data set and the second data set both contain or do not contain original data corresponding to the values according to whether the values represented by the elements exist;
and if the elements of the first bitmap are different from the elements of the corresponding positions in the second bitmap, determining the difference between the original data corresponding to the values of the first data set and the second data set represented by the elements according to whether the values of the elements of the first bitmap are present and whether the values of the elements of the second bitmap are present.
13. The apparatus of claim 8, wherein each raw data of the data set is a cell phone number;
the acquisition module is further configured to:
and removing the first digit of each mobile phone number in the data set, determining the value obtained after removal as the value corresponding to the original data, and determining the value range corresponding to the data set according to the value corresponding to the original data.
14. The apparatus of claim 13, wherein the range of values corresponding to the data set is a range of values of mobile phone number segments from which unopened resources are removed.
15. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the following method when executing the program:
after a first data set and a second data set need to be compared, acquiring a first bitmap corresponding to the first data set and a second bitmap corresponding to the second data set; the first bitmap and the second bitmap correspond to the same value range, each bit element in the first bitmap and the second bitmap represents whether a value corresponding to the element exists or not, and the value corresponds to original data of a data set;
and comparing the elements in the first bitmap with the elements in the corresponding positions in the second bitmap, and determining a comparison result of the first data set and the second data set according to the comparison result.
CN202110211405.XA 2021-02-25 2021-02-25 Data comparison method and device and computer equipment Pending CN112711683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110211405.XA CN112711683A (en) 2021-02-25 2021-02-25 Data comparison method and device and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110211405.XA CN112711683A (en) 2021-02-25 2021-02-25 Data comparison method and device and computer equipment

Publications (1)

Publication Number Publication Date
CN112711683A true CN112711683A (en) 2021-04-27

Family

ID=75550191

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110211405.XA Pending CN112711683A (en) 2021-02-25 2021-02-25 Data comparison method and device and computer equipment

Country Status (1)

Country Link
CN (1) CN112711683A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115357384A (en) * 2022-08-17 2022-11-18 广州鼎甲计算机科技有限公司 Space recovery method and device of data de-duplication storage system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239301A (en) * 2013-06-06 2014-12-24 阿里巴巴集团控股有限公司 Data comparing method and device
US20180268019A1 (en) * 2017-03-16 2018-09-20 International Business Machines Corporation Comparison of block based volumes with ongoing inputs and outputs
CN110018996A (en) * 2018-07-23 2019-07-16 郑州云海信息技术有限公司 A kind of the snapshot rollback method and relevant apparatus of distributed memory system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104239301A (en) * 2013-06-06 2014-12-24 阿里巴巴集团控股有限公司 Data comparing method and device
US20180268019A1 (en) * 2017-03-16 2018-09-20 International Business Machines Corporation Comparison of block based volumes with ongoing inputs and outputs
CN110018996A (en) * 2018-07-23 2019-07-16 郑州云海信息技术有限公司 A kind of the snapshot rollback method and relevant apparatus of distributed memory system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115357384A (en) * 2022-08-17 2022-11-18 广州鼎甲计算机科技有限公司 Space recovery method and device of data de-duplication storage system
CN115357384B (en) * 2022-08-17 2024-02-02 广州鼎甲计算机科技有限公司 Space reclamation method and device for repeated data deleting storage system

Similar Documents

Publication Publication Date Title
US8117609B2 (en) System and method for optimizing changes of data sets
CN108628898B (en) Method, device and equipment for data storage
US8271635B2 (en) Multi-tier, multi-state lookup
CN111163130A (en) Network service system and data transmission method thereof
CN115049070A (en) Screening method and device of federal characteristic engineering data, equipment and storage medium
CN113169882A (en) System and method for block chain interoperability
CN110311855B (en) User message processing method and device, electronic equipment and storage medium
CN112348596A (en) Bidding and quotation method, system, equipment and storage medium based on block chain
CN112711683A (en) Data comparison method and device and computer equipment
CN112597525B (en) Data processing method and device based on privacy protection and server
CN113966602B (en) Distributed storage of blocks in a blockchain
CN105357100A (en) Method and device for acquiring priorities of instant messaging group members
CN113254989B (en) Fusion method and device of target data and server
CN109685129A (en) A kind of multiclass social application subject information cluster association method based on smart phone
CN115113821A (en) 5G big data computing power service system based on quantum encryption
CN114417069A (en) Page data interaction method and device and electronic equipment
CN112232639A (en) Statistical method and device and electronic equipment
CN109743188A (en) Daily record data treating method and apparatus
CN116521952B (en) Method and device for crowd-sourced statistics by using federal learning model
CN110555625B (en) Information processing method, device, computer equipment and storage medium
CN114679471B (en) Data matching method based on cloud service processing
CN113965536B (en) Message token updating method and device, equipment, medium and product thereof
CN110471933B (en) Information processing method, device, computer equipment and storage medium
CN107480286B (en) Message processing method and trusted system
CN113222757A (en) Intelligent contract management method and system for block chain

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210427