CN113760907A

CN113760907A - Data uniqueness identification method in database

Info

Publication number: CN113760907A
Application number: CN202110883879.9A
Authority: CN
Inventors: 王锦胤; 刘海涛; 史延莹
Original assignee: Zijincheng Credit Investigation Co ltd
Current assignee: Zijincheng Credit Investigation Co ltd
Priority date: 2021-08-02
Filing date: 2021-08-02
Publication date: 2021-12-07

Abstract

The application discloses a data uniqueness identification method in a database. Defining a data identification column, including original data field selection and data identification field definition, selecting a related column needing to judge uniqueness, and adding a new data identification column; generating a data unique identifier, generating the data unique identifier for the selected column, and storing the generated result in the newly added identifier column for later use; judging the uniqueness of the data, and comparing the generated uniqueness identifier column with the uniqueness identifier column of the existing data in the library, wherein if the same uniqueness identifier exists, the same data is considered to exist, and if the same uniqueness identifier does not exist, the same data is not considered to exist; when the same data exists, performing relevant processing on the repeated data according to the requirement of a service scene; and when the same data does not exist, the new data is put in a warehouse and the data initial version identification is added. The data recording method and device solve the technical problems that only the uniqueness of the data record can be kept, and the uniqueness of the recorded data content cannot be effectively identified.

Description

Data uniqueness identification method in database

Technical Field

The application relates to the field of front-end development, in particular to a data uniqueness identification method in a database.

Background

The current data uniqueness identification mode mainly comprises the following steps: a digital serial number, a Universal Unique Identifier (UUID), a Global Unique Identifier (GUID), a timestamp, and the like.

At present, the prior art on the market has the following defects:

the uniqueness of the data record can be kept, and the uniqueness of the recorded data content cannot be effectively identified.

Aiming at the problem that the uniqueness of the recorded data content cannot be effectively identified in the related art, no effective solution is provided at present.

Disclosure of Invention

The present application mainly aims to provide a method for identifying uniqueness of data in a database, so as to solve the above problems.

In order to achieve the above object, according to one aspect of the present application, there is provided a data uniqueness identifying method in a database.

The data uniqueness identification method in the database comprises the following steps:

defining a data identification column, selecting a related column needing to judge uniqueness, and adding a new data identification column;

generating a data uniqueness identifier, and storing a generated result in a newly added identifier column for later use;

judging the uniqueness of the data, and comparing the uniqueness with the uniqueness of the existing data in the library through the uniqueness identification column;

when the same unique identifier is judged to exist, performing relevant processing on the repeated data according to the requirement of a service scene;

and when the same unique identifier does not exist, warehousing the new data and adding the data initial version identifier.

Further, the defining a data identifier column specifically includes:

selecting an original data field;

data identification field definitions.

Further, the original data field is selected to be a data column which needs to distinguish data uniqueness, and one field, a plurality of fields or all fields can be selected according to different specific service scenes.

Further, the data identification field is defined and used for storing a data identification calculation result.

Further, the specific step of generating the data unique identifier includes:

serializing each selected field into a character string;

sorting the selected words according to the field names;

splicing and combining the serialized character strings according to the sequencing result;

and carrying out Hash calculation on the generated sequencing result, wherein the calculation result is used as a unique data identifier.

Further, the uniqueness of the data is judged, if the same uniqueness mark exists, the same data is considered to exist, and if the same uniqueness mark does not exist, the same data is considered to not exist.

Further, when it is determined that the same unique identifier exists, the correlation processing is performed on the repeated data according to the needs of the service scenario, which specifically includes:

when the data needs to be reserved, the data version identification is put in a warehouse and updated as required;

when no reservation is needed, the data is discarded.

In the embodiment of the application, a data unique identifier is generated by defining a data identifier column, the generated result is stored in a newly added identifier column for standby, the unique identifier column is compared with a unique identifier column of existing data in a library, when the unique identifier column is judged to be identical, the repeated data is subjected to relevant processing according to the requirement of a service scene, otherwise, new data is put in a warehouse and a data initial version identifier is added, the technical effects of compatibility completion with the existing database data Identifier (ID) technology, wide application scenes, high efficiency and resource saving are achieved, and the technical problem that the uniqueness of recorded data content cannot be effectively identified in the prior art is solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:

FIG. 1 is a flow chart of a method for identifying uniqueness of data in a database according to an embodiment of the application;

fig. 2 is a flow chart of generating a data unique identifier according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings. These terms are used primarily to better describe the invention and its embodiments and are not intended to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.

Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meanings of these terms in the present invention can be understood by those skilled in the art as appropriate.

Furthermore, the terms "mounted," "disposed," "provided," "connected," and "sleeved" are to be construed broadly. For example, it may be a fixed connection, a removable connection, or a unitary construction; can be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements or components. The specific meanings of the above terms in the present invention can be understood by those of ordinary skill in the art according to specific situations.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

According to an embodiment of the present invention, as shown in fig. 1, there is provided a method for identifying uniqueness of data in a database, the method including the following steps:

defining a data identification column, selecting a related column needing to judge uniqueness, and adding a new data identification column, which specifically comprises the following steps:

selecting original data fields, selecting data columns (data fields) needing to distinguish data uniqueness, and selecting one field, a plurality of fields or all fields according to different specific service scenes;

and the data identification field definition is used for storing the data identification calculation result.

Generating a data unique identifier, and storing the generated result in an added identifier column, as shown in fig. 2, the specific step of generating the data unique identifier from the selected column includes:

serializing each selected field into a character string;

sorting the selected words according to the field names;

and performing Hash (Hash) calculation on the generated sequencing result, wherein the calculation result is used as the unique data identifier.

Judging the uniqueness of the data, comparing the generated uniqueness identifier column with the uniqueness identifier column of the existing data in the library, and determining that the same data exists if the same uniqueness identifier exists; absence is considered to be no identical data.

When the same data is judged to exist, according to the needs of the service scene, the relevant processing is performed on the repeated data, which specifically comprises:

when no reservation is needed, the data is discarded.

And when the same data does not exist, warehousing the new data and adding the data initial version identification.

From the above description, it can be seen that the present invention achieves the following technical effects:

in the embodiment of the application, the method is compatible with the existing database data Identification (ID) technology, and can be used as a data identification for replacing the existing database technology, and can also be used as a newly added field as an auxiliary column of the existing data identification;

the method has wide application scenes and can be used in database applications such as relational databases, document databases, distributed data and the like; the method can also be used in big data application scenes such as data cleaning and data deduplication;

the method is efficient and saves resources, only one identification column of 32-256 bits is needed to be added, and all required fields are not needed to be judged.

It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for uniquely identifying data in a database is characterized by comprising the following steps:

2. The method according to claim 1, wherein the defining a data identification column specifically includes:

selecting an original data field;

data identification field definitions.

3. The method according to claim 2, wherein the original data field is selected to select a data column that needs to distinguish data uniqueness, and one field, multiple fields, or all fields can be selected according to different specific service scenarios.

4. The method for uniquely identifying data in a database according to claim 2, wherein the data identification field is defined for storing a result of the data identification calculation.

5. The method for uniquely identifying data in a database according to claim 1, wherein the step of generating the unique data identification comprises:

serializing each selected field into a character string;

sorting the selected words according to the field names;

6. The method according to claim 1, wherein the data uniqueness is determined, and if the same uniqueness exists, the same data is considered to exist, and if the same uniqueness does not exist, the same data is not considered to exist.

7. The method for identifying uniqueness of data in a database according to claim 1, wherein when the same uniqueness identification is determined to exist, the method performs correlation processing on the repeated data according to the needs of a service scenario, and specifically comprises:

when no reservation is needed, the data is discarded.