CN113760907A - Data uniqueness identification method in database - Google Patents

Data uniqueness identification method in database Download PDF

Info

Publication number
CN113760907A
CN113760907A CN202110883879.9A CN202110883879A CN113760907A CN 113760907 A CN113760907 A CN 113760907A CN 202110883879 A CN202110883879 A CN 202110883879A CN 113760907 A CN113760907 A CN 113760907A
Authority
CN
China
Prior art keywords
data
uniqueness
column
identification
identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110883879.9A
Other languages
Chinese (zh)
Inventor
王锦胤
刘海涛
史延莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zijincheng Credit Investigation Co ltd
Original Assignee
Zijincheng Credit Investigation Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zijincheng Credit Investigation Co ltd filed Critical Zijincheng Credit Investigation Co ltd
Priority to CN202110883879.9A priority Critical patent/CN113760907A/en
Publication of CN113760907A publication Critical patent/CN113760907A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof

Abstract

The application discloses a data uniqueness identification method in a database. Defining a data identification column, including original data field selection and data identification field definition, selecting a related column needing to judge uniqueness, and adding a new data identification column; generating a data unique identifier, generating the data unique identifier for the selected column, and storing the generated result in the newly added identifier column for later use; judging the uniqueness of the data, and comparing the generated uniqueness identifier column with the uniqueness identifier column of the existing data in the library, wherein if the same uniqueness identifier exists, the same data is considered to exist, and if the same uniqueness identifier does not exist, the same data is not considered to exist; when the same data exists, performing relevant processing on the repeated data according to the requirement of a service scene; and when the same data does not exist, the new data is put in a warehouse and the data initial version identification is added. The data recording method and device solve the technical problems that only the uniqueness of the data record can be kept, and the uniqueness of the recorded data content cannot be effectively identified.

Description

Data uniqueness identification method in database
Technical Field
The application relates to the field of front-end development, in particular to a data uniqueness identification method in a database.
Background
The current data uniqueness identification mode mainly comprises the following steps: a digital serial number, a Universal Unique Identifier (UUID), a Global Unique Identifier (GUID), a timestamp, and the like.
At present, the prior art on the market has the following defects:
the uniqueness of the data record can be kept, and the uniqueness of the recorded data content cannot be effectively identified.
Aiming at the problem that the uniqueness of the recorded data content cannot be effectively identified in the related art, no effective solution is provided at present.
Disclosure of Invention
The present application mainly aims to provide a method for identifying uniqueness of data in a database, so as to solve the above problems.
In order to achieve the above object, according to one aspect of the present application, there is provided a data uniqueness identifying method in a database.
The data uniqueness identification method in the database comprises the following steps:
defining a data identification column, selecting a related column needing to judge uniqueness, and adding a new data identification column;
generating a data uniqueness identifier, and storing a generated result in a newly added identifier column for later use;
judging the uniqueness of the data, and comparing the uniqueness with the uniqueness of the existing data in the library through the uniqueness identification column;
when the same unique identifier is judged to exist, performing relevant processing on the repeated data according to the requirement of a service scene;
and when the same unique identifier does not exist, warehousing the new data and adding the data initial version identifier.
Further, the defining a data identifier column specifically includes:
selecting an original data field;
data identification field definitions.
Further, the original data field is selected to be a data column which needs to distinguish data uniqueness, and one field, a plurality of fields or all fields can be selected according to different specific service scenes.
Further, the data identification field is defined and used for storing a data identification calculation result.
Further, the specific step of generating the data unique identifier includes:
serializing each selected field into a character string;
sorting the selected words according to the field names;
splicing and combining the serialized character strings according to the sequencing result;
and carrying out Hash calculation on the generated sequencing result, wherein the calculation result is used as a unique data identifier.
Further, the uniqueness of the data is judged, if the same uniqueness mark exists, the same data is considered to exist, and if the same uniqueness mark does not exist, the same data is considered to not exist.
Further, when it is determined that the same unique identifier exists, the correlation processing is performed on the repeated data according to the needs of the service scenario, which specifically includes:
when the data needs to be reserved, the data version identification is put in a warehouse and updated as required;
when no reservation is needed, the data is discarded.
In the embodiment of the application, a data unique identifier is generated by defining a data identifier column, the generated result is stored in a newly added identifier column for standby, the unique identifier column is compared with a unique identifier column of existing data in a library, when the unique identifier column is judged to be identical, the repeated data is subjected to relevant processing according to the requirement of a service scene, otherwise, new data is put in a warehouse and a data initial version identifier is added, the technical effects of compatibility completion with the existing database data Identifier (ID) technology, wide application scenes, high efficiency and resource saving are achieved, and the technical problem that the uniqueness of recorded data content cannot be effectively identified in the prior art is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:
FIG. 1 is a flow chart of a method for identifying uniqueness of data in a database according to an embodiment of the application;
fig. 2 is a flow chart of generating a data unique identifier according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings. These terms are used primarily to better describe the invention and its embodiments and are not intended to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.
Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meanings of these terms in the present invention can be understood by those skilled in the art as appropriate.
Furthermore, the terms "mounted," "disposed," "provided," "connected," and "sleeved" are to be construed broadly. For example, it may be a fixed connection, a removable connection, or a unitary construction; can be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements or components. The specific meanings of the above terms in the present invention can be understood by those of ordinary skill in the art according to specific situations.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
According to an embodiment of the present invention, as shown in fig. 1, there is provided a method for identifying uniqueness of data in a database, the method including the following steps:
defining a data identification column, selecting a related column needing to judge uniqueness, and adding a new data identification column, which specifically comprises the following steps:
selecting original data fields, selecting data columns (data fields) needing to distinguish data uniqueness, and selecting one field, a plurality of fields or all fields according to different specific service scenes;
and the data identification field definition is used for storing the data identification calculation result.
Generating a data unique identifier, and storing the generated result in an added identifier column, as shown in fig. 2, the specific step of generating the data unique identifier from the selected column includes:
serializing each selected field into a character string;
sorting the selected words according to the field names;
splicing and combining the serialized character strings according to the sequencing result;
and performing Hash (Hash) calculation on the generated sequencing result, wherein the calculation result is used as the unique data identifier.
Judging the uniqueness of the data, comparing the generated uniqueness identifier column with the uniqueness identifier column of the existing data in the library, and determining that the same data exists if the same uniqueness identifier exists; absence is considered to be no identical data.
When the same data is judged to exist, according to the needs of the service scene, the relevant processing is performed on the repeated data, which specifically comprises:
when the data needs to be reserved, the data version identification is put in a warehouse and updated as required;
when no reservation is needed, the data is discarded.
And when the same data does not exist, warehousing the new data and adding the data initial version identification.
From the above description, it can be seen that the present invention achieves the following technical effects:
in the embodiment of the application, the method is compatible with the existing database data Identification (ID) technology, and can be used as a data identification for replacing the existing database technology, and can also be used as a newly added field as an auxiliary column of the existing data identification;
the method has wide application scenes and can be used in database applications such as relational databases, document databases, distributed data and the like; the method can also be used in big data application scenes such as data cleaning and data deduplication;
the method is efficient and saves resources, only one identification column of 32-256 bits is needed to be added, and all required fields are not needed to be judged.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (7)

1. A method for uniquely identifying data in a database is characterized by comprising the following steps:
defining a data identification column, selecting a related column needing to judge uniqueness, and adding a new data identification column;
generating a data uniqueness identifier, and storing a generated result in a newly added identifier column for later use;
judging the uniqueness of the data, and comparing the uniqueness with the uniqueness of the existing data in the library through the uniqueness identification column;
when the same unique identifier is judged to exist, performing relevant processing on the repeated data according to the requirement of a service scene;
and when the same unique identifier does not exist, warehousing the new data and adding the data initial version identifier.
2. The method according to claim 1, wherein the defining a data identification column specifically includes:
selecting an original data field;
data identification field definitions.
3. The method according to claim 2, wherein the original data field is selected to select a data column that needs to distinguish data uniqueness, and one field, multiple fields, or all fields can be selected according to different specific service scenarios.
4. The method for uniquely identifying data in a database according to claim 2, wherein the data identification field is defined for storing a result of the data identification calculation.
5. The method for uniquely identifying data in a database according to claim 1, wherein the step of generating the unique data identification comprises:
serializing each selected field into a character string;
sorting the selected words according to the field names;
splicing and combining the serialized character strings according to the sequencing result;
and carrying out Hash calculation on the generated sequencing result, wherein the calculation result is used as a unique data identifier.
6. The method according to claim 1, wherein the data uniqueness is determined, and if the same uniqueness exists, the same data is considered to exist, and if the same uniqueness does not exist, the same data is not considered to exist.
7. The method for identifying uniqueness of data in a database according to claim 1, wherein when the same uniqueness identification is determined to exist, the method performs correlation processing on the repeated data according to the needs of a service scenario, and specifically comprises:
when the data needs to be reserved, the data version identification is put in a warehouse and updated as required;
when no reservation is needed, the data is discarded.
CN202110883879.9A 2021-08-02 2021-08-02 Data uniqueness identification method in database Pending CN113760907A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110883879.9A CN113760907A (en) 2021-08-02 2021-08-02 Data uniqueness identification method in database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110883879.9A CN113760907A (en) 2021-08-02 2021-08-02 Data uniqueness identification method in database

Publications (1)

Publication Number Publication Date
CN113760907A true CN113760907A (en) 2021-12-07

Family

ID=78788344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110883879.9A Pending CN113760907A (en) 2021-08-02 2021-08-02 Data uniqueness identification method in database

Country Status (1)

Country Link
CN (1) CN113760907A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385157A (en) * 2023-06-05 2023-07-04 紫金诚征信有限公司 Data processing method and device for credit investigation credit principal identification

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112035571A (en) * 2020-08-19 2020-12-04 深圳乐信软件技术有限公司 Data synchronization method, device, equipment and storage medium
US20210073196A1 (en) * 2019-09-09 2021-03-11 Sap Se Semantic, single-column identifiers for data entries
CN112579623A (en) * 2019-09-29 2021-03-30 北京国双科技有限公司 Method, device, storage medium and equipment for storing data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210073196A1 (en) * 2019-09-09 2021-03-11 Sap Se Semantic, single-column identifiers for data entries
CN112579623A (en) * 2019-09-29 2021-03-30 北京国双科技有限公司 Method, device, storage medium and equipment for storing data
CN112035571A (en) * 2020-08-19 2020-12-04 深圳乐信软件技术有限公司 Data synchronization method, device, equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116385157A (en) * 2023-06-05 2023-07-04 紫金诚征信有限公司 Data processing method and device for credit investigation credit principal identification
CN116385157B (en) * 2023-06-05 2023-08-15 紫金诚征信有限公司 Data processing method and device for credit investigation credit principal identification

Similar Documents

Publication Publication Date Title
CN111459985B (en) Identification information processing method and device
EP2924594A1 (en) Data encoding and corresponding data structure in a column-store database
CN104794123A (en) Method and device for establishing NoSQL database index for semi-structured data
CN106874281B (en) Method and device for realizing database read-write separation
CN107092686B (en) File management method and device based on cloud storage platform
CN110490761B (en) Power grid distribution network equipment ledger data model modeling method
CN113326264A (en) Data processing method, server and storage medium
CN113760907A (en) Data uniqueness identification method in database
CN112307318A (en) Content publishing method, system and device
CN110018845A (en) Metadata version control methods and device
CN107291938A (en) Order Query System and method
CN111897837B (en) Data query method, device, equipment and medium
CN109672608B (en) Method for transmitting messages according to time
CN105740251B (en) Method and system for integrating different content sources in bus mode
CN116521956A (en) Graph database query method and device, electronic equipment and storage medium
CN110659393A (en) Method and system for generating xml code
CN105426676A (en) Drilling data processing method and system
CN111563123B (en) Real-time synchronization method for hive warehouse metadata
CN114936269A (en) Document searching platform, searching method, device, electronic equipment and storage medium
CN111666278B (en) Data storage method, data retrieval method, electronic device and storage medium
CN112905847A (en) Tree structure construction method and device
CN107888415B (en) Network management system data maintenance method
CN111258955A (en) File reading method and system, storage medium and computer equipment
CN111782886A (en) Method and device for managing metadata
CN103823671A (en) Control establishing method, control invoking method and control invoking system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination