CN113760907A - Data uniqueness identification method in database - Google Patents
Data uniqueness identification method in database Download PDFInfo
- Publication number
- CN113760907A CN113760907A CN202110883879.9A CN202110883879A CN113760907A CN 113760907 A CN113760907 A CN 113760907A CN 202110883879 A CN202110883879 A CN 202110883879A CN 113760907 A CN113760907 A CN 113760907A
- Authority
- CN
- China
- Prior art keywords
- data
- uniqueness
- column
- identification
- identifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 27
- 238000012545 processing Methods 0.000 claims abstract description 7
- 238000012163 sequencing technique Methods 0.000 claims description 6
- 238000005516 engineering process Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000004140 cleaning Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/221—Column-oriented storage; Management thereof
Abstract
The application discloses a data uniqueness identification method in a database. Defining a data identification column, including original data field selection and data identification field definition, selecting a related column needing to judge uniqueness, and adding a new data identification column; generating a data unique identifier, generating the data unique identifier for the selected column, and storing the generated result in the newly added identifier column for later use; judging the uniqueness of the data, and comparing the generated uniqueness identifier column with the uniqueness identifier column of the existing data in the library, wherein if the same uniqueness identifier exists, the same data is considered to exist, and if the same uniqueness identifier does not exist, the same data is not considered to exist; when the same data exists, performing relevant processing on the repeated data according to the requirement of a service scene; and when the same data does not exist, the new data is put in a warehouse and the data initial version identification is added. The data recording method and device solve the technical problems that only the uniqueness of the data record can be kept, and the uniqueness of the recorded data content cannot be effectively identified.
Description
Technical Field
The application relates to the field of front-end development, in particular to a data uniqueness identification method in a database.
Background
The current data uniqueness identification mode mainly comprises the following steps: a digital serial number, a Universal Unique Identifier (UUID), a Global Unique Identifier (GUID), a timestamp, and the like.
At present, the prior art on the market has the following defects:
the uniqueness of the data record can be kept, and the uniqueness of the recorded data content cannot be effectively identified.
Aiming at the problem that the uniqueness of the recorded data content cannot be effectively identified in the related art, no effective solution is provided at present.
Disclosure of Invention
The present application mainly aims to provide a method for identifying uniqueness of data in a database, so as to solve the above problems.
In order to achieve the above object, according to one aspect of the present application, there is provided a data uniqueness identifying method in a database.
The data uniqueness identification method in the database comprises the following steps:
defining a data identification column, selecting a related column needing to judge uniqueness, and adding a new data identification column;
generating a data uniqueness identifier, and storing a generated result in a newly added identifier column for later use;
judging the uniqueness of the data, and comparing the uniqueness with the uniqueness of the existing data in the library through the uniqueness identification column;
when the same unique identifier is judged to exist, performing relevant processing on the repeated data according to the requirement of a service scene;
and when the same unique identifier does not exist, warehousing the new data and adding the data initial version identifier.
Further, the defining a data identifier column specifically includes:
selecting an original data field;
data identification field definitions.
Further, the original data field is selected to be a data column which needs to distinguish data uniqueness, and one field, a plurality of fields or all fields can be selected according to different specific service scenes.
Further, the data identification field is defined and used for storing a data identification calculation result.
Further, the specific step of generating the data unique identifier includes:
serializing each selected field into a character string;
sorting the selected words according to the field names;
splicing and combining the serialized character strings according to the sequencing result;
and carrying out Hash calculation on the generated sequencing result, wherein the calculation result is used as a unique data identifier.
Further, the uniqueness of the data is judged, if the same uniqueness mark exists, the same data is considered to exist, and if the same uniqueness mark does not exist, the same data is considered to not exist.
Further, when it is determined that the same unique identifier exists, the correlation processing is performed on the repeated data according to the needs of the service scenario, which specifically includes:
when the data needs to be reserved, the data version identification is put in a warehouse and updated as required;
when no reservation is needed, the data is discarded.
In the embodiment of the application, a data unique identifier is generated by defining a data identifier column, the generated result is stored in a newly added identifier column for standby, the unique identifier column is compared with a unique identifier column of existing data in a library, when the unique identifier column is judged to be identical, the repeated data is subjected to relevant processing according to the requirement of a service scene, otherwise, new data is put in a warehouse and a data initial version identifier is added, the technical effects of compatibility completion with the existing database data Identifier (ID) technology, wide application scenes, high efficiency and resource saving are achieved, and the technical problem that the uniqueness of recorded data content cannot be effectively identified in the prior art is solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this application, serve to provide a further understanding of the application and to enable other features, objects, and advantages of the application to be more apparent. The drawings and their description illustrate the embodiments of the invention and do not limit it. In the drawings:
FIG. 1 is a flow chart of a method for identifying uniqueness of data in a database according to an embodiment of the application;
fig. 2 is a flow chart of generating a data unique identifier according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be used. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In this application, the terms "upper", "lower", "left", "right", "front", "rear", "top", "bottom", "inner", "outer", "middle", "vertical", "horizontal", "lateral", "longitudinal", and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings. These terms are used primarily to better describe the invention and its embodiments and are not intended to limit the indicated devices, elements or components to a particular orientation or to be constructed and operated in a particular orientation.
Moreover, some of the above terms may be used to indicate other meanings besides the orientation or positional relationship, for example, the term "on" may also be used to indicate some kind of attachment or connection relationship in some cases. The specific meanings of these terms in the present invention can be understood by those skilled in the art as appropriate.
Furthermore, the terms "mounted," "disposed," "provided," "connected," and "sleeved" are to be construed broadly. For example, it may be a fixed connection, a removable connection, or a unitary construction; can be a mechanical connection, or an electrical connection; may be directly connected, or indirectly connected through intervening media, or may be in internal communication between two devices, elements or components. The specific meanings of the above terms in the present invention can be understood by those of ordinary skill in the art according to specific situations.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
According to an embodiment of the present invention, as shown in fig. 1, there is provided a method for identifying uniqueness of data in a database, the method including the following steps:
defining a data identification column, selecting a related column needing to judge uniqueness, and adding a new data identification column, which specifically comprises the following steps:
selecting original data fields, selecting data columns (data fields) needing to distinguish data uniqueness, and selecting one field, a plurality of fields or all fields according to different specific service scenes;
and the data identification field definition is used for storing the data identification calculation result.
Generating a data unique identifier, and storing the generated result in an added identifier column, as shown in fig. 2, the specific step of generating the data unique identifier from the selected column includes:
serializing each selected field into a character string;
sorting the selected words according to the field names;
splicing and combining the serialized character strings according to the sequencing result;
and performing Hash (Hash) calculation on the generated sequencing result, wherein the calculation result is used as the unique data identifier.
Judging the uniqueness of the data, comparing the generated uniqueness identifier column with the uniqueness identifier column of the existing data in the library, and determining that the same data exists if the same uniqueness identifier exists; absence is considered to be no identical data.
When the same data is judged to exist, according to the needs of the service scene, the relevant processing is performed on the repeated data, which specifically comprises:
when the data needs to be reserved, the data version identification is put in a warehouse and updated as required;
when no reservation is needed, the data is discarded.
And when the same data does not exist, warehousing the new data and adding the data initial version identification.
From the above description, it can be seen that the present invention achieves the following technical effects:
in the embodiment of the application, the method is compatible with the existing database data Identification (ID) technology, and can be used as a data identification for replacing the existing database technology, and can also be used as a newly added field as an auxiliary column of the existing data identification;
the method has wide application scenes and can be used in database applications such as relational databases, document databases, distributed data and the like; the method can also be used in big data application scenes such as data cleaning and data deduplication;
the method is efficient and saves resources, only one identification column of 32-256 bits is needed to be added, and all required fields are not needed to be judged.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowcharts, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and they may alternatively be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, or fabricated separately as individual integrated circuit modules, or fabricated as a single integrated circuit module from multiple modules or steps. Thus, the present invention is not limited to any specific combination of hardware and software.
The above description is only a preferred embodiment of the present application and is not intended to limit the present application, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present application shall be included in the protection scope of the present application.
Claims (7)
1. A method for uniquely identifying data in a database is characterized by comprising the following steps:
defining a data identification column, selecting a related column needing to judge uniqueness, and adding a new data identification column;
generating a data uniqueness identifier, and storing a generated result in a newly added identifier column for later use;
judging the uniqueness of the data, and comparing the uniqueness with the uniqueness of the existing data in the library through the uniqueness identification column;
when the same unique identifier is judged to exist, performing relevant processing on the repeated data according to the requirement of a service scene;
and when the same unique identifier does not exist, warehousing the new data and adding the data initial version identifier.
2. The method according to claim 1, wherein the defining a data identification column specifically includes:
selecting an original data field;
data identification field definitions.
3. The method according to claim 2, wherein the original data field is selected to select a data column that needs to distinguish data uniqueness, and one field, multiple fields, or all fields can be selected according to different specific service scenarios.
4. The method for uniquely identifying data in a database according to claim 2, wherein the data identification field is defined for storing a result of the data identification calculation.
5. The method for uniquely identifying data in a database according to claim 1, wherein the step of generating the unique data identification comprises:
serializing each selected field into a character string;
sorting the selected words according to the field names;
splicing and combining the serialized character strings according to the sequencing result;
and carrying out Hash calculation on the generated sequencing result, wherein the calculation result is used as a unique data identifier.
6. The method according to claim 1, wherein the data uniqueness is determined, and if the same uniqueness exists, the same data is considered to exist, and if the same uniqueness does not exist, the same data is not considered to exist.
7. The method for identifying uniqueness of data in a database according to claim 1, wherein when the same uniqueness identification is determined to exist, the method performs correlation processing on the repeated data according to the needs of a service scenario, and specifically comprises:
when the data needs to be reserved, the data version identification is put in a warehouse and updated as required;
when no reservation is needed, the data is discarded.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110883879.9A CN113760907A (en) | 2021-08-02 | 2021-08-02 | Data uniqueness identification method in database |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110883879.9A CN113760907A (en) | 2021-08-02 | 2021-08-02 | Data uniqueness identification method in database |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113760907A true CN113760907A (en) | 2021-12-07 |
Family
ID=78788344
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110883879.9A Pending CN113760907A (en) | 2021-08-02 | 2021-08-02 | Data uniqueness identification method in database |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113760907A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116385157A (en) * | 2023-06-05 | 2023-07-04 | 紫金诚征信有限公司 | Data processing method and device for credit investigation credit principal identification |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112035571A (en) * | 2020-08-19 | 2020-12-04 | 深圳乐信软件技术有限公司 | Data synchronization method, device, equipment and storage medium |
US20210073196A1 (en) * | 2019-09-09 | 2021-03-11 | Sap Se | Semantic, single-column identifiers for data entries |
CN112579623A (en) * | 2019-09-29 | 2021-03-30 | 北京国双科技有限公司 | Method, device, storage medium and equipment for storing data |
-
2021
- 2021-08-02 CN CN202110883879.9A patent/CN113760907A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210073196A1 (en) * | 2019-09-09 | 2021-03-11 | Sap Se | Semantic, single-column identifiers for data entries |
CN112579623A (en) * | 2019-09-29 | 2021-03-30 | 北京国双科技有限公司 | Method, device, storage medium and equipment for storing data |
CN112035571A (en) * | 2020-08-19 | 2020-12-04 | 深圳乐信软件技术有限公司 | Data synchronization method, device, equipment and storage medium |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116385157A (en) * | 2023-06-05 | 2023-07-04 | 紫金诚征信有限公司 | Data processing method and device for credit investigation credit principal identification |
CN116385157B (en) * | 2023-06-05 | 2023-08-15 | 紫金诚征信有限公司 | Data processing method and device for credit investigation credit principal identification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111459985B (en) | Identification information processing method and device | |
EP2924594A1 (en) | Data encoding and corresponding data structure in a column-store database | |
CN104794123A (en) | Method and device for establishing NoSQL database index for semi-structured data | |
CN106874281B (en) | Method and device for realizing database read-write separation | |
CN107092686B (en) | File management method and device based on cloud storage platform | |
CN110490761B (en) | Power grid distribution network equipment ledger data model modeling method | |
CN113326264A (en) | Data processing method, server and storage medium | |
CN113760907A (en) | Data uniqueness identification method in database | |
CN112307318A (en) | Content publishing method, system and device | |
CN110018845A (en) | Metadata version control methods and device | |
CN107291938A (en) | Order Query System and method | |
CN111897837B (en) | Data query method, device, equipment and medium | |
CN109672608B (en) | Method for transmitting messages according to time | |
CN105740251B (en) | Method and system for integrating different content sources in bus mode | |
CN116521956A (en) | Graph database query method and device, electronic equipment and storage medium | |
CN110659393A (en) | Method and system for generating xml code | |
CN105426676A (en) | Drilling data processing method and system | |
CN111563123B (en) | Real-time synchronization method for hive warehouse metadata | |
CN114936269A (en) | Document searching platform, searching method, device, electronic equipment and storage medium | |
CN111666278B (en) | Data storage method, data retrieval method, electronic device and storage medium | |
CN112905847A (en) | Tree structure construction method and device | |
CN107888415B (en) | Network management system data maintenance method | |
CN111258955A (en) | File reading method and system, storage medium and computer equipment | |
CN111782886A (en) | Method and device for managing metadata | |
CN103823671A (en) | Control establishing method, control invoking method and control invoking system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |