CN116010523A - System and method for storing and retrieving mass electronic certificates in distributed database - Google Patents

System and method for storing and retrieving mass electronic certificates in distributed database Download PDF

Info

Publication number
CN116010523A
CN116010523A CN202310028091.9A CN202310028091A CN116010523A CN 116010523 A CN116010523 A CN 116010523A CN 202310028091 A CN202310028091 A CN 202310028091A CN 116010523 A CN116010523 A CN 116010523A
Authority
CN
China
Prior art keywords
data
distributed database
license
electronic
storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310028091.9A
Other languages
Chinese (zh)
Inventor
吴志雄
吴浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Linewell Software Co Ltd
Original Assignee
Linewell Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Linewell Software Co Ltd filed Critical Linewell Software Co Ltd
Priority to CN202310028091.9A priority Critical patent/CN116010523A/en
Publication of CN116010523A publication Critical patent/CN116010523A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a system and a method for storing and retrieving mass electronic certificates in a distributed database, comprising the steps of storing electronic certificate data; electronic license data retrieval; maintaining the consistency of the electronic license data; based on the characteristic of large scale of the electronic license data, the invention provides a technical scheme of storing data by using a distributed database, storing index data by using an elastic search, and respectively bearing the storage and the retrieval of the data by different devices, combines the advantages of the integrity of the stored data of the distributed database, easy maintenance and transaction management support, and the advantages of automatically creating all field indexes of the elastic search and quickly retrieving and counting.

Description

System and method for storing and retrieving mass electronic certificates in distributed database
Technical Field
The invention relates to a system and a method for storing and retrieving mass electronic certificates in a distributed database.
Background
The electronic license refers to various electronic documents such as identity cards, wedding cards, bank repayment flow certificates, business licenses and the like which are legal by each unit and have legal efficacy. The data volume of the electronic license in a region is mainly related to the number of natural persons and legal persons in the region. For a generalized city, the population is about 400 to 1000 tens of thousands, the legal entity is 50 tens of thousands, and the number of electronic licenses is about 4000 tens of thousands. In the case of the provincial electronic license platform, the provincial electronic license platform is estimated to be about 4 hundred million, and the extra provincial can reach the level of 10 hundred million. For data volumes ranging from 4000 tens of thousands to 10 billion, a single database is not manageable, and a distributed database is used for storage management. Distributed database systems typically use smaller computer systems, each of which may be placed separately in a place, where each computer may have a full or partial copy of the DBMS, and have its own local database, where many computers in different places are interconnected by a network to together form a complete, global, logically centralized, physically distributed large database. In general, a physical table is proposed to store 500 to 800 tens of thousands of data. 4000 tens of thousands of data require 5 to 8 tables to store; 10 billions of data require 125 to 200 tables to store.
The prior closest patent application numbers are 201410283343.3 and 202110118827.2, which respectively disclose a query or access method and device of a distributed database, and the main technology is that query sentences are analyzed, sql query sentences are reorganized and dispersed into each distributed database for query, and finally query results are summarized. The technical scheme is that data are stored in a distributed database, search problems are solved by optimizing query sentences, the query sentences are mainly deconstructed and recombined in a grammar analysis mode, the distributed search problems are solved by starting from a query method, the query sentences depend on the correct establishment of the database index on one hand, and on the other hand, operation and maintenance personnel are required to cooperate to carry out continuous index maintenance. However, in practice, it is difficult to solve many practical query problems only from the query end, because the search speed of the database is largely related to the database index, and if the queried fields cannot utilize the database index, full-table scanning and comparison are required; in addition, the scattered data are often not ordered according to a specific order, if the ordering requirement exists in the query statement, the pre-ordering of each table is needed, and finally the data are combined and ordered in the memory, so that the efficiency is very low, and even the retrieval requirement cannot be met at all. In addition, modern informatization systems need to continuously meet new user requirements, new fields are added, new queries are added, and calculation of statistics is unavoidable, which requires that operation and maintenance personnel of the system also frequently add indexes to the database. If the database index is not added, the search query cannot be completed, and adding the index to the distributed tables is a laborious task, often causing confusion and even requiring out-of-service and special handling.
The elastiscearch is an open source distributed search engine developed using the Java language built on apache lucene. The elastiscearch encapsulates and expands Lucene, making storage, indexing, searching faster and easier. The prior patent numbers CN201910201797 and CN201910792472 are both solutions, in which data is stored directly in the elastesearch and retrieved by the elastesearch. The technical scheme has the defects that the elastiscearch is used as a unique data storage mechanism, so that the elastiscearch is limited by the characteristics of the elastiscearch, and the elastiscearch has some defects in data relevance, transaction management and maintainability. Because of the lack of correlation, all related data generally need to exist in the same index, resulting in the overlarge data volume of single data, generating a lot of redundant data, being very unfavorable for data update, and sometimes a simple mapping data change, the whole library table needs to be traversed and rewritten. Since the transaction is not supported by the elastic search, as the relevance is insufficient, all information needs to be written into one index, so that the problem of transaction consistency is avoided; and because the management authority of the data has small discrimination, the visitor identity authentication is generally only made through the x-pack, and the actual data management authority is not as perfect as that of the relational database, so the maintainability is insufficient.
Disclosure of Invention
The invention aims to provide a system and a method for storing and retrieving mass electronic certificates in a distributed database, which use the distributed database to store data and an elastic search storage index, and respectively bear the storage and retrieval of the data by different devices, thereby ensuring the integrity, the maintainability and the transaction management of the data storage and having the advantages of easy retrieval and easy statistics.
The invention relates to a method for storing and retrieving mass electronic certificates in a distributed database, which comprises the following steps:
step 1, an electronic license data storage step
The electronic license data comprises 13 basic metadata, a plurality of management control fields, and illumination data and detail data of the electronic license, wherein the 13 basic metadata and the management control fields in the electronic license data, the illumination data and the detail data of the electronic license are stored into a distributed database as storage data, and hash distribution is carried out in the distributed database through a license main key; simultaneously, organizing the storage data into asynchronous messages and sending the asynchronous messages to a kafka queue to form an asynchronous message queue; after acquiring the asynchronous message, the queue consumer extracts an index field of the stored data and writes the index field into an elastic search to form an index table, wherein the index field comprises a license data main key, 13 items of basic metadata and a part of management control field;
step 2, electronic license data retrieval step
When a search requester searches, executing a query statement in an elastic search to obtain a list comprising a main key of license data, 13 basic metadata and part of management control fields; acquiring storage data corresponding to the electronic license from the distributed database according to a primary key of the license data, and returning the storage data to a retrieval requester, so that the retrieval requester inquires to obtain complete license data;
step 3, maintaining the consistency of the electronic license data
Periodically comparing the preset day-ahead data in the distributed database and the elastiscearch; and analyzing the comparison result, performing data consistency processing, deleting more data, supplementing less data, and updating inconsistent data contents.
The invention relates to a system for storing and retrieving mass electronic certificates in a distributed database, which comprises a storage module, a retrieval module and a data consistency maintenance module, wherein
The storage module comprises a distributed database and an elastic search, 13 items of basic metadata, management control fields, electronic license face data and detail data in the electronic license data are stored as storage data in the distributed database, and hash distribution is carried out in the distributed database through a license main key; simultaneously, organizing the storage data into asynchronous messages and sending the asynchronous messages to a kafka queue to form an asynchronous message queue; after acquiring the asynchronous message, the queue consumer extracts an index field of the stored data and writes the index field into an elastic search to form an index table, wherein the index field comprises a license data main key, 13 basic metadata and a part of management control field;
the search module is used for acquiring a search request provided by a search requester, executing a query statement in an elastic search, and acquiring a list comprising a main key of license data, 13 basic metadata and a part of management control fields; acquiring storage data corresponding to the electronic license from a distributed database according to a primary key of the license data; returning the stored data to the retrieval requester;
the data consistency maintenance module is used for periodically comparing the data of the distribution database with the data of the preset day before in the elastic search; and analyzing the comparison result, performing data consistency processing, deleting more data, supplementing less data, and updating inconsistent data contents.
A computer device comprising a processor and a memory, the memory having stored therein a computer program, characterized in that: the computer program, when executed by the processor, implements a method for storing and retrieving the mass electronic certificates in the distributed database.
A computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method of storing and retrieving a mass electronic license in a distributed database.
Based on the characteristic of large scale of the electronic license data, the invention provides a technical scheme of storing data by using a distributed database, storing index data by using an elastic search, and respectively bearing the storage and the retrieval of the data by different devices, combines the advantages of the integrity of the stored data of the distributed database, easy maintenance (backup, recovery and expansion), support of transaction management, and the advantages of automatically creating all field indexes and quickly retrieving and counting by the elastic search.
Drawings
FIG. 1 is a flow chart of a memory module of the present invention;
FIG. 2 is a flow chart of the search module of the present invention;
FIG. 3 is a flow chart of a data consistency maintenance module of the present invention.
The invention is described in further detail below with reference to the drawings and the specific examples.
Detailed Description
It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be noted that, for convenience of description, only the portions related to the present invention are shown in the drawings. It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The scope of the present specification should be considered as long as there is no contradiction between the combinations of these technical features.
Example 1
1-3, in a first embodiment, the present invention provides a method for storing and retrieving a mass electronic license in a distributed database, including the following steps:
step 1, an electronic license data storage step
The electronic license data comprises 13 basic metadata, a plurality of management control fields, and the face data and detail data of the electronic license, wherein the retrieval mainly depends on the 13 basic metadata and the management control fields, and the face data of the electronic license is not generally retrieved, because the face data and the detail data are very different for different types of electronic licenses (such as wedding cards, birth cards and identity cards), the 13 basic metadata and the management control fields in the electronic license data, the face data and the detail data of the electronic license are stored in a distributed database as storage data, and hash distribution is carried out in the distributed database through a license main key; simultaneously, organizing the storage data into asynchronous messages and sending the asynchronous messages to a kafka queue to form an asynchronous message queue; after obtaining the asynchronous message, the queue consumer extracts an index field of the stored data and writes the index field into an elastic search to form an index table, wherein the index field comprises a license data main key, 13 items of basic metadata and a part of management control field (not all of the fields are determined by the service characteristics of a system);
step 2, electronic license data retrieval step
When a search requester searches, executing a query statement in an elastic search to obtain a list comprising a main key of license data, 13 basic metadata and part of management control fields; acquiring storage data corresponding to the electronic license from the distributed database according to a primary key of the license data, and returning the storage data to a retrieval requester, so that the retrieval requester inquires to obtain complete license data;
step 3, maintaining the consistency of the electronic license data
Periodically comparing the data of the previous day (which can be preset) in the distributed database and the elastosearch; analyzing the comparison result, wherein the comparison result comprises multiple data, fewer data and inconsistent data content; and then carrying out data consistency processing, deleting more data, supplementing less data, and updating inconsistent data contents.
Example two
In a second embodiment, the invention provides a system for storing and retrieving mass electronic certificates in a distributed database, which comprises a storage module, a retrieval module and a data consistency maintenance module, wherein
The storage module comprises a distributed database and an elastic search, 13 items of basic metadata and management control fields used for searching in the electronic license data and electronic license face data and detail data are stored as storage data in the distributed database, and hash distribution is carried out in the distributed database through a license main key; simultaneously, organizing the storage data into asynchronous messages and sending the asynchronous messages to a kafka queue to form an asynchronous message queue; after acquiring the asynchronous message, the queue consumer extracts an index field of the stored data and writes the index field into an elastic search to form an index table, wherein the index field comprises a license data main key, 13 basic metadata and a part of management control field;
the search module is used for acquiring a search request provided by a search requester, executing a query statement in an elastic search, and acquiring a list comprising a main key of license data, 13 basic metadata and a part of management control fields; acquiring storage data corresponding to the electronic license from a distributed database according to a primary key of the license data; returning the stored data to the retrieval requester;
the data consistency maintenance module is used for periodically comparing the data of the previous preset days in the distributed database and the elastic search; and analyzing the comparison result, performing data consistency processing, deleting more data, supplementing less data, and updating the content with inconsistent data content.
Example III
In a third embodiment, the present invention provides a computer device comprising a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities, the memory comprising a non-volatile storage medium, an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The computer program, when executed by the processor, implements the method for storing and retrieving the massive electronic certificates in the distributed database according to the first embodiment. The first and third embodiments have the same functions and advantageous effects.
Example IV
In a fourth embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor implements a method for storing and retrieving a mass electronic license of the first embodiment in a distributed database. The first and fourth embodiments have the same functions and advantageous effects.
The total amount of stored data of the mass electronic license is between 4000 ten thousand and 10 hundred million, and the data is too large for one database list table, but not large enough for a big data hadoop cluster, so that the mass electronic license is most suitable to be stored in a distributed database.
The data composition of the electronic license in the invention comprises 13 basic metadata, a plurality of management control fields, and the care data and detail data of the license. The retrieval mainly relies on 13 basic metadata and some management control fields. The data of the electronic license is not generally retrieved because the data of the electronic license is very different from the data of the electronic license of the detail type (e.g. wedding card, birth card, identity card). However, in storage, the data of the care plane is often large, if the data are all in the elastiscearch, all the data can be used for calculating the index, actually, the performance of the insertion is affected, and the real-time performance is not enough. Therefore, the invention stores part of the fields to be searched into the elastic search to form an index table, so as to reduce the weight of data search.
However, the elastic search, while advantageous in indexing queries, has drawbacks, such as it does not support transaction management, and does not support table association. Whereas license data as important fact data certainly requires transaction management; and also has association relation with other tables in the database, and association inquiry needs to be supported. Therefore, the invention inquires the primary key of the certificate data through the elastic search, and then inquires the distributed database through the primary key to obtain the complete certificate data. Some of the management control fields relate to specific services, and various management control modes cannot be listed here, for example, in the association relationship, the electronic license may have a plurality of related photo pictures, where the photo pictures are not in an elastic search and are in another file table, and after obtaining the license main key, all the photo pictures can be queried through the association table.
Since the data is stored in the distributed database as a whole and the index is stored in the elastic search, the consistency of the data at two sides needs to be maintained, and a data consistency maintenance module needs to be arranged.
Those skilled in the art will appreciate that implementing all or part of the above described embodiments may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of embodiments of the method described above. Wherein the non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims (4)

1. The method for storing and retrieving mass electronic certificates in the distributed database is characterized by comprising the following steps:
step 1, an electronic license data storage step
The electronic license data comprises 13 basic metadata, a plurality of management control fields, and illumination data and detail data of the electronic license, wherein the 13 basic metadata and the management control fields in the electronic license data, the illumination data and the detail data of the electronic license are stored into a distributed database as storage data, and hash distribution is carried out in the distributed database through a license main key; simultaneously, organizing the storage data into asynchronous messages and sending the asynchronous messages to a kafka queue to form an asynchronous message queue; after acquiring the asynchronous message, the queue consumer extracts an index field of the stored data and writes the index field into an elastic search to form an index table, wherein the index field comprises a license data main key, 13 items of basic metadata and a part of management control field;
step 2, electronic license data retrieval step
When a search requester searches, executing a query statement in an elastic search to obtain a list comprising a main key of license data, 13 basic metadata and part of management control fields; acquiring storage data corresponding to the electronic license from the distributed database according to a primary key of the license data, and returning the storage data to a retrieval requester, so that the retrieval requester inquires to obtain complete license data;
step 3, maintaining the consistency of the electronic license data
Periodically comparing the preset day-ahead data in the distributed database and the elastiscearch; and analyzing the comparison result, performing data consistency processing, deleting more data, supplementing less data, and updating inconsistent data contents.
2. The system for storing and retrieving mass electronic certificates in the distributed database is characterized by comprising a storage module, a retrieval module and a data consistency maintenance module, wherein
The storage module comprises a distributed database and an elastic search, 13 items of basic metadata, management control fields, electronic license face data and detail data in the electronic license data are stored as storage data in the distributed database, and hash distribution is carried out in the distributed database through a license main key; simultaneously, organizing the storage data into asynchronous messages and sending the asynchronous messages to a kafka queue to form an asynchronous message queue; after acquiring the asynchronous message, the queue consumer extracts an index field of the stored data and writes the index field into an elastic search to form an index table, wherein the index field comprises a license data main key, 13 basic metadata and a part of management control field;
the search module is used for acquiring a search request provided by a search requester, executing a query statement in an elastic search, and acquiring a list comprising a main key of license data, 13 basic metadata and a part of management control fields; acquiring storage data corresponding to the electronic license from a distributed database according to a primary key of the license data; returning the stored data to the retrieval requester;
the data consistency maintenance module is used for periodically comparing the data of the distribution database with the data of the preset day before in the elastic search; and analyzing the comparison result, performing data consistency processing, deleting more data, supplementing less data, and updating inconsistent data contents.
3. A computer device comprising a processor and a memory, the memory having stored therein a computer program, characterized in that: the computer program when executed by a processor implements the method for storing and retrieving mass electronic certificates in a distributed database as claimed in claim 1.
4. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements a method of storing and retrieving a mass electronic license as defined in claim 1 in a distributed database.
CN202310028091.9A 2023-01-09 2023-01-09 System and method for storing and retrieving mass electronic certificates in distributed database Pending CN116010523A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310028091.9A CN116010523A (en) 2023-01-09 2023-01-09 System and method for storing and retrieving mass electronic certificates in distributed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310028091.9A CN116010523A (en) 2023-01-09 2023-01-09 System and method for storing and retrieving mass electronic certificates in distributed database

Publications (1)

Publication Number Publication Date
CN116010523A true CN116010523A (en) 2023-04-25

Family

ID=86024584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310028091.9A Pending CN116010523A (en) 2023-01-09 2023-01-09 System and method for storing and retrieving mass electronic certificates in distributed database

Country Status (1)

Country Link
CN (1) CN116010523A (en)

Similar Documents

Publication Publication Date Title
US20190258625A1 (en) Data partitioning and ordering
Agrawal et al. Asynchronous view maintenance for VLSD databases
US9672241B2 (en) Representing an outlier value in a non-nullable column as null in metadata
US7467163B1 (en) System and method to manipulate large objects on enterprise server data management system
US9507807B1 (en) Meta file system for big data
US9639542B2 (en) Dynamic mapping of extensible datasets to relational database schemas
US10180992B2 (en) Atomic updating of graph database index structures
Chavan et al. Survey paper on big data
US8862566B2 (en) Systems and methods for intelligent parallel searching
CN107103032B (en) Mass data paging query method for avoiding global sequencing in distributed environment
CN105144159A (en) HIVE table links
US20170255708A1 (en) Index structures for graph databases
CN105164673A (en) Query integration across databases and file systems
US7617206B1 (en) Method for analyzing status of specialized tank files which store and handle large objects
Stadler et al. Sparklify: A scalable software component for efficient evaluation of sparql queries over distributed rdf datasets
US10095738B1 (en) Dynamic assignment of logical partitions according to query predicate evaluations
WO2017156855A1 (en) Database systems with re-ordered replicas and methods of accessing and backing up databases
Barkhordari et al. Atrak: a MapReduce-based data warehouse for big data
US9275059B1 (en) Genome big data indexing
Pandagale et al. Hadoop-HBase for finding association rules using Apriori MapReduce algorithm
US11868362B1 (en) Metadata extraction from big data sources
US8554722B2 (en) Method for transferring data into database systems
CN116010523A (en) System and method for storing and retrieving mass electronic certificates in distributed database
CN108256019A (en) Database key generation method, device, equipment and its storage medium
US8706769B1 (en) Processing insert with normalize statements

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination