CN115291838A - Multi-user electronic mailbox mail indexing method based on KV system - Google Patents

Multi-user electronic mailbox mail indexing method based on KV system Download PDF

Info

Publication number
CN115291838A
CN115291838A CN202210872986.6A CN202210872986A CN115291838A CN 115291838 A CN115291838 A CN 115291838A CN 202210872986 A CN202210872986 A CN 202210872986A CN 115291838 A CN115291838 A CN 115291838A
Authority
CN
China
Prior art keywords
mail
information
data
key
folder
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210872986.6A
Other languages
Chinese (zh)
Inventor
佟路林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING EYOU INFORMATION TECHNOLOGY CO LTD
Original Assignee
BEIJING EYOU INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING EYOU INFORMATION TECHNOLOGY CO LTD filed Critical BEIJING EYOU INFORMATION TECHNOLOGY CO LTD
Priority to CN202210872986.6A priority Critical patent/CN115291838A/en
Publication of CN115291838A publication Critical patent/CN115291838A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/107Computer-aided management of electronic mailing [e-mailing]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Computer Hardware Design (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

The invention provides a multi-user electronic mailbox mail indexing method based on a KV system, which comprises the following steps: constructing a Server program architecture; constructing a Key-List mailbox structure based on a KV system according to the Server program architecture; constructing a Map structure thread lock barrel based on a memory; and storing a plurality of field information of the mail by adopting a ProtoBuf structure. The newly added mail information field is flexible and efficient, backward/forward compatibility of the mail information data field structure is effectively realized, and the summary information of the user mailbox folder and the index of the mail information are realized.

Description

Multi-user electronic mailbox mail indexing method based on KV system
Technical Field
The invention relates to the field of email servers, in particular to a multi-user email box email indexing method based on a KV system.
Background
One of the important functions of the email server is to store mailbox information and email text of a plurality of users, and the users access the email server through MUA (e.g. webmail, pop, imap) clients to read summary information and email text contents of individual mailboxes. With the dramatic increase of the number of users and the number of mails thereof, the speed and the performance of the email server are slower and slower when the email server displays the summary information (folder information and the number of mails) of the personal mailbox and reads the original text list of the mails, and even under the conditions of large concurrency and high load, the email server is down or cannot be served.
For the above reasons, it is necessary to cache or index the mailbox summary information and the mail textual list information of the user, so as to accelerate the user to quickly display the mailbox summary information and the mail list information of the user when receiving the mail through the MUA client, and to improve the speed of the user in traversing the mail list, and to quickly update the mailbox summary information of the user and adjust the mail list information when the number of mails in the mailbox of the user is changed (for example, a new mail is received, a mail is deleted, a mail is moved from one folder to another folder, etc.).
The mailbox formats of a general electronic mail server are mbox and maildir formats, which respectively organize the storage of mail original text by adopting a single file or a directory structure file without a cache or an index of related mailbox summary information, when the mailbox summary information needs to be displayed, the mbox format needs to perform summary statistics on all mail original text information and the number of mail original text information in the single file, and the maildir format needs to count information such as the number of files and the size of files in the whole directory, and is very complex when changing or moving, so the efficiency is usually low.
Most of improved electronic mail servers still adopt maildir format or cloud storage mode to store mail original text, but an indexing device is added to store summary information and mail original text list information of mailboxes so as to accelerate the speed of reading related information.
One of the indexing devices is a method of storing original mail information of a user in a binary structured index file, and records related information (sender, receiver, subject, size, file ID, and the like) of the original mail by using each structural block with a fixed size in the structured file, when the information is read, related summary information is calculated by calculating the number of the structural blocks in the structured file and traversing the structural block information, and when mail list information is displayed, the speed can be improved by directly reading the structural block information in the structured file. Because the data structure of the structured block is fixed, if the mail information field needs to be added, the whole structured file needs to be changed, and a very large amount of workload is required no matter the on-line mode or the off-line mode is adopted.
In addition, another indexing device is a mode of storing the original mail text information of users in a MySQL database table, a mail folder information table and a mail information table of the database are used, the mail folder information (mail folder name, mail number and the like) of all users and the related information (sender, receiver, subject, size, mail folder ID, file ID and the like) of the original mail text are stored in the records of the database in a table field mode, when the information is read, query and traversal are carried out through SQL related statements, when data changes such as mail addition or deletion are carried out, due to the fact that the mail folder information table and the mail information table have a correlation updating relationship, in order to guarantee atomic updating, a database transaction operation is needed, and the performance of the database transaction operation is relatively low. Meanwhile, with the increase of the mail data volume of the user, certain bottleneck can exist in the query performance of the database. When the mail information field is added, table locking operation is needed, and when the data volume is very large, the table locking operation needs to be carried out for a long time to update the newly added field of the database.
In summary, the index device in the prior art has low working efficiency.
Disclosure of Invention
In view of the above problems, the present invention has been made to provide a multi-user email box mail indexing method based on KV system that overcomes or at least partially solves the above problems.
According to one aspect of the invention, the method for indexing the multi-user email box mail based on the KV system comprises the following steps:
constructing a Server program architecture;
constructing a Key-List mailbox structure based on a KV system according to the Server program architecture;
constructing a Map structure thread lock barrel based on a memory;
and storing a plurality of field information of the mail by adopting a ProtoBuf structure.
Optionally, the building of the Server program architecture specifically includes:
the Server program is a multi-thread service program and comprises a network IO thread pool, a network IO thread pool and a Server module, wherein the network IO thread pool is used for providing Restful interface access service of an HTTP protocol; the working thread pool is used for carrying out data query and update service of the KV storage system;
and the network IO thread pool and the working thread pool are interacted through a message queue.
Optionally, the constructing a Key-List mailbox structure based on the KV system according to the Server program architecture specifically includes:
the method comprises the steps that a working thread pool thread utilizes a plurality of pieces of Key-Value data based on a KV storage system to construct a mail folder information and mail information List structure of a Key-List, one piece of Key-Value data is mail folder information data, a plurality of pieces of Key-Value data are mail information data, each piece of Key-Value data represents one piece of mail data, and relevant information of a mail is stored;
planning a content format of a Key in the Key-Value data, wherein the Key comprises a TAG identification field which indicates the type of the Key, a mail folder ID and a mail ID field; the prefix portion includes the same user ID;
the Value comprises a plurality of fields, the Value of the mail folder information comprises a mail folder update time stamp, the number of mails in the mail folder, the total Size of the mails in the mail folder, the maximum mail ID and the minimum mail ID in the mail folder, and the Value of the mail information comprises the time stamp of the mail, the Size of the mail and the ProtoBuf serialization data of the summary information of the mail.
Optionally, the constructing a Map structure thread lock bucket based on a memory specifically includes:
when the thread of the working thread pool updates the mail folder information and the mail information of a user, a user-level thread mutual exclusion lock is used for ensuring the atomicity operation of data;
a Map structure thread lock barrel based on a memory is used;
when the KV data is updated, a KV system batch operation interface is adopted.
Optionally, the storing the multiple pieces of field information of the mail by using the ProtoBuf structure specifically includes:
the ProtoBuf structure stores a plurality of fields of mail information, when the fields are stored in Key-Value of a KV system, the whole structure is serialized into a character string, and when the fields are read, the character string is deserialized, so that the whole process is transparent to a user;
if a new mail information field is needed, only one field of the ProtoBuf structure needs to be added, the newly added field can automatically adopt a default value in the de-serialization process of old data, and the newly added field can be automatically serialized into a character string during updating, so that the whole process has no influence on the whole data, the data conversion is flexible and efficient, and the backward/forward compatibility of the data structure is realized.
The invention provides a multi-user electronic mailbox mail indexing method based on a KV system, which comprises the following steps: constructing a Server program architecture; constructing a Key-List mailbox structure based on a KV system according to the Server program architecture; constructing a Map structure thread lock barrel based on a memory; and storing a plurality of field information of the mail by adopting a ProtoBuf structure. The newly added mail information field is flexible and efficient, backward/forward compatibility of the mail information data field structure is effectively realized, and the summary information of the user mailbox folder and the index of the mail information are realized.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a diagram of a Server program architecture according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a Key-List mailbox structure based on the KV system according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a memory-based MAP structure thread lock bucket according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The terms "comprises" and "comprising," and any variations thereof, in the present description and claims and drawings are intended to cover a non-exclusive inclusion, such as a list of steps or elements.
The technical solution of the present invention is further described in detail with reference to the accompanying drawings and embodiments.
As shown in fig. 1, a Server program architecture is constructed: the Server program is a multi-thread service program and mainly comprises two groups of thread pools, wherein one group is a network IO thread pool and provides a Restful interface access service of an HTTP protocol, and the other group is a working thread pool and is used for carrying out data query and update service of a KV storage system. And the network IO thread and the working thread pool are interacted through a message queue.
A KV storage engine system is integrated inside, index data such as mailbox summary information and mail information of a plurality of users are stored in the KV system, and the KV system is a very quick NoSQL storage system and can store mass data and provide quick access to the data.
As shown in fig. 2, establishing a Key-List mailbox structure based on the KV system includes:
the thread of the working thread pool utilizes a plurality of pieces of Key-Value data based on a KV storage system to construct a mail folder information and mail information List structure of a Key-List, wherein one piece of KV data is mail folder information data, the other plurality of pieces of KV data are mail information data, each piece of KV data represents one piece of mail data, and each piece of relevant information of a mail is stored. In order to establish an association relationship between the mail folder information and the mail information, a content format of a Key in the KV data needs to be planned, where the Key includes multiple fields, where a prefix portion includes the same user ID to ensure that data of the same user has the same prefix, and the Key also includes a TAG identification field to indicate the type of the Key (mail folder information, mail information), and fields of the mail folder ID and the mail ID, and the Value also includes multiple fields, and the Value of the mail folder information mainly includes a mail folder update timestamp, a mail number in the mail folder, a mail total Size in the mail folder, a maximum mail ID in the mail folder, and a minimum mail ID, and the Value of the mail information mainly includes a mail timestamp, a mail Size, and digest information ProtoBuf serialized data of the mail.
When a mail is added or deleted, a piece of mail Key-Value data needs to be added or deleted, and the values of the Count, bytes, maxID and MinID fields in the Value of the mail folder Key-Value data are updated at the same time.
When the information of the mail folder is read, only the data of Key-Value of the mail folder needs to be read, which is a very quick reading action in the NoSQL system, and the information of the mail folder can be directly obtained.
When the mail in the mail folder is traversed, only the UID prefix and the MailID sequence are needed to read the Key-Value data in the NoSQL system in sequence, and the KV traversal pointer of the NoSQL system is utilized to quickly read the Key-Value data even in mass data.
Constructing a Key-List structure through a plurality of pieces of Key-Value data of the KV system, namely: the Key is a record for recording the mail folder information of the user, the List is a plurality of records, each record is a piece of mail information of the user, and the mailbox index structure of the user is constructed in such a way. When the information of the mail folder is read, only one piece of Key-Value data needs to be read, the advantages of the KV storage system of NoSQL are adopted, the piece of Key-Value mail folder information data is quickly positioned and read in mass data, and the reading speed is very high. When a mail is newly added or deleted, only 2 records are needed to be updated simultaneously, one record is mail folder information, the other record is mail information of the new or deleted mail, and a transaction is not needed, but a user-level memory Mutex thread lock is utilized, a batch operation (BatchPut) interface of a KV memory system is adopted for one-time completion, and the speed and the efficiency are very high.
As shown in fig. 3, constructing a memory-based Map structure thread lock bucket includes: when the thread of the working thread pool updates the mail folder information and the mail information of a user, in order to ensure the atomic operation of data, a user-level (UID) thread exclusive lock is used, meanwhile, in order to satisfy the requirement that a large concurrent user updates data simultaneously, a memory-based Map structure thread lock barrel is used, 10000 barrels are pre-allocated, an element of a Map is a hash value of the UID, data corresponding to the element is a thread lock Mutex object, according to the characteristic of a Map data structure, the UID obtains the time complexity of a lock to be O (1) or O (logn), when KV data are updated, a KV system batch operation interface (BatchPut) is adopted, the rapidness and the atomicity of the updated data are ensured, meanwhile, an automatic release function is realized for the Mutex lock in the working thread, and the condition of system deadlock is effectively avoided.
In order to ensure the atomicity of updating Key-List data, a thread lock bucket of a Map structure based on a memory is used in a Server program, and a Key is shared to different buckets to obtain a thread lock. When the number of elements (i.e., the number of buckets) in the Map is set to 10000 (or more), more concurrent access requests of the user can be obtained. The time complexity of element positioning of Map can reach O (1) or O (logn), and the method can quickly position a specific lock and ensure quick concurrent updating of data.
The method for storing the information of the plurality of fields of the mail by adopting the ProtoBuf structure comprises the following steps: the ProtoBuf structure stores a plurality of fields of mail information, when the fields are stored in Key-Value of a KV system, the whole structure is serialized into a character string, and when the fields are read, the character string is deserialized, so that the whole process is transparent to a user. When a new mail information field is needed, only one field of the ProtoBuf structure needs to be added, the newly added field can automatically adopt a default value in the process of deserializing old data, the integrity of the data is not affected, and the newly added field can be automatically serialized into a character string during updating.
For a piece of Key-Value data of the mail information, a plurality of fields of the mail information are serialized by using a ProtoBuf structure and then stored in the Value, and deserialization is performed during reading. When a field is newly added, all the stored mail information data do not need to be updated, the ProtoBuf structure provides a default data Value of the newly added field, and only when the Key-Value data need to be updated, the mail information data stored by the ProtoBuf structure can be updated and is updated according to a single data, which does not affect the stored whole data.
Has the advantages that: through a plurality of pieces of Key-Value data in the KV system, a mailbox index structure which realizes a Key-List form of a plurality of users is constructed, folder summary information and mail information indexes of user mailboxes are realized, and quick inquiry/update of mail folder information and quick addition/deletion/update of mail information can be realized.
The quick atomic updating of the mail folder information and the mail information in Key-List structures of a plurality of users under large concurrency is realized through a thread lock barrel of a memory-level Map structure.
A plurality of fields of the mail information are stored in Value of Key-Value in a ProtoBuf structure, the newly added mail information fields are flexible and efficient, and backward/forward compatibility of the field structure of the mail information data is effectively realized.
The above embodiments are provided to further explain the objects, technical solutions and advantages of the present invention in detail, it should be understood that the above embodiments are merely exemplary embodiments of the present invention and are not intended to limit the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (5)

1. A multi-user E-mail indexing method based on a KV system is characterized by comprising the following steps:
constructing a Server program architecture;
constructing a Key-List mailbox structure based on a KV system according to the Server program architecture;
constructing a Map structure thread lock barrel based on a memory;
and storing a plurality of field information of the mail by adopting a ProtoBuf structure.
2. The method according to claim 1, wherein the constructing of the Server program architecture specifically comprises:
the Server program is a multi-thread service program and comprises a network IO thread pool, a network IO thread pool and a Server module, wherein the network IO thread pool is used for providing Restful interface access service of an HTTP protocol; the working thread pool is used for carrying out data query and update service of the KV storage system;
and the network IO thread pool and the work thread pool are interacted through a message queue.
3. The method according to claim 2, wherein the constructing a Key-List mailbox structure based on the KV system according to the Server program architecture specifically comprises:
the method comprises the steps that a working thread pool thread utilizes a plurality of pieces of Key-Value data based on a KV storage system to construct a mail folder information and mail information List structure of a Key-List, one piece of Key-Value data is mail folder information data, a plurality of pieces of Key-Value data are mail information data, each piece of Key-Value data represents one piece of mail data, and relevant information of a mail is stored;
planning a content format of a Key in the Key-Value data, wherein the Key comprises a TAG identification field which indicates the type of the Key, a mail folder ID and a mail ID field; the prefix portion includes the same user ID;
the Value includes a plurality of fields, the Value of the folder information includes a folder update time stamp, the number of mails in the folder, a total Size of the mails in the folder, a maximum mail ID and a minimum mail ID in the folder, and the Value of the mail information includes a time stamp of the mail, a Size of the mail and summary information ProtoBuf serialization data of the mail.
4. The method for indexing the multi-user electronic mailbox mail based on the KV system as claimed in claim 1, wherein the constructing of the Map-structured thread lock bucket based on the memory specifically comprises:
when the thread of the working thread pool updates the mail folder information and the mail information of a user, a user-level thread mutual exclusion lock is used for ensuring the atomicity operation of data;
a Map structure thread lock barrel based on a memory is used;
when the KV data is updated, a KV system batch operation interface is adopted.
5. The method according to claim 1, wherein the storing of the information on the plurality of fields of the mail by using the ProtoBuf structure specifically comprises:
the ProtoBuf structure stores a plurality of fields of mail information, when the fields are stored in Key-Value of a KV system, the whole structure is serialized into a character string, and when the fields are read, the character string is deserialized, so that the whole process is transparent to a user;
if a new mail information field is needed, only one field of the ProtoBuf structure needs to be added, the newly added field can automatically adopt a default value in the de-serialization process of old data, and the newly added field can be automatically serialized into a character string during updating, so that the whole process has no influence on the whole data, the data conversion is flexible and efficient, and the backward/forward compatibility of the data structure is realized.
CN202210872986.6A 2022-07-22 2022-07-22 Multi-user electronic mailbox mail indexing method based on KV system Pending CN115291838A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210872986.6A CN115291838A (en) 2022-07-22 2022-07-22 Multi-user electronic mailbox mail indexing method based on KV system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210872986.6A CN115291838A (en) 2022-07-22 2022-07-22 Multi-user electronic mailbox mail indexing method based on KV system

Publications (1)

Publication Number Publication Date
CN115291838A true CN115291838A (en) 2022-11-04

Family

ID=83825141

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210872986.6A Pending CN115291838A (en) 2022-07-22 2022-07-22 Multi-user electronic mailbox mail indexing method based on KV system

Country Status (1)

Country Link
CN (1) CN115291838A (en)

Similar Documents

Publication Publication Date Title
US6167402A (en) High performance message store
US6754799B2 (en) System and method for indexing and retrieving cached objects
US7089365B2 (en) Method and system for an atomically updated, central cache memory
US6463439B1 (en) System for accessing database tables mapped into memory for high performance data retrieval
US20100274795A1 (en) Method and system for implementing a composite database
US7031973B2 (en) Accounting for references between a client and server that use disparate e-mail storage formats
CN103020315A (en) Method for storing mass of small files on basis of master-slave distributed file system
GB2417342A (en) Indexing system for a computer file store
US8392369B2 (en) File-backed in-memory structured storage for service synchronization
CN113076304A (en) Distributed version management method, device and system
US20040064430A1 (en) Systems and methods for queuing data
CN108694230B (en) Management of unique identifiers in a database
US7634510B2 (en) Method and system for time-based reclamation of objects from a recycle bin in a database
CN109165259B (en) Index table updating method based on network attached storage, processor and storage device
CN115291838A (en) Multi-user electronic mailbox mail indexing method based on KV system
CN114063931B (en) Data storage method based on big data
US7925252B2 (en) Container-level transaction management system and method therefor
CN110109866B (en) Method and equipment for managing file system directory
US7546526B2 (en) Efficient extensible markup language namespace parsing for editing
US8126841B2 (en) Storage and retrieval of variable data
US7428544B1 (en) Systems and methods for mapping e-mail records between a client and server that use disparate storage formats
CN114625805B (en) Return test configuration method, device, equipment and medium
WO2022188573A1 (en) Soft deletion of data in sharded databases
US20050131883A1 (en) Browsing a list of data items
CN117235203A (en) Data storage method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination