CN116594962A - Access request processing method and device and forward index system - Google Patents

Access request processing method and device and forward index system Download PDF

Info

Publication number
CN116594962A
CN116594962A CN202310402027.2A CN202310402027A CN116594962A CN 116594962 A CN116594962 A CN 116594962A CN 202310402027 A CN202310402027 A CN 202310402027A CN 116594962 A CN116594962 A CN 116594962A
Authority
CN
China
Prior art keywords
index system
field
target document
stored
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310402027.2A
Other languages
Chinese (zh)
Inventor
朱学敏
李贺亭
张书尧
卢嘉龙
段雪涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202310402027.2A priority Critical patent/CN116594962A/en
Publication of CN116594962A publication Critical patent/CN116594962A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • G06F16/134Distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides an access request processing method, an access request processing device and a forward index system, and relates to the artificial intelligence fields of distributed storage, big data processing and the like. The method may comprise: the method comprises the steps of obtaining an access request for a forward index system, wherein the forward index system comprises the following steps: a first-level index system using a disk as a storage medium and a second-level index system using a memory and the disk as the storage medium; if the target document requested to be accessed is determined to be stored in the secondary index system, generating an access result according to the content of the target document in a storage medium corresponding to the target field requested to be accessed, wherein all fields included in different documents are divided into two types of hot fields and cold fields, the memory is used for storing the hot fields, and the magnetic disk is used for storing the cold fields; otherwise, generating an access result according to the target document content in the primary index system, and storing the target document content in the secondary index system. By applying the scheme disclosed by the disclosure, access delay and the like can be reduced.

Description

Access request processing method and device and forward index system
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to an access request processing method, an access request processing device and a forward index system in the fields of distributed storage, big data processing and the like.
Background
The forward index refers to an index for storing corresponding values of certain specific fields of a document (doc), and is widely applied in the fields of searching, recommending and the like.
Disclosure of Invention
The disclosure provides an access request processing method, an access request processing device and a forward index system.
An access request processing method, comprising:
obtaining an access request for a forward index system, wherein the forward index system comprises: a first-level index system using a disk as a storage medium and a second-level index system using a memory and the disk as the storage medium;
responding to the fact that the target document which is requested to be accessed is stored in the secondary index system, and generating an access result corresponding to the access request according to the content of the target document stored in a storage medium corresponding to the target field which is requested to be accessed, wherein all fields included in different documents are divided into two types of hot fields and cold fields, the memory is used for storing the hot fields, and the disk is used for storing the cold fields;
in response to determining that the target document is not stored in the secondary indexing system, generating the access result from the target document content stored in the primary indexing system, and storing the target document content in the secondary indexing system.
An access request processing apparatus comprising: a request acquisition module and a result generation module;
the request acquisition module is configured to acquire an access request for a forward index system, where the forward index system includes: a first-level index system using a disk as a storage medium and a second-level index system using a memory and the disk as the storage medium;
the result generation module is used for responding to the fact that the target document which is requested to be accessed is stored in the secondary index system, generating an access result corresponding to the access request according to the target document content stored in a storage medium corresponding to the target field which is requested to be accessed, wherein all fields included in different documents are divided into two types of hot fields and cold fields, the memory is used for storing the hot fields, the magnetic disk is used for storing the cold fields, responding to the fact that the target document is not stored in the secondary index system, generating the access result according to the target document content stored in the primary index system, and storing the target document content in the secondary index system.
A forward indexing system comprising:
A first-level index system using a disk as a storage medium and a second-level index system using a memory and the disk as the storage medium;
the primary index system is used for storing different file contents;
the secondary indexing system is configured to store popular document content, where the popular document content includes: generating an access result according to target document content corresponding to an access request stored in the primary index system, and adding the access result into the target document content in the secondary index system, wherein the target document is a document requesting access, and the access result is generated after preferentially inquiring the secondary index system and determining that the target document is not stored in the secondary index system;
each field in any document stored in the secondary index system belongs to one of two types of hot fields and cold fields, the memory is used for storing the hot fields, the disk is used for storing the cold fields, and the access result further comprises: and determining an access result generated according to the content of the target document stored in a storage medium corresponding to the target field requested to be accessed when the target document is stored in the secondary index system.
An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method as described above.
A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform a method as described above.
A computer program product comprising computer programs/instructions which when executed by a processor implement a method as described above.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:
FIG. 1 is a flow chart of an embodiment of an access request processing method according to the present disclosure;
FIG. 2 is a schematic diagram illustrating an implementation process of an access request processing method based on a primary index system and a secondary index system according to the present disclosure;
FIG. 3 is a schematic diagram of the structure of an embodiment 300 of an access request processing apparatus according to the present disclosure;
FIG. 4 is a schematic diagram of the structure of a forward index system 400 according to the present disclosure;
fig. 5 shows a schematic block diagram of an electronic device 500 that may be used to implement embodiments of the present disclosure.
Detailed Description
Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
In addition, it should be understood that the term "and/or" herein is merely one association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
Fig. 1 is a flowchart of an embodiment of an access request processing method according to the present disclosure. As shown in fig. 1, the following detailed implementation is included.
In step 101, an access request for a forward index system is obtained, where the forward index system includes: a primary index system using a disk as a storage medium and a secondary index system using a memory and a disk as storage media.
In step 102, in response to determining that the target document requested to be accessed is stored in the secondary index system, generating an access result corresponding to the access request according to the content of the target document stored in the storage medium corresponding to the target field requested to be accessed, wherein all fields included in different documents are divided into two types of hot fields and cold fields, the memory is used for storing the hot fields, and the disk is used for storing the cold fields.
In step 103, in response to determining that the target document is not stored in the secondary index system, the access result is generated from the target document content stored in the primary index system and the target document content is stored in the secondary index system.
In the fields of searching, recommending and the like, hundreds of millions of concurrent accesses are usually provided, the access pressure is huge, and a single machine cannot bear the access amount of such a scale, so the following modes are adopted in the traditional mode: the forward index is stored in a slicing way according to the dimension of the document, the parallel access of hundred million scale is resisted by increasing the number of copies in the slicing way, and single storage media such as a magnetic disk are selected for single data storage after the slicing, so that the method can provide higher single capacity, has lower realization cost and can bring larger access delay.
In the scheme of the embodiment of the method, a two-stage index design mode of a first-stage index system and a second-stage index system is adopted, wherein the second-stage index system adopts a mixed storage medium of a memory and a disk and is used for storing document contents which are frequently accessed recently, namely hot document contents, hot fields in the documents are stored in the memory, and cold fields are stored in the disk, accordingly, a considerable part of access requests hit the document contents in the memory in the second-stage index system, so that access delay is reduced, throughput is improved, and the like.
The primary index system may be a distributed forward index system based on Key-Value (KV) storage, and each document may be serialized into a KV pair and then stored, where a Key may be an identification (id) of the document, and Value may be each field included in the document, a corresponding field Value, and so on.
The primary index system may employ a single storage medium, such as a disk, where the disk refers to a disk in a broad sense, such as RocksDB, rocksDB may be a high-performance Key-Value persistent storage engine, or may also be a solid state disk (SSD, solidStateDisk), etc.
The primary index system has lower implementation cost, and preferably, a fragmentation storage mode is supported, a copy mode is supported in the fragmentation (namely, one or more copies are respectively added for each fragmentation), the single-machine data storage capacity can reach a plurality of Terabytes (TB), a certain throughput can be provided in a reasonable access delay range, the continuously-increased data scale in a large-scale forward index system can be effectively treated, and the capacity is easy to expand.
In the scheme of the disclosure, a secondary index system is further constructed on the primary index system, the secondary index system is a distributed forward index system based on a mixed storage medium, wherein the stored data size can be far smaller than that of the primary index system, the secondary index system is mainly used for storing document contents which are frequently accessed recently, in the scenes of searching, recommending and the like, the access frequency of a small amount of document contents is far higher than that of other document contents, and accordingly, the corresponding document contents can be stored in the secondary index system, and the mixed storage medium can be a memory and a magnetic disk respectively.
In order to meet the high-frequency field access requirement and reduce the consumption of expensive resources such as a memory, hot fields with frequent access can be stored in the memory, and cold fields can be stored in a disk, so that low-delay and high-throughput services can be provided at lower cost.
Preferably, the secondary indexing system may also support a sharded storage approach and support an intra-sharded incremental copy approach. In addition, by adopting a mode of separately storing hot fields and cold fields, the memory quantity occupied by a single document can be reduced, the number of documents in a single instance is further improved, the number of fragments of a system is reduced, the fan-out of the system is reduced, and further access delay and the like are reduced.
Preferably, the document content stored in the secondary indexing system may be eliminated in a least recently used (LRU, leastRecentlyUsed) manner.
The LRU is a mature algorithm, and can timely release the storage space in the secondary index system through elimination processing, so that document contents with higher access frequency are reserved or added into the secondary index system, and further, the space utilization rate in the secondary index system, the hit rate of access requests and the like are improved.
Preferably, the access request may carry the identifier of the target document and the target field information, and accordingly, when the access request is acquired, the secondary index system may be queried first, that is, the secondary index system may be queried according to the identifier of the target document carried in the access request, so as to determine whether the target document is stored in the secondary index system, if it is determined that the target document is stored in the secondary index system, the access result corresponding to the access request may be generated and returned according to the target document content stored in the storage medium corresponding to the target field, if it is determined that the target document is not stored in the secondary index system, it may be determined that the target document is stored in the primary index system, and the access request may be penetrated into the primary index system, and accordingly, the access result may be generated and returned according to the target document content stored in the primary index system, and in addition, the target document content may be stored in the secondary index system, that is, and if it is, the target document content may be stored in the secondary index system, but the secondary index system may be preferentially queried later.
Preferably, if it is determined that the target document is stored in the secondary index system, a field selector (field selector) stored in the memory may be queried, a hot field list and a cold field list are stored in the field selector, the hot field list includes field information belonging to the hot field, the cold field list includes field information belonging to the cold field, in response to determining that the target fields are all hot fields according to the query result, an access result may be generated according to the target document content stored in the memory, in response to determining that the target fields are all cold fields according to the query result, an access result may be generated according to the target document content stored in the disk, in response to determining that part of the target fields are hot fields according to the query result, and an access result may be generated according to the memory and the target document content stored in the disk.
The specific forms of the hot field list and the cold field list are not limited as long as it can embody which fields belong to the hot field and which fields belong to the cold field.
Therefore, the storage medium corresponding to the target field, namely the storage medium where the target field is located, can be determined efficiently and accurately by querying the field selector, and further, a required access result can be generated according to the content of the target document stored in the corresponding storage medium.
In addition, preferably, when the access result is required to be generated according to the content of the target document stored in the memory and the disk, the document version (version) information of the target document stored in the memory and the document version information of the target document stored in the disk may be acquired respectively, and in response to the acquired document version information being consistent, the access result may be generated according to the content of the target document stored in the memory and the disk.
That is, for the document contents stored in the memory and the disk, the corresponding document version information may also be stored at the same time. Therefore, when the hot field content and the cold field content of the target document are required to be accessed simultaneously, the document version information corresponding to the two parts of content can be compared, and if the two parts of content are consistent, an access result can be generated according to the target document content stored in the memory and the disk, so that the accuracy and the like of the generated access result are improved.
Preferably, in response to the acquired document version information being inconsistent, an access result may be generated from the target document content stored in the primary index system, and the target document content stored in the secondary index system may be replaced with the target document content stored in the primary index system.
If the access request is inconsistent, the access request can be penetrated to the primary index system, and an access result is generated based on the primary index system, so that the access success rate is improved, and the secondary index system can be updated in time by utilizing the target document content stored in the primary index system, so that the hit rate of the subsequent access request hitting the secondary index system is improved, and the like.
In addition, preferably, if it is determined that the target document is stored in the secondary index system, the field selector may be queried to obtain the field version information stored in the field selector, where the field version information in the field selector may be updated in response to updating of the hot field list and/or the cold field list, the field version information of the target document may be obtained from the storage medium corresponding to the target field, and in response to consistency of the obtained field version information, the access result may be generated according to the content of the target document stored in the storage medium corresponding to the target field.
The hot field list and the cold field list are not invariable and can be dynamically adjusted according to actual needs. For example, the original hot field list includes 4 fields, i.e., a field a, a field b, a field c, and a field d, and the cold field list includes 3 fields, i.e., a field e, a field f, and a field g, so that the type of one or some fields can be adjusted according to the actual access frequency, for example, the field e is adjusted from a cold field to a hot field, and correspondingly, the adjusted hot field list includes 5 fields, i.e., a field a, a field b, a field c, a field d, and a field e, and the cold field list includes 2 fields, i.e., a field f, and a field g. For another example, if a new document field, such as field h, is added, field h may be added to a corresponding list, such as a hot field list, and accordingly, the adjusted hot field list includes 5 fields, such as field a, field b, field c, field d, and field h, and the cold field list includes 3 fields, such as field e, field f, and field g. Each adjustment may correspond to updating field version information in the field selector.
It can be seen that, for any document stored in the secondary indexing system, corresponding document version information and field version information need to be stored respectively, and in addition, the accuracy of the generated access result can be further improved by comparing the field selector with the field version information stored in the storage medium.
In addition, preferably, in response to the acquired inconsistent version information of each field, an access result may be generated according to the target document content stored in the primary index system, and the target document content stored in the secondary index system may be replaced with the target document content stored in the primary index system.
If the target document content in the secondary index system is inconsistent, the target document content in the secondary index system can be regarded as the expiration data, accordingly, the access request can be penetrated to the primary index system, and the access result is generated based on the primary index system, so that the access success rate is improved.
The primary index system stores the latest document version information and the latest field version information corresponding to each document content, so that the secondary index system can be updated in time by utilizing the target document content stored in the primary index system.
Based on the above description, fig. 2 is a schematic implementation process of the access request processing method based on the primary index system and the secondary index system according to the present disclosure.
As shown in fig. 2, the forward index system includes a primary index system and a secondary index system, where the primary index system may use a disk as a storage medium, that is, may be a disk type distributed forward index system, the secondary index system may use a memory and a disk as storage media, that is, may be a hybrid type distributed forward index system, where the secondary index system may store document contents that are frequently accessed recently, such as document 1, document 2, and document 3 shown in fig. 2, and may use a memory to store hot fields therein, and use a disk to store cold fields therein, for example, the hot fields of document 1 include field 1 and field 2, the cold fields include field 4 and field 5, the hot fields of document 2 include field 2 and field 3, the cold fields include field 4 and field 5, and the hot fields of document 3 include field 1 and field 3, and the cold fields include field 4 and field 5.
As shown in fig. 2, for an obtained access request, assuming that a target document corresponding to the access request is document 1, and target fields are field 1 and field 2, the secondary index system may be queried preferentially, accordingly, it may be determined that document 1 is stored in the secondary index system, then an access result corresponding to the access request may be generated and returned according to the content of document 1 stored in a memory in the secondary index system, for another obtained access request, assuming that a target document corresponding to the access request is document 4, and target fields are field 1 and field 3, the secondary index system may be queried preferentially, and it may be determined that document 4 is not stored in the secondary index system, accordingly, it may be determined that document 4 is stored in the primary index system, and the access request may be penetrated into the primary index system, and then an access result corresponding to the access request may be generated and returned according to the content of document 4 stored in the primary index system.
In practical application, the data size in the secondary index system is far smaller than that of the primary index system, and in addition, the primary index system and the secondary index system can dynamically adjust the number of fragments, the number of copies and the like according to practical requirements. Generally, the number of fragments in the secondary index system is smaller than that of the primary index system, but the number of copies in the fragments in the secondary index system is larger than that of the primary index system, because the data volume stored in the secondary index system is smaller, but the access volume is larger, accordingly, compared with the traditional mode, the number of copies in the fragments in the primary index system can be reduced, and further the implementation cost of the primary index system is further reduced.
It should be noted that, for the sake of simplicity of description, the foregoing method embodiments are expressed as a series of combinations of actions, but it should be understood by those skilled in the art that the present disclosure is not limited by the order of actions described, as some steps may be performed in other order or simultaneously in accordance with the present disclosure. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required by the present disclosure.
In a word, by adopting the scheme of the embodiment of the method, a two-stage index design mode of a first-stage index system and a second-stage index system is adopted, so that good balance is achieved in the aspects of considering system performance, cost and the like, various dynamic adjustment and optimization can be carried out according to actual needs, the method is flexible and convenient, and meanwhile, the system can stably operate and meet large-scale access requirements and the like.
The foregoing is a description of embodiments of the method, and the following further describes embodiments of the present disclosure through examples of apparatus.
Fig. 3 is a schematic diagram of a composition structure of an embodiment 300 of an access request processing apparatus according to the present disclosure. As shown in fig. 3, includes: a request acquisition module 301 and a result generation module 302.
The request obtaining module 301 is configured to obtain an access request for a forward index system, where the forward index system includes: a primary index system using a disk as a storage medium and a secondary index system using a memory and a disk as storage media.
The result generating module 302 is configured to generate, in response to determining that the target document requested to be accessed is stored in the secondary index system, an access result corresponding to the access request according to the target document content stored in the storage medium corresponding to the target field requested to be accessed, where all fields included in different documents are classified into two types of hot fields and cold fields, the memory is used for storing the hot fields, the disk is used for storing the cold fields, in response to determining that the target document is not stored in the secondary index system, generate, in accordance with the target document content stored in the primary index system, the access result, and store the target document content in the secondary index system.
In the scheme of the embodiment of the device, a two-stage index design mode of a first-stage index system and a second-stage index system is adopted, wherein the second-stage index system adopts a mixed storage medium of a memory and a disk and is used for storing document contents which are frequently accessed recently, hot fields in the documents are stored in the memory, and cold fields are stored in the disk, accordingly, a part of access requests hit the document contents in the memory in the second-stage index system, so that access delay is reduced, throughput and the like are improved, and the first-stage index system can adopt the disk as a storage medium, so that implementation cost is reduced, and the condition that the second-stage index system cannot provide access results is well supplemented, so that access success rate and the like are improved.
The primary index system can be a distributed forward index system based on KV storage, each document can be serialized into a KV pair and then stored, wherein Key can be the identification of the document, and Value can be each field included in the document, a corresponding field Value and the like. In addition, the primary indexing system may employ a single storage medium, such as a disk.
In the scheme of the disclosure, a secondary index system is further constructed on the primary index system, the secondary index system is a distributed forward index system based on a mixed storage medium, wherein the stored data size can be far smaller than that of the primary index system, the secondary index system is mainly used for storing document contents which are frequently accessed recently, in the scenes of searching, recommending and the like, the access frequency of a small amount of document contents is far higher than that of other document contents, and accordingly, the corresponding document contents can be stored in the secondary index system, and the mixed storage medium can be a memory and a magnetic disk respectively.
In addition, in order to meet the high-frequency field access requirement and reduce the consumption of expensive resources such as a memory, the hot field with frequent access can be stored in the memory, and the cold field can be stored in a disk, so that low-delay and high-throughput service can be provided at lower cost.
Preferably, the result generation module 302 may eliminate the document content stored in the secondary indexing system in an LRU manner.
In addition, preferably, the access request may carry the identifier of the target document and the target field information, and accordingly, when the access request is acquired, the result generating module 302 may query the secondary index system first, that is, may query the secondary index system according to the identifier of the target document carried in the access request, if it is determined that the target document is stored in the secondary index system, may generate the access result corresponding to the access request and return the access result according to the target document content stored in the storage medium corresponding to the target field, if it is determined that the target document is not stored in the secondary index system, may determine that the target document is stored in the primary index system, and may penetrate the access request into the primary index system, and accordingly, may generate the access result according to the target document content stored in the primary index system and return the access result, and may further store the target document content into the secondary index system, that is, may store the target document content in the secondary index system, but may query the secondary index system later preferentially.
Preferably, if the result generating module 302 determines that the target document is stored in the secondary index system, the field selector stored in the memory may be queried, the field selector stores therein a hot field list and a cold field list, the hot field list includes field information belonging to the hot field, the cold field list includes field information belonging to the cold field, the target fields are all hot fields in response to determining according to the query result, the access result may be generated according to the target document content stored in the memory, the target fields are all cold fields in response to determining according to the query result, the access result may be generated according to the target document content stored in the disk, the partial fields in the target fields are hot fields in response to determining according to the query result, and the access result may be generated according to the memory and the target document content stored in the disk.
In addition, preferably, when the access result needs to be generated according to the content of the target document stored in the memory and the disk, the result generating module 302 may acquire the document version information of the target document stored in the memory and the document version information of the target document stored in the disk, respectively, and in response to the acquired document version information being consistent, may generate the access result according to the content of the target document stored in the memory and the disk.
Preferably, the result generation module 302 may generate the access result according to the target document content stored in the primary index system and may replace the target document content stored in the secondary index system with the target document content stored in the primary index system in response to the acquired document version information being inconsistent.
In addition, preferably, if it is determined that the target document is stored in the secondary index system, the result generating module 302 may query the field selector to obtain the field version information stored in the field selector, where the field version information in the field selector may be updated in response to the hot field list and/or the cold field list being updated, the field version information of the target document may be obtained from the storage medium corresponding to the target field, and in response to the obtained field version information being consistent, the access result may be generated according to the content of the target document stored in the storage medium corresponding to the target field.
Preferably, the result generation module 302 may generate the access result according to the target document content stored in the primary index system and may replace the target document content stored in the secondary index system with the target document content stored in the primary index system in response to the acquired inconsistent version information of each field.
In addition, preferably, the primary index system and the secondary index system both support a sharded storage mode, and both support an intra-sharded copy adding mode.
Fig. 4 is a schematic diagram of a composition structure of an embodiment 400 of the forward index system according to the present disclosure. As shown in fig. 4, includes: a primary index system 401 that employs magnetic disks as storage media, and a secondary index system 402 that employs memory and magnetic disks as storage media.
A primary indexing system 401 for storing different document content.
A secondary indexing system 402 for storing popular document content, the popular document content comprising: after generating an access result according to the target document content corresponding to the access request stored in the primary index system 401, adding the access result to the target document content in the secondary index system 402, wherein the target document is the document requesting access, and the access result is generated after preferentially querying the secondary index system 402 and determining that the target document is not stored in the secondary index system 402.
Wherein, each field in any document stored in the secondary index system 402 belongs to one of two types of hot fields and cold fields, the memory is used for storing the hot fields, the disk is used for storing the cold fields, and the access result further comprises: when the target document is stored in the secondary index system 402, an access result generated according to the content of the target document stored in the storage medium corresponding to the target field requested to be accessed is determined.
In the scheme of the embodiment of the system, a two-stage index design mode of a first-stage index system and a second-stage index system is adopted, wherein the second-stage index system adopts a mixed storage medium of a memory and a disk and is used for storing document contents which are frequently accessed recently, hot fields in the documents are stored in the memory, and cold fields are stored in the disk, accordingly, a part of access requests hit the document contents in the memory in the second-stage index system, so that access delay is reduced, throughput and the like are improved, and the first-stage index system can adopt the disk as a storage medium, so that implementation cost is reduced, and the condition that the second-stage index system cannot provide access results is well supplemented, so that access success rate and the like are improved.
The specific workflow of the above apparatus and system embodiments may refer to the related descriptions in the foregoing method embodiments, which are not repeated.
In a word, by adopting the scheme of the embodiment of the device and the system disclosed by the invention, a two-stage index design mode of a first-stage index system and a second-stage index system is adopted, so that good balance is achieved in the aspects of considering system performance, cost and the like, various dynamic adjustments and optimizations can be carried out according to actual needs, the system is flexible and convenient, and meanwhile, the system can stably operate and meet large-scale access requirements and the like.
The scheme disclosed by the disclosure can be applied to the field of artificial intelligence, and particularly relates to the fields of distributed storage, big data processing and the like. Artificial intelligence is the subject of studying certain thinking processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.) that make a computer simulate a person, and has technology at both hardware and software levels, and artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, etc., and artificial intelligence software technologies mainly include computer vision technologies, speech recognition technologies, natural language processing technologies, machine learning/deep learning, big data processing technologies, knowledge graph technologies, etc.
In addition, the document and the like in the embodiments of the present disclosure are not specific to a particular user, and cannot reflect personal information of a particular user. In the technical scheme of the disclosure, the related processes of collecting, storing, using, processing, transmitting, providing, disclosing and the like of the personal information of the user accord with the regulations of related laws and regulations, and the public order colloquial is not violated.
According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.
Fig. 5 shows a schematic block diagram of an electronic device 500 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile apparatuses, such as personal digital assistants, cellular telephones, smartphones, wearable devices, and other similar computing apparatuses. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 5, the apparatus 500 includes a computing unit 501 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, ROM502, and RAM503 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as the methods described in this disclosure. For example, in some embodiments, the methods described in the present disclosure may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM502 and/or the communication unit 509. When the computer program is loaded into RAM503 and executed by computing unit 501, one or more steps of the methods described in the present disclosure may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the methods described in the present disclosure by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.
The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (22)

1. An access request processing method, comprising:
obtaining an access request for a forward index system, wherein the forward index system comprises: a first-level index system using a disk as a storage medium and a second-level index system using a memory and the disk as the storage medium;
responding to the fact that the target document which is requested to be accessed is stored in the secondary index system, and generating an access result corresponding to the access request according to the content of the target document stored in a storage medium corresponding to the target field which is requested to be accessed, wherein all fields included in different documents are divided into two types of hot fields and cold fields, the memory is used for storing the hot fields, and the disk is used for storing the cold fields;
In response to determining that the target document is not stored in the secondary indexing system, generating the access result from the target document content stored in the primary indexing system, and storing the target document content in the secondary indexing system.
2. The method of claim 1, wherein,
the access request carries the identification of the target document and the target field information;
the determining that the target document is requested to be accessed is stored in the secondary index system comprises: and determining whether the target document is stored in the secondary index system according to the identification.
3. The method of claim 1, further comprising:
and eliminating the document content stored in the secondary index system according to the least recently used mode.
4. The method of claim 1, wherein,
the generating the access result corresponding to the access request according to the target document content stored in the storage medium corresponding to the target field of the access request comprises:
inquiring a field selector stored in the memory, wherein a hot field list and a cold field list are stored in the field selector, the hot field list comprises field information belonging to the hot field, and the cold field list comprises field information belonging to the cold field;
Responding to the target fields which are determined to be the hot fields according to the query result, and generating the access result according to the target document content stored in the memory;
responding to the query result to determine that the target fields are all the cold fields, and generating the access result according to the target document content stored in the disk;
and responding to the query result to determine that part of the fields in the target fields are the hot fields, and generating the access result according to the memory and the target document content stored in the disk.
5. The method of claim 4, wherein,
the generating the access result according to the memory and the target document content stored in the disk includes:
respectively acquiring the document version information of the target document stored in the memory and the document version information of the target document stored in the disk;
and responding to the obtained version information of each document to be consistent, and generating the access result according to the memory and the target document content stored in the disk.
6. The method of claim 5, further comprising:
And generating the access result according to the target document content stored in the primary index system in response to the inconsistency of the acquired document version information, and replacing the target document content stored in the secondary index system by using the target document content stored in the primary index system.
7. The method of claim 4, wherein,
the generating the access result corresponding to the access request according to the target document content stored in the storage medium corresponding to the target field of the access request comprises:
querying the field selector to obtain field version information stored in the field selector, wherein the field version information in the field selector is updated in response to updating of the hot field list and/or the cold field list;
acquiring field version information of the target document from a storage medium corresponding to the target field;
and generating the access result according to the target document content stored in the storage medium corresponding to the target field in response to the consistency of the acquired version information of each field.
8. The method of claim 7, further comprising:
And generating the access result according to the target document content stored in the primary index system in response to the inconsistency of the acquired version information of each field, and replacing the target document content stored in the secondary index system by using the target document content stored in the primary index system.
9. The method according to any one of claims 1 to 8, wherein,
the primary index system and the secondary index system both support a fragmentation storage mode and both support a copy adding mode in the fragmentation.
10. An access request processing apparatus comprising: a request acquisition module and a result generation module;
the request acquisition module is configured to acquire an access request for a forward index system, where the forward index system includes: a first-level index system using a disk as a storage medium and a second-level index system using a memory and the disk as the storage medium;
the result generation module is used for responding to the fact that the target document which is requested to be accessed is stored in the secondary index system, generating an access result corresponding to the access request according to the target document content stored in a storage medium corresponding to the target field which is requested to be accessed, wherein all fields included in different documents are divided into two types of hot fields and cold fields, the memory is used for storing the hot fields, the magnetic disk is used for storing the cold fields, responding to the fact that the target document is not stored in the secondary index system, generating the access result according to the target document content stored in the primary index system, and storing the target document content in the secondary index system.
11. The apparatus of claim 10, wherein,
the access request carries the identification of the target document and the target field information;
the result generation module determines whether the target document is stored in the secondary index system according to the identification.
12. The apparatus of claim 10, wherein,
the result generation module is further used for eliminating the document content stored in the secondary index system according to the least recently used mode.
13. The apparatus of claim 10, wherein,
the result generation module queries a field selector stored in the memory, wherein a hot field list and a cold field list are stored in the field selector, the hot field list comprises field information belonging to the hot field, the cold field list comprises field information belonging to the cold field, the target fields are all the hot fields according to query results, the access result is generated according to target document content stored in the memory, the target fields are all the cold fields according to the query results, the access result is generated according to target document content stored in the disk, the partial fields in the target fields are the hot fields according to the query results, and the access result is generated according to the target document content stored in the memory and the disk.
14. The apparatus of claim 13, wherein,
the result generation module respectively acquires the document version information of the target document stored in the memory and the document version information of the target document stored in the disk, and generates the access result according to the contents of the target document stored in the memory and the disk in response to the consistency of the acquired document version information.
15. The apparatus of claim 14, wherein,
the result generation module is further used for generating the access result according to the target document content stored in the primary index system in response to the inconsistency of the obtained document version information, and replacing the target document content stored in the secondary index system by the target document content stored in the primary index system.
16. The apparatus of claim 13, wherein,
the result generation module queries the field selector to acquire field version information stored in the field selector, updates the field version information in the field selector in response to updating of the hot field list and/or the cold field list, acquires the field version information of the target document from a storage medium corresponding to the target field, and generates the access result according to the content of the target document stored in the storage medium corresponding to the target field in response to consistency of the acquired field version information.
17. The apparatus of claim 16, wherein,
the result generation module is further configured to generate the access result according to the target document content stored in the primary index system in response to inconsistency of the obtained version information of each field, and replace the target document content stored in the secondary index system with the target document content stored in the primary index system.
18. The device according to any one of claims 10 to 17, wherein,
the primary index system and the secondary index system both support a fragmentation storage mode and both support a copy adding mode in the fragmentation.
19. A forward indexing system comprising:
a first-level index system using a disk as a storage medium and a second-level index system using a memory and the disk as the storage medium;
the primary index system is used for storing different file contents;
the secondary indexing system is configured to store popular document content, where the popular document content includes: generating an access result according to target document content corresponding to an access request stored in the primary index system, and adding the access result into the target document content in the secondary index system, wherein the target document is a document requesting access, and the access result is generated after preferentially inquiring the secondary index system and determining that the target document is not stored in the secondary index system;
Each field in any document stored in the secondary index system belongs to one of two types of hot fields and cold fields, the memory is used for storing the hot fields, the disk is used for storing the cold fields, and the access result further comprises: and determining an access result generated according to the content of the target document stored in a storage medium corresponding to the target field requested to be accessed when the target document is stored in the secondary index system.
20. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-9.
21. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1-9.
22. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the method of any of claims 1-9.
CN202310402027.2A 2023-04-14 2023-04-14 Access request processing method and device and forward index system Pending CN116594962A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310402027.2A CN116594962A (en) 2023-04-14 2023-04-14 Access request processing method and device and forward index system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310402027.2A CN116594962A (en) 2023-04-14 2023-04-14 Access request processing method and device and forward index system

Publications (1)

Publication Number Publication Date
CN116594962A true CN116594962A (en) 2023-08-15

Family

ID=87588867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310402027.2A Pending CN116594962A (en) 2023-04-14 2023-04-14 Access request processing method and device and forward index system

Country Status (1)

Country Link
CN (1) CN116594962A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118069590A (en) * 2024-04-22 2024-05-24 极限数据(北京)科技有限公司 Forward index processing method, device, medium and equipment for searching database

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118069590A (en) * 2024-04-22 2024-05-24 极限数据(北京)科技有限公司 Forward index processing method, device, medium and equipment for searching database

Similar Documents

Publication Publication Date Title
JP2022191412A (en) Method for training multi-target image-text matching model and image-text retrieval method and apparatus
WO2023093245A1 (en) Metadata query method based on distributed file system, and device and storage medium
JP2022137281A (en) Data query method, device, electronic device, storage medium, and program
CN116594962A (en) Access request processing method and device and forward index system
CN114817651B (en) Data storage method, data query method, device and equipment
CN113722600B (en) Data query method, device, equipment and product applied to big data
CN109947736B (en) Method and system for real-time computing
CN111949648B (en) Memory data caching system and data indexing method
US12007965B2 (en) Method, device and storage medium for deduplicating entity nodes in graph database
CN114491253B (en) Method and device for processing observation information, electronic equipment and storage medium
US10769214B2 (en) Encoding and decoding files for a document store
CN115617859A (en) Data query method and device based on knowledge graph cluster
CN113240089B (en) Graph neural network model training method and device based on graph retrieval engine
KR20240010581A (en) Machine learning hyperparameter tuning
US11599583B2 (en) Deep pagination system
CN111639099A (en) Full-text indexing method and system
CN110765237A (en) Document processing method, document processing device, storage medium and electronic equipment
CN113449155B (en) Method, apparatus, device and medium for feature representation processing
CN115759233B (en) Model training method, graph data processing device and electronic equipment
EP4131017A2 (en) Distributed data storage
EP3961432A2 (en) Method and apparatus of optimizing search system
CN113032402B (en) Method, device, equipment and storage medium for storing data and acquiring data
CN117112546A (en) Data operation method, device, equipment and storage medium
Wei et al. A Highly Accurate Data Synchronization and Full-text Search Algorithm for Canal and Elasticsearch
CN116383333A (en) Data storage method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination