CN102169507A - Distributed real-time search engine - Google Patents

Distributed real-time search engine Download PDF

Info

Publication number
CN102169507A
CN102169507A CN 201110137785 CN201110137785A CN102169507A CN 102169507 A CN102169507 A CN 102169507A CN 201110137785 CN201110137785 CN 201110137785 CN 201110137785 A CN201110137785 A CN 201110137785A CN 102169507 A CN102169507 A CN 102169507A
Authority
CN
China
Prior art keywords
index
burst
node
center control
control nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN 201110137785
Other languages
Chinese (zh)
Other versions
CN102169507B (en
Inventor
程行荣
季刚
陈青溪
时宜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yaxon Networks Co Ltd
Original Assignee
Xiamen Yaxon Networks Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yaxon Networks Co Ltd filed Critical Xiamen Yaxon Networks Co Ltd
Priority to CN 201110137785 priority Critical patent/CN102169507B/en
Publication of CN102169507A publication Critical patent/CN102169507A/en
Application granted granted Critical
Publication of CN102169507B publication Critical patent/CN102169507B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of search engines, specifically relating to a distributed real-time search engine. A system construction and operation method of the search engine at least comprises the following steps: A, designing a functional structure of a system; B, designing a data index structure of the system; C, creating an index; D, updating the index; and E, searching the index. The distributed real-time search engine can construct an updating index and a combining index simultaneously in the memory of the system, and can access the updating index and the combining index simultaneously while searching the index; when the number of the documents of the updating index is accumulated to a threshold value, the updating index is submitted to a disk index and changed as a combining index, and the original combining index is changed as a new updating index; and therefore, the updating data can be searched, and the real time property of the retrieval data of the search engine can be improved.

Description

A kind of distributed real-time search engine
Technical field
The present invention relates to the search engine technique field, relate in particular to a kind of distributed real-time search engine.
Background technology
Be accompanied by advent of Knowledge, the information in the internet is explosive growth, and what the present stage people faced is not absence of information, but information spreads unchecked, the screening of having no way of, thereby, obtaining the information that needs how accurately and fast, in time, is the problem that search engine need solve.
Search engine is meant according to certain strategy, the specific computer program of utilization and gathers information from particular network such as internet that after information being organized and handled, for the user provides retrieval service, user's system is given in the information exhibition that user search is relevant.
Traditional search engine, for example, Google, Baidu, Yahoo etc., though the data volume of handling is huge, reached the TB level, but its data source is mainly from conventional websites such as portal website, forum, E-Government, the station data renewal frequency of this class is not high, each data updated amount is also little, thereby its information processing is not high to the real-time requirement of search engine.
Along with microblogging, the rise of socialization medium such as social class website, " micromessage " emerge in multitude that the netizen creates, thus produce the real time mass data.In addition along with enterprise moves the fast development of using as mobile crm system and handheld terminal, the user has higher requirement to the inquiry velocity and the real-time of information, the processing demands that traditional search engines can not adapt to the real time mass data processing and search in real time.The real time mass data have that renewal frequency height, data updated amount are big, the big characteristic of data volume of accumulation, reach hundreds of GB usually, even reach the data volume of TB or PB level.Real-time search engine all has very high requirement on the real-time of mass data processing and inquiry response.When data volume reaches the TB level, there is very big contradiction between the frequency of Data Update and the speed of inquiry response, because it is big to work as the cumulative data amount, when the data updated amount is also very big, thereby can cause the structure of index and maintenance time length to cause real-time to guarantee, promptly, when existing search engine scheme adopts this increment index mechanism, the structure of index and retrieving separately carry out, after the number of files that the construction logic of index is only accumulated in new section reaches threshold value (as 10000) or reaches threshold value (as 5 minutes) interval time, just new section is submitted in the index burst and uses for the indexed search logic.Therefore, can retrieve the document from being submitted to of a document, between have a regular hour and postpone, usually a few minutes in the dozens of minutes scope, and in real-time retrieval, so long delay is intolerable.
Summary of the invention
Deficiency at the prior art scheme, the present invention proposes a kind ofly to overcome increment index mechanism with the contradiction between the index real-time, index during by the renewal in the Installed System Memory, a kind of distributed real-time search engine that the cooperation of index and disk index realizes when merging.
The technical solution used in the present invention is as follows:
A kind of distributed real-time search engine, its system constructing and operation may further comprise the steps at least:
A. the functional structure of design system, this functional structure is to create in the concentrating type system based on Master/Slave, comprise following functional node: center control nodes, index datastore node and external service node, wherein, described center control nodes is created in the Master system, described index datastore node and external service node are created in the Slave system, described center control nodes, the storage and maintenance that is used for the attribute information of data directory structure index, and the storage and maintenance of the attribute information of index datastore node, described index datastore node is used for the establishment of data directory structure index burst, upgrade and retrieval, described external service node is used for the establishment of reception hint, renewal and retrieval request also are forwarded to center control nodes with this request and handle;
B. the data directory structure of design system, this index structure tree hierarchy from top to bottom constitutes: index, the index burst, section, document and territory, wherein, described index can have a plurality of in a system, a described index burst is that described index is by a data block after cutting apart, wherein, each index burst that belongs to same index is stored on the index datastore node, a described index burst is to be made of one or more section, a described section is to be made of one or more document, each contained document can be different data object type in the section, a described document has the uniquely identified key assignments in system's overall situation, the structure of described document comprises the territory that is used to describe Doctype;
C. the establishment of index may further comprise the steps:
C1. after externally service node receives the index creation request this request is forwarded to center control nodes, center control nodes is resolved this index creation request, therefrom extract the attribute information of index to be created, and verify whether this attribute information is complete and effective, if this attribute information is complete and effective, then carry out the processing of step C2,, then send answer failed information to external service node if this attribute information is incomplete or invalid;
C2. center control nodes is divided into some bursts according to the index burst number in the attribute information of the index to be created that generates among the step C1 with index to be created, simultaneously, according to the attribute information that is stored in the index data node in the center control nodes, judge the state and the loading condition of each index data node, and come according to this to determine each index burst is stored and created, and the attribute information with index to be created is sent to each corresponding index datastore node then in which index data node.The index datastore node is according to the attribute information of the index of receiving to be created, on this index datastore node, make up the index burst of the described index to be created of center control nodes assignment, if this index datastore node is created this index burst failure, then center control nodes is tasked the index data node that other in good condition, load compare less with this index burst branch and is created, create in the index datastore node until whole index bursts of this index to be created and to finish or to create failure, carry out the processing of step C3;
If C3. whole index bursts of index to be created are created in the index datastore node and are finished among the step C2, the center control nodes updated stored is in index datastore node attribute information wherein, and transmission index burst is created successful response message to external service node; If whole index bursts of index to be created are created failure among the step C2 in the index datastore node, then send and create replying of index failure to external service node;
D. the renewal of index may further comprise the steps:
D1. after externally service node receives the index upgrade request this request is forwarded to center control nodes, center control nodes is sent to this index upgrade request the index datastore node at the index burst place of this index according to the index attributes information and the index datastore node attribute information that are stored in wherein;
D2. the index datastore node is according to the index upgrade request of receiving, on the index burst of index to be updated place index datastore node, to upgrade document storage in new section, if upgrade the document storage success, then will upgrade the corresponding old document of document and in new section, be labeled as the deletion state, and return the index upgrade successful information to center control nodes, if upgrade the document storage failure, then return the index upgrade failure information to center control nodes, center control nodes is sent to external service node with the information of index upgrade success or failure at last;
The index upgrade of this step D also comprises the deletion step of document: when index upgrade request during only for the deletion document command, on the storage burst of the index datastore node of waiting to delete the document place, in new section the document is labeled as deletion;
The index upgrade of this step D, also comprise the step that makes up real time indexing: in the internal memory of system, make up index when index is with merging when upgrading simultaneously, the retrieval of index be when visiting this renewal index and when merging index carry out, when carrying out index upgrade, the index when index in the renewal is described renewal, when reached threshold value the update time of index when the number of documents of index reached threshold value or this renewal when this renewal, system indexes when submitting this renewal in the disk index, index index index when the upgrading during merging before index and the change simultaneously when merging when changing this renewal afterwards;
E. the retrieval of index may further comprise the steps:
E1. externally send it to center control nodes after the retrieval request of service node reception hint, center control nodes resolve this retrieval request and judge its at the target index, then according to the attribute information of index datastore node attribute information and target index, search all index bursts of this target index, and assign retrieval request to the index datastore node of each burst of storage;
E2. the index datastore node is retrieved relevant documentation according to the retrieval request of receiving on the respective index burst of its storage, will be sent to external service node after the result for retrieval ordering at last;
After integrating, sort, the result for retrieval of each index datastore node that E3. external service node will receive is sent to client.
Further, the functional structure of the described system of steps A, also comprise a standby center control nodes, described center control nodes backs up to standby center control nodes with the data sync of its storage in real time, when the center Control Node breaks down the phase, this standby center control nodes changes to center control nodes, and when former center control nodes is recovered from fault, former center control nodes changes to new standby center control nodes.
Further, described index datastore node and external service node periodically send the heartbeat signal that characterizes its status information to described center control nodes, if center control nodes is not received heartbeat signal in the default time, then this index datastore node of mark or external service node are dead, simultaneously, center control nodes can will be labeled as all index bursts of storing in the dead index datastore node, copy is a in the index data node of any this index burst copy of not storing of other again in the copy of these index bursts of storing from other index data nodes, make the number of copies of index burst remain unchanged, to guarantee that the index burst all is available at any time.
Further, in the heartbeat signal that described index data node takes place in center control nodes, the load information that comprises this index data node, in the process of index creation, center control nodes can be distributed to the index burst the little index data node storage of load as far as possible, equally, in the process of indexed search, center control nodes can be submitted to retrieval request the index datastore node processing at the little index burst of load or this burst copy place as far as possible.
Further, described index datastore node attribute information comprises: the state of the ID of node, the title of node, node types, node, the load of node and the position of node, described index attributes information comprises: the memory node ID of the number of copies of the burst number of the organization definition of document, index, index burst and index burst and index burst copy in the title of index, the index.
Further, in the data directory structure of the described system of step B, each index burst also has a plurality of index burst copies, this index burst copy is created when the described index creation of step C, upgrade the back asynchronous refresh at former index burst when the described index upgrade of step D, it is stored on the different index datastore nodes with former index burst; The index datastore node at former index burst place is responsible for handling the update request at this index burst, when former index burst upgrade finish after, the index data node that the index data node at former index burst place is responsible for update request is sent to asynchronously corresponding index burst copy place carries out the renewal of index burst copy; Index burst copy is all supported indexed search with corresponding former index burst, center control nodes is submitted to little index burst of load or index burst copy place index datastore node processing according to the loading condition of former index burst and index burst copy place index datastore node with the indexed search request.
Further, center control nodes is made regular check on the number of the index burst copy of each index in whole index, and when the number of index burst copy was lower than default setting number, the copy of this index burst duplicated automatically in other back end in system; When the index datastore node of the former index burst of storage breaks down, system chooses an index upgrade job of taking over former index burst from the index burst copy of correspondence, this index burst copy becomes new former index burst, then in other index data nodes generating an index burst copy, guarantee that the number of copies of this index burst remains unchanged; When the index data node of storage index burst copy breaks down, system can in other index data nodes, generate one with the same copy of former index burst, guarantee that the number of copies of this index burst remains unchanged.
Further, each index burst of described same index and index burst copy creating and be stored on the index datastore node, be to carry out according to following strategy: center control nodes is according to the load information of node in the attribute information of index datastore node, described index burst and index burst copy are dispensed to the lightest index datastore node of load, when the number of available index datastore node is less than the number of index burst, center control nodes distributes a plurality of index bursts to same index datastore node, and center control nodes is the index burst copy of allocation index burst not; When the number of available index datastore node during more than the number of index burst, the center control nodes distribution portion or all the index burst copy of index bursts to remaining index datastore node.
Further, in the renewal of the described index of step D, the step of the merging of the section of comprising also: the index of the index in described renewal divides the number of medium film section to reach on threshold value or the distance once to reach interval time that index merges threshold value, the index datastore node at this index burst place reads the document in less several sections and it is stored in a new section, then with these several less section physics deletions.
Further, the storage of the described renewal document of step D on the index burst, be cryptographic hash by the key assignments that calculate to upgrade document, this cryptographic hash is counted delivery with the index burst of document place index after, at last document is assigned to the index burst of the numerical value reference numeral of this delivery and stores.
Further, the different pieces of information object type of the described document of step B, comprise: text data object, image data objects, audio data objects, video data objects, executable program data object, the attribute information of each data object type are stored in the structure in territory of document.
The present invention is by adopting technique scheme, and the beneficial effect that has is:
1. in the internal memory of system, make up index when index is with merging when upgrading simultaneously, index when index is with merging when passing through visit renewal simultaneously during indexed search, after the number of documents of index runs up to threshold value when upgrading, upgrading index is submitted to the disk index and changes to index when merging, index when index changes to new renewal during original merging, guaranteed that the data of upgrading also can be retrieved, but improved the real-time of search engine retrieve data;
2. the center control nodes of native system, standby center control nodes, external service node and index datastore node are at the concentrating type system creation based on Master/Slave, has the height fault-tolerance, be fit to be deployed on the cheap machine, and the data access of high-throughput can be provided;
3. by the index burst that is stored in the index datastore node is created index burst copy, the fault-tolerance of enhanced system.
Description of drawings
Fig. 1 is the functional structure synoptic diagram of one embodiment of the present invention.
Fig. 2 is the synoptic diagram of data directory structure of the present invention.
Fig. 3 is the embodiment synoptic diagram of index burst of the present invention and index burst copy storage policy.
Embodiment
Now the present invention is further described with embodiment in conjunction with the accompanying drawings.
A kind of distributed real-time search engine, its system constructing and operation are to be made of following steps:
Steps A: the functional structure of design system, consult shown in the accompanying drawing 1, this functional structure is to create in the concentrating type system based on Master/Slave, comprise following functional node: center control nodes, index datastore node and external service node, wherein, described center control nodes is created in the Master system, described index datastore node and external service node are created in the Slave system, described center control nodes is host node in system, the storage and maintenance that is used for the attribute information of data directory structure index, and the storage and maintenance of the attribute information of index datastore node, described index datastore node is back end in system, be used for the establishment of data directory structure index sliced layer, upgrade and retrieval, described external service node is a client node in system, is used for the establishment of reception hint, renewal and retrieval request also are forwarded to center control nodes with this request and handle;
Step B: the data directory structure of design system, consult shown in the accompanying drawing 2, this index structure tree hierarchy from top to bottom constitutes: index, the index burst, section, document and territory, wherein, described index can have a plurality of in a system, a described index burst is that described index is by a data block after cutting apart, wherein, each index burst that belongs to same index is stored on the index datastore node, a described index burst is to be made of one or more section, a described section is to be made of one or more document, each contained document can be different data object type in the section, a described document has the uniquely identified key assignments in system's overall situation, the structure of described document comprises the territory that is used to describe the document different attribute; Wherein, described index provides the set of the several data object of retrieval support, and described index burst disperses to be stored on the index datastore node of system, and this can improve the retrieve data efficient of system;
Step C: the establishment of index is to be made of following step:
C1. after externally service node receives the index creation request this request is forwarded to center control nodes, center control nodes is resolved this index creation request, therefrom extract the attribute information of index to be created, and verify whether this attribute information is complete and effective, if this attribute information is complete and effective, then carry out the processing of step C2,, then send answer failed information to external service node if this attribute information is incomplete or invalid;
C2. center control nodes is divided into some bursts according to the index burst number in the attribute information of the index to be created that generates among the step C1 with index to be created, simultaneously, according to the attribute information that is stored in the index data node in the center control nodes, judge the state and the loading condition of each index data node, and come according to this to determine each index burst is stored and created, and the attribute information with index to be created is sent to each corresponding index datastore node then in which index data node; The index datastore node is according to the attribute information of the index of receiving to be created, on this index datastore node, make up the index burst of the described index to be created of center control nodes assignment, if this index datastore node is created this index burst failure, then center control nodes is tasked the index data node that other in good condition, load compare less with this index burst branch and is created, create in the index datastore node until whole index bursts of this index to be created and to finish or to create failure, carry out the processing of step C3;
If C3. whole index bursts of index to be created are created in the index datastore node and are finished among the step C2, the center control nodes updated stored is in index datastore node attribute information wherein, and transmission index burst is created successful response message to external service node; If whole index bursts of index to be created are created failure among the step C2 in the index datastore node, then send and create replying of index failure to external service node;
Step D: the renewal of index is to be made of following steps:
D1. after externally service node receives the index upgrade request this request is forwarded to center control nodes, center control nodes is sent to this index upgrade request the index datastore node at the index burst place of this index according to the index attributes information and the index datastore node attribute information that are stored in wherein;
D2. the index datastore node is according to the index upgrade request of receiving, on the index burst of index to be updated place index datastore node, to upgrade document storage in new section, if upgrade the document storage success, then will upgrade the corresponding old document of document and in new section, be labeled as the deletion state, and return the index upgrade successful information to center control nodes, if upgrade the document storage failure, then return the index upgrade failure information to center control nodes, center control nodes is sent to external service node with the information of index upgrade success or failure at last;
The index upgrade of this step D also comprises the deletion step of document: when index upgrade request during only for the deletion document command, on the storage burst of the index datastore node of waiting to delete the document place, in new section the document is labeled as deletion;
The index upgrade of this step D, also comprise the step that makes up real time indexing: in the internal memory of system, make up index when index is with merging when upgrading simultaneously, the retrieval of index be when visiting this renewal index and when merging index carry out, when carrying out index upgrade, the index when index in the renewal is described renewal, when reached threshold value the update time of index when the number of documents of index reached threshold value or this renewal when this renewal, system indexes when submitting this renewal in the disk index, index index index when the upgrading during merging before index and the change simultaneously when merging when changing this renewal afterwards;
Step e: the retrieval of index is to be made of following steps:
E1. externally send it to center control nodes after the retrieval request of service node reception hint, center control nodes resolve this retrieval request and judge its at the target index, then according to the attribute information of index datastore node attribute information and target index, search all index bursts of this target index, and assign retrieval request to the index datastore node of each burst of storage;
E2. the index datastore node is retrieved relevant documentation according to the retrieval request of receiving on the respective index burst of its storage, will be sent to external service node after the result for retrieval ordering at last;
After integrating, sort, the result for retrieval of each index datastore node that E3. external service node will receive is sent to client.
As one preferred embodiment, the functional structure of the described system of steps A, also comprise a standby center control nodes, described center control nodes backs up to standby center control nodes with the data sync of its storage in real time, when the center Control Node breaks down the phase, this standby center control nodes changes to center control nodes, and when former center control nodes is recovered from fault, former center control nodes changes to new standby center control nodes; Because center control nodes is a host node in system, in a single day it break down, and will cause the total system paralysis, therefore, by increasing standby center control nodes, can realize the fault of center control nodes is shifted, and improves the fault-tolerance of system.
As one preferred embodiment, described index datastore node and external service node periodically send the heartbeat signal that characterizes its status information to described center control nodes, if center control nodes is not received heartbeat signal in the default time, then this index datastore node of mark or external service node are dead, simultaneously, center control nodes can will be labeled as all index bursts of storing in the dead index datastore node, copy is a in the index data node of any this index burst copy of not storing of other again in the copy of these index bursts of storing from other index data nodes, make the number of copies of index burst remain unchanged, to guarantee that the index burst all is available at any time.
As one preferred embodiment, in the heartbeat signal that described index data node takes place in center control nodes, the load information that comprises this index data node, in the process of index creation, center control nodes can be distributed to the index burst the little index data node storage of load as far as possible, equally, in the process of indexed search, center control nodes can be submitted to retrieval request the index datastore node processing at the little index burst of load or this burst copy place as far as possible.
As one preferred embodiment, described index datastore node attribute information comprises: the state of the ID of node, the title of node, node types, node, the load of node and the position of node, and described index attributes information comprises: the memory node ID of the number of copies of the burst number of the organization definition of document, index, index burst and index burst and index burst copy in the title of index, the index; This index datastore node attribute information and index attributes information are metadata in system, this metadata store is on center control nodes, and the center control nodes of system, index datastore node and external service node can be followed according to these metadata and be deduced each index burst position in cluster.
As one preferred embodiment, in the data directory structure of the described system of step B, each index burst also has a plurality of index burst copies, this index burst copy is created when the described index creation of step C, upgrade the back asynchronous refresh at former index burst when the described index upgrade of step D, it is stored on the different index datastore nodes with former index burst.The index datastore node at former index burst place is responsible for handling the update request at this index burst, when former index burst upgrade finish after, the index data node that the index data node at former index burst place is responsible for update request is sent to asynchronously corresponding index burst copy place carries out the renewal of index burst copy.Index burst copy is all supported indexed search with corresponding former index burst, center control nodes is submitted to little index burst of load or index burst copy place index datastore node processing according to the loading condition of former index burst and index burst copy place index datastore node with the indexed search request.。
Further, center control nodes is made regular check on the number of the index burst copy of each index in whole index, and when the number of index burst copy was lower than default setting number, the copy of this index burst duplicated automatically in other back end in system.When the index datastore node of the former index burst of storage breaks down, system chooses an index upgrade job of taking over former index burst from the index burst copy of correspondence, this index burst copy becomes new former index burst, then in other index data nodes generating an index burst copy, guarantee that the number of copies of this index burst remains unchanged.When the index data node of storage index burst copy breaks down, system can in other index data nodes, generate one with the same copy of former index burst, guarantee that the number of copies of this index burst remains unchanged.
Further, each index burst of described same index and index burst copy creating and be stored on the index datastore node, be to carry out according to following strategy: center control nodes is according to the load information of node in the attribute information of index datastore node, described index burst and index burst copy are dispensed to the lightest index datastore node of load, when the number of available index datastore node is less than the number of index burst, center control nodes distributes a plurality of index bursts to same index datastore node, and center control nodes is the index burst copy of allocation index burst not; When the number of available index datastore node during more than the number of index burst, the center control nodes distribution portion or all the index burst copy of index bursts to remaining index datastore node; One that consults this strategy shown in the accompanying drawing 3 illustrates, it is that an index burst number is 2, the index burst number of copies of each index burst is 1 the index situation in the storage of index datastore node: when the index datastore node number of system is 1, the index burst 1 of this index and index burst 2 all are stored in the index datastore node 1, and each burst does not have index burst copy, because copy only is stored in the different nodes and could the availability and the reliability of system be worked with former burst, when the index datastore node number in the system is 2, the index burst 1 and the index burst 2 that are stored in the index datastore node 1 all have index burst copy 1 ' and the index burst copy 2 ' that is stored on the index datastore node 2, index datastore node 2 can provide with index datastore node 1 the same service, therefore increase the service performance that the index datastore node can expanding system; When the index datastore node number of system was 4, index burst 1, index burst 2, index burst copy 1 ' and index burst copy 2 ' were that separate storage is on these 4 index datastore nodes.
As one preferred embodiment, in the renewal of the described index of step D, the step of the merging of the section of comprising also: the index of the index in described renewal divides the number of medium film section to reach on threshold value or the distance once to reach interval time that index merges threshold value, the index datastore node at this index burst place reads the document in less several sections and it is stored in a new section, then with these several less section physics deletions; In the building process of index, can constantly produce new section, when index divides the number of medium film section too many, can influence the recall precision of indexed search logic, therefore, this step is merged into a big section with a plurality of little sections, and rejects the data of tag delete, has optimized the storage space of index, reduce the number of the index segment that the indexed search logic operates simultaneously, thereby improved the recall precision of indexed search logic.
As one preferred embodiment, the storage of the described renewal document of step D on the index burst, be by calculating the cryptographic hash of the key assignments that upgrades document, after this cryptographic hash counted delivery with the index burst of document place index, at last document is assigned to the index burst of the numerical value reference numeral of this delivery and stores.
As one preferred embodiment, the different pieces of information object type of the described document of step B is: text data object, image data objects, audio data objects, video data objects, executable program data object, the attribute information of each data object type is stored in the structure in territory of document, the structure in the territory of document is used to store the attribute information of document, for example, for the document of text, can comprise following information: file name, keyword, author, file size, classification, file description etc.; And, can comprise following information: file name, bit rate (bps), file size, duration, author or artist name, song title, school, album name etc. for the document of audio types.
Although specifically show and introduced the present invention in conjunction with preferred embodiment; but the those skilled in the art should be understood that; in the spirit and scope of the present invention that do not break away from appended claims and limited; can make various variations to the present invention in the form and details, be protection scope of the present invention.

Claims (10)

1. distributed real-time search engine, its system constructing and operation may further comprise the steps at least:
A. the functional structure of design system, this functional structure is to create in the concentrating type system based on Master/Slave, comprise following functional node: center control nodes, index datastore node and external service node, wherein, described center control nodes is created in the Master system, described index datastore node and external service node are created in the Slave system, described center control nodes, the storage and maintenance that is used for the attribute information of data directory structure index, and the storage and maintenance of the attribute information of index datastore node, described index datastore node is used for the establishment of data directory structure index burst, upgrade and retrieval, described client node is used for the establishment of reception hint, renewal and retrieval request also are forwarded to center control nodes with this request and handle;
B. the data directory structure of design system, this index structure tree hierarchy from top to bottom constitutes: index, the index burst, section, document and territory, wherein, described index can have a plurality of in a system, a described index burst is that described index is by a data block after cutting apart, wherein, each index burst that belongs to same index is stored on the index datastore node, a described index burst is to be made of one or more section, a described section is to be made of one or more document, each contained document can be different data object type in the section, a described document has the uniquely identified key assignments in system's overall situation, the structure of described document comprises the territory that is used to describe Doctype;
C. the establishment of index may further comprise the steps:
C1. after externally service node receives the index creation request this request is forwarded to center control nodes, center control nodes is resolved this index creation request, therefrom extract the attribute information of index to be created, and verify whether this attribute information is complete and effective, if this attribute information is complete and effective, then carry out the processing of step C2,, then send answer failed information to external service node if this attribute information is incomplete or invalid;
C2. center control nodes is divided into some bursts according to the index burst number in the attribute information of the index to be created that generates among the step C1 with index to be created, simultaneously, according to the attribute information that is stored in the index data node in the center control nodes, judge the state and the loading condition of each index data node, and come according to this to determine each index burst is stored and created, and the attribute information with index to be created is sent to each corresponding index datastore node then in which index data node;
The index datastore node is according to the attribute information of the index of receiving to be created, on this index datastore node, make up an index burst of the described index to be created of center control nodes assignment, if this index datastore node is created this index burst failure, then center control nodes is tasked the index data node that other in good condition, load compare less with this index burst branch and is created, create in the index datastore node until whole index bursts of this index to be created and to finish or to create failure, carry out the processing of step C3;
If C3. whole index bursts of index to be created are created in the index datastore node and are finished among the step C2, the center control nodes updated stored is in index datastore node attribute information wherein, and transmission index burst is created successful response message to external service node; If whole index bursts of index to be created are created failure among the step C2 in the index datastore node, then send and create replying of index failure to external service node;
D. the renewal of index may further comprise the steps:
D1. after externally service node receives the index upgrade request this request is forwarded to center control nodes, center control nodes is sent to this index upgrade request the index datastore node at the index burst place of this index according to the index attributes information and the index datastore node attribute information that are stored in wherein;
D2. the index datastore node is according to the index upgrade request of receiving, on the index burst of index to be updated place index datastore node, to upgrade document storage in new section, if upgrade the document storage success, then will upgrade the corresponding old document of document and in new section, be labeled as the deletion state, and return the index upgrade successful information to center control nodes, if upgrade the document storage failure, then return the index upgrade failure information to center control nodes, center control nodes is sent to external service node with the information of index upgrade success or failure at last;
The index upgrade of this step D also comprises the deletion step of document: when index upgrade request during only for the deletion document command, on the storage burst of the index datastore node of waiting to delete the document place, in new section the document is labeled as deletion;
The index upgrade of this step D, also comprise the step that makes up real time indexing: in the internal memory of system, make up index when index is with merging when upgrading simultaneously, the retrieval of index be when visiting this renewal index and when merging index carry out, when carrying out index upgrade, the index when index in the renewal is described renewal, when reached threshold value the update time of index when the number of documents of index reached threshold value or this renewal when this renewal, system indexes when submitting this renewal in the disk index, index index index when the upgrading during merging before index and the change simultaneously when merging when changing this renewal afterwards;
E. the retrieval of index may further comprise the steps:
E1. externally send it to center control nodes after the retrieval request of service node reception hint, center control nodes resolve this retrieval request and judge its at the target index, then according to the attribute information of index datastore node attribute information and target index, search all index bursts of this target index, and assign retrieval request to the index datastore node of each burst of storage;
E2. the index datastore node is retrieved relevant documentation according to the retrieval request of receiving on the respective index burst of its storage, will be sent to external service node after the result for retrieval ordering at last;
After integrating, sort, the result for retrieval of each index datastore node that E3. external service node will receive is sent to client.
2. distributed real-time search engine as claimed in claim 1, it is characterized in that: the functional structure of the described system of steps A, also comprise a standby center control nodes, described center control nodes backs up to standby center control nodes with the data sync of its storage in real time, when the center Control Node breaks down the phase, this standby center control nodes changes to center control nodes, and when former center control nodes is recovered from fault, former center control nodes changes to new standby center control nodes.
3. distributed real-time search engine as claimed in claim 1, it is characterized in that: described index datastore node and external service node periodically send the heartbeat signal that characterizes its status information to described center control nodes, if center control nodes is not received heartbeat signal in the default time, then this index datastore node of mark or external service node are dead, simultaneously, center control nodes can will be labeled as all index bursts of storing in the dead index datastore node, copy is a in the index data node of any this index burst copy of not storing of other again in the copy of these index bursts of storing from other index data nodes, make the number of copies of index burst remain unchanged, to guarantee that the index burst all is available at any time; In the heartbeat signal that described index data node takes place in center control nodes, the load information that comprises this index data node, in the process of index creation, center control nodes can be distributed to the index burst the little index data node storage of load as far as possible, equally, in the process of indexed search, center control nodes can be submitted to retrieval request the index datastore node processing at the little index burst of load or this burst copy place as far as possible.
4. distributed real-time search engine as claimed in claim 1, it is characterized in that: described index datastore node attribute information comprises: the state of the ID of node, the title of node, node types, node, the load of node and the position of node, described index attributes information comprises: the memory node ID of the number of copies of the burst number of the organization definition of document, index, index burst and index burst and index burst copy in the title of index, the index.
5. distributed real-time search engine as claimed in claim 1, it is characterized in that: in the data directory structure of the described system of step B, each index burst also has a plurality of index burst copies, this index burst copy is created when the described index creation of step C, upgrade the back asynchronous refresh at former index burst when the described index upgrade of step D, it is stored on the different index datastore nodes with former index burst; The index datastore node at former index burst place is responsible for handling the update request at this index burst, when former index burst upgrade finish after, the index data node that the index data node at former index burst place is responsible for update request is sent to asynchronously corresponding index burst copy place carries out the renewal of index burst copy; Index burst copy is all supported indexed search with corresponding former index burst, center control nodes is submitted to little index burst of load or index burst copy place index datastore node processing according to the loading condition of former index burst and index burst copy place index datastore node with the indexed search request.
6. distributed real-time search engine as claimed in claim 5, it is characterized in that: center control nodes is made regular check on the number of the index burst copy of each index in whole index, when the number of index burst copy was lower than default setting number, the copy of this index burst duplicated automatically in other back end in system; When the index datastore node of the former index burst of storage breaks down, system chooses an index upgrade job of taking over former index burst from the index burst copy of correspondence, this index burst copy becomes new former index burst, then in other index data nodes generating an index burst copy, guarantee that the number of copies of this index burst remains unchanged; When the index data node of storage index burst copy breaks down, system can in other index data nodes, generate one with the same copy of former index burst, guarantee that the number of copies of this index burst remains unchanged.
7. distributed real-time search engine as claimed in claim 5, it is characterized in that: each index burst of described same index and index burst copy creating and be stored on the index datastore node, be to carry out according to following strategy: center control nodes is according to the load information of node in the attribute information of index datastore node, described index burst and index burst copy are dispensed to the lightest index datastore node of load, when the number of available index datastore node is less than the number of index burst, center control nodes distributes a plurality of index bursts to same index datastore node, and center control nodes is the index burst copy of allocation index burst not; When the number of available index datastore node during more than the number of index burst, the center control nodes distribution portion or all the index burst copy of index bursts to remaining index datastore node.
8. distributed search engine as claimed in claim 1, it is characterized in that: in the renewal of the described index of step D, the step of the merging of the section of comprising also: the index of the index in described renewal divides the number of medium film section to reach on threshold value or the distance once to reach interval time that index merges threshold value, the index datastore node at this index burst place reads the document in less several sections and it is stored in a new section, then with these several less section physics deletions.
9. distributed real-time search engine as claimed in claim 1, it is characterized in that: the storage of the described renewal document of step D on the index burst, be by calculating the cryptographic hash of the key assignments that upgrades document, after this cryptographic hash counted delivery with the index burst of document place index, at last document is assigned to the index burst of the numerical value reference numeral of this delivery and stores.
10. distributed real-time search engine as claimed in claim 1, it is characterized in that: the different pieces of information object type of the described document of step B, comprise: text data object, image data objects, audio data objects, video data objects, executable program data object, the attribute information of each data object type are stored in the structure in territory of document.
CN 201110137785 2011-05-26 2011-05-26 Implementation method of distributed real-time search engine Active CN102169507B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110137785 CN102169507B (en) 2011-05-26 2011-05-26 Implementation method of distributed real-time search engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110137785 CN102169507B (en) 2011-05-26 2011-05-26 Implementation method of distributed real-time search engine

Publications (2)

Publication Number Publication Date
CN102169507A true CN102169507A (en) 2011-08-31
CN102169507B CN102169507B (en) 2013-03-20

Family

ID=44490669

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110137785 Active CN102169507B (en) 2011-05-26 2011-05-26 Implementation method of distributed real-time search engine

Country Status (1)

Country Link
CN (1) CN102169507B (en)

Cited By (50)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394922A (en) * 2011-10-27 2012-03-28 上海文广互动电视有限公司 Distributed cluster file system and file access method thereof
CN102523480A (en) * 2011-12-08 2012-06-27 成都东方盛行电子有限责任公司 Recording system and method based on active-standby and cache technology
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN102984762A (en) * 2012-12-12 2013-03-20 中国联合网络通信集团有限公司 Method and device for function allocation of IMS
CN103067525A (en) * 2013-01-18 2013-04-24 广东工业大学 Cloud storage data backup method based on characteristic codes
CN103106233A (en) * 2012-11-02 2013-05-15 北京邮电大学 Asynchronous index and read-write method of massive files applied to search engine
CN103198108A (en) * 2013-03-27 2013-07-10 新浪网技术(中国)有限公司 Index data updating method, retrieval server and index data updating system
CN103258036A (en) * 2013-05-15 2013-08-21 广州一呼百应网络技术有限公司 Distributed real-time search engine based on p2p
CN103309903A (en) * 2012-03-16 2013-09-18 刘龙 Position search system and method based on cloud computing
CN103310023A (en) * 2013-07-05 2013-09-18 深圳中兴网信科技有限公司 Distributed searching system and method
CN103488687A (en) * 2013-09-02 2014-01-01 用友软件股份有限公司 Searching system and searching method of big data
CN103685429A (en) * 2012-09-25 2014-03-26 阿里巴巴集团控股有限公司 A method and an apparatus for demonstrating information
CN103699648A (en) * 2013-12-26 2014-04-02 成都市卓睿科技有限公司 Tree-form data structure used for quick retrieval and implementation method of tree-form data structure
CN103853802A (en) * 2012-12-04 2014-06-11 邻客音公司 Apparatus and method for indexing electronic content
CN103914483A (en) * 2013-01-07 2014-07-09 深圳市腾讯计算机系统有限公司 File storage method and device and file reading method and device
CN104092735A (en) * 2014-06-23 2014-10-08 吕志雪 Cloud computing data access method and system based on binary tree
CN104239377A (en) * 2013-11-12 2014-12-24 新华瑞德(北京)网络科技有限公司 Platform-crossing data retrieval method and device
CN104252537A (en) * 2014-09-18 2014-12-31 深圳市彩讯科技有限公司 Index fragmentation method based on mail characteristics
CN104298692A (en) * 2013-07-19 2015-01-21 深圳中兴网信科技有限公司 Distributed searching method and system
CN104361009A (en) * 2014-10-11 2015-02-18 北京中搜网络技术股份有限公司 Real-time indexing method based on reverse index
CN104820693A (en) * 2015-04-28 2015-08-05 广东小天才科技有限公司 Method and device for data search
CN105045684A (en) * 2015-07-16 2015-11-11 北京京东尚科信息技术有限公司 Method and device for switching and controlling indexes
CN105138669A (en) * 2015-09-07 2015-12-09 天脉聚源(北京)传媒科技有限公司 Method and device for combining incremental indexes with general indexes
CN105373835A (en) * 2015-10-14 2016-03-02 国网湖北省电力公司 Link information management method based on tree model construction
CN105843933A (en) * 2016-03-30 2016-08-10 电子科技大学 Index building method for distributed memory columnar database
CN106294721A (en) * 2016-08-08 2017-01-04 无锡天脉聚源传媒科技有限公司 A kind of company-data statistics and deriving method and device
CN106598990A (en) * 2015-10-16 2017-04-26 卓望数码技术(深圳)有限公司 Search method and system
CN107133350A (en) * 2017-05-25 2017-09-05 努比亚技术有限公司 Data-updating method, mobile terminal and storage medium based on search engine
CN107220347A (en) * 2017-05-27 2017-09-29 国家计算机网络与信息安全管理中心 A kind of self-defined relevancy ranking algorithm of the support expression formula based on Lucene
CN107436923A (en) * 2017-07-07 2017-12-05 北京奇虎科技有限公司 A kind of method and apparatus of the search index in big data cluster
WO2018011670A1 (en) * 2016-07-12 2018-01-18 International Business Machines Corporation Manipulating distributed agreement protocol to identify desired set of storage units
CN108509438A (en) * 2017-02-24 2018-09-07 南京烽火星空通信发展有限公司 A kind of ElasticSearch fragments extended method
CN108681592A (en) * 2018-05-15 2018-10-19 北京三快在线科技有限公司 Index switching method, device, system and index switching control device
CN108694188A (en) * 2017-04-07 2018-10-23 腾讯科技(深圳)有限公司 A kind of newer method of index data and relevant apparatus
CN108804502A (en) * 2018-04-09 2018-11-13 中国平安人寿保险股份有限公司 Big data inquiry system, method, computer equipment and storage medium
CN108959640A (en) * 2018-07-26 2018-12-07 浙江数链科技有限公司 ES index fast construction method and device
CN109002448A (en) * 2017-06-07 2018-12-14 中国移动通信集团甘肃有限公司 A kind of report form statistics method, apparatus and system
CN109086409A (en) * 2018-08-02 2018-12-25 泰康保险集团股份有限公司 Micro services data processing method, device, electronic equipment and computer-readable medium
CN109120885A (en) * 2017-06-26 2019-01-01 杭州海康威视数字技术股份有限公司 Video data acquisition methods and device
CN109726264A (en) * 2019-01-16 2019-05-07 北京百度网讯科技有限公司 Method, apparatus, equipment and the medium updated for index information
CN109767247A (en) * 2019-01-15 2019-05-17 武汉费米坊科技有限公司 A kind of distribution commodity traceability system and source tracing method
CN110175151A (en) * 2019-05-22 2019-08-27 中国农业科学院农业信息研究所 A kind of processing method, device, equipment and the storage medium of agricultural big data
CN110209910A (en) * 2019-05-20 2019-09-06 无线生活(杭州)信息科技有限公司 Index switching dispatching method and dispatching device
CN110609844A (en) * 2018-05-29 2019-12-24 优信拍(北京)信息科技有限公司 Data updating method, device and system
CN110704453A (en) * 2019-10-15 2020-01-17 腾讯音乐娱乐科技(深圳)有限公司 Data query method and device, storage medium and electronic equipment
CN111324767A (en) * 2020-02-17 2020-06-23 厦门快商通科技股份有限公司 Distributed audio fingerprint engine system
CN111492624A (en) * 2017-10-23 2020-08-04 西门子股份公司 Method and control system for controlling and/or monitoring a device
CN112527210A (en) * 2020-12-22 2021-03-19 南京中兴力维软件有限公司 Storage method and device of full data and computer readable storage medium
CN113535730A (en) * 2021-07-21 2021-10-22 挂号网(杭州)科技有限公司 Index updating method and system for search engine, electronic equipment and storage medium
CN114020986A (en) * 2022-01-05 2022-02-08 深圳思谋信息科技有限公司 Content retrieval system

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106649804A (en) * 2016-12-29 2017-05-10 深圳市优必选科技有限公司 Data processing method, data processing device and data processing system for data query server

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090119261A1 (en) * 2005-12-05 2009-05-07 Collarity, Inc. Techniques for ranking search results
CN101677328A (en) * 2008-09-19 2010-03-24 中兴通讯股份有限公司 Content-fragment based multimedia distributing system and content-fragment based multimedia distributing method
CN101727460A (en) * 2008-10-31 2010-06-09 中兴通讯股份有限公司 Method and system for positioning content fragment
CN101853283A (en) * 2010-05-21 2010-10-06 南京邮电大学 Construction method for multidimensional data-oriented semantic indexing peer-to-peer network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090119261A1 (en) * 2005-12-05 2009-05-07 Collarity, Inc. Techniques for ranking search results
CN101677328A (en) * 2008-09-19 2010-03-24 中兴通讯股份有限公司 Content-fragment based multimedia distributing system and content-fragment based multimedia distributing method
CN101727460A (en) * 2008-10-31 2010-06-09 中兴通讯股份有限公司 Method and system for positioning content fragment
CN101853283A (en) * 2010-05-21 2010-10-06 南京邮电大学 Construction method for multidimensional data-oriented semantic indexing peer-to-peer network

Cited By (79)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102394922A (en) * 2011-10-27 2012-03-28 上海文广互动电视有限公司 Distributed cluster file system and file access method thereof
CN102523480A (en) * 2011-12-08 2012-06-27 成都东方盛行电子有限责任公司 Recording system and method based on active-standby and cache technology
CN103309903A (en) * 2012-03-16 2013-09-18 刘龙 Position search system and method based on cloud computing
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method
CN102779185B (en) * 2012-06-29 2014-11-12 浙江大学 High-availability distribution type full-text index method
CN103685429A (en) * 2012-09-25 2014-03-26 阿里巴巴集团控股有限公司 A method and an apparatus for demonstrating information
CN103685429B (en) * 2012-09-25 2017-09-26 阿里巴巴集团控股有限公司 A kind of method and apparatus of information displaying
CN103106233A (en) * 2012-11-02 2013-05-15 北京邮电大学 Asynchronous index and read-write method of massive files applied to search engine
CN103853802B (en) * 2012-12-04 2018-07-31 微软技术许可有限责任公司 Device and method for indexing digital content
CN103853802A (en) * 2012-12-04 2014-06-11 邻客音公司 Apparatus and method for indexing electronic content
CN102984762A (en) * 2012-12-12 2013-03-20 中国联合网络通信集团有限公司 Method and device for function allocation of IMS
CN103914483B (en) * 2013-01-07 2018-09-25 深圳市腾讯计算机系统有限公司 File memory method, device and file reading, device
CN103914483A (en) * 2013-01-07 2014-07-09 深圳市腾讯计算机系统有限公司 File storage method and device and file reading method and device
CN103067525A (en) * 2013-01-18 2013-04-24 广东工业大学 Cloud storage data backup method based on characteristic codes
CN103067525B (en) * 2013-01-18 2015-11-25 广东工业大学 A kind of cloud storing data backup method of feature based code
CN103198108B (en) * 2013-03-27 2016-08-10 新浪网技术(中国)有限公司 A kind of index data update method, retrieval server and system
CN103198108A (en) * 2013-03-27 2013-07-10 新浪网技术(中国)有限公司 Index data updating method, retrieval server and index data updating system
CN103258036A (en) * 2013-05-15 2013-08-21 广州一呼百应网络技术有限公司 Distributed real-time search engine based on p2p
CN103310023A (en) * 2013-07-05 2013-09-18 深圳中兴网信科技有限公司 Distributed searching system and method
CN104298692A (en) * 2013-07-19 2015-01-21 深圳中兴网信科技有限公司 Distributed searching method and system
CN104298692B (en) * 2013-07-19 2017-11-24 深圳中兴网信科技有限公司 A kind of method and system of distributed search
CN103488687A (en) * 2013-09-02 2014-01-01 用友软件股份有限公司 Searching system and searching method of big data
CN104239377A (en) * 2013-11-12 2014-12-24 新华瑞德(北京)网络科技有限公司 Platform-crossing data retrieval method and device
CN103699648A (en) * 2013-12-26 2014-04-02 成都市卓睿科技有限公司 Tree-form data structure used for quick retrieval and implementation method of tree-form data structure
CN104092735A (en) * 2014-06-23 2014-10-08 吕志雪 Cloud computing data access method and system based on binary tree
CN104252537A (en) * 2014-09-18 2014-12-31 深圳市彩讯科技有限公司 Index fragmentation method based on mail characteristics
CN104252537B (en) * 2014-09-18 2019-05-21 彩讯科技股份有限公司 Index sharding method based on mail features
CN104361009A (en) * 2014-10-11 2015-02-18 北京中搜网络技术股份有限公司 Real-time indexing method based on reverse index
CN104361009B (en) * 2014-10-11 2017-10-31 北京中搜网络技术股份有限公司 A kind of real time indexing method based on inverted index
CN104820693B (en) * 2015-04-28 2018-07-24 广东小天才科技有限公司 A kind of method and device of data search
CN104820693A (en) * 2015-04-28 2015-08-05 广东小天才科技有限公司 Method and device for data search
CN105045684A (en) * 2015-07-16 2015-11-11 北京京东尚科信息技术有限公司 Method and device for switching and controlling indexes
CN105045684B (en) * 2015-07-16 2018-06-15 北京京东尚科信息技术有限公司 Index switching and the method and device of index control
CN105138669A (en) * 2015-09-07 2015-12-09 天脉聚源(北京)传媒科技有限公司 Method and device for combining incremental indexes with general indexes
CN105373835B (en) * 2015-10-14 2021-07-02 国网湖北省电力公司 Link information management method based on structure tree model
CN105373835A (en) * 2015-10-14 2016-03-02 国网湖北省电力公司 Link information management method based on tree model construction
CN106598990A (en) * 2015-10-16 2017-04-26 卓望数码技术(深圳)有限公司 Search method and system
CN106598990B (en) * 2015-10-16 2020-06-19 卓望数码技术(深圳)有限公司 Searching method and system
CN105843933A (en) * 2016-03-30 2016-08-10 电子科技大学 Index building method for distributed memory columnar database
CN105843933B (en) * 2016-03-30 2019-01-29 电子科技大学 The index establishing method of distributed memory columnar database
GB2567106B (en) * 2016-07-12 2021-07-14 Ibm Manipulating distributed agreement protocol to identify desired set of storage units
GB2567106A (en) * 2016-07-12 2019-04-03 Ibm Manipulating distributed agreement protocol to identify desired set of storage units
US10942806B2 (en) 2016-07-12 2021-03-09 International Business Machines Corporation Manipulating a distributed agreement protocol to identify a desired set of storage units
WO2018011670A1 (en) * 2016-07-12 2018-01-18 International Business Machines Corporation Manipulating distributed agreement protocol to identify desired set of storage units
CN106294721B (en) * 2016-08-08 2020-05-19 无锡天脉聚源传媒科技有限公司 Cluster data counting and exporting methods and devices
CN106294721A (en) * 2016-08-08 2017-01-04 无锡天脉聚源传媒科技有限公司 A kind of company-data statistics and deriving method and device
CN108509438A (en) * 2017-02-24 2018-09-07 南京烽火星空通信发展有限公司 A kind of ElasticSearch fragments extended method
CN108509438B (en) * 2017-02-24 2021-08-31 南京烽火星空通信发展有限公司 ElasticSearch fragment expansion method
CN108694188A (en) * 2017-04-07 2018-10-23 腾讯科技(深圳)有限公司 A kind of newer method of index data and relevant apparatus
CN107133350A (en) * 2017-05-25 2017-09-05 努比亚技术有限公司 Data-updating method, mobile terminal and storage medium based on search engine
CN107220347A (en) * 2017-05-27 2017-09-29 国家计算机网络与信息安全管理中心 A kind of self-defined relevancy ranking algorithm of the support expression formula based on Lucene
CN107220347B (en) * 2017-05-27 2020-07-03 国家计算机网络与信息安全管理中心 Custom relevance ranking algorithm based on Lucene support expression
CN109002448A (en) * 2017-06-07 2018-12-14 中国移动通信集团甘肃有限公司 A kind of report form statistics method, apparatus and system
CN109120885A (en) * 2017-06-26 2019-01-01 杭州海康威视数字技术股份有限公司 Video data acquisition methods and device
CN107436923A (en) * 2017-07-07 2017-12-05 北京奇虎科技有限公司 A kind of method and apparatus of the search index in big data cluster
CN111492624B (en) * 2017-10-23 2022-09-23 西门子股份公司 Method and control system for controlling and/or monitoring a device
CN111492624A (en) * 2017-10-23 2020-08-04 西门子股份公司 Method and control system for controlling and/or monitoring a device
US11615007B2 (en) 2017-10-23 2023-03-28 Siemens Aktiengesellschaft Method and control system for controlling and/or monitoring devices
CN108804502A (en) * 2018-04-09 2018-11-13 中国平安人寿保险股份有限公司 Big data inquiry system, method, computer equipment and storage medium
CN108681592B (en) * 2018-05-15 2021-05-25 北京三快在线科技有限公司 Index switching method, device and system and index switching central control device
CN108681592A (en) * 2018-05-15 2018-10-19 北京三快在线科技有限公司 Index switching method, device, system and index switching control device
CN110609844A (en) * 2018-05-29 2019-12-24 优信拍(北京)信息科技有限公司 Data updating method, device and system
CN110609844B (en) * 2018-05-29 2022-05-13 优信拍(北京)信息科技有限公司 Data updating method, device and system
CN108959640B (en) * 2018-07-26 2021-02-12 浙江数链科技有限公司 ES index rapid construction method and device
CN108959640A (en) * 2018-07-26 2018-12-07 浙江数链科技有限公司 ES index fast construction method and device
CN109086409A (en) * 2018-08-02 2018-12-25 泰康保险集团股份有限公司 Micro services data processing method, device, electronic equipment and computer-readable medium
CN109086409B (en) * 2018-08-02 2021-10-08 泰康保险集团股份有限公司 Microservice data processing method and device, electronic equipment and computer readable medium
CN109767247A (en) * 2019-01-15 2019-05-17 武汉费米坊科技有限公司 A kind of distribution commodity traceability system and source tracing method
CN109726264B (en) * 2019-01-16 2022-02-25 北京百度网讯科技有限公司 Method, apparatus, device and medium for index information update
CN109726264A (en) * 2019-01-16 2019-05-07 北京百度网讯科技有限公司 Method, apparatus, equipment and the medium updated for index information
CN110209910A (en) * 2019-05-20 2019-09-06 无线生活(杭州)信息科技有限公司 Index switching dispatching method and dispatching device
CN110209910B (en) * 2019-05-20 2021-06-04 无线生活(杭州)信息科技有限公司 Index switching scheduling method and scheduling device
CN110175151A (en) * 2019-05-22 2019-08-27 中国农业科学院农业信息研究所 A kind of processing method, device, equipment and the storage medium of agricultural big data
CN110704453B (en) * 2019-10-15 2022-05-06 腾讯音乐娱乐科技(深圳)有限公司 Data query method and device, storage medium and electronic equipment
CN110704453A (en) * 2019-10-15 2020-01-17 腾讯音乐娱乐科技(深圳)有限公司 Data query method and device, storage medium and electronic equipment
CN111324767A (en) * 2020-02-17 2020-06-23 厦门快商通科技股份有限公司 Distributed audio fingerprint engine system
CN112527210A (en) * 2020-12-22 2021-03-19 南京中兴力维软件有限公司 Storage method and device of full data and computer readable storage medium
CN113535730A (en) * 2021-07-21 2021-10-22 挂号网(杭州)科技有限公司 Index updating method and system for search engine, electronic equipment and storage medium
CN114020986A (en) * 2022-01-05 2022-02-08 深圳思谋信息科技有限公司 Content retrieval system

Also Published As

Publication number Publication date
CN102169507B (en) 2013-03-20

Similar Documents

Publication Publication Date Title
CN102169507B (en) Implementation method of distributed real-time search engine
JP7410181B2 (en) Hybrid indexing methods, systems, and programs
CN106484877B (en) A kind of document retrieval system based on HDFS
US9710535B2 (en) Object storage system with local transaction logs, a distributed namespace, and optimized support for user directories
CN104714755B (en) Snapshot management method and device
US9195666B2 (en) Location independent files
CN104679898A (en) Big data access method
CN104778270A (en) Storage method for multiple files
CN104536959A (en) Optimized method for accessing lots of small files for Hadoop
CN104301360A (en) Method, log server and system for recording log data
CN103595797B (en) Caching method for distributed storage system
CN109376121B (en) File indexing system and method based on elastic search full-text retrieval
CN103488687A (en) Searching system and searching method of big data
JP2022500727A (en) Systems and methods for early removal of tombstone records in databases
CN105303456A (en) Method for processing monitoring data of electric power transmission equipment
CN108614837B (en) File storage and retrieval method and device
CN103067461A (en) Metadata management system of document and metadata management method thereof
CN102930060A (en) Method and device for performing fast indexing of database
CN104881466A (en) Method and device for processing data fragments and deleting garbage files
CN103049574B (en) Realize key assignments file system and the method for file dynamic copies
CN103795811A (en) Information storage and data statistical management method based on meta data storage
CN103353901A (en) Orderly table data management method and system based on Hadoop distributed file system (HDFS)
CN102523301A (en) Method for caching data on client in cloud storage
CN102955808A (en) Data acquisition method and distributed file system
CN103778219A (en) HBase-based method for updating incremental indexes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant