CN103310023A

CN103310023A - Distributed searching system and method

Info

Publication number: CN103310023A
Application number: CN2013102818388A
Authority: CN
Inventors: 赵兴成; 刘亚军; 杨景慧; 周辉; 黄韶军; 姜佰胜
Original assignee: ZTE ICT Technologies Co Ltd
Current assignee: ZTE ICT Technologies Co Ltd
Priority date: 2013-07-05
Filing date: 2013-07-05
Publication date: 2013-09-18

Abstract

The invention provides a distributed searching system comprises an index creating unit, an index fragmenting unit and a fragment searching unit, wherein the index creating unit is used for creating instructions according to received indexes and creating indexes for specified data; the index fragmenting unit is used for configuring data according to received fragments, dividing each index into a number of fragments and recording the information of the fragments; the fragment searching unit is used for determining the information of at least one fragment according to received searching conditions, searching at least one target fragment among a number of fragments according to the information of the fragment and returning data corresponding to the target fragment to users. The application further provides a distributed searching method. According to the technical scheme of the application, indexes can be fragmented; during searching, one or more corresponding fragments can be searched according to searching conditions; the response speed of searching can be improved by searching the fragments.

Description

Distributed search system and distributed search methods

Technical field

The present invention relates to the data searching technology field, in particular to a kind of distributed search system and a kind of distributed search methods.

Background technology

Burst growth along with network data, big data processing technique has become necessity operation that data are handled, in these technology, hadoop relies on its high stability, reliability, extensibility, become the standard of big data industry gradually, but also there is defective in hadoop aspect the processing in real time, can not the higher scene of requirement of real time, when in the database of hadoop, carrying out the mass data search, search speed is slower, and the demonstration of Search Results has bigger time delay with respect to the input of search operation, is difficult to satisfy the user to the demand of real-time search.

Therefore, need a kind of new search technique, can improve the response speed of mass data search, improve the real-time that mass data is searched for.

Summary of the invention

The present invention just is being based on the problems referred to above, has proposed a kind of search technique, can improve the response speed of mass data search, improves the real-time that mass data is searched for.

In view of this, the present invention proposes a kind of distributed search system, comprising: the index creation unit is used for according to the index creation instruction that receives, for specific data is created index; The index sharding unit is used for described index being divided into a plurality of bursts, and recording each fragment information of fragments in described a plurality of burst according to the burst configuration data that receives; The burst search unit, be used for determining at least one burst information according to the search condition that receives, and in a plurality of bursts, search at least one target burst according at least one burst information, with each target burst in described at least one target burst respectively corresponding data return to the user.

In this technical scheme, after to certain data creation index, index can be divided into a plurality of bursts, when search data, can directly in search condition, import the burst information that to search for, because an index is corresponding to a plurality of bursts, the corresponding data volume of each burst is less, thereby each burst is searched for, with respect to the corresponding chunk data of search index, can search the corresponding little blocks of data of each burst quickly, and with a plurality of bursts respectively corresponding little blocks of data return to the user, the speed that returns to the user with respect to the chunk data with the index correspondence is also very fast, thereby has improved the speed of data query and return results in the data search process, has improved the real-time of search data.

In technique scheme, preferably, described index creation unit also is used for generating index database according to the metadata information of the constructive process of described index; Described index sharding unit is used for described each fragment information of fragments is recorded in described index database; Described burst search unit is used for according to described search condition, determines described at least one burst information in described index database.

In this technical scheme, can generate corresponding metadata information in each index creation process, wherein putting down in writing the specifying information of index, such as the index corresponding data, the index position, information such as the ID of index, can be according to the index database of metadata creation index, thereby when the user imports search criterion, just can be convenient and find corresponding index rapidly in index database according to search criterion, after to the index burst, can also be stored in each fragment information of fragments in the index database, thereby can fast and accurately determine corresponding burst information according to search criterion, and then search corresponding burst.

In technique scheme, preferably, described burst configuration data comprises: burst quantity and/or burst node; Described burst information comprises: segmental identification and/or described burst node.

In this technical scheme, the user can arrange the burst configuration data as required, thereby the concrete burst quantity of index and the node (being burst residing position in server) that each burst is distributed to are set, burst information can comprise segmental identification and/or burst node, and namely the user can find corresponding burst by sign and/or the positional information of importing the burst that will search in search instruction.

In technique scheme, preferably, also comprise: the burst storage unit is used for according to preset algorithm described each burst being arranged into corresponding burst node respectively and stores.

In this technical scheme, after index is carried out burst, each burst need be arranged on the node of server, such as carrying out burst for the index among the hadoop, can be arranged on several nodes of hadoop server according to the burst of the intrinsic algorithm among the hadoop with index so, to finish the storage of burst.

In above-mentioned arbitrary technical scheme, preferably, described index sharding unit also is used for the burst to be expanded in described a plurality of bursts being divided into a plurality of sub-bursts, and recording each sub-fragment information of fragments in described a plurality of sub-burst according to the expansion burst instruction that receives.

In this technical scheme, can expand the burst quantity of index as required, specifically can adjust the burst quantity of index, again index is carried out burst, also can divide further one or more bursts, burst is divided into a plurality of sub-bursts, makes the user can search burst or the corresponding more burst of small data quantity of corresponding more big data quantity.

The application has also proposed a kind of distributed search methods, comprising: step 202, according to the index creation instruction that receives, for specific data is created index; Step 204, the burst configuration data according to receiving is divided into a plurality of bursts with described index, and records each fragment information of fragments in described a plurality of burst; Step 206, determine at least one burst information according to the search condition that receives, and in described a plurality of bursts, search at least one target burst according to described at least one burst information, with each target burst in described at least one target burst respectively corresponding data return to the user.

In technique scheme, preferably, described step 202 also comprises: generate index database according to the metadata information in the constructive process of described index; Then described step 204 comprises: described each fragment information of fragments is recorded in the described index database; Described step 206 comprises: according to described search condition, determine described at least one burst information in described index database.

In technique scheme, preferably, described step 204 also comprises: according to preset algorithm described each burst is arranged into corresponding burst node respectively and stores.

In technique scheme, preferably, also comprise: the expansion burst instruction according to receiving is divided into a plurality of sub-bursts with the burst to be expanded in described a plurality of bursts, and records each sub-fragment information of fragments in described a plurality of sub-burst.

By above technical scheme, can be behind the index of creating data, index is carried out burst, and then when carrying out search operation, can search corresponding one or more burst according to search condition, because the corresponding data volume of each burst is little with respect to the corresponding data volume of index, thereby can improve the response speed of search by the search burst.

Description of drawings

Fig. 1 shows the block diagram of distributed search system according to an embodiment of the invention;

Fig. 2 shows the process flow diagram of distributed search methods according to an embodiment of the invention;

Fig. 3 shows the synoptic diagram of creating burst and search burst according to an embodiment of the invention;

Fig. 4 A to Fig. 4 C shows the synoptic diagram of expanding burst according to an embodiment of the invention.

Embodiment

In order more to be expressly understood above-mentioned purpose of the present invention, feature and advantage, below in conjunction with the drawings and specific embodiments the present invention is further described in detail.Need to prove that under the situation of not conflicting, the application's embodiment and the feature among the embodiment can make up mutually.

A lot of details have been set forth in the following description so that fully understand the present invention; but; the present invention can also adopt other to be different from other modes described here and implement, and therefore, protection scope of the present invention is not subjected to the restriction of following public specific embodiment.

Fig. 1 shows the block diagram of distributed search system according to an embodiment of the invention.

As shown in Figure 1, distributed search system 100 comprises according to an embodiment of the invention: index creation unit 102 is used for according to the index creation instruction that receives, for specific data is created index; Index sharding unit 104 is used for index being divided into a plurality of bursts, and recording each fragment information of fragments in a plurality of bursts according to the burst configuration data that receives; Burst search unit 106, be used for determining at least one burst information according to the search condition that receives, and in described a plurality of bursts, search at least one target burst according to described at least one burst information, with each target burst at least one target burst respectively corresponding data return to the user.

After to certain data creation index, index can be divided into a plurality of bursts, when search data, can directly in search condition, import the burst information that to search for, because an index is corresponding to a plurality of bursts, the corresponding data volume of each burst is less, thereby each burst is searched for, with respect to the corresponding chunk data of search index, can search the corresponding little blocks of data of each burst quickly, and with a plurality of bursts respectively corresponding little blocks of data return to the user, the speed that returns to the user with respect to the chunk data with the index correspondence is also very fast, thereby improved the speed of data query and return results in the data search process, improved the real-time of search data.

Such as the data creation index that is 10G for a size, the corresponding size of data of this index is exactly 10G so, if this index is equally divided into 5 bursts, the corresponding size of data of each burst is 2G so, the user can search for one or several burst in 5 bursts as required, with respect to the data of search 10G, the data response speed of 1 to 5 2G of search is faster, and search accuracy also more can satisfy user's needs.

Preferably, index creation unit 102 also is used for the metadata information generation index database according to the constructive process of index; Index sharding unit 104 is used for each fragment information of fragments is recorded in index database; Burst search unit 106 is used for according to search condition, determines at least one burst information in index database.

Can generate corresponding metadata information in each index creation process, wherein putting down in writing the specifying information of index, such as the index corresponding data, the index position, information such as the ID of index, can be according to the index database of metadata creation index, thereby when the user imports search criterion, just can be convenient and find corresponding index rapidly in index database according to search criterion, after to the index burst, each fragment information of fragments also can be stored in the index database, thereby can fast and accurately determine corresponding burst information according to search criterion, and then search corresponding burst.

For last example, in the index database of the data of 10G, comprise each fragment information of fragments in 5 bursts, comprise such as first fragment information of fragments: ID is 10.1, the position is the 2nd server the 16th node, second fragment information of fragments comprises: ID is 10.2, the position is the 2nd server the 17th node, and the like, 5 fragment information of fragments of this index in index database, have been put down in writing, the user can directly import the ID of the burst that will inquire about, such as comprising 10.1 and 10.2 in the search condition, then searches first burst data corresponding with second burst.

Preferably, the burst configuration data comprises: burst quantity and/or burst node; Burst information comprises: segmental identification and/or burst node.

The user can arrange the burst configuration data as required, thereby the concrete burst quantity of index and the node (being burst residing position in server) that each burst is distributed to are set, burst information can comprise segmental identification and/or burst node, and namely the user can find corresponding burst by sign and/or the positional information of importing the burst that will search in search instruction.

Preferably, also comprise: burst storage unit 108 is used for according to preset algorithm each burst being arranged into corresponding burst node respectively and stores.

After index is carried out burst, each burst need be arranged on the node of server, such as carrying out burst for the index among the hadoop, can be arranged on several nodes of hadoop server according to the burst of the intrinsic algorithm among the hadoop with index so, to finish the storage of burst.

Preferably, index sharding unit 104 also is used for the burst to be expanded in a plurality of bursts being divided into a plurality of sub-bursts, and recording each sub-fragment information of fragments in a plurality of sub-bursts according to the expansion burst instruction that receives.

Can expand the burst quantity of index as required, specifically can adjust the burst quantity of index, again index is carried out burst, also can divide further one or more bursts, burst is divided into a plurality of sub-bursts, makes the user can search burst or the corresponding more burst of small data quantity of corresponding more big data quantity.

For last example, can further divide each burst in 5 bursts, such as dividing for first burst, obtain 3 sub-bursts, the corresponding ID of each burst can be 10.1.1 so, 10.1.2,10.1.3, thereby the user can import sub-fragment information of fragments and search for sub-burst in search instruction, further improve search accuracy, certainly, also can repartition the index of 10G data, at first delete corresponding each burst of this index, again this index is divided according to the burst quantity in the burst configuration data (can be a configuration file, be arranged by the user) that receives then, such as being divided into 3 bursts, the data volume of each burst correspondence is respectively 3G, 3G and 4G, thus the user can search for the burst of larger data amount.

Fig. 2 shows the process flow diagram of distributed search methods according to an embodiment of the invention.

As shown in Figure 2, distributed search methods comprises according to an embodiment of the invention: step 202, according to the index creation instruction that receives, for specific data is created index; Step 204, the burst configuration data according to receiving is divided into a plurality of bursts with index, and records each fragment information of fragments in a plurality of bursts; Step 206, determine at least one burst information according to the search condition that receives, and in a plurality of bursts, search at least one target burst according at least one burst information, with each target burst at least one target burst respectively corresponding data return to the user.

Preferably, step 202 also comprises: generate index database according to the metadata information in the constructive process of index; Then step 204 comprises: each fragment information of fragments is recorded in the index database; Step 206 comprises: according to search condition, determine at least one burst information in index database.

Preferably, step 204 also comprises: according to preset algorithm each burst is arranged into corresponding burst node respectively and stores.

Preferably, also comprise: the expansion burst instruction according to receiving is divided into a plurality of sub-bursts with the burst to be expanded in a plurality of bursts, and records each sub-fragment information of fragments in a plurality of sub-bursts.

Fig. 3 shows the synoptic diagram of creating burst and search burst according to an embodiment of the invention.

As shown in Figure 3, distributed search methods as shown in Figure 2 can be applied among the hadoop, hadoop can comprise index server 304 and search server 306, host node 302 is used for receiving user instruction by the foreground interface, and search content returned the foreground, by the index management interface establishment index operation of index server 304 is controlled simultaneously, and by the search management interface search operation of search server 306 is controlled, namenode wherein is used for providing the metadata service, and the node in the index server 304 and the datanode on the node in the search server 306 send control command.

At first create index for target data, then index is left in the index server 304 of hadoop, burst configuration data according to user's input, index is divided into a plurality of bursts, and be arranged on a plurality of nodes in the index server 304 and store, such as N burst is saved in index node 1 respectively to index node N, and there is the node of correspondence in index node 1 in search server 306 to index node N, be that node (N+1) is to node (N+N), each fragment information of fragments also can correspond to node (N+1) to node (N+N), when the request of user's inputted search, the searching request of search server 306 process user, determine the burst information that searching request (being search condition) comprises, such as according to the burst node in the searching request, the residing node of the definite burst that will search, inquire the information that comprises this burst node at node (N+1) to node (N+N) then, thereby search corresponding burst and the corresponding data of burst at the node of index server 304 correspondences.

The data that search can return to the user by the foreground interface, and slave node can be stored burst information in index server 304 and/or the search server 306, also can be used for the storage burst.

Shown in Fig. 4 A, the size of data A is 10G, for data A creates index, index ID is A, arrange according to the user this index on average is divided into 5 bursts, the size of data of each burst correspondence is 2G so, A.1 the ID of 5 burst correspondences be respectively, A.2, A.3, A.4, A.5, the user can directly import the ID of the burst that will search for, search corresponding burst, such as searching for burst ID:A.1, A.3 and A.4, then the data with these three burst correspondences return to the user as Search Results.

Shown in Fig. 4 B, can further divide the one or more bursts in already present 5 bursts according to user's extended instruction, burst is divided into 4 sub-bursts such as inciting somebody to action A.1, A.1.1 the ID of each sub-burst be followed successively by, A.1.2, A.1.3, A.1.4, thereby be equivalent to index A is divided into 8 bursts, the size of data of each burst correspondence is 0.5G, 0.5G, 0.5G, 0.5G, 2G, 2G, 2G, 2G, searches for the corresponding data of sub-burst thereby the user can import the ID of sub-burst.

Shown in Fig. 4 C, the user can also carry out burst to index A as required again, at first delete already present burst information in the server, then according to the burst configuration data that re-enters, index A is divided into 3 bursts, each burst respectively corresponding ID for A.11, A.12, A.13, each burst size of data of correspondence respectively is 3G, 3G, 4G, thus the user can search the data of required size as required.

More than be described with reference to the accompanying drawings technical scheme of the present invention, consider in the correlation technique, when in the database of hadoop, carrying out the mass data search, search speed is slower, the demonstration of Search Results has bigger time delay with respect to the input of search operation, is difficult to satisfy the user to the demand of real-time search.By technical scheme of the present invention, can improve the response speed of mass data search, improve the real-time that mass data is searched for.

In the present invention, term " first ", " second " only are used for describing purpose, and can not be interpreted as indication or hint relative importance.Term " a plurality of " refers to two or more, unless clear and definite restriction is arranged in addition.

The above is the preferred embodiments of the present invention only, is not limited to the present invention, and for a person skilled in the art, the present invention can have various changes and variation.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1. a distributed search system is characterized in that, comprising:

The index creation unit is used for according to the index creation instruction that receives, for specific data is created index;

The index sharding unit is used for described index being divided into a plurality of bursts, and recording each fragment information of fragments in described a plurality of burst according to the burst configuration data that receives;

The burst search unit, be used for determining at least one burst information according to the search condition that receives, and in described a plurality of bursts, search at least one target burst according to described at least one burst information, with each target burst in described at least one target burst respectively corresponding data return to the user.

2. distributed search system according to claim 1 is characterized in that, described index creation unit also is used for generating index database according to the metadata information of the constructive process of described index; Described index sharding unit is used for described each fragment information of fragments is recorded in described index database; Described burst search unit is used for according to described search condition, determines described at least one burst information in described index database.

3. distributed search system according to claim 1 is characterized in that, described burst configuration data comprises: burst quantity and/or burst node; Described burst information comprises: segmental identification and/or described burst node.

4. distributed search system according to claim 3 is characterized in that, also comprises:

The burst storage unit is used for according to preset algorithm described each burst being arranged into corresponding burst node respectively and stores.

5. according to each described distributed search system in the claim 1 to 4, it is characterized in that, described index sharding unit also is used for according to the expansion burst instruction that receives, burst to be expanded in described a plurality of bursts is divided into a plurality of sub-bursts, and records each sub-fragment information of fragments in described a plurality of sub-burst.

6. a distributed search methods is characterized in that, comprising:

Step 202 is according to the index creation instruction that receives, for specific data is created index;

Step 204, the burst configuration data according to receiving is divided into a plurality of bursts with described index, and records each fragment information of fragments in described a plurality of burst;

Step 206, determine at least one burst information according to the search condition that receives, and in described a plurality of bursts, search at least one target burst according to described at least one burst information, with each target burst in described at least one target burst respectively corresponding data return to the user.

7. distributed search methods according to claim 6 is characterized in that, described step 202 also comprises: generate index database according to the metadata information in the constructive process of described index; Then described step 204 comprises: described each fragment information of fragments is recorded in the described index database; Described step 206 comprises: according to described search condition, determine described at least one burst information in described index database.

8. distributed search methods according to claim 6 is characterized in that, described burst configuration data comprises: burst quantity and/or burst node; Described burst information comprises: segmental identification and/or described burst node.

9. distributed search methods according to claim 8 is characterized in that, described step 204 also comprises: according to preset algorithm described each burst is arranged into corresponding burst node respectively and stores.

10. according to each described distributed search methods in the claim 6 to 9, it is characterized in that, also comprise: according to the expansion burst instruction that receives, burst to be expanded in described a plurality of bursts is divided into a plurality of sub-bursts, and records each sub-fragment information of fragments in described a plurality of sub-burst.