CN103488687A - Searching system and searching method of big data - Google Patents

Searching system and searching method of big data Download PDF

Info

Publication number
CN103488687A
CN103488687A CN201310392278.3A CN201310392278A CN103488687A CN 103488687 A CN103488687 A CN 103488687A CN 201310392278 A CN201310392278 A CN 201310392278A CN 103488687 A CN103488687 A CN 103488687A
Authority
CN
China
Prior art keywords
burst
index
search
data
resource data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310392278.3A
Other languages
Chinese (zh)
Inventor
郭辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yonyou Software Co Ltd
Original Assignee
Yonyou Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yonyou Software Co Ltd filed Critical Yonyou Software Co Ltd
Priority to CN201310392278.3A priority Critical patent/CN103488687A/en
Publication of CN103488687A publication Critical patent/CN103488687A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/41Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a searching system of big data. The searching system comprises a grouping unit, a fragmentation creating unit and a searching unit, wherein the grouping unit is used for dividing an index file of the big data into one or a plurality of source groups; the index file of each source group comprises resource data of the same type; the fragmentation creating unit is used for carrying out fragmentation operation on each source group to obtain index files of a plurality of fragmentations; corresponding index fragmentations are created by using the index file of each fragmentation; the searching unit is used for executing and sending search operation in fragmentation searching files corresponding to the appointed one or multiple index fragmentations to obtain and return a corresponding search result according to a received searching instruction. The invention also provides a searching method of the big data. By using the technical scheme of the invention, a distributed index file searching method is realized, the searching speed is favorably promoted, and the searching efficiency bottleneck problem of the big data of an enterprise is solved.

Description

Search system and searching method for large data
Technical field
The present invention relates to the data searching technology field, in particular to a kind of search system for large data and a kind of searching method for large data.
Background technology
The large data of enterprise (big data), or title flood tide data, refer to data quantity related in the process such as enterprise's producing and selling huge to seeing through current main flow Software tool, reaching acquisition, management, processing within reasonable time, also arrangement becomes the information that positive purpose is played in the help enterprise management decision-making.Be accompanied by the extensive application of technology in enterprise information management such as Internet of Things, cloud computing, mobile Internet, car networking, expedited the emergence of a large amount of internal information resources.According to statistics, business data every year is with 200% speed increment, wherein 80% data leave in enterprise in computer system with unstructured data forms such as file, mail, picture, sound, the not competent retrieval to these data of database management system and work for the treatment of, but these a large amount of relatively scattered data again can be compared to a huge underground gold mine for enterprise, and large data search can become a kind of means of enterprise in gold mine the inside Denver Nuggets, large data search technical solution has become the urgent problem to be solved that enterprise faces.
The enterprise search technology is a kind of important technical of business processes inside non-structured data.Yet, at large data age, data volume constantly expands, index file increases too fast, causes search performance constantly to descend, and has become the new bottleneck in the enterprise search application on availability and efficiency.
In the prior art, the large data search of existing solution enterprise at present mainly contains two kinds of methods:
One, solve the storage problem of large data by the Apche project Hadoop that increases income;
Two, by controlling the mode of index information scale, when increment adds index, some inactive index are deleted, controlled the index file scale.
But, in actual application process, all there are some defects in above-mentioned two schemes.Such as in scheme one, there is efficiency in Hadoop to the real-time search of the large data of enterprise, and the strong point of Hadoop is once to store, repeatedly read, and business data frequently modification can have a strong impact on efficiency; And scheme two is obviously a kind of forced method, take and sacrifice data volume and improve the search efficiency problem as cost.
Therefore, how the search efficiency of the large data of enterprise, become technical matters urgently to be resolved hurrily at present.
Summary of the invention
The present invention just is being based on the problems referred to above, has proposed a kind of search technique of large data newly, can realize a kind of distributed index file searching method, contributes to promote search speed, solves the search efficiency bottleneck problem of the large data of enterprise.
In view of this, the present invention proposes a kind of search system for large data, comprising: grouped element, be divided into one or more sources group for the index file by described large data, the index file in each source group includes the resource data of same type; The burst creating unit, for each described source group is carried out to Fragmentation, obtain a plurality of burst index files, utilizes the index burst that each described burst index file creation is corresponding; Search unit, for according to the search instruction that receives, the execution concurrence search operation in corresponding burst search file at one or more index bursts of appointment, to obtain and to return corresponding Search Results.
In this technical scheme, by index file is carried out to burst, make when carrying out search, realize concurrent search operation on a plurality of index bursts, thereby required time while effectively having shortened the search that completes all index files has promoted search efficiency simultaneously.Generate different source groups by the type according to resource data, make when corresponding index burst is retrieved, be easier to the demand of user according to self, directly specify the index burst of partial response is retrieved, and all retrieved without the index burst to all, contribute to promote recall precision, reduce power consumption and calculation resources that search operaqtion consumes.Wherein, the index that index file comprises concrete resource data and generates based on these resource datas; Simultaneously, " one or more index bursts of appointment " can be the demand appointment of user according to self, can be also the part or all of index burst of acquiescence.
In technique scheme, preferably, described burst creating unit is used for: by the resource data in the group of same described source, according to the difference of residing server, be divided into and described server a plurality of burst index files one to one, and create corresponding index burst.
In this technical scheme, for originally just being stored in the resource data of a plurality of servers respectively, the resource data of storing on each server can be created as to corresponding index burst; For originally just being stored in the resource data in same server, it can be created as to a corresponding index burst, or be created as a plurality of index bursts after grouping.By based on server, resource data being created as to corresponding index burst, reduced as much as possible the move operation for resource data, contribute to reduce the calculation resources taken, avoid data to shift the loss of data equivalent risk that may cause.
In above-mentioned arbitrary technical scheme, preferably, described burst creating unit also for: for the resource data in same server, according to the level of intimate of relation, be divided into a plurality of burst index files, and create corresponding index burst.
In this technical scheme, level of intimate refers between resource data whether meet some default conditions simultaneously, when meeting wherein one or meeting many simultaneously, can think in close relations between resource data, can be used as the resource data of same type, for leaving same index burst in.Particularly, between data, exist the level of intimate of relation comprise as some data always (number of times is more than or equal to default frequency threshold value) called simultaneously or edited, or some data all relate to identical user, company etc.
In above-mentioned arbitrary technical scheme, preferably, described search unit also for: obtain respectively the burst Search Results that each described index burst obtains; In all burst Search Results, select data that the matching degree of predetermined number is the highest as final Search Results, and return to described final Search Results.
In this technical scheme, keyword based on user's input, appointed each index burst is all carried out corresponding search operation, then after the burst Search Results all index bursts obtained carries out comprehensively, therefrom select the highest data of matching degree of predetermined number, thereby realized the merging of burst Search Results that a plurality of index bursts are obtained.
In above-mentioned arbitrary technical scheme, preferably, also comprise: the relational storage unit, for the corresponding relation between the resource data of preserving each described burst index file and wherein comprising; Wherein, described search unit also for: during according to the edit instruction to the allocated resource data that receives, determine the index burst that comprises described allocated resource data according to described corresponding relation, and the described allocated resource data of search executive editor's operation in definite index burst.
In this technical scheme, by the corresponding relation between the resource data of setting up the burst index file and wherein comprising, make while wishing to upgrade resource data such as the user, need to carry out editing operation to original resource data, can be according to above-mentioned corresponding relation, directly find out the affiliated index burst of this resource data, thereby only need in this index burst, search for corresponding resource data and edit, get final product, without other index bursts are carried out to search operation, contribute to reduce computational load, improve treatment effeciency.
According to another aspect of the invention, also proposed a kind of searching method for large data, having comprised: step 202, the index file of described large data is divided into to one or more sources group, the index file in each source group includes the resource data of same type; Step 204, carry out Fragmentation to each described source group, obtains a plurality of burst index files, utilizes the index burst that each described burst index file creation is corresponding; Step 206, according to the search instruction received, the execution concurrence search operation in corresponding burst search file at one or more index bursts of appointment, to obtain and to return corresponding Search Results.
In this technical scheme, by index file is carried out to burst, make when carrying out search, realize concurrent search operation on a plurality of index bursts, thereby required time while effectively having shortened the search that completes all index files has promoted search efficiency simultaneously.Generate different source groups by the type according to resource data, make when corresponding index burst is retrieved, be easier to the demand of user according to self, directly specify the index burst of partial response is retrieved, and all retrieved without the index burst to all, contribute to promote recall precision, reduce power consumption and calculation resources that search operaqtion consumes.Wherein, the index that index file comprises concrete resource data and generates based on these resource datas; Simultaneously, " one or more index bursts of appointment " can be the demand appointment of user according to self, can be also the part or all of index burst of acquiescence.
In technique scheme, preferably, described step 204 comprises: by the resource data in the group of same described source, according to the difference of residing server, be divided into and described server a plurality of burst index files one to one, and create corresponding index burst.
In this technical scheme, for originally just being stored in the resource data of a plurality of servers respectively, the resource data of storing on each server can be created as to corresponding index burst; For originally just being stored in the resource data in same server, it can be created as to a corresponding index burst, or be created as a plurality of index bursts after grouping.By based on server, resource data being created as to corresponding index burst, reduced as much as possible the move operation for resource data, contribute to reduce the calculation resources taken, avoid data to shift the loss of data equivalent risk that may cause.
In above-mentioned arbitrary technical scheme, preferably, described step 204 also comprises: for the resource data in same server, according to the level of intimate of relation, be divided into a plurality of burst index files, and create corresponding index burst.
In this technical scheme, level of intimate refers between resource data whether meet some default conditions simultaneously, when meeting wherein one or meeting many simultaneously, can think in close relations between resource data, can be used as the resource data of same type, for leaving same index burst in.Particularly, between data, exist the level of intimate of relation comprise as some data always (number of times is more than or equal to default frequency threshold value) called simultaneously or edited, or some data all relate to identical user, company etc.
In above-mentioned arbitrary technical scheme, preferably, described step 206 also comprises: obtain respectively the burst Search Results that each described index burst obtains; In all burst Search Results, select data that the matching degree of predetermined number is the highest as final Search Results, and return to described final Search Results.
In this technical scheme, keyword based on user's input, appointed each index burst is all carried out corresponding search operation, then after the burst Search Results all index bursts obtained carries out comprehensively, therefrom select the highest data of matching degree of predetermined number, thereby realized the merging of burst Search Results that a plurality of index bursts are obtained.
In above-mentioned arbitrary technical scheme, preferably, also comprise: preserve each described index burst and the resource data that wherein comprises between corresponding relation; During according to the edit instruction to the allocated resource data that receives, determine the index burst that comprises described allocated resource data according to described corresponding relation; The described allocated resource data of search executive editor's operation in definite index burst.
In this technical scheme, by the corresponding relation between the resource data of setting up the burst index file and wherein comprising, make while wishing to upgrade resource data such as the user, need to carry out editing operation to original resource data, can be according to above-mentioned corresponding relation, directly find out the affiliated index burst of this resource data, thereby only need in this index burst, search for corresponding resource data and edit, get final product, without other index bursts are carried out to search operation, contribute to reduce computational load, improve treatment effeciency.
By above technical scheme, can realize a kind of distributed index file searching method, contribute to promote search speed, solve the search efficiency bottleneck problem of the large data of enterprise.
The accompanying drawing explanation
Fig. 1 shows according to an embodiment of the invention the schematic block diagram for the search system of large data;
Fig. 2 shows according to an embodiment of the invention the schematic flow sheet for the searching method of large data;
Fig. 3 shows the principle framework schematic diagram of searching for according to an embodiment of the invention large data;
Fig. 4 shows the schematic flow sheet of execution index burst according to an embodiment of the invention.
Embodiment
In order more clearly to understand above-mentioned purpose of the present invention, feature and advantage, below in conjunction with the drawings and specific embodiments, the present invention is further described in detail.It should be noted that, in the situation that do not conflict, the application's embodiment and the feature in embodiment can combine mutually.
A lot of details have been set forth in the following description so that fully understand the present invention; but; the present invention can also adopt other to be different from other modes described here and implement, and therefore, protection scope of the present invention is not subject to the restriction of following public specific embodiment.
Fig. 1 shows according to an embodiment of the invention the schematic block diagram for the search system of large data.
As shown in Figure 1, according to an embodiment of the invention for the search system 100 of large data, comprise: grouped element 102, be divided into one or more sources group for the index file by described large data, the index file in each source group includes the resource data of same type; Burst creating unit 104, for each described source group is carried out to Fragmentation, obtain a plurality of burst index files, utilizes the index burst that each described burst index file creation is corresponding; Search unit 106, for according to the search instruction that receives, the execution concurrence search operation in corresponding burst search file at one or more index bursts of appointment, to obtain and to return corresponding Search Results.
In this technical scheme, by index file is carried out to burst, make when carrying out search, realize concurrent search operation on a plurality of index bursts, thereby required time while effectively having shortened the search that completes all index files has promoted search efficiency simultaneously.Generate different source groups by the type according to resource data, make when corresponding index burst is retrieved, be easier to the demand of user according to self, directly specify the index burst of partial response is retrieved, and all retrieved without the index burst to all, contribute to promote recall precision, reduce power consumption and calculation resources that search operaqtion consumes.Wherein, the index that index file comprises concrete resource data and generates based on these resource datas; Simultaneously, " one or more index bursts of appointment " can be the demand appointment of user according to self, can be also the part or all of index burst of acquiescence.
In technique scheme, preferably, described burst creating unit 104 for: by the resource data in the group of same described source, according to the difference of residing server, be divided into and described server a plurality of burst index files one to one, and create corresponding index burst.
In this technical scheme, for originally just being stored in the resource data of a plurality of servers respectively, the resource data of storing on each server can be created as to corresponding index burst; For originally just being stored in the resource data in same server, it can be created as to a corresponding index burst, or be created as a plurality of index bursts after grouping.By based on server, resource data being created as to corresponding index burst, reduced as much as possible the move operation for resource data, contribute to reduce the calculation resources taken, avoid data to shift the loss of data equivalent risk that may cause.
In above-mentioned arbitrary technical scheme, preferably, described burst creating unit 104 also for: for the resource data in same server, according to the level of intimate of relation, be divided into a plurality of burst index files, and create corresponding index burst.
In this technical scheme, level of intimate refers between resource data whether meet some default conditions simultaneously, when meeting wherein one or meeting many simultaneously, can think in close relations between resource data, can be used as the resource data of same type, for leaving same index burst in.Particularly, between data, exist the level of intimate of relation comprise as some data always (number of times is more than or equal to default frequency threshold value) called simultaneously or edited, or some data all relate to identical user, company etc.
In above-mentioned arbitrary technical scheme, preferably, described search unit 106 also for: obtain respectively the burst Search Results that each described index burst obtains; In all burst Search Results, select data that the matching degree of predetermined number is the highest as final Search Results, and return to described final Search Results.
In this technical scheme, keyword based on user's input, appointed each index burst is all carried out corresponding search operation, then after the burst Search Results all index bursts obtained carries out comprehensively, therefrom select the highest data of matching degree of predetermined number, thereby realized the merging of burst Search Results that a plurality of index bursts are obtained.
In above-mentioned arbitrary technical scheme, preferably, also comprise: relational storage unit 108, for the corresponding relation between the resource data of preserving each described burst index file and wherein comprising; Wherein, described search unit 106 also for: during according to the edit instruction to the allocated resource data that receives, determine the index burst that comprises described allocated resource data according to described corresponding relation, and the described allocated resource data of search executive editor's operation in definite index burst.
In this technical scheme, by the corresponding relation between the resource data of setting up the burst index file and wherein comprising, make while wishing to upgrade resource data such as the user, need to carry out editing operation to original resource data, can be according to above-mentioned corresponding relation, directly find out the affiliated index burst of this resource data, thereby only need in this index burst, search for corresponding resource data and edit, get final product, without other index bursts are carried out to search operation, contribute to reduce computational load, improve treatment effeciency.
Search system 100 with respect to shown in Fig. 1, be elaborated to the process based on the large data search of the present invention below in conjunction with Fig. 2-Fig. 4.
As shown in Figure 2, for the searching method of large data, comprising according to an embodiment of the invention: step 202, the index file of described large data is divided into to one or more sources group, the index file in each source group includes the resource data of same type; Step 204, carry out Fragmentation to each described source group, obtains a plurality of burst index files, utilizes the index burst that each described burst index file creation is corresponding; Step 206, according to the search instruction received, the execution concurrence search operation in corresponding burst search file at one or more index bursts of appointment, to obtain and to return corresponding Search Results.
In this technical scheme, by index file is carried out to burst, make when carrying out search, realize concurrent search operation on a plurality of index bursts, thereby required time while effectively having shortened the search that completes all index files has promoted search efficiency simultaneously.Generate different source groups by the type according to resource data, make when corresponding index burst is retrieved, be easier to the demand of user according to self, directly specify the index burst of partial response is retrieved, and all retrieved without the index burst to all, contribute to promote recall precision, reduce power consumption and calculation resources that search operaqtion consumes.Wherein, the index that index file comprises concrete resource data and generates based on these resource datas; Simultaneously, " one or more index bursts of appointment " can be the demand appointment of user according to self, can be also the part or all of index burst of acquiescence.
Search source as shown in Figure 3, be above-mentioned all index files, wherein include enterprise database, web data, file system, voice data, video data etc., type that can be based on different, above-mentioned all files are classified, such as the basic file type of standard, document types, Office Doctype, email type etc.
Such as dissimilar being called " Source Type " by above-mentioned, the resource data based on the identical sources type, can be placed in same set, i.e. " search source ".Certainly, the Source Type of the resource data in different search sources can be identical, also can be not identical.The search source of all same types can form one " source group ", can certainly form respectively a plurality of sources group, and the Source Type of the resource data that a plurality of sources group comprises can be identical.
Take " source group " be minimum physical isolation unit, the resource data of all large data is carried out to Fragmentation.Wherein, a source group can form a plurality of burst index files, thereby forms a plurality of index bursts.Each index burst can be called to a search core.Can configure and move a plurality of search core on every station server, and a plurality of search cores corresponding to source group also can configure and operate on a plurality of servers.Such as being divided into 3 search when a source group during core, can configure and move 3 whole search cores on a station server, also can on 3 station servers, respectively move a search core, or on a station server, core is searched in 1 of operation, on another station server, cores etc. are searched in 2 of operations, can be configured and adjust according to actual conditions.
When creating the index burst, need to the resource data in the search source, be crawled by index reptile plug-in unit, and the burst rule based on default, determine specifically how to carry out burst, and create corresponding index burst.
In technique scheme, preferably, described step 204 comprises: by the resource data in the group of same described source, according to the difference of residing server, be divided into and described server a plurality of burst index files one to one, and create corresponding index burst.
In this technical scheme, for originally just being stored in the resource data of a plurality of servers respectively, the resource data of storing on each server can be created as to corresponding index burst; For originally just being stored in the resource data in same server, it can be created as to a corresponding index burst, or be created as a plurality of index bursts after grouping.By based on server, resource data being created as to corresponding index burst, reduced as much as possible the move operation for resource data, contribute to reduce the calculation resources taken, avoid data to shift the loss of data equivalent risk that may cause.
In above-mentioned arbitrary technical scheme, preferably, described step 204 also comprises: for the resource data in same server, according to the level of intimate of relation, be divided into a plurality of burst index files, and create corresponding index burst.
In this technical scheme, level of intimate refers between resource data whether meet some default conditions simultaneously, when meeting wherein one or meeting many simultaneously, can think in close relations between resource data, can be used as the resource data of same type, for leaving same index burst in.Particularly, between data, exist the level of intimate of relation comprise as some data always (number of times is more than or equal to default frequency threshold value) called simultaneously or edited, or some data all relate to identical user, company etc.
In the embodiment shown in fig. 3, can carry out Fragmentation by the index distribution module.Particularly, the application has proposed the mode of multiple execution Fragmentation, such as:
In the first situation, can adopt the index level burst.For a resource data that the source group is corresponding, set up a plurality of search core, and be configured on a plurality of different servers.Wherein, each server can configure one or more search core.
If the resource data in the group of same source is located in a plurality of servers originally, can adopt aforesaid way, directly create corresponding a plurality of search cores.
In the second situation, can adopt the index vertical fragmentation.For a resource data that the source group is corresponding, all be stored in same server, and set up a plurality of search core.
If the resource data in the group of same source is located on same server originally, can adopt aforesaid way, directly create corresponding a plurality of search cores.
In the third situation, can adopt the level of intelligence vertical fragmentation.The data of enterprise normally have the operation rule, by data in close relations, by data relationship, calculate, and are distributed to same index burst.Carry out logical partitioning according to group or company in a lot of situations of data at present general large enterprise's database, usually there will be the data access together usually of certain several group or company, the data of other group company are accessed together, for such a case, we have proposed the intelligent stripping strategy of business-level, by all data to thering is the identical services association, be set in same index burst, these data can be arranged in same server, also can be arranged in a plurality of servers, all can be according to actual conditions, it is configured in same server, to set up corresponding one or more search cores, also it can be configured in different servers, to set up corresponding one or more search cores in each server.
By default which kind of minute sheet mode that adopts, concrete stripping strategy, can realize the auto plate separation to index file.
Based on above-mentioned processing, can obtain the perdurable data shown in Fig. 3, as index file, search Source Type, the grouping of search source, search source information, index stripping strategy etc., for the user, carry out search operation.
Fig. 4 shows the schematic flow sheet of execution index burst according to an embodiment of the invention.
As shown in Figure 4, the flow process of execution index burst comprises according to an embodiment of the invention:
Step 402, source group reptile plug-in unit crawls the index data (being above-mentioned resource data) in search source, is designated as: list, int i=0.
Step 404, obtain i bar index data.
Step 406, according to default stripping strategy, determining whether needs to create new index burst, if need, enters step 408, otherwise enters step 410.
Step 408, according to the stripping strategy obtained, than horizontal fragmentation strategy described above or vertical fragmentation strategy etc., be transmitted to the burst server by index data, for managing the server of Fragmentation.
Step 410, determine and need the index file upgraded.
Step 412, according to the contrast information table, upgrade the index burst.
Step 414, judge whether i<list.size(), if, there is still untreated index data, enter step 416, otherwise finish.
Step 416, i++, after making i add 1, return to step 404.
In above-mentioned steps 412, relate to " contrast information table ".In the contrast information table, corresponding relation between the resource data that is actually each index burst and wherein comprises, when the user need to be operated certain resource data, such as needs are deleted it or upgrade, the contrast information table that system can be based on above-mentioned, determine the residing index burst of this resource data, and only need in this index burst, carry out the search get final product, do not need the index burst irrelevant to other to carry out search operation, contribute to reduce running load, promote recall precision.
Index field Explanation of field
sourcegroup Search source group sign
source Search source sign
ID Data ID
Shard Burst
Table 1
Table 1 shows the form of the contrast information table in a kind of situation, has wherein comprised the information such as search source group sign, search source sign, data ID (unique identification of resource data), burst (belonging to which index burst).
In above-mentioned arbitrary technical scheme, preferably, also comprise: preserve each described index burst and the resource data that wherein comprises between corresponding relation; During according to the edit instruction to the allocated resource data that receives, determine the index burst that comprises described allocated resource data according to described corresponding relation; The described allocated resource data of search executive editor's operation in definite index burst.
In this technical scheme, by the corresponding relation between the resource data of setting up the burst index file and wherein comprising, make while wishing to upgrade resource data such as the user, need to carry out editing operation to original resource data, can be according to above-mentioned corresponding relation, directly find out the affiliated index burst of this resource data, thereby only need in this index burst, search for corresponding resource data and edit, get final product, without other index bursts are carried out to search operation, contribute to reduce computational load, improve treatment effeciency.
In Fig. 3, in " index file distribution frame ", also include a plurality of functional modules.Wherein:
Cluster burst information acquisition module: the control center that is all cluster burst information.Wherein, the cluster burst information is equivalent to the data field of burst information, and this module externally provides the burst collocation strategy in source group and search source, and this provides the change service to above-mentioned stripping strategy.
Index burst and administration module: the index data cutting algorithm of putting according to the source assembly, the index quantity of each index burst of global statistics, quantity according to statistics, provide and carry out index distribution command foundation, and pin is responsible for loading and management contrast information table (as shown in table 1), for reptile and search inquiry, uses.This functional module specifically can be taked cache policy, and the amount of active index key is loaded in internal memory, sets up the storage of hash table, accelerates retrieval process speed.Above this index management mode is also more convenient to dynamic expansion shard.
The dynamic-configuration module: mainly complete the dynamic appending of search cluster server, and the index stripping strategy, and the management of the dynamic-configuration of search strategy.
Particularly, along with the increase of business data amount, in order to improve performance, need to add search server, or need to increase the search burst to some sources group, improve search efficiency, so dynamic expansion is also a critical function of cluster management.Particularly, for the server of configuration index burst in cluster, take mirror-image copies, server code is identical with former server code, dynamically is switched on new high-performance server; For the server newly increased, by the dynamic-configuration module, encoded, server code does not allow repetition in whole cluster, if coding is enabled, do not allow the change of encoding, adjustment need to be at the source group of the enterprising line index burst of this server and the index stripping strategy in search source.In addition, index burst and administration module also can pass through load balancing, preferentially on newly-increased server, create the index burst.
Search merges engine: when the user proposes searching request to some servers, search merges engine and carries out concurrent search according to the index burst of user's appointment, after having searched for, Search Results is merged, and returns to search subscriber.If assigned indexes burst not, whole index bursts of default search source group, Search Results returns to the user after merging.
In above-mentioned arbitrary technical scheme, preferably, described step 206 also comprises: obtain respectively the burst Search Results that each described index burst obtains; In all burst Search Results, select data that the matching degree of predetermined number is the highest as final Search Results, and return to described final Search Results.
In this technical scheme, keyword based on user's input, appointed each index burst is all carried out corresponding search operation, then after the burst Search Results all index bursts obtained carries out comprehensively, therefrom select the highest data of matching degree of predetermined number, thereby realized the merging of burst Search Results that a plurality of index bursts are obtained.
More than be described with reference to the accompanying drawings technical scheme of the present invention, technical scheme of the present invention can realize:
1. by index file under the enterprise clusters environment is carried out to burst, improved greatly the processing power to large data, made the data-handling capacity of enterprise bring up to the TB magnitude from the GB magnitude.
2. by the multi-core parallel concurrent way of search, improved the search speed of large data, millions data and do not adopt this technology to be contrasted before, can improve a plurality of orders of magnitude.
3. by the support level extended mode, make enterprise be easy to increase search server.
The foregoing is only the preferred embodiments of the present invention, be not limited to the present invention, for a person skilled in the art, the present invention can have various modifications and variations.Within the spirit and principles in the present invention all, any modification of doing, be equal to replacement, improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. the search system for large data, is characterized in that, comprising:
Grouped element, be divided into one or more sources group for the index file by described large data, and the index file in each source group includes the resource data of same type;
The burst creating unit, for each described source group is carried out to Fragmentation, obtain a plurality of burst index files, utilizes the index burst that each described burst index file creation is corresponding;
Search unit, for according to the search instruction that receives, the execution concurrence search operation in corresponding burst search file at one or more index bursts of appointment, to obtain and to return corresponding Search Results.
2. the search system for large data according to claim 1, is characterized in that, described burst creating unit is used for:
By the resource data in the group of same described source, according to the difference of residing server, be divided into and described server a plurality of burst index files one to one, and create corresponding index burst.
3. the search system for large data according to claim 2, is characterized in that, described burst creating unit also for:
For the resource data in same server, be divided into a plurality of burst index files according to the level of intimate of relation, and create corresponding index burst.
4. the search system for large data according to claim 1, is characterized in that, described search unit also for:
Obtain respectively the burst Search Results that each described index burst obtains;
In all burst Search Results, select data that the matching degree of predetermined number is the highest as final Search Results, and return to described final Search Results.
5. according to the described search system for large data of any one in claim 1 to 4, it is characterized in that, also comprise:
The relational storage unit, for the corresponding relation between the resource data of preserving each described burst index file and wherein comprising;
Wherein, described search unit also for: during according to the edit instruction to the allocated resource data that receives, determine the index burst that comprises described allocated resource data according to described corresponding relation, and the described allocated resource data of search executive editor's operation in definite index burst.
6. the searching method for large data, is characterized in that, comprising:
Step 202, be divided into one or more sources group by the index file of described large data, and the index file in each source group includes the resource data of same type;
Step 204, carry out Fragmentation to each described source group, obtains a plurality of burst index files, utilizes the index burst that each described burst index file creation is corresponding;
Step 206, according to the search instruction received, the execution concurrence search operation in corresponding burst search file at one or more index bursts of appointment, to obtain and to return corresponding Search Results.
7. the searching method for large data according to claim 6, is characterized in that, described step 204 comprises:
By the resource data in the group of same described source, according to the difference of residing server, be divided into and described server a plurality of burst index files one to one, and create corresponding index burst.
8. the searching method for large data according to claim 7, is characterized in that, described step 204 also comprises:
For the resource data in same server, be divided into a plurality of burst index files according to the level of intimate of relation, and create corresponding index burst.
9. the searching method for large data according to claim 6, is characterized in that, described step 206 also comprises:
Obtain respectively the burst Search Results that each described index burst obtains;
In all burst Search Results, select data that the matching degree of predetermined number is the highest as final Search Results, and return to described final Search Results.
10. according to the described searching method for large data of any one in claim 6 to 9, it is characterized in that, also comprise:
Corresponding relation between the resource data of preserving each described index burst and wherein comprising;
During according to the edit instruction to the allocated resource data that receives, determine the index burst that comprises described allocated resource data according to described corresponding relation;
The described allocated resource data of search executive editor's operation in definite index burst.
CN201310392278.3A 2013-09-02 2013-09-02 Searching system and searching method of big data Pending CN103488687A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310392278.3A CN103488687A (en) 2013-09-02 2013-09-02 Searching system and searching method of big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310392278.3A CN103488687A (en) 2013-09-02 2013-09-02 Searching system and searching method of big data

Publications (1)

Publication Number Publication Date
CN103488687A true CN103488687A (en) 2014-01-01

Family

ID=49828913

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310392278.3A Pending CN103488687A (en) 2013-09-02 2013-09-02 Searching system and searching method of big data

Country Status (1)

Country Link
CN (1) CN103488687A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252537A (en) * 2014-09-18 2014-12-31 深圳市彩讯科技有限公司 Index fragmentation method based on mail characteristics
CN104462381A (en) * 2014-12-11 2015-03-25 北京中细软移动互联科技有限公司 Trademark image retrieval method
CN105045684A (en) * 2015-07-16 2015-11-11 北京京东尚科信息技术有限公司 Method and device for switching and controlling indexes
CN106021440A (en) * 2016-05-16 2016-10-12 中国建设银行股份有限公司 Search method and device
CN106528683A (en) * 2016-10-25 2017-03-22 深圳市盛凯信息科技有限公司 Index segmenting equalization based big data cloud search platform and method thereof
CN107315761A (en) * 2017-04-17 2017-11-03 阿里巴巴集团控股有限公司 A kind of data-updating method, data query method and device
CN108197296A (en) * 2018-01-23 2018-06-22 马上消费金融股份有限公司 Date storage method based on Elasticsearch indexes
CN108984659A (en) * 2018-06-28 2018-12-11 山东浪潮商用系统有限公司 A kind of file equalization methods for IDFS
CN109002448A (en) * 2017-06-07 2018-12-14 中国移动通信集团甘肃有限公司 A kind of report form statistics method, apparatus and system
CN109496420A (en) * 2018-08-22 2019-03-19 袁振南 Cyclic annular server set group managing means, device and computer storage medium
CN109656978A (en) * 2018-12-24 2019-04-19 泰华智慧产业集团股份有限公司 The optimization method of near real-time search service
CN110309390A (en) * 2018-03-15 2019-10-08 广东神马搜索科技有限公司 Index column indention method, apparatus and server suitable for search
CN110674108A (en) * 2019-08-30 2020-01-10 中国人民财产保险股份有限公司 Data processing method and device
CN110795626A (en) * 2019-10-28 2020-02-14 南京弹跳力信息技术有限公司 Big data processing method and system
CN110990399A (en) * 2016-09-12 2020-04-10 杭州数梦工场科技有限公司 Index reconstruction method and device
CN116991892A (en) * 2023-07-08 2023-11-03 上海螣龙科技有限公司 Network asset data query method, system, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169507A (en) * 2011-05-26 2011-08-31 厦门雅迅网络股份有限公司 Distributed real-time search engine
CN102332004A (en) * 2011-07-29 2012-01-25 中国科学院计算技术研究所 Data processing method and system for managing mass data
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102169507A (en) * 2011-05-26 2011-08-31 厦门雅迅网络股份有限公司 Distributed real-time search engine
CN102332004A (en) * 2011-07-29 2012-01-25 中国科学院计算技术研究所 Data processing method and system for managing mass data
CN102779185A (en) * 2012-06-29 2012-11-14 浙江大学 High-availability distribution type full-text index method

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104252537A (en) * 2014-09-18 2014-12-31 深圳市彩讯科技有限公司 Index fragmentation method based on mail characteristics
CN104252537B (en) * 2014-09-18 2019-05-21 彩讯科技股份有限公司 Index sharding method based on mail features
CN104462381A (en) * 2014-12-11 2015-03-25 北京中细软移动互联科技有限公司 Trademark image retrieval method
CN104462381B (en) * 2014-12-11 2019-03-19 中细软移动互联科技有限公司 Trademark image retrieval method
CN105045684B (en) * 2015-07-16 2018-06-15 北京京东尚科信息技术有限公司 Index switching and the method and device of index control
CN105045684A (en) * 2015-07-16 2015-11-11 北京京东尚科信息技术有限公司 Method and device for switching and controlling indexes
CN106021440B (en) * 2016-05-16 2019-10-18 中国建设银行股份有限公司 A kind of searching method and device
CN106021440A (en) * 2016-05-16 2016-10-12 中国建设银行股份有限公司 Search method and device
CN110990399B (en) * 2016-09-12 2023-04-28 杭州数梦工场科技有限公司 Reconstruction index method and device
CN110990399A (en) * 2016-09-12 2020-04-10 杭州数梦工场科技有限公司 Index reconstruction method and device
CN106528683A (en) * 2016-10-25 2017-03-22 深圳市盛凯信息科技有限公司 Index segmenting equalization based big data cloud search platform and method thereof
CN107315761B (en) * 2017-04-17 2020-08-04 阿里巴巴集团控股有限公司 Data updating method, data query method and device
CN107315761A (en) * 2017-04-17 2017-11-03 阿里巴巴集团控股有限公司 A kind of data-updating method, data query method and device
CN109002448A (en) * 2017-06-07 2018-12-14 中国移动通信集团甘肃有限公司 A kind of report form statistics method, apparatus and system
CN108197296A (en) * 2018-01-23 2018-06-22 马上消费金融股份有限公司 Date storage method based on Elasticsearch indexes
CN110309390A (en) * 2018-03-15 2019-10-08 广东神马搜索科技有限公司 Index column indention method, apparatus and server suitable for search
CN108984659A (en) * 2018-06-28 2018-12-11 山东浪潮商用系统有限公司 A kind of file equalization methods for IDFS
CN109496420A (en) * 2018-08-22 2019-03-19 袁振南 Cyclic annular server set group managing means, device and computer storage medium
CN109496420B (en) * 2018-08-22 2021-02-23 袁振南 Ring server cluster management method, device and computer storage medium
CN109656978A (en) * 2018-12-24 2019-04-19 泰华智慧产业集团股份有限公司 The optimization method of near real-time search service
CN110674108A (en) * 2019-08-30 2020-01-10 中国人民财产保险股份有限公司 Data processing method and device
CN110795626A (en) * 2019-10-28 2020-02-14 南京弹跳力信息技术有限公司 Big data processing method and system
CN116991892A (en) * 2023-07-08 2023-11-03 上海螣龙科技有限公司 Network asset data query method, system, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN103488687A (en) Searching system and searching method of big data
CN102169507B (en) Implementation method of distributed real-time search engine
CN102725755B (en) Method and system of file access
US9576013B2 (en) Optimizing update operations in in-memory database systems
CN107515878B (en) Data index management method and device
CN110147407B (en) Data processing method and device and database management server
CN103020255B (en) Classification storage means and device
CN106919675B (en) Data storage method and device
CN104778270A (en) Storage method for multiple files
CN104679778A (en) Search result generating method and device
CN103473239A (en) Method and device for updating data of non relational database
CN106294695A (en) A kind of implementation method towards the biggest data search engine
CN101790257A (en) Method for memorizing data and network management system
KR20130049111A (en) Forensic index method and apparatus by distributed processing
CN104239377A (en) Platform-crossing data retrieval method and device
CN105100050A (en) User permission management method and system
CN104111924A (en) Database system
CN113468199B (en) Index updating method and system
CN104111936A (en) Method and system for querying data
CN106155934A (en) Based on the caching method repeating data under a kind of cloud environment
CN102724301B (en) Cloud database system and method and equipment for reading and writing cloud data
CN105095515A (en) Bucket dividing method, device and equipment supporting fast query of Map-Reduce output result
CN103365923A (en) Method and device for assessing partition schemes of database
KR101666440B1 (en) Data processing method in In-memory Database System based on Circle-Queue
CN112000703B (en) Data warehousing processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 100094 Haidian District North Road, Beijing, No. 68

Applicant after: Yonyou Network Technology Co., Ltd.

Address before: 100094 Beijing city Haidian District North Road No. 68, UFIDA Software Park

Applicant before: UFIDA Software Co., Ltd.

COR Change of bibliographic data
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20140101