CN102576360B - Search device and system - Google Patents

Search device and system Download PDF

Info

Publication number
CN102576360B
CN102576360B CN200980161042.0A CN200980161042A CN102576360B CN 102576360 B CN102576360 B CN 102576360B CN 200980161042 A CN200980161042 A CN 200980161042A CN 102576360 B CN102576360 B CN 102576360B
Authority
CN
China
Prior art keywords
result
retrieval
request
storage part
object data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN200980161042.0A
Other languages
Chinese (zh)
Other versions
CN102576360A (en
Inventor
新名博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Toshiba Digital Solutions Corp
Original Assignee
Toshiba Corp
Toshiba Solutions Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp, Toshiba Solutions Corp filed Critical Toshiba Corp
Publication of CN102576360A publication Critical patent/CN102576360A/en
Application granted granted Critical
Publication of CN102576360B publication Critical patent/CN102576360B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Command reception unit (12) receives a search command from a client device (5). An analysis unit (30) analyzes the search command and generates search requests for multiple management devices (3) that disperse and manage search target data. A first switching unit (32) switches between a first permitted state and a first prohibited state according to the amount of search result data stored in a result storage unit (22), which temporarily stores a search result for a search request. A request transmission unit (14) transmits search requests to the respective management devices (3) and, when the first permitted state exists, sequentially transmits a result request that requests a search result of a prescribed size to the respective management devices (3). A result reception unit (16) sequentially receives the search results of the prescribed size from the respective management devices (3), storing the results in the result storage unit (22). A compiling unit (34) compiles the search results stored in the result storage unit (22) to generate a compiled result, and deletes the search results, the compiling of which has been completed, from the result storage unit (22). A compiled result transmission unit (18) transmits the compiled result to the client device (5).

Description

Indexing unit and system
Technical field
The present invention relates to retrieval.
Background technology
There will be a known the discrete retrieval system without sharing formation, the discrete retrieval system that this nothing shares formation has: multiple management devices, carries out Decentralization to searching object data in the past; And indexing unit, to each management devices request retrieval, and concentrate the result for retrieval (such as with reference to non-patent literature 1) of each management devices.
When retrieving a large amount of searching object data for the retrieval request from client terminal device, if kept all result for retrieval of each management devices by indexing unit, then need more memory capacity in storer.Therefore, in discrete retrieval system in the past, when indexing unit obtains concentrated result from client device requests at every turn, to each management devices request result for retrieval, and asked result for retrieval is concentrated, return to client terminal device.
Prior art document:
Non-patent literature:
Non-patent literature 1:M.TAMER OZSU et al., " Principle of DISTRIBUTEDDATABASE SYSTEMS ", PRENTICE HALL; Second Edition, 1999.13.2.3Parallel System Architectures.pp.424-430
Summary of the invention
The problem that invention will solve
But, in searching system in the past, concentrate obtaining of result to need the time.
The present invention completes in view of the above problems, and its object is to provide a kind of does not increase memory capacity and shorten indexing unit and the system of the time that obtains of concentrated result.
For solving the means of problem
In order to solve above-mentioned problem and achieve the goal, the feature of the indexing unit that a form of the present invention relates to is, comprising: instruction reception portion, receives search instruction from client terminal device; Analysis portion, analyzes described search instruction, and make respectively for the retrieval request of multiple management devices, the plurality of management devices carries out Decentralization to searching object data; First switching part, according to the data volume of the result for retrieval for described retrieval request stored in result storage part, switch the first illegal state of result for retrieval described in the first License Status of result for retrieval described in license request and inhibition request, described result storage part temporarily stores described result for retrieval; Request sending part, sends described retrieval request respectively to described management devices, and when described first License Status, successively sends request the result request of the described result for retrieval of given size amount respectively to described management devices; Result acceptance division, successively receives the described result for retrieval of described given size amount respectively from described management devices, be stored in described result storage part; The portion of concentrating, concentrates the described result for retrieval stored in described result storage part, result in spanning set, and from described result storage part, delete the described result for retrieval concentrated; And concentrated result sending part, send described concentrated result to described client terminal device.
In addition, the feature of the searching system that another form of the present invention relates to is, comprise: described indexing unit and the multiple described management devices be connected with described indexing unit via network, each of described management devices comprises: request receiving portion, receive described retrieval request from described indexing unit, and successively receive described result request; Search part, from the data store storing described searching object data, successively retrieval represents the address meeting the memory location of the described searching object data of described retrieval request; Second switching part, according to the data volume of the described searching object data stored in temporary transient storage part, switch license obtain the second License Status of described searching object data and forbid obtaining the second illegal state of described searching object data, described temporary transient storage part temporarily stores the described searching object data obtained from described data store; Obtaining section, when described second License Status, obtains the described searching object data shown in the described address retrieved, is stored in described temporary transient storage part from described data store; And result for retrieval sending part, when each described result request is received, the described searching object data of described given size amount are obtained from described temporary transient storage part, send to described indexing unit as described result for retrieval acquired described searching object data, and delete from described temporary transient storage part the described searching object data sent.
Invention effect:
According to the present invention, achieve and can not increase memory capacity and the effect shortening the time that obtains of concentrated result.
Accompanying drawing explanation
Fig. 1 is the block diagram of the summary of the searching system representing present embodiment.
Fig. 2 is the block diagram of the details of the searching system representing present embodiment.
Fig. 3 is the state transition diagram of the indexing unit representing present embodiment.
Fig. 4 is the state transition diagram of the management devices representing present embodiment.
Fig. 5 is the process flow diagram of the processing sequence of the indexing unit representing present embodiment.
Fig. 6 is the process flow diagram of the processing sequence of the management devices representing present embodiment.
Embodiment
Below, reference accompanying drawing is while describe the embodiment of searching system of the present invention in detail.
Fig. 1 is the block diagram that the summary of the searching system 1 representing present embodiment is formed.As shown in Figure 1, comprise and decentralized multiple management devices 3 is carried out to searching object data and retrieves to the request of each management devices 3 and concentrate the indexing unit 2 of the result for retrieval of each management devices 3.Indexing unit 2 is connected via network 4 with multiple management devices 3.
The searching system 1 of present embodiment is the discrete retrieval system that so-called nothing shares formation (shared nothing), and multiple management devices 3 separately has computer resource.That is, each management devices 3 does not have computer resource with other management devices, but separately manages searching object data.The searching object data managed by each management devices 3 are that the unitary set of searching object data searching system 1 managed is divided into part set.In addition, in the present embodiment, for the number of units of indexing unit 2 be 1, the number of units of management devices 3 is that the situation of 100 is described, but the number of units of indexing unit 2 and management devices 3 is not limited to this.
In addition, network 4 is connected with client terminal device 5, this client terminal device 5 command search device 2 is retrieved, and receives the concentrated result of indexing unit 2.In addition, network 4 is not limited to wireless or wired, LAN (Local Area Network: LAN (Local Area Network)) or common communication circuit, and network in any case can.
Fig. 2 is the block diagram of an example of the detailed formation of the searching system 1 representing present embodiment.As shown in Figure 2, indexing unit 2 comprises Department of Communication Force 10, storage part 20, analysis portion 30, first switching part 32 and concentrated portion 34.
Department of Communication Force 10 communicates with between client terminal device 5 or multiple management devices 3 via network 4, such as, can be realized by existing control device such as the existing communicators such as communication interface or CPU (Central ProcessingUnit: CPU (central processing unit)).Further, Department of Communication Force 10 comprises instruction reception portion 12, request sending part 14, result acceptance division 16 and concentrated result sending part 18.In addition, the details about these each several parts describe later.
Storage part 20 stores the information etc. used in the various program performed by indexing unit 2 or the various process undertaken by indexing unit 2.Storage part 20 such as can magnetic, optics or the existing memory storage that electrically carries out storing be realized by HDD (Hard Disk Drive: hard disk drive), SSD (Solid State Drive: solid-state drive), storage card, CD, ROM (ReadOnly Memory: ROM (read-only memory)), RAM (Random Access Memory: random access memory) etc.Further, storage part 20 comprises result storage part 22.In addition, the details about result storage part 22 describe later.
Instruction reception portion 12 receives search instruction from client terminal device 5.Search instruction is the instruction using all searching object data (unitary set) of searching system 1 management as searching object.In addition, instruction reception portion 12 receives from client terminal device 5 and sends instruction, and this transmission instruction instruction sends the concentrated result concentrated by concentrated portion 34 described later.
Analysis portion 30 is analyzed the search instruction received by instruction reception portion 12, and makes the retrieval request for each management devices 3.That is, the search instruction received by instruction reception portion 12 is divided into the retrieval request for the searching object data managed by each management devices 3 (part set) by analysis portion 30.Such as, if the result that analysis portion 30 analyzes search instruction is name be called the searching object data stored in the database of " patent " all be all necessary.In this situation, analysis portion 30 is called as the management devices 3 of the database of " patent " for management, makes the such retrieval type of db (" patent ")/patent as retrieval request.In addition, in this embodiment, the root folder (root directory) being called as the database of " patent " becomes patent.
In addition, analysis portion 30 makes and is used for concentrating the concentrated rule for the result for retrieval of made each retrieval request.
Result storage part 22 temporarily stores the result for retrieval for the retrieval request made by analysis portion 30.In the present embodiment, realized by RAM.
First switching part 32, according to the data volume of the result for retrieval stored in result storage part 22, switches the first License Status of license request result for retrieval and the first illegal state of inhibition request result for retrieval.
Specifically, the data volume of the result for retrieval stored in result storage part 22 when the first License Status is more than first threshold, the first switching part 32 switches to the first illegal state from the first License Status.In addition, the data volume of result for retrieval stored in result storage part 22 when the first illegal state be below Second Threshold and this Second Threshold is less than first threshold, the first switching part 32 switches to the first License Status from the first illegal state.
In addition, the first switching part 32 is when the first License Status, and the data volume of the result for retrieval stored in reference results storage part 22, determines the request size of the result for retrieval of being asked by request sending part 14 described later.Specifically, the idle capacity that the first switching part 32 obtains after deducting the data volume of the result for retrieval stored in result storage part 22 with reference to first threshold, determines the request size of result for retrieval.Such as, be 1GB in first threshold, when the data volume of the result for retrieval stored in result storage part 22 is 0B, the request size of the result for retrieval of each management devices 3 for 100 management devices 3 determines as 10MB by the first switching part 32.
In addition, in the present embodiment, the management such as the mark stored in storage part 20 are utilized to be the first License Status or the first illegal state.Further, the first switching part 32, by switching this mark, switches the first License Status and the first illegal state.In addition, in the present embodiment, in storage part 20 grade, first threshold and Second Threshold is stored in advance.
Request sending part 14 sends to each management devices 3 retrieval request made by analysis portion 30.In addition, sending part 14 is asked successively to send request the result request of the result for retrieval of given size amount to each management devices 3 when the first License Status.In addition, as previously mentioned, given size (request size) is determined by the first switching part 32.
Result acceptance division 16 successively receives the result for retrieval of given size amount from each management devices 3, and is stored in result storage part 22.In addition, when management devices 3 have sent all result for retrieval, result acceptance division 16 receives the transmission ending message representing and have sent all result for retrieval from this management devices 3.When be have received transmission ending message by result acceptance division 16, the first switching part 32 gets rid of the management devices 3 that have sent and send ending message from the request object of result for retrieval.
Concentrate portion 34 to be concentrated by the result for retrieval stored in result storage part 22, result in spanning set, and from result storage part 22, delete the result for retrieval concentrated.Specifically, when receiving transmission instruction by instruction reception portion 12 at every turn, concentrate portion 34 according to the concentrated rule made by analysis portion 30, concentrate result for retrieval.
Result sending part 18 is concentrated to send the concentrated result generated by concentrated portion 34 to client terminal device 5.
In addition, about analysis portion 30, first switching part 32 and concentrated portion 34, such as, realized by existing control device.In addition, in the present embodiment, result acceptance division 16 and concentrated portion 34 action is side by side configured to.
Fig. 3 is the state transition diagram of an example of the state transfer of the indexing unit 2 representing present embodiment.In addition, in the example shown in Fig. 3, indicating first threshold is 1GB, state transfer when Second Threshold is 700MB.
First, when indexing unit 2 starts process, in result storage part 22, result for retrieval is not stored.Therefore, the first switching part 32 is judged as that the data volume of the result for retrieval stored in result storage part 22 is less than 1GB, is set as the first License Status (reference arrow 40).
Under the first License Status, license request result for retrieval, so request sending part 14 successively sends result request to each management devices 3.Thereupon, result acceptance division 16 successively receives result for retrieval from each management devices 3, is stored in result storage part 22.On the other hand, concentrate portion 34 according to the transmission instruction from client terminal device 5, concentrate the result for retrieval stored in result storage part 22, result in spanning set, and from result storage part 22, delete the result for retrieval concentrated.Wherein, the transmission instruction from client terminal device 5 is not limited to successively carry out.Therefore, under the first License Status, there is the result for retrieval be newly stored in result storage part 22 compared with the result for retrieval deleted from result storage part 22 and become many tendencies, thus the tendency that the data volume that there is the result for retrieval stored in result storage part 22 also increases.
During being judged as that by the first switching part 32 data volume of the result for retrieval stored in result storage part 22 is less than 1GB, continue the first License Status (reference arrow 41).Further, when the first switching part 32 is judged as that the data volume of the result for retrieval stored in result storage part 22 is more than 1GB, the first illegal state (reference arrow 42) is switched to from the first License Status.
Under the first illegal state, inhibition request result for retrieval, so request sending part 14 stops sending result request to management devices 3.Thereupon, the transmission from the result for retrieval of management devices 3 also stops, and the result for retrieval to result storage part 22 undertaken by result acceptance division 16 stores and also stops.On the other hand, concentrate portion 34 also according to the transmission instruction from client terminal device 5 under the first illegal state, the result for retrieval stored in concentrated result storage part 22, result in spanning set, and from result storage part 22, delete the result for retrieval concentrated.Therefore, under the first illegal state, the tendency that the data volume that there is the result for retrieval stored in result storage part 22 reduces.
During being judged as that by the first switching part 32 data volume of the result for retrieval stored in result storage part 22 is more than 700MB, continue the first illegal state (reference arrow 43).Further, when the first switching part 32 is judged as that the data volume of the result for retrieval stored in result storage part 22 is below 700MB, the first License Status (reference arrow 44) is switched to from the first illegal state.
Return Fig. 2, management devices 3 comprises Department of Communication Force 50, storage part 60, search part 70, second switching part 72 and obtaining section 74.
Department of Communication Force 50 communicates via between network 4 with indexing unit 2, such as, can be realized by existing communicator or existing control device.Further, Department of Communication Force 50 comprises request receiving portion 52 and result for retrieval sending part 54.In addition, the details about these each several parts describe later.
Storage part 60 stores the information etc. used in the various program performed by management devices 3 or the various process undertaken by management devices 3, in the same manner as indexing unit 2, can be realized by existing memory storage.Further, storage part 60 comprises data store 62 and temporary transient storage part 64.In addition, the details about these each several parts describe later.
Request receiving portion 52 receives retrieval request from indexing unit 2.In addition, request receiving portion 52 is from indexing unit 2 successively reception result request.
Data store 62 memory scan object data, such as, can be realized by HDD etc.Data store 62 plays a role as the database of the such structured document of management XML (Extensible Markup Language: extend markup language) file or RDB (relational database: relational database) etc.
Search part 70 successively retrieves the address representing and meet the memory location of the searching object data of retrieval request from data store 62.Such as, search part 70, when being received the such retrieval request of db (" patent ")/patent by request receiving portion 52, successively retrieves the address of the searching object data meeting this retrieval request from data store 62.
Temporary transient storage part 64 temporarily stores the searching object data obtained from data store 62 by obtaining section 74 described later, in the present embodiment, can be realized by RAM.
Second switching part 72, according to the data volume of the searching object data stored in temporary transient storage part 64, switches license and obtains the second License Status of searching object data and forbid obtaining the second illegal state of searching object data.
Specifically, the data volume of the searching object data stored in temporary transient storage part 64 when the second License Status is more than 3rd threshold value, the second switching part 72 switches to the second illegal state from the second License Status.In addition, the data volume of searching object data stored in temporary transient storage part 64 when the second illegal state be below the 4th threshold value and the 4th threshold value is less than the 3rd threshold value, the second switching part 72 switches to the second License Status from the second illegal state.In addition, in the present embodiment, the 3rd threshold value and the 4th threshold value preset.
In addition, in the present embodiment, the management such as the mark stored in storage part 60 are utilized to be the second License Status or the second illegal state.Further, the second switching part 72, by switching this mark, switches the second License Status and the second illegal state.In addition, in the present embodiment, in storage part 60, the 3rd threshold value and the 4th threshold value is stored in advance.
When the second License Status, obtaining section 74 obtains the searching object data shown in address retrieved by search part 70 from data store 62, be stored in temporary transient storage part 64.Obtaining of the searching object data that obtaining section 74 is carried out only is carried out under the second License Status, but, the address search of the searching object data that search part 70 is carried out has nothing to do with the second License Status and the second illegal state, continues to retrieve, until terminate.In addition, the data size of searching object data becomes the large scale of hundred times to millions of times of degree compared with the data size of address.Particularly, when data store 62 is databases of managing structured document, the data size that there are searching object data significantly becomes large tendency compared with the data size of address.
When receiving result request by request receiving portion 52 at every turn, result for retrieval sending part 54 obtains the searching object data of the given size amount of being specified by result request from temporary transient storage part 64, send to indexing unit 2 as result for retrieval acquired searching object data, and delete the searching object data sent from temporary transient storage part 64.Such as, when the result request of result for retrieval being received request 10MB by request receiving portion 52, the searching object data that result for retrieval sending part 54 obtains 10MB from temporary transient storage part 64 send to indexing unit 2, and delete the searching object data sent from temporary transient storage part 64.In addition, when result for retrieval sending part 54 have sent all searching object data obtained by obtaining section 74, the transmission ending message representing and have sent all result for retrieval is sent to indexing unit 2.
In addition, about search part 70, second switching part 72 and obtaining section 74, such as, can be realized by existing control device.In addition, in the present embodiment, search part 70, obtaining section 74 and result for retrieval sending part 54 are configured to carry out action arranged side by side.
Fig. 4 is the state transition diagram of an example of the state transfer of the management devices 3 representing present embodiment.In addition, in the example shown in Figure 4, state transfer when the 3rd threshold value is 100MB, the 4th threshold value is 50MB is shown.
First, when management devices 3 starts process, in temporary transient storage part 64, searching object data are not stored.Therefore, the second switching part 72 is judged as that the data volume of the searching object data stored in temporary transient storage part 64 is less than 100MB, is set as the second License Status (reference arrow 80).
Under the second License Status, license obtains searching object data, so obtaining section 74 successively obtains the searching object data shown in address retrieved by search part 70 from data store 62, and is stored in temporary transient storage part 64.On the other hand, whenever receiving result request by request receiving portion 52, result for retrieval sending part 54 obtains the searching object data of given size amount from temporary transient storage part 64, send to indexing unit 2 as result for retrieval acquired searching object data, and delete the searching object data sent from temporary transient storage part 64.But, when indexing unit 2 is first License Status, do not send result request from indexing unit 2, request receiving portion 52 not reception result request.Therefore, under the second License Status, to exist compared with the searching object data of deleting from temporary transient storage part 64, newly the searching object data be stored in temporary transient storage part 64 becomes more than tendency, the tendency that the data volume that to there are in temporary transient storage part 64 the searching object data of storage also increases.
During being judged as that by the second switching part 72 data volume of the searching object data stored in temporary transient storage part 64 is less than 100MB, continue the second License Status (reference arrow 81).Further, the second switching part 72, when the data volume of the searching object data being judged as storage in temporary transient storage part 64 is more than 100MB, switches to the second illegal state (reference arrow 82) from the second License Status.
Under the second illegal state, forbid obtaining searching object data, so obtaining section 74 stops obtaining searching object data and to temporary transient storage part 64 memory scan object data from data store 62.On the other hand, even if result for retrieval sending part 54 is under the second illegal state, when receiving result request by request receiving portion 52, also the searching object data of given size amount are obtained from temporary transient storage part 64, send to indexing unit 2 as result for retrieval acquired searching object data, and delete the searching object data sent from temporary transient storage part 64.Therefore, under the second illegal state, the tendency that the data volume that there are the searching object data stored in temporary transient storage part 64 reduces.
During being judged as that by the second switching part 72 data volume of the searching object data stored in temporary transient storage part 64 is more than 50MB, continue the second illegal state (reference arrow 83).Further, when the data volume that the second switching part 72 is judged as the searching object data stored in temporary transient storage part 64 is below 50MB, the second License Status (reference arrow 84) is switched to from the second illegal state.
Fig. 5 is the process flow diagram carrying out an example of the sequence flow processed in the indexing unit 2 representing present embodiment.
In step S10, instruction reception portion 12 receives search instruction from client terminal device 5.
In step S12, analysis portion 30 analyzes the search instruction received by instruction reception portion 12, makes the retrieval request for each management devices 3.
In step S14, request sending part 14 sends to each management devices 3 retrieval request made by analysis portion 30.
In step S16, first switching part 32 is confirmed to be the first License Status or the first illegal state, when the first License Status (in step S16 "Yes"), enter step S18, when the first illegal state (in step S16 "No"), enter step S24.
In step S18, first switching part 32 is with reference to the transmission ending message received by result acceptance division 16 (with reference to step S24), be confirmed whether the management devices 3 of the request object that there is result for retrieval, when having management devices 3 of request object (in step S18 "Yes"), enter step S20, when not having management devices 3 of request object (in step S18 "No"), enter step S24.
In step S20, the data volume of the result for retrieval stored in the first switching part 32 reference results storage part 22, determines the request size of the result for retrieval of asking to the management devices 3 of request object.Specifically, the data volume of the result for retrieval stored in the number of units of the management devices 3 that the first switching part 32 does not complete with reference to the transmission of result for retrieval and result storage part 22, determines the request size of the result for retrieval of asking to the management devices 3 of request object.
In step S22, request sending part 14 sends result request to each management devices 3.
In step S24, result acceptance division 16 is confirmed whether to receive result for retrieval from management devices 3, when receiving result for retrieval (in step S24 "Yes"), enter step S26, when not receiving result for retrieval (in step S24 "No"), enter step S33.In addition, when management devices 3 have sent all result for retrieval, result acceptance division 16 receives the transmission ending message representing and have sent all result for retrieval from this management devices 3.
In step S26, the result for retrieval of the given size amount received from management devices 3 is stored in result storage part 22 by result acceptance division 16.
In step S28, first switching part 32 is confirmed to be the first License Status or the first illegal state, when the first License Status (in step S28 "Yes"), enter step S30, when the first illegal state (in step S28 "No"), enter step S34.
In step S30, first switching part 32 confirms whether the data volume of the result for retrieval stored in result storage part 22 is more than first threshold, more than first threshold (in step S30 "Yes"), enter step S32, when not being more than first threshold (in step S30 "No"), enter step S34.
In step S32, the first switching part 32 is switched to the first illegal state from the first License Status.
In step S33, instruction reception portion 12 is confirmed whether that receiving instruction from client terminal device 5 sends the transmission instruction of concentrating result, when receiving (in step S33 "Yes"), enter step S34, when not receiving (in step S33 "No"), enter step S16.
In step S34, concentrate portion 34 according to the concentrated rule made by analysis portion 30, the result for retrieval stored is concentrated, result in spanning set in result storage part 22.
In step S35, concentrate result sending part 18 that the concentrated result generated by concentrated portion 34 is sent to client terminal device 5.
In step S36, portion 34 is concentrated to delete the result for retrieval concentrated from result storage part 22.
In step S38, first switching part 32 is confirmed to be the first illegal state or the first License Status, when the first illegal state (in step S38 "Yes"), enter step S40, when the first License Status (in step S38 "No"), enter step S44.
In step S40, first switching part 32 confirms whether the data volume of the result for retrieval stored in result storage part 22 is below Second Threshold, below Second Threshold (in step S40 "Yes"), enter step S42, when not being below Second Threshold (in step S40 "No"), enter step S44.
In step S42, the first switching part 32 switches to the first License Status from the first illegal state.
In step S46, indexing unit 2 is confirmed whether the transmission completing all concentrated results, when completing all (in step S46 "Yes"), ends process, when not completing (in step S46 "No"), enter step S16.
Fig. 6 is the process flow diagram of an example of the sequence flow representing the process carried out in the management devices 3 of present embodiment.
In step S60, request receiving portion 52 receives retrieval request from indexing unit 2.
In step S62, search part 70 successively retrieves the address of the searching object data meeting retrieval request from data store 62.
In step S64, obtaining section 74 is confirmed to be the second License Status or the second illegal state, when the second License Status (in step S64 "Yes"), enter step S66, when the second illegal state (in step S64 "No"), enter step S72.
In step S66, obtaining section 74 obtains the searching object data shown in address retrieved by search part 70 from data store 62, and is stored in temporary transient storage part 64.
In step S68, second switching part 72 confirms whether the data volume of the searching object data stored in temporary transient storage part 64 is more than the 3rd threshold values, more than the 3rd threshold value (in step S68 "Yes"), enter step S70, when not being more than 3rd threshold value (in step S68 "No"), enter step S72.
In step S70, the second switching part 72 switches to the second illegal state from the second License Status.
In step S72, request receiving portion 52 is confirmed whether to receive result request from indexing unit 2, when receiving result request (in step S72 "Yes"), enter step S74, when not receiving result request (in step S72 "No"), enter step S84.
In step S74, when receiving result request by request receiving portion 52, the size of being specified by result request as the upper limit, is obtained searching object data from temporary transient storage part 64, sends to indexing unit 2 as result for retrieval acquired searching object data by result for retrieval sending part 54.In addition, when result for retrieval sending part 54 have sent all searching object data obtained by obtaining section 74, the transmission ending message representing and have sent all result for retrieval is sent to indexing unit 2.
In step S76, result for retrieval sending part 54 deletes the searching object data sent from temporary transient storage part 64.
In step S78, second switching part 72 is confirmed to be the second illegal state or the second License Status, when the second illegal state (in step S78 "Yes"), enter step S80, when the second License Status (in step S78 "No"), enter step S84.
In step S80, second switching part 72 confirms whether the data volume of the searching object data stored in temporary transient storage part 64 is below the 4th threshold values, below the 4th threshold value (in step S80 "Yes"), enter step S82, when not being below the 4th threshold value (in step S80 "No"), enter step S84.
In step S82, the second switching part 72 switches to the second License Status from the second illegal state.
In step S84, management devices 2 is confirmed whether the transmission completing all result for retrieval (searching object data), when completing all (in step S84 "Yes"), end process, when not completing (in step S84 "No"), enter step S62.
In addition, indexing unit and each management devices of present embodiment comprise the display device such as external memory, display and the input medias such as keyboard or mouse such as the memory storages such as CPU equal controller, ROM or RAM, HDD or removable drive unit, become the hardware that make use of common computing machine and form.
As mentioned above, the indexing unit of present embodiment, to each management devices request result for retrieval, makes the data volume that the data volume of the result for retrieval stored in result storage part becomes corresponding to the memory capacity of result storage part.Further, the result for retrieval stored in result storage part, when being obtained concentrated result from client terminal device by order, not to each management devices request result for retrieval, and is concentrated, is returned to client terminal device by the indexing unit of present embodiment.
Therefore, indexing unit according to the present embodiment, even if when retrieving result for retrieval in a large number, also can not produce buffer overflow etc., and can process result for retrieval successively, and the concentrated result that can shorten client terminal device when not increasing memory capacity obtains the time.
Such as, the result for retrieval of 100 management devices concentrated by the indexing unit of present embodiment, when have sent the result for retrieval of 100GB from each management devices, becomes the result for retrieval adding up to 10TB.Even if in this case, the indexing unit of present embodiment is to each management devices request result for retrieval, the result for retrieval stored in result storage part concentrated successively and returns to client terminal device, making the data volume of the result for retrieval stored in result storage part in the scope of 700MB ~ 1GB degree.Therefore, even if the indexing unit of present embodiment is when retrieving the result for retrieval of 10TB, also can process result for retrieval successively, obtaining above-mentioned effect.
In addition, each management devices of present embodiment obtains searching object data, makes the data volume of the searching object data stored in temporary transient storage part become the data volume corresponding with the memory capacity of temporary transient storage part.Further, the searching object data stored in temporary transient storage part, when obtaining result for retrieval from indexing unit is requested, are sent to indexing unit by each management devices of present embodiment.
Therefore, each management devices according to the present embodiment, even if retrieve a large amount of result for retrieval, when obtaining a large amount of searching object data as result for retrieval, also can process acquired searching object data successively and not produce buffer overflow etc.In addition, each management devices according to the present embodiment, can not memory capacity be increased and shorten the searching object data that indexing unit carries out obtain the time.
Such as, the management devices of present embodiment manages the searching object data of 1,000,000, and its average data size is 100KB.Further, when having carried out from indexing unit retrieving the retrieval request of all searching object data managed by management devices, the searching object data of 100GB have been stored in temporary transient storage part.Even if in this case, the management devices of present embodiment obtains searching object data, according to the request from indexing unit, the searching object data stored in temporary transient storage part are sent to indexing unit successively, makes the data volume of the searching object data stored in temporary transient storage part in the scope of 50MB ~ 100MB degree.Therefore, even if the management devices of present embodiment is when achieving the searching object data of 100GB, also can processes acquired searching object data successively, obtaining above-mentioned effect.
In addition, in the management devices of present embodiment, search part, obtaining section and result for retrieval sending part are configured to action arranged side by side, thus can improve searching object data obtain processing speed.Particularly, in the management devices of present embodiment, be not that the action arranged side by side that control search part, obtaining section and result for retrieval sending part carry out is overall, but according to the data volume of the searching object data stored in temporary transient storage part, only control obtaining of the obtaining section searching object data of carrying out, that need not carry out that complicated control just can improve searching object data obtains processing speed.
In addition, the present invention not former state limits above-mentioned embodiment, implementation phase can make in the scope not departing from its main idea inscape be out of shape specialize.In addition, by multiple inscape disclosed in appropriately combined above-mentioned embodiment, various invention can be formed.Such as, also several inscape can be deleted from the whole inscapes shown in embodiment.
Such as, also the indexing unit of present embodiment and the function of management devices can be realized respectively by executive routine.
In this case, the program performed respectively by indexing unit and the management devices of above-mentioned embodiment is stored in the storage medium of embodied on computer readable with the form that can install or the document form that can perform, and is provided as computer program.In addition, the program that the indexing unit of above-mentioned embodiment and management devices perform respectively also can embed ROM etc. in advance to be provided.
The program performed respectively by indexing unit and the management devices of above-mentioned embodiment becomes the modular structure for realizing above-mentioned each portion on computers.As the hardware of reality, program reads on RAM and is performed by CPU from HDD etc., realizes above-mentioned each portion thus on computers.
Industrial practicality
As mentioned above, the indexing unit that the present invention relates to and system are applicable to the discrete retrieval system that a large amount of searching object data are retrieved.
Symbol description:
2 indexing units
3 management devices
5 client terminal devices
12 instruction reception portion
14 request sending parts
16 result acceptance divisions
18 concentrate result sending part
22 result storage parts
30 analysis portion
32 first switching parts
34 concentrate portion

Claims (5)

1. an indexing unit, is characterized in that, comprising:
Instruction reception portion, receives search instruction from client terminal device;
Analysis portion, analyzes described search instruction, and make respectively for the retrieval request of multiple management devices, the plurality of management devices carries out Decentralization to searching object data;
First switching part, according to the data volume of the result for retrieval for described retrieval request stored in result storage part, switch the first illegal state of result for retrieval described in the first License Status of result for retrieval described in license request and inhibition request, described result storage part temporarily stores described result for retrieval;
Request sending part, sends described retrieval request respectively to described management devices, and when described first License Status, successively sends request the result request of the described result for retrieval of given size amount respectively to described management devices;
Result acceptance division, successively receives the described result for retrieval of described given size amount respectively from described management devices, be stored in described result storage part;
The portion of concentrating, concentrates the described result for retrieval stored in described result storage part, result in spanning set, and from described result storage part, delete the described result for retrieval concentrated; And
Concentrate result sending part, send described concentrated result to described client terminal device.
2. indexing unit according to claim 1, is characterized in that,
The data volume of the described result for retrieval stored in described result storage part when described first License Status is more than first threshold, described first switching part switches to described first illegal state from described first License Status, the data volume of the described result for retrieval stored in described result storage part when described first illegal state be below Second Threshold and this Second Threshold is less than described first threshold, described first switching part switches to described first License Status from described first illegal state.
3. indexing unit according to claim 1, is characterized in that,
When described first License Status, described first switching part, with reference to the data volume of the described result for retrieval stored in described result storage part, determines described given size.
4. a searching system, comprising: indexing unit according to claim 1 and the multiple described management devices be connected with described indexing unit via network, is characterized in that,
Each of described management devices comprises:
Request receiving portion, receives described retrieval request from described indexing unit, and successively receives described result request;
Search part, from the data store storing described searching object data, successively retrieval represents the address meeting the memory location of the described searching object data of described retrieval request;
Second switching part, according to the data volume of the described searching object data stored in temporary transient storage part, switch license obtain the second License Status of described searching object data and forbid obtaining the second illegal state of described searching object data, described temporary transient storage part temporarily stores the described searching object data obtained from described data store;
Obtaining section, when described second License Status, obtains the described searching object data shown in the described address retrieved, is stored in described temporary transient storage part from described data store; And
Result for retrieval sending part, when each described result request is received, the described searching object data of described given size amount are obtained from described temporary transient storage part, send to described indexing unit as described result for retrieval acquired described searching object data, and delete from described temporary transient storage part the described searching object data sent.
5. searching system according to claim 4, is characterized in that,
The data volume of the described searching object data stored in described temporary transient storage part when described second License Status is more than 3rd threshold value, described second switching part switches to described second illegal state from described second License Status, the data volume of the described searching object data stored in described temporary transient storage part when described second illegal state be below the 4th threshold value and the 4th threshold value is less than described 3rd threshold value, described second switching part switches to described second License Status from described second illegal state.
CN200980161042.0A 2009-09-29 2009-09-29 Search device and system Active CN102576360B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2009/066959 WO2011039841A1 (en) 2009-09-29 2009-09-29 Search device and system

Publications (2)

Publication Number Publication Date
CN102576360A CN102576360A (en) 2012-07-11
CN102576360B true CN102576360B (en) 2015-04-01

Family

ID=43825698

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200980161042.0A Active CN102576360B (en) 2009-09-29 2009-09-29 Search device and system

Country Status (3)

Country Link
JP (1) JP5514220B2 (en)
CN (1) CN102576360B (en)
WO (1) WO2011039841A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI607331B (en) 2015-09-23 2017-12-01 財團法人工業技術研究院 Method and device for analyzing data
JP6720792B2 (en) * 2016-09-15 2020-07-08 セイコーエプソン株式会社 Device management device and device management program
JP6468268B2 (en) * 2016-09-23 2019-02-13 カシオ計算機株式会社 Information search device, information search system, information search method, and program

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101496012A (en) * 2006-07-26 2009-07-29 微软公司 Data processing over very large databases

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3004102B2 (en) * 1991-10-04 2000-01-31 株式会社東芝 Database processing unit
JP4021287B2 (en) * 2002-09-09 2007-12-12 日立ソフトウエアエンジニアリング株式会社 Database search program, database search method and database search device
WO2005106713A1 (en) * 2004-04-28 2005-11-10 Shinji Furusho Information processing method and information processing system

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101496012A (en) * 2006-07-26 2009-07-29 微软公司 Data processing over very large databases

Also Published As

Publication number Publication date
WO2011039841A1 (en) 2011-04-07
CN102576360A (en) 2012-07-11
JPWO2011039841A1 (en) 2013-02-21
JP5514220B2 (en) 2014-06-04

Similar Documents

Publication Publication Date Title
US9047330B2 (en) Index compression in databases
US20190278783A1 (en) Compaction policy
CN110347684B (en) Block chain based hierarchical storage method and device and electronic equipment
US20020040376A1 (en) Process for managing data in which existing data item is moved to neighbor page before insertion or after deletion of another data item
CN104809182A (en) Method for web crawler URL (uniform resource locator) deduplicating based on DSBF (dynamic splitting Bloom Filter)
CN102819586A (en) Uniform Resource Locator (URL) classifying method and equipment based on cache
CN102576360B (en) Search device and system
Fu et al. Optimized data replication for small files in cloud storage systems
Zhai et al. Hadoop perfect file: A fast and memory-efficient metadata access archive file to face small files problem in hdfs
CN110442580A (en) A kind of block chain state date storage method, equipment and storage medium
CN111782659A (en) Database index creation method and device, computer equipment and storage medium
CN104956340A (en) Scalable data deduplication
CN115454994A (en) Metadata storage method and device based on distributed key value database
CN102780780B (en) Method, equipment and system for data processing in cloud computing mode
JP5907251B2 (en) Database management method, program, and information processing apparatus
US20150186549A1 (en) Tiered Index Management
CN113544683A (en) Data generalization device, data generalization method, and program
EP3995972A1 (en) Metadata processing method and apparatus, and computer-readable storage medium
He et al. SLC-index: A scalable skip list-based index for cloud data processing
EP3091447B1 (en) Method for modifying root nodes and modifying apparatus
CN112328629B (en) Entity object processing method and device and electronic equipment
KR100744378B1 (en) Method for searching database with hash function
CN114116612B (en) Access method for index archive file based on B+ tree
CN112711627B (en) Data importing method, device and equipment of Greemplum database
JP3623873B2 (en) Database management system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant