CN105468758A - Data retrieval method and device - Google Patents

Data retrieval method and device Download PDF

Info

Publication number
CN105468758A
CN105468758A CN201510857487.XA CN201510857487A CN105468758A CN 105468758 A CN105468758 A CN 105468758A CN 201510857487 A CN201510857487 A CN 201510857487A CN 105468758 A CN105468758 A CN 105468758A
Authority
CN
China
Prior art keywords
index
data
retrieval
base
currently
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510857487.XA
Other languages
Chinese (zh)
Other versions
CN105468758B (en
Inventor
虞航仲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Kingsoft Internet Security Software Co Ltd
Original Assignee
Beijing Kingsoft Internet Security Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Kingsoft Internet Security Software Co Ltd filed Critical Beijing Kingsoft Internet Security Software Co Ltd
Priority to CN201510857487.XA priority Critical patent/CN105468758B/en
Publication of CN105468758A publication Critical patent/CN105468758A/en
Application granted granted Critical
Publication of CN105468758B publication Critical patent/CN105468758B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data retrieval method and device. The data retrieval method is applied to a data retrieval device, and comprises the following steps: constructing two identical index libraries corresponding to basic data serving as retrieval objects; when auxiliary data serving as a retrieval object is obtained, determining a first index base to be reconstructed from at least one index base which is not currently subjected to data retrieval; reconstructing the first index base according to the current corresponding retrieval object of the first index base and the auxiliary data obtained this time; the method further comprises the following steps: when a data retrieval request is obtained, determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation; and determining a retrieval result corresponding to the data retrieval request based on the second index library and the corresponding retrieval object. By the aid of the scheme, influence on response to the data retrieval request can be avoided in the process of updating the index database by the aid of the auxiliary data.

Description

Data retrieval method and device
Technical Field
The present invention relates to the field of data retrieval technologies, and in particular, to a data retrieval method and apparatus.
Background
In order to improve the retrieval efficiency, the data retrieval device usually constructs an index library for the retrieval object, and then performs data retrieval based on the index library, where the index library is usually: some information extracted from the retrieval object is organized as index information. For example: for a document, the corresponding index information is the text content extracted from the document or the attribute parameters of the document, and the attribute parameters of the document may be: author name, document category, etc.
When the data retrieval device is started, an index base corresponding to basic data which currently exists and is used as a retrieval object is constructed, and then subsequent data retrieval is executed based on the index base; moreover, as data resources are increasing, auxiliary data needs to be added to perfect existing search objects, and at this time, the corresponding index library also needs to be updated, for example: for data retrieval devices of hundredths, google, etc., as daily network resources are increasing, it is undoubtedly necessary to increase auxiliary data to perfect the retrieval object, and then the corresponding index database needs to be updated continuously. In the prior art, after obtaining the added auxiliary data, the data retrieval device reconstructs the currently existing index base on the basis of the added auxiliary data and the currently existing retrieval object each time, so that after reconstruction is completed, data retrieval is subsequently performed on the basis of the reconstructed index base.
Although the existing method can ensure the index base to effectively correspond to the retrieval object, the reconstruction of the old index base after the auxiliary data is obtained each time is executed, which undoubtedly results in influencing the response to the data retrieval request in the updating process.
Disclosure of Invention
Embodiments of the present invention provide a data retrieval method and apparatus, so as to avoid influencing a response to a data retrieval request during updating an index database with auxiliary data. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides a data retrieval method, which is applied to a data retrieval device, and the method includes:
constructing two identical index libraries corresponding to basic data serving as retrieval objects;
when auxiliary data serving as a retrieval object is obtained, determining a first index base to be reconstructed from at least one index base which is not currently subjected to data retrieval;
reconstructing the first index base according to the current corresponding retrieval object of the first index base and the auxiliary data obtained this time;
the method further comprises the following steps:
when a data retrieval request is obtained, determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation;
and determining a retrieval result corresponding to the data retrieval request based on the second index library and the corresponding retrieval object.
Optionally, the determining a first index base to be reconstructed from at least one index base currently not subjected to data retrieval includes:
if the number of the index bases which are not subjected to data retrieval is two, randomly selecting one index base as a first index base to be reconstructed;
if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as a first index base to be reconstructed;
the determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation includes:
if the number of the index libraries which are not currently subjected to the reconstruction operation is two, one index library is randomly selected as a second index library to be utilized;
and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
Optionally, the determining a first index base to be reconstructed from at least one index base currently not subjected to data retrieval includes:
if the number of the index libraries which are not subjected to data retrieval at present is two, taking the index library which is not subjected to reconstruction operation according to the auxiliary data obtained last time as a first index library to be reconstructed;
if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as a first index base to be reconstructed;
the determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation includes:
if the number of the index libraries which are not subjected to the reconstruction operation is two, judging whether the two index libraries which are not subjected to the reconstruction operation are not reconstructed, if so, randomly selecting one index library as a second index library to be utilized, and otherwise, taking the index library which is subjected to the reconstruction operation according to the auxiliary data obtained last time as the second index library to be utilized;
and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
Optionally, the determining, based on the second index library and the corresponding retrieval object, a retrieval result corresponding to the data retrieval request includes:
and determining whether index information matched with the search word carried by the data search request exists in the second index database, and if so, determining a search result corresponding to the data search request from a search object currently corresponding to the second index database.
Optionally, the manner of obtaining the auxiliary data as the retrieval object includes:
acquiring auxiliary data serving as a retrieval object based on a mode of uploading data by a web crawler at regular time;
or,
auxiliary data as a retrieval object is obtained based on the manner of requesting data from the web crawler at regular time.
Optionally, the manner of obtaining the auxiliary data as the retrieval object includes:
auxiliary data to be retrieved is obtained based on a manual data importing method.
Optionally, a manner of reconstructing the first index library is the same as a manner of constructing two identical index libraries corresponding to the basic data as the retrieval object.
Optionally, both the manner of reconstructing the first index library and the manner of constructing two identical index libraries corresponding to the basic data as the retrieval object are: the table arrangement mode is reversed.
In a second aspect, an embodiment of the present invention provides a data retrieval apparatus, including:
the index database construction module is used for constructing two same index databases corresponding to basic data serving as a retrieval object;
a to-be-reconstructed index base determination module configured to determine, when auxiliary data serving as a retrieval target is obtained, a first index base to be reconstructed from at least one index base on which data retrieval is not currently performed;
the index database reconstruction module is used for reconstructing the first index database according to the current corresponding retrieval object of the first index database and the auxiliary data obtained this time;
the device further comprises:
the to-be-utilized index base determining module is used for determining a second index base to be utilized from at least one index base which is not currently executed with reconstruction operation when the data retrieval request is obtained;
and the index result determining module is used for determining a retrieval result corresponding to the data retrieval request based on the second index library and the corresponding retrieval object.
Optionally, the module for determining the index library to be reconstructed includes:
a first to-be-reconstructed index base determination unit configured to, when the auxiliary data as the retrieval object is obtained, randomly select one index base as a first index base to be reconstructed if there are two index bases on which data retrieval is not currently performed; if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as a first index base to be reconstructed;
the to-be-utilized index base determination module comprises:
a first to-be-utilized index base determining unit configured to, when a data retrieval request is obtained, randomly select one index base as a second index base to be utilized if there are two index bases on which a reconfiguration operation is not currently performed; and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
Optionally, the module for determining the index library to be reconstructed includes:
a second to-be-reconstructed index base determination unit configured to, when the auxiliary data as the retrieval target is obtained, if there are two index bases for which data retrieval is not currently performed, take an index base for which a reconstruction operation is not performed based on the auxiliary data obtained last time as a first index base to be reconstructed; if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as a first index base to be reconstructed;
the to-be-utilized index base determination module comprises:
a second to-be-utilized index base determining unit, configured to, when a data retrieval request is obtained, determine whether both of two index bases that are not currently subjected to a reconfiguration operation are not reconfigured, if so, randomly select one index base as a second index base to be utilized, and otherwise, select an index base that is subjected to a reconfiguration operation according to auxiliary data obtained last time as the second index base to be utilized; and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
Optionally, the index result determining module includes:
and the index result determining unit is used for determining whether index information matched with the search word carried by the data search request exists in the second index library, and if so, determining a search result corresponding to the data search request from a search object currently corresponding to the second index library.
Optionally, the manner for obtaining the auxiliary data as the retrieval object by the to-be-reconstructed index base determining module includes:
acquiring auxiliary data serving as a retrieval object based on a mode of uploading data by a web crawler at regular time;
or,
auxiliary data as a retrieval object is obtained based on the manner of requesting data from the web crawler at regular time.
Optionally, the manner for obtaining the auxiliary data as the retrieval object by the to-be-reconstructed index base determining module includes:
auxiliary data to be retrieved is obtained based on a manual data importing method.
Optionally, the index library reconstructing module reconstructs the first index library in the same manner as that in which the index library constructing module constructs two identical index libraries corresponding to the basic data to be retrieved.
Optionally, a manner of reconstructing the first index library by the index library reconstruction module and a manner of constructing two identical index libraries corresponding to the basic data as the retrieval object by the index library construction module are both: the table arrangement mode is reversed.
Compared with the prior art, the method has the advantages that two index bases are constructed in advance, after the auxiliary data are obtained, at least one index base which is not subjected to data retrieval at present is determined from the two index bases, the first index base to be reconstructed is determined from the at least one index base which is not subjected to data retrieval at present, and the first index base is reconstructed according to the currently corresponding retrieval object of the first index base and the auxiliary data obtained at this time; further, when the data retrieval request is obtained, at least one index library which is not currently executed with reconstruction operation is determined from the two index libraries; determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation; and determining a retrieval result corresponding to the data retrieval request based on the second index database and the corresponding retrieval object, and finally achieving the purpose of avoiding influencing the response to the data retrieval request in the process of updating the index database by using auxiliary data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a flowchart of a data retrieval method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a data retrieval device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to avoid influencing the response to the data retrieval request in the process of updating the index database by using auxiliary data, the embodiment of the invention provides a data retrieval method and a data retrieval device.
First, a data retrieval method provided by an embodiment of the present invention is described below.
It should be noted that the data retrieval method provided by the embodiment of the present invention is applied to a data retrieval device.
As shown in fig. 1, a data retrieval method provided in an embodiment of the present invention may include the following steps:
s101, constructing two identical index libraries corresponding to basic data serving as retrieval objects;
when the data retrieval device is started, a preset construction mode can be adopted to construct two identical index libraries corresponding to basic data serving as retrieval objects; furthermore, it is understood that the obtaining manner of the basic data can be obtained by using the prior art, for example: it is reasonable to import manually, or crawl through web crawlers, etc.
The predetermined construction method may adopt an existing method, for example: the predetermined construction mode may be an inverted list mode, and the like.
It should be emphasized that the "base" in the "base data" and the "auxiliary" in the "auxiliary data" are only data to be searched that exist at different times, and do not have any limiting meaning; similarly, the "first" in the "first index repository" and the "second" in the "second index repository" are only used to distinguish the index repository to be reconstructed from the index repository to be utilized, and are not meant in any limiting sense.
S102, when auxiliary data serving as a retrieval object is obtained, determining a first index library to be reconstructed from at least one index library which is not subjected to data retrieval currently;
in order to enrich the retrieval object, the auxiliary data as the retrieval object may be obtained multiple times, and when the auxiliary data as the retrieval object is obtained each time, in order to ensure that the data retrieval and the index base reconstruction are not affected, a first index base to be reconstructed may be determined from at least one index base on which the data retrieval is not currently performed; the first index base to be reconstructed is an index base which is not currently executed with data retrieval, so that the other index base can be used as a basis for data retrieval requests in the process of reconstructing the first index base, and response to the data retrieval requests is prevented from being influenced in the process of updating the index base by using auxiliary data. Further, each index repository may be correspondingly provided with a state identifier, and it can be known whether the index repository is reconstructed and the current state is located through the state identifier, where the current state includes: an executed reconfiguration state, a executed data retrieval state, and an idle state, the idle state being neither reconfigured nor data retrieved, for example: the state identifier 000 indicates that no reconstruction has been performed and currently belongs to the idle state; the state identifier 010 indicates that reconstruction is not performed and currently belongs to the state of data retrieval being performed; the state identifier 100 indicates that reconstruction was performed and currently belongs to the idle state; the state identifier 110 indicates that reconstruction was performed and currently belongs to the state of being performed data retrieval, the state identifier 101 indicates that reconstruction was performed and currently belongs to the state of being reconstructed, and so on.
Specifically, in an implementation manner, the determining a first index base to be reconstructed from at least one index base currently not subjected to data retrieval may include:
if the number of the index bases which are not subjected to data retrieval is two, randomly selecting one index base as a first index base to be reconstructed;
and if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as the first index base to be reconstructed.
In a second implementation manner, in order to ensure that both index libraries have higher availability, the two index libraries are reconfigured in turn, and based on this idea, the determining a first index library to be reconfigured from at least one index library which is not currently executed with data retrieval may include:
if the number of the index libraries which are not subjected to data retrieval at present is two, taking the index library which is not subjected to reconstruction operation according to the auxiliary data obtained last time as a first index library to be reconstructed;
if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as a first index base to be reconstructed;
it is to be understood that, in the second implementation, if there are two index libraries that are not currently performing data retrieval and both index libraries have not been performed with reconstruction, one index library may be randomly selected as the first index library to be reconstructed.
In addition, there are various ways to obtain the auxiliary data as the retrieval target, and for clarity of the solution, several specific implementations are described below:
the first mode is to obtain auxiliary data as a retrieval object based on a mode of uploading data regularly by a web crawler.
In the implementation mode, a web crawler crawls auxiliary data serving as a retrieval object on a network according to a preset crawling task, and the crawled auxiliary data are uploaded to a data retrieval device at regular time; the method for crawling the auxiliary data on the network by the web crawler can adopt the method for crawling the network data by the web crawler in the prior art.
In the second mode, the auxiliary data to be retrieved is obtained based on the mode of requesting data from the web crawler at regular time.
In this implementation, the web crawler crawls auxiliary data on the network as a retrieval target according to a predetermined crawling task, caches the crawled network data, and the data retrieval device periodically requests the crawled network data from the web crawler, wherein the way in which the web crawler crawls the auxiliary data on the network may be the way in which the web crawler crawls the network data in the prior art.
In the third method, auxiliary data to be retrieved is obtained based on the manual data importing method.
In this implementation, the auxiliary data to be retrieved may be manually acquired, and the auxiliary service may be manually imported through a data import portal provided by the data retrieval device.
It should be emphasized that the above-described manner of obtaining auxiliary data as a retrieval target is merely an example, and should not be construed as limiting the embodiments of the present invention.
S103, reconstructing the first index base according to the current corresponding retrieval object of the first index base and the auxiliary data obtained this time;
after the first index library to be utilized is determined, the first index library can be reconstructed according to the currently corresponding retrieval object of the first index library and the auxiliary data obtained this time.
It should be noted that the manner of reconstructing the first index library is the same as the manner of constructing two identical index libraries corresponding to the basic data to be retrieved, for example: the method of reconstructing the first index library and the method of constructing two identical index libraries corresponding to the basic data to be searched may be in the form of an inverted table, but the method is not limited to this.
S104, when a data retrieval request is obtained, determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation;
when a data retrieval request is obtained, in order to respond to the data retrieval request, a second index library to be utilized can be determined from at least one index library which is not currently executed with reconstruction operation; the second index base is an index base which is not currently executed with reconstruction operation, so as to ensure that another index base can carry out reconstruction operation based on the obtained auxiliary data in the data retrieval process.
In one implementation, specifically, based on the first implementation manner of determining the first index repository to be utilized, correspondingly, the determining the second index repository to be utilized from at least one index repository on which the restructuring operation is not currently performed may include:
if the number of the index libraries which are not currently subjected to the reconstruction operation is two, one index library is randomly selected as a second index library to be utilized;
and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
In another implementation manner, specifically, based on the second implementation manner of determining the first index repository to be utilized, correspondingly, determining the second index repository to be utilized from at least one index repository on which the reconstructing operation is not currently performed may include:
if the number of the index libraries which are not subjected to the reconstruction operation is two, judging whether the two index libraries which are not subjected to the reconstruction operation are not reconstructed, if so, randomly selecting one index library as a second index library to be utilized, and otherwise, taking the index library which is subjected to the reconstruction operation according to the auxiliary data obtained last time as the second index library to be utilized;
and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
In another implementation manner, in order to ensure that the index libraries are reconstructed in turn, if there are two index libraries for which the reconstruction operation is not currently performed, it may be determined whether both the two index libraries for which the reconstruction operation is not currently performed are not reconstructed, if so, it indicates that the two index libraries are completely the same, at this time, one index library may be randomly selected as the second index library to be used, otherwise, in order to ensure that the index libraries are reconstructed in turn, the index library for which the reconstruction operation is performed according to the auxiliary data obtained last time may be used as the second index library to be used.
S105, based on the second index database and the corresponding retrieval object, determining the retrieval result corresponding to the data retrieval request.
After the second index library to be utilized is determined, a retrieval result corresponding to the data retrieval request may be determined based on the second index library and the corresponding retrieval object. Of course, after the search result corresponding to the data search request is determined, the search result may be output so that the sender of the data search request can know the search result.
Specifically, the determining a search result corresponding to the data search request based on the second index repository and the corresponding search object may include:
and determining whether index information matched with the search word carried by the data search request exists in the second index database, and if so, determining a search result corresponding to the data search request from a search object currently corresponding to the second index database.
It can be understood that the data index request carries a search term, and it is reasonable that at least one piece of index information matched with the search term exists in the second index library, or there is no index information matched with the search information. The matching of the search term and the index information specifically means: the content of part of the search term is the same as the index information, the search term is the same as the index information and/or the search term is contained in the index information, and so on. And when the second index library does not have index information matched with the search word carried by the data search request, determining that the search result corresponding to the data search request is empty. For example: assuming that the search term is "notebook", for the case that the search term matches the index information as the search term is included in the index information, the index information matching the search term may include: "notebook rank", for the case that the search term matches the index information as the content of part of the search term is the same as the index information, the index information matching the search term may include: the "cloud note" may include a "notebook" in the case where the search term and the index information match each other such that the search term and the index information are the same.
Further, when index information matched with the search word exists in a second index database, the search result corresponding to the index information in the search object corresponding to the second index database is the search result corresponding to the data search request; and when at least two index information matched with the search terms exist in the second index database, respectively determining primary search results corresponding to the at least two index information from the search objects corresponding to the second index database, and determining a result obtained after the primary search results are subjected to union set as the search result corresponding to the data search request.
It should be emphasized that, the specific implementation manner for determining the search result corresponding to the data search request from the search object currently corresponding to the second index repository may adopt the prior art, and is not described herein again.
For example: the data search device constructs two identical index libraries corresponding to basic data to be searched: the system comprises an index base A and an index base B, wherein when the system is reconstructed according to auxiliary data for the first time, one index base is randomly selected to carry out reconstruction operation, and then the index base A and the index base B are reconstructed in turn; and when obtaining the data retrieval request, if the index library A and the index library B are not executed with reconstruction operation currently, judging whether the index library A and the index library B are not reconstructed, if so, randomly selecting one index library to execute the data retrieval, otherwise, taking the index library with the latest index information as the index library according to which the data retrieval is based, namely taking the index library which is subjected to reconstruction operation according to the auxiliary data obtained last time as the index library according to which the data retrieval is based.
Compared with the prior art, the method has the advantages that two index bases are constructed in advance, after the auxiliary data are obtained, at least one index base which is not subjected to data retrieval at present is determined from the two index bases, the first index base to be reconstructed is determined from the at least one index base which is not subjected to data retrieval at present, and the first index base is reconstructed according to the currently corresponding retrieval object of the first index base and the auxiliary data obtained at this time; further, when the data retrieval request is obtained, at least one index library which is not currently executed with reconstruction operation is determined from the two index libraries; determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation; and determining a retrieval result corresponding to the data retrieval request based on the second index database and the corresponding retrieval object, and finally achieving the purpose of avoiding influencing the response to the data retrieval request in the process of updating the index database by using auxiliary data.
Corresponding to the above method embodiment, an embodiment of the present invention further provides a data retrieval device, as shown in fig. 2, the device may include:
an index base constructing module 210, configured to construct two identical index bases corresponding to basic data serving as a retrieval object;
a to-be-reconstructed index base determining module 220, configured to determine, when the auxiliary data serving as the retrieval target is obtained, a first index base to be reconstructed from at least one index base on which data retrieval is not currently performed;
an index database reconstructing module 230, configured to reconstruct the first index database according to the currently corresponding retrieved object of the first index database and the auxiliary data obtained this time;
the device further comprises:
a to-be-utilized index base determining module 240, configured to determine, when the data retrieval request is obtained, a second index base to be utilized from at least one index base on which the reconfiguration operation is not currently performed;
an index result determining module 250, configured to determine, based on the second index repository and the corresponding search object, a search result corresponding to the data search request.
Compared with the prior art, the method has the advantages that two index bases are constructed in advance, after the auxiliary data are obtained, at least one index base which is not subjected to data retrieval at present is determined from the two index bases, the first index base to be reconstructed is determined from the at least one index base which is not subjected to data retrieval at present, and the first index base is reconstructed according to the currently corresponding retrieval object of the first index base and the auxiliary data obtained at this time; further, when the data retrieval request is obtained, at least one index library which is not currently executed with reconstruction operation is determined from the two index libraries; determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation; and determining a retrieval result corresponding to the data retrieval request based on the second index database and the corresponding retrieval object, and finally achieving the purpose of avoiding influencing the response to the data retrieval request in the process of updating the index database by using auxiliary data.
In one implementation, the to-be-reconstructed index database determining module 220 may include:
a first to-be-reconstructed index base determination unit configured to, when the auxiliary data as the retrieval object is obtained, randomly select one index base as a first index base to be reconstructed if there are two index bases on which data retrieval is not currently performed; if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as a first index base to be reconstructed;
the to-be-utilized index repository determining module 240 may include:
a first to-be-utilized index base determining unit configured to, when a data retrieval request is obtained, randomly select one index base as a second index base to be utilized if there are two index bases on which a reconfiguration operation is not currently performed; and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
In a second implementation manner, the to-be-reconstructed index database determining module 220 may include:
a second to-be-reconstructed index base determination unit configured to, when the auxiliary data as the retrieval target is obtained, if there are two index bases for which data retrieval is not currently performed, take an index base for which a reconstruction operation is not performed based on the auxiliary data obtained last time as a first index base to be reconstructed; if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as a first index base to be reconstructed;
the to-be-utilized index repository determining module 240 may include:
a second to-be-utilized index base determining unit, configured to, when a data retrieval request is obtained, determine whether both of two index bases that are not currently subjected to a reconfiguration operation are not reconfigured, if so, randomly select one index base as a second index base to be utilized, and otherwise, select an index base that is subjected to a reconfiguration operation according to auxiliary data obtained last time as the second index base to be utilized; and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
Specifically, the index result determining module 250 may include:
and the index result determining unit is used for determining whether index information matched with the search word carried by the data search request exists in the second index library, and if so, determining a search result corresponding to the data search request from a search object currently corresponding to the second index library.
Specifically, the manner for the to-be-reconstructed index base determination module to obtain the auxiliary data serving as the retrieval object includes:
acquiring auxiliary data serving as a retrieval object based on a mode of uploading data by a web crawler at regular time;
or,
auxiliary data as a retrieval object is obtained based on the manner of requesting data from the web crawler at regular time.
Specifically, the manner for the to-be-reconstructed index base determination module to obtain the auxiliary data serving as the retrieval object includes:
auxiliary data to be retrieved is obtained based on a manual data importing method.
Specifically, the manner in which the index base reconstruction module reconstructs the first index base is the same as the manner in which the index base construction module reconstructs two identical index bases corresponding to the basic data serving as the retrieval object.
Specifically, the manner in which the index base reconstruction module reconstructs the first index base and the manner in which the index base construction module reconstructs two identical index bases corresponding to the basic data serving as the retrieval object are both: the table arrangement mode is reversed.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (10)

1. A data retrieval method is applied to a data retrieval device, and the method comprises the following steps:
constructing two identical index libraries corresponding to basic data serving as retrieval objects;
when auxiliary data serving as a retrieval object is obtained, determining a first index base to be reconstructed from at least one index base which is not currently subjected to data retrieval;
reconstructing the first index base according to the current corresponding retrieval object of the first index base and the auxiliary data obtained this time;
the method further comprises the following steps:
when a data retrieval request is obtained, determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation;
and determining a retrieval result corresponding to the data retrieval request based on the second index library and the corresponding retrieval object.
2. The method according to claim 1, wherein determining a first index base to be reconstructed from at least one index base currently not being subjected to data retrieval comprises:
if the number of the index bases which are not subjected to data retrieval is two, randomly selecting one index base as a first index base to be reconstructed;
if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as a first index base to be reconstructed;
the determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation includes:
if the number of the index libraries which are not currently subjected to the reconstruction operation is two, one index library is randomly selected as a second index library to be utilized;
and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
3. The method according to claim 1, wherein determining a first index base to be reconstructed from at least one index base currently not being subjected to data retrieval comprises:
if the number of the index libraries which are not subjected to data retrieval at present is two, taking the index library which is not subjected to reconstruction operation according to the auxiliary data obtained last time as a first index library to be reconstructed;
if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as a first index base to be reconstructed;
the determining a second index library to be utilized from at least one index library which is not currently executed with reconstruction operation includes:
if the number of the index libraries which are not subjected to the reconstruction operation is two, judging whether the two index libraries which are not subjected to the reconstruction operation are not reconstructed, if so, randomly selecting one index library as a second index library to be utilized, and otherwise, taking the index library which is subjected to the reconstruction operation according to the auxiliary data obtained last time as the second index library to be utilized;
and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
4. The method according to any one of claims 1 to 3, wherein the determining a search result corresponding to the data search request based on the second index repository and the corresponding search object comprises:
and determining whether index information matched with the search word carried by the data search request exists in the second index database, and if so, determining a search result corresponding to the data search request from a search object currently corresponding to the second index database.
5. The method according to any one of claims 1 to 3, wherein the manner of obtaining auxiliary data as a retrieval object includes:
acquiring auxiliary data serving as a retrieval object based on a mode of uploading data by a web crawler at regular time;
or,
auxiliary data as a retrieval object is obtained based on the manner of requesting data from the web crawler at regular time.
6. The method according to any one of claims 1 to 3, wherein the manner of obtaining auxiliary data as a retrieval object includes:
auxiliary data to be retrieved is obtained based on a manual data importing method.
7. The method according to any one of claims 1 to 3, wherein the first index library is reconstructed in the same manner as two identical index libraries corresponding to the basic data as the retrieval object are constructed.
8. The method according to claim 7, wherein the first index library is reconstructed in a manner that two identical index libraries corresponding to the basic data to be retrieved are constructed: the table arrangement mode is reversed.
9. A data retrieval device, comprising:
the index database construction module is used for constructing two same index databases corresponding to basic data serving as a retrieval object;
a to-be-reconstructed index base determination module configured to determine, when auxiliary data serving as a retrieval target is obtained, a first index base to be reconstructed from at least one index base on which data retrieval is not currently performed;
the index database reconstruction module is used for reconstructing the first index database according to the current corresponding retrieval object of the first index database and the auxiliary data obtained this time;
the device further comprises:
the to-be-utilized index base determining module is used for determining a second index base to be utilized from at least one index base which is not currently executed with reconstruction operation when the data retrieval request is obtained;
and the index result determining module is used for determining a retrieval result corresponding to the data retrieval request based on the second index library and the corresponding retrieval object.
10. The apparatus of claim 9, wherein the index repository to be reconstructed determining module comprises:
a first to-be-reconstructed index base determination unit configured to, when the auxiliary data as the retrieval object is obtained, randomly select one index base as a first index base to be reconstructed if there are two index bases on which data retrieval is not currently performed; if the index base which is not subjected to data retrieval currently is one, taking the index base which is not subjected to data retrieval currently as a first index base to be reconstructed;
the to-be-utilized index base determination module comprises:
a first to-be-utilized index base determining unit configured to, when a data retrieval request is obtained, randomly select one index base as a second index base to be utilized if there are two index bases on which a reconfiguration operation is not currently performed; and if the index base which is not currently subjected to the reconstruction operation is one, taking the index base which is not currently subjected to the reconstruction operation as a second index base to be utilized.
CN201510857487.XA 2015-11-30 2015-11-30 Data retrieval method and device Active CN105468758B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510857487.XA CN105468758B (en) 2015-11-30 2015-11-30 Data retrieval method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510857487.XA CN105468758B (en) 2015-11-30 2015-11-30 Data retrieval method and device

Publications (2)

Publication Number Publication Date
CN105468758A true CN105468758A (en) 2016-04-06
CN105468758B CN105468758B (en) 2019-08-09

Family

ID=55606458

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510857487.XA Active CN105468758B (en) 2015-11-30 2015-11-30 Data retrieval method and device

Country Status (1)

Country Link
CN (1) CN105468758B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229358A (en) * 2017-12-22 2018-06-29 北京市商汤科技开发有限公司 Index establishing method and device, electronic equipment, computer storage media, program
CN108874983A (en) * 2018-06-12 2018-11-23 陕西师范大学 A kind of computerized data retrieval method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052284A1 (en) * 2006-08-05 2008-02-28 Terry Stokes System and Method for the Capture and Archival of Electronic Communications
CN101246500A (en) * 2008-03-27 2008-08-20 腾讯科技(深圳)有限公司 Retrieval system and method for implementing data fast indexing
CN101882142A (en) * 2009-05-08 2010-11-10 富士通株式会社 Index combining method and index combining device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080052284A1 (en) * 2006-08-05 2008-02-28 Terry Stokes System and Method for the Capture and Archival of Electronic Communications
CN101246500A (en) * 2008-03-27 2008-08-20 腾讯科技(深圳)有限公司 Retrieval system and method for implementing data fast indexing
CN101882142A (en) * 2009-05-08 2010-11-10 富士通株式会社 Index combining method and index combining device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229358A (en) * 2017-12-22 2018-06-29 北京市商汤科技开发有限公司 Index establishing method and device, electronic equipment, computer storage media, program
CN108874983A (en) * 2018-06-12 2018-11-23 陕西师范大学 A kind of computerized data retrieval method

Also Published As

Publication number Publication date
CN105468758B (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN110427368B (en) Data processing method and device, electronic equipment and storage medium
US11580168B2 (en) Method and system for providing context based query suggestions
US9147000B2 (en) Method and system for recommending websites
CN108052632B (en) Network information acquisition method and system and enterprise information search system
US20150169711A1 (en) System and method for crowdsourced template based search
US20140280070A1 (en) System and method for providing technology assisted data review with optimizing features
CN110704411A (en) Knowledge graph building method and device suitable for art field and electronic equipment
CN105431844A (en) Third party search applications for a search system
EP3353683A1 (en) Advanced computer implementation for crawling and/or detecting related electronically catalogued data using improved metadata processing
WO2014197227A1 (en) Natural language search results for intent queries
RU2645266C1 (en) Method and device for planning web-crowlers in accordance with keyword search
CN103136342B (en) The searching method of application A PP, system and search server
CN102760058B (en) Massive software project sharing method oriented to large-scale collaborative development
US10394939B2 (en) Resolving outdated items within curated content
CN103927310B (en) Generation method and device are suggested in a kind of search of map datum
CN103617241B (en) Search information processing method, browser terminal and server
US20160253172A1 (en) Indicating a trait of a continuous delivery pipeline
US20110208715A1 (en) Automatically mining intents of a group of queries
CN107463592B (en) Method, device and data processing system for matching a content item with an image
AU2011227327A1 (en) Indexing and searching employing virtual documents
CN104965918B (en) A kind of searching method and device based on searching keyword
CN110889023A (en) Distributed multifunctional search engine of elastic search
CN106210150A (en) The content supplying system of a kind of Behavior-based control analysis and method
CN105488165B (en) Data retrieval method and system based on index database
JP5352712B2 (en) Search method, integrated search server, and computer program

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant