WO2010037314A1 - A method for searching and the device and system thereof - Google Patents

A method for searching and the device and system thereof Download PDF

Info

Publication number
WO2010037314A1
WO2010037314A1 PCT/CN2009/073971 CN2009073971W WO2010037314A1 WO 2010037314 A1 WO2010037314 A1 WO 2010037314A1 CN 2009073971 W CN2009073971 W CN 2009073971W WO 2010037314 A1 WO2010037314 A1 WO 2010037314A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
search
interest model
search request
member engine
Prior art date
Application number
PCT/CN2009/073971
Other languages
French (fr)
Chinese (zh)
Inventor
胡汉强
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP09817228A priority Critical patent/EP2352102A4/en
Publication of WO2010037314A1 publication Critical patent/WO2010037314A1/en
Priority to US13/070,265 priority patent/US8527509B2/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • the present invention relates to the field of communication technologies, and in particular, to a search method, system and apparatus.
  • the search server collects the meta-index of the member engine, and the search server calculates the similarity between the search request and the member engine according to the meta-index, and selects a member engine with high similarity to serve the user, Search requests are distributed to these selected member engines for searching.
  • the inventors found that the selected member engine in the meta search scheme may be inaccurate, resulting in low accuracy of the search.
  • a method for searching comprising: receiving a search request; extracting a user's interest model from the user personalized data according to the search request; acquiring a meta index of each member engine; and searching, according to the meta index of each member engine
  • the request and the user's interest model select a member engine, and send the search request to the selected member engine to facilitate the selected member engine to complete the search.
  • a system for searching the system capable of applying the above search method, the system comprising: a search service subsystem, configured to receive a search request, and receive a meta index reported by each member engine, according to each member engine a meta index, the search request, and a user's interest model selection member engine, to send the search request to the selected member engine;
  • At least one member engine is configured to report the meta index of the member engine to the search service subsystem, and complete the search after receiving the search request sent by the search service subsystem.
  • Each of the foregoing embodiments extracts the interest model of the user after receiving the search request, and selects a member engine according to the meta index of the member engines, the search request, and the interest model of the user, that is, the selected member engines. Taking full account of the factors of the search request and the user's interest model, after the search is completed by the selected member engines, the selection of the member engine is more personalized, and the selected engine is related to the user's interest, thus improving the system scheduling (or Select) the efficiency and accuracy of the search.
  • 1 is a schematic structural diagram of a system according to an embodiment of the present invention
  • 2 is a schematic structural diagram of a system according to another embodiment of the present invention
  • FIG. 3 is a schematic structural view of a device in the system architecture shown in FIG. 2;
  • FIG. 4 is a schematic structural diagram of a system according to another embodiment of the present invention.
  • Figure 5 is a schematic structural view of the device in the system architecture shown in Figure 4;
  • FIG. 6 is a schematic structural diagram of a system according to another embodiment of the present invention.
  • FIG. 7 is a schematic structural view of a device in the system architecture of FIG. 6;
  • FIG. 8 is a schematic flow chart of an embodiment of a search method according to the present invention.
  • FIG. 9 is a schematic flow chart of another embodiment of a search method according to the present invention.
  • FIG. 10 is a schematic flow chart of another embodiment of a search method according to the present invention.
  • FIG. 11 is a schematic flow chart of extracting a user interest model according to the present invention.
  • FIG. 12 is a schematic flow chart of extracting a static interest model of a user according to the present invention.
  • FIG. 13 is a schematic flow chart of extracting a dynamic interest model of a user according to the present invention.
  • FIG. 15 is a schematic structural diagram of still another search system according to an embodiment of the present invention.
  • FIG. 16 is a structural diagram of a search server and an application server in the search system architecture of FIG. 15;
  • Figure 17 is a flow chart showing the operation of the system of the search shown in Figure 15.
  • a schematic diagram of a system structure of a system embodiment for searching includes:
  • Search client 10 used to send a search request to the search service subsystem
  • the search service subsystem 20 is configured to receive a search request sent by the search client, extract the interest model of the user from the user personalized data according to the search request, obtain a meta index reported by each member engine, and obtain a meta element according to the member engine
  • the index, the search request, and the user's interest model select a member engine; the search request is sent to the selected member engine;
  • At least one member engine for providing a meta index of the member engine to the search service subsystem, and completing the search after receiving the search request sent by the search service subsystem.
  • there are generally multiple member engines such as a first member engine 301, a second member engine 302, a third member engine 303, and a fourth member engine 304.
  • the member engine refers to each vertical search engine responsible for a specific search in the architecture of the meta search.
  • the meta index refers to the statistical data used to describe the capabilities of the member engine and used for member engine selection in the metasearch architecture.
  • the member engine's meta index is a document or record contained in a database, a sub-database, a database, or a sub-database corresponding to the member engine, and statistics of terms contained in the document or the record.
  • the meta-index of the member engine is used as one of the basis for selecting a member engine.
  • the user's interest model is based on the weighted scores of certain dimensions extracted from the relevant data of the user.
  • the search service subsystem 20 extracts the interest model of the user after receiving the search request, and selects the member engine according to the meta index of the respective member engines, the search request, and the interest model of the user, That is to say, the member engines of these choices fully consider the factors of the search request and the user's interest model. After that, the selected member engines complete the search, the member engine selection is more personalized, and the selected engine is interested in the user. Accordingly, this improves the efficiency of the system scheduling (or selection) and the accuracy of the search.
  • FIG. 2 there is shown a block diagram of another embodiment of a search system similar to that described above with respect to Figure 1, including a search client 10, a search service subsystem 20A, and at least one member engine.
  • the search client 10 is configured to send a search request to the search service subsystem; at least one member engine is configured to provide a meta index of the member engine to the search service subsystem, and after receiving the search request sent by the search service subsystem, Finish the search.
  • the search service subsystem 20A is comprised of a search server 201 A and a user database 202.
  • the user database 202 is configured to store or provide personalized data of the user
  • the search server 201A is configured to receive a search request sent by the search client, extract the interest model of the user from the user personalized data according to the search request, obtain a meta index reported by each member engine, and obtain a meta element according to each member engine.
  • the index, the search request, and the user's interest model select the member engine; the search request is sent to the selected member engine.
  • FIG. 3 a detailed structural diagram of each device in the system architecture shown in FIG. 2, wherein the search server 201A includes:
  • the search request receiving module 201A1 is configured to receive a search request sent by the search client;
  • the user's interest model extraction module 201A2 extracts the user's interest model from the user personalized data according to the search request;
  • a meta index collection module 201A3 configured to obtain a meta index of each member engine
  • a member engine selection module 201A4 configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user;
  • the search request distribution module 201A5 is configured to send the search request to the selected member engine, so that the selected member engine completes the search according to the search request.
  • FIG. 4 a schematic diagram of an architecture of another search system embodiment, similar to the system of FIG. 1, includes a search client 10, a search service subsystem 20B, and at least one member engine.
  • the search client 10 is configured to send a search request to the search service subsystem; at least one member engine is configured to provide a meta index of the member engine to the search service subsystem, and a part of the member engines receive the send by the search service subsystem After searching for the request, complete the search.
  • the search service subsystem 20B is composed of a search server 201B, a dispatch server 203A, and a user database 202.
  • the user database 202 is configured to store or provide personalized data of the user
  • the search server 201B is configured to receive a search request sent by the search client, extract the interest model of the user from the user personalized data according to the search request, and send the user's interest model and the search request to the dispatch server 203A. Receiving the selected member engine returned by the dispatch server 203A, and transmitting the search request to the selected member engine;
  • the scheduling server 203A is configured to receive the interest model of the user sent by the search server 201B and the search request, and receive a meta index reported by each member engine; according to the meta index of the member engines, the search request, and the interest of the user
  • the model selects a member engine; the selected member engine is returned to the search server 201B.
  • FIG. 5 it is a schematic structural diagram of the search server 201B and the dispatch server 203A in the system architecture shown in FIG.
  • the search server 201B includes:
  • the search request receiving module 201B1 is configured to receive a search request sent by the search client, and send the search request to the member engine selection request sending module 201B3;
  • the user's interest model extraction module 201B2 is configured to extract the user's interest model from the user personalized data according to the search request, and send the user's interest model to the member engine selection request sending module 201B3;
  • a member engine selection request sending module 201B3 configured to send the search request and the user's interest model to a scheduling server, so that the scheduling server is configured according to the meta index of the respective member engines, the search request, and the user
  • the interest model selects the member engine
  • a member engine selection result receiving module 201B4 configured to receive a selected member engine returned by the scheduling server; a search request distribution module 201B5, configured to send the search request to the selected member engine, to facilitate the selection The member engine completes the search based on the search request.
  • the dispatch server 203A that communicates with the search server 201B described above includes:
  • a member engine selection request receiving module 203A1 configured to receive a search request sent by the search server and an interest model of the user
  • a meta index collection module 203A2 configured to obtain a meta index of each member engine
  • a member engine selection module 203 A3 configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user;
  • a member engine selection result returning module 203A4 configured to send the selected member engine to the search server, so that the search server sends the search request to the selected member engine, the selected member engine The search is completed according to the search request.
  • FIG. 6 a schematic diagram of an architecture of another search system embodiment, similar to the system described above with reference to FIG. 1, includes a search client 10, a search service subsystem 20C, and at least one member engine.
  • the search client 10 is configured to send a search request to the search service subsystem; at least one member engine is configured to provide a meta index of the member engine to the search service subsystem, and after receiving the search request sent by the search service subsystem, Finish the search.
  • the search service subsystem 20C shown in FIG. 6 is composed of a search server 201C, a dispatch server 203C, and a user database 202, wherein
  • the user database 202 is configured to store or provide personalized data of the user
  • the search server 201C is configured to receive a search request sent by the search client, send the search request to the dispatch server, receive the selected member engine returned by the dispatch server, and send the search request to the selected member engine.
  • the scheduling server 203C is configured to receive the search request sent by the search server, and personalize the user according to the search request. Extracting the user's interest model from the data, and obtaining a meta index of each member engine; selecting a member engine according to the meta index of the respective member engine, the search request, and the user's interest model; returning the selected member engine to the Search server.
  • the search server 201C as shown in Fig. 7 includes:
  • the search request receiving module 201C1 is configured to receive a search request sent by the search client, and send the search request to the member engine selection request sending module 201C3;
  • a member engine selection request sending module 201C3 configured to send the search request to the scheduling server, so that the scheduling server selects a member engine according to the meta index of the respective member engines, the search request, and the user's interest model;
  • a member engine selection result receiving module 201C4 configured to receive a selected member engine returned by the scheduling server; a search request distribution module 201C5, configured to send the search request to the selected member engine, to facilitate the selection The member engine completes the search based on the search request.
  • a dispatch server 203C which can communicate with the aforementioned search server 201C, includes
  • a member engine selection request receiving module 203C1 configured to receive a search request sent by the search server
  • the user's interest model extraction module 203C5 is configured to extract the user's interest model from the user personalized data according to the received search request, and send the user's interest model to the member engine selection module 203C3;
  • a meta index collection module 203C2 configured to obtain a meta index of each member engine
  • a member engine selection module 203C3 configured to select a member engine according to a meta index of each member engine, a search request, and a user's interest model;
  • a member engine selection result returning module 203C4 configured to send to the search server according to the selected member engine, so that the search server sends the search request to the selected member engine, the selected member engine The search is completed according to the search request.
  • the device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, ie may be located One place, or it can be distributed to multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the objectives of the embodiments of the present invention. Those of ordinary skill in the art can understand and implement without deliberate labor.
  • the above-mentioned system and device for searching because the member engine is selected according to the meta index of each member engine, the search request, and the user's interest model, and the search is performed by the selected member engines, thereby further improving the accuracy of the search and improving the search. effectiveness.
  • the method includes:
  • the search service subsystem receives a meta index of each member engine.
  • each member engine may report its meta index to the search service subsystem.
  • Each member engine corresponds to a database, and the database may include several sub-databases.
  • the meta-index of the member engine is specifically determined by the data in the database or sub-database corresponding to the member engine.
  • the meta index contains statistics about terms in the data or interest in documents in the data Model statistics.
  • the interest model of the document is a vector based on the weighted scores of the data extracted in the document relative to a certain dimension. In a specific embodiment of the invention, the dimension of the interest model of the above document and the interest model of the user should be consistent.
  • the meta index may include one of the following information or any combination thereof:
  • maximum normalized weight vector mnw (mnwl, mnw2, , mnwi, mnwp), where mnwi is the maximum normalized weight of the term ti relative to all documents in the database or subdatabase corresponding to the member engine.
  • mnwi can be calculated in the following way: first calculate the weight of each document in the database relative to the term ti, the value of the weight is the number of occurrences of the term in the document, and then the maximum weight of all documents relative to the term ti Value, the maximum weight mnwi' of the term ti relative to all documents in the database is obtained, and the vectors mnw' mnwl', mnw2', ... mnwi', ...., mnwp') are normalized. , finally get the normalized vector mnw of mnw' vector (mnwl, mnw2, ..., mnwi, , mnwp
  • anwi can be calculated as follows: First, calculate the weight of each document of the database relative to the term ti, the value of the weight is the number of occurrences of the term in the document, and then average from the weight of all documents relative to the term ti, Obtain the average weight anwi' of the term ti relative to all documents in the database, and then normalize the vector anw' (anwl', anw2' ... anwi', .... . , anwp'), and finally Get the normalized vector anw of anw' vector (anwl, anw2,..., anwi, , anwp
  • mnvi can be calculated in the following way: First, calculate the weight of each document of the database relative to the i-th dimension of the interest model.
  • the value of the weight is the i-th dimension range of the interest model (such as: sports) in the document.
  • the sum of the word frequencies of all words and then taking the maximum value from the weights of all documents relative to the i-th dimension of the interest model, the maximum weight mnvi' of the i-th dimension of the interest model relative to all documents in the database D is obtained.
  • mnvi', ., mnvp' is normalized, and finally the normalized vector mnv of the mnv' vector is obtained (mnvl, mnv2,... , mnvi, , mnvp).
  • anvi can be calculated in the following way: First, calculate the weight of each document in the database relative to the i-th dimension of the interest model.
  • the value of the weight is the i-th dimension of the interest model in the document (such as: sports).
  • the sum of the word frequencies of all words then calculate the average from the weights of all the documents relative to the i-th dimension of the interest model, and get the average weight of the i-th dimension of the interest model relative to all documents in the database anvi', then
  • the vector an V '(an V r,an V 2' ...an V i', .... . , anvp') is normalized, and finally the normalized vector anv of the anv' vector (anvl, Anv2,...,anvi, ,anvp).
  • the search service subsystem receives a search request sent by the search client.
  • the search request carries the ID of the user, and information such as a search keyword composed of one or more terms.
  • (gl, g2, ..g ... g :) can be used to represent the vector corresponding to the search request, where ⁇ is the weight of the term in the search request, and i and k are natural numbers.
  • the search service subsystem extracts a user's interest model from the user database.
  • the user database generally stores personalized data of the user, including the user's static user profile search history, presence information, location information, and the like.
  • the user's interest model can be extracted by different methods for different data, which will be detailed later.
  • the step of selecting considers both the search request and the meta-index, and more factors of the user's interest model, and the specific process will be described later.
  • a flow diagram of another search method the method includes:
  • the scheduling server receives each member engine element index.
  • each member engine may report its meta index to the scheduling server.
  • the search server receives a search request sent by the search client.
  • the search request carries the ID of the user, and information such as a search keyword composed of one or more terms. It can be seen that the foregoing steps 901 and 902 may have no prioritized sequence.
  • the search server After receiving the search request, the search server extracts the user's interest model from the user personalized data in the user database.
  • the search server submits the search request and the user's interest model to the scheduling server.
  • the scheduling server selects a member engine according to the search request, the user's interest model, and the meta index.
  • the scheduling server sends the selected member engine to the search server.
  • the search server sends the foregoing search request to the selected member engine, so that the member engine completes the search.
  • the method includes:
  • the scheduling server receives a meta index of each member engine.
  • each member engine may report its meta index to the scheduling server.
  • the search client sends a search request to the search server.
  • the search request carries the ID of the user, and information such as a search keyword composed of one or more terms. It can be seen that the steps of 1001 and 1002 have no prioritization in time. 1003.
  • the search server sends the search request to the scheduling server after receiving the search request.
  • the dispatching server After receiving the search request, the dispatching server extracts a user's interest model from user personalized data in the user database.
  • the scheduling server selects a member engine according to the search request, the user's interest model, and the meta index.
  • the scheduling server returns the selected member engine to the search server.
  • the search server After receiving the selected member engine, the search server distributes the search request to the selected member engine.
  • the specific method of extracting the user's interest model from the user personalized data in the user database in the foregoing steps 803, 903 or 1004 is specifically described below.
  • a method for extracting the user's interest model from user personalized data includes:
  • the dimensions of interest such as: news, sports, entertainment, finance, technology, real estate, games, women, forums, weather, merchandise, home appliances, music, reading, blog, mobile, military, education, travel, MMS, ring tones, catering, civil aviation , industry, agriculture, computers, geography, etc.
  • the actual implementation process is not limited to the above dimension of interest.
  • Each score value constitutes a score value vector, and the score value vector is a static interest model of the user.
  • the user's static interest model can be expressed as:
  • the interest model corresponding to the user's static user profile Rl (pl,
  • the user personalized data may also be a user's search history, and the obtained user's interest model may be referred to as a user's dynamic interest model.
  • the method may specifically include:
  • the sum of the above-mentioned score value vectors for the clicked documents in different search histories is the dynamic interest model of the user.
  • di (tl, t2, t3, ..., tj)
  • the user personalized data is the search history
  • the user's interest model may be further modified according to the specific situation, including:
  • the rating value vector for the document is forward weighted.
  • the score value vector for the document is decremented according to the time elapsed after the document was clicked.
  • the user personalized data may include both the static user profile of the user and the search history of the user.
  • the obtained user's interest model may be referred to as the user's comprehensive interest model.
  • the method for extracting the user interest model may further include:
  • the static interest model, the dynamic interest model are first weighted and added, and the summed sum is normalized, and the normalized result is a comprehensive interest model.
  • the specific process of selecting the member engine according to the search request, the user's interest model, and the meta index in the foregoing steps 804, 905 or 1005 will be specifically described below.
  • the selection process specifically includes:
  • the process of obtaining the first similarity and the second similarity in the above 1401 ⁇ 1402 may include different steps (that is, adopt different formulas or algorithms), and several specific examples are described below.
  • the 1401a steps include: Calculation ( ( qi * gidfi * mnwi + ⁇ i /
  • the 1402a step includes: calculating l ⁇ ⁇ "(if (s i m(V (picture ⁇ , ⁇ " ⁇ /' ⁇ , 1 ⁇ ⁇ / ' ⁇ ")) , Q ') > T) then
  • the 1402b step includes: calculating 1 ⁇ ⁇ " (if (sim( V( IM_gidfi*mnvi , IM _gidjj * anvj(j ⁇ i, ⁇ ⁇ j ⁇ n) ) , Q')> T ) resort qi * gidfi * anwi) then( ( ri * mnvi * IM _ gidfi + ⁇ j* anvj * IM _ gidff ) /
  • R + '. 1 /
  • the calculation method of Q' is: If the term ti belongs to a range of a dimension of the user's interest model, the value of qi is mapped to the weight of the dimension of the user's interest model, and then the weights of the same dimension are added to obtain qi.
  • V is a vector consisting of IM_gidfi* mnvi and - gidj ⁇ * anvjij ⁇ i,l ⁇ j ⁇ n); sim( V ( IM _ gidfi * mnvi , IM _ gidff * anvj (J ⁇ i, ⁇ ⁇ _/ ⁇ ")) , Q' ) is the cousine similarity of vector V and vector Q'; ⁇ is a threshold and 0 ⁇ ⁇ ⁇ 1.
  • the 1401c steps include: a value of /
  • the 1402c step includes: calculating 1 ⁇ ⁇ ' ⁇ " ( if (sim( V ( mnvi, ⁇ i,l ⁇ j ⁇ n)) , Q')> T ) k
  • Q' is calculated as: If the term ti belongs to a range of dimensions of the user's interest model, the value of qi is mapped to the weight of the dimension of the user's interest model, and then the weights of the same dimension are added to obtain qi' And then normalized; V is a vector consisting of mnvi and ⁇ V / ⁇ U; ⁇ m ⁇ N( mnvi, anvjU ⁇ iA ⁇ j ⁇ n) ⁇ , Q,) is vector v and vector Q' Cousine similarity; T is a threshold and 0 ⁇ T ⁇ 1.
  • the selected database of the member engine has a good similarity with the combination of the search request and the user's interest model, thereby improving the accuracy of the search and saving the resources of the search system. , improve the efficiency of the search.
  • the selection of the member engine fully utilizes the rich user data, and performs personalized selection, so that the member engine that best meets the user's personalized interest needs can be selected to serve the user, and the member engine selects the scheduling. Accurate, thus achieving the purpose of accurate search.
  • FIG. 15 is a schematic structural diagram of still another search system according to an embodiment of the present invention.
  • the system is similar to the system described above with respect to Figure 1, and includes a search client 10, a search service subsystem 20D, and at least one member engine 301, 302 or 303.
  • the search client 10 is configured to send a search request to the search service subsystem 20D; at least one member engine 301, 302 or 303 is configured to provide the search service subsystem 20D with a meta index of the member engine, and upon receiving the search service After the search request sent by the system 20D, the search is completed.
  • the search service subsystem 20D shown in Fig. 15 is composed of a search server 201D, an application server 204, and a user database 202D.
  • the user database 202D is used to store or provide personalized data for the user.
  • the application server 204 is configured to receive a search request sent by the search client, extract the interest model of the user from the user personalized data according to the search request, and send the search request and the user interest model to the search server 201D.
  • the search server 201D is configured to receive the search request sent by the application server 204 and the interest model of the user, and receive the meta index reported by each member engine 301, 302 or 303, and according to the meta index of each member engine 301, 302 or 303. And the search request and the interest model of the user select a member engine 301, 302 or 303; send the search request to the selected member engine 301, 302 or 303.
  • FIG. 16 a structural diagram of the search server 201D and the application server 204 in the search system architecture of FIG.
  • the application server 204 includes a search request receiving module 2041, configured to receive a search request sent by the search client.
  • the interest model extraction module 2042 is configured to extract a user's interest model from the user's personalized data according to the search request. Similarly, the user's interest model may be extracted from the user's personalized data after receiving the search request, or the user's interest model may be extracted from the user's personalized data in advance, and the pre-extraction is directly obtained after receiving the search request.
  • User's interest model is configured to send the search request and the user's interest model to the search server 201D.
  • the search server 201D includes: a receiving module 201D1, configured to receive a search request sent by an application server, and a user Interest model.
  • the meta-index collection module 201D2 is configured to receive a meta-index reported by each member engine.
  • the member engine selection module 201D3 is configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user.
  • the search request distribution module 201D4 is configured to send the search request to the selected member engine, so that the selected member engine completes the search according to the search request.
  • the search method includes:
  • the search server receives a meta index of each member engine.
  • each member engine may actively report its own meta index to the search server, or the search server may request the member engine's meta index to each member engine.
  • the application server receives a search request sent by the search client.
  • the application server extracts the user's interest model from the user's personalized data.
  • the user's interest model may be extracted from the user's personalized data after receiving the search request, or the interest model may be extracted from the user's personalized data in advance, and after receiving the search request, directly extracting the pre-extracted user's Interest model.
  • the similarity between the search request and the interest model and the database corresponding to the member engine is calculated according to the search request, the interest model, and the meta index, and the member engine with high similarity is selected.
  • the search request the search request
  • the interest model the interest model
  • the meta index the member engine with high similarity
  • the search service subsystem can be composed of a search server, a dispatch server, or an application server, or it can be integrated on one server.
  • User databases can exist independently or on any of the aforementioned servers.
  • the different modules in the aforementioned search server, dispatch server or application server can also be integrated in any combination.
  • the steps of extracting the user's interest model from the user personalized data according to the search request may include different situations: After receiving the search request, extracting the user's interest model from the user personalized data; or extracting the user's interest model from the user personalized data in advance, and after receiving the search request, directly extracting the advance
  • the user's interest model is extracted from the user personalized data.
  • the user's interest model in various embodiments of the present invention can be equivalently replaced with personalization data of other various types of users.
  • the user's interest model may be an expression of the user's personalized data, and the scope of protection of the present invention is of course not limited to this form of expression.
  • member engines are selected according to the meta-index of each member engine, the search request, and the personalized data of the user. In actual application scenarios, some considerations may be added. The selection of the member engine, or further processing of the selected member engine, such as integration, filtering, etc., is then searched by the finalized member engine.
  • a search server includes:
  • a search request receiving module configured to receive a search request sent by the search client;
  • the interest model extraction module extracts the user's interest model from the user personalized data according to the search request;
  • the meta index collection module is configured to receive the meta index reported by each member engine;
  • a member engine selection module configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user
  • a search request distribution module configured to send the search request to the selected a member engine, such that the selected member engine completes the search based on the search request.
  • a search server includes:
  • a search request receiving module configured to receive a search request sent by the search client, and send the search request to the member engine selection request sending module;
  • the interest model extraction module is configured to extract the user's interest model from the user personalized data according to the search request, and send the user's interest model to the member engine selection request sending module;
  • a member engine selection request sending module configured to send the search request and the user's interest model to a scheduling server, so that the scheduling server is configured according to the meta index of the respective member engines, the search request, and the user's interest Model selection member engine;
  • a member engine selection result receiving module configured to receive a selected member engine returned by the scheduling server
  • a search request distribution module configured to send the search request to the selected member engine, to facilitate the selected member The engine completes the search based on the search request.
  • a scheduling server is operative to communicate with the foregoing search server, the scheduling server comprising:
  • a member engine selection request receiving module configured to receive a search request sent by the search server and an interest model of the user
  • a meta index collection module configured to receive a meta index reported by each member engine
  • a member engine selection module configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user;
  • a member engine selection result returning module configured to send the selected member engine to the search server, so that the search server sends the search request to the selected member engine, and the selected member engine is configured according to The search request completes the search.
  • a search server includes:
  • a search request receiving module configured to receive a search request sent by the search client, and send the search request to the member engine selection request sending module;
  • a member engine selection request sending module configured to send the search request to the scheduling server, so that the scheduling server selects a member engine according to the meta index of the member engines, the search request, and the user's interest model;
  • a member engine selection result receiving module configured to receive a selected member engine returned by the scheduling server
  • a search request distribution module configured to send the search request to the selected member engine, to facilitate the selected member The engine completes the search based on the search request.
  • a scheduling server wherein the scheduling server communicates with the foregoing search server, including:
  • a member engine selection request receiving module configured to receive a search request sent by the search server
  • An interest model extraction module configured to extract the user's interest model from the user personalized data according to the received search request, and Sending the user's interest model to the member's engine selection module;
  • a meta index collection module configured to receive a meta index reported by each member engine
  • a member engine selection module configured to select a member engine according to the meta index of the respective member engines, the search request, and the user's interest model
  • a member engine returning module configured to send to the search server according to the selected member engine, so that the search server sends the search request to the selected member engine, the selected member engine according to the The search request completes the search.
  • a search server that includes:
  • a receiving module configured to receive a search request sent by an application server and a user's interest model
  • a meta index collection module configured to receive a meta index reported by each member engine
  • a member engine selection module configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user;
  • a search request distribution module is operative to send the search request to the selected member engine to facilitate the selected member to complete the search based on the search request.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for searching is provided which includes the following steps: receiving the searching request; extracting the user interest model from the user personalized data according to the searching request; obtaining the meta index of each member engine; selecting the member engine according to the meta index of each member engine, the searching request and the user interest model; sending the searching request to the selected member engine and the selected member engine completing searching. The system and relative device thereof are provided such as a searching server, a scheduling server. This improves the efficiency of system and precision of searching.

Description

一种搜索的方法、 系统和装置 本申请要求于 2008年 9月 26日提交中国专利局、 申请号为 200810216521.5、发明名称为"一种 搜索的方法、 系统和装置"的中国专利申请的优先权, 以及要求于 2008年 12月 24日提交中国专利 局、 申请号为 200810190595.6、 发明名称为"一种搜索的方法、 系统和装置"的中国专利申请的优先 权, 其全部内容通过引用结合在本申请中。  Method, system and device for searching, the present application claims priority to Chinese patent application filed on September 26, 2008, the Chinese Patent Office, Application No. 200810216521.5, entitled "A Search Method, System and Apparatus" And the priority of the Chinese Patent Application entitled "Method, System and Apparatus for Searching", filed on December 24, 2008, filed on Dec. 24, 2008, the entire disclosure of which is incorporated herein by reference. In the application.
技术领域 Technical field
本发明涉及通信技术领域, 尤其是一种搜索的方法、 系统与装置。  The present invention relates to the field of communication technologies, and in particular, to a search method, system and apparatus.
发明背景 Background of the invention
随着科学的发展和进步,通信技术也随着飞速发展,其中移动搜索技术是一个伴随着通信技术 发展的新亮点。 移动搜索技术的研究也成为业界研发的重点。 移动搜索技术中一个很重要的技术亮 点是精确搜索, 也就是提供给用户个性化的搜索服务, 实现用户所搜即所得。  With the development and advancement of science, communication technology has also developed rapidly, and mobile search technology is a new bright spot along with the development of communication technology. Research on mobile search technology has also become the focus of industry research and development. A very important technical highlight of mobile search technology is accurate search, which is to provide users with personalized search services to achieve the user's search.
现有技术一种元搜索个性化的处理方案中,搜索服务器搜集成员引擎的元索引,搜索服务器根 据元索引计算搜索请求与成员引擎的相似度, 选择相似度高的成员引擎为用户服务, 将搜索请求分 发给这些选中的成员引擎进行搜索。  In a prior art processing scheme of meta-search, the search server collects the meta-index of the member engine, and the search server calculates the similarity between the search request and the member engine according to the meta-index, and selects a member engine with high similarity to serve the user, Search requests are distributed to these selected member engines for searching.
发明人在实现本发明的过程中,发现该元搜索方案中选中的成员引擎可能不准确, 导致搜索的 精确度不高。  In the process of implementing the present invention, the inventors found that the selected member engine in the meta search scheme may be inaccurate, resulting in low accuracy of the search.
发明内容 Summary of the invention
为了提高搜索的精确度, 本发明的实施方式提供了相应的用于搜索的方法、 系统和装置。 一种搜索的方法, 该方法包括: 接收搜索请求; 根据所述搜索请求从用户个性化数据中提取用 户的兴趣模型; 获取各个成员引擎的元索引; 根据所述各个成员引擎的元索引、 搜索请求和所述用 户的兴趣模型选择成员引擎, 将所述搜索请求发送给所述选择的成员引擎, 以便于所述选择的成员 引擎完成搜索。  In order to improve the accuracy of the search, embodiments of the present invention provide corresponding methods, systems, and apparatus for searching. A method for searching, the method comprising: receiving a search request; extracting a user's interest model from the user personalized data according to the search request; acquiring a meta index of each member engine; and searching, according to the meta index of each member engine The request and the user's interest model select a member engine, and send the search request to the selected member engine to facilitate the selected member engine to complete the search.
相应的,一种用于搜索的系统,该系统能够应用上述搜索方法,该系统包括:搜索服务子系统, 用于接收搜索请求, 接收各个成员引擎上报的元索引, 根据所述各个成员引擎的元索引、 所述搜索 请求和用户的兴趣模型选择成员引擎, 将所述搜索请求发送给所述选择的成员引擎;  Correspondingly, a system for searching, the system capable of applying the above search method, the system comprising: a search service subsystem, configured to receive a search request, and receive a meta index reported by each member engine, according to each member engine a meta index, the search request, and a user's interest model selection member engine, to send the search request to the selected member engine;
至少一个成员引擎,用于向搜索服务子系统上报该成员引擎的元索引, 并在接收到所述搜索服 务子系统发送的搜索请求后, 完成搜索。  At least one member engine is configured to report the meta index of the member engine to the search service subsystem, and complete the search after receiving the search request sent by the search service subsystem.
上述各实施方式由于接收了搜索请求后提取了该用户的兴趣模型, 并且根据该各个成员引擎 的元索引、 该搜索请求和该用户的兴趣模型选择成员引擎, 也就是说, 这些选择的成员引擎充分考 虑了搜索请求和用户的兴趣模型的因素, 之后, 由选择的这些成员引擎完成搜索, 成员引擎的选择 更具有个性化, 选出的引擎是用户兴趣相关的, 这样提高了系统调度 (或者选择) 的效率与搜索的 精确度。  Each of the foregoing embodiments extracts the interest model of the user after receiving the search request, and selects a member engine according to the meta index of the member engines, the search request, and the interest model of the user, that is, the selected member engines. Taking full account of the factors of the search request and the user's interest model, after the search is completed by the selected member engines, the selection of the member engine is more personalized, and the selected engine is related to the user's interest, thus improving the system scheduling (or Select) the efficiency and accuracy of the search.
附图简要说明 BRIEF DESCRIPTION OF THE DRAWINGS
图 1为本发明一个实施例的系统架构示意图; 图 2为本发明另一个实施例的系统架构示意图; 1 is a schematic structural diagram of a system according to an embodiment of the present invention; 2 is a schematic structural diagram of a system according to another embodiment of the present invention;
图 3为图 2所示的系统架构下的装置的结构示意图;  3 is a schematic structural view of a device in the system architecture shown in FIG. 2;
图 4为本发明另一个实施例的系统架构示意图;  4 is a schematic structural diagram of a system according to another embodiment of the present invention;
图 5为图 4所示的系统架构下的装置的结构示意图;  Figure 5 is a schematic structural view of the device in the system architecture shown in Figure 4;
图 6为本发明另一个实施例的系统架构示意图;  6 is a schematic structural diagram of a system according to another embodiment of the present invention;
图 7为图 6的系统架构下的装置的结构示意图;  7 is a schematic structural view of a device in the system architecture of FIG. 6;
图 8为本发明一个搜索方法实施例的流程示意图;  8 is a schematic flow chart of an embodiment of a search method according to the present invention;
图 9为本发明另一个搜索方法实施例的流程示意图;  9 is a schematic flow chart of another embodiment of a search method according to the present invention;
图 10为本发明又一个搜索方法实施例的流程示意图;  10 is a schematic flow chart of another embodiment of a search method according to the present invention;
图 11为本发明一个提取用户兴趣模型的流程示意图;  11 is a schematic flow chart of extracting a user interest model according to the present invention;
图 12为本发明一个提取用户的静态兴趣模型的流程示意图;  12 is a schematic flow chart of extracting a static interest model of a user according to the present invention;
图 13为本发明一个提取用户的动态兴趣模型的流程示意图;  13 is a schematic flow chart of extracting a dynamic interest model of a user according to the present invention;
图 14为本发明一个选取成员引擎的流程示意图;  14 is a schematic flow chart of a selected member engine according to the present invention;
图 15为本发明实施方式提供的又一种搜索系统的架构示意图;  15 is a schematic structural diagram of still another search system according to an embodiment of the present invention;
图 16为图 15的搜索系统架构下的搜索服务器、 应用服务器的结构图;  16 is a structural diagram of a search server and an application server in the search system architecture of FIG. 15;
图 17为图 15所示的搜索的系统的工作流程图。  Figure 17 is a flow chart showing the operation of the system of the search shown in Figure 15.
实施本发明的方式 Mode for carrying out the invention
参考图 1, 为一种用于搜索的系统实施例的系统结构示意图, 该系统包括:  Referring to FIG. 1, a schematic diagram of a system structure of a system embodiment for searching includes:
搜索客户端 10, 用于向搜索服务子系统发送搜索请求,  Search client 10, used to send a search request to the search service subsystem,
搜索服务子系统 20, 用于接收搜索客户端发送的搜索请求; 根据该搜索请求从用户个性化数 据中提取该用户的兴趣模型; 获取各个成员引擎上报的元索引; 根据该各个成员引擎的元索引、 该 搜索请求和该用户的兴趣模型选择成员引擎; 将该搜索请求发送给该选择的成员引擎;  The search service subsystem 20 is configured to receive a search request sent by the search client, extract the interest model of the user from the user personalized data according to the search request, obtain a meta index reported by each member engine, and obtain a meta element according to the member engine The index, the search request, and the user's interest model select a member engine; the search request is sent to the selected member engine;
至少一个成员引擎,用于向搜索服务子系统提供该成员引擎的元索引, 并在接收到该搜索服务 子系统发送的搜索请求后, 完成搜索。 在实施的过程中, 一般有多个成员引擎, 例如第一成员引擎 301, 第二成员引擎 302, 第三成员引擎 303, 第四成员引擎 304。  At least one member engine for providing a meta index of the member engine to the search service subsystem, and completing the search after receiving the search request sent by the search service subsystem. In the implementation process, there are generally multiple member engines, such as a first member engine 301, a second member engine 302, a third member engine 303, and a fourth member engine 304.
需要说明的是, 在本发明的各实施方式中, 成员引擎是指在元搜索的架构中, 负责具体搜索的 各个垂直搜索引擎。 元索引是指在元搜索的架构中, 用来描述成员引擎的能力并用于成员引擎选择 的统计数据。 具体的, 成员引擎的元索引为关于该成员引擎对应的数据库、 子数据库、 数据库或者 子数据库中包含的文档或者记录, 以及, 该文档或者记录中包含的术语的统计数据。 在本发明的各 实施方式中, 成员引擎的元索引用作选择成员引擎的依据之一。  It should be noted that, in various embodiments of the present invention, the member engine refers to each vertical search engine responsible for a specific search in the architecture of the meta search. The meta index refers to the statistical data used to describe the capabilities of the member engine and used for member engine selection in the metasearch architecture. Specifically, the member engine's meta index is a document or record contained in a database, a sub-database, a database, or a sub-database corresponding to the member engine, and statistics of terms contained in the document or the record. In various embodiments of the invention, the meta-index of the member engine is used as one of the basis for selecting a member engine.
兴趣模型是由相对于一定的维度的权重分值所组成的向量, 可以用公式表示为 R = (rl, r2,..ri...., rn);其中 n表示 n个维度, '为该兴趣模型第 i个维度的权重分值, i为自然数。 其中, 用户的兴趣模型是基于用户的相关数据提取出来的相对于一定的维度的权重分值所组成的向 上述用于搜索的系统中,由于搜索服务子系统 20接收了搜索请求后提取了该用户的兴趣模型, 并且根据该各个成员引擎的元索引、 该搜索请求和该用户的兴趣模型选择成员引擎, 也就是说, 这 些选择的成员引擎充分考虑了搜索请求和用户的兴趣模型的因素, 之后, 由选择的这些成员引擎完 成搜索,成员引擎的选择更具有个性化,选出的引擎是与用户兴趣相关的,这样提高了系统调度(或 者选择) 的效率与搜索的精确度。 Interest model is a vector of dimension with respect to the weight of a certain value consisting weights, it can be expressed using the formula R = (rl, r2, .. ri .... , rn); wherein n represents the n dimensions, 'is The weight score of the i-th dimension of the interest model, and i is a natural number. The user's interest model is based on the weighted scores of certain dimensions extracted from the relevant data of the user. In the above system for searching, since the search service subsystem 20 extracts the interest model of the user after receiving the search request, and selects the member engine according to the meta index of the respective member engines, the search request, and the interest model of the user, That is to say, the member engines of these choices fully consider the factors of the search request and the user's interest model. After that, the selected member engines complete the search, the member engine selection is more personalized, and the selected engine is interested in the user. Relatedly, this improves the efficiency of the system scheduling (or selection) and the accuracy of the search.
参考图 2为另一个搜索系统实施例的架构示意图,该系统与前述图 1所述的系统类似,包括搜 索客户端 10, 搜索服务子系统 20A, 至少一个成员引擎。 搜索客户端 10用于向搜索服务子系统发 送搜索请求; 至少一个成员引擎, 用于向搜索服务子系统提供该成员引擎的元索引, 并在接收到该 搜索服务子系统发送的搜索请求后, 完成搜索。 在实施的过程中, 一般有多个成员引擎, 例如第一 成员引擎 301, 第二成员引擎 302, 第三成员引擎 303, 第四成员引擎 304。  Referring to Figure 2, there is shown a block diagram of another embodiment of a search system similar to that described above with respect to Figure 1, including a search client 10, a search service subsystem 20A, and at least one member engine. The search client 10 is configured to send a search request to the search service subsystem; at least one member engine is configured to provide a meta index of the member engine to the search service subsystem, and after receiving the search request sent by the search service subsystem, Finish the search. In the implementation process, there are generally multiple member engines, such as a first member engine 301, a second member engine 302, a third member engine 303, and a fourth member engine 304.
该搜索服务子系统 20A由搜索服务器 201 A和用户数据库 202组成,  The search service subsystem 20A is comprised of a search server 201 A and a user database 202.
该用户数据库 202, 用于存储或者提供用户的个性化数据;  The user database 202 is configured to store or provide personalized data of the user;
该搜索服务器 201A, 用于接收搜索客户端发送的搜索请求; 根据该搜索请求从用户个性化数 据中提取该用户的兴趣模型; 获取各个成员引擎上报的元索引; 根据所述各个成员引擎的元索引、 搜索请求和用户的兴趣模型选择成员引擎; 将该搜索请求发送给该选择的成员引擎。  The search server 201A is configured to receive a search request sent by the search client, extract the interest model of the user from the user personalized data according to the search request, obtain a meta index reported by each member engine, and obtain a meta element according to each member engine. The index, the search request, and the user's interest model select the member engine; the search request is sent to the selected member engine.
参考图 3, 为在图 2所示的系统架构下各装置的具体结构示意图, 其中, 搜索服务器 201A包 括:  Referring to FIG. 3, a detailed structural diagram of each device in the system architecture shown in FIG. 2, wherein the search server 201A includes:
搜索请求接收模块 201A1 , 用于接收搜索客户端发送的搜索请求;  The search request receiving module 201A1 is configured to receive a search request sent by the search client;
用户的兴趣模型提取模块 201A2,根据所述搜索请求从用户个性化数据中提取该用户的兴趣模 型;  The user's interest model extraction module 201A2 extracts the user's interest model from the user personalized data according to the search request;
元索引收集模块 201A3, 用于获取各个成员引擎的元索引;  a meta index collection module 201A3, configured to obtain a meta index of each member engine;
成员引擎选择模块 201A4,用于根据所述各个成员引擎的元索引、所述搜索请求和所述用户的 兴趣模型选择成员引擎;  a member engine selection module 201A4, configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user;
搜索请求分发模块 201A5,用于将所述搜索请求发送给所述选择的成员引擎, 以便于所述选择 的成员引擎根据所述搜索请求完成搜索。  The search request distribution module 201A5 is configured to send the search request to the selected member engine, so that the selected member engine completes the search according to the search request.
参考图 4, 为又一个搜索系统实施例的架构示意图, 该系统与前述图 1该的系统类似, 包括搜 索客户端 10, 搜索服务子系统 20B, 至少一个成员引擎。 搜索客户端 10用于向搜索服务子系统发 送搜索请求; 至少一个成员引擎, 用于向搜索服务子系统提供该成员引擎的元索引, 并其中部分成 员引擎在接收到该搜索服务子系统发送的搜索请求后, 完成搜索。 在实施的过程中, 一般有多个成 员引擎, 例如第一成员引擎 301, 第二成员引擎 302, 第三成员引擎 303, 第四成员引擎 304。  Referring to FIG. 4, a schematic diagram of an architecture of another search system embodiment, similar to the system of FIG. 1, includes a search client 10, a search service subsystem 20B, and at least one member engine. The search client 10 is configured to send a search request to the search service subsystem; at least one member engine is configured to provide a meta index of the member engine to the search service subsystem, and a part of the member engines receive the send by the search service subsystem After searching for the request, complete the search. In the course of implementation, there are typically a plurality of member engines, such as a first member engine 301, a second member engine 302, a third member engine 303, and a fourth member engine 304.
该搜索服务子系统 20B由搜索服务器 201B、 调度服务器 203A和用户数据库 202组成。  The search service subsystem 20B is composed of a search server 201B, a dispatch server 203A, and a user database 202.
具体的, 该用户数据库 202用于存储或者提供用户的个性化数据;  Specifically, the user database 202 is configured to store or provide personalized data of the user;
该搜索服务器 201B, 用于接收搜索客户端发送的搜索请求, 根据该搜索请求从用户个性化数 据中提取该用户的兴趣模型, 将该用户的兴趣模型和该搜索请求发送给该调度服务器 203A, 接收该 调度服务器 203A返回的其选择的成员引擎, 并将该搜索请求发送给该选择的成员引擎; 该调度服务器 203A, 用于接收搜索服务器 201B发送的用户的兴趣模型和该搜索请求, 以及, 接收各个成员引擎上报的元索引; 根据该各个成员引擎的元索引、 该搜索请求和该用户的兴趣模型 选择成员引擎; 将该选择的成员引擎返回给该搜索服务器 201B。 The search server 201B is configured to receive a search request sent by the search client, extract the interest model of the user from the user personalized data according to the search request, and send the user's interest model and the search request to the dispatch server 203A. Receiving the selected member engine returned by the dispatch server 203A, and transmitting the search request to the selected member engine; The scheduling server 203A is configured to receive the interest model of the user sent by the search server 201B and the search request, and receive a meta index reported by each member engine; according to the meta index of the member engines, the search request, and the interest of the user The model selects a member engine; the selected member engine is returned to the search server 201B.
参考图 5, 为上述图 4所示的系统架构下的搜索服务器 201B和调度服务器 203A的结构示意 图。  Referring to FIG. 5, it is a schematic structural diagram of the search server 201B and the dispatch server 203A in the system architecture shown in FIG.
该搜索服务器 201B包括:  The search server 201B includes:
搜索请求接收模块 201B1 , 用于接收搜索客户端发送的搜索请求, 并将该搜索请求发送给成员 引擎选择请求发送模块 201B3 ;  The search request receiving module 201B1 is configured to receive a search request sent by the search client, and send the search request to the member engine selection request sending module 201B3;
用户的兴趣模型提取模块 201B2,用于根据所述搜索请求从用户个性化数据中提取该用户的兴 趣模型, 并将该用户的兴趣模型发送给成员引擎选择请求发送模块 201B3 ;  The user's interest model extraction module 201B2 is configured to extract the user's interest model from the user personalized data according to the search request, and send the user's interest model to the member engine selection request sending module 201B3;
成员引擎选择请求发送模块 201B3 ,用于将所述搜索请求和该用户的兴趣模型发送给调度服务 器, 以便于该调度服务器根据所述各个成员引擎的元索引、 所述搜索请求和所述用户的兴趣模型选 择成员引擎;  a member engine selection request sending module 201B3, configured to send the search request and the user's interest model to a scheduling server, so that the scheduling server is configured according to the meta index of the respective member engines, the search request, and the user The interest model selects the member engine;
成员引擎选择结果接收模块 201B4, 用于接收所述调度服务器返回的其选择的成员引擎; 搜索请求分发模块 201B5, 用于将所述搜索请求发送给所述选择的成员引擎, 以便于所述选择 的成员引擎根据所述搜索请求完成搜索。  a member engine selection result receiving module 201B4, configured to receive a selected member engine returned by the scheduling server; a search request distribution module 201B5, configured to send the search request to the selected member engine, to facilitate the selection The member engine completes the search based on the search request.
和上述搜索服务器 201B通信的调度服务器 203A包括:  The dispatch server 203A that communicates with the search server 201B described above includes:
成员引擎选择请求接收模块 203A1 ,用于接收所述搜索服务器发送的搜索请求和该用户的兴趣 模型;  a member engine selection request receiving module 203A1, configured to receive a search request sent by the search server and an interest model of the user;
元索引收集模块 203A2, 用于获取各个成员引擎的元索引;  a meta index collection module 203A2, configured to obtain a meta index of each member engine;
成员引擎选择模块 203 A3,用于根据所述各个成员引擎的元索引、所述搜索请求和所述用户的 兴趣模型选择成员引擎;  a member engine selection module 203 A3, configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user;
成员引擎选择结果返回模块 203A4,用于将所述选择的成员引擎发送给所述搜索服务器, 以便 于所述搜索服务器将所述搜索请求发送给所述选择的成员引擎, 所述选择的成员引擎根据所述搜索 请求完成搜索。  a member engine selection result returning module 203A4, configured to send the selected member engine to the search server, so that the search server sends the search request to the selected member engine, the selected member engine The search is completed according to the search request.
参考图 6, 为又一个搜索系统实施例的架构示意图, 该系统与前述图 1所述系统类似, 包括搜 索客户端 10, 搜索服务子系统 20C, 至少一个成员引擎。 搜索客户端 10用于向搜索服务子系统发 送搜索请求; 至少一个成员引擎, 用于向搜索服务子系统提供该成员引擎的元索引, 并在接收到该 搜索服务子系统发送的搜索请求后, 完成搜索。  Referring to FIG. 6, a schematic diagram of an architecture of another search system embodiment, similar to the system described above with reference to FIG. 1, includes a search client 10, a search service subsystem 20C, and at least one member engine. The search client 10 is configured to send a search request to the search service subsystem; at least one member engine is configured to provide a meta index of the member engine to the search service subsystem, and after receiving the search request sent by the search service subsystem, Finish the search.
该图 6所示的搜索服务子系统 20C由搜索服务器 201C, 调度服务器 203C, 用户数据库 202 组成, 其中,  The search service subsystem 20C shown in FIG. 6 is composed of a search server 201C, a dispatch server 203C, and a user database 202, wherein
该用户数据库 202用于存储或者提供用户的个性化数据;  The user database 202 is configured to store or provide personalized data of the user;
该搜索服务器 201C, 用于接收搜索客户端发送的搜索请求, 将该搜索请求发送给该调度服务 器, 接收该调度服务器返回的其选择的成员引擎, 并将该搜索请求发送给该选择的成员引擎; 该调度服务器 203C, 用于接收搜索服务器发送的该搜索请求, 根据该搜索请求从用户个性化 数据中提取该用户的兴趣模型, 以及, 获取各个成员引擎的元索引; 根据该各个成员引擎的元索引、 该搜索请求和该用户的兴趣模型选择成员引擎; 将该选择的成员引擎返回给该搜索服务器。 The search server 201C is configured to receive a search request sent by the search client, send the search request to the dispatch server, receive the selected member engine returned by the dispatch server, and send the search request to the selected member engine. The scheduling server 203C is configured to receive the search request sent by the search server, and personalize the user according to the search request. Extracting the user's interest model from the data, and obtaining a meta index of each member engine; selecting a member engine according to the meta index of the respective member engine, the search request, and the user's interest model; returning the selected member engine to the Search server.
参考图 7, 为图 6所示的系统架构下, 搜索服务器 201C, 调度服务器 203C的结构示意图。 如图 7中所示的搜索服务器 201C包括:  Referring to FIG. 7, the structure of the search server 201C and the dispatch server 203C is shown in the system architecture shown in FIG. 6. The search server 201C as shown in Fig. 7 includes:
搜索请求接收模块 201C1 , 用于接收搜索客户端发送的搜索请求, 并将该搜索请求发送给成员 引擎选择请求发送模块 201C3 ;  The search request receiving module 201C1 is configured to receive a search request sent by the search client, and send the search request to the member engine selection request sending module 201C3;
成员引擎选择请求发送模块 201C3 , 用于将所述搜索请求发送给调度服务器, 以便于该调度服 务器根据所述各个成员引擎的元索引、 所述搜索请求和用户的兴趣模型选择成员引擎;  a member engine selection request sending module 201C3, configured to send the search request to the scheduling server, so that the scheduling server selects a member engine according to the meta index of the respective member engines, the search request, and the user's interest model;
成员引擎选择结果接收模块 201C4, 用于接收所述调度服务器返回的其选择的成员引擎; 搜索请求分发模块 201C5, 用于将所述搜索请求发送给所述选择的成员引擎, 以便于所述选择 的成员引擎根据所述搜索请求完成搜索。  a member engine selection result receiving module 201C4, configured to receive a selected member engine returned by the scheduling server; a search request distribution module 201C5, configured to send the search request to the selected member engine, to facilitate the selection The member engine completes the search based on the search request.
如图 7中所示的一种调度服务器 203C,该调度服务器 203C可与前述的搜索服务器 201C通信, 包括  As shown in FIG. 7, a dispatch server 203C, which can communicate with the aforementioned search server 201C, includes
成员引擎选择请求接收模块 203C1 , 用于接收搜索服务器发送的搜索请求;  a member engine selection request receiving module 203C1, configured to receive a search request sent by the search server;
用户的兴趣模型提取模块 203C5,用于根据接收的搜索请求从用户个性化数据中提取该用户的 兴趣模型, 并将该用户的兴趣模型发送给成员引擎选择模块 203C3 ;  The user's interest model extraction module 203C5 is configured to extract the user's interest model from the user personalized data according to the received search request, and send the user's interest model to the member engine selection module 203C3;
元索引收集模块 203C2, 用于获取各成员引擎的元索引;  a meta index collection module 203C2, configured to obtain a meta index of each member engine;
成员引擎选择模块 203C3 , 用于根据各个成员引擎的元索引、搜索请求和用户的兴趣模型选择 成员引擎;  a member engine selection module 203C3, configured to select a member engine according to a meta index of each member engine, a search request, and a user's interest model;
成员引擎选择结果返回模块 203C4, 用于根据所述选择的成员引擎发送给所述搜索服务器, 以 便于所述搜索服务器将所述搜索请求发送给所述选择的成员引擎, 所述选择的成员引擎根据所述搜 索请求完成搜索。  a member engine selection result returning module 203C4, configured to send to the search server according to the selected member engine, so that the search server sends the search request to the selected member engine, the selected member engine The search is completed according to the search request.
以上所描述的装置实施例仅仅是示意性的, 其中所述作为分离部件说明的单元可以是或者也可 以不是物理上分开的, 作为单元显示的部件可以是或者也可以不是物理单元, 即可以位于一个地方, 或者也可以分布到多个网络单元上。 可以根据实际的需要选择其中的部分或者全部模块来实现本实 施例方案的目的。 本领域普通技术人员在不付出创造性的劳动的情况下, 即可以理解并实施。  The device embodiments described above are merely illustrative, wherein the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, ie may be located One place, or it can be distributed to multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the objectives of the embodiments of the present invention. Those of ordinary skill in the art can understand and implement without deliberate labor.
上述用于搜索的系统和装置, 由于根据各个成员引擎的元索引、 搜索请求和用户的兴趣模型选 择成员引擎, 由这些选择的成员引擎进行搜索, 进一步提高了搜索的准确度, 提高了搜索的效率。  The above-mentioned system and device for searching, because the member engine is selected according to the meta index of each member engine, the search request, and the user's interest model, and the search is performed by the selected member engines, thereby further improving the accuracy of the search and improving the search. effectiveness.
前述的用于搜索的系统和搜索服务器、 调度服务器等装置的工作流程与下文的搜索方法类似, 下面将详述之。  The aforementioned workflows for searching systems and search servers, scheduling servers, etc. are similar to the search methods below, as will be described in more detail below.
参考图 8, 为一种实施例搜索方法的流程示意图。 该方法包括:  Referring to FIG. 8, a flow chart of a search method of an embodiment is shown. The method includes:
801、 搜索服务子系统接收各个成员引擎的元索引。  801. The search service subsystem receives a meta index of each member engine.
具体的, 可以是各个成员引擎向搜索服务子系统上报其元索引。其中各成员引擎分别对应一个 数据库, 该数据库可能包含若干子数据库, 成员引擎的元索引具体由该成员引擎对应的数据库或者 子数据库中的数据确定。 该元索引包含关于该数据中的术语的统计数据或者该数据中的文档的兴趣 模型的统计数据。 其中, 文档的兴趣模型是基于该文档中的数据提取的相对于一定的维度的权重分 值所组成的向量。 在具体的发明实施方式中, 上述文档的兴趣模型和前述用户的兴趣模型采用的维 度应该一致。 Specifically, each member engine may report its meta index to the search service subsystem. Each member engine corresponds to a database, and the database may include several sub-databases. The meta-index of the member engine is specifically determined by the data in the database or sub-database corresponding to the member engine. The meta index contains statistics about terms in the data or interest in documents in the data Model statistics. The interest model of the document is a vector based on the weighted scores of the data extracted in the document relative to a certain dimension. In a specific embodiment of the invention, the dimension of the interest model of the above document and the interest model of the user should be consistent.
更具体的, 该元索引可以包括下述信息之一或者其任意组合:  More specifically, the meta index may include one of the following information or any combination thereof:
①、 术语最大归一化权重向量 mnw=(mnwl, mnw2, , mnwi, mnwp), 其中 mnwi 为术语 ti相对于该成员引擎对应的数据库或者子数据库中的所有文档的最大归一化权重。  1. The term maximum normalized weight vector mnw=(mnwl, mnw2, , mnwi, mnwp), where mnwi is the maximum normalized weight of the term ti relative to all documents in the database or subdatabase corresponding to the member engine.
其中, mnwi可以以下面的方式计算得到:首先计算数据库中的每个文档相对于术语 ti的权重, 权重的取值为文档中术语出现的次数, 再从所有文档相对术语 ti的权重中取最大值, 就得到术语 ti 相对于数据库中的所有文档的最大权重 mnwi', 再对向量 mnw' mnwl' , mnw2' , ...mnwi' , .... ., mnwp') 作归一化处理,最终得到 mnw'向量的归一化向量 mnw(mnwl, mnw2, ..., mnwi, , mnwp  Where mnwi can be calculated in the following way: first calculate the weight of each document in the database relative to the term ti, the value of the weight is the number of occurrences of the term in the document, and then the maximum weight of all documents relative to the term ti Value, the maximum weight mnwi' of the term ti relative to all documents in the database is obtained, and the vectors mnw' mnwl', mnw2', ... mnwi', ...., mnwp') are normalized. , finally get the normalized vector mnw of mnw' vector (mnwl, mnw2, ..., mnwi, , mnwp
②、 术语平均归一化权重向量 anw =(anwl, anw2, , anwi , anwp), 其中 anwi 为术语 ti相对于该成员引擎对应的数据库或者子数据库中的所有文档的平均归一化权重。  2. The term average normalized weight vector anw = (anwl, anw2, , anwi , anwp), where anwi is the average normalized weight of the term ti relative to all documents in the database or subdatabase corresponding to the member engine.
其中, anwi可以通过如下的方式计算: 首先计算数据库的每个文档相对于术语 ti的权重, 权 重的取值为文档中术语出现的次数, 再从所有文档相对术语 ti的权重中取平均值, 就得到术语 ti相 对于数据库中的所有文档的平均权重 anwi' , 再对向量 anw' (anwl',anw2' ... anwi', .... . ,anwp')作归一 化处理, 最终得到 anw'向量的归一化向量 anw(anwl,anw2,...,anwi, ,anwp  Among them, anwi can be calculated as follows: First, calculate the weight of each document of the database relative to the term ti, the value of the weight is the number of occurrences of the term in the document, and then average from the weight of all documents relative to the term ti, Obtain the average weight anwi' of the term ti relative to all documents in the database, and then normalize the vector anw' (anwl', anw2' ... anwi', .... . , anwp'), and finally Get the normalized vector anw of anw' vector (anwl, anw2,..., anwi, , anwp
③、数据库或者子数据库中的文档的兴趣模型最大归一化权重向量 mnv=(mnVl, mnv2, ......, mnvi, ......, mnvn ), 其中 mnvi为该文档的兴趣模型的第 i个维度相对于该成员引擎对应的数据 库或者子数据库中的所有文档的最大归一化权重。 3. The interest model of the document in the database or sub-database has the largest normalized weight vector mnv=(mn V l, mnv2, ..., mnvi, ..., mnvn ), where mnvi is the The maximum normalized weight of the i-th dimension of the document's interest model relative to all documents in the database or sub-database corresponding to the member engine.
其中, mnvi可以通过下面的方式计算得到: 首先计算数据库的每个文档相对于兴趣模型第 i 个维度的权重, 权重的取值为文档中属于兴趣模型第 i个维度范围 (如: 体育) 的所有词的词频之 和; 再从所有文档相对于兴趣模型第 i个维度的权重中取最大值,就得到兴趣模型的第 i个维度相对 于数据库 D中的所有文档的最大权重 mnvi', 再对向量 mnv'(mnvl',mnv2' ...mnvi', .... . ,mnvp')作归 一化处理, 最终得到 mnv'向量的归一化向量 mnv(mnvl,mnv2,...,mnvi, ,mnvp)。  Among them, mnvi can be calculated in the following way: First, calculate the weight of each document of the database relative to the i-th dimension of the interest model. The value of the weight is the i-th dimension range of the interest model (such as: sports) in the document. The sum of the word frequencies of all words; and then taking the maximum value from the weights of all documents relative to the i-th dimension of the interest model, the maximum weight mnvi' of the i-th dimension of the interest model relative to all documents in the database D is obtained. The vector mnv'(mnvl', mnv2' ... mnvi', ....., mnvp') is normalized, and finally the normalized vector mnv of the mnv' vector is obtained (mnvl, mnv2,... , mnvi, , mnvp).
④、 数据库或者子数据库中的文档的兴趣模型平均归一化权重向量 anV =(anVl, anv2, ......, anvi, ......, anvn), 其中 anvi为该文档的兴趣模型的第 i个维度相对于该成员引擎对应的数据库中 的所有文档的平均归一化权重。 4. The interest model of the document in the database or subdatabase averages the normalized weight vector an V = (an V l, anv2, ..., anvi, ..., anvn), where anvi is The average normalized weight of the i-th dimension of the document's interest model relative to all documents in the database corresponding to the member engine.
其中, anvi可以通过下面的方式计算得到: 首先计算数据库的每个文档相对于兴趣模型第 i个 维度的权重, 权重的取值为文档中属于兴趣模型第 i个维度范围 (如: 体育) 的所有词的词频之和; 再从所有文档相对于兴趣模型第 i个维度的权重中计算平均值,就得到兴趣模型的第 i个维度相对于 数据库中的所有文档的平均权重 anvi', 再对向量 anV'(anVr,anV2' ...anVi', .... . ,anvp')作归一化处理, 最终得到 anv'向量的归一化向量 anv(anvl,anv2,...,anvi, ,anvp)。 Among them, anvi can be calculated in the following way: First, calculate the weight of each document in the database relative to the i-th dimension of the interest model. The value of the weight is the i-th dimension of the interest model in the document (such as: sports). The sum of the word frequencies of all words; then calculate the average from the weights of all the documents relative to the i-th dimension of the interest model, and get the average weight of the i-th dimension of the interest model relative to all documents in the database anvi', then The vector an V '(an V r,an V 2' ...an V i', .... . , anvp') is normalized, and finally the normalized vector anv of the anv' vector (anvl, Anv2,...,anvi, ,anvp).
⑤、 术语 ti相对于该数据库的全局反向文档频率 gidfi, 其中 gidf= l/dfi, dfi为该元索引对应 的数据库中包含术语 ti的文档的数量。 ⑥、 文档的兴趣模型第 i个维度对应的全局反向文档频率 IM _gidfi, 其中 IM _gidfi=l/IM_IDFi, IM— IDFi为数据库或者子数据库中包含的属于文档的兴趣模型的第 i个维度的一个或者多个术语的 文档的个数。 5. The term ti is relative to the global reverse document frequency gidfi of the database, where gidf=l/dfi, dfi is the number of documents containing the term ti in the database corresponding to the meta index. 6. The global reverse document frequency IM _gidfi corresponding to the i-th dimension of the document's interest model, where IM _gidfi=l/IM_IDFi, IM-IDFi is the i-th dimension of the interest model belonging to the document contained in the database or sub-database The number of documents for one or more terms.
802、 搜索服务子系统接收搜索客户端发送的搜索请求。  802. The search service subsystem receives a search request sent by the search client.
一般的, 该搜索请求中携带有用户的 ID, 以及, 由一个或多个术语组成的搜索关键字等信息。 后文可以用 β = (gl, g2,..g ...g :)表示搜索请求对应的向量, 其中 ^为搜索请求中术语 的权重, i、 k为自然数。  Generally, the search request carries the ID of the user, and information such as a search keyword composed of one or more terms. In the following text, β = (gl, g2, ..g ... g :) can be used to represent the vector corresponding to the search request, where ^ is the weight of the term in the search request, and i and k are natural numbers.
803、 搜索服务子系统从用户数据库中提取用户的兴趣模型。  803. The search service subsystem extracts a user's interest model from the user database.
具体的, 用户数据库中一般存储有用户的个性化数据, 包括用户的静态用户档案 profile搜索 历史、 呈现信息、 位置信息等。 针对不同的数据可以采用具体不同的方法提取用户的兴趣模型, 后 文将详述之。  Specifically, the user database generally stores personalized data of the user, including the user's static user profile search history, presence information, location information, and the like. The user's interest model can be extracted by different methods for different data, which will be detailed later.
804、 根据搜索请求、 用户的兴趣模型和元索引选择成员引擎。  804. Select a member engine according to the search request, the user's interest model, and the meta index.
具体的,该选择的步骤既考虑搜索请求,也考虑元索引的情况,更考虑用户的兴趣模型的因素, 其具体过程将在后文详述之。  Specifically, the step of selecting considers both the search request and the meta-index, and more factors of the user's interest model, and the specific process will be described later.
805、 将搜索请求分发给选中的成员引擎。 参考图 9, 为另一个搜索的方法的流程示意图, 该方法包括:  805. Distribute the search request to the selected member engine. Referring to Figure 9, a flow diagram of another search method, the method includes:
901、 调度服务器接收各个成员引擎元索引。  901. The scheduling server receives each member engine element index.
具体的, 可以是各个成员引擎向调度服务器上报其元索引。  Specifically, each member engine may report its meta index to the scheduling server.
902、 搜索服务器接收搜索客户端发送的搜索请求。  902. The search server receives a search request sent by the search client.
一般的, 该搜索请求中携带有用户的 ID, 以及, 由一个或多个术语组成的搜索关键字等信息。 可以看出, 前述 901、 902步骤可以没有时间的先后顺序。  Generally, the search request carries the ID of the user, and information such as a search keyword composed of one or more terms. It can be seen that the foregoing steps 901 and 902 may have no prioritized sequence.
903、 搜索服务器接收到搜索请求后, 从用户数据库中的用户个性化数据中提取用户的兴趣模 型。  903. After receiving the search request, the search server extracts the user's interest model from the user personalized data in the user database.
904、 搜索服务器将该搜索请求和该用户的兴趣模型提交给调度服务器。  904. The search server submits the search request and the user's interest model to the scheduling server.
905、 该调度服务器根据搜索请求、 用户的兴趣模型和元索引选择成员引擎。  905. The scheduling server selects a member engine according to the search request, the user's interest model, and the meta index.
906、 该调度服务器将选择的成员引擎发送给该搜索服务器。  906. The scheduling server sends the selected member engine to the search server.
907、 该搜索服务器将前述搜索请求发送给该选择的成员引擎, 以便于该成员引擎完成搜索。 参考图 10, 为另一个类似的搜索的方法, 该方法包括:  907. The search server sends the foregoing search request to the selected member engine, so that the member engine completes the search. Referring to Figure 10, for another similar search method, the method includes:
1001、 调度服务器接收各个成员引擎的元索引。  1001. The scheduling server receives a meta index of each member engine.
具体的, 可以是各个成员引擎向该调度服务器上报其元索引。  Specifically, each member engine may report its meta index to the scheduling server.
1002、 搜索客户端向搜索服务器发送搜索请求。  1002. The search client sends a search request to the search server.
一般的, 该搜索请求中携带有用户的 ID, 以及, 由一个或多个术语组成的搜索关键字等信息。 可以看出, 该 1001、 1002的步骤没有时间上的先后顺序。 1003、 该搜索服务器接收到该搜索请求后将该搜索请求发送给该调度服务器。 Generally, the search request carries the ID of the user, and information such as a search keyword composed of one or more terms. It can be seen that the steps of 1001 and 1002 have no prioritization in time. 1003. The search server sends the search request to the scheduling server after receiving the search request.
1004、该调度服务器接收到该搜索请求后从用户数据库中的用户个性化数据提取用户的兴趣模 型。  1004. After receiving the search request, the dispatching server extracts a user's interest model from user personalized data in the user database.
1005、 该调度服务器根据搜索请求、 用户的兴趣模型和元索引选择成员引擎。  1005. The scheduling server selects a member engine according to the search request, the user's interest model, and the meta index.
1006、 该调度服务器将选择的成员引擎返回给搜索服务器。  1006. The scheduling server returns the selected member engine to the search server.
1007、 该搜索服务器接收到选择的成员引擎以后, 将该搜索请求分发给该选择的成员引擎。 下面具体介绍前述步骤 803、 903或者 1004中从用户数据库中的用户个性化数据中提取用户的 兴趣模型的具体方法。  1007. After receiving the selected member engine, the search server distributes the search request to the selected member engine. The specific method of extracting the user's interest model from the user personalized data in the user database in the foregoing steps 803, 903 or 1004 is specifically described below.
其中, 参考图 11, 一种从用户个性化数据中提取该用户的兴趣模型的方法包括:  Referring to FIG. 11, a method for extracting the user's interest model from user personalized data includes:
1101、 用若干兴趣维度表示用户的兴趣;  1101, representing a user's interest with a number of interest dimensions;
该兴趣维度例如: 新闻、 体育、 娱乐、 财经、 科技、 房产、 游戏、 女性、 论坛、 天气、 商品、 家电、 音乐、 读书、 博客、 手机、 军事、 教育、 旅游、 彩信、 彩铃、 餐饮、 民航、 工业、 农业、 电 脑、 地理等, 当然实际实施的过程中不限于上述兴趣维度。  The dimensions of interest such as: news, sports, entertainment, finance, technology, real estate, games, women, forums, weather, merchandise, home appliances, music, reading, blog, mobile, military, education, travel, MMS, ring tones, catering, civil aviation , industry, agriculture, computers, geography, etc. Of course, the actual implementation process is not limited to the above dimension of interest.
1102、 针对每个兴趣维度给出评分值;  1102, giving a rating value for each interest dimension;
1103、 由所述针对每个兴趣维度的评分值组成一个向量, 该向量为该用户的兴趣模型。  1103. Form, by the score value for each interest dimension, a vector, where the vector is an interest model of the user.
该用户的兴趣模型可以用公式表示为/? = ^12,'^'—"7), 其中 n表示 n个维度, 为该用 户的兴趣模型第 i个维度的权重分值, i为自然数。 具体的, 该用户个性化数据可以是用户的静态用户档案 profile, 获得的用户的兴趣模型可以称 为用户静态兴趣模型。 参考图 12, 该方法具体包括: The user's interest model can be expressed as /? = ^ 1 , 2 , '^' - " 7 ), where n represents n dimensions, which is the weight score of the i-th dimension of the user's interest model, i is Specifically, the user personalized data may be a static user profile of the user, and the obtained user's interest model may be referred to as a user static interest model. Referring to FIG. 12, the method specifically includes:
1201、 获得所述用户的静态用户档案中的属于某兴趣维度的词的词频;  1201. Obtain a word frequency of a word belonging to a certain interest dimension in a static user profile of the user;
1202、 计算属于该兴趣维度的所有词的词频之和, 作为该兴趣维度的评分值;  1202. Calculate a sum of word frequencies of all words belonging to the interest dimension as a score value of the interest dimension;
1203、 各评分值组成一个评分值向量, 该评分值向量为该用户的静态兴趣模型。  1203. Each score value constitutes a score value vector, and the score value vector is a static interest model of the user.
该用户的静态兴趣模型可以用公式表示为: 用户的静态用户档案所对应的兴趣模型 Rl=(pl, The user's static interest model can be expressed as: The interest model corresponding to the user's static user profile Rl=(pl,
P2, p3 , ……, pi ), 其中 pi为静态用户档案中属于第 i个兴趣维度的所有词的词频之和, i为自 然数。 具体的, 该用户个性化数据还可以是用户的搜索历史,则获得的用户的兴趣模型可以称为用户 的动态兴趣模型。 参考图 13, 该方法可以具体包括: P 2, p3 , ..., pi ), where pi is the sum of the word frequencies of all words belonging to the i-th interest dimension in the static user profile, and i is a natural number. Specifically, the user personalized data may also be a user's search history, and the obtained user's interest model may be referred to as a user's dynamic interest model. Referring to FIG. 13, the method may specifically include:
1301、 获得所述用户的搜索历史中被点击的某文档中属于某兴趣维度的词的词频;  1301. Obtain a word frequency of a word belonging to a certain interest dimension in a document that is clicked in the search history of the user;
1302、计算该文档中属于该兴趣维度的所有词的词频之和, 该词频之和作为该文档针对该兴趣 维度的评分值;  1302. Calculate a sum of word frequencies of all words in the document belonging to the interest dimension, and the sum of the word frequencies is used as a score value of the document for the interest dimension;
1303、 针对各兴趣维度的各评分值形成针对该文档的评分值向量;  1303. Forming a score value vector for the document for each score value of each interest dimension;
1304、 针对不同搜索历史中被点击的文档的上述评分值向量的和为所述用户的动态兴趣模型。 该用户的动态兴趣模型可以用公式表示为:用户的搜索点击历史对应的兴趣模型 R2=dl+d2+d3 +…… di, 其中 di =(tl, t2, t3 , ……, tj), di为用户点击某个文档 i所对应的兴趣模型向量, 当用 户点击了这个文档 i, tj 等于该文档 i中属于第 j个兴趣维度的所有词的词频之和, i, j为自然数。 1304. The sum of the above-mentioned score value vectors for the clicked documents in different search histories is the dynamic interest model of the user. The user's dynamic interest model can be expressed as: the user's search click history corresponding to the interest model R2 = dl + d2 + d3 + ... di, where di = (tl, t2, t3, ..., tj), di For the user to click on the interest model vector corresponding to a certain document i, when the user clicks on the document i, tj is equal to the sum of the word frequencies of all the words belonging to the jth interest dimension in the document i, i, j is a natural number.
当用户个性化数据为搜索历史时,在具体的例子中, 可以根据具体情况进一步修正该用户的兴 趣模型, 包括:  When the user personalized data is the search history, in a specific example, the user's interest model may be further modified according to the specific situation, including:
当用户对于被点击的该文档评价好时, 对所述针对该文档的评分值向量进行正向加权。  When the user evaluates the document being clicked, the rating value vector for the document is forward weighted.
该步骤可以用公式表示为: 如果对于被点击的该文档 i评价好, di向量乘以一个正的常数 c表 示文档的重要性增加 di = c*di = (c*ti, c*t2, c*t3 , ......, c*tn)。 或者,  This step can be expressed as: If the evaluation of the document i is good, the di vector is multiplied by a positive constant c to indicate the importance of the document. di = c*di = (c*ti, c*t2, c *t3 , ..., c*tn). Or,
当用户对于被点击的该文档评价不好时, 对所述针对该文档的评分值向量进行反向加权; 该步骤可以用公式表示为: 如果对于被点击的该文档 i评价不好, di向量乘以一个正的常数 c 的倒数表示文档的重要性减小 di = l/c*di = (l/c*ti, l/c*t2, l/c*t3 , ......, l/c*tn)。  When the user does not evaluate the clicked document, the weighted value vector for the document is inversely weighted; the step can be expressed as: if the evaluation is not good for the clicked document i, di vector Multiplying the reciprocal of a positive constant c indicates that the importance of the document is reduced by di = l/c*di = (l/c*ti, l/c*t2, l/c*t3, ..., l/c*tn).
根据该文档被点击后经历的时间, 递减所述针对该文档的评分值向量。  The score value vector for the document is decremented according to the time elapsed after the document was clicked.
例如, 过了一段时间, tj的值自动减少一定的百分比, 表示随着时间的推移其重要性减弱, 直 到过了较长的时间 tj的值减为零为止, 这时将 di从用户的搜索历史中删除。 例如, 点击后每经过一 个月, tj的值减少 10%。 当点击后经历连续 10个月而无新的点击之后, tj的值减为零。 在实施过程中, 该用户个性化数据可以既包括用户的静态用户档案 profile, 也包括用户的搜索 历史, 在这种情况下, 获得的用户的兴趣模型可以称之为用户的综合的兴趣模型。 那么, 提取用户 兴趣模型的方法还可以包括:  For example, after a period of time, the value of tj is automatically reduced by a certain percentage, indicating that its importance decreases with time, until the value of tj is reduced to zero after a long period of time, then di is searched from the user. Deleted in history. For example, after every month after clicking, the value of tj is reduced by 10%. After a click for 10 consecutive months without a new click, the value of tj is reduced to zero. In the implementation process, the user personalized data may include both the static user profile of the user and the search history of the user. In this case, the obtained user's interest model may be referred to as the user's comprehensive interest model. Then, the method for extracting the user interest model may further include:
将该静态兴趣模型、该动态兴趣模型分别作归一化, 以该静态兴趣模型与该动态兴趣模型的和 为综合的兴趣模型; 或者,  Normalizing the static interest model and the dynamic interest model, and combining the static interest model with the dynamic interest model as a comprehensive interest model; or
将该静态兴趣模型、该动态兴趣模型先加权相加, 再将该相加的和进行归一化, 该归一化的结 果为综合的兴趣模型。 下面再具体介绍前述 804、 905或者 1005步骤中根据搜索请求、用户的兴趣模型和元索引选择 成员引擎的具体过程。  The static interest model, the dynamic interest model are first weighted and added, and the summed sum is normalized, and the normalized result is a comprehensive interest model. The specific process of selecting the member engine according to the search request, the user's interest model, and the meta index in the foregoing steps 804, 905 or 1005 will be specifically described below.
参考图 14, 该选择的过程具体包括:  Referring to Figure 14, the selection process specifically includes:
1401、 获取某数据库中的第一文档与搜索请求向量 Q的第一相似度, 该第一文档满足与搜索 请求向量 Q相似度最高的条件;其中,搜索请求向量 2 = (≠^-≠-≠1, ^为搜索请求中术语 的权重。  1401. Acquire a first similarity between a first document in a database and a search request vector Q, where the first document satisfies a condition with the highest similarity with the search request vector Q; wherein, the search request vector 2 = (≠^-≠- ≠ 1, ^ is the weight of the term in the search request.
1402、 获取该数据库中的第二文档与用户的兴趣模型 R 的第二相似度, 并且该第二文档满足 在与向量 Q' (ql', q2' ... qm')匹配度满足规定阈值的基础上与用户的兴趣模型向量 R相似度最高的 条件,其中, 向量 Q'为搜索请求向量 Q针对用户的兴趣模型向量 R的转换形式,用户的兴趣模型向 量 ? = ir\, r2,..ri...., rn), 为用户的兴趣模型第 i个维度的权重分值。 1403、选取上述第一相似度和第二相似度中的较大值作为该搜索请求和该用户的兴趣模型的结 合与该数据库之间的相似度。 1402: Obtain a second similarity between the second document in the database and the user's interest model R, and the second document satisfies the matching threshold with the vector Q′ (ql′, q2′ . . . qm′). Based on the condition that the user's interest model vector R has the highest similarity, the vector Q' is a conversion form of the search request vector Q for the user's interest model vector R, and the user's interest model vector? = ir\, r2,. .ri...., rn), the weighting score for the i-th dimension of the user's interest model. 1403. Select a larger one of the first similarity and the second similarity as the similarity between the combination of the search request and the user's interest model and the database.
1404、重复采用上述方法,获得该搜索请求和该用户的兴趣模型的结合与各个数据库之间的相 似度, 其中各数据库分别对应一个成员引擎;  1404. Repeatly adopting the foregoing method to obtain a similarity between the combination of the search request and the user's interest model and each database, where each database corresponds to a member engine;
1405、按该搜索请求和该用户的兴趣模型的结合与各个数据库之间的相似度对各数据库进行排 序, 选择排在前面的相似度较大的一个或多个数据库对应的成员引擎。 具体的, 上述 1401〜1402中获得第一相似度、 第二相似度的过程, 可以包含不同的步骤 (即 采取不同的公式或者算法 ), 下面举几个具体的例子进行说明。  1405. Sort each database according to the similarity between the search request and the user's interest model and each database, and select a member engine corresponding to one or more databases with a similar degree of similarity. Specifically, the process of obtaining the first similarity and the second similarity in the above 1401~1402 may include different steps (that is, adopt different formulas or algorithms), and several specific examples are described below.
一种情况下: qj * giajj * anwj) 该 1401a 步骤包括: 计算 ( (qi*gidfi*mnwi + ≠i /|Q| + In one case: qj * giajj * anwj) The 1401a steps include: Calculation ( ( qi * gidfi * mnwi + ≠i /|Q| +
^rj* anvj ^rj* anvj
j=l /|R|)的值, 其中 IQI为搜索请求向量 Q的模, |R|为用户的兴趣模型 R的模; 该值作为该 第一相似度。 The value of j = l /|R|), where IQI is the modulus of the search request vector Q, and |R| is the modulus of the user's interest model R; this value is taken as the first similarity.
Max  Max
该 1402a步骤包括: 计算 l≤ ≤"(if(sim(V (画 νβ"ν /'1≤·/'")) , Q')>T)then The 1402a step includes: calculating l ≤ ≤ "(if (s i m(V (picture ν , β " ν /' , 1 ≤ · / ' ")) , Q ') > T) then
* anvj ^qi* gidfi* anwi * anvj ^qi* gidfi* anwi
((fi*mnvi + , ) /|R|+ ≡ί /|Q| )的值, 该值作为该第二相似度; 其中, Q'的计算方法为: 如果术语 ti属于用户的兴趣模型的某个维度的范围, 将 qi的值映射 成用户的兴趣模型的该维度的权重, 然后将相同维度的权重相加得到 qi', 再作归一化处理; V为由 ^^和^^^ ,^^^组成的向量; iim{ Y{mnvi,anvj{j≠i,\<j<n))Q,)为向量 v 和向量 Q'的 cousine相似度; T为一个阀值, 且 0<T≤ 1; i, k, j, n为自然数。 另一种情况下: qj * giajj * anwj) 该 1401b 步骤包括: 计算 1≤ A ( {qi*gidfi*mnwi + ≠i /|Q| + ((fi*mnvi + , ) /| R |+ ≡ί /|Q| ), the value is taken as the second similarity; wherein Q' is calculated as: if the term ti belongs to the user's interest model The range of a dimension, the value of qi is mapped to the weight of the dimension of the user's interest model, and then the weights of the same dimension are added to obtain qi', and then normalized; V is ^^ and ^^^ , ^^^ consists of vectors; iim{ Y{ mnvi, anvj{j≠i,\<j<n) ) , Q ,) is the cousine similarity of the vector v and the vector Q'; T is a threshold, and 0<T≤ 1; i, k, j, n are natural numbers. In another case: qj * giajj * anwj) The 1401b step consists of: Calculating 1 ≤ A ( { qi * gidfi * mnwi + ≠i /|Q| +
^rj*anvj*IM _gidjj / ) 的值, 其中 |Q|为搜索请求向量 Q的模, |R|为用户的兴趣模型 R的 模; 该值作为该第一相似度。 The value of ^rj*anvj*IM _gidjj / ), where |Q| is the modulus of the search request vector Q, |R| is the modulus of the user's interest model R; this value is used as the first similarity.
Max  Max
该 1402b 步 骤 包 括 : 计 算 1≤ ≤" ( if (sim( V( IM_gidfi*mnvi , IM _gidjj * anvj(j≠ i,\ < j≤n) ) , Q' )> T ) „ qi * gidfi * anwi) then( ( ri * mnvi * IM _ gidfi + ^j* anvj * IM _ gidff ) /|R + '.=1 /|Q ) 的值, 该值作为该第二相似度。 The 1402b step includes: calculating 1 ≤ ≤" (if (sim( V( IM_gidfi*mnvi , IM _gidjj * anvj(j≠ i,\ < j≤n) ) , Q')> T ) „ qi * gidfi * anwi) then( ( ri * mnvi * IM _ gidfi + ^j* anvj * IM _ gidff ) /|R + '. =1 /|Q ), the value as the second similarity .
其中, Q'的计算方法为: 如果术语 ti属于用户的兴趣模型的某个维度的范围, 将 qi的值映射 成用户的兴趣模型的该维度的权重, 然后将相同维度的权重相加得到 qi', 再作归一化处理; V为由 IM_ gidfi* mnvi 和 — gidj} * anvjij≠ i,l≤ j≤ n) 组 成 的 向 量 ; sim( V ( IM _ gidfi * mnvi , IM _ gidff * anvj (J≠ i,\ < _/≤")) , Q' )为向量 V和向量 Q'的 cousine 相似度; τ为一个阀值, 且 0<Τ≤ 1。 又一种情况下: gidjj * anwj) 该 1401c 步骤包括: 计算
Figure imgf000013_0001
/|Q| + )的值,其中 |Q|为搜索请求向量 Q的模, |R|为用户的兴趣模型 R的模;
Figure imgf000013_0002
Wherein, the calculation method of Q' is: If the term ti belongs to a range of a dimension of the user's interest model, the value of qi is mapped to the weight of the dimension of the user's interest model, and then the weights of the same dimension are added to obtain qi. ', then normalized; V is a vector consisting of IM_gidfi* mnvi and - gidj} * anvjij≠ i,l≤ j≤ n); sim( V ( IM _ gidfi * mnvi , IM _ gidff * anvj (J≠ i,\ < _/ ≤")) , Q' ) is the cousine similarity of vector V and vector Q'; τ is a threshold and 0 < Τ ≤ 1. Another case: gidjj * Anwj) The 1401c steps include:
Figure imgf000013_0001
a value of /|Q| + ), where |Q| is the modulus of the search request vector Q, and |R| is the modulus of the user's interest model R;
Figure imgf000013_0002
该值作为该第一相似度。 This value is taken as the first similarity.
Max  Max
该 1402c步骤包括: 计算 1≤ ί'≤ " ( if (sim( V ( mnvi,
Figure imgf000013_0003
≠ i,l≤j≤n)) , Q' )> T ) k
The 1402c step includes: calculating 1 ≤ ί' ≤ " ( if (sim( V ( mnvi,
Figure imgf000013_0003
≠ i,l≤j≤n)) , Q')> T ) k
„ qi * gidfi * anwi) then( ( ri * mnvi * IM _ gidfi + ^j* anvj * IM _ gidff ) /|R + '.=1 /|Q ) 的值, 该值作为该第二相似度; „ qi * gidfi * anwi) then( ( ri * mnvi * IM _ gidfi + ^j* anvj * IM _ gidff ) /|R + '. =1 /|Q ), the value as the second similarity ;
其中 Q'的计算方法为: 如果术语 ti属于用户的兴趣模型的某个维度的范围, 将 qi的值映射成 用户的兴趣模型的该维度的权重, 然后将相同维度的权重相加得到 qi', 再作归一化处理; V为由 mnvi和^ V /≠U 组成的向量; ^m{ N(mnvi,anvjU≠iA<j<n)^ , Q,)为向量 v 和向量 Q'的 cousine相似度; T为一个阀值, 且 0<T≤ 1。 通过前述 1401〜1405的计算和选择, 选择出的成员引擎对应的数据库与搜索请求和用户的兴 趣模型的结合有较好的相似度, 因而能够提高搜索的准确率, 同时节约了搜索系统的资源, 提高了 搜索的效率。 Wherein Q' is calculated as: If the term ti belongs to a range of dimensions of the user's interest model, the value of qi is mapped to the weight of the dimension of the user's interest model, and then the weights of the same dimension are added to obtain qi' And then normalized; V is a vector consisting of mnvi and ^ V /≠U; ^ m{ N( mnvi, anvjU≠iA<j<n)^ , Q,) is vector v and vector Q' Cousine similarity; T is a threshold and 0 < T ≤ 1. Through the calculation and selection of the foregoing 1401~1405, the selected database of the member engine has a good similarity with the combination of the search request and the user's interest model, thereby improving the accuracy of the search and saving the resources of the search system. , improve the efficiency of the search.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到各实施方式可借助软件加必 需的通用硬件平台的方式来实现, 当然也可以通过硬件。 基于这样的理解, 上述技术方案本质上或 者说对现有技术做出贡献的部分可以以软件产品的形式体现出来, 该计算机软件产品可以存储在计 算机可读存储介质中, 如 ROM/RAM、 磁碟、 光盘等, 包括若干指令用以使得一台计算机设备 (可 以是个人计算机, 服务器, 或者网络设备等) 执行各个实施例或者实施例的某些部分所述的方法。 在前述的个实施方式中,成员引擎的选择充分的利用了丰富的用户数据,进行了个性化的选择, 从而可以选出最符合用户个性化兴趣需要的成员引擎为用户服务, 成员引擎选择调度准确, 从而达 到精确搜索的目的。 Through the description of the above embodiments, those skilled in the art can clearly understand that the embodiments can be implemented by means of software plus a necessary general hardware platform, and of course, by hardware. Based on such understanding, the above-described technical solutions may be embodied in the form of software products in essence or in the form of software products, which may be stored in a computer readable storage medium such as ROM/RAM, magnetic Disc, CD, etc., including a number of instructions to make a computer device (available The method described in various parts of the various embodiments or embodiments is performed by a personal computer, server, or network device. In the foregoing embodiments, the selection of the member engine fully utilizes the rich user data, and performs personalized selection, so that the member engine that best meets the user's personalized interest needs can be selected to serve the user, and the member engine selects the scheduling. Accurate, thus achieving the purpose of accurate search.
以上所述的实施方式,并不构成对该技术方案保护范围的限定。任何在上述实施方式的精神和 原则之内所作的修改、 等同替换和改进等, 均应包含在该技术方案的保护范围之内。  The embodiments described above do not constitute a limitation on the scope of protection of the technical solutions. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the above-described embodiments are intended to be included within the scope of the technical solutions.
本发明另一具体实施方式中, 与前述各实施方式中类似, 不同之处在于元索引包含的信息: 术语 ti相对于该数据库的全局反向文档频率可以替换为 gidfi=log(n/(gdfi+l)),其中 gdfi为所有 成员引擎对应数据库或者子数据库中包含术语 ti的文档的数量的总和, n为所有成员引擎所包含的 所有文档数量的总和;  In another embodiment of the present invention, similar to the foregoing embodiments, the difference is that the meta index contains information: the global reverse document frequency of the term ti relative to the database can be replaced by gidfi=log(n/(gdfi) +l)), where gdfi is the sum of the number of documents containing the term ti in the database or subdatabase of all member engines, and n is the sum of the total number of documents included in all member engines;
文档的兴趣模型第 i个维度对应的全局反向文档频率可以替换为 IM _gidfi=log(n/(IM _gdfi+l)), IM _gdfi为所有成员引擎对应的数据库或子数据库中包含属于文档的兴趣模型的第 i个维度的术语的 文档个数的总和, n为所有成员引擎所包含的所有文档数量的总和。 参考图 15, 为本发明实施方式提供的又一种搜索系统的架构示意图。 该系统与前述图 1所述 的系统类似, 包括搜索客户端 10, 搜索服务子系统 20D, 至少一个成员引擎 301, 302或 303。 搜 索客户端 10用于向搜索服务子系统 20D发送搜索请求; 至少一个成员引擎 301, 302或 303, 用于 向搜索服务子系统 20D提供该成员引擎的元索引,并在接收到该搜索服务子系统 20D发送的搜索请 求后, 完成搜索。  The global reverse document frequency corresponding to the i-th dimension of the document's interest model can be replaced by IM _gidfi=log(n/(IM _gdfi+l)), and IM _gdfi is the database or sub-database corresponding to all member engines. The sum of the number of documents for the term of the i-th dimension of the interest model, n is the sum of the total number of documents included in all member engines. FIG. 15 is a schematic structural diagram of still another search system according to an embodiment of the present invention. The system is similar to the system described above with respect to Figure 1, and includes a search client 10, a search service subsystem 20D, and at least one member engine 301, 302 or 303. The search client 10 is configured to send a search request to the search service subsystem 20D; at least one member engine 301, 302 or 303 is configured to provide the search service subsystem 20D with a meta index of the member engine, and upon receiving the search service After the search request sent by the system 20D, the search is completed.
图 15所示的搜索服务子系统 20D由搜索服务器 201D, 应用服务器 204, 用户数据库 202D组 成。  The search service subsystem 20D shown in Fig. 15 is composed of a search server 201D, an application server 204, and a user database 202D.
该用户数据库 202D用于存储或者提供用户的个性化数据。  The user database 202D is used to store or provide personalized data for the user.
该应用服务器 204用于接收搜索客户端发送 10的搜索请求, 根据所述搜索请求从用户个性化 数据中提取该用户的兴趣模型; 将该搜索请求和该用户兴趣模型发送给搜索服务器 201D。  The application server 204 is configured to receive a search request sent by the search client, extract the interest model of the user from the user personalized data according to the search request, and send the search request and the user interest model to the search server 201D.
该搜索服务器 201D用于接收应用服务器 204发送的搜索请求和该用户的兴趣模型, 接收各成 员引擎 301, 302或 303上报的元索引, 并根据所述各个成员引擎 301, 302或 303的元索引、 所述 搜索请求和所述用户的兴趣模型选择成员引擎 301, 302或 303 ; 将该搜索请求发送给所述选择的成 员引擎 301, 302或 303。  The search server 201D is configured to receive the search request sent by the application server 204 and the interest model of the user, and receive the meta index reported by each member engine 301, 302 or 303, and according to the meta index of each member engine 301, 302 or 303. And the search request and the interest model of the user select a member engine 301, 302 or 303; send the search request to the selected member engine 301, 302 or 303.
参考图 16, 为图 15的搜索系统架构下的搜索服务器 201D、 应用服务器 204的结构图。  Referring to FIG. 16, a structural diagram of the search server 201D and the application server 204 in the search system architecture of FIG.
其中, 应用服务器 204包括搜索请求接收模块 2041, 用于接收搜索客户端发送的搜索请求。 兴趣模型提取模块 2042, 用于根据搜索请求从用户的个性化数据中提取用户的兴趣模型。 类似的, 可以是接收到搜索请求后从用户的个性化数据中提取用户的兴趣模型, 也可以是预先从用户的个性 化数据中提取用户的兴趣模型, 接收到搜索请求后直接获取该预先提取的用户的兴趣模型。 发送模 块 2043, 用于将该搜索请求和用户的兴趣模型发送给搜索服务器 201D。  The application server 204 includes a search request receiving module 2041, configured to receive a search request sent by the search client. The interest model extraction module 2042 is configured to extract a user's interest model from the user's personalized data according to the search request. Similarly, the user's interest model may be extracted from the user's personalized data after receiving the search request, or the user's interest model may be extracted from the user's personalized data in advance, and the pre-extraction is directly obtained after receiving the search request. User's interest model. The sending module 2043 is configured to send the search request and the user's interest model to the search server 201D.
其中搜索服务器 201D包括: 接收模块 201D1 , 用于接收应用服务器发送的搜索请求和用户的 兴趣模型。元索引收集模块 201D2,用于接收各个成员引擎上报的元索引。成员弓 I擎选择模块 201D3, 用于根据所述各个成员引擎的元索引、 所述搜索请求和所述用户的兴趣模型选择成员引擎。 搜索请 求分发模块 201D4, 用于将所述搜索请求发送给所述选择的成员引擎, 以便于所述选择的成员引擎 根据所述搜索请求完成搜索。 The search server 201D includes: a receiving module 201D1, configured to receive a search request sent by an application server, and a user Interest model. The meta-index collection module 201D2 is configured to receive a meta-index reported by each member engine. The member engine selection module 201D3 is configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user. The search request distribution module 201D4 is configured to send the search request to the selected member engine, so that the selected member engine completes the search according to the search request.
参考图 17, 为图 15所示的搜索的系统的工作流程图。 该搜索的方法包括:  Referring to Figure 17, a flowchart of the operation of the system shown in Figure 15 is shown. The search method includes:
1700、 搜索服务器接收各成员引擎的元索引。  1700. The search server receives a meta index of each member engine.
具体的, 可以是各成员引擎主动上报自己的元索引给搜索服务器,也可以是搜索服务器向各成 员引擎请求该成员引擎的元索引。  Specifically, each member engine may actively report its own meta index to the search server, or the search server may request the member engine's meta index to each member engine.
1701、 应用服务器接收搜索客户端发送的搜索请求。  1701: The application server receives a search request sent by the search client.
1702、 应用服务器从用户的个性化数据提取用户的兴趣模型。  1702. The application server extracts the user's interest model from the user's personalized data.
具体的, 可以收到搜索请求后从用户的个性化数据中提取用户的兴趣模型,或者可以预先从用 户的个性化数据中提取兴趣模型, 收到搜索请求后, 直接取出该预先提取的用户的兴趣模型。  Specifically, the user's interest model may be extracted from the user's personalized data after receiving the search request, or the interest model may be extracted from the user's personalized data in advance, and after receiving the search request, directly extracting the pre-extracted user's Interest model.
1703、 将该用户的兴趣模型和该搜索请求发给搜索服务器。  1703. Send the user's interest model and the search request to the search server.
1704、 根据该搜索请求、 用户的兴趣模型和元索引选择成员引擎。  1704. Select a member engine according to the search request, the user's interest model, and the meta index.
具体的, 根据搜索请求、兴趣模型和元索引计算搜索请求和兴趣模型与成员引擎对应的数据库 的相似度, 选择相似度高的成员引擎。 具体的选择的方法可以参考前述各具体实施方式。  Specifically, the similarity between the search request and the interest model and the database corresponding to the member engine is calculated according to the search request, the interest model, and the meta index, and the member engine with high similarity is selected. For specific methods of selection, reference may be made to the foregoing specific embodiments.
1705、 将该搜索请求发送给该选择的成员引擎, 以便于由所述选择的成员引擎完成该搜索。 本领域技术人员也可以理解, 上述系统和服务器的实施方式中, 部分或者全部模块可以集成, 也可以分散布置。 例如, 搜索服务子系统的功能可以由搜索服务器、 调度服务器或者应用服务器组 成, 也可以集成在一个服务器上。 用户数据库可以独立存在, 也可以集成在前述任一服务器上。 前 述搜索服务器、 调度服务器或者应用服务器中的不同模块也可以任意组合集成在一起。  1705. Send the search request to the selected member engine to facilitate the search by the selected member engine. It will also be understood by those skilled in the art that in the above embodiments of the system and the server, some or all of the modules may be integrated or distributed. For example, the functionality of the search service subsystem can be composed of a search server, a dispatch server, or an application server, or it can be integrated on one server. User databases can exist independently or on any of the aforementioned servers. The different modules in the aforementioned search server, dispatch server or application server can also be integrated in any combination.
需要补充说明的是, 本领域技术人员可以理解, 上述各方法、 系统、 服务器的实施方式中, 根 据所述搜索请求从用户个性化数据中提取该用户的兴趣模型的步骤可以包括不同的情况: 收到所述 搜索请求后, 从用户个性化数据中提取该用户的兴趣模型; 或者预先从用户个性化数据中提取该用 户的兴趣模型, 在收到所述搜索请求后, 直接取出所述预先从用户个性化数据中提取该用户的兴趣 模型。  It should be noted that, in an implementation manner of the foregoing methods, systems, and servers, the steps of extracting the user's interest model from the user personalized data according to the search request may include different situations: After receiving the search request, extracting the user's interest model from the user personalized data; or extracting the user's interest model from the user personalized data in advance, and after receiving the search request, directly extracting the advance The user's interest model is extracted from the user personalized data.
本领域技术人员可以理解的是,本发明中各具体实施方式中的用户的兴趣模型可以等同替换为 其它各种类型的用户的个性化数据。 用户的兴趣模型可以是用户的个性化数据的一种表达形式, 本 发明的保护范围当然不限于这种表达形式。 本领域技术人员还可以理解的是, 本发明各具体实施方 式中根据各个成员引擎的元索引、 搜索请求和用户的个性化数据选择成员引擎, 在实际的应用场景 中, 可以增加一些考虑因素进行成员引擎的选择, 或者对于选择的成员引擎进行进一步的处理, 例 如整合、 过滤等等, 再由最后确定的成员引擎进行搜索。  It will be understood by those skilled in the art that the user's interest model in various embodiments of the present invention can be equivalently replaced with personalization data of other various types of users. The user's interest model may be an expression of the user's personalized data, and the scope of protection of the present invention is of course not limited to this form of expression. It can also be understood by those skilled in the art that in various embodiments of the present invention, member engines are selected according to the meta-index of each member engine, the search request, and the personalized data of the user. In actual application scenarios, some considerations may be added. The selection of the member engine, or further processing of the selected member engine, such as integration, filtering, etc., is then searched by the finalized member engine.
基于上述各实施方式的描述, 可以知道本发明包括如下的具体实施方式:  Based on the description of each of the above embodiments, it will be appreciated that the present invention includes the following specific embodiments:
一种搜索服务器, 该搜索服务器包括:  A search server, the search server includes:
搜索请求接收模块, 用于接收搜索客户端发送的搜索请求; 兴趣模型提取模块, 根据所述搜索请求从用户个性化数据中提取该用户的兴趣模型; 元索引收集模块, 用于接收各个成员引擎上报的元索引; a search request receiving module, configured to receive a search request sent by the search client; The interest model extraction module extracts the user's interest model from the user personalized data according to the search request; the meta index collection module is configured to receive the meta index reported by each member engine;
成员引擎选择模块,用于根据所述各个成员引擎的元索引、所述搜索请求和所述用户的兴趣模 型选择成员引擎; 搜索请求分发模块, 用于将所述搜索请求发送给所述选择的成员引擎, 以便于所 述选择的成员引擎根据所述搜索请求完成搜索。  a member engine selection module, configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user; and a search request distribution module, configured to send the search request to the selected a member engine, such that the selected member engine completes the search based on the search request.
一种搜索服务器, 该搜索服务器包括:  A search server, the search server includes:
搜索请求接收模块,用于接收搜索客户端发送的搜索请求, 并将该搜索请求发送给成员引擎选 择请求发送模块;  a search request receiving module, configured to receive a search request sent by the search client, and send the search request to the member engine selection request sending module;
兴趣模型提取模块,用于根据所述搜索请求从用户个性化数据中提取该用户的兴趣模型, 并将 该用户的兴趣模型发送给成员引擎选择请求发送模块;  The interest model extraction module is configured to extract the user's interest model from the user personalized data according to the search request, and send the user's interest model to the member engine selection request sending module;
成员引擎选择请求发送模块,用于将所述搜索请求和该用户的兴趣模型发送给调度服务器, 以 便于该调度服务器根据所述各个成员引擎的元索引、 所述搜索请求和所述用户的兴趣模型选择成员 引擎;  a member engine selection request sending module, configured to send the search request and the user's interest model to a scheduling server, so that the scheduling server is configured according to the meta index of the respective member engines, the search request, and the user's interest Model selection member engine;
成员引擎选择结果接收模块, 用于接收所述调度服务器返回的其选择的成员引擎; 搜索请求分发模块,用于将所述搜索请求发送给所述选择的成员引擎, 以便于所述选择的成员 引擎根据所述搜索请求完成搜索。  a member engine selection result receiving module, configured to receive a selected member engine returned by the scheduling server; a search request distribution module, configured to send the search request to the selected member engine, to facilitate the selected member The engine completes the search based on the search request.
一种调度服务器, 可与前述搜索服务器通信, 该调度服务器包括:  A scheduling server is operative to communicate with the foregoing search server, the scheduling server comprising:
成员引擎选择请求接收模块, 用于接收所述搜索服务器发送的搜索请求和该用户的兴趣模型; 元索引收集模块, 用于接收各个成员引擎上报的元索引;  a member engine selection request receiving module, configured to receive a search request sent by the search server and an interest model of the user; a meta index collection module, configured to receive a meta index reported by each member engine;
成员引擎选择模块,用于根据所述各个成员引擎的元索引、所述搜索请求和所述用户的兴趣模 型选择成员引擎;  a member engine selection module, configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user;
成员引擎选择结果返回模块,用于将所述选择的成员引擎发送给所述搜索服务器, 以便于所述 搜索服务器将所述搜索请求发送给所述选择的成员引擎, 所述选择的成员引擎根据所述搜索请求完 成搜索。  a member engine selection result returning module, configured to send the selected member engine to the search server, so that the search server sends the search request to the selected member engine, and the selected member engine is configured according to The search request completes the search.
一种搜索服务器, 该搜索服务器包括:  A search server, the search server includes:
搜索请求接收模块,用于接收搜索客户端发送的搜索请求, 并将该搜索请求发送给成员引擎选 择请求发送模块;  a search request receiving module, configured to receive a search request sent by the search client, and send the search request to the member engine selection request sending module;
成员引擎选择请求发送模块,用于将所述搜索请求发送给调度服务器, 以便于该调度服务器根 据所述各个成员引擎的元索引、 所述搜索请求和用户的兴趣模型选择成员引擎;  a member engine selection request sending module, configured to send the search request to the scheduling server, so that the scheduling server selects a member engine according to the meta index of the member engines, the search request, and the user's interest model;
成员引擎选择结果接收模块, 用于接收所述调度服务器返回的其选择的成员引擎; 搜索请求分发模块,用于将所述搜索请求发送给所述选择的成员引擎, 以便于所述选择的成员 引擎根据所述搜索请求完成搜索。  a member engine selection result receiving module, configured to receive a selected member engine returned by the scheduling server; a search request distribution module, configured to send the search request to the selected member engine, to facilitate the selected member The engine completes the search based on the search request.
一种调度服务器, 该调度服务器前述搜索服务器通信, 包括:  A scheduling server, wherein the scheduling server communicates with the foregoing search server, including:
成员引擎选择请求接收模块, 用于接收搜索服务器发送的搜索请求;  a member engine selection request receiving module, configured to receive a search request sent by the search server;
兴趣模型提取模块,用于根据接收的搜索请求从用户个性化数据中提取该用户的兴趣模型,并 将该用户的兴趣模型发送给成员弓 I擎选择模块; An interest model extraction module, configured to extract the user's interest model from the user personalized data according to the received search request, and Sending the user's interest model to the member's engine selection module;
元索引收集模块, 用于接收各成员引擎上报的元索引;  a meta index collection module, configured to receive a meta index reported by each member engine;
成员引擎选择模块,用于根据所述各个成员引擎的元索引、所述搜索请求和用户的兴趣模型选 择成员引擎;  a member engine selection module, configured to select a member engine according to the meta index of the respective member engines, the search request, and the user's interest model;
成员引擎返回模块,用于根据所述选择的成员引擎发送给所述搜索服务器, 以便于所述搜索服 务器将所述搜索请求发送给所述选择的成员引擎,所述选择的成员引擎根据所述搜索请求完成搜索。  a member engine returning module, configured to send to the search server according to the selected member engine, so that the search server sends the search request to the selected member engine, the selected member engine according to the The search request completes the search.
一种搜索服务器, 该服务器包括:  A search server that includes:
接收模块, 用于接收应用服务器发送的搜索请求和用户的兴趣模型;  a receiving module, configured to receive a search request sent by an application server and a user's interest model;
元索引收集模块, 用于接收各个成员引擎上报的元索引;  a meta index collection module, configured to receive a meta index reported by each member engine;
成员引擎选择模块,用于根据所述各个成员引擎的元索引、所述搜索请求和所述用户的兴趣模 型选择成员引擎; 和  a member engine selection module, configured to select a member engine according to the meta index of the respective member engines, the search request, and the interest model of the user; and
搜索请求分发模块,用于将所述搜索请求发送给所述选择的成员引擎, 以便于所述选择的成员 弓 I擎根据所述搜索请求完成搜索。  A search request distribution module is operative to send the search request to the selected member engine to facilitate the selected member to complete the search based on the search request.

Claims

权利要求 Rights request
1 . 一种搜索的方法, 其特征在于, 该方法包括: · · . . .. 接收搜索请求;  What is claimed is: 1. A method of searching, the method comprising: • receiving a search request;
根据所述搜索请求从用户个性化数据中提取用户的兴趣模型;  Extracting a user's interest model from the user personalized data according to the search request;
获取各个成员引擎的元索引;  Get the meta index of each member engine;
根据所述各个成员引擎的元索引、 搜索请求和用户的兴趣模型选择成员引擎;  Selecting a member engine according to the meta index of each member engine, a search request, and a user's interest model;
将所述搜索请求发送给所述选择的成员引擎, 以便于所述选择的成员弓 I擎完成搜索。  The search request is sent to the selected member engine to facilitate the selected member to complete the search.
2. 根据权利耍求 1所述的方法, 其特征在于,  2. The method according to claim 1, characterized in that
所述接收搜索请求、提取用户的兴趣模型和将所述搜索请求发送给所述选择的成员引擎的步骤 由搜索服务器完成;  The step of receiving a search request, extracting a user's interest model, and transmitting the search request to the selected member engine is performed by a search server;
所述选择成员弓 I擎的步骤具体包括:  The steps of selecting a member of the engine include:
所述搜索服务器将所述搜索请求和所述用户的兴趣模型发送给调度服务器;  The search server sends the search request and the interest model of the user to a scheduling server;
所述调度服务器根据各个成员引擎的元索引、所述搜索请求和所述用户的兴趣模型选择成员引 擎;  The scheduling server selects a member engine according to a meta index of each member engine, the search request, and the interest model of the user;
所述调度服务器将所述选择的成员引擎发送给所述搜索服务器。  The scheduling server sends the selected member engine to the search server.
3. 根据权利耍求 1所述的方法, 其特征在于,  3. The method according to claim 1, characterized in that
所述接收搜索请求和将所述搜索请求发送给所述选择的成员引擎的步骤由搜索服务器完成; 所述提取用户的兴趣模型和所述选择成员弓 I擎的步骤具体包括:  The step of receiving the search request and transmitting the search request to the selected member engine is performed by a search server; the step of extracting the user's interest model and the selecting member engine specifically includes:
所述搜索服务器将所述搜索请求发送给调度服务器;  The search server sends the search request to a scheduling server;
所述调度服务器根据所述搜索请求从用户个性化数据中提取用户的兴趣模型, 以及, 根据各个 成员引擎的元索引、 所述搜索请求和所述用户的兴趣模型选择成员引擎;  The scheduling server extracts a user's interest model from the user personalized data according to the search request, and selects a member engine according to the meta index of each member engine, the search request, and the user's interest model;
所述调度服务器将所述选择的成员引擎发送给所述搜索服务器。  The scheduling server sends the selected member engine to the search server.
4. 根据权利耍求 1所述的方法, 其特征在于,  4. The method according to claim 1, characterized in that
所述接收搜索请求, 以及,根据所述搜索请求从用户个性化数据中提取用户的兴趣模型的步骤 由应用服务器完成; 并且,  Receiving the search request, and extracting the user's interest model from the user personalized data according to the search request is performed by the application server; and
所述选择成员引擎的步骤包括:  The steps of selecting a member engine include:
所述应用服务器将所述搜索请求和所述用户的兴趣模型发送给搜索服务器;  The application server sends the search request and the user's interest model to a search server;
所述搜索服务器根据各个成员引擎的元索引、所述搜索请求和所述用户的兴趣模型选择成员引 擎。  The search server selects a member engine based on a meta index of each member engine, the search request, and the user's interest model.
5. 根据权利耍求 1至 4中任一项所述的方法, 其特征在于, 所述根据所述搜索请求从用户个 性化数据中提取用户的兴趣模型包括:  The method according to any one of claims 1 to 4, wherein the extracting the user's interest model from the user personalized data according to the search request comprises:
收到所述搜索请求后, 从用户个性化数据中提取用户的兴趣模型; 或者, 预先从用户个性化数 据中提取用户的兴趣模型, 在收到所述搜索请求后, 直接取出所述预先从用户个性化数据中提取用 户的兴趣模型。  After receiving the search request, extracting the user's interest model from the user personalized data; or, extracting the user's interest model from the user personalized data in advance, and directly taking out the pre-received after receiving the search request The user's interest model is extracted from the user's personalized data.
6. 根据权利耍求 1至 4中任一项所述的方法, 其特征在于, 所述用户的兴趣模型为用若干兴  The method according to any one of claims 1 to 4, wherein the user's interest model is
16  16
更正页(细则第 91条) 趣维度表示 ffl户的兴趣; 针对每个兴趣维度给出评分值; 由针对每个兴趣维度的评分值组成的向量; 所述用户的兴趣模型包括: 静态兴趣投型和动态兴趣模型; Correction page (Article 91) The interesting dimension represents the interest of the household; the score value is given for each interest dimension; the vector consists of the score values for each interest dimension; the user's interest model includes: a static interest cast and a dynamic interest model;
所述静态兴趣模型为: 获得所述用户的静态用户档案中的属于某兴趣维度的词的词频, 计算属 于该兴趣维度的所有词的词频之和, 作为该兴趣维度的评分值, 各评分值组成一个评分值向量, 该 评分值向量为用户的兴趣模型, 在此简称为静态兴趣模型;  The static interest model is: obtaining a word frequency of a word belonging to a certain interest dimension in a static user profile of the user, calculating a sum of word frequencies of all words belonging to the interest dimension, as a score value of the interest dimension, each rating value Forming a score value vector, which is a user's interest model, referred to herein as a static interest model;
所述动态兴趣模型为:获得所述用户的搜索历史中被点击的某文档中属于某兴趣维度的词的词 频, 计算该文档中属于该兴趣维度的所有词的词频之和, 该词频之和作为该文档针对该兴趣维度的 评分值, 针对各兴趣维度的各评分值形成针对该文档的评分值向量, 针对不同文档的上述评分值向 量的和为所述用户的兴趣模型, 在此简称为动态兴趣模型。  The dynamic interest model is: obtaining a word frequency of a word belonging to a certain interest dimension in a document that is clicked in the search history of the user, and calculating a sum of word frequencies of all words belonging to the interest dimension in the document, the sum of the word frequency As a score value of the document for the interest dimension, each score value of each interest dimension forms a score value vector for the document, and the sum of the score value vectors for different documents is the user's interest model, which is referred to herein as Dynamic interest model.
7. 根据权利耍求 6所述的方法, 其特征在于, 该方法进一步包括:  7. The method according to claim 6, wherein the method further comprises:
当用户对于被点击的该文档评价好时, 对所述针对该文档的评分值向量进行正向加权; 或者, 当用户对于被点击的该文档评价不好时,对所述针对该文档的评分值向量进行反向加权;或者, 根据该文档被点击后经历的时间, 递减所述针对该文档的评分值向量。  When the user evaluates the document that is clicked, the weighting of the rating value vector for the document is forward-weighted; or, when the user does not evaluate the document that is clicked, the rating for the document is The value vector is inverse weighted; or, the score value vector for the document is decremented based on the time elapsed after the document was clicked.
8. 根据权利耍求 6所述的方法, 其特征在于, 所述从用户个性化数据中提取用户的兴趣模型 的步骤中进一步包括:  8. The method according to claim 6, wherein the step of extracting the user's interest model from the user personalized data further comprises:
先将所述静态兴趣模型和所述动态兴趣模型分别作归一化,再计算所述静态兴趣模型与所述动 态兴趣模型的和, 以该结果作为所述用户的兴趣模型; 或者,  Normally normalizing the static interest model and the dynamic interest model, and then calculating a sum of the static interest model and the dynamic interest model, and using the result as the user's interest model; or
先将所述静态兴趣模型和所述动态兴趣模型加权相加, 再将该相加的和进行归一化, 以该结果 作为所述兴趣模型。  The static interest model and the dynamic interest model are first weighted together, and the summed sum is normalized, and the result is used as the interest model.
9. 根据权利耍求 1所述的方法, 其特征在于, 所述元索引为关于下述信息之一或者其任意组 合的统计数据: 所述成员引擎对应的数据库, 子数据库, 所述数据库中包含的文档或者记录, 所述 子数据库中包含的文档或者记录, 或者, 所述文档或者所述记录中包含的术语。  9. The method according to claim 1, wherein the meta-index is statistical data about one or a combination of the following information: a database corresponding to the member engine, a sub-database, in the database An included document or record, a document or record contained in the sub-database, or a term contained in the document or the record.
10. 根据权利耍求 9所述的方法, 其特征在于, 所述元索引包括下述信息之一或者其任意组 合.  10. The method according to claim 9, wherein the meta index comprises one of the following information or any combination thereof.
术语最大归一化权重向量 mnw=(mnwl , mnw2, …, mnwi, ...mnwp), 其中 mnwi为术语 ti 相对于所述成员引擎对应的数据库或者子数据库中的所有文档的最大归一化权重;  The term maximum normalized weight vector mnw=(mnwl , mnw2, ..., mnwi, ... mnwp), where mnwi is the maximum normalization of the term ti relative to all documents in the database or sub-database corresponding to the member engine Weights;
术语平均归一化权重向量 anw =(anwl , anw2, ..., anwi , anwp), 其中 anwi为术语 ti相对于所述成员引擎对应的数据库或者子数据库中的所有文档的平均归一化权重;  The term average normalized weight vector anw = (anwl , anw2, ..., anwi , anwp), where anwi is the average normalized weight of the term ti relative to all documents in the database or subdatabase corresponding to the member engine ;
数据库或者子数据库中的文档的兴趣模型最大归一化权重向量 mtw=(mnVl, mnv2 , mnvi , mnvn ), 其中 mnvi 所述文档的兴趣模型的第 i个维度相对于所述成员引擎对应的 数据库或者子数据库中的所有文档的最大归一化权重; The interest model of the document in the database or the sub-database has a maximum normalized weight vector mtw=( mnV l, mnv2 , mnvi , mnvn ), where the ith dimension of the document's interest model is relative to the member engine The maximum normalized weight of all documents in the database or subdatabase;
数据库或者子数据库中的文档的兴趣模型平均归一化权重向量 anv =(anvl, anv2 , ......, anvi , ...... , anvn), 其中 anvi为文档的兴趣模型的第 i个维度相对于所述成员引擎对应的数据库或 者子数据库中的所有文档的平均归一化权重; The interest model of the document in the database or subdatabase averages the normalized weight vector anv = ( an vl, anv2 , ......, anvi , ..., anvn), where anvi is the document's interest model The average normalized weight of the i-th dimension relative to all documents in the database or sub-database corresponding to the member engine;
术语 ti相对于该数据库的全局反向文档频率 gidfi , 其中 gidfl = l/dfi, dfi为该元索引对应的数  The term ti is relative to the global reverse document frequency gidfi of the database, where gidfl = l/dfi, dfi is the number corresponding to the meta index
17  17
更正页 (细则第 91条) 据库中包含术语 ti的文档的数 M; Correction page (Article 91) According to the number M of the document containing the term ti in the library ;
文档的兴趣模型第 i 个维度对应的全局反向文档频率 IM— gidfi , 其中 IM—gidfi=l/IM_IDFi, IMJDFi为数据库或者子数据库中包含的属于文档的兴趣模型的第 i个维度的术语的文档的个数; 术语 ti相对于该数据库的全局反向文档频率
Figure imgf000020_0001
其中 gdfi为所有成员引擎对 应数据库或者子数据库中包含术语 ti的文档的数量的总和, n为所有成员引擎所包含的所有文档数 量的总和; 或者,
The global reverse document frequency IM-gidfi corresponding to the i-th dimension of the document's interest model, where IM_gidfi=l/IM_IDFi, IMJDFi is the term of the i-th dimension of the interest model belonging to the document contained in the database or sub-database The number of documents; the term ti relative to the global reverse document frequency of the database
Figure imgf000020_0001
Where gdfi is the sum of the number of documents containing the term ti in the database or subdatabase of all member engines, and n is the sum of the total number of documents included in all member engines; or
文档的兴趣模型第 i个维度对应的全局反向文档频率 IM—gidfi=log(n/(lM— gdfi+1)), IM—gdfi为所 有成员引擎对应的数据库或子数据库中包含属于文档的兴趣模型的第 i个维度的术语的文档个数的 总和, n为所有成员引擎所包含的所有文档数量的总和。  The global reverse document frequency corresponding to the i-th dimension of the document's interest model IM-gidfi=log(n/(lM-gdfi+1)), IM-gdfi is the database or sub-database corresponding to all member engines. The sum of the number of documents for the term of the i-th dimension of the interest model, n is the sum of the total number of documents included in all member engines.
1 1 . 根据权利耍求 9或 10所述的方法, 其特征在于, 所述根据各个成员引擎的元索引、 所述 搜索请求和所述用户的兴趣模型选择成员引擎的步骤包括:  The method according to claim 9 or 10, wherein the step of selecting a member engine according to the meta index of each member engine, the search request, and the interest model of the user comprises:
获取某数据库中的第一文档与搜索请求向量 Q的第一相似度, 该第一文档满足与搜索请求向 量 Q的相似度最高的条件; 其中, 搜索请求向量^ : ^,^2,'^…^^, ^为搜索请求中术语 "的 权重; Obtaining a first similarity between the first document in a database and the search request vector Q, the first document satisfies a condition with the highest similarity with the search request vector Q; wherein, the search request vector ^: ^, ^ 2 , '^ ...^^, ^ is the weight of the term "in the search request;
获取该数据库中的第二文档与所述用户的兴趣模型 R 的第二相似度, 并且该第二文档满足在 与向量 Q' (q l ', q2' ... qm')有匹配度满足规定阈值的基础上与用户的兴趣模型向量 R相似度最高的 条件,其中, 向量 Q'为搜索请求向量 Q针对用户的兴趣模型向量 R的转换形式, 用户的兴趣模型向 量 R = (rl, r2,..ri...., rn), ri为用户的兴趣模型第 j个维度的权重分值; Obtaining a second similarity between the second document in the database and the interest model R of the user, and the second document satisfies the requirement that the matching degree with the vector Q' (ql ', q2' ... qm') satisfies the requirement The threshold is based on the condition that the user's interest model vector R has the highest similarity, wherein the vector Q' is a conversion form of the search request vector Q for the user's interest model vector R, and the user's interest model vector R = (rl, r2, ..ri ...., rn), ri is the weight of the user's interest model j-th dimension heavy score;
选取上述第一相似度和第二相似度中的较大值作为所述搜索请求和所述用户的兴趣模型的结 合与该数据库之间的相似度;  Selecting a larger one of the first similarity and the second similarity as the similarity between the combination of the search request and the user's interest model and the database;
重复采甩上述方法,获得所述搜索请求和所述用户的兴趣模型的结合与各个数据库之间的相似 度, 其中各数据库分别对应一个成员引擎;  Repeating the above method to obtain a similarity between the combination of the search request and the user's interest model and each database, wherein each database corresponds to a member engine;
按所述搜索请求和所述用户的兴趣模型的结合与各个数据库之间的相似度对各数据库进行排 序, 选择排在前面的相似度较大的一个或多个数据库对应的成员引擎。  Each database is sorted according to the similarity between the search request and the user's interest model and the respective databases, and the member engines corresponding to one or more databases having the same similarity are selected.
12. 根据权利耍求 1 1所述的方法, 其特征在于, 获取所述第一相似度的步骤包括:  12. The method according to claim 1, wherein the step of acquiring the first similarity comprises:
ΜαΧ . * . 、  ΜαΧ . * . ,
> ¾/ * giajj * anwj) rj ^ anvj  > 3⁄4/ * giajj * anwj) rj ^ anvj
计算 l≤i≤k ( {qi * gidfi * mnwi + J≠i /|Q| + , / |R| )的值, 该值作为所述第一相似度; Calculating a value of l ≤ i ≤ k ( { qi * gidfi * mnwi + J≠i / |Q| + , / |R| ), the value being the first similarity;
获取所述第二相似度的步骤包括:  The step of obtaining the second similarity includes:
Max  Max
计算 1≤ /≤ " ( if (sim( γ( mnvi, anvj j≠ i,\≤_/≤") ) , Q,)> T ) THEN ( ( ri * mnvi + qi * gidfi * anwiCalculate 1 ≤ / ≤ " ( if (sim( γ( mnvi, anvj j≠ i,\≤_/≤") ) , Q,)> T ) THEN ( ( ri * mnvi + qi * gidfi * anwi
Figure imgf000020_0002
/|Q| ) 的值, 该值作为所述第二相似度;
Figure imgf000020_0002
a value of /|Q| ) as the second similarity;
其中, IQ1为搜索请求向量 Q的模; |R|为用户的兴趣模型 R的模; Q'的计算方法为: 如果术语  Where IQ1 is the modulus of the search request vector Q; |R| is the modulus of the user's interest model R; Q' is calculated as:
18  18
更正页(细则第 91条) ti属于用户的兴趣模型的某个维度的范围, 将 qi的值映射成 ffl户的兴趣模型的该维度的权重, 然后 将相同维度的权重相加得到 qi',再作归一化处理; V为由 mnvi和 FL"V ≠ ^J≤ w)组成的向量; sun (
Figure imgf000021_0001
≠ "J'≤ n) ) , Q' )为向量 V和向量 Q'的 cousine相似度; T为一个阀值, 且 0<T≤ 1; i, k, j, n为自然数。
Correction page (Article 91) Ti belongs to a range of dimensions of the user's interest model, maps the value of qi to the weight of the dimension of the interest model of the fff household, and then adds the weights of the same dimension to obtain qi', and then performs normalization processing; a vector consisting of mnvi and FL " V ≠ ^ J ≤ w ); sun (
Figure imgf000021_0001
≠ " J' ≤ n ) ) , Q' ) is the cousine similarity of the vector V and the vector Q'; T is a threshold, and 0 < T ≤ 1; i, k, j, n are natural numbers.
13. 根据权利耍求 11 所述的方法, 其特征在于: 获取所述第一相似度的步骤包括: 计算 /|R| ) 的值;
Figure imgf000021_0002
13. The method according to claim 11, wherein: the step of acquiring the first similarity comprises: calculating a value of / |R| );
Figure imgf000021_0002
该值作为该第一相似度; The value is taken as the first similarity;
Max  Max
获 取 所 述 第 二 相 似 度 的 步 骤 包 括 : 计 算 i ^w ( if (sim( V( IM— gidfi*画 1IM - Sidfj * anvj{j≠ i,\ <j≤n) ) , Q, )> T ) Obtaining a second degree of similarity comprises: calculating i ^ w (if (sim ( V (IM - g id fi * Videos 1, IM - Sidfj * anvj { j ≠ i, \ <j≤n)), Q , )> T )
« k  « k
Υ rj * anvj * IM _ gidff V qi * gidfi * anwi) then((ri*mnvi*IM-Sidfi + J-^ )/|R|+ '=' /|Q|)的值, 该值作为该第二相似度; Υ rj * anvj * IM _ gidff V qi * gidfi * anwi) t hen (( ri * mnvi * IM - S id fi + J-^ ) / |R| + '=' /|Q|) a value as the second similarity;
其中, IQI为搜索请求向量 Q的模, |R|为用户的兴趣模型 R的模; Q'的计算方法为: 如果术语 ti属于用户的兴趣模型的某个维度的范围, 将 qi的值映射成用户的兴趣模型的该维度的权重, 然后 将相同维度的权重相加得到 qi', 再作归一化处理; V 为 由 IM— gidf^mnvi 和 IM_ gidff *anvj-(j≠i,l≤j≤n) 组 成 的 向 量 ; sim( V( IM― gidfi * mnviIM― Sidj] *續 jU≠ U≤_ ≤")), Q, )为向量 v和向量 Q,的 cousine 相似度; T为一个阀值, 且 0<Τ≤ 1; i, k, j, n为自然数。 Where IQI is the modulus of the search request vector Q, |R| is the modulus of the user's interest model R; Q' is calculated as: If the term ti belongs to a range of dimensions of the user's interest model, the value of qi is mapped The weight of the dimension of the user's interest model, then add the weights of the same dimension to get qi', and then normalize; V is by IM-gidf^mnvi and IM_gidff *anvj-(j≠i,l ≤j≤n) a vector composed; sim( V( IM ― g id fi * mnvi , IM ― Sidj] * continued jU ≠ U ≤ _ ≤ ")), Q, ) is a vector v and a vector Q, similar to cousine Degree; T is a threshold, and 0 < Τ ≤ 1; i, k, j, n are natural numbers.
14. 根据权利耍求 n 所述的方法, 其特征在于, 获取所述第一相似度的步骤包括: 计算  14. The method according to claim n, wherein the step of acquiring the first similarity comprises: calculating
ΜθΧ qj* gidff *anwj) Y rj * anvj * IM _ gidff ΜθΧ qj* gidff *anwj) Y rj * anvj * IM _ gidff
l≤i≤k (
Figure imgf000021_0003
/|Q| + ;=· /|R| )的值; 该值作为该第一相似度;
l ≤ i ≤ k (
Figure imgf000021_0003
/| Q | + ;=· /|R| ); the value is the first similarity;
Max  Max
获取所述所述第二相似度的步骤包括:计算1≤ f " ( if(sim( V(mnvi nvjU≠ i,\≤·/≤")), The step of obtaining the second similarity includes: calculating 1 ≤ f " ( if(sim( V(mnvi nvjU≠ i,\≤·/≤"))),
.* gidfi* anwi).* gidfi* anwi)
Q')>T) then( ( ri / "謂 1M― +
Figure imgf000021_0004
Q')>T) then( ( ri / " 1M ― +
Figure imgf000021_0004
/|Q|)的值, 该值作为该第二相似度;  a value of /|Q|) as the second similarity;
其中, IQI为搜索请求向量 Q的模; |R|为用户的兴趣模型 R的模; Q'的计算方法为: 如果术语 ti属于用户的兴趣模型的某个维度的范围, 将 qi的值映射成用户的兴趣模型的该维度的权重, 然后  Where IQI is the modulus of the search request vector Q; |R| is the modulus of the user's interest model R; Q' is calculated as: If the term ti belongs to a range of dimensions of the user's interest model, the value of qi is mapped The weight of the dimension into the user's interest model, then
19 19
更正页 (细则第 91条) 将相同维度的权重相加得到 qi',再作归一化处理; V为由 mnvi和 αην·"≠ ^ J≤ η)组成的向量; sim( V( ≤ ≤ n) ) , Q' )为向量 V和向量 Q'的 cousine相似度; T为一个阀值, 且 0< T≤ 1; i, k, j, n为自然数。 Correction page (Article 91) The weights of the same dimension are added to obtain qi', and then normalized; V is a vector composed of mnvi and αην · "≠ ^ J ≤ η ); sim( V( ≤ ≤ n ) ) , Q' ) Is the cousine similarity of the vector V and the vector Q'; T is a threshold, and 0 < T ≤ 1; i, k, j, n are natural numbers.
15. 一种用于搜索的系统, 其特征在于, 该系统包括:  15. A system for searching, characterized in that the system comprises:
搜索服务子系统, 用于接收搜索请求, 接收各个成员引擎上报的元索引, 根据所述各个成员引 擎的元索引、 所述搜索请求和用户的兴趣模型选择成员引擎, 将所述搜索请求发送给所述选择的成 员引擎;  a search service subsystem, configured to receive a search request, receive a meta index reported by each member engine, select a member engine according to the meta index of each member engine, the search request, and a user's interest model, and send the search request to The selected member engine;
至少一个成员引擎,用于向搜索服务子系统上报该成员引擎的元索引, 并在接收到所述搜索服 务子系统发送的搜索请求后, 完成搜索。  At least one member engine is configured to report the meta index of the member engine to the search service subsystem, and complete the search after receiving the search request sent by the search service subsystem.
16. 根据权利耍求 15所述的用于搜索的系统, 其特征在于, 所述搜索服务子系统还用于根据 所述搜索请求从用户个性化数据中提取用户的兴趣模型, 以便于根据所述各个成员弓 I擎的元索引、 所述搜索请求和所述用户的兴趣模型选择成员引擎。  16. The system for searching according to claim 15, wherein the search service subsystem is further configured to extract a user's interest model from the user personalized data according to the search request, so as to facilitate The member index of each member's meta-index, the search request, and the user's interest model are selected.
17. 根据权利耍求 16所述的搜索系统, 其特征在于, 所述搜索服务子系统包括搜索服务器和 用户数据库;  17. The search system according to claim 16, wherein the search service subsystem comprises a search server and a user database;
所述用户数据库, 用于存储或者提供用户的个性化数据;  The user database is configured to store or provide personalized data of the user;
所述搜索服务器,用于接收搜索客户端发送的搜索请求; 根据所述搜索请求从用户个性化数据 中提取用户的兴趣模型; 获取各个成员引擎的元索引; 根据所述各个成员引擎的元索引、 所述搜索 请求和所述用户的兴趣模型选择成员引擎; 将所述搜索请求发送给所述选择的成员引擎。  The search server is configured to receive a search request sent by the search client, extract a user's interest model from the user personalized data according to the search request, obtain a meta index of each member engine, and obtain a meta index according to each member engine. And the search request and the interest model of the user select a member engine; send the search request to the selected member engine.
18. 根据权利耍求 16所述的搜索系统, 其特征在于, 所述搜索服务子系统包括: 搜索服务器、 调度服务器和用户数据库, 其中,  18. The search system according to claim 16, wherein the search service subsystem comprises: a search server, a dispatch server, and a user database, where
所述用户数据库, 用于存储或者提供用户的个性化数据;  The user database is configured to store or provide personalized data of the user;
所述搜索服务器,用于接收搜索客户端发送的搜索请求,根据所述搜索请求从用户个性化数据 中提取用户的兴趣模型, 将所述用户的兴趣模型和所述搜索请求发送给所述调度服务器, 接收所述 调度服务器返回的其选择的成员引擎, 并将所述搜索请求发送给所述选择的成员引擎;  The search server is configured to receive a search request sent by a search client, extract a user's interest model from the user personalized data according to the search request, and send the user's interest model and the search request to the schedule a server, receiving a selected member engine returned by the scheduling server, and sending the search request to the selected member engine;
所述调度服务器, 用于接收搜索服务器发送的用户的兴趣模型和所述搜索请求, 以及, 获取各 个成员引擎的元索引; 根据所述各个成员引擎的元索引、 所述搜索请求和所述用户的兴趣模型选择 成员引擎; 将所述选择的成员引擎返回给所述搜索服务器;  The scheduling server is configured to receive a user's interest model and the search request sent by the search server, and obtain a meta index of each member engine; according to the meta index of the member engines, the search request, and the user The interest model selects a member engine; returns the selected member engine to the search server;
19. 根据权利耍求 16所述的搜索系统, 其特征在于, 所述搜索服务子系统包括搜索服务器、 调度服务器和用户数据库, 其中,  19. The search system according to claim 16, wherein the search service subsystem comprises a search server, a dispatch server, and a user database, wherein
所述用户数据库, 用于存储或者提供用户的个性化数据;  The user database is configured to store or provide personalized data of the user;
所述搜索服务器,用于接收搜索客户端发送的搜索请求,将所述搜索请求发送给所述调度服务 器, 接收所述调度服务器返回的其选择的成员引擎, 并将所述搜索请求发送给所述选择的成员引擎; 所述调度服务器, 用于接收搜索服务器发送的所述搜索请求,根据所述搜索请求从用户个性化 数据中提取用户的兴趣模型, 以及, 获取各个成员引擎的元索引; 根据所述各个成员引擎的元索引、  The search server is configured to receive a search request sent by a search client, send the search request to the scheduling server, receive a selected member engine returned by the scheduling server, and send the search request to the The selected member engine is configured to receive the search request sent by the search server, extract the user's interest model from the user personalized data according to the search request, and obtain a meta index of each member engine; According to the meta index of each member engine,
20  20
更正页 (细则第 91条) 所述搜索请求和所述用户的兴趣模型选择成员引擎;将所述选择的成员引擎返回给所述搜索服务器。 Correction page (Article 91) The search request and the user's interest model select a member engine; return the selected member engine to the search server.
20. 根据权利耍求 16所述的搜索系统, 其特征在于, 所述搜索服务子系统包括搜索服务器, 应用服务器, 用户数据库: 20. The search system according to claim 16, wherein the search service subsystem comprises a search server, an application server, and a user database:
该用户数据库, 用于存储或者提供用户的个性化数据;  The user database for storing or providing personalized data of the user;
该应用服务器, 用于接收客户端发送的搜索请求, 根据所述搜索请求从用户个性化数据中提取 用户的兴趣模型; 将该搜索请求和用户兴趣模型发送给搜索服务器;  The application server is configured to receive a search request sent by the client, extract the user's interest model from the user personalized data according to the search request, and send the search request and the user interest model to the search server;
该搜索服务器, 用于接收应用服务器发送的搜索请求和用户的兴趣模型, 接收各成员引擎上报 的元索引, 并根据所述各个成员引擎的元索引、所述搜索请求和所述用户的兴趣模型选择成员引擎: 将该搜索请求发送给所述选择的成员引擎。  The search server is configured to receive a search request sent by an application server and a user's interest model, receive a meta index reported by each member engine, and perform a meta index, a search request, and an interest model of the user according to the member engine Select member engine: Send the search request to the selected member engine.
21 twenty one
更正页(细则第 91条)  Correction page (Article 91)
PCT/CN2009/073971 2008-09-26 2009-09-16 A method for searching and the device and system thereof WO2010037314A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP09817228A EP2352102A4 (en) 2008-09-26 2009-09-16 A method for searching and the device and system thereof
US13/070,265 US8527509B2 (en) 2008-09-26 2011-03-23 Search method, system and device

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN200810216521 2008-09-26
CN200810216521.5 2008-09-26
CN200810190595.6 2008-12-24
CN200810190595.6A CN101685456B (en) 2008-09-26 2008-12-24 Search method, system and device

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/070,265 Continuation US8527509B2 (en) 2008-09-26 2011-03-23 Search method, system and device

Publications (1)

Publication Number Publication Date
WO2010037314A1 true WO2010037314A1 (en) 2010-04-08

Family

ID=42048620

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2009/073971 WO2010037314A1 (en) 2008-09-26 2009-09-16 A method for searching and the device and system thereof

Country Status (4)

Country Link
US (1) US8527509B2 (en)
EP (1) EP2352102A4 (en)
CN (1) CN101685456B (en)
WO (1) WO2010037314A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819529A (en) * 2011-06-10 2012-12-12 阿里巴巴集团控股有限公司 Information publishing method and system for social website

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996211B (en) * 2009-08-20 2013-01-23 华为技术有限公司 Method for interconnecting search servers for mobile search, search servers and system
US8869277B2 (en) * 2010-09-30 2014-10-21 Microsoft Corporation Realtime multiple engine selection and combining
CN102819575B (en) * 2012-07-20 2015-06-17 南京大学 Personalized search method for Web service recommendation
US8983976B2 (en) * 2013-03-14 2015-03-17 Microsoft Technology Licensing, Llc Dynamically expiring crowd-sourced content
CN104298785B (en) * 2014-11-12 2017-05-03 中南大学 Searching method for public searching resources
CN106407011B (en) * 2016-09-20 2019-05-10 焦点科技股份有限公司 A kind of method and system of the search system cluster service management based on routing table
CN107748792B (en) * 2017-11-01 2020-11-27 上海数据交易中心有限公司 Data retrieval method and device and terminal
US11422881B2 (en) * 2018-07-19 2022-08-23 Oracle International Corporation System and method for automatic root cause analysis and automatic generation of key metrics in a multidimensional database environment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1158421A2 (en) * 2000-05-16 2001-11-28 LAS21 Co., Ltd. Internet site search service system having a function of building individual meta search engines
CN1811780A (en) * 2006-03-03 2006-08-02 中国移动通信集团公司 Searching system and method based on personalized information
CN1983253A (en) * 2005-12-15 2007-06-20 北京中科信利技术有限公司 Method, apparatus and system for supplying musically searching service

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030220913A1 (en) * 2002-05-24 2003-11-27 International Business Machines Corporation Techniques for personalized and adaptive search services
US20060288001A1 (en) * 2005-06-20 2006-12-21 Costa Rafael Rego P R System and method for dynamically identifying the best search engines and searchable databases for a query, and model of presentation of results - the search assistant
US8548974B2 (en) * 2005-07-25 2013-10-01 The Boeing Company Apparatus and methods for providing geographically oriented internet search results to mobile users
US7558922B2 (en) * 2005-12-28 2009-07-07 Hitachi, Ltd. Apparatus and method for quick retrieval of search data by pre-feteching actual data corresponding to search candidate into cache memory
US7805432B2 (en) * 2006-06-15 2010-09-28 University College Dublin National University Of Ireland, Dublin Meta search engine

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1158421A2 (en) * 2000-05-16 2001-11-28 LAS21 Co., Ltd. Internet site search service system having a function of building individual meta search engines
CN1983253A (en) * 2005-12-15 2007-06-20 北京中科信利技术有限公司 Method, apparatus and system for supplying musically searching service
CN1811780A (en) * 2006-03-03 2006-08-02 中国移动通信集团公司 Searching system and method based on personalized information

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102819529A (en) * 2011-06-10 2012-12-12 阿里巴巴集团控股有限公司 Information publishing method and system for social website
CN102819529B (en) * 2011-06-10 2015-08-19 阿里巴巴集团控股有限公司 Social network sites information issuing method and system

Also Published As

Publication number Publication date
US20110173192A1 (en) 2011-07-14
EP2352102A1 (en) 2011-08-03
EP2352102A4 (en) 2012-10-24
CN101685456A (en) 2010-03-31
US8527509B2 (en) 2013-09-03
CN101685456B (en) 2013-08-28

Similar Documents

Publication Publication Date Title
US10270791B1 (en) Search entity transition matrix and applications of the transition matrix
WO2010037314A1 (en) A method for searching and the device and system thereof
Bennett et al. Inferring and using location metadata to personalize web search
KR100887169B1 (en) Generating user information for use in targeted advertising
CN101641697B (en) Related search queries for a webpage and their applications
US9177063B2 (en) Endorsing search results
CN105701216B (en) A kind of information-pushing method and device
US9324112B2 (en) Ranking authors in social media systems
JP4906846B2 (en) Scoring user compatibility in social networks
EP2336905A1 (en) A searching method and system
US7831474B2 (en) System and method for associating an unvalued search term with a valued search term
US20130179426A1 (en) Search and Retrieval Methods and Systems of Short Messages Utilizing Messaging Context and Keyword Frequency
US20080109285A1 (en) Techniques for determining relevant advertisements in response to queries
WO2011137125A1 (en) Vertical search-based query method, system and apparatus
CN101699440B (en) Service-based retrieving method and service-based retrieving system
CN103473291A (en) Personalized service recommendation system and method based on latent semantic probability models
EP2774061A1 (en) Method and apparatus of ranking search results, and search method and apparatus
CN102056335A (en) Mobile search method, device and system
CN106777282B (en) The sort method and device of relevant search
CN101777989B (en) Search method and server
WO2010096986A1 (en) Mobile search method and device
CN113627995B (en) Commodity recommendation list updating method and device, equipment, medium and product thereof
CN108664515A (en) A kind of searching method and device, electronic equipment
CN108415928B (en) Book recommendation method and system based on weighted mixed k-nearest neighbor algorithm
CN105677838A (en) User profile creating and personalized search ranking method and system based on user requirements

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 09817228

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2009817228

Country of ref document: EP