WO2021139209A1 - 查询自动补全的方法、装置、设备和计算机存储介质 - Google Patents

查询自动补全的方法、装置、设备和计算机存储介质 Download PDF

Info

Publication number
WO2021139209A1
WO2021139209A1 PCT/CN2020/116632 CN2020116632W WO2021139209A1 WO 2021139209 A1 WO2021139209 A1 WO 2021139209A1 CN 2020116632 W CN2020116632 W CN 2020116632W WO 2021139209 A1 WO2021139209 A1 WO 2021139209A1
Authority
WO
WIPO (PCT)
Prior art keywords
poi
query
user
vector representation
history information
Prior art date
Application number
PCT/CN2020/116632
Other languages
English (en)
French (fr)
Inventor
李莹
黄际洲
范淼
王海峰
Original Assignee
百度在线网络技术(北京)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 百度在线网络技术(北京)有限公司 filed Critical 百度在线网络技术(北京)有限公司
Priority to US17/311,793 priority Critical patent/US20220342936A1/en
Priority to EP20894917.2A priority patent/EP3879416A4/en
Publication of WO2021139209A1 publication Critical patent/WO2021139209A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90324Query formulation using system suggestions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Definitions

  • This application relates to the field of computer application technology, and in particular to a method, device, equipment, and computer storage medium for automatic query completion in the field of intelligent search technology.
  • QAC Query Auto-Completion
  • the search engine can recommend a series of candidate POIs to the user in the candidate list in real time for the user to choose as the query completion result (in this application, the query recommended in the candidate list is called query completion suggestion) .
  • the user finds his intended POI in the candidate list, he can complete the query by selecting the POI from the candidate list, thereby initiating the retrieval of the POI.
  • the suggestions provided for the same query prefix are all the same. For example, they are all sorted based on the search popularity of each POI in the candidate list, which does not fit the user's requirements well. Individual needs.
  • the present application provides a method, device, device, and computer storage medium for automatic query completion, so that recommended query completion suggestions better meet the actual needs of users.
  • this application provides a method for automatic query completion, which includes:
  • the query completion suggestion recommended to the user is determined according to the score of each candidate POI.
  • obtaining the vector representation of the user's query history information includes:
  • the query history information includes the POIs queried or clicked by the user in the first period of time and the high-frequency POIs queried or clicked by the user in the second period of time, the first 2.
  • the duration is greater than the first duration
  • the vector representation of the POI is used to obtain the vector representation of the query history information of the user.
  • the vector representation of each POI is obtained in advance in the following manner:
  • each POI sequence according to the preset sliding window size, and each slice includes the center POI and the context POI of the center POI;
  • the vector representation of each POI is obtained from the skip-gram model.
  • the training of the skip-gram model using each slice includes:
  • encoding the attribute information of each POI includes:
  • the fully connected layer is mapped to obtain the vector representation of the POI.
  • the ranking model further uses the attribute feature vector representation of the user and the popularity feature vector representation of each candidate POI when scoring each candidate POI.
  • this application provides a method for establishing a ranking model for automatic query completion.
  • the method includes:
  • the vector representation of the query history information before the user enters the query prefix and the vector representation of the POI selected by the user in the query completion suggestion Take the vector representation of the query history information before the user enters the query prefix and the vector representation of the POI selected by the user in the query completion suggestion as a positive example, the vector representation of the query history information before the user enters the query prefix and the corresponding query complement
  • the POI that is not selected by the user in the full recommendation is taken as a negative example, and the neural network model is trained to obtain the ranking model.
  • the training objective is to maximize the difference between the neural network model’s score on the positive POI and the negative POI’s score value.
  • said obtaining the vector representation of the query history information of the user before inputting the query prefix includes:
  • the query history information includes the POI that the user queried or clicked in the first time period before the query prefix was entered, and the high frequency of the query or clicked in the second time period POI, the second duration is greater than the first duration;
  • the vector representation of the POI is used to obtain the vector representation of the query history information of the user before entering the query prefix.
  • the vector representation of each POI is obtained in advance in the following manner:
  • each POI sequence according to the preset sliding window size, and each slice includes the center POI and the context POI of the center POI;
  • the vector representation of each POI is obtained from the skip-gram model.
  • the training of the skip-gram model using each slice includes:
  • the positive example further includes the attribute feature vector representation of the user and the popularity feature vector representation of the POI selected by the user;
  • the negative example further includes the attribute feature vector representation of the user and the popularity feature vector representation of the POI not selected by the user.
  • this application also provides an automatic query completion device, which includes:
  • the first obtaining unit is configured to obtain the query prefix currently input by the user and determine the candidate POI corresponding to the query prefix;
  • the second acquiring unit is configured to acquire the vector representation of the user's query history information and the vector representation of each candidate POI;
  • the scoring unit is used to input the vector representation of the user's query history information and the vector representation of each candidate POI into a pre-trained ranking model to obtain a score for each candidate POI;
  • the query completion unit is used to determine the query completion suggestions recommended to the user according to the score of each candidate POI.
  • the present application provides a device for establishing a ranking model for automatic query completion, and the device includes:
  • the first obtaining unit is used to obtain the user ID from the POI query log, the query prefix entered when the user selects the POI from the query completion suggestion, each POI in the query completion suggestion corresponding to the query prefix, and the query completion suggestion POI selected by the user;
  • the second acquiring unit is used to acquire the vector representation of the query history information before the user enters the query prefix and the vector representation of each POI in the query completion suggestion;
  • the model training unit is used to take the vector representation of the query history information before the user enters the query prefix and the vector representation of the POI selected by the user in the query completion suggestion as a positive example, the user’s query history information before the query prefix is entered
  • the vector representation and the POIs that are not selected by the user in the corresponding query completion suggestions are taken as negative examples, and the neural network model is trained to obtain the ranking model.
  • the training objective is to maximize the score of the neural network model on the positive example POI and the negative example POI The difference between the ratings.
  • this application provides an electronic device, including:
  • At least one processor At least one processor
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method as described above.
  • the present application provides a non-transitory computer-readable storage medium storing computer instructions, and the computer instructions are used to make the computer execute the method described above.
  • the user's query history information is integrated into the ranking model to rank candidate POIs, so that the query completion suggestions recommended to the user are more in line with the user's search preferences.
  • Figure 1 is a sample diagram of the interface for query auto-completion
  • Figure 2 shows an exemplary system architecture to which embodiments of the present invention can be applied
  • FIG. 3 is a flowchart of a query completion method provided in Embodiment 1 of this application;
  • FIG. 4 is a flowchart of a method for obtaining a POI vector representation provided by Embodiment 1 of this application;
  • FIG. 5 is a schematic diagram of method processing provided by an embodiment of the application.
  • FIG. 6 is a flowchart of a method for establishing a ranking model provided by Embodiment 2 of the application;
  • FIG. 7 is a structural diagram of a device for automatic query completion provided in Embodiment 3 of this application.
  • FIG. 8 is a structural diagram of an apparatus for establishing a sorting model provided by an embodiment of the application.
  • Fig. 9 is a block diagram of an electronic device used to implement the method of the embodiment of the present application.
  • FIG. 2 shows an exemplary system architecture to which embodiments of the present invention can be applied.
  • the system architecture may include terminal devices 101 and 102, a network 103 and a server 104.
  • the network 103 is used to provide a medium for communication links between the terminal devices 101 and 102 and the server 104.
  • the network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the user can use the terminal devices 101 and 102 to interact with the server 104 through the network 103.
  • Various applications may be installed on the terminal devices 101 and 102, such as voice interactive applications, web browser applications, and communication applications.
  • the terminal devices 101 and 102 may be various electronic devices, including but not limited to smart phones, tablet computers, PCs, smart TVs, and so on.
  • the device for automatic query completion provided by the present invention can be set up and run on the server 104. It can be implemented as multiple software or software modules (for example, to provide distributed services), or as a single software or software module, which is not specifically limited here.
  • the browser or the client when the user enters the query prefix on the search interface provided by the browser or the client on the terminal device 101, the browser or the client provides the query prefix to the server 104 in real time, and the server uses the method provided in this application to send the query prefix to the server 104.
  • the terminal device 101 returns the query completion suggestion corresponding to the query prefix currently input by the user. If the user finds a desired POI from the query completion suggestion, he can initiate a search for the POI by selecting the POI.
  • the user does not find the desired POI from the query completion suggestions, he can continue to input, and then the browser or client will provide the query prefix to the server 104 in real time, and the server 104 will return the query completion corresponding to the query prefix entered by the user Suggest.
  • the query completion suggestions are recommended to the user in real time following the query prefix input by the user.
  • the server 104 may be a single server or a server group composed of multiple servers. It should be understood that the numbers of terminal devices, networks, and servers in FIG. 2 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.
  • the essence of the technology of this application is to establish the association between the user and the POI, and its use scenario can be: when the user uses map data to perform POI retrieval, the query prefix entered by the user is recommended to the user in real time with query completion suggestions.
  • the query completion suggestion is obtained after the candidate POI corresponding to the query prefix input by the user is determined, and then the candidate POIs are sorted using the ranking model.
  • the ranking of each candidate POI often considers the popularity characteristics of each candidate POI, and in some cases, the attribute characteristics of some users are also considered.
  • this sorting method cannot well meet the actual needs of users.
  • a large number of users will retrieve the same POI repeatedly. For example, about 20% of users will retrieve the same POI repeatedly within 7 days.
  • the core idea of this application is to integrate the user's personalized query history information as a unique feature of each user into the ranking model, then it can quickly capture the user's repeated retrieval of the same POI, and more quickly complete the user's Retrieve intent. The method provided in this application will be described in detail below in conjunction with embodiments.
  • FIG. 3 is a flowchart of a query completion method provided in Embodiment 1 of this application. As shown in FIG. 3, the method may include the following steps:
  • the query prefix currently input by the user is obtained, and the candidate POI corresponding to the query prefix is determined.
  • This application is suitable for various forms of input content, which can be Chinese characters, pinyin, acronyms, etc., but the input query prefixes can all be regarded as character strings.
  • the query prefix currently entered by the user is obtained in real time. For example, when a user wants to enter "Baidu Building", he will enter multiple query prefixes such as "bai", “Baidu”, “Baidu Da”, etc., and then execute the method provided in this application for each query prefix. That is, when the user enters "hundred”, the currently entered query prefix is "hundred”, and the method of executing this application for the query prefix recommends query completion suggestions for the user.
  • the currently entered query prefix is “Baidu”, and the method of executing this application for the query prefix recommends query completion suggestions for the user.
  • the currently entered query prefix is "Baidu Big”, and the method of executing this application for this query prefix recommends query completion suggestions for the user.
  • the method of determining the candidate POI corresponding to the currently input query prefix may adopt an existing implementation method, and the purpose is to find a POI that is strongly related to the query prefix, or to find a POI whose text starts with the query prefix.
  • an inverted index can be established in the POI library with various corresponding query prefixes for the POI in advance.
  • the POI library is queried for the currently entered query prefix, and all the POIs that are hit are regarded as candidate POIs.
  • the vector representation of the user's query history information and the vector representation of each candidate POI are obtained.
  • the user's query history information when obtaining the vector representation of the user's query history, can be obtained first, and then the vector representation of the POI is used to obtain the vector representation of the user's query history information.
  • the user's query history information may include POIs that the user queried or clicked in the first time period, and high-frequency POIs that the user queried or clicked in the second time period, the second time period being greater than the first time period.
  • the POI inquired or clicked by the user in the first period of time can be regarded as the user's short-term query history.
  • the short-term query history may include previous user behaviors in the same retrieval session as the currently entered query prefix, for example, POIs queried before the currently entered query prefix, POIs clicked in the same session.
  • the short-term search history can be regarded as the context information of the current query (query prefix), which reflects the short-term immediate interest of the user.
  • the above-mentioned "session” refers to a retrieval session, and a widely used method for determining a retrieval session can be used here. If the user does not have any retrieval behavior before the first time period (for example, 30 minutes), the first retrieval behavior within the first time period can be referred to as the beginning of the current session. In other words, if the user has continuous retrieval behaviors within 30 minutes, then the continuous retrieval behaviors within 30 minutes belong to the same session.
  • the POI inquired or clicked by the user in the second time period can be regarded as the long-term query history of the user.
  • the long-term query history refers to all the retrieval behaviors of the user in the second time period before the currently entered query (query prefix), including queries or clicked high-frequency POIs in all sessions of the user during the second time period.
  • the so-called "high-frequency POI” may be a POI whose number of queries or clicks exceeds a preset threshold.
  • the long-term query history reflects the user's long-term inherent interest preferences.
  • the above-mentioned first time length can be selected in a minute level or an hour level, for example, 30 minutes.
  • the above-mentioned second duration can be selected from day level or month level, for example, 3 months.
  • the vector representation of each POI can be obtained in advance. Assume that the vector representation of POI is k-dimensional, and k is a positive integer greater than 1. The user has inquired or clicked on m POIs in the first time period, and has inquired or clicked on n high-frequency POIs in the second time period. Then the vector representation of the POI is used to represent these (m+n) POIs, and then they can get The (m+n)*k-dimensional vector matrix is used as the vector representation of the user's query history information.
  • the method for obtaining the vector representation of the POI in advance will be described in detail below.
  • the vector representation of the POI here reflects the vector representation of the meaning of the text, and its acquisition method can be as shown in Figure 4, including the following steps:
  • the POI query log of large-scale users is obtained, and the POIs inquired or clicked by each user are arranged in time sequence to obtain each POI sequence.
  • each POI sequence is sliced according to a preset sliding window size, and each slice includes a center POI and a context POI of the center POI.
  • each POI sequence can be sliced into slices composed of up to 3 POIs.
  • slices can be obtained: [POI_ID_2, POI_ID_6, POI_ID_7], [ POI_ID_6, POI_ID_7, POI_ID_8] and so on.
  • Each slice can include the center POI and the context POI of the center POI.
  • the so-called central POI is the POI that is not located at both ends of the slice.
  • the context POI of the central POI can be other POIs in the slice except the central POI, or the POI adjacent to the central POI in the slice.
  • the skip-gram model is trained using each slice.
  • the skip-gram model is a model used in the field of natural language processing to predict the context word corresponding to a given central word.
  • the skip-gram model is used for reference and used to obtain the vector representation of each POI.
  • the skip-gram model can be used to encode the attribute information of each POI to obtain the vector representation of each POI, and the vector representation of the central POI in each slice is used to predict the vector representation of the context POI in the same slice, and iterate according to the error of the prediction result Update the model parameters of the skip-gram model.
  • the attribute information involved may include, but is not limited to, information such as the identification, name, category, address, and label of the POI.
  • the name and address information of the POI can be encoded using a convolutional neural network, and other attribute information of the POI can be encoded using a feedforward neural network. Then, the vector obtained by splicing the coding results of the same POI is mapped through the fully connected layer to obtain the vector representation of the POI.
  • the training of the skip-gram model ends.
  • the vector representation of each POI is obtained from the skip-gram model.
  • the vector representation of the user's query history information and the vector representation of each candidate POI are input into the pre-trained ranking model to obtain the score of each candidate POI.
  • the ranking model When scoring each candidate POI, the ranking model further uses the attribute feature vector representation of the user and the popularity feature vector representation of each candidate POI. That is to say, the input of the ranking model includes the vector representation of the user’s query history information, the vector representation of each candidate POI, the user’s attribute feature vector representation, and the popularity feature vector representation of each candidate POI.
  • the output of the ranking model is the representation of each candidate POI. Rating.
  • the ranking model may be a neural network model, and the training process will be described in detail in the second embodiment.
  • the user's attribute characteristics can include information such as the user's age, gender, job, income level, and city.
  • the vector representation of the user's attribute characteristics can be obtained after encoding this information.
  • the popularity characteristics of the candidate POI can be characterized by the click frequency, retrieval frequency, navigation frequency and other information of the candidate POI, and the vector representation of the popularity characteristics of the candidate POI can be obtained by encoding these information. The details are not repeated, and the method in the prior art can be adopted.
  • e w2v is used as the vector representation of the POI
  • U per is used as the vector table of the user's query history information
  • U d is used as the vector of user attribute characteristics
  • V pop is used as the vector of the popularity characteristics of the candidate POI.
  • the (m+n)*k-dimensional U per and the k-dimensional e w2v can be multiplied to obtain the m+n-dimensional similarity feature matrix V per :
  • V per , U d and V pop can be spliced into a new feature vector, which is transformed by the neural network to obtain the score of the candidate POI.
  • a query completion suggestion recommended to the user is determined according to the score of each candidate POI.
  • candidate POIs with a score greater than or equal to a preset score threshold can be used as query completion suggestions, or POIs with the top P scores can be used as query completion suggestions, etc., where P is the preset Is a positive integer.
  • the candidate list is sorted according to the scores of each POI. The recommended method can continue to use the existing drop-down box near the search box, or use other forms.
  • the user's query history information is incorporated into the ranking model to rank candidate POIs, so that the query completion suggestions recommended to the user are more in line with the user's search preferences. For example, a user works in "Baidu Building", and therefore will often search for POIs of "Baidu Building” for navigation or traffic condition queries, etc. However, in the prior art, it is necessary to sort according to the search popularity of POIs. Unless a large number of users like to click or search the POI of "Baidu Building" very much, otherwise "Baidu Building” will not be ranked very high in the query completion suggestions.
  • Fig. 6 is a flow chart of the method for establishing a ranking model provided in the second embodiment of the application. As shown in Fig. 6, the method may specifically include the following steps:
  • the user ID is obtained from the POI query log, the query prefix entered when the user selects the POI from the query completion suggestion, each POI in the query completion suggestion corresponding to the query prefix, and the query completion suggestion by the user The selected POI.
  • the vector representation of the query history information of the user before entering the query prefix and the vector representation of each POI in the query completion suggestion are obtained.
  • the query history information of the user before entering the query prefix can be obtained.
  • the query history information can include the POIs that the user has queried or clicked in the first time period before the query prefix is entered, and the POIs queried or clicked in the second time period. Frequency POI, where the second duration is greater than the first duration; then the vector representation of the POI is used to obtain the vector representation of the query history information of the user before entering the query prefix.
  • step 302 The implementation of this step is similar to the implementation of step 302 in the first embodiment, and the vector representation of the POI can also refer to the implementation of step 302 in the first embodiment, which will not be repeated here.
  • the vector representation of the query history information before the user enters the query prefix and the vector representation of the POI selected by the user in the corresponding query completion suggestions are taken as positive examples, and the vector representation of the query history information before the user enters the query prefix And the POI that is not selected by the user in the corresponding query completion suggestion is used as a negative example, and the neural network model is trained to obtain the ranking model.
  • the training of the ranking model can be done in a pairwise manner.
  • the above positive example may further include the attribute feature vector representation of the user and the popularity feature vector representation of the POI selected by the user; the negative example further includes the attribute feature vector representation of the user and the popularity feature of the POI not selected by the user Vector representation.
  • the processing procedure is similar to that shown in Figure 5. That is to say, the positive examples include: the vector representation of the query history information before the user enters the query prefix (corresponding to U per in Figure 5), and the vector representation of the POI selected by the user in the query completion suggestion (corresponding to Figure 5 E w2v ), the attribute feature vector representation of the user (corresponding to U d in Fig. 5), and the popularity feature vector representation of the POI selected by the user (corresponding to V pop in Fig. 5), where U per and e w2v can be correlated After multiplication, the similarity feature matrix V per is obtained .
  • Negative examples include the vector representation of the query history information before the user enters the query prefix (corresponding to U per in Figure 5), and the vector representation of the POI that is not selected by the user in the query completion suggestion (corresponding to e w2v in Figure 5) , The user's attribute feature vector representation (corresponding to U d in Figure 5) and the popularity feature vector representation of the POI that has not been selected by the user (corresponding to V pop in Figure 5), where U per and e w2v can be multiplied after processing Obtain the similarity feature matrix j ⁇ k (i) .
  • the input vectors indicate that the scores of the positive POI and the negative POI are obtained after the splicing and transformation of the ranking model, and the parameters of the ranking model are updated according to the scores of the positive POI and the score of the negative POI, until it reaches Training goals.
  • the training objective can be: maximizing the difference between the score of the positive POI and the score of the negative POI of the neural network model.
  • the above training objective can be embodied as minimizing the loss L ⁇ of the neural network model, for example, the following formula can be used:
  • is a hyperparameter.
  • a piece of training data (the ith training data) can be expressed as: (u (i) , ⁇ v (i,1) ,...,v (i,j) ,...v (i,n) ⁇ ,k (i) ), m is the number of training data.
  • u is the vector representation of the user, in the embodiment of this application it is the user's U d
  • ⁇ v (i,1) ,...,v (i,j) ,...v (i,n) ⁇ is the query completion
  • the set of POIs in the suggestion k (i) is the POI selected by the user in the query completion suggestion.
  • the vector v can be a concatenation of V pop and V per. It is a positive example, (u (i) ,v (i,j) ) is a negative example, where j ⁇ k (i) . h() is the function used by the ranking model to score the POI, which contains the model parameters that need to be updated during the training process of the ranking model.
  • FIG. 7 is a structural diagram of the device for automatic query completion provided in the third embodiment of the application.
  • the device may include: a first obtaining unit 01, a second obtaining unit 02, a scoring unit 03, and query completion
  • the unit 04 may further include a third acquiring unit 05.
  • the main functions of each component are as follows:
  • the first obtaining unit 01 is responsible for obtaining the query prefix currently input by the user and determining the candidate POI corresponding to the query prefix.
  • the method of determining the candidate POI corresponding to the currently input query prefix may adopt an existing implementation method, and the purpose is to find a POI that is strongly related to the query prefix, or to find a POI whose text starts with the query prefix.
  • an inverted index can be established in the POI library with various corresponding query prefixes for the POI in advance.
  • the second acquiring unit 02 is responsible for acquiring the vector representation of the user's query history information and the vector representation of each candidate POI.
  • the second acquiring unit 02 can acquire the user's query history information.
  • the query history information includes the POIs that the user queried or clicked in the first time period and the high-frequency POIs that the user queried or clicked in the second time period.
  • the second The duration is greater than the first duration; the vector representation of the POI is used to obtain the vector representation of the user's query history information.
  • the vector representation of each POI may be predetermined by the third acquiring unit 05.
  • the third obtaining unit 05 may obtain POI query logs of large-scale users, and arrange the POIs queried or clicked by each user according to the time sequence to obtain each POI sequence; according to the preset sliding window size, perform the POI sequence on each POI sequence.
  • Slices, each slice includes the center POI and the context POI of the center POI; each slice is used to train the skip-gram model; after the training, the vector representation of each POI is obtained from the skip-gram model.
  • the third acquiring unit 05 uses each slice to train the skip-gram model, it can use the skip-gram model to encode the attribute information of each POI to obtain the vector representation of each POI, using the vector of the center POI in each slice It is a vector representation that predicts the context POI in the same slice, and iteratively updates the model parameters of the skip-gram model according to the error of the prediction result.
  • the name and address information of the POI can be encoded using a convolutional neural network; other attribute information of the POI can be encoded using a feedforward neural network; and the information of the same POI can be encoded using a feedforward neural network. After the encoding results are spliced, they are mapped through the fully connected layer to obtain the vector representation of the POI.
  • the scoring unit 03 is responsible for inputting the vector representation of the user's query history information and the vector representation of each candidate POI into the pre-trained ranking model to obtain a score for each candidate POI.
  • the scoring unit 03 may input the attribute feature vector representation of the user and the popularity feature vector representation of each candidate POI into the ranking model together, for the ranking model to score each candidate POI.
  • the specific processing method please refer to the related description in the first embodiment, which will not be repeated here.
  • the query completion unit 04 is responsible for determining the query completion suggestions recommended to the user according to the score of each candidate POI. For example, candidate POIs with a score greater than or equal to a preset score threshold can be used as query completion suggestions, and POIs with the top P scores can be used as query completion suggestions, etc., where P is the preset positive Integer.
  • the candidate list is sorted according to the scores of each POI. The recommended method can continue to use the existing drop-down box near the search box, or use other forms.
  • FIG. 8 is a structural diagram of an apparatus for establishing a ranking model provided by an embodiment of the application.
  • the apparatus may include: a first obtaining unit 11, a second obtaining unit 12, and a model training unit 13, and may further include The third obtaining unit 14.
  • the main functions of each component are as follows:
  • the first obtaining unit 11 is responsible for obtaining the user ID from the POI query log, the query prefix that the user has entered when selecting a POI from the query completion suggestions, each POI in the query completion suggestions corresponding to the query prefix, and the query completion suggestions. The POI selected by the user.
  • the second acquiring unit 12 is responsible for acquiring the vector representation of the query history information before the user enters the query prefix and the vector representation of each POI in the query completion suggestion.
  • the second obtaining unit 12 may obtain the query history information of the user before entering the query prefix.
  • the query history information includes the POIs that the user queried or clicked in the first time period before the query prefix was entered, and the POIs queried or clicked in the second time period.
  • the second duration is greater than the first duration; the vector representation of the POI is used to obtain the vector representation of the query history information before the user enters the query prefix.
  • the vector representation of each POI is obtained in advance by the third obtaining unit 14.
  • the third obtaining unit 14 may obtain POI query logs of large-scale users, and arrange the POIs queried or clicked by each user according to the time sequence to obtain each POI sequence; according to the preset sliding window size, perform the POI sequence on each POI sequence.
  • Slices each slice includes the center POI and the context POI of the center POI; each slice is used to train the skip-gram model; after the training, the vector representation of each POI is obtained from the skip-gram model.
  • the third acquisition unit 14 when the third acquisition unit 14 uses each slice to train the skip-gram model, it can use the skip-gram model to encode the attribute information of each POI to obtain the vector representation of each POI, and use the center POI in each slice to encode the attribute information.
  • the vector represents the vector representation of predicting the context POI in the same slice, and iteratively updates the model parameters of the skip-gram model according to the error of the prediction result.
  • the name and address information of the POI can be encoded using a convolutional neural network; other attribute information of the POI is encoded using a feedforward neural network; the information of the same POI can be encoded using a feedforward neural network. After the encoding results are spliced, they are mapped through the fully connected layer to obtain the vector representation of the POI.
  • the model training unit 13 is responsible for taking the vector representation of the query history information before the user enters the query prefix and the vector representation of the POI selected by the user in the query completion suggestion as a positive example, and the vector representation of the query history information before the user enters the query prefix Represents and the POI that is not selected by the user in the corresponding query completion suggestions as negative examples, trains the neural network model to obtain the ranking model, where the training goal is to maximize the neural network model’s score for the positive case POI and the negative case POI score The difference between.
  • the above positive example may further include the attribute feature vector representation of the user and the popularity feature vector representation of the POI selected by the user;
  • the above negative example may further include the attribute feature vector representation of the user and the POI not selected by the user. Hotness feature vector representation.
  • the present application also provides an electronic device and a readable storage medium.
  • FIG. 9 it is a block diagram of an electronic device according to the method for automatically completing a query or a method for establishing a ranking model according to an embodiment of the present application.
  • Electronic devices are intended to represent various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices can also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are merely examples, and are not intended to limit the implementation of the application described and/or required herein.
  • the electronic device includes one or more processors 901, a memory 902, and interfaces for connecting various components, including a high-speed interface and a low-speed interface.
  • the various components are connected to each other using different buses, and can be installed on a common motherboard or installed in other ways as needed.
  • the processor may process instructions executed in the electronic device, including instructions stored in or on the memory to display graphical information of the GUI on an external input/output device (such as a display device coupled to an interface).
  • an external input/output device such as a display device coupled to an interface.
  • multiple processors and/or multiple buses can be used with multiple memories and multiple memories.
  • multiple electronic devices can be connected, and each device provides part of the necessary operations (for example, as a server array, a group of blade servers, or a multi-processor system).
  • a processor 901 is taken as an example.
  • the memory 902 is a non-transitory computer-readable storage medium provided by this application.
  • the memory stores instructions that can be executed by at least one processor, so that the at least one processor executes the method for automatic query completion or the method for establishing a ranking model provided in this application.
  • the non-transitory computer-readable storage medium of the present application stores computer instructions, and the computer instructions are used to make a computer execute the method for automatic query completion or the method for establishing a ranking model provided by the present application.
  • the memory 902 as a non-transitory computer-readable storage medium, can be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as the method of automatic query completion or the method of establishing a ranking model in the embodiment of the present application.
  • the processor 901 executes various functional applications and data processing of the server by running the non-transient software programs, instructions, and modules stored in the memory 902, that is, implements the query automatic completion method in the foregoing method embodiment or establishes a ranking model Methods.
  • the memory 902 may include a program storage area and a data storage area, where the program storage area can store an operating system and an application program required by at least one function; the data storage area can store data created according to the use of the electronic device, and the like.
  • the memory 902 may include a high-speed random access memory, and may also include a non-transitory memory, such as at least one magnetic disk storage device, a flash memory device, or other non-transitory solid-state storage devices.
  • the memory 902 may optionally include a memory remotely provided with respect to the processor 901, and these remote memories may be connected to the electronic device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the electronic device may further include: an input device 903 and an output device 904.
  • the processor 901, the memory 902, the input device 903, and the output device 904 may be connected by a bus or in other ways. The connection by a bus is taken as an example in FIG. 9.
  • the input device 903 can receive input digital or character information, and generate key signal input related to the user settings and function control of the electronic device, such as touch screen, keypad, mouse, track pad, touch pad, indicator stick, one or more A mouse button, trackball, joystick and other input devices.
  • the output device 904 may include a display device, an auxiliary lighting device (for example, LED), a tactile feedback device (for example, a vibration motor), and the like.
  • the display device may include, but is not limited to, a liquid crystal display (LCD), a light emitting diode (LED) display, and a plasma display. In some embodiments, the display device may be a touch screen.
  • Various implementations of the systems and techniques described herein can be implemented in digital electronic circuit systems, integrated circuit systems, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: being implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, the programmable processor It can be a dedicated or general-purpose programmable processor that can receive data and instructions from the storage system, at least one input device, and at least one output device, and transmit the data and instructions to the storage system, the at least one input device, and the at least one output device. An output device.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, device, and/or device used to provide machine instructions and/or data to a programmable processor ( For example, magnetic disks, optical disks, memory, programmable logic devices (PLD)), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described here can be implemented on a computer that has: a display device for displaying information to the user (for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) ); and a keyboard and a pointing device (for example, a mouse or a trackball) through which the user can provide input to the computer.
  • a display device for displaying information to the user
  • LCD liquid crystal display
  • keyboard and a pointing device for example, a mouse or a trackball
  • Other types of devices can also be used to provide interaction with the user; for example, the feedback provided to the user can be any form of sensory feedback (for example, visual feedback, auditory feedback, or tactile feedback); and can be in any form (including Acoustic input, voice input, or tactile input) to receive input from the user.
  • the systems and technologies described herein can be implemented in a computing system that includes back-end components (for example, as a data server), or a computing system that includes middleware components (for example, an application server), or a computing system that includes front-end components (for example, A user computer with a graphical user interface or a web browser, through which the user can interact with the implementation of the system and technology described herein), or includes such back-end components, middleware components, Or any combination of front-end components in a computing system.
  • the components of the system can be connected to each other through any form or medium of digital data communication (for example, a communication network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • the computer system can include clients and servers.
  • the client and server are generally far away from each other and usually interact through a communication network.
  • the relationship between the client and the server is generated by computer programs that run on the corresponding computers and have a client-server relationship with each other.
  • the user's query history information is integrated into the ranking model to sort the candidate POIs, so that the query completion suggestions recommended to the user are more in line with the user's search preferences.
  • this application When integrating the user's query history information, this application considers both the user's short-term immediate interest and the user's long-term interest preferences, so that the recommended query completion suggestions conform to the user's retrieval preferences as much as possible.
  • This application uses the skip-gram model when determining the vector representation of the POI, so that the vector representation of the POI is more in line with the contextual constraints in terms of the textual meaning.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种查询自动补全的方法、装置、设备和计算机存储介质,涉及智能搜索技术领域。所述方法为:获取用户当前输入的查询前缀,以及确定与所述查询前缀对应的候选POI(301);获取所述用户的查询历史信息的向量表示以及各候选POI的向量表示(302);将所述用户的查询历史信息的向量表示以及各候选POI的向量表示输入预先训练得到的排序模型,得到对各候选POI的评分(303);依据各候选POI的评分确定向所述用户推荐的查询补全建议(304)。所述方法使得推荐的查询补全建议更好地符合用户的现实需求。

Description

查询自动补全的方法、装置、设备和计算机存储介质
本申请要求了申请日为2020年01月06日,申请号为2020100104792发明名称为“查询自动补全的方法、装置、设备和计算机存储介质”的中国专利申请的优先权。
技术领域
本申请涉及计算机应用技术领域,特别涉及智能搜索技术领域的一种查询自动补全的方法、装置、设备和计算机存储介质。
背景技术
QAC(Query Auto-Completion,查询自动补全)目前已被主流的通用搜索引擎和垂直搜索引擎广泛采用。例如,在地图类应用中,当用户输入query(查询)以搜索某个POI(Point of Interest,兴趣点)时,从用户输入不完整query开始(在本申请中将用户输入的不完整query称为查询前缀),搜索引擎可以在候选列表中实时向用户推荐一系列候选的POI以供用户选择作为query的补全结果(在本申请中将候选列表中推荐的query称为查询补全建议)。一旦用户在候选列表中发现其意向的POI,则通过从候选列表中选择该POI即可补全query,从而发起该POI的检索。
举个例子,如图1中所示,当用户在地图类应用的检索框中输入查询前缀“百度”时,能够以候选列表的形式向用户推荐诸如“百度大厦”、“百度大厦-C座”、“百度科技园”等等候选POI以供用户选择,一旦用户从中选择了“百度大厦”,则完成query的补全,发起针对“百度大厦”的检索。
然而,现有的查询自动补全方案中,针对相同的查询前缀提供的建议均是相同的,例如均是在候选列表中基于各POI的检索热度进行排序,并不能够很好地符合用户的个性化需求。
发明内容
有鉴于此,本申请提供了一种查询自动补全的方法、装置、设备和计算机存储介质,使得推荐的查询补全建议更好地符合用户的现实需求。
第一方面,本申请提供了一种查询自动补全的方法,该方法包括:
获取用户当前输入的查询前缀,以及确定与所述查询前缀对应的候选兴趣点POI;
获取所述用户的查询历史信息的向量表示以及各候选POI的向量表示;
将所述用户的查询历史信息的向量表示以及各候选POI的向量表示输入预先训练得到的排序模型,得到对各候选POI的评分;
依据各候选POI的评分确定向所述用户推荐的查询补全建议。
根据本申请一优选实施方式,获取所述用户的查询历史信息的向量表示包括:
获取所述用户的查询历史信息,所述查询历史信息包括所述用户在第一时长内查询或点击过的POI以及所述用户在第二时长内查询或点击过的高频POI,所述第二时长大于所述第一时长;
利用POI的向量表示获取所述用户的查询历史信息的向量表示。
根据本申请一优选实施方式,各POI的向量表示采用如下方式预先得到:
获取大规模用户的POI查询日志,将各用户查询或点击的POI分别按照时序进行排列,得到各POI序列;
按照预设的滑动窗口大小,对各POI序列进行切片,各切片包括中心POI和该中心POI的上下文POI;
利用各切片进行跳字skip-gram模型的训练;
训练结束后,从skip-gram模型获得各POI的向量表示。
根据本申请一优选实施方式,所述利用各切片进行skip-gram模型的训练包括:
利用skip-gram模型对各POI的属性信息进行编码,得到各POI的向量表示,以各切片中中心POI的向量表示预测同一切片中上下文POI的向量表示,依据预测结果的误差迭代更新skip-gram模型的模型参数。
根据本申请一优选实施方式,对各POI的属性信息进行编码包括:
将POI的名称和地址信息采用卷积神经网络进行编码;
将POI的其他属性信息采用前馈神经网络进行编码;
将同一POI的编码结果进行拼接后,经过全连接层的映射,得到该POI的向量表示。
根据本申请一优选实施方式,所述排序模型在对各候选POI进行评分时,进一步利用所述用户的属性特征向量表示和各候选POI的热度特征向量表示。
第二方面,本申请提供了一种建立用于查询自动补全的排序模型的方法,该方法包括:
从POI查询日志中获取用户标识、用户从查询补全建议中选择POI时已输入的查询前缀、该查询前缀对应的查询补全建议中的各POI以及查询补全建议中被用户选择的POI;
获取用户在输入查询前缀之前的查询历史信息的向量表示以及查询补全建议中各POI的向量表示;
将用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中被用户选择的POI的向量表示作为正例,用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中未被用户选择的POI作为负例,训练神经网络模型,得到所述排序模型,其中训练目标为:最大化神经网络模型对正例POI的评分与负例POI的评分之间的差值。
根据本申请一优选实施方式,所述获取用户在输入查询前缀之前的查询历史信息的向量表示包括:
获取用户在输入所述查询前缀之前的查询历史信息,所述查询历史信息包括用户在输入所述查询前缀之前第一时长内查询或点击过的POI以及第二时长内查询或点击过的高频POI,所述第二时长大于所述第一时长;
利用POI的向量表示获取所述用户在输入查询前缀之前的查询历史信息的向量表示。
根据本申请一优选实施方式,各POI的向量表示采用如下方式预先得到:
获取大规模用户的POI查询日志,将各用户查询或点击的POI分别按照时序进行排列,得到各POI序列;
按照预设的滑动窗口大小,对各POI序列进行切片,各切片包括中心POI和该中心POI的上下文POI;
利用各切片进行跳字skip-gram模型的训练;
训练结束后,从skip-gram模型获得各POI的向量表示。
根据本申请一优选实施方式,所述利用各切片进行skip-gram模型的训练包括:
利用skip-gram模型对各POI的属性信息进行编码,得到各POI的向量表示,以各切片中中心POI的向量表示预测同一切片中上下文POI的向量表示,依据预测结果的误差迭代更新skip-gram模型的模型参数。
根据本申请一优选实施方式,所述正例中进一步包括用户的属性特征向量表示和被用户选择的POI的热度特征向量表示;
所述负例中进一步包括用户的属性特征向量表示和未被用户选择的POI的热度特征向量表示。
第三方面,本申请还提供了一种查询自动补全的装置,该装置包括:
第一获取单元,用于获取用户当前输入的查询前缀,以及确定与所述查询前缀对应的候选POI;
第二获取单元,用于获取所述用户的查询历史信息的向量表示以及各候选POI的向量表示;
评分单元,用于将所述用户的查询历史信息的向量表示以及各候选POI的向量表示输入预先训练得到的排序模型,得到对各候选POI的评分;
查询补全单元,用于依据各候选POI的评分确定向所述用户推荐的查询补全建议。
第四方面,本申请提供了一种建立用于查询自动补全的排序模型的装置,该装置包括:
第一获取单元,用于从POI查询日志中获取用户标识、用户从查询补全建议中选择POI时已输入的查询前缀、该查询前缀对应的查询补全建议中的各POI以及查询补全建议中被用户选择的POI;
第二获取单元,用于获取用户在输入查询前缀之前的查询历史信息的向量表示以及查询补全建议中各POI的向量表示;
模型训练单元,用于将用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中被用户选择的POI的向量表示作为正例,用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中未被用户选择的POI作为负例,训练神经网络模型,得到所 述排序模型,其中训练目标为:最大化神经网络模型对正例POI的评分与负例POI的评分之间的差值。
第五方面,本申请提供了一种电子设备,包括:
至少一个处理器;以及
与所述至少一个处理器通信连接的存储器;其中,
所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行如上所述的方法。
第六方面,本申请提供了一种存储有计算机指令的非瞬时计算机可读存储介质,所述计算机指令用于使所述计算机执行如上所述的方法。
由以上技术方案可以看出,本申请在POI的查询自动补全中,将用户的查询历史信息融入排序模型进行候选POI的排序,使得向用户推荐的查询补全建议更加符合用户的检索偏好。
上述可选方式所具有的其他效果将在下文中结合具体实施例加以说明。
附图说明
附图用于更好地理解本方案,不构成对本申请的限定。其中:
图1是查询自动补全的界面示例图;
图2示出了可以应用本发明实施例的示例性系统架构;
图3为本申请实施例一提供的查询补全的方法流程图;
图4为本申请实施例一提供的获取POI向量表示的方法流程图;
图5为本申请实施例提供的方法处理示意图;
图6为本申请实施例二提供的建立排序模型的方法流程图;
图7为本申请实施例三提供的查询自动补全的装置结构图;
图8为本申请实施例提供的建立排序模型的装置结构图;
图9是用来实现本申请实施例的方法的电子设备的框图。
具体实施方式
以下结合附图对本申请的示范性实施例做出说明,其中包括本申请实施例的各种细节以助于理解,应当将它们认为仅仅是示范性的。因此, 本领域普通技术人员应当认识到,可以对这里描述的实施例做出各种改变和修改,而不会背离本申请的范围和精神。同样,为了清楚和简明,以下的描述中省略了对公知功能和结构的描述。
图2示出了可以应用本发明实施例的示例性系统架构。如图2所示,该系统架构可以包括终端设备101和102,网络103和服务器104。网络103用以在终端设备101、102和服务器104之间提供通信链路的介质。网络103可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101和102通过网络103与服务器104交互。终端设备101和102上可以安装有各种应用,例如语音交互应用、网页浏览器应用、通信类应用等。
终端设备101和102可以是各种电子设备,包括但不限于智能手机、平板电脑、PC、智能电视等等。本发明所提供的查询自动补全的装置可以设置并运行于服务器104上。其可以实现成多个软件或软件模块(例如用来提供分布式服务),也可以实现成单个软件或软件模块,在此不做具体限定。
例如,当用户通过在终端设备101上的浏览器或客户端提供的检索界面上输入查询前缀时,浏览器或客户端将该查询前缀实时提供给服务器104,由服务器采用本申请提供的方法向终端设备101返回用户当前输入的查询前缀对应的查询补全建议。如果用户从该查询补全建议中发现意愿的POI,则可以通过选择该POI发起针对该POI的检索。如果用户从查询补全建议中未发现意愿的POI,则可以继续进行输入,然后浏览器或客户端在将查询前缀实时提供给服务器104,由服务器104返回用户输入的查询前缀对应的查询补全建议。从而形成这样一种效果:在用户输入query的过程中,随着用户输入的查询前缀实时向用户推荐查询补全建议。
服务器104可以是单一服务器,也可以是多个服务器构成的服务器群组。应该理解,图2中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。
本申请的技术实质在于建立用户与POI之间的关联,其使用场景可以为:当用户使用地图类数据进行POI检索的过程中,随着用户输入的 查询前缀实时向用户推荐查询补全建议。其中查询补全建议是在确定出用户输入的查询前缀对应的候选POI后,利用排序模型对各候选POI进行排序后得到的。
在现有技术中,针对各候选POI的排序往往是考虑各候选POI的热度特征,还有些情况还会考虑一些用户的属性特征。但该排序方式并不能够很好地满足用户的实际需求。通过使用大规模地图数据中用户进行真实POI检索的数据进行统计,发现有很大数量的用户会重复检索相同的POI。例如有20%左右的用户会在7天内重复检索相同的POI。基于此,本申请的核心思想在于,将用户个性化的查询历史信息作为每一个用户独有的特征融入排序模型,那么就可以快速捕捉用户重复检索相同POI的情形,更加快速地补全用户的检索意图。下面结合实施例对本申请提供的方法进行详细描述。
实施例一、
图3为本申请实施例一提供的查询补全的方法流程图,如图3中所示,该方法可以包括以下步骤:
在301中,获取用户当前输入的查询前缀,以及确定与该查询前缀对应的候选POI。
在本申请适用于各种形式的输入内容,可以是汉字、拼音、首字母简称,等等,但输入的查询前缀均可以看成是字符串。随着用户输入查询前缀,实时获取用户当前输入的查询前缀。例如,用户想输入“百度大厦”的过程中,会经历输入“百”、“百度”、“百度大”等多个查询前缀,则针对每一个查询前缀都执行本申请所提供的方法。即用户输入“百”时,当前输入的查询前缀为“百”,针对该查询前缀执行本申请的方法为用户推荐查询补全建议。当用户输入“百度”时,当前输入的查询前缀为“百度”,针对该查询前缀执行本申请的方法为用户推荐查询补全建议。当用户输入“百度大”时,当前输入的查询前缀为“百度大”,针对该查询前缀执行本申请的方法为用户推荐查询补全建议。
确定与当前输入的查询前缀对应的候选POI的方式可以采用现有的实现方式,目的是找到与该查询前缀强相关的POI,或者找到以该查询前缀为文本开头的POI。例如可以在POI库中预先针对POI以各种对应的查询前缀建立倒排索引。当用户输入query时,针对当前输入的查询 前缀查询POI库,命中的所有POI作为候选POI。
在302中,获取用户的查询历史信息的向量表示以及各候选POI的向量表示。
其中,在获取用户的查询历史的向量表示时,可以首先获取用户的查询历史信息,然后利用POI的向量表示获取用户的查询历史信息的向量表示。
具体地,用户的查询历史信息可以包括用户在第一时长内查询或点击过的POI,以及用户在第二时长内查询或点击过的高频POI,第二时长大于第一时长。
其中,用户在第一时长内查询或点击过的POI可以看作是用户的短期查询历史。短期查询历史可以包括与当前输入的查询前缀处于同一个检索会话中的前序用户行为,例如,同一会话中在当前输入的查询前缀之前查询的POI、点击的POI。短期检索历史可以看做是当前query(查询前缀)的上下文信息,其反映了用户的短期即时兴趣。
其中,上述的“会话”(session)指的是检索会话,在此可以使用一个被广泛采用的检索会话的确定方式。如果用户在第一时长(例如30分钟)之前没有任何检索行为,则可以将该第一时长之内首次检索行为是指为本次session的开始。也就是说,如果用户在30分钟内有连续的检索行为,那么这30分钟内的连续检索行为都属于同一session。
其中,用户在第二时长内查询或点击过的POI可以看作是用户的长期查询历史。长期查询历史指的是当前输入的query(查询前缀)之前的第二时长内用户所有的检索行为,包括第二时长内该用户的所有session内的查询或点击过的高频POI。其中,所谓“高频POI”可以是查询或点击次数超过预设阈值的POI。长期查询历史反映了用户长期的内在的兴趣偏好。
在本申请中,上述第一时长可以选取分钟级别、小时级别,例如30分钟。上述第二时长可以选取天级别、月级别,例如3个月。
在本申请实施例中,各POI的向量表示可以预先获取到。假设POI的向量表示是k维的,k为大于1的正整数。用户在第一时长内查询或点击过m个POI,在第二时长内查询或点击过n个高频POI,则利用POI的向量表示来表示这(m+n)个POI,就能够获取到(m+n)*k维的向量矩阵 作为用户的查询历史信息的向量表示。
下面对预先获取POI的向量表示的方法进行详细描述。这里的POI的向量表示体现的是文本含义的向量表示,其获取方式可以如图4中所示,包括以下步骤:
在401中,获取大规模用户的POI查询日志,将各用户查询或点击的POI分别按照时序进行排列,得到各POI序列。
从大规模用户的POI查询日志中,按照用户汇总其先后查询或点击的POI时序,例如:
user_A:POI_ID_1,POI_ID_2,POI_ID_3,…
user_B:POI_ID_2,POI_ID_6,POI_ID_7,POI_ID_8,…
在402中,按照预设的滑动窗口大小,对各POI序列进行切片,各切片包括中心POI和该中心POI的上下文POI。
例如滑动窗口的大小为3,那么可以将各POI序列切成由最多3个POI构成的切片,例如对上述user_B对应的POI时序进行切片后,可以得到切片:【POI_ID_2,POI_ID_6,POI_ID_7】、【POI_ID_6,POI_ID_7,POI_ID_8】等等。
在每一个切片中都可以包括中心POI和中心POI的上下文POI。所谓中心POI就是不位于切片两端的POI,中心POI的上下文POI可以是该切片中除了该中心POI之外的其他POI,也可以是该切片中与该中心POI相邻的POI。
在403中,利用各切片进行skip-gram(跳字)模型的训练。
skip-gram模型是自然语言处理领域中使用的一种模型,用于预测给定中心词相对应的上下文词。本申请中借鉴和利用了skip-gram模型用于获取各POI的向量表示。
具体地,可以利用skip-gram模型对各POI的属性信息进行编码,得到各POI的向量表示,以各切片中中心POI的向量表示预测同一切片中上下文POI的向量表示,依据预测结果的误差迭代更新skip-gram模型的模型参数。
其中,对各POI的属性信息进行编码时,其中涉及的属性信息可以包括但不限于POI的标识、名称、类别、地址、标签等信息。可以将POI的名称和地址信息采用卷积神经网络进行编码,将POI的其他属性信息 采用前馈神经网络进行编码。然后,将同一POI的编码结果进行拼接后得到的向量经过全连接层的映射,得到该POI的向量表示。
经过对skip-gram模型的模型参数进行迭代更新后,若预测结果的误差满足预设要求,或者迭代次数达到预设阈值等等训练结束的条件满足时,结束对skip-gram模型的训练。
在404中,训练结束后,从skip-gram模型获得各POI的向量表示。
继续参见图3。在303中,将用户的查询历史信息的向量表示以及各候选POI的向量表示输入预先训练得到的排序模型,得到各候选POI的评分。
排序模型在对各候选POI进行评分时,进一步利用用户的属性特征向量表示和各候选POI的热度特征向量表示。也就是说,排序模型的输入包括用户的查询历史信息的向量表示、各候选POI的向量表示、用户的属性特征向量表示和各候选POI的热度特征向量表示,排序模型的输出为对各候选POI的评分。其中,排序模型可以是神经网络模型,其训练过程将在实施例二中进行详细描述。
用户的属性特征可以包括诸如用户的年龄、性别、工作、收入等级、所在城市等信息,用户属性特征的向量表示可以对这些信息进行编码后得到。候选POI的热度特征可以由候选POI的点击频率、检索频率、导航频率等信息进行表征,候选POI的热度特征的向量表示可以对这些信息进行编码后得到。具体不做赘述,可以采用现有技术中的方式。
在本申请实施例中以e w2v作为POI的向量表示,将U per作为用户的查询历史信息的向量表,将U d作为用户属性特征的向量表示,将V pop作为候选POI的热度特征的向量表示,上述整个过程可以如图5中所示。作为一种实现方式,可以将(m+n)*k维的U per和k维的e w2v进行相乘处理后,得到m+n维的相似性特征矩阵V per
Figure PCTCN2020116632-appb-000001
然后将V per、U d和V pop输入排序模型,在排序模型中可以将V per、U d和V pop拼接为一个新的特征向量后,经过神经网络的转化得到对候选POI的评分。
在304中,依据各候选POI的评分确定向用户推荐的查询补全建议。
本步骤中,可以将评分值大于或等于预设评分阈值的候选POI作为 查询补全建议,也可以将评分值排在前P个的POI作为查询补全建议,等等,其中P为预设的正整数。在向用户推荐查询补全建议时,依据各POI的评分在候选列表中进行排序。推荐方式可以沿用现有的在检索框附近的下拉框的形式,也可以采用其他形式。
通过本实施例中的方式,将用户的查询历史信息融入排序模型进行候选POI的排序,使得向用户推荐的查询补全建议更加符合用户的检索偏好。举个例子,一个用户在“百度大厦”工作,因此会经常检索“百度大厦”的POI从而进行导航或路况查询等,但现有技术中需要根据POI的检索热度来进行排序。除非大量用户都非常喜欢点击或检索“百度大厦”这一POI,否则“百度大厦”不会在查询补全建议中排序很靠前。而通过本申请实施例中的方式,当该用户输入“ba”等查询前缀时,由于该用户经常检索“百度大厦”,因此对于该用户提供的查询补全建议中,“百度大厦”的排名会非常靠前,从而快速满足用户的检索偏好。
实施例二、
图6为本申请实施例二提供的建立排序模型的方法流程图,如图6中所示,该方法可以具体包括以下步骤:
在601中,从POI查询日志中获取用户标识、用户从查询补全建议中选择POI时已输入的查询前缀、该查询前缀对应的查询补全建议中的各POI以及查询补全建议中被用户选择的POI。
例如,某用户user_A在逐一输入字符形成各查询前缀的过程中,在输入“百度大”时,从查询补全建议中点击了POI“百度大厦A座”,那么获取用户标识user_A、查询前缀“百度大”、对应的查询补全建议中的各POI,以及被用户选择的POI“百度大厦A座”,作为一条数据。采用同样方式可以从大规模用户的POI查询日志中获取很多条数据用于排序模型的训练。
在602中,获取用户在输入查询前缀之前的查询历史信息的向量表示以及查询补全建议中各POI的向量表示。
本步骤中,可以获取用户在输入查询前缀之前的查询历史信息,查询历史信息可以包括用户在输入该查询前缀之前第一时长内查询或点击过的POI以及第二时长内查询或点击过的高频POI,其中第二时长大于第一时长;然后利用POI的向量表示获取该用户在输入查询前缀之前的 查询历史信息的向量表示。
本步骤的实现方式与实施例一中步骤302的实现方式类似,其中POI的向量表示也可以参见实施例一中步骤302的实现方式,在此不做赘述。
在603中,将用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中被用户选择的POI的向量表示作为正例,用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中未被用户选择的POI作为负例,训练神经网络模型,得到排序模型。
排序模型的训练可以采用pairwise的方式。更进一步地,上述正例中可以进一步包括用户的属性特征向量表示和被用户选择的POI的热度特征向量表示;负例中进一步包括用户的属性特征向量表示和未被用户选择的POI的热度特征向量表示。
处理过程与图5类似。也就是说,正例包括:用户在输入查询前缀之前的查询历史信息的向量表示(对应图5中的U per)、对应查询补全建议中被用户选择的POI的向量表示(对应图5中的e w2v)、用户的属性特征向量表示(对应图5中的U d)和被用户选择的POI的热度特征向量表示(对应图5中的V pop),其中U per和e w2v可以经过相乘处理后得到相似性特征矩阵V per。负例包括用户在输入查询前缀之前的查询历史信息的向量表示(对应图5中的U per)、对应查询补全建议中未被用户选择的POI的向量表示(对应图5中的e w2v)、用户的属性特征向量表示(对应图5中的U d)和未被用户选择的POI的热度特征向量表示(对应图5中的V pop),其中U per和e w2v可以经过相乘处理后得到相似性特征矩阵j≠k (i)
输入的各向量表示经过排序模型的拼接和转化后,分别得到正例POI的评分和负例POI的评分,根据得到的正例POI的评分和负例POI的评分更新排序模型的参数,直至达到训练目标。其中训练目标可以为:最大化神经网络模型对正例POI的评分与负例POI的评分之间的差值。
具体地,上述训练目标可以体现为最小化神经网络模型的损失L Δ,例如可以采用以下公式:
Figure PCTCN2020116632-appb-000002
其中,τ为超参数。一条训练数据(第i条训练数据)可以表示为: (u (i),{v (i,1),…,v (i,j),…v (i,n)},k (i)),m为训练数据的条数。其中,u为用户的向量表示,在本申请实施例中为用户的U d,{v (i,1),…,v (i,j),…v (i,n)}为查询补全建议中的POI构成的集合,k (i)为用户在查询补全建议中选择的POI。在本申请实施例中,向量v可以V pop和V per的拼接。
Figure PCTCN2020116632-appb-000003
为正例,(u (i),v (i,j))为负例,其中j≠k (i)。h()为排序模型对POI进行评分时采用的函数,其中包含排序模型训练过程中需要更新的模型参数。
以上是对本申请提供的方法进行的详细描述,下面结合实施例对本申请提供的装置进行详细描述。
实施例三、
图7为本申请实施例三提供的查询自动补全的装置结构图,如图7中所示,该装置可以包括:第一获取单元01、第二获取单元02、评分单元03和查询补全单元04,还可以进一步包括第三获取单元05。其中各组成单元的主要功能如下:
第一获取单元01负责获取用户当前输入的查询前缀,以及确定与查询前缀对应的候选POI。
确定与当前输入的查询前缀对应的候选POI的方式可以采用现有的实现方式,目的是找到与该查询前缀强相关的POI,或者找到以该查询前缀为文本开头的POI。例如可以在POI库中预先针对POI以各种对应的查询前缀建立倒排索引。当用户输入query时,针对当前输入的查询前缀查询POI库,命中的所有POI作为候选POI。
第二获取单元02负责获取用户的查询历史信息的向量表示以及各候选POI的向量表示。
具体地,第二获取单元02可以获取用户的查询历史信息,查询历史信息包括用户在第一时长内查询或点击过的POI以及用户在第二时长内查询或点击过的高频POI,第二时长大于第一时长;利用POI的向量表示获取用户的查询历史信息的向量表示。
各POI的向量表示可以由第三获取单元05预先确定。具体地,第三获取单元05可以获取大规模用户的POI查询日志,将各用户查询或点击的POI分别按照时序进行排列,得到各POI序列;按照预设的滑动窗口大小,对各POI序列进行切片,各切片包括中心POI和该中心POI的上下文POI;利用各切片进行skip-gram模型的训练;训练结束后,从 skip-gram模型获得各POI的向量表示。
其中,第三获取单元05在利用各切片进行skip-gram模型的训练时,可以利用skip-gram模型对各POI的属性信息进行编码,得到各POI的向量表示,以各切片中中心POI的向量表示预测同一切片中上下文POI的向量表示,依据预测结果的误差迭代更新skip-gram模型的模型参数。
第三获取单元05在对各POI的属性信息进行编码时,可以将POI的名称和地址信息采用卷积神经网络进行编码;将POI的其他属性信息采用前馈神经网络进行编码;将同一POI的编码结果进行拼接后,经过全连接层的映射,得到该POI的向量表示。
评分单元03负责将用户的查询历史信息的向量表示以及各候选POI的向量表示输入预先训练得到的排序模型,得到对各候选POI的评分。
进一步地,评分单元03可以将用户的属性特征向量表示和各候选POI的热度特征向量表示一并输入排序模型,用于排序模型对各候选POI进行评分。具体处理方式可以参见实施例一中的相关描述,在此不做赘述。
查询补全单元04负责依据各候选POI的评分确定向用户推荐的查询补全建议。例如,可以将评分值大于或等于预设评分阈值的候选POI作为查询补全建议,也可以将评分值排在前P个的POI作为查询补全建议,等等,其中P为预设的正整数。在向用户推荐查询补全建议时,依据各POI的评分在候选列表中进行排序。推荐方式可以沿用现有的在检索框附近的下拉框的形式,也可以采用其他形式。
实施例四、
图8为本申请实施例提供的建立排序模型的装置结构图,如图8中所示,该装置可以包括:第一获取单元11、第二获取单元12和模型训练单元13,还可以进一步包括第三获取单元14。其中各组成单元的主要功能如下:
第一获取单元11负责从POI查询日志中获取用户标识、用户从查询补全建议中选择POI时已输入的查询前缀、该查询前缀对应的查询补全建议中的各POI以及查询补全建议中被用户选择的POI。
第二获取单元12负责获取用户在输入查询前缀之前的查询历史信息的向量表示以及查询补全建议中各POI的向量表示。
具体地,第二获取单元12可以获取用户在输入查询前缀之前的查询历史信息,查询历史信息包括用户在输入查询前缀之前第一时长内查询或点击过的POI以及第二时长内查询或点击过的高频POI,第二时长大于第一时长;利用POI的向量表示获取用户在输入查询前缀之前的查询历史信息的向量表示。
其中,各POI的向量表示由第三获取单元14预先获取。具体地,第三获取单元14可以获取大规模用户的POI查询日志,将各用户查询或点击的POI分别按照时序进行排列,得到各POI序列;按照预设的滑动窗口大小,对各POI序列进行切片,各切片包括中心POI和该中心POI的上下文POI;利用各切片进行跳字skip-gram模型的训练;训练结束后,从skip-gram模型获得各POI的向量表示。
具体地,第三获取单元14在利用各切片进行skip-gram模型的训练时,可以利用skip-gram模型对各POI的属性信息进行编码,得到各POI的向量表示,以各切片中中心POI的向量表示预测同一切片中上下文POI的向量表示,依据预测结果的误差迭代更新skip-gram模型的模型参数。
在第三获取单元14对各POI的属性信息进行编码时,可以将POI的名称和地址信息采用卷积神经网络进行编码;将POI的其他属性信息采用前馈神经网络进行编码;将同一POI的编码结果进行拼接后,经过全连接层的映射,得到该POI的向量表示。
模型训练单元13负责将用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中被用户选择的POI的向量表示作为正例,用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中未被用户选择的POI作为负例,训练神经网络模型,得到排序模型,其中训练目标为:最大化神经网络模型对正例POI的评分与负例POI的评分之间的差值。
其中,上述正例中还可以进一步包括用户的属性特征向量表示和被用户选择的POI的热度特征向量表示;上述负例中还可以进一步包括用户的属性特征向量表示和未被用户选择的POI的热度特征向量表示。
根据本申请的实施例,本申请还提供了一种电子设备和一种可读存储介质。
如图9所示,是根据本申请实施例的查询自动补全的方法或建立排序模型的方法的电子设备的框图。电子设备旨在表示各种形式的数字计算机,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本申请的实现。
如图9所示,该电子设备包括:一个或多个处理器901、存储器902,以及用于连接各部件的接口,包括高速接口和低速接口。各个部件利用不同的总线互相连接,并且可以被安装在公共主板上或者根据需要以其它方式安装。处理器可以对在电子设备内执行的指令进行处理,包括存储在存储器中或者存储器上以在外部输入/输出装置(诸如,耦合至接口的显示设备)上显示GUI的图形信息的指令。在其它实施方式中,若需要,可以将多个处理器和/或多条总线与多个存储器和多个存储器一起使用。同样,可以连接多个电子设备,各个设备提供部分必要的操作(例如,作为服务器阵列、一组刀片式服务器、或者多处理器系统)。图9中以一个处理器901为例。
存储器902即为本申请所提供的非瞬时计算机可读存储介质。其中,所述存储器存储有可由至少一个处理器执行的指令,以使所述至少一个处理器执行本申请所提供的查询自动补全的方法或建立排序模型的方法。本申请的非瞬时计算机可读存储介质存储计算机指令,该计算机指令用于使计算机执行本申请所提供的查询自动补全的方法或建立排序模型的方法。
存储器902作为一种非瞬时计算机可读存储介质,可用于存储非瞬时软件程序、非瞬时计算机可执行程序以及模块,如本申请实施例中的查询自动补全的方法或建立排序模型的方法对应的程序指令/模块。处理器901通过运行存储在存储器902中的非瞬时软件程序、指令以及模块,从而执行服务器的各种功能应用以及数据处理,即实现上述方法实施例中的查询自动补全的方法或建立排序模型的方法。
存储器902可以包括存储程序区和存储数据区,其中,存储程序区 可存储操作系统、至少一个功能所需要的应用程序;存储数据区可存储根据电子设备的使用所创建的数据等。此外,存储器902可以包括高速随机存取存储器,还可以包括非瞬时存储器,例如至少一个磁盘存储器件、闪存器件、或其他非瞬时固态存储器件。在一些实施例中,存储器902可选包括相对于处理器901远程设置的存储器,这些远程存储器可以通过网络连接至电子设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
该电子设备还可以包括:输入装置903和输出装置904。处理器901、存储器902、输入装置903和输出装置904可以通过总线或者其他方式连接,图9中以通过总线连接为例。
输入装置903可接收输入的数字或字符信息,以及产生与该电子设备的用户设置以及功能控制有关的键信号输入,例如触摸屏、小键盘、鼠标、轨迹板、触摸板、指示杆、一个或者多个鼠标按钮、轨迹球、操纵杆等输入装置。输出装置904可以包括显示设备、辅助照明装置(例如,LED)和触觉反馈装置(例如,振动电机)等。该显示设备可以包括但不限于,液晶显示器(LCD)、发光二极管(LED)显示器和等离子体显示器。在一些实施方式中,显示设备可以是触摸屏。
此处描述的系统和技术的各种实施方式可以在数字电子电路系统、集成电路系统、专用ASIC(专用集成电路)、计算机硬件、固件、软件、和/或它们的组合中实现。这些各种实施方式可以包括:实施在一个或者多个计算机程序中,该一个或者多个计算机程序可在包括至少一个可编程处理器的可编程系统上执行和/或解释,该可编程处理器可以是专用或者通用可编程处理器,可以从存储系统、至少一个输入装置、和至少一个输出装置接收数据和指令,并且将数据和指令传输至该存储系统、该至少一个输入装置、和该至少一个输出装置。
这些计算程序(也称作程序、软件、软件应用、或者代码)包括可编程处理器的机器指令,并且可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。如本文使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读 信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。
由以上描述可以看出,本申请实施例提供的上述方法、装置、设备和计算机存储介质可以具备以下优点:
1)本申请在POI的查询自动补全中,将用户的查询历史信息融入排序模型进行候选POI的排序,使得向用户推荐的查询补全建议更加符合用户的检索偏好。
2)本申请在融入用户的查询历史信息时,既考虑了用户的短期即时兴趣,又考虑了用户的长期兴趣偏好,从而使得推荐的查询补全建议尽可能符合用户的检索偏好。
3)本申请在确定POI的向量表示时,利用了skip-gram模型,使得 POI的向量表示在文本含义上更加符合上下文约束。
4)本申请中在对各POI的属性信息进行编码时,充分考虑了POI作为一个多源信息的聚合体,融合POI的名称、地址、类别、标识等信息,从而使得对POI的表示更加准确。
应该理解,可以使用上面所示的各种形式的流程,重新排序、增加或删除步骤。例如,本发申请中记载的各步骤可以并行地执行也可以顺序地执行也可以不同的次序执行,只要能够实现本申请公开的技术方案所期望的结果,本文在此不进行限制。
上述具体实施方式,并不构成对本申请保护范围的限制。本领域技术人员应该明白的是,根据设计要求和其他因素,可以进行各种修改、组合、子组合和替代。任何在本申请的精神和原则之内所作的修改、等同替换和改进等,均应包含在本申请保护范围之内。

Claims (20)

  1. 一种查询自动补全的方法,其特征在于,该方法包括:
    获取用户当前输入的查询前缀,以及确定与所述查询前缀对应的候选兴趣点POI;
    获取所述用户的查询历史信息的向量表示以及各候选POI的向量表示;
    将所述用户的查询历史信息的向量表示以及各候选POI的向量表示输入预先训练得到的排序模型,得到对各候选POI的评分;
    依据各候选POI的评分确定向所述用户推荐的查询补全建议。
  2. 根据权利要求1所述的方法,其特征在于,获取所述用户的查询历史信息的向量表示包括:
    获取所述用户的查询历史信息,所述查询历史信息包括所述用户在第一时长内查询或点击过的POI以及所述用户在第二时长内查询或点击过的高频POI,所述第二时长大于所述第一时长;
    利用POI的向量表示获取所述用户的查询历史信息的向量表示。
  3. 根据权利要求1或2所述的方法,其特征在于,各POI的向量表示采用如下方式预先得到:
    获取大规模用户的POI查询日志,将各用户查询或点击的POI分别按照时序进行排列,得到各POI序列;
    按照预设的滑动窗口大小,对各POI序列进行切片,各切片包括中心POI和该中心POI的上下文POI;
    利用各切片进行跳字skip-gram模型的训练;
    训练结束后,从skip-gram模型获得各POI的向量表示。
  4. 根据权利要求3所述的方法,其特征在于,所述利用各切片进行skip-gram模型的训练包括:
    利用skip-gram模型对各POI的属性信息进行编码,得到各POI的向量表示,以各切片中中心POI的向量表示预测同一切片中上下文POI的向量表示,依据预测结果的误差迭代更新skip-gram模型的模型参数。
  5. 根据权利要求4所述的方法,其特征在于,对各POI的属性信息进行编码包括:
    将POI的名称和地址信息采用卷积神经网络进行编码;
    将POI的其他属性信息采用前馈神经网络进行编码;
    将同一POI的编码结果进行拼接后,经过全连接层的映射,得到该POI的向量表示。
  6. 根据权利要求1所述的方法,其特征在于,所述排序模型在对各候选POI进行评分时,进一步利用所述用户的属性特征向量表示和各候选POI的热度特征向量表示。
  7. 一种建立用于查询自动补全的排序模型的方法,其特征在于,该方法包括:
    从POI查询日志中获取用户标识、用户从查询补全建议中选择POI时已输入的查询前缀、该查询前缀对应的查询补全建议中的各POI以及查询补全建议中被用户选择的POI;
    获取用户在输入查询前缀之前的查询历史信息的向量表示以及查询补全建议中各POI的向量表示;
    将用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中被用户选择的POI的向量表示作为正例,用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中未被用户选择的POI作为负例,训练神经网络模型,得到所述排序模型,其中训练目标为:最大化神经网络模型对正例POI的评分与负例POI的评分之间的差值。
  8. 根据权利要求7所述的方法,其特征在于,所述获取用户在输入查询前缀之前的查询历史信息的向量表示包括:
    获取用户在输入所述查询前缀之前的查询历史信息,所述查询历史信息包括用户在输入所述查询前缀之前第一时长内查询或点击过的POI以及第二时长内查询或点击过的高频POI,所述第二时长大于所述第一时长;
    利用POI的向量表示获取所述用户在输入查询前缀之前的查询历史信息的向量表示。
  9. 根据权利要求7或8所述的方法,其特征在于,各POI的向量表示采用如下方式预先得到:
    获取大规模用户的POI查询日志,将各用户查询或点击的POI分别 按照时序进行排列,得到各POI序列;
    按照预设的滑动窗口大小,对各POI序列进行切片,各切片包括中心POI和该中心POI的上下文POI;
    利用各切片进行跳字skip-gram模型的训练;
    训练结束后,从skip-gram模型获得各POI的向量表示。
  10. 根据权利要求9所述的方法,其特征在于,所述利用各切片进行skip-gram模型的训练包括:
    利用skip-gram模型对各POI的属性信息进行编码,得到各POI的向量表示,以各切片中中心POI的向量表示预测同一切片中上下文POI的向量表示,依据预测结果的误差迭代更新skip-gram模型的模型参数。
  11. 根据权利要求7所述的方法,其特征在于,所述正例中进一步包括用户的属性特征向量表示和被用户选择的POI的热度特征向量表示;
    所述负例中进一步包括用户的属性特征向量表示和未被用户选择的POI的热度特征向量表示。
  12. 一种查询自动补全的装置,其特征在于,该装置包括:
    第一获取单元,用于获取用户当前输入的查询前缀,以及确定与所述查询前缀对应的候选POI;
    第二获取单元,用于获取所述用户的查询历史信息的向量表示以及各候选POI的向量表示;
    评分单元,用于将所述用户的查询历史信息的向量表示以及各候选POI的向量表示输入预先训练得到的排序模型,得到对各候选POI的评分;
    查询补全单元,用于依据各候选POI的评分确定向所述用户推荐的查询补全建议。
  13. 根据权利要求12所述的装置,其特征在于,所述第二获取单元,具体用于:获取所述用户的查询历史信息,所述查询历史信息包括所述用户在第一时长内查询或点击过的POI以及所述用户在第二时长内查询或点击过的高频POI,所述第二时长大于所述第一时长;利用POI的向量表示获取所述用户的查询历史信息的向量表示。
  14. 根据权利要求12或13所述的装置,其特征在于,该装置还包 括:第三获取单元,用于采用以下方式预先获取各POI的向量表示:
    获取大规模用户的POI查询日志,将各用户查询或点击的POI分别按照时序进行排列,得到各POI序列;
    按照预设的滑动窗口大小,对各POI序列进行切片,各切片包括中心POI和该中心POI的上下文POI;
    利用各切片进行跳字skip-gram模型的训练;
    训练结束后,从skip-gram模型获得各POI的向量表示。
  15. 根据权利要求12所述的装置,其特征在于,所述评分单元,还用于进一步将所述用户的属性特征向量表示和各候选POI的热度特征向量表示输入所述排序模型,用于所述排序模型对各候选POI进行评分。
  16. 一种建立用于查询自动补全的排序模型的装置,其特征在于,该装置包括:
    第一获取单元,用于从POI查询日志中获取用户标识、用户从查询补全建议中选择POI时已输入的查询前缀、该查询前缀对应的查询补全建议中的各POI以及查询补全建议中被用户选择的POI;
    第二获取单元,用于获取用户在输入查询前缀之前的查询历史信息的向量表示以及查询补全建议中各POI的向量表示;
    模型训练单元,用于将用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中被用户选择的POI的向量表示作为正例,用户在输入查询前缀之前的查询历史信息的向量表示以及对应查询补全建议中未被用户选择的POI作为负例,训练神经网络模型,得到所述排序模型,其中训练目标为:最大化神经网络模型对正例POI的评分与负例POI的评分之间的差值。
  17. 根据权利要求16所述的装置,其特征在于,该装置包括:
    所述第二获取单元,具体用于获取用户在输入所述查询前缀之前的查询历史信息,所述查询历史信息包括用户在输入所述查询前缀之前第一时长内查询或点击过的POI以及第二时长内查询或点击过的高频POI,所述第二时长大于所述第一时长;利用POI的向量表示获取所述用户在输入查询前缀之前的查询历史信息的向量表示。
  18. 根据权利要求16或17所述的装置,其特征在于,该装置还包括:
    第三获取单元,用于利用如下方式预先获取各POI的向量表示:
    获取大规模用户的POI查询日志,将各用户查询或点击的POI分别按照时序进行排列,得到各POI序列;
    按照预设的滑动窗口大小,对各POI序列进行切片,各切片包括中心POI和该中心POI的上下文POI;
    利用各切片进行跳字skip-gram模型的训练;
    训练结束后,从skip-gram模型获得各POI的向量表示。
  19. 一种电子设备,其特征在于,包括:
    至少一个处理器;以及
    与所述至少一个处理器通信连接的存储器;其中,
    所述存储器存储有可被所述至少一个处理器执行的指令,所述指令被所述至少一个处理器执行,以使所述至少一个处理器能够执行权利要求1-11中任一项所述的方法。
  20. 一种存储有计算机指令的非瞬时计算机可读存储介质,其特征在于,所述计算机指令用于使所述计算机执行权利要求1-11中任一项所述的方法。
PCT/CN2020/116632 2020-01-06 2020-09-21 查询自动补全的方法、装置、设备和计算机存储介质 WO2021139209A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/311,793 US20220342936A1 (en) 2020-01-06 2020-09-21 Query auto-completion method and apparatus, device and computer storage medium
EP20894917.2A EP3879416A4 (en) 2020-01-06 2020-09-21 METHOD, DEVICE AND DEVICE FOR AUTOMATICALLY COMPLETING A QUERY, AND COMPUTER STORAGE MEDIUM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010010479.2 2020-01-06
CN202010010479.2A CN111222058B (zh) 2020-01-06 2020-01-06 查询自动补全的方法、装置、设备和计算机存储介质

Publications (1)

Publication Number Publication Date
WO2021139209A1 true WO2021139209A1 (zh) 2021-07-15

Family

ID=70832259

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/116632 WO2021139209A1 (zh) 2020-01-06 2020-09-21 查询自动补全的方法、装置、设备和计算机存储介质

Country Status (4)

Country Link
US (1) US20220342936A1 (zh)
EP (1) EP3879416A4 (zh)
CN (1) CN111222058B (zh)
WO (1) WO2021139209A1 (zh)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111222058B (zh) * 2020-01-06 2021-04-16 百度在线网络技术(北京)有限公司 查询自动补全的方法、装置、设备和计算机存储介质
CN111694919B (zh) * 2020-06-12 2023-07-25 北京百度网讯科技有限公司 生成信息的方法、装置、电子设备及计算机可读存储介质
CN111767479B (zh) * 2020-06-30 2023-06-27 北京百度网讯科技有限公司 推荐模型生成方法、装置、电子设备及存储介质
CN112395044B (zh) * 2020-11-10 2023-04-28 新华三技术有限公司合肥分公司 命令行关键词填充方法、装置及网络设备
CN112528156B (zh) * 2020-12-24 2024-03-26 北京百度网讯科技有限公司 建立排序模型的方法、查询自动补全的方法及对应装置
CN116821271B (zh) * 2023-08-30 2023-11-24 安徽商信政通信息技术股份有限公司 一种基于音形码的地址识别和规范化方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140358879A1 (en) * 2012-05-31 2014-12-04 International Business Machines Corporation Search engine suggestion
CN104462369A (zh) * 2014-12-08 2015-03-25 沈阳美行科技有限公司 一种导航设备的搜索自动补全方法
CN107832404A (zh) * 2017-11-02 2018-03-23 武汉大学 一种poi信息的补全方法
CN107862004A (zh) * 2017-10-24 2018-03-30 科大讯飞股份有限公司 智能排序方法及装置、存储介质、电子设备
CN110046298A (zh) * 2019-04-24 2019-07-23 中国人民解放军国防科技大学 一种查询词推荐方法、装置、终端设备及计算机可读介质
CN111222058A (zh) * 2020-01-06 2020-06-02 百度在线网络技术(北京)有限公司 查询自动补全的方法、装置、设备和计算机存储介质
CN111241427A (zh) * 2020-01-06 2020-06-05 百度在线网络技术(北京)有限公司 查询自动补全的方法、装置、设备和计算机存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8438142B2 (en) * 2005-05-04 2013-05-07 Google Inc. Suggesting and refining user input based on original user input
US20160041983A1 (en) * 2014-08-07 2016-02-11 Yahoo! Inc. Local query ranking for search assist method and apparatus
CN107122469B (zh) * 2017-04-28 2019-12-17 中国人民解放军国防科学技术大学 基于语义相似度与时效性频率的查询推荐排序方法与装置
WO2018201280A1 (en) * 2017-05-02 2018-11-08 Alibaba Group Holding Limited Method and apparatus for query auto-completion
US11366866B2 (en) * 2017-12-08 2022-06-21 Apple Inc. Geographical knowledge graph
WO2021076606A1 (en) * 2019-10-14 2021-04-22 Stacks LLC Conceptual, contextual, and semantic-based research system and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140358879A1 (en) * 2012-05-31 2014-12-04 International Business Machines Corporation Search engine suggestion
CN104462369A (zh) * 2014-12-08 2015-03-25 沈阳美行科技有限公司 一种导航设备的搜索自动补全方法
CN107862004A (zh) * 2017-10-24 2018-03-30 科大讯飞股份有限公司 智能排序方法及装置、存储介质、电子设备
CN107832404A (zh) * 2017-11-02 2018-03-23 武汉大学 一种poi信息的补全方法
CN110046298A (zh) * 2019-04-24 2019-07-23 中国人民解放军国防科技大学 一种查询词推荐方法、装置、终端设备及计算机可读介质
CN111222058A (zh) * 2020-01-06 2020-06-02 百度在线网络技术(北京)有限公司 查询自动补全的方法、装置、设备和计算机存储介质
CN111241427A (zh) * 2020-01-06 2020-06-05 百度在线网络技术(北京)有限公司 查询自动补全的方法、装置、设备和计算机存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3879416A4

Also Published As

Publication number Publication date
EP3879416A4 (en) 2022-03-09
CN111222058A (zh) 2020-06-02
EP3879416A1 (en) 2021-09-15
US20220342936A1 (en) 2022-10-27
CN111222058B (zh) 2021-04-16

Similar Documents

Publication Publication Date Title
WO2021139209A1 (zh) 查询自动补全的方法、装置、设备和计算机存储介质
CN111984689B (zh) 信息检索的方法、装置、设备以及存储介质
WO2021128729A1 (zh) 确定搜索结果的方法、装置、设备和计算机存储介质
WO2021139222A1 (zh) 建立排序模型的方法、查询自动补全的方法及对应装置
CN112507715A (zh) 确定实体之间关联关系的方法、装置、设备和存储介质
WO2021139221A1 (zh) 查询自动补全的方法、装置、设备和计算机存储介质
US11907671B2 (en) Role labeling method, electronic device and storage medium
WO2021212827A1 (zh) 检索地理位置的方法、装置、设备和计算机存储介质
US20220129448A1 (en) Intelligent dialogue method and apparatus, and storage medium
WO2021212826A1 (zh) 用于检索地理位置的相似度模型建立方法和装置
CN112528001B (zh) 一种信息查询方法、装置及电子设备
CN110147494B (zh) 信息搜索方法、装置,存储介质及电子设备
JP7160986B2 (ja) 検索モデルの訓練方法、装置、デバイス、コンピュータ記憶媒体、及びコンピュータプログラム
EP3876563A1 (en) Method and apparatus for broadcasting configuration information of synchronizing signal block, and method and apparatus for receiving configuration information of synchronizing signal block
CN111881255B (zh) 同义文本获取方法、装置、电子设备及存储介质
CN112528157B (zh) 建立排序模型的方法、查询自动补全的方法及对应装置
CN113268618B (zh) 一种搜索信息评分方法、装置和电子设备
CN117573960A (zh) 搜索推荐方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020894917

Country of ref document: EP

Effective date: 20210609

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20894917

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE