WO2021185147A1 - 搜索意图识别 - Google Patents

搜索意图识别 Download PDF

Info

Publication number
WO2021185147A1
WO2021185147A1 PCT/CN2021/080240 CN2021080240W WO2021185147A1 WO 2021185147 A1 WO2021185147 A1 WO 2021185147A1 CN 2021080240 W CN2021080240 W CN 2021080240W WO 2021185147 A1 WO2021185147 A1 WO 2021185147A1
Authority
WO
WIPO (PCT)
Prior art keywords
search
intent
scene
feature vector
feature
Prior art date
Application number
PCT/CN2021/080240
Other languages
English (en)
French (fr)
Inventor
刘铭
许鑫
汪祖海
王可
吕梅
于志安
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Publication of WO2021185147A1 publication Critical patent/WO2021185147A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Definitions

  • This application relates to the field of search engines, specifically to search intent recognition.
  • search intent usually refers to the user's real needs behind the search behavior. For example, searching for "badminton” may be because the user wants to buy badminton equipment, or looking for badminton courts, or learning the rules of badminton, and so on.
  • searching for "badminton” may be because the user wants to buy badminton equipment, or looking for badminton courts, or learning the rules of badminton, and so on.
  • "purchase equipment”, "find venues” and “learning rules” are three different search intents related to the search keyword "badminton”.
  • search intent is determined by the text matching of search keywords and rules; 2) Search based on text classification or clustering Prediction of intent; 3) Map keywords to a high-dimensional semantic vector space through topic models and other methods to express search intent.
  • This application provides a search intention recognition method, device, electronic equipment, and storage medium.
  • a search intent identification method including: in response to a search request, obtaining search scene information associated with the search request; To identify the composite feature of search intent; input the composite feature into the search intent recognition model to obtain the search intent recognition result output by the search intent recognition model.
  • the generating a composite feature for identifying search intent according to the search scene information and the search request includes: encoding the search scene information into a scene feature vector, and encoding the search request to obtain The search request feature vector corresponding to the search request; the scene feature vector and the search request feature vector are fused to obtain a fusion feature vector, and the fusion feature vector is used as the composite feature, wherein the search request feature vector
  • the proportion of dimensions in the fusion feature vector is not less than a preset ratio.
  • the encoding the search scene information into a scene feature vector includes: separately encoding the search scene information according to scene dimensions to obtain feature vectors corresponding to each scene dimension, and the scene dimensions include at least the following One: location dimension, weather dimension, user behavior dimension, time dimension.
  • the encoding the search scene information according to scene dimensions to obtain feature vectors corresponding to each scene dimension includes: performing GeoHash processing on the latitude and longitude information in the position dimension, and performing one-hot encoding on the processing result, Obtain the latitude and longitude feature vector.
  • the encoding the search scene information according to scene dimensions to obtain feature vectors corresponding to each scene dimension includes: performing bucket discretization processing on the continuous value category information in the weather dimension, and processing the result Perform one-hot encoding to get the weather feature vector.
  • said encoding the search scene information according to scene dimensions to obtain feature vectors corresponding to each scene dimension includes: for the user behavior sequence in the user behavior dimension, the number of user behaviors in the user behavior sequence If the number is not greater than the specified number, select all user behaviors in the user behavior sequence; if the number of user behaviors in the user behavior sequence is greater than the specified number, select the specified number of user behavior sequences in reverse chronological order User behavior; obtain the search intent of the target corresponding to each selected user behavior; perform feature embedding processing on the obtained search intent to obtain the user behavior feature vector.
  • the specified number is predetermined in the following manner: for each user behavior sequence that includes an order behavior in the search log, count the length of the continuous click behavior sequence in the user behavior sequence that includes the order behavior,
  • the continuous click behavior refers to a click behavior that occurs between two ordering behaviors and the occurrence interval is not greater than a preset time threshold; the length average of each continuous click behavior sequence is taken as the specified number.
  • the search intention recognition model is obtained by training in the following ways: generating training samples according to the search log; generating composite features according to the training samples; using the composite features to train the search intention recognition model.
  • the generating training samples according to the search log includes: generating a first type of positive sample according to a search log containing an order behavior; generating a second type of positive sample according to the search log containing a click behavior, the first type of positive sample The weight of is greater than the weight of the second type of positive samples; negative samples are generated based on search logs that only contain browsing behavior.
  • the search intention recognition result includes the intent intensity distribution of a plurality of search intents
  • the method further includes: obtaining a specified search intent and its intent rank; and determining the intent intensity distribution according to the intent rank and the intent intensity distribution.
  • the intent intensity value of the designated search intent; and the intent intensity distribution that includes the designated search intent is generated according to the intent intensity value of the designated search intent and the intent intensity distribution.
  • the obtaining the specified search intent and the ranking of the intent includes: obtaining the specified search intent that matches the search request and is in an effective state, and the effective state is based on the display time of the specified search intent and/or the specified search The number of impressions of the intent is determined.
  • a search intention recognition device including: a response unit for obtaining search scene information associated with the search request in response to a search request; The search scene information and the search request are used to generate a composite feature for identifying search intent; a search intent recognition unit is used to input the composite feature into the search intent recognition model to obtain the search output from the search intent recognition model Intent recognition result.
  • the composite feature generating unit is configured to encode the search scene information into a scene feature vector, and encode the search request to obtain a search request feature vector corresponding to the search request;
  • the feature vector and the search request feature vector are fused to obtain a fusion feature vector, and the fusion feature vector is used as the composite feature, wherein the dimension ratio of the search request feature vector in the fusion feature vector is not less than a preset ratio .
  • the composite feature generating unit is configured to separately encode the search scene information according to scene dimensions to obtain feature vectors corresponding to each scene dimension, and the scene dimensions include at least one of the following: location dimensions, Weather dimension, user behavior dimension, time dimension.
  • the composite feature generating unit is configured to perform GeoHash processing on the latitude and longitude information in the location dimension, and perform one-hot encoding on the processing result to obtain the latitude and longitude feature vector.
  • the composite feature generating unit is configured to perform bucket discretization processing on the continuous value class information in the weather dimension, and perform one-hot encoding on the processing result to obtain the weather feature vector.
  • the composite feature generating unit is configured to select all user behavior sequences in the user behavior sequence in the case that the number of user behaviors in the user behavior sequence is not greater than a specified number User behavior; when the number of user behaviors in the user behavior sequence is greater than the specified number, select a specified number of user behaviors in the user behavior sequence in reverse time order; obtain the search intent of the target corresponding to each selected user behavior; Perform feature embedding processing on the acquired search intent to obtain the user behavior feature vector.
  • the search intention recognition device further includes: a preprocessing unit, configured to count each user behavior sequence containing an order behavior in the search log, among the user behavior sequences that include an order behavior, the sequence of consecutive click behaviors
  • the continuous click behavior refers to the click behavior that occurs between two ordering behaviors and the occurrence interval is not greater than a preset time threshold; the average length of each continuous click behavior sequence is taken as the specified number.
  • the search intention recognition device further includes: a preprocessing unit, configured to generate training samples based on the search log, and composite features based on the training samples; and a training unit, configured to use the composite features to perform search intent recognition models train.
  • a preprocessing unit configured to generate training samples based on the search log, and composite features based on the training samples
  • a training unit configured to use the composite features to perform search intent recognition models train.
  • the preprocessing unit is configured to generate a first type of positive sample according to a search log containing an order behavior; generate a second type of positive sample according to a search log containing a click behavior, and the weight of the first type of positive sample Greater than the weight of the second type of positive samples; generating negative samples based on search logs that only contain browsing behaviors.
  • the search intention recognition result includes intent intensity distributions of multiple search intents
  • the search intention recognition device further includes: an intent adjustment unit configured to obtain a specified search intent and its intent rank; and according to the intent position And the intent intensity distribution to determine the intent intensity value of the designated search intent; and generate an intent intensity distribution that includes the designated search intent according to the intent intensity value of the designated search intent and the intent intensity distribution.
  • the intent adjustment unit is configured to obtain a specified search intent that matches the search request and is in an effective state, and the effective state is based on the display time of the specified search intent and/or the number of times that the specified search intent has been displayed Sure.
  • an electronic device comprising: a processor; and a memory arranged to store computer-executable instructions, which when executed, cause the processor to execute Any of the search intention recognition methods described above.
  • a computer-readable storage medium stores one or more programs, and the one or more programs, when executed by a processor, implement any of the foregoing 1.
  • the embodiment of the present application acquires search scene information in response to a search request, generates a composite feature for identifying search intent according to the search scene information and the search request, and enters the composite feature into the search intent recognition model to obtain The search intent recognition result output by the search intent recognition model.
  • the embodiments of this application not only focus on search requests, but also on search scene information such as weather, location, and user behavior.
  • the search intention recognition model based on composite modeling is used to predict the real needs of users with reference to various factors, which improves the search based on search only.
  • the problem that the request cannot accurately identify the search intent is particularly suitable for life service and LBS (Location Based Services, location-based services) search scenarios.
  • Fig. 1 shows a schematic flowchart of a search intention recognition method according to an embodiment of the present application.
  • Fig. 2 shows a schematic flowchart of a training method for a search intent recognition model according to an embodiment of the present application.
  • Fig. 3 shows a schematic structural diagram of a search intention recognition model according to an embodiment of the present application.
  • Fig. 4 shows a schematic flowchart of a search intention recognition method according to an embodiment of the present application.
  • Fig. 5 shows a schematic structural diagram of a search intention recognition device according to an embodiment of the present application.
  • Fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • Fig. 7 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.
  • solution 1) also requires manual labeling and rule formulation, which has poor generalization ability and cannot cope with iterative changes in business scenarios; solution 3) is difficult to adapt to scenarios that require high accuracy and consistency. It can be seen that the existing technology cannot meet the business needs, and there is still a lot of room for improvement.
  • This application proposes a solution that incorporates search scene information such as user behavior, weather, and location into the scope of attention, combined with search requests for composite modeling, and achieves more accurate identification of search intentions.
  • Fig. 1 shows a schematic flowchart of a search intention recognition method according to an embodiment of the present application. As shown in Fig. 1, the search intention recognition method includes steps S110-S130.
  • step S110 in response to the search request, the search scene information associated with the search request is acquired.
  • search engine technology including but not limited to general search engines such as Baidu and Google (the business names here are only illustrative), and special search engines in the fields of patents, trademarks, etc. , And the search engine in the app.
  • the user can generate a search request (query) in various ways such as text, image, voice, etc.
  • the text can be a search keyword or an expression form of a search sentence.
  • Step S120 according to the search scene information and the search request, a composite feature for identifying the search intention is generated.
  • search scene information can be regarded as the indirect expression of the user's search intent, and can supplement the potential search intent not reflected in the search request.
  • search scene information can cover multiple scene dimensions, such as time dimension, location dimension, weather dimension, user behavior dimension, and so on.
  • Kung Pao Chicken For example, if a user searches for "Kung Pao Chicken", it may be because they want to learn the practice of Kung Pao Chicken, or they want to order Kung Pao Chicken for takeout, or they may want to go to a restaurant that sells Kung Pao Chicken.
  • users search they do not necessarily express their search intent clearly with search requests. This requires users to find in search or perform a second search, which reduces user experience.
  • Step S130 Input the composite feature into the search intent recognition model, and obtain the search intent recognition result output by the search intent recognition model.
  • the search intent recognition model here is based on composite modeling and pre-training of search requests and search scene information.
  • search intents may include take-out, dine-in, recipes, reviews, discounts, etc. These search intents can reflect user needs, specifically, the name determination and category division of search intents can be performed by business parties or domain experts. In other words, search intent can be understood as generalized user needs.
  • the search intent can correspond to the category of goods or services, and the categories of goods and services can be defined according to business needs.
  • the take-out and dine-in given above are the classification of service provision methods.
  • a search result can correspond to one or more search intents. For example, if a restaurant provides both dine-in sales and take-out services, the corresponding search intent of the restaurant can include take-out and dine-in; while another restaurant only provides take-out services, then The search intent for this restaurant only includes takeaway. Conversely, a search intent can also correspond to one or more search results, and generally multiple search results, for example, there are many restaurants that provide take-out services. The more the search intent matches the user's real needs, the easier it is for the search results displayed to the user to achieve the user's search purpose.
  • the search intent recognition method shown in Figure 1 not only pays attention to search requests, but also pays attention to search scene information such as weather, location, and user behavior.
  • the search intent recognition model based on composite modeling is used to refer to the real needs of users from various factors. Prediction has improved the problem that the search intent cannot be accurately identified only based on the search request, which is especially suitable for life service and LBS search scenarios.
  • generating a composite feature for identifying the search intention includes: encoding the search scene information into a scene feature vector, and performing the search request
  • the search request feature vector corresponding to the search request is obtained by encoding; the scene feature vector and the search request feature vector are fused to obtain the fusion feature vector, and the fusion feature vector is used as a composite feature.
  • the dimensionality of the search request feature vector in the fusion feature vector accounts for The ratio is not less than the preset ratio.
  • feature vectors are mathematical expressions of information such as text and images, and are generally high-dimensional vectors.
  • the encoding operation can be implemented using any type or multiple types of feature engineering techniques in the prior art, as long as vectorized data can be obtained.
  • both the search request feature vector and the scene feature vector are continuous vectors obtained by embedding.
  • the generation of the search request feature vector can be achieved by using NLP (Natural Language Processing) technology to encode the content of the search request in the form of text, or using image processing technology to encode the content of the search request in the form of an image, etc. Wait.
  • NLP Natural Language Processing
  • the search request is information that can directly reflect the user's search intent. Therefore, the feature vector of the search request is relatively important, and the proportion of the dimensionality in the fusion feature vector cannot be too low.
  • the specific fusion operation can be a Concat operation.
  • encoding the search scene information into a scene feature vector includes: separately encoding the search scene information according to scene dimensions to obtain feature vectors corresponding to each scene dimension. It includes at least one of the following: a location dimension, a weather dimension, a user behavior dimension, and a time dimension.
  • the scene information can specifically include latitude and longitude information, city information, entity (POI of interest, such as shopping malls, residential areas, etc.) information in the location dimension; in the weather dimension, it can specifically include wind information, temperature information, etc.; in user behavior
  • the dimension can specifically include click information, order information, browsing information, etc.; in the time dimension, it can specifically include season information, holiday information, etc.
  • Corresponding feature vectors can be generated for each scene dimension, and these feature vectors can be independently used as scene feature vectors, or all or part of these feature vectors can be fused through the Concat operation and the fused feature vectors can be used as the scene feature vector.
  • the search scene information is respectively encoded according to the scene dimensions, and the feature vector corresponding to each scene dimension is obtained including: GeoHash processing the longitude and latitude information in the position dimension, and One-hot encoding is performed on the processing result to obtain the latitude and longitude feature vector.
  • GeoHash processing is essentially a way of spatial indexing, which can be understood as treating the ground surface as a two-dimensional plane and recursively decomposing the plane into smaller sub-blocks, each of which has the same code within a certain range of latitude and longitude.
  • Establishing a spatial index in the way of GeoHash can improve the efficiency of latitude and longitude retrieval.
  • GeoHash is used to make the two-dimensional latitude and longitude information into one dimension, which is convenient for the training and application of the search intent recognition model.
  • One-hot encoding (one-hot) can be understood as encoding N states with N-bit status registers, each state has an independent register bit, but only one of these register bits is valid. Discrete features can be continuous by one-hot encoding.
  • the search scene information is respectively encoded according to the scene dimensions, and the feature vector corresponding to each scene dimension is obtained including: bucketing the continuous value category information in the weather dimension Discretization processing and one-hot encoding of the processing results to obtain weather feature vectors.
  • the bucket discretization processing is mainly for continuous values such as wind and temperature, so that the obtained weather feature vector is high-dimensional and sparse, which is convenient for the training and use of the search intent recognition model.
  • the search scene information is respectively encoded according to the scene dimensions, and the feature vector corresponding to each scene dimension is obtained including: for the user behavior sequence in the user behavior dimension, When the number of user behaviors in the behavior sequence is not greater than the specified number, select all user behaviors in the user behavior sequence; when the number of user behaviors in the user behavior sequence is greater than the specified number, select in reverse chronological order Set a specified number of user behaviors in the user behavior sequence; obtain the search intent of the target corresponding to each selected user behavior; perform feature embedding processing on the obtained search intent to obtain the user behavior feature vector.
  • the log can record the time point of each user behavior, and these user behaviors can form a user behavior sequence. If the user behavior information contains multiple user behaviors, if it is used as search scene information, it is necessary to ensure that these user behaviors have a certain relevance. Therefore, in the embodiments of the present application, a method of selecting user behaviors in reverse chronological order is provided, so as to avoid too many user behaviors to be included or unrelated.
  • search results User behaviors often correspond to specific search results, and these search results are business-related.
  • the business party can provide the search intent of these search results in advance, and this part of the content usually does not require additional generation in actual scenarios, because For their own business needs, business parties usually first classify search intents and associate search results with search intents.
  • Word Embedding is a text processing technology in Natural Language Processing (NLP), which can be used to perform feature embedding processing in the embodiments of the present application.
  • NLP Natural Language Processing
  • the specific feature embedding method is not limited to this example.
  • Transformer a type of NLP model proposed by Google, no Chinese name
  • BERT Bidirectional Encoder Representations from Transformers, Transformer-based two-way encoder representation
  • GPT Geneerative Pre-Training, generative training
  • the specified number is predetermined in the following way: for each user behavior sequence that contains an order behavior in the search log, count the user behaviors that include an order behavior In the sequence, the length of the sequence of continuous click behaviors.
  • Continuous click behavior refers to the click behaviors that occur between two ordering behaviors and the occurrence interval is not greater than the preset time threshold; the average length of each continuous click behavior sequence is taken as the specified number .
  • N is a specified number
  • the behavior is counted, and then push forward for another 30 seconds...This will continue to push forward until there is no click behavior or order placement behavior for more than 30 seconds.
  • This forms a sequence of continuous click behaviors.
  • Count the length of a continuous behavior sequence in a long time interval, and the average value is N.
  • the modeling of user behavior is to predict the current preference based on the maximum N click preferences before the current search request to determine the search intention.
  • the search intention recognition model is obtained by training: generating training samples based on search logs; generating composite features based on training samples; using composite features to perform search intent recognition models Training.
  • the search log here records the specific content of the search request, such as query text or query image, and records search scene information.
  • the specific training can be divided into multiple stages. After each training stage, the obtained search intent recognition model is verified. If the verification is passed, it will be put into use. If the verification fails, on the one hand, the parameters of the search intent recognition model can be checked. Adjustment is to optimize the search intent recognition model. On the other hand, it is also possible to consider adjusting the generation or fusion method of training samples and feature vectors. Then re-train according to the adjusted data and process until the search intent recognition model is verified.
  • the search intent recognition model can be pre-trained first, and the feature vector of the search request can be fine-tuned according to the feedback of the pre-training.
  • generating training samples based on search logs includes: generating first-type positive samples based on search logs containing order behavior; generating second-type positive samples based on search logs containing click behaviors For positive samples, the weight of the positive samples of the first type is greater than the weight of the positive samples of the second type; negative samples are generated based on the search log containing only browsing behavior.
  • the search log can record information from the time the user initiates a search request until the order is placed, the search is performed again, or the search engine leaves the search engine. For example, a user searches for "Gongbao Chicken", and the search engine displays multiple search results on the page. Some of these search results are only displayed, some are clicked by the user, and the user may eventually select some search results to place an order.
  • the ordering behavior best reflects the user’s true forward search intent, that is, "what is needed”; although the click behavior can also reflect the user’s forward search intent, It may also be caused by accidental touch; and if there is only browsing behavior, it can reflect the user's negative search intent, that is, "doesn't need anything.”
  • search logs containing click behaviors can be taken as the second type of positive samples, and search logs containing ordering behaviors can be used as the first type of positive samples, and distinguished by weight.
  • the weight of the second type of positive samples is the same as that of the first type.
  • the weight ratio of positive samples can be 1:10.
  • Negative samples can correspond to the search results that the user browses before clicking (referred to as "Skip above" in the industry, no Chinese name), and those search results displayed after clicking are not processed.
  • Fig. 2 shows a schematic flowchart of a training method for a search intent recognition model according to an embodiment of the present application.
  • the search engine when the user enters a search keyword and initiates a search request, the search engine will return the search results and record the search log.
  • the search log is stored after cleaning and other processing.
  • the browsing behavior, clicking behavior, and ordering behavior recorded in the search log can generate positive and negative training samples and weights, and label the samples by combining the search intention categories given by the business side.
  • Perform feature processing on the training samples to obtain search request feature vectors, latitude and longitude feature vectors, weather feature vectors, user behavior feature vectors, and other extended feature vectors that can be generated according to needs.
  • a fusion feature vector is generated, and the search intention recognition model is input for model training. If the model verification is passed, a usable search intent recognition model is obtained. If the model verification fails, processing such as parameter optimization is performed, and the training is repeated until the search intent recognition model is verified.
  • the new search intent here does not necessarily mean that the user has a new demand, it may be a new definition in the business
  • search intent recognition model can be updated iteratively.
  • the search keyword is processed by the coding layer to obtain the search request feature vector and enter the network layer; the latitude and longitude information enters the coding layer after GeoHash processing to obtain the latitude and longitude feature vector; the weather information enters the coding layer after the binning discretization process to obtain the weather Feature vector: the user behavior sequence is processed by the coding layer to obtain the user behavior feature vector and enter the network layer; the latitude and longitude feature vector and the weather feature vector are obtained through the Concat operation to obtain the environmental feature vector, and then enter the network layer; the output of the above network layers is obtained through the Concat operation Fusion feature vector, enter the backbone network layer, output the search intent recognition result, and calculate the loss.
  • the search intention recognition result includes the intent intensity distributions of multiple search intents
  • the method further includes: obtaining the specified search intent and its intent rank; and according to the intent rank sum Intent intensity distribution, determine the intent intensity value of the specified search intent; according to the intent intensity value of the specified search intent and the intent intensity distribution, generate the intent intensity distribution that contains the specified search intent.
  • Modeling based on search logs although the method of finally obtaining search intent can meet the needs of the user side, it also has certain shortcomings for the business side. The reason is that only modeling based on user behavior is prone to produce the Matthew effect, that is, the strong will always be strong, and the weak will always be weak, leading to some search intentions that are easy to be ignored, and new search intentions are harder to be exposed.
  • this application designs an integrated solution that incorporates other search intents, such as search intents recommended by the business side, so that the business side can also participate in the search intent identification process.
  • the search engine has identified four search intents A, B, C, and D.
  • the intent strengths of these four search intents gradually decrease to 0.4, 0.3, 0.2, and 0.1, respectively.
  • the intent intensity distribution of the four search intents is formed, and the search results corresponding to the search intent A will be displayed first.
  • the business side wants to display the search intent E, and hope it can be displayed in the third position, that is, the order of A, B, E, C, D, and then the intent intensity of E can be generated according to the current intent intensity distribution
  • the value for example, is an arithmetic mean value of 0.35 for the intention strength value of B and the intention strength value of C. Since E is added so that the sum of the intensity values of each intention exceeds 1, the softmax function can be used for normalization.
  • each search intent can correspond to a different search result, and the user can switch between search intents on the search result page (for example, each search intent displays its corresponding search result in its own tab).
  • "Takeaway” is an existing search intent, and the business side has launched a new search intent of "Boutique Takeaway” in the course of operations.
  • a search result may correspond to both “Takeaway” and “Boutique Takeaway”, and the search result has a higher display priority in "Boutique Takeaway”. So for users who like the search results, it is obvious that "exquisite takeaway" is a better search intent.
  • the search intent is a newly generated search intent
  • the "exquisite takeaway” will hardly be displayed, which is not in line with users and business. Party’s needs.
  • the intensity distribution of intent is adjusted according to the above method, the "excellent takeaway" can have a higher display priority, so that the search intent recognition model can be further adjusted according to the search log.
  • acquiring the designated search intention and its ranking includes: acquiring the designated search intention that matches the search request and is in an effective state, and the effective state is displayed according to the specified search intent The time and/or the number of impressions for the specified search intent is determined.
  • the specified search intent can be applied to the cold start scenario, which guarantees the position of the specified search intent within a period of time or the number of impressions, thereby ensuring the display of the corresponding search results and satisfying the cultivation of user cognition.
  • the search intent recognition model has accumulated enough search logs for search intent recognition. This overcomes the Matthew effect problem that often occurs in user behavior modeling scenarios, and meets the needs of the business side while being close to the needs of users.
  • Fig. 4 shows a schematic flowchart of a search intention recognition method according to an embodiment of the present application.
  • the search request feature vector, latitude and longitude feature vector, weather feature vector, user behavior feature vector, and other extended feature vectors that can be generated according to needs are generated. These feature vectors are fused and input into the search intent recognition model to obtain the intent intensity distribution of multiple search intents.
  • the search results are selected for display according to the intent intensity distribution; if the business party has the specified search intent available, then the intent intensity distribution is recalculated according to the specified search intent, and the intent intensity distribution is obtained according to the recalculation The intensity distribution of intent to select search results for display.
  • the optional solution is to provide it in the specified data format.
  • the specified search intent is required to be associated with a specific search keyword, effective at a specific time and scene, and there is a limit on the number of recommended exposures. and many more.
  • the effective duration is set, the number of duration days is automatically reduced by 1 per day until it reaches 0; the number of exposures is the number of impressions, which also decreases with the number of daily search log records, until it reaches 0, and is updated daily.
  • the search intent is guaranteed to be in the corresponding position in the intent distribution; on the contrary, if the effective duration or the number of exposures is 0, the designation will not be considered Search intent, at this time, the search intent is completely determined by the search intent recognition model.
  • FIG. 5 shows a schematic structural diagram of a search intention recognition device according to an embodiment of the present application.
  • the search intention recognition device 500 includes a response unit 510, a composite feature generation unit 520 and a search intention recognition unit 530.
  • the response unit 510 is configured to obtain search scene information associated with the search request in response to the search request.
  • search engine technology including but not limited to general search engines such as Baidu and Google (the business names here are only illustrative), and special search engines in the fields of patents, trademarks, etc. , And the search engine in the app.
  • the user can generate a search request (query) in various ways such as text, image, voice, etc.
  • the text can be a search keyword or an expression form of a search sentence.
  • the composite feature generating unit 520 is configured to generate composite features for identifying search intent according to the search scene information and the search request.
  • search scene information can be regarded as the indirect expression of the user's search intent, and can supplement the potential search intent not reflected in the search request.
  • search scene information can cover multiple scene dimensions, such as time dimension, location dimension, weather dimension, user behavior dimension, and so on.
  • a user searches for "Kong Pao Chicken", it may be because they want to learn the practice of Gong Pao Chicken, or they want to order Kung Pao Chicken for takeout, or they may want to go to a restaurant that sells Gong Pao Chicken.
  • users search they do not necessarily express their search intent clearly with search requests. This requires users to find in search or perform a second search, which reduces user experience.
  • the search intent recognition unit 530 is configured to input the composite feature into the search intent recognition model, and obtain the search intent recognition result output by the search intent recognition model.
  • the search intent recognition model here is based on composite modeling and pre-training of search requests and search scene information.
  • search intents may include take-out, dine-in, recipes, reviews, discounts, etc. These search intents can reflect user needs, specifically, the name determination and category division of search intents can be performed by business parties or domain experts. In other words, search intent can be understood as generalized user needs.
  • the search intent can correspond to the category of goods or services, and the categories of goods and services can be defined according to business needs.
  • the take-out and dine-in given above are the classification of service provision methods.
  • a search result can correspond to one or more search intents. For example, if a restaurant provides both dine-in and take-out services, the corresponding search intent of the restaurant can include take-out and dine-in; while another restaurant only provides take-out services, then The search intent for this restaurant only includes takeaway. Conversely, a search intent can also correspond to one or more search results, and generally multiple search results, for example, there are many restaurants that provide take-out services. The more the search intent matches the real needs of the user, the easier it is for the search results displayed to the user to achieve the user's search purpose.
  • the search intent recognition device shown in Figure 5 not only pays attention to search requests, but also pays attention to search scene information such as weather, location, and user behavior. It uses a search intent recognition model based on composite modeling and refers to the real needs of users from multiple factors. Prediction has improved the problem that the search intent cannot be accurately identified only based on the search request, which is especially suitable for life service and LBS search scenarios.
  • the composite feature generating unit 520 is configured to encode the search scene information into a scene feature vector, and encode the search request to obtain the search request feature vector corresponding to the search request;
  • the scene feature vector and the search request feature vector are fused to obtain the fusion feature vector, and the fusion feature vector is used as a composite feature, wherein the dimensionality ratio of the search request feature vector in the fusion feature vector is not less than the preset ratio.
  • the composite feature generating unit 520 is configured to separately encode the search scene information according to the scene dimensions to obtain feature vectors corresponding to each scene dimension.
  • the scene dimensions include at least the following One: location dimension, weather dimension, user behavior dimension, time dimension.
  • the composite feature generating unit 520 is configured to perform GeoHash processing on the latitude and longitude information in the location dimension, and perform one-hot encoding on the processing result to obtain the latitude and longitude feature vector.
  • the composite feature generating unit 520 is used to perform bucket discretization processing on the continuous value category information in the weather dimension, and perform one-hot encoding on the processing result to obtain the weather Feature vector.
  • the composite feature generating unit 520 is used for the user behavior sequence in the user behavior dimension, in the case that the number of user behaviors in the user behavior sequence is not greater than the specified number , Select all user behaviors in the user behavior sequence; when the number of user behaviors in the user behavior sequence is greater than the specified number, select the specified number of user behaviors in the user behavior sequence in reverse time order; obtain each selected The search intent of the target corresponding to the user behavior of the user; perform feature embedding processing on the acquired search intent to obtain the user behavior feature vector.
  • the search intention recognition device further includes: a preprocessing unit, configured to count each user behavior sequence that includes an order behavior in the search log, and count the user behavior sequences that include an order behavior.
  • Continuous click behavior refers to the click behaviors that occur between two ordering behaviors and the occurrence interval is not greater than the preset time threshold; the average length of each continuous click behavior sequence is taken as the specified number.
  • the search intention recognition device further includes: a preprocessing unit, used to generate training samples according to the search log, and composite features based on the training samples; and a training unit, used to perform search intention recognition models using the composite features Training.
  • the preprocessing unit is configured to generate the first type of positive samples according to the search log containing the order behavior; generate the second type of positive samples according to the search log containing the click behavior, The weight of the first type of positive samples is greater than the weight of the second type of positive samples; negative samples are generated based on search logs that only contain browsing behavior.
  • the search intention recognition result includes the intent intensity distributions of multiple search intents
  • the device further includes: an intention adjustment unit for acquiring the specified search intent and its intent rank ; According to the intent rank and intent intensity distribution, determine the intent intensity value of the designated search intent; according to the intent intensity value and intent intensity distribution of the designated search intent, generate the intent intensity distribution containing the designated search intent.
  • the intention adjustment unit is configured to obtain the specified search intention that matches the search request and is in an effective state.
  • the effective state is based on the display time of the specified search intent and/or the specified search The number of impressions of the intent is determined.
  • the embodiments of the present application not only focus on search requests, but also focus on search scene information such as weather, location, and user behavior.
  • the search intention recognition model based on composite modeling is used to analyze the real needs of users with reference to various factors. It is predicted that the problem that the search intent cannot be accurately identified only based on the search request is improved, and it is especially suitable for life service and LBS search scenarios.
  • the specified search intent that matches the search request and is in effect can be used to adjust the intensity distribution of the intent, which further improves the match between the search intent given finally and the user's needs .
  • modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all the features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.
  • the various component embodiments of the present application may be implemented by hardware, or by software modules running on one or more processors, or by a combination of them.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the search intention recognition device according to the embodiments of the present application.
  • This application can also be implemented as a device or device program (for example, a computer program and a computer program product) for executing part or all of the methods described herein.
  • Such a program for implementing the present application may be stored on a computer-readable medium, or may have the form of one or more signals.
  • Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.
  • FIG. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device 600 includes a processor 610 and a memory 620 arranged to store computer-executable instructions (computer-readable program code).
  • the memory 620 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the memory 620 has a storage space 630 storing a computer-readable program code 631 for executing the aforementioned search intention recognition method.
  • the storage space 630 for storing computer-readable program codes may include various computer-readable program codes 631 respectively used to implement various steps in the above method.
  • the computer-readable program code 631 may be read from or written into one or more computer program products.
  • FIG. 7 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.
  • the computer-readable storage medium 700 stores the computer-readable program code 631 for executing the search intention recognition method described above, which can be read by the processor 610 of the electronic device 600.
  • the electronic device 600 is caused to execute each step in the method described above.
  • the computer readable program code 631 stored in the computer readable storage medium can execute the method shown in any of the above embodiments.
  • the computer readable program code 631 may be compressed in a suitable form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种方法。所述方法包括:响应于搜索请求,获取与所述搜索请求关联的搜索场景信息;根据所述搜索场景信息以及所述搜索请求,生成用于识别搜索意图的复合特征;将所述复合特征输入到搜索意图识别模型中,获取所述搜索意图识别模型输出的搜索意图识别结果。

Description

搜索意图识别 技术领域
本申请涉及搜索引擎领域,具体涉及搜索意图识别。
背景技术
准确预测用户的搜索意图,是搜索引擎至关重要的能力。其中,搜索意图通常指搜索行为背后体现的用户真实需求,例如,搜索“羽毛球”,可能是因为用户想购买羽毛球器械,也可能是寻找羽毛球场馆,亦可能是学习羽毛球规则,等等。在这个例子中,“购买器械”、“寻找场馆”以及“学习规则”就是与“羽毛球”这个搜索关键词相关的三类不同搜索意图。
对搜索意图进行识别,现有技术中有如下的几种常见方案:1)基于业务专家制定的规则,利用搜索关键词与规则的文本匹配确定搜索意图;2)基于文本分类或聚类进行搜索意图的预测;3)通过主题模型等方式,将关键词映射到高维度的语义向量空间,以表达搜索意图。
发明内容
本申请提供一种搜索意图识别方法、装置、电子设备和存储介质。
依据本申请的第一方面,提供了一种搜索意图识别方法,包括:响应于搜索请求,获取与所述搜索请求关联的搜索场景信息;根据所述搜索场景信息以及所述搜索请求,生成用于识别搜索意图的复合特征;将所述复合特征输入到搜索意图识别模型中,获取所述搜索意图识别模型输出的搜索意图识别结果。
可选地,所述根据所述搜索场景信息以及所述搜索请求,生成用于识别搜索意图的复合特征包括:将所述搜索场景信息编码为场景特征向量,以及对所述搜索请求进行编码得到与所述搜索请求对应的搜索请求特征向量;对所述场景特征向量和所述搜索请求特征向量进行融合得到融合特征向量,将所述融合特征向量作为所述复合特征,其中,搜索请求特征向量在所述融合特征向量中的维度占比不小于预设比值。
可选地,所述将所述搜索场景信息编码为场景特征向量包括:对所述搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量,所述场景维度包括如下 的至少一种:位置维度,天气维度,用户行为维度,时间维度。
可选地,所述对所述搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量包括:对位置维度下的经纬度信息进行GeoHash处理,并对处理结果进行独热编码,得到经纬度特征向量。
可选地,所述对所述搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量包括:对天气维度下的连续值类信息进行分桶离散化处理,并对处理结果进行独热编码,得到天气特征向量。
可选地,所述对所述搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量包括:针对用户行为维度下的用户行为序列,在用户行为序列中的用户行为个数不大于指定数量的情况下,选定该用户行为序列中的全部用户行为;在用户行为序列中的用户行为个数大于指定数量的情况下,以时间倒序方式选定用户行为序列中指定数量个用户行为;获取各选定的用户行为所对应目标的搜索意图;对获取的搜索意图进行特征嵌入处理,得到用户行为特征向量。
可选地,所述指定数量是通过如下方式预先确定的:针对搜索日志中每条包含下单行为的用户行为序列,统计该包含下单行为的用户行为序列中,连续点击行为序列的长度,所述连续点击行为是指发生在两次下单行为之间、且发生间隔不大于预设时间阈值的点击行为;将各连续点击行为序列的长度均值作为所述指定数量。
可选地,所述搜索意图识别模型是通过如下方式训练得到的:根据搜索日志生成训练样本;根据训练样本生成复合特征;利用所述复合特征进行搜索意图识别模型的训练。
可选地,所述根据搜索日志生成训练样本包括:根据包含下单行为的搜索日志生成第一类正样本;根据包含点击行为的搜索日志生成第二类正样本,所述第一类正样本的权重大于所述第二类正样本的权重;根据仅包含浏览行为的搜索日志生成负样本。
可选地,所述搜索意图识别结果包括多个搜索意图的意图强度分布,该方法还包括:获取指定搜索意图及其意图位次;根据所述意图位次和所述意图强度分布,确定所述指定搜索意图的意图强度值;根据所述指定搜索意图的意图强度值和所述意图强度分布,生成包含所述指定搜索意图的意图强度分布。
可选地,所述获取指定搜索意图及其意图位次包括:获取与所述搜索请求匹配、且在生效状态的指定搜索意图,所述生效状态根据指定搜索意图的展示时间和/或指定搜索意图的已展示次数确定。
依据本申请的第二方面,提供了一种搜索意图识别装置,包括:响应单元,用于响应于搜索请求,获取与所述搜索请求关联的搜索场景信息;复合特征生成单元,用于根据所述搜索场景信息以及所述搜索请求,生成用于识别搜索意图的复合特征;搜索意图识别单元,用于将所述复合特征输入到搜索意图识别模型中,获取所述搜索意图识别模型输出的搜索意图识别结果。
可选地,所述复合特征生成单元,用于将所述搜索场景信息编码为场景特征向量,以及对所述搜索请求进行编码得到与所述搜索请求对应的搜索请求特征向量;对所述场景特征向量和所述搜索请求特征向量进行融合得到融合特征向量,将所述融合特征向量作为所述复合特征,其中,搜索请求特征向量在所述融合特征向量中的维度占比不小于预设比值。
可选地,所述复合特征生成单元,用于对所述搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量,所述场景维度包括如下的至少一种:位置维度,天气维度,用户行为维度,时间维度。
可选地,所述复合特征生成单元,用于对位置维度下的经纬度信息进行GeoHash处理,并对处理结果进行独热编码,得到经纬度特征向量。
可选地,所述复合特征生成单元,用于对天气维度下的连续值类信息进行分桶离散化处理,并对处理结果进行独热编码,得到天气特征向量。
可选地,所述复合特征生成单元,用于针对用户行为维度下的用户行为序列,在用户行为序列中的用户行为个数不大于指定数量的情况下,选定该用户行为序列中的全部用户行为;在用户行为序列中的用户行为个数大于指定数量的情况下,以时间倒序方式选定用户行为序列中指定数量个用户行为;获取各选定的用户行为所对应目标的搜索意图;对获取的搜索意图进行特征嵌入处理,得到用户行为特征向量。
可选地,所述搜索意图识别装置还包括:预处理单元,用于针对搜索日志中每条包含下单行为的用户行为序列,统计该包含下单行为的用户行为序列中,连续点击行为序列的长度,所述连续点击行为是指发生在两次下单行为之间、且发生间隔不大于预设时间阈值的点击行为;将各连续点击行为序列的长度均值作为所述指定数量。
可选地,所述搜索意图识别装置还包括:预处理单元,用于根据搜索日志生成训练样本,并根据训练样本生成复合特征;训练单元,用于利用所述复合特征进行搜索意图识别模型的训练。
可选地,所述预处理单元,用于根据包含下单行为的搜索日志生成第一类正样本;根据包含点击行为的搜索日志生成第二类正样本,所述第一类正样本的权重大于所述第二类正样本的权重;根据仅包含浏览行为的搜索日志生成负样本。
可选地,所述搜索意图识别结果包括多个搜索意图的意图强度分布,所述搜索意图识别装置还包括:意图调整单元,用于获取指定搜索意图及其意图位次;根据所述意图位次和所述意图强度分布,确定所述指定搜索意图的意图强度值;根据所述指定搜索意图的意图强度值和所述意图强度分布,生成包含所述指定搜索意图的意图强度分布。
可选地,所述意图调整单元,用于获取与所述搜索请求匹配、且在生效状态的指定搜索意图,所述生效状态根据指定搜索意图的展示时间和/或指定搜索意图的已展示次数确定。
依据本申请的第三方面,提供了一种电子设备,包括:处理器;以及被安排成存储计算机可执行指令的存储器,所述计算机可执行指令在被执行时使所述处理器执行如上述任一所述的搜索意图识别方法。
依据本申请的第四方面,提供了一种计算机可读存储介质,所述计算机可读存储介质存储一个或多个程序,所述一个或多个程序当被处理器执行时,实现如上述任一所述的搜索意图识别方法。
由上述可知,本申请的实施例,响应于搜索请求,获取搜索场景信息,根据搜索场景信息以及搜索请求,生成用于识别搜索意图的复合特征,将复合特征输入到搜索意图识别模型中,获取搜索意图识别模型输出的搜索意图识别结果。本申请实施例不仅关注搜索请求,还关注天气、位置、用户行为等搜索场景信息,利用基于复合建模实现的搜索意图识别模型,参考多方面因素对用户真实需求进行预测,改善了仅根据搜索请求无法精确识别出搜索意图的问题,特别适合于生活服务类、LBS(Location Based Services,基于位置的服务)类搜索场景。
上述说明仅是本申请实施例的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
通过阅读下文一些实施方式的详细描述,各种其他的优点和益处对于本领域普通技 术人员将变得清楚明了。附图仅用于示出一些实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了根据本申请一个实施例的搜索意图识别方法的流程示意图。
图2示出了根据本申请一个实施例的一种搜索意图识别模型的训练方法的流程示意图。
图3示出了根据本申请一个实施例的搜索意图识别模型的结构示意图。
图4示出了根据本申请一个实施例的一种搜索意图识别方法的流程示意图。
图5示出了根据本申请一个实施例的搜索意图识别装置的结构示意图。
图6示出了根据本申请一个实施例的电子设备的结构示意图。
图7示出了根据本申请一个实施例的计算机可读存储介质的结构示意图。
具体实施方式
下面将参照附图更详细地描述本申请的示例性实施例。虽然附图中显示了本申请的示例性实施例,然而应当理解,可以以各种形式实现本申请而不应被这里阐述的实施例所限制。相反,提供这些实施例是为了能够更透彻地理解本申请,并且能够将本申请的范围完整的传达给本领域的技术人员。
背景技术中所介绍的上述方案均存在着仅关注文本,缺少对其他因素关注的问题。除此之外,方案1)还需要人工进行标注和规则制定,泛化能力差,不能应对业务场景的迭代变化;方案3)则难以适应对精确度和一致性要求很高的场景。可见,现有技术并不能满足业务需求,还有很大的改进空间。
本申请提出了一种将用户行为、天气、位置等搜索场景信息纳入关注范围,结合搜索请求进行复合建模,实现对搜索意图更准确识别的方案。
图1示出了根据本申请一个实施例的搜索意图识别方法的流程示意图。如图1所示,搜索意图识别方法包括步骤S110-S130。
在步骤S110,响应于搜索请求,获取与搜索请求关联的搜索场景信息。
本申请的实施例可以应用于使用搜索引擎技术的各类场景,包括但不限于百度、谷歌(此处的商业名称仅作示例性说明)等通用搜索引擎,专利、商标等领域的专用搜索引擎,以及应用APP内的搜索引擎等。
用户可以通过文本、图像、语音等各类方式生成搜索请求(query),例如文本可以是搜索关键词或者搜索语句的表述形式。
步骤S120,根据搜索场景信息以及搜索请求,生成用于识别搜索意图的复合特征。
如果说搜索请求是用户对其搜索意图给出的直接表达,那么搜索场景信息可以看作是用户对其搜索意图给出的间接表达,并且能够补充搜索请求所没有体现出的潜在搜索意图。举例来说,搜索场景信息可以覆盖多个场景维度,例如时间维度、位置维度、天气维度、用户行为维度等等。
例如,用户搜索“宫保鸡丁”,可能是因为想学习宫保鸡丁的做法,也可能是因为想点宫保鸡丁的外卖,也可能是希望前往售卖宫保鸡丁的餐馆就餐。但用户在搜索时,并不一定会以搜索请求清楚地表达出自己的搜索意图,这就需要用户在搜索中查找,或是进行二次检索,降低了用户体验。
但是,从搜索场景信息入手,就能够改善这一问题。例如,如果用户是在商场内搜索宫保鸡丁,那么就更有可能是希望前往售卖宫保鸡丁的餐馆就餐,而并非查找菜谱或点外卖。此时,环境的作用就体现了出来。而如果用户略过了多个售卖宫保鸡丁的实体餐馆,点击进入了多个外卖餐馆的页面,并在一家外卖餐馆下单,就能够确定用户是希望点外卖,而非其他意图。这就体现了用户行为的作用。
步骤S130,将复合特征输入到搜索意图识别模型中,获取搜索意图识别模型输出的搜索意图识别结果。这里的搜索意图识别模型是基于对搜索请求以及搜索场景信息的复合建模以及预训练实现的。
举例而言,搜索意图可以包括外卖、堂食、菜谱、点评、优惠等等,这些搜索意图能够反映出用户需求,具体可以由业务方或是领域专家等进行搜索意图的名称确定以及类别划分。换句话说,搜索意图可以理解为是概括出的用户需求。
具体到业务场景,搜索意图可以是和商品或者服务的类别相对应的,而商品和服务的类别可以根据业务需求进行定义,例如上面给出的外卖、堂食就是对服务提供方式的分类。
一个搜索结果可以对应一个或多个搜索意图,例如某餐馆既提供堂食售卖,也提供外卖服务,则该餐馆对应的搜索意图可以包括外卖和堂食;而另一餐馆只提供外卖服务,则该餐馆对应的搜索意图仅包括外卖。反过来,一个搜索意图也能够对应一个或多个搜索结果,并且一般是多个搜索结果,比如提供外卖服务的餐馆很多。搜索意图与用户的 真实需求越匹配,展示给用户的搜索结果也就更容易达到用户的搜索目的。
可见,图1所示的搜索意图识别方法,不仅关注搜索请求,还关注天气、位置、用户行为等搜索场景信息,利用基于复合建模实现的搜索意图识别模型,参考多方面因素对用户真实需求进行预测,改善了仅根据搜索请求无法精确识别出搜索意图的问题,特别适合于生活服务类、LBS类搜索场景。
在本申请的一个实施例中,上述搜索意图识别方法中,根据搜索场景信息以及搜索请求,生成用于识别搜索意图的复合特征包括:将搜索场景信息编码为场景特征向量,以及对搜索请求进行编码得到与搜索请求对应的搜索请求特征向量;对场景特征向量和搜索请求特征向量进行融合得到融合特征向量,将融合特征向量作为复合特征,其中,搜索请求特征向量在融合特征向量中的维度占比不小于预设比值。
其中,特征向量是文本、图像等信息的数学表达,一般是高维向量。编码操作可以采用现有技术中的任一类或多类特征工程技术实现,只要能够得到向量化的数据即可。在一个具体实施例中,搜索请求特征向量和场景特征向量均为通过嵌入(Embedding)操作得到的连续向量。搜索请求特征向量的生成,可以是利用NLP(Natural Language Processing,自然语言处理)技术对文本形式的搜索请求内容进行编码,或者是利用图像处理技术对图像形式的搜索请求内容进行编码实现的,等等。
前面提到,搜索请求是能够直接反映用户搜索意图的信息,因此搜索请求特征向量就显得相对重要,在融合特征向量中的维度占比不能过低。具体的融合操作可以是连接(Concat)操作。
在本申请的一个实施例中,上述搜索意图识别方法中,将搜索场景信息编码为场景特征向量包括:对搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量,场景维度包括如下的至少一种:位置维度,天气维度,用户行为维度,时间维度。
其中,场景信息在位置维度下可以具体包括经纬度信息、城市信息、实体(兴趣点POI,例如商场、住宅区等)信息等;在天气维度下可以具体包括风力信息、温度信息等;在用户行为维度下可以具体包括点击信息、下单信息、浏览信息等;在时间维度下可以具体包括季节信息、节假日信息等。
各场景维度都可以生成相应的特征向量,这些特征向量均可以独立作为场景特征向量,也可以将这些特征向量中的全部或部分通过Concat操作进行融合并将融合后的特 征向量作为场景特征向量。
在本申请的一个实施例中,上述搜索意图识别方法中,对搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量包括:对位置维度下的经纬度信息进行GeoHash处理,并对处理结果进行独热编码,得到经纬度特征向量。
其中,GeoHash处理本质上是空间索引的一种方式,可以理解为将地表视为一个二维平面,将平面递归分解成更小的子块,每个子块在一定经纬度范围内拥有相同的编码。以GeoHash方式建立空间索引,可以提高经纬度检索的效率。在本申请中利用GeoHash将二维的经纬度信息一维化,便于搜索意图识别模型的训练以及运用。独热编码(one-hot)可以理解为用N位状态寄存器编码N个状态,每个状态都有独立的寄存器位,但这些寄存器位中只有一位有效。通过独热编码可以将离散的特征连续化。
在本申请的一个实施例中,上述搜索意图识别方法中,对搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量包括:对天气维度下的连续值类信息进行分桶离散化处理,并对处理结果进行独热编码,得到天气特征向量。分桶离散化处理主要针对风力、温度等连续值,使得得到的天气特征向量高维稀疏,便于搜索意图识别模型的训练以及使用。
在本申请的一个实施例中,上述搜索意图识别方法中,对搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量包括:针对用户行为维度下的用户行为序列,在用户行为序列中的用户行为个数不大于指定数量的情况下,选定该用户行为序列中的全部用户行为;在用户行为序列中的用户行为个数大于指定数量的情况下,以时间倒序方式选定用户行为序列中指定数量个用户行为;获取各选定的用户行为所对应目标的搜索意图;对获取的搜索意图进行特征嵌入处理,得到用户行为特征向量。
举例而言,日志可以记录各个用户行为的发生时间点,这些用户行为可以形成用户行为序列。用户行为信息如果包含多个用户行为,若作为搜索场景信息,则需要确保这些用户行为具有一定的关联性。因此,在本申请的实施例中提供了一种以时间倒序选择用户行为的方式,避免要纳入的用户行为数量过多,或者不具有关联性。
用户行为往往是与具体的搜索结果对应的,而这些搜索结果与业务相关,可以由业务方事先提供这些搜索结果的搜索意图,而这部分内容在实际场景下也通常不需要进行额外生成,因为业务方为了自身业务需要,通常都会先做好搜索意图的分类以及搜索结果与搜索意图的关联。
词嵌入编码(Word Embedding)是自然语言处理(Natural Language Processing,NLP)中的一项文本处理技术,在本申请的实施例中可以用其进行特征嵌入处理。当然,具体的特征嵌入方式并不限于该示例,例如还可以使用Transformer(谷歌提出的一类NLP模型,暂无中文名)、BERT(Bidirectional Encoder Representations from Transformers,基于Transformer的双向编码器表征)模型和GPT(Generative Pre-Training,生成式训练)模型进行特征嵌入处理。
在本申请的一个实施例中,上述搜索意图识别方法中,指定数量是通过如下方式预先确定的:针对搜索日志中每条包含下单行为的用户行为序列,统计该包含下单行为的用户行为序列中,连续点击行为序列的长度,连续点击行为是指发生在两次下单行为之间、且发生间隔不大于预设时间阈值的点击行为;将各连续点击行为序列的长度均值作为指定数量。
例如,对用户行为序列取当前行为前的不大于N(N为指定数量)次点击行为所对应的搜索意图,N的计算方法可以是:每次下单行为往前推30秒,如果有点击行为就计入,由此再往前推30秒……以此往复不断向前推,直至超过30秒没有点击行为或者发生下单行为便中断。这样就形成了连续点击行为序列。统计一个较长时间区间的连续行为序列的长度,求平均值即为N。对用户行为的建模即是以当前搜索请求前的最多N次点击偏好预测当前偏好,以确定搜索意图。
在本申请的一个实施例中,上述搜索意图识别方法中,搜索意图识别模型是通过如下方式训练得到的:根据搜索日志生成训练样本;根据训练样本生成复合特征;利用复合特征进行搜索意图识别模型的训练。
这里的搜索日志记录有搜索请求的具体内容,如查询文本或是查询图像,以及记录有搜索场景信息。具体训练时,可以分为多个阶段,在每个训练阶段后,对得到的搜索意图识别模型进行验证,验证通过则投入使用,如验证不通过,一方面可以对搜索意图识别模型的参数进行调整,也就是对搜索意图识别模型进行优化,另一方面也可以考虑对训练样本以及特征向量的生成方式或者融合方式进行调整。然后根据调整后的数据和流程重新进行训练,直至搜索意图识别模型验证通过。
例如一个可选方案中,搜索意图识别模型可以先进行预训练,根据预训练的反馈,可以对搜索请求特征向量进行微调(Fine-tuning)。
在本申请的一个实施例中,上述搜索意图识别方法中,根据搜索日志生成训练样本 包括:根据包含下单行为的搜索日志生成第一类正样本;根据包含点击行为的搜索日志生成第二类正样本,第一类正样本的权重大于第二类正样本的权重;根据仅包含浏览行为的搜索日志生成负样本。
举例来说,搜索日志可以记录从用户发起一次搜索请求开始直到下单、重新进行搜索、或者离开搜索引擎的过程中的信息。例如,用户搜索了“宫保鸡丁”,搜索引擎通过页面展示了多个搜索结果。这些搜索结果有的仅被展示出来,有的被用户点击,用户最终还可能选择一些搜索结果进行下单。
对于浏览行为、点击行为和下单行为而言,下单行为最能够反映出用户真实的正向搜索意图,也就是“需要什么”;点击行为虽然也能够反映出用户的正向搜索意图,但也可能是通过误触产生;而如果仅有浏览行为,则能够反映出用户的负向搜索意图,也就是“不需要什么”。
因此,可以将包含点击行为的搜索日志作为第二类正样本,将包含下单行为的搜索日志作为第一类正样本,并以权重区分,例如,第二类正样本的权重与第一类正样本的权重比值可以是1:10。而负样本可以对应于用户在点击前浏览的搜索结果(业内称为“Skip above”,暂无中文名称),而对于点击后展现的那些搜索结果则不作处理。
当然,具体的样本生成方式可以不限于上述示例,可以根据需求进行变更。
图2示出了根据本申请一个实施例的一种搜索意图识别模型的训练方法的流程示意图。参见图2,当用户输入搜索关键词,发起搜索请求后,搜索引擎会返回搜索结果并记录搜索日志。搜索日志经过清洗等处理后被存储。通过搜索日志记录的浏览行为、点击行为和下单行为可以生成正负训练样本及权重,并通过结合业务方给出的搜索意图类别进行样本标注。对训练样本进行特征处理,得到搜索请求特征向量、经纬度特征向量、天气特征向量、用户行为特征向量以及其他一些可根据需求生成的扩展特征向量。根据这些特征向量生成融合特征向量,输入搜索意图识别模型进行模型训练。如果模型验证通过则得到可用的搜索意图识别模型,如果模型验证不通过则进行参数优化等处理,重复训练直至搜索意图识别模型验证通过。
另外,当有新的搜索意图产生(这里的新的搜索意图产生,并不一定是指用户有了新需求,更可能是业务上有了新的定义)时,在收集到一定数量的搜索日志后,可以对搜索意图识别模型进行迭代更新。
在特征处理方面,可以参照图3示出的根据本申请一个实施例的搜索意图识别模型 的结构示意图。其中,搜索关键词通过编码层处理后得到搜索请求特征向量,进入网络层;经纬度信息经过GeoHash处理后进入编码层,得到经纬度特征向量;天气信息经过分桶离散化处理后进入编码层,得到天气特征向量;用户行为序列通过编码层处理后得到用户行为特征向量,进入网络层;经纬度特征向量与天气特征向量通过Concat操作得到环境特征向量,进入网络层;上述各网络层的输出通过Concat操作得到融合特征向量,进入主干网络层,输出搜索意图识别结果,并计算损失。
在本申请的一个实施例中,上述搜索意图识别方法中,搜索意图识别结果包括多个搜索意图的意图强度分布,该方法还包括:获取指定搜索意图及其意图位次;根据意图位次和意图强度分布,确定指定搜索意图的意图强度值;根据指定搜索意图的意图强度值和意图强度分布,生成包含指定搜索意图的意图强度分布。
根据搜索日志来建模,最终得到搜索意图的方法虽然能够符合用户侧的需求,但是对于业务方而言也存在一定不足。原因在于,仅基于用户行为的建模,容易产生马太效应,即强者恒强,弱者恒弱,导致有些搜索意图容易被忽视,新的搜索意图较难被曝光。
并且,在冷启动(应用预设时间段内首次启动)场景下,由于用户行为信息的缺失,上述的搜索意图识别有时不能达到较好的业务效果。因此本申请设计了将其他搜索意图,如业务方推荐的搜索意图纳入的整合性方案,使得业务方在搜索意图识别过程中也有参与。
例如,根据用户输入的搜索关键词,搜索引擎识别出了A、B、C、D四个搜索意图,这四个搜索意图的意图强度逐次递减,分别为0.4,0.3,0.2和0.1,这样就形成了这四个搜索意图的意图强度分布,在展现时会优先展示A搜索意图对应的搜索结果。
但是业务方希望展示搜索意图E,并希望其能展示在第三位,也就是形成A、B、E、C、D的次序,此时就可以根据目前的意图强度分布,生成E的意图强度值,例如以B的意图强度值与C的意图强度值取算术平均值0.35。由于加入了E使得各意图强度值的总和超过1,可以利用softmax函数等进行归一化处理。
举例来说,每个搜索意图可以对应不同的搜索结果,用户可以在搜索结果页面中的各搜索意图间(例如每个搜索意图分别在各自的选项卡中展示其对应的搜索结果)进行切换。“外卖”是一个已有的搜索意图,而业务方在运营过程中,又推出了“精品外卖”这个新的搜索意图。则一个搜索结果可能既对应“外卖”,也对应“精品外卖”,而该搜索结果在“精品外卖”中的展示优先级更高。那么对于喜欢该搜索结果的用户来说, 显然,“精品外卖”是更优的搜索意图。但由于该搜索意图是新产生的搜索意图,因此如果仅根据搜索意图识别模型输出的意图强度分布来进行搜索意图的展示,就使得“精品外卖”几乎不会被展示出来,不符合用户和业务方的需求。而如果根据上述方式对意图强度分布进行调整,就可以使“精品外卖”有较高的展示优先级,这样才能进一步根据搜索日志进行搜索意图识别模型的调整。
在本申请的一个实施例中,上述搜索意图识别方法中,获取指定搜索意图及其意图位次包括:获取与搜索请求匹配、且在生效状态的指定搜索意图,生效状态根据指定搜索意图的展示时间和/或指定搜索意图的已展示次数确定。
可以看到,指定搜索意图能够应用于冷启动场景,保障了一段时间或者展示次数内,指定搜索意图所处的位次,从而保证了对应搜索结果的展示,满足了对用户认知的培养。指定搜索意图失效时,搜索意图识别模型已积累了足够的搜索日志进行搜索意图识别。从而克服了用户行为建模场景常出现的马太效应问题,在贴近用户需求同时也满足了业务方需求。
图4示出了根据本申请一个实施例的一种搜索意图识别方法的流程示意图。如图4所示,当用户输入搜索关键词,发起搜索请求后,生成搜索请求特征向量、经纬度特征向量、天气特征向量、用户行为特征向量以及其他一些可根据需求生成的扩展特征向量。将这些特征向量融合后输入搜索意图识别模型,得到多个搜索意图的意图强度分布。如果业务方没有可用的指定搜索意图,那么就按照该意图强度分布来选择搜索结果进行展示;如果业务方有可用的指定搜索意图,那么就按照指定搜索意图重新计算意图强度分布,根据重新计算得到的意图强度分布来选择搜索结果进行展示。
业务方在提供指定搜索意图时,可选方案是按指定的数据格式来提供,例如,要求指定搜索意图与特定的搜索关键词关联,在特定时间、场景生效,并且有推荐曝光次数的限制,等等。例如,设定了生效时长后,自动地每天将时长天数减1,直至0;曝光次数也就是已展示次数,也随着每日搜索日志记录次数减少,直至0,按天更新。当某一搜索意图的生效时长和曝光次数都不为0时,将该搜索意图保障处在意图分布中的对应位次;反之,生效时长或者曝光次数任一为0,则不再考虑该指定搜索意图,此时完全由搜索意图识别模型来确定搜索意图。
图5示出了根据本申请一个实施例的一种搜索意图识别装置的结构示意图,如图5所示,搜索意图识别装置500包括响应单元510、复合特征生成单元520和搜索意图识别单元530。
响应单元510,用于响应于搜索请求,获取与搜索请求关联的搜索场景信息。
本申请的实施例可以应用于使用搜索引擎技术的各类场景,包括但不限于百度、谷歌(此处的商业名称仅作示例性说明)等通用搜索引擎,专利、商标等领域的专用搜索引擎,以及应用APP内的搜索引擎等。
用户可以通过文本、图像、语音等各类方式生成搜索请求(query),例如文本可以是搜索关键词或者搜索语句的表述形式。
复合特征生成单元520,用于根据搜索场景信息以及搜索请求,生成用于识别搜索意图的复合特征。
如果说搜索请求是用户对其搜索意图给出的直接表达,那么搜索场景信息可以看作是用户对其搜索意图给出的间接表达,并且能够补充搜索请求所没有体现出的潜在搜索意图。举例来说,搜索场景信息可以覆盖多个场景维度,例如时间维度、位置维度、天气维度、用户行为维度等等。
例如,用户搜索“宫保鸡丁”,可能是因为想学习宫保鸡丁的做法,也可能是因为想点宫保鸡丁的外卖,也可能是希望前往售卖宫保鸡丁的餐馆就餐。但用户在搜索时,并不一定会以搜索请求清楚地表达出自己的搜索意图,这就需要用户在搜索中查找,或是进行二次检索,降低了用户体验。
但是,从搜索场景信息入手,就能够改善这一问题。例如,如果用户是在商场内搜索宫保鸡丁,那么就更有可能是希望前往售卖宫保鸡丁的餐馆就餐,而并非查找菜谱或点外卖。此时,环境的作用就体现了出来。而如果用户略过了多个售卖宫保鸡丁的实体餐馆,点击进入了多个外卖餐馆的页面,并在一家外卖餐馆下单,就能够确定用户是希望点外卖,而非其他意图。这就体现了用户行为的作用。
搜索意图识别单元530,用于将复合特征输入到搜索意图识别模型中,获取搜索意图识别模型输出的搜索意图识别结果。这里的搜索意图识别模型是基于对搜索请求以及搜索场景信息的复合建模以及预训练实现的。
举例而言,搜索意图可以包括外卖、堂食、菜谱、点评、优惠等等,这些搜索意图能够反映出用户需求,具体可以由业务方或是领域专家等进行搜索意图的名称确定以及类别划分。换句话说,搜索意图可以理解为是概括出的用户需求。
具体到业务场景,搜索意图可以是和商品或者服务的类别相对应的,而商品和服务的类别可以根据业务需求进行定义,例如上面给出的外卖、堂食就是对服务提供方 式的分类。
一个搜索结果可以对应一个或多个搜索意图,例如某餐馆既提供堂食售卖,也提供外卖服务,则该餐馆对应的搜索意图可以包括外卖和堂食;而另一餐馆只提供外卖服务,则该餐馆对应的搜索意图仅包括外卖。反过来,一个搜索意图也能够对应一个或多个搜索结果,并且一般是多个搜索结果,比如提供外卖服务的餐馆很多。搜索意图与用户的真实需求越匹配,展示给用户的搜索结果也就更容易达到用户的搜索目的。
可见,图5所示的搜索意图识别装置,不仅关注搜索请求,还关注天气、位置、用户行为等搜索场景信息,利用基于复合建模实现的搜索意图识别模型,参考多方面因素对用户真实需求进行预测,改善了仅根据搜索请求无法精确识别出搜索意图的问题,特别适合于生活服务类、LBS类搜索场景。
在本申请的一个实施例中,搜索意图识别装置中,复合特征生成单元520,用于将搜索场景信息编码为场景特征向量,以及对搜索请求进行编码得到与搜索请求对应的搜索请求特征向量;对场景特征向量和搜索请求特征向量进行融合得到融合特征向量,将融合特征向量作为复合特征,其中,搜索请求特征向量在融合特征向量中的维度占比不小于预设比值。
在本申请的一个实施例中,搜索意图识别装置中,复合特征生成单元520,用于对搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量,场景维度包括如下的至少一种:位置维度,天气维度,用户行为维度,时间维度。
在本申请的一个实施例中,搜索意图识别装置中,复合特征生成单元520,用于对位置维度下的经纬度信息进行GeoHash处理,并对处理结果进行独热编码,得到经纬度特征向量。
在本申请的一个实施例中,搜索意图识别装置中,复合特征生成单元520,用于对天气维度下的连续值类信息进行分桶离散化处理,并对处理结果进行独热编码,得到天气特征向量。
在本申请的一个实施例中,搜索意图识别装置中,复合特征生成单元520,用于针对用户行为维度下的用户行为序列,在用户行为序列中的用户行为个数不大于指定数量的情况下,选定该用户行为序列中的全部用户行为;在用户行为序列中的用户行为个数大于指定数量的情况下,以时间倒序方式选定用户行为序列中指定数量个用户行为;获取各选定的用户行为所对应目标的搜索意图;对获取的搜索意图进行特征嵌入处理, 得到用户行为特征向量。
在本申请的一个实施例中,搜索意图识别装置还包括:预处理单元,用于针对搜索日志中每条包含下单行为的用户行为序列,统计该包含下单行为的用户行为序列中,连续点击行为序列的长度,连续点击行为是指发生在两次下单行为之间、且发生间隔不大于预设时间阈值的点击行为;将各连续点击行为序列的长度均值作为指定数量。
在本申请的一个实施例中,搜索意图识别装置还包括:预处理单元,用于根据搜索日志生成训练样本,并根据训练样本生成复合特征;训练单元,用于利用复合特征进行搜索意图识别模型的训练。
在本申请的一个实施例中,搜索意图识别装置中,预处理单元,用于根据包含下单行为的搜索日志生成第一类正样本;根据包含点击行为的搜索日志生成第二类正样本,第一类正样本的权重大于第二类正样本的权重;根据仅包含浏览行为的搜索日志生成负样本。
在本申请的一个实施例中,搜索意图识别装置中,搜索意图识别结果包括多个搜索意图的意图强度分布,所述装置还包括:意图调整单元,用于获取指定搜索意图及其意图位次;根据意图位次和意图强度分布,确定指定搜索意图的意图强度值;根据指定搜索意图的意图强度值和意图强度分布,生成包含指定搜索意图的意图强度分布。
在本申请的一个实施例中,搜索意图识别装置中,意图调整单元,用于获取与搜索请求匹配、且在生效状态的指定搜索意图,生效状态根据指定搜索意图的展示时间和/或指定搜索意图的已展示次数确定。
需要说明的是,上述各装置实施例的具体实施方式可以参照前述对应方法实施例的具体实施方式进行,在此不再赘述。
综上所述,本申请的实施例,不仅关注搜索请求,还关注天气、位置、用户行为等搜索场景信息,利用基于复合建模实现的搜索意图识别模型,参考多方面因素对用户真实需求进行预测,改善了仅根据搜索请求无法精确识别出搜索意图的问题,特别适合于生活服务类、LBS类搜索场景。对于冷启动、业务方存在指定搜索意图的场景,可以利用与搜索请求匹配、且在生效状态的指定搜索意图进行意图强度分布的调整,进一步提升了最终给出的搜索意图与用户需求的匹配度。
需要说明的是:
在此提供的算法和显示不与任何特定计算机、虚拟装置或者其它设备固有相关。 各种通用装置也可以与基于在此的示教一起使用。根据上面的描述,构造这类装置所要求的结构是显而易见的。此外,本申请也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本申请的内容,并且上面对特定语言所做的描述是为了披露本申请的最佳实施方式。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下实践。在一些实施例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本申请并帮助理解各个发明方面中的一个或多个,在上面对本申请的示例性实施例的描述中,本申请的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本申请要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,发明方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本申请的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本申请的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的搜索意图识别装置中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所 描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图6示出了根据本申请一个实施例的电子设备的结构示意图。该电子设备600包括处理器610和被安排成存储计算机可执行指令(计算机可读程序代码)的存储器620。存储器620可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器620具有存储用于执行上述搜索意图识别方法的计算机可读程序代码631的存储空间630。例如,用于存储计算机可读程序代码的存储空间630可以包括分别用于实现上面的方法中的各种步骤的各个计算机可读程序代码631。计算机可读程序代码631可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘、紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为例如图7所示的计算机可读存储介质。图7示出了根据本申请一个实施例的一种计算机可读存储介质的结构示意图。该计算机可读存储介质700存储有用于执行上述搜索意图识别方法的计算机可读程序代码631,可以被电子设备600的处理器610读取,当计算机可读程序代码631由电子设备600运行时,导致该电子设备600执行上面所描述的方法中的各个步骤,具体来说,该计算机可读存储介质存储的计算机可读程序代码631可以执行上述任一实施例中示出的方法。计算机可读程序代码631可以以适当形式进行压缩。
应该注意的是上述实施例对本申请进行说明而不是对本申请进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。

Claims (14)

  1. 一种搜索意图识别方法,包括:
    响应于搜索请求,获取与所述搜索请求关联的搜索场景信息;
    根据所述搜索场景信息以及所述搜索请求,生成用于识别搜索意图的复合特征;
    将所述复合特征输入到搜索意图识别模型中,获取所述搜索意图识别模型输出的搜索意图识别结果。
  2. 如权利要求1所述的搜索意图识别方法,其中,所述根据所述搜索场景信息以及所述搜索请求,生成用于识别搜索意图的复合特征包括:
    将所述搜索场景信息编码为场景特征向量,以及对所述搜索请求进行编码得到与所述搜索请求对应的搜索请求特征向量;
    对所述场景特征向量和所述搜索请求特征向量进行融合得到融合特征向量,将所述融合特征向量作为所述复合特征,其中,搜索请求特征向量在所述融合特征向量中的维度占比不小于预设比值。
  3. 如权利要求2所述的搜索意图识别方法,其中,所述将所述搜索场景信息编码为场景特征向量包括:
    对所述搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量,所述场景维度包括如下的至少一种:位置维度,天气维度,用户行为维度,时间维度。
  4. 如权利要求3所述的搜索意图识别方法,其中,所述对所述搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量包括:
    对位置维度下的经纬度信息进行GeoHash处理和独热编码,得到经纬度特征向量。
  5. 如权利要求3所述的搜索意图识别方法,其中,所述对所述搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量包括:
    对天气维度下的连续值类信息进行分桶离散化处理和独热编码,得到天气特征向量。
  6. 如权利要求3所述的搜索意图识别方法,其中,所述对所述搜索场景信息按场景维度分别进行编码,得到与各场景维度对应的特征向量包括:
    针对用户行为维度下的用户行为序列,在用户行为序列中的用户行为个数不大于指定数量的情况下,选定该用户行为序列中的全部用户行为;在用户行为序列中的用户行为个数大于指定数量的情况下,以时间倒序方式选定用户行为序列中指定数量个用户行为;
    获取各选定的用户行为所对应目标的搜索意图;
    对获取的搜索意图进行特征嵌入处理,得到用户行为特征向量。
  7. 如权利要求6所述的搜索意图识别方法,其中,所述指定数量是通过如下方式预先确定的:
    针对搜索日志中每条包含下单行为的用户行为序列,统计该包含下单行为的用户行为序列中,连续点击行为序列的长度,所述连续点击行为是指发生在两次下单行为之间、且发生间隔不大于预设时间阈值的点击行为;
    将各连续点击行为序列的长度均值作为所述指定数量。
  8. 如权利要求1所述的搜索意图识别方法,其中,所述搜索意图识别模型是通过如下方式训练得到的:
    根据搜索日志生成训练样本;
    根据训练样本生成复合特征;
    利用所述复合特征进行搜索意图识别模型的训练。
  9. 如权利要求8所述的搜索意图识别方法,其中,所述根据搜索日志生成训练样本包括:
    根据包含下单行为的搜索日志生成第一类正样本;
    根据包含点击行为的搜索日志生成第二类正样本,所述第一类正样本的权重大于所述第二类正样本的权重;
    根据仅包含浏览行为的搜索日志生成负样本。
  10. 如权利要求1-9中任一项所述的搜索意图识别方法,其中,所述搜索意图识别结果包括多个搜索意图的意图强度分布,该方法还包括:
    获取指定搜索意图及其意图位次;
    根据所述意图位次和所述意图强度分布,确定所述指定搜索意图的意图强度值;
    根据所述指定搜索意图的意图强度值和所述意图强度分布,生成包含所述指定搜索意图的意图强度分布。
  11. 如权利要求10所述的搜索意图识别方法,其中,所述获取指定搜索意图及其意图位次包括:
    获取与所述搜索请求匹配、且在生效状态的指定搜索意图,
    所述生效状态根据指定搜索意图的展示时间和/或指定搜索意图的已展示次数确定。
  12. 一种搜索意图识别装置,包括:
    响应单元,用于响应于搜索请求,获取与所述搜索请求关联的搜索场景信息;
    复合特征生成单元,用于根据所述搜索场景信息以及所述搜索请求,生成用于识别 搜索意图的复合特征;
    搜索意图识别单元,用于将所述复合特征输入到搜索意图识别模型中,获取所述搜索意图识别模型输出的搜索意图识别结果。
  13. 一种电子设备,该电子设备包括:处理器;以及被安排成存储计算机可执行指令的存储器,其中,所述计算机可执行指令在被执行时使所述处理器执行如权利要求1-11中任一项所述的搜索意图识别方法。
  14. 一种计算机可读存储介质,所述计算机可读存储介质存储一个或多个程序,其中,所述一个或多个程序当被处理器执行时,实现如权利要求1-11中任一项所述的搜索意图识别方法。
PCT/CN2021/080240 2020-03-20 2021-03-11 搜索意图识别 WO2021185147A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010204153.3A CN111310008A (zh) 2020-03-20 2020-03-20 搜索意图识别方法、装置、电子设备和存储介质
CN202010204153.3 2020-03-20

Publications (1)

Publication Number Publication Date
WO2021185147A1 true WO2021185147A1 (zh) 2021-09-23

Family

ID=71157269

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/080240 WO2021185147A1 (zh) 2020-03-20 2021-03-11 搜索意图识别

Country Status (2)

Country Link
CN (1) CN111310008A (zh)
WO (1) WO2021185147A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805023A (zh) * 2023-08-25 2023-09-26 量子数科科技有限公司 一种基于大语言模型的外卖推荐方法

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310008A (zh) * 2020-03-20 2020-06-19 北京三快在线科技有限公司 搜索意图识别方法、装置、电子设备和存储介质
CN112330215B (zh) * 2020-11-26 2024-02-02 长沙理工大学 一种城市用车需求量预测方法、设备及存储介质
CN112765424B (zh) * 2021-01-29 2023-10-10 抖音视界有限公司 数据查询方法、装置、设备及计算机可读介质
CN113032694B (zh) * 2021-05-26 2021-11-09 浙江口碑网络技术有限公司 基于场景的查询方法及装置、存储介质、计算机设备
CN113255354B (zh) * 2021-06-03 2021-12-07 北京达佳互联信息技术有限公司 搜索意图识别方法、装置、服务器及存储介质
CN113468405B (zh) * 2021-06-25 2024-03-26 北京达佳互联信息技术有限公司 数据搜索方法、装置、电子设备及存储介质
CN113343692B (zh) * 2021-07-15 2023-09-12 杭州网易云音乐科技有限公司 搜索意图的识别方法、模型训练方法、装置、介质及设备
CN113553851A (zh) * 2021-07-15 2021-10-26 杭州网易云音乐科技有限公司 关键词的确定方法、装置、存储介质和计算设备
CN114218259B (zh) * 2022-02-21 2022-05-24 深圳市云初信息科技有限公司 基于大数据SaaS的多维科创信息搜索方法及系统
CN114385933B (zh) * 2022-03-22 2022-06-07 武汉大学 一种顾及语义的地理信息资源检索意图识别方法
CN115099242B (zh) * 2022-08-29 2022-11-15 江西电信信息产业有限公司 意图识别方法、系统、计算机及可读存储介质
CN116881541A (zh) * 2023-05-05 2023-10-13 厦门亚瑟网络科技有限公司 针对在线搜索活动的ai处理方法及在线服务大数据系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049481A (zh) * 2012-11-29 2013-04-17 百度在线网络技术(北京)有限公司 一种搜索方法和搜索设备
CN104636336A (zh) * 2013-11-06 2015-05-20 百度在线网络技术(北京)有限公司 一种视频搜索的方法和装置
CN106326338A (zh) * 2016-08-03 2017-01-11 北京百度网讯科技有限公司 基于搜索引擎的服务提供方法和装置
US20190354555A1 (en) * 2017-01-04 2019-11-21 International Business Machines Corporation Dynamic faceting for personalized search and discovery
CN111310008A (zh) * 2020-03-20 2020-06-19 北京三快在线科技有限公司 搜索意图识别方法、装置、电子设备和存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866474B (zh) * 2014-02-20 2018-10-09 阿里巴巴集团控股有限公司 个性化数据搜索方法及装置
CN105930527B (zh) * 2016-06-01 2019-09-20 北京百度网讯科技有限公司 搜索方法及装置
CN110020128B (zh) * 2017-10-26 2023-04-28 阿里巴巴集团控股有限公司 一种搜索结果排序方法及装置
CN107862027B (zh) * 2017-10-31 2019-03-12 北京小度信息科技有限公司 检索意图识别方法、装置、电子设备及可读存储介质
CN108416649A (zh) * 2018-02-05 2018-08-17 北京三快在线科技有限公司 搜索结果排序方法、装置、电子设备及存储介质
CN110309431A (zh) * 2018-03-09 2019-10-08 北京搜狗科技发展有限公司 一种数据处理方法、装置和电子设备
CN109063200B (zh) * 2018-09-11 2022-10-14 优视科技(中国)有限公司 资源搜索方法及其装置、电子设备、计算机可读介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103049481A (zh) * 2012-11-29 2013-04-17 百度在线网络技术(北京)有限公司 一种搜索方法和搜索设备
CN104636336A (zh) * 2013-11-06 2015-05-20 百度在线网络技术(北京)有限公司 一种视频搜索的方法和装置
CN106326338A (zh) * 2016-08-03 2017-01-11 北京百度网讯科技有限公司 基于搜索引擎的服务提供方法和装置
US20190354555A1 (en) * 2017-01-04 2019-11-21 International Business Machines Corporation Dynamic faceting for personalized search and discovery
CN111310008A (zh) * 2020-03-20 2020-06-19 北京三快在线科技有限公司 搜索意图识别方法、装置、电子设备和存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116805023A (zh) * 2023-08-25 2023-09-26 量子数科科技有限公司 一种基于大语言模型的外卖推荐方法
CN116805023B (zh) * 2023-08-25 2023-11-03 量子数科科技有限公司 一种基于大语言模型的外卖推荐方法

Also Published As

Publication number Publication date
CN111310008A (zh) 2020-06-19

Similar Documents

Publication Publication Date Title
WO2021185147A1 (zh) 搜索意图识别
US11321759B2 (en) Method, computer program product and system for enabling personalized recommendations using intelligent dialog
JP6657124B2 (ja) 会話理解システムのためのセッションコンテキストモデリング
CN105760417B (zh) 基于个性化用户模型和情境的认知交互式搜索的方法和系统
JP6177871B2 (ja) 製品情報の公開
US20220365939A1 (en) Methods and systems for client side search ranking improvements
US10216851B1 (en) Selecting content using entity properties
US9104754B2 (en) Object selection based on natural language queries
US20160378863A1 (en) Selecting representative video frames for videos
US20100191740A1 (en) System and method for ranking web searches with quantified semantic features
US8977625B2 (en) Inference indexing
US11494204B2 (en) Mixed-grained detection and analysis of user life events for context understanding
CN103679462A (zh) 一种评论数据处理方法和装置、一种搜索方法和系统
US9501530B1 (en) Systems and methods for selecting content
US20150254247A1 (en) Natural language searching with cognitive modeling
CN103823900A (zh) 信息点重要性确定方法和装置
WO2021227869A1 (zh) 搜索意图识别
CN109977292A (zh) 搜索方法、装置、计算设备和计算机可读存储介质
CN116917887A (zh) 使用基于注意力的排名系统的查询处理
CN116680481B (zh) 搜索排序方法、装置、设备、存储介质及计算机程序产品
US20240020538A1 (en) Systems and methods for real-time search based generative artificial intelligence
JP2012113716A (ja) カテゴリーマッチングを用いたキーワード抽出システムおよびキーワード抽出方法
CN102693264B (zh) 跨Web图传播信号
Chang et al. Using ANN to Analyze the Correlation Between Tourism-Related Hot Words and Tourist Numbers: A Case Study in Japan
CN112214664A (zh) 知识库的构建方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21770518

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21770518

Country of ref document: EP

Kind code of ref document: A1