CN111310008A

CN111310008A - Search intention recognition method and device, electronic equipment and storage medium

Info

Publication number: CN111310008A
Application number: CN202010204153.3A
Authority: CN
Inventors: 刘铭; 许鑫; 汪祖海; 王可; 吕梅; 于志安
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-03-20
Filing date: 2020-03-20
Publication date: 2020-06-19
Also published as: WO2021185147A1

Abstract

The application discloses a search intention identification method, a search intention identification device, an electronic device and a storage medium. The method comprises the following steps: responding to a search request, and acquiring search scene information associated with the search request; generating a composite feature for identifying a search intention according to the search scene information and the search request; and inputting the composite features into a search intention recognition model, and acquiring a search intention recognition result output by the search intention recognition model. The technical scheme has the advantages that the method not only focuses on the search request, but also focuses on weather, position, user behavior and other search scene information related to the search request, and predicts the real needs of the user by using a search intention recognition model realized based on composite modeling and referring to factors in multiple aspects, so that the problem that the search intention cannot be accurately recognized only according to the search request is solved, and the method is particularly suitable for life service type and LBS type search scenes.

Description

Search intention recognition method and device, electronic equipment and storage medium

Technical Field

The application relates to the field of search engines, in particular to a search intention identification method and device, electronic equipment and a storage medium.

Background

Accurately predicting the search intention of a user is a vital capability of a search engine. The search intention generally refers to the real needs of the user reflected in the back of the search behavior, for example, searching for "badminton", possibly because the user wants to buy badminton equipment, possibly searching for badminton stadiums, possibly learning badminton rules, and the like. In this example, "buy instruments", "find venues", and "learn rules" are three different types of search intentions related to the search keyword "badminton".

For identifying the search intention, there are several common schemes in the prior art as follows: 1) determining a search intention by matching a search keyword with a text of a rule based on the rule formulated by a service expert; 2) predicting search intent based on text classification or clustering; 3) and mapping the keywords to a high-dimensional semantic vector space by means of a topic model and the like so as to express the search intention.

The above solutions all have the problems that only texts are concerned, and other factors are not concerned. Therefore, the prior art can not meet the business requirements, and has great improvement space.

Disclosure of Invention

In view of the above, the present application is made to provide a search intention identifying method, apparatus, electronic device, and storage medium that overcome or at least partially solve the above problems.

According to a first aspect of the present application, there is provided a search intention identification method including: responding to a search request, and acquiring search scene information associated with the search request; generating a composite feature for identifying a search intention according to the search scene information and the search request; and inputting the composite features into a search intention recognition model, and acquiring a search intention recognition result output by the search intention recognition model.

Optionally, the generating a composite feature for identifying a search intention according to the search scenario information and the search request includes: coding the search scene information into a scene feature vector, and obtaining a search request feature vector corresponding to the search request according to the search request coding; and fusing the scene feature vector and the search request feature vector, and taking the obtained fused feature vector as the composite feature, wherein the dimension proportion of the search request feature vector in the fused feature vector is not less than a preset ratio.

Optionally, the encoding the search scene information into a scene feature vector includes: respectively encoding the scene information into feature vectors corresponding to the scene dimensions according to the scene dimensions; the scene dimensions include at least one of: location dimension, weather dimension, user behavior dimension, time dimension.

Optionally, the encoding the scene information into feature vectors corresponding to the scene dimensions according to the scene dimensions includes: and performing GeoHash processing on the longitude and latitude information under the position dimension, and performing one-hot coding on a processing result to obtain a longitude and latitude characteristic vector.

Optionally, the encoding the scene information into feature vectors corresponding to the scene dimensions according to the scene dimensions includes: and carrying out sub-bucket discretization on the continuous value information under the weather dimension, and carrying out independent hot coding on a processing result to obtain a weather characteristic vector.

Optionally, the encoding the scene information into feature vectors corresponding to the scene dimensions according to the scene dimensions includes: for a user behavior sequence under a user behavior dimension, selecting all user behaviors in the user behavior sequence under the condition that the number of the user behaviors in the user behavior sequence is not more than the specified number; under the condition that the number of the user behaviors in the user behavior sequence is larger than the specified number, the specified number of the user behaviors in the user behavior sequence is selected in a reverse order mode; acquiring a search intention of a target corresponding to each selected user behavior; and performing feature embedding processing on the acquired search intention to obtain a user behavior feature vector.

Optionally, the specified number is predetermined by: counting the length of a continuous click behavior sequence in each user behavior sequence containing the order placing behavior in the search log; the continuous clicking behavior is clicking behavior which occurs between two ordering behaviors and has an occurrence interval not greater than a preset time threshold; and taking the length average value of each continuous click behavior sequence as the specified number.

Optionally, the search intention recognition model is trained by: generating a training sample according to the search log; generating a composite feature according to the training sample; and training the search intention recognition model by using the composite features generated according to the training samples.

Optionally, the generating a training sample according to the search log includes: generating a first type positive sample according to a search log containing a click behavior; generating a second type of positive sample according to the search log containing the ordering behavior; the weight of the first type of positive samples is smaller than that of the second type of positive samples; negative examples are generated from search logs containing only browsing behavior.

Optionally, the search intention recognition result is an intention intensity distribution of a plurality of search intents, the method further comprising: acquiring a designated search intention and an intention order thereof; determining an intention strength value of the designated search intention according to the intention rank and the intention strength distribution; and generating an intention intensity distribution containing the specified search intention according to the intention intensity value of the specified search intention and the intention intensity distribution.

Optionally, the obtaining the specified search intention and the intention rank thereof includes: acquiring a specified search intention which is matched with the search request and is in a valid state; the effective state is determined according to the display time of the specified search intention and/or the displayed times of the specified search intention.

According to a second aspect of the present application, there is provided a search intention recognition apparatus including: the response unit is used for responding to a search request and acquiring search scene information related to the search request; a composite feature generating unit configured to generate a composite feature for identifying a search intention according to the search scene information and the search request; and the search intention identification unit is used for inputting the composite features into a search intention identification model and acquiring a search intention result output by the search intention identification model.

Optionally, the composite feature generating unit is configured to encode the search scene information into a scene feature vector, and obtain a search request feature vector corresponding to the search request according to the search request encoding; and fusing the scene feature vector and the search request feature vector, and taking the obtained fused feature vector as the composite feature, wherein the dimension proportion of the search request feature vector in the fused feature vector is not less than a preset ratio.

Optionally, the composite feature generating unit is configured to encode the scene information into feature vectors corresponding to scene dimensions according to the scene dimensions; the scene dimensions include at least one of: location dimension, weather dimension, user behavior dimension, time dimension.

Optionally, the composite feature generating unit is configured to perform GeoHash processing on the longitude and latitude information under the position dimension, and perform unique hot coding on a processing result to obtain a longitude and latitude feature vector.

Optionally, the composite feature generation unit is configured to perform bucket-based discretization on the continuous value class information in the weather dimension, and perform unique hot coding on a processing result to obtain a weather feature vector.

Optionally, the composite feature generating unit is configured to select, for a user behavior sequence in a user behavior dimension, all user behaviors in the user behavior sequence when the number of the user behaviors in the user behavior sequence is not greater than a specified number; under the condition that the number of the user behaviors in the user behavior sequence is larger than the specified number, the specified number of the user behaviors in the user behavior sequence is selected in a reverse order mode; acquiring a search intention of a target corresponding to each selected user behavior; and performing feature embedding processing on the acquired search intention to obtain a user behavior feature vector.

Optionally, the apparatus further comprises: the preprocessing unit is used for counting the length of a continuous click behavior sequence in each user behavior sequence containing the ordering behavior in the search log; the continuous clicking behavior is clicking behavior which occurs between two ordering behaviors and has an occurrence interval not greater than a preset time threshold; and taking the length average value of each continuous click behavior sequence as the specified number.

Optionally, the apparatus further comprises: the preprocessing unit is used for generating training samples according to the search logs; generating a composite feature according to the training sample;

and the training unit is used for training the search intention recognition model by using the composite features generated according to the training samples.

Optionally, the preprocessing unit is configured to generate a first type of positive sample according to a search log containing a click behavior; generating a second type of positive sample according to the search log containing the ordering behavior; the weight of the first type of positive samples is smaller than that of the second type of positive samples; negative examples are generated from search logs containing only browsing behavior.

Optionally, the search intention identification result is an intention intensity distribution of a plurality of search intents; the device further comprises: an intention adjusting unit for acquiring a specified search intention and an intention rank thereof; determining an intention strength value of the designated search intention according to the intention rank and the intention strength distribution; and generating an intention intensity distribution containing the specified search intention according to the intention intensity value of the specified search intention and the intention intensity distribution.

Optionally, the intention adjusting unit is configured to obtain a specified search intention that matches the search request and is in a valid state; the effective state is determined according to the display time of the specified search intention and/or the displayed times of the specified search intention.

In accordance with a third aspect of the present application, there is provided an electronic device comprising: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a search intent recognition method as in any above.

According to a fourth aspect of the present application, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the search intention identification method as any one of the above.

According to the technical scheme, the search scene information is obtained in response to the search request, the composite feature of the current search scene is generated according to the search scene information and the search request, the composite feature is input into the search intention identification model, and the search intention identification result output by the search intention identification model is obtained. According to the technical scheme, the search request is focused, the weather, the position, the user behavior and other search scene information are focused, the real requirements of the user are predicted by utilizing a search intention identification model Based on composite modeling and referring to factors in multiple aspects, the problem that the search intention cannot be accurately identified only according to the search request is solved, and the method is particularly suitable for the search scenes of living Services and Location Based Services (LBS).

The foregoing description is only an overview of the technical solutions of the present application, and the present application can be implemented according to the content of the description in order to make the technical means of the present application more clearly understood, and the following detailed description of the present application is given in order to make the above and other objects, features, and advantages of the present application more clearly understandable.

Drawings

Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the application. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:

FIG. 1 shows a schematic flow diagram of a search intent recognition method according to one embodiment of the present application;

FIG. 2 illustrates a flow diagram of a method of training a search intent recognition model according to one embodiment of the present application;

FIG. 3 illustrates a structural diagram of a search intent recognition model, according to one embodiment of the present application;

FIG. 4 shows a schematic flow diagram of a search intent recognition method according to one embodiment of the present application;

FIG. 5 shows a schematic structural diagram of a search intention recognition apparatus according to one embodiment of the present application;

FIG. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present application;

FIG. 7 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.

In addition to what is introduced in the background art, the scheme 1) also needs manual labeling and rule making, has poor generalization capability, and cannot cope with iterative changes of business scenes; scheme 3) is difficult to adapt to scenes with high requirements on accuracy and consistency. The application draws for the technical idea of the scheme 2), provides a scheme for bringing search scene information such as user behaviors, weather and positions into a focus range and combining a search request to perform composite modeling to realize more accurate identification of search intentions.

Fig. 1 shows a schematic flow diagram of a search intention identification method according to an embodiment of the present application. As shown in fig. 1, the search intention identifying method includes:

step S110, in response to the search request, obtaining search scenario information associated with the search request.

The embodiments of the present application can be applied to various scenarios using search engine technology, including but not limited to general search engines such as Baidu, Google (the trade name is merely exemplary), special search engines in the fields of patent, trademark, etc., and search engines within APP.

The user may generate the search request in various manners such as text, image, voice, etc., for example, the text may be a search keyword or a representation of a search sentence.

And step S120, generating a composite feature for identifying the search intention according to the search scene information and the search request.

If the search request is a direct expression given by the user to the search intention of the user, the search scene information can be regarded as an indirect expression given by the user to the search intention of the user and can supplement the potential search intention which is not reflected by the search request. In particular, searching for context information may cover multiple context dimensions, such as a time dimension, a location dimension, a weather dimension, and so forth.

For example, the user searches for "palace chicken bouillon", possibly because he wants to learn about how to go to palace chicken bouillon, possibly because he wants to order the take-out of palace chicken bouillon, or may wish to go to a restaurant selling the palace chicken bouillon to have a meal. However, when a user searches, the user does not necessarily express his/her search intention clearly with a search request, which requires the user to search in the search or perform a secondary search, thereby reducing the user experience.

However, starting from searching for scene information, this problem can be improved. For example, if the user is searching for a palace chicken bouillon in a store, it is more likely that the user wishes to eat to a restaurant selling the palace chicken bouillon rather than looking for a menu or order for sale. At this time, the effect of the environment is reflected. If the user skips over a plurality of physical restaurants selling palace chicken dices, clicks on pages entering a plurality of takeaway restaurants, and places an order at one takeaway restaurant, the user can be determined to wish to take the takeaway, but not other intentions. This reflects the role of user behavior.

And step S130, inputting the composite features into the search intention recognition model, and acquiring a search intention recognition result output by the search intention recognition model. The search intention recognition model herein is implemented based on composite modeling and pre-training of search requests and search context information.

For example, the search intention may include take-out, food, menu, comment, offer, etc., which reflect the user's needs, and may be specified by the business, domain expert, etc. for name determination and classification of the search intention. In other words, it can be understood as a generalized user requirement.

Specifically, in a business scenario, the search intention may correspond to a category of goods or services, and the category of goods and services may be defined according to business requirements, for example, the takeout and the food service provided above are categories of service providing manners.

A search result may correspond to one or more search intentions, for example, if a restaurant offers both hall food sales and takeaway services, then the search intentions corresponding to the restaurant may include takeaway and hall food; and another restaurant only provides takeaway services, the search intent corresponding to that restaurant only includes takeaway. Conversely, it is apparent that a search intent can also correspond to one or more search results, and typically a plurality of search results, such as many restaurants that provide take-away services. The more the search intent matches the user's real needs, the more easily the search results presented to the user will correspondingly achieve the user's search goals.

It can be seen that the search intention identification method shown in fig. 1 not only focuses on search requests, but also focuses on weather, location, user behavior and other search scenario information, and predicts the real needs of users by using a search intention identification model based on composite modeling and referring to multiple factors, thereby improving the problem that the search intention cannot be accurately identified only according to the search requests, and being particularly suitable for life service type and LBS type search scenarios.

In one embodiment of the present application, in the above search intention identification method, generating a composite feature for identifying a search intention according to search scene information and a search request includes: coding the search scene information into a scene characteristic vector, and obtaining a search request characteristic vector corresponding to the search request according to the search request coding; and fusing the scene feature vector and the search request feature vector, and taking the obtained fused feature vector as a composite feature, wherein the dimension proportion of the search request feature vector in the fused feature vector is not less than a preset ratio.

The feature vector is a mathematical expression of information such as text and image, and is generally a high-dimensional vector. The encoding operation can be implemented by any one or more types of feature engineering techniques in the prior art, and only vectorized data needs to be obtained. In one embodiment, the search request feature vector and the scene feature vector are continuous vectors obtained by an Embedding (Embedding) operation. The search request feature vector may be generated by encoding the search request content in a text form by using NLP (Natural language processing) technology, or by encoding the search request content in an image form by using image processing technology, and the like.

As mentioned above, the search request is information that can directly reflect the search intention of the user, so the search request feature vector is relatively important, and the dimension ratio in the fused feature vector cannot be too low. The specific fusion operation may be a ligation (Concat) operation.

In an embodiment of the application, in the above search intention identification method, encoding the search scene information into a scene feature vector includes: respectively encoding scene information into feature vectors corresponding to scene dimensions according to the scene dimensions; the scene dimensions include at least one of: location dimension, weather dimension, user behavior dimension, time dimension.

The scene information may specifically include longitude and latitude information, city information, entity (point of interest POI, such as a mall, a residential area, and the like) information, and the like under the location dimension; the weather dimension may specifically include wind information, temperature information, etc.; the user behavior dimension can specifically include click information, order placing information, browsing information and the like; the time dimension may specifically include season information, holiday information, and the like.

Each scene dimension can generate corresponding feature vectors, the feature vectors can be independently used as scene feature vectors, and all or part of fusion feature vectors formed through Concat operation can be used as scene feature vectors.

In an embodiment of the present application, in the method for identifying search intention, encoding the scene information into the feature vectors corresponding to the scene dimensions respectively according to the scene dimensions includes: and performing GeoHash processing on the longitude and latitude information under the position dimension, and performing one-hot coding on a processing result to obtain a longitude and latitude characteristic vector.

The GeoHash processing is essentially a way of spatial indexing, and can be understood as considering the earth surface as a two-dimensional plane, and recursively decomposing the plane into smaller sub-blocks, wherein each sub-block has the same code in a certain latitude and longitude range. The space index is established in a GeoHash mode, and the efficiency of latitude and longitude retrieval can be improved. In the application, the GeoHash is utilized to maintain the two-dimensional longitude and latitude information in a one-dimensional mode, and the training and the application of a search intention model are facilitated. One-hot encoding (one-hot) may be understood as encoding N states with an N-bit state register, each state having a separate register bit, but only one of these register bits being valid. Discrete features can be serialized by one-hot encoding.

In an embodiment of the present application, in the method for identifying search intention, encoding the scene information into the feature vectors corresponding to the scene dimensions respectively according to the scene dimensions includes: and carrying out sub-bucket discretization on the continuous value information under the weather dimension, and carrying out independent hot coding on a processing result to obtain a weather characteristic vector. The bucket discretization processing is mainly aimed at continuous values of wind power, temperature and the like, so that the obtained weather feature vectors are sparse in high dimension, and training and use of the search intention recognition model are facilitated.

In an embodiment of the present application, in the method for identifying search intention, encoding the scene information into the feature vectors corresponding to the scene dimensions respectively according to the scene dimensions includes: for a user behavior sequence under a user behavior dimension, selecting all user behaviors in the user behavior sequence under the condition that the number of the user behaviors in the user behavior sequence is not more than the specified number; under the condition that the number of the user behaviors in the user behavior sequence is larger than the specified number, the specified number of the user behaviors in the user behavior sequence is selected in a reverse order mode; acquiring a search intention of a target corresponding to each selected user behavior; and performing feature embedding processing on the acquired search intention to obtain a user behavior feature vector.

In particular, the log may record the occurrence time points of individual user behaviors, which may form a sequence of user behaviors. If the user behavior information includes a plurality of user behaviors, it is necessary to ensure that the user behaviors have a certain relevance if the user behavior information is used as search scene information. Therefore, in the embodiment of the present application, a manner of selecting user behaviors in a reverse time order is provided, and an excessive number of user behaviors to be included or no relevance is avoided.

The user behavior is often corresponding to specific search results, the search results are related to businesses, a business party can provide the search intention of the search results in advance, and the content does not need to be additionally generated in an actual scene, because the business party generally makes classification of the search intention and association of the search results and the search intention for the business needs of the business party.

Word Embedding coding (Word Embedding) is a text Processing technology in Natural Language Processing (NLP), and can be used to perform feature Embedding Processing in the embodiments of the present application. Of course, the specific feature embedding method is not limited to this example, and for example, a transform (NLP model of a type proposed by google, without a Chinese name), BERT (Bidirectional Encoder representation from transforms), and a GPT (Generative Pre-Training) model may be used to perform the feature embedding process.

In one embodiment of the present application, in the above search intention identification method, the specified number is predetermined by: counting the length of a continuous click behavior sequence in each user behavior sequence containing the order placing behavior in the search log; the continuous clicking behavior refers to the clicking behavior which occurs between two ordering behaviors and the occurrence interval of which is not more than a preset time threshold; and taking the length average value of each continuous click behavior sequence as a specified number.

For example, the user behavior sequence is provided with a search intention corresponding to a behavior not greater than N clicks before the current behavior, and the calculation method of N may be: every ordering action is pushed forward for 30 seconds, if the clicking action is counted, the pushing forward is continued for … … seconds, and the pushing forward is continued until the 30 seconds are exceeded or the ordering action is interrupted. This forms a sequence of consecutive click behaviors. And (4) counting the length of the continuous behavior sequence in a longer time interval, and averaging to obtain the average value N. Modeling of user behavior predicts the current preference with a preference of up to N clicks of the current search request to determine search intent.

In an embodiment of the present application, in the above search intention identifying method, the search intention identifying model is trained as follows: generating a training sample according to the search log; generating a composite feature according to the training sample; and training the search intention recognition model by using the composite features generated according to the training samples.

The search log records specific contents of the search request, such as query text or query image, and records search scene information. During specific training, the training can be divided into a plurality of stages, after each training stage, the obtained search intention recognition model is verified, and the obtained search intention recognition model is put into use if the verification is passed, if the verification is not passed, on one hand, the parameters of the search intention recognition model can be adjusted, namely, the search intention recognition model is optimized, and on the other hand, the adjustment of the generation mode or the fusion mode of the training samples and the feature vectors can be considered. And then training is carried out again according to the adjusted data and flow until the search intention recognition model passes the verification.

For example, in a preferred embodiment, the search intention recognition model may be pre-trained, and the search request feature vector may be Fine-tuned (Fine-tuning) according to the pre-trained feedback.

In an embodiment of the present application, in the above search intention identification method, generating a training sample from a search log includes: generating a first type positive sample according to a search log containing a click behavior; generating a second type of positive sample according to the search log containing the ordering behavior; the weight of the first type of positive samples is less than the weight of the second type of positive samples; negative examples are generated from search logs containing only browsing behavior.

In particular, the search log may record information from the time a user initiates a search request until the order is placed, the search is resumed, or the search engine is left. For example, a user has searched for "Tungbao chicken", and the search engine has presented multiple search results through a page, some of which are presented only and some of which are clicked on by the user. The user may also eventually select some search results for ordering.

For browsing behavior, clicking behavior and ordering behavior, ordering behavior can reflect the real forward search intention of the user, namely 'what is needed'; the click behavior can reflect the forward search intention of the user, but can also be generated by mistaken touch; if only browsing action is available, the negative search intention of the user, i.e. "nothing is needed", can be reflected.

Therefore, the search log of the click behavior can be used as the second type positive sample, the search log of the order placing behavior can be used as the first type positive sample, and the search logs are distinguished by weights, and specifically, the ratio of the weight of the second type positive sample to the weight of the first type positive sample can be 1: 10. While negative examples may correspond to search results that a user browsed before clicking (referred to in the art as "Skip above," with no Chinese name in the interim), and not processed for those search results that are presented after clicking.

Of course, the specific sample generation method may not be limited to the above example, and may be changed as needed.

FIG. 2 shows a flow diagram of a method of training a search intention recognition model according to one embodiment of the present application. Referring to fig. 2, when a user inputs a search keyword and initiates a search request, a search engine returns a search result and records a search log. The search log is stored after being subjected to processing such as cleaning. Positive and negative training samples and weights can be generated through browsing behaviors, clicking behaviors and ordering behaviors recorded by a search log, and labeling is carried out through combining search intention categories given by a business party. After the training samples are subjected to feature processing, obtaining search request feature vectors, longitude and latitude feature vectors, weather feature vectors, user behavior feature vectors and other expansion feature vectors which can be generated according to requirements, generating fusion feature vectors according to the feature vectors, inputting a search intention identification model for training, obtaining an available search intention identification model if verification is passed, performing processing such as parameter optimization if verification is not passed, and repeating training until the search intention identification model passes verification.

In addition, when a new search intention is generated (the new generation does not necessarily mean that the user has new requirements, and more likely, the user has new definitions on business), after a certain number of search logs are collected, the search intention identification model can be updated iteratively.

In terms of feature processing, reference may be made to a schematic structural diagram of a search intention recognition model according to an embodiment of the present application illustrated in fig. 3. The search keyword is processed by a coding layer to obtain a search request characteristic vector, and the search request characteristic vector enters a network layer; the longitude and latitude information enters an encoding layer after being processed by GeoHash to obtain longitude and latitude characteristic vectors; the weather information enters the coding layer after being subjected to barrel discretization processing to obtain weather characteristic vectors; the user behavior sequence is processed by a coding layer to obtain a user behavior characteristic vector, and the user behavior characteristic vector enters a network layer; obtaining an environment characteristic vector by the longitude and latitude characteristic vector and the weather characteristic vector through Concat operation, and entering a network layer; and obtaining a fusion feature vector by the output of each network layer through Concat operation, entering a backbone network layer, outputting a search intention identification result, and calculating loss.

In one embodiment of the present application, in the above-mentioned search intention identification method, the search intention identification result is an intention intensity distribution of a plurality of search intentions, the method further includes: acquiring a designated search intention and an intention order thereof; determining an intention strength value for specifying a search intention according to the intention rank and the intention strength distribution; an intention intensity distribution containing the specified search intention is generated from the intention intensity value and the intention intensity distribution specifying the search intention.

Although the method for modeling according to the search log and finally obtaining the search intention can meet the requirements of the user side, the method also has certain defects for the business side. The reason is that the martensitic effect is easily generated only based on the modeling of the user behavior, namely, the strong person is constantly strong, and the weak person is constantly weak, so that some search intentions are easily ignored, and new search intentions are difficult to be exposed.

In addition, in a cold start (initial start within a preset time period of application), due to the absence of user behavior information, the search intention recognition may not achieve a good service effect. Therefore, the application designs an integrated scheme for incorporating other search intents, such as the search intention recommended by a business party, so that the business party also participates in the search intention identification process.

For example, according to the search keyword input by the user, the search engine identifies A, B, C, D four search intentions, the intention intensities of the four search intentions are gradually decreased and respectively become 0.4, 0.3, 0.2 and 0.1, so that the intention intensity distributions of the four search intentions are formed, and the search result corresponding to the search intention A is preferentially displayed when the intention is displayed.

However, the business side wants to show the search intention E and wants it to be shown in the third place, i.e. in the order of A, B, E, C, D, and then the intention strength value of E can be generated according to the current intention strength distribution, for example, the arithmetic mean value of the intention strength value of B and the intention strength value of C is 0.35. Since E is added so that the sum of the respective intention intensity values exceeds 1, normalization processing can be performed using a softmax function or the like.

As a specific example, each search intention may correspond to a different search result, and the user may switch between the search intents in the search result page (e.g., each search intention shows its corresponding search result in a respective tab). The 'take out' is an existing search intention, and the business side puts forward a new search intention of 'fine take out'. A search result may correspond to both "take out" and "competitive take out" with the search result being presented in "competitive take out" at a higher priority. It is obvious to the user who likes the search result that "competitive takeaway" is a more optimal search intention. However, since the search intention is a newly generated search intention, if the display of the search intention is performed only based on the intention intensity distribution output by the search intention recognition model, "competitive takeaway" is hardly displayed, and does not meet the needs of the user and the business side. If the intention intensity distribution is adjusted according to the mode, the 'competitive product takeout' can have higher display priority, and therefore the search intention identification model can be further adjusted according to the search log.

In one embodiment of the present application, in the above search intention identifying method, acquiring the specified search intention and the intention rank thereof includes: acquiring a specified search intention which is matched with the search request and is in a valid state; the effective state is determined according to the display time of the specified search intention and/or the displayed times of the specified search intention.

The method has the advantages that the specified search intention can be applied to a cold start scene, and the level of the specified search intention within a period of time or display times is guaranteed, so that the display of the corresponding search result is guaranteed, and the culture of user cognition is met. When the designated search intention is invalid, the search intention model accumulates enough search logs for search intention identification. Therefore, the problem of Martian effect frequently occurring in a user behavior modeling scene is solved, and the requirements of a business party are met while the requirements of the user are met.

FIG. 4 shows a flowchart of a search intent recognition method according to one embodiment of the present application. As shown in fig. 4, when a user inputs a search keyword and initiates a search request, a search request feature vector, a longitude and latitude feature vector, a weather feature vector, a user behavior feature vector, and other expansion feature vectors that can be generated according to needs are generated, and these feature vectors are fused and input into a search intention recognition model to obtain intention intensity distribution of a plurality of search intents. If the business side does not have available designated search intention, selecting a search result for presentation according to the intention intensity distribution; if the business party has available designated search intention, the intention intensity distribution is recalculated according to the designated search intention, and the search result is selected for presentation according to the recalculated intention intensity distribution.

When the business party provides the specified search intention, the preferred scheme is to provide the specified search intention according to a specified data format, for example, the specified search intention is required to be associated with a specific search keyword, to take effect at a specific time and in a scene, and to have a limit on the recommended exposure times, and the like. For example, after the effective time length is set, the time length days are automatically reduced by 1 every day until the time length days are 0; the number of exposures, i.e. the number of already shown times, also decreases with the number of search log records per day, until 0, and is updated on a daily basis. When the effective duration and the exposure times of a certain search intention are not 0, ensuring the search intention to be at the corresponding bit times in the intention distribution; otherwise, if the effective time length or the exposure times is 0, the specified search intention is not considered any more, and the search intention is determined by the search intention recognition model completely.

Fig. 5 is a schematic structural diagram of a search intention recognition apparatus according to an embodiment of the present application, and as shown in fig. 5, the search intention recognition apparatus 500 includes:

a response unit 510, configured to, in response to the search request, obtain search scenario information associated with the search request.

The user may generate a search request (query) in various manners such as text, image, voice, etc., for example, the text may be in the form of search keywords or expression of search sentences.

A composite feature generating unit 520, configured to generate a composite feature for identifying a search intention according to the search scene information and the search request.

And the identifying unit 530 is used for inputting the composite features into the search intention identification model and acquiring the search intention result output by the search intention identification model. The search intention recognition model herein is implemented based on composite modeling and pre-training of search requests and search context information.

A search result may correspond to one or more search intentions, for example, if a restaurant offers both hall food sales and takeaway services, then the search intentions corresponding to the restaurant may include takeaway and hall food; and another restaurant only provides takeaway services, the search intent corresponding to that restaurant only includes takeaway. Conversely, it is apparent that one search intent can also correspond to multiple search results, such as many restaurants offering take-away services. The more the search intent matches the user's real needs, the more easily the search results presented to the user will correspondingly achieve the user's search goals.

It can be seen that the search intention recognition apparatus shown in fig. 5 not only focuses on search requests, but also focuses on weather, location, user behavior and other search scenario information, and predicts the real needs of users by using a search intention recognition model based on composite modeling and referring to multiple factors, thereby improving the problem that the search intention cannot be accurately recognized only according to the search requests, and being particularly suitable for life service type and LBS type search scenarios.

In an embodiment of the present application, in the search intention identifying apparatus, the composite feature generating unit 520 is configured to encode search scene information into a scene feature vector, and obtain a search request feature vector corresponding to a search request according to a search request encoding; and fusing the scene feature vector and the search request feature vector, and taking the obtained fused feature vector as a composite feature, wherein the dimension proportion of the search request feature vector in the fused feature vector is not less than a preset ratio.

In an embodiment of the present application, in the search intention identifying apparatus, the composite feature generating unit 520 is configured to encode the scene information into feature vectors corresponding to scene dimensions, respectively, according to the scene dimensions; the scene dimensions include at least one of: location dimension, weather dimension, user behavior dimension, time dimension.

In an embodiment of the present application, in the search intention identification apparatus, the composite feature generation unit 520 is configured to perform GeoHash processing on longitude and latitude information under a location dimension, and perform unique hot coding on a processing result to obtain a longitude and latitude feature vector.

In an embodiment of the present application, in the search intention identification apparatus, the composite feature generation unit 520 is configured to perform barrel discretization on continuous value class information in a weather dimension, and perform one-hot encoding on a processing result to obtain a weather feature vector.

In an embodiment of the present application, in the search intention identifying apparatus, the composite feature generating unit 520 is configured to select, for a user behavior sequence under a user behavior dimension, a specified number of user behaviors in the user behavior sequence in a reverse order; if the number of the user behaviors in the user behavior sequence is smaller than the specified number, all the user behaviors in the user behavior sequence are selected; acquiring a search intention of a target corresponding to each selected user behavior; and performing feature embedding processing on the acquired search intention to obtain a user behavior feature vector.

In one embodiment of the present application, the search intention identifying means further includes: the preprocessing unit is used for counting the length of a continuous click behavior sequence in each user behavior sequence containing the ordering behavior in the search log; the continuous clicking behavior refers to the clicking behavior which occurs between two ordering behaviors and the occurrence interval of which is not more than a preset time threshold; and taking the length average value of each continuous click behavior sequence as a specified number.

In one embodiment of the present application, the search intention identifying means further includes: the preprocessing unit is used for generating training samples according to the search logs; generating a composite feature according to the training sample; and the training unit is used for training the search intention recognition model by using the composite features generated according to the training samples.

In one embodiment of the application, in the search intention identification device, a preprocessing unit is used for generating a first type positive sample according to a search log containing click behaviors; generating a second type of positive sample according to the search log containing the ordering behavior; the weight of the first type of positive samples is less than the weight of the second type of positive samples; negative examples are generated from search logs containing only browsing behavior.

In one embodiment of the present application, in the search intention identifying means, the search intention identifying result is an intention intensity distribution of a plurality of search intentions; the device still includes: an intention adjusting unit for acquiring a specified search intention and an intention rank thereof; determining an intention strength value for specifying a search intention according to the intention rank and the intention strength distribution; an intention intensity distribution containing the specified search intention is generated from the intention intensity value and the intention intensity distribution specifying the search intention.

In one embodiment of the application, in the search intention identification device, an intention adjustment unit is used for acquiring a specified search intention which is matched with a search request and is in a valid state; the effective state is determined according to the display time of the specified search intention and/or the displayed times of the specified search intention.

It should be noted that, for the specific implementation of each apparatus embodiment, reference may be made to the specific implementation of the corresponding method embodiment, which is not described herein again.

In summary, according to the technical scheme, the search request is focused, the weather, the position, the user behavior and other search scene information are focused, the real requirements of the user are predicted by referring to multiple factors by utilizing the search intention recognition model based on composite modeling, the problem that the search intention cannot be accurately recognized only according to the search request is solved, and the method and the device are particularly suitable for life service type and LBS type search scenes. For the scenes of cold start and business side with the designated search intention, the intention intensity distribution can be adjusted by using the designated search intention which is matched with the search request and is in a valid state, and the matching degree of the finally given search intention and the user requirement is further improved.

It should be noted that:

the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. In addition, this application is not directed to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the present application as described herein, and any descriptions of specific languages are provided above to disclose the best modes of the present application.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the application may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the application, various features of the application are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the application and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: this application is intended to cover such departures from the present disclosure as come within known or customary practice in the art to which this invention pertains. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this application.

Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the application and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.

The various component embodiments of the present application may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. It will be appreciated by those skilled in the art that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components of the search intention identification apparatus according to embodiments of the present application. The present application may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present application may be stored on a computer readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.

For example, fig. 6 shows a schematic structural diagram of an electronic device according to an embodiment of the present application. The electronic device 600 comprises a processor 610 and a memory 620 arranged to store computer executable instructions (computer readable program code). The memory 620 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 620 has a storage space 630 storing computer readable program code 631 for performing any of the method steps described above. For example, the memory space 630 for storing the computer readable program code may comprise respective computer readable program codes 631 for respectively implementing the various steps in the above method. The computer readable program code 631 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 7. FIG. 7 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present application. The computer readable storage medium 700, in which a computer readable program code 631 for performing the method steps according to the application is stored, is readable by the processor 610 of the electronic device 600, which computer readable program code 631, when executed by the electronic device 600, causes the electronic device 600 to perform the respective steps of the method described above, in particular the computer readable program code 631 stored by the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 631 may be compressed in a suitable form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the application, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The application may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims

1. A search intention identification method, comprising:

responding to a search request, and acquiring search scene information associated with the search request;

generating a composite feature for identifying a search intention according to the search scene information and the search request;

and inputting the composite features into a search intention recognition model, and acquiring a search intention recognition result output by the search intention recognition model.

2. The search intention recognition method according to claim 1, wherein the generating of the composite feature for recognizing the search intention from the search scene information and the search request includes:

coding the search scene information into a scene feature vector, and obtaining a search request feature vector corresponding to the search request according to the search request coding;

and fusing the scene feature vector and the search request feature vector, and taking the obtained fused feature vector as the composite feature, wherein the dimension proportion of the search request feature vector in the fused feature vector is not less than a preset ratio.

3. The search intention identification method of claim 2, wherein said encoding the search scene information into a scene feature vector comprises:

respectively encoding the scene information into feature vectors corresponding to the scene dimensions according to the scene dimensions; the scene dimensions include at least one of: location dimension, weather dimension, user behavior dimension, time dimension.

4. The search intention recognition method of claim 3, wherein said encoding the scene information into feature vectors corresponding to scene dimensions, respectively, by scene dimension comprises:

and performing GeoHash processing on the longitude and latitude information under the position dimension, and performing one-hot coding on a processing result to obtain a longitude and latitude characteristic vector.

5. The search intention recognition method of claim 3, wherein said encoding the scene information into feature vectors corresponding to scene dimensions, respectively, by scene dimension comprises:

and carrying out sub-bucket discretization on the continuous value information under the weather dimension, and carrying out independent hot coding on a processing result to obtain a weather characteristic vector.

6. The search intention recognition method of claim 3, wherein said encoding the scene information into feature vectors corresponding to scene dimensions, respectively, by scene dimension comprises:

for a user behavior sequence under a user behavior dimension, selecting all user behaviors in the user behavior sequence under the condition that the number of the user behaviors in the user behavior sequence is not more than the specified number; under the condition that the number of the user behaviors in the user behavior sequence is larger than the specified number, the specified number of the user behaviors in the user behavior sequence is selected in a reverse order mode;

acquiring a search intention of a target corresponding to each selected user behavior;

and performing feature embedding processing on the acquired search intention to obtain a user behavior feature vector.

7. The search intention recognition method according to claim 6, wherein the specified number is predetermined by:

counting the length of a continuous click behavior sequence in each user behavior sequence containing the order placing behavior in the search log; the continuous clicking behavior is clicking behavior which occurs between two ordering behaviors and has an occurrence interval not greater than a preset time threshold;

and taking the length average value of each continuous click behavior sequence as the specified number.

8. The search intention recognition method according to claim 1, wherein the search intention recognition model is trained by:

generating a training sample according to the search log;

generating a composite feature according to the training sample;

and training the search intention recognition model by using the composite features generated according to the training samples.

9. The search intention recognition method of claim 8, wherein the generating training samples from the search logs comprises:

generating a first type positive sample according to a search log containing a click behavior;

generating a second type of positive sample according to a search log containing ordering behaviors, wherein the weight of the first type of positive sample is smaller than that of the second type of positive sample;

negative examples are generated from search logs containing only browsing behavior.

10. The search intention recognition method of any one of claims 1 to 9, wherein the search intention recognition result is an intention intensity distribution of a plurality of search intents, the method further comprising:

acquiring a designated search intention and an intention order thereof;

determining an intention strength value of the designated search intention according to the intention rank and the intention strength distribution;

and generating an intention intensity distribution containing the specified search intention according to the intention intensity value of the specified search intention and the intention intensity distribution.

11. The search intention recognition method according to claim 10, wherein the acquiring of the specified search intention and the intention rank thereof comprises:

acquiring a specified search intention which is matched with the search request and is in a valid state;

the effective state is determined according to the display time of the specified search intention and/or the displayed times of the specified search intention.

12. A search intention recognition apparatus comprising:

the response unit is used for responding to a search request and acquiring search scene information related to the search request;

a composite feature generating unit configured to generate a composite feature for identifying a search intention according to the search scene information and the search request;

and the search intention identification unit is used for inputting the composite features into a search intention identification model and acquiring a search intention result output by the search intention identification model.

13. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the search intent recognition method of any of claims 1-11.

14. A computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs which, when executed by a processor, implement the search intention identification method of any one of claims 1-11.