CN107832476A

CN107832476A - A kind of understanding method of search sequence, device, equipment and storage medium

Info

Publication number: CN107832476A
Application number: CN201711248658.4A
Authority: CN
Inventors: 王硕寰; 孙宇; 于佃海
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2017-12-01
Filing date: 2017-12-01
Publication date: 2018-03-23
Anticipated expiration: 2037-12-01
Also published as: CN107832476B

Abstract

The embodiment of the invention discloses a kind of understanding method of search sequence, device, equipment and storage medium.Methods described includes：It is determined that the term vector of each word included in annotation search sequence；Using advance search sequence is clicked on according to having for each URL site names and each URL site names and without hiding layer parameter, convolution layer parameter and the pond layer parameter clicked in the search sequence CNN models that search sequence trains to obtain as the hiding layer parameter in initial field identification model, convolution layer parameter and pond layer parameter；The field of annotation search sequence marks according to described in, and the term vector of each word included in the sequence of annotation search, the full-mesh layer parameter determined in the initial field identification model is trained to the initial field identification model, to obtain field identification model.The program can improve the model capability and generalization ability in the case of a small amount of sample, optimize training pattern, improve the understanding effect of search sequence.

Description

Method, device, equipment and storage medium for understanding search sequence

Technical Field

The embodiment of the invention relates to the technical field of information processing, in particular to a method, a device, equipment and a storage medium for understanding a search sequence.

Background

With the rapid development of Artificial Intelligence (AI) technology, more and more products and applications, such as intelligent customer service, intelligent assistant, vehicle navigation, and smart home, are beginning to try to introduce interactive man-machine interaction. However, in actual work, the development of a dialog system is a difficult task for most developers, and one of the main technical difficulties is the understanding of a search sequence (Query). The core task of Query understanding is to convert natural language into a machine-processable formal language and establish connection between the natural language and resources and services.

Query understanding can be decomposed into three tasks, namely Domain (Domain) identification (judging whether Query belongs to the Domain, if not, analyzing, Intent (Intent) classification (judging Query subdividing Intent under the Domain) and Slot (Slot) marking (marking parameter information needing to be concerned in Query under the Intent). At present, mainly according to a labeled sample in the Field, Domain identification is performed by using a model structure of a Convolutional Neural Network (CNN), and an intet/Slot joint analysis is performed by using a model structure of a Recurrent Neural Network (RNN) or a Recurrent Neural Network-Conditional Random Field (RNN-CRF).

However, the prior art has the following problems: 1) the cost of labeling data is high, and developers need to label a large amount of data to perform model training to obtain an ideal Query understanding effect. However, when the amount of label data is small, the effect of the model is limited. 2) Query understanding the generalization ability of the model is not strong, and the new Query may not be able to be resolved if it is literally completely different from the Query of the training set. For example, a developer is serving a Query understanding service for a snack vending machine, labeled "give me a bottle of cola," where the intent is "buy," the unit is "one," and the merchandise is "cola. For the new Query "snow, 2-pot", it is difficult to judge that the intent of this Query is also "buy" because every word has not been learned. Unless the user collects and enters a domain dictionary, it is difficult to find that "sprite" is a species of merchandise, as is "cola". 3) Besides labeled linguistic data, developers generally have a large amount of unlabeled linguistic data, and the linguistic data imply knowledge in the field and common grammatical structures, but the existing technology cannot be used. 4) At present, Query in many other fields understands linguistic data, and linguistic data in different fields have certain similarity. The current technology cannot migrate labeled corpora in other fields and optimize Query understanding effect in a brand new field.

Disclosure of Invention

The invention provides a method, a device, equipment and a storage medium for understanding a search sequence, which can improve the model capability and generalization capability under the condition of a small number of samples, optimize a training model and improve the Query understanding effect.

In a first aspect, an embodiment of the present invention provides a method for understanding a search sequence, including:

determining a word vector of each word contained in the labeled search sequence;

hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence convolutional neural network model, which are obtained in advance according to Uniform Resource Locator (URL) station names and click search sequences and click-free search sequences of the URL station names, are used as hidden layer parameters, convolutional layer parameters and pooling layer parameters in an initial domain identification model;

and training the initial domain recognition model to determine parameters of a full communication layer in the initial domain recognition model according to the domain labels of the labeled search sequences and word vectors of all words contained in the labeled search sequences so as to obtain the domain recognition model.

In a second aspect, an embodiment of the present invention further provides an apparatus for understanding a search sequence, including:

the word vector determining module is used for determining the word vectors of all words contained in the labeled search sequence;

the model parameter module is used for taking hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model which is obtained in advance according to the URL site names and the click search sequence and the click-free search sequence of the URL site names as hidden layer parameters, convolutional layer parameters and pooling layer parameters in the initial domain recognition model;

and the domain identification model module is used for training the initial domain identification model to determine parameters of a full communication layer in the initial domain identification model according to the domain labels of the labeled search sequences and word vectors of all words contained in the labeled search sequences so as to obtain the domain identification model.

In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method of understanding a search sequence as described above.

In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for understanding the search sequence as described above.

According to the embodiment of the invention, the domain identification model of the CNN and the bottom layer parameters of the intention/slot position identification model of the RNN are determined by searching a large number of Query and corresponding click results thereof, and then the upper layer parameters of the model are determined by using a small amount of marking data. Because the scale of the bottom layer parameters in the CNN model and the RNN model is large, the bottom layer parameters are trained in advance by introducing unsupervised data without labeling results, and then model parameters on the upper layer are trained by a small amount of data with labeling results, so that model training can be realized by a small amount of labeling data, the model capability and generalization capability under the condition of a small amount of samples can be improved, the training model is optimized, and the Query understanding effect is improved.

Drawings

FIG. 1 is a flowchart of a method for understanding a search sequence according to a first embodiment of the present invention;

FIG. 1a is a schematic diagram of a domain identification model according to a first embodiment of the present invention;

FIG. 2 is a flowchart of a method for understanding a search sequence according to a second embodiment of the present invention;

FIG. 2a is a schematic diagram of domain identification model pre-training in a second embodiment of the present invention;

FIG. 3 is a flowchart of a method for understanding a search sequence according to a third embodiment of the present invention;

FIG. 3a is a schematic overall flow chart of a method for understanding a search sequence according to a third embodiment of the present invention;

FIG. 4 is a flowchart of a method for understanding a search sequence according to a fourth embodiment of the present invention;

FIG. 4a is a schematic diagram of pre-training of an intent/slot recognition model according to a fourth embodiment of the present invention;

FIG. 4b is a diagram illustrating an intent/slot identification model according to a fourth embodiment of the present invention;

FIG. 5 is a schematic structural diagram of an understanding apparatus for a search sequence according to a fifth embodiment of the present invention;

fig. 6 is a schematic structural diagram of an apparatus in a sixth embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.

Example one

Fig. 1 is a flowchart of a method for understanding a search sequence in a first embodiment of the present invention, which is applicable to a case of understanding a search sequence in a specific field, and which can be executed by an apparatus for understanding a search sequence, and specifically includes the following steps:

step 110, determining a word vector of each word included in the labeled search sequence.

In this embodiment, the labeled search sequence refers to a search sequence with labeled results that is labeled manually. Specifically, for a specific domain, the domain label content of the search sequence may be a name of the domain, such as a movie domain, a traffic domain, and the like.

The word vector may be a very long vector represented by one-hot encoding (one-hot encoding), and the dimension of the long vector is the size of the vocabulary, in which most elements are 0, and only one dimension has a value of 1, and this dimension represents the current word. In deep learning, generally, a distributed representation (DistributedRepresentation) method is used for representing word vectors, and the method uses a low-dimensional real number vector to represent words, so that the advantage is that similar words are closer in distance and can reflect the correlation between different words, thereby reflecting the dependency relationship between words. The embodiment adopts a distributed representation method to identify word vectors.

And step 120, taking hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model obtained in advance according to the URL site names and the click search sequence and the click-free search sequence of the URL site names as hidden layer parameters, convolutional layer parameters and pooling layer parameters in the initial domain identification model.

And recording click behaviors between the Query and the URL through searching data, and counting all URLs to recall the corresponding Query. If the user searches a Query, shows the URL, and clicks the URL, then the Query is marked as a clicked Query (shown by the user search); and if the user does not click the URL, recording the Query as a no-click Query. In addition, other random search sequences in the search log can be used as the click-free search sequence.

The initial domain identification model is established based on a CNN model and sequentially comprises an input layer, a hidden layer, a convolutional layer, a pooling layer, a temporary abandoning layer, a fully-connected layer and an output layer, parameters of the hidden layer, the convolutional layer and the pooling layer are determined, and parameters of the fully-connected layer are unknown.

Specifically, in this embodiment, we consider that two different Query clicks may be related if their URL or URL site names have similarities in text. Training each URL site name in other fields and a click search sequence and a click-free search sequence of each URL site name in a CNN model to obtain bottom layer parameters: and hiding the layer parameters, the convolutional layer parameters and the pooling layer parameters, and taking the bottom layer parameters as the bottom layer parameters in the initial domain identification model. Because the scale of the bottom model parameters in the CNN model is large, each word is represented by vectors with several hundred dimensions, if there are hundreds of thousands of words, the bottom model parameters will be hundreds of millions, and the upper model parameters, namely the parameters of the fully connected layer, generally only include a matrix with several hundred dimensions multiplied by several hundred dimensions, the parameters are relatively reduced greatly, and the model can be learned through a small amount of labeled data.

Step 130, training the initial domain identification model according to the domain labels of the labeled search sequences and the word vectors of the words contained in the labeled search sequences to determine parameters of the fully connected layer in the initial domain identification model so as to obtain a domain identification model.

Specifically, referring to fig. 1a, a word vector of each word included in a labeled search sequence is used as an input of an initial domain identification model, after the initial domain identification model performs bottom Layer processing on the word vector of each word through a Hidden Layer (Hidden Layer), a Convolution Layer (Convolution Layer) and a pooling Layer (Polling Layer), the initial domain identification model performs conversion through a temporary Layer (Droport Layer), that is, N vectors are randomly selected, for example, half of the 256-dimensional vectors are selected, and then the initial domain identification model performs conversion through a full connection Layer (FullConnect Layer, FCL), compares a processing result through the full connection Layer with a domain label of the search sequence, and adjusts an FCL parameter according to a comparison result until an iteration condition is satisfied, so that a parameter of the full connection Layer can be obtained, that is, training of the domain identification model is realized.

The bottom layer parameters in the domain identification model can be determined through step 120, the parameters of the fully connected layer in the model can be determined through step 130, that is, all the parameters in the domain identification model are determined, the domain identification model is obtained, and the domain identification can be performed on the search sequence.

In this embodiment, a large number of Query and corresponding click results thereof are searched to determine the bottom layer parameters of the CNN-based domain identification model, and then a small amount of labeled data is used to determine the parameters of the fully connected layer of the domain identification model. Because the scale of the bottom layer parameters of the CNN model is large, the bottom layer parameters are trained in advance by introducing unsupervised data without labeling results, and then model parameters on the upper layer are trained by a small amount of data with labeling results, so that model training can be realized by a small amount of labeling data, the model capability and generalization capability under the condition of a small amount of samples can be improved, the training model is optimized, and the Query understanding effect is improved.

Example two

Fig. 2 is a flowchart of a method for understanding a search sequence according to a second embodiment of the present invention. The embodiment further optimizes the understanding method of the search sequence on the basis of the embodiment. Correspondingly, as shown in fig. 2, the method of the embodiment specifically includes:

step 210, determining a word vector of each word included in the labeled search sequence.

Step 220, obtaining the URL site names and the click search sequence and the click-free search sequence of the URL site names.

Wherein the URL site name is a combination of a server name and a domain name in the URL, for example: if the URL is: http:// lights.ctrip.com/fuzzy/# ctm _ ref ═ ctr _ nav _ flt _ fz _ pgs, the server name of this URL is lights, the domain name is ctrip.com, and the site name is lights.ctrip.com, or the page title of lights.ctrip.com may be used as the site name.

Specifically, all URLs are traversed to obtain the URL site names and the click search sequence and the click-less search sequence of the URL site names.

Step 230, determining word vectors of words included in the clicked search sequence, word vectors of words included in the non-clicked search sequence, and word vectors of words included in the URL site names.

In this embodiment, the specific process of determining the word vector of each word included in the search sequence or the URL site name may be: segmenting words of the search sequence or the URL site name to obtain all words contained in the search sequence or the URL site name; and carrying out word, part of speech and named entity identification on each word contained in the search sequence or the URL site name to obtain a word vector of each word contained in the search sequence or the URL site name. The word vector is determined by fusing the characteristics of the word, the part of speech, the named entity and the like.

Step 240, determining a clicked search vector according to the word vector of each word included in the clicked search sequence by using a first CNN model, determining a non-clicked search vector according to the word vector of each word included in the non-clicked search sequence by using the first CNN model, and determining a site name vector according to the word vector of each word included in the URL site name by using a second CNN model.

Specifically, referring to fig. 2a, a clicked search sequence QueryA and a clickless search sequence QueryB may share a first CNN model for training, so as to obtain a clicked search vector and a clickable search vector, respectively, and the URL site name is trained by using another second CNN model, so as to obtain a site name vector.

And step 250, optimizing the first CNN model and the second CNN model according to a first similarity between the clicked search vector and the site name vector and a second similarity between the clicked search vector and the site name vector, and taking the optimized first CNN model as the search sequence CNN model.

Specifically, referring to fig. 2a, the similarity between the site name vector and the click search vector and the similarity between the site name vector and the click search vector are calculated to obtain a first similarity Similar _ Score (QueryA, URL) and a second similarity Similar _ Score (QueryB, URL). And then optimizing the first CNN model and the second CNN model by a Back Propagation (BP) algorithm minimum Loss (Loss) function, and taking the optimized first CNN model as the search sequence CNN model.

Wherein, the Loss function can be expressed as:

wherein, Similar (V)_{Has a point Q},V_T) For the first similarity, Similar (V)_{Non-point Q},V_T) For the second similarity, margin is a constant.

And step 260, taking hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model obtained in advance according to the URL site names and the click search sequence and the click-free search sequence of the URL site names as hidden layer parameters, convolutional layer parameters and pooling layer parameters in the initial domain identification model.

Specifically, the hidden layer parameter, the convolutional layer parameter, and the pooling layer parameter of the search sequence CNN model determined in step 250 are used as the hidden layer parameter, the convolutional layer parameter, and the pooling layer parameter in the initial domain identification model, that is, the bottom layer parameter of the initial domain identification model is determined.

Step 270, training the initial domain identification model according to the domain labels of the labeled search sequences and the word vectors of the words contained in the labeled search sequences to determine parameters of the fully connected layer in the initial domain identification model so as to obtain a domain identification model.

Specifically, the underlying parameters of the initial domain identification model in step 250 may be migrated as the underlying parameters in the domain identification model. And carrying out virtual supervision training on the initial domain recognition model according to the domain labels of the labeled search sequences, the word vectors of all words contained in the labeled search sequences and the word vectors of all words contained in the unlabeled search sequences, and determining the parameters of the fully-connected layer after the transformation of the temporarily-abandoned layer and the fully-connected layer. Therefore, the bottom layer parameters and the parameters of the full connected layer of the field identification model are obtained, namely the field identification model is obtained, and the field identification can be carried out on the search sequence.

EXAMPLE III

Fig. 3 is a flowchart of a method for understanding a search sequence according to a third embodiment of the present invention. The present embodiment specifically explains the model determination of the field recognition, the intention recognition, and the slot recognition in the understanding method of the search sequence on the basis of the above-described embodiment. Correspondingly, the method of the embodiment specifically includes:

step 310, determining a word vector of each word included in the labeled search sequence.

And step 320, taking hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model obtained by training according to the URL site names and the click search sequence and the click-free search sequence of the URL site names in advance as hidden layer parameters, convolutional layer parameters and pooling layer parameters in the initial domain identification model.

Step 321, training the initial domain identification model according to the domain labels of the labeled search sequences and word vectors of words included in the labeled search sequences to determine parameters of a full connected layer in the initial domain identification model, so as to obtain a domain identification model.

And 322, performing virtual supervision training on the initial field recognition model according to the field labels of the labeled search sequences, the word vectors of the words contained in the labeled search sequences and the word vectors of the words contained in the unlabeled search sequences to determine parameters of the fully connected layer in the initial field recognition model so as to obtain the field recognition model.

And 330, taking hidden layer parameters in the bidirectional RNN language model obtained by training according to the search sequence in advance as hidden layer parameters in the initial intention recognition model and the initial slot position recognition model.

331, training the initial intention recognition model according to the intention labels of the labeled search sequences to determine parameters of a fully connected layer in the initial intention recognition model so as to obtain an intention recognition model; or training the initial slot position identification model according to the slot position label of the labeled search sequence to determine the full communication layer parameter and the conditional random field layer parameter in the initial slot position identification model so as to obtain the slot position identification model.

Step 332, performing virtual supervision training on the initial intention recognition model according to the intention labels of the labeled search sequences, the word vectors of the words contained in the labeled search sequences and the word vectors of the words contained in the unlabeled search sequences to determine parameters of a fully connected layer in the initial intention recognition model so as to obtain the intention recognition model.

Step 333, performing virtual supervision training on the initial slot position recognition model according to the slot position label of the labeled search sequence, the word vector of each word contained in the labeled search sequence and the word vector of each word contained in the unlabeled search sequence to determine the fully connected layer parameters and the conditional random field layer parameters in the initial slot position so as to obtain the slot position recognition model.

It should be noted that, step 320 and step 330 are parallel, and have no sequence, or may be parallel.

In this embodiment, if there is an unlabeled search sequence in the field, it is preferable that a Virtual adaptive Training technique (Virtual adaptive Training) be used to introduce unsupervised data and labeled data together to perform semi-supervised Training; the virtual confrontation training technology can also be adopted for semi-supervised training for the labeled search sequence. Referring to step 322, step 332 and step 333, the virtual confrontation training technique is adopted to obtain a field recognition model, an intention recognition model and a slot recognition model respectively. For the vertical data without the labeling result, the direction of the maximum disturbance to the recognition result is determined by the minimum loss function for the field, the intention and the probability distribution of the slot position of the vertical data, and the difference between the disturbed recognition result and the recognition result of the original sample is as small as possible. Wherein the minimization loss function can be expressed as:

wherein,s represents a sample, d represents a disturbance direction, p is probability distribution of intention or slot position, KL is KL divergence of the probability distribution, and r is_v-advFor the direction of greatest change in KL divergence, by derivativeAnd solving N' as the sum of the marked samples and the unmarked samples.

Fig. 3a is a schematic overall flow chart of a method for understanding a search sequence in the third embodiment of the present invention. In system integration, word vectors of point click Query and no click Query are obtained by searching a large amount of Query data and corresponding click behavior results thereof, a CNN multi-feature classification model can be obtained according to the word vectors, and then a CNN Domain model can be obtained according to labeled data and the CNN multi-feature classification model. Meanwhile, a Bi-Current neural Network (Bi-RNN) multi-feature language model can be obtained through training according to the searched Query data, and then an Intent/Slot model of a Bi-Current neural Network-Conditional Random Field (Bi-RNN-CRF) can be obtained according to the labeled data and the Bi-RNN multi-feature language model. If the marked data exist, unsupervised data and marked data can be introduced together by adopting a virtual confrontation training technology to carry out semi-supervised training to obtain a Domain model of the CNN and an Intent/Slot model of the Bi-RNN-CRF. The Domain model of CNN and the Intent/Slot model of Bi-RNN-CRF are used by users.

According to the embodiment of the invention, through searching Query and corresponding click results in a large number, bottom layer parameters of a CNN-based field identification model and an intention/slot position identification model based on a bidirectional RNN language model are determined, and then upper layer parameters of the field identification model and the intention/slot position identification model are determined by using a small amount of marking data; and the unlabeled search sequence and the labeled search sequence can adopt a virtual confrontation training technology to carry out semi-supervised training. Because the bottom layer parameters are trained in advance by introducing the unsupervised data without the labeled results, and the model parameters on the upper layer are trained by using a small amount of data with the labeled results, the model training can be realized by using a small amount of labeled data, and the model capability and the generalization capability under the condition of a small amount of samples can be improved; by adopting the virtual countermeasure training technology, the influence of small differences of the characteristics on the result can be reduced, the smoothness is increased, the training model is optimized, and the Query understanding effect is improved.

Example four

Fig. 4 is a flowchart of a method for understanding a search sequence in the fourth embodiment of the present invention. On the basis of the above embodiments, the present embodiment further optimizes the model determination of intent recognition and slot position recognition in the above understanding method of the search sequence. Correspondingly, the method of the embodiment specifically includes:

step 410, determining a word vector of each word included in the labeled search sequence.

Step 420, determine the word vector for each word included in the search sequence.

In this embodiment, a specific process of determining a word vector of each word included in the search sequence may be: performing word segmentation on the search sequence to obtain each word contained in the search sequence; and performing word, part of speech and named entity identification on each word contained in the search sequence to obtain a word vector of each word contained in the search sequence. The word vector is determined by fusing the characteristics of the word, the part of speech, the named entity and the like.

And step 430, taking the word vector of each word contained in the search sequence as the input of the bidirectional RNN language model, predicting the next word through a forward recurrent neural network in the bidirectional RNN language model, predicting the previous word through a reverse recurrent neural network, and adjusting hidden layer parameters of the forward recurrent neural network and hidden layer parameters of the reverse recurrent neural network in the bidirectional RNN language model according to the prediction result.

Specifically, referring to fig. 4a, a word vector of each word included in the search sequence is used as an input of the bidirectional RNN language model, and is processed by an Embedding Layer (Embedding Layer), and then a next word is predicted by a forward recurrent neural network in the bidirectional RNN language model Layer (RNN Layer), and a previous word is predicted by a reverse recurrent neural network, and a hidden Layer parameter of the forward recurrent neural network and a hidden Layer parameter of the reverse recurrent neural network in the bidirectional RNN language model are adjusted according to a prediction result. And splicing the hidden layer parameters of the forward cyclic neural network and the hidden layer parameters of the reverse cyclic neural network to obtain the hidden layer parameters in the bidirectional RNN language model. Wherein, the bidirectional RNN language model can be optimized by BP algorithm.

And 440, taking hidden layer parameters in the bidirectional RNN language model obtained by training according to the search sequence in advance as hidden layer parameters in the initial intention recognition model and the initial slot position recognition model.

The initial slot position identification model comprises an input layer, a hidden layer, a word representation layer, a temporary layer, a sequence representation layer, a fully connected layer and an output layer, wherein the sequence representation layer is used for splicing word representations output by the temporary layer to obtain an integral representation of a sequence, the initial slot position identification model comprises the input layer, the hidden layer, the word representation layer, the temporary layer, the fully connected layer, a conditional random field layer and the output layer, parameters of the hidden layer, the word representation layer and the temporary layer are determined, and parameters of the fully connected layer and the conditional random field layer are unknown.

Specifically, the search sequence is trained in the bi-directional RNN language model in step 430 to obtain the bottom layer parameters: and hiding the layer parameters, and taking the bottom layer parameters as the bottom layer parameters in the initial intention identification model and the initial slot position identification model.

Step 450, training the initial intention recognition model according to the intention labels of the labeled search sequences to determine parameters of a fully connected layer in the initial intention recognition model so as to obtain an intention recognition model; or training the initial slot position identification model according to the slot position label of the labeled search sequence to determine the full communication layer parameter and the conditional random field layer parameter in the initial slot position identification model so as to obtain the slot position identification model.

Specifically, referring to fig. 4b, the intention label with the search sequence labeled is used as an input of an initial intention recognition model, and after the initial intention recognition model processes the intention label through bottom layers of a hidden layer, a word representation layer and a temporary abandon layer, the initial intention recognition model processes the intention label through a sequence representation layer, a full connectivity layer transformation and a Softmax classification function, parameters of the full connectivity layer can be determined, that is, the training of the intention recognition model is realized.

Or, referring to fig. 4b, the slot label with the marked search sequence is used as the input of the initial slot identification model, after the slot label is processed by the bottom layer of the hidden layer, the presentation layer and the temporary abandon layer, the initial slot identification model passes through the Conditional Random Field layer (crf layer), the starting probability (a), the transition probability (w) and the ending probability (b) of the modeling slot label are used for one labeling result Obtaining CRF parameters; and parameters of the fully communicated layer can be determined through the transformation of the fully communicated layer, so that the training of the slot position recognition model is realized.

According to the embodiment of the invention, through searching Query and the corresponding click result in a large amount, the bottom layer parameters of the intention/slot position identification model based on the bidirectional RNN language model are determined, and then the upper layer parameters of the field identification model and the intention/slot position identification model are determined by using a small amount of labeled data. Because the scale of the bottom layer parameters of the RNN model is large, the bottom layer parameters are trained in advance by introducing unsupervised data without labeling results, and then model parameters on the upper layer are trained by a small amount of data with labeling results, so that model training can be realized by a small amount of labeling data, the model capability and generalization capability under the condition of a small amount of samples can be improved, the training model is optimized, and the Query understanding effect is improved.

EXAMPLE five

Fig. 5 is a schematic structural diagram of an apparatus for understanding a search sequence in the fifth embodiment of the present invention, where the apparatus may include:

a word vector determining module 510, configured to determine a word vector of each word included in the labeled search sequence;

a model parameter module 520, configured to use hidden layer parameters, convolutional layer parameters, and pooling layer parameters in a search sequence CNN model obtained in advance according to each URL site name and a click search sequence and a click-free search sequence of each URL site name as hidden layer parameters, convolutional layer parameters, and pooling layer parameters in the initial domain identification model;

a domain identification model module 530, configured to train the initial domain identification model according to the domain labels of the labeled search sequences and the word vectors of the words included in the labeled search sequences to determine parameters of the fully connected layer in the initial domain identification model, so as to obtain a domain identification model.

Illustratively, the apparatus may further include a CNN model module, specifically configured to:

acquiring each URL site name and a click search sequence and a click-free search sequence of each URL site name;

determining word vectors of words contained in the clicked search sequence, word vectors of words contained in the non-clicked search sequence and word vectors of words contained in the URL site names;

determining a clicked search vector according to the word vector of each word contained in the clicked search sequence by adopting a first CNN model, determining a non-clicked search vector according to the word vector of each word contained in the non-clicked search sequence by adopting the first CNN model, and determining a site name vector according to the word vector of each word contained in the URL site name by adopting a second CNN model;

and optimizing the first CNN model and the second CNN model according to a first similarity between the clicked search vector and the site name vector and a second similarity between the non-clicked search vector and the site name vector, and taking the optimized first CNN model as the search sequence CNN model.

Illustratively, the apparatus may further comprise an intent/slot identification model module, specifically to:

after word vectors of all words contained in the marked search sequence are determined, hidden layer parameters in a bidirectional RNN language model obtained in advance according to the training of the search sequence are used as hidden layer parameters in an initial intention recognition model and an initial slot position recognition model;

training the initial intention recognition model according to the intention labels of the labeled search sequences to determine parameters of a full connected layer in the initial intention recognition model so as to obtain an intention recognition model; or training the initial slot position identification model according to the slot position label of the labeled search sequence to determine the full communication layer parameter and the conditional random field layer parameter in the initial slot position identification model so as to obtain the slot position identification model.

Further, the apparatus may further include a bidirectional RNN language model parameter module, specifically configured to:

determining a word vector of each word contained in the search sequence;

and taking the word vector of each word contained in the search sequence as the input of a bidirectional RNN language model, predicting the next word through a forward circulation neural network in the bidirectional RNN language model, predicting the previous word through a reverse circulation neural network, and adjusting the hidden layer parameter of the forward circulation neural network and the hidden layer parameter of the reverse circulation neural network in the bidirectional RNN language model according to the prediction result.

Illustratively, the apparatus may further include a word vector module, specifically configured to:

segmenting words of the search sequence or the URL site name to obtain all words contained in the search sequence or the URL site name;

and carrying out word, part of speech and named entity identification on each word contained in the search sequence or the URL site name to obtain a word vector of each word contained in the search sequence or the URL site name.

Illustratively, the domain identification model module may be specifically configured to:

and carrying out virtual supervision training on the initial field recognition model according to the field labels of the labeled search sequence, the word vectors of all words contained in the labeled search sequence and the word vectors of all words contained in the unlabeled search sequence to determine parameters of a full communication layer in the initial field recognition model so as to obtain the field recognition model.

Illustratively, the intention recognition model module may be specifically configured to:

and carrying out virtual supervision training on the initial intention recognition model according to the intention labels of the labeled search sequence, word vectors of all words contained in the labeled search sequence and word vectors of all words contained in the unlabeled search sequence to determine parameters of a fully connected layer in the initial intention recognition model so as to obtain the intention recognition model.

For example, the slot identification model module may be specifically configured to:

and carrying out virtual supervision training on the initial slot position recognition model according to the slot position label of the labeled search sequence, the word vector of each word contained in the labeled search sequence and the word vector of each word contained in the unlabeled search sequence to determine the parameters of the fully-connected layer and the parameters of the conditional random field layer in the initial slot position so as to obtain the slot position recognition model.

The device for understanding the search sequence provided by the embodiment of the invention can execute the method for understanding the search sequence provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE six

Fig. 6 is a schematic structural diagram of an apparatus in a sixth embodiment of the present invention. Fig. 6 illustrates a block diagram of an exemplary device 612 suitable for use in implementing embodiments of the present invention. The device 612 shown in fig. 6 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present invention.

As shown in FIG. 6, device 612 is in the form of a general purpose computing device. Components of device 612 may include, but are not limited to: one or more processors 616, a system memory 628, and a bus 618 that couples various system components including the system memory 628 and the processors 616.

Bus 618 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and processor 616, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.

Device 612 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by device 612 and includes both volatile and nonvolatile media, removable and non-removable media.

The system memory 628 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)630 and/or cache memory 632. The device 612 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 634 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard disk drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be connected to bus 618 by one or more data media interfaces. Memory 628 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.

A program/utility 640 having a set (at least one) of program modules 642 may be stored, for example, in memory 628, such program modules 642 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 642 generally perform the functions and/or methods of the described embodiments of the present invention.

Device 612 may also communicate with one or more external devices 614 (e.g., keyboard, pointing device, display 624, etc.), with one or more devices that enable a user to interact with device 612, and/or with any devices (e.g., network card, modem, etc.) that enable device 612 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 622. Also, the device 612 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through the network adapter 620. As shown, the network adapter 620 communicates with the other modules of the device 612 via the bus 618. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the device 612, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

Processor 616 executes programs stored in system memory 628 to perform various functional applications and data processing, such as implementing a method for understanding a search sequence provided by embodiments of the present invention, the method comprising:

hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model which is obtained by training according to the URL site names and click search sequences and click-free search sequences of the URL site names in advance are used as hidden layer parameters, convolutional layer parameters and pooling layer parameters in an initial domain recognition model;

EXAMPLE seven

The seventh embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for understanding a search sequence according to the seventh embodiment of the present invention, where the method includes:

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims

1. A method for understanding a search sequence, comprising:

2. The method of claim 1, wherein the training of the search sequence CNN model according to the URL site names and the click-to-search sequence and the click-not-search sequence of the URL site names comprises:

3. The method of claim 1, wherein after determining the word vector for each word included in the labeled search sequence, further comprising:

hidden layer parameters in a bidirectional RNN language model obtained by training according to a search sequence are used as hidden layer parameters in an initial intention recognition model and an initial slot position recognition model;

4. The method of claim 3, wherein the bi-directional RNN language model trained according to the search sequence comprises:

determining a word vector of each word contained in the search sequence;

5. The method of claim 2 or claim 4, wherein determining a word vector for each word contained in a search sequence or URL site name comprises:

6. The method according to claim 1, wherein the training the initial domain identification model according to the domain labels of the labeled search sequences and the word vectors of the words included in the labeled search sequences to determine parameters of the fully connected layer in the initial domain identification model to obtain the domain identification model comprises:

7. The method of claim 4, wherein the training the initial intent recognition model according to the intent labels of the labeled search sequences to determine the parameters of the fully connected layer in the initial intent recognition model to obtain the intent recognition model comprises:

8. The method of claim 4, wherein said training the initial slot position identification model according to the slot position annotation of the annotated search sequence to determine full connectivity layer parameters and conditional random field layer parameters in the initial slot position identification model to obtain a slot position identification model comprises:

9. An apparatus for understanding a search sequence, comprising:

10. The apparatus according to claim 9, further comprising a CNN model module, specifically configured to:

11. The apparatus of claim 9, further comprising an intent/slot identification model module, specifically to:

12. The apparatus according to claim 11, further comprising a bi-directional RNN language model parameter module, specifically configured to:

determining a word vector of each word contained in the search sequence;

13. The apparatus according to claim 10 or claim 12, further comprising a word vector module, specifically configured to:

14. The apparatus of claim 9, wherein the domain identification model module is specifically configured to:

15. An apparatus, characterized in that the apparatus comprises:

one or more processors;

storage means for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method for understanding a search sequence as recited in any of claims 1-8.

16. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out a method of understanding a search sequence according to any one of claims 1 to 8.