CN107832476A - A kind of understanding method of search sequence, device, equipment and storage medium - Google Patents

A kind of understanding method of search sequence, device, equipment and storage medium Download PDF

Info

Publication number
CN107832476A
CN107832476A CN201711248658.4A CN201711248658A CN107832476A CN 107832476 A CN107832476 A CN 107832476A CN 201711248658 A CN201711248658 A CN 201711248658A CN 107832476 A CN107832476 A CN 107832476A
Authority
CN
China
Prior art keywords
search sequence
word
model
search
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711248658.4A
Other languages
Chinese (zh)
Other versions
CN107832476B (en
Inventor
王硕寰
孙宇
于佃海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201711248658.4A priority Critical patent/CN107832476B/en
Publication of CN107832476A publication Critical patent/CN107832476A/en
Application granted granted Critical
Publication of CN107832476B publication Critical patent/CN107832476B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9566URL specific, e.g. using aliases, detecting broken or misspelled links
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention discloses a kind of understanding method of search sequence, device, equipment and storage medium.Methods described includes:It is determined that the term vector of each word included in annotation search sequence;Using advance search sequence is clicked on according to having for each URL site names and each URL site names and without hiding layer parameter, convolution layer parameter and the pond layer parameter clicked in the search sequence CNN models that search sequence trains to obtain as the hiding layer parameter in initial field identification model, convolution layer parameter and pond layer parameter;The field of annotation search sequence marks according to described in, and the term vector of each word included in the sequence of annotation search, the full-mesh layer parameter determined in the initial field identification model is trained to the initial field identification model, to obtain field identification model.The program can improve the model capability and generalization ability in the case of a small amount of sample, optimize training pattern, improve the understanding effect of search sequence.

Description

Method, device, equipment and storage medium for understanding search sequence
Technical Field
The embodiment of the invention relates to the technical field of information processing, in particular to a method, a device, equipment and a storage medium for understanding a search sequence.
Background
With the rapid development of Artificial Intelligence (AI) technology, more and more products and applications, such as intelligent customer service, intelligent assistant, vehicle navigation, and smart home, are beginning to try to introduce interactive man-machine interaction. However, in actual work, the development of a dialog system is a difficult task for most developers, and one of the main technical difficulties is the understanding of a search sequence (Query). The core task of Query understanding is to convert natural language into a machine-processable formal language and establish connection between the natural language and resources and services.
Query understanding can be decomposed into three tasks, namely Domain (Domain) identification (judging whether Query belongs to the Domain, if not, analyzing, Intent (Intent) classification (judging Query subdividing Intent under the Domain) and Slot (Slot) marking (marking parameter information needing to be concerned in Query under the Intent). At present, mainly according to a labeled sample in the Field, Domain identification is performed by using a model structure of a Convolutional Neural Network (CNN), and an intet/Slot joint analysis is performed by using a model structure of a Recurrent Neural Network (RNN) or a Recurrent Neural Network-Conditional Random Field (RNN-CRF).
However, the prior art has the following problems: 1) the cost of labeling data is high, and developers need to label a large amount of data to perform model training to obtain an ideal Query understanding effect. However, when the amount of label data is small, the effect of the model is limited. 2) Query understanding the generalization ability of the model is not strong, and the new Query may not be able to be resolved if it is literally completely different from the Query of the training set. For example, a developer is serving a Query understanding service for a snack vending machine, labeled "give me a bottle of cola," where the intent is "buy," the unit is "one," and the merchandise is "cola. For the new Query "snow, 2-pot", it is difficult to judge that the intent of this Query is also "buy" because every word has not been learned. Unless the user collects and enters a domain dictionary, it is difficult to find that "sprite" is a species of merchandise, as is "cola". 3) Besides labeled linguistic data, developers generally have a large amount of unlabeled linguistic data, and the linguistic data imply knowledge in the field and common grammatical structures, but the existing technology cannot be used. 4) At present, Query in many other fields understands linguistic data, and linguistic data in different fields have certain similarity. The current technology cannot migrate labeled corpora in other fields and optimize Query understanding effect in a brand new field.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for understanding a search sequence, which can improve the model capability and generalization capability under the condition of a small number of samples, optimize a training model and improve the Query understanding effect.
In a first aspect, an embodiment of the present invention provides a method for understanding a search sequence, including:
determining a word vector of each word contained in the labeled search sequence;
hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence convolutional neural network model, which are obtained in advance according to Uniform Resource Locator (URL) station names and click search sequences and click-free search sequences of the URL station names, are used as hidden layer parameters, convolutional layer parameters and pooling layer parameters in an initial domain identification model;
and training the initial domain recognition model to determine parameters of a full communication layer in the initial domain recognition model according to the domain labels of the labeled search sequences and word vectors of all words contained in the labeled search sequences so as to obtain the domain recognition model.
In a second aspect, an embodiment of the present invention further provides an apparatus for understanding a search sequence, including:
the word vector determining module is used for determining the word vectors of all words contained in the labeled search sequence;
the model parameter module is used for taking hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model which is obtained in advance according to the URL site names and the click search sequence and the click-free search sequence of the URL site names as hidden layer parameters, convolutional layer parameters and pooling layer parameters in the initial domain recognition model;
and the domain identification model module is used for training the initial domain identification model to determine parameters of a full communication layer in the initial domain identification model according to the domain labels of the labeled search sequences and word vectors of all words contained in the labeled search sequences so as to obtain the domain identification model.
In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method of understanding a search sequence as described above.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for understanding the search sequence as described above.
According to the embodiment of the invention, the domain identification model of the CNN and the bottom layer parameters of the intention/slot position identification model of the RNN are determined by searching a large number of Query and corresponding click results thereof, and then the upper layer parameters of the model are determined by using a small amount of marking data. Because the scale of the bottom layer parameters in the CNN model and the RNN model is large, the bottom layer parameters are trained in advance by introducing unsupervised data without labeling results, and then model parameters on the upper layer are trained by a small amount of data with labeling results, so that model training can be realized by a small amount of labeling data, the model capability and generalization capability under the condition of a small amount of samples can be improved, the training model is optimized, and the Query understanding effect is improved.
Drawings
FIG. 1 is a flowchart of a method for understanding a search sequence according to a first embodiment of the present invention;
FIG. 1a is a schematic diagram of a domain identification model according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a method for understanding a search sequence according to a second embodiment of the present invention;
FIG. 2a is a schematic diagram of domain identification model pre-training in a second embodiment of the present invention;
FIG. 3 is a flowchart of a method for understanding a search sequence according to a third embodiment of the present invention;
FIG. 3a is a schematic overall flow chart of a method for understanding a search sequence according to a third embodiment of the present invention;
FIG. 4 is a flowchart of a method for understanding a search sequence according to a fourth embodiment of the present invention;
FIG. 4a is a schematic diagram of pre-training of an intent/slot recognition model according to a fourth embodiment of the present invention;
FIG. 4b is a diagram illustrating an intent/slot identification model according to a fourth embodiment of the present invention;
FIG. 5 is a schematic structural diagram of an understanding apparatus for a search sequence according to a fifth embodiment of the present invention;
fig. 6 is a schematic structural diagram of an apparatus in a sixth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a method for understanding a search sequence in a first embodiment of the present invention, which is applicable to a case of understanding a search sequence in a specific field, and which can be executed by an apparatus for understanding a search sequence, and specifically includes the following steps:
step 110, determining a word vector of each word included in the labeled search sequence.
In this embodiment, the labeled search sequence refers to a search sequence with labeled results that is labeled manually. Specifically, for a specific domain, the domain label content of the search sequence may be a name of the domain, such as a movie domain, a traffic domain, and the like.
The word vector may be a very long vector represented by one-hot encoding (one-hot encoding), and the dimension of the long vector is the size of the vocabulary, in which most elements are 0, and only one dimension has a value of 1, and this dimension represents the current word. In deep learning, generally, a distributed representation (DistributedRepresentation) method is used for representing word vectors, and the method uses a low-dimensional real number vector to represent words, so that the advantage is that similar words are closer in distance and can reflect the correlation between different words, thereby reflecting the dependency relationship between words. The embodiment adopts a distributed representation method to identify word vectors.
And step 120, taking hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model obtained in advance according to the URL site names and the click search sequence and the click-free search sequence of the URL site names as hidden layer parameters, convolutional layer parameters and pooling layer parameters in the initial domain identification model.
And recording click behaviors between the Query and the URL through searching data, and counting all URLs to recall the corresponding Query. If the user searches a Query, shows the URL, and clicks the URL, then the Query is marked as a clicked Query (shown by the user search); and if the user does not click the URL, recording the Query as a no-click Query. In addition, other random search sequences in the search log can be used as the click-free search sequence.
The initial domain identification model is established based on a CNN model and sequentially comprises an input layer, a hidden layer, a convolutional layer, a pooling layer, a temporary abandoning layer, a fully-connected layer and an output layer, parameters of the hidden layer, the convolutional layer and the pooling layer are determined, and parameters of the fully-connected layer are unknown.
Specifically, in this embodiment, we consider that two different Query clicks may be related if their URL or URL site names have similarities in text. Training each URL site name in other fields and a click search sequence and a click-free search sequence of each URL site name in a CNN model to obtain bottom layer parameters: and hiding the layer parameters, the convolutional layer parameters and the pooling layer parameters, and taking the bottom layer parameters as the bottom layer parameters in the initial domain identification model. Because the scale of the bottom model parameters in the CNN model is large, each word is represented by vectors with several hundred dimensions, if there are hundreds of thousands of words, the bottom model parameters will be hundreds of millions, and the upper model parameters, namely the parameters of the fully connected layer, generally only include a matrix with several hundred dimensions multiplied by several hundred dimensions, the parameters are relatively reduced greatly, and the model can be learned through a small amount of labeled data.
Step 130, training the initial domain identification model according to the domain labels of the labeled search sequences and the word vectors of the words contained in the labeled search sequences to determine parameters of the fully connected layer in the initial domain identification model so as to obtain a domain identification model.
Specifically, referring to fig. 1a, a word vector of each word included in a labeled search sequence is used as an input of an initial domain identification model, after the initial domain identification model performs bottom Layer processing on the word vector of each word through a Hidden Layer (Hidden Layer), a Convolution Layer (Convolution Layer) and a pooling Layer (Polling Layer), the initial domain identification model performs conversion through a temporary Layer (Droport Layer), that is, N vectors are randomly selected, for example, half of the 256-dimensional vectors are selected, and then the initial domain identification model performs conversion through a full connection Layer (FullConnect Layer, FCL), compares a processing result through the full connection Layer with a domain label of the search sequence, and adjusts an FCL parameter according to a comparison result until an iteration condition is satisfied, so that a parameter of the full connection Layer can be obtained, that is, training of the domain identification model is realized.
The bottom layer parameters in the domain identification model can be determined through step 120, the parameters of the fully connected layer in the model can be determined through step 130, that is, all the parameters in the domain identification model are determined, the domain identification model is obtained, and the domain identification can be performed on the search sequence.
In this embodiment, a large number of Query and corresponding click results thereof are searched to determine the bottom layer parameters of the CNN-based domain identification model, and then a small amount of labeled data is used to determine the parameters of the fully connected layer of the domain identification model. Because the scale of the bottom layer parameters of the CNN model is large, the bottom layer parameters are trained in advance by introducing unsupervised data without labeling results, and then model parameters on the upper layer are trained by a small amount of data with labeling results, so that model training can be realized by a small amount of labeling data, the model capability and generalization capability under the condition of a small amount of samples can be improved, the training model is optimized, and the Query understanding effect is improved.
Example two
Fig. 2 is a flowchart of a method for understanding a search sequence according to a second embodiment of the present invention. The embodiment further optimizes the understanding method of the search sequence on the basis of the embodiment. Correspondingly, as shown in fig. 2, the method of the embodiment specifically includes:
step 210, determining a word vector of each word included in the labeled search sequence.
Step 220, obtaining the URL site names and the click search sequence and the click-free search sequence of the URL site names.
Wherein the URL site name is a combination of a server name and a domain name in the URL, for example: if the URL is: http:// lights.ctrip.com/fuzzy/# ctm _ ref ═ ctr _ nav _ flt _ fz _ pgs, the server name of this URL is lights, the domain name is ctrip.com, and the site name is lights.ctrip.com, or the page title of lights.ctrip.com may be used as the site name.
Specifically, all URLs are traversed to obtain the URL site names and the click search sequence and the click-less search sequence of the URL site names.
Step 230, determining word vectors of words included in the clicked search sequence, word vectors of words included in the non-clicked search sequence, and word vectors of words included in the URL site names.
In this embodiment, the specific process of determining the word vector of each word included in the search sequence or the URL site name may be: segmenting words of the search sequence or the URL site name to obtain all words contained in the search sequence or the URL site name; and carrying out word, part of speech and named entity identification on each word contained in the search sequence or the URL site name to obtain a word vector of each word contained in the search sequence or the URL site name. The word vector is determined by fusing the characteristics of the word, the part of speech, the named entity and the like.
Step 240, determining a clicked search vector according to the word vector of each word included in the clicked search sequence by using a first CNN model, determining a non-clicked search vector according to the word vector of each word included in the non-clicked search sequence by using the first CNN model, and determining a site name vector according to the word vector of each word included in the URL site name by using a second CNN model.
Specifically, referring to fig. 2a, a clicked search sequence QueryA and a clickless search sequence QueryB may share a first CNN model for training, so as to obtain a clicked search vector and a clickable search vector, respectively, and the URL site name is trained by using another second CNN model, so as to obtain a site name vector.
And step 250, optimizing the first CNN model and the second CNN model according to a first similarity between the clicked search vector and the site name vector and a second similarity between the clicked search vector and the site name vector, and taking the optimized first CNN model as the search sequence CNN model.
Specifically, referring to fig. 2a, the similarity between the site name vector and the click search vector and the similarity between the site name vector and the click search vector are calculated to obtain a first similarity Similar _ Score (QueryA, URL) and a second similarity Similar _ Score (QueryB, URL). And then optimizing the first CNN model and the second CNN model by a Back Propagation (BP) algorithm minimum Loss (Loss) function, and taking the optimized first CNN model as the search sequence CNN model.
Wherein, the Loss function can be expressed as:
wherein, Similar (V)Has a point Q,VT) For the first similarity, Similar (V)Non-point Q,VT) For the second similarity, margin is a constant.
And step 260, taking hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model obtained in advance according to the URL site names and the click search sequence and the click-free search sequence of the URL site names as hidden layer parameters, convolutional layer parameters and pooling layer parameters in the initial domain identification model.
Specifically, the hidden layer parameter, the convolutional layer parameter, and the pooling layer parameter of the search sequence CNN model determined in step 250 are used as the hidden layer parameter, the convolutional layer parameter, and the pooling layer parameter in the initial domain identification model, that is, the bottom layer parameter of the initial domain identification model is determined.
Step 270, training the initial domain identification model according to the domain labels of the labeled search sequences and the word vectors of the words contained in the labeled search sequences to determine parameters of the fully connected layer in the initial domain identification model so as to obtain a domain identification model.
Specifically, the underlying parameters of the initial domain identification model in step 250 may be migrated as the underlying parameters in the domain identification model. And carrying out virtual supervision training on the initial domain recognition model according to the domain labels of the labeled search sequences, the word vectors of all words contained in the labeled search sequences and the word vectors of all words contained in the unlabeled search sequences, and determining the parameters of the fully-connected layer after the transformation of the temporarily-abandoned layer and the fully-connected layer. Therefore, the bottom layer parameters and the parameters of the full connected layer of the field identification model are obtained, namely the field identification model is obtained, and the field identification can be carried out on the search sequence.
In this embodiment, a large number of Query and corresponding click results thereof are searched to determine the bottom layer parameters of the CNN-based domain identification model, and then a small amount of labeled data is used to determine the parameters of the fully connected layer of the domain identification model. Because the scale of the bottom layer parameters of the CNN model is large, the bottom layer parameters are trained in advance by introducing unsupervised data without labeling results, and then model parameters on the upper layer are trained by a small amount of data with labeling results, so that model training can be realized by a small amount of labeling data, the model capability and generalization capability under the condition of a small amount of samples can be improved, the training model is optimized, and the Query understanding effect is improved.
EXAMPLE III
Fig. 3 is a flowchart of a method for understanding a search sequence according to a third embodiment of the present invention. The present embodiment specifically explains the model determination of the field recognition, the intention recognition, and the slot recognition in the understanding method of the search sequence on the basis of the above-described embodiment. Correspondingly, the method of the embodiment specifically includes:
step 310, determining a word vector of each word included in the labeled search sequence.
And step 320, taking hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model obtained by training according to the URL site names and the click search sequence and the click-free search sequence of the URL site names in advance as hidden layer parameters, convolutional layer parameters and pooling layer parameters in the initial domain identification model.
Step 321, training the initial domain identification model according to the domain labels of the labeled search sequences and word vectors of words included in the labeled search sequences to determine parameters of a full connected layer in the initial domain identification model, so as to obtain a domain identification model.
And 322, performing virtual supervision training on the initial field recognition model according to the field labels of the labeled search sequences, the word vectors of the words contained in the labeled search sequences and the word vectors of the words contained in the unlabeled search sequences to determine parameters of the fully connected layer in the initial field recognition model so as to obtain the field recognition model.
And 330, taking hidden layer parameters in the bidirectional RNN language model obtained by training according to the search sequence in advance as hidden layer parameters in the initial intention recognition model and the initial slot position recognition model.
331, training the initial intention recognition model according to the intention labels of the labeled search sequences to determine parameters of a fully connected layer in the initial intention recognition model so as to obtain an intention recognition model; or training the initial slot position identification model according to the slot position label of the labeled search sequence to determine the full communication layer parameter and the conditional random field layer parameter in the initial slot position identification model so as to obtain the slot position identification model.
Step 332, performing virtual supervision training on the initial intention recognition model according to the intention labels of the labeled search sequences, the word vectors of the words contained in the labeled search sequences and the word vectors of the words contained in the unlabeled search sequences to determine parameters of a fully connected layer in the initial intention recognition model so as to obtain the intention recognition model.
Step 333, performing virtual supervision training on the initial slot position recognition model according to the slot position label of the labeled search sequence, the word vector of each word contained in the labeled search sequence and the word vector of each word contained in the unlabeled search sequence to determine the fully connected layer parameters and the conditional random field layer parameters in the initial slot position so as to obtain the slot position recognition model.
It should be noted that, step 320 and step 330 are parallel, and have no sequence, or may be parallel.
In this embodiment, if there is an unlabeled search sequence in the field, it is preferable that a Virtual adaptive Training technique (Virtual adaptive Training) be used to introduce unsupervised data and labeled data together to perform semi-supervised Training; the virtual confrontation training technology can also be adopted for semi-supervised training for the labeled search sequence. Referring to step 322, step 332 and step 333, the virtual confrontation training technique is adopted to obtain a field recognition model, an intention recognition model and a slot recognition model respectively. For the vertical data without the labeling result, the direction of the maximum disturbance to the recognition result is determined by the minimum loss function for the field, the intention and the probability distribution of the slot position of the vertical data, and the difference between the disturbed recognition result and the recognition result of the original sample is as small as possible. Wherein the minimization loss function can be expressed as:
wherein,s represents a sample, d represents a disturbance direction, p is probability distribution of intention or slot position, KL is KL divergence of the probability distribution, and r isv-advFor the direction of greatest change in KL divergence, by derivativeAnd solving N' as the sum of the marked samples and the unmarked samples.
Fig. 3a is a schematic overall flow chart of a method for understanding a search sequence in the third embodiment of the present invention. In system integration, word vectors of point click Query and no click Query are obtained by searching a large amount of Query data and corresponding click behavior results thereof, a CNN multi-feature classification model can be obtained according to the word vectors, and then a CNN Domain model can be obtained according to labeled data and the CNN multi-feature classification model. Meanwhile, a Bi-Current neural Network (Bi-RNN) multi-feature language model can be obtained through training according to the searched Query data, and then an Intent/Slot model of a Bi-Current neural Network-Conditional Random Field (Bi-RNN-CRF) can be obtained according to the labeled data and the Bi-RNN multi-feature language model. If the marked data exist, unsupervised data and marked data can be introduced together by adopting a virtual confrontation training technology to carry out semi-supervised training to obtain a Domain model of the CNN and an Intent/Slot model of the Bi-RNN-CRF. The Domain model of CNN and the Intent/Slot model of Bi-RNN-CRF are used by users.
According to the embodiment of the invention, through searching Query and corresponding click results in a large number, bottom layer parameters of a CNN-based field identification model and an intention/slot position identification model based on a bidirectional RNN language model are determined, and then upper layer parameters of the field identification model and the intention/slot position identification model are determined by using a small amount of marking data; and the unlabeled search sequence and the labeled search sequence can adopt a virtual confrontation training technology to carry out semi-supervised training. Because the bottom layer parameters are trained in advance by introducing the unsupervised data without the labeled results, and the model parameters on the upper layer are trained by using a small amount of data with the labeled results, the model training can be realized by using a small amount of labeled data, and the model capability and the generalization capability under the condition of a small amount of samples can be improved; by adopting the virtual countermeasure training technology, the influence of small differences of the characteristics on the result can be reduced, the smoothness is increased, the training model is optimized, and the Query understanding effect is improved.
Example four
Fig. 4 is a flowchart of a method for understanding a search sequence in the fourth embodiment of the present invention. On the basis of the above embodiments, the present embodiment further optimizes the model determination of intent recognition and slot position recognition in the above understanding method of the search sequence. Correspondingly, the method of the embodiment specifically includes:
step 410, determining a word vector of each word included in the labeled search sequence.
Step 420, determine the word vector for each word included in the search sequence.
In this embodiment, a specific process of determining a word vector of each word included in the search sequence may be: performing word segmentation on the search sequence to obtain each word contained in the search sequence; and performing word, part of speech and named entity identification on each word contained in the search sequence to obtain a word vector of each word contained in the search sequence. The word vector is determined by fusing the characteristics of the word, the part of speech, the named entity and the like.
And step 430, taking the word vector of each word contained in the search sequence as the input of the bidirectional RNN language model, predicting the next word through a forward recurrent neural network in the bidirectional RNN language model, predicting the previous word through a reverse recurrent neural network, and adjusting hidden layer parameters of the forward recurrent neural network and hidden layer parameters of the reverse recurrent neural network in the bidirectional RNN language model according to the prediction result.
Specifically, referring to fig. 4a, a word vector of each word included in the search sequence is used as an input of the bidirectional RNN language model, and is processed by an Embedding Layer (Embedding Layer), and then a next word is predicted by a forward recurrent neural network in the bidirectional RNN language model Layer (RNN Layer), and a previous word is predicted by a reverse recurrent neural network, and a hidden Layer parameter of the forward recurrent neural network and a hidden Layer parameter of the reverse recurrent neural network in the bidirectional RNN language model are adjusted according to a prediction result. And splicing the hidden layer parameters of the forward cyclic neural network and the hidden layer parameters of the reverse cyclic neural network to obtain the hidden layer parameters in the bidirectional RNN language model. Wherein, the bidirectional RNN language model can be optimized by BP algorithm.
And 440, taking hidden layer parameters in the bidirectional RNN language model obtained by training according to the search sequence in advance as hidden layer parameters in the initial intention recognition model and the initial slot position recognition model.
The initial slot position identification model comprises an input layer, a hidden layer, a word representation layer, a temporary layer, a sequence representation layer, a fully connected layer and an output layer, wherein the sequence representation layer is used for splicing word representations output by the temporary layer to obtain an integral representation of a sequence, the initial slot position identification model comprises the input layer, the hidden layer, the word representation layer, the temporary layer, the fully connected layer, a conditional random field layer and the output layer, parameters of the hidden layer, the word representation layer and the temporary layer are determined, and parameters of the fully connected layer and the conditional random field layer are unknown.
Specifically, the search sequence is trained in the bi-directional RNN language model in step 430 to obtain the bottom layer parameters: and hiding the layer parameters, and taking the bottom layer parameters as the bottom layer parameters in the initial intention identification model and the initial slot position identification model.
Step 450, training the initial intention recognition model according to the intention labels of the labeled search sequences to determine parameters of a fully connected layer in the initial intention recognition model so as to obtain an intention recognition model; or training the initial slot position identification model according to the slot position label of the labeled search sequence to determine the full communication layer parameter and the conditional random field layer parameter in the initial slot position identification model so as to obtain the slot position identification model.
Specifically, referring to fig. 4b, the intention label with the search sequence labeled is used as an input of an initial intention recognition model, and after the initial intention recognition model processes the intention label through bottom layers of a hidden layer, a word representation layer and a temporary abandon layer, the initial intention recognition model processes the intention label through a sequence representation layer, a full connectivity layer transformation and a Softmax classification function, parameters of the full connectivity layer can be determined, that is, the training of the intention recognition model is realized.
Or, referring to fig. 4b, the slot label with the marked search sequence is used as the input of the initial slot identification model, after the slot label is processed by the bottom layer of the hidden layer, the presentation layer and the temporary abandon layer, the initial slot identification model passes through the Conditional Random Field layer (crf layer), the starting probability (a), the transition probability (w) and the ending probability (b) of the modeling slot label are used for one labeling result Obtaining CRF parameters; and parameters of the fully communicated layer can be determined through the transformation of the fully communicated layer, so that the training of the slot position recognition model is realized.
According to the embodiment of the invention, through searching Query and the corresponding click result in a large amount, the bottom layer parameters of the intention/slot position identification model based on the bidirectional RNN language model are determined, and then the upper layer parameters of the field identification model and the intention/slot position identification model are determined by using a small amount of labeled data. Because the scale of the bottom layer parameters of the RNN model is large, the bottom layer parameters are trained in advance by introducing unsupervised data without labeling results, and then model parameters on the upper layer are trained by a small amount of data with labeling results, so that model training can be realized by a small amount of labeling data, the model capability and generalization capability under the condition of a small amount of samples can be improved, the training model is optimized, and the Query understanding effect is improved.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an apparatus for understanding a search sequence in the fifth embodiment of the present invention, where the apparatus may include:
a word vector determining module 510, configured to determine a word vector of each word included in the labeled search sequence;
a model parameter module 520, configured to use hidden layer parameters, convolutional layer parameters, and pooling layer parameters in a search sequence CNN model obtained in advance according to each URL site name and a click search sequence and a click-free search sequence of each URL site name as hidden layer parameters, convolutional layer parameters, and pooling layer parameters in the initial domain identification model;
a domain identification model module 530, configured to train the initial domain identification model according to the domain labels of the labeled search sequences and the word vectors of the words included in the labeled search sequences to determine parameters of the fully connected layer in the initial domain identification model, so as to obtain a domain identification model.
Illustratively, the apparatus may further include a CNN model module, specifically configured to:
acquiring each URL site name and a click search sequence and a click-free search sequence of each URL site name;
determining word vectors of words contained in the clicked search sequence, word vectors of words contained in the non-clicked search sequence and word vectors of words contained in the URL site names;
determining a clicked search vector according to the word vector of each word contained in the clicked search sequence by adopting a first CNN model, determining a non-clicked search vector according to the word vector of each word contained in the non-clicked search sequence by adopting the first CNN model, and determining a site name vector according to the word vector of each word contained in the URL site name by adopting a second CNN model;
and optimizing the first CNN model and the second CNN model according to a first similarity between the clicked search vector and the site name vector and a second similarity between the non-clicked search vector and the site name vector, and taking the optimized first CNN model as the search sequence CNN model.
Illustratively, the apparatus may further comprise an intent/slot identification model module, specifically to:
after word vectors of all words contained in the marked search sequence are determined, hidden layer parameters in a bidirectional RNN language model obtained in advance according to the training of the search sequence are used as hidden layer parameters in an initial intention recognition model and an initial slot position recognition model;
training the initial intention recognition model according to the intention labels of the labeled search sequences to determine parameters of a full connected layer in the initial intention recognition model so as to obtain an intention recognition model; or training the initial slot position identification model according to the slot position label of the labeled search sequence to determine the full communication layer parameter and the conditional random field layer parameter in the initial slot position identification model so as to obtain the slot position identification model.
Further, the apparatus may further include a bidirectional RNN language model parameter module, specifically configured to:
determining a word vector of each word contained in the search sequence;
and taking the word vector of each word contained in the search sequence as the input of a bidirectional RNN language model, predicting the next word through a forward circulation neural network in the bidirectional RNN language model, predicting the previous word through a reverse circulation neural network, and adjusting the hidden layer parameter of the forward circulation neural network and the hidden layer parameter of the reverse circulation neural network in the bidirectional RNN language model according to the prediction result.
Illustratively, the apparatus may further include a word vector module, specifically configured to:
segmenting words of the search sequence or the URL site name to obtain all words contained in the search sequence or the URL site name;
and carrying out word, part of speech and named entity identification on each word contained in the search sequence or the URL site name to obtain a word vector of each word contained in the search sequence or the URL site name.
Illustratively, the domain identification model module may be specifically configured to:
and carrying out virtual supervision training on the initial field recognition model according to the field labels of the labeled search sequence, the word vectors of all words contained in the labeled search sequence and the word vectors of all words contained in the unlabeled search sequence to determine parameters of a full communication layer in the initial field recognition model so as to obtain the field recognition model.
Illustratively, the intention recognition model module may be specifically configured to:
and carrying out virtual supervision training on the initial intention recognition model according to the intention labels of the labeled search sequence, word vectors of all words contained in the labeled search sequence and word vectors of all words contained in the unlabeled search sequence to determine parameters of a fully connected layer in the initial intention recognition model so as to obtain the intention recognition model.
For example, the slot identification model module may be specifically configured to:
and carrying out virtual supervision training on the initial slot position recognition model according to the slot position label of the labeled search sequence, the word vector of each word contained in the labeled search sequence and the word vector of each word contained in the unlabeled search sequence to determine the parameters of the fully-connected layer and the parameters of the conditional random field layer in the initial slot position so as to obtain the slot position recognition model.
The device for understanding the search sequence provided by the embodiment of the invention can execute the method for understanding the search sequence provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
EXAMPLE six
Fig. 6 is a schematic structural diagram of an apparatus in a sixth embodiment of the present invention. Fig. 6 illustrates a block diagram of an exemplary device 612 suitable for use in implementing embodiments of the present invention. The device 612 shown in fig. 6 is only an example and should not bring any limitations to the functionality or scope of use of the embodiments of the present invention.
As shown in FIG. 6, device 612 is in the form of a general purpose computing device. Components of device 612 may include, but are not limited to: one or more processors 616, a system memory 628, and a bus 618 that couples various system components including the system memory 628 and the processors 616.
Bus 618 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and processor 616, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Device 612 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by device 612 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 628 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)630 and/or cache memory 632. The device 612 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 634 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 6, commonly referred to as a "hard disk drive"). Although not shown in FIG. 6, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be connected to bus 618 by one or more data media interfaces. Memory 628 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 640 having a set (at least one) of program modules 642 may be stored, for example, in memory 628, such program modules 642 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 642 generally perform the functions and/or methods of the described embodiments of the present invention.
Device 612 may also communicate with one or more external devices 614 (e.g., keyboard, pointing device, display 624, etc.), with one or more devices that enable a user to interact with device 612, and/or with any devices (e.g., network card, modem, etc.) that enable device 612 to communicate with one or more other computing devices. Such communication may occur via input/output (I/O) interfaces 622. Also, the device 612 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) through the network adapter 620. As shown, the network adapter 620 communicates with the other modules of the device 612 via the bus 618. It should be appreciated that although not shown, other hardware and/or software modules may be used in conjunction with the device 612, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Processor 616 executes programs stored in system memory 628 to perform various functional applications and data processing, such as implementing a method for understanding a search sequence provided by embodiments of the present invention, the method comprising:
determining a word vector of each word contained in the labeled search sequence;
hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model which is obtained by training according to the URL site names and click search sequences and click-free search sequences of the URL site names in advance are used as hidden layer parameters, convolutional layer parameters and pooling layer parameters in an initial domain recognition model;
and training the initial domain recognition model to determine parameters of a full communication layer in the initial domain recognition model according to the domain labels of the labeled search sequences and word vectors of all words contained in the labeled search sequences so as to obtain the domain recognition model.
EXAMPLE seven
The seventh embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a method for understanding a search sequence according to the seventh embodiment of the present invention, where the method includes:
determining a word vector of each word contained in the labeled search sequence;
hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model which is obtained by training according to the URL site names and click search sequences and click-free search sequences of the URL site names in advance are used as hidden layer parameters, convolutional layer parameters and pooling layer parameters in an initial domain recognition model;
and training the initial domain recognition model to determine parameters of a full communication layer in the initial domain recognition model according to the domain labels of the labeled search sequences and word vectors of all words contained in the labeled search sequences so as to obtain the domain recognition model.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (16)

1. A method for understanding a search sequence, comprising:
determining a word vector of each word contained in the labeled search sequence;
hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model which is obtained by training according to the URL site names and click search sequences and click-free search sequences of the URL site names in advance are used as hidden layer parameters, convolutional layer parameters and pooling layer parameters in an initial domain recognition model;
and training the initial domain recognition model to determine parameters of a full communication layer in the initial domain recognition model according to the domain labels of the labeled search sequences and word vectors of all words contained in the labeled search sequences so as to obtain the domain recognition model.
2. The method of claim 1, wherein the training of the search sequence CNN model according to the URL site names and the click-to-search sequence and the click-not-search sequence of the URL site names comprises:
acquiring each URL site name and a click search sequence and a click-free search sequence of each URL site name;
determining word vectors of words contained in the clicked search sequence, word vectors of words contained in the non-clicked search sequence and word vectors of words contained in the URL site names;
determining a clicked search vector according to the word vector of each word contained in the clicked search sequence by adopting a first CNN model, determining a non-clicked search vector according to the word vector of each word contained in the non-clicked search sequence by adopting the first CNN model, and determining a site name vector according to the word vector of each word contained in the URL site name by adopting a second CNN model;
and optimizing the first CNN model and the second CNN model according to a first similarity between the clicked search vector and the site name vector and a second similarity between the non-clicked search vector and the site name vector, and taking the optimized first CNN model as the search sequence CNN model.
3. The method of claim 1, wherein after determining the word vector for each word included in the labeled search sequence, further comprising:
hidden layer parameters in a bidirectional RNN language model obtained by training according to a search sequence are used as hidden layer parameters in an initial intention recognition model and an initial slot position recognition model;
training the initial intention recognition model according to the intention labels of the labeled search sequences to determine parameters of a full connected layer in the initial intention recognition model so as to obtain an intention recognition model; or training the initial slot position identification model according to the slot position label of the labeled search sequence to determine the full communication layer parameter and the conditional random field layer parameter in the initial slot position identification model so as to obtain the slot position identification model.
4. The method of claim 3, wherein the bi-directional RNN language model trained according to the search sequence comprises:
determining a word vector of each word contained in the search sequence;
and taking the word vector of each word contained in the search sequence as the input of a bidirectional RNN language model, predicting the next word through a forward circulation neural network in the bidirectional RNN language model, predicting the previous word through a reverse circulation neural network, and adjusting the hidden layer parameter of the forward circulation neural network and the hidden layer parameter of the reverse circulation neural network in the bidirectional RNN language model according to the prediction result.
5. The method of claim 2 or claim 4, wherein determining a word vector for each word contained in a search sequence or URL site name comprises:
segmenting words of the search sequence or the URL site name to obtain all words contained in the search sequence or the URL site name;
and carrying out word, part of speech and named entity identification on each word contained in the search sequence or the URL site name to obtain a word vector of each word contained in the search sequence or the URL site name.
6. The method according to claim 1, wherein the training the initial domain identification model according to the domain labels of the labeled search sequences and the word vectors of the words included in the labeled search sequences to determine parameters of the fully connected layer in the initial domain identification model to obtain the domain identification model comprises:
and carrying out virtual supervision training on the initial field recognition model according to the field labels of the labeled search sequence, the word vectors of all words contained in the labeled search sequence and the word vectors of all words contained in the unlabeled search sequence to determine parameters of a full communication layer in the initial field recognition model so as to obtain the field recognition model.
7. The method of claim 4, wherein the training the initial intent recognition model according to the intent labels of the labeled search sequences to determine the parameters of the fully connected layer in the initial intent recognition model to obtain the intent recognition model comprises:
and carrying out virtual supervision training on the initial intention recognition model according to the intention labels of the labeled search sequence, word vectors of all words contained in the labeled search sequence and word vectors of all words contained in the unlabeled search sequence to determine parameters of a fully connected layer in the initial intention recognition model so as to obtain the intention recognition model.
8. The method of claim 4, wherein said training the initial slot position identification model according to the slot position annotation of the annotated search sequence to determine full connectivity layer parameters and conditional random field layer parameters in the initial slot position identification model to obtain a slot position identification model comprises:
and carrying out virtual supervision training on the initial slot position recognition model according to the slot position label of the labeled search sequence, the word vector of each word contained in the labeled search sequence and the word vector of each word contained in the unlabeled search sequence to determine the parameters of the fully-connected layer and the parameters of the conditional random field layer in the initial slot position so as to obtain the slot position recognition model.
9. An apparatus for understanding a search sequence, comprising:
the word vector determining module is used for determining the word vectors of all words contained in the labeled search sequence;
the model parameter module is used for taking hidden layer parameters, convolutional layer parameters and pooling layer parameters in a search sequence CNN model which is obtained in advance according to the URL site names and the click search sequence and the click-free search sequence of the URL site names as hidden layer parameters, convolutional layer parameters and pooling layer parameters in the initial domain recognition model;
and the domain identification model module is used for training the initial domain identification model to determine parameters of a full communication layer in the initial domain identification model according to the domain labels of the labeled search sequences and word vectors of all words contained in the labeled search sequences so as to obtain the domain identification model.
10. The apparatus according to claim 9, further comprising a CNN model module, specifically configured to:
acquiring each URL site name and a click search sequence and a click-free search sequence of each URL site name;
determining word vectors of words contained in the clicked search sequence, word vectors of words contained in the non-clicked search sequence and word vectors of words contained in the URL site names;
determining a clicked search vector according to the word vector of each word contained in the clicked search sequence by adopting a first CNN model, determining a non-clicked search vector according to the word vector of each word contained in the non-clicked search sequence by adopting the first CNN model, and determining a site name vector according to the word vector of each word contained in the URL site name by adopting a second CNN model;
and optimizing the first CNN model and the second CNN model according to a first similarity between the clicked search vector and the site name vector and a second similarity between the non-clicked search vector and the site name vector, and taking the optimized first CNN model as the search sequence CNN model.
11. The apparatus of claim 9, further comprising an intent/slot identification model module, specifically to:
after word vectors of all words contained in the marked search sequence are determined, hidden layer parameters in a bidirectional RNN language model obtained in advance according to the training of the search sequence are used as hidden layer parameters in an initial intention recognition model and an initial slot position recognition model;
training the initial intention recognition model according to the intention labels of the labeled search sequences to determine parameters of a full connected layer in the initial intention recognition model so as to obtain an intention recognition model; or training the initial slot position identification model according to the slot position label of the labeled search sequence to determine the full communication layer parameter and the conditional random field layer parameter in the initial slot position identification model so as to obtain the slot position identification model.
12. The apparatus according to claim 11, further comprising a bi-directional RNN language model parameter module, specifically configured to:
determining a word vector of each word contained in the search sequence;
and taking the word vector of each word contained in the search sequence as the input of a bidirectional RNN language model, predicting the next word through a forward circulation neural network in the bidirectional RNN language model, predicting the previous word through a reverse circulation neural network, and adjusting the hidden layer parameter of the forward circulation neural network and the hidden layer parameter of the reverse circulation neural network in the bidirectional RNN language model according to the prediction result.
13. The apparatus according to claim 10 or claim 12, further comprising a word vector module, specifically configured to:
segmenting words of the search sequence or the URL site name to obtain all words contained in the search sequence or the URL site name;
and carrying out word, part of speech and named entity identification on each word contained in the search sequence or the URL site name to obtain a word vector of each word contained in the search sequence or the URL site name.
14. The apparatus of claim 9, wherein the domain identification model module is specifically configured to:
and carrying out virtual supervision training on the initial field recognition model according to the field labels of the labeled search sequence, the word vectors of all words contained in the labeled search sequence and the word vectors of all words contained in the unlabeled search sequence to determine parameters of a full communication layer in the initial field recognition model so as to obtain the field recognition model.
15. An apparatus, characterized in that the apparatus comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a method for understanding a search sequence as recited in any of claims 1-8.
16. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out a method of understanding a search sequence according to any one of claims 1 to 8.
CN201711248658.4A 2017-12-01 2017-12-01 Method, device, equipment and storage medium for understanding search sequence Active CN107832476B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711248658.4A CN107832476B (en) 2017-12-01 2017-12-01 Method, device, equipment and storage medium for understanding search sequence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711248658.4A CN107832476B (en) 2017-12-01 2017-12-01 Method, device, equipment and storage medium for understanding search sequence

Publications (2)

Publication Number Publication Date
CN107832476A true CN107832476A (en) 2018-03-23
CN107832476B CN107832476B (en) 2020-06-05

Family

ID=61647472

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711248658.4A Active CN107832476B (en) 2017-12-01 2017-12-01 Method, device, equipment and storage medium for understanding search sequence

Country Status (1)

Country Link
CN (1) CN107832476B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509596A (en) * 2018-04-02 2018-09-07 广州市申迪计算机系统有限公司 File classification method, device, computer equipment and storage medium
CN108846126A (en) * 2018-06-29 2018-11-20 北京百度网讯科技有限公司 Generation, question and answer mode polymerization, device and the equipment of related question polymerization model
CN108874941A (en) * 2018-06-04 2018-11-23 成都知道创宇信息技术有限公司 Big data URL De-weight method based on convolution feature and multiple Hash mapping
CN109165721A (en) * 2018-07-02 2019-01-08 算丰科技(北京)有限公司 Data processing method, data processing equipment and electronic equipment
CN109597993A (en) * 2018-11-30 2019-04-09 深圳前海微众银行股份有限公司 Sentence analysis processing method, device, equipment and computer readable storage medium
CN109710927A (en) * 2018-12-12 2019-05-03 东软集团股份有限公司 Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity
CN110309514A (en) * 2019-07-09 2019-10-08 北京金山数字娱乐科技有限公司 A kind of method for recognizing semantics and device
CN110647617A (en) * 2019-09-29 2020-01-03 百度在线网络技术(北京)有限公司 Training sample construction method of dialogue guide model and model generation method
CN110751285A (en) * 2018-07-23 2020-02-04 第四范式(北京)技术有限公司 Training method and system and prediction method and system of neural network model
CN111046662A (en) * 2018-09-26 2020-04-21 阿里巴巴集团控股有限公司 Training method, device and system of word segmentation model and storage medium
CN111523169A (en) * 2020-04-24 2020-08-11 广东博智林机器人有限公司 Decoration scheme generation method and device, electronic equipment and storage medium
CN112036186A (en) * 2019-06-04 2020-12-04 腾讯科技(深圳)有限公司 Corpus labeling method and device, computer storage medium and electronic equipment
CN112602155A (en) * 2018-08-27 2021-04-02 皇家飞利浦有限公司 Generating metadata for a trained model
CN113378781A (en) * 2021-06-30 2021-09-10 北京百度网讯科技有限公司 Training method and device of video feature extraction model and electronic equipment
CN117574878A (en) * 2024-01-15 2024-02-20 西湖大学 Component syntactic analysis method, device and medium for mixed field

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615767A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Searching-ranking model training method and device and search processing method
CN106354852A (en) * 2016-09-02 2017-01-25 北京百度网讯科技有限公司 Search method and device based on artificial intelligence
CN106407333A (en) * 2016-09-05 2017-02-15 北京百度网讯科技有限公司 Artificial intelligence-based spoken language query identification method and apparatus
CN106649786A (en) * 2016-12-28 2017-05-10 北京百度网讯科技有限公司 Deep question answer-based answer retrieval method and device
CN107256267A (en) * 2017-06-19 2017-10-17 北京百度网讯科技有限公司 Querying method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104615767A (en) * 2015-02-15 2015-05-13 百度在线网络技术(北京)有限公司 Searching-ranking model training method and device and search processing method
CN106354852A (en) * 2016-09-02 2017-01-25 北京百度网讯科技有限公司 Search method and device based on artificial intelligence
CN106407333A (en) * 2016-09-05 2017-02-15 北京百度网讯科技有限公司 Artificial intelligence-based spoken language query identification method and apparatus
CN106649786A (en) * 2016-12-28 2017-05-10 北京百度网讯科技有限公司 Deep question answer-based answer retrieval method and device
CN107256267A (en) * 2017-06-19 2017-10-17 北京百度网讯科技有限公司 Querying method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
奚雪峰,周国栋: "面向自然语言处理的深度学习研究", 《自动化学报》 *
王继民,李雷明子,郑玉凤: "基于日志挖掘的移动搜索用户行为研究综述", 《情报理论与实践》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108509596B (en) * 2018-04-02 2021-06-04 广州市申迪计算机系统有限公司 Text classification method and device, computer equipment and storage medium
CN108509596A (en) * 2018-04-02 2018-09-07 广州市申迪计算机系统有限公司 File classification method, device, computer equipment and storage medium
CN108874941B (en) * 2018-06-04 2021-09-21 成都知道创宇信息技术有限公司 Big data URL duplication removing method based on convolution characteristics and multiple Hash mapping
CN108874941A (en) * 2018-06-04 2018-11-23 成都知道创宇信息技术有限公司 Big data URL De-weight method based on convolution feature and multiple Hash mapping
CN108846126A (en) * 2018-06-29 2018-11-20 北京百度网讯科技有限公司 Generation, question and answer mode polymerization, device and the equipment of related question polymerization model
CN108846126B (en) * 2018-06-29 2021-07-27 北京百度网讯科技有限公司 Generation of associated problem aggregation model, question-answer type aggregation method, device and equipment
CN109165721A (en) * 2018-07-02 2019-01-08 算丰科技(北京)有限公司 Data processing method, data processing equipment and electronic equipment
CN109165721B (en) * 2018-07-02 2022-03-01 算丰科技(北京)有限公司 Data processing method, data processing device and electronic equipment
CN110751285A (en) * 2018-07-23 2020-02-04 第四范式(北京)技术有限公司 Training method and system and prediction method and system of neural network model
CN110751285B (en) * 2018-07-23 2024-01-23 第四范式(北京)技术有限公司 Training method and system and prediction method and system for neural network model
CN112602155A (en) * 2018-08-27 2021-04-02 皇家飞利浦有限公司 Generating metadata for a trained model
CN111046662A (en) * 2018-09-26 2020-04-21 阿里巴巴集团控股有限公司 Training method, device and system of word segmentation model and storage medium
CN111046662B (en) * 2018-09-26 2023-07-18 阿里巴巴集团控股有限公司 Training method, device and system of word segmentation model and storage medium
WO2020107765A1 (en) * 2018-11-30 2020-06-04 深圳前海微众银行股份有限公司 Statement analysis processing method, apparatus and device, and computer-readable storage medium
CN109597993A (en) * 2018-11-30 2019-04-09 深圳前海微众银行股份有限公司 Sentence analysis processing method, device, equipment and computer readable storage medium
CN109710927B (en) * 2018-12-12 2022-12-20 东软集团股份有限公司 Named entity identification method and device, readable storage medium and electronic equipment
CN109710927A (en) * 2018-12-12 2019-05-03 东软集团股份有限公司 Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity
CN112036186A (en) * 2019-06-04 2020-12-04 腾讯科技(深圳)有限公司 Corpus labeling method and device, computer storage medium and electronic equipment
CN110309514A (en) * 2019-07-09 2019-10-08 北京金山数字娱乐科技有限公司 A kind of method for recognizing semantics and device
CN110647617A (en) * 2019-09-29 2020-01-03 百度在线网络技术(北京)有限公司 Training sample construction method of dialogue guide model and model generation method
CN111523169A (en) * 2020-04-24 2020-08-11 广东博智林机器人有限公司 Decoration scheme generation method and device, electronic equipment and storage medium
CN113378781A (en) * 2021-06-30 2021-09-10 北京百度网讯科技有限公司 Training method and device of video feature extraction model and electronic equipment
CN117574878A (en) * 2024-01-15 2024-02-20 西湖大学 Component syntactic analysis method, device and medium for mixed field
CN117574878B (en) * 2024-01-15 2024-05-17 西湖大学 Component syntactic analysis method, device and medium for mixed field

Also Published As

Publication number Publication date
CN107832476B (en) 2020-06-05

Similar Documents

Publication Publication Date Title
CN107832476B (en) Method, device, equipment and storage medium for understanding search sequence
CN107679039B (en) Method and device for determining statement intention
CN107291828B (en) Spoken language query analysis method and device based on artificial intelligence and storage medium
CN110245348B (en) Intention recognition method and system
CN112100356A (en) Knowledge base question-answer entity linking method and system based on similarity
CN108549656B (en) Statement analysis method and device, computer equipment and readable medium
CN112015859A (en) Text knowledge hierarchy extraction method and device, computer equipment and readable medium
CN112836487B (en) Automatic comment method and device, computer equipment and storage medium
CN110334186B (en) Data query method and device, computer equipment and computer readable storage medium
CN113987169A (en) Text abstract generation method, device and equipment based on semantic block and storage medium
CN113128431B (en) Video clip retrieval method, device, medium and electronic equipment
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN113761868B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN113656561A (en) Entity word recognition method, apparatus, device, storage medium and program product
CN111078881A (en) Fine-grained emotion analysis method and system, electronic equipment and storage medium
CN115238691A (en) Knowledge fusion based embedded multi-intention recognition and slot filling model
Tarride et al. A comparative study of information extraction strategies using an attention-based neural network
CN116541492A (en) Data processing method and related equipment
CN116663539A (en) Chinese entity and relationship joint extraction method and system based on Roberta and pointer network
CN115374786A (en) Entity and relationship combined extraction method and device, storage medium and terminal
CN117093687A (en) Question answering method and device, electronic equipment and storage medium
CN114742016A (en) Chapter-level event extraction method and device based on multi-granularity entity differential composition
CN111460224B (en) Comment data quality labeling method, comment data quality labeling device, comment data quality labeling equipment and storage medium
CN113468890A (en) Sedimentology literature mining method based on NLP information extraction and part-of-speech rules
CN116680481A (en) Search ranking method, apparatus, device, storage medium and computer program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant