CN117113987A - Keyword intelligent distinguishing method and system based on user behavior characteristics - Google Patents
Keyword intelligent distinguishing method and system based on user behavior characteristics Download PDFInfo
- Publication number
- CN117113987A CN117113987A CN202310419903.2A CN202310419903A CN117113987A CN 117113987 A CN117113987 A CN 117113987A CN 202310419903 A CN202310419903 A CN 202310419903A CN 117113987 A CN117113987 A CN 117113987A
- Authority
- CN
- China
- Prior art keywords
- search
- user
- word
- sequence
- context
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 239000013598 vector Substances 0.000 claims abstract description 333
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 75
- 230000011218 segmentation Effects 0.000 claims abstract description 42
- 238000007781 pre-processing Methods 0.000 claims abstract description 25
- 230000006399 behavior Effects 0.000 claims description 253
- 238000012549 training Methods 0.000 claims description 125
- 238000012545 processing Methods 0.000 claims description 49
- 239000011159 matrix material Substances 0.000 claims description 24
- 238000011176 pooling Methods 0.000 claims description 9
- 230000004913 activation Effects 0.000 claims description 7
- 238000013507 mapping Methods 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012546 transfer Methods 0.000 claims description 6
- 238000003062 neural network model Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 description 37
- 230000000694 effects Effects 0.000 description 12
- 238000009826 distribution Methods 0.000 description 10
- 238000010586 diagram Methods 0.000 description 6
- 238000003860 storage Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000000306 recurrent effect Effects 0.000 description 3
- 230000003252 repetitive effect Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 238000001035 drying Methods 0.000 description 2
- 230000007774 longterm Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 230000015654 memory Effects 0.000 description 2
- 230000001737 promoting effect Effects 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000002441 reversible effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000006403 short-term memory Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
- G06N3/0455—Auto-encoder networks; Encoder-decoder networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A keyword intelligent distinguishing method and system based on user behavior characteristics are disclosed. Firstly, text preprocessing is carried out on user search behavior data, word segmentation is carried out on the user search behavior data, semantic understanding feature vectors of the user search behavior are obtained through a text convolutional neural network model, then word embedding layers are carried out on search queries input by analyzed users, word embedding vectors are obtained through word embedding layers, context encoders are carried out on the sequences of the word embedding vectors to obtain sequences of context search word feature vectors, semantic correlation matrixes between the context search word feature vectors and the user search behavior semantic understanding feature vectors are calculated to obtain a plurality of semantic correlation matrixes, and finally the semantic correlation matrixes are respectively subjected to a classifier to obtain a plurality of probability values, and search words corresponding to the largest one of the probability values are used as keywords. In this way, keywords can be resolved intelligently.
Description
Technical Field
The application relates to the field of keyword resolution, in particular to an intelligent keyword resolution method and system based on user behavior characteristics.
Background
Keyword resolution refers to the screening and classification of keywords extracted from a piece of text to better understand, generalize, and utilize the keywords. Keyword recognition based on text content means that keywords which can represent the text theme or meaning most automatically are recognized from the text, and the keyword recognition based on the text content has the problems of difficult disambiguation of ambiguities, long tail word processing, noise interference, data sparseness and the like although the keyword recognition based on the text content has wide application in fields such as natural language processing and information retrieval.
The search behavior data refers to related data generated when a user searches on a search engine or other platforms, and the data generally comprises information such as search keywords, search time, search results, click links, stay time and the like, and can reflect the real requirements and interests of the user without being interfered by text information such as titles, abstracts and the like.
Therefore, a keyword intelligent resolution scheme based on user behavior characteristics is desired.
Disclosure of Invention
The present application has been made to solve the above-mentioned technical problems. The embodiment of the application provides a keyword intelligent distinguishing method and system based on user behavior characteristics. Firstly, text preprocessing is carried out on user search behavior data, word segmentation is carried out on the user search behavior data, semantic understanding feature vectors of the user search behavior are obtained through a text convolutional neural network model, then word embedding layers are carried out on search queries input by analyzed users, word embedding vectors are obtained through word embedding layers, context encoders are carried out on the sequences of the word embedding vectors to obtain sequences of context search word feature vectors, semantic correlation matrixes between the context search word feature vectors and the user search behavior semantic understanding feature vectors are calculated to obtain a plurality of semantic correlation matrixes, and finally the semantic correlation matrixes are respectively subjected to a classifier to obtain a plurality of probability values, and search words corresponding to the largest one of the probability values are used as keywords. In this way, keywords can be resolved intelligently.
According to one aspect of the present application, there is provided a keyword intelligent distinguishing method based on user behavior characteristics, including: acquiring search behavior data of an analyzed user; text preprocessing is carried out on the user search behavior data to obtain preprocessed user search behavior data; after word segmentation processing is carried out on the preprocessed user search behavior data, a text convolutional neural network model containing a word embedding layer is used for obtaining a user search behavior semantic understanding feature vector; obtaining a search query entered by the analyzed user; after word segmentation processing is carried out on the search query, a word embedding layer is used for obtaining a sequence of the search word embedding vector; the sequence of the embedded vectors of the search terms passes through a context encoder based on a converter to obtain a sequence of feature vectors of the context search terms; calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors to obtain a plurality of semantic association matrixes; respectively passing the plurality of semantic association matrixes through a classifier to obtain a plurality of probability values; and taking the search word corresponding to the maximum probability value in the plurality of probability values as a keyword.
In the above-mentioned keyword intelligent resolution method based on user behavior features, the method for obtaining the semantic understanding feature vector of the user search behavior by a text convolutional neural network model including a word embedding layer after performing word segmentation processing on the preprocessed user search behavior data includes: word segmentation processing is carried out on the preprocessed user search behavior data so as to convert the preprocessed user search behavior data into a word sequence composed of a plurality of words; mapping each word in the word sequence to a word vector by using a word embedding layer of the text convolutional neural network model to obtain a sequence of word embedding vectors; and passing the sequence of word embedded vectors through the text convolutional neural network model to obtain the user search behavior semantic understanding feature vector.
In the above-mentioned keyword intelligent resolution method based on user behavior features, the step of obtaining the user search behavior semantic understanding feature vector by passing the sequence of the word embedding vector through the text convolutional neural network model includes: and respectively carrying out convolution processing, mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the semantic understanding feature vector of the user search behavior by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is a sequence of the word embedding vectors.
In the above-mentioned keyword intelligent resolution method based on user behavior features, the step of obtaining the sequence of the context keyword feature vectors by passing the sequence of the keyword embedded vectors through a context encoder based on a converter includes: one-dimensional arrangement is carried out on the sequence of the search word embedded vector to obtain a global word sequence feature vector; calculating the product between the global word sequence feature vector and the transpose vector of each search word embedding vector in the sequence of the search word embedding vectors to obtain a plurality of self-attention association matrixes; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and weighting each search word embedding vector in the sequence of search word embedding vectors by taking each probability value in the plurality of probability values as a weight to obtain the sequence of the context search word feature vectors.
In the above-mentioned keyword intelligent resolution method based on user behavior features, calculating a semantic association matrix between each context keyword feature vector and the user search behavior semantic understanding feature vector in the sequence of context keyword feature vectors to obtain a plurality of semantic association matrices, including: calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors according to the following association formula to obtain a plurality of semantic association matrixes; which is a kind of The association formula is as follows:wherein->Each context-search-term feature vector in the sequence representing said context-search-term feature vector, is->Transposed vector representing vector, ">Representing semantic understanding feature vectors of said user search behavior, < >>Representing each of the plurality of semantic incidence matrices,/a semantic incidence matrix of the plurality of semantic incidence matrices>Representing vector multiplication.
The keyword intelligent distinguishing method based on the user behavior characteristics further comprises the training step of: training the text convolutional neural network model containing the word embedding layer, the converter-based context encoder and the classifier; wherein the training step comprises: acquiring training search behavior data of an analyzed user and a true value of a keyword; text preprocessing is carried out on the training search behavior data to obtain user search behavior data after training preprocessing; performing word segmentation on the user search behavior data after training pretreatment, and obtaining a training user search behavior semantic understanding feature vector through the text convolutional neural network model containing the word embedding layer; acquiring a training search query entered by the analyzed user; word segmentation is carried out on the training search query, and then a word embedding layer is used for obtaining a sequence of training search word embedding vectors; the training search word embedded vector sequence passes through the context encoder based on the converter to obtain training context search word characteristic vector sequence; calculating semantic association matrixes between each training context search word feature vector and the training user search behavior semantic understanding feature vector in the sequence of the training context search word feature vectors to obtain a plurality of training semantic association matrixes; calculating the feature vector of each training context search term and the stream refinement loss function value of the semantic understanding feature vector of the training user search behavior; respectively passing the training semantic association matrixes through the classifier to obtain a plurality of classification loss function values; and training the text convolutional neural network model including the word embedding layer, the converter-based context encoder, and the classifier based on a weighted sum of the stream refinement loss function value and the plurality of classification loss function values as a loss function and traveling in a direction of gradient descent.
In the above-mentioned keyword intelligent resolution method based on user behavior features, calculating the stream refinement loss function value of each training context search term feature vector and the training user search behavior semantic understanding feature vector includes: calculating the streaming refinement loss function values of the feature vectors of the training context search term and the training user search behavior semantic understanding feature vector according to the following loss formulas;wherein->Representing feature vectors of said respective training context terms,/->Representing semantic understanding feature vectors of the search behavior of the training user,/->Represents the square of the two norms of the vector, and +.>And->Represents position-by-position subtraction and multiplication of vectors, respectively, ">An exponential operation representing a vector, the exponential operation representing the calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector,representing the streaming refinement loss function value. According to another aspect of the present application, there is provided a keyword intelligent resolution system based on user behavior characteristics, comprising: the search behavior data acquisition module is used for acquiring search behavior data of the analyzed user; the text preprocessing module is used for carrying out text preprocessing on the user search behavior data to obtain preprocessed user search behavior data; the embedded convolution coding module is used for obtaining semantic understanding feature vectors of the user search behaviors through a text convolution neural network model comprising a word embedded layer after word segmentation processing is carried out on the preprocessed user search behavior data; a user input data acquisition module for acquiring a search query input by the analyzed user; the embedding module is used for obtaining a sequence of the search term embedding vector through the term embedding layer after the search query is subjected to the term segmentation processing; the context coding module is used for embedding the search term into the sequence of vectors, and obtaining the sequence of the characteristic vectors of the context search term through a context coder based on a converter; the semantic association module is used for calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors so as to obtain a plurality of semantic association matrixes; the classification module is used for respectively passing the plurality of semantic association matrixes through a classifier to obtain a plurality of probability values; and
And the keyword acquisition module is used for taking the search word corresponding to the maximum probability value in the plurality of probability values as a keyword.
In the above-mentioned intelligent keyword resolution system based on user behavior features, the embedded convolutional encoding module is configured to: word segmentation processing is carried out on the preprocessed user search behavior data so as to convert the preprocessed user search behavior data into a word sequence composed of a plurality of words; mapping each word in the word sequence to a word vector by using a word embedding layer of the text convolutional neural network model to obtain a sequence of word embedding vectors; and passing the sequence of word embedded vectors through the text convolutional neural network model to obtain the user search behavior semantic understanding feature vector.
In the above-mentioned keyword intelligent resolution system based on user behavior features, the step of obtaining the user search behavior semantic understanding feature vector by passing the sequence of the word embedding vector through the text convolutional neural network model includes: and respectively carrying out convolution processing, mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the semantic understanding feature vector of the user search behavior by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is a sequence of the word embedding vectors.
Compared with the prior art, the keyword intelligent distinguishing method and system based on the user behavior features are characterized in that firstly text preprocessing is conducted on user search behavior data, word segmentation is conducted on the user search behavior data, then semantic understanding feature vectors of the user search behavior are obtained through a text convolutional neural network model, then word segmentation is conducted on search queries input by analyzed users, sequences of the search word embedding vectors are obtained through a word embedding layer, then the sequences of the search word embedding vectors are obtained through a context encoder, sequences of context search word feature vectors are obtained, semantic association matrixes among the context search word feature vectors and the user search behavior semantic understanding feature vectors are calculated to obtain a plurality of semantic association matrixes, and finally the plurality of semantic association matrixes are respectively processed through a classifier to obtain a plurality of probability values, and search words corresponding to the largest probability values are used as keywords. In this way, keywords can be resolved intelligently.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art. The following drawings are not intended to be drawn to scale, emphasis instead being placed upon illustrating the principles of the application.
Fig. 1 is an application scenario diagram of a keyword intelligent resolution method based on user behavior characteristics according to an embodiment of the present application.
Fig. 2 is a flowchart of a keyword intelligent resolution method based on user behavior characteristics according to an embodiment of the present application.
Fig. 3 is a schematic architecture diagram of a keyword intelligent resolution method based on user behavior characteristics according to an embodiment of the present application.
Fig. 4 is a flowchart of sub-step S130 of the intelligent keyword resolution method based on user behavior features according to an embodiment of the present application.
Fig. 5 is a flowchart of substep S160 of the intelligent keyword resolution method based on user behavior characteristics according to an embodiment of the present application.
Fig. 6 is a flowchart of training steps further included in the keyword intelligent distinguishing method based on the user behavior characteristics according to the embodiment of the present application.
Fig. 7 is a block diagram of a keyword intelligent resolution system based on user behavior features in accordance with an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are also within the scope of the application.
As used in the specification and in the claims, the terms "a," "an," "the," and/or "the" are not specific to a singular, but may include a plurality, unless the context clearly dictates otherwise. In general, the terms "comprises" and "comprising" merely indicate that the steps and elements are explicitly identified, and they do not constitute an exclusive list, as other steps or elements may be included in a method or apparatus.
Although the present application makes various references to certain modules in a system according to embodiments of the present application, any number of different modules may be used and run on a user terminal and/or server. The modules are merely illustrative, and different aspects of the systems and methods may use different modules.
A flowchart is used in the present application to describe the operations performed by a system according to embodiments of the present application. It should be understood that the preceding or following operations are not necessarily performed in order precisely. Rather, the various steps may be processed in reverse order or simultaneously, as desired. Also, other operations may be added to or removed from these processes.
Hereinafter, exemplary embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present application and not all embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Aiming at the technical problems, the technical conception of the application is as follows: since the searching and browsing actions of the user can reflect the real demands and interests of the user, the user is not interfered by text information such as titles, abstracts and the like, and the user behavior characteristics are expected to be introduced into a keyword resolution scheme. Specifically, keywords are extracted by obtaining search behavior data of an analyzed user and a search query entered by the analyzed user, and combining deep learning and artificial intelligence techniques.
Specifically, in the technical scheme of the application, firstly, search behavior data of an analyzed user is obtained. As described above, the searching and browsing actions of the user can reflect the real demands and interests of the user without being interfered by text information such as titles, abstracts and the like, and compared with the traditional keyword recognition method based on text content, the method based on user actions can reflect the interests, hobbies and behavior habits of the user more accurately. In addition, the method based on the user behavior can also be used for describing and analyzing the user in a finer granularity.
Considering that the user search behavior data generally contains a lot of useless or repeated information, the information has no practical effect on keyword resolution and information understanding, in the technical scheme of the application, text preprocessing is performed on the user search behavior data to obtain preprocessed user search behavior data. For example, the method can be used for clearing and optimizing through removing stop words, drying words and the like in the text preprocessing process, so that useful information is reserved, and the accuracy and efficiency of a model are improved. Specifically, stop words refer to words that often appear in a language but have no practical meaning in text, such as "on", "off", etc., that can be removed when data processing is performed. Word drying refers to converting different forms of words (such as different tenses of verbs, different numbers of nouns, etc.) into basic forms thereof, such as converting "run", "ran", etc. into "run", so that interference of words of different forms on a model can be avoided.
Because the preprocessed user search behavior data is text data, in the technical scheme of the application, after word segmentation processing is carried out on the preprocessed user search behavior data, the text is converted into numerical representation through a text convolutional neural network model containing a word embedding layer, and semantic feature information is extracted from the text convolutional neural network model, so that a user search behavior semantic understanding feature vector is obtained. Here, the word embedding layer may map each word to a vector in a high-dimensional space, thereby preserving semantic relationships between words; the text convolutional neural network model performs feature extraction on the sequence of the word embedding vector by using operations such as a convolutional layer, a pooling layer and the like, so that a feature vector capable of representing the search behavior intention of a user is obtained.
In order to acquire the current search requirement of a user and convert the current search requirement into a numerical representation so as to be convenient for comparison and matching with the historical search behavior of the user, in the technical scheme of the application, firstly, the search query input by the analyzed user is acquired; then, the search query is subjected to word segmentation processing and then passes through a word embedding layer to obtain a sequence of search word embedding vectors.
In consideration of the existence of the context relation among the search words, if the context relation can be captured, the effect of subsequent classification can be greatly improved. Here, the converter-based context encoder may capture correlations between each element in the sequence and global semantic information of the respective element as a background using a self-attention mechanism.
And then, calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors so as to measure semantic similarity between the search term and the user search behavior, thereby obtaining a plurality of semantic association matrixes.
And then, respectively passing the plurality of semantic association matrixes through a classifier to obtain a plurality of probability values, and taking a search word corresponding to the maximum one of the plurality of probability values as a keyword. Wherein the classifier is a neural network model, the input of which is a semantic association matrix, and the output of which is a probability value indicating the likelihood of whether the term is a keyword. In the technical scheme of the application, the keyword which can most represent the user demand and interest can be selected by selecting the keyword corresponding to the maximum probability value as the keyword.
Here, when calculating the semantic correlation matrix between each context keyword feature vector and the user search behavior semantic understanding feature vector in the sequence of context keyword feature vectors to obtain a plurality of semantic correlation matrices, it is essentially mapping the serialized context encoding feature of the keyword represented by each context keyword feature vector and the textual sequence semantic feature representation of the user search behavior represented by the user search behavior semantic understanding feature vector into a high-dimensional correlation feature space, so if the correlation between the context keyword feature vector and the user search behavior semantic understanding feature vector in the correlation space dimension between the serialized text semantic feature representation and the high-dimensional feature can be improved, the expression effect of the semantic correlation matrix can be improved.
Based on the above, the applicant of the present application further introduces feature vectors for each of the context search terms in addition to the classification loss function based on the semantic association matrixAnd said user search behavior semantic understanding feature vector +.>The streaming refinement loss function of (2) is expressed as: />Wherein->Representing the square of the two norms of the vector.
Here, the streaming refinement loss function is based on each of the context term feature vectorsAnd said user search behavior semantic understanding feature vector +.>Conversion of a serialization encoded streaming distribution of text into a spatial distribution within a high-dimensional associated feature space, super-resolution promotion of the spatial distribution within the high-dimensional associated feature space is achieved by interpolation under the sequential distribution of the simultaneous vectors, providing finer alignment of distribution differences within the high-dimensional associated feature space by balancing inter-class probabilistic relationships under the sequence to jointly align in the serialization text semantic feature dimension and the associated feature space dimensionAnd presenting cross inter-dimensional context association, so that the expression effect of the plurality of semantic association matrices is improved, and the accuracy of the plurality of probability values obtained by the plurality of semantic association matrices through the classifier is further improved.
The application has the following technical effects: 1. a keyword intelligent resolution scheme based on user behavior features is provided.
2. According to the intelligent keyword resolution scheme based on the user behavior characteristics, through analysis and understanding of the user search behavior characteristics, the real requirements and interests of the user can be mastered more accurately, and the accuracy of keyword resolution is improved.
Fig. 1 is an application scenario diagram of a keyword intelligent resolution method based on user behavior characteristics according to an embodiment of the present application. As shown in fig. 1, in this application scenario, first, search behavior data of an analyzed user (e.g., D1 illustrated in fig. 1) and a search query input by the analyzed user (e.g., D2 illustrated in fig. 1) are acquired, and then the user search behavior data and the search query are input to a server (e.g., S illustrated in fig. 1) in which a keyword intelligent resolution algorithm based on user behavior characteristics is deployed, wherein the server is capable of using the keyword intelligent resolution algorithm based on user behavior characteristics to acquire a plurality of probability values for the user search behavior data and the search query, and a search term corresponding to a maximum one of the plurality of probability values is used as a keyword.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
Fig. 2 is a flowchart of a keyword intelligent resolution method based on user behavior characteristics according to an embodiment of the present application. As shown in fig. 2, the keyword intelligent distinguishing method based on the user behavior characteristics according to the embodiment of the application comprises the following steps: s110, acquiring search behavior data of an analyzed user; s120, performing text preprocessing on the user search behavior data to obtain preprocessed user search behavior data; s130, performing word segmentation on the preprocessed user search behavior data, and obtaining a user search behavior semantic understanding feature vector through a text convolutional neural network model comprising a word embedding layer; s140, acquiring a search query input by the analyzed user; s150, performing word segmentation processing on the search query, and then obtaining a sequence of search word embedded vectors through a word embedding layer; s160, enabling the sequence of the search term embedded vectors to pass through a context encoder based on a converter to obtain a sequence of context search term feature vectors; s170, calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors to obtain a plurality of semantic association matrixes; s180, respectively passing the plurality of semantic association matrixes through a classifier to obtain a plurality of probability values; and S190, taking the search word corresponding to the maximum probability value in the plurality of probability values as a keyword.
Fig. 3 is a schematic architecture diagram of a keyword intelligent resolution method based on user behavior characteristics according to an embodiment of the present application. As shown in fig. 3, in the network architecture, first, search behavior data of an analyzed user is acquired; then, text preprocessing is carried out on the user search behavior data to obtain preprocessed user search behavior data; then, word segmentation is carried out on the preprocessed user search behavior data, and then a text convolutional neural network model containing a word embedding layer is used for obtaining a user search behavior semantic understanding feature vector; next, obtaining a search query entered by the analyzed user; then, word segmentation is carried out on the search query, and a sequence of search word embedded vectors is obtained through a word embedding layer; then, the sequence of the embedded vectors of the search terms passes through a context encoder based on a converter to obtain a sequence of feature vectors of the context search terms; then, calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors to obtain a plurality of semantic association matrixes; then, the semantic association matrixes respectively pass through a classifier to obtain a plurality of probability values; and finally, taking the search word corresponding to the maximum probability value in the plurality of probability values as a keyword.
More specifically, in step S110, search behavior data of the analyzed user is acquired. The searching and browsing behaviors of the user can reflect the real demands and interests of the user without being interfered by text information such as titles, abstracts and the like, and compared with the traditional keyword recognition method based on text content, the method based on the user behaviors can reflect the interests, hobbies and behavior habits of the user more accurately. In addition, the method based on the user behavior can also be used for describing and analyzing the user in a finer granularity.
More specifically, in step S120, the user search behavior data is text-preprocessed to obtain preprocessed user search behavior data. The user search behavior data usually contains a lot of useless or repeated information, which has no practical effect on keyword resolution and information understanding, and in the technical scheme of the application, text preprocessing is performed on the user search behavior data to obtain preprocessed user search behavior data.
More specifically, in step S130, after the word segmentation processing is performed on the preprocessed user search behavior data, a text convolutional neural network model including a word embedding layer is used to obtain a semantic understanding feature vector of the user search behavior. Because the preprocessed user search behavior data is text data, in the technical scheme of the application, after word segmentation processing is carried out on the preprocessed user search behavior data, the text is converted into numerical representation through a text convolutional neural network model containing a word embedding layer, and semantic feature information is extracted from the text convolutional neural network model, so that a user search behavior semantic understanding feature vector is obtained. Here, the word embedding layer may map each word to a vector in a high-dimensional space, thereby preserving semantic relationships between words; the text convolutional neural network model performs feature extraction on the sequence of the word embedding vector by using operations such as a convolutional layer, a pooling layer and the like, so that a feature vector capable of representing the search behavior intention of a user is obtained.
Accordingly, in a specific example, as shown in fig. 4, after word segmentation processing is performed on the preprocessed user search behavior data, a text convolutional neural network model including a word embedding layer is used to obtain a semantic understanding feature vector of the user search behavior, which includes: s131, performing word segmentation processing on the preprocessed user search behavior data to convert the preprocessed user search behavior data into a word sequence composed of a plurality of words; s132, mapping each word in the word sequence to a word vector by using a word embedding layer of the text convolutional neural network model to obtain a sequence of word embedding vectors; and S133, enabling the sequence of the word embedded vectors to pass through the text convolutional neural network model to obtain the user search behavior semantic understanding feature vector.
Accordingly, in one specific example, passing the sequence of word embedding vectors through the text convolutional neural network model to obtain the user search behavior semantic understanding feature vector includes: and respectively carrying out convolution processing, mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the semantic understanding feature vector of the user search behavior by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is a sequence of the word embedding vectors.
It should be appreciated that convolutional neural network (Convolutional Neural Network, CNN) is an artificial neural network and has wide application in the fields of image recognition and the like. The convolutional neural network may include an input layer, a hidden layer, and an output layer, where the hidden layer may include a convolutional layer, a pooling layer, an activation layer, a full connection layer, etc., where the previous layer performs a corresponding operation according to input data, outputs an operation result to the next layer, and obtains a final result after the input initial data is subjected to a multi-layer operation.
More specifically, in step S140, a search query entered by the analyzed user is acquired. In order to acquire the current search requirement of a user and convert the current search requirement into a numerical representation so as to be convenient for comparison and matching with the historical search behavior of the user, in the technical scheme of the application, firstly, the search query input by the analyzed user is acquired; then, the search query is subjected to word segmentation processing and then passes through a word embedding layer to obtain a sequence of search word embedding vectors.
More specifically, in step S150, the search query is word-segmented and then passed through a word embedding layer to obtain a sequence of search word embedding vectors.
More specifically, in step S160, the sequence of term embedded vectors is passed through a converter-based context encoder to obtain a sequence of context term feature vectors. In consideration of the existence of the context relation among the search words, if the context relation can be captured, the effect of subsequent classification can be greatly improved. Here, the converter-based context encoder may capture correlations between each element in the sequence and global semantic information of the respective element as a background using a self-attention mechanism.
It should be appreciated that by the context encoder, the relationship between a certain word segment and other word segments in the vector representation sequence may be analyzed to obtain corresponding feature information. The context encoder aims to mine for hidden patterns between contexts in the word sequence, optionally the encoder comprises: CNN (Convolutional Neural Network ), recurrent NN (RecursiveNeural Network, recurrent neural network), language Model (Language Model), and the like. The CNN-based method has a better extraction effect on local features, but has a poor effect on Long-Term Dependency (Long-Term Dependency) problems in sentences, so Bi-LSTM (Long Short-Term Memory) based encoders are widely used. The repetitive NN processes sentences as a tree structure rather than a sequence, has stronger representation capability in theory, but has the weaknesses of high sample marking difficulty, deep gradient disappearance, difficulty in parallel calculation and the like, so that the repetitive NN is less in practical application. The transducer has a network structure with wide application, has the characteristics of CNN and RNN, has a better extraction effect on global characteristics, and has a certain advantage in parallel calculation compared with RNN (recurrent neural network).
Accordingly, in one specific example, as shown in fig. 5, passing the sequence of term embedded vectors through a context encoder based on a converter to obtain a sequence of context term feature vectors includes: s161, one-dimensional arrangement is carried out on the sequence of the search word embedded vector so as to obtain a global word sequence feature vector; s162, calculating the product between the global word sequence feature vector and the transpose vector of each search word embedding vector in the sequence of the search word embedding vectors to obtain a plurality of self-attention association matrixes; s163, respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; s164, obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and S165, weighting each search word embedding vector in the sequence of search word embedding vectors by taking each probability value in the plurality of probability values as a weight so as to obtain the sequence of the context search word feature vectors.
More specifically, in step S170, a semantic association matrix between each context keyword feature vector and the user search behavior semantic understanding feature vector in the sequence of context keyword feature vectors is calculated to obtain a plurality of semantic association matrices. In this way, the semantic similarity between the search word and the user search behavior can be measured, so that a plurality of semantic association matrixes are obtained.
Accordingly, in a specific example, calculating a semantic association matrix between each context keyword feature vector and the user search behavior semantic understanding feature vector in the sequence of context keyword feature vectors to obtain a plurality of semantic association matrices includes: calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors according to the following association formula to obtain a plurality of semantic association matrixes; wherein, the association formula is:wherein->Each context-search-term feature vector in the sequence representing said context-search-term feature vector, is->Transposed vector representing vector, ">Representing semantic understanding feature vectors of said user search behavior, < >>Representing each of the plurality of semantic incidence matrices,/a semantic incidence matrix of the plurality of semantic incidence matrices>Representing vector multiplication.
More specifically, in step S180, the plurality of semantic association matrices are respectively passed through a classifier to obtain a plurality of probability values. Wherein the classifier is a neural network model, the input of which is a semantic association matrix, and the output of which is a probability value indicating the likelihood of whether the term is a keyword. In the technical scheme of the application, the keyword which can most represent the user demand and interest can be selected by selecting the keyword corresponding to the maximum probability value as the keyword.
More specifically, in step S190, the keyword corresponding to the largest one of the plurality of probability values is used as the keyword.
Accordingly, in a specific example, the training step is further included: training the text convolutional neural network model containing the word embedding layer, the converter-based context encoder and the classifier; wherein, as shown in fig. 6, the training step includes: s201, training search behavior data of an analyzed user and a true value of a keyword are obtained; s202, performing text preprocessing on the training search behavior data to obtain user search behavior data after training preprocessing; s203, performing word segmentation on the user search behavior data after training pretreatment, and obtaining a training user search behavior semantic understanding feature vector through the text convolutional neural network model containing the word embedding layer; s204, acquiring training search queries input by the analyzed user; s205, performing word segmentation processing on the training search query, and then obtaining a training search word embedded vector sequence through a word embedding layer; s206, enabling the sequence of the training search term embedded vectors to pass through the context encoder based on the converter to obtain a sequence of training context search term feature vectors; s207, calculating semantic association matrixes between each training context search word feature vector and the training user search behavior semantic understanding feature vector in the sequence of the training context search word feature vectors to obtain a plurality of training semantic association matrixes; s208, calculating the feature vectors of the training context search terms and the stream refinement loss function values of the training user search behavior semantic understanding feature vectors; s209, respectively passing the training semantic association matrixes through the classifier to obtain a plurality of classification loss function values; and S210 training the text convolutional neural network model including the word embedding layer, the converter-based context encoder, and the classifier based on a weighted sum of the stream refinement loss function value and the plurality of classification loss function values as a loss function, and traveling in a direction of gradient descent.
Here, when calculating the semantic association matrix between each training context feature vector and the training user search behavior semantic understanding feature vector in the sequence of training context feature vectors to obtain a plurality of training semantic association matrices, the serialized context coding feature of the search word represented by each training context feature vector and the textual sequence semantic feature representation of the user search behavior represented by the training user search behavior semantic understanding feature vector are essentially mapped into a high-dimensional association feature space, so if the correlation between the training context feature vector and the training user search behavior semantic understanding feature vector in the association space dimension between the serialized text semantic feature representation and the high-dimensional feature can be improved, the expression effect of the training semantic association matrix can be improved. Based on the above, the applicant of the present application further introduces a stream refinement loss function for each training context search term feature vector and the training user search behavior semantic understanding feature vector, in addition to the classification loss function based on the training semantic association matrix.
Accordingly, in one specific example, calculating the streaming refinement loss function value of the respective training context term feature vector and the training user search behavior semantic understanding feature vector includes: calculating the streaming refinement loss function values of the feature vectors of the training context search term and the training user search behavior semantic understanding feature vector according to the following loss formulas;wherein, the method comprises the steps of, wherein,representing feature vectors of said respective training context terms,/->Representing semantic understanding feature vectors of the search behavior of the training user,/->Represents the square of the two norms of the vector, and +.>And->Representing the position-wise subtraction and multiplication of vectors respectively,an exponential operation representing a vector representing a calculation of a natural exponential function value raised to a power by a characteristic value of each position in the vector, ">Representing the streaming refinement loss function value.
Here, the streaming refinement loss function is based on the transformation of the serial encoding streaming distribution of each training context search term feature vector and the training user search behavior semantic understanding feature vector in the text to the spatial distribution in the high-dimensional association feature space, and super-resolution promotion of the spatial distribution in the high-dimensional association feature space is realized by synchronously carrying out interpolation under the sequence distribution of the vectors, so that finer alignment is provided for the distribution difference in the high-dimensional association feature space through the inter-class probability relation under the balanced sequence, and cross inter-dimensional context association is jointly presented on the serial text semantic feature dimension and the association feature space dimension, thereby promoting the expression effect of the training semantic association matrices and further promoting the accuracy of the probability values obtained by the training semantic association matrices through the classifier.
In summary, according to the keyword intelligent resolution method based on user behavior features of the embodiment of the application, text preprocessing is performed on user search behavior data, word segmentation is performed on the user search behavior data, then a text convolutional neural network model is used for obtaining user search behavior semantic understanding feature vectors, word segmentation is performed on search queries input by analyzed users, then a word embedding layer is used for obtaining a sequence of search word embedding vectors, the sequence of search word embedding vectors is used for obtaining a sequence of context search word feature vectors through a context encoder, semantic correlation matrixes between each context search word feature vector and the user search behavior semantic understanding feature vectors are calculated to obtain a plurality of semantic correlation matrixes, and finally the plurality of semantic correlation matrixes are respectively used for obtaining a plurality of probability values through a classifier, and search words corresponding to the maximum probability values are used as keywords. In this way, keywords can be resolved intelligently.
Fig. 7 is a block diagram of a keyword intelligent resolution system 100 based on user behavior features in accordance with an embodiment of the present application. As shown in fig. 7, the keyword intelligent resolution system 100 based on user behavior features according to an embodiment of the present application includes: a search behavior data acquisition module 110 for acquiring search behavior data of the analyzed user; the text preprocessing module 120 is configured to perform text preprocessing on the user search behavior data to obtain preprocessed user search behavior data; the embedded convolutional encoding module 130 is configured to perform word segmentation on the preprocessed user search behavior data, and obtain a semantic understanding feature vector of the user search behavior through a text convolutional neural network model including a word embedded layer; a user input data acquisition module 140 for acquiring a search query entered by the analyzed user; the embedding module 150 is configured to obtain a sequence of search term embedded vectors through a term embedding layer after performing term segmentation processing on the search query; a context encoding module 160, configured to insert the search term into the sequence of vectors, and obtain a sequence of feature vectors of the search term by using a context encoder based on a converter; the semantic association module 170 is configured to calculate a semantic association matrix between each context keyword feature vector and the user search behavior semantic understanding feature vector in the sequence of context keyword feature vectors to obtain a plurality of semantic association matrices; the classification module 180 is configured to pass the plurality of semantic association matrices through a classifier to obtain a plurality of probability values; and a keyword obtaining module 190, configured to use, as a keyword, a keyword corresponding to a largest one of the plurality of probability values.
In one example, in the above-described intelligent keyword resolution system based on user behavior features 100, the embedded convolutional encoding module 130 is configured to: word segmentation processing is carried out on the preprocessed user search behavior data so as to convert the preprocessed user search behavior data into a word sequence composed of a plurality of words; mapping each word in the word sequence to a word vector by using a word embedding layer of the text convolutional neural network model to obtain a sequence of word embedding vectors; and passing the sequence of word embedding vectors through the text convolutional neural network model to obtain the user search behavior semantic understanding feature vector.
In one example, in the above-mentioned keyword intelligent resolution system 100 based on user behavior features, passing the sequence of word embedding vectors through the text convolutional neural network model to obtain the user search behavior semantic understanding feature vector includes: and respectively carrying out convolution processing, mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the semantic understanding feature vector of the user search behavior by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is a sequence of the word embedding vectors.
In one example, in the above-described intelligent keyword resolution system based on user behavior features 100, the context encoding module 160 is configured to: one-dimensional arrangement is carried out on the sequence of the search word embedded vector to obtain a global word sequence feature vector; calculating the product between the global word sequence feature vector and the transpose vector of each search word embedding vector in the sequence of the search word embedding vectors to obtain a plurality of self-attention association matrixes; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and weighting each search word embedding vector in the sequence of search word embedding vectors by taking each probability value in the plurality of probability values as a weight to obtain the sequence of the context search word feature vectors.
In one example, in the above-described keyword intelligent resolution system based on user behavior features 100, the semantic association module 170 is configured to: calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors according to the following association formula to obtain a plurality of semantic association matrixes; wherein, the association formula is: Wherein->Each context-search-term feature vector in the sequence representing said context-search-term feature vector, is->Transposed vector representing vector, ">Representing semantic understanding feature vectors of said user search behavior, < >>Representing each of the plurality of semantic incidence matrices,/a semantic incidence matrix of the plurality of semantic incidence matrices>Representing vector multiplication.
In one example, in the above-mentioned keyword intelligent recognition system 100 based on the user behavior feature, the method further includes a training module for training the text convolutional neural network model including the word embedding layer, the context encoder based on the converter, and the classifier; wherein, training module is used for: acquiring training search behavior data of an analyzed user and a true value of a keyword; text preprocessing is carried out on the training search behavior data to obtain user search behavior data after training preprocessing; performing word segmentation on the user search behavior data after training pretreatment, and obtaining a training user search behavior semantic understanding feature vector through the text convolutional neural network model containing the word embedding layer; acquiring a training search query entered by the analyzed user; word segmentation is carried out on the training search query, and then a word embedding layer is used for obtaining a sequence of training search word embedding vectors; the training search word embedded vector sequence passes through the context encoder based on the converter to obtain training context search word characteristic vector sequence; calculating semantic association matrixes between each training context search word feature vector and the training user search behavior semantic understanding feature vector in the sequence of the training context search word feature vectors to obtain a plurality of training semantic association matrixes; calculating the feature vector of each training context search term and the stream refinement loss function value of the semantic understanding feature vector of the training user search behavior; respectively passing the training semantic association matrixes through the classifier to obtain a plurality of classification loss function values; and training the text convolutional neural network model including the word embedding layer, the converter-based context encoder, and the classifier based on a weighted sum of the stream refinement loss function value and the plurality of classification loss function values as a loss function and traveling in a direction of gradient descent.
In one example, in the above-described keyword intelligent resolution system based on user behavior features 100, computing the respective training context term feature vectors and the stream refinement loss function values of the training user search behavior semantic understanding feature vectors includes: calculating the streaming refinement loss function values of the feature vectors of the training context search term and the training user search behavior semantic understanding feature vector according to the following loss formulas;wherein->Representing feature vectors of said respective training context terms,/->Representing semantic understanding feature vectors of the search behavior of the training user,/->Represents the square of the two norms of the vector, and +.>And->Represents position-by-position subtraction and multiplication of vectors, respectively, ">An exponential operation representing a vector, the exponential operation representing the calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector,representing the streaming refinement loss function value.
Here, it will be understood by those skilled in the art that the specific functions and operations of the respective modules in the above-described user behavior feature-based keyword intelligent resolution system 100 have been described in detail in the above description of the user behavior feature-based keyword intelligent resolution method with reference to fig. 1 to 6, and thus, repetitive descriptions thereof will be omitted.
As described above, the keyword intelligent resolution system 100 based on the user behavior feature according to the embodiment of the present application may be implemented in various wireless terminals, for example, a server or the like having a keyword intelligent resolution algorithm based on the user behavior feature. In one example, the keyword intelligent resolution system 100 based on user behavior features according to embodiments of the present application may be integrated into a wireless terminal as a software module and/or hardware module. For example, the user behavior feature-based keyword intelligent resolution system 100 may be a software module in the operating system of the wireless terminal or may be an application developed for the wireless terminal; of course, the intelligent keyword resolution system 100 based on the user behavior characteristics may also be one of a plurality of hardware modules of the wireless terminal.
Alternatively, in another example, the user behavior feature-based keyword intelligent resolution system 100 and the wireless terminal may be separate devices, and the user behavior feature-based keyword intelligent resolution system 100 may be connected to the wireless terminal through a wired and/or wireless network and transmit interaction information in a agreed data format.
According to another aspect of the present application there is also provided a non-volatile computer readable storage medium having stored thereon computer readable instructions which when executed by a computer can perform a method as described above.
Program portions of the technology may be considered to be "products" or "articles of manufacture" in the form of executable code and/or associated data, embodied or carried out by a computer readable medium. A tangible, persistent storage medium may include any memory or storage used by a computer, processor, or similar device or related module. Such as various semiconductor memories, tape drives, disk drives, or the like, capable of providing storage functionality for software.
All or a portion of the software may sometimes communicate over a network, such as the internet or other communication network. Such communication may load software from one computer device or processor to another. For example: a hardware platform loaded from a server or host computer of the video object detection device to a computer environment, or other computer environment implementing the system, or similar functioning system related to providing information needed for object detection. Thus, another medium capable of carrying software elements may also be used as a physical connection between local devices, such as optical, electrical, electromagnetic, etc., propagating through cable, optical cable, air, etc. Physical media used for carrier waves, such as electrical, wireless, or optical, may also be considered to be software-bearing media. Unless limited to a tangible "storage" medium, other terms used herein to refer to a computer or machine "readable medium" mean any medium that participates in the execution of any instructions by a processor.
The application uses specific words to describe embodiments of the application. Reference to "a first/second embodiment," "an embodiment," and/or "some embodiments" means that a particular feature, structure, or characteristic is associated with at least one embodiment of the application. Thus, it should be emphasized and should be appreciated that two or more references to "an embodiment" or "one embodiment" or "an alternative embodiment" in various positions in this specification are not necessarily referring to the same embodiment. Furthermore, certain features, structures, or characteristics of one or more embodiments of the application may be combined as suitable.
Furthermore, those skilled in the art will appreciate that the various aspects of the application are illustrated and described in the context of a number of patentable categories or circumstances, including any novel and useful procedures, machines, products, or materials, or any novel and useful modifications thereof. Accordingly, aspects of the application may be performed entirely by hardware, entirely by software (including firmware, resident software, micro-code, etc.) or by a combination of hardware and software. The above hardware or software may be referred to as a "data block," module, "" engine, "" unit, "" component, "or" system. Furthermore, aspects of the application may take the form of a computer product, comprising computer-readable program code, embodied in one or more computer-readable media.
Unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The foregoing is illustrative of the present invention and is not to be construed as limiting thereof. Although a few exemplary embodiments of this invention have been described, those skilled in the art will readily appreciate that many modifications are possible in the exemplary embodiments without materially departing from the novel teachings and advantages of this invention. Accordingly, all such modifications are intended to be included within the scope of this invention as defined in the following claims. It is to be understood that the foregoing is illustrative of the present invention and is not to be construed as limited to the specific embodiments disclosed, and that modifications to the disclosed embodiments, as well as other embodiments, are intended to be included within the scope of the appended claims. The invention is defined by the claims and their equivalents.
Claims (10)
1. The intelligent keyword distinguishing method based on the user behavior characteristics is characterized by comprising the following steps of: acquiring search behavior data of an analyzed user; text preprocessing is carried out on the user search behavior data to obtain preprocessed user search behavior data; after word segmentation processing is carried out on the preprocessed user search behavior data, a text convolutional neural network model containing a word embedding layer is used for obtaining a user search behavior semantic understanding feature vector; obtaining a search query entered by the analyzed user; after word segmentation processing is carried out on the search query, a word embedding layer is used for obtaining a sequence of the search word embedding vector; the sequence of the embedded vectors of the search terms passes through a context encoder based on a converter to obtain a sequence of feature vectors of the context search terms; calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors to obtain a plurality of semantic association matrixes; respectively passing the plurality of semantic association matrixes through a classifier to obtain a plurality of probability values; and taking the search word corresponding to the maximum probability value in the plurality of probability values as a keyword.
2. The intelligent keyword resolution method based on user behavior features according to claim 1, wherein the word segmentation processing is performed on the preprocessed user search behavior data to obtain the semantic understanding feature vector of the user search behavior through a text convolutional neural network model including a word embedding layer, and the method comprises the following steps: word segmentation processing is carried out on the preprocessed user search behavior data so as to convert the preprocessed user search behavior data into a word sequence composed of a plurality of words; mapping each word in the word sequence to a word vector by using a word embedding layer of the text convolutional neural network model to obtain a sequence of word embedding vectors; and passing the sequence of word embedded vectors through the text convolutional neural network model to obtain the user search behavior semantic understanding feature vector.
3. The method for intelligently distinguishing keywords based on user behavior features according to claim 2, wherein the step of passing the sequence of word embedding vectors through the text convolutional neural network model to obtain the user search behavior semantic understanding feature vector comprises the following steps: and respectively carrying out convolution processing, mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the semantic understanding feature vector of the user search behavior by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is a sequence of the word embedding vectors.
4. A method of intelligent resolution of keywords based on user behavior features according to claim 3, characterized in that embedding the sequence of the term embedded vectors through a context encoder based on a converter to obtain the sequence of context term feature vectors comprises: one-dimensional arrangement is carried out on the sequence of the search word embedded vector to obtain a global word sequence feature vector; calculating the product between the global word sequence feature vector and the transpose vector of each search word embedding vector in the sequence of the search word embedding vectors to obtain a plurality of self-attention association matrixes; respectively carrying out standardization processing on each self-attention correlation matrix in the plurality of self-attention correlation matrices to obtain a plurality of standardized self-attention correlation matrices; obtaining a plurality of probability values by using a Softmax classification function through each normalized self-attention correlation matrix in the normalized self-attention correlation matrices; and weighting each search word embedding vector in the sequence of search word embedding vectors by taking each probability value in the plurality of probability values as a weight to obtain the sequence of the context search word feature vectors.
5. The intelligent keyword resolution method based on user behavior features of claim 4, wherein each context keyword feature vector in the sequence of context keyword feature vectors is computed together with the user searchSemantic association matrices between behavior semantic understanding feature vectors to obtain a plurality of semantic association matrices, including: calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors according to the following association formula to obtain a plurality of semantic association matrixes; wherein, the association formula is:wherein->Each context-search-term feature vector in the sequence representing said context-search-term feature vector, is->Transposed vector representing vector, ">Representing semantic understanding feature vectors of said user search behavior, < >>Representing each of the plurality of semantic incidence matrices,/a semantic incidence matrix of the plurality of semantic incidence matrices>Representing vector multiplication.
6. The intelligent keyword spotting method based on user behavior features of claim 5 further comprising the training step of: training the text convolutional neural network model containing the word embedding layer, the converter-based context encoder and the classifier; wherein the training step comprises: acquiring training search behavior data of an analyzed user and a true value of a keyword; text preprocessing is carried out on the training search behavior data to obtain user search behavior data after training preprocessing; performing word segmentation on the user search behavior data after training pretreatment, and obtaining a training user search behavior semantic understanding feature vector through the text convolutional neural network model containing the word embedding layer; acquiring a training search query entered by the analyzed user; word segmentation is carried out on the training search query, and then a word embedding layer is used for obtaining a sequence of training search word embedding vectors; the training search word embedded vector sequence passes through the context encoder based on the converter to obtain training context search word characteristic vector sequence; calculating semantic association matrixes between each training context search word feature vector and the training user search behavior semantic understanding feature vector in the sequence of the training context search word feature vectors to obtain a plurality of training semantic association matrixes; calculating the feature vector of each training context search term and the stream refinement loss function value of the semantic understanding feature vector of the training user search behavior; respectively passing the training semantic association matrixes through the classifier to obtain a plurality of classification loss function values; and training the text convolutional neural network model including the word embedding layer, the converter-based context encoder, and the classifier based on a weighted sum of the stream refinement loss function value and the plurality of classification loss function values as a loss function and traveling in a direction of gradient descent.
7. The method for intelligent resolution of keywords based on user behavior features according to claim 6, wherein calculating the stream refinement loss function values of the individual training context term feature vectors and the training user search behavior semantic understanding feature vectors comprises: calculating the streaming refinement loss function values of the feature vectors of the training context search term and the training user search behavior semantic understanding feature vector according to the following loss formulas;wherein->Representing feature vectors of said respective training context terms,/->Representing semantic understanding feature vectors of the search behavior of the training user,/->Represents the square of the two norms of the vector, and +.>And->Represents position-by-position subtraction and multiplication of vectors, respectively, ">An exponential operation representing a vector, the exponential operation representing the calculation of a natural exponential function value raised to a power by a eigenvalue of each position in the vector,representing the streaming refinement loss function value.
8. A keyword intelligent resolution system based on user behavior features, comprising: the search behavior data acquisition module is used for acquiring search behavior data of the analyzed user; the text preprocessing module is used for carrying out text preprocessing on the user search behavior data to obtain preprocessed user search behavior data; the embedded convolution coding module is used for obtaining semantic understanding feature vectors of the user search behaviors through a text convolution neural network model comprising a word embedded layer after word segmentation processing is carried out on the preprocessed user search behavior data; a user input data acquisition module for acquiring a search query input by the analyzed user; the embedding module is used for obtaining a sequence of the search term embedding vector through the term embedding layer after the search query is subjected to the term segmentation processing; the context coding module is used for embedding the search term into the sequence of vectors, and obtaining the sequence of the characteristic vectors of the context search term through a context coder based on a converter; the semantic association module is used for calculating semantic association matrixes between each context search term feature vector and the user search behavior semantic understanding feature vector in the sequence of the context search term feature vectors so as to obtain a plurality of semantic association matrixes; the classification module is used for respectively passing the plurality of semantic association matrixes through a classifier to obtain a plurality of probability values; and the keyword acquisition module is used for taking the search word corresponding to the maximum probability value in the plurality of probability values as the keyword.
9. The intelligent keyword spotting system based on user behavior features of claim 8 wherein the embedded convolutional encoding module is to: word segmentation processing is carried out on the preprocessed user search behavior data so as to convert the preprocessed user search behavior data into a word sequence composed of a plurality of words; mapping each word in the word sequence to a word vector by using a word embedding layer of the text convolutional neural network model to obtain a sequence of word embedding vectors; and passing the sequence of word embedded vectors through the text convolutional neural network model to obtain the user search behavior semantic understanding feature vector.
10. The intelligent user behavior feature-based keyword resolution system of claim 9, wherein passing the sequence of word embedding vectors through the text convolutional neural network model to obtain the user search behavior semantic understanding feature vector comprises: and respectively carrying out convolution processing, mean pooling processing and nonlinear activation processing on input data in forward transfer of layers by using each layer of the text convolutional neural network model to output the semantic understanding feature vector of the user search behavior by the last layer of the text convolutional neural network model, wherein the input of the first layer of the text convolutional neural network model is a sequence of the word embedding vectors.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310419903.2A CN117113987A (en) | 2023-04-19 | 2023-04-19 | Keyword intelligent distinguishing method and system based on user behavior characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310419903.2A CN117113987A (en) | 2023-04-19 | 2023-04-19 | Keyword intelligent distinguishing method and system based on user behavior characteristics |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117113987A true CN117113987A (en) | 2023-11-24 |
Family
ID=88797216
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310419903.2A Pending CN117113987A (en) | 2023-04-19 | 2023-04-19 | Keyword intelligent distinguishing method and system based on user behavior characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117113987A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118069920A (en) * | 2024-04-19 | 2024-05-24 | 湖北华中电力科技开发有限责任公司 | Data acquisition system for access of massive multi-network protocol terminal equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020224097A1 (en) * | 2019-05-06 | 2020-11-12 | 平安科技(深圳)有限公司 | Intelligent semantic document recommendation method and device, and computer-readable storage medium |
CN115203380A (en) * | 2022-09-19 | 2022-10-18 | 山东鼹鼠人才知果数据科技有限公司 | Text processing system and method based on multi-mode data fusion |
CN115827939A (en) * | 2022-11-28 | 2023-03-21 | 华东冶金地质勘查局八一五地质队 | Digital archive management system |
CN115880036A (en) * | 2023-02-23 | 2023-03-31 | 山东金潮交通设施有限公司 | Parking stall level dynamic sharing intelligence management and control transaction platform |
CN115982736A (en) * | 2022-12-21 | 2023-04-18 | 南阳理工学院 | Data encryption method and system for computer network information |
-
2023
- 2023-04-19 CN CN202310419903.2A patent/CN117113987A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020224097A1 (en) * | 2019-05-06 | 2020-11-12 | 平安科技(深圳)有限公司 | Intelligent semantic document recommendation method and device, and computer-readable storage medium |
CN115203380A (en) * | 2022-09-19 | 2022-10-18 | 山东鼹鼠人才知果数据科技有限公司 | Text processing system and method based on multi-mode data fusion |
CN115827939A (en) * | 2022-11-28 | 2023-03-21 | 华东冶金地质勘查局八一五地质队 | Digital archive management system |
CN115982736A (en) * | 2022-12-21 | 2023-04-18 | 南阳理工学院 | Data encryption method and system for computer network information |
CN115880036A (en) * | 2023-02-23 | 2023-03-31 | 山东金潮交通设施有限公司 | Parking stall level dynamic sharing intelligence management and control transaction platform |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118069920A (en) * | 2024-04-19 | 2024-05-24 | 湖北华中电力科技开发有限责任公司 | Data acquisition system for access of massive multi-network protocol terminal equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115203380B (en) | Text processing system and method based on multi-mode data fusion | |
CN108875074B (en) | Answer selection method and device based on cross attention neural network and electronic equipment | |
CN108959246B (en) | Answer selection method and device based on improved attention mechanism and electronic equipment | |
CN116010713A (en) | Innovative entrepreneur platform service data processing method and system based on cloud computing | |
CN112966074B (en) | Emotion analysis method and device, electronic equipment and storage medium | |
CN111443165A (en) | Odor identification method based on gas sensor and deep learning | |
CN104834747A (en) | Short text classification method based on convolution neutral network | |
CN116245513B (en) | Automatic operation and maintenance system and method based on rule base | |
CN108537257B (en) | Zero sample image classification method based on discriminant dictionary matrix pair | |
CN116309580B (en) | Oil and gas pipeline corrosion detection method based on magnetic stress | |
CN110019795B (en) | Sensitive word detection model training method and system | |
CN116089648B (en) | File management system and method based on artificial intelligence | |
CN117113987A (en) | Keyword intelligent distinguishing method and system based on user behavior characteristics | |
CN116523583A (en) | Electronic commerce data analysis system and method thereof | |
CN116992304A (en) | Policy matching analysis system and method based on artificial intelligence | |
CN116340506A (en) | Text classification method based on BERT and pooling-free convolutional neural network | |
CN116482524A (en) | Power transmission and distribution switch state detection method and system | |
CN116301166B (en) | Non-band gap reference voltage source circuit | |
CN117787287A (en) | Accident handling method, system and storage medium based on large model | |
CN116402777B (en) | Power equipment detection method and system based on machine vision | |
CN112800217A (en) | Vector relevance matrix-based intelligent assessment method for government affair transaction processing report | |
CN116467485A (en) | Video image retrieval construction system and method thereof | |
CN117708324A (en) | Text topic classification method, device, chip and terminal | |
CN116468894A (en) | Distance self-adaptive mask generation method for supervised learning of lithium battery pole piece | |
CN116757773A (en) | Clothing electronic commerce sales management system and method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |