CN112182144A

CN112182144A - Search term normalization method, computing device, and computer-readable storage medium

Info

Publication number: CN112182144A
Application number: CN202011374977.1A
Authority: CN
Inventors: 杨涵; 陈广顺
Original assignee: Zhenkunxing Network Technology Nanjing Co ltd
Current assignee: Zhenkunxing Network Technology Nanjing Co ltd
Priority date: 2020-12-01
Filing date: 2020-12-01
Publication date: 2021-01-05
Anticipated expiration: 2040-12-01
Also published as: CN112182144B

Abstract

The present disclosure provides a search term normalization method, a computing device, and a computer-readable storage medium. The method comprises the following steps: constructing a plurality of pieces of training data based on historical data of a plurality of users; constructing a self-attention model of the depth semantic similarity model and training the self-attention model by utilizing the plurality of pieces of training data; determining a first similarity between the search term vector and the positive sample vector and a second similarity between the search term vector and the negative sample vector, respectively; determining a similarity loss function based on the first similarity and the second similarity; determining a loss function of the deep semantic similarity model based on the search term vectors, the positive sample vectors and the negative sample vectors of the plurality of pieces of training data; generating a standard product thesaurus index based on the self-attention model and a standard product database; and determining a standard product name for the target search term input by the specific user based on the self-attention model and the standard product thesaurus index.

Description

Search term normalization method, computing device, and computer-readable storage medium

Technical Field

The present invention relates generally to the field of machine learning, and more particularly, to a search term normalization method, a computing device, and a computer-readable storage medium.

Background

With the continuous development of networks, more and more users meet shopping demands through the e-commerce search system. However, in many cases, the search terms entered by the user are not standard product names, such that the search results may contain a large amount of useless product information. In particular, in the field of industrial sales, product names often have professional and standardized expressions. With conventional search methods, when an irregular search word is input, it is difficult to accurately hit a desired product or a large number of redundant products are found in a recall result, so that the user experience is poor. For example, when a user wishes to find a product such as "water hose clamp", he may input a search term such as "clamp for water hose" if he does not know the normalized product name, and the search system may recall products containing "water hose" and "clamp" respectively after segmenting the search term, which may not exactly match the user's needs.

Disclosure of Invention

Aiming at the problems, the invention provides a search term standardization scheme, which rewrites any target search term input by a user into a standard search term by constructing an improved deep semantic similarity model and a standard product word stock index so as to obtain a more accurate recall result.

According to one aspect of the present invention, a search term normalization method is provided. The method comprises the following steps: constructing a plurality of pieces of training data based on historical data of a plurality of users, wherein each piece of training data comprises a search word, a positive sample and a negative sample; constructing a self-attention model of a depth semantic similarity model and training the self-attention model by utilizing the plurality of pieces of training data, wherein for each piece of training data, the self-attention model outputs a search word vector, a positive sample vector and a negative sample vector; respectively determining a first similarity between the search word vector and the positive sample vector and a second similarity between the search word vector and the negative sample vector based on the search word vector, the positive sample vector and the negative sample vector of each set of training data; determining a similarity loss function based on the first similarity and the second similarity; determining a loss function of the deep semantic similarity model based on the search term vectors, the positive sample vectors and the negative sample vectors of the plurality of pieces of training data; generating a standard product thesaurus index based on the self-attention model and a standard product database; and determining a standard product name for the target search term input by the specific user based on the self-attention model and the standard product thesaurus index.

In one embodiment, constructing the plurality of pieces of training data comprises: determining a search term from the historical data of the plurality of users; determining, as the positive sample, product data in which a user who inputs the search word has performed the predetermined operation on a search result, wherein the predetermined operation includes at least one of clicking, joining a shopping cart, and purchasing; and determining the negative examples based on the standard product database.

In one embodiment, the negative examples include at least one of: product data randomly selected from the standard product database; and product data in the standard product database having the same parent category as the positive sample.

In one embodiment, training the self-attention model using the plurality of pieces of training data comprises: performing character-level word embedding on a search word, a positive sample and a negative sample of each piece of training data respectively to obtain character-level word embedding vectors of the search word, the positive sample and the negative sample respectively; respectively performing odd-even position coding on a search word, a positive sample and a negative sample of each piece of training data to respectively obtain position coding vectors of the search word, the positive sample and the negative sample; merging and normalizing the character-level word embedding vectors and the position coding vectors of the search words, the positive samples and the negative samples respectively to obtain normalized vectors of the search words, the positive samples and the negative samples; in each of at least one self-attention head, performing an operation on the normalized vectors of the search word, the positive sample, and the negative sample using a self-attention function to obtain self-attention vectors of the search word, the positive sample, and the negative sample, respectively; in each self-attention head, operating on the self-attention vectors of the search word, the positive sample and the negative sample by using a nonlinear activation function to obtain full-connected vectors of the search word, the positive sample and the negative sample; and averaging at least one fully-connected vector of the search word, the positive sample, and the negative sample obtained from the at least one self-attention head to obtain the search word vector, the positive sample vector, and the negative sample vector, respectively.

In one embodiment, determining the loss function of the depth semantic similarity model comprises: determining a first similarity between the search word vector and the positive sample vector and a second similarity between the search word vector and the negative sample vector based on the search word vector, the positive sample vector, and the negative sample vector of each piece of training data, respectively; determining a similarity loss function based on the first similarity and the second similarity; determining a category classification loss function based on categories of search terms and search term vectors of the plurality of pieces of training data; determining a loss function of a depth semantic similarity model based on the similarity loss function and the category classification loss function.

In one embodiment, the similarity Loss function comprises a triple Loss function (Triplet Loss).

In one embodiment, generating the standard product thesaurus index comprises: acquiring the standard product database; inputting each standard product name in the standard product database into the self-attention model to generate a name vector for the standard product name; and operating the name vectors of all the standard product names in the standard product database by using a locality sensitive hashing algorithm based on a random hyperplane to generate a hash index tree of the name vectors of all the standard product names in the standard product database, wherein in the hash index tree, hash indexes of adjacent name vectors are adjacent.

In one embodiment, determining a standard product name for a target search term entered by a particular user based on the self-attention model and the standard product thesaurus index comprises: receiving a target search word input by the specific user; inputting the target search term into the self-attention model to generate a target search term vector for the target search term; recalling a target name vector corresponding to the target search word vector in the standard product thesaurus index; and acquiring a standard product name corresponding to the target name vector as the standard product name of the target search term.

According to another aspect of the invention, a computing device is provided. The computing device includes: at least one processor; and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor causing the computing device to perform steps according to the above-described method.

According to yet another aspect of the present invention, a computer-readable storage medium is provided, having stored thereon computer program code, which when executed performs the method as described above.

By utilizing the scheme of the invention, any search word input by a user can be rewritten into a standard search word (namely a standard product name) by constructing the improved deep semantic similarity model and the standard product word bank index, so that a more accurate recall result can be obtained based on the standard search word.

Drawings

The invention will be better understood and other objects, details, features and advantages thereof will become more apparent from the following description of specific embodiments of the invention given with reference to the accompanying drawings.

FIG. 1 shows a schematic diagram of a system for implementing a search term normalization method according to an embodiment of the invention.

FIG. 2 illustrates a flow diagram of a search term normalization method according to some embodiments of the invention.

FIG. 3 shows a flowchart of steps for constructing pieces of training data, according to an embodiment of the invention.

FIG. 4 is a schematic structural diagram of a deep semantic similarity model according to an embodiment of the present invention.

FIG. 5 is a flowchart illustrating steps for training a self-attention model using training data according to an embodiment of the present invention.

FIG. 6 shows a flowchart of steps for determining a loss function for a depth semantic similarity model, according to an embodiment of the invention.

FIG. 7 is a flowchart illustrating the steps of generating a standard product thesaurus index according to an embodiment of the present invention.

FIG. 8 shows a flowchart of the steps of recalling a target standard product according to an embodiment of the present invention.

FIG. 9 illustrates a block diagram of a computing device suitable for implementing embodiments of the present invention.

Detailed Description

Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In the following description, for the purposes of illustrating various inventive embodiments, certain specific details are set forth in order to provide a thorough understanding of the various inventive embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details. In other instances, well-known devices, structures and techniques associated with this application may not be shown or described in detail to avoid unnecessarily obscuring the description of the embodiments.

Throughout the specification and claims, the word "comprise" and variations thereof, such as "comprises" and "comprising," are to be understood as an open, inclusive meaning, i.e., as being interpreted to mean "including, but not limited to," unless the context requires otherwise.

Reference throughout this specification to "one embodiment" or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment. Thus, the appearances of the phrases "in one embodiment" or "in some embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the terms first, second and the like used in the description and the claims are used for distinguishing objects for clarity, and do not limit the size, other order and the like of the described objects.

Fig. 1 shows a schematic diagram of a system 1 for implementing a hierarchical ordering method according to an embodiment of the invention. As shown in fig. 1, the system 1 includes a user terminal 10, a computing device 20, a server 30, and a network 40. User terminal 10, computing device 20, and server 30 may interact with data via network 40. Here, each user terminal 10 may be a mobile or fixed terminal of an end user, such as a mobile phone, a tablet computer, a desktop computer, or the like. The user terminal 10 may communicate with a server 30 of the electronic commerce enterprise, for example, through an electronic commerce enterprise application or a specific search engine installed thereon, to send information to the server 30 and/or receive information from the server 30. The computing device 20 performs corresponding operations based on data from the user terminal 10 and/or the server 30. The computing device 20 may include at least one processor 210 and at least one memory 220 coupled to the at least one processor 210, the memory 220 having stored therein instructions 230 executable by the at least one processor 210, the instructions 230, when executed by the at least one processor 210, performing at least a portion of the method 100 as described below. Note that herein, computing device 20 may be part of server 30 or may be separate from server 30. The specific structure of computing device 20 or server 30 may be described, for example, in connection with FIG. 9, below.

FIG. 2 illustrates a flow diagram of a search term normalization method 100 according to some embodiments of the invention. The method 100 may be performed, for example, by the computing device 20 or the server 30 in the system 1 shown in fig. 1. The method 100 is described below in conjunction with fig. 1-9, with an example being performed in the computing device 20.

As shown in FIG. 2, method 100 includes step 110, where computing device 20 constructs a plurality of pieces of training data based on historical data of a plurality of users. Each piece of training data may include a search term (query), a positive sample (positive _ item), and a negative sample (negative _ item). The positive sample positive _ item indicates product data on which a user has performed a predetermined operation among search results based on the search term query, and the negative sample negative _ item indicates product data unrelated to the search results based on the search term query.

FIG. 3 shows a flowchart of step 110 for constructing pieces of training data, according to an embodiment of the invention.

As shown in fig. 3, step 110 may include sub-step 112 in which computing device 20 may determine a search term from historical data for a plurality of users. Here, the history data of the plurality of users may be stored in the server 30 or an external database associated therewith. In constructing the training data, computing device 20 may retrieve the historical data for these users from memory 30 or an external database. One piece of history data may include, for example, user information (e.g., a user ID), a search word, a search result (e.g., a searched product list) corresponding to the search word, a user operation, product data corresponding to the user operation (e.g., product data in which the user performed a specific operation in the search result), and other information (e.g., time information, etc.).

In sub-step 112, computing device 20 may extract the search term from a piece of historical data.

In sub-step 114, computing device 20 determines (e.g., from the historical data described above) as positive sample positive _ item product data for which the user entering the search term performed a predetermined operation on the search results. For example, the positive sample positive _ item may be product data in which the user performed at least one of a click operation, a join shopping cart operation, and a purchase operation. For one search performed by one user, if the user performs any one of the above predetermined operations on a plurality of product data in the search result, one positive sample may be generated for the product data of all the operations, respectively, or only one or more product data thereof may be selected to generate one positive sample, respectively. That is, a piece of historical data may produce one or more positive samples (and thus one or more pieces of training data). Of course, a piece of historical data may not produce any positive samples (and thus training data), such as when the user has not performed any predetermined operation on any product in the search results. In this case, the piece of historical data is not used to construct training data.

In sub-step 116, computing device 20 determines a negative sample negative item of the piece of training data based on the standard product database. The standard product database may be a product database constructed by an electronic commerce industry in which a user performs retrieval, and may be stored in advance in the server 30 of the electronic commerce industry or an external storage connected thereto. In sub-step 116, computing device 20 may send a request to server 30 or the external storage and receive product data back from server 30 or the external storage. The negative sample negative item should be product data that is substantially unrelated to the search results for the search term.

In one embodiment, the negative sample negative item may be product data randomly selected from the standard product database. Since a large amount of product data is usually contained in the standard product database, it is unlikely that randomly selected product data is the same as or similar to the commodity data of the positive sample positive _ item, and thus can be considered as sample data completely unrelated to the positive sample positive _ item.

The negative sample negative _ item may also be product data in the standard product database that has the same parent category as the corresponding positive sample positive _ item. Typically, in a standard product database, each piece of product data has its corresponding category data, and the category data for a plurality of products is typically stored in a tree structure in the server 30 or an external storage connected thereto. Each product data may correspond to multiple categories, one category first, one category second, one category third, … …, etc. from top to bottom. The number of levels of the tree structure of category data may vary from enterprise to enterprise and from product to product. In some embodiments according to the invention, the level number of the class number data is 4, the class number of the lowest one level of each product data is called a final level class number (may also be called a level four class number in the case of the level number 4), the class number of the last one level thereof is called a parent level class number (may also be called a level three class number in the case of the level number 4), the last one level of the parent level class is called a grandparent level class number (may also be called a level two class number in the case of the level number 4), and the last one level of the grandparent level class is called a great-parent level class number (may also be called a level one class number in the case of the level number 4). Thus, product data having the same parent category as the positive sample positive _ item refers to product data in other four-level categories having the same three-level category as the positive sample positive _ item. If the number of hierarchies is greater than 4, the analogy can be performed in the upward direction, and if the number of hierarchies is less than 4, the classification can be sequentially called a primary category and a secondary category … … from top to bottom.

Alternatively, negative examples in the plurality of pieces of training data may use a combination of the two different ways described above. For example, negative examples in one portion of training data (e.g., 80% training data) may be determined based on a random selection, and negative examples in another portion of training data (e.g., 20% training data) may be determined based on the same parent category.

At step 110, a plurality of pieces of training data may be constructed in the manner described above in sub-steps 112 to 116. For example, an instance of a piece of training data may be represented as:

it can be seen that, in the piece of training data, the query word query is "infrared thermometer", the positive sample positive _ item is "ulide infrared thermometer UT300A +", and the negative sample negative _ item is "FLUKE-802 CN". In the present example, the product data of the positive and negative samples include brands "ulide" and "folk", product names "infrared thermometer" and "vibration tester", and models "UT 300A +" and "FLUKE-802 CN", respectively, although those skilled in the art will appreciate that the product data of the positive and negative samples may include more or less information.

Continuing with FIG. 2, at step 120, computing device 20 builds a self-attention model of the deep semantic similarity model and trains the self-attention model using the pieces of training data built at step 110.

FIG. 4 shows a structural schematic diagram of a deep semantic similarity model 400 according to an embodiment of the invention. As shown in fig. 4, the depth semantic similarity model 400 may include a self-attention model 410, a similarity calculation layer 420, and a fully connected classification layer 430.

In the model training phase, the self-attention model 410 may input each piece of training data (including search term query, positive sample positive _ item, and negative sample negative _ item) constructed as described above, and output a corresponding search term vector query _ vec, positive sample vector positive _ item _ vec, and negative sample vector negative _ item _ vec, respectively, as shown in fig. 4. That is, although the search term query, the positive sample positive _ item, and the negative sample negative _ item are input from the attention model 410 as one piece of training data, they are processed separately and get respective vectors in the self-attention model 410. In the model using stage, the self-attention model 410 may input a target search term query input by the user and output a target search term vector query _ vec of the target search term query (not shown in the figure). The self-attention model 410 is described below primarily in terms of a model training phase.

As shown in fig. 4, the self-attention model 410 of the depth semantic similarity model 400 may include a character-level word embedding layer 411, a position coding layer 412, a merging layer 413, a normalization layer 414, at least one self-attention head (each including a self-attention layer 415 and a first fully-connected layer 416). For the case where multiple self-attention heads are included in the self-attention model 410 (e.g., "self-attention head x 3" shown in fig. 4 indicates that there are 3 self-attention heads), the self-attention model 410 may also include a multi-head self-attention merging layer 417. The functions of the various components of self-attention model 410 are described in detail below with reference to FIG. 5.

The similarity calculation layer 420 of the depth semantic similarity model 400 may include a similarity calculation layer 422 and a similarity loss function determination layer 424. The functions of the various components of the similarity calculation layer 420 will be described in detail below with reference to fig. 6.

The fully-connected classification layer 430 of the depth semantic similarity model 400 may include a second fully-connected layer 432 and a category classification loss function determination layer 434. The functions of the various components of the fully connected taxonomy layer 430 will be described in detail below with reference to fig. 6.

FIG. 5 shows a flowchart of step 120 of training a self-attention model 410 with training data, according to an embodiment of the present invention.

As shown in fig. 5, step 120 may include sub-step 121, where at character-level word embedding layer 411, computing device 20 performs character-level word embedding on the search word query, positive sample positive _ item, and negative sample negative _ item, respectively, of each piece of training data to obtain character-level word embedding vectors for the search word query, positive sample positive _ item, and negative sample negative _ item, respectively

、

And

(hereinafter collectively referred to as

）。

In one embodiment, each character of the input text may be embedded as oneDWei (A)DIs an adjustable parameter value, which may be equal to 256, for example) vector to determine the character-level word embedding vector for the input text. For example, a character-level word embedding vector may be determined according to the following formula (1)

：

（1）

Wherein

A character-level word embedding vector (which is a single word-embedding vector) representing input text (e.g., input search word query, positive sample positive _ item, and negative sample negative _ item)D*NA matrix),

is expressed as length ofNIn the input text ofiThe number of the characters is one,

is shown asiCharacter-level word embedding corresponding to each character,

is composed ofDThe vector of real numbers is then measured in dimensions,

as determined by a hash lookup.

In sub-step 122, at position encoding layer 412, computing device 20 performs parity position encoding on the search term query, positive sample positive _ item, and negative sample negative _ item, respectively, of each piece of training data to obtain position encoding vectors for the search term query, positive sample positive _ item, and negative sample negative _ item, respectively

、

And

(hereinafter collectively referred to as

). For example, the position-coding vector may be determined according to the following formula (2)

：

（2）

Wherein the content of the first and second substances,

a position-encoding vector (which is also a single vector) representing input text (e.g., input search term query, positive sample positive _ item, and negative sample negative _ item)D*NA matrix),

to representDNumber of dimension real number vectordThe ratio of vitamin to vitamin is,

is as followsiCorresponding to a characterDDimension position coding vectorP ⁱTo (1) adAnd (4) dimension value. Wherein in the calculation

And calling different position coding functions according to the odd-even positions, and obtaining the relative position coding vectors between different positions through the trigonometric function relation of sine (sin) and cosine (cos) functions.

Although sub-step 122 is shown in fig. 5 as being subsequent to sub-step 121, it will be understood by those skilled in the art that sub-step 122 may be performed in parallel with sub-step 121 or before sub-step 121.

Next, in sub-step 123, at merge layer 413 and normalization layer 414, computing device 20 embeds vectors into character-level words of search term query, positive sample positive _ item, and negative sample negative _ item, respectively

And a position-coding vector

Merging and normalizing to obtain normalized vectors of search term query, positive sample positive _ item and negative sample negative _ item respectively

、

And

(hereinafter collectively referred to as

）。

For example, at the merge layer 413, a character-level word embedding vector may be determined according to the following equation (3)

And a position-coding vector

Is combined with the vector

(it is also one)D*NMatrix):

（3）

i.e. using matrix additionOperation is embedding vector for character level word

And a position-coding vector

And merging.

Further, at the normalization layer 414, a normalized vector may be determined according to equation (4) below

To (1) aiGo to the firstdColumn elements:

（4）

wherein the content of the first and second substances,

merging vector representing input

To (1) aiA vector of (P ⁱ+C ⁱ) To (1) adThe value of the dimension is set as the value,

merging vector representing input

To (1) adDimension value in lengthNThe average value of the following values,

merging vector representing input

To (1) adDimension value in lengthNStandard deviation of the following. In the self-attention model 410,

and

is thatDThe parameters may be trained dimensionally.

Next, at sub-step 124, at the self-attention layer 415 in each self-attention head, computing device 20 normalizes the normalized vector of search term query, positive sample positive _ item, and negative sample negative _ item using the self-attention function

Performing operations to obtain self-attention vectors of search term query, positive sample positive _ item and negative sample negative _ item

、

And

(hereinafter collectively referred to as

）。

For example, the self-attention vector may be determined according to the following equation (5)

：

（5）

Wherein the content of the first and second substances,Q、K、Vrepresentative inputx(i.e., normalized vectors for the search term query, positive sample positive _ item, and negative sample negative _ item, respectively

) Respectively associated with trainable parametersW _q、W _k、W _vAs a result of the dot-product,K ^TrepresentsKThe transpose of (a) is performed,

is an inputxDimension (d) ofDSquare (to avoid gradient vanishing).

The result of (A) is a self-attention weight, andVafter the multiplication is performed, the input isxSelf-attention weighting.

Then, in sub-step 125, at the first fully-connected layer 416 in each self-attention head, computing device 20 utilizes a non-linear activation function to self-attention vectors for the search term query, positive sample positive _ item, and negative sample negative _ item, respectively

Operate to obtain a fully-connected vector of search term query, positive sample positive _ item, and negative sample negative _ item

、

And

(hereinafter collectively referred to as

）。

For example, the full-connection vector may be determined according to the following equation (6)

：

（6）

Wherein the content of the first and second substances,W、bis a parameter that can be trained in a way that,ReLuis a non-linear activation function.

By including the first fully-connected layer 416 in the self-attention head, the non-linearity of the depth semantic similarity model 400 can be increased.

In the case where self-attention model 410 includes only one self-attention head, the fully-connected vectors obtained by sub-step 125

、

And

namely the search term vector query _ vec, positive sample vector positive _ item _ vec, and negative sample vector negative _ item _ vec output from the attention model 410.

In the case where self-attention model 410 includes multiple self-attention heads, step 120 may further include sub-step 126, where computing device 20, at multi-headed self-attention merge layer 417, concatenates multiple fully-connected vectors of search terms query, positive sample positive _ item, and negative sample negative _ item obtained from multiple self-attention heads

、

And

averaging is performed to obtain a search term vector query _ vec, a positive sample vector positive _ item _ vec, and a negative sample vector negative _ item _ vec, respectively.

For example, a plurality of fully-connected vectors obtained from a plurality of attention heads may be processed according to the following formula (7)

Carrying out averaging:

（7）

wherein the content of the first and second substances,

representsJFirst in the self-attention headjThe head of the individual self-attention is,

representsDNumber of dimension real number vectordThe ratio of vitamin to vitamin is,

i.e. the full join vector obtained in substep 125

，

The operation is thatJIs averaged over the dimension of (a) to obtainDMulti-headed self-attention vector of dimension

As corresponding search term vector query _ vec, positive sample vector positive _ item _ vec, and negative sample vector negative _ item _ vec.

Continuing with FIG. 2, at step 130, computing device 20 may determine a loss function for depth semantic similarity model 400 based on search term vector query _ vec, positive sample vector positive _ item _ vec, and negative sample vector negative _ item _ vec for the plurality of pieces of training data. In the deep semantic similarity model 400 according to the present invention, the loss function of the model can be considered in terms of both similarity and product category.

FIG. 6 shows a flowchart of step 130 for determining a loss function of the depth semantic similarity model 400 according to an embodiment of the present invention. The implementation of step 130 is further described below in conjunction with fig. 4 and 6.

As shown in fig. 6, step 130 may include sub-step 132, wherein at similarity calculation layer 422, computing device 20 determines a first similarity between the search term vector query _ vec and the positive sample vector positive _ item _ vec and a second similarity between the search term vector query _ vec and the negative sample vector negative _ item _ vec, respectively, based on the search term vector query _ vec, the positive sample vector positive _ item _ vec, and the negative sample vector negative _ item _ vec of each piece of training data.

In one embodiment, the first similarity and the second similarity may be determined using cosine similarity. For example, the first similarity may be determined based on the following formulas (8) and (9), respectively

And a second degree of similarity

：

（8）

（9）

That is, the similarity between the search term Query and the positive sample positive _ item or the negative sample negative _ item is defined as a search term vector Query _ vec ()Q) And a positive sample vector positive _ item _ vec (I ⁺) Or a negative sample vector negative _ item _ vec (I ^-) Divided by the scalar multiplication of their modulo.

Next, at sub-step 134, at similarity loss function determination level 424, computing device 20 determines a first similarity based on the first similarity

And a second degree of similarity

A similarity loss function is determined.

In an embodiment of the present invention, the similarity Loss function may include a triple Loss function (triple Loss). For example, the similarity loss function may be determined using the following equation (10)

：

（10）

Wherein the content of the first and second substances,mthe boundary distance value is a minimum difference value that can be used to control the similarity of the positive and negative samples, and the value can be between 0.2 and 0.6 (for example, the preferred value is 0.4), that is, the similarity of the positive samples (the first similarity)

) Is more similar than the negative sample (second similarity)

) At least 0.4 greater.

At sub-step 136, at full-connection classification layer 430, computing device 20 determines a category classification loss function based on the categories of the search term query and the search term vector query _ vec of the plurality of pieces of training data

。

More specifically, at a second fully-connected layer 432 of fully-connected classification layer 430, computing device 20 may determine a second fully-connected vector of search term vector query _ vec in a manner similar to equation (6) in sub-step 125 aboveFeedForward2. The difference from the above substep 125 is that for the second fullThe connection layer 432 is provided on the substrate,Wis defined asD,M]The ratio of vitamin to vitamin is,Dis the dimension of the input search term vector query vec,Mis the total number of categories (e.g., parent categories of the last category) of products in the standard goods database. During training of depth semantic similarity model 400, such as determining positive sample positive _ item in step 110 or sub-step 114 described above, computing device 20 may determine a Category of positive sample positive _ item (e.g., may be a parent Category of a last Category), and will determine a Category of positive sample positive _ itemWThe element value (representing the probability that the positive sample positive _ item belongs to the Category) in (1) corresponding to the determined Category is set to 1, and the other element values are set to 0.

Thus, in the using stage of the trained deep semantic similarity model 400, when a target search term vector query _ vec of a target search term is input, the second fully-connected layer 432 may output a target search term vector query _ vecMSecond fully-connected vector of dimensionsFeedForward2Each dimension corresponds to the probability that the target search term query is in the corresponding category.

Then, at category classification loss function determination layer 434 of full-join classification layer 430, computing device 20 may base this second full-join vector onFeedForward2Determining a category classification penalty function

。

For example, the category classification loss function may be determined using the following equation (11)

：

（11）

This is a binary cross-entropy loss function, where,Mis the total number of categories (e.g. parent categories of the last category) of products in the standard goods database,y _isecond fully-connected to represent output of second fully-connected layer 432Vector of connectionFeedForward2Corresponds toiThe actual value of the individual category,p() RepresentsSigmoid() The function operation is activated and the function operation is activated,p(y _i) Represents the firstiClass classification probability of an individual class.

Step 130 further includes a substep 138 in which computing device 20 bases the similarity loss function

Class and category classification loss function

A loss function is determined for the entire depth semantic similarity model 400.

For example, the following equation (12) may be utilized to determine a loss function for the depth semantic similarity model 400L：

（12）

Namely, the loss function of the deep semantic similarity model 400LDefined as a similarity loss function

Class and category classification loss function

And (4) summing.

In the deep semantic similarity model 400, the triple Loss function Triplet Loss is used to replace the Loss function SoftMax Loss in the original model, so that product names with similar characters and different semantics (such as two product names of "safety helmet" and "safety shoe", which have high character similarity and low semantic similarity) can be better distinguished, and a text encoder can also be helped to learn a better text encoding vector expression mode. Meanwhile, a multi-task optimization mode is used during training, and a category classification loss function is added

The auxiliary depth semantic similarity model 400 improves the encoding capability for text information.

During training of the deep semantic similarity model 400 with multiple pieces of training data in

steps

120 and 130, stochastic gradient descent may be used to continually optimize the loss function of the deep semantic similarity model 400LUntil the model converges.

Using the self-attention model 410 of the deep semantic similarity model 400 constructed as above, an arbitrarily input search word can be converted into an embedded vector. In order to determine the standard product data corresponding to any input search word, a standard product thesaurus index should also be established for all standard products for vector retrieval.

Specifically, as shown in FIG. 2, method 100 further includes step 140, wherein computing device 20 generates a standard product thesaurus index based on self-attention model 410 and a standard product database.

FIG. 7 shows a flowchart of the step 140 of generating a standard product thesaurus index according to an embodiment of the present invention.

As shown in fig. 7, step 140 may include sub-step 142, wherein computing device 20 obtains the standard product database. As previously mentioned, the standard product database may be a product database constructed by an electronic commerce enterprise in which a user performs retrieval, which may be stored in advance in the server 30 of the electronic commerce enterprise or an external storage connected thereto. Thus, in generating the standard product thesaurus index, the computing device 20 should first retrieve (e.g., from the server 30 or an external memory connected thereto) all of the product data in the standard product database.

In sub-step 144, computing device 20 enters each standard product name (standard _ item) in the standard product database into trained self-attention model 410 to generate a name vector (standard _ item _ vec) for the standard product name.

Here, the method for generating the name vector standard _ item _ vec of the standard product name standard _ item is similar to the method for generating the corresponding search word vector query _ vec, positive sample vector positive _ item _ vec and negative sample vector negative _ item _ vec for the search word query, positive sample positive _ item and negative sample negative _ item, respectively, described in step 120 above, and thus will not be described herein again.

In sub-step 146, computing device 20 operates on the name vectors standard _ item _ vec for all standard product names in the standard product database based on the locality sensitive hashing algorithm of the random hyperplane to generate a hash index tree for the name vectors standard _ item _ vec for all standard product names in the standard product database.

In particular, in one embodiment, for the set of all name vectors standard item vec

(

RepresentsdDimensional real vector space), can be passeddDimensional Gaussian distribution sampling a random hyperplane

By using

Aggregating vectorsVIs divided into two spaces. For random hyperplane

The following Locality Sensitive Hash (LSH) function may be defined

：

Repeatedly executing the LSH operation to obtain a hash index tree of the name vectors of the standard item names, wherein the depth of the hash index tree is equal to the repeated times of the LSH operationAnd (4) counting. The complexity of obtaining a vector's neighboring vectors by the hash index tree thus generated is

，NIs the vector set size. That is, in the hash index tree thus generated, hash indexes of adjacent name vectors standard _ item _ vec are also adjacent, and thus such hash index tree can serve as a thesaurus index of the standard product database.

After building the standard product thesaurus index, computing device 20 may recall, for any target search term entered by the user, the target standard product corresponding to the target search term based on self-attention model 410 and the standard product thesaurus index, at step 150.

FIG. 8 shows a flowchart of the step 150 of recalling a target standard product according to an embodiment of the present invention.

As shown in FIG. 8, step 150 may include a substep 152 in which computing device 20 receives a targeted search term query entered by a particular user.

In sub-step 154, computing device 20 inputs the target search term query from attention model 410 to generate a target search term vector query vec for the target search term query.

Here, the method for generating the target search term vector query _ vec of the target search term query is similar to the method for generating the corresponding search term vector query _ vec, positive sample vector positive _ item _ vec and negative sample vector negative _ item _ vec for the search term query, positive sample positive _ item and negative sample negative _ item respectively in step 120, and therefore, the description thereof is omitted.

Next, in sub-step 156, computing device 20 recalls the target name vector standard _ item _ vec corresponding to the target search word vector query _ vec in the standard product thesaurus index.

As mentioned before, the standard product thesaurus index is constructed using LSH, so the nearest vector found from the table product thesaurus index according to the target search word vector query _ vec is considered as the target name vector standard _ item _ vec corresponding to the target search word vector query _ vec, which corresponds to a standard product name.

In sub-step 158, the computing device 20 obtains a standard product name standard _ item _ vec corresponding to the target name vector standard _ item _ vec as the standard product name of the target search term query (e.g., based on the standard product name standard _ item and its correspondence of the name vector standard _ item _ vec).

Further, the computing device 20 may perform a search based on the standard product name to obtain more accurate search results.

FIG. 9 illustrates a block diagram of a computing device 900 suitable for implementing embodiments of the present invention. Computing device 900 may be, for example, computing device 20 or server 30 as described above.

As shown in fig. 9, computing device 900 may include one or more Central Processing Units (CPUs) 910 (only one shown schematically) that may perform various suitable actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 920 or loaded from a storage unit 980 into a Random Access Memory (RAM) 930. In the RAM 930, various programs and data required for operation of the computing device 900 may also be stored. The CPU 910, ROM 920, and RAM 930 are connected to each other via a bus 940. An input/output (I/O) interface 950 is also connected to bus 940.

A number of components in computing device 900 are connected to I/O interface 950, including: an input unit 960 such as a keyboard, a mouse, etc.; an output unit 970 such as various types of displays, speakers, and the like; a storage unit 980 such as a magnetic disk, optical disk, or the like; and a communication unit 990 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 990 allows the computing device 900 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.

The method 100 described above may be performed, for example, by the CPU 910 of the computing device 900 (e.g., computing device 20 or server 30). For example, in some embodiments, the method 100 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 980. In some embodiments, some or all of the computer program can be loaded and/or installed onto computing device 900 via ROM 920 and/or communications unit 990. When loaded into RAM 930 and executed by CPU 910, may perform one or more of the operations of method 100 described above. Further, the communication unit 990 may support wired or wireless communication functions.

Those skilled in the art will appreciate that the computing device 900 shown in FIG. 9 is merely illustrative. In some embodiments, computing device 20 or server 30 may contain more or fewer components than computing device 900.

The search term normalization method 100 and the computing device 900 that may be used as the computing device 20 or the server 30 according to the present invention are described above with reference to the accompanying drawings. However, it will be appreciated by those skilled in the art that the performance of the steps of the method 100 is not limited to the order shown in the figures and described above, but may be performed in any other reasonable order. Further, the computing device 900 need not include all of the components shown in FIG. 9, it may include only some of the components necessary to perform the functions described in the present invention, and the manner in which these components are connected is not limited to the form shown in the figures.

And (3) simulation results:

the following table shows the comparison of simulation results of the deep semantic similarity model 400 according to the present invention with a conventional word embedding model:

the various test data are illustrated below:

sequencing test: by manual screening<Search term query, positive sample, negative sample>Test sets, e.g.<Safety helmet, helmet and safety shoes>. Is defined as the similarity score output by the deep semantic similarity model 400 when

If so, the entry is recorded as a correct entry, otherwise, the entry is an incorrect entry. Rank score = number of correct terms ranked/total test samples ranked.

Synonym testing: defining a test sample based on an industrial product synonym library accumulated by the applicant

WhereinAIn the name of the product,

is a synonym for the name of a product,Bfor randomly sampled product names, e.g.<Pressure regulating valve, pressure reducing valve and breaker lock>. Definition of

The similarity score output for the deep semantic similarity model 400 when

If so, the entry is recorded as a correct entry, otherwise, the entry is an incorrect entry. Synonym score = number of synonym correct terms/number of synonym total test samples.

And (3) lexical testing: the lexical test is divided into four modules, A, AB based on the stopword lexical, and Prefix and Suffix based on the affix lexical. The lexical test sample is defined as<

>Wherein C and D are basic vocabulary phrases,

the extended words are constructed based on the lexical rules and based on C and D, the basic vocabulary phrase set is defined as V, and the corresponding extended word set is defined as V

. The following are examples of A, AB, Prefix, and Suffix.

A: based on "AA", "a-a", "a comes a go" the constructed pair of expanded words, e.g., < smile, smile >, < guess, guess >, < run, run come go >.

AB: and expanding word pairs based on the structures of AABB and AB in A, such as < happy, happy >, < flustered, flustered >.

Prefix: and expanding word pairs constructed based on Chinese prefix characters such as 'the second', 'big', and the like, such as 'the second, the second', 'the sea, the sea'.

And, Suffix: the expansion word pair constructed based on Chinese suffix words such as 'son' and 'person' is disclosed, for example, < chair, chair >, < production, producer >.

Definition of

For the vectors output by the deep semantic similarity model 400, first, the vectors are computed

To a

Calculating

Based on

Selecting

To make

And

if selected, is closest to

And recording the data as a correct item, otherwise, recording the data as an error item. Lexical score = number of lexical correct items/number of total lexical test samples.

From the simulation results, it can be seen that, by using the deep semantic similarity model 400 of the present invention, the evaluation scores for the ranking test, the synonym test, and the lexical test are far higher than those of the conventional word embedding model.

The present invention may be methods, apparatus, systems and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therein for carrying out aspects of the present invention.

In one or more exemplary designs, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. For example, if implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The units of the apparatus disclosed herein may be implemented using discrete hardware components, or may be integrally implemented on a single hardware component, such as a processor. For example, the various illustrative logical blocks, modules, and circuits described in connection with the invention may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both.

The previous description of the invention is provided to enable any person skilled in the art to make or use the invention. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the present invention is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method of search term normalization, comprising:

constructing a plurality of pieces of training data based on historical data of a plurality of users, wherein each piece of training data comprises a search word, a positive sample and a negative sample, wherein the positive sample indicates product data in a search result based on the search word, the user performed a predetermined operation, and the negative sample indicates product data irrelevant to the search result;

constructing a self-attention model of a depth semantic similarity model and training the self-attention model by utilizing the plurality of pieces of training data, wherein for each piece of training data, the self-attention model outputs a search word vector, a positive sample vector and a negative sample vector;

determining a loss function of the deep semantic similarity model based on the search term vectors, the positive sample vectors, and the negative sample vectors of the plurality of pieces of training data;

generating a standard product thesaurus index based on the self-attention model and a standard product database; and

determining a standard product name for a target search term input by a particular user based on the self-attention model and the standard product thesaurus index.

2. The method of claim 1, wherein constructing a plurality of pieces of training data comprises:

determining a search term from the historical data of the plurality of users;

determining, as the positive sample, product data in which a user who inputs the search word has performed the predetermined operation on a search result, wherein the predetermined operation includes at least one of clicking, joining a shopping cart, and purchasing; and

determining the negative examples based on the standard product database.

3. The method of claim 2, wherein the negative examples include at least one of:

product data randomly selected from the standard product database; and

product data in the standard product database having the same parent category as the positive sample.

4. The method of claim 1, wherein training the self-attention model with the plurality of pieces of training data comprises:

performing character-level word embedding on a search word, a positive sample and a negative sample of each piece of training data respectively to obtain character-level word embedding vectors of the search word, the positive sample and the negative sample respectively;

respectively performing odd-even position coding on a search word, a positive sample and a negative sample of each piece of training data to respectively obtain position coding vectors of the search word, the positive sample and the negative sample;

merging and normalizing the character-level word embedding vectors and the position coding vectors of the search words, the positive samples and the negative samples respectively to obtain normalized vectors of the search words, the positive samples and the negative samples;

in each of at least one self-attention head, performing an operation on the normalized vectors of the search word, the positive sample, and the negative sample using a self-attention function to obtain self-attention vectors of the search word, the positive sample, and the negative sample, respectively;

in each self-attention head, operating on the self-attention vectors of the search word, the positive sample and the negative sample by using a nonlinear activation function to obtain full-connected vectors of the search word, the positive sample and the negative sample; and

averaging at least one fully-connected vector of the search term, the positive sample, and the negative sample obtained from the at least one self-attention head, respectively, to obtain the search term vector, the positive sample vector, and the negative sample vector.

5. The method of claim 1, wherein determining a loss function of the depth semantic similarity model comprises:

determining a first similarity between the search word vector and the positive sample vector and a second similarity between the search word vector and the negative sample vector based on the search word vector, the positive sample vector, and the negative sample vector of each piece of training data, respectively;

determining a similarity loss function based on the first similarity and the second similarity;

determining a category classification loss function based on categories of search terms and search term vectors of the plurality of pieces of training data;

determining a loss function of a depth semantic similarity model based on the similarity loss function and the category classification loss function.

6. The method of claim 5, wherein the similarity Loss function comprises a triple Loss function (Triplet Loss).

7. The method of claim 1, wherein generating a standard product thesaurus index comprises:

acquiring the standard product database;

inputting each standard product name in the standard product database into the self-attention model to generate a name vector for the standard product name; and

a random hyperplane-based locality-sensitive hashing algorithm operates on the name vectors of all standard product names in the standard product database to generate a hash index tree of the name vectors of all standard product names in the standard product database,

wherein hash indices of adjacent name vectors are also adjacent in the hash index tree.

8. The method of claim 1, wherein determining a standard product name for a particular user-entered target search term based on the self-attention model and the standard product thesaurus index comprises:

receiving a target search word input by the specific user;

inputting the target search term into the self-attention model to generate a target search term vector for the target search term;

recalling a target name vector corresponding to the target search word vector in the standard product thesaurus index; and

and acquiring a standard product name corresponding to the target name vector as the standard product name of the target search term.

9. A computing device, comprising:

at least one processor; and

at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor causing the computing device to perform the steps of the method of any of claims 1-8.

10. A computer readable storage medium having stored thereon computer program code which, when executed, performs the method of any of claims 1 to 8.