CN112182144A - Search term normalization method, computing device, and computer-readable storage medium - Google Patents

Search term normalization method, computing device, and computer-readable storage medium Download PDF

Info

Publication number
CN112182144A
CN112182144A CN202011374977.1A CN202011374977A CN112182144A CN 112182144 A CN112182144 A CN 112182144A CN 202011374977 A CN202011374977 A CN 202011374977A CN 112182144 A CN112182144 A CN 112182144A
Authority
CN
China
Prior art keywords
vector
standard product
self
vectors
search term
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011374977.1A
Other languages
Chinese (zh)
Other versions
CN112182144B (en
Inventor
杨涵
陈广顺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhenkunxing Network Technology Nanjing Co ltd
Original Assignee
Zhenkunxing Network Technology Nanjing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhenkunxing Network Technology Nanjing Co ltd filed Critical Zhenkunxing Network Technology Nanjing Co ltd
Priority to CN202011374977.1A priority Critical patent/CN112182144B/en
Publication of CN112182144A publication Critical patent/CN112182144A/en
Application granted granted Critical
Publication of CN112182144B publication Critical patent/CN112182144B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The present disclosure provides a search term normalization method, a computing device, and a computer-readable storage medium. The method comprises the following steps: constructing a plurality of pieces of training data based on historical data of a plurality of users; constructing a self-attention model of the depth semantic similarity model and training the self-attention model by utilizing the plurality of pieces of training data; determining a first similarity between the search term vector and the positive sample vector and a second similarity between the search term vector and the negative sample vector, respectively; determining a similarity loss function based on the first similarity and the second similarity; determining a loss function of the deep semantic similarity model based on the search term vectors, the positive sample vectors and the negative sample vectors of the plurality of pieces of training data; generating a standard product thesaurus index based on the self-attention model and a standard product database; and determining a standard product name for the target search term input by the specific user based on the self-attention model and the standard product thesaurus index.

Description

Search term normalization method, computing device, and computer-readable storage medium
Technical Field
The present invention relates generally to the field of machine learning, and more particularly, to a search term normalization method, a computing device, and a computer-readable storage medium.
Background
With the continuous development of networks, more and more users meet shopping demands through the e-commerce search system. However, in many cases, the search terms entered by the user are not standard product names, such that the search results may contain a large amount of useless product information. In particular, in the field of industrial sales, product names often have professional and standardized expressions. With conventional search methods, when an irregular search word is input, it is difficult to accurately hit a desired product or a large number of redundant products are found in a recall result, so that the user experience is poor. For example, when a user wishes to find a product such as "water hose clamp", he may input a search term such as "clamp for water hose" if he does not know the normalized product name, and the search system may recall products containing "water hose" and "clamp" respectively after segmenting the search term, which may not exactly match the user's needs.
Disclosure of Invention
Aiming at the problems, the invention provides a search term standardization scheme, which rewrites any target search term input by a user into a standard search term by constructing an improved deep semantic similarity model and a standard product word stock index so as to obtain a more accurate recall result.
According to one aspect of the present invention, a search term normalization method is provided. The method comprises the following steps: constructing a plurality of pieces of training data based on historical data of a plurality of users, wherein each piece of training data comprises a search word, a positive sample and a negative sample; constructing a self-attention model of a depth semantic similarity model and training the self-attention model by utilizing the plurality of pieces of training data, wherein for each piece of training data, the self-attention model outputs a search word vector, a positive sample vector and a negative sample vector; respectively determining a first similarity between the search word vector and the positive sample vector and a second similarity between the search word vector and the negative sample vector based on the search word vector, the positive sample vector and the negative sample vector of each set of training data; determining a similarity loss function based on the first similarity and the second similarity; determining a loss function of the deep semantic similarity model based on the search term vectors, the positive sample vectors and the negative sample vectors of the plurality of pieces of training data; generating a standard product thesaurus index based on the self-attention model and a standard product database; and determining a standard product name for the target search term input by the specific user based on the self-attention model and the standard product thesaurus index.
In one embodiment, constructing the plurality of pieces of training data comprises: determining a search term from the historical data of the plurality of users; determining, as the positive sample, product data in which a user who inputs the search word has performed the predetermined operation on a search result, wherein the predetermined operation includes at least one of clicking, joining a shopping cart, and purchasing; and determining the negative examples based on the standard product database.
In one embodiment, the negative examples include at least one of: product data randomly selected from the standard product database; and product data in the standard product database having the same parent category as the positive sample.
In one embodiment, training the self-attention model using the plurality of pieces of training data comprises: performing character-level word embedding on a search word, a positive sample and a negative sample of each piece of training data respectively to obtain character-level word embedding vectors of the search word, the positive sample and the negative sample respectively; respectively performing odd-even position coding on a search word, a positive sample and a negative sample of each piece of training data to respectively obtain position coding vectors of the search word, the positive sample and the negative sample; merging and normalizing the character-level word embedding vectors and the position coding vectors of the search words, the positive samples and the negative samples respectively to obtain normalized vectors of the search words, the positive samples and the negative samples; in each of at least one self-attention head, performing an operation on the normalized vectors of the search word, the positive sample, and the negative sample using a self-attention function to obtain self-attention vectors of the search word, the positive sample, and the negative sample, respectively; in each self-attention head, operating on the self-attention vectors of the search word, the positive sample and the negative sample by using a nonlinear activation function to obtain full-connected vectors of the search word, the positive sample and the negative sample; and averaging at least one fully-connected vector of the search word, the positive sample, and the negative sample obtained from the at least one self-attention head to obtain the search word vector, the positive sample vector, and the negative sample vector, respectively.
In one embodiment, determining the loss function of the depth semantic similarity model comprises: determining a first similarity between the search word vector and the positive sample vector and a second similarity between the search word vector and the negative sample vector based on the search word vector, the positive sample vector, and the negative sample vector of each piece of training data, respectively; determining a similarity loss function based on the first similarity and the second similarity; determining a category classification loss function based on categories of search terms and search term vectors of the plurality of pieces of training data; determining a loss function of a depth semantic similarity model based on the similarity loss function and the category classification loss function.
In one embodiment, the similarity Loss function comprises a triple Loss function (Triplet Loss).
In one embodiment, generating the standard product thesaurus index comprises: acquiring the standard product database; inputting each standard product name in the standard product database into the self-attention model to generate a name vector for the standard product name; and operating the name vectors of all the standard product names in the standard product database by using a locality sensitive hashing algorithm based on a random hyperplane to generate a hash index tree of the name vectors of all the standard product names in the standard product database, wherein in the hash index tree, hash indexes of adjacent name vectors are adjacent.
In one embodiment, determining a standard product name for a target search term entered by a particular user based on the self-attention model and the standard product thesaurus index comprises: receiving a target search word input by the specific user; inputting the target search term into the self-attention model to generate a target search term vector for the target search term; recalling a target name vector corresponding to the target search word vector in the standard product thesaurus index; and acquiring a standard product name corresponding to the target name vector as the standard product name of the target search term.
According to another aspect of the invention, a computing device is provided. The computing device includes: at least one processor; and at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor causing the computing device to perform steps according to the above-described method.
According to yet another aspect of the present invention, a computer-readable storage medium is provided, having stored thereon computer program code, which when executed performs the method as described above.
By utilizing the scheme of the invention, any search word input by a user can be rewritten into a standard search word (namely a standard product name) by constructing the improved deep semantic similarity model and the standard product word bank index, so that a more accurate recall result can be obtained based on the standard search word.
Drawings
The invention will be better understood and other objects, details, features and advantages thereof will become more apparent from the following description of specific embodiments of the invention given with reference to the accompanying drawings.
FIG. 1 shows a schematic diagram of a system for implementing a search term normalization method according to an embodiment of the invention.
FIG. 2 illustrates a flow diagram of a search term normalization method according to some embodiments of the invention.
FIG. 3 shows a flowchart of steps for constructing pieces of training data, according to an embodiment of the invention.
FIG. 4 is a schematic structural diagram of a deep semantic similarity model according to an embodiment of the present invention.
FIG. 5 is a flowchart illustrating steps for training a self-attention model using training data according to an embodiment of the present invention.
FIG. 6 shows a flowchart of steps for determining a loss function for a depth semantic similarity model, according to an embodiment of the invention.
FIG. 7 is a flowchart illustrating the steps of generating a standard product thesaurus index according to an embodiment of the present invention.
FIG. 8 shows a flowchart of the steps of recalling a target standard product according to an embodiment of the present invention.
FIG. 9 illustrates a block diagram of a computing device suitable for implementing embodiments of the present invention.
Detailed Description
Preferred embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present invention are shown in the drawings, it should be understood that the present invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
In the following description, for the purposes of illustrating various inventive embodiments, certain specific details are set forth in order to provide a thorough understanding of the various inventive embodiments. One skilled in the relevant art will recognize, however, that the embodiments may be practiced without one or more of the specific details. In other instances, well-known devices, structures and techniques associated with this application may not be shown or described in detail to avoid unnecessarily obscuring the description of the embodiments.
Throughout the specification and claims, the word "comprise" and variations thereof, such as "comprises" and "comprising," are to be understood as an open, inclusive meaning, i.e., as being interpreted to mean "including, but not limited to," unless the context requires otherwise.
Reference throughout this specification to "one embodiment" or "some embodiments" means that a particular feature, structure, or characteristic described in connection with the embodiments is included in at least one embodiment. Thus, the appearances of the phrases "in one embodiment" or "in some embodiments" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
Furthermore, the terms first, second and the like used in the description and the claims are used for distinguishing objects for clarity, and do not limit the size, other order and the like of the described objects.
Fig. 1 shows a schematic diagram of a system 1 for implementing a hierarchical ordering method according to an embodiment of the invention. As shown in fig. 1, the system 1 includes a user terminal 10, a computing device 20, a server 30, and a network 40. User terminal 10, computing device 20, and server 30 may interact with data via network 40. Here, each user terminal 10 may be a mobile or fixed terminal of an end user, such as a mobile phone, a tablet computer, a desktop computer, or the like. The user terminal 10 may communicate with a server 30 of the electronic commerce enterprise, for example, through an electronic commerce enterprise application or a specific search engine installed thereon, to send information to the server 30 and/or receive information from the server 30. The computing device 20 performs corresponding operations based on data from the user terminal 10 and/or the server 30. The computing device 20 may include at least one processor 210 and at least one memory 220 coupled to the at least one processor 210, the memory 220 having stored therein instructions 230 executable by the at least one processor 210, the instructions 230, when executed by the at least one processor 210, performing at least a portion of the method 100 as described below. Note that herein, computing device 20 may be part of server 30 or may be separate from server 30. The specific structure of computing device 20 or server 30 may be described, for example, in connection with FIG. 9, below.
FIG. 2 illustrates a flow diagram of a search term normalization method 100 according to some embodiments of the invention. The method 100 may be performed, for example, by the computing device 20 or the server 30 in the system 1 shown in fig. 1. The method 100 is described below in conjunction with fig. 1-9, with an example being performed in the computing device 20.
As shown in FIG. 2, method 100 includes step 110, where computing device 20 constructs a plurality of pieces of training data based on historical data of a plurality of users. Each piece of training data may include a search term (query), a positive sample (positive _ item), and a negative sample (negative _ item). The positive sample positive _ item indicates product data on which a user has performed a predetermined operation among search results based on the search term query, and the negative sample negative _ item indicates product data unrelated to the search results based on the search term query.
FIG. 3 shows a flowchart of step 110 for constructing pieces of training data, according to an embodiment of the invention.
As shown in fig. 3, step 110 may include sub-step 112 in which computing device 20 may determine a search term from historical data for a plurality of users. Here, the history data of the plurality of users may be stored in the server 30 or an external database associated therewith. In constructing the training data, computing device 20 may retrieve the historical data for these users from memory 30 or an external database. One piece of history data may include, for example, user information (e.g., a user ID), a search word, a search result (e.g., a searched product list) corresponding to the search word, a user operation, product data corresponding to the user operation (e.g., product data in which the user performed a specific operation in the search result), and other information (e.g., time information, etc.).
In sub-step 112, computing device 20 may extract the search term from a piece of historical data.
In sub-step 114, computing device 20 determines (e.g., from the historical data described above) as positive sample positive _ item product data for which the user entering the search term performed a predetermined operation on the search results. For example, the positive sample positive _ item may be product data in which the user performed at least one of a click operation, a join shopping cart operation, and a purchase operation. For one search performed by one user, if the user performs any one of the above predetermined operations on a plurality of product data in the search result, one positive sample may be generated for the product data of all the operations, respectively, or only one or more product data thereof may be selected to generate one positive sample, respectively. That is, a piece of historical data may produce one or more positive samples (and thus one or more pieces of training data). Of course, a piece of historical data may not produce any positive samples (and thus training data), such as when the user has not performed any predetermined operation on any product in the search results. In this case, the piece of historical data is not used to construct training data.
In sub-step 116, computing device 20 determines a negative sample negative item of the piece of training data based on the standard product database. The standard product database may be a product database constructed by an electronic commerce industry in which a user performs retrieval, and may be stored in advance in the server 30 of the electronic commerce industry or an external storage connected thereto. In sub-step 116, computing device 20 may send a request to server 30 or the external storage and receive product data back from server 30 or the external storage. The negative sample negative item should be product data that is substantially unrelated to the search results for the search term.
In one embodiment, the negative sample negative item may be product data randomly selected from the standard product database. Since a large amount of product data is usually contained in the standard product database, it is unlikely that randomly selected product data is the same as or similar to the commodity data of the positive sample positive _ item, and thus can be considered as sample data completely unrelated to the positive sample positive _ item.
The negative sample negative _ item may also be product data in the standard product database that has the same parent category as the corresponding positive sample positive _ item. Typically, in a standard product database, each piece of product data has its corresponding category data, and the category data for a plurality of products is typically stored in a tree structure in the server 30 or an external storage connected thereto. Each product data may correspond to multiple categories, one category first, one category second, one category third, … …, etc. from top to bottom. The number of levels of the tree structure of category data may vary from enterprise to enterprise and from product to product. In some embodiments according to the invention, the level number of the class number data is 4, the class number of the lowest one level of each product data is called a final level class number (may also be called a level four class number in the case of the level number 4), the class number of the last one level thereof is called a parent level class number (may also be called a level three class number in the case of the level number 4), the last one level of the parent level class is called a grandparent level class number (may also be called a level two class number in the case of the level number 4), and the last one level of the grandparent level class is called a great-parent level class number (may also be called a level one class number in the case of the level number 4). Thus, product data having the same parent category as the positive sample positive _ item refers to product data in other four-level categories having the same three-level category as the positive sample positive _ item. If the number of hierarchies is greater than 4, the analogy can be performed in the upward direction, and if the number of hierarchies is less than 4, the classification can be sequentially called a primary category and a secondary category … … from top to bottom.
Alternatively, negative examples in the plurality of pieces of training data may use a combination of the two different ways described above. For example, negative examples in one portion of training data (e.g., 80% training data) may be determined based on a random selection, and negative examples in another portion of training data (e.g., 20% training data) may be determined based on the same parent category.
At step 110, a plurality of pieces of training data may be constructed in the manner described above in sub-steps 112 to 116. For example, an instance of a piece of training data may be represented as:
Figure 153317DEST_PATH_IMAGE001
it can be seen that, in the piece of training data, the query word query is "infrared thermometer", the positive sample positive _ item is "ulide infrared thermometer UT300A +", and the negative sample negative _ item is "FLUKE-802 CN". In the present example, the product data of the positive and negative samples include brands "ulide" and "folk", product names "infrared thermometer" and "vibration tester", and models "UT 300A +" and "FLUKE-802 CN", respectively, although those skilled in the art will appreciate that the product data of the positive and negative samples may include more or less information.
Continuing with FIG. 2, at step 120, computing device 20 builds a self-attention model of the deep semantic similarity model and trains the self-attention model using the pieces of training data built at step 110.
FIG. 4 shows a structural schematic diagram of a deep semantic similarity model 400 according to an embodiment of the invention. As shown in fig. 4, the depth semantic similarity model 400 may include a self-attention model 410, a similarity calculation layer 420, and a fully connected classification layer 430.
In the model training phase, the self-attention model 410 may input each piece of training data (including search term query, positive sample positive _ item, and negative sample negative _ item) constructed as described above, and output a corresponding search term vector query _ vec, positive sample vector positive _ item _ vec, and negative sample vector negative _ item _ vec, respectively, as shown in fig. 4. That is, although the search term query, the positive sample positive _ item, and the negative sample negative _ item are input from the attention model 410 as one piece of training data, they are processed separately and get respective vectors in the self-attention model 410. In the model using stage, the self-attention model 410 may input a target search term query input by the user and output a target search term vector query _ vec of the target search term query (not shown in the figure). The self-attention model 410 is described below primarily in terms of a model training phase.
As shown in fig. 4, the self-attention model 410 of the depth semantic similarity model 400 may include a character-level word embedding layer 411, a position coding layer 412, a merging layer 413, a normalization layer 414, at least one self-attention head (each including a self-attention layer 415 and a first fully-connected layer 416). For the case where multiple self-attention heads are included in the self-attention model 410 (e.g., "self-attention head x 3" shown in fig. 4 indicates that there are 3 self-attention heads), the self-attention model 410 may also include a multi-head self-attention merging layer 417. The functions of the various components of self-attention model 410 are described in detail below with reference to FIG. 5.
The similarity calculation layer 420 of the depth semantic similarity model 400 may include a similarity calculation layer 422 and a similarity loss function determination layer 424. The functions of the various components of the similarity calculation layer 420 will be described in detail below with reference to fig. 6.
The fully-connected classification layer 430 of the depth semantic similarity model 400 may include a second fully-connected layer 432 and a category classification loss function determination layer 434. The functions of the various components of the fully connected taxonomy layer 430 will be described in detail below with reference to fig. 6.
FIG. 5 shows a flowchart of step 120 of training a self-attention model 410 with training data, according to an embodiment of the present invention.
As shown in fig. 5, step 120 may include sub-step 121, where at character-level word embedding layer 411, computing device 20 performs character-level word embedding on the search word query, positive sample positive _ item, and negative sample negative _ item, respectively, of each piece of training data to obtain character-level word embedding vectors for the search word query, positive sample positive _ item, and negative sample negative _ item, respectively
Figure 8140DEST_PATH_IMAGE002
Figure 319036DEST_PATH_IMAGE003
And
Figure 901196DEST_PATH_IMAGE004
(hereinafter collectively referred to as
Figure 856513DEST_PATH_IMAGE005
)。
In one embodiment, each character of the input text may be embedded as oneDWei (A)DIs an adjustable parameter value, which may be equal to 256, for example) vector to determine the character-level word embedding vector for the input text. For example, a character-level word embedding vector may be determined according to the following formula (1)
Figure 210134DEST_PATH_IMAGE005
Figure 634425DEST_PATH_IMAGE006
(1)
Wherein
Figure 364483DEST_PATH_IMAGE007
A character-level word embedding vector (which is a single word-embedding vector) representing input text (e.g., input search word query, positive sample positive _ item, and negative sample negative _ item)D*NA matrix),
Figure 236624DEST_PATH_IMAGE008
is expressed as length ofNIn the input text ofiThe number of the characters is one,
Figure 885780DEST_PATH_IMAGE009
is shown asiCharacter-level word embedding corresponding to each character,
Figure 577793DEST_PATH_IMAGE010
is composed ofDThe vector of real numbers is then measured in dimensions,
Figure 845963DEST_PATH_IMAGE011
as determined by a hash lookup.
In sub-step 122, at position encoding layer 412, computing device 20 performs parity position encoding on the search term query, positive sample positive _ item, and negative sample negative _ item, respectively, of each piece of training data to obtain position encoding vectors for the search term query, positive sample positive _ item, and negative sample negative _ item, respectively
Figure 258097DEST_PATH_IMAGE012
Figure 219099DEST_PATH_IMAGE013
And
Figure 867250DEST_PATH_IMAGE014
(hereinafter collectively referred to as
Figure 673532DEST_PATH_IMAGE015
). For example, the position-coding vector may be determined according to the following formula (2)
Figure 769532DEST_PATH_IMAGE016
Figure 511223DEST_PATH_IMAGE017
(2)
Wherein the content of the first and second substances,
Figure 771303DEST_PATH_IMAGE018
a position-encoding vector (which is also a single vector) representing input text (e.g., input search term query, positive sample positive _ item, and negative sample negative _ item)D*NA matrix),
Figure 741796DEST_PATH_IMAGE008
is expressed as length ofNIn the input text ofiThe number of the characters is one,
Figure 770932DEST_PATH_IMAGE019
to representDNumber of dimension real number vectordThe ratio of vitamin to vitamin is,
Figure 949103DEST_PATH_IMAGE020
is as followsiCorresponding to a characterDDimension position coding vectorP i To (1) adAnd (4) dimension value. Wherein in the calculation
Figure 352272DEST_PATH_IMAGE021
And calling different position coding functions according to the odd-even positions, and obtaining the relative position coding vectors between different positions through the trigonometric function relation of sine (sin) and cosine (cos) functions.
Although sub-step 122 is shown in fig. 5 as being subsequent to sub-step 121, it will be understood by those skilled in the art that sub-step 122 may be performed in parallel with sub-step 121 or before sub-step 121.
Next, in sub-step 123, at merge layer 413 and normalization layer 414, computing device 20 embeds vectors into character-level words of search term query, positive sample positive _ item, and negative sample negative _ item, respectively
Figure 765935DEST_PATH_IMAGE005
And a position-coding vector
Figure 524944DEST_PATH_IMAGE022
Merging and normalizing to obtain normalized vectors of search term query, positive sample positive _ item and negative sample negative _ item respectively
Figure 733071DEST_PATH_IMAGE023
Figure 528596DEST_PATH_IMAGE024
And
Figure 214792DEST_PATH_IMAGE025
(hereinafter collectively referred to as
Figure 343154DEST_PATH_IMAGE026
)。
For example, at the merge layer 413, a character-level word embedding vector may be determined according to the following equation (3)
Figure 659866DEST_PATH_IMAGE027
And a position-coding vector
Figure 116255DEST_PATH_IMAGE028
Is combined with the vector
Figure 497820DEST_PATH_IMAGE029
(it is also one)D*NMatrix):
Figure 497000DEST_PATH_IMAGE030
(3)
i.e. using matrix additionOperation is embedding vector for character level word
Figure 312509DEST_PATH_IMAGE005
And a position-coding vector
Figure 380828DEST_PATH_IMAGE031
And merging.
Further, at the normalization layer 414, a normalized vector may be determined according to equation (4) below
Figure 939986DEST_PATH_IMAGE032
To (1) aiGo to the firstdColumn elements:
Figure 262514DEST_PATH_IMAGE033
(4)
wherein the content of the first and second substances,
Figure 983345DEST_PATH_IMAGE034
merging vector representing input
Figure 834233DEST_PATH_IMAGE035
To (1) aiA vector of (P i +C i ) To (1) adThe value of the dimension is set as the value,
Figure 806868DEST_PATH_IMAGE036
merging vector representing input
Figure 295487DEST_PATH_IMAGE035
To (1) adDimension value in lengthNThe average value of the following values,
Figure 187220DEST_PATH_IMAGE037
merging vector representing input
Figure 715284DEST_PATH_IMAGE035
To (1) adDimension value in lengthNStandard deviation of the following. In the self-attention model 410,
Figure 616244DEST_PATH_IMAGE038
and
Figure 460834DEST_PATH_IMAGE039
is thatDThe parameters may be trained dimensionally.
Next, at sub-step 124, at the self-attention layer 415 in each self-attention head, computing device 20 normalizes the normalized vector of search term query, positive sample positive _ item, and negative sample negative _ item using the self-attention function
Figure 195572DEST_PATH_IMAGE040
Performing operations to obtain self-attention vectors of search term query, positive sample positive _ item and negative sample negative _ item
Figure 460200DEST_PATH_IMAGE041
Figure 571376DEST_PATH_IMAGE042
And
Figure 267543DEST_PATH_IMAGE043
(hereinafter collectively referred to as
Figure 766657DEST_PATH_IMAGE044
)。
For example, the self-attention vector may be determined according to the following equation (5)
Figure 3735DEST_PATH_IMAGE044
Figure 246497DEST_PATH_IMAGE045
(5)
Wherein the content of the first and second substances,QKVrepresentative inputx(i.e., normalized vectors for the search term query, positive sample positive _ item, and negative sample negative _ item, respectively
Figure 767477DEST_PATH_IMAGE046
) Respectively associated with trainable parametersW q W k W v As a result of the dot-product,K T representsKThe transpose of (a) is performed,
Figure 109597DEST_PATH_IMAGE047
is an inputxDimension (d) ofDSquare (to avoid gradient vanishing).
Figure 381441DEST_PATH_IMAGE048
The result of (A) is a self-attention weight, andVafter the multiplication is performed, the input isxSelf-attention weighting.
Then, in sub-step 125, at the first fully-connected layer 416 in each self-attention head, computing device 20 utilizes a non-linear activation function to self-attention vectors for the search term query, positive sample positive _ item, and negative sample negative _ item, respectively
Figure 162315DEST_PATH_IMAGE049
Operate to obtain a fully-connected vector of search term query, positive sample positive _ item, and negative sample negative _ item
Figure 288534DEST_PATH_IMAGE050
Figure 129451DEST_PATH_IMAGE051
And
Figure 855967DEST_PATH_IMAGE052
(hereinafter collectively referred to as
Figure 112636DEST_PATH_IMAGE053
)。
For example, the full-connection vector may be determined according to the following equation (6)
Figure 483575DEST_PATH_IMAGE054
Figure 142000DEST_PATH_IMAGE055
(6)
Wherein the content of the first and second substances,Wbis a parameter that can be trained in a way that,ReLuis a non-linear activation function.
By including the first fully-connected layer 416 in the self-attention head, the non-linearity of the depth semantic similarity model 400 can be increased.
In the case where self-attention model 410 includes only one self-attention head, the fully-connected vectors obtained by sub-step 125
Figure 700020DEST_PATH_IMAGE050
Figure 963642DEST_PATH_IMAGE051
And
Figure 189087DEST_PATH_IMAGE052
namely the search term vector query _ vec, positive sample vector positive _ item _ vec, and negative sample vector negative _ item _ vec output from the attention model 410.
In the case where self-attention model 410 includes multiple self-attention heads, step 120 may further include sub-step 126, where computing device 20, at multi-headed self-attention merge layer 417, concatenates multiple fully-connected vectors of search terms query, positive sample positive _ item, and negative sample negative _ item obtained from multiple self-attention heads
Figure 496441DEST_PATH_IMAGE056
Figure 72915DEST_PATH_IMAGE057
And
Figure 671387DEST_PATH_IMAGE052
averaging is performed to obtain a search term vector query _ vec, a positive sample vector positive _ item _ vec, and a negative sample vector negative _ item _ vec, respectively.
For example, a plurality of fully-connected vectors obtained from a plurality of attention heads may be processed according to the following formula (7)
Figure 377437DEST_PATH_IMAGE053
Carrying out averaging:
Figure 199900DEST_PATH_IMAGE058
(7)
wherein the content of the first and second substances,
Figure 998091DEST_PATH_IMAGE059
representsJFirst in the self-attention headjThe head of the individual self-attention is,
Figure 134675DEST_PATH_IMAGE060
representsDNumber of dimension real number vectordThe ratio of vitamin to vitamin is,
Figure 193766DEST_PATH_IMAGE061
i.e. the full join vector obtained in substep 125
Figure 655972DEST_PATH_IMAGE054
Figure 941459DEST_PATH_IMAGE062
The operation is thatJIs averaged over the dimension of (a) to obtainDMulti-headed self-attention vector of dimension
Figure 832799DEST_PATH_IMAGE063
As corresponding search term vector query _ vec, positive sample vector positive _ item _ vec, and negative sample vector negative _ item _ vec.
Continuing with FIG. 2, at step 130, computing device 20 may determine a loss function for depth semantic similarity model 400 based on search term vector query _ vec, positive sample vector positive _ item _ vec, and negative sample vector negative _ item _ vec for the plurality of pieces of training data. In the deep semantic similarity model 400 according to the present invention, the loss function of the model can be considered in terms of both similarity and product category.
FIG. 6 shows a flowchart of step 130 for determining a loss function of the depth semantic similarity model 400 according to an embodiment of the present invention. The implementation of step 130 is further described below in conjunction with fig. 4 and 6.
As shown in fig. 6, step 130 may include sub-step 132, wherein at similarity calculation layer 422, computing device 20 determines a first similarity between the search term vector query _ vec and the positive sample vector positive _ item _ vec and a second similarity between the search term vector query _ vec and the negative sample vector negative _ item _ vec, respectively, based on the search term vector query _ vec, the positive sample vector positive _ item _ vec, and the negative sample vector negative _ item _ vec of each piece of training data.
In one embodiment, the first similarity and the second similarity may be determined using cosine similarity. For example, the first similarity may be determined based on the following formulas (8) and (9), respectively
Figure 887342DEST_PATH_IMAGE064
And a second degree of similarity
Figure 458132DEST_PATH_IMAGE065
Figure 965337DEST_PATH_IMAGE066
(8)
Figure 161832DEST_PATH_IMAGE067
(9)
That is, the similarity between the search term Query and the positive sample positive _ item or the negative sample negative _ item is defined as a search term vector Query _ vec ()Q) And a positive sample vector positive _ item _ vec (I + ) Or a negative sample vector negative _ item _ vec (I - ) Divided by the scalar multiplication of their modulo.
Next, at sub-step 134, at similarity loss function determination level 424, computing device 20 determines a first similarity based on the first similarity
Figure 8565DEST_PATH_IMAGE068
And a second degree of similarity
Figure 874890DEST_PATH_IMAGE069
A similarity loss function is determined.
In an embodiment of the present invention, the similarity Loss function may include a triple Loss function (triple Loss). For example, the similarity loss function may be determined using the following equation (10)
Figure 761069DEST_PATH_IMAGE070
Figure 371042DEST_PATH_IMAGE071
(10)
Wherein the content of the first and second substances,mthe boundary distance value is a minimum difference value that can be used to control the similarity of the positive and negative samples, and the value can be between 0.2 and 0.6 (for example, the preferred value is 0.4), that is, the similarity of the positive samples (the first similarity)
Figure 9964DEST_PATH_IMAGE072
) Is more similar than the negative sample (second similarity)
Figure 47191DEST_PATH_IMAGE069
) At least 0.4 greater.
At sub-step 136, at full-connection classification layer 430, computing device 20 determines a category classification loss function based on the categories of the search term query and the search term vector query _ vec of the plurality of pieces of training data
Figure 919200DEST_PATH_IMAGE073
More specifically, at a second fully-connected layer 432 of fully-connected classification layer 430, computing device 20 may determine a second fully-connected vector of search term vector query _ vec in a manner similar to equation (6) in sub-step 125 aboveFeedForward2. The difference from the above substep 125 is that for the second fullThe connection layer 432 is provided on the substrate,Wis defined asD,M]The ratio of vitamin to vitamin is,Dis the dimension of the input search term vector query vec,Mis the total number of categories (e.g., parent categories of the last category) of products in the standard goods database. During training of depth semantic similarity model 400, such as determining positive sample positive _ item in step 110 or sub-step 114 described above, computing device 20 may determine a Category of positive sample positive _ item (e.g., may be a parent Category of a last Category), and will determine a Category of positive sample positive _ itemWThe element value (representing the probability that the positive sample positive _ item belongs to the Category) in (1) corresponding to the determined Category is set to 1, and the other element values are set to 0.
Thus, in the using stage of the trained deep semantic similarity model 400, when a target search term vector query _ vec of a target search term is input, the second fully-connected layer 432 may output a target search term vector query _ vecMSecond fully-connected vector of dimensionsFeedForward2Each dimension corresponds to the probability that the target search term query is in the corresponding category.
Then, at category classification loss function determination layer 434 of full-join classification layer 430, computing device 20 may base this second full-join vector onFeedForward2Determining a category classification penalty function
Figure 4968DEST_PATH_IMAGE074
For example, the category classification loss function may be determined using the following equation (11)
Figure 888611DEST_PATH_IMAGE075
Figure 985486DEST_PATH_IMAGE076
(11)
This is a binary cross-entropy loss function, where,Mis the total number of categories (e.g. parent categories of the last category) of products in the standard goods database,y i second fully-connected to represent output of second fully-connected layer 432Vector of connectionFeedForward2Corresponds toiThe actual value of the individual category,p() RepresentsSigmoid() The function operation is activated and the function operation is activated,p(y i ) Represents the firstiClass classification probability of an individual class.
Step 130 further includes a substep 138 in which computing device 20 bases the similarity loss function
Figure 689000DEST_PATH_IMAGE077
Class and category classification loss function
Figure 516142DEST_PATH_IMAGE078
A loss function is determined for the entire depth semantic similarity model 400.
For example, the following equation (12) may be utilized to determine a loss function for the depth semantic similarity model 400L
Figure 519870DEST_PATH_IMAGE079
(12)
Namely, the loss function of the deep semantic similarity model 400LDefined as a similarity loss function
Figure 23532DEST_PATH_IMAGE070
Class and category classification loss function
Figure 479921DEST_PATH_IMAGE074
And (4) summing.
In the deep semantic similarity model 400, the triple Loss function Triplet Loss is used to replace the Loss function SoftMax Loss in the original model, so that product names with similar characters and different semantics (such as two product names of "safety helmet" and "safety shoe", which have high character similarity and low semantic similarity) can be better distinguished, and a text encoder can also be helped to learn a better text encoding vector expression mode. Meanwhile, a multi-task optimization mode is used during training, and a category classification loss function is added
Figure 110754DEST_PATH_IMAGE073
The auxiliary depth semantic similarity model 400 improves the encoding capability for text information.
During training of the deep semantic similarity model 400 with multiple pieces of training data in steps 120 and 130, stochastic gradient descent may be used to continually optimize the loss function of the deep semantic similarity model 400LUntil the model converges.
Using the self-attention model 410 of the deep semantic similarity model 400 constructed as above, an arbitrarily input search word can be converted into an embedded vector. In order to determine the standard product data corresponding to any input search word, a standard product thesaurus index should also be established for all standard products for vector retrieval.
Specifically, as shown in FIG. 2, method 100 further includes step 140, wherein computing device 20 generates a standard product thesaurus index based on self-attention model 410 and a standard product database.
FIG. 7 shows a flowchart of the step 140 of generating a standard product thesaurus index according to an embodiment of the present invention.
As shown in fig. 7, step 140 may include sub-step 142, wherein computing device 20 obtains the standard product database. As previously mentioned, the standard product database may be a product database constructed by an electronic commerce enterprise in which a user performs retrieval, which may be stored in advance in the server 30 of the electronic commerce enterprise or an external storage connected thereto. Thus, in generating the standard product thesaurus index, the computing device 20 should first retrieve (e.g., from the server 30 or an external memory connected thereto) all of the product data in the standard product database.
In sub-step 144, computing device 20 enters each standard product name (standard _ item) in the standard product database into trained self-attention model 410 to generate a name vector (standard _ item _ vec) for the standard product name.
Here, the method for generating the name vector standard _ item _ vec of the standard product name standard _ item is similar to the method for generating the corresponding search word vector query _ vec, positive sample vector positive _ item _ vec and negative sample vector negative _ item _ vec for the search word query, positive sample positive _ item and negative sample negative _ item, respectively, described in step 120 above, and thus will not be described herein again.
In sub-step 146, computing device 20 operates on the name vectors standard _ item _ vec for all standard product names in the standard product database based on the locality sensitive hashing algorithm of the random hyperplane to generate a hash index tree for the name vectors standard _ item _ vec for all standard product names in the standard product database.
In particular, in one embodiment, for the set of all name vectors standard item vec
Figure 391825DEST_PATH_IMAGE080
(
Figure 207334DEST_PATH_IMAGE081
RepresentsdDimensional real vector space), can be passeddDimensional Gaussian distribution sampling a random hyperplane
Figure 557544DEST_PATH_IMAGE082
By using
Figure 975756DEST_PATH_IMAGE082
Aggregating vectorsVIs divided into two spaces. For random hyperplane
Figure 688497DEST_PATH_IMAGE082
The following Locality Sensitive Hash (LSH) function may be defined
Figure 550274DEST_PATH_IMAGE083
Figure 981255DEST_PATH_IMAGE084
Repeatedly executing the LSH operation to obtain a hash index tree of the name vectors of the standard item names, wherein the depth of the hash index tree is equal to the repeated times of the LSH operationAnd (4) counting. The complexity of obtaining a vector's neighboring vectors by the hash index tree thus generated is
Figure 232852DEST_PATH_IMAGE085
NIs the vector set size. That is, in the hash index tree thus generated, hash indexes of adjacent name vectors standard _ item _ vec are also adjacent, and thus such hash index tree can serve as a thesaurus index of the standard product database.
After building the standard product thesaurus index, computing device 20 may recall, for any target search term entered by the user, the target standard product corresponding to the target search term based on self-attention model 410 and the standard product thesaurus index, at step 150.
FIG. 8 shows a flowchart of the step 150 of recalling a target standard product according to an embodiment of the present invention.
As shown in FIG. 8, step 150 may include a substep 152 in which computing device 20 receives a targeted search term query entered by a particular user.
In sub-step 154, computing device 20 inputs the target search term query from attention model 410 to generate a target search term vector query vec for the target search term query.
Here, the method for generating the target search term vector query _ vec of the target search term query is similar to the method for generating the corresponding search term vector query _ vec, positive sample vector positive _ item _ vec and negative sample vector negative _ item _ vec for the search term query, positive sample positive _ item and negative sample negative _ item respectively in step 120, and therefore, the description thereof is omitted.
Next, in sub-step 156, computing device 20 recalls the target name vector standard _ item _ vec corresponding to the target search word vector query _ vec in the standard product thesaurus index.
As mentioned before, the standard product thesaurus index is constructed using LSH, so the nearest vector found from the table product thesaurus index according to the target search word vector query _ vec is considered as the target name vector standard _ item _ vec corresponding to the target search word vector query _ vec, which corresponds to a standard product name.
In sub-step 158, the computing device 20 obtains a standard product name standard _ item _ vec corresponding to the target name vector standard _ item _ vec as the standard product name of the target search term query (e.g., based on the standard product name standard _ item and its correspondence of the name vector standard _ item _ vec).
Further, the computing device 20 may perform a search based on the standard product name to obtain more accurate search results.
FIG. 9 illustrates a block diagram of a computing device 900 suitable for implementing embodiments of the present invention. Computing device 900 may be, for example, computing device 20 or server 30 as described above.
As shown in fig. 9, computing device 900 may include one or more Central Processing Units (CPUs) 910 (only one shown schematically) that may perform various suitable actions and processes in accordance with computer program instructions stored in a Read Only Memory (ROM) 920 or loaded from a storage unit 980 into a Random Access Memory (RAM) 930. In the RAM 930, various programs and data required for operation of the computing device 900 may also be stored. The CPU 910, ROM 920, and RAM 930 are connected to each other via a bus 940. An input/output (I/O) interface 950 is also connected to bus 940.
A number of components in computing device 900 are connected to I/O interface 950, including: an input unit 960 such as a keyboard, a mouse, etc.; an output unit 970 such as various types of displays, speakers, and the like; a storage unit 980 such as a magnetic disk, optical disk, or the like; and a communication unit 990 such as a network card, a modem, a wireless communication transceiver, or the like. The communication unit 990 allows the computing device 900 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks.
The method 100 described above may be performed, for example, by the CPU 910 of the computing device 900 (e.g., computing device 20 or server 30). For example, in some embodiments, the method 100 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 980. In some embodiments, some or all of the computer program can be loaded and/or installed onto computing device 900 via ROM 920 and/or communications unit 990. When loaded into RAM 930 and executed by CPU 910, may perform one or more of the operations of method 100 described above. Further, the communication unit 990 may support wired or wireless communication functions.
Those skilled in the art will appreciate that the computing device 900 shown in FIG. 9 is merely illustrative. In some embodiments, computing device 20 or server 30 may contain more or fewer components than computing device 900.
The search term normalization method 100 and the computing device 900 that may be used as the computing device 20 or the server 30 according to the present invention are described above with reference to the accompanying drawings. However, it will be appreciated by those skilled in the art that the performance of the steps of the method 100 is not limited to the order shown in the figures and described above, but may be performed in any other reasonable order. Further, the computing device 900 need not include all of the components shown in FIG. 9, it may include only some of the components necessary to perform the functions described in the present invention, and the manner in which these components are connected is not limited to the form shown in the figures.
And (3) simulation results:
the following table shows the comparison of simulation results of the deep semantic similarity model 400 according to the present invention with a conventional word embedding model:
Figure 409886DEST_PATH_IMAGE086
the various test data are illustrated below:
sequencing test: by manual screening<Search term query, positive sample, negative sample>Test sets, e.g.<Safety helmet, helmet and safety shoes>. Is defined as the similarity score output by the deep semantic similarity model 400 when
Figure 488569DEST_PATH_IMAGE087
If so, the entry is recorded as a correct entry, otherwise, the entry is an incorrect entry. Rank score = number of correct terms ranked/total test samples ranked.
Synonym testing: defining a test sample based on an industrial product synonym library accumulated by the applicant
Figure 141268DEST_PATH_IMAGE088
WhereinAIn the name of the product,
Figure 183173DEST_PATH_IMAGE089
is a synonym for the name of a product,Bfor randomly sampled product names, e.g.<Pressure regulating valve, pressure reducing valve and breaker lock>. Definition of
Figure 496605DEST_PATH_IMAGE090
The similarity score output for the deep semantic similarity model 400 when
Figure 824818DEST_PATH_IMAGE091
If so, the entry is recorded as a correct entry, otherwise, the entry is an incorrect entry. Synonym score = number of synonym correct terms/number of synonym total test samples.
And (3) lexical testing: the lexical test is divided into four modules, A, AB based on the stopword lexical, and Prefix and Suffix based on the affix lexical. The lexical test sample is defined as<
Figure 636916DEST_PATH_IMAGE092
>Wherein C and D are basic vocabulary phrases,
Figure 200622DEST_PATH_IMAGE093
the extended words are constructed based on the lexical rules and based on C and D, the basic vocabulary phrase set is defined as V, and the corresponding extended word set is defined as V
Figure 414565DEST_PATH_IMAGE094
. The following are examples of A, AB, Prefix, and Suffix.
A: based on "AA", "a-a", "a comes a go" the constructed pair of expanded words, e.g., < smile, smile >, < guess, guess >, < run, run come go >.
AB: and expanding word pairs based on the structures of AABB and AB in A, such as < happy, happy >, < flustered, flustered >.
Prefix: and expanding word pairs constructed based on Chinese prefix characters such as 'the second', 'big', and the like, such as 'the second, the second', 'the sea, the sea'.
And, Suffix: the expansion word pair constructed based on Chinese suffix words such as 'son' and 'person' is disclosed, for example, < chair, chair >, < production, producer >.
Definition of
Figure 913680DEST_PATH_IMAGE095
For the vectors output by the deep semantic similarity model 400, first, the vectors are computed
Figure 164139DEST_PATH_IMAGE096
To a
Figure 547847DEST_PATH_IMAGE097
Calculating
Figure 209773DEST_PATH_IMAGE098
Based on
Figure 4422DEST_PATH_IMAGE099
Selecting
Figure 853429DEST_PATH_IMAGE100
To make
Figure 509670DEST_PATH_IMAGE101
And
Figure 448938DEST_PATH_IMAGE102
if selected, is closest to
Figure 289855DEST_PATH_IMAGE103
And recording the data as a correct item, otherwise, recording the data as an error item. Lexical score = number of lexical correct items/number of total lexical test samples.
From the simulation results, it can be seen that, by using the deep semantic similarity model 400 of the present invention, the evaluation scores for the ranking test, the synonym test, and the lexical test are far higher than those of the conventional word embedding model.
The present invention may be methods, apparatus, systems and/or computer program products. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therein for carrying out aspects of the present invention.
In one or more exemplary designs, the functions described herein may be implemented in hardware, software, firmware, or any combination thereof. For example, if implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The units of the apparatus disclosed herein may be implemented using discrete hardware components, or may be integrally implemented on a single hardware component, such as a processor. For example, the various illustrative logical blocks, modules, and circuits described in connection with the invention may be implemented or performed with a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both.
The previous description of the invention is provided to enable any person skilled in the art to make or use the invention. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the present invention is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of search term normalization, comprising:
constructing a plurality of pieces of training data based on historical data of a plurality of users, wherein each piece of training data comprises a search word, a positive sample and a negative sample, wherein the positive sample indicates product data in a search result based on the search word, the user performed a predetermined operation, and the negative sample indicates product data irrelevant to the search result;
constructing a self-attention model of a depth semantic similarity model and training the self-attention model by utilizing the plurality of pieces of training data, wherein for each piece of training data, the self-attention model outputs a search word vector, a positive sample vector and a negative sample vector;
determining a loss function of the deep semantic similarity model based on the search term vectors, the positive sample vectors, and the negative sample vectors of the plurality of pieces of training data;
generating a standard product thesaurus index based on the self-attention model and a standard product database; and
determining a standard product name for a target search term input by a particular user based on the self-attention model and the standard product thesaurus index.
2. The method of claim 1, wherein constructing a plurality of pieces of training data comprises:
determining a search term from the historical data of the plurality of users;
determining, as the positive sample, product data in which a user who inputs the search word has performed the predetermined operation on a search result, wherein the predetermined operation includes at least one of clicking, joining a shopping cart, and purchasing; and
determining the negative examples based on the standard product database.
3. The method of claim 2, wherein the negative examples include at least one of:
product data randomly selected from the standard product database; and
product data in the standard product database having the same parent category as the positive sample.
4. The method of claim 1, wherein training the self-attention model with the plurality of pieces of training data comprises:
performing character-level word embedding on a search word, a positive sample and a negative sample of each piece of training data respectively to obtain character-level word embedding vectors of the search word, the positive sample and the negative sample respectively;
respectively performing odd-even position coding on a search word, a positive sample and a negative sample of each piece of training data to respectively obtain position coding vectors of the search word, the positive sample and the negative sample;
merging and normalizing the character-level word embedding vectors and the position coding vectors of the search words, the positive samples and the negative samples respectively to obtain normalized vectors of the search words, the positive samples and the negative samples;
in each of at least one self-attention head, performing an operation on the normalized vectors of the search word, the positive sample, and the negative sample using a self-attention function to obtain self-attention vectors of the search word, the positive sample, and the negative sample, respectively;
in each self-attention head, operating on the self-attention vectors of the search word, the positive sample and the negative sample by using a nonlinear activation function to obtain full-connected vectors of the search word, the positive sample and the negative sample; and
averaging at least one fully-connected vector of the search term, the positive sample, and the negative sample obtained from the at least one self-attention head, respectively, to obtain the search term vector, the positive sample vector, and the negative sample vector.
5. The method of claim 1, wherein determining a loss function of the depth semantic similarity model comprises:
determining a first similarity between the search word vector and the positive sample vector and a second similarity between the search word vector and the negative sample vector based on the search word vector, the positive sample vector, and the negative sample vector of each piece of training data, respectively;
determining a similarity loss function based on the first similarity and the second similarity;
determining a category classification loss function based on categories of search terms and search term vectors of the plurality of pieces of training data;
determining a loss function of a depth semantic similarity model based on the similarity loss function and the category classification loss function.
6. The method of claim 5, wherein the similarity Loss function comprises a triple Loss function (Triplet Loss).
7. The method of claim 1, wherein generating a standard product thesaurus index comprises:
acquiring the standard product database;
inputting each standard product name in the standard product database into the self-attention model to generate a name vector for the standard product name; and
a random hyperplane-based locality-sensitive hashing algorithm operates on the name vectors of all standard product names in the standard product database to generate a hash index tree of the name vectors of all standard product names in the standard product database,
wherein hash indices of adjacent name vectors are also adjacent in the hash index tree.
8. The method of claim 1, wherein determining a standard product name for a particular user-entered target search term based on the self-attention model and the standard product thesaurus index comprises:
receiving a target search word input by the specific user;
inputting the target search term into the self-attention model to generate a target search term vector for the target search term;
recalling a target name vector corresponding to the target search word vector in the standard product thesaurus index; and
and acquiring a standard product name corresponding to the target name vector as the standard product name of the target search term.
9. A computing device, comprising:
at least one processor; and
at least one memory coupled to the at least one processor and storing instructions for execution by the at least one processor, the instructions when executed by the at least one processor causing the computing device to perform the steps of the method of any of claims 1-8.
10. A computer readable storage medium having stored thereon computer program code which, when executed, performs the method of any of claims 1 to 8.
CN202011374977.1A 2020-12-01 2020-12-01 Search term normalization method, computing device, and computer-readable storage medium Active CN112182144B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011374977.1A CN112182144B (en) 2020-12-01 2020-12-01 Search term normalization method, computing device, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011374977.1A CN112182144B (en) 2020-12-01 2020-12-01 Search term normalization method, computing device, and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN112182144A true CN112182144A (en) 2021-01-05
CN112182144B CN112182144B (en) 2021-03-05

Family

ID=73918289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011374977.1A Active CN112182144B (en) 2020-12-01 2020-12-01 Search term normalization method, computing device, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN112182144B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221530A (en) * 2021-04-19 2021-08-06 杭州火石数智科技有限公司 Text similarity matching method and device based on circle loss, computer equipment and storage medium
CN116089729A (en) * 2023-03-31 2023-05-09 浙江口碑网络技术有限公司 Search recommendation method, device and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120173547A1 (en) * 2008-04-22 2012-07-05 Uc4 Software Gmbh Method Of Detecting A Reference Sequence Of Events In A Sample Sequence Of Events
CN110795544A (en) * 2019-09-10 2020-02-14 腾讯科技(深圳)有限公司 Content search method, device, equipment and storage medium
CN111078842A (en) * 2019-12-31 2020-04-28 北京每日优鲜电子商务有限公司 Method, device, server and storage medium for determining query result

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120173547A1 (en) * 2008-04-22 2012-07-05 Uc4 Software Gmbh Method Of Detecting A Reference Sequence Of Events In A Sample Sequence Of Events
CN110795544A (en) * 2019-09-10 2020-02-14 腾讯科技(深圳)有限公司 Content search method, device, equipment and storage medium
CN111078842A (en) * 2019-12-31 2020-04-28 北京每日优鲜电子商务有限公司 Method, device, server and storage medium for determining query result

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113221530A (en) * 2021-04-19 2021-08-06 杭州火石数智科技有限公司 Text similarity matching method and device based on circle loss, computer equipment and storage medium
CN113221530B (en) * 2021-04-19 2024-02-13 杭州火石数智科技有限公司 Text similarity matching method and device, computer equipment and storage medium
CN116089729A (en) * 2023-03-31 2023-05-09 浙江口碑网络技术有限公司 Search recommendation method, device and storage medium
CN116089729B (en) * 2023-03-31 2023-07-18 浙江口碑网络技术有限公司 Search recommendation method, device and storage medium

Also Published As

Publication number Publication date
CN112182144B (en) 2021-03-05

Similar Documents

Publication Publication Date Title
WO2020182019A1 (en) Image search method, apparatus, device, and computer-readable storage medium
CN108959246B (en) Answer selection method and device based on improved attention mechanism and electronic equipment
CN111444320B (en) Text retrieval method and device, computer equipment and storage medium
CN108875074B (en) Answer selection method and device based on cross attention neural network and electronic equipment
US20180158078A1 (en) Computer device and method for predicting market demand of commodities
US8918348B2 (en) Web-scale entity relationship extraction
CN110943981B (en) Cross-architecture vulnerability mining method based on hierarchical learning
CN112395487B (en) Information recommendation method and device, computer readable storage medium and electronic equipment
CN112182144B (en) Search term normalization method, computing device, and computer-readable storage medium
CN112800344B (en) Deep neural network-based movie recommendation method
CN111813930B (en) Similar document retrieval method and device
CN113239071B (en) Retrieval query method and system for scientific and technological resource subject and research topic information
CN110427480B (en) Intelligent personalized text recommendation method and device and computer readable storage medium
CN112231569A (en) News recommendation method and device, computer equipment and storage medium
CN111581923A (en) Method, device and equipment for generating file and computer readable storage medium
CN111400584A (en) Association word recommendation method and device, computer equipment and storage medium
CN112214623A (en) Image-text sample-oriented efficient supervised image embedding cross-media Hash retrieval method
CN113722512A (en) Text retrieval method, device and equipment based on language model and storage medium
CN112131453A (en) Method, device and storage medium for detecting network bad short text based on BERT
US20220198149A1 (en) Method and system for machine reading comprehension
CN117076636A (en) Information query method, system and equipment for intelligent customer service
CN109902162B (en) Text similarity identification method based on digital fingerprints, storage medium and device
Coviello et al. Multivariate Autoregressive Mixture Models for Music Auto-Tagging.
Bhattacharya et al. Beyond hard negatives in product search: Semantic matching using one-class classification (smocc)
CN110275957B (en) Name disambiguation method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CB03 Change of inventor or designer information

Inventor after: Yang Han

Inventor after: Chen Guangshun

Inventor after: Chen Hongli

Inventor before: Yang Han

Inventor before: Chen Guangshun

CB03 Change of inventor or designer information