CN112182323A - Category prediction method and device, electronic equipment and storage medium - Google Patents

Category prediction method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN112182323A
CN112182323A CN202010988807.6A CN202010988807A CN112182323A CN 112182323 A CN112182323 A CN 112182323A CN 202010988807 A CN202010988807 A CN 202010988807A CN 112182323 A CN112182323 A CN 112182323A
Authority
CN
China
Prior art keywords
category
target
knowledge
knowledge base
semantic features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010988807.6A
Other languages
Chinese (zh)
Inventor
任磊
王金刚
刘金宝
杨扬
步佳昊
张富峥
王仲远
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sankuai Online Technology Co Ltd
Original Assignee
Beijing Sankuai Online Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sankuai Online Technology Co Ltd filed Critical Beijing Sankuai Online Technology Co Ltd
Priority to CN202010988807.6A priority Critical patent/CN112182323A/en
Publication of CN112182323A publication Critical patent/CN112182323A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/908Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a category prediction method, a category prediction device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a target search word input by a user; extracting semantic features of the target search words; extracting target entities included by the target search terms; querying a category to which a target entity belongs in a predetermined category knowledge base; setting knowledge flag bit characteristics for each category in a category knowledge base according to the category to which the target entity belongs; and determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base and the category semantic features corresponding to each category in the category knowledge base. Therefore, when the categories are predicted, the target entities are identified through the entity identification task, the high-quality category knowledge base is introduced, finally, the target categories are determined through the semantic features of the target search words, the category knowledge flag bit features and the category semantic features, and therefore the accuracy of category prediction is improved.

Description

Category prediction method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of internet technologies, and in particular, to a category prediction method, apparatus, electronic device, and storage medium.
Background
In the internet era, a search engine is a basic and core function, and no matter in the traditional internet or the mobile internet, a good search engine can understand the search intention of a user, so that the user can find information which the user wants to search in time. For a search engine, the core of understanding the user's search intention is to understand the search words input by the user, and generally speaking, the understanding of the search word text includes category prediction to analyze which categories the search word intention is more relevant to.
In the related art, the methods for category prediction are generally: the category to which the search word belongs is predicted by counting user behavior data of recall results of the user to recall the search word under each category, and if the user behavior data of the recall results of the user to recall certain category is more, the search word is determined to belong to the category.
However, the category prediction method in the related art has a relatively obvious Martian effect. In particular, statistical user behavior data may dominate over more exposed categories, thereby biasing predicted categories more toward more exposed categories, resulting in less category prediction accuracy.
Disclosure of Invention
In order to solve the technical problem of low category prediction accuracy in the related art, the application shows a category prediction method, a category prediction device, an electronic device and a storage medium.
In a first aspect, an embodiment of the present application illustrates a category prediction method, where the method includes:
acquiring a target search word input by a user;
extracting semantic features of the target search words;
extracting target entities included by the target search terms;
querying a category to which the target entity belongs in a predetermined category knowledge base;
setting knowledge flag bit characteristics for each category in the category knowledge base according to the category to which the target entity belongs;
and determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base and the category semantic features corresponding to each category in the category knowledge base.
Optionally, the setting of the knowledge flag bit feature for each category in the category knowledge base according to the category to which the target entity belongs includes:
setting a first knowledge zone bit characteristic for the category to which the target entity belongs;
and setting a second knowledge zone bit characteristic for other categories except the category to which the target entity belongs in the category knowledge base, wherein the second knowledge zone bit characteristic is different from the first knowledge zone bit characteristic.
Optionally, the determining, based on the semantic features, knowledge flag bit features corresponding to each category in the category knowledge base, and category semantic features corresponding to each category in the category knowledge base, a target category to which the target search word belongs includes:
obtaining category semantic features corresponding to each category in the category knowledge base;
for any category in the category knowledge base, splicing the semantic features, the knowledge flag bit features corresponding to the category and the category semantic features corresponding to the category to obtain spliced features;
inputting the spliced features corresponding to each category in the category knowledge base into a pre-trained classifier to obtain a probability value corresponding to each category;
and determining the category with the probability value larger than the preset probability value as a target category corresponding to the target search word.
Optionally, the obtaining of the category semantic features corresponding to each category in the category knowledge base includes:
acquiring first category semantic features of a first dimension corresponding to each category in the category knowledge base;
reducing the dimension of the first category semantic features corresponding to each category to obtain second category semantic features of a second dimension;
and for each category in the category knowledge base, determining the second category semantic features corresponding to the category as the category semantic features corresponding to the category.
Optionally, the extracting semantic features of the target search term includes:
and inputting the target search word into a pre-trained semantic feature extraction model to obtain the semantic features of the target search word.
In a second aspect, an embodiment of the present application provides a category prediction apparatus, including:
the search word acquisition module is used for acquiring a target search word input by a user;
the semantic feature extraction module is used for extracting semantic features of the target search words;
the entity extraction module is used for extracting a target entity included by the target search term;
the category query module is used for querying the category to which the target entity belongs in a predetermined category knowledge base;
the knowledge flag bit feature setting module is used for setting knowledge flag bit features for each category in the category knowledge base according to the category to which the target entity belongs;
and the category determining module is used for determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to the categories in the category knowledge base and the category semantic features corresponding to the categories in the category knowledge base.
Optionally, the knowledge flag bit feature setting module is specifically configured to:
setting a first knowledge zone bit characteristic for the category to which the target entity belongs;
and setting a second knowledge zone bit characteristic for other categories except the category to which the target entity belongs in the category knowledge base, wherein the second knowledge zone bit characteristic is different from the first knowledge zone bit characteristic.
Optionally, the category determining module includes:
the category semantic feature acquisition unit is used for acquiring category semantic features corresponding to various categories in the category knowledge base;
the feature splicing unit is used for splicing the semantic features, the knowledge flag bit features corresponding to the categories and the category semantic features corresponding to the categories to obtain spliced features for any category in the category knowledge base;
the probability value calculation unit is used for inputting the spliced features corresponding to all categories in the category knowledge base into a pre-trained classifier to obtain the probability values corresponding to all categories;
and the category determining unit is used for determining the category with the probability value larger than the preset probability value as the target category corresponding to the target search word.
Optionally, the category semantic feature obtaining unit is specifically configured to:
acquiring first category semantic features of a first dimension corresponding to each category in the category knowledge base;
reducing the dimension of the first category semantic features corresponding to each category to obtain second category semantic features of a second dimension;
and for each category in the category knowledge base, determining the second category semantic features corresponding to the category as the category semantic features corresponding to the category.
Optionally, the semantic feature extraction module is specifically configured to:
and inputting the target search word into a pre-trained semantic feature extraction model to obtain the semantic features of the target search word.
In a third aspect, the present application shows an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the category prediction method according to the first aspect when executing the program.
In a fourth aspect, this application embodiment shows a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the category prediction method according to the first aspect.
According to the technical scheme provided by the embodiment of the application, the target search word input by the user is obtained; extracting semantic features of the target search words; extracting target entities included by the target search terms; querying a category to which a target entity belongs in a predetermined category knowledge base; setting knowledge flag bit characteristics for each category in a category knowledge base according to the category to which the target entity belongs; and determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base and the category semantic features corresponding to each category in the category knowledge base. Therefore, according to the technical scheme provided by the embodiment of the application, when the category is predicted, the entity included in the target search word identified by the entity identification task is introduced, the high-quality category knowledge base is introduced, and finally, the target category to which the target search word belongs is determined according to the semantic features of the target search word, the category knowledge flag bit features and the category semantic features, so that the accuracy of category prediction is improved.
Drawings
FIG. 1 is a flowchart illustrating steps of a category prediction method according to an embodiment of the present disclosure;
FIG. 2 is a flowchart of the steps of one embodiment of S160 of FIG. 1;
FIG. 3 is a schematic illustration of category prediction in conjunction with specific examples provided by embodiments of the present application;
fig. 4 is a block diagram illustrating a category prediction apparatus according to an embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.
In the internet era, a search engine is a basic and core function, and no matter traditional internet or mobile internet, a good search engine can understand the search requirements of users, so that users can find information which the users want to search for, such as news, videos, restaurants, hotels and commodities, for the first time. For a search engine, the most core information for understanding the search intention of a user is to understand Query searched by the user, i.e., search word text input by the user, and therefore, the degree of understanding of the search word text by the search engine directly determines the user experience of the search engine.
Generally, the understanding of the search word text includes keyword extraction, rule matching, rewriting, error correction, component recognition, intention recognition, semantic matching, feature extraction, category prediction, and the like. Wherein the category prediction is to analyze which categories the search term intention is more relevant to.
After the search engine receives the search terms and recalls the results, the category relevance is generally used as an important highly selected ranking factor. Therefore, on one hand, the sequencing efficiency is guaranteed, and on the other hand, the relevance of the categories is also guaranteed from the source, so that the user experience is guaranteed. Meanwhile, the method can help to optimize the recall result, promote the recall precision and reduce the false recall rate. Therefore, the improvement of the category prediction accuracy is beneficial to greatly improving the search accuracy and recall rate and improving the use experience of users.
In the related art, the methods for category prediction are generally: the category to which the search word belongs is predicted by counting user behavior data of recall results of the user to recall the search word under each category, and if the user behavior data of the recall results of the user to recall certain category is more, the search word is determined to belong to the category.
However, the category prediction method in the related art has a relatively obvious Martian effect. Specifically, first, statistical user behavior data dominates over more exposed categories, making predicted categories more biased towards more exposed categories; secondly, the problem of cold start is solved, and if the situation of adding new categories exists, even if the user behavior data is increased in the near day, the existing categories are difficult to catch up on the statistical value; in addition, the purely statistical-based approach does not take into account the semantic relevance of the text. As can be seen from the above description, the category prediction method in the related art has a technical problem that the accuracy of the category prediction is low.
Therefore, the embodiment of the application provides a category prediction method, a category prediction device, an electronic device and a storage medium.
In a first aspect, a category prediction method provided in an embodiment of the present application is first described in detail.
It should be noted that an execution subject of the category prediction method provided in the embodiment of the present application may be a category prediction apparatus, and the category prediction apparatus may be run in an electronic device, and the electronic device may be a server, and the embodiment of the present application does not specifically limit the electronic device.
Referring to fig. 1, a flowchart illustrating steps of a category prediction method according to the present application is shown, which may specifically include the following steps:
s110, acquiring the target search term input by the user.
The target search term may be any search term input by the user. For example, the target search term may be a submarine skip, a submarine skip primer, a self-service hot pot or a hot pot, and the like, and the target search term is not specifically limited in the embodiment of the present application.
And S120, extracting semantic features of the target search words.
Specifically, after the target search word is obtained, a semantic feature of the target search word may be extracted, where the semantic feature may be a feature vector for representing the semantic meaning of the target search word.
In one embodiment, extracting the semantic features of the target search term may include the following steps:
and inputting the target search word into a pre-trained semantic feature extraction model to obtain the semantic features of the target search word.
In this embodiment, the semantic feature extraction model is used to extract deep semantic features of the target search term. The semantic feature extraction model may be a BERT model, an ERNIE model, an XLNET model, a MASS model, or the like, which can extract semantic features, and the semantic feature extraction model is not specifically limited in the embodiments of the present application. The training process of the semantic feature extraction model can be understood by those skilled in the art, and will not be described in detail herein.
For example, when the target search word is the Qiaqian walnut-flavored melon seed, the semantic feature extraction model is used for inputting the Qian walnut-flavored melon seed, and then the voice feature vector for representing the semantic meaning of the Qian walnut-flavored melon seed can be obtained.
And S130, extracting target entities included by the target search terms.
Specifically, after the target search term is obtained, the target search term may be input into a pre-trained entity extraction model, for example, the target search term may be input into a pre-trained NER model, so as to obtain a target entity included in the target search term, where the target entity may be a merchant, a commodity, a brand, or the like included in the target search term. For example, when the target search term is the exact walnut-flavored melon seeds, the target entities included in the obtained target search term can be the melon seeds.
The above describes, by way of example, only the target entities included in the target search term extracted by the NER model, and the embodiment of the present application is not particularly limited to the manner of extracting the target entities included in the target search term.
S140, querying the category to which the target entity belongs in a predetermined category knowledge base.
Specifically, the search is one of the core portals that the mei-zong and the point-review carry the user's search request. Compared with a general search engine, O2O (online To offline) search scene, the user search request is explicit, and is mainly structured merchant POI and commodity SPU.
For example, the mei-gang search accounts for a lot of search traffic, and the semantic relevance between search terms and categories is one of the key factors influencing the search experience of users. In search services on both sides of the American group and the point assessment, the relevance of search terms and categories is an important characteristic of the ranking model.
In the O2O search scenario, the text length of the search word is short, and compared with normal text, the information that can be mined is less, and the relevance determination of the search word to the category is more difficult. However, due to the service attribute of O2O, although the search word text is short in length, the search word often contains more structured information such as POI/SPU/BRAND (merchant/commodity/BRAND), and at the same time, when the merchant or commodity comes online, the merchant operator fills in the background multi-level categories to which the merchant or commodity belongs, and the categories are checked and maintained by the american group operator, and the category knowledge base is a high-quality artificial marking knowledge base.
The storage structure of the category knowledge base may be as shown in table one.
Watch 1
POI/SPU Class I order Class II order Class III
Fuji apple (SPU) Fresh fruit and vegetable Fruit Apple (Malus pumila)
Submarine fishing (POI) Food Chafing dish Sichuan flavor/Chongqing chafing dish
Seabed fishing bed material (POI) Flavouring of grain and oil Seasoning Hotpot condiment
Self-service chafing dish (SPU) Instant food Cooked food Chafing dish
Therefore, after the target entity included in the target search term is obtained, the category to which the target entity belongs can be queried in a predetermined category knowledge base.
For example, when the target entity is a submarine fishing, the category to which the target entity belongs is a cate category; when the target entity is the seabed fishing bottom material, the category of the target entity is grain and oil seasoning category; when the target entity is the self-help hot pot, the category to which the target entity belongs is the category of cooked food and instant food, and when the target entity is the hot pot, the category to which the target entity belongs is the category of delicious food.
For another example, when the target entity is apple, the category to which the target entity inquired in the category knowledge base belongs is: the first class, the second class and the third class corresponding to Fuji apples, namely the class to which Fuji apples belong, are as follows: fresh fruits and vegetables, fruits and apples.
S150, setting knowledge flag bit characteristics for each category in the category knowledge base according to the category to which the target entity belongs.
Specifically, after the category to which the target entity belongs is searched in the category knowledge base, the knowledge flag bit features may be set for each category in the category knowledge base according to the category to which the target entity belongs. It will be appreciated that the knowledge flag characteristics set for the category to which the target entity belongs are different from the knowledge flag characteristics set for other categories in the category knowledge base. For example, the knowledge flag bit feature set for the category to which the target entity belongs may be 1, and the knowledge flag bit feature set for other categories in the category knowledge base may be 0.
In one embodiment, the setting of the knowledge flag bit feature for each category in the category knowledge base according to the category to which the target entity belongs may include the following steps:
step a, setting a first knowledge zone bit characteristic for a category to which a target entity belongs.
And b, setting a second knowledge zone bit characteristic for other categories except the category to which the target entity belongs in the category knowledge base, wherein the second knowledge zone bit characteristic is different from the first knowledge zone bit characteristic.
In this embodiment, in order to accurately introduce the high-quality external category knowledge base, after the category to which the target entity belongs is queried in the category knowledge base, different knowledge flag bit features may be set for the category to which the target entity belongs and other categories in the category knowledge base. The first knowledge flag feature may be 0, and the second knowledge flag feature may be 0.
S160, determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base and the category semantic features corresponding to each category in the category knowledge base.
Specifically, in order to accurately determine the target category to which the target entity belongs, category semantic features corresponding to each category in a category knowledge base are also combined.
In one embodiment, S160, determining a target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base, and the category semantic features corresponding to each category in the category knowledge base, as shown in fig. 2, may include the following steps:
s161, obtaining category semantic features corresponding to each category in the category knowledge base.
In particular, a multi-layer deep neural network DNN structure may be employed to encode semantic representations of various categories in a category knowledge base. Of course, other encoding methods may be used to encode the semantic features of each category in the category knowledge base, which is not specifically limited in this application.
In practical applications, if the dimension of the category semantic features is too large, the category prediction efficiency may be affected, and therefore, as an implementation manner of the embodiment of the present application, S161, obtaining the category semantic features corresponding to each category in the category knowledge base may include the following steps, which are step b1 to step b 3:
and b1, acquiring first category semantic features of a first dimension corresponding to each category in the category knowledge base.
And b2, performing dimensionality reduction on the first category semantic features corresponding to each category to obtain second category semantic features of a second dimension.
And b3, for each category in the category knowledge base, determining the second category semantic feature corresponding to the category as the category semantic feature corresponding to the category.
In this implementation, the dimension of the category semantic feature may be reduced from a first dimension to a second dimension, and for each category, the second category semantic feature of the second dimension after the dimension reduction may be determined as the category semantic feature of the category. Therefore, the dimensionality of the obtained category semantic features is low, the calculated amount of predicting the target category to which the target search word belongs can be reduced, and the efficiency of presetting the target category is improved.
And S162, for any category in the category knowledge base, splicing the semantic features, the knowledge flag bit features corresponding to the category and the category semantic features corresponding to the category to obtain spliced features.
As can be seen from the above description, for any category in the category knowledge base, the knowledge flag bit features corresponding to the category and the category semantic features corresponding to the category are determined, and in order to accurately determine the target category to which the target search word belongs in the subsequent steps, for any category in the category knowledge base, the semantic features of the target search word, the knowledge flag bit features corresponding to the category and the category semantic features corresponding to the category can be spliced to obtain the spliced features. That is, the number of categories in the category knowledge base is the same as the number of features after splicing.
And S163, inputting the spliced characteristics corresponding to each category in the category knowledge base into a pre-trained classifier to obtain a probability value corresponding to each category.
Specifically, after the spliced features corresponding to each category in the category knowledge base are obtained, the spliced features corresponding to each category can be input into a pre-trained classifier, and then the probability value corresponding to each category can be obtained.
The probability value corresponding to each category is used for representing the probability that the target search word belongs to the category. If the probability value corresponding to one category is larger, the probability that the target search word belongs to the category is higher; if the probability value corresponding to one category is smaller, the probability that the target search word belongs to the category is lower.
And S164, determining the category with the probability value larger than the preset probability value as a target category corresponding to the target search word.
Specifically, in practical application, the target search term may belong to only one category, or the target search term may belong to multiple categories at the same time, so that the category having the probability value greater than the preset probability value may be determined as the target category corresponding to the target search term. The size of the preset probability value can be determined according to actual conditions, and the size of the preset probability value is not specifically limited in the embodiment of the application.
According to the technical scheme provided by the embodiment of the application, the target search word input by the user is obtained; extracting semantic features of the target search words; extracting target entities included by the target search terms; querying a category to which a target entity belongs in a predetermined category knowledge base; setting knowledge flag bit characteristics for each category in a category knowledge base according to the category to which the target entity belongs; and determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base and the category semantic features corresponding to each category in the category knowledge base. Therefore, according to the technical scheme provided by the embodiment of the application, when the category is predicted, the entity included in the target search word identified by the entity identification task is introduced, the high-quality category knowledge base is introduced, and finally, the target category to which the target search word belongs is determined according to the semantic features of the target search word, the category knowledge flag bit features and the category semantic features, so that the accuracy of category prediction is improved.
For clarity of description of the scheme, the category prediction method will be described in detail below with reference to specific examples.
As shown in fig. 3, category prediction may be achieved by a model as shown in fig. 3.
As can be seen from fig. 3, the overall model architecture is mainly divided into four parts: the method comprises the following steps of (1) a Query semantic representation part, (2) a knowledge integration part based on an entity, (c) a category semantic representation part, and (c) a feature fusion classification part.
The Query semantic representation part adopts a pre-training model BERT to learn deep semantic features in the user Query. In fig. 3, the user Query is a chachiya flavored melon seed. In the embodiment shown in fig. 3, BERT is taken as an example to illustrate the scheme, and in practical applications, the pre-training model may also select other models such as ERNIE, XLNET, and MASS, which is not specifically limited in this application.
And secondly, integrating part based on knowledge of the entity. Inputting the target search word into an NER model, wherein the NER model identifies entities included by the target search word, such as a merchant POI, a commodity SPU and a BRAND BRAND, and the entities identified in FIG. 3 are melon seeds; and inquiring the category to which the entity belongs in a category Knowledge Base (Knowledge Base) according to the entity identified by the NER, and setting corresponding Knowledge flag bit characteristics according to the inquired category to which the entity belongs. Specifically, the knowledge flag bit feature that can be set for the queried generic category may be 1, and the knowledge flag bit feature that can be set for other categories in the category knowledge base may be 0.
The category Knowledge Base (Knowledge Base) is a high-quality Knowledge Base, and is filled in by merchant operators when merchants or commodities are online, and a background multi-level category system of the merchants or the commodities is checked and maintained by the company operators. The storage structure of the category Knowledge Base (Knowledge Base) is shown in table one of the above embodiments, and is not specifically limited herein.
And thirdly, semantic representation part of the categories. In particular, a multi-layer DNN structure may be employed to encode semantic representations of classes. The DNN coding method is taken as an example of the part, and other coding methods may also be adopted, which is not specifically limited in the embodiment of the present application.
Wherein, each category corresponds to a category semantic feature vector. As shown in FIG. 3, category C1Category C2,.NEach category corresponds to a category semantic feature vector Categery N, and each category Categery N can be used for representing the category semantic. Because the dimension of Categer N is usually higher, Categer N can be reduced to 256 dimensions, as shown in FIG. 3, Categer N can be reduced to 768 dimensions first.
And fourthly, fusing and classifying the features. Specifically, after the Query semantic representation, the knowledge flag bit features and the category semantic features are spliced, the user Query is classified through a classifier so as to judge the probability that the user Query belongs to each category. As shown in FIG. 3, P (Q, C)1) For indicating that user Query belongs to category C1Probability value of (2), P (Q, C)2) For indicating that user Query belongs to category C2… …, P (Q, C)n) For indicating that user Query belongs to category CnThe probability value of (2). Finally, the category with the probability value larger than the preset probability value can be determined as the target category to which the user Query belongs.
It should be noted that, when training the model shown in fig. 3, the training data may be a data pair composed of a user Query, a category and an entity component, where the method for constructing the training data is as follows:
(1) based on the user behavior data: and (3) ordering behavior data based on the clicking of the user Query in a category in a period of history, and constructing data formed by (the user Query, the category and the entity components) by combining the online entity identification result. When the training data is constructed, the method is mainly aimed at medium-high frequency user Query, so that the training data can be ensured to be consistent with the training data on line, and the method has higher accuracy.
(2) Based on the category knowledge base: and constructing (POI/SPU/BRAND, category and entity component) data according to the relationship between the POI and the category of the merchant, the relationship between the SPU and the category of the commodity and the relationship between the BRAND BRAND and the category in a manually operated category knowledge base. So that the training set is more uniform in category distribution and more complete, and can help solve the long tail problem.
According to the technical scheme provided by the embodiment of the application, when the category is predicted, the entity included in the target search word identified by the entity identification task is introduced, the high-quality category knowledge base is introduced, and finally, the target category to which the target search word belongs is determined according to the semantic features of the target search word, the category knowledge flag bit features and the category semantic features, so that the accuracy of category prediction is improved.
It is noted that, for simplicity of explanation, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art will also appreciate that the embodiments described in the specification are exemplary and that no action is necessarily required in this application.
In a second aspect, an embodiment of the present application provides a category prediction apparatus, and referring to fig. 4, the apparatus includes:
a search word obtaining module 410, configured to obtain a target search word input by a user;
a semantic feature extraction module 420, configured to extract a semantic feature of the target search term;
an entity extraction module 430, configured to extract a target entity included in the target search term;
a category query module 440, configured to query a predetermined category knowledge base for categories to which the target entity belongs;
a knowledge flag bit feature setting module 450, configured to set a knowledge flag bit feature for each category in the category knowledge base according to the category to which the target entity belongs;
a category determining module 460, configured to determine a target category to which the target search word belongs based on the semantic features, knowledge flag bit features corresponding to each category in the category knowledge base, and category semantic features corresponding to each category in the category knowledge base.
According to the technical scheme provided by the embodiment of the application, the target search word input by the user is obtained; extracting semantic features of the target search words; extracting target entities included by the target search terms; querying a category to which a target entity belongs in a predetermined category knowledge base; setting knowledge flag bit characteristics for each category in a category knowledge base according to the category to which the target entity belongs; and determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base and the category semantic features corresponding to each category in the category knowledge base. Therefore, according to the technical scheme provided by the embodiment of the application, when the category is predicted, the entity included in the target search word identified by the entity identification task is introduced, the high-quality category knowledge base is introduced, and finally, the target category to which the target search word belongs is determined according to the semantic features of the target search word, the category knowledge flag bit features and the category semantic features, so that the accuracy of category prediction is improved.
Optionally, the knowledge flag bit feature setting module is specifically configured to:
setting a first knowledge zone bit characteristic for the category to which the target entity belongs;
and setting a second knowledge zone bit characteristic for other categories except the category to which the target entity belongs in the category knowledge base, wherein the second knowledge zone bit characteristic is different from the first knowledge zone bit characteristic.
Optionally, the category determining module includes:
the category semantic feature acquisition unit is used for acquiring category semantic features corresponding to various categories in the category knowledge base;
the feature splicing unit is used for splicing the semantic features, the knowledge flag bit features corresponding to the categories and the category semantic features corresponding to the categories to obtain spliced features for any category in the category knowledge base;
the probability value calculation unit is used for inputting the spliced features corresponding to all categories in the category knowledge base into a pre-trained classifier to obtain the probability values corresponding to all categories;
and the category determining unit is used for determining the category with the probability value larger than the preset probability value as the target category corresponding to the target search word.
Optionally, the category semantic feature obtaining unit is specifically configured to:
acquiring first category semantic features of a first dimension corresponding to each category in the category knowledge base;
reducing the dimension of the first category semantic features corresponding to each category to obtain second category semantic features of a second dimension;
and for each category in the category knowledge base, determining the second category semantic features corresponding to the category as the category semantic features corresponding to the category.
Optionally, the semantic feature extraction module is specifically configured to:
and inputting the target search word into a pre-trained semantic feature extraction model to obtain the semantic features of the target search word.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
In a third aspect, the present application shows an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the category prediction method according to the first aspect when executing the program.
According to the technical scheme provided by the embodiment of the application, the target search word input by the user is obtained; extracting semantic features of the target search words; extracting target entities included by the target search terms; querying a category to which a target entity belongs in a predetermined category knowledge base; setting knowledge flag bit characteristics for each category in a category knowledge base according to the category to which the target entity belongs; and determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base and the category semantic features corresponding to each category in the category knowledge base. Therefore, according to the technical scheme provided by the embodiment of the application, when the category is predicted, the entity included in the target search word identified by the entity identification task is introduced, the high-quality category knowledge base is introduced, and finally, the target category to which the target search word belongs is determined according to the semantic features of the target search word, the category knowledge flag bit features and the category semantic features, so that the accuracy of category prediction is improved.
In a fourth aspect, this application embodiment shows a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the category prediction method according to the first aspect.
According to the technical scheme provided by the embodiment of the application, the target search word input by the user is obtained; extracting semantic features of the target search words; extracting target entities included by the target search terms; querying a category to which a target entity belongs in a predetermined category knowledge base; setting knowledge flag bit characteristics for each category in a category knowledge base according to the category to which the target entity belongs; and determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base and the category semantic features corresponding to each category in the category knowledge base. Therefore, according to the technical scheme provided by the embodiment of the application, when the category is predicted, the entity included in the target search word identified by the entity identification task is introduced, the high-quality category knowledge base is introduced, and finally, the target category to which the target search word belongs is determined according to the semantic features of the target search word, the category knowledge flag bit features and the category semantic features, so that the accuracy of category prediction is improved.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The above detailed description is provided for a category prediction method, apparatus, electronic device and storage medium, and the principles and embodiments of the present application are explained in detail by applying specific examples, and the descriptions of the above examples are only used to help understand the method and core ideas of the present application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims (12)

1. A method for predicting a category, the method comprising:
acquiring a target search word input by a user;
extracting semantic features of the target search words;
extracting target entities included by the target search terms;
querying a category to which the target entity belongs in a predetermined category knowledge base;
setting knowledge flag bit characteristics for each category in the category knowledge base according to the category to which the target entity belongs;
and determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base and the category semantic features corresponding to each category in the category knowledge base.
2. The method according to claim 1, wherein the setting of the knowledge flag bit feature for each category in the category knowledge base according to the category to which the target entity belongs comprises:
setting a first knowledge zone bit characteristic for the category to which the target entity belongs;
and setting a second knowledge zone bit characteristic for other categories except the category to which the target entity belongs in the category knowledge base, wherein the second knowledge zone bit characteristic is different from the first knowledge zone bit characteristic.
3. The method according to claim 1, wherein the determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to each category in the category knowledge base, and the category semantic features corresponding to each category in the category knowledge base comprises:
obtaining category semantic features corresponding to each category in the category knowledge base;
for any category in the category knowledge base, splicing the semantic features, the knowledge flag bit features corresponding to the category and the category semantic features corresponding to the category to obtain spliced features;
inputting the spliced features corresponding to each category in the category knowledge base into a pre-trained classifier to obtain a probability value corresponding to each category;
and determining the category with the probability value larger than the preset probability value as a target category corresponding to the target search word.
4. The method according to claim 3, wherein the obtaining of category semantic features corresponding to each category in the category knowledge base comprises:
acquiring first category semantic features of a first dimension corresponding to each category in the category knowledge base;
reducing the dimension of the first category semantic features corresponding to each category to obtain second category semantic features of a second dimension;
and for each category in the category knowledge base, determining the second category semantic features corresponding to the category as the category semantic features corresponding to the category.
5. The method according to any one of claims 1 to 4, wherein the extracting semantic features of the target search term comprises:
and inputting the target search word into a pre-trained semantic feature extraction model to obtain the semantic features of the target search word.
6. An apparatus for predicting a category, the apparatus comprising:
the search word acquisition module is used for acquiring a target search word input by a user;
the semantic feature extraction module is used for extracting semantic features of the target search words;
the entity extraction module is used for extracting a target entity included by the target search term;
the category query module is used for querying the category to which the target entity belongs in a predetermined category knowledge base;
the knowledge flag bit feature setting module is used for setting knowledge flag bit features for each category in the category knowledge base according to the category to which the target entity belongs;
and the category determining module is used for determining the target category to which the target search word belongs based on the semantic features, the knowledge flag bit features corresponding to the categories in the category knowledge base and the category semantic features corresponding to the categories in the category knowledge base.
7. The apparatus of claim 6, wherein the knowledge flag feature setting module is specifically configured to:
setting a first knowledge zone bit characteristic for the category to which the target entity belongs;
and setting a second knowledge zone bit characteristic for other categories except the category to which the target entity belongs in the category knowledge base, wherein the second knowledge zone bit characteristic is different from the first knowledge zone bit characteristic.
8. The apparatus of claim 6, wherein the category determination module comprises:
the category semantic feature acquisition unit is used for acquiring category semantic features corresponding to various categories in the category knowledge base;
the feature splicing unit is used for splicing the semantic features, the knowledge flag bit features corresponding to the categories and the category semantic features corresponding to the categories to obtain spliced features for any category in the category knowledge base;
the probability value calculation unit is used for inputting the spliced features corresponding to all categories in the category knowledge base into a pre-trained classifier to obtain the probability values corresponding to all categories;
and the category determining unit is used for determining the category with the probability value larger than the preset probability value as the target category corresponding to the target search word.
9. The apparatus according to claim 8, wherein the category semantic feature obtaining unit is specifically configured to:
acquiring first category semantic features of a first dimension corresponding to each category in the category knowledge base;
reducing the dimension of the first category semantic features corresponding to each category to obtain second category semantic features of a second dimension;
and for each category in the category knowledge base, determining the second category semantic features corresponding to the category as the category semantic features corresponding to the category.
10. The apparatus according to any one of claims 6 to 9, wherein the semantic feature extraction module is specifically configured to:
and inputting the target search word into a pre-trained semantic feature extraction model to obtain the semantic features of the target search word.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the category prediction method as claimed in any one of claims 1 to 5 are implemented by the processor when executing the program.
12. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the category prediction method as claimed in any one of the claims 1 to 5.
CN202010988807.6A 2020-09-18 2020-09-18 Category prediction method and device, electronic equipment and storage medium Pending CN112182323A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010988807.6A CN112182323A (en) 2020-09-18 2020-09-18 Category prediction method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010988807.6A CN112182323A (en) 2020-09-18 2020-09-18 Category prediction method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112182323A true CN112182323A (en) 2021-01-05

Family

ID=73956486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010988807.6A Pending CN112182323A (en) 2020-09-18 2020-09-18 Category prediction method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112182323A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114139041A (en) * 2022-01-28 2022-03-04 浙江口碑网络技术有限公司 Category relevance prediction network training and category relevance prediction method and device
WO2023178965A1 (en) * 2022-03-25 2023-09-28 平安科技(深圳)有限公司 Intent recognition method and apparatus, and electronic device and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114139041A (en) * 2022-01-28 2022-03-04 浙江口碑网络技术有限公司 Category relevance prediction network training and category relevance prediction method and device
CN114139041B (en) * 2022-01-28 2022-05-13 浙江口碑网络技术有限公司 Category relevance prediction network training and category relevance prediction method and device
WO2023178965A1 (en) * 2022-03-25 2023-09-28 平安科技(深圳)有限公司 Intent recognition method and apparatus, and electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN110909182B (en) Multimedia resource searching method, device, computer equipment and storage medium
US8515212B1 (en) Image relevance model
CN106503014B (en) Real-time information recommendation method, device and system
CN111461841B (en) Article recommendation method, device, server and storage medium
CN110334356B (en) Article quality determining method, article screening method and corresponding device
US9117006B2 (en) Recommending keywords
CN109360057B (en) Information pushing method, device, computer equipment and storage medium
CN107341268B (en) Hot searching ranking method and system
CN106557480B (en) Method and device for realizing query rewriting
CN110175895B (en) Article recommendation method and device
WO2013121181A1 (en) Method of machine learning classes of search queries
CN107153656B (en) Information searching method and device
CN109241451B (en) Content combination recommendation method and device and readable storage medium
WO2022095585A1 (en) Content recommendation method and device
KR20160064448A (en) A recommendation method for items by using preference prediction of their similar group
CN112182323A (en) Category prediction method and device, electronic equipment and storage medium
CN107247728B (en) Text processing method and device and computer storage medium
CN112434072A (en) Searching method, searching device, electronic equipment and storage medium
CN108153735B (en) Method and system for acquiring similar meaning words
CN112085058A (en) Object combination recall method and device, electronic equipment and storage medium
CN113792212A (en) Multimedia resource recommendation method, device, equipment and storage medium
CN107133811A (en) The recognition methods of targeted customer a kind of and device
CN109460474B (en) User preference trend mining method
CN113127720A (en) Hot word searching determination method and device
CN108694171B (en) Information pushing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination