CN115329176A - Search request processing method and device, computer equipment and storage medium - Google Patents

Search request processing method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115329176A
CN115329176A CN202210959045.6A CN202210959045A CN115329176A CN 115329176 A CN115329176 A CN 115329176A CN 202210959045 A CN202210959045 A CN 202210959045A CN 115329176 A CN115329176 A CN 115329176A
Authority
CN
China
Prior art keywords
entity
query text
query
category
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210959045.6A
Other languages
Chinese (zh)
Inventor
朱秀红
曹训
张伟
黄泽谦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202210959045.6A priority Critical patent/CN115329176A/en
Publication of CN115329176A publication Critical patent/CN115329176A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a search request processing method and device, computer equipment and a storage medium, and belongs to the technical field of computers. According to the method, the entity object in the original Query is obtained by carrying out entity identification on the original Query, the entity Query is obtained by rewriting on the basis of the entity object, the class prediction is carried out on the entity Query obtained by rewriting, and then the duplication removal is carried out on each candidate class obtained by prediction, so that the target class to which the original Query belongs can be quickly and efficiently identified, non-entity characters in the original Query are filtered out during class prediction, the entity Query rewritten on the basis of the entity object can avoid missing the entity Query formed by combining different entity objects, the phenomenon that the long-tail Query is missed is greatly improved, and the accuracy and the recall rate of class prediction for the long-tail Query are greatly improved.

Description

Search request processing method and device, computer equipment and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and an apparatus for processing a search request, a computer device, and a storage medium.
Background
In a search engine system, query text (Query) in a search request is the vehicle for a user to express the intent of the search. In the search intention identification task, first, a class (Label) to which the query text belongs needs to be identified, and at present, a HMCN (Hierarchical Multi-Label Classification Networks) model is usually used to predict the specific class to which the query text belongs.
When the HMCN performs category prediction, the HMCN depends heavily on supervised training data, and aiming at head query texts with high search frequency of some users, better recall rate can be realized because the training data is relatively sufficient, but aiming at long tail query texts with low search frequency of some users, the problem of missed recall of the long tail query texts is serious because the training data is relatively deficient, and the accuracy of category prediction is low.
Disclosure of Invention
The embodiment of the application provides a search request processing method and device, computer equipment and a storage medium, which can solve the problem of missed recall of long-tail Query and improve the category prediction accuracy rate of the long-tail Query. The technical scheme is as follows:
in one aspect, a method for processing a search request is provided, where the method includes:
performing entity identification on a query text carried by a search request to obtain at least one entity object contained in the query text;
based on the at least one entity object, acquiring at least one entity query text corresponding to the query text, wherein the entity query text is formed by combining one or more entity objects;
respectively carrying out category prediction on the at least one entity query text to obtain candidate categories associated with the at least one entity query text;
and carrying out duplicate removal on the candidate categories respectively associated with the at least one entity query text to obtain the target categories associated with the query text.
In one aspect, a method for processing a search request is provided, where the method includes:
inputting a query text of a search request into a shared coding model, extracting global category characteristics of the query text through the shared coding model, wherein the global category characteristics represent global semantics of each character in the query text on a category prediction task, and the shared coding model is used for coding the query text to obtain global category characteristics for category prediction and semantic characteristics for entity recognition;
performing full-connection processing on the global category characteristics to obtain full-connection category characteristics of the query text;
carrying out nonlinear mapping on the fully-connected category characteristics to obtain respective prediction scores of the query text belonging to a plurality of leaf categories;
and determining the leaf category with the largest prediction score as the target category associated with the query text.
In one aspect, an apparatus for processing a search request is provided, the apparatus including:
the entity identification module is used for carrying out entity identification on the query text carried by the search request to obtain at least one entity object contained in the query text;
the obtaining module is used for obtaining at least one entity query text corresponding to the query text based on the at least one entity object, and the entity query text is formed by combining one or more entity objects;
the category prediction module is used for respectively carrying out category prediction on the at least one entity query text to obtain candidate categories associated with the at least one entity query text;
and the duplication removing module is used for carrying out duplication removal on the candidate categories associated with the entity query texts respectively to obtain target categories associated with the query texts.
In one possible embodiment, the entity identification module comprises:
a semantic feature extraction unit for extracting features of a plurality of characters contained in the query text, obtaining semantic features of the query text;
and the entity identification unit is used for carrying out entity identification on the query text based on the semantic features of the query text to obtain the at least one entity object.
In one possible implementation, the semantic feature extraction unit is configured to:
performing word segmentation processing on the query text to obtain a plurality of characters contained in the query text;
extracting the characteristics of the characters to obtain the character characteristics of the query text;
inputting the character features of the query text into a plurality of first coding layers of an entity recognition model, coding the character features of the query text through the plurality of first coding layers, and outputting the semantic features of the query text, wherein the entity recognition model is used for carrying out entity recognition on the query text.
In one possible embodiment, the entity identification unit comprises:
the full-connection subunit is used for inputting the semantic features of the query text into a first full-connection layer of the entity identification model, and performing full-connection processing on the semantic features of the query text through the first full-connection layer to obtain the full-connection semantic features of the query text;
the predicting subunit is used for inputting the fully connected semantic features into a conditional random field CRF layer of the entity recognition model, and obtaining entity boundary position labels of a plurality of characters in the query text through prediction of the CRF layer;
and the dividing subunit is used for dividing the plurality of characters to obtain the at least one entity object based on the entity boundary position labels of the plurality of characters.
In one possible embodiment, the predictor is configured to:
obtaining a plurality of candidate paths formed by a plurality of candidate boundary position labels corresponding to the characters;
respectively scoring the multiple candidate paths through the CRF layer to obtain respective path scores of the multiple candidate paths, wherein the path scores represent the possibility that candidate boundary position labels contained in corresponding candidate paths belong to the entity boundary position labels;
and determining a plurality of candidate boundary position labels contained in the candidate path with the highest path score as entity boundary position labels of the plurality of characters.
In one possible implementation, the partitioning subunit is configured to:
determining a start character and an end character of each of the at least one entity object based on an entity boundary position tag of each of the plurality of characters;
and dividing the at least one entity object from the plurality of characters based on the respective starting character and ending character of the at least one entity object.
In one possible embodiment, the category prediction module comprises:
the character feature extraction unit is used for extracting the character features of any entity query text in the at least one entity query text;
the global feature extraction unit is used for acquiring global entity semantic features of the entity query text based on the character features of the entity query text, wherein the global entity semantic features represent global semantics of all characters in the entity query text on a category prediction task;
and the prediction unit is used for predicting to obtain a candidate category associated with the entity query text based on the global entity semantic features.
In one possible implementation, the global feature extraction unit is configured to:
inputting character features of the target classifier and character features of the entity query text into a plurality of second coding layers of a category prediction model, wherein the category prediction model is used for predicting candidate categories associated with the entity query text;
and coding the character features of the target classifier and the character features of the entity query text through the plurality of second coding layers, and outputting the global entity semantic features corresponding to the target classifier.
In one possible implementation, the prediction unit is configured to:
inputting the global entity semantic features into a plurality of second fully-connected layers of the category prediction model, and performing fully-connected processing on the global entity semantic features through the plurality of second fully-connected layers to obtain fully-connected entity semantic features of the entity query text;
carrying out nonlinear mapping on the semantic features of the fully-connected entity to obtain the respective prediction scores of the entity query text belonging to a plurality of leaf categories;
and determining the leaf category with the largest prediction score as the candidate category associated with the entity query text.
In one possible embodiment, the apparatus further comprises:
and the query module is used for querying one or more hierarchy categories associated with the target category from a preset category table, wherein the hierarchy categories refer to upper-level categories or lower-level categories which respectively have association relations with the target category under different hierarchies.
In one aspect, an apparatus for processing a search request is provided, the apparatus including:
the feature extraction module is used for inputting a query text of a search request into a shared coding model, extracting global category features of the query text through the shared coding model, wherein the global category features represent global semantics of characters in the query text on a category prediction task, and the shared coding model is used for coding the query text to obtain global category features for category prediction and semantic features for entity recognition;
the full-connection module is used for performing full-connection processing on the global category characteristics to obtain full-connection category characteristics of the query text;
the mapping module is used for carrying out nonlinear mapping on the fully-connected category characteristics to obtain respective prediction scores of the query text belonging to a plurality of leaf categories;
and the determining module is used for determining the leaf category with the maximum prediction score as the target category associated with the query text.
In one possible implementation, the feature extraction module is configured to:
performing word segmentation processing on the query text to obtain a plurality of characters contained in the query text;
extracting the characteristics of the characters to obtain the character characteristics of the query text;
inputting the character features of the classification indicator and the character features of the query text into a plurality of third coding layers of the shared coding model, coding the character features of the classification indicator and the character features of the query text through the plurality of third coding layers, and outputting the global class features corresponding to the classification indicator.
In one possible embodiment, the shared coding model further outputs semantic features of the query text, and the apparatus further comprises:
and the entity identification module is used for carrying out entity identification on the query text based on the semantic features of the query text to obtain at least one entity object contained in the query text.
In one possible embodiment, the entity identification module is configured to:
performing full-connection processing on the semantic features of the query text to obtain full-connection identification features of the query text;
based on the full-connection identification features, entity boundary position labels of a plurality of characters in the query text are obtained through prediction;
and dividing the at least one entity object from the plurality of characters based on the entity boundary position labels of the plurality of characters.
In one possible embodiment, the apparatus further comprises:
and the query module is used for querying one or more hierarchy categories associated with the target category from a preset category table, wherein the hierarchy categories refer to upper-level categories or lower-level categories which respectively have association relations with the target category at different hierarchies.
In one aspect, a computer device is provided, the computer device comprising one or more processors and one or more memories, the one or more memories storing therein at least one computer program that is loaded by the one or more processors and executed to implement a method of processing a search request as described in any of the possible implementations.
In one aspect, a storage medium is provided, in which at least one computer program is stored, the at least one computer program being loaded and executed by a processor to implement the processing method of a search request according to any one of the above possible implementations.
In one aspect, a computer program product is provided that includes one or more computer programs stored in a computer readable storage medium. One or more processors of the computer device can read the one or more computer programs from the computer-readable storage medium, and the one or more processors execute the one or more computer programs, so that the computer device can execute the processing method of the search request of any one of the above-mentioned possible embodiments.
The beneficial effects brought by the technical scheme provided by the embodiment of the application at least comprise:
the method comprises the steps of firstly carrying out entity identification on an original Query to identify entity objects in the original Query, then rewriting the original Query on the basis of the entity objects to obtain one or more entity queries, and eliminating some non-entity characters in the original Query, so that only class prediction needs to be carried out on each entity Query obtained through rewriting, and then carrying out deduplication on each candidate class obtained through the class prediction, and the target class to which the original Query belongs can be quickly and efficiently identified.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to be able to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of an implementation environment of a method for processing a search request according to an embodiment of the present application;
fig. 2 is a flowchart of a method for processing a search request according to an embodiment of the present application;
fig. 3 is a flowchart of a method for processing a search request according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an entity recognition BERT model provided by an embodiment of the present application;
FIG. 5 is a flow chart of a method for predicting a boundary position tag of an entity according to an embodiment of the present disclosure;
FIG. 6 is a schematic flow chart of Query rewrite provided by an embodiment of the present application;
FIG. 7 is a flowchart of class prediction of an entity Query according to an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of a class prediction BERT model provided by an embodiment of the present application;
fig. 9 is a schematic flowchart of a processing method of a search request according to an embodiment of the present application;
fig. 10 is a flowchart of a method for processing a search request according to an embodiment of the present application;
fig. 11 is a schematic diagram of a shared BERT coding model provided in an embodiment of the present application;
fig. 12 is a schematic structural diagram of a search request processing apparatus according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a search request processing apparatus according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the present application more clear, the following detailed description of the embodiments of the present application will be made with reference to the accompanying drawings.
In this application, the terms "first," "second," and the like are used for distinguishing identical or similar items with substantially identical functions and functionalities, and it should be understood that "first," "second," and "n" have no logical or temporal dependency, and no limitation on the number or execution order.
The term "at least one" in this application means one or more, and the meaning of "a plurality" means two or more, for example, a plurality of first locations means two or more first locations.
In the present application the term "comprising at least one of A or B" relates to the following: including only a, only B, and both a and B.
The user-related information (including but not limited to device information, personal information, behavioral information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.), and signals referred to herein are all approved, authorized, or fully authorized by the respective party when applied to a particular product or technology in accordance with the methods of the embodiments of the present application, and the collection, use, and processing of the related information, data, and signals requires compliance with relevant laws and regulations in the relevant countries and regions. For example, query text referred to in this application is obtained with sufficient authorization.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises an audio processing technology, a computer vision technology, a natural language processing technology, machine learning/deep learning and the like.
The computer can listen, see, speak and feel, and is a development direction of future human-computer interaction, wherein Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is a science integrating linguistics, computer science and mathematics. Therefore, the research in this field will involve natural language, i.e. the language that people use everyday, so it is closely related to the research of linguistics. Natural language processing techniques typically include techniques for entity recognition, machine translation, text preprocessing, semantic understanding, robotic question answering, knowledge mapping, and the like.
In a search engine system, query text (Query) in a search request is a carrier for expressing search intentions by a user, so that semantic understanding of the Query text by a machine is an important link for establishing contact between the user and the search engine system. Usually, a QP (Query Processor) module of a search engine system is responsible for identifying a search intention of a Query text, and in an intention identification task, a category (Label) to which the Query text belongs needs to be identified first, in other words, category prediction for the Query text is an important link for identifying the search intention of a user. Only the query text is accurately classified, the search intention of the user can be assisted and accurately judged, so that the correlation calculation of the query text and the recall Item (Item) is assisted and carried out at the downstream rough arrangement or fine arrangement stage aiming at the recall resource, and the method can also be applied to navigation, correlation search and the like in the auxiliary function of searching products.
Generally, in order to establish a category to which a query text belongs better, a technician may pre-classify the query text to obtain a preset category table, where the preset category table records categories pre-classified under different levels, so that when predicting the category to which the query text belongs, a multi-level and multi-label classification result is predicted, for example, in the case of setting four levels, a classification result format such as "first-level label _ second-level label _ third-level label _ fourth-level label", for example, "makeup skin care _ facial wash _ facial essence _ essence liquid/muscle base liquid" is an exemplary illustration of a multi-level and multi-label classification result.
Currently, when a search engine system predicts categories of a query text into multiple levels, the query text is usually input into an HMCN (Hierarchical Multi-Label Classification Networks) model, a classifier is independently set for each level in the HMCN model, a specific category of the query text in a current level is finally predicted by the classifier of each level, and output categories of all classifiers are integrated to obtain a Multi-level and Multi-Label Classification result.
The HMCN model is a machine learning model obtained by training with supervised training data, and requires to note which class of each level the sample Query belongs to, which may cause the following two problems: (1) Since the number of categories of each level in the preset category table is numerous, for example, the level 1 includes 33 categories, the level 2 includes 277 categories, the level 3 includes 2453 categories, and the level 4 includes 3386 categories, it is obvious that, in the process of manually labeling the sample Query, the problems of high labor cost and low labeling efficiency exist due to numerous levels and categories; (2) The supervised sample Query obtained based on limited labor cost labeling is used as training data and is difficult to cover complete Query distribution in an actual search scene, so that generally, an HMCN model can predict the category of the head Query in a precise category aiming at some head queries with high search frequency due to the fact that the training data are sufficient, but for some long-tail queries with low search frequency, the long-tail Query has a serious recall missing problem due to the fact that the training data are very poor.
As shown in table 1, taking a product search scenario as an example, the category prediction results of the HMCN model for the following head Query and long-tail Query are shown:
TABLE 1
Original Query Query type Recognition result
Brand A product 1 Head Query
Method for using brand A product 1 Long-tail Query ×
What is the main efficacy of brand a product 1 Long-tail Query ×
Action and efficacy of Brand A product 1 Long-tail Query ×
Brand A product 1 is suitable for what age Long-tail Query ×
Brand A product 1 tax free shop price Long-tail Query ×
Pregnant women can use brand A product 1 Long-tail Query ×
Dry skin may be treated with brand a product 1 Long-tail Query ×
Why allergy with Brand A product 1 Long-tail Query ×
Wherein, the problem of missed recall refers to: the category to which the long-tail Query belongs cannot be identified, so that the resources of the corresponding category cannot be accurately recalled.
As shown in table 1 above, it can be seen that, in a commodity search scenario, because the head Query is generally distributed in a single and regular manner, and directly presents a format of "brand name — product name", the recall rate of the HMCN model to the head Query is relatively high, but for the long-tail Query, there are characteristics of various distributions, different features, and the like, and therefore the HMCN model has a serious recall missing problem to the long-tail Query.
In view of this, an embodiment of the present application provides a method for processing a search request, which provides a scheme for predicting short text hierarchy categories of a product based on entity identification and Query rewrite, so as to solve the technical problem that a long-tailed Query missed a recall seriously in a product search scene.
Hereinafter, terms related to the embodiments of the present application will be explained.
Query: query text, usually a short text, refers to search text (which may be a search word, or a search sentence, or a search phrase, etc.) provided by a user in a search scenario.
And predicting the Query category: the Query text type prediction is to classify the Query input by the user and judge the types of the Query under the multi-level, in particular to locate the commodity type intention of the Query under the commodity search scene, so that the calculation of the correlation between the Query and Item can be assisted in the rough-ranking stage and the fine-ranking stage of recalling resources, or the Query text type prediction is applied to other scenes such as navigation, related search and the like in the search product assisting function.
And (3) identifying a Query entity: the method refers to performing Entity identification (NER, also called Named Entity identification, entity blocking, entity extraction, etc.) on Query, and identifying Entity objects with specific meanings contained in Query by performing sequence tagging on character sequences of Query. For example, in a commodity search scenario, the entity object may be attribute information such as a brand name, a product name, and a category name, which can assist in more finely matching the search intention of the user.
Generally, given a Query, entity identification involves identifying an entity boundary and identifying an entity type, where an entity boundary refers to which character an entity object starts from and ends, and where an entity type refers to which type of attribute an entity object specifically belongs.
In the embodiment of the present application, indicators such as B (Begin character), I (Inside, middle character), E (End character), O (Outside, non-entity character) and the like are used to label the entity boundary, and labels such as "brand word", "product word" and the like are used to label the entity type.
In the above labeling manner for the entity boundary and the entity type, assuming that Query is "XY product name using method", the identification result for the entity object should be: "XY" is a brand word, "product name" is a product word, "use method" is a non-entity, the Query entity identification can be regarded as a sequence tagging task for a Query character sequence, and the sequence tagging result is shown in table 2.
TABLE 2
Figure BDA0003791098040000101
Figure BDA0003791098040000111
Query rewrite: the method aims at the Query to perform processing such as normalization, error correction, expansion, word loss and the like so as to achieve better commodity recall and matching. In the embodiment of the application, query rewriting is performed by utilizing the identified entity object.
And (3) Query level category prediction: and a preset category table of the known Query inputs a given Query and outputs the category of the Query belonging to each level in the multiple levels. In a commodity search scene, particularly commodity short text hierarchical category prediction, a preset category table of a Query is also called a hierarchical category tree of a commodity, given a commodity Query, a complete hierarchical path of one or more category trees is output, one or more categories correspond to multiple labels, and multiple levels corresponding to the complete hierarchy, for example, if the commodity Query is input as an 'XX product name using method', the predicted category is 'cosmetic skin care _ face washing _ face essence _ essence/muscle base solution'.
Transformer (Transformer): a neural network structure based on a self-attention mechanism is widely applied to the fields of natural language processing, voice technology, computer vision and the like.
BERT (Bidirectional Encoder Representation From transforms, transform model using bi-directional encoding): the BERT model is a deep two-way coding characterization model based on a Transformer structure, a multi-layer two-way coding network is essentially constructed by using the Transformer structure, coding layers involved in the BERT model are two-way coding layers, each two-way coding layer is used for carrying out forward coding and reverse coding on an input signal, each two-way coding layer comprises an attention network and a forward full-link layer, each hidden layer in the attention network is obtained by carrying out weighted average on the hidden layer of the previous layer, each hidden layer can be directly associated with all the hidden layers of the previous layer, a hidden layer vector for representing global information can be obtained by using input long sequence information, and the forward full-link layer is used for further processing the global information obtained by the attention network so as to enhance the learning capability of the whole BERT model.
Hereinafter, a system architecture according to an embodiment of the present application will be described.
Fig. 1 is a schematic implementation environment of a method for processing a search request according to an embodiment of the present application.
Referring to fig. 1, the implementation environment involves a terminal 101 and a server 102, and the following details are provided:
the terminal 101 is any computer device capable of supporting a search engine, and the terminal 101 is installed and operated with an application program supporting the search engine, and optionally, the application program may be: the application program comprises a browser application, a social application, a content sharing application, an audio and video application, a short video application, a take-out application, a shopping application and the like, and the type of the application program is not specifically limited in the embodiment of the application program.
In some embodiments, a user inputs a query text in an application program supporting a search engine, and in response to a search operation performed by the user, the terminal 101 is triggered to generate a search request carrying the query text, and the terminal 101 sends the search request carrying the query text to the server 102.
The terminal 101 and the server 102 may be directly or indirectly connected through wired or wireless communication, and the present application is not limited thereto.
The server 102 is a server, a plurality of servers, a cloud computing platform, a virtualization center, or the like that can provide a search engine service. The server 102 is used for providing background services for the application programs supporting the search engine on the terminal 101. Optionally, in the resource search process based on the query text, the server 102 undertakes primary computing work, and the terminal 101 undertakes secondary computing work; or, the server 102 undertakes the secondary computing work, and the terminal 101 undertakes the primary computing work; or, the terminal 101 and the server 102 perform cooperative computing by using a distributed computing architecture.
In some embodiments, the server 102 is an independent physical server, or a server cluster or distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, web services, cloud communication, middleware services, domain name services, security services, a CDN (Content Delivery Network), and big data and artificial intelligence platforms.
In some embodiments, the server 102 receives a search request sent by the terminal 101, analyzes the search request to obtain a query text, and then, by using the method provided in the embodiment of the present application, performs category prediction on the query text to obtain a target category to which the query text belongs, further queries other hierarchical categories associated with the target category in a preset category table, integrates the target category and the queried hierarchical categories to obtain a multi-level multi-tag classification result for the query text, so as to be put into subsequent resource recall and ranking tasks, so as to facilitate recall of more accurate multimedia resources for the search request, and simultaneously sort the recalled multimedia resources more accurately, and optionally, the identified multi-level multi-tag classification result can be put into various resource recommendation tasks such as advertisement recommendation, commodity recommendation, information recommendation, and application recommendation, so as to improve resource recommendation accuracy.
In some embodiments, the device types of the terminal 101 include: at least one of a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, an MP3 (Moving Picture Experts Group Audio Layer III, motion video Experts compression standard Audio Layer 3) player, an MP4 (Moving Picture Experts Group Audio Layer IV, motion video Experts compression standard Audio Layer 4) player, or an e-book reader, but not limited thereto.
Those skilled in the art will appreciate that terminal 101 may refer broadly to one of a plurality of terminals, and that the number of terminals 101 may be greater or fewer. For example, the number of the terminals 101 may be only one, or the number of the terminals 101 may be several tens or hundreds, or more. The number and the device type of the terminals 101 are not limited in the embodiment of the present application.
Hereinafter, a process flow of the search request provided in the embodiment of the present application will be briefly described.
Fig. 2 is a flowchart of a method for processing a search request according to an embodiment of the present application. Referring to fig. 2, the embodiment is executed by a computer device, and is described by taking the computer device as a server as an example, and the embodiment includes the following steps:
201. and the server performs entity identification on the query text carried by the search request to obtain at least one entity object contained in the query text.
In some embodiments, a server receives a data request sent by a terminal, parses a header field of the data request, determines that the data request is a search request when the header field carries a search request identifier, and then parses a data field of the search request to obtain a Query text carried in the data field, where the Query text refers to a Query provided by a user, and generally, the Query is a short text.
The method includes that a user starts an application program supporting a search engine on a terminal, inputs a query text in a search input box of the search engine, then clicks a search button, and triggers the terminal to send a search request carrying the query text to a server, optionally, the query text can be input by the user in a voice mode, or input in a text mode, or the user clicks a shortcut search word, a history search word, an association search word, and the like by one key, and the input mode of the query text is not specifically limited in the embodiment of the application.
In some embodiments, after the server parses the search request to obtain the query text, the server performs entity identification on the query text, for example, performs word segmentation on the query text to obtain a character sequence formed by a plurality of characters in the query text, further performs sequence tagging on the character sequence of the query text to perform entity boundary and entity type identification on each character in the character sequence, and finally, according to a sequence tagging result, at least one entity object included in the query text can be segmented from the character sequence.
Schematically, word segmentation processing is performed on the query text "XY product name using method" to obtain character sequences { "X", "Y", "product", "name", "use", "square", "method" }, and sequence labeling is performed on the character sequences to obtain sequence labeling results shown in table 2 above. It can be seen that, since the labeling results of the characters "use", "method" and "method" are all O, i.e. non-entity, the "use method" is not an entity object in the query text. Further, entity segmentation is performed on the XY product name, the labeling result of "X" is "B-brand word", that is, the representative character "X" is recognized as the "start character" of the entity type "brand word", the labeling result of "Y" is "E-brand word", the representative character "Y" is recognized as the "end character" of the entity type "brand word", so that the 1 st entity object "XY" in the query text can be segmented, and the entity type of the entity object "XY" is "brand word"; similarly, the labeling result of the product is the product word B, namely the character product is identified as the beginning character of the product word product, the labeling result of the product is the product word I, the character product is identified as the middle character of the product word product, the labeling result of the name is the product word E, the character name is identified as the end character of the product word product, and the 2 nd entity object product name in the query text can be obtained by segmentation, and the entity type of the product object product name is the product word product. Finally, 2 entity objects "XY" and "product name" contained in the query text can be obtained.
202. And the server acquires at least one entity query text corresponding to the query text based on the at least one entity object, wherein the entity query text is formed by combining one or more entity objects.
In some embodiments, query rewriting may be performed based on the entity objects identified in step 201, so as to cut some irrelevant non-entity characters from a Query text with a complex and diverse original form and features that are difficult to align, and further rewrite the Query text into a regular and easily-classified entity Query, that is, an entity Query text.
In some embodiments, in the process of rewriting an original Query, that is, a Query text, based on at least one entity object to obtain at least one entity Query, that is, an entity Query text, a server may determine an entity type of each entity object, and further exhaust permutation and combination that can be formed by entity objects of different entity types, where each permutation and combination constitutes one entity Query. For example, assuming that 2 entity objects with entity types as brand words and 3 entity objects with entity types as product words are identified, the entity objects under each brand word are combined with the entity objects under 3 different product words, and the 2 brand words and the 3 product words can form 6 combination modes in total, that is, 6 entity queries can be obtained based on the 5 entity objects (including the 2 brand words and the 3 product words).
In one example, the original Query is an "XY product name using method", 2 entity objects in total of 1 brand word "XY" and 1 product word "product name" are recognized through the step 201, and 2 entity objects of different entity types can only form one combination mode, namely, the "XY product name", so that one entity Query "XY product name" corresponding to the original Query can be obtained through the 2 entity objects "XY" and "product name".
In another example, the original Query is "how XY product 1 and product 2 are distinguished", 3 entity objects in total are identified by the step 201, the brand word "XY" and 2 product words "product 1" and "product 2" are combined to form one entity Query "XY product 1", the brand word "XY" and the product word "product 1" are combined to form another entity Query "XY product 2", and finally two entities Query "XY product 1" and "XY product 2" corresponding to the original Query can be obtained.
203. And the server performs category prediction on the at least one entity query text respectively to obtain candidate categories associated with the at least one entity query text respectively.
In some embodiments, the server performs category prediction based on at least one entity Query obtained in step 202, for example, performs category prediction separately on each entity Query to obtain a candidate category associated with each entity Query.
In some embodiments, when class prediction is performed on each entity Query, a global entity semantic feature of the entity Query may be extracted first, full-connection processing is performed on the global entity semantic feature, a full-connection entity semantic feature of the entity Query is extracted, then non-linear mapping is performed on the full-connection entity semantic feature, a prediction score of the entity Query belonging to each leaf class is obtained, and then the leaf class with the largest prediction score is determined as a candidate class associated with the current entity Query. The detailed type prediction method will be described in detail in the next embodiment, which is not described herein.
It should be noted that the leaf category may refer to a category label with the finest granularity in the preset category table, for example, assuming that, in the case of dividing 4 levels, the complete label should be "first-level label _ second-level label _ third-level label _ fourth-level label", and the "fourth-level label" is the leaf category with the finest granularity, for example, in the multi-level multi-label classification result of "makeup and skin care _ facial wash _ facial essence _ essence and muscle base lotion", the "essence/muscle base lotion" is the leaf category with the finest granularity.
204. And the server performs deduplication on the candidate categories associated with the at least one entity query text respectively to obtain the target categories associated with the query text.
In some embodiments, the server may predict a candidate class for each entity Query in step 203, but the candidate classes of different entity queries may be repeated, and at this time, in step 204, all candidate classes obtained by performing class prediction on all entity queries in step 203 need to be deduplicated, and after the candidate classes to which all entity queries belong are deduplicated, the target class to which the original Query belongs may be obtained.
In an example, the original Query is "how XY product 1 and product 2 are distinguished", and 2 entity queries "XY product 1" and "XY product 2" are obtained by rewriting after entity identification, but it is possible that the category prediction result for entity Query "XY product 1" is the same as the category prediction result for entity Query "XY product 2", for example, both predicted candidate categories are "essence/muscle fluid", and at this time, the candidate categories of the two entity queries are de-duplicated, and the target category "essence/muscle fluid" to which the original Query belongs can be obtained.
In another example, the original Query is "how XY product 1 and product 2 are distinguished", after entity identification, 2 entity queries "XY product 1" and "XY product 2" are obtained by rewriting, and assuming that a category prediction result for entity Query "XY product 1" is different from a category prediction result for entity Query "XY product 2", for example, a candidate category to which entity Query "XY product 1" belongs is "essence/muscle fluid", a candidate category to which entity Query "XY product 2" belongs is "eye cream", at this time, any candidate category is not removed when the candidate categories of the two entity queries are removed, so that target categories to which the original Query belongs are "essence/muscle fluid" and "eye cream".
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
According to the method provided by the embodiment of the application, entity identification is carried out on the original Query to identify entity objects in the original Query, one or more entity queries are obtained by rewriting the original Query on the basis of the entity objects to remove non-entity characters in some original queries, so that only class prediction is carried out on each entity Query obtained by rewriting, and then duplication removal is carried out on each candidate class obtained by the class prediction, so that the target class to which the original Query belongs can be identified quickly and efficiently.
In the above embodiment, the processing flow of the server for the search request is briefly described, and in the embodiment of the present application, the implementation of each step in the processing flow of the search request will be described in detail, which is described below.
Fig. 3 is a flowchart of a method for processing a search request according to an embodiment of the present application. Referring to fig. 3, the embodiment is executed by a computer device, and is described by taking the computer device as a server as an example, and the embodiment includes the following steps:
301. and the server receives the search request and analyzes the search request to obtain the query text.
The method includes that a user starts an application program supporting a search engine on a terminal, inputs a query text in a search input box of the search engine, then clicks a search button, and triggers the terminal to send a search request carrying the query text to a server, optionally, the query text can be input by the user in a voice mode, or input in a text mode, or the user clicks a shortcut search word, a history search word, an association search word, and the like by one key, and the input mode of the query text is not specifically limited in the embodiment of the application.
In some embodiments, a server receives a data request sent by a terminal, parses a header field of the data request, determines that the data request is a search request when a search request identifier is carried in the header field, and then parses a data field of the search request to obtain a Query text carried in the data field, where the Query text is a Query provided by a user, and generally, the Query is a short text.
302. And the server carries out word segmentation processing on the query text to obtain a plurality of characters contained in the query text.
In some embodiments, after the server parses the Query text from the search request, the server performs a word segmentation process on the Query text, that is, divides an original short Query text segment by character to obtain a character sequence formed by a plurality of characters contained in the Query text. Optionally, the server performs a segmentation process on the query text by using a segmentation (Token) tool, so as to disassemble the query text into a character sequence containing all characters in the query text, where the characters involved in the embodiment of the present application include, but are not limited to: characters (Chinese or English words), numbers, special symbols, punctuation marks, etc.
In one example, a word segmentation process is performed on the query text "XY product name usage" to obtain the character sequences { "X", "Y", "product", "name", "use", "square", "method" }.
303. And the server extracts the characteristics of the characters to obtain the character characteristics of the query text.
In some embodiments, the server performs Embedding (Embedding) processing on a plurality of characters in the character sequence, and maps each character into an Embedding vector of an Embedding space, where the Embedding vector of each character is a character feature of the current character.
In other embodiments, the server performs One-hot (One-hot) encoding on a plurality of characters in the character sequence, and maps each character into One-hot vector, wherein the One-hot vector of each character is the character feature of the current character.
In some embodiments, a character table (or referred to as a word table) is created in advance in the server, a unique character ID (Identification) is created for each character (e.g., word) in the character table, meanwhile, character features of the character are extracted in advance, and the character ID and the character features of each character in the character table are stored in a feature library in an associated manner, after each character included in the query text is obtained by the word segmentation in step 302, the character features stored in an associated manner with the character ID can be queried from the feature library according to the character ID of each character, where the character features may be an Embedding vector or an One-hot vector of the character, which is not specifically limited in this embodiment of the present application.
In an example, for a query text "XY product name using method", word segmentation is performed to obtain a character sequence { "X", "Y", "product", "name", "use", "square", "method" }, then, each character in the character sequence is subjected to a character table lookup to obtain a character ID corresponding to each character, and then a character feature corresponding to each character is found from a feature library according to the character ID, for example, the character features of the above 9 characters are respectively: t is a unit of 1 、T 2 、T 3 、T 4 、T 5 、T 6 、T 7 、T 8 、T 9
304. The server inputs the character features of the query text into a plurality of first coding layers of the entity recognition model, codes the character features of the query text through the plurality of first coding layers and outputs the semantic features of the query text.
The entity identification model is used for performing entity identification on the Query text, an input signal of the entity identification model is character features of the Query text (such as a short text Query), and after the processing of the entity identification model, an entity labeling result for each character in the Query text is output, wherein the entity labeling result is used for indicating an entity boundary and/or an entity type of the corresponding character.
In some embodiments, the entity recognition model includes a plurality of first coding layers, a first fully-connected layer, and a CRF (Conditional Random Field) layer, where the first coding layer is configured to code an input signal to extract corresponding semantic features, the first fully-connected layer is configured to fully-connect the input signal to extract corresponding fully-connected features, and the CRF layer is configured to score each possible entity labeling result path formed by each character to predict a candidate path with a highest score.
In some embodiments, the server inputs the character features of the query text into a plurality of first coding layers of the entity recognition model, and the character features of the query text are encoded through the plurality of first coding layers, and the plurality of first coding layers are cascaded, that is, an input signal of each first coding layer is an output signal of a previous first coding layer, and an output signal of each first coding layer is input into a next first coding layer, so that after encoding through the plurality of first coding layers, the semantic features of the query text are output by a last first coding layer.
In some embodiments, the BERT model is exemplified as an entity recognition model, and the BERT model includes each first coding layer that is a bidirectional coding layer for forward coding and backward coding an input signal. Each bidirectional coding layer comprises two parts, one part is an attention network, the other part is a forward full-link layer, each hidden layer in the attention network is obtained by carrying out weighted average on the hidden layer of the previous layer, each hidden layer can be directly related to all the hidden layers of the previous layer, a hidden layer vector (meaning the semantic characteristics of a classifier) for representing global information can be obtained by utilizing input long sequence information, and the forward full-link layer is used for further processing the global information acquired by the attention network so as to enhance the learning capability of the whole BERT model.
In some embodiments, after extracting the character features of the characters in the query text, the character features of the classifier [ CLS ], the character features of the characters, the character features of the separator [ SEP ], and the character features of the filler [ PAD ] are spliced according to the above sequence to obtain a character feature sequence, the character feature sequence is input into the entity recognition BERT model, the character feature sequence is forward-coded and backward-coded through a plurality of first coding layers of the entity recognition BERT model, and the semantic features of the characters output in the last first coding layer are obtained as the semantic features of the query text, wherein the semantic features of the characters respectively correspond to the character features of the characters. It should be noted that the character features of the classifier [ CLS ], the separator [ SEP ], and the filler [ PAD ] may also be extracted in the same manner as in step 303, that is, the character IDs of the classifier [ CLS ], the separator [ SEP ], and the filler [ PAD ] are found by looking up the character table, and then the character features of the classifier [ CLS ], the separator [ SEP ], and the filler [ PAD ] are found from the feature library according to the character IDs.
The first coding layer is taken as an example to illustrate, and the first coding layer comprises an attention network and a forward full connection layer. Inputting the character feature sequence into an attention network of a first coding layer, weighting the character feature sequence through the attention network to extract the attention feature sequence of the character feature sequence, inputting the attention feature sequence into a forward full-link layer of the first coding layer, performing bidirectional semantic coding (including forward coding and reverse coding) on the attention feature sequence through the forward full-link layer, outputting a hidden vector sequence, inputting the hidden vector sequence into a second first coding layer, and so on, wherein the processing logic of the subsequent first coding layers is similar to that of the first coding layer, and no repeated description is made here, and the semantic features of a plurality of characters in the query text are found in the hidden vector sequence output by the last first coding layer and serve as the semantic features of the query text. Because the attention mechanism is introduced into the first coding layer, each character can be focused on a character which is relatively large in relation to the character (has a relatively close relation) when semantic coding is carried out each time, and finally obtained semantic features have higher accuracy.
In the process, the server respectively carries out forward coding and reverse coding on each character feature in the character feature sequence through the first coding layer, the feature of each character can fuse the related information of the historical character appearing before the character through the forward coding, the feature of each character can fuse the related information of the future character appearing after the character through the reverse coding, the coding operation in two directions can greatly improve the expression capability of the hidden vector of each character, namely the expression capability of the semantic feature.
Fig. 4 is a schematic diagram of an entity identification BERT model according to an embodiment of the present application, and as shown in fig. 4, an entity identification BERT model 400 includes a plurality of first coding layers 410, a first fully connected layer 420, and a CRF layer 430. Assuming that the query text is an XY product name using method, the character features of each character in the extracted query text are respectively as follows: t is 1 、T 2 、T 3 、T 4 、T 5 、T 6 、T 7 、T 8 、T 9 Then, the classifier [ CLS ] is used]Is placed in T 1 Before (i.e. [ CLS ]]Is placed first), the separator [ SEP ] is placed]Is placed in T 9 Then (i.e. [ SEP ]]After the character feature of the last character), finally judging whether the sequence length of the current character feature sequence reaches the first preset length, if not, using a filler [ PAD ] to obtain the first preset length]The character feature of (2) fills in the character feature sequence until the length of the character feature sequence is reducedThe filling is performed until a first preset length is reached, wherein the first preset length is any integer greater than or equal to 1, for example, the first preset length is 30 or other values, which can be specified by a technician. The character feature sequence of the first preset length { [ CLS ] is used],T 1 ,T 2 ,T 3 ,T 4 ,T 5 ,T 6 ,T 7 ,T 8 ,T 9 ,[SEP],[PAD],[PAD],…,[PAD]Inputting the character feature sequence into a plurality of first coding layers 410, forward coding and backward coding the character feature sequence through the plurality of first coding layers 410, outputting a hidden vector sequence { E ] from the last first coding layer 410 [CLS] ,E 1 ,E 2 ,E 3 ,E 4 ,E 5 ,E 6 ,E 7 ,E 8 ,E 9 ,E [SEP] ,E [PAD] ,E [PAD] ,…,E [PAD] The semantic features { E) corresponding to a plurality of characters in the query text in the hidden vector sequence are combined 1 ,E 2 ,E 3 ,E 4 ,E 5 ,E 6 ,E 7 ,E 8 ,E 9 And acquiring semantic features of the query text. Wherein E in the implicit vector sequence [CLS] The global semantics of the whole query text can be represented by setting the dimensionality to 768, and the semantic feature E of each character 1 ~E 9 May also be set to 768.
In the above steps 302-304, a possible implementation manner is provided for extracting features of a plurality of characters included in the query text to obtain semantic features of the query text, that is, extracting the semantic features of the query text by using an entity recognition BERT model, because the character features of the query text are forward-coded and backward-coded in the BERT model, the coding operation in two directions can greatly improve the expression capability of the hidden vector of each character, that is, the expression capability of the semantic features of the query text can be improved.
In other embodiments, a Transformer coding layer may be used as the first coding layer of the entity recognition model, or other Sequence-To-Sequence (Seq-To-Sequence) models that support coding functions such as: span BERT (improved Pre-training BERT model based on performance and predicted span), LSTM (Long Short-Term Memory network), bi-LSTM (Bidirectional Long Short-Term Memory network), GRU (Gated current Unit), and the like.
305. The server inputs the semantic features of the query text into a first full-connection layer of the entity recognition model, and full-connection processing is carried out on the semantic features of the query text through the first full-connection layer to obtain the full-connection semantic features of the query text.
In some embodiments, after the semantic features of the characters in the query text are extracted in step 304, a splicing feature obtained by splicing (Concat) the semantic features of the characters in the query text may be input into a first full-connection layer of the entity identification model, and the splicing feature is subjected to full-connection processing through the first full-connection layer to extract the full-connection semantic features of the query text.
In the first full-connection layer, global information can be effectively extracted through a many-to-many connection mode. It should be noted that the many-to-many connection mode means that neurons between layers (each neuron corresponds to the semantic feature of one character) are completely connected, and each output neuron can acquire information of all input neurons, in other words, the above-mentioned fully-connected semantic feature can acquire information of the semantic features of all input characters constituting the splicing feature, which is beneficial to information summarization.
In one exemplary scenario, as shown in FIG. 4, the hidden vector sequence { E } output for the plurality of first coding layers 410 [CLS] ,E 1 ,E 2 ,E 3 ,E 4 ,E 5 ,E 6 ,E 7 ,E 8 ,E 9 ,E [SEP] ,E [PAD] ,E [PAD] ,…,E [PAD] Get rid of the classifier [ CLS ]]Is hidden vector of (1), will remainHidden vector sequence { E } 1 ,E 2 ,E 3 ,E 4 ,E 5 ,E 6 ,E 7 ,E 8 ,E 9 ,E [SEP] ,E [PAD] ,E [PAD] ,…,E [PAD] Splicing all the hidden vectors in the query text to obtain spliced hidden vectors, namely spliced characteristics, inputting the spliced hidden vectors into the first full-connection layer 420 for full-connection processing, and thus obtaining full-connection semantic characteristics of the query text.
Illustratively, assuming that the first preset length is 30, after inputting a character feature sequence with the length of 30 into the entity recognition BERT model, a hidden vector sequence with the length of 30 is output, because the first-order classifier [ CLS ] is removed]To obtain a spliced implicit vector E with the length of 29 in Assuming that each hidden vector is 768 dimensions, it is obvious to splice the hidden vectors E in Has a dimension of [29, 768%]To splice the hidden vectors E in Into the first fully connected layer 420. Further, if the weight matrix W in the first fully-connected layer 420 1 Has the dimension of [768,7]Then pass through the weight matrix W 1 Pair splicing hidden vector E in After the full-link processing, the first full-link layer 420 outputs a dimension of [29,7 ]]Vector E of out The above output vector E out I.e., fully connected semantic features of the query text.
306. And the server inputs the full-connection semantic features into a CRF layer of the entity recognition model, and entity boundary position labels of a plurality of characters in the query text are obtained through prediction of the CRF layer.
The entity boundary position label is used for representing the entity boundary label and the entity type label of the current character, in other words, the entity boundary position label is equivalent to an entity boundary labeling result and an entity type label result provided for the current character.
In some embodiments, the server pre-defines an entity identification category table of entity boundary position tags, and each entity boundary position tag in the entity identification category table is labeled with a combination of a determined entity boundary tag and an entity type tag.
Illustratively, the entity boundaries are labeled by using 4 indicators of B (Begin, end), I (Inside, middle), E (End), and O (exit, non-entity), and the example of labeling the entity types by using 2 tags of "brand word" and "product word" is described, in which case, 2 tags of entity type and 4 tags of entity boundary can be combined to form 7 tags of entity boundary position, after assigning a tag ID to each tag of entity boundary position, the mapping relationship between each tag of entity boundary position and its tag ID is recorded by an entity identification category table, as shown in table 3:
TABLE 3
Tag ID Entity boundary position tag
0 O
1 B-brand word
2 I-brand word
3 E-brand word
4 B-product word
5 I-product word
6 E-product word
An entity identification category table is predefined based on an entity boundary label and an entity type label in advance, so that a combination formed by the entity boundary label and the entity type label can be simultaneously represented by using a single entity boundary position label, and a labeling task for an entity boundary and a labeling task for an entity type are converted into a classification task for classifying each character into which entity boundary position label, so that the entity identification efficiency can be greatly improved.
In some embodiments, after the fully-connected semantic features of the query text are extracted in the step 305, the fully-connected semantic features of the query text may be input into a CRF layer of the entity recognition model, since each character may correspond to one candidate boundary position label in the entity recognition category table, and different characters may form different candidate paths when corresponding to different candidate boundary position labels, in other words, the candidate boundary position label of each character in each possible case may form one candidate path of possible sequence labeling, each possible candidate path may be scored in the CRF layer to select a candidate path with the highest path score, and the candidate boundary position label of each character in the candidate path with the highest path score is the finally output entity boundary position label.
Fig. 5 is a flowchart of predicting entity boundary position tags according to an embodiment of the present application, and as shown in fig. 5, the server may predict entity boundary position tags of each character through the following steps 3061-3063:
3061. the server acquires a plurality of candidate paths formed by a plurality of candidate boundary position labels corresponding to the plurality of characters.
In some embodiments, there may be all candidate boundary position labels for each character in the query text, a candidate boundary position label referring to any one of the entity recognition category tablesTag ID, different candidate boundary position tags corresponding to different characters can form different candidate paths by combining, for example, for a query text of "XY product name using method", taking an entity identification category table shown in Table 3 as an example, each character in the query text has 7 possible candidate boundary position tags, all the 7 possible candidate boundary position tags are arranged and combined for all the characters, and all the multiple candidate paths can be obtained, for example, for a query text of "XY product name using method", 9 characters are arranged and combined for the 7 candidate boundary position tags, and 7 candidate paths can be obtained in total 9 A candidate path.
In one exemplary embodiment, for a plurality of characters included in the query text, a candidate path is characterized by using a tag ID sequence formed by tag IDs of candidate boundary position tags of the respective characters in the entity identification category table.
3062. And the server respectively scores the multiple candidate paths through the CRF layer to obtain the path scores of the multiple candidate paths.
The path score characterizes the possibility that the candidate boundary position label included in the corresponding candidate path belongs to the entity boundary position label.
In some embodiments, for each candidate path, the CRF layer obtains, based on the CRF algorithm, conditional probabilities that the characters have candidate boundary position labels determined by the current candidate path, where the conditional probabilities represent path scores of the current candidate path, and the conditional probabilities further characterize matching degrees of the current candidate path and the characters in all the candidate paths, where the higher the conditional probability is, the higher the matching degree between the candidate boundary position labels determined by the current candidate path and the corresponding characters is, and the lower the conditional probability is, the lower the matching degree between the candidate boundary position labels determined by the current candidate path and the corresponding characters is.
3063. The server determines a plurality of candidate boundary position tags contained in the candidate path with the highest path score as entity boundary position tags of the plurality of characters respectively.
In some embodiments, all possible candidate paths are determined from step 3061, the path score for each candidate path is determined from step 3062, the candidate path with the highest path score is obtained, and each candidate boundary position label included in the candidate path with the highest path score is determined as the entity boundary position label of the corresponding character. Optionally, the candidate paths may be ranked in order of decreasing path scores, and each candidate boundary position label included in the candidate path ranked at the top, that is, the candidate path with the highest path score, is determined as the entity boundary position label of the corresponding character.
In one exemplary scenario, as shown in FIG. 4, the fully-connected semantic feature output by the first fully-connected layer 420 is vector E out The label ID sequence output by the CRF layer 430 can determine the entity boundary position label of each character in the candidate path uniquely indicated by the label ID sequence.
Assuming that the label ID sequence output by the CRF layer 430 is {1,3,4,5,6,0,0,0,0} for the query text "XY product name usage method", it can be known that the entity boundary position labels represented by this candidate path are "B-brand word, E-brand word, B-product word, I-product word, E-product word, O, O, O, O" in sequence according to the mapping relationship shown in table 3 of the entity identification category table, and therefore, the result of entity labeling of the entity boundary position label of each character by the final candidate path is shown in table 4 below.
TABLE 4
Character(s) X Y Produce birth to Article (A) Name (name) Make it possible to By using Square block Method of
Label (R) B-Brand word E-brand word B-product word I-product word E-product word O O O O
ID
1 3 4 5 6 0 0 0 0
It should be noted that, in the entity identification BERT model for scoring candidate paths based on the CRF layer, in the training phase, the model parameters of the entire entity identification BERT model may be updated based on a loss function of the CRF layer, where the loss function of the CRF layer refers to a negative log-likelihood of a sum of a path score of a current transition state matrix in the CRF algorithm on a correct path (i.e., a reference path labeled for a sample Query) and all path scores.
In the steps 3061 to 3063, a possible implementation manner that the CRF layer is used for predicting the entity boundary position label of each character in the entity recognition BERT model is provided, the CRF layer is used for integrally scoring the candidate paths instead of predicting the candidate paths one by one, the precedence order of the entity boundary position labels of different characters can be restricted, and therefore the accuracy of predicting the entity boundary position label of each character in the query text can be improved.
In other embodiments, the CRF layer in the entity recognition BERT model may be replaced with an exponential normalization layer, i.e., a Softmax layer, in which case, the fully-connected semantic features extracted in step 305 are input into the Softmax layer, a matching probability of each character with respect to each candidate boundary position tag in the entity recognition category table is calculated for each character through the Softmax layer, and the entity boundary position tag of each character is set as the candidate boundary position tag with the highest matching probability, so that the calculation amount of predicting the entity boundary position tag of each character can be simplified, and the processing resources of the server can be saved.
307. And the server divides at least one entity object contained in the query text from the characters based on the entity boundary position labels of the characters.
In some embodiments, the server may determine the start character and the end character of each of the at least one entity object based on the entity boundary position tag of each of the plurality of characters, and since the entity boundary position tag can identify not only the entity boundary but also the entity type, for each entity object included in the query text, the server can find the start character and the end character of the entity object according to the entity boundary position tag. Then, the server divides the at least one entity object from the plurality of characters based on the respective start character and end character of the at least one entity object, that is, for each entity object, the server determines the respective characters from the start character of the entity object to the end character of the entity object as the entity object.
Illustratively, for the query text "XY product name usage method", after the entity boundary position label of each character shown in table 4 above is obtained, since the labeling results of the characters "use", "method", and "method" are all O, i.e., non-entities, the "usage method" is not an entity object in the query text. Further, entity segmentation is performed on the XY product name, the labeling result of "X" is "B-brand word", that is, the character "X" is identified as the "start character" of the entity type "brand word", the labeling result of "Y" is "E-brand word", the character "Y" is identified as the "end character" of the entity type "brand word", and therefore the 1 st entity object "XY" in the query text can be segmented, and the entity type of the entity object "XY" is "brand word"; similarly, the labeling result of the product is the product word B, namely the character product is identified as the beginning character of the product word product, the labeling result of the product is the product word I, the character product is identified as the middle character of the product word product, the labeling result of the name is the product word E, the character name is identified as the end character of the product word product, and the 2 nd entity object product name in the query text can be obtained by segmentation, and the entity type of the product object product name is the product word product. Finally, 2 entity objects "XY" and "product name" contained in the query text can be obtained.
In the above steps 305-307, one possible embodiment of performing entity recognition on the Query text based on the semantic features of the Query text to obtain the at least one entity object is provided, that is, extracting the semantic features of the Query text by using the first encoding layer through an integral entity recognition model, and inputting the semantic features of the Query text into the first fully-connected layer and the CRF layer in cascade to perform sequence labeling on the Query text, and outputting an entity boundary position label of each character in the Query text, so that each entity object can be obtained by segmentation from the Query text, and thus, the entity object contained in the long-tail Query can be automatically identified aiming at any long-tail Query through a trained entity identification model, thereby being beneficial to rewriting the entity Query according to the entity object to obtain the entity Query subsequently, and improving the accuracy of subsequent category prediction.
In the foregoing steps 302 to 307, a possible implementation manner of performing entity identification on the query text carried by the search request to obtain at least one entity object included in the query text is further provided, that is, in the embodiment of the present application, it is described by taking an example that the entity identification model and the category prediction model are different models, in the next embodiment, processing logic of a shared encoding module of the entity identification model and the category prediction model will be described in detail, which is not specifically limited in the embodiment of the present application.
308. And the server acquires at least one entity query text corresponding to the query text based on the at least one entity object.
Wherein the entity query text is formed by one or more entity object combinations.
In some embodiments, query rewrite may be further performed based on the entity object of the Query text identified in the above step 307, so that the Query text with complex and various original forms and difficult alignment of features may be cut off some irrelevant non-entity characters, and further rewritten into a regular and easily-classified entity Query text, that is, an entity Query.
Generally, the entity Query is a combined word or a combined phrase formed by combining one or more entity objects identified in the above step 307, so that the accuracy of performing category prediction on the entity Query is greatly improved compared with the accuracy of performing category prediction on the original Query.
In some embodiments, in the process of rewriting an original Query, that is, a Query text, based on at least one entity object to obtain at least one entity Query, that is, an entity Query text, a server may determine an entity type of each entity object, and further exhaust permutation and combination that can be formed by entity objects of different entity types, where each permutation and combination constitutes one entity Query. For example, assuming that 2 entity objects with entity types as brand words and 3 entity objects with entity types as product words are identified, the entity objects under each brand word are combined with the entity objects under 3 different product words, and the 2 brand words and the 3 product words can form 6 combination modes in total, that is, 6 entity queries can be obtained based on the 5 entity objects (including the 2 brand words and the 3 product words).
In one example, the original Query is an "XY product name using method", 2 entity objects in total of 1 brand word "XY" and 1 product word "product name" are identified through the step 307, and 2 entity objects of different entity types can only form one combination mode, namely, the "XY product name", so that one entity Query "XY product name" corresponding to the original Query can be obtained through the 2 entity objects "XY" and "product name".
In another example, the original Query is "how XY product 1 and product 2 are distinguished", 3 entity objects in total are identified by 1 brand word "XY" and 2 product words "product 1" and "product 2" through the above step 307, the brand word "XY" and the product word "product 1" are combined to form one entity Query "XY product 1", the brand word "XY" and the product word "product 2" are combined to form another entity Query "XY product 2", and finally two entities Query "XY product 1" and "XY product 2" corresponding to the original Query can be obtained.
Fig. 6 is a schematic flow chart of Query rewrite provided in an embodiment of the present application, and as shown in fig. 6, for each entity object segmented from a Query text based on an entity labeling result of an entity identification model, for example, each brand word 601 and each product word 602 obtained by segmentation, entity objects of any different entity types may be combined to form a new entity Query 603, so that each entity Query 603 obtained by rewriting an original Query based on an entity object can be implemented, and further, each entity Query 603 obtained by rewriting is input to a category prediction model for category prediction.
In this step 308, query rewrite is performed based on entity object combination, so that a long-tail Query with a complex structure and various forms can be rewritten into a regular entity Query easy to classify, and then category prediction is performed only on the entity Query, that is, the entity Query text, without performing category prediction on the original Query, that is, the Query text, so that recall rate and accuracy of category prediction tasks can be greatly improved.
309. And the server performs category prediction on the at least one entity query text respectively to obtain candidate categories associated with the at least one entity query text respectively.
In some embodiments, the server performs category prediction based on the at least one entity Query obtained in step 308, for example, performs category prediction separately on each entity Query to obtain a candidate category associated with each entity Query.
In some embodiments, when class prediction is performed on each entity Query, a global entity semantic feature of the entity Query may be extracted first, full-connection processing is performed on the global entity semantic feature, a full-connection entity semantic feature of the entity Query is extracted, then non-linear mapping is performed on the full-connection entity semantic feature, a prediction score of the entity Query belonging to each leaf class is obtained, and then the leaf class with the largest prediction score is determined as a candidate class associated with the current entity Query.
Next, a class prediction process of a single entity Query will be described by taking any one entity Query of the at least one entity Query as an example, fig. 7 is a class prediction process diagram of an entity Query provided in an embodiment of the present application, and as shown in fig. 7, a server implements class prediction of a single entity Query through the following steps 3091-3093:
3091. for each entity Query, the server extracts character features of the entity Query.
In some embodiments, for each entity Query resulting from overwriting, the character features of the entity Query can be extracted in the same manner as in steps 302-303 above. Namely, word segmentation processing is carried out on the entity Query to obtain a plurality of characters contained in the entity Query, and then feature extraction is carried out on the plurality of characters contained in the entity Query to obtain character features of the entity Query.
In other embodiments, since the entity Query is obtained by combining the characters included in the original Query, the entity Query does not have new characters that do not appear in the original Query, so after the character features of each character in the original Query are extracted in step 303, the character features of each character in the original Query can be cached, and then, in step 3091, the character features extracted in step 303 can be reused only by querying the character features of each character belonging to the current entity Query from the character features of each character in the cached original Query, thereby avoiding performing word segmentation and character feature extraction on the entity Query again, and greatly saving the processing resources of the server.
In an example, for the rewritten entity Query "XY product name", word segmentation is performed to obtain character characteristics { "X", "Y", "product", "name" }, then, each character in the character sequence is passed through a lookup character table to obtain a character ID corresponding to each character, and then the character characteristic corresponding to each character is found from the characteristic library according to the character ID, for example, the character characteristics of the above 5 characters are respectively: t is 1 、T 2 、T 3 、T 4 、T 5
3092. And the server acquires the global entity semantic features of the entity Query based on the character features of the entity Query.
The global entity semantic features represent global semantics of all characters in the entity Query on a category prediction task.
In some embodiments, the server may extract global entity semantic features of the entity Query based on character features of the entity Query through a class prediction model. Optionally, the category prediction model is used for predicting a candidate category associated with the entity Query, where the candidate category predicted by the category prediction model is a leaf category with the finest granularity in a preset category table of the entity Query.
In some embodiments, the class prediction model includes a plurality of second coding layers and a plurality of second fully-connected layers, where the second coding layers are used for coding the input signal to extract corresponding global semantic features, and the second fully-connected layers are used for performing fully-connected processing on the input signal to predict corresponding candidate classes.
In some embodiments, the server inputs the character features of the entity Query extracted in step 3091 into a plurality of second coding layers of the class prediction model, and codes the character features of the entity Query through the plurality of second coding layers, where the plurality of second coding layers are cascaded, that is, an input signal of each second coding layer is an output signal of a previous second coding layer, and an output signal of each second coding layer is input into a next second coding layer, so that after coding through the plurality of second coding layers, a global entity semantic feature of the entity Query is output by the last second coding layer.
Schematically, the description is given by taking the category prediction model as the category prediction BERT model as an example, where a plurality of second coding layers included in the BERT model are all bidirectional coding layers, and the bidirectional coding layers are the same as the bidirectional coding layers of the entity identification BERT model introduced in step 304, but generally have different model parameters, which are not described herein again. On the basis of the category prediction BERT model, the server can extract the global entity semantic features of the entity Query through the following steps A1-A2:
and A1, the server inputs the character features of the target classifier and the character features of the entity Query into a plurality of second coding layers of the class prediction model.
In some embodiments, after extracting the character features of a plurality of characters in the entity Query, the character features of the target classifier [ CLS ], the character features of the plurality of characters, the character features of the separator [ SEP ], and the character features of the filler [ PAD ] are spliced according to the sequence to obtain an entity character feature sequence, and the entity character feature sequence is input into a plurality of second coding layers in the class prediction BERT model.
Fig. 8 is a schematic diagram of a class prediction BERT model according to an embodiment of the present application, and as shown in fig. 8, the class prediction BERT model 800 includes a plurality of second coding layers 810 and a plurality of second fully-connected layers 820. Assuming that the entity Query is an "XY product name", the character features of each character in the extracted entity Query are respectively: t is 1 、T 2 、T 3 、T 4 、T 5 Then, the target classifier [ CLS ] is used]Is placed in T 1 Before (i.e. [ CLS ]]Is placed first), separator [ SEP ] is placed]Is placed in T 5 Then (i.e. [ SEP ]]After the character feature of the last character), finally judging whether the sequence length of the current entity character feature sequence reaches the second preset length, if not, using a filler [ PAD ] to obtain the final character feature of the current entity character feature sequence]The character feature of (2) fills the entity character feature sequence until the length of the entity character feature sequence is filled to a second preset length, wherein the second preset length is any integer greater than or equal to 1, for example, the second preset length is 30 or other numerical values, and can be specified by a technician. The entity character feature sequence of the second preset length { [ CLS ]],T 1 ,T 2 ,T 3 ,T 4 ,T 5 ,[SEP],[PAD],[PAD],…,[PAD]Is input into the plurality of second encoding layers 810.
And A2, the server encodes the character features of the target classifier and the character features of the entity Query through the plurality of second encoding layers and outputs the global entity semantic features corresponding to the target classifier.
In some embodiments, the server performs forward encoding and reverse encoding on the above-mentioned entity character feature sequence through the plurality of second encoding layers, and acquires the hidden vector corresponding to the target classifier [ CLS ] output in the last second encoding layer as the global entity semantic feature of the entity Query, where the hidden vector corresponding to the target classifier [ CLS ] is obtained by comprehensively encoding the long sequence information, i.e., the hidden vector corresponding to the target classifier [ CLS ] is a hidden layer vector used for representing global semantic information, so as to represent the global entity semantic feature of the current entity Query. The bidirectional encoding method of each second encoding layer is the same as the bidirectional encoding method of the first encoding layer in step 304, and is not described herein again.
Still taking fig. 8 as an example for explanation, the entity character feature sequence { [ CLS ] involved in the above step A1 is described],T 1 ,T 2 ,T 3 ,T 4 ,T 5 ,[SEP],[PAD],[PAD],…,[PAD]After inputting into the second coding layers 810, the entity character feature sequence is coded forward and backward by the second coding layers 810, and an implicit vector sequence { E } is output from the last second coding layer 810 [CLS] ,E 1 ,E 2 ,E 3 ,E 4 ,E 5 ,E [SEP] ,E [PAD] ,E [PAD] ,…,E [PAD] The hidden vector sequence is classified with a target classifier [ CLS ]]Corresponding hidden vector E [CLS] Obtaining the global entity semantic feature of the entity Query, namely a hidden vector E [CLS] Representing the global semantics of the whole entity Query, the dimension can be set to 768, and simultaneously the hidden vector E of each character in the entity Query 1 ~E 5 May also be set to 768.
3093. And the server predicts and obtains a candidate category associated with the entity Query based on the global entity semantic features.
In some embodiments, the server may predict the candidate class of the entity Query based on the global entity semantic features through the class prediction model described above. Optionally, the global entity semantic features extracted in step 3092 are input into a plurality of second fully-connected layers included in the category prediction model, fully-connected entity semantic features of the entity Query are extracted through the plurality of second fully-connected layers, and then the fully-connected entity semantic features are subjected to nonlinear mapping through an activation layer to obtain a prediction score for mapping the entity Query to each leaf category, and further, the leaf category with the maximum prediction score is determined as a candidate category of the current entity Query.
Illustratively, taking the class prediction model as the class prediction BERT model as an example, each second fully-connected layer of the class prediction BERT model is connected with an active layer in series, and on this basis, the server may extract the candidate class of the entity Query through the following steps B1-B3:
b1, the server inputs the global entity semantic features into a plurality of second full-connection layers of the category prediction model, and full-connection processing is carried out on the global entity semantic features through the plurality of second full-connection layers to obtain full-connection entity semantic features of the entity Query.
In some embodiments, the server inputs the global entity semantic features into a plurality of second fully-connected layers of the category prediction BERT model, fully connects the global entity semantic features through a first second fully-connected layer to obtain a fully-connected feature, activates the fully-connected feature through an activation layer connected in series with the first second fully-connected layer to obtain a fully-connected activation feature, inputs the fully-connected activation feature into a second fully-connected layer, and so on until the last second fully-connected layer outputs the fully-connected entity semantic features of the entity Query.
In one exemplary scenario, the class prediction BERT model includes two second fully-connected layers, each of which is followed by an active layer in series, wherein the weight matrix of the first and second fully-connected layers can be represented as W 21 After the first and second full connection layers, an activation layer can be connected in seriesThe weight matrix for the second fully-connected layer, with the function of tanh, can be represented as W 22 After the second fully-connected layer, an activation layer with an activation function sigmoid can be connected in series.
Based on the class prediction BERT model of the architecture, as shown in FIG. 8, the global entity semantic feature E output by the last second coding layer 810 [CLS] Input into the first and second fully-connected layers 820 by weight matrix W 21 To global entity semantic feature E [CLS] Performing full connection processing to obtain a full connection feature, further performing activation processing on the full connection feature through a tanh activation layer to obtain a full connection activation feature, inputting the full connection activation feature into a second full connection layer 820, and performing weight matrix W 22 Then, in the following step B2, the fully-connected entity semantic features are input into a sigmoid activation layer to perform nonlinear mapping on the fully-connected entity semantic features, so as to obtain a prediction score of each leaf category to which the entity Query belongs. In one example, W 21 Has a dimension of [768, 768]In the case where the number of categories of the finest-grained leaf category in the preset category table is equal to 5363, W 22 May be set to 768, 5363]。
B2, the server performs nonlinear mapping on the semantic features of the fully connected entity to obtain the prediction scores of the entity Query belonging to a plurality of leaf categories.
In some embodiments, for the fully-connected entity semantic features output by the last second fully-connected layer in step B1 above, the fully-connected entity semantic features are input into an active layer, and the fully-connected entity semantic features are non-linearly mapped by a sigmoid activation function in the active layer to output a prediction score for mapping the entity Query to each leaf category.
In other embodiments, after the plurality of second fully-connected layers of the category prediction BERT model, an index normalization layer may be further connected in series to obtain the prediction score of each leaf category, and the obtaining manner of the prediction score is not particularly limited in this embodiment.
It should be noted that the leaf category may refer to a category label with the finest granularity in the preset category table, for example, assuming that, in the case of dividing 4 levels, the complete label should be "first-level label _ second-level label _ third-level label _ fourth-level label", and the "fourth-level label" is the leaf category with the finest granularity, for example, in the multi-level multi-label classification result of "makeup and skin care _ facial wash _ facial essence _ essence and muscle base lotion", the "essence/muscle base lotion" is the leaf category with the finest granularity.
And B3, the server determines the leaf category with the maximum prediction score as a candidate category associated with the entity Query.
In some embodiments, since a prediction score between each leaf category is calculated for the entity Query, the leaf category with the largest prediction score may be selected as the candidate category associated with the current entity Query. Optionally, after obtaining the prediction score of each leaf category, a leaf category with a prediction score greater than a specified threshold may be selected as a prediction result of the current entity Query, and further, if two or more leaf categories with prediction scores greater than the specified threshold exist, a leaf sub-category with a highest prediction score is selected as a candidate category of the current entity Query, or one leaf sub-category with a prediction score greater than the specified threshold is randomly selected from the two or more leaf sub-categories with prediction scores greater than the specified threshold to serve as a candidate category of the current entity Query, or all leaf categories with prediction scores greater than the specified threshold may be retained to serve as candidate categories of the current entity Query. The specified threshold may be any value greater than or equal to 0, for example, the specified threshold may be 0.5.
Schematically, as shown in fig. 8, for the entity Query "XY product name", the category of leaves with the largest predicted score output by the final category prediction BERT model is "essence/muscle fluid", in other words, the candidate category associated with the entity Query "XY product name" is "essence/muscle fluid".
In the process, a multi-level multi-label category prediction BERT model is constructed on the basis of the finest leaf category, and the dimension of the fully-connected entity semantic features output by the last second fully-connected layer is controlled to be equal to the number of the finest leaf category in the preset category table, so that after the fully-connected entity semantic features are subjected to nonlinear mapping, namely activation processing, each dimension in the activated vector is in one-to-one correspondence with one leaf category, and therefore the value of each dimension in the activated vector represents the prediction score of the entity Query belonging to the corresponding leaf category, and the candidate category to which the entity Query belongs can be screened out.
It should be noted that, in the process of training the class prediction BERT model, based on the cross entropy loss between the candidate class predicted from the Query of the same sample entity and the actual class obtained by labeling as a loss function, the model parameters of the second coding layer and the second fully-connected layer (which may be regarded as a classification layer) in the whole class prediction BERT model may be updated through the loss function.
In the above steps B1-B3, how the server predicts the candidate class associated with the entity Query based on the global entity semantic feature is shown, and through the nonlinear mapping of the multiple second fully-connected layers and the active layer, the leaf class (i.e., the class label with the finest granularity) to which the entity Query belongs can be predicted in one step, so that only the upper-level label associated with the leaf class at each level needs to be sequentially queried in the preset class table, that is, the classification result of one multi-level multi-label can be output, and the individual prediction for each level as in the HMCN model is not needed, so that the accuracy of class prediction can be greatly improved.
In other embodiments, after the global entity semantic features are extracted, an HMCN model may be concatenated to predict the classification result of the multi-level and multi-label entity Query, and since the HMCN model has a high recognition accuracy for the regular entity Query, and the entity Query obtained by entity recognition and Query rewriting is in a relatively regular format, a good recognition accuracy can be obtained.
In the above step 3091-3093, a type prediction manner of a single entity Query in at least one entity Query is shown, when a plurality of entity queries are obtained by Query rewriting, the above steps 3091-3093 are performed on each entity Query to obtain a candidate type associated with the current entity Query, and the above operations are repeatedly performed until all candidate types associated with the entity queries are obtained, and then the following step 310 is performed.
310. And the server performs duplicate removal on the candidate categories associated with the at least one entity query text respectively to obtain the target categories associated with the query text.
In some embodiments, the server may predict a candidate class for each entity Query in step 309, but the candidate classes of different entity queries may be repeated, and at this time, in step 310, all candidate classes obtained by performing class prediction on all entity queries in step 309 need to be deduplicated, and after the candidate classes to which all entity queries belong are deduplicated, the target class to which the original Query belongs may be obtained.
In an example, the original Query is "how XY product 1 and product 2 are distinguished", and 2 entity queries "XY product 1" and "XY product 2" are obtained by rewriting after entity identification, but it is possible that the category prediction result for entity Query "XY product 1" is the same as the category prediction result for entity Query "XY product 2", for example, both predicted candidate categories are "essence/muscle fluid", and at this time, the candidate categories of the two entity queries are de-duplicated, and the target category "essence/muscle fluid" to which the original Query belongs can be obtained.
In another example, the original Query is "how XY product 1 and product 2 are distinguished", after entity identification, 2 entity queries "XY product 1" and "XY product 2" are obtained by rewriting, and assuming that a category prediction result for entity Query "XY product 1" is different from a category prediction result for entity Query "XY product 2", for example, a candidate category to which entity Query "XY product 1" belongs is "essence/muscle fluid", a candidate category to which entity Query "XY product 2" belongs is "eye cream", at this time, any candidate category is not removed when the candidate categories of the two entity queries are removed, so that target categories to which the original Query belongs are "essence/muscle fluid" and "eye cream".
311. And the server queries one or more hierarchical categories associated with the target category from a preset category table.
The hierarchy type refers to an upper level or a lower level type which has an association relationship with the target type respectively under different hierarchies.
In some embodiments, in the step 309, when the leaf category in the preset category table is selected as the candidate category of each entity Query, since the association relationship between the categories of different levels is recorded in the preset category table, for example, the preset category table records the upper level category to which each leaf category belongs, and records the higher level category to which the higher level category belongs, and so on until the highest level category (which may be referred to as a root category) is found, for each candidate category included in the target category deduplicated in the step 310, the candidate category (i.e., the leaf category) and each upper level category and higher level category to which the candidate category belongs may be traced back to the highest level category (which is the root category), and all the associated categories of the candidate categories in different levels may be summarized, so that one or more level categories associated with the entire Query text, that is the target category of the original Query, that is, in other words, a multi-level multi-label classification result of the original Query may be obtained.
In one example, for the query text "XY product name using method", the target category obtained by final deduplication only includes one candidate category, that is, the leaf category "essence/muscle base lotion", the upper-level category "face essence" to which the leaf category "essence/muscle base lotion" belongs is sequentially queried in the preset category table, the higher-level category "face care" to which the upper-level category "face care" belongs is further queried, the root category "makeup skin care" to which the higher-level category "face care" belongs is further queried, and then the association categories of different levels are summarized to obtain the final multi-level multi-label classification result "makeup skin _ face care _ face _ essence/muscle base lotion".
Fig. 9 is a schematic flowchart of a method for processing a search request according to an embodiment of the present application, and as shown in fig. 9, for a query text "XY product name using method" parsed from a search request, a character sequence { "X", "Y", "product", "item", "name", "use", "square", "method" is obtained by word segmentation, and then a character feature sequence { [ CLS ] is constructed],T 1 ,T 2 ,T 3 ,T 4 ,T 5 ,T 6 ,T 7 ,T 8 ,T 9 ,[SEP],[PAD],[PAD],…,[PAD]Inputting the Query text into an entity recognition model 910, performing entity recognition on the Query text through the entity recognition model 910, outputting a candidate path obtained by performing entity sequence labeling on the Query text, which is characterized as "B-brand word, E-brand word, B-product word, I-product word, E-product word, O, O, O, O", further performing Query rewrite in a Query rewrite module 920 based on an entity object obtained by dividing the Query text to obtain one or more recombined entity queries, inputting one or more entity queries output by the Query rewrite module 920 into a category prediction model 930, each entity Query individually calling a category prediction module 930 to independently predict candidate categories of a single entity Query, comprehensively de-duplicating the candidate categories predicted by all the entity queries, obtaining target categories predicted by the Query text (the target categories are one or more leaf categories in a preset category table), and then returning to a root category associated with each entity Query category in the preset category table, and outputting a multi-level classification result of a higher-level text classification and a multi-level classification result.
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
According to the method provided by the embodiment of the application, entity identification is carried out on the original Query to identify entity objects in the original Query, one or more entity queries are obtained by rewriting the original Query on the basis of the entity objects to remove non-entity characters in some original queries, so that only class prediction is carried out on each entity Query obtained by rewriting, and then duplication removal is carried out on each candidate class obtained by the class prediction, so that the target class to which the original Query belongs can be identified quickly and efficiently.
In the above embodiment, the process of performing entity identification and category prediction on the Query text by using a single entity identification model and a single category prediction model is described in detail, so that decoupling of the entity identification model and the category prediction model can be achieved, in some embodiments, since both the entity identification model and the category prediction model need to encode the input Query (Query text and entity Query text), in the training stage, the entity identification model and the category prediction model can share the underlying coding module, at this time, the underlying shared coding module is referred to as a shared coding model, and the following will describe in detail the process flow based on the shared coding model by using an embodiment.
Fig. 10 is a flowchart of a method for processing a search request according to an embodiment of the present application, and referring to fig. 10, the embodiment is executed by a computer device, and is described by taking the computer device as an example, where the embodiment includes the following steps:
1001. the server receives the search request and analyzes the search request to obtain the query text.
Step 1001 is the same as step 301 in the previous embodiment, and is not described herein again.
1002. And the server carries out word segmentation processing on the query text to obtain a plurality of characters contained in the query text.
Step 1002 is similar to step 302 in the previous embodiment, and is not described herein.
1003. And the server extracts the characteristics of the characters to obtain the character characteristics of the query text.
Step 1003 is the same as step 303 of the previous embodiment, and is not described herein.
1004. The server inputs the character features of the classification indicators and the character features of the query text into a plurality of third coding layers of a shared coding model, codes the character features of the classification indicators and the character features of the query text through the plurality of third coding layers, and outputs the global category features of the query text and the semantic features of the query text corresponding to the classification indicators.
The shared coding model is used for coding a query text to obtain a global category feature for category prediction and a semantic feature for entity recognition, wherein the global category feature represents global semantics of each character in the query text on a category prediction task.
In other words, the shared coding model is used for extracting global category features for category prediction and semantic features for entity identification from the query text. The shared coding model is equivalent to a coding module which realizes the bottom layer sharing of the entity recognition model and the category prediction model in the previous embodiment.
In some embodiments, the shared coding model includes a plurality of third coding layers, and the third coding layers are used for coding the input signal to extract corresponding global class features and corresponding semantic features. Taking the shared coding model as the shared BERT coding model as an example for explanation, each third coding layer included in the BERT model is a bidirectional coding layer, and the bidirectional coding layer is used for performing forward coding and backward coding on the input signal, and a specific bidirectional coding manner is the same as that described in step 304 in the previous embodiment, which is not described herein again.
In some embodiments, after extracting character features of a plurality of characters in the query text, the character features of the classification indicator [ CLS ], the character features of the plurality of characters, the character features of the separator [ SEP ], and the character features of the filler [ PAD ] are spliced according to the order to obtain a target character feature sequence, the target character feature sequence is input into a shared BERT coding model, the target character feature sequence is forward-coded and backward-coded through a plurality of third coding layers of the shared BERT coding model, hidden vectors corresponding to the classification indicator [ CLS ] output in a last third coding layer are used as global class features, and semantic features corresponding to the plurality of characters are obtained as the semantic features of the query text, wherein the semantic features of the plurality of characters respectively correspond to the character features of the plurality of characters.
Fig. 11 is a schematic diagram of a shared BERT coding model provided in an embodiment of the present application, and as shown in fig. 11, the shared BERT coding model 1110 includes a plurality of third coding layers 1111, assuming that a query text is "XY product name using method", character features of each character in the extracted query text are: t is 1 、T 2 、T 3 、T 4 、T 5 、T 6 、T 7 、T 8 、T 9 Then, the classification indicator [ CLS ] is used]Is placed in T 1 Before (i.e. [ CLS ]]Is placed first), the separator [ SEP ] is placed]Is placed in T 9 Then (i.e. [ SEP ]]After the character feature of the last character), finally judging whether the sequence length of the current target character feature sequence reaches a third preset length, if not, using a filler [ PAD ] to obtain the final target character feature sequence]Until the length of the target character feature sequence is filled to reach a third preset length, wherein, the third predetermined length is any integer greater than or equal to 1, for example, the third predetermined length is 30 or other values, which can be specified by the skilled person. The target character characteristic sequence of the third preset length { [ CLS ]],T 1 ,T 2 ,T 3 ,T 4 ,T 5 ,T 6 ,T 7 ,T 8 ,T 9 ,[SEP],[PAD],[PAD],…,[PAD]The symbol sequence is inputted into a plurality of third coding layers 1111, the target character feature sequence is forward-coded and backward-coded through the plurality of third coding layers 1111, and the last third codingLayer 1111 outputs an implicit vector sequence { E } [CLS] ,E 1 ,E 2 ,E 3 ,E 4 ,E 5 ,E 6 ,E 7 ,E 8 ,E 9 ,E [SEP] ,E [PAD] ,E [PAD] ,…,E [PAD] The semantic features { E) corresponding to a plurality of characters in the query text in the hidden vector sequence are combined 1 ,E 2 ,E 3 ,E 4 ,E 5 ,E 6 ,E 7 ,E 8 ,E 9 Obtaining semantic features of the query text, and simultaneously comparing the hidden vector sequence with a classification indicator [ CLS ]]Corresponding hidden vector E [CLS] Obtaining the global category characteristics of the query text, wherein the hidden vector E [CLS] Representing the global semantics of the whole query text, the dimension can be set to 768, and the semantic feature E of each character 1 ~E 9 May also be set to 768.
In the above step 1004, a possible implementation manner is shown in which the query text of the search request is input into the shared coding model, and the global category feature of the query text is extracted through the shared coding model, that is, the shared coding model is a coding module shared at the bottom layer between the entity identification model and the category prediction model, at this time, the shared coding model not only outputs the global category feature of the query text, but also outputs the semantic feature of the query text, further, the global category feature of the query text is input into the following step 1005 to start category prediction on the query text, and at the same time, the query text can be subjected to entity identification by using the semantic feature of the query text.
In some embodiments, for the semantic features of the query text output by the shared coding model in step 1004, the server may perform entity identification on the query text based on the semantic features of the query text, so as to obtain at least one entity object included in the query text.
In some embodiments, in the entity identification process, the server may perform full-connection processing on the semantic features of the query text to obtain full-connection identification features of the query text; based on the full-connection identification feature, entity boundary position labels of a plurality of characters in the query text are obtained through prediction; and dividing the at least one entity object from the plurality of characters based on the entity boundary position labels of the plurality of characters respectively. The entity identification process is the same as the entity identification method in steps 305 to 307 in the previous embodiment, and is not described herein again.
Continuing with the example of FIG. 11, the implicit vector sequence { E ] output in the coding model 1110 will be shared [CLS] ,E 1 ,E 2 ,E 3 ,E 4 ,E 5 ,E 6 ,E 7 ,E 8 ,E 9 ,E [SEP] ,E [PAD] ,E [PAD] ,…,E [PAD] Get rid of the Classification indicator [ CLS ]]The remaining hidden vector sequence { E } 1 ,E 2 ,E 3 ,E 4 ,E 5 ,E 6 ,E 7 ,E 8 ,E 9 ,E [SEP] ,E [PAD] ,E [PAD] ,…,E [PAD] Splicing all the hidden vectors in the query text to obtain spliced hidden vectors, namely splicing characteristics, inputting the spliced hidden vectors into a first full-connection layer 1121 in an entity identification model 1120 to perform full-connection processing, so that full-connection identification characteristics of the query text can be obtained, further inputting the full-connection identification characteristics of the query text into a CRF layer 1122 to score each candidate path to output a label ID sequence of the candidate path with the highest path score, and determining an entity boundary position label of each character based on the label ID sequence, so that one or more entity objects can be segmented from the query text according to the entity boundary position label of each character.
1005. And the server performs full connection processing on the global category characteristic to obtain a full connection category characteristic of the query text.
In other words, the server inputs the global category feature into a plurality of second fully-connected layers of the category prediction model, and performs fully-connected processing on the global category feature through the plurality of second fully-connected layers to obtain a fully-connected category feature of the query text. Step 1005 is similar to sub-step B1 included in step 3093 in the previous embodiment, and therefore is not described herein again.
1006. And the server performs nonlinear mapping on the full-connection type characteristics to obtain the prediction scores of the query text belonging to the leaf types.
In other words, the server inputs the full-connection category features into an activation layer of a category prediction model for nonlinear mapping, and obtains prediction scores of the query text belonging to each of a plurality of leaf categories. Step 1006 is the same as sub-step B2 included in step 3093 in the previous embodiment, and is not described herein again.
1007. The server determines the leaf category with the largest prediction score as the target category associated with the query text.
In other words, the server determines the leaf category with the largest predictive score as the target category associated with the query text. Step 1007 is similar to sub-step B3 included in step 3093 in the previous embodiment, and is not described herein again.
Continuing with the example of FIG. 11, assume that the class prediction model includes two second fully-connected layers, each of which is followed by an active layer in series, wherein the weight matrix of the first fully-connected layer can be represented as W 21 After the first and second fully-connected layers, an active layer with an activation function of tanh may be concatenated, and the weight matrix of the second fully-connected layer may be represented as W 22 After the second fully-connected layer, an activation layer with an activation function sigmoid can be connected in series.
On the basis, the global class characteristic E output by the last third coding layer 1111 in the shared coding model 1110 [CLS] Inputting the data into the first and second fully-connected layers 1131 of the category prediction model 1130 via the weight matrix W 21 To global class feature E [CLS] Performing full connection processing to obtain a full connection feature, further performing activation processing on the full connection feature through a tanh activation layer to obtain a full connection activation feature, inputting the full connection activation feature into a second full connection layer 1132, and performing weight matrix W on the full connection activation feature 22 Come to rightAnd performing full connection processing on the full connection activation features to obtain full connection category features of the query text, then inputting the full connection category features into a sigmoid activation layer to perform nonlinear mapping on the full connection category features to obtain a prediction score of each leaf category of the query text, and further determining the leaf category with the highest prediction score as a target category to which the query text belongs. In one example, W 21 Has a dimension of [768, 768]In the case where the number of categories of the finest-grained leaf category in the preset category table is equal to 5363, W 22 May be set to 768, 5363]。
It should be noted that, in the embodiment of the present application, since the entity identification model and the category prediction model share the underlying coding module, that is, both use the features output by the shared coding model for further processing, it is equivalent to that the entity identification model and the category prediction model perform joint multi-task learning by using the shared coding model, so that the three models need to be trained cooperatively in the training phase, and optionally, the three models may use a common loss function, for example, the loss function is the sum of the cross entropy loss of the prediction category and the true category in the category prediction model and the negative log likelihood loss of the CRF in the entity identification model.
1008. And the server inquires one or more hierarchical categories associated with the target category from a preset category table.
The hierarchy type refers to an upper level or a lower level type which has an association relationship with the target type respectively under different hierarchies.
Step 1008 is the same as step 311 in the previous embodiment, and is not described herein.
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
According to the method provided by the embodiment of the application, the shared coding model can extract the semantic features of the Query text for entity recognition and can also extract the global category features for category prediction by training the shared coding model, so that the global category features extracted by the shared coding model can be comprehensively used for global semantic information of the semantic features of the Query text for entity recognition on a category prediction task, and therefore in the process of carrying out category prediction based on the global category features, the global category features have stronger expression capability, the phenomenon of missing recall of the long-tail Query can be improved, and the accuracy and recall rate of category prediction for the long-tail Query are greatly improved.
Hereinafter, the test effect of the method for processing a search request according to the embodiment of the present application will be described with reference to test data in some product search scenarios.
Because the HMCN model has a serious call-back missing problem for the long-tailed Query, as shown in table 5, in the commodity search scenario, regardless of the head Query of the user for high-frequency search or the long-tailed Query of the user for low-frequency search, the HMCN model is generally composed of entity objects of two entity types, namely, a brand word and a product word.
TABLE 5
Original Query Query type Brand word Product word Entity identification HMCN
XY product
1 Head Query XY Product 1
XY product 1 application method Long-tail Query XY Product 1 ×
What is the main efficacy of XY product 1 Long-tail Query XY Product 1 ×
Effect and efficacy of XY product 1 Long-tail Query XY Product 1 ×
Age to which XY product 1 is suitable Long-tail Query XY Product 1 ×
XY product 1 tax free shop price Long-tail Query XY Product 1 ×
XY product 1 for pregnant women Long-tail Query XY Product 1 ×
Dry skin can be treated with XY product 1 Long-tail Query XY Product 1 ×
Why allergy with XY product 1 Long-tail Query XY Product 1 ×
It can be seen that the category prediction task only has a good recall rate on the head Query due to focusing more on the global semantics of the original Query, but has a serious recall missing problem on the long-tail Query; and the entity identification task is more sensitive to learning of core entity segments (namely entity objects) in the original Query due to more attention paid to the entity objects in the original Query, so that the entity identification task has a good recall rate for both the head Query and the long-tail Query.
In a commodity searching scene, training data of an entity identification model only needs to cover common brand words and product words (namely only needs to cover common entity objects in a corresponding application scene), any head Query and/or long-tail Query can be correctly identified, and a category prediction model needs to cover full Query distribution as much as possible (the long-tail Query is difficult to cover completely, so that the searching frequency is low, the sample verification is insufficient, and the manual labeling cost is high), so that the time and the labor are consumed for covering the category prediction model to the full Query distribution, and the problem of long-tail Query recall missing is difficult to improve by simply expanding the training data of the category prediction model.
According to the method and the device, the entity object of the core in the Query text is extracted by means of the entity recognition model, so that the regular rewriting Query (namely, the entity Query obtained based on the entity object rewriting) more suitable for the multi-level and multi-label category prediction task is constructed, the trained category prediction mode has high recall rate and accuracy, and therefore various indexes and research and development efficiency of the whole search engine system are improved.
Schematically, table 6 shows the test results of performing class prediction on several types of original Query originally shown in table 5 by applying the multi-level multi-label class prediction method based on entity identification rewriting according to the embodiment of the present application.
TABLE 6
Original Query Query type Rewriting Query Recall results
XY product 1 Head Query XY product 1
XY product 1 application method Long-tail Query XY product 1
What is the main efficacy of XY product 1 Long-tail Query XY product 1
Effect and efficacy of XY product 1 Long-tail Query XY product 1
Age to which XY product 1 is suitable Long-tail Query XY product 1
XY product 1 tax free shop price Long-tail Query XY product 1
XY product 1 for pregnant women Long-tail Query XY product 1
Dry skin can be treated with XY product 1 Long-tail Query XY product 1
Why allergy with XY product 1 Long-tail Query XY product 1
Obviously, by comparing the recall result of the HMCN model in table 5 with the recall result of the embodiment of the present application, it can be found that the recall rate for the long-tailed Query is greatly increased, and the phenomenon of missed recall of the long-tailed Query in the conventional HMCN model is greatly improved.
Further, on a specific commodity search scene "commodity _ make-up skin care", a comparison experiment was performed using the conventional HMCN model and the method of the present application, and the comparison experiment results are shown in table 7.
TABLE 7
Training mode Rate of accuracy Recall rate F1 index
HMCN 54.0% 46.0% 49.7%
This application is a 96.0% 86.0% 90.7%(+41.0%)
The F1 index is also called as F1 index, and refers to a harmonic mean value of accuracy and recall.
It can be seen that after the method of the embodiment of the application is applied, on the class prediction task of the whole Query, the accuracy rate and the recall rate are both significantly improved, and the F1 index is also significantly improved (the increase is as high as 41.0%).
Fig. 12 is a schematic structural diagram of a device for processing a search request according to an embodiment of the present application, please refer to fig. 12, where the device includes:
an entity identification module 1201, configured to perform entity identification on a query text carried by the search request, to obtain at least one entity object included in the query text;
an obtaining module 1202, configured to obtain, based on the at least one entity object, at least one entity query text corresponding to the query text, where the entity query text is formed by combining one or more entity objects;
a category prediction module 1203, configured to perform category prediction on the at least one entity query text respectively to obtain candidate categories associated with the at least one entity query text;
a deduplication module 1204 is configured to perform deduplication on candidate categories associated with the at least one entity query text, respectively, to obtain target categories associated with the query text.
According to the device provided by the embodiment of the application, entity identification is carried out on the original Query to identify the entity object in the original Query, one or more entity queries are obtained by rewriting the original Query on the basis of the entity object to remove some non-entity characters in the original Query, so that only class prediction is carried out on each entity Query obtained by rewriting, and then duplication removal is carried out on each candidate class obtained by the class prediction, so that the target class to which the original Query belongs can be identified quickly and efficiently.
In a possible implementation, based on the apparatus composition of fig. 12, the entity identifying module 1201 includes:
the semantic feature extraction unit is used for extracting features of a plurality of characters contained in the query text to obtain semantic features of the query text;
and the entity identification unit is used for carrying out entity identification on the query text based on the semantic features of the query text to obtain the at least one entity object.
In one possible implementation, the semantic feature extraction unit is configured to:
performing word segmentation processing on the query text to obtain a plurality of characters contained in the query text;
extracting the characteristics of the characters to obtain the character characteristics of the query text;
inputting the character features of the query text into a plurality of first coding layers of an entity recognition model, coding the character features of the query text through the plurality of first coding layers, and outputting the semantic features of the query text, wherein the entity recognition model is used for carrying out entity recognition on the query text.
In a possible implementation, based on the apparatus composition of fig. 12, the entity identifying unit includes:
the full-connection subunit is used for inputting the semantic features of the query text into a first full-connection layer of the entity recognition model, and performing full-connection processing on the semantic features of the query text through the first full-connection layer to obtain full-connection semantic features of the query text;
the prediction subunit is used for inputting the fully connected semantic features into a conditional random field CRF layer of the entity recognition model, and obtaining entity boundary position labels of a plurality of characters in the query text through the CRF layer prediction;
and the dividing unit is used for dividing the at least one entity object from the plurality of characters based on the entity boundary position labels of the plurality of characters.
In one possible embodiment, the predictor is configured to:
obtaining a plurality of candidate paths formed by a plurality of candidate boundary position labels corresponding to the characters;
respectively scoring the multiple candidate paths through the CRF layer to obtain respective path scores of the multiple candidate paths, wherein the path scores represent the possibility that candidate boundary position labels contained in corresponding candidate paths belong to the entity boundary position labels;
and determining a plurality of candidate boundary position labels contained in the candidate path with the highest path score as entity boundary position labels of the plurality of characters.
In one possible embodiment, the partitioning subunit is configured to:
determining a start character and an end character of each of the at least one entity object based on the entity boundary position tags of each of the plurality of characters;
the at least one entity object is partitioned from the plurality of characters based on a start character and an end character of each of the at least one entity object.
In one possible implementation, based on the apparatus components of fig. 12, the category prediction module 1203 includes:
the character feature extraction unit is used for extracting the character features of any entity query text in the at least one entity query text;
the global feature extraction unit is used for acquiring global entity semantic features of the entity query text based on the character features of the entity query text, wherein the global entity semantic features represent the global semantics of all characters in the entity query text on the category prediction task;
and the predicting unit is used for predicting to obtain a candidate category associated with the entity query text based on the global entity semantic features.
In one possible implementation, the global feature extraction unit is configured to:
inputting the character features of the target classifier and the character features of the entity query text into a plurality of second coding layers of a category prediction model, wherein the category prediction model is used for predicting candidate categories associated with the entity query text;
and coding the character features of the target classifier and the character features of the entity query text through the plurality of second coding layers, and outputting the global entity semantic features corresponding to the target classifier.
In one possible embodiment, the prediction unit is configured to:
inputting the global entity semantic features into a plurality of second fully-connected layers of the category prediction model, and performing fully-connected processing on the global entity semantic features through the plurality of second fully-connected layers to obtain fully-connected entity semantic features of the entity query text;
carrying out nonlinear mapping on the semantic features of the fully-connected entity to obtain respective prediction scores of the entity query text belonging to a plurality of leaf categories;
and determining the leaf category with the largest prediction score as the candidate category associated with the entity query text.
In a possible embodiment, based on the apparatus composition of fig. 12, the apparatus further comprises:
and the query module is used for querying one or more hierarchy categories associated with the target category from a preset category table, wherein the hierarchy categories refer to upper-level categories or lower-level categories which respectively have association relations with the target category under different hierarchies.
All the above optional technical solutions can be combined arbitrarily to form the optional embodiments of the present disclosure, and are not described herein again.
It should be noted that: in the processing apparatus for a search request according to the above embodiment, when processing a search request, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution can be completed by different functional modules according to needs, that is, the internal structure of the computer device is divided into different functional modules to complete all or part of the above described functions. In addition, the processing apparatus for a search request and the processing method for a search request provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the processing method for a search request, and are not described herein again.
Fig. 13 is a schematic structural diagram of a device for processing a search request according to an embodiment of the present application, please refer to fig. 13, where the device includes:
the feature extraction module 1301 is configured to input a query text of a search request into a shared coding model, extract a global category feature of the query text through the shared coding model, where the global category feature represents global semantics of each character in the query text on a category prediction task, and the shared coding model is configured to encode the query text to obtain a global category feature available for category prediction and a semantic feature available for entity identification;
a full-connection module 1302, configured to perform full-connection processing on the global category feature to obtain a full-connection category feature of the query text;
the mapping module 1303 is configured to perform nonlinear mapping on the fully-connected category features to obtain respective prediction scores of the query text belonging to multiple leaf categories;
a determining module 1304, configured to determine the leaf category with the largest predictive score as the target category associated with the query text.
The device provided by the embodiment of the application can be used for extracting the semantic features of the Query text for entity recognition and extracting the global category features for category prediction by training a shared coding model, so that the global category features extracted by the shared coding model can be comprehensively used for global semantic information of the semantic features of the Query text for entity recognition on a category prediction task, and in the process of carrying out category prediction based on the global category features, the global category features have stronger expression capability, so that the phenomenon of missing recall of the long-tail Query can be improved, and the accuracy and recall rate of category prediction for the long-tail Query are greatly improved.
In one possible implementation, the feature extraction module 1301 is configured to:
performing word segmentation processing on the query text to obtain a plurality of characters contained in the query text;
extracting the characteristics of the characters to obtain the character characteristics of the query text;
inputting the character features of the classification indicator and the character features of the query text into a plurality of third coding layers of the shared coding model, coding the character features of the classification indicator and the character features of the query text through the plurality of third coding layers, and outputting the global class features corresponding to the classification indicator.
In a possible embodiment, the shared coding model further outputs semantic features of the query text, and the apparatus further includes, based on the apparatus composition of fig. 13:
and the entity identification module is used for carrying out entity identification on the query text based on the semantic features of the query text to obtain at least one entity object contained in the query text.
In one possible embodiment, the entity identification module is configured to:
carrying out full-connection processing on the semantic features of the query text to obtain full-connection identification features of the query text;
based on the full-connection identification feature, entity boundary position labels of a plurality of characters in the query text are obtained through prediction;
and dividing the at least one entity object from the plurality of characters based on the entity boundary position labels of the plurality of characters respectively.
In a possible embodiment, based on the apparatus composition of fig. 13, the apparatus further comprises:
and the query module is used for querying one or more hierarchy categories associated with the target category from a preset category table, wherein the hierarchy categories refer to upper-level categories or lower-level categories which respectively have association relations with the target category under different hierarchies.
All the above optional technical solutions can adopt any combination to form optional embodiments of the present disclosure, and are not described in detail herein.
It should be noted that: in the search request processing apparatus provided in the foregoing embodiment, when processing a search request, only the division of each function module is illustrated, and in practical applications, the function distribution can be completed by different function modules as needed, that is, the internal structure of the computer device is divided into different function modules to complete all or part of the functions described above. In addition, the processing apparatus for a search request and the processing method for a search request provided in the foregoing embodiments belong to the same concept, and specific implementation processes thereof are detailed in the processing method for a search request, and are not described herein again.
Fig. 14 is a schematic structural diagram of a computer device according to an embodiment of the present application, where the computer device 1400 may generate a relatively large difference due to different configurations or performances, and the computer device 1400 includes one or more processors (CPUs) 1401 and one or more memories 1402, where the memory 1402 stores at least one computer program, and the at least one computer program is loaded and executed by the one or more processors 1401 to implement the Processing method of the search request according to the embodiments. Optionally, the computer device 1400 further has components such as a wired or wireless network interface, a keyboard, and an input/output interface, so as to perform input/output, and the computer device 1400 further includes other components for implementing device functions, which are not described herein again.
In an exemplary embodiment, a computer-readable storage medium, such as a memory including at least one computer program, which is executable by a processor in a terminal to perform the processing method of the search request in the above embodiments, is also provided. For example, the computer readable storage medium includes a ROM (Read-Only Memory), a RAM (Random-Access Memory), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided that includes one or more computer programs stored in a computer readable storage medium. One or more processors of the computer device can read the one or more computer programs from the computer-readable storage medium, and the one or more processors execute the one or more computer programs, so that the computer device can execute the processing method for completing the search request in the above-described embodiments.
Those skilled in the art can understand that all or part of the steps for implementing the above embodiments can be implemented by hardware, or can be implemented by a program instructing relevant hardware, and optionally, the program is stored in a computer-readable storage medium, and optionally, the above-mentioned storage medium is a read-only memory, a magnetic disk, an optical disk, or the like.
The above description is only exemplary of the present application and should not be taken as limiting, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (20)

1. A method for processing a search request, the method comprising:
performing entity identification on a query text carried by a search request to obtain at least one entity object contained in the query text;
based on the at least one entity object, acquiring at least one entity query text corresponding to the query text, wherein the entity query text is formed by combining one or more entity objects;
respectively carrying out category prediction on the at least one entity query text to obtain candidate categories associated with the at least one entity query text;
and carrying out duplicate removal on the candidate categories respectively associated with the at least one entity query text to obtain the target categories associated with the query text.
2. The method of claim 1, wherein the performing entity identification on the query text carried in the search request to obtain at least one entity object included in the query text comprises:
extracting the features of a plurality of characters contained in the query text to obtain the semantic features of the query text;
and performing entity identification on the query text based on the semantic features of the query text to obtain the at least one entity object.
3. The method according to claim 2, wherein the extracting features of the plurality of characters included in the query text to obtain semantic features of the query text comprises:
performing word segmentation processing on the query text, obtaining a plurality of characters contained in the query text;
extracting the characteristics of the characters to obtain the character characteristics of the query text;
inputting the character features of the query text into a plurality of first coding layers of an entity recognition model, coding the character features of the query text through the plurality of first coding layers, and outputting the semantic features of the query text, wherein the entity recognition model is used for carrying out entity recognition on the query text.
4. The method of claim 3, wherein the entity recognizing the query text based on the semantic features of the query text to obtain the at least one entity object comprises:
inputting the semantic features of the query text into a first full-connection layer of the entity recognition model, and performing full-connection processing on the semantic features of the query text through the first full-connection layer to obtain full-connection semantic features of the query text;
inputting the fully connected semantic features into a Conditional Random Field (CRF) layer of the entity recognition model, and predicting by the CRF layer to obtain entity boundary position labels of a plurality of characters in the query text;
and dividing the at least one entity object from the plurality of characters based on the entity boundary position labels of the plurality of characters.
5. The method of claim 4, wherein the predicting entity boundary position labels of each of a plurality of characters in the query text by the CRF layer comprises:
acquiring a plurality of candidate paths formed by a plurality of candidate boundary position labels corresponding to the characters;
respectively scoring the multiple candidate paths through the CRF layer to obtain respective path scores of the multiple candidate paths, wherein the path scores represent the possibility that candidate boundary position labels contained in corresponding candidate paths belong to the entity boundary position labels;
and determining a plurality of candidate boundary position labels contained in the candidate path with the highest path score as entity boundary position labels of the plurality of characters.
6. The method of claim 4, wherein the dividing the at least one entity object from the plurality of characters based on the entity boundary position labels of each of the plurality of characters comprises:
determining a start character and an end character of each of the at least one entity object based on an entity boundary position tag of each of the plurality of characters;
and dividing the at least one entity object from the plurality of characters based on the respective starting character and ending character of the at least one entity object.
7. The method of claim 1, wherein performing category prediction on the at least one entity query text respectively to obtain candidate categories associated with the at least one entity query text respectively comprises:
extracting character features of any entity query text in the at least one entity query text;
acquiring global entity semantic features of the entity query text based on the character features of the entity query text, wherein the global entity semantic features represent global semantics of all characters in the entity query text on a category prediction task;
and predicting to obtain a candidate category associated with the entity query text based on the global entity semantic features.
8. The method of claim 7, wherein the obtaining global entity semantic features of the entity query text based on the character features of the entity query text comprises:
inputting character features of the target classifier and character features of the entity query text into a plurality of second coding layers of a category prediction model, wherein the category prediction model is used for predicting candidate categories associated with the entity query text;
and coding the character features of the target classifier and the character features of the entity query text through the plurality of second coding layers, and outputting the global entity semantic features corresponding to the target classifier.
9. The method of claim 8, wherein predicting the candidate category associated with the entity query text based on the global entity semantic features comprises:
inputting the global entity semantic features into a plurality of second fully-connected layers of the category prediction model, and performing fully-connected processing on the global entity semantic features through the plurality of second fully-connected layers to obtain fully-connected entity semantic features of the entity query text;
carrying out nonlinear mapping on the semantic features of the fully-connected entity to obtain the respective prediction scores of the entity query text belonging to a plurality of leaf categories;
and determining the leaf category with the largest prediction score as the candidate category associated with the entity query text.
10. The method of claim 1, wherein after the candidate categories associated with the at least one entity query text are de-duplicated to obtain the target categories associated with the query text, the method further comprises:
and inquiring one or more hierarchy categories associated with the target category from a preset category table, wherein the hierarchy categories refer to upper-level or lower-level categories which respectively have association relations with the target category under different hierarchies.
11. A method for processing a search request, the method comprising:
inputting a query text of a search request into a shared coding model, extracting global category characteristics of the query text through the shared coding model, wherein the global category characteristics represent global semantics of each character in the query text on a category prediction task, and the shared coding model is used for coding the query text to obtain global category characteristics for category prediction and semantic characteristics for entity recognition;
performing full-connection processing on the global category characteristics to obtain full-connection category characteristics of the query text;
carrying out nonlinear mapping on the fully-connected category characteristics to obtain respective prediction scores of the query text belonging to a plurality of leaf categories;
and determining the leaf category with the largest prediction score as the target category associated with the query text.
12. The method of claim 11, wherein the inputting query text of a search request into a shared coding model, wherein extracting global category features of the query text through the shared coding model comprises:
performing word segmentation processing on the query text to obtain a plurality of characters contained in the query text;
extracting the characteristics of the characters to obtain the character characteristics of the query text;
inputting the character features of the classification indicator and the character features of the query text into a plurality of third coding layers of the shared coding model, coding the character features of the classification indicator and the character features of the query text through the plurality of third coding layers, and outputting the global class features corresponding to the classification indicator.
13. The method of claim 11, wherein the shared coding model further outputs semantic features of the query text, the method further comprising:
and performing entity identification on the query text based on the semantic features of the query text to obtain at least one entity object contained in the query text.
14. The method of claim 13, wherein the performing entity identification on the query text based on the semantic features of the query text to obtain at least one entity object included in the query text comprises:
performing full-connection processing on the semantic features of the query text to obtain full-connection identification features of the query text;
based on the full-connection identification features, entity boundary position labels of a plurality of characters in the query text are obtained through prediction;
and dividing the at least one entity object from the plurality of characters based on the entity boundary position labels of the plurality of characters respectively.
15. The method of claim 11, wherein after determining the leaf category with the highest predicted score as the target category associated with the query text, the method further comprises:
and inquiring one or more hierarchy categories associated with the target category from a preset category table, wherein the hierarchy categories refer to upper-level or lower-level categories which respectively have association relations with the target category under different hierarchies.
16. An apparatus for processing a search request, the apparatus comprising:
the entity identification module is used for carrying out entity identification on the query text carried by the search request to obtain at least one entity object contained in the query text;
the obtaining module is used for obtaining at least one entity query text corresponding to the query text based on the at least one entity object, and the entity query text is formed by combining one or more entity objects;
the category prediction module is used for performing category prediction on the at least one entity query text respectively to obtain candidate categories associated with the at least one entity query text respectively;
and the duplication removing module is used for carrying out duplication removal on the candidate categories associated with the at least one entity query text respectively to obtain the target categories associated with the query text.
17. An apparatus for processing a search request, the apparatus comprising:
the feature extraction module is used for inputting a query text of a search request into a shared coding model, extracting global category features of the query text through the shared coding model, wherein the global category features represent global semantics of characters in the query text on a category prediction task, and the shared coding model is used for coding the query text to obtain global category features for category prediction and semantic features for entity recognition;
a full connection module for performing full connection processing on the global class characteristics, obtaining the full-connection type characteristics of the query text;
the mapping module is used for carrying out nonlinear mapping on the fully-connected category characteristics to obtain respective prediction scores of the query text belonging to a plurality of leaf categories;
and the determining module is used for determining the leaf category with the maximum prediction score as the target category associated with the query text.
18. A computer device, characterized in that the computer device comprises one or more processors and one or more memories in which at least one computer program is stored, the at least one computer program being loaded and executed by the one or more processors to implement the method of processing a search request according to any one of claims 1 to 10 or 11 to 15.
19. A storage medium having stored therein at least one computer program which is loaded and executed by a processor to implement a method of processing a search request according to any one of claims 1 to 10 or 11 to 15.
20. A computer program product, characterized in that it comprises at least one computer program which is loaded and executed by a processor to implement a method of processing a search request according to any one of claims 1 to 10 or 11 to 15.
CN202210959045.6A 2022-08-10 2022-08-10 Search request processing method and device, computer equipment and storage medium Pending CN115329176A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210959045.6A CN115329176A (en) 2022-08-10 2022-08-10 Search request processing method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210959045.6A CN115329176A (en) 2022-08-10 2022-08-10 Search request processing method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115329176A true CN115329176A (en) 2022-11-11

Family

ID=83922237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210959045.6A Pending CN115329176A (en) 2022-08-10 2022-08-10 Search request processing method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115329176A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982338A (en) * 2023-02-24 2023-04-18 中国测绘科学研究院 Query path ordering-based domain knowledge graph question-answering method and system
CN117316159A (en) * 2023-11-30 2023-12-29 深圳市天之眼高新科技有限公司 Vehicle voice control method, device, equipment and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115982338A (en) * 2023-02-24 2023-04-18 中国测绘科学研究院 Query path ordering-based domain knowledge graph question-answering method and system
CN117316159A (en) * 2023-11-30 2023-12-29 深圳市天之眼高新科技有限公司 Vehicle voice control method, device, equipment and storage medium
CN117316159B (en) * 2023-11-30 2024-01-26 深圳市天之眼高新科技有限公司 Vehicle voice control method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
WO2021027533A1 (en) Text semantic recognition method and apparatus, computer device, and storage medium
CN110427461B (en) Intelligent question and answer information processing method, electronic equipment and computer readable storage medium
CN107085581B (en) Short text classification method and device
CN112800170A (en) Question matching method and device and question reply method and device
CN115329176A (en) Search request processing method and device, computer equipment and storage medium
CN111221944A (en) Text intention recognition method, device, equipment and storage medium
CN112131876A (en) Method and system for determining standard problem based on similarity
CN111666400B (en) Message acquisition method, device, computer equipment and storage medium
CN111460783B (en) Data processing method and device, computer equipment and storage medium
CN113282711B (en) Internet of vehicles text matching method and device, electronic equipment and storage medium
CN113761868B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN113553412A (en) Question and answer processing method and device, electronic equipment and storage medium
CN111858898A (en) Text processing method and device based on artificial intelligence and electronic equipment
CN115795038B (en) Intent recognition method and device based on localization deep learning framework
CN113722492A (en) Intention identification method and device
CN112364664A (en) Method and device for training intention recognition model and intention recognition and storage medium
CN115408488A (en) Segmentation method and system for novel scene text
US11361031B2 (en) Dynamic linguistic assessment and measurement
CN114491079A (en) Knowledge graph construction and query method, device, equipment and medium
CN114722198A (en) Method, system and related device for determining product classification code
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN116484105B (en) Service processing method, device, computer equipment, storage medium and program product
WO2023134085A1 (en) Question answer prediction method and prediction apparatus, electronic device, and storage medium
CN117216617A (en) Text classification model training method, device, computer equipment and storage medium
CN115203372A (en) Text intention classification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40075674

Country of ref document: HK