CN113821588A - Text processing method and device, electronic equipment and storage medium - Google Patents

Text processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113821588A
CN113821588A CN202110614403.5A CN202110614403A CN113821588A CN 113821588 A CN113821588 A CN 113821588A CN 202110614403 A CN202110614403 A CN 202110614403A CN 113821588 A CN113821588 A CN 113821588A
Authority
CN
China
Prior art keywords
text
query
matched
matrix
participles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110614403.5A
Other languages
Chinese (zh)
Inventor
毛铁峥
赵子元
颜强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202110614403.5A priority Critical patent/CN113821588A/en
Publication of CN113821588A publication Critical patent/CN113821588A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Abstract

The application relates to the technical field of artificial intelligence, and discloses a text processing method, a text processing device, electronic equipment and a storage medium, wherein the method comprises the following steps: determining a matching weight matrix of the query text relative to the text to be matched, wherein the matching weight matrix comprises at least one of a first weight matrix and a second weight matrix; according to the matching weight matrix, enhancing the similarity matrix of the query text relative to the text to be matched to obtain a first similarity matrix, wherein the similarity matrix of the query text relative to the text to be matched is obtained by performing similarity calculation according to the word vector of each participle in the query text and the word vector of each participle in the text to be matched; determining a matching degree score between the query text and the text to be matched according to the first similarity matrix; and determining a target matching text according to the matching degree score between the query text and the text to be matched. The text matching accuracy can be effectively improved through the scheme.

Description

Text processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a text processing method and apparatus, an electronic device, and a storage medium.
Background
Text matching is widely applied in resource query scenarios, such as news information query, paper query, etc. In practice, the accuracy of text matching is found to be low, and the efficiency of resource query is low, so how to improve the accuracy of text matching is a technical problem to be solved urgently in the prior art.
Disclosure of Invention
The embodiment of the application provides a text processing method and device, electronic equipment and a storage medium, and aims to solve the problem of low text matching accuracy.
According to an aspect of an embodiment of the present application, there is provided a text processing method, including: determining a matching weight matrix of a query text relative to a text to be matched, wherein the matching weight matrix comprises at least one of a first weight matrix and a second weight matrix, the first weight matrix is determined according to the category of participles to which the participles in the query text belong and the category of participles to which the participles in the text to be matched belong, and the second weight matrix is determined according to the incidence relation between the participles in the query text and the participles in the text to be matched; according to the matching weight matrix, enhancing a similarity matrix of the query text relative to the text to be matched to obtain a first similarity matrix, wherein the similarity matrix of the query text relative to the text to be matched is obtained by performing similarity calculation according to word vectors of all participles in the query text and word vectors of all participles in the text to be matched; determining a matching degree score between the query text and the text to be matched according to the first similarity matrix; and determining a target matching text according to the matching degree score between the query text and the text to be matched.
According to an aspect of an embodiment of the present application, there is provided a text processing apparatus including: the matching weight matrix determining module is used for determining a matching weight matrix of the query text relative to the text to be matched, wherein the matching weight matrix comprises at least one of a first weight matrix and a second weight matrix; the first weight matrix is determined according to the word class to which each word in the query text belongs and the word class to which each word in the text to be matched belongs; the second weight matrix is determined according to the incidence relation between each participle in the query text and each participle in the text to be matched; the enhancement module is used for enhancing the similarity matrix of the query text relative to the text to be matched according to the matching weight matrix to obtain a first similarity matrix; the similarity matrix is obtained by performing similarity calculation according to the word vector of each participle in the query text and the word vector of each participle in the text to be matched; the matching degree score determining module is used for determining a matching degree score between the query text and the text to be matched according to the first similarity matrix; and the target matching text determining module is used for determining a target matching text according to the matching degree score between the query text and the text to be matched.
In some embodiments of the present application, based on the foregoing scheme, the matching weight matrix comprises a first weight matrix; a match weight matrix determination module comprising: the word segmentation type identification unit is used for identifying the word segmentation type to which each word in the query text belongs; the first weight determining unit is used for determining the first weight of each participle in the query text relative to each participle in the text to be matched according to the participle category to which each participle in the query text belongs, the participle category to which each participle in the text to be matched belongs and the weight mapping information; the weight mapping information indicates a first weight associated with any two-part word category; and the first weight matrix determining unit is used for combining first weights of all the participles in the query text relative to all the participles in the text to be matched to obtain the first weight matrix.
In some embodiments of the present application, based on the foregoing scheme, the word segmentation class identification unit includes: a first entity link information obtaining unit, configured to obtain first entity link information, where the first entity link information is obtained by performing entity link on each participle in the query text in a knowledge graph; and the word segmentation category determining unit is used for linking the word segmentation in the query text to the word segmentation category to which the first entity belongs in the knowledge graph as the word segmentation category to which the word segmentation in the query text belongs.
In other embodiments of the present application, based on the foregoing scheme, the matching weight matrix comprises a second weight matrix; a match weight matrix determination module comprising: the incidence relation identification unit is used for identifying the incidence relation between the participles in the query text and the participles in the text to be matched according to a knowledge graph; the second weight determining unit is used for carrying out weight search according to the incidence relation to obtain a second weight of the participles in the query text relative to the participles in the text to be matched; and the second weight matrix determining unit is used for combining the participles in the query text with respect to the second weight of the participles in the text to be matched to obtain the second weight matrix.
In some embodiments of the present application, based on the foregoing scheme, the association relationship identifying unit includes: a first entity link information obtaining unit, configured to obtain first entity link information, where the first entity link information is used to indicate a first entity to which a participle in the query text is linked on the knowledge graph; a second entity link information obtaining unit, configured to obtain second entity link information, where the second entity link information is used to indicate a second entity to which a participle in the text to be matched is linked on the knowledge graph; and the incidence relation determining unit is used for determining the incidence relation between the first entity and the second entity in the knowledge spectrogram as the incidence relation between the corresponding participle in the query text and the corresponding participle in the text to be matched.
In some embodiments of the present application, based on the foregoing, the enhancement module is further configured to: and multiplying the matching weight matrix and the similarity matrix to obtain the first similarity matrix.
In some embodiments of the present application, based on the foregoing scheme, the matching score determining module includes: the pooling processing unit is used for pooling the first similarity matrix to obtain a second similarity matrix; and the matching degree score calculating unit is used for calculating the matching degree score between the query text and the text to be matched according to the second similarity matrix.
In some embodiments of the present application, based on the foregoing scheme, the matching degree score calculating unit includes: the attention weighting unit is used for carrying out attention weighting on the second similarity matrix based on an attention mechanism to obtain a third similarity matrix; and the score prediction unit is used for performing score prediction according to the third similarity matrix to obtain a matching degree score between the query text and the text to be matched.
In some embodiments of the present application, based on the foregoing scheme, the attention weighting unit includes: the key matrix determining unit is used for performing linear transformation on the word vectors of all the participles in the query text according to the key weight vectors to obtain the key vectors corresponding to all the participles in the query text; the query matrix determining unit is used for performing linear transformation on the semantic feature vector corresponding to the query text according to the query weight vector to obtain a query vector; the attention score determining unit is used for calculating the attention score corresponding to each participle in the query text according to the key vector corresponding to each participle in the query text and the query vector; a target similarity vector determining unit, configured to weight, according to the attention scores corresponding to the participles in the query text, the value vectors corresponding to the participles in the query text, respectively, to obtain target similarity vectors corresponding to the participles in the query text; the value vector corresponding to the word in the query text is obtained by performing linear transformation on the similarity vector corresponding to the word in the query text according to the value weight vector, and the similarity vector corresponding to the word in the query text is obtained by extracting elements related to the corresponding word in the query text from the second similarity matrix and combining the extracted elements; and the third similarity matrix determining unit is used for combining the target similarity vectors corresponding to the participles in the query text to obtain the third similarity matrix.
In some embodiments of the present application, based on the foregoing scheme, the second similarity matrix comprises a first pooling matrix and a second pooling matrix; the pooling treatment unit includes: the first pooling processing unit is used for pooling the first similarity matrix along the transverse direction of the first similarity matrix to obtain a first pooled matrix; and the second pooling processing unit is used for pooling the first similarity matrix along the longitudinal direction of the first similarity matrix to obtain a second pooled matrix.
In some embodiments of the present application, based on the foregoing solution, the target matching text determining module includes: the sorting unit is used for sorting the texts to be matched according to the sequence of the matching degree scores from large to small; and the target matching text determining unit is used for determining the texts to be matched in the preset number in the front in the sequence as target matching texts.
In some embodiments of the present application, based on the foregoing solution, the text processing apparatus further includes: a service query request receiving module, configured to receive a service query request sent by a client, where the service query request indicates the query text; and further comprising: the application information acquisition module is used for acquiring application information of a target service application, wherein the target service application refers to a service application corresponding to the target matching text; the query result generation module is used for generating a query result according to the application information; and the query result returning module is used for returning the query result to the client so that the client displays the service entrance of the target service application according to the query result.
According to an aspect of an embodiment of the present application, there is provided an electronic device including: a processor; a memory having computer readable instructions stored thereon which, when executed by the processor, implement a text processing method as described above.
According to an aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon computer-readable instructions which, when executed by a processor, implement a text processing method as described above.
According to the scheme, a similarity matrix of the query text relative to the text to be matched is enhanced through a matching weight matrix of the query text relative to the text to be matched, the matching weight matrix comprises at least one of a first weight matrix and a second weight matrix, the first weight matrix is determined according to the category of participles to which the participles in the query text belong and the category of the participles to which the participles in the text to be matched belong, and the second weight matrix is determined according to the incidence relation between the participles in the query text and the participles in the text to be matched; therefore, in the process of calculating the matching degree score between the query text and the text to be matched, besides the similarity calculated based on the word vector, the factors of the incidence relation between the participles in the query text and the participles in the text to be matched are also referred, and/or the factors of the participle category to which the participles in the query text belong and the participle category to which the participles in the text to be matched belong are referred.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. It is obvious that the drawings in the following description are only some embodiments of the application, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1A shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied.
FIG. 1B is a schematic diagram illustrating a query interface, according to an embodiment of the present application.
FIG. 1C is a schematic diagram illustrating a secondary query interface for a "music" option, according to an embodiment of the present application.
FIG. 2 is a flow diagram illustrating a text processing method according to one embodiment of the present application.
Fig. 3 is a flow chart illustrating step 210 in the corresponding embodiment of fig. 2 according to an embodiment of the present application.
FIG. 4 is a diagram illustrating knowledge graph-based recognition of association between a category of participles and two participles, according to an embodiment of the present application.
Fig. 5 is a flow chart illustrating step 210 in the corresponding embodiment of fig. 2 according to another embodiment of the present application.
Fig. 6 is a flowchart illustrating step 230 in the corresponding embodiment of fig. 2 according to an embodiment of the present application.
Fig. 7 is a flowchart illustrating step 620 in the corresponding embodiment of fig. 6 according to an embodiment of the present application.
Fig. 8 is a diagram illustrating attention weighting based on an attention mechanism according to an embodiment of the present application.
Fig. 9 is a flowchart illustrating a text processing method according to another embodiment of the present application.
Fig. 10 is a model diagram illustrating a text matching model according to an embodiment of the application.
Fig. 11 is a block diagram of a text processing apparatus according to an embodiment of the present application.
FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the subject matter of the present application can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the application.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It should be noted that: reference herein to "a plurality" means two or more. "and/or" describe the association relationship of the associated objects, meaning that there may be three relationships, e.g., A and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Before further explanation of the embodiments of the present application, the terms and expressions referred to in the embodiments of the present application will be explained and explained.
Vertical searching: the search engine is a specialized search engine in a certain industry, is subdivided and extended, integrates certain special information in a library once, extracts required data from directed subsections, processes the data and returns the data to a user in a certain form. Vertical searches may provide valuable information and related services for a particular domain, a particular group of people, or a particular need. Public searches, applet searches, provided in social application platforms such as WeChat, may be considered a vertical search.
Service search: the method refers to searching with the service as a retrieval target, and the service matched with the query text input by the user can be directly displayed to the user through the service searching. For example, when searching for a caregiver, the service search may directly provide a caregiver service menu; for another example, when searching for an express delivery, the service search may directly provide a service entry for the express delivery service, and the user may directly trigger the service entry to enter a page for the express delivery. The service search is a vertical search.
Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technique of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.
The artificial intelligence technology is a comprehensive subject and relates to the field of extensive technology, namely the technology of a hardware level and the technology of a software level. The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
Natural Language Processing (NLP) is a subject of Language analysis, understanding, and Processing of Natural Language by computer technology, and it uses computer as a powerful tool for Language research, and it quantifies and studies Language information with the support of computer, and provides Language description that can be used by both human and computer. The method comprises a Natural Language Understanding (NLU) part and a Natural Language Generation (NLG) part.
Text matching is widely applied to scenes such as resource query (or resource retrieval), content recommendation, intelligent question answering and the like. An important link in text matching is natural language understanding, and text matching is performed on the basis of understanding the semantics of texts by computer equipment through natural language understanding technology.
The problem of low text matching accuracy exists in the related art, and if the text matching degree is low in a resource retrieval scene, a user spends more time to further perform resource screening from a retrieval result, or required resources cannot be obtained from the retrieval result. Based on this, in order to solve the problem that text matching accuracy is low in the related art, the scheme of the application is provided.
Fig. 1A shows a schematic diagram of an exemplary system architecture to which the technical solutions of the embodiments of the present application can be applied. As shown in fig. 1A, the system architecture may include a terminal device (e.g., one or more of a smartphone 101, a tablet 102, and a portable computer 103 shown in fig. 1A, but may also be a desktop computer, etc.), a network 104, and a server 105. The network 104 serves as a medium for providing communication links between terminal devices and the server 105. Network 104 may include various connection types, such as wired communication links, wireless communication links, and so forth.
It should be understood that the number of terminal devices, networks, and servers in FIG. 1A are merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, server 105 may be a server cluster comprised of multiple servers, or the like.
In some embodiments of the present application, the text processing method may be executed by the server 105, and the user may input the query text through the terminal device and initiate the query request to the server 105. After receiving the query request, the server 105 extracts the query text from the query request, determines matching degree scores between the query text and each text to be matched according to the method of the application, further screens out the text to be matched with the query text with higher matching degree score based on the matching degree scores, and uses the text to be matched as a target matching text, so that a query result is generated according to the target matching text, and the query result is returned to the terminal device.
It will be appreciated that the terminal device provides a query entry for the user to enter query text, and upon detecting that query text has been entered at the query entry, the terminal device may initiate a query request to the server based on the query text.
In some embodiments of the present application, in a scenario where multiple data sources are provided in a user interface of a terminal device, a user may further perform data source selection, where the selected data source is used as a target data source. After receiving the query request sent by the terminal device, the server 105 determines matching degree scores of each text to be matched and the query text in the target data source selected by the user, and further determines a query result corresponding to the query text based on the text to be matched in the target data source.
In an application scenario of the present application, the query text may be text input for performing a video query, an image query, an audio query, a text content (e.g., news, blogs, papers, articles on the public number, etc.) query, a service application query, a commodity query, an applet query, a public number query, an advertisement query, an emoticon query. One or more words may be included in the query text.
In each query scenario, a resource description text corresponding to each resource is correspondingly provided in the data source, the resource description text is a text to be matched in the scheme of the application, and the resource includes, for example, the video, the image, the audio, the text content, the applet, the service application, the public number, the expression and the like listed above. The resource description text refers to a text for describing a corresponding resource, such as a video brief description corresponding to a video, an image brief description text corresponding to an image, an audio brief description text corresponding to an audio, lyrics and/or a song name corresponding to a song, etc., a service brief description text of a service application, etc.; the resource description text may also be text content of other aspects, for example, one or more items of text labels labeled for the resources, names of the resources, evaluation content of other users for the resources, and the like, which are not specifically limited herein.
Certainly, the types of resources to be queried are different, and after the server 105 determines the target matching text corresponding to the query text, the query results returned to the terminal device are correspondingly different. For example, if the query request is initiated for querying a service application, after determining a target matching text corresponding to the query text, generating a query result according to the service application associated with the target matching text, where the query result is used to indicate the service application associated with the target matching text; if the query request is initiated for querying the video, the query result is used for indicating the video associated with the target matching text; if the query request is initiated for a query for goods, the query result is used to indicate the goods associated with the target matching text.
In some embodiments of the present application, the query entry provided in the query interface may be used for querying one type of resource, and may also be used for querying multiple types of resources. For example, a query for applets, service applications, emoticons, and public numbers may be made based on query text entered in a query portal.
FIG. 1B is a schematic diagram illustrating a query interface, according to a particular embodiment. As shown in fig. 1B, a query entry 111 is provided in the query interface, and query text can be entered at the query entry 111. The query text can be obtained by directly inputting the text, or by converting the input voice into the text, or by performing character recognition on characters in the input image.
The interface shown in FIG. 1B also includes a resource type selection area 112 in which a plurality of resource type options are provided for selection, and the resource type options in the resource type selection area 112 shown in FIG. 1B include an "article" option, a "pubic" option, a "applet" option, a "music" option, an "emoticon" option, and a "service" option. It should be noted that the resource type options shown in the resource type selection area in fig. 1B are merely exemplary examples, and in other embodiments, the resource type selection area may further include other more or fewer resource type options, which may be specifically set according to actual needs. The resource type option is provided for the user to select the resource type according to the actual query requirement, and then the targeted query is performed in the resource library corresponding to the selected resource type, and the query result does not include the resources in the resource library corresponding to the unselected resource type, so that the user can conveniently and quickly obtain the required resources from the query result, and the resource query efficiency is improved. Resource query in a resource library corresponding to the resource type selected by the user is a vertical search.
Setting a resource library for each resource type option, for example, setting an article resource library for an article option, where the article resource library includes a plurality of articles available for query and description texts corresponding to the articles; for another example, for a "service" option, a service application resource library is set, where the service application resource library includes multiple invocation interfaces of a service application that can be entered by a user and a description text corresponding to the service application, the invocation interface based on the service application can display an application entry of the service application in an interface of a query result, and the user can enter the interface of the service application by triggering the application entry of the service application.
If the user selects a resource type option in the resource type selection area 112, a target matching text corresponding to the query text is determined in the resource library corresponding to the selected resource type option, and then a target resource for the target matching text is determined. For example, if the user selects the option of "expression", the target matching text corresponding to the query text is determined in the resource library corresponding to the option of "expression", and then the expression image corresponding to the target matching text is determined.
In some embodiments of the present application, if the user selects a resource type option in the resource type selection, the user may also jump from the query interface shown in fig. 1B to a secondary query interface for the selected resource type option. Fig. 1C shows a secondary query interface for the option "music", where if a user inputs a query text in the query entry 111 of the secondary query interface shown in fig. 1C, a corresponding request is made to perform text matching on a resource library (i.e., in a music resource library) corresponding to the option "music" based on the query text, determine a target matching text matching the query text, and further determine music corresponding to the target matching text. Further, as shown in FIG. 1C, topical search content is further provided in the secondary query interface for selection by the user.
In some embodiments of the present application, if the user does not select a resource type in the query interface shown in fig. 1B, the query texts may be matched in multiple resource libraries based on the query texts, so that the obtained query result may include resources of multiple resource types. For example, if the query text is "home appliance maintenance", and the user does not select the resource type in the query interface shown in fig. 1B, the query result may include the matched resources such as articles, service applications, applets, emoticons, and the like.
In other application scenarios, the query interface shown in fig. 1B may also be a query entry provided for a resource query of one set resource type or multiple set resource types, so that a user does not need to additionally select a resource type option, and correspondingly, a resource type selection area may not be set.
In other embodiments, the method of the present application may also be executed by a terminal device with computing processing capability, or a system composed of the terminal device and a server, and is not limited in particular herein.
The details of implementation of the technical solution of the embodiments of the present application are set forth in the following.
Fig. 2 is a flowchart illustrating a text processing method according to an embodiment of the present application, which may be executed by a computer device with processing capability, such as a server or a terminal device, and is not limited in detail herein. Referring to fig. 2, the method includes at least steps 210 to 240, which are described in detail as follows:
step 210, determining a matching weight matrix of the query text relative to the text to be matched, wherein the matching weight matrix comprises at least one of a first weight matrix and a second weight matrix, the first weight matrix is determined according to the category of the participles to which the participles in the query text belong and the category of the participles to which the participles in the text to be matched belong, and the second weight matrix is determined according to the incidence relation between the participles in the query text and the participles in the text to be matched.
Query text refers to text entered for resource queries. The query text may include one or more words; the query text may be a sentence or a piece of text according to the grammar rule, or may be a text formed by several independent phrases or word combinations. The query text can be obtained by directly inputting a text by a user, can also be obtained by converting voice input by the user into a text, and can also be obtained by performing character recognition on a picture input by the user.
In order to perform resource query for a user, a resource library is correspondingly provided, and the resource library comprises a plurality of resources. The text to be matched refers to resource description text for describing resources in the resource library. Each resource in the resource library corresponds to a resource description text. The resource description text corresponding to the resource may include one or more of a resource name, a resource profile, a resource source, a resource tag, evaluation content of the resource by other users, and the like.
The resource types are different, and the resource description texts corresponding to the resources may have differences. For example, if the resource type is a service type, the service description text corresponding to the service application may include a name of the service application and profile information of the service application; if the resource type is a type of a public article, the resource description text corresponding to the public article may be one or more of a brief description of the public article (e.g., a summary of the public article), an author of the public article, a tag of the public article (e.g., a keyword), a title of the article, and the like.
And the matching weight matrix of the query text relative to the text to be matched is used for indicating the matching weight of the participles in the query text relative to the participles in the text to be matched. And an element in the matching weight matrix is used for representing the matching weight of a word in the query text relative to a word in the text to be matched.
In some embodiments of the present application, the first weight matrix may be used alone as the matching weight matrix, the second weight matrix may be used alone as the matching weight matrix, and both the first weight matrix and the second weight matrix may be used as the matching weight matrix.
In order to determine the first weight matrix or the second weight matrix of the query text relative to the text to be matched, the query text and the text to be matched need to be segmented, and the segmentation in the query text and the segmentation in the text to be matched are determined.
In some embodiments of the present application, the query text and the text to be matched may be participled according to a dictionary. In some scenes, because resources of some professional fields relate to more professional vocabularies, such as the medical field, the material field, the chemical field and the like, and the part of professional vocabularies is greatly different from daily universal vocabularies, in order to ensure the accuracy of word segmentation, in the retrieval of resources related to the professional fields, a dictionary related to the professional fields can be constructed in advance aiming at the professional vocabularies possibly related to the professional fields, and then the constructed dictionary related to the professional fields is used for segmenting the query text and the text to be matched related to the professional fields.
An element in the first weight matrix represents a matching weight of a word in the query text relative to a word in the text to be matched (for the convenience of distinguishing, the matching weight represented in the first weight matrix is called a first weight); an element in the second weight matrix represents another matching weight of a word in the query text relative to a word in the text to be matched (the matching weight represented by the element in the second weight matrix is referred to as a second weight).
In some embodiments of the present application, a segmentation word pair may be constructed based on a segmentation word in a query text and a segmentation word in a text to be matched. Wherein, one word in the word segmentation pair is from the query text, and the other word is from the text to be matched; on the basis, the elements in the first weight matrix are determined based on the word segmentation categories to which the two word segmentations in the word segmentation pairs respectively belong, and the elements in the second weight matrix are determined based on the incidence relation between the two word segmentations in the word segmentation pairs.
In some embodiments of the present application, a mapping relationship between a word and a word category may be preset to obtain a word-word category mapping dataset; and then, performing word segmentation searching on the word segmentation-word segmentation class mapping data set aiming at the word segmentation in the query text, and taking the word segmentation class corresponding to the searched word segmentation as the word segmentation class to which the word segmentation in the query text belongs. The word segmentation category to which the word segmentation in the text to be matched belongs can also be determined according to the method, and details are not repeated here.
The segmentation categories in the segmentation-segmentation category dataset can be set according to actual needs. In some embodiments of the present application, the participle categories may include entity words, status words, behavior words, brand words, compound words, and the like. The entity words refer to words for representing entities, the state words refer to words for representing states, the behavior words refer to words for representing actions and behaviors, the brand words refer to words for representing brand names, and the compound words refer to phrases formed by combining at least two words. For example, under the setting of the word segmentation category, if the word segmentation result of a text is: in the word segmentation result, the 'beautiful' is a brand word, the 'air conditioner' is a real word, the 'clean' is a behavior word, the 'home' is a status word, and the 'service' is a behavior word; if the term "home service" is regarded as a word segmentation in the word segmentation process, the term "home service" is a compound word.
The above is merely an exemplary example of the word segmentation categories, and in other embodiments, more or fewer word segmentation categories may be set according to other classification principles, for example, in the chemical field, entity words may be further classified into chemical substance name words, instrument name words, person name words, and the like.
In some embodiments of the present application, on the basis of setting the segmentation class, a first weight corresponding to a segmentation class pair formed by any two segmentation classes is further set, and on this basis, the first weight corresponding to the segmentation class pair may be searched based on the segmentation class pair formed by the segmentation classes to which the two segmentation classes in the segmentation pair belong, where the searched first weight is the first weight corresponding to the segmentation class pair. Repeating this process may correspond to determining each element in the first weight matrix.
In some embodiments of the present application, the second weight corresponding to each association relationship is preset. And after determining the association relationship between the two participles in each participle pair, taking the second weight corresponding to the association relationship as the second weight corresponding to the participle pair. And combining second weights corresponding to all part of word pairs constructed by the word segmentation in the query text and the text to be matched to obtain a second weight matrix.
The association relationship between two segments may be a relationship determined based on the semantics of the two segments, and may be, for example, an hypernym (hyponym) relationship, a synonym relationship, an antonym relationship, or the like; the association relationship between two segmented words may also be a relationship determined based on a common collocation of the two segmented words, such as an expanded word relationship. The expanded word relationship means that two participles are semantically unrelated, but the two participles are usually collocated for use. For example, in "home appliance cleaning", two participles, namely "home appliance" and "cleaning", are not semantically related, but the two participles are usually collocated, so that the two participles are in an expanded word relationship. If the two segmentations have no semantic relationship and are not general collocation, the association relationship of the two segmentations can be set as other relationships. The association listed above is merely an exemplary example, and is not considered to limit the scope of the application, and in other embodiments, more or less associations may be set.
In some embodiments of the present application, to facilitate matrix calculation, the elements in the first weight matrix and the elements in the second weight matrix may be arranged in the same order, for example, the horizontal order in the first weight matrix is arranged according to the precedence order of the participles corresponding to the elements in the text to be matched, and the vertical order is arranged according to the precedence order of the participles corresponding to the elements in the query text. The elements in the second weight matrix are also arranged in the same order.
And step 220, according to the matching weight matrix, enhancing the similarity matrix of the query text relative to the text to be matched to obtain a first similarity matrix, wherein the similarity matrix of the query text relative to the text to be matched is obtained by performing similarity calculation according to the word vector of each participle in the query text and the word vector of each participle in the text to be matched.
A Word embedding is a vector that maps words or phrases to real numbers. Such a mapping may be generated by neural networks, probabilistic models, interpretable knowledge base methods.
Before step 220, after the query text and the text to be matched are respectively participated, the participated words in the query text and the participated words in the text to be matched are respectively mapped to a real number vector space, and word vectors of the participated words in the query text and word vectors of the participated words in the text to be matched are obtained.
In some embodiments of the present application, the similarity matrix of the query text with respect to the text to be matched may be determined in units of word pairs (where one word is derived from the query text and the other word is derived from the text to be matched). And one element in the similarity matrix is used for reflecting the similarity between word vectors corresponding to two participles in a participle pair, and the similarity matrix between the query text and the text to be matched is obtained by combining the similarity between the word vectors corresponding to the two participles in each participle pair. The similarity between two word vectors may be cosine similarity, euclidean distance, etc., and is not limited herein.
In some embodiments of the present application, step 220 further comprises: and multiplying the matching weight matrix and the similarity matrix to obtain a first similarity matrix.
And step 230, determining a matching degree score between the query text and the text to be matched according to the first similarity matrix.
With the similarity matrix, each element in the first similarity matrix reflects the similarity of two segmented words in a segmented word pair after enhancement. After the first similarity matrix is obtained, the enhanced similarity corresponding to the segmented words of the whole part of word pairs can be integrated to calculate the matching degree score, so that the matching degree score directly and comprehensively reflects the matching degree between the query text and the text to be matched.
In some embodiments of the present application, classification prediction may be performed according to the first similarity matrix through a fully-connected network, and a matching degree score between the query text and the text to be matched is output.
And 240, determining a target matching text according to the matching degree score between the query text and the text to be matched.
In some embodiments of the present application, in a scenario where there are multiple texts to be matched, a target matching text corresponding to a query text may be determined according to the following process: sequencing the texts to be matched according to the sequence of the matching degree scores from large to small; and determining the texts to be matched in the preset number in the front in the sequence as target matching texts. The set number may be set according to actual needs, for example, 5, 10, 20, 30, etc., and is not particularly limited herein.
In some embodiments of the present application, a matching degree score threshold may also be set, and if the matching degree score between a text to be matched and the query text is not lower than the matching degree score threshold, the text to be matched is determined as the target matching text.
In some embodiments of the present application, after determining the target matching text, a query result may be determined based on the target matching text, where the query result is used to indicate a resource corresponding to the target matching text, and the query result is returned to the user. If the query text is a text input for service application query, the resource corresponding to the target matching text is the service application associated with the target matching text; and if the query text is the text input for performing advertisement query, the resource corresponding to the target matching text is the advertisement associated with the target matching text.
In text matching in the related art, generally, a matching degree score between two texts is calculated directly based on the similarity of each word pair in the two texts (a query text and a text to be matched), that is, the matching degree score only refers to the similarity between word vectors of two words in the word pair.
The inventors of the present application found in practice that: if the co-occurrence frequency of two segmented words requiring similarity calculation is high, such as conventional collocation, the existing word vector generation model considers the two segmented words as segmented words with high semantic similarity because of the high co-occurrence frequency of the two segmented words, and thus the similarity of the word vectors generated for the two segmented words is also high. In this case, if the matching degree score between the two texts is calculated by considering only the similarity between the word vectors corresponding to the two participles in the participle pair, the higher similarity between the word vectors corresponding to the participles with the higher co-occurrence frequency may cause the difference between the matching degree reflected by the calculated matching degree score and the actual matching degree between the two texts, resulting in identifying the text that is actually not matched as the matched text. Meanwhile, the inventor of the present application has also found that if the association relationship between two participles is an hypernym (hyponym) relationship in the text matching process, this situation may also bring noise to the text matching. In summary, the association relationship between two participles in a participle pair affects the semantic matching degree of the two participles, and the influence cannot be reflected in the similarity obtained by performing similarity calculation on the word vectors of the two participles.
The inventor of the present application further finds that in the related art, the matching degree score between two texts is calculated by referring to the similarity of each participle pair in the two texts, and no other factors are introduced to calculate the matching degree score. Correspondingly, the contribution degree of the participles of different participle category pairs to the judgment of the matching degree of the two texts is different.
The scheme of the application is provided based on the fact that incidence relations among the participles and different participle categories have influence on text matching noise.
According to the scheme, a similarity matrix of the query text relative to the text to be matched is enhanced through a matching weight matrix of the query text relative to the text to be matched, the matching weight matrix comprises at least one of a first weight matrix and a second weight matrix, the first weight matrix is determined according to the category of participles to which the participles in the query text belong and the category of the participles to which the participles in the text to be matched belong, and the second weight matrix is determined according to the incidence relation between the participles in the query text and the participles in the text to be matched; therefore, in the process of calculating the matching degree score between the query text and the text to be matched, besides the similarity calculated based on the word vector, the factors of the incidence relation between the participles in the query text and the participles in the text to be matched are also referred, and/or the factors of the participle category to which the participles in the query text belong and the participle category to which the participles in the text to be matched belong are referred.
In some embodiments of the present application, the matching weight matrix comprises a first weight matrix. In this embodiment, as shown in fig. 3, step 210 includes:
step 310, identifying the segmentation class to which each segmentation in the query text belongs.
In some embodiments of the present application, the segmentation class to which each segmentation in the query text belongs may be identified according to the segmentation-segmentation class data set constructed above.
In other embodiments of the present application, with continued reference to FIG. 3, step 310 further includes steps 311 and 312. In step 311, first entity link information is obtained, where the first entity link information is obtained by performing entity link on each participle in the query text in the knowledge graph.
The knowledge graph comprises a plurality of graph nodes which are mutually associated, each graph node represents a word, and the relationship between the graph nodes is used for indicating the association relationship between two words positioned at the two graph nodes; in the scheme of the application, the word classification to which the word belongs is further set for the word located in the graph node. Words located at graph nodes may also be referred to as entities.
The step of carrying out entity linkage on a participle in the knowledge graph refers to searching and determining an entity corresponding to the participle in the knowledge graph, wherein the determined entity is the entity linked to the participle on the knowledge graph. For ease of distinction, the entity to which the participles in the query text are linked on the knowledge-graph is referred to as the first entity.
Since there are instances of entities with the same name and different names (the same word has different meanings in different contexts, for example, "apple" may refer to "apple" and also to "apple" as an electronic technology company) or entities with different names (the same thing may include multiple references, for example, "potato" and "potato" refer to the same object), in order to avoid ambiguity, the participles in the query text are linked by entities, so as to clarify the entities referred to by the participles in the query text.
In some embodiments of the present application, since the knowledge graph has knowledge of the association relationship between the entities and the segmentation class to which the entity belongs, the knowledge graph can be used to identify the query text, the segmentation class to which each segmentation in the text to be matched belongs, and identify the association relationship between a segmentation in the query text and a segmentation in the text to be matched based on the knowledge graph.
FIG. 4 is a diagram illustrating the identification of association between a category of participles and two participles through a knowledge graph according to an embodiment of the present application. The knowledge graph shown in fig. 4 is constructed for "household category", and in other embodiments, the knowledge graph may be constructed for each category based on the performed category division, for example, the knowledge graph is constructed for beauty category and health care category.
The knowledge-graph shown in fig. 4, in which the segmentation categories include: the service entity words refer to entity words related to services; the service behavior words refer to behavior words related to the service; the service status word refers to a status word related to the service; a service brand refers to a brand word associated with a service.
With continued reference to fig. 4, fig. 4 shows some of the entities in the knowledge-graph (circled in fig. 4), including "home", "air conditioner", "wash", "clean", "beauty", "home"; wherein, the household appliance and the air conditioner are service entity words; "rinse" and "clean" are service status words; "to home" is a service status word; "American" is a brand of service. Further, as shown in fig. 4, the association relationship between the entity of "home appliance" and the entity of "air conditioner" is a hypernym relationship; the association between the entity "rinse" and the entity "clean" is a synonym relationship.
Based on the knowledge graph shown in fig. 4, if two texts with matching degree scores needing to be calculated include a text I and a text II, wherein the word segmentation result of the text I is as follows: beauty/home appliances/cleaning; the word segmentation result of the text II is as follows: air conditioning/cleaning,/home/service. Carrying out entity linkage on the participles in the text I in the knowledge graph, wherein the participle of 'beauty' in the text I is linked to the entity of 'beauty' in the knowledge graph as shown in figure 4; the word "household appliance" in the text I is linked to the entity "household appliance" in the knowledge graph; the term "washout" in text I links to the entity "washout" in the knowledge-graph.
Entity linking is carried out on the participles in the text II in the knowledge graph, as shown in FIG. 4, the participle of 'air conditioner' in the text II is linked to the entity of 'air conditioner' in the knowledge graph; the term "clean" in text II links to the entity "clean" in the knowledge-graph; the term "to home" in text II links to the entity "to home" in the knowledge-graph. Fig. 4 shows only a part of the entities in the knowledge-graph, and therefore the linked-to entities in the knowledge-graph in text II are not shown in fig. 4.
After segmenting the segmentation words in the text I and the text II in the knowledge graph, determining a segmentation word category to which the segmentation word belongs and an association relationship between the two segmentation words based on an entity to which the segmentation word is linked, for example, determining a segmentation word category to which the segmentation word is linked to the entity in the knowledge graph as the segmentation word category to which the segmentation word belongs; and determining the association relationship between the two participles and the entity linked to the two participles on the knowledge graph as the association relationship between the two participles.
In some embodiments of the present application, since the entity to which the participle in the query text is linked on the knowledge graph is semantically the same as or similar to the participle, the entity to which the participle is linked on the knowledge graph may be used as reference information for generating a word vector of the participle, in other words, the word vector of the participle is determined by combining the participle and the entity to which the participle is linked on the knowledge graph. For ease of distinction, the entity to which the participles in the query text are linked on the knowledge-graph is referred to as the first entity.
In some embodiments of the present application, in order to facilitate construction of a word vector of each participle in the query text, a word vector of an entity to which a participle in the query text is linked in the knowledge graph may be used as the word vector of the participle.
In some embodiments of the present application, since there is a relatively close relationship between neighboring entities of an entity in the knowledge graph, in order to provide more reference information in generating a word vector of a participle, the neighboring entity of the entity linked to the participle on the knowledge graph may also be used as reference information for generating the word vector of the participle.
In some embodiments of the present application, since the closer the distance between the entities in the knowledge-graph is, the closer the relationship between the entities is, e.g., the semantic similarity, the collocation use, etc., and the more the reference information is, the more the computation amount in the process of generating the word vector is, in order to balance between the accuracy and the computation amount of the word vector, the neighbor entity used as a reference for generating the word vector may be an entity directly adjacent to the first entity, which may also be referred to as a one-hop neighbor (1-hop neighbor) of the first entity.
In some embodiments of the present application, after determining a first entity to which a participle in a query text is linked in a knowledge graph, the first entity and a one-hop neighbor of the first entity may be spliced to a corresponding participle in the query text, so that the resulting spliced text (for convenience of distinction, the spliced text is referred to as a first spliced text) is input into a word vector generation model, and a word vector of each participle in the query text is determined by the word vector generation model according to the first spliced text.
In some embodiments of the application, since the participles in the query text are entity-linked on the knowledge graph first, and the linked first entity is identical in semantics or similar in semantics to the corresponding participles in the query text, the first entity to which the participles in the query text are linked on the knowledge graph may be used as a data basis for generating word vectors of the corresponding participles. Specifically, according to the sequence of each participle in the query text, the correspondingly linked first entities are spliced, a neighbor entity (for example, a one-hop neighbor) of the first entity is spliced to the corresponding first entity, the spliced text (for convenience of distinction, the spliced text is referred to as a second spliced text) obtained in the splicing process is input into a neural network, and the neural network outputs a word vector of each participle in the query text according to the second spliced text.
In the scenario of generating the word vector of each participle in the query text according to the first concatenated text or the second concatenated text as described above, the first entity link information may be the first concatenated text or the second concatenated text obtained based on entity linking. Of course, the word vector of each participle in the text to be matched may be generated according to the above generation method of the word vector of each participle in the query text, and is not described herein again.
Step 312, the participles in the knowledge graph that are linked to the participle category to which the first entity belongs for the participles in the query text are taken as the participle category to which the participles in the query text belong.
After the first entity to which each participle in the query text is linked on the knowledge graph is determined, the semantic meaning of the linked first entity is the same as or similar to that of the participle, and on the basis that the participle category to which each entity belongs is set in the knowledge graph, the participle category to which the participle in the query text is linked to the first entity can be directly used as the participle category to which the corresponding participle in the query text belongs.
Through the process of the above step 311-312, the recognition of the segmentation class to which each segmentation in the query text belongs based on the knowledge graph is realized, and meanwhile, the recognition of the segmentation class to which each segmentation in the text to be matched belongs can also be performed according to a similar method.
Step 320, determining a first weight of each participle in the query text relative to each participle in the text to be matched according to the participle category to which each participle in the query text belongs, the participle category to which each participle in the text to be matched belongs and the weight mapping information; the weight mapping information indicates a first weight associated with any two-participle category.
After determining the category of each participle in the query text and the category of each participle in the text to be matched, the category of each participle to which each participle in each participle pair (one participle is from the query text and the other participle is from the text to be matched) respectively belongs can be correspondingly determined, and the category pair of each participle corresponding to each participle pair is obtained, so that the category pair of the participle corresponding to the participle pair is searched in the weight mapping information, and further, a first weight associated with the category pair of the participle is determined, namely, the first weight of the corresponding participle in the query text relative to the corresponding participle in the text to be matched.
Step 330, combining the first weights of all the participles in the query text relative to all the participles in the text to be matched to obtain a first weight matrix.
In some embodiments of the present application, the first weights of the participles in the query text relative to the participles in the text to be matched may be combined according to a set order, for example, the horizontal order illustrated above is ordered according to the positions of the participles corresponding to the elements in the text to be matched, and the vertical order is ordered according to the positions of the participles corresponding to the elements in the query text.
Of course, for convenience of calculation, the arrangement order of the elements in the first weight matrix is the same as that of the elements in the similarity matrix.
In other embodiments of the present application, the matching weight matrix includes a second weight matrix; as shown in fig. 5, step 210 includes:
and 510, identifying the incidence relation between the participles in the query text and the participles in the text to be matched according to the knowledge graph.
In some embodiments of the present application, step 510 comprises: acquiring first entity link information, wherein the first entity link information is used for indicating a first entity to which a participle in a query text is linked on a knowledge graph; acquiring second entity link information, wherein the second entity link information is used for indicating a second entity linked to the participles in the text to be matched on the knowledge graph; and determining the incidence relation between the first entity and the second entity in the knowledge spectrogram as the incidence relation between the corresponding participle in the query text and the corresponding participle in the text to be matched.
The second entity refers to an entity to which the participles in the text to be matched are linked on the knowledge graph. In some embodiments of the application, the information linked to the first entity, the second entity linking information may be a spliced text obtained by splicing a second entity linked to each participle in the text to be matched to a corresponding participle in the text to be matched and splicing a neighboring entity of the second entity to the corresponding second entity, or a spliced text obtained by splicing a second entity linked to each participle according to a position of each participle in the text to be matched and splicing a neighboring entity of the second entity to the corresponding second entity.
Since the connection relationship between two entities in the knowledge-graph indicates the association relationship between the two entities, therefore, after determining the first entity (assumed to be entity P1) to which the participle in the query text (assumed to be participle a) is linked and the second entity (assumed to be entity P2) to which the participle in the text to be matched (assumed to be participle B) is linked, since the entity P1 to which the participle a is linked on the knowledge-graph is highly matched, e.g., semantically identical, part-of-speech identical or similar, the entity P2 to which the participle B is linked on the knowledge-graph is also highly matched, in this sense, therefore, the participle A can be replaced by the linked entity P1, the participle B can be replaced by the linked entity P2, and the association relationship between the entity P1 and the entity P2 on the knowledge graph can represent the association relationship between the participle A and the participle B. Therefore, the association relationship between the entity P1 and the entity P2 on the knowledge-graph can be determined as the association relationship between the participle a in the query text and the participle B in the text to be matched.
And 520, performing weight search according to the association relation to obtain a second weight of the participles in the query text relative to the participles in the text to be matched.
In some embodiments of the application, the second weight corresponding to each association relationship is preset, after the association relationship between the participles in the query text and the participles in the text to be matched is determined, the second weight corresponding to the association relationship is correspondingly obtained, and the second weight of each participle in the query text relative to each participle in the text to be matched can be determined according to the process.
And 530, combining the participles in the query text with the second weight of the participles in the text to be matched to obtain a second weight matrix.
In one embodiment, for convenience of calculation, the arrangement order of the elements in the second weight matrix is the same as the arrangement order of the elements in the similarity matrix.
Through the steps 510-530, the association relationship between the participles in the query text and the participles in the text to be matched is determined based on the knowledge graph, and the second weight corresponding to the determined association relationship is searched for, so as to obtain the second weight of the participles in the query text relative to the participles in the text to be matched.
If the matching weight matrix includes the first weight matrix and the second weight matrix, the first weight matrix may be determined according to the embodiment shown in fig. 3, the second weight matrix may be determined according to the embodiment shown in fig. 5, and then the similarity matrix between the query text and the text to be matched is enhanced by using the first weight matrix and the second weight matrix.
In some embodiments of the present application, as shown in fig. 6, step 230, comprises:
step 610, performing pooling processing on the first similarity matrix to obtain a second similarity matrix.
The pooling of the first similarity matrix refers to extracting important feature information from the first similarity matrix and downsampling the first similarity feature. Therefore, the dimensionality of the second similarity matrix obtained by the pooling is smaller than that of the first similarity matrix, the parameter quantity is reduced, and the matching degree score is convenient to calculate subsequently.
The pooling process may be a maximum pooling process, an average pooling process, a global average pooling, a global adaptive pooling, etc., and is not particularly limited herein.
In some embodiments of the present application, the second similarity matrix comprises a first pooled matrix and a second pooled matrix; step 610 further comprises: pooling the first similarity matrix along the transverse direction of the first similarity matrix to obtain a first pooled matrix; and pooling the first similarity matrix along the longitudinal direction of the first similarity matrix to obtain a second pooled matrix.
Under the condition that one direction dimension of the first similarity matrix represents the participles in the query text and the other direction dimension represents the participles in the text to be matched, pooling is carried out on the first similarity matrix along the two direction dimensions (namely the transverse direction and the longitudinal direction), so that a pooling matrix for reflecting the hit condition of the participles in the query text in the text to be matched and a pooling matrix for reflecting the hit condition of the participles in the text to be matched in the query text can be obtained respectively. Therefore, the obtained first pooling matrix and the second pooling matrix reflect the matching situation between the query text and the text to be matched under the condition that the query text is taken as a reference and the text to be matched is taken as a reference respectively, and the matching situation between the query text and the text to be matched can be reflected from different angles by combining the first pooling matrix and the second pooling matrix.
And step 620, calculating a matching degree score between the query text and the text to be matched according to the second similarity matrix.
In some embodiments of the present application, the prediction of the match score may be performed by a fully connected layer. And inputting the second similarity matrix into a full-connection layer, fully connecting the second similarity matrix by the full-connection layer, and predicting the matching degree score between the query text and the text to be matched.
In some embodiments of the present application, as shown in fig. 7, step 620, comprises:
and 710, performing attention weighting on the second similarity matrix based on an attention mechanism to obtain a third similarity matrix.
In one embodiment of the present application, step 710 comprises: performing linear transformation on the word vectors of all the participles in the query text according to the key weight vectors to obtain key vectors corresponding to all the participles in the query text; performing linear transformation on semantic feature vectors corresponding to the query text according to the query weight vector to obtain a query vector; calculating according to the key vector and the query vector corresponding to each participle in the query text to obtain the attention score corresponding to each participle in the query text; weighting the value vectors corresponding to the participles in the query text according to the attention scores corresponding to the participles in the query text to obtain target similarity vectors corresponding to the participles in the query text; the value vector corresponding to the word in the query text is obtained by carrying out linear transformation on the similarity vector corresponding to the word in the query text according to the value weight vector, and the similarity vector corresponding to the word in the query text is obtained by extracting elements related to the corresponding word in the query text from the second similarity matrix and combining the extracted elements; and combining the target similarity vectors corresponding to the participles in the query text to obtain a third similarity matrix.
For ease of description, the participle in the query text will be referred to as the first participle. In this embodiment, the attention score of each first participle is calculated, and then the similarity vector corresponding to the first participle is weighted according to the attention score of the first participle, so as to obtain the target similarity vector corresponding to the first participle. The similarity vector corresponding to the first segmentation is obtained by extracting elements related to the first segmentation from the second similarity matrix and combining the extracted elements.
Suppose that the query text comprises n participles, and the word vector corresponding to the ith participle is qiThen, the key vector K corresponding to the ith word segmentation in the text is inquirediComprises the following steps:
Ki=qiWk(ii) a (formula 1)
Wherein, WkIs a key weight vector;
query vector Q corresponding to query textiComprises the following steps:
Qi=EmbclsWq(ii) a (formula 2)
Wherein EmbclsFor querying semantic feature vectors corresponding to text, WqIs a query weight vector; in a specific embodiment, the semantic feature vector corresponding to the query text may be obtained by fusing word vectors of all the participles in the query text.
According to the scaling dot product model, inquiring the attention score e corresponding to the ith participle in the textiComprises the following steps:
Figure BDA0003097458460000181
wherein the content of the first and second substances,
Figure BDA0003097458460000182
indicates the key vector K corresponding to the ith word segmentationiTransposing;
Figure BDA0003097458460000183
is a scaling factor, wherein dkThe dimension of the key vector corresponding to the ith participle can be taken as the dimension. In other embodiments, the calculation of the attention score based on the key vector and the query vector may also be performed in accordance with an additive model, a dot product model, or a bilinear model.
Value vector V corresponding to ith participle in query textiComprises the following steps:
Vi=ciWv(ii) a (formula 4)
Wherein, ciFor the similarity vector corresponding to the ith word-segmentation, WvIs a value weight matrix. In the above formula, Wk、Wq、WvAre parameters that the model can learn.
Target similarity vector Attention (Q) corresponding to ith participle in query texti,Ki,Vi) Comprises the following steps:
Attention(Qi,Ki,Vi)=softmax(ei)Vi(ii) a (formula 5)
Or combining the formula (3) and the formula (4) to obtain a target similarity vector Attention (Q) corresponding to the ith participle in the query texti,Ki,Vi) Comprises the following steps:
Figure BDA0003097458460000184
fig. 8 is a schematic diagram illustrating a method for calculating a target similarity vector corresponding to each participle in a query text according to an embodiment of the present application. As shown in fig. 8, the semantic feature vector corresponding to the query text is linearly transformed according to the query weight matrix to obtain a query vector Qi(ii) a Word vectors (q) for each participle in the query text based on key weight vector pairs1、q2、q3…) to obtain key vectors K corresponding to each participle in the query texti(ii) a According to the value weight vector, the similarity vector (c) corresponding to the participle in the query text1、c2、c3) Linear transformation is carried out to obtain a value vector V corresponding to the participle in the query texti
And then, sequentially executing the steps 810 and 830, wherein in the step 810, the attention scores corresponding to the participles in the query text are calculated through a MatMul function. Specifically, the attention score is calculated according to the above formula (3). The MatMul function is used to multiply two matrices. The resulting attention score is normalized by the Softmax function, step 820. And step 830, calculating a target similarity vector corresponding to each participle in the query text through a MatMul function. In step 830, the normalized attention score is multiplied by a value vector corresponding to the participle in the query text through a MatMul function to obtain a corresponding target similarity vector.
And 720, performing score prediction according to the third similarity matrix to obtain a matching degree score between the query text and the text to be matched.
In some embodiments of the present application, a matching degree score between the query text and the text to be matched may be calculated according to the third similarity matrix through a full connected layers (FC). And inputting the trained full-connection layer into a third similarity matrix, transforming the third similarity matrix through the full-connection layer, and outputting a matching degree score.
The meaning of a word expressed in text is often related to the context of the word. Thus, the context information of a word helps to enhance its semantic representation. Also, different words in context tend to have different contributions to the semantic representation. Therefore, the attention score corresponding to each participle in the query text can be calculated through the attention mechanism. The attention score corresponding to the participle reflects the contribution degree of the participle to the overall semantics of the query text. It can be understood that, if a word in the query text contributes to the overall semantics of the query text to a higher degree, in the text matching process, the importance of the similarity related to the word needs to be paid attention to.
In this embodiment, the attention scores corresponding to the participles in the query text are used to enhance the matching degree related to the participles in the second similarity matrix, and the contribution degree of the participles to the overall semantics of the query text is introduced to enhance the matching degree corresponding to the participles, so that the matching accuracy can be further improved.
Further, the inventor of the present application finds, through practical analysis, that if a neighboring entity of a participle linked to an entity on a knowledge graph on the participle is referred to in a process of constructing a word vector of the participle in a query text, the referred neighboring entity may introduce noise to text matching. For example, if the word segmentation result of the query text is: air conditioning/water leakage, wherein a neighbor entity of the entity to which the participle of "water leakage" is linked on the knowledge graph is "water pipe maintenance", wherein "water leakage" and "water pipe maintenance" are in an expanded word relationship. In this case, if the text to be matched is "toilet water pipe maintenance", if the semantic contribution degree of each participle in the query text to the query text is not considered, in the process of performing text matching through the model, the model considers that the matching degree of the query text "air conditioner water leakage" and the text "toilet water pipe maintenance" to be matched is higher, and actually, the two are irrelevant. The reason for this is that the input neighbor entity "water pipe repair" introduces noise, which causes an error in the matching result.
For this problem, according to the scheme of this embodiment, the matching strength of the participle is weighted based on the attention score calculated by the attention mechanism, so as to reduce the influence of noise introduced by the neighboring entity on the text matching result.
In some embodiments of the present application, as shown in fig. 9, prior to step 210, the method further comprises: step 910, receiving a service query request sent by a client, where the service query request indicates a query text; in this embodiment, after step 240, the method further includes:
step 920, obtaining application information of the target service application, where the target service application refers to a service application corresponding to the target matching text.
Step 930, generating a query result according to the application information.
And 940, returning a query result to the client so that the client displays the service entrance of the target service application according to the query result.
Wherein the service query request is initiated for a service application query. The determined application information of the target service application may include an application name of the target service application, a calling interface related to a service that the current user needs to call, and the like.
The query result indicates the calling interface of each target service application, so that after the client receives the query interface, the client can display the application entry of the target service application according to the calling interface of each target service application in the query result, and a user can enter the corresponding service page by triggering the application entry. For example, if the user needs to query a service application related to "send an express item", after the query result, the client displays an application entry for entering into a service page providing the "send an express item", and the user may trigger the application entry to enter into the service page providing the "send an express item".
In some embodiments of the present application, the process of calculating the matching degree score between the query text and the text to be matched in the text processing process may be implemented by a text matching model, and the text matching model may be constructed by a convolutional neural network, a pooled neural network, a fully-connected neural network, a recurrent neural network, and the like.
FIG. 10 is a model diagram illustrating a text matching model according to an embodiment of the present application.
As shown in fig. 10, the text matching model includes an input layer 1010, a similarity matrix layer 1020, a pooling layer (including a first pooling layer 1031 and a second pooling layer 1032), an attention layer 1040, and a full-connection layer 1050. The input layer 1010 is configured to receive input text, and in particular in this application, the input text includes input text for a query text and input text for a text to be matched.
The input text aiming at the query text refers to a text obtained by splicing entities linked to the knowledge graph by the participles in the query text according to the sequence of the participles in the query text after the participles in the query text are subjected to entity linkage, and splicing one-hop neighbor entities corresponding to the entities linked to the knowledge graph on the knowledge graph to the corresponding entities.
The input text aiming at the text to be matched is a text obtained by splicing the entities linked to the knowledge graph by the participles in the text to be matched according to the sequence of the participles in the text to be matched after the participles in the text to be matched are subjected to entity linkage on the knowledge graph, and splicing the one-hop neighbor entities corresponding to the entities linked to the knowledge graph on the knowledge graph to the corresponding entities.
In the scheme of the application, the association relationship between the participle category to which each entity belongs and the entity is set in the knowledge graph, so that in the subsequent process, the text matching model can determine the participle category to which each participle in the query text belongs, the participle category to which each participle in the text to be matched belongs, and the association relationship between the participle in the query text and the participle in the text to be matched based on the knowledge graph, the input text for the query text and the input text for the text to be matched.
The similarity matrix layer 1020 calculates a similarity matrix between the query text and the text to be matched based on the word vectors of the participles in the query text and the word vectors of the participles in the text to be matched; determining a matching weight matrix of the query text relative to the text to be matched according to the input text aiming at the query text, the input text aiming at the text to be matched and the knowledge graph; and then enhancing the similarity matrix through the matching weight matrix to obtain a first similarity matrix. The determination of the similarity matrix and the matching weight matrix is described above and will not be described herein. The matching weight matrix may be a matrix obtained by fusing the first weight matrix and the second weight matrix, or may be the first weight matrix or the second weight matrix.
A first pooling layer 1031 and a second pooling layer 1032 in the pooling layer respectively perform pooling treatment on the first similar matrix layer along two directions, specifically, in fig. 10, the first pooling layer 1031 is to move a pooling kernel (pooling kernel) along a transverse direction of the first similarity matrix for pooling to obtain a first pooling matrix; the second pooling layer 1032 pools by moving the pooling cores in the longitudinal direction of the first similarity matrix, resulting in a second pooling matrix.
The first pooling matrix and the second pooling matrix obtained by pooling in the pooling layer are input to the attention layer 1040, and the attention layer performs attention weighting on the first pooling matrix and the second pooling matrix based on the attention mechanism and outputs a second similarity matrix. As described above, the attention layer also needs to calculate a target similarity vector corresponding to each participle by means of the semantic feature vector of the query text and the vector of each participle in the query text.
The full connected layers (FC) 1050 plays a role of a classifier, and is configured to perform classification prediction according to the input second similarity matrix, and output a matching degree score representing a matching degree between the query text and the text to be matched.
The text matching effect of the text matching model shown in fig. 10 is further described below with reference to a specific embodiment. If the word segmentation result of the query text is as follows: accumulation fund/loan; the word segmentation result of the text to be matched is as follows: accumulation fund/inquiry/accumulation fund/payment. The similarity of the participles in the query text with respect to the participles in the text to be matched, which is calculated based on the word vectors of the participles in the query text and the word vectors of the participles in the text to be matched, is shown in table 1 below. In table 1, "Query" represents Query text, and "Doc" represents text to be matched.
TABLE 1
Figure BDA0003097458460000211
Referring to the similarity in table 1, the similarity matrix of the query text with respect to the text to be matched is:
Figure BDA0003097458460000212
if the similarity matrix is directly subjected to pooling treatment, wherein the pooling treatment is carried out along the longitudinal direction of the similarity matrix, the obtained pooling matrix is
Figure BDA0003097458460000213
Pooling is carried out along the transverse direction of the similarity matrix, and the obtained pooled matrix is | 0.970.730.970.8 |; on the basis, the full connection layer carries out classification prediction based on the two pooling matrixes, and outputs matching degree scores indicating that the query text 'public deposit loan' is matched with the text to be matched 'public deposit query public deposit payment'. Obviously, the 'public accumulation loan' and 'public accumulation inquiry and payment' are not matched in semantics, so that the matching degree score obtained by directly predicting the matching degree score based on the similarity matrix and the actual situation are not matchedThe match result indicated by the output match score is incorrect.
Table 2 shows the first weights associated with the two-part-word categories indicated in the weight mapping information.
TABLE 2
Figure BDA0003097458460000221
After the category of the participles to which the participles in the query text belong and the category of the participles to which the participles in the text to be matched belong are determined, a first weight matrix of the query text relative to the text to be matched can be correspondingly determined based on the first weight indicated by the weight mapping information.
Table 3 shows the second weights corresponding to the partial association relations.
TABLE 3
Figure BDA0003097458460000222
After determining the incidence relation between each participle in the query text and each participle in the text to be matched, based on the second weight corresponding to the preset incidence relation, a second weight matrix of the query text relative to the text to be matched can be determined.
After determining a first weight matrix and a second weight matrix of the query text relative to the text to be matched, multiplying the first weight matrix, the second weight matrix and the similarity matrix of the determined query text relative to the text to be matched to obtain a first similarity matrix, wherein the similarity of each participle in the query text after enhancement indicated by the first similarity matrix relative to the text to be matched is shown in the following table 4:
TABLE 4
Figure BDA0003097458460000223
The obtained first similarity matrix is:
Figure BDA0003097458460000224
pooling is performed along the longitudinal direction of the first similarity matrix, and the obtained second pooled matrix is:
Figure BDA0003097458460000225
pooling is performed along a transverse direction of the first similarity matrix, and the obtained first pooled matrix is: l 0.780.550.780.66 l; and the full-connection layer carries out classification prediction based on the first pooling matrix and the second pooling matrix and outputs a matching degree score indicating that the query text 'public deposit loan' is not matched with the text to be matched 'public deposit query public deposit payment'. Therefore, after the similarity matrix is enhanced based on the matching weight matrix, the matching degree indicated by the matching degree score obtained by predicting the matching degree score according to the enhanced first similarity matrix is consistent with the actual matching condition of 'the equity loan' relative to 'the inquiry and payment of the equity fund'. Therefore, the method proves that the similarity matrix is enhanced through the matching weight matrix of the query text relative to the text to be matched, and the accuracy of text matching can be improved.
In some embodiments of the present application, the text matching model shown in fig. 10 may be obtained by improving an existing text matching model, and by improving the existing text matching model, the improved text matching model may predict a matching degree score between the query text and the text to be matched according to a text matching process in the present application scheme.
In some embodiments of the present application, a K-NRM (Kernel-based Neural network ranking Model) Model may be improved to obtain a text matching Model shown in fig. 10, so that the improved Model performs text matching according to the process in the present application. After the K-NRM model is improved, an original K-NRM model and the improved K-NRM model are subjected to a comparative test, specifically, the original K-NRM model and the improved K-NRM model are respectively used for text matching in an information query scene, a query result is determined based on a matching degree Score obtained by text matching, and the accuracy rate, the recall rate and the F1 Score (also called F1 Score) of the query results of the original K-NRM model and the improved K-NRM model are counted based on the query result. The improved effect pairs with the original K-NRM model are shown in table 5 below.
TABLE 5
Rate of accuracy Recall rate F1 score
Original K-NRM model 0.8363 0.8660 0.8509
Improved K-NRM model 0.8730 0.9016 0.8871
Where the accuracy ratio is TP/(TP + FP), which indicates the proportion of correct entries among the retrieved entries. The recall ratio TP/(TP + FN) indicates the proportion of all correct entries retrieved.
The F1 Score (F1 Score), also known as balanced F Score, is defined as the harmonic mean of accuracy and recall and is an index used in statistics to measure the accuracy of the two-class model. The method gives consideration to the accuracy and the recall rate of the classification model. The F1 score can be viewed as a weighted average of model accuracy and recall, with a maximum of 1 and a minimum of 0. Wherein F1 score ═ precision ═ recall/2 (precision + recall).
Wherein TP (True Positive) means the number of entries for which a Positive decision is made and the decision is correct. FP (False Positive) refers to the number of entries for which a Positive decision is made and the decision is erroneous. FN (False Negative) means that a reverse decision is made and the decision is the number of erroneous entries. In the resource query scenario, the forward determination may be that the model determines that a text to be matched is matched with the query text; the reverse determination may be that the model determines that a text to be matched does not match the query text.
As can be seen from table 5 above, after the K-NRM model is improved, the improved K-NRM model is used for text matching according to the method of the present application, and the accuracy, recall rate, and F1 score of text matching are all effectively improved, which indicates that the accuracy of text matching is effectively improved when the improved K-NRM model is used for text matching according to the method of the present application.
Embodiments of the apparatus of the present application are described below, which may be used to perform the methods of the above-described embodiments of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the above-described embodiments of the method of the present application.
Fig. 11 is a block diagram illustrating a text processing apparatus according to an embodiment, which includes, as illustrated in fig. 11: a matching weight matrix determining module 1110, configured to determine a matching weight matrix of the query text with respect to the text to be matched, where the matching weight matrix includes at least one of a first weight matrix and a second weight matrix; the first weight matrix is determined according to the word class to which each word in the query text belongs and the word class to which each word in the text to be matched belongs; the second weight matrix is determined according to the incidence relation between each participle in the query text and each participle in the text to be matched. The enhancing module 1120 is configured to enhance the similarity matrix of the query text with respect to the text to be matched according to the matching weight matrix to obtain a first similarity matrix; and the similarity matrix is obtained by carrying out similarity calculation according to the word vector of each participle in the query text and the word vector of each participle in the text to be matched. And a matching degree score determining module 1130, configured to determine a matching degree score between the query text and the text to be matched according to the first similarity matrix. And a target matching text determining module 1140, configured to determine a target matching text according to the matching degree score between the query text and the text to be matched.
In some embodiments of the present application, the matching weight matrix comprises a first weight matrix; the matching weight matrix determination module 1110 includes: the word segmentation type identification unit is used for identifying the word segmentation type to which each word in the query text belongs; the first weight determining unit is used for determining the first weight of each participle in the query text relative to each participle in the text to be matched according to the participle category to which each participle in the query text belongs, the participle category to which each participle in the text to be matched belongs and the weight mapping information; the weight mapping information indicates a first weight associated with any two-participle category; and the first weight matrix determining unit is used for combining the first weights of all the participles in the query text relative to all the participles in the text to be matched to obtain a first weight matrix.
In some embodiments of the present application, the word segmentation class identification unit includes: the first entity link information acquisition unit is used for acquiring first entity link information, and the first entity link information is obtained by carrying out entity link on each participle in the query text in a knowledge graph; and the word segmentation category determining unit is used for linking the word segmentation in the query text in the knowledge graph to the word segmentation category to which the first entity belongs, and taking the word segmentation category as the word segmentation category to which the word segmentation in the query text belongs.
In other embodiments of the present application, the matching weight matrix includes a second weight matrix; the matching weight matrix determination module 1110 includes: the incidence relation identification unit is used for identifying the incidence relation between the participles in the query text and the participles in the text to be matched according to the knowledge graph; the second weight determining unit is used for carrying out weight search according to the incidence relation to obtain a second weight of the participles in the query text relative to the participles in the text to be matched; and the second weight matrix determining unit is used for combining the second weights of the participles in the query text relative to the participles in the text to be matched to obtain a second weight matrix.
In some embodiments of the present application, the association relationship identifying unit includes: a first entity link information acquisition unit, configured to acquire first entity link information, where the first entity link information is used to indicate a first entity to which a participle in a query text is linked on a knowledge graph; the second entity link information acquisition unit is used for acquiring second entity link information which is used for indicating a second entity to which the participles in the text to be matched are linked on the knowledge graph; and the incidence relation determining unit is used for determining the incidence relation between the first entity and the second entity in the knowledge spectrogram as the incidence relation between the corresponding participle in the query text and the corresponding participle in the text to be matched.
In some embodiments of the present application, the enhancement module is further configured to: and multiplying the matching weight matrix and the similarity matrix to obtain a first similarity matrix.
In some embodiments of the present application, the matching score determining module 1130 includes: the pooling processing unit is used for pooling the first similarity matrix to obtain a second similarity matrix; and the matching degree score calculating unit is used for calculating the matching degree score between the query text and the text to be matched according to the second similarity matrix.
In some embodiments of the present application, the matching degree score calculating unit includes: the attention weighting unit is used for carrying out attention weighting on the second similarity matrix based on an attention mechanism to obtain a third similarity matrix; and the score prediction unit is used for performing score prediction according to the third similarity matrix to obtain a matching degree score between the query text and the text to be matched.
In some embodiments of the present application, the attention weighting unit includes: the key matrix determining unit is used for performing linear transformation on the word vectors of all the participles in the query text according to the key weight vectors to obtain key vectors corresponding to all the participles in the query text; the query matrix determining unit is used for performing linear transformation on the semantic feature vectors corresponding to the query texts according to the query weight vectors to obtain query vectors; the attention score determining unit is used for calculating the attention score corresponding to each participle in the query text according to the key vector corresponding to each participle in the query text and the query vector; the target similarity vector determining unit is used for weighting the value vectors corresponding to the participles in the query text according to the attention scores corresponding to the participles in the query text to obtain target similarity vectors corresponding to the participles in the query text; the value vector corresponding to the word in the query text is obtained by carrying out linear transformation on the similarity vector corresponding to the word in the query text according to the value weight vector, and the similarity vector corresponding to the word in the query text is obtained by extracting elements related to the corresponding word in the query text from the second similarity matrix and combining the extracted elements; and the third similarity matrix determining unit is used for combining the target similarity vectors corresponding to the participles in the query text to obtain a third similarity matrix.
In some embodiments of the present application, the second similarity matrix comprises a first pooled matrix and a second pooled matrix; a pooling processing unit comprising: the first pooling processing unit is used for pooling the first similarity matrix along the transverse direction of the first similarity matrix to obtain a first pooling matrix; and the second pooling processing unit is used for pooling the first similarity matrix along the longitudinal direction of the first similarity matrix to obtain a second pooling matrix.
In some embodiments of the present application, the target matching text determination module comprises: the sorting unit is used for sorting the texts to be matched according to the sequence of the matching degree scores from large to small; and the target matching text determining unit is used for determining the texts to be matched in the preset number in the front in the sequence as target matching texts.
In some embodiments of the present application, the text processing apparatus further comprises: the service query request receiving module is used for receiving a service query request sent by a client, wherein the service query request indicates a query text; and further comprising: the application information acquisition module is used for acquiring application information of a target service application, wherein the target service application refers to a service application corresponding to a target matching text; the query result generation module is used for generating a query result according to the application information; and the query result returning module is used for returning the query result to the client so that the client displays the service entrance of the target service application according to the query result.
FIG. 12 illustrates a schematic structural diagram of a computer system suitable for use in implementing the electronic device of an embodiment of the present application.
It should be noted that the computer system 1200 of the electronic device shown in fig. 12 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 12, the computer system 1200 includes a Central Processing Unit (CPU)1201, which can perform various appropriate actions and processes, such as executing the methods in the above-described embodiments, according to a program stored in a Read-Only Memory (ROM) 1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for system operation are also stored. The CPU1201, ROM1202, and RAM 1203 are connected to each other by a bus 1204. An Input/Output (I/O) interface 1205 is also connected to bus 1204.
The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output section 1207 including a Display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.
In particular, according to embodiments of the application, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211. The computer program executes various functions defined in the system of the present application when executed by a Central Processing Unit (CPU) 1201.
It should be noted that the computer readable medium shown in the embodiments of the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a Read-Only Memory (ROM), an Erasable Programmable Read-Only Memory (EPROM), a flash Memory, an optical fiber, a portable Compact Disc Read-Only Memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. Each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.
As another aspect, the present application also provides a computer-readable storage medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer-readable storage medium carries computer-readable instructions that, when executed by a processor, implement the text processing method of any of the embodiments described above.
According to an aspect of the present application, there is also provided an electronic device, including: a processor; a memory having computer readable instructions stored thereon, the computer readable instructions, when executed by the processor, implement the text processing method in any of the above embodiments.
According to an aspect of an embodiment of the present application, there is provided a computer program product or a computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to execute the text processing method in any of the above embodiments.
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present application.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (15)

1. A method of text processing, comprising:
determining a matching weight matrix of a query text relative to a text to be matched, wherein the matching weight matrix comprises at least one of a first weight matrix and a second weight matrix, the first weight matrix is determined according to the category of participles to which the participles in the query text belong and the category of participles to which the participles in the text to be matched belong, and the second weight matrix is determined according to the incidence relation between the participles in the query text and the participles in the text to be matched;
according to the matching weight matrix, enhancing a similarity matrix of the query text relative to the text to be matched to obtain a first similarity matrix, wherein the similarity matrix of the query text relative to the text to be matched is obtained by performing similarity calculation according to word vectors of all participles in the query text and word vectors of all participles in the text to be matched;
determining a matching degree score between the query text and the text to be matched according to the first similarity matrix;
and determining a target matching text according to the matching degree score between the query text and the text to be matched.
2. The method of claim 1, wherein the matching weight matrix comprises a first weight matrix; the determining of the matching weight matrix of the query text relative to the text to be matched comprises:
identifying the category of each participle in the query text;
determining a first weight of each participle in the query text relative to each participle in the text to be matched according to the participle category to which each participle in the query text belongs, the participle category to which each participle in the text to be matched belongs and weight mapping information; the weight mapping information indicates a first weight associated with any two-part word category;
and combining the first weights of all the participles in the query text relative to all the participles in the text to be matched to obtain the first weight matrix.
3. The method of claim 2, wherein the identifying the segmentation class to which each segmentation in the query text belongs comprises:
acquiring first entity link information, wherein the first entity link information is obtained by carrying out entity link on each participle in the query text in a knowledge graph;
and linking the participles in the query text in the knowledge graph to the participle category to which the first entity belongs, and taking the participles as the participle category to which the participles in the query text belong.
4. The method of claim 1, wherein the matching weight matrix comprises a second weight matrix; the determining of the matching weight matrix of the query text relative to the text to be matched comprises:
identifying the incidence relation between the participles in the query text and the participles in the text to be matched according to a knowledge graph;
carrying out weight search according to the incidence relation to obtain a second weight of the participles in the query text relative to the participles in the text to be matched;
and combining the second weight of each participle in the query text relative to each participle in the text to be matched to obtain the second weight matrix.
5. The method according to claim 4, wherein the identifying the association relationship between the participles in the query text and the participles in the text to be matched according to the knowledge graph comprises:
acquiring first entity link information, wherein the first entity link information is used for indicating a first entity to which a participle in the query text is linked on the knowledge graph;
acquiring second entity link information, wherein the second entity link information is used for indicating a second entity to which the participles in the text to be matched are linked on the knowledge graph;
and determining the incidence relation between the first entity and the second entity in the knowledge spectrogram as the incidence relation between the corresponding participle in the query text and the corresponding participle in the text to be matched.
6. The method according to claim 1, wherein the enhancing the similarity matrix of the query text with respect to the text to be matched according to the matching weight matrix to obtain a first similarity matrix comprises:
and multiplying the matching weight matrix and the similarity matrix to obtain the first similarity matrix.
7. The method of claim 1, wherein the determining a matching score between the query text and the text to be matched according to the first similarity matrix comprises:
performing pooling treatment on the first similarity matrix to obtain a second similarity matrix;
and calculating a matching degree score between the query text and the text to be matched according to the second similarity matrix.
8. The method according to claim 7, wherein the calculating a matching score between the query text and the text to be matched according to the second similarity matrix comprises:
performing attention weighting on the second similarity matrix based on an attention mechanism to obtain a third similarity matrix;
and performing score prediction according to the third similarity matrix to obtain a matching degree score between the query text and the text to be matched.
9. The method of claim 8, wherein the attention-weighting the second similarity matrix based on an attention mechanism to obtain a third similarity matrix comprises:
performing linear transformation on the word vector of each participle in the query text according to the key weight vector to obtain a key vector corresponding to each participle in the query text;
performing linear transformation on the semantic feature vector corresponding to the query text according to the query weight vector to obtain a query vector;
calculating according to the key vectors corresponding to the participles in the query text and the query vector to obtain the attention scores corresponding to the participles in the query text;
weighting the value vectors corresponding to the participles in the query text according to the attention scores corresponding to the participles in the query text to obtain target similarity vectors corresponding to the participles in the query text; the value vector corresponding to the word in the query text is obtained by performing linear transformation on the similarity vector corresponding to the word in the query text according to the value weight vector, and the similarity vector corresponding to the word in the query text is obtained by extracting elements related to the corresponding word in the query text from the second similarity matrix and combining the extracted elements;
and combining the target similarity vectors corresponding to the participles in the query text to obtain the third similarity matrix.
10. The method of claim 7, wherein the second similarity matrix comprises a first pooling matrix and a second pooling matrix;
the pooling processing of the first similarity matrix to obtain a second similarity matrix includes:
pooling the first similarity matrix along the transverse direction of the first similarity matrix to obtain a first pooled matrix;
and pooling the first similarity matrix along the longitudinal direction of the first similarity matrix to obtain a second pooled matrix.
11. The method of claim 1, wherein the determining a target matching text according to the matching degree score between the query text and the text to be matched comprises:
sequencing the texts to be matched according to the sequence of the matching degree scores from large to small;
and determining the texts to be matched in the preset number in the front in the sequence as target matching texts.
12. The method of claim 1, wherein prior to determining the matching weight matrix of the query text with respect to the text to be matched, the method further comprises:
receiving a service query request sent by a client, wherein the service query request indicates the query text;
after determining a target matching text according to the matching degree score between the query text and the text to be matched, the method further comprises the following steps:
acquiring application information of a target service application, wherein the target service application refers to a service application corresponding to the target matching text;
generating a query result according to the application information;
and returning the query result to the client so that the client displays the service entrance of the target service application according to the query result.
13. A text processing apparatus, comprising:
the matching weight matrix determining module is used for determining a matching weight matrix of the query text relative to the text to be matched, wherein the matching weight matrix comprises at least one of a first weight matrix and a second weight matrix; the first weight matrix is determined according to the word class to which each word in the query text belongs and the word class to which each word in the text to be matched belongs; the second weight matrix is determined according to the incidence relation between each participle in the query text and each participle in the text to be matched;
the enhancement module is used for enhancing the similarity matrix of the query text relative to the text to be matched according to the matching weight matrix to obtain a first similarity matrix; the similarity matrix is obtained by performing similarity calculation according to the word vector of each participle in the query text and the word vector of each participle in the text to be matched;
the matching degree score determining module is used for determining a matching degree score between the query text and the text to be matched according to the first similarity matrix;
and the target matching text determining module is used for determining a target matching text according to the matching degree score between the query text and the text to be matched.
14. An electronic device, comprising:
a processor;
a memory having computer-readable instructions stored thereon which, when executed by the processor, implement the method of any of claims 1-12.
15. A computer readable storage medium having computer readable instructions stored thereon which, when executed by a processor, implement the method of any one of claims 1-12.
CN202110614403.5A 2021-06-02 2021-06-02 Text processing method and device, electronic equipment and storage medium Pending CN113821588A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110614403.5A CN113821588A (en) 2021-06-02 2021-06-02 Text processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110614403.5A CN113821588A (en) 2021-06-02 2021-06-02 Text processing method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113821588A true CN113821588A (en) 2021-12-21

Family

ID=78923796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110614403.5A Pending CN113821588A (en) 2021-06-02 2021-06-02 Text processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113821588A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114548047A (en) * 2022-04-25 2022-05-27 阿里巴巴达摩院(杭州)科技有限公司 Data processing method and device, and text processing method and device
CN114596574A (en) * 2022-03-22 2022-06-07 北京百度网讯科技有限公司 Text recognition method and device, electronic equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114596574A (en) * 2022-03-22 2022-06-07 北京百度网讯科技有限公司 Text recognition method and device, electronic equipment and medium
CN114548047A (en) * 2022-04-25 2022-05-27 阿里巴巴达摩院(杭州)科技有限公司 Data processing method and device, and text processing method and device

Similar Documents

Publication Publication Date Title
Ray et al. A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis
CN109284357B (en) Man-machine conversation method, device, electronic equipment and computer readable medium
KR102288249B1 (en) Information processing method, terminal, and computer storage medium
WO2018151856A1 (en) Intelligent matching system with ontology-aided relation extraction
KR101458004B1 (en) System and method for predicting change of stock price using artificial neural network model
Wu et al. Infusing finetuning with semantic dependencies
Liu et al. Open intent discovery through unsupervised semantic clustering and dependency parsing
CN111274822A (en) Semantic matching method, device, equipment and storage medium
CN113821588A (en) Text processing method and device, electronic equipment and storage medium
CN111368555B (en) Data identification method and device, storage medium and electronic equipment
JP2022035314A (en) Information processing unit and program
Xiong et al. DNCP: An attention-based deep learning approach enhanced with attractiveness and timeliness of News for online news click prediction
Fares et al. Difficulties and improvements to graph-based lexical sentiment analysis using lisa
CN112818091A (en) Object query method, device, medium and equipment based on keyword extraction
CN111126073B (en) Semantic retrieval method and device
CN112182126A (en) Model training method and device for determining matching degree, electronic equipment and readable storage medium
CN116521892A (en) Knowledge graph application method, knowledge graph application device, electronic equipment, medium and program product
CN115062135A (en) Patent screening method and electronic equipment
CN114942981A (en) Question-answer query method and device, electronic equipment and computer readable storage medium
CN115129863A (en) Intention recognition method, device, equipment, storage medium and computer program product
US11734602B2 (en) Methods and systems for automated feature generation utilizing formula semantification
Machado et al. Analysis of unsupervised aspect term identification methods for portuguese reviews
CN111126033A (en) Response prediction device and method for article
Kuzmin Item2vec-based approach to a recommender system
KR102625347B1 (en) A method for extracting food menu nouns using parts of speech such as verbs and adjectives, a method for updating a food dictionary using the same, and a system for the same

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination