CN116028626A - Text matching method and device, storage medium and electronic equipment - Google Patents

Text matching method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116028626A
CN116028626A CN202310066272.0A CN202310066272A CN116028626A CN 116028626 A CN116028626 A CN 116028626A CN 202310066272 A CN202310066272 A CN 202310066272A CN 116028626 A CN116028626 A CN 116028626A
Authority
CN
China
Prior art keywords
text content
model
similarity
initial text
initial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310066272.0A
Other languages
Chinese (zh)
Inventor
唐伟佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
ICBC Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC, ICBC Technology Co Ltd filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202310066272.0A priority Critical patent/CN116028626A/en
Publication of CN116028626A publication Critical patent/CN116028626A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a text matching method, a text matching device, a storage medium and electronic equipment. Relates to the field of artificial intelligence. The method comprises the following steps: acquiring initial text content sent by a user, and acquiring any one object text content and an application scene of the object text content in a database; judging whether the initial text content has errors or not in an application scene, and inputting the initial text content and the object text content into each model in the model cluster in sequence under the condition that the initial text content has no errors to obtain a similarity score output by each model; and inputting the multiple similarity scores into a logistic regression model to obtain a similarity determination result of the initial text content and the object text content, and determining whether the initial text content and the object text content are matched according to the similarity determination result. According to the method and the device, the problem of low accuracy in similarity judgment by inputting the text into a single model in the related art is solved.

Description

Text matching method and device, storage medium and electronic equipment
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a text matching method, a device, a storage medium, and an electronic apparatus.
Background
Text matching is a basic task in natural language processing and is widely applied to the fields of intelligent question-answering systems and information retrieval. Given two texts, the matching model or system needs to judge whether the semantics of the two texts are the same or not, which is a typical two-class judging task. Early text matching is mostly realized based on a statistical learning method, such as BM25, but the effect is often poor in the practical application process due to lack of feature modeling of a text semantic hierarchy. With the rapid development of deep learning in recent years, a series of text matching methods based on static word vectors have emerged and used to determine whether text matches.
In the deep learning era, the current mainstream text matching methods are mainly divided into two main categories: representation-based methods and interaction-based methods. Most representation-based methods adopt a twin network architecture, such as Siamese-LSTM, respectively extract high-level features of two texts by means of metric learning and the like, and then determine the similarity of the texts through distance calculation. Such matching methods generally have poor results due to lack of consideration of semantic relevance of two sentences. Based on the interactive method, the attention mechanism is generally used in the model structure to perform the feature interaction between two sentences, so as to obtain better matching effect. Most popular pre-training models fall into the second category when applied to text matching tasks and often have the ability to far beyond non-pre-training methods.
The above-mentioned various text matching methods, including the conventional methods, have a common point, that is, two texts to be compared are directly input into a single text matching model, whether the two texts are identical or not is directly determined through the text matching model, and the texts are not preprocessed, so that the recognition and judgment accuracy of the model is reduced after the models are input.
Aiming at the problem of low accuracy of similarity judgment by inputting a text into a single model in the related art, no effective solution is proposed at present.
Disclosure of Invention
The application provides a text matching method, a device, a storage medium and electronic equipment, which are used for solving the problem of low accuracy of similarity judgment by inputting texts into a single model in the related technology.
According to one aspect of the present application, a text matching method is provided. The method comprises the following steps: acquiring initial text content sent by a user, and acquiring any one object text content and an application scene of the object text content in a database; judging whether the initial text content has errors or not in an application scene, and under the condition that the initial text content does not have errors, sequentially inputting the initial text content and the object text content into each model in a model cluster to obtain a similarity score output by each model, wherein a plurality of machine learning models exist in the model cluster, and each machine learning model is used for determining the similarity scores of the initial text content and the object text content; and inputting the multiple similarity scores into a logistic regression model to obtain a similarity determination result of the initial text content and the object text content, and determining whether the initial text content and the object text content are matched according to the similarity determination result.
Optionally, determining whether the initial text content has an error in the application scene includes: word segmentation is carried out on the initial text content, and a word segmentation result is obtained, wherein the word segmentation result comprises a plurality of words; acquiring a historical word correction record in an application scene from a dictionary database, and judging whether a plurality of words exist in the historical word correction record or not to obtain a judgment result; determining that the initial text content is wrong under the condition that the judging result indicates that at least one word in the plurality of words exists in the historical word correction record; and under the condition that the judging result indicates that a plurality of words are not in the historical word correction record, determining that the initial text content is not wrong.
Optionally, the model cluster includes a word matching model, inputting the initial text content and the object text content into the word matching model, and obtaining a similarity score output by the word matching model includes: word segmentation is carried out on the initial text content according to a first granularity and a second granularity, and a first word segmentation result and a second word segmentation result are obtained, wherein the first granularity is smaller than the second granularity; performing word segmentation on the text content of the object according to the first granularity and the second granularity to obtain a third word segmentation result and a fourth word segmentation result; inputting the first word segmentation result and the third word segmentation result with the same granularity into a word matching model to obtain a first similarity value; inputting the second word segmentation result and the fourth word segmentation result with the same granularity into a word matching model to obtain a second similarity value; and determining a similarity score output by the word matching model through the first similarity value and the second similarity value.
Optionally, the model cluster includes a relationship determination model, and inputting the initial text content and the object text content into the relationship determination model to obtain a similarity score output by the relationship determination model includes: inputting the initial text content and the object text content into a relation judging model to obtain a judging result, wherein the judging result is used for determining whether a containing relation exists between the initial text content and the object text content; determining a similarity score output by a relation judging model as a first score under the condition that the judging result represents that an inclusion relation exists between the initial text content and the object text content; and determining the similarity score output by the relation determination model as a second score when the determination result indicates that the inclusion relation does not exist between the initial text content and the object text content, wherein the second score is smaller than the first score.
Optionally, the model cluster includes a first contrast model, and inputting the initial text content and the object text content into the first contrast model, and obtaining the similarity score output by the first contrast model includes: the initial text content and the object text content are subjected to word segmentation according to sentence components, and two groups of fifth word segmentation results are obtained; inputting two groups of fifth word segmentation results into a first comparison model to obtain a first comparison result, wherein the first comparison model is used for determining whether an anti-meaning word exists in words with the same sentence components in the initial text content and the object text content, and the first comparison result comprises the logarithm of the anti-meaning word; and acquiring the logarithm of the anti-ambiguous word from the first comparison result, and determining the similarity score output by the first comparison model according to the logarithm.
Optionally, the model cluster includes a second contrast model, and inputting the initial text content and the object text content into the second contrast model, and obtaining the similarity score output by the second contrast model includes: determining keywords in the initial text content through a dictionary database to obtain first keywords; determining keywords in the text content of the object through a dictionary database to obtain second keywords; inputting the first keywords and the second keywords into a second comparison model to obtain a second comparison result, wherein the second comparison model is used for determining the similarity of the keywords in the text content; and determining the second comparison result as a similarity score output by the second comparison model.
Optionally, before determining the keywords in the initial text content by the dictionary database to obtain the first keywords, the method further includes: determining whether keywords exist in the initial text content and the object text content through a dictionary database; determining the similarity score output by the second comparison model as a second score under the condition that no keyword exists in the initial text content or the object text content; in the case that no keyword exists in the initial text content and the object text content, the similarity score output by the second comparison model is determined as the first score.
According to another aspect of the present application, a text matching device is provided. The device comprises: the acquisition unit is used for acquiring initial text content sent by a user and acquiring any one object text content and application scene of the object text content in the database; the input unit is used for judging whether the initial text content has errors or not under an application scene, and inputting the initial text content and the object text content into each model in the model cluster in sequence under the condition that the initial text content does not have errors to obtain the similarity score output by each model, wherein a plurality of machine learning models exist in the model cluster, and each machine learning model is used for determining the similarity scores of the initial text content and the object text content; and the first determining unit is used for inputting the plurality of similarity scores into the logistic regression model to obtain a similarity determining result of the initial text content and the object text content, and determining whether the initial text content and the object text content are matched according to the similarity determining result.
According to another aspect of the embodiment of the present invention, there is also provided a computer storage medium for storing a program, where the program controls a device in which the computer storage medium is located to execute a text matching method when running.
According to another aspect of embodiments of the present invention, there is also provided an electronic device including one or more processors and a memory; the memory has stored therein computer readable instructions for execution by the processor, wherein the computer readable instructions when executed perform a text matching method.
Through the application, the following steps are adopted: acquiring initial text content sent by a user, and acquiring any one object text content and an application scene of the object text content in a database; judging whether the initial text content has errors or not in an application scene, and under the condition that the initial text content does not have errors, sequentially inputting the initial text content and the object text content into each model in a model cluster to obtain a similarity score output by each model, wherein a plurality of machine learning models exist in the model cluster, and each machine learning model is used for determining the similarity scores of the initial text content and the object text content; and inputting the multiple similarity scores into a logistic regression model to obtain a similarity determination result of the initial text content and the object text content, and determining whether the initial text content and the object text content are matched according to the similarity determination result. The problem of low accuracy of similarity judgment by inputting texts into a single model in the related art is solved. The method comprises the steps of firstly determining whether the text content to be compared is correct or not, and under the condition that the text content is correct, sequentially judging the initial text content and the object text content through a plurality of models to obtain a plurality of similarity results, and comprehensively judging through the plurality of similarity results, so that the effect of accurately judging whether the two text contents are matched or not is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application, illustrate and explain the application and are not to be construed as limiting the application. In the drawings:
FIG. 1 is a flow chart of a text matching method provided according to an embodiment of the present application;
FIG. 2 is a flowchart of steps for determining whether an initial text is correct or incorrect, provided in accordance with an embodiment of the present application;
fig. 3 is a schematic diagram of a text matching device provided according to an embodiment of the present application;
fig. 4 is a schematic diagram of an electronic device according to an embodiment of the present application.
Detailed Description
It should be noted that, in the case of no conflict, the embodiments and features in the embodiments may be combined with each other. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
In order to make the present application solution better understood by those skilled in the art, the following description will be made in detail and with reference to the accompanying drawings in the embodiments of the present application, it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, shall fall within the scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate in order to describe the embodiments of the present application described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, related information (including, but not limited to, user equipment information, user personal information, etc.) and data (including, but not limited to, data for presentation, analyzed data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.
It should be noted that the text matching method, the device, the storage medium and the electronic device determined by the present disclosure may be used in the field of artificial intelligence, and may also be used in any field other than the field of artificial intelligence, and the application fields of the text matching method, the device, the storage medium and the electronic device determined by the present disclosure are not limited.
According to an embodiment of the application, a text matching method is provided.
Fig. 1 is a flowchart of a text matching method provided according to an embodiment of the present application. As shown in fig. 1, the method comprises the steps of:
step S101, acquiring initial text content sent by a user, and acquiring any one object text content and application scene of the object text content in a database.
It should be noted that, the comparison text content may be a preset text with a preset result, after the initial text content sent by the user is obtained, it is necessary to determine, according to the initial text content, information that the user wants to obtain or a request that the user wants to execute, and sequentially compare the initial text content with each comparison text content stored in the database, so that text information that is closest to the initial text content may be determined, and the preset result corresponding to the text is determined as the preset result of the user.
Specifically, when text matching is performed, firstly, initial text content sent by a user is acquired, wherein the initial text content can be the requirement of the user, the initial text content can be input in a voice or typing mode, and after the initial text content is acquired, any one object text content can be selected from a plurality of existing object text contents, so that the similarity between the initial text content and the object text content, namely, the matching degree, is determined. After the similarity between the initial text content and each object text content is confirmed in sequence, the object text content with the highest similarity can be selected from the initial text content, the object text content is determined to be the text content corresponding to the initial text content, and information corresponding to the object text content is fed back to the user.
It should be added that after determining the object text content, firstly determining the application scenario of the object text content, and because error-prone words frequently appear in different scenarios and words used in different scenarios are different, under some scenarios the word a may be the error word, and under other scenarios the word a is the correct word, it is necessary to determine whether the initial text content has errors in the application scenario.
Step S102, judging whether the initial text content has errors or not under an application scene, and inputting the initial text content and the object text content into each model in a model cluster in sequence under the condition that the initial text content has no errors to obtain a similarity score output by each model, wherein a plurality of machine learning models exist in the model cluster, and each machine learning model is used for determining the similarity scores of the initial text content and the object text content.
Specifically, in the case that the initial text content has no error, the initial text content and the object text content need to be input into a plurality of pre-trained machine learning models, so that similarity scores between the initial text content and the object text content are determined from a plurality of angles through the plurality of models, and further, the similarity between the initial text content and the object text content can be evaluated from a plurality of angles.
Step S103, inputting a plurality of similarity scores into a logistic regression model to obtain a similarity determination result of the initial text content and the object text content, and determining whether the initial text content and the object text content are matched according to the similarity determination result.
Specifically, after obtaining multiple similarity scores, the multiple scoring results may be classified and determined by using a logistic regression model, where the multiple similarity scores are used as features of different dimensions, and are input together into a two-classification logistic regression model to perform final matching result prediction, so as to determine whether the initial text content and the object text content are consistent text contents, where when determining whether the initial text content and the object text content are consistent, the multiple obtained similarity scores may be processed by using the logistic regression model, and logistic regression calculation may be performed according to the multiple similarity scores to obtain a similarity score, and the object text content with the highest score may be determined as the text closest to the initial text content from the similarity scores of the initial text content and the multiple object text contents.
According to the text matching method, initial text content sent by a user is obtained, and any one object text content and an application scene of the object text content in a database are obtained; judging whether the initial text content has errors or not in an application scene, and under the condition that the initial text content does not have errors, sequentially inputting the initial text content and the object text content into each model in a model cluster to obtain a similarity score output by each model, wherein a plurality of machine learning models exist in the model cluster, and each machine learning model is used for determining the similarity scores of the initial text content and the object text content; and inputting the multiple similarity scores into a logistic regression model to obtain a similarity determination result of the initial text content and the object text content, and determining whether the initial text content and the object text content are matched according to the similarity determination result. The problem of low accuracy of similarity judgment by inputting texts into a single model in the related art is solved. The method comprises the steps of firstly determining whether the text content to be compared is correct or not, and under the condition that the text content is correct, sequentially judging the initial text content and the object text content through a plurality of models to obtain a plurality of similarity results, and comprehensively judging through the plurality of similarity results, so that the effect of accurately judging whether the two text contents are matched or not is achieved.
Optionally, fig. 2 is a flowchart of steps for determining whether an initial text is correct or incorrect, as shown in fig. 2, in the text matching method provided in the embodiment of the present application, in step S102, determining whether the initial text content has an error in an application scenario includes:
step S201, word segmentation is carried out on the initial text content to obtain word segmentation results, wherein the word segmentation results comprise a plurality of words; step S202, acquiring a historical word correction record in an application scene from a dictionary database, and judging whether a plurality of words exist in the historical word correction record or not to obtain a judgment result; step S203, determining that the initial text content has errors when the judging result indicates that at least one word in the plurality of words exists in the historical word correction record; in step S204, if the judgment result indicates that none of the plurality of words exists in the history word correction record, it is determined that no error exists in the initial text content.
Specifically, since the correctness of the same word is different in different language environments, for example, in a school scene, the word "steel" is wrong with a high probability, and the corresponding correct word is "just" but in a building site scene, the word "steel" is correct.
In order to find the problems in time, when determining whether an error exists in the initial text content, a history word correction record in the current application scene can be obtained, wherein the history word correction record comprises a plurality of pairs of history word correction records, each pair of history word correction records comprises an error word and a corresponding correct word, after the initial text content is segmented, whether the error word recorded in a dictionary database exists in the obtained plurality of words can be sequentially determined, and when the error word recorded in the dictionary database exists, the word is determined to be the error word, and the correct word corresponding to the error word is used for replacing the error word in the initial text content, so that the effect of correcting the initial text content is achieved.
Optionally, in the text matching method provided in the embodiment of the present application, the model cluster includes a word matching model, and inputting the initial text content and the object text content into the word matching model, obtaining the similarity score output by the word matching model includes: word segmentation is carried out on the initial text content according to a first granularity and a second granularity, and a first word segmentation result and a second word segmentation result are obtained, wherein the first granularity is smaller than the second granularity; performing word segmentation on the text content of the object according to the first granularity and the second granularity to obtain a third word segmentation result and a fourth word segmentation result; inputting the first word segmentation result and the third word segmentation result with the same granularity into a word matching model to obtain a first similarity value; inputting the second word segmentation result and the fourth word segmentation result with the same granularity into a word matching model to obtain a second similarity value; and determining a similarity score output by the word matching model through the first similarity value and the second similarity value.
Specifically, the word matching model can determine the similarity of two text contents through the similarity between words, in order to improve the accuracy of the word matching model, the initial text contents can be segmented according to a first granularity and a second granularity, and the object text contents can be segmented according to the first granularity and the second granularity, so that the segmentation is performed through the two segmentation granularities, for example, the first granularity performs segmentation for each word once, the second granularity performs segmentation for each two words once, and then the same text contents are compared under different segmentation granularities, so that the accuracy of similarity scores is improved.
Optionally, in the text matching method provided in the embodiment of the present application, the model cluster includes a relationship determination model, and inputting the initial text content and the object text content into the relationship determination model, obtaining the similarity score output by the relationship determination model includes: inputting the initial text content and the object text content into a relation judging model to obtain a judging result, wherein the judging result is used for determining whether a containing relation exists between the initial text content and the object text content; determining a similarity score output by a relation judging model as a first score under the condition that the judging result represents that an inclusion relation exists between the initial text content and the object text content; and determining the similarity score output by the relation determination model as a second score when the determination result indicates that the inclusion relation does not exist between the initial text content and the object text content, wherein the second score is smaller than the first score.
It should be noted that, the relationship determination model may be trained by the first sample data, where the first sample information may include a plurality of sets of text contents having a containing relationship and a plurality of sets of text contents having no containing relationship, and after the relationship determination model is trained, the relationship determination model may determine whether a containing relationship exists between two text contents.
Specifically, after the initial text content and the object text content are input into the relation judgment model to obtain the judgment result, when the judgment result represents that the inclusion relation exists between the initial text content and the object text content, a larger similarity exists between the initial text content and the object text content, so that the similarity score can be determined as a first score, and when the inclusion relation does not exist, a lower second score can be determined as a similarity score, but the similarity which cannot represent the initial text content and the object text content is low, and judgment of other dimensions is still needed to be performed so as to determine the similarity between the initial text content and the object text content.
Optionally, in the text matching method provided in the embodiment of the present application, the model cluster includes a first contrast model, and inputting the initial text content and the object text content into the first contrast model, obtaining a similarity score output by the first contrast model includes: the initial text content and the object text content are subjected to word segmentation according to sentence components, and two groups of fifth word segmentation results are obtained; inputting two groups of fifth word segmentation results into a first comparison model to obtain a first comparison result, wherein the first comparison model is used for determining whether an anti-meaning word exists in words with the same sentence components in the initial text content and the object text content, and the first comparison result comprises the logarithm of the anti-meaning word; and acquiring the logarithm of the anti-ambiguous word from the first comparison result, and determining the similarity score output by the first comparison model according to the logarithm.
Specifically, since the accuracy of comparing only words is low, the initial text content and the object text content can be segmented according to sentence components, that is, the two text contents are segmented according to sentence components of a main predicate and the segmented result is input into a first comparison model for comparison, so that whether an anti-meaning word exists or not can be determined in the dimension of the sentence components, that is, the opposite relationship, for example, the initial text content can be "today weather is good", the object text content can be "today weather is bad", when the similarity is determined only by the segmentation method, the similarity can be possibly judged to be high due to the fact that the same words are more in the two sentences, but are not similar words, the anti-meaning word is identified and detected after the segmentation according to the sentence components, and further after a plurality of groups of anti-meaning words exist, the accuracy of determining the similarity between the initial text content and the object text content can be improved by scoring according to the log similarity score of the anti-meaning words.
Optionally, in the text matching method provided in the embodiment of the present application, the model cluster includes a second comparison model, and inputting the initial text content and the object text content into the second comparison model, obtaining the similarity score output by the second comparison model includes: determining keywords in the initial text content through a dictionary database to obtain first keywords; determining keywords in the text content of the object through a dictionary database to obtain second keywords; inputting the first keywords and the second keywords into a second comparison model to obtain a second comparison result, wherein the second comparison model is used for determining the similarity of the keywords in the text content; and determining the second comparison result as a similarity score output by the second comparison model.
Specifically, the dictionary library further stores a plurality of keywords, which may be important entity words, such as place names, person names, proper nouns, and the like, and keywords of each text content may be obtained from the initial text content and the object text content, respectively, and the plurality of keywords are input into the second comparison model for comparison, so that similarity scores between the initial text content and the object text content are determined by determining similarities between the keywords.
Optionally, in the text matching method provided in the embodiment of the present application, before determining, by the dictionary database, a keyword in the initial text content to obtain the first keyword, the method further includes: determining whether keywords exist in the initial text content and the object text content through a dictionary database; determining the similarity score output by the second comparison model as a second score under the condition that no keyword exists in the initial text content or the object text content; in the case that no keyword exists in the initial text content and the object text content, the similarity score output by the second comparison model is determined as the first score.
Specifically, when determining a keyword, there may be a case where no keyword exists, a case where any one of the initial text content and the object text content does not exist, a similarity score is determined as a second score, for example, 0.5 score, and a case where neither the initial text content nor the object text content exists, a similarity score is determined as a first score, for example, 1 score, so that in a case where a similarity score cannot be determined by the second comparison model, a similarity score of the second comparison model is determined, ensuring accuracy in subsequent calculation by the similarity scores of the plurality of dimensions.
It should be noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.
The embodiment of the application also provides a text matching device, and the text matching device can be used for executing the text matching method. The text matching device provided in the embodiment of the present application is described below.
Fig. 3 is a schematic diagram of a text matching device provided according to an embodiment of the present application. As shown in fig. 3, the apparatus includes: an acquisition unit 31, an input unit 32, a first determination unit 33.
An obtaining unit 31, configured to obtain initial text content sent by a user, and obtain any one of object text content and an application scene of the object text content in a database.
And the input unit 32 is configured to determine whether the initial text content has an error in the application scenario, and sequentially input the initial text content and the object text content into each model in the model cluster to obtain a similarity score output by each model when the initial text content has no error, where a plurality of machine learning models exist in the model cluster, and each machine learning model is configured to determine the similarity scores of the initial text content and the object text content.
The first determining unit 33 is configured to input the plurality of similarity scores into a logistic regression model, obtain a similarity determination result of the initial text content and the object text content, and determine whether the initial text content and the object text content match according to the similarity determination result.
The text matching device provided by the embodiment of the application acquires initial text content sent by a user through the acquisition unit 31, and acquires any one object text content and an application scene of the object text content in a database. The input unit 32 judges whether the initial text content has an error in the application scene, and sequentially inputs the initial text content and the object text content into each model in the model cluster to obtain a similarity score output by each model when the initial text content has no error, wherein a plurality of machine learning models exist in the model cluster, and each machine learning model is used for determining the similarity scores of the initial text content and the object text content. The first determining unit 33 inputs the plurality of similarity scores into the logistic regression model, obtains a similarity determination result of the initial text content and the object text content, and determines whether the initial text content and the object text content match or not based on the similarity determination result. The problem of low accuracy of similarity judgment by inputting texts into a single model in the related art is solved. The method comprises the steps of firstly determining whether the text content to be compared is correct or not, and under the condition that the text content is correct, sequentially judging the initial text content and the object text content through a plurality of models to obtain a plurality of similarity results, and comprehensively judging through the plurality of similarity results, so that the effect of accurately judging whether the two text contents are matched or not is achieved.
Optionally, in the text matching device provided in the embodiment of the present application, the input unit 32 includes: the first word segmentation module is used for segmenting the initial text content to obtain a word segmentation result, wherein the word segmentation result comprises a plurality of words; the first judging module is used for acquiring a historical word correction record in an application scene from the dictionary database, judging whether the plurality of words exist in the historical word correction record or not, and obtaining a judging result; the first determining module is used for determining that the initial text content is wrong when the judging result indicates that at least one word in the plurality of words exists in the historical word correction record; and the second determining module is used for determining that the initial text content has no error under the condition that the judging result indicates that the plurality of words do not exist in the historical word correction record.
Optionally, in the text matching device provided in the embodiment of the present application, the model cluster includes a word matching model, and the input unit 32 includes: the second word segmentation module is used for segmenting the initial text content according to a first granularity and a second granularity to obtain a first word segmentation result and a second word segmentation result, wherein the first granularity is smaller than the second granularity; the third word segmentation module is used for segmenting the text content of the object according to the first granularity and the second granularity to obtain a third word segmentation result and a fourth word segmentation result; the first input module is used for inputting a first word segmentation result and a third word segmentation result with the same granularity into the word matching model to obtain a first similarity value; the second input module is used for inputting a second word segmentation result and a fourth word segmentation result with the same granularity into the word matching model to obtain a second similarity value; and the third determining module is used for determining the similarity score output by the word matching model through the first similarity value and the second similarity value.
Optionally, in the text matching device provided in the embodiment of the present application, the model cluster includes a relationship determination model, and the input unit 32 includes: the third input module is used for inputting the initial text content and the object text content into the relation judging model to obtain a judging result, wherein the judging result is used for determining whether a containing relation exists between the initial text content and the object text content; a fourth determining module, configured to determine, as a first score, a similarity score output by the relationship determination model when the determination result characterizes that there is an inclusion relationship between the initial text content and the object text content; and a fifth determining module, configured to determine, when the determination result indicates that there is no inclusion relationship between the initial text content and the object text content, a similarity score output by the relationship determination model as a second score, where the second score is smaller than the first score.
Optionally, in the text matching device provided in the embodiment of the present application, the model cluster includes a first contrast model, and the input unit 32 includes: the fourth word segmentation module is used for segmenting the initial text content and the object text content according to sentence components to obtain two groups of fifth word segmentation results; the fourth input module is used for inputting two groups of fifth word segmentation results into the first comparison model to obtain a first comparison result, wherein the first comparison model is used for determining whether an anti-meaning word exists in words with the same sentence components in the initial text content and the object text content, and the first comparison result comprises the logarithm of the anti-meaning word; and the sixth determining module is used for acquiring the logarithm of the anti-ambiguous word from the first comparison result and determining the similarity score output by the first comparison model according to the logarithm.
Optionally, in the text matching device provided in the embodiment of the present application, the model cluster includes a second contrast model, and the input unit 32 includes: a seventh determining module, configured to determine keywords in the initial text content through a dictionary database, to obtain a first keyword; an eighth determining module, configured to determine keywords in the object text content through a dictionary database, to obtain second keywords; the fifth input module is used for inputting the first keywords and the second keywords into a second comparison model to obtain a second comparison result, wherein the second comparison model is used for determining the similarity of the keywords in the text content; and the ninth determining module is used for determining the second comparison result as the similarity score output by the second comparison model.
Optionally, in the text matching device provided in the embodiment of the present application, before determining, by the dictionary database, a keyword in the initial text content to obtain the first keyword, the device further includes: a second determining unit for determining whether keywords exist in both the initial text content and the object text content through a dictionary database; a third determining unit configured to determine, as a second score, a similarity score output by the second comparison model in a case where no keyword exists in the initial text content or the object text content; and a fourth determining unit configured to determine, as the first score, a similarity score output by the second comparison model in a case where no keyword exists in both the initial text content and the object text content.
The text matching device includes a processor and a memory, the acquisition unit 31, the input unit 32, the first determination unit 33, and the like are stored as program units in the memory, and the processor executes the program units stored in the memory to realize the corresponding functions.
The processor includes a kernel, and the kernel fetches the corresponding program unit from the memory. The kernel can be provided with one or more than one kernel, and the problem of low accuracy of similarity judgment by inputting texts into a single model in the related art is solved by adjusting kernel parameters.
The memory may include volatile memory, random Access Memory (RAM), and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), among other forms in computer readable media, the memory including at least one memory chip.
An embodiment of the present invention provides a computer-readable storage medium having stored thereon a program that, when executed by a processor, implements the text matching method.
The embodiment of the invention provides a processor which is used for running a program, wherein the text matching method is executed when the program runs.
As shown in fig. 4, an embodiment of the present invention provides an electronic device, where the electronic device 40 includes a processor, a memory, and a program stored on the memory and executable on the processor, and when the processor executes the program, the following steps are implemented: acquiring initial text content sent by a user, and acquiring any one object text content and an application scene of the object text content in a database; judging whether the initial text content has errors or not in an application scene, and under the condition that the initial text content does not have errors, sequentially inputting the initial text content and the object text content into each model in a model cluster to obtain a similarity score output by each model, wherein a plurality of machine learning models exist in the model cluster, and each machine learning model is used for determining the similarity scores of the initial text content and the object text content; and inputting the multiple similarity scores into a logistic regression model to obtain a similarity determination result of the initial text content and the object text content, and determining whether the initial text content and the object text content are matched according to the similarity determination result. The device herein may be a server, PC, PAD, cell phone, etc.
The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: acquiring initial text content sent by a user, and acquiring any one object text content and an application scene of the object text content in a database; judging whether the initial text content has errors or not in an application scene, and under the condition that the initial text content does not have errors, sequentially inputting the initial text content and the object text content into each model in a model cluster to obtain a similarity score output by each model, wherein a plurality of machine learning models exist in the model cluster, and each machine learning model is used for determining the similarity scores of the initial text content and the object text content; and inputting the multiple similarity scores into a logistic regression model to obtain a similarity determination result of the initial text content and the object text content, and determining whether the initial text content and the object text content are matched according to the similarity determination result.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, etc., such as Read Only Memory (ROM) or flash RAM. Memory is an example of a computer-readable medium.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises an element.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims (10)

1. A text matching method, comprising:
acquiring initial text content sent by a user, and acquiring any one object text content and an application scene of the object text content in a database;
Judging whether the initial text content has errors or not under the application scene, and inputting the initial text content and the object text content into each model in a model cluster in sequence under the condition that the initial text content does not have errors to obtain a similarity score output by each model, wherein a plurality of machine learning models exist in the model cluster, and each machine learning model is used for determining the similarity scores of the initial text content and the object text content;
and inputting the multiple similarity scores into a logistic regression model to obtain a similarity determination result of the initial text content and the object text content, and determining whether the initial text content and the object text content are matched according to the similarity determination result.
2. The method of claim 1, wherein determining whether the initial text content has an error in the application scenario comprises:
word segmentation is carried out on the initial text content to obtain word segmentation results, wherein the word segmentation results comprise a plurality of words;
acquiring a historical word correction record under the application scene from a dictionary database, and judging whether the plurality of words exist in the historical word correction record or not to obtain a judgment result;
Determining that the initial text content is wrong when the judging result indicates that at least one word in the plurality of words exists in the historical word correction record;
and under the condition that the judging result indicates that the words are not in the historical word correction record, determining that the initial text content is not wrong.
3. The method of claim 1, wherein the model cluster includes a word matching model, wherein inputting the initial text content and the object text content into the word matching model, and wherein obtaining a similarity score for an output of the word matching model comprises:
word segmentation is carried out on the initial text content according to a first granularity and a second granularity, and a first word segmentation result and a second word segmentation result are obtained, wherein the first granularity is smaller than the second granularity;
performing word segmentation on the object text content according to the first granularity and the second granularity to obtain a third word segmentation result and a fourth word segmentation result;
inputting a first word segmentation result and a third word segmentation result with the same granularity into the word matching model to obtain a first similarity value;
inputting a second word segmentation result and a fourth word segmentation result with the same granularity into the word matching model to obtain a second similarity value;
And determining the similarity score output by the word matching model through the first similarity value and the second similarity value.
4. The method of claim 1, wherein the model cluster includes a relationship determination model, wherein inputting the initial text content and the object text content into the relationship determination model, and wherein obtaining a similarity score output by the relationship determination model includes:
inputting the initial text content and the object text content into the relation judging model to obtain a judging result, wherein the judging result is used for determining whether a containing relation exists between the initial text content and the object text content;
determining a similarity score output by the relation determination model as a first score when the determination result characterizes that an inclusion relation exists between the initial text content and the object text content;
and determining a similarity score output by the relation determination model as a second score when the determination result indicates that no containing relation exists between the initial text content and the object text content, wherein the second score is smaller than the first score.
5. The method of claim 1, wherein the model cluster includes a first contrast model, wherein inputting the initial text content and the object text content into the first contrast model, and wherein obtaining a similarity score output by the first contrast model includes:
the initial text content and the object text content are subjected to word segmentation according to sentence components, and two groups of fifth word segmentation results are obtained;
inputting the two groups of fifth word segmentation results into the first comparison model to obtain a first comparison result, wherein the first comparison model is used for determining whether an anti-meaning word exists in words with the same sentence components in the initial text content and the object text content, and the first comparison result comprises the logarithm of the anti-meaning word;
and acquiring the logarithm of the disambiguation word from the first comparison result, and determining the similarity score output by the first comparison model according to the logarithm.
6. The method of claim 1, wherein the model cluster includes a second comparison model, wherein inputting the initial text content and the object text content into the second comparison model, and wherein obtaining the similarity score output by the second comparison model includes:
Determining keywords in the initial text content through a dictionary database to obtain first keywords;
determining keywords in the text content of the object through the dictionary database to obtain second keywords;
inputting the first keywords and the second keywords into the second comparison model to obtain a second comparison result, wherein the second comparison model is used for determining the similarity of the keywords in the text content;
and determining the second comparison result as a similarity score output by the second comparison model.
7. The method of claim 6, wherein prior to determining keywords in the initial text content via a dictionary database to obtain first keywords, the method further comprises:
determining whether the keywords exist in the initial text content and the object text content through the dictionary database;
determining a similarity score output by the second comparison model as a second score in the case that the keyword is not present in the initial text content or the object text content;
and determining the similarity score output by the second comparison model as a first score under the condition that the keywords are not present in the initial text content and the object text content.
8. A text matching apparatus, comprising:
the acquisition unit is used for acquiring initial text content sent by a user and acquiring any one object text content and an application scene of the object text content in the database;
the input unit is used for judging whether the initial text content has errors or not under the application scene, and inputting the initial text content and the object text content into each model in a model cluster in sequence under the condition that the initial text content does not have errors to obtain a similarity score output by each model, wherein a plurality of machine learning models exist in the model cluster, and each machine learning model is used for determining the similarity scores of the initial text content and the object text content;
and the first determining unit is used for inputting a plurality of similarity scores into a logistic regression model to obtain a similarity determining result of the initial text content and the object text content, and determining whether the initial text content and the object text content are matched according to the similarity determining result.
9. A computer storage medium for storing a program, wherein the program when run controls a device in which the computer storage medium is located to perform the text matching method of any one of claims 1 to 7.
10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the text matching method of any of claims 1-7.
CN202310066272.0A 2023-01-12 2023-01-12 Text matching method and device, storage medium and electronic equipment Pending CN116028626A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310066272.0A CN116028626A (en) 2023-01-12 2023-01-12 Text matching method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310066272.0A CN116028626A (en) 2023-01-12 2023-01-12 Text matching method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
CN116028626A true CN116028626A (en) 2023-04-28

Family

ID=86070584

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310066272.0A Pending CN116028626A (en) 2023-01-12 2023-01-12 Text matching method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116028626A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116188091A (en) * 2023-05-04 2023-05-30 品茗科技股份有限公司 Method, device, equipment and medium for automatic matching unit price reference of cost list

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116188091A (en) * 2023-05-04 2023-05-30 品茗科技股份有限公司 Method, device, equipment and medium for automatic matching unit price reference of cost list

Similar Documents

Publication Publication Date Title
CN105989040B (en) Intelligent question and answer method, device and system
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
US11016997B1 (en) Generating query results based on domain-specific dynamic word embeddings
CN110597966A (en) Automatic question answering method and device
US20130304471A1 (en) Contextual Voice Query Dilation
CN110162778B (en) Text abstract generation method and device
US11734322B2 (en) Enhanced intent matching using keyword-based word mover's distance
US20220351634A1 (en) Question answering systems
CN110781687B (en) Same intention statement acquisition method and device
US20220138240A1 (en) Source code retrieval
CN109471889B (en) Report accelerating method, system, computer equipment and storage medium
CN110955766A (en) Method and system for automatically expanding intelligent customer service standard problem pairs
EP3832485A1 (en) Question answering systems
CN114661872A (en) Beginner-oriented API self-adaptive recommendation method and system
CN116150306A (en) Training method of question-answering robot, question-answering method and device
CN116028626A (en) Text matching method and device, storage medium and electronic equipment
CN117290481A (en) Question and answer method and device based on deep learning, storage medium and electronic equipment
CN109522920B (en) Training method and device of synonymy discriminant model based on combination of semantic features
CN114691907B (en) Cross-modal retrieval method, device and medium
CN109993190B (en) Ontology matching method and device and computer storage medium
CN110795562A (en) Map optimization method, device, terminal and storage medium
CN112632232B (en) Text matching method, device, equipment and medium
CN110929501B (en) Text analysis method and device
CN112579774A (en) Model training method, model training device and terminal equipment
CN116225770B (en) Patch matching method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination