CN113297354A - Text matching method, device, equipment and storage medium - Google Patents

Text matching method, device, equipment and storage medium Download PDF

Info

Publication number
CN113297354A
CN113297354A CN202110669568.2A CN202110669568A CN113297354A CN 113297354 A CN113297354 A CN 113297354A CN 202110669568 A CN202110669568 A CN 202110669568A CN 113297354 A CN113297354 A CN 113297354A
Authority
CN
China
Prior art keywords
text
matched
vector representation
determining
matching
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110669568.2A
Other languages
Chinese (zh)
Inventor
周楠楠
汤耀华
杨海军
徐倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202110669568.2A priority Critical patent/CN113297354A/en
Publication of CN113297354A publication Critical patent/CN113297354A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a text matching method, a text matching device, text matching equipment and a storage medium, wherein the method comprises the following steps: when text matching is carried out, firstly, a text to be matched is input into a pre-training model to obtain a first vector representation corresponding to the text to be matched, a second vector representation corresponding to the text to be matched is determined according to the dependency relationship among the words in the text to be matched, and then a target vector representation corresponding to the text to be matched is determined by combining the first vector and the second vector representation, so that the dependency relationship among the words in the text to be matched is fully considered, and the accuracy of vector representation for describing the text to be matched is improved; therefore, the matching result of the text to be matched is determined according to the target vector representation with higher accuracy, and the accuracy of the text matching result is improved.

Description

Text matching method, device, equipment and storage medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a text matching method, apparatus, device, and storage medium.
Background
Text matching finds widespread application in more and more fields, such as the field of intelligent question and answer or the field of text.
In the prior art, when text matching is performed, a CLS zone bit is added to the first position of a text to be matched, the added text to be matched is input to a pre-training model, a vector of the last layer of CLS zone bit in an output result of the pre-training model is determined as a target vector for describing the text to be matched, and then similarity between the target vector and a vector corresponding to a pre-stored text is calculated, so that whether the text to be matched is matched with the text in a database is determined according to the similarity.
However, by using the existing matching method, the vector of the last layer of CLS flag bits in the output result of the pre-trained model is determined as the target vector for describing the text to be matched, which results in low accuracy of text matching.
Disclosure of Invention
The present application mainly aims to provide a text matching method, device, apparatus, and storage medium, and aims to improve the accuracy of text matching.
In order to achieve the above object, the present application provides a text matching method, including:
and acquiring a text to be matched.
Inputting the text to be matched into a pre-training model to obtain a first vector representation corresponding to the text to be matched, and determining a second vector representation corresponding to the text to be matched according to the dependency relationship among words in the text to be matched.
And determining a target vector representation corresponding to the text to be matched according to the first vector representation and the second vector representation.
And determining a matching result of the text to be matched according to the target vector representation.
In a possible implementation manner, the determining, according to the dependency relationship between the words in the text to be matched, a second vector representation corresponding to the text to be matched includes:
and inputting the text to be matched into a pre-trained dependency syntax analysis model to obtain the dependency relationship among the vocabularies in the text to be matched.
And determining at least one core vocabulary in the text to be matched according to the dependency relationship among the vocabularies in the text to be matched.
And determining the second vector representation according to the vector representation corresponding to each core vocabulary in the at least one core vocabulary.
In a possible implementation manner, the number of the core vocabularies is at least two, and the determining the second vector representation according to the vector representation corresponding to each core vocabulary in the at least one core vocabulary includes:
carrying out weighted average on vector representations corresponding to the core vocabularies; determining a weighted average result as the second vector representation.
In a possible implementation manner, the determining, according to the first vector representation and the second vector representation, a target vector representation corresponding to the text to be matched includes:
and splicing the first vector representation and the second vector representation to obtain a spliced vector representation.
And determining the vector representation after the splicing processing as the target vector representation.
In a possible implementation manner, the inputting the text to be matched into a pre-training model to obtain a first vector representation corresponding to the text to be matched includes:
and inputting the text to be matched into the pre-training model to obtain an output result corresponding to the text to be matched.
And carrying out weighted average on vectors of the last two layers in the output result to obtain the first vector representation.
In a possible implementation manner, the determining a matching result of the text to be matched according to the target vector representation includes:
and determining cosine similarity between the target vector representation and a preset vector representation corresponding to a preset text.
Determining a matching result of the text to be matched according to the cosine similarity; the matching result comprises that the text to be matched is matched with the preset text; or the text to be matched is not matched with the preset text.
In one possible implementation, the method further includes:
and if the matching result is that the text to be matched is matched with the preset text, determining response information corresponding to the preset text.
And outputting the response information.
The present application also provides a text matching apparatus, which may include:
and the acquisition unit is used for acquiring the text to be matched.
And the processing unit is used for inputting the text to be matched into a pre-training model to obtain a first vector representation corresponding to the text to be matched, and determining a second vector representation corresponding to the text to be matched according to the dependency relationship among the vocabularies in the text to be matched.
And the determining unit is used for determining a target vector representation corresponding to the text to be matched according to the first vector representation and the second vector representation, and determining a matching result of the text to be matched according to the target vector representation.
The present application further provides an electronic device, which includes: a memory, a processor and a text matching program stored on the memory and executable on the processor, wherein the text matching program, when executed by the processor, implements the steps of the text matching method according to any one of the possible implementations of the first aspect.
The present application further provides a computer-readable storage medium, on which a text matching program is stored, and the text matching program, when executed by a processor, implements the steps of the text matching method according to any one of the possible implementations of the first aspect.
The present application also provides a computer program product comprising a computer program, which when executed by a processor implements the text matching method according to any of the possible implementation manners of the first aspect.
In the method, when text matching is carried out, a text to be matched is input into a pre-training model to obtain a first vector representation corresponding to the text to be matched, a second vector representation corresponding to the text to be matched is determined according to the dependency relationship among words in the text to be matched, a target vector representation corresponding to the text to be matched is determined by combining the first vector representation and the second vector representation, the dependency relationship among words in the text to be matched is fully considered, and the accuracy of vector representation for describing the text to be matched is improved; therefore, the matching result of the text to be matched is determined according to the target vector representation with higher accuracy, and the accuracy of the text matching result is improved.
Drawings
Fig. 1 is a schematic diagram of a framework of an application scenario provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a text matching method according to an embodiment of the present application;
fig. 3 is a schematic flowchart of a method for determining a second vector representation corresponding to a text to be matched according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a text matching apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
The text matching method provided by the embodiment of the application can be applied to an intelligent voice question-answer scene and can also be applied to a text matching scene. Taking the application to an intelligent voice question-and-answer scenario as an example, for example, a user inputs "do you like, ask how much weather on day? If the intelligent device has sufficient computing power, after acquiring voice information input by a user, converting the voice information into text information, taking the text information as a matching basis, searching preset text information matched with the voice information from a question and answer library, if the preset text information 'how the tomorrow weather' exists in the question and answer library is determined through text matching, determining a response message corresponding to the preset text information, namely, a response corresponding to the preset text information 'how the tomorrow weather' is 'suitable for outdoor activities', and outputting the 'tomorrow weather is clear and suitable for outdoor activities' to the user so as to complete the voice question and answer operation of the time.
It will be appreciated that, if the computing power of the smart device is weak, after converting the voice information into text information, the text information can be used as a matching basis and sent to the electronic equipment for performing text matching and other operations, for example, a terminal or a server, please refer to fig. 1 for example, and fig. 1 is a schematic diagram of an application scenario provided in an embodiment of the present application, where an electronic device searches for preset text information matched with the voice information from a question and answer library, and determines that "how the weather is tomorrow" exists in the preset text information in the question and answer library through text matching, then the answer corresponding to the preset text message ' how the weather is in the open air ' is sent to the intelligent device, which is suitable for outdoor activities ', therefore, the intelligent equipment outputs the answer 'clear weather and suitable for outdoor activities' to the user, and the voice question-answering operation can be completed at the same time.
As can be seen from the above description, if the voice question-answering operation is to be accurately completed, the accuracy of text matching needs to be improved, and only when the corresponding preset text is accurately matched, the answer corresponding to the preset text can be output, so that the voice question-answering operation can be accurately completed. However, in the prior art, when text matching is performed, a CLS flag is added to the first position of a text to be matched, the added text to be matched is input to a pre-training model, and then a vector of the last layer of CLS flag in an output result of the pre-training model is determined as a target vector for describing the text to be matched, which may result in low accuracy of text matching.
In order to improve the accuracy of text matching, the text to be matched can be accurately described by considering the dependency relationship among the vocabularies in the text to be matched, so that when a target vector for describing the text to be matched is determined, the target vector for describing the text to be matched can be determined together by combining the dependency relationship among the vocabularies in the text to be matched on the basis of obtaining the vector representation corresponding to the text to be matched through a pre-training model, and the accuracy of the vector representation for describing the text to be matched is improved; therefore, the matching result of the text to be matched is determined according to the target vector representation with higher accuracy, and the accuracy of the text matching result can be effectively improved.
Based on the above technical concept, the embodiment of the present application provides a text matching method, and the text matching method provided by the present application will be described in detail through specific embodiments. It is to be understood that the following detailed description may be combined with other embodiments, and that the same or similar concepts or processes may not be repeated in some embodiments.
Fig. 2 is a flowchart illustrating a text matching method according to an embodiment of the present disclosure, where the text matching method may be executed by software and/or a hardware device, for example, the hardware device may be a text matching device, and the text matching device may be a terminal or a server. For example, referring to fig. 2, the text matching method may include:
s201, obtaining a text to be matched.
For example, the text to be matched may be "hello, ask for a question about how is the weather on day? "the service a is not only simple in operation but also fast in arrival", and may be specifically set according to actual needs, in this embodiment of the present application, a text to be matched is described as an example of "the service a is not only simple in operation but also fast in arrival", but the embodiment of the present application is not limited to this.
For example, when a text to be matched is obtained, if the text is in an intelligent voice question-and-answer scene, a voice to be matched may be collected first, and the voice to be matched is converted into a corresponding text to be matched; or, the text to be matched obtained by converting the voice to be matched sent by the acquisition device can be directly received, so that the text to be matched is obtained. If in a text scene, a text to be matched input by a user can be directly received, so that the text to be matched is obtained, and the setting can be specifically performed according to actual needs.
After the text to be matched is obtained, the following S202 may be executed:
s202, inputting the text to be matched into a pre-training model to obtain a first vector representation corresponding to the text to be matched, and determining a second vector representation corresponding to the text to be matched according to the dependency relationship among words in the text to be matched.
For example, the dependency relationships between vocabularies may include 15 kinds of relationships, such as a subject-to-verbs (SBV), a live-object (VOB), and objects in the live-object relationship are direct objects, indirect-object (IOB), and indirect objects in the live-object relationship are indirect objects, front-object (FOB), bilingual (DBL) centering (ATT), simple (ADV ), supplementary (CMP), Coordinate (COO), merge-object (POB), Left Append (LAD), right append (right), independent (index, IS), and core (read).
Taking the text to be matched as "a service is not only simple in operation but also fast in account", the dependency relationship between the vocabularies in the text to be matched can be seen in the following table 1:
TABLE 1
Figure BDA0003118238200000061
Figure BDA0003118238200000071
As can be seen from table 1, the text "a service" to be matched is not only simple in operation, but also includes 7 words and one punctuation mark, where the 7 words are "a service", "not only", "operation", "simple", "and", "account" and "fast", respectively, where there is an SBV relationship between "a service" and "simple", there is an ADV relationship between "operation" and "simple", where "simple" is a core word, and is represented by 0, there is a punctuation mark ", there is a WP relationship between" and "simple", and there is an ADV relationship between "and" fast ", there is an ADV relationship between" account "and" fast ", and there is a COO relationship between" fast "and" simple ".
For example, when the first vector representation corresponding to the text to be matched is determined through the pre-training model, in a possible implementation manner, referring to related descriptions in the prior art, a CLS flag may be added to the first bit of the text to be matched, the added text to be matched is input to the pre-training model, and the vector of the last layer of CLS flag in the output result of the pre-training model is determined as the first vector representation for describing the text to be matched, so as to obtain the first vector representation of the text to be matched.
In another possible implementation manner, in order to further improve the accuracy of the obtained first vector representation, a CLS flag bit is not required to be added to the first position of the text to be matched, but the text to be matched is directly input into a pre-training model, so that an output result corresponding to the text to be matched is obtained; and then carrying out weighted average on vectors of the last two layers in the output result, and determining the weighted average result as a first vector representation for describing the text to be matched, so that the first vector representation of the text to be matched is determined through the vectors of the last two layers in the output result, and the accuracy of the obtained first vector representation can be improved.
After the first vector representation corresponding to the text to be matched and the second vector representation corresponding to the text to be matched are respectively obtained, the target vector representation corresponding to the text to be matched can be determined by combining the first vector representation and the second vector representation, that is, the following S203 is executed:
s203, determining target vector representation corresponding to the text to be matched according to the first vector representation and the second vector representation.
It can be understood that, in the embodiment of the present application, precisely because the dependency relationship between the vocabularies in the text to be matched is considered, on the basis of the first vector representation, the target vector representation corresponding to the text to be matched is determined in combination with the second vector representation, so that the accuracy of the target vector representation can be improved; therefore, the accuracy of the text matching result can be further improved when the matching result of the text to be matched is determined according to the target vector representation with higher accuracy.
And S204, determining a matching result of the text to be matched according to the target vector representation.
For example, when determining the matching result of the text to be matched according to the target vector representation, the cosine similarity between the target vector representation and the preset vector representation corresponding to the preset text may be determined first; determining a matching result of the text to be matched according to the cosine similarity; the matching result comprises matching of the text to be matched and a preset text; or the text to be matched is not matched with the preset text.
It should be noted that, in the embodiment of the present application, when determining the cosine similarity between the target vector representation and the preset vector representation corresponding to the preset text, the preset vector representation corresponding to the preset text may be predetermined and stored, so that after determining the target vector representation corresponding to the text to be matched, the cosine similarity between the target vector representation and the preset vector representation corresponding to the preset text may be directly determined, so as to improve the efficiency of text matching. Certainly, if the efficiency of text matching is not considered, the preset text may be predetermined and stored, and the preset vector representation corresponding to the preset text is determined while the target vector representation corresponding to the text to be matched is determined, and then the cosine similarity between the target vector representation and the preset vector representation corresponding to the preset text is determined.
It can be understood that, in the embodiment of the present application, when determining the preset vector representation corresponding to the preset text, the determining method of the preset vector representation corresponding to the text to be matched is similar to that in the embodiment of the present application, the preset text may be input into the pre-training model to obtain a first vector representation corresponding to the preset text, and according to the dependency relationship between words and phrases in the preset text, a second vector representation corresponding to the preset text is determined, and then the target vector representation corresponding to the preset text is determined in combination with the first vector representation and the second vector representation, which may refer to the related description of the determining method of the target vector representation corresponding to the text to be matched, and is not described herein again.
In the embodiment of the application, when text matching is performed, a text to be matched is input into a pre-training model to obtain a first vector representation corresponding to the text to be matched, a second vector representation corresponding to the text to be matched is determined according to the dependency relationship among the words in the text to be matched, and a target vector representation corresponding to the text to be matched is determined by combining the first vector and the second vector representation, so that the dependency relationship among the words in the text to be matched is fully considered, and the accuracy of the vector representation for describing the text to be matched is improved; therefore, the matching result of the text to be matched is determined according to the target vector representation with higher accuracy, and the accuracy of the text matching result is improved.
In conjunction with the embodiment shown in fig. 2, to facilitate understanding how to determine the second vector representation corresponding to the text to be matched according to the dependency relationship between the words in the text to be matched in S203, the following description will be made in detail on how to determine the second vector representation corresponding to the text to be matched according to the dependency relationship between the words in the text to be matched by using the embodiment shown in fig. 3.
Fig. 3 is a flowchart illustrating a method for determining a second vector representation corresponding to a text to be matched according to an embodiment of the present application, where the method for determining the second vector representation corresponding to the text to be matched may be executed by a software and/or a hardware device, for example, the hardware device may also be a text matching device. For example, referring to fig. 3, the method for determining the second vector representation corresponding to the text to be matched may include:
s301, inputting the text to be matched into a pre-trained dependency syntax analysis model to obtain the dependency relationship among the vocabularies in the text to be matched.
The dependency syntax analysis model is mainly used for determining, analyzing and determining the dependency relationship among vocabularies, and the dependency syntax analysis model has the input of texts and the output of the texts as the dependency relationship among the vocabularies in the texts.
For example, when the dependency parsing model is trained in advance, a large amount of training sample data sets need to be acquired, and in view of easy acquisition of the published text data sets, a part of the published text data sets may be acquired as the training sample data.
After a large number of training sample data sets are obtained, word segmentation and part-of-speech tagging may be performed on each sample data in the training sample data sets, for example, when the word segmentation and part-of-speech tagging are performed, a self-training model or a meta-open tool, such as a jieba tool, may be used to perform word segmentation and part-of-speech tagging on each sample data, and taking sample data as "simple operation" as an example, after the word segmentation and part-of-speech tagging are performed, sample data "simple operation (v) (a)" is obtained, where v represents a verb and a represents an adjective.
Then, the processed training sample data set is labeled, for example, the labeling standard of a Language Technology Platform (LTP) is referred to for labeling, so as to obtain a labeled training sample data set, where the labeled training sample data set includes the dependency relationship between words in each training sample data set. Taking a training sample data in a training sample data set as "a business is not only simple to operate, but also accounts for fast" as an example, the dependency relationship between words in the training sample data can be shown in table 1, where an SBV relationship exists between "a business" and "simple", an ADV relationship exists between "simple" and "operation", a core word "simple" is represented by 0, a punctuation mark, a WP relationship exists between "simple" and "fast", an ADV relationship exists between "accounts for" fast ", and a COO relationship exists between" fast "and" simple ".
After the labeled training sample data set is obtained, a dependency syntax analysis model can be trained by using the labeled training sample data set, and the dependency syntax analysis model can adopt a structure of a Transformer-multilayer perceptron-double affine transformation network, and the training process is as follows:
a) vectorizing each vocabulary in each training sample data set after the labeling to obtain vector representation of each vocabulary, wherein the vector representation of each vocabulary comprises a word vector, a part of speech vector and a position vector; namely;
Figure BDA0003118238200000101
wherein, Ew represents a word vector of a vocabulary, and can be obtained by word2vec, Glove, ELMO, bert and other models, Et represents a part-of-speech vector of the vocabulary, and is initialized randomly during training, and Ep represents a position vector of the vocabulary, and is obtained by a sine function and a cosine function;
Figure BDA0003118238200000102
representing a concate operation between vectors.
b) Inputting the vectorized training sample data into a Transformer model for feature extraction, and vectorizing the training sample data substantially;
c) respectively inputting the output of the Transformer model into two multilayer perceptron networks, wherein the two multilayer perceptron networks can output two different vector representations Rh and Rd, each word vector in Rh is a vocabulary and is used as a vector representation of head in a pair of dependency relationships, and each word vector in Rd is a vocabulary and is used as a vector representation of dependent in a pair of dependency relationships;
d) inputting two different vector representations Rh and Rd into a double affine transformation network layer, obtaining a score matrix S through double affine transformation, and predicting to obtain the dependency relationship among vocabularies in training sample data through a maximum spanning tree algorithm;
e) and comparing the predicted label of the dependency relationship between the vocabularies in the training sample data with the real label of the dependency relationship between the vocabularies in the training sample data, calculating the error between the two labels, updating the parameters of the model through a back propagation algorithm until the dependency syntactic analysis model is converged, and obtaining the dependency syntactic analysis model which is the trained dependency syntactic analysis model.
Therefore, after the dependency syntax analysis model is trained, the text to be matched can be directly input into the pre-trained dependency syntax analysis model in the text matching process, and the dependency relationship among the vocabularies in the text to be matched is obtained. After the dependency relationship among the vocabularies in the text to be matched is determined, which vocabularies in the text to be matched are the core vocabularies and which vocabularies are the non-core vocabularies can be accurately determined according to the dependency relationship among the vocabularies. Since the core vocabulary generally can describe the content of the text to be matched more accurately, and the non-core vocabulary has less influence on the text to be matched, it is necessary to determine at least one core vocabulary in the text to be matched according to the dependency relationship between the vocabularies in the text to be matched, that is, the following S302 is performed:
s302, determining at least one core vocabulary in the text to be matched according to the dependency relationship among the vocabularies in the text to be matched.
With reference to the above description in S202, continuing to take the example that the text to be matched "a service is not only simple in operation, but also fast in account", the dependency relationship between the words in the text to be matched includes: an SBV relationship exists between ' A business ' and ' simple ', an ADV relationship exists between ' operation ' and ' simple ', a core vocabulary is ' simple ', and is represented by 0, a punctuation mark ', a WP relationship exists between ' A business ' and ' simple ', an ADV relationship exists between ' A business ' and ' simple ', and a COO relationship exists between ' A business ' and ' simple '.
According to the dependency relationship among the 8 vocabularies, it can be determined that the text to be matched comprises five vocabularies of "A business", "operation", "simple", "tie-out" and "fast" as core vocabularies, and the two vocabularies of "not only" and "but also" are non-core vocabularies.
It can be seen that, the five words, i.e. the core word "a service", "operation", "simple", "account" and "fast", are core words, and can accurately describe the content of the text to be matched, but the non-core words "not only" and "but also" have less influence on the text to be matched, so that the second vector representation corresponding to the text to be matched can be determined according to the vector representation corresponding to the core words, that is, the following S303 is executed:
s303, according to the vector representation corresponding to each core vocabulary in at least one core vocabulary, determining a second vector representation.
For example, when determining the second vector representation according to the vector representation corresponding to each core vocabulary in the at least one core vocabulary, two cases may be included:
in one case, when the number of the core vocabulary is one, the vector representation corresponding to the core vocabulary can be directly determined as the second vector representation corresponding to the text to be matched.
In another case, when the number of the core vocabularies is at least two, the vector representations corresponding to the core vocabularies can be comprehensively considered, and the second vector representations corresponding to the texts to be matched are jointly determined. The specific process is as follows: the vector representations corresponding to the core vocabularies may be weighted and averaged first, and the weighted and averaged result may be determined as the second vector representation corresponding to the text to be matched.
It can be seen that, in the embodiment of the present application, when determining the second vector representation corresponding to the text to be matched according to the dependency relationship between the words in the text to be matched, the text to be matched may be first input into a pre-trained dependency syntax analysis model to obtain the dependency relationship between the words in the text to be matched; in view of the fact that the core vocabulary in the text can generally describe the content of the text to be matched more accurately, and the influence of the non-core vocabulary on the text to be matched is small, at least one core vocabulary in the text to be matched can be determined according to the dependency relationship among the vocabularies in the text to be matched; and determining the second vector representation according to the vector representation corresponding to each core word in the at least one core word, so that the accuracy of the second vector representation can be improved.
After the second vector representation corresponding to the text to be matched is determined according to the dependency relationship among the words in the text to be matched, the target vector representation corresponding to the text to be matched can be determined jointly by combining the second vector representation on the basis of the first vector representation corresponding to the text to be matched. For example, when the target vector representation corresponding to the text to be matched is determined according to the first vector representation and the second vector representation, the first vector representation and the second vector representation may be subjected to splicing processing to obtain a spliced vector representation; and the vector representation after the splicing processing is determined as target vector representation.
For example, when the first vector representation and the second vector representation are subjected to the stitching process, the first vector representation may be considered as a first half of the stitching and the second vector representation may be considered as a second half of the stitching, which is stitched behind the first vector representation; the second vector can also be represented as the first half part of the splicing, the first vector can be represented as the second half part of the splicing, and the second vector is spliced behind the second vector, so that the spliced vector is represented and determined as the target vector representation, and thus when the matching result of the text to be matched is determined according to the target vector representation, the accuracy of the vector representation for describing the text to be matched is improved; therefore, the matching result of the text to be matched is determined according to the target vector representation with higher accuracy, and the accuracy of the text matching result is improved.
Based on any embodiment, after the matching result of the text to be matched is determined, if the matching result is that the text to be matched is matched with the preset text, the response information corresponding to the preset text can be further determined; and outputs corresponding response information.
Taking the application to an intelligent voice question-and-answer scene as an example, a user inputs "do you like, ask about how much weather on day? If the intelligent device has sufficient computing power, after the voice information input by the user is collected, the voice information can be converted into a text to be matched to determine whether a preset text matched with the text to be matched exists in the question and answer library, if the preset text 'the open weather is determined to exist in the question and answer library through text matching, a response message corresponding to the preset text, namely an answer corresponding to the preset text information' the open weather is determined to be 'the open weather is clear and suitable for outdoor activities', and the answer 'the open weather is clear and suitable for outdoor activities' is output to the user to complete the voice question and answer operation of the time.
Fig. 4 is a schematic structural diagram of a text matching apparatus 40 according to an embodiment of the present application, and for example, please refer to fig. 4, the text matching apparatus 40 may include:
an obtaining unit 401, configured to obtain a text to be matched.
The processing unit 402 is configured to input the text to be matched into the pre-training model, obtain a first vector representation corresponding to the text to be matched, and determine a second vector representation corresponding to the text to be matched according to a dependency relationship between words in the text to be matched.
A determining unit 403, configured to determine, according to the first vector representation and the second vector representation, a target vector representation corresponding to the text to be matched, and determine, according to the target vector representation, a matching result of the text to be matched.
Optionally, the processing unit 402 is specifically configured to input the text to be matched into a pre-trained dependency syntax analysis model, so as to obtain a dependency relationship between words in the text to be matched; determining at least one core vocabulary in the text to be matched according to the dependency relationship among the vocabularies in the text to be matched; and determining a second vector representation according to the vector representation corresponding to each core word in the at least one core word.
Optionally, the number of the core vocabularies is at least two, and the processing unit 402 is specifically configured to perform weighted average on the vector representations corresponding to the core vocabularies; the weighted average result is determined as the second vector representation.
Optionally, the determining unit 403 is specifically configured to perform splicing processing on the first vector representation and the second vector representation to obtain a vector representation after the splicing processing; and determining the vector representation after the splicing processing as a target vector representation.
Optionally, the processing unit 402 is specifically configured to input the text to be matched into the pre-training model, so as to obtain an output result corresponding to the text to be matched; and carrying out weighted average on vectors of the last two layers in the output result to obtain a first vector representation.
Optionally, the determining unit 403 is specifically configured to determine cosine similarity between the target vector representation and a preset vector representation corresponding to a preset text; determining a matching result of the text to be matched according to the cosine similarity; the matching result comprises matching of the text to be matched and a preset text; or the text to be matched is not matched with the preset text.
Optionally, the text matching apparatus 40 further includes an output unit 404.
The determining unit 403 is further configured to determine response information corresponding to the preset text if the matching result is that the text to be matched is matched with the preset text.
And an output unit 404 for outputting the response information.
The text matching device 40 provided in the embodiment of the present application can execute the technical solution of the text matching method in any one of the above embodiments, and the implementation principle and the beneficial effect thereof are similar to those of the text matching method, and reference may be made to the implementation principle and the beneficial effect of the text matching method, which is not described herein again.
Fig. 5 is a schematic structural diagram of an electronic device 50 according to an embodiment of the present application, for example, please refer to fig. 5, where the electronic device may include: a memory 501, a processor 502 and a model parameter determination program stored on the memory 501 and executable on the processor 502, which when executed by the processor 502 implements the steps of the model parameter determination method as described in any of the previous embodiments.
Alternatively, the memory 501 may be separate or integrated with the processor 502.
The electronic device 50 shown in the embodiment of the present application can execute the technical solution of the text matching method in any embodiment, and the implementation principle and the beneficial effect thereof are similar to those of the text matching method, which can be referred to as the implementation principle and the beneficial effect of the text matching method, and are not described herein again.
An embodiment of the present application further provides a computer-readable storage medium, where a text matching program is stored in the computer-readable storage medium, and the text matching program is executed by a processor to implement the steps of the text matching method according to any of the above embodiments, and the implementation principle and the beneficial effects of the text matching method are similar to those of the text matching method, which can be referred to as the implementation principle and the beneficial effects of the text matching method, and are not described herein again.
The embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the technical solution of the text matching method in any of the embodiments is implemented, and the implementation principle and the beneficial effect of the computer program are similar to those of the text matching method, which can be referred to as the implementation principle and the beneficial effect of the text matching method, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules is only one logical division, and other divisions may be realized in practice, for example, a plurality of modules may be combined or integrated into another system, or some features may be omitted, or not executed.
The integrated module implemented in the form of a software functional module may be stored in a computer-readable storage medium. The software functional module is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor to execute some steps of the methods described in the embodiments of the present application.
It should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile storage NVM, such as at least one disk memory, and may also be a usb disk, a removable hard disk, a read-only memory, a magnetic or optical disk, etc.
The storage medium may be implemented by any type or combination of volatile or non-volatile memory devices, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the storage medium may reside as discrete components in an electronic device or host device.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (11)

1. A text matching method, comprising:
acquiring a text to be matched;
inputting the text to be matched into a pre-training model to obtain a first vector representation corresponding to the text to be matched, and determining a second vector representation corresponding to the text to be matched according to the dependency relationship among words in the text to be matched;
determining a target vector representation corresponding to the text to be matched according to the first vector representation and the second vector representation;
and determining a matching result of the text to be matched according to the target vector representation.
2. The method according to claim 1, wherein the determining, according to the dependency relationship between words and phrases in the text to be matched, the second vector representation corresponding to the text to be matched includes:
inputting the text to be matched into a pre-trained dependency syntax analysis model to obtain the dependency relationship among the vocabularies in the text to be matched;
determining at least one core vocabulary in the text to be matched according to the dependency relationship among the vocabularies in the text to be matched;
and determining the second vector representation according to the vector representation corresponding to each core vocabulary in the at least one core vocabulary.
3. The method of claim 2, wherein the number of core words is at least two, and wherein determining the second vector representation based on the vector representations corresponding to each of the at least one core word comprises:
carrying out weighted average on vector representations corresponding to the core vocabularies;
determining a weighted average result as the second vector representation.
4. The method according to any one of claims 1-3, wherein the determining a target vector representation corresponding to the text to be matched according to the first vector representation and the second vector representation comprises:
splicing the first vector representation and the second vector representation to obtain a spliced vector representation;
and determining the vector representation after the splicing processing as the target vector representation.
5. The method according to any one of claims 1 to 3, wherein the inputting the text to be matched into a pre-training model to obtain a first vector representation corresponding to the text to be matched comprises:
inputting the text to be matched into the pre-training model to obtain an output result corresponding to the text to be matched;
and carrying out weighted average on vectors of the last two layers in the output result to obtain the first vector representation.
6. The method according to any one of claims 1-3, wherein the determining the matching result of the text to be matched according to the target vector representation comprises:
determining cosine similarity between the target vector representation and a preset vector representation corresponding to a preset text;
determining a matching result of the text to be matched according to the cosine similarity; the matching result comprises that the text to be matched is matched with the preset text; or the text to be matched is not matched with the preset text.
7. The method of claim 6, further comprising:
if the matching result is that the text to be matched is matched with the preset text, determining response information corresponding to the preset text;
and outputting the response information.
8. A text matching apparatus, comprising:
the acquiring unit is used for acquiring a text to be matched;
the processing unit is used for inputting the text to be matched into a pre-training model to obtain a first vector representation corresponding to the text to be matched, and determining a second vector representation corresponding to the text to be matched according to the dependency relationship among the vocabularies in the text to be matched;
and the determining unit is used for determining a target vector representation corresponding to the text to be matched according to the first vector representation and the second vector representation, and determining a matching result of the text to be matched according to the target vector representation.
9. An electronic device, characterized in that the electronic device comprises: a memory, a processor and a text matching program stored on the memory and executable on the processor, the text matching program when executed by the processor implementing the steps of the text matching method according to any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored thereon a text matching program which, when executed by a processor, implements the steps of the text matching method according to any one of claims 1 to 7.
11. A computer program product comprising a computer program, characterized in that the computer program realizes the text matching method of any of claims 1 to 7 when executed by a processor.
CN202110669568.2A 2021-06-16 2021-06-16 Text matching method, device, equipment and storage medium Pending CN113297354A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110669568.2A CN113297354A (en) 2021-06-16 2021-06-16 Text matching method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110669568.2A CN113297354A (en) 2021-06-16 2021-06-16 Text matching method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113297354A true CN113297354A (en) 2021-08-24

Family

ID=77328582

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110669568.2A Pending CN113297354A (en) 2021-06-16 2021-06-16 Text matching method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113297354A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710744A (en) * 2018-12-28 2019-05-03 合肥讯飞数码科技有限公司 A kind of data matching method, device, equipment and storage medium
CN110263144A (en) * 2019-06-27 2019-09-20 深圳前海微众银行股份有限公司 A kind of answer acquisition methods and device
CN111125335A (en) * 2019-12-27 2020-05-08 北京百度网讯科技有限公司 Question and answer processing method and device, electronic equipment and storage medium
CN111274358A (en) * 2020-01-20 2020-06-12 腾讯科技(深圳)有限公司 Text processing method and device, electronic equipment and storage medium
US20200279018A1 (en) * 2019-03-01 2020-09-03 Rakuten, Inc. Sentence extraction system, sentence extraction method, and information storage medium
CN111930894A (en) * 2020-08-13 2020-11-13 腾讯科技(深圳)有限公司 Long text matching method and device, storage medium and electronic equipment
CN112182167A (en) * 2020-11-06 2021-01-05 平安科技(深圳)有限公司 Text matching method and device, terminal equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109710744A (en) * 2018-12-28 2019-05-03 合肥讯飞数码科技有限公司 A kind of data matching method, device, equipment and storage medium
US20200279018A1 (en) * 2019-03-01 2020-09-03 Rakuten, Inc. Sentence extraction system, sentence extraction method, and information storage medium
CN110263144A (en) * 2019-06-27 2019-09-20 深圳前海微众银行股份有限公司 A kind of answer acquisition methods and device
CN111125335A (en) * 2019-12-27 2020-05-08 北京百度网讯科技有限公司 Question and answer processing method and device, electronic equipment and storage medium
CN111274358A (en) * 2020-01-20 2020-06-12 腾讯科技(深圳)有限公司 Text processing method and device, electronic equipment and storage medium
CN111930894A (en) * 2020-08-13 2020-11-13 腾讯科技(深圳)有限公司 Long text matching method and device, storage medium and electronic equipment
CN112182167A (en) * 2020-11-06 2021-01-05 平安科技(深圳)有限公司 Text matching method and device, terminal equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108763510B (en) Intention recognition method, device, equipment and storage medium
CN110263150B (en) Text generation method, device, computer equipment and storage medium
JP6909832B2 (en) Methods, devices, equipment and media for recognizing important words in audio
CN111695352A (en) Grading method and device based on semantic analysis, terminal equipment and storage medium
CN111046133A (en) Question-answering method, question-answering equipment, storage medium and device based on atlas knowledge base
CN110162780B (en) User intention recognition method and device
CN107221328B (en) Method and device for positioning modification source, computer equipment and readable medium
CN111738016A (en) Multi-intention recognition method and related equipment
CN111783471B (en) Semantic recognition method, device, equipment and storage medium for natural language
CN113408287B (en) Entity identification method and device, electronic equipment and storage medium
CN112101003B (en) Sentence text segmentation method, device and equipment and computer readable storage medium
CN114218945A (en) Entity identification method, device, server and storage medium
CN111160026B (en) Model training method and device, and text processing method and device
CN113326702A (en) Semantic recognition method and device, electronic equipment and storage medium
CN115509485A (en) Filling-in method and device of business form, electronic equipment and storage medium
CN110633724A (en) Intention recognition model dynamic training method, device, equipment and storage medium
CN113220854B (en) Intelligent dialogue method and device for machine reading and understanding
CN113535925B (en) Voice broadcasting method, device, equipment and storage medium
CN110750626B (en) Scene-based task-driven multi-turn dialogue method and system
CN116522905B (en) Text error correction method, apparatus, device, readable storage medium, and program product
CN112669850A (en) Voice quality detection method and device, computer equipment and storage medium
CN116186219A (en) Man-machine dialogue interaction method, system and storage medium
CN115691503A (en) Voice recognition method and device, electronic equipment and storage medium
CN115859121A (en) Text processing model training method and device
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination