WO2022127041A1 - 相似句匹配方法、装置、计算机设备及存储介质 - Google Patents

相似句匹配方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022127041A1
WO2022127041A1 PCT/CN2021/097099 CN2021097099W WO2022127041A1 WO 2022127041 A1 WO2022127041 A1 WO 2022127041A1 CN 2021097099 W CN2021097099 W CN 2021097099W WO 2022127041 A1 WO2022127041 A1 WO 2022127041A1
Authority
WO
WIPO (PCT)
Prior art keywords
vector
sentence
tested
value
layer
Prior art date
Application number
PCT/CN2021/097099
Other languages
English (en)
French (fr)
Inventor
宋青原
王健宗
吴天博
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022127041A1 publication Critical patent/WO2022127041A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3347Query execution using vector based model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3334Selection or weighting of terms from queries, including natural language queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a similar sentence matching method, apparatus, computer equipment and storage medium.
  • the embodiments of the present application provide a similar sentence matching method, apparatus, computer equipment and storage medium, which aim to solve the problem of low accuracy of the existing similar sentence matching methods.
  • an embodiment of the present application provides a similar sentence matching method
  • the twin network model includes a multi-layer encoder and a multi-layer inference module
  • the similar sentence matching method includes:
  • the first sentence to be tested and the second sentence to be tested are respectively converted into the first vector and the second vector by the preset word vector training tool;
  • the first vector is encoded by the multi-layer encoder to obtain a third vector
  • the second vector is encoded by the multi-layer encoder to obtain a fourth vector
  • the matching result between the first sentence to be tested and the second sentence to be tested is determined according to the probability value.
  • an embodiment of the present application also provides a similar sentence matching device, the twin network model includes a multi-layer encoder and a multi-layer inference module, and the similar sentence matching device includes:
  • a conversion unit which is used to convert the first sentence to be tested and the second sentence to be tested into a first vector and a second vector through a preset word vector training tool;
  • a first encoding unit configured to encode the first vector by the multi-layer encoder to obtain a third vector, and to encode the second vector by the multi-layer encoder to obtain a fourth vector;
  • an interaction processing unit configured to perform information interaction processing on the third vector and the fourth vector through the multi-layer reasoning module to obtain a fifth vector
  • a calculation unit used for calculating the global draw value of the fifth vector
  • a normalization processing unit configured to perform normalization processing on the global draw value to obtain a probability value
  • a judgment unit configured to judge the matching result between the first sentence to be tested and the second sentence to be tested according to the probability value.
  • an embodiment of the present application further provides a computer device, the computer device includes a memory and a processor, the memory stores a computer program, and the processor is configured to run the computer program to perform the following steps :
  • the first sentence to be tested and the second sentence to be tested are respectively converted into the first vector and the second vector by the preset word vector training tool;
  • the first vector is encoded by the multi-layer encoder to obtain a third vector
  • the second vector is encoded by the multi-layer encoder to obtain a fourth vector
  • the matching result between the first sentence to be tested and the second sentence to be tested is judged according to the probability value, wherein the twin network model includes a multi-layer encoder and a multi-layer reasoning module.
  • embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, wherein, when the computer program is executed by a processor, the processor executes the following steps :
  • the first sentence to be tested and the second sentence to be tested are respectively converted into the first vector and the second vector by the preset word vector training tool;
  • the first vector is encoded by the multi-layer encoder to obtain a third vector
  • the second vector is encoded by the multi-layer encoder to obtain a fourth vector
  • the matching result between the first sentence to be tested and the second sentence to be tested is judged according to the probability value, wherein the twin network model includes a multi-layer encoder and a multi-layer reasoning module.
  • Embodiments of the present invention provide a similar sentence matching method, device, computer equipment and storage medium.
  • the method makes full use of sentences by performing information interaction processing on the third vector and the fourth vector and calculating the global average value of the fifth vector. information, which improves the accuracy of matching similar sentences.
  • FIG. 1 is a schematic block diagram of a twin network model provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a similar sentence matching method provided by an embodiment of the present application.
  • FIG. 3 is a schematic flowchart of a similar sentence matching method provided by another embodiment of the present application.
  • FIG. 4 is a schematic sub-flow diagram of a similar sentence matching method provided by an embodiment of the present application.
  • FIG. 5 is a schematic block diagram of a similar sentence matching apparatus provided by an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the term “if” may be contextually interpreted as “when” or “once” or “in response to determining” or “in response to detecting” .
  • the phrases “if it is determined” or “if the [described condition or event] is detected” may be interpreted, depending on the context, to mean “once it is determined” or “in response to the determination” or “once the [described condition or event] is detected. ]” or “in response to detection of the [described condition or event]”.
  • FIG. 1 is a schematic block diagram of a twin network model 200 provided by an embodiment of the present application.
  • the Siamese network model 200 includes two multi-layer encoders 201 and two multi-layer inference modules 202, wherein the two multi-layer encoders 201 are in a parallel relationship, and the two multi-layer inference modules 202 are in a parallel relationship. Information exchange.
  • FIG. 2 is a schematic flowchart of a similar sentence matching method provided by an embodiment of the present application.
  • This application can be applied to scenarios of smart government affairs/smart city management/smart community/smart security/smart logistics/smart medical care/smart education/smart environmental protection/smart transportation, so as to promote the construction of smart cities.
  • the method includes the following steps S1-S6.
  • the first sentence to be tested and the second sentence to be tested are respectively converted into a first vector and a second vector by a preset word vector training tool.
  • a word vector is a way to mathematically quantify words in a language. As the name suggests, a word vector is to represent a word as a vector.
  • word2vec is used as a word vector training tool, and word vector training is performed on the words in the second word segmentation sequence through word2vec to obtain an input word vector sequence.
  • word2vec is a word vector training tool whose function is to convert words in natural language into word vectors that can be understood by computers.
  • Traditional word vector training tools are easily plagued by the disaster of dimensionality, and any two words are isolated and cannot reflect the relationship between words. Therefore, in this embodiment, word2vec is used to train word vectors, which can be obtained through Calculate the distance between vectors to reflect the similarity between words.
  • the first sentence to be tested is: spring/flower/true/beautiful.
  • the word vector of "spring” is M11
  • the word vector of "flower” is M12
  • the word vector of "true” is M13
  • the beautiful word vector is M14
  • the first vector is (M11, M12, M13 , M14).
  • word2vec is only an example of a word vector training tool provided in this application, and those skilled in the art may also use other word vector training tools, which will not go beyond the protection scope of this application.
  • the first vector is encoded by the multi-layer encoder to obtain the third vector
  • the second vector is encoded by the multi-layer encoder to obtain the fourth vector
  • the third vector is obtained by encoding the first vector by the multi-layer encoder.
  • each row vector corresponding to the first vector has three representations Q (Query Vector), K (Key Vector) and V (Value Vector), wherein Q is a query vector matrix, K is a key vector matrix, and V is a matrix of value vectors. Assuming that the dimension of each row vector is 8, if the dimension of these representations is 5, then all three representations are 8 ⁇ 5 matrices.
  • three representations M Q , M K and M V of the first vector can be obtained. Specifically, in one embodiment, multiply each row vector of the first vector by the Q query vector matrix to obtain the M Q query vector matrix of the first sentence to be tested, and compare each row vector of the first vector with the K key vector matrix.
  • the key vector matrix of the first sentence to be tested is obtained by multiplying MK
  • the value vector matrix of the first sentence to be tested is obtained by multiplying each row vector of the first vector with the V value vector matrix.
  • the self-attention score (Attention Scores) of the first vector is calculated to obtain a third vector.
  • the self-attention value of the first vector is The third vector is obtained by multiplying it by the value vector matrix MV of the first sentence to be tested.
  • MQ is the query vector matrix of the first sentence to be tested
  • MK is the key vector matrix of the first sentence to be tested
  • MV is the matrix of value vectors of the first sentence to be tested
  • M is the first sentence to be tested
  • d 1 is the dimension of the multi-layer encoder network layer.
  • a fourth vector is obtained by encoding the second vector by the multi-layer encoder.
  • each row vector corresponding to the second vector has three representations Q, K and V, where Q is a query vector matrix, K is a key vector matrix, and V is a value vector matrix.
  • each row vector of the second vector is multiplied by the Q query vector matrix to obtain an N Q query vector matrix of the second sentence to be tested, and each row vector of the second vector is compared with the K key vector matrix. Multiply to obtain the key vector matrix of the N K second sentence to be tested, and multiply each row vector of the second vector with the V value vector matrix to obtain the N V value vector matrix of the second sentence to be tested.
  • the self-attention value of the second vector is The fourth vector is obtained by multiplying it by the value vector matrix NV of the second sentence to be tested.
  • N Q is the query vector matrix of the second sentence to be tested
  • N K is the key vector matrix of the second sentence to be tested
  • NV is the value vector matrix of the second sentence to be tested
  • N is the second sentence to be tested
  • d 1 is the dimension of the multi-layer encoder network layer.
  • the third vector and the fourth vector are subjected to information interaction processing by the multi-layer reasoning module to obtain the fifth vector.
  • the key vector matrix MK and the value vector matrix MV of the first sentence to be tested are replaced with the key vector matrix NK and the value vector matrix NV of the second sentence to be tested, and then the obtained Self-attention mechanism equation for information interaction.
  • a fifth vector is obtained by calculating the attention value of the information interaction between the third vector and the fourth vector. It should be noted that the attention value of the information interaction between the third vector and the fourth vector is The fifth vector is obtained by multiplying it by the value vector matrix NV of the second sentence to be tested.
  • the self-attention mechanism equation of information interaction enables better information interaction between the first sentence to be tested and the second sentence to be tested, and provides a more reliable basis for the matching results of sentences, thereby improving the matching accuracy of similar sentences.
  • M Q is the query vector matrix of the first sentence to be tested
  • N K is the key vector matrix of the second sentence to be tested
  • NV is the value vector matrix of the second sentence to be tested
  • M is the first sentence to be tested
  • N is The second sentence to be tested
  • d 2 is the dimension of the network layer of the multi-layer reasoning module.
  • the global average value of the fifth vector is calculated, and the multi-layer inference module in this embodiment includes a multi-layer inference network, and each layer of the inference network calculates the information interaction between the third vector and the fourth vector. attention value.
  • step S5 only the attention value output by the last layer of the inference network of the multi-layer inference module is normalized in step S5 to obtain the probability value as the result of similar sentence matching.
  • the influence of the attention value output by other inference networks on the probability value reduces the accuracy of matching similar sentences.
  • the attention value output by each layer of the inference network of the multi-layer inference module participates in the calculation of the probability value, thereby greatly improving the matching accuracy of similar sentences.
  • step S4 specifically includes: S41-S42.
  • the sum of the attention value of the interaction between the third vector and the fourth vector information is calculated, so that the inference network of each layer is The output can participate in the calculation of the probability value of the next step.
  • the sum of the attention values of the information interaction between the third vector and the fourth vector is averaged to obtain the global average value of the fifth vector.
  • the average value of the attention value of the information interaction between the three vectors and the fourth vector is multiplied by the value vector matrix of the second sentence to be tested to obtain the global average value of the fifth vector.
  • the output information of each layer of inference network is fully utilized to ensure maximum utilization of information.
  • the global draw value is normalized to obtain a probability value
  • a normalized exponential function Softmax function
  • the normalized exponential function can "compress" a multidimensional vector containing any real number into another multidimensional real vector, so that each element is in the range (0, 1).
  • normalized exponential function is only an example of normalization processing provided by this application, and those skilled in the art may also use other normalization processing functions, which will not exceed the protection scope of this application.
  • S6 Determine the matching result between the first sentence to be tested and the second sentence to be tested according to the probability value.
  • the matching result between the first sentence to be tested and the second sentence to be tested is determined according to the probability value.
  • the threshold is set, it is judged that the first sentence to be tested is not similar to the second sentence to be tested.
  • the preset threshold is 0.5. If the probability value is greater than 0.5, it is determined that the first sentence to be tested is similar to the second sentence to be tested. If the probability value is less than 0.5, it is determined that the first sentence to be tested is similar. Not similar to the second test sentence.
  • the user can set the preset threshold according to the actual situation, which is not specifically limited in this application.
  • the similar sentence matching method includes: using a preset word vector training tool to convert the first sentence to be tested and the second sentence to be tested into a first vector and a second vector respectively; Encoding the first vector to obtain a third vector, encoding the second vector through the multi-layer encoder to obtain a fourth vector; encoding the third vector and the fourth vector through the multi-layer inference module Perform information interaction processing to obtain a fifth vector; calculate the global draw value of the fifth vector; perform normalization processing on the global draw value to obtain a probability value; judge the first sentence to be tested and the sentence according to the probability value.
  • the matching result of the second sentence to be tested The method makes full use of sentence information and improves the matching accuracy of similar sentences by performing information interaction processing on the third vector and the fourth vector and calculating the global average value of the fifth vector.
  • FIG. 3 is a schematic flowchart of a similar sentence matching method provided by another embodiment of the present application.
  • the similar sentence matching method of this embodiment includes steps S101-S109.
  • the steps S104-S109 are similar to the steps S1-S6 in the above-mentioned embodiment, and are not repeated here.
  • the steps S101-S103 added in this embodiment will be described in detail below.
  • a multi-layer encoder is trained using a contrastive self-supervised method.
  • the use of the contrastive self-supervised method to train the multi-layer encoder first construct the positive label and negative label of the training; input the first training sentence and the second training sentence into the multi-layer encoder to obtain x, y and x ', where x is the shallow output of the first training sentence, y is the deep output of the first training sentence, and x' is the shallow output of the second training sentence.
  • the first training sentence and the second training sentence are two sentences with different meanings.
  • the shallow output of the first training sentence of x and the deep output of the first training sentence of y form a positive label (x, y), where y is the deep output of the first training sentence and the shallow output of the second training sentence of x'.
  • the trained multi-layer encoder and the multi-layer reasoning module are formed into a siamese network model.
  • the trained multi-layer encoder and the multi-layer reasoning module form a siamese network model.
  • the trained multi-layer encoder and multi-layer reasoning module are formed into a twin network model as shown in Figure 1, in which the two multi-layer encoders run in parallel without affecting each other, and information is exchanged between the two multi-layer reasoning modules. .
  • the twin network model is trained to obtain the trained twin network model.
  • the multi-layer encoder is trained by the method of contrastive self-supervision, and then the trained multi-layer encoder and multi-layer inference module are used to form a siamese network model, and then the whole siamese network model is trained. Because the multi-layer encoder already has strong coding ability after step S1, the training of the twin network model does not need to retrain the multi-layer encoder, which not only improves the convergence speed of the twin network model, but also reduces the amount of labeling data. need.
  • FIG. 5 is a schematic block diagram of a similar sentence matching apparatus provided by an embodiment of the present application.
  • the present application further provides a similar sentence matching apparatus 100 .
  • the similar sentence matching apparatus 100 includes a unit for executing the above similar sentence matching method, and the apparatus can be configured in a desktop computer, a tablet computer, a laptop computer, and other terminals.
  • the similar sentence matching apparatus 100 includes a conversion unit 101 , a first encoding unit 102 , an interaction processing unit 103 , a calculation unit 104 , a normalization processing unit 105 and a judgment unit 106 .
  • the conversion unit 101 is used to convert the first sentence to be measured and the second sentence to be measured into a first vector and a second vector respectively by a preset word vector training tool;
  • a first encoding unit 102 configured to encode the first vector by the multi-layer encoder to obtain a third vector, and encode the second vector by the multi-layer encoder to obtain a fourth vector;
  • An interaction processing unit 103 configured to perform information interaction processing on the third vector and the fourth vector through the multi-layer reasoning module to obtain a fifth vector;
  • a calculation unit 104 configured to calculate the global draw value of the fifth vector
  • a normalization processing unit 105 configured to perform normalization processing on the global draw value to obtain a probability value
  • the judgment unit 106 is configured to judge the matching result between the first sentence to be tested and the second sentence to be tested according to the probability value.
  • the third vector is obtained by encoding the first vector by the multi-layer encoder
  • the fourth vector is obtained by encoding the second vector by the multi-layer encoder, including:
  • the encoding of the second vector by the multi-layer encoder to obtain the fourth vector includes:
  • N Q is the query vector matrix of the second sentence to be tested
  • N K is the key vector matrix of the second sentence to be tested
  • N V is the second to be tested.
  • the value vector matrix of the test sentence, N is the second sentence to be tested
  • d 1 is the dimension of the multi-layer encoder network layer.
  • M Q is the query vector matrix of the first sentence to be tested
  • N K is the key vector matrix of the second sentence to be tested
  • NV is the value vector matrix of the second sentence to be tested
  • M is the first sentence to be tested
  • N is The second sentence to be tested
  • d 2 is the dimension of the network layer of the multi-layer reasoning module.
  • the multi-layer reasoning module includes a multi-layer reasoning network, and each layer of the reasoning network calculates the attention value of the information interaction between the third vector and the fourth vector, and the calculation of the fifth vector
  • the global tie value of including:
  • the sum of the attention values of the information interaction between the third vector and the fourth vector is averaged to obtain the global average value of the fifth vector.
  • the matching results include similarity and dissimilarity
  • the matching results of the first sentence to be tested and the second sentence to be tested are determined according to the probability value, including:
  • the probability value is less than the preset threshold, it is determined that the first sentence to be tested is not similar to the second sentence to be tested.
  • the similar sentence matching method before the first sentence to be tested and the second sentence to be tested are respectively converted into the first vector and the second vector by the preset word vector training tool, the similar sentence matching method further includes:
  • the trained multi-layer encoder and multi-layer inference module form a twin network model
  • the use of the contrastive self-supervised method to train the multi-layer encoder includes:
  • the above-mentioned similar sentence matching apparatus can be implemented in the form of a computer program, and the computer program can be executed on a computer device as shown in FIG. 6 .
  • FIG. 6 is a schematic block diagram of a computer device provided by an embodiment of the present application.
  • the computer device 300 is a host computer.
  • the host computer may be an electronic device such as a tablet computer, a notebook computer, and a desktop computer.
  • the computer device 300 includes a processor 302 , a memory and a network interface 305 connected through a system bus 301 , wherein the memory may include a non-volatile storage medium 303 and an internal memory 304 .
  • the nonvolatile storage medium 303 can store an operating system 3031 and a computer program 3032 .
  • the computer program 3032 when executed, can cause the processor 302 to perform a similar sentence matching method.
  • the processor 302 is used to provide computing and control capabilities to support the operation of the entire computer device 300 .
  • the internal memory 304 provides an environment for running the computer program 3032 in the non-volatile storage medium 303.
  • the processor 302 can execute a similar sentence matching method.
  • the network interface 305 is used for network communication with other devices.
  • the structure shown in FIG. 6 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer device 300 to which the solution of the present application is applied.
  • the specific computer device 300 may include more or fewer components than shown, or combine certain components, or have a different arrangement of components.
  • processor 302 is used to run the computer program 3032 stored in the memory to realize the following steps:
  • the first sentence to be tested and the second sentence to be tested are respectively converted into the first vector and the second vector by the preset word vector training tool;
  • the first vector is encoded by the multi-layer encoder to obtain a third vector
  • the second vector is encoded by the multi-layer encoder to obtain a fourth vector
  • the matching result between the first sentence to be tested and the second sentence to be tested is judged according to the probability value, wherein the twin network model includes a multi-layer encoder and a multi-layer reasoning module.
  • the encoding of the first vector by the multi-layer encoder to obtain a third vector includes:
  • Equation Calculate the self-attention value of the first vector to obtain the third vector wherein MQ is the query vector matrix of the first sentence to be tested, MK is the key vector matrix of the first sentence to be tested, and MV is the first to be tested.
  • MQ is the query vector matrix of the first sentence to be tested
  • MK is the key vector matrix of the first sentence to be tested
  • MV is the first to be tested.
  • the value vector matrix of the test sentence, M is the first sentence to be tested
  • d 1 is the dimension of the multi-layer encoder network layer.
  • the encoding of the second vector by the multi-layer encoder to obtain a fourth vector includes:
  • N Q is the query vector matrix of the second sentence to be tested
  • N K is the key vector matrix of the second sentence to be tested
  • N V is the second to be tested.
  • the value vector matrix of the test sentence, N is the second sentence to be tested
  • d 1 is the dimension of the multi-layer encoder network layer.
  • performing information interaction processing on the third vector and the fourth vector by the multi-layer reasoning module to obtain a fifth vector including:
  • M Q is the query vector matrix of the first sentence to be tested
  • N K is the key vector matrix of the second sentence to be tested
  • NV is the value vector matrix of the second sentence to be tested
  • M is the first sentence to be tested
  • N is The second sentence to be tested
  • d 2 is the dimension of the network layer of the multi-layer reasoning module.
  • the multi-layer reasoning module includes a multi-layer reasoning network, and each layer of the reasoning network calculates the attention value of the information interaction between the third vector and the fourth vector, and the calculation of the fifth vector
  • the global tie value of including:
  • the sum of the attention values of the information interaction between the third vector and the fourth vector is averaged to obtain the global average value of the fifth vector.
  • the average value of the sum of the attention values of the information interaction between the third vector and the fourth vector is obtained to obtain the global average value of the fifth vector, including:
  • Multiplying the average value of the attention value of the information interaction between the third vector and the fourth vector by the value vector matrix of the second sentence to be tested obtains the global average value of the fifth vector.
  • the matching results include similarity and dissimilarity
  • the matching results of the first sentence to be tested and the second sentence to be tested are determined according to the probability value, including:
  • the probability value is less than the preset threshold, it is determined that the first sentence to be tested is not similar to the second sentence to be tested.
  • the similar sentence matching method before the first sentence to be tested and the second sentence to be tested are respectively converted into the first vector and the second vector by the preset word vector training tool, the similar sentence matching method further includes:
  • the trained multi-layer encoder and the multi-layer inference module form a twin network model
  • the use of the contrastive self-supervised method to train the multi-layer encoder includes:
  • the processor 302 may be a central processing unit (Central Processing Unit, CPU), and the processor 302 may also be other general-purpose processors, digital signal processors (Digital Signal Processor, DSP), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor or the like.
  • the computer program can be stored in a storage medium, which is a computer-readable storage medium.
  • the computer program is executed by at least one processor in the computer system to implement the flow steps of the above-described method embodiments.
  • the present application also provides a storage medium.
  • the storage medium may be a computer-readable storage medium, and the computer-readable storage medium may be non-volatile or volatile.
  • the storage medium stores a computer program. The computer program, when executed by the processor, causes the processor to perform the following steps:
  • the first sentence to be tested and the second sentence to be tested are respectively converted into the first vector and the second vector by the preset word vector training tool;
  • the first vector is encoded by the multi-layer encoder to obtain a third vector
  • the second vector is encoded by the multi-layer encoder to obtain a fourth vector
  • the matching result between the first sentence to be tested and the second sentence to be tested is judged according to the probability value, wherein the twin network model includes a multi-layer encoder and a multi-layer reasoning module.
  • the encoding of the first vector by the multi-layer encoder to obtain a third vector includes:
  • Equation Calculate the self-attention value of the first vector to obtain the third vector wherein MQ is the query vector matrix of the first sentence to be tested, MK is the key vector matrix of the first sentence to be tested, and MV is the first to be tested.
  • MQ is the query vector matrix of the first sentence to be tested
  • MK is the key vector matrix of the first sentence to be tested
  • MV is the first to be tested.
  • the value vector matrix of the test sentence, M is the first sentence to be tested
  • d 1 is the dimension of the multi-layer encoder network layer.
  • the encoding of the second vector by the multi-layer encoder to obtain a fourth vector includes:
  • N Q is the query vector matrix of the second sentence to be tested
  • N K is the key vector matrix of the second sentence to be tested
  • N V is the second to be tested.
  • the value vector matrix of the test sentence, N is the second sentence to be tested
  • d 1 is the dimension of the multi-layer encoder network layer.
  • performing information interaction processing on the third vector and the fourth vector by the multi-layer reasoning module to obtain a fifth vector including:
  • M Q is the query vector matrix of the first sentence to be tested
  • N K is the key vector matrix of the second sentence to be tested
  • NV is the value vector matrix of the second sentence to be tested
  • M is the first sentence to be tested
  • N is The second sentence to be tested
  • d 2 is the dimension of the network layer of the multi-layer reasoning module.
  • the multi-layer reasoning module includes a multi-layer reasoning network, and each layer of the reasoning network calculates the attention value of the information interaction between the third vector and the fourth vector, and the calculation of the fifth vector
  • the global tie value of including:
  • the sum of the attention values of the information interaction between the third vector and the fourth vector is averaged to obtain the global average value of the fifth vector.
  • the average value of the sum of the attention values of the information interaction between the third vector and the fourth vector is obtained to obtain the global average value of the fifth vector, including:
  • Multiplying the average value of the attention value of the information interaction between the third vector and the fourth vector by the value vector matrix of the second sentence to be tested obtains the global average value of the fifth vector.
  • the matching results include similarity and dissimilarity
  • the matching results of the first sentence to be tested and the second sentence to be tested are determined according to the probability value, including:
  • the probability value is less than the preset threshold, it is determined that the first sentence to be tested is not similar to the second sentence to be tested.
  • the similar sentence matching method before the first sentence to be tested and the second sentence to be tested are respectively converted into the first vector and the second vector by the preset word vector training tool, the similar sentence matching method further includes:
  • the trained multi-layer encoder and multi-layer inference module form a twin network model
  • the use of the contrastive self-supervised method to train the multi-layer encoder includes:
  • the storage medium is a physical, non-transitory storage medium, such as a U disk, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a magnetic disk or an optical disk and other physical storage that can store program codes. medium.
  • ROM Read-Only Memory
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the integrated unit if implemented in the form of a software functional unit and sold or used as an independent product, may be stored in a storage medium.
  • the technical solutions of the present application are essentially or part of contributions to the prior art, or all or part of the technical solutions can be embodied in the form of software products, and the computer software products are stored in a storage medium , including several instructions to cause a computer device (which may be a personal computer, a terminal, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种相似句匹配方法、装置、计算机设备及存储介质,涉及人工智能技术领域,可应用于智慧科技中以推动智慧城市的建设。该方法包括:通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;通过多层编码器对第一向量进行编码得到第三向量,通过多层编码器对第二向量进行编码得到第四向量;通过多层推理模块对所述第三向量及第四向量进行信息交互处理得到第五向量;计算第五向量的全局平局值;对全局平局值进行归一化处理得到概率值;根据概率值判断第一待测句子与第二待测句子的匹配结果。

Description

相似句匹配方法、装置、计算机设备及存储介质
本申请要求于2020年12月16日提交中国专利局,申请号为202011483693.6,发明名称为“相似句匹配方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,尤其涉及一种相似句匹配方法、装置、计算机设备及存储介质。
背景技术
自然语言领域目前已经从自然语言处理发展到自然语言理解的过程,而对于自然语言理解,很重要的就是能对一句话的深层意思理解到位。相似句匹配问题在多个领域都起到很重要的作用,比如问答以及阅读理解等。语言的表达千变万化,如何能正确的判断两个句子是否表达的是同一个意思至关重要。
发明人发现,传统的方法大多停留在判断文字本身的相似度上,比如使用编辑距离等方式判断两个句子的相似度,但是这种方法的准确率非常低,因为语言表达千变万化,有时两个句子只有一个字不一样,可能表达的意思就天差地别了。最近几年随着深度学习的流行,人们开始使用深度学习实现相似句匹配。虽然深度学习完美的解决了之前规则不准的问题,但是准确率较低。
发明内容
本申请实施例提供了一种相似句匹配方法、装置、计算机设备及存储介质,旨在解决现有相似句匹配方法准确率较低的问题。
第一方面,本申请实施例提供了一种相似句匹配方法,孪生网络模型包括多层编码器和多层推理模块,所述相似句匹配方法包括:
通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量;
通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
计算所述第五向量的全局平局值;
对所述全局平局值进行归一化处理得到概率值;
根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果。
第二方面,本申请实施例还提供了一种相似句匹配装置,孪生网络模型包括多层编码器和多层推理模块,所述相似句匹配装置包括:
转换单元,用于通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
第一编码单元,用于通过所述多层编码器对所述第一向量进行编码得到第三向量,用于通过所述多层编码器对所述第二向量进行编码得到第四向量;
交互处理单元,用于通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
计算单元,用于计算所述第五向量的全局平局值;
归一化处理单元,用于对所述全局平局值进行归一化处理得到概率值;
判断单元,用于根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果。
第三方面,本申请实施例还提供了一种计算机设备,所述计算机设备包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器用于运行所述计算机程序,以执行如下步骤:
通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量;
通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
计算所述第五向量的全局平局值;
对所述全局平局值进行归一化处理得到概率值;
根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果,其中,孪生网络模型包括多层编码器和多层推理模块。
第四方面,本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其中,所述计算机程序被处理器执行时使所述处理器执行以下步骤:
通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量;
通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
计算所述第五向量的全局平局值;
对所述全局平局值进行归一化处理得到概率值;
根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果,其中,孪生网络模型包括多层编码器和多层推理模块。
本发明实施例提供了一种相似句匹配方法、装置、计算机设备及存储介质,该方法通过对第三向量和第四向量进行信息交互处理以及计算第五向量的全局平局值,充分的利用句子的信息,提高了相似句匹配的准确率。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本申请实施例提供的一种孪生网络模型的示意框图;
图2为本申请实施例提供的一种相似句匹配方法流程示意图;
图3为本申请另一实施例提供的一种相似句匹配方法流程示意图;
图4为本申请实施例提供的一种相似句匹配方法的子流程示意图;
图5为本申请实施例提供的一种相似句匹配装置的示意性框图;
图6为本申请实施例提供的一种计算机设备的示意性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
应当理解,当在本说明书和所附权利要求书中使用时,术语“包括”和“包还应当理解,在此本申请说明书中所使用的术语仅仅是出于描述特定实施例的目的而并不意在限制本申请。如在本申请说明书和所附权利要求书中所使用的那样,除非上下文清楚地指明其它情况,否则单数形式的“一”、“一个”及“该”意在包括复数形式。
还应当进一步理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。
如在本说明书和所附权利要求书中所使用的那样,术语“如果”可以依据上下文被解释为“当...时”或“一旦”或“响应于确定”或“响应于检测到”。类似地,短语“如果确定”或“如果检测到[所描述条件或事件]”可以依据上下文被解释为意指“一旦确定”或“响应于确定”或“一旦检测到[所描述条件或事件]”或“响应于检测到[所描述条件或事件]”。
本申请实施例提出的技术方案可应用于智慧科技中以推动智慧城市的建设。
请参阅图1,图1是本申请实施例提供的一种孪生网络模型200的示意框图。如图所示,该孪生网络模型200包括两个多层编码器201和两个多层推理模块202,其中两个多层编码器201为并行的关系,两个多层推理模块202之间进行信息交互。
参阅图2,图2是本申请实施例提供的相似句匹配方法的流程示意图。本申请可应用于智慧政务/智慧城管/智慧社区/智慧安防/智慧物流/智慧医疗/智慧教育/智慧环保/智慧交通场景中,从而推动智慧城市的建设。如图2所示,该方法包括以下步骤S1-S6。
S1,通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量。
自然语言理解的问题要转化为机器学习的问题,第一步需要要把句子数学化。词向量就是用来将语言中的词进行数学化的一种方式,顾名思义,词向量就是把一个词表示成一个向量。
具体实施中,采用word2vec作为词向量训练工具,并通过word2vec对所述第二分词序列中的单词进行词向量训练以得到输入词向量序列。
word2vec是一种词向量训练工具,其作用就是将自然语言中的字词转为计算机可以理解的词向量。传统的词向量训练工具容易受维数灾难的困扰,且任意两个词之间都是孤立的,不能体现词和词之间的关系,因此本实施例采用word2vec来训练词向量,其可通过计算向量之间的距离来体现词与词之间的相似性。
例如,在一实施例中,第一待测句子为:春天/花/真/漂亮。通过word2vec训练后得到“春天”的词向量为M11、“花”的词向量为M12、“真”的词向量为M13以及漂亮的词向量为M14,则第一向量为(M11、M12、M13、M14)。
需要说明的是,word2vec仅仅是本申请提供的一种词向量训练工具的示例,本领域技术人员还可以采用其他词向量训练工具,这并不会超出本申请的保护范围。
S2,通过多层编码器对第一向量进行编码得到第三向量,通过多层编码器对第二向量进行编码得到第四向量。
具体实施中,通过所述多层编码器对所述第一向量进行编码得到第三向量。在一实施例中,对应第一向量的每一行向量均有三个表示Q(Query Vector)、K(Key Vector)以及V(Value Vector),其中Q为查询向量矩阵,K为键向量矩阵,V为值向量矩阵。假设每一行向量的维数是8,若这些表示的维数是5,则三个表示均为8×5的矩阵。
用适当的随机分布分别对Q查询向量矩阵,K键向量矩阵以及V值向量矩阵进行随机初始化得到初始化的Q查询向量矩阵,K键向量矩阵以及V值向量矩阵。
对Q查询向量矩阵,K键向量矩阵以及V值向量矩阵初始化后与第一向量作积,便可得到第一向量的三个表示M Q、M K以及M V。具体的在一实施例中,将第一向量的每一行向量与Q查询向量矩阵相乘得到M Q第一待测句子的查询向量矩阵,将第一向量的每一行向量与K键向量矩阵相乘得到M K第一待测句子的键向量矩阵,将第一向量的每一行向量与V值向量矩阵相乘得到M V第一待测句子的值向量矩阵。
通过自注意力机制的方程
Figure PCTCN2021097099-appb-000001
计算所述第一向量的自注意力值(Attention Scores)从而得到第三向量。需要说明的是,其中第一向量的 自注意力值为
Figure PCTCN2021097099-appb-000002
将其乘以第一待测句子的值向量矩阵M V便得到第三向量。其中M Q为第一待测句子的查询向量矩阵,M K为第一待测句子的键向量矩阵,M V为第一待测句子的值向量矩阵,M为第一待测句子,d 1为多层编码器网络层的维度。
具体实施中,通过所述多层编码器对所述第二向量进行编码得到第四向量。在本实施例中对应第二向量的每一行向量均有三个表示Q、K以及V,其中Q为查询向量矩阵,K为键向量矩阵,V为值向量矩阵。
用适当的随机分布分别对Q查询向量矩阵,K键向量矩阵以及V值向量矩阵进行随机初始化,得到初始化的Q查询向量矩阵,K键向量矩阵以及V值向量矩阵。
对Q查询向量矩阵,K键向量矩阵以及V值向量矩阵初始化后与第二向量作积,便可得到第二向量的三个表示N Q、N K以及N V。具体的在本实施例中,将第二向量的每一行向量与Q查询向量矩阵相乘得到N Q第二待测句子的查询向量矩阵,将第二向量的每一行向量与K键向量矩阵相乘得到N K第二待测句子的键向量矩阵,将第二向量的每一行向量与V值向量矩阵相乘得到N V第二待测句子的值向量矩阵。
通过自注意力机制方程
Figure PCTCN2021097099-appb-000003
计算所述第二向量的自注意力值从而得到第四向量,需要说明的是,其中第二向量的自注意力值为
Figure PCTCN2021097099-appb-000004
将其乘以第二待测句子的值向量矩阵N V便得到第四向量。其中N Q为第二待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,N为第二待测句子,d 1为多层编码器网络层的维度。
S3,通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量。
具体实施中,通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量。基于步骤S2的自注意力机制方程将其中第一待测句子的键向量矩阵M K和值向量矩阵M V替换为第二待测句子的键向量矩阵N K和值向量矩阵N V便可得到信息交互的自注意力机制方程。
通过自注意力机制方程
Figure PCTCN2021097099-appb-000005
计算所述第三向量与第四向量信息交互的注意力值从而得到第五向量。需要说明的是,其中第三向量与第四向量信息交互的注意力值为
Figure PCTCN2021097099-appb-000006
将其乘以第二待测句子的值向量矩阵N V便得到第五向量。通过信息交互的自注意力机制方程使第一待测句子与第二待测句子更好的进行信息交互,为句子的匹配结果提供更可靠的依据,从而提高相似句匹配的准确率。
其中M Q为第一待测句子的查询向量矩阵,N K第二待测句子的为键向量矩阵,N V为第二待测句子的值向量矩阵,M为第一待测句子,N为第二待测句子,d 2为多层推理模块网络层的维度。
S4,计算所述第五向量的全局平局值。
具体实施中,计算所述第五向量的全局平局值,本实施例中的多层推理模块包括多层推 理网络,各层所述推理网络均计算所述第三向量与第四向量信息交互的注意力值。
在传统的计算方法中只对多层推理模块的最后一层推理网络输出的注意力值进行步骤S5的归一化处理得到概率值作为相似句匹配的结果,这样处理忽略了多层推理模块中其他推理网络输出的注意力值对概率值的影响,降低了相似句匹配的准确率。
在本实施例中,多层推理模块的每一层推理网络输出的注意力值都参与概率值的计算,从而大大提高了相似句匹配的准确率。
参见图4,在一实施例中,以上步骤S4具体包括:S41-S42。
S41,根据各层所述推理网络计算的第三向量与第四向量信息交互的注意力值,计算第三向量与第四向量信息交互的注意力值的总和。
具体实施中,根据各层所述推理网络计算的第三向量与第四向量信息交互的注意力值,计算第三向量与第四向量信息交互的注意力值的总和,使每层推理网络的输出都能参与到下一步骤概率值的计算。
S42,对第三向量与第四向量信息交互的注意力值的总和求取平均值从而得到第五向量的全局平局值。
具体实施中,对第三向量与第四向量信息交互的注意力值的总和求取平均值从而得到第五向量的全局平局值。将第三向量与第四向量信息交互的注意力值的总和除以多层推理网络的维度得到第三向量与第四向量信息交互的注意力值的平均值,需要说明的是,再将第三向量与第四向量信息交互的注意力值的平均值乘以第二待测句子的值向量矩阵得到第五向量的全局平局值。充分利用了每一层推理网络的输出信息,确保了最大化利用信息。
S5,对所述全局平局值进行归一化处理得到概率值。
具体实施中,对所述全局平局值进行归一化处理得到概率值,使用归一化指数函数(Softmax函数)对全局平局值进行归一化处理得到概率值。归一化指数函数能将一个含任意实数的多维向量“压缩”到另一个多维实向量中,使得每一个元素的范围都在(0,1)之间。
需要说明的是,归一化指数函数仅仅是本申请提供的一种归一化处理的示例,本领域技术人员还可以采用其他归一化处理函数,这并不会超出本申请的保护范围。
S6,根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果。
具体实施中,根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果。判断所述概率值是否大于预设阈值;若所述概率值大于所述预设阈值,则判断所述第一待测句子与第二待测句子相似;若所述概率值小于所述预设阈值,则判断所述第一待测句子与第二待测句子不相似。在一实施例中,预设阈值为0.5,若概率值大于0.5,则判断所述第一待测句子与第二待测句子相似,若概率值小于0.5,则判断所述第一待测句子与第二待测句子不相似。用户可根据实际情况对预设阈值进行设定,本申请对此不做具体限定。
本申请实施例提供的相似句匹配方法包括:通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量;通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;计算所述第五向量的全局平局值;对所述全局平局值进行归一化处理得到概率值;根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果。该方法通过对第三向量和第四向量进行信息交互处理以及计算第五向量的全局平局值,充分的利用句子的信息,提高了相似句匹配的准确率。
请参阅图3,图3是本申请另一实施例提供的一种相似句匹配方法的流程示意图。如图3所示,本实施例的相似句匹配方法包括步骤S101-S109。其中步骤S104-S109与上述实施例中的步骤S1-S6类似,在此不再赘述。下面详细说明本实施例中所增加的步骤S101-S103。
S101,使用对比自监督方法对多层编码器进行训练。
具体实施中,所述使用对比自监督方法对多层编码器进行训练,首先构建训练的正标签与负标签;将第一训练句子与第二训练句子输入多层编码器得到x,y以及x',其中x为第一 训练句子的浅层输出,y为第一训练句子的深层输出,x'为第二训练句子的浅层输出。需要说明的是,第一训练句子与第二训练句子为两个意思不同的两个句子。x第一训练句子的浅层输出与y第一训练句子的深层输出组成一个正标签(x,y),y为第一训练句子的深层输出与x'第二训练句子的浅层输出组成一个负标签(x',y)。
通过公式JS(x,y)=max(E[log(σ(T(x,y)))]+E[log(1-σ(T(x',y)))])计算损失值,其中T(x,y)与T(x',y)为分类器,(x,y)为正标签,(x',y)为负标签;根据损失值调整多层编码器的参数;重复上述训练步骤直至多层编码器的参数不再变化则停止训练。其中,根据损失值调整多层编码器的参数为本领域技术人员熟知的技术手段,在此不再赘述。通过上述训练可得到一个表达能力较强的编码器。在该训练中使用比自监督方法通过构建正负标签便可完成训练,无需使用标注数据。
S102,将经过训练的多层编码器与多层推理模块组成孪生网络模型。
具体实施中,将经过训练的多层编码器与多层推理模块组成孪生网络模型。将经过训练的多层编码器与多层推理模块组成如图1所示的孪生网络模型,其中两个多层编码器并行运行,互不影响,两个多层推理模块之间进行信息的交互。
S103,对孪生网络模型进行训练。
具体实施中,对孪生网络模型进行训练,得到经过训练的孪生网络模型。首先使用对比自监督的方法训练多层编码器,然后使用训练好的多层编码器与多层推理模块组成孪生网络模型,再对整个孪生网络模型进行训练。因为经过步骤S1多层编码器已经具有较强的编码能力,对孪生网络模型的训练就无需对多层编码器再进行训练,不仅提高了孪生网络模型的收敛速度,也减少了对标注数据的需求。
图5是本申请实施例提供的一种相似句匹配装置的示意性框图。如图5所示,对应于以上相似句匹配方法,本申请还提供一种相似句匹配装置100。该相似句匹配装置100包括用于执行上述相似句匹配方法的单元,该装置可以被配置于台式电脑、平板电脑、手提电脑、等终端中。具体地,请参阅图5,该相似句匹配装置100包括转换单元101、第一编码单元102、交互处理单元103、计算单元104、归一化处理单元105以及判断单元106。
转换单元101,用于通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
第一编码单元102,用于通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量;
交互处理单元103,用于通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
计算单元104,用于计算所述第五向量的全局平局值;
归一化处理单元105,用于对所述全局平局值进行归一化处理得到概率值;
判断单元106,用于根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果。
在一实施例中,所述通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量,包括:
通过方程
Figure PCTCN2021097099-appb-000007
计算所述第一向量的自注意力值从而得到第三向量,其中M Q为第一待测句子的查询向量矩阵,M K为第一待测句子的键向量矩阵,M V为第一待测句子的值向量矩阵,M为第一待测句子,d 1为多层编码器网络层的维度;
在一实施例中,所述通过所述多层编码器对所述第二向量进行编码得到第四向量,包括:
通过方程
Figure PCTCN2021097099-appb-000008
计算所述第二向量的自注意力值从而得到第四向量,其中N Q为第二待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,N为第二待测句子,d 1为多层编码器网络层的维度。
通过方程
Figure PCTCN2021097099-appb-000009
计算所述第三向量与第四向量信息交互的注意力值从而得到第五向量,
其中M Q为第一待测句子的查询向量矩阵,N K第二待测句子的为键向量矩阵,N V为第二待测句子的值向量矩阵,M为第一待测句子,N为第二待测句子,d 2为多层推理模块网络层的维度。
在一实施例中,所述多层推理模块包括多层推理网络,各层所述推理网络均计算所述第三向量与第四向量信息交互的注意力值,所述计算所述第五向量的全局平局值,包括:
根据各层所述推理网络计算的第三向量与第四向量信息交互的注意力值,计算第三向量与第四向量信息交互的注意力值的总和;
对第三向量与第四向量信息交互的注意力值的总和求取平均值从而得到第五向量的全局平局值。
在一实施例中,所述匹配结果包括相似以及不相似,所述根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果,包括:
判断所述概率值是否大于预设阈值;
若所述概率值大于所述预设阈值,则判断所述第一待测句子与第二待测句子相似;
若所述概率值小于所述预设阈值,则判断所述第一待测句子与第二待测句子不相似。
在一实施例中,所述通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量之前,所述相似句匹配方法还包括:
使用对比自监督方法对多层编码器进行训练;
将经过训练的多层编码器与多层推理模块组成孪生网络模型;
对孪生网络模型进行训练。
在一实实施例中,所述使用对比自监督方法对多层编码器进行训练,包括:
构建训练的正标签与负标签;
通过公式JS(x,y)=max(E[log(σ(T(x,y)))]+E[log(1-σ(T(x',y)))])计算损失值,其中T(x,y)与T(x',y)为分类器,(x,y)为正标签,(x',y)为负标签;
根据损失值调整多层编码器的参数。
需要说明的是,所属领域的技术人员可以清楚地了解到,上述相似句匹配装置和各单元的具体实现过程,可以参考前述方法实施例中的相应描述,为了描述的方便和简洁,在此不再赘述。
上述相似句匹配装置可以实现为一种计算机程序的形式,该计算机程序可以在如图6所示的计算机设备上运行。
请参阅图6,图6是本申请实施例提供的一种计算机设备的示意性框图。该计算机设备300是上位机。其中,上位机可以是平板电脑、笔记本电脑、台式电脑等电子设备。
参阅图6,该计算机设备300包括通过系统总线301连接的处理器302、存储器和网络接口305,其中,存储器可以包括非易失性存储介质303和内存储器304。
该非易失性存储介质303可存储操作系统3031和计算机程序3032。该计算机程序3032被执行时,可使得处理器302执行一种相似句匹配方法。
该处理器302用于提供计算和控制能力,以支撑整个计算机设备300的运行。
该内存储器304为非易失性存储介质303中的计算机程序3032的运行提供环境,该计算机程序3032被处理器302执行时,可使得处理器302执行一种相似句匹配方法。
该网络接口305用于与其它设备进行网络通信。本领域技术人员可以理解,图6中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备300的限定,具体的计算机设备300可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器302用于运行存储在存储器中的计算机程序3032,以实现如下步骤:
通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量;
通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
计算所述第五向量的全局平局值;
对所述全局平局值进行归一化处理得到概率值;
根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果,其中,孪生网络模型包括多层编码器和多层推理模块。
在一实施例中,所述通过所述多层编码器对所述第一向量进行编码得到第三向量,包括:
通过方程
Figure PCTCN2021097099-appb-000010
计算所述第一向量的自注意力值从而得到第三向量,其中M Q为第一待测句子的查询向量矩阵,M K为第一待测句子的键向量矩阵,M V为第一待测句子的值向量矩阵,M为第一待测句子,d 1为多层编码器网络层的维度。
在一实施例中,所述通过所述多层编码器对所述第二向量进行编码得到第四向量,包括:
通过方程
Figure PCTCN2021097099-appb-000011
计算所述第二向量的自注意力值从而得到第四向量,其中N Q为第二待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,N为第二待测句子,d 1为多层编码器网络层的维度。
在一实施例中,所述通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量,包括:
通过方程
Figure PCTCN2021097099-appb-000012
计算所述第三向量与第四向量信息交互的注意力值从而得到第五向量,
其中M Q为第一待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,M为第一待测句子,N为第二待测句子,d 2为多层推理模块网络层的维度。
在一实施例中,所述多层推理模块包括多层推理网络,各层所述推理网络均计算所述第三向量与第四向量信息交互的注意力值,所述计算所述第五向量的全局平局值,包括:
根据各层所述推理网络计算的第三向量与第四向量信息交互的注意力值,计算第三向量与第四向量信息交互的注意力值的总和;
对第三向量与第四向量信息交互的注意力值的总和求取平均值从而得到第五向量的全局平局值。
在一实施例中,所述对第三向量与第四向量信息交互的注意力值的总和求取平均值从而得到第五向量的全局平局值,包括:
将第三向量与第四向量信息交互的注意力值的总和除以多层推理网络的维度得到第三向量与第四向量信息交互的注意力值的平均值;
将第三向量与第四向量信息交互的注意力值的平均值乘以第二待测句子的值向量矩阵得到第五向量的全局平局值。
在一实施例中,所述匹配结果包括相似以及不相似,所述根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果,包括:
判断所述概率值是否大于预设阈值;
若所述概率值大于所述预设阈值,则判断所述第一待测句子与第二待测句子相似;
若所述概率值小于所述预设阈值,则判断所述第一待测句子与第二待测句子不相似。
在一实施例中,所述通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量之前,所述相似句匹配方法还包括:
使用对比自监督方法对多层编码器进行训练;
将经过训练的多层编码器与多层推理模块组成孪生网络模型;
对孪生网络模型进行训练。
在一实施例中,所述使用对比自监督方法对多层编码器进行训练,包括:
构建训练的正标签与负标签;
通过公式JS(x,y)=max(E[log(σ(T(x,y)))]+E[log(1-σ(T(x',y)))])计算损失值,其中T(x,y)与T(x',y)为分类器,(x,y)为正标签,(x',y)为负标签;
根据损失值调整多层编码器的参数。
应当理解,在本申请实施例中,处理器302可以是中央处理单元(Central Processing Unit,CPU),该处理器302还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。其中,通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
本领域普通技术人员可以理解的是实现上述实施例的方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成。该计算机程序可存储于一存储介质中,该存储介质为计算机可读存储介质。该计算机程序被该计算机系统中的至少一个处理器执行,以实现上述方法的实施例的流程步骤。
因此,本申请还提供一种存储介质。该存储介质可以为计算机可读存储介质,所述计算机可读存储介质可以是非易失性,也可以是易失性。该存储介质存储有计算机程序。该计算机程序被处理器执行时使处理器执行如下步骤:
通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量;
通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
计算所述第五向量的全局平局值;
对所述全局平局值进行归一化处理得到概率值;
根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果,其中,孪生网络模型包括多层编码器和多层推理模块。
在一实施例中,所述通过所述多层编码器对所述第一向量进行编码得到第三向量,包括:
通过方程
Figure PCTCN2021097099-appb-000013
计算所述第一向量的自注意力值从而得到第三向量,其中M Q为第一待测句子的查询向量矩阵,M K为第一待测句子的键向量矩阵,M V为第一待测句子的值向量矩阵,M为第一待测句子,d 1为多层编码器网络层的维度。
在一实施例中,所述通过所述多层编码器对所述第二向量进行编码得到第四向量,包括:
通过方程
Figure PCTCN2021097099-appb-000014
计算所述第二向量的自注意力值从而得到第四向量,其中N Q为第二待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,N为第二待测句子,d 1为多层编码器网络层的维度。
在一实施例中,所述通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量,包括:
通过方程
Figure PCTCN2021097099-appb-000015
计算所述第三向量与第四向量信息交互的注意力值从而得到第五向量,
其中M Q为第一待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,M为第一待测句子,N为第二待测句子,d 2为多层推理模块网络层的维度。
在一实施例中,所述多层推理模块包括多层推理网络,各层所述推理网络均计算所述第三向量与第四向量信息交互的注意力值,所述计算所述第五向量的全局平局值,包括:
根据各层所述推理网络计算的第三向量与第四向量信息交互的注意力值,计算第三向量与第四向量信息交互的注意力值的总和;
对第三向量与第四向量信息交互的注意力值的总和求取平均值从而得到第五向量的全局平局值。
在一实施例中,所述对第三向量与第四向量信息交互的注意力值的总和求取平均值从而得到第五向量的全局平局值,包括:
将第三向量与第四向量信息交互的注意力值的总和除以多层推理网络的维度得到第三向量与第四向量信息交互的注意力值的平均值;
将第三向量与第四向量信息交互的注意力值的平均值乘以第二待测句子的值向量矩阵得到第五向量的全局平局值。
在一实施例中,所述匹配结果包括相似以及不相似,所述根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果,包括:
判断所述概率值是否大于预设阈值;
若所述概率值大于所述预设阈值,则判断所述第一待测句子与第二待测句子相似;
若所述概率值小于所述预设阈值,则判断所述第一待测句子与第二待测句子不相似。
在一实施例中,所述通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量之前,所述相似句匹配方法还包括:
使用对比自监督方法对多层编码器进行训练;
将经过训练的多层编码器与多层推理模块组成孪生网络模型;
对孪生网络模型进行训练。
在一实施例中,所述使用对比自监督方法对多层编码器进行训练,包括:
构建训练的正标签与负标签;
通过公式JS(x,y)=max(E[log(σ(T(x,y)))]+E[log(1-σ(T(x',y)))])计算损失值,其中T(x,y)与T(x',y)为分类器,(x,y)为正标签,(x',y)为负标签;
根据损失值调整多层编码器的参数。
所述存储介质为实体的、非瞬时性的存储介质,例如可以是U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、磁碟或者光盘等各种可以存储程序代码的实体存储介质。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的。例如,各个单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。
本申请实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减。本申请实施例装置中的单元可以根据实际需要进行合并、划分和删减。另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。
该集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,终端,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详细描述的部分,可以参见其他实施例的相关描述。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,尚且本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种相似句匹配方法,其中,孪生网络模型包括多层编码器和多层推理模块,所述相似句匹配方法包括:
    通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
    通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量;
    通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
    计算所述第五向量的全局平局值;
    对所述全局平局值进行归一化处理得到概率值;
    根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果。
  2. 根据权利要求1所述的相似句匹配方法,其中,所述通过所述多层编码器对所述第一向量进行编码得到第三向量,包括:
    通过方程
    Figure PCTCN2021097099-appb-100001
    计算所述第一向量的自注意力值从而得到第三向量,其中M Q为第一待测句子的查询向量矩阵,M K为第一待测句子的键向量矩阵,M V为第一待测句子的值向量矩阵,M为第一待测句子,d 1为多层编码器网络层的维度。
  3. 根据权利要求2所述的相似句匹配方法,其中,所述通过所述多层编码器对所述第二向量进行编码得到第四向量,包括:
    通过方程
    Figure PCTCN2021097099-appb-100002
    计算所述第二向量的自注意力值从而得到第四向量,其中N Q为第二待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,N为第二待测句子,d 1为多层编码器网络层的维度。
  4. 根据权利要求3所述的相似句匹配方法,其中,所述通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量,包括:
    通过方程
    Figure PCTCN2021097099-appb-100003
    计算所述第三向量与第四向量信息交互的注意力值从而得到第五向量,
    其中M Q为第一待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,M为第一待测句子,N为第二待测句子,d 2为多层推理模块网络层的维度。
  5. 根据权利要求4所述的相似句匹配方法,其中,所述多层推理模块包括多层推理网络,各层所述推理网络均计算所述第三向量与第四向量信息交互的注意力值,所述计算所述第五向量的全局平局值,包括:
    根据各层所述推理网络计算的第三向量与第四向量信息交互的注意力值,计算第三向量与第四向量信息交互的注意力值的总和;
    对第三向量与第四向量信息交互的注意力值的总和求取平均值从而得到第五向量的全局平局值。
  6. 根据权利要求5所述的相似句匹配方法,其中,所述对第三向量与第四向量信息交互 的注意力值的总和求取平均值从而得到第五向量的全局平局值,包括:
    将第三向量与第四向量信息交互的注意力值的总和除以多层推理网络的维度得到第三向量与第四向量信息交互的注意力值的平均值;
    将第三向量与第四向量信息交互的注意力值的平均值乘以第二待测句子的值向量矩阵得到第五向量的全局平局值。
  7. 根据权利要求1所述的相似句匹配方法,其中,所述匹配结果包括相似以及不相似,所述根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果,包括:
    判断所述概率值是否大于预设阈值;
    若所述概率值大于所述预设阈值,则判断所述第一待测句子与第二待测句子相似;
    若所述概率值小于所述预设阈值,则判断所述第一待测句子与第二待测句子不相似。
  8. 根据权利要求1所述的相似句匹配方法,其中,所述通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量之前,所述相似句匹配方法还包括:
    使用对比自监督方法对多层编码器进行训练;
    将经过训练的多层编码器与多层推理模块组成孪生网络模型;
    对孪生网络模型进行训练。
  9. 根据权利要求8所述的相似句匹配方法,其中,所述使用对比自监督方法对多层编码器进行训练,包括:
    构建训练的正标签与负标签;
    通过公式JS(x,y)=max(E[log(σ(T(x,y)))]+E[log(1-σ(T(x',y)))])计算损失值,其中T(x,y)与T(x',y)为分类器,(x,y)为正标签,(x',y)为负标签;
    根据损失值调整多层编码器的参数。
  10. 一种相似句匹配装置,其中,孪生网络模型包括多层编码器和多层推理模块,所述相似句匹配装置包括:
    转换单元,用于通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
    第一编码单元,用于通过所述多层编码器对所述第一向量进行编码得到第三向量,用于通过所述多层编码器对所述第二向量进行编码得到第四向量;
    交互处理单元,用于通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
    计算单元,用于计算所述第五向量的全局平局值;
    归一化处理单元,用于对所述全局平局值进行归一化处理得到概率值;
    判断单元,用于根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果。
  11. 一种计算机设备,所述计算机设备包括存储器及处理器,所述存储器上存储有计算机程序,所述处理器用于运行所述计算机程序,以执行如下步骤:
    通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
    通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量;
    通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
    计算所述第五向量的全局平局值;
    对所述全局平局值进行归一化处理得到概率值;
    根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果,其中,孪生网络模型包括多层编码器和多层推理模块。
  12. 根据权利要求11所述的计算机设备,其中,所述通过所述多层编码器对所述第一向 量进行编码得到第三向量,包括:
    通过方程
    Figure PCTCN2021097099-appb-100004
    计算所述第一向量的自注意力值从而得到第三向量,其中M Q为第一待测句子的查询向量矩阵,M K为第一待测句子的键向量矩阵,M V为第一待测句子的值向量矩阵,M为第一待测句子,d 1为多层编码器网络层的维度。
  13. 根据权利要求12所述的计算机设备,其中,所述通过所述多层编码器对所述第二向量进行编码得到第四向量,包括:
    通过方程
    Figure PCTCN2021097099-appb-100005
    计算所述第二向量的自注意力值从而得到第四向量,其中N Q为第二待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,N为第二待测句子,d 1为多层编码器网络层的维度。
  14. 根据权利要求13所述的计算机设备,其中,所述通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量,包括:
    通过方程
    Figure PCTCN2021097099-appb-100006
    计算所述第三向量与第四向量信息交互的注意力值从而得到第五向量,
    其中M Q为第一待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,M为第一待测句子,N为第二待测句子,d 2为多层推理模块网络层的维度。
  15. 根据权利要求14所述的计算机设备,其中,所述多层推理模块包括多层推理网络,各层所述推理网络均计算所述第三向量与第四向量信息交互的注意力值,所述计算所述第五向量的全局平局值,包括:
    根据各层所述推理网络计算的第三向量与第四向量信息交互的注意力值,计算第三向量与第四向量信息交互的注意力值的总和;
    对第三向量与第四向量信息交互的注意力值的总和求取平均值从而得到第五向量的全局平局值。
  16. 一种计算机可读存储介质,其中,所述存储介质存储有计算机程序,所述计算机程序当被处理器执行时如下步骤:
    通过预设的词向量训练工具分别将第一待测句子以及第二待测句子转换为第一向量以及第二向量;
    通过所述多层编码器对所述第一向量进行编码得到第三向量,通过所述多层编码器对所述第二向量进行编码得到第四向量;
    通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量;
    计算所述第五向量的全局平局值;
    对所述全局平局值进行归一化处理得到概率值;
    根据所述概率值判断所述第一待测句子与第二待测句子的匹配结果,其中,孪生网络模型包括多层编码器和多层推理模块。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述通过所述多层编码器对所述第一向量进行编码得到第三向量,包括:
    通过方程
    Figure PCTCN2021097099-appb-100007
    计算所述第一向量的自注意力值从而得到第三向量,其中M Q为第一待测句子的查询向量矩阵,M K为第一待测句子的键向量矩阵,M V为第一待测句子的值向量矩阵,M为第一待测句子,d 1为多层编码器网络层的维度。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述通过所述多层编码器对所述第二向量进行编码得到第四向量,包括:
    通过方程
    Figure PCTCN2021097099-appb-100008
    计算所述第二向量的自注意力值从而得到第四向量,其中N Q为第二待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,N为第二待测句子,d 1为多层编码器网络层的维度。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述通过所述多层推理模块对所述第三向量及第四向量进行信息交互处理从而得到第五向量,包括:
    通过方程
    Figure PCTCN2021097099-appb-100009
    计算所述第三向量与第四向量信息交互的注意力值从而得到第五向量,
    其中M Q为第一待测句子的查询向量矩阵,N K为第二待测句子的键向量矩阵,N V为第二待测句子的值向量矩阵,M为第一待测句子,N为第二待测句子,d 2为多层推理模块网络层的维度。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述多层推理模块包括多层推理网络,各层所述推理网络均计算所述第三向量与第四向量信息交互的注意力值,所述计算所述第五向量的全局平局值,包括:
    根据各层所述推理网络计算的第三向量与第四向量信息交互的注意力值,计算第三向量与第四向量信息交互的注意力值的总和;
    对第三向量与第四向量信息交互的注意力值的总和求取平均值从而得到第五向量的全局平局值。
PCT/CN2021/097099 2020-12-16 2021-05-31 相似句匹配方法、装置、计算机设备及存储介质 WO2022127041A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011483693.6A CN112507081B (zh) 2020-12-16 2020-12-16 相似句匹配方法、装置、计算机设备及存储介质
CN202011483693.6 2020-12-16

Publications (1)

Publication Number Publication Date
WO2022127041A1 true WO2022127041A1 (zh) 2022-06-23

Family

ID=74972433

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/097099 WO2022127041A1 (zh) 2020-12-16 2021-05-31 相似句匹配方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN112507081B (zh)
WO (1) WO2022127041A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112507081B (zh) * 2020-12-16 2023-05-23 平安科技(深圳)有限公司 相似句匹配方法、装置、计算机设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110238409A1 (en) * 2010-03-26 2011-09-29 Jean-Marie Henri Daniel Larcheveque Semantic Clustering and Conversational Agents
CN110083690A (zh) * 2019-04-10 2019-08-02 华侨大学 一种基于智能问答的对外汉语口语训练方法及系统
CN110688491A (zh) * 2019-09-25 2020-01-14 暨南大学 基于深度学习的机器阅读理解方法、系统、设备及介质
CN110795535A (zh) * 2019-10-28 2020-02-14 桂林电子科技大学 一种深度可分离卷积残差块的阅读理解方法
CN111538838A (zh) * 2020-04-28 2020-08-14 中国科学技术大学 基于文章的问题生成方法
CN112507081A (zh) * 2020-12-16 2021-03-16 平安科技(深圳)有限公司 相似句匹配方法、装置、计算机设备及存储介质

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9176949B2 (en) * 2011-07-06 2015-11-03 Altamira Technologies Corporation Systems and methods for sentence comparison and sentence-based search
KR102589638B1 (ko) * 2016-10-31 2023-10-16 삼성전자주식회사 문장 생성 장치 및 방법
CN108509411B (zh) * 2017-10-10 2021-05-11 腾讯科技(深圳)有限公司 语义分析方法和装置
WO2019081776A1 (en) * 2017-10-27 2019-05-02 Babylon Partners Limited METHOD AND SYSTEM FOR DETERMINATION IMPLEMENTED BY COMPUTER
CN108304390B (zh) * 2017-12-15 2020-10-16 腾讯科技(深圳)有限公司 基于翻译模型的训练方法、训练装置、翻译方法及存储介质
US10860630B2 (en) * 2018-05-31 2020-12-08 Applied Brain Research Inc. Methods and systems for generating and traversing discourse graphs using artificial neural networks
CN110895553A (zh) * 2018-08-23 2020-03-20 国信优易数据有限公司 语义匹配模型训练方法、语义匹配方法及答案获取方法
CN110309282B (zh) * 2019-06-14 2021-08-27 北京奇艺世纪科技有限公司 一种答案确定方法及装置
CN111723547A (zh) * 2020-05-25 2020-09-29 河海大学 一种基于预训练语言模型的文本自动摘要方法
CN111611809B (zh) * 2020-05-26 2023-04-18 西藏大学 一种基于神经网络的汉语语句相似度计算方法
CN111783430A (zh) * 2020-08-04 2020-10-16 腾讯科技(深圳)有限公司 句对匹配率的确定方法、装置、计算机设备和存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110238409A1 (en) * 2010-03-26 2011-09-29 Jean-Marie Henri Daniel Larcheveque Semantic Clustering and Conversational Agents
CN110083690A (zh) * 2019-04-10 2019-08-02 华侨大学 一种基于智能问答的对外汉语口语训练方法及系统
CN110688491A (zh) * 2019-09-25 2020-01-14 暨南大学 基于深度学习的机器阅读理解方法、系统、设备及介质
CN110795535A (zh) * 2019-10-28 2020-02-14 桂林电子科技大学 一种深度可分离卷积残差块的阅读理解方法
CN111538838A (zh) * 2020-04-28 2020-08-14 中国科学技术大学 基于文章的问题生成方法
CN112507081A (zh) * 2020-12-16 2021-03-16 平安科技(深圳)有限公司 相似句匹配方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN112507081A (zh) 2021-03-16
CN112507081B (zh) 2023-05-23

Similar Documents

Publication Publication Date Title
WO2021114840A1 (zh) 基于语义分析的评分方法、装置、终端设备及存储介质
WO2022068195A1 (zh) 跨模态的数据处理方法、装置、存储介质以及电子装置
US20220284321A1 (en) Visual-semantic representation learning via multi-modal contrastive training
WO2022134728A1 (zh) 一种图像检索方法、系统、设备以及介质
WO2020244475A1 (zh) 用于语言序列标注的方法、装置、存储介质及计算设备
CN113407660B (zh) 非结构化文本事件抽取方法
WO2021174922A1 (zh) 语句情感分类方法及相关设备
WO2021208727A1 (zh) 基于人工智能的文本错误检测方法、装置、计算机设备
WO2021120779A1 (zh) 一种基于人机对话的用户画像构建方法、系统、终端及存储介质
CN110377733B (zh) 一种基于文本的情绪识别方法、终端设备及介质
WO2020192307A1 (zh) 基于深度学习的答案抽取方法、装置、计算机设备和存储介质
CN112800292A (zh) 一种基于模态特定和共享特征学习的跨模态检索方法
WO2021072863A1 (zh) 文本相似度计算方法、装置、电子设备及计算机可读存储介质
CN112183881A (zh) 一种基于社交网络的舆情事件预测方法、设备及存储介质
US20220230061A1 (en) Modality adaptive information retrieval
WO2022127041A1 (zh) 相似句匹配方法、装置、计算机设备及存储介质
CN115146068B (zh) 关系三元组的抽取方法、装置、设备及存储介质
WO2022228127A1 (zh) 要素文本处理方法、装置、电子设备和存储介质
WO2023134069A1 (zh) 实体关系的识别方法、设备及可读存储介质
WO2024109597A1 (zh) 文本合并判断模型的训练方法和文本合并判断方法
Gan et al. Chinese named entity recognition based on bert-transformer-bilstm-crf model
WO2021147404A1 (zh) 依存关系分类方法及相关设备
WO2023116572A1 (zh) 一种词句生成方法及相关设备
WO2024045318A1 (zh) 自然语言预训练模型训练方法、装置、设备及存储介质
CN115033683B (zh) 摘要生成方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21904976

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21904976

Country of ref document: EP

Kind code of ref document: A1