WO2022116436A1 - Text semantic matching method and apparatus for long and short sentences, computer device and storage medium - Google Patents

Text semantic matching method and apparatus for long and short sentences, computer device and storage medium Download PDF

Info

Publication number
WO2022116436A1
WO2022116436A1 PCT/CN2021/083780 CN2021083780W WO2022116436A1 WO 2022116436 A1 WO2022116436 A1 WO 2022116436A1 CN 2021083780 W CN2021083780 W CN 2021083780W WO 2022116436 A1 WO2022116436 A1 WO 2022116436A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
matched
target sample
semantic
length
Prior art date
Application number
PCT/CN2021/083780
Other languages
French (fr)
Chinese (zh)
Inventor
谢静文
阮晓雯
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022116436A1 publication Critical patent/WO2022116436A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Definitions

  • the present application relates to the technical field of semantic parsing, and in particular, to a method, apparatus, computer device and storage medium for semantic matching of long and short sentences.
  • Embodiments of the present application provide a method, an apparatus, a computer device, and a storage medium for semantic matching of long and short sentences, so as to solve the problem of low accuracy of semantic matching between long and short sentences.
  • a method for semantic matching of long and short sentences including:
  • the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
  • the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • a long-short sentence text semantic matching device comprising:
  • a sentence obtaining module configured to obtain a sentence to be matched and a target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
  • a window length recording module configured to record the length of the character to be matched as the window length of the sliding window when the length of the character to be matched is less than the length of the character of the target sample;
  • the first sentence matching module is used to slide the sliding window on the target sample sentence, and match the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain the first sentence. lexical distance results;
  • the first semantic score determination module is used to determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
  • a matching sentence determination module configured to record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence when the first semantic score exceeds a preset score threshold.
  • a computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the following steps when executing the computer-readable instructions:
  • the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
  • the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • One or more readable storage media storing computer-readable instructions, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
  • the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • the method obtains the sentence to be matched and the target sample sentence, the length of the character to be matched corresponding to the sentence to be matched and the target sample character corresponding to the target sample sentence Length comparison is carried out; when the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window; sliding the sliding window on the target sample sentence, the The target sample field of the target sample sentence covered by the sliding window is matched with the sentence to be matched to obtain a first semantic distance result; according to the first semantic distance corresponding to the sentence to be matched and the target sample sentence As a result, the first semantic score between the sentence to be matched and the target sample sentence is determined; when the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as the sentence corresponding to the sentence to be matched. Semantically match sentences.
  • the present application defines a sliding window indicator to match the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first word sense distance result, and then according to the first word sense distance
  • the first semantic score between the sentence to be matched and the target sample sentence is determined to determine whether there is part of the semantic information in the target sample sentence that matches the sentence to be matched, so that the target sample sentence that will not be recalled (target sample sentence)
  • target sample sentence When the semantic similarity between the sentence and the sentence to be matched is less than the preset similarity threshold, it will be directly determined that the target sample sentence does not match the sentence to be matched), and there is a possibility of being recalled.
  • the target scene provides more sample data, and also improves the semantic matching accuracy between short sentences and long sentences.
  • FIG. 1 is a schematic diagram of an application environment of a method for semantic matching of long and short sentences in an embodiment of the present application
  • FIG. 2 is a flowchart of a method for semantic matching of long and short sentences in an embodiment of the present application
  • step S30 is a flowchart of step S30 in the method for semantic matching of long and short sentences in an embodiment of the present application
  • step S40 is a flowchart of step S40 in the method for semantic matching of long and short sentences in an embodiment of the present application
  • 5 is another flowchart of a method for semantic matching of long and short sentences in an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a device for semantic matching of long and short sentences in an embodiment of the present application
  • FIG. 7 is a schematic block diagram of a first sentence matching module in a device for semantic matching of long and short sentences in an embodiment of the present application
  • FIG. 8 is a schematic block diagram of a first semantic score determination module in a long-short sentence text semantic matching device according to an embodiment of the present application
  • FIG. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the method for semantic matching of long-short sentence text can be applied in the application environment shown in FIG. 1 .
  • the long-short sentence text semantic matching method is applied in a long and short sentence text semantic matching system, and the long and short sentence text semantic matching system includes a client and a server as shown in FIG.
  • the problem of low accuracy of semantic matching between them refers to the client, also known as the client, refers to the program corresponding to the server and providing local services for the client.
  • Clients can be installed on, but not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a method for semantic matching of long and short sentences is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • S10 Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence.
  • the sentences to be matched may be sentences in different application scenarios.
  • the sentences to be matched may be sentences for robot reply.
  • the target sample sentences can also be sentences in different application scenarios.
  • the target sample sentences and the sentences to be matched are sentences in the same application scenario.
  • the length of the characters to be matched refers to the number of characters in the sentence to be matched; the length of the target sample character refers to the number of characters in the target sample sentence.
  • step S10 before step S10, it further includes:
  • S01 Obtain a sentence to be matched and a target sample text; the target sample text includes multiple sentences.
  • the target sample text is the text waiting to be detected whether there is a sentence semantically matching the sentence to be matched, and the target sample text contains multiple sentences. It is understandable that the segmentation processing based on the period form is performed on the target sample text, that is, a sentence in the target sample text that ends with a period is segmented (because usually a complete sentence contains an independent semantic information). Generally, the sentence to be matched is a single sentence, and if there are multiple periods in the sentence to be matched, it can also be split.
  • S02 Input the to-be-matched sentence and each of the sentences into a preset similarity recognition model, and determine the semantic similarity between the to-be-matched sentence and each of the sentences.
  • the preset similarity recognition model may be a model pre-trained by methods such as machine learning, and the preset similarity recognition model is used to determine the semantic similarity between two sentences.
  • S03 Determine the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences.
  • the preset similarity threshold may be set according to the requirements of the actual application scenario.
  • the preset similarity threshold may be set to 0.9, 0.95, or the like.
  • the preset similarity difference value can be any value from 0.1 to 0.5.
  • the semantic similarity between the to-be-matched sentence and each of the sentences is determined, due to the needs discussed here.
  • Select the sentence with the highest semantic similarity to the sentence to be matched and the most matching sentence so determine the highest semantic similarity in the semantic similarity corresponding to the sentence to be matched and each sentence, and compare the highest semantic similarity with the sentence.
  • the preset similarity thresholds are compared, and when the highest semantic similarity is less than the preset similarity threshold, the difference between the highest semantic similarity and the preset similarity threshold is determined, and the difference is compared with the preset similarity.
  • the difference is compared, and when the difference is smaller than the preset similarity difference, the sentence corresponding to the highest semantic similarity is recorded as the target sample sentence.
  • the preset similarity recognition model determines that the semantic similarity between two sentences is less than the preset similarity threshold, it will determine that the two sentences are not similar. For sentences corresponding to less than the preset similarity threshold, it is determined whether the difference between the semantic similarity and the preset similarity threshold is less than the preset similarity difference, and further semantic similarity judgment is performed through steps S10-S40.
  • the method further includes:
  • the sentence corresponding to the highest semantic similarity is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • the highest semantic similarity is greater than or equal to the preset similarity threshold, it means that the sentence corresponding to the highest semantic similarity is a sentence that is semantically matched with the sentence to be matched, then the sentence corresponding to the highest semantic similarity is directly matched with the highest semantic similarity.
  • the corresponding sentence is recorded as a semantic matching sentence corresponding to the sentence to be matched.
  • step S10 before step S10, that is, before obtaining the length of the characters to be matched corresponding to the sentences to be matched, and the length of the target sample characters corresponding to each of the target sample sentences, the steps include:
  • the preset text recognition model may be a word2vec or bert model trained based on a large number of training samples, and the preset text recognition model is used to perform word vector conversion on sentences.
  • a preset text recognition model is obtained, and the sentence to be matched is input into the preset text recognition model, and word embedding processing is performed on the sentence to be matched, that is, word segmentation is performed on the sentence to be matched.
  • word embedding processing is performed on the sentence to be matched, that is, word segmentation is performed on the sentence to be matched.
  • it is converted into a word vector, and the word vector to be matched corresponding to the sentence to be matched is obtained; in the same way, the target sample sentence is input into the preset text recognition model, and word embedding processing is performed on the target sample sentence to obtain the target sample sentence.
  • the corresponding target sample word vector is obtained by obtaining the sentence to be matched and the target sample sentence.
  • the recognition model inputting the to-be-matched sentence into the preset text recognition model to obtain a to-be-matched word vector corresponding to the to-be-matched sentence; at the same time, inputting the target sample sentence into the preset text
  • the recognition model after obtaining the target sample word vector corresponding to the target sample sentence, the length of the to-be-matched characters of the to-be-matched sentence is determined according to the specific number of each to-be-matched word vector; at the same time, the target sample is determined according to the specific number of each target sample word vector.
  • the target sample character length of the sentence is determined according to the specific number of each target sample word vector.
  • the length of the character to be matched is recorded as the window length of the sliding window.
  • the target sample field refers to the character segment covered by the sliding window in the target sample sentence.
  • the word sense distance result represents whether there is key sense information between the sentence to be matched and the target sample sentence.
  • step S30 it includes:
  • S301 Align the start characters of the target sample sentence with the start characters of the to-be-matched sentence, and record the first target sample field covered by the sliding window as the first intercepted sentence.
  • the starting character refers to the character at the starting position (ie, the first position) in the sentence.
  • the length of the first intercepted sentence is equal to the length of the window and also equal to the length of the characters to be matched.
  • S302 Perform semantic matching on the first intercepted sentence and the to-be-matched sentence to obtain a semantic result of the to-be-matched sentence and the first intercepted sentence.
  • a word sense result can be regarded as a word sense distance value, that is, it represents the word sense distance between each intercepted sentence and the sentence to be matched.
  • the first The intercepted sentence and the to-be-matched sentence are semantically matched to obtain a semantic result of the to-be-matched sentence and the first intercepted sentence, where the semantic result represents whether the first intercepted sentence and the to-be-matched sentence are semantically similar.
  • semantic matching here is based on sentence structure information for semantic matching judgment, and the sentence structure information represents whether the sentence character composition between the first intercepted sentence and the sentence to be matched is similar, or whether the structure is similar (such as sentence The structure is subject, predicate, object, etc.), which can be used as a supplement to semantic information.
  • the relationship between the first intercepted sentence and the to-be-matched sentence is represented.
  • S304 Perform semantic matching on the second intercepted sentence and the to-be-matched sentence to obtain a semantic result of the to-be-matched sentence and the second intercepted sentence.
  • the second intercepted sentence is The sentence and the to-be-matched sentence are semantically matched to obtain a semantic result of the to-be-matched sentence and the second intercepted sentence, where the semantic result represents whether the second intercepted sentence and the to-be-matched sentence are semantically similar.
  • the end character refers to the last character in the sentence.
  • S40 Determine a first semantic score between the sentence to be matched and the target sample sentence according to the result of the first semantic distance corresponding to the sentence to be matched and the target sample sentence.
  • the first semantic score indicates the semantic similarity between the target sample sentence and the sentence to be matched. The higher the first semantic score indicates that the target sample sentence contains more key semantic information matching the sentence to be matched.
  • step S40 that is, according to the first word sense distance result corresponding to the sentence to be matched and the target sample sentence, determine the distance between the sentence to be matched and the target sample sentence.
  • the first semantic score including:
  • S401 Perform derivation processing on a first word sense distance result corresponding to the sentence to be matched and the target sample sentence, to obtain a word meaning curve corresponding to the first word meaning distance result.
  • the first word sense distance result can be regarded as a continuous density sequence formed by integrating multiple word sense results, and a word sense curve corresponding to the first word sense distance result can be obtained by derivation of the first word sense distance result.
  • S402 Determine whether there is a word-meaning peak in the word-meaning curve through a peak-seeking identification algorithm.
  • the peak-seeking identification algorithm is used to find out whether a word-meaning peak appears in the word-meaning curve, and the word-meaning peak is used to represent the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word-meaning curve.
  • the peak-seeking identification algorithm can perform a global search in the word meaning curve. During the global search process, if the word meaning curve has a point where the curve first rises and then falls, it is a word meaning peak.
  • the peak search algorithm is used to identify the word meaning curve in the word sense curve. Find out whether there is a word meaning peak, if there is a word meaning peak in the word meaning curve, the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve can be determined according to the word meaning peak.
  • the first semantic score between the sentence to be matched and the target sample sentence corresponding to the semantic curve e.g., the larger the peak value of the word sense peak, the higher the first semantic score; or the larger the area occupied by the word sense peak, the higher the first semantic score.
  • the preset score threshold may be determined according to different application scenarios, and for example, the preset score threshold may be a value such as 90 or 95.
  • a sliding window indicator is defined to match the target sample field of the target sample sentence covered by the sliding window with the to-be-matched sentence to obtain a first word sense distance result, and then according to the The first semantic distance result determines the first semantic score between the sentence to be matched and the target sample sentence, so as to determine whether there is part of the semantic information in the target sample sentence that matches the sentence to be matched, so that the target sample that would not be recalled originally Sentence (when the semantic similarity between the target sample sentence and the sentence to be matched is less than the preset similarity threshold, it will be directly determined that the target sample sentence does not match the sentence to be matched), and there is a possibility of being recalled.
  • step S10 it further includes:
  • the target sample character length as the window length of the sliding window.
  • the starting character of the target sample sentence is compared with the The starting characters of the sentence to be matched are aligned, and the first field to be matched covered by the sliding window (that is, the field in the sentence to be matched that is made up of characters of the same length as the window length) is recorded as the third interception sentence.
  • the second semantic score indicates the semantic similarity between the target sample sentence and the sentence to be matched.
  • a higher second semantic score indicates that the target sample sentence contains more key semantic information matching the sentence to be matched.
  • the second semantic score is compared with the predicted The score threshold is set for comparison, and when the second semantic score exceeds the preset score threshold, the target sample sentence is recorded as the semantic matching sentence corresponding to the sentence to be matched; when the second semantic score does not exceed the preset score threshold, it is characterized.
  • the target sample sentence does not semantically match the sentence to be matched.
  • a long-short sentence text semantic matching device is provided, and the long and short sentence text semantic matching device is in one-to-one correspondence with the long-short sentence text semantic matching method in the above embodiment.
  • the apparatus for semantic matching of long and short sentences includes a sentence acquisition module 10 , a first window length recording module 20 , a first sentence matching module 30 , a first semantic score determination module 40 and a first matched sentence determination module 50 .
  • the detailed description of each functional module is as follows:
  • the sentence obtaining module 10 is configured to obtain a sentence to be matched and a target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
  • the first window length recording module 20 is configured to record the length of the character to be matched as the window length of the sliding window when the length of the character to be matched is less than the length of the character of the target sample;
  • the first sentence matching module 30 is used to slide the sliding window on the target sample sentence, and match the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain the first sentence. a semantic distance result;
  • the first semantic score determination module 40 is configured to determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
  • the first matching sentence determination module 50 is configured to record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence when the first semantic score exceeds a preset score threshold.
  • the long-short sentence text semantic matching device further comprises:
  • a sample text obtaining module used to obtain a sentence to be matched and a target sample text;
  • the target sample text contains a plurality of sentences;
  • a semantic similarity determination module configured to input the to-be-matched sentence and each of the sentences into a preset similarity recognition model to determine the semantic similarity between the to-be-matched sentence and each of the sentences;
  • a highest similarity determination module used for determining the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences;
  • the target sample sentence determination module is used to determine the semantic similarity with the highest semantic similarity when the highest semantic similarity is smaller than the preset similarity threshold and the difference between it and the preset similarity threshold is smaller than the preset similarity difference
  • the sentence corresponding to the degree is recorded as the target sample sentence.
  • the long-short sentence text semantic matching device further comprises:
  • the semantic matching sentence recording module is configured to record the sentence corresponding to the highest semantic similarity as the semantic matching sentence corresponding to the to-be-matched sentence when the highest semantic similarity is greater than or equal to a preset similarity threshold.
  • the long-short sentence text semantic matching device further comprises:
  • the text recognition model acquisition module is used to acquire the preset text recognition model
  • a word vector determination module configured to input the sentence to be matched into the preset text recognition model to obtain a word vector to be matched corresponding to the sentence to be matched; at the same time, input the target sample sentence into the In the preset text recognition model, the target sample word vector corresponding to the target sample sentence is obtained;
  • a character length determination module configured to determine the length of the characters to be matched in the sentence to be matched according to each of the word vectors to be matched; at the same time, determine the length of the target sample characters of the target sample sentence according to each of the target sample word vectors.
  • the first sentence matching module 30 includes the following units:
  • Character alignment unit 301 for aligning the starting character of the target sample sentence with the starting character of the sentence to be matched, and recording the first target sample field covered by the sliding window as the first intercepted sentence;
  • a first semantic matching unit 302 configured to perform semantic matching on the first intercepted sentence and the to-be-matched sentence, to obtain a semantic result of the to-be-matched sentence and the first intercepted sentence;
  • Window sliding unit 303 for sliding the sliding window to the right by one character length on the target sample sentence, and recording the second target sample field covered by the sliding window as the second interception sentence;
  • the second semantic matching unit 304 is configured to perform semantic matching between the second intercepted sentence and the to-be-matched sentence to obtain a semantic result of the to-be-matched sentence and the second intercepted sentence;
  • the lexical distance result recording unit 305 is configured to record all semantic results as the first lexical distance result when it is detected that the end character of the sliding window has been aligned with the end character of the target sample sentence.
  • the first semantic score determination module 40 includes:
  • a word meaning curve determination unit 401 configured to perform derivation processing on the first word meaning distance result corresponding to the sentence to be matched and the target sample sentence, to obtain a word meaning curve corresponding to the first word meaning distance result;
  • a word meaning peak determining unit 402 configured to determine whether there is a word meaning peak in the word meaning curve through a peak-seeking identification algorithm
  • a first semantic score determination unit 403, configured to determine, according to the word meaning peak, a first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve when there is a word meaning peak in the word meaning curve;
  • the second semantic score determining unit 404 is configured to determine that the first semantic score between the sentence to be matched and the target sample sentence corresponding to the semantic curve is 0 when the word meaning curve does not have a word meaning peak.
  • the long-short sentence text semantic matching device further comprises:
  • a second window length recording module configured to record the length of the target sample character as the window length of the sliding window when the length of the character to be matched is greater than or equal to the length of the target sample character;
  • the second sentence matching module is used to slide the sliding window on the sentence to be matched, and match the to-be-matched field of the sentence to be matched covered by the sliding window with the target sample sentence to obtain the second sentence. lexical distance results;
  • a second semantic score determination module configured to determine a second semantic score between the to-be-matched sentence and the target sample sentence according to the second semantic distance result corresponding to the to-be-matched sentence and the target sample sentence;
  • the second matching sentence determination module is configured to record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence when the second semantic score exceeds the preset score threshold.
  • Each module in the above-mentioned apparatus for semantic matching of long and short sentences can be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a readable storage medium, an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer-readable instructions in the readable storage medium.
  • the database of the computer device is used to store the data used in the text semantic matching method of long and short sentences in the above embodiment.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer readable instructions when executed by a processor, implement a method for semantic matching of long and short sentences.
  • the readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • a computer apparatus comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the processor executes the computer
  • the following steps are implemented when readable instructions:
  • the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
  • the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • one or more readable storage media having computer-readable instructions stored thereon, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processing The device performs the following steps:
  • the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
  • the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Abstract

A text semantic matching method and apparatus for long and short sentences, a computer device and a storage medium. The method comprises: obtaining a sentence to be matched and a target sample sentence, and comparing the length of characters to be matched corresponding to the sentence with the length of target sample characters corresponding to the target sample sentence (S10); when the length of the characters is less than the length of the target sample characters, recording the length of the characters as the window length of a sliding window (S20); sliding the sliding window on the target sample sentence, matching a target sample field of the target sample sentence covered by the sliding window with the sentence to obtain a first semantic distance result (S30); determining a first semantic score between the sentence and the target sample sentence according to the first semantic distance result corresponding to the sentence and the target sample sentence (S40); when the first semantic score exceeds a preset score threshold, recording the target sample sentence as a semantic matching sentence corresponding to the sentence (S50). The method improves the accuracy of semantic matching between long and short sentences.

Description

长短句文本语义匹配方法、装置、计算机设备及存储介质Method, device, computer equipment and storage medium for semantic matching of long and short sentences
本申请要求于2020年12月1日提交中国专利局、申请号为202011382663.6,发明名称为“长短句文本语义匹配方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on December 1, 2020 with the application number 202011382663.6 and the title of the invention is "Long and Short Sentence Text Semantic Matching Method, Device, Computer Equipment and Storage Medium", the entire content of which is approved by Reference is incorporated in this application.
技术领域technical field
本申请涉及语义解析技术领域,尤其涉及一种长短句文本语义匹配方法、装置、计算机设备及存储介质。The present application relates to the technical field of semantic parsing, and in particular, to a method, apparatus, computer device and storage medium for semantic matching of long and short sentences.
背景技术Background technique
随着科学技术的发展,自然语言处理技术领域也逐步发展,且自然语言处理技术已被广泛应用在如相似句子匹配,相似表述召回等各个场景中。With the development of science and technology, the field of natural language processing technology has also gradually developed, and natural language processing technology has been widely used in various scenarios such as similar sentence matching and similar expression recall.
发明人意识到,目前,针对相似句子匹配,相似表述召回等场景,常常通过端到端的深度学习模型或者是无监督语义匹配进行文本匹配,以直接输出两个句子之间的语义相似度,进而进行相似度比对;针对短句子匹配则通过端到端的模型或者字符匹配等方法;但是,现有技术中针对文本与短句子之间的匹配,常常需要将文本拆解成与短句子具有相同字符的字段之后,才进行短句子与字段之间的相似度匹配,并且针对于文本与短句子之间的匹配,通过端到端的模型往往不能够准确覆盖所有语义信息,而通过字符相似度容器进行判断又容易造成误判,导致语义匹配相似度较低。The inventor realized that at present, for similar sentence matching, similar expression recall and other scenarios, text matching is often performed through an end-to-end deep learning model or unsupervised semantic matching to directly output the semantic similarity between two sentences, and then Similarity comparison is performed; for short sentence matching, end-to-end models or character matching methods are used; however, for matching between text and short sentences in the prior art, it is often necessary to disassemble the text into the same short sentences as the text. After the field of characters, the similarity between short sentences and fields is matched, and for the matching between text and short sentences, the end-to-end model often cannot accurately cover all semantic information, but through the character similarity container Judgment is easy to cause misjudgment, resulting in low semantic matching similarity.
申请内容Application content
本申请实施例提供一种长短句文本语义匹配方法、装置、计算机设备及存储介质,以解决长短句之间语义匹配准确率较低的问题。Embodiments of the present application provide a method, an apparatus, a computer device, and a storage medium for semantic matching of long and short sentences, so as to solve the problem of low accuracy of semantic matching between long and short sentences.
一种长短句文本语义匹配方法,包括:A method for semantic matching of long and short sentences, including:
获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;When the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;Sliding the sliding window on the target sample sentence, and matching the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first semantic distance result;
根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;Determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。When the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
一种长短句文本语义匹配装置,包括:A long-short sentence text semantic matching device, comprising:
句子获取模块,用于获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;a sentence obtaining module, configured to obtain a sentence to be matched and a target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
窗口长度记录模块,用于在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;A window length recording module, configured to record the length of the character to be matched as the window length of the sliding window when the length of the character to be matched is less than the length of the character of the target sample;
第一句子匹配模块,用于在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;The first sentence matching module is used to slide the sliding window on the target sample sentence, and match the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain the first sentence. lexical distance results;
第一语义得分确定模块,用于根据所述待匹配句子与所述目标样本句子对应的第一词 义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;The first semantic score determination module is used to determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
匹配句子确定模块,用于在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。A matching sentence determination module, configured to record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence when the first semantic score exceeds a preset score threshold.
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the following steps when executing the computer-readable instructions:
获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;When the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;Sliding the sliding window on the target sample sentence, and matching the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first semantic distance result;
根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;Determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。When the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer-readable instructions, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;When the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;Sliding the sliding window on the target sample sentence, and matching the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first semantic distance result;
根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;Determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。When the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
上述长短句文本语义匹配方法、装置、计算机设备及存储介质,该方法通过获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。The above-mentioned long and short sentence text semantic matching method, device, computer equipment and storage medium, the method obtains the sentence to be matched and the target sample sentence, the length of the character to be matched corresponding to the sentence to be matched and the target sample character corresponding to the target sample sentence Length comparison is carried out; when the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window; sliding the sliding window on the target sample sentence, the The target sample field of the target sample sentence covered by the sliding window is matched with the sentence to be matched to obtain a first semantic distance result; according to the first semantic distance corresponding to the sentence to be matched and the target sample sentence As a result, the first semantic score between the sentence to be matched and the target sample sentence is determined; when the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as the sentence corresponding to the sentence to be matched. Semantically match sentences.
本申请通过定义一个滑动窗口指标,以将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果,进而根据该第一词义距离结果确定待匹配句子与所述目标样本句子之间的第一语义得分,以确定目标样本句子中是否存在部分语义信息与待匹配句子相匹配,使得原本不会被召回的目标样本句子(目标样本句子与待匹配句子的语义相似度小于预设相似度阈值时,会被直接判定为目标样本句子与待匹配句子不匹配),存在可能被召回的可能性,进而通过本申请可以为一些缺少样本的目标场景,提供更多的样本数据,同时还提高了短句子与长句子之间的语义匹配准确率。The present application defines a sliding window indicator to match the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first word sense distance result, and then according to the first word sense distance As a result, the first semantic score between the sentence to be matched and the target sample sentence is determined to determine whether there is part of the semantic information in the target sample sentence that matches the sentence to be matched, so that the target sample sentence that will not be recalled (target sample sentence) When the semantic similarity between the sentence and the sentence to be matched is less than the preset similarity threshold, it will be directly determined that the target sample sentence does not match the sentence to be matched), and there is a possibility of being recalled. The target scene provides more sample data, and also improves the semantic matching accuracy between short sentences and long sentences.
本申请的一个或多个实施例的细节在下面的附图和描述中提出,本申请的其他特征和 优点将从说明书、附图以及权利要求变得明显。The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below, and other features and advantages of the application will be apparent from the description, drawings, and claims.
附图说明Description of drawings
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the technical solutions of the embodiments of the present application more clearly, the following briefly introduces the drawings that are used in the description of the embodiments of the present application. Obviously, the drawings in the following description are only some embodiments of the present application. , for those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative labor.
图1是本申请一实施例中长短句文本语义匹配方法的一应用环境示意图;1 is a schematic diagram of an application environment of a method for semantic matching of long and short sentences in an embodiment of the present application;
图2是本申请一实施例中长短句文本语义匹配方法的一流程图;2 is a flowchart of a method for semantic matching of long and short sentences in an embodiment of the present application;
图3是本申请一实施例中长短句文本语义匹配方法中步骤S30的一流程图;3 is a flowchart of step S30 in the method for semantic matching of long and short sentences in an embodiment of the present application;
图4是本申请一实施例中长短句文本语义匹配方法中步骤S40的一流程图;4 is a flowchart of step S40 in the method for semantic matching of long and short sentences in an embodiment of the present application;
图5是本申请一实施例中长短句文本语义匹配方法的另一流程图;5 is another flowchart of a method for semantic matching of long and short sentences in an embodiment of the present application;
图6是本申请一实施例中长短句文本语义匹配装置的一原理框图;6 is a schematic block diagram of a device for semantic matching of long and short sentences in an embodiment of the present application;
图7是本申请一实施例中长短句文本语义匹配装置中第一句子匹配模块的一原理框图;7 is a schematic block diagram of a first sentence matching module in a device for semantic matching of long and short sentences in an embodiment of the present application;
图8是本申请一实施例中长短句文本语义匹配装置中第一语义得分确定模块的一原理框图;8 is a schematic block diagram of a first semantic score determination module in a long-short sentence text semantic matching device according to an embodiment of the present application;
图9是本申请一实施例中计算机设备的一示意图。FIG. 9 is a schematic diagram of a computer device in an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. Based on the embodiments in the present application, all other embodiments obtained by those of ordinary skill in the art without creative work fall within the protection scope of the present application.
本申请实施例提供的长短句文本语义匹配方法,该长短句文本语义匹配方法可应用如图1所示的应用环境中。具体地,该长短句文本语义匹配方法应用在长短句文本语义匹配系统中,该长短句文本语义匹配系统包括如图1所示的客户端和服务器,客户端与服务器通过网络进行通信,用于解决长短句之间语义匹配准确率较低问题。其中,客户端又称为用户端,是指与服务器相对应,为客户提供本地服务的程序。客户端可安装在但不限于各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备上。服务器可以用独立的服务器或者是多个服务器组成的服务器集群来实现。The method for semantic matching of long-short sentence text provided by the embodiment of the present application can be applied in the application environment shown in FIG. 1 . Specifically, the long-short sentence text semantic matching method is applied in a long and short sentence text semantic matching system, and the long and short sentence text semantic matching system includes a client and a server as shown in FIG. The problem of low accuracy of semantic matching between them. Among them, the client, also known as the client, refers to the program corresponding to the server and providing local services for the client. Clients can be installed on, but not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices. The server can be implemented as an independent server or a server cluster composed of multiple servers.
在一实施例中,如图2所示,提供一种长短句文本语义匹配方法,以该方法应用在图1中的服务器为例进行说明,包括如下步骤:In one embodiment, as shown in FIG. 2 , a method for semantic matching of long and short sentences is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
S10:获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较。S10: Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence.
其中,待匹配句子可以为各个不同应用场景下的句子,示例性地,在多轮智能交互机器人领域中,待匹配句子可以为机器人答复句子。目标样本句子也可以为各个不同应用场景下的句子,优选地,目标样本句子与待匹配句子为同一个应用场景下的句子。待匹配字符长度指的是待匹配句子中字符数量;目标样本字符长度指的是目标样本句子中字符数量。进一步地,待匹配句子与目标样本句子之间字符长度差距较大,如待匹配句子字符长度为4‐6个字符,而目标样本字符长度为12‐16个字符。The sentences to be matched may be sentences in different application scenarios. Exemplarily, in the field of multi-round intelligent interactive robots, the sentences to be matched may be sentences for robot reply. The target sample sentences can also be sentences in different application scenarios. Preferably, the target sample sentences and the sentences to be matched are sentences in the same application scenario. The length of the characters to be matched refers to the number of characters in the sentence to be matched; the length of the target sample character refers to the number of characters in the target sample sentence. Further, there is a large difference in character length between the sentence to be matched and the target sample sentence. For example, the character length of the sentence to be matched is 4-6 characters, while the character length of the target sample is 12-16 characters.
在一具体实施例中,步骤S10之前还包括:In a specific embodiment, before step S10, it further includes:
S01:获取待匹配句子以及目标样本文本;所述目标样本文本中包含多个句子。S01: Obtain a sentence to be matched and a target sample text; the target sample text includes multiple sentences.
其中,目标样本文本为等待检测是否存在与待匹配句子语义匹配的句子的文本,该目标样本文本中包含多个句子。可以理解地,对目标样本文本进行基于句号形式的分割处理, 也即将目标样本文本中以句号结尾的一个句子进行分割出来(因为通常一个完整句子中会包含一个独立的语义信息)。一般地,待匹配句子为一个单句,若待匹配句子中存在多个句号,则也可以对其进行拆分。The target sample text is the text waiting to be detected whether there is a sentence semantically matching the sentence to be matched, and the target sample text contains multiple sentences. It is understandable that the segmentation processing based on the period form is performed on the target sample text, that is, a sentence in the target sample text that ends with a period is segmented (because usually a complete sentence contains an independent semantic information). Generally, the sentence to be matched is a single sentence, and if there are multiple periods in the sentence to be matched, it can also be split.
S02:将所述待匹配句子与各所述句子输入至预设相似度识别模型中,确定所述待匹配句子与各所述句子之间的语义相似度。S02: Input the to-be-matched sentence and each of the sentences into a preset similarity recognition model, and determine the semantic similarity between the to-be-matched sentence and each of the sentences.
其中,预设相似度识别模型可以通过如机器学习等方法预先训练好的模型,该预设相似度识别模型用于判断两个句子之间的语义相似度。The preset similarity recognition model may be a model pre-trained by methods such as machine learning, and the preset similarity recognition model is used to determine the semantic similarity between two sentences.
具体地,在获取待匹配句子以及目标样本文本之后,将待匹配句子与目标样本文本中各句子输入至预设相似度识别模型中,对待匹配句子以及各句子进行编辑距离计算,亦或者杰卡德系数计算,确定待匹配句子与各句子之间的语义相似度。Specifically, after acquiring the sentence to be matched and the target sample text, input each sentence in the sentence to be matched and the target sample text into the preset similarity recognition model, and perform edit distance calculation on the sentence to be matched and each sentence, or The German coefficient is calculated to determine the semantic similarity between the sentence to be matched and each sentence.
S03:确定所述待匹配句子与各所述句子对应的各所述语义相似度中最高的语义相似度。S03: Determine the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences.
S04:在最高的语义相似度小于预设相似度阈值且其与所述预设相似度阈值之间的差值小于预设相似度差值时,将与最高的语义相似度对应的所述句子记录为目标样本句子。S04: when the highest semantic similarity is less than a preset similarity threshold and the difference between it and the preset similarity threshold is less than the preset similarity difference, the sentence corresponding to the highest semantic similarity is Recorded as the target sample sentence.
其中,预设相似度阈值可以根据实际应用场景需求设定,示例性地,预设相似度阈值可以设定为0.9,0.95等。预设相似度差值可以从0.1至0.5之间任选一个值。Wherein, the preset similarity threshold may be set according to the requirements of the actual application scenario. Exemplarily, the preset similarity threshold may be set to 0.9, 0.95, or the like. The preset similarity difference value can be any value from 0.1 to 0.5.
可以理解地,在将所述待匹配句子与各所述句子输入至预设相似度识别模型中,确定所述待匹配句子与各所述句子之间的语义相似度,由于此处讨论的需要选择与待匹配句子语义相似度最高,最匹配的句子,因此确定所述待匹配句子与各所述句子对应的各所述语义相似度中最高的语义相似度,并将最高的语义相似度与预设相似度阈值进行比较,在最高的语义相似度小于预设相似度阈值时,确定最高的语义相似度与预设相似度阈值之间的差值,并将该差值与预设相似度差值进行比较,在该差值小于预设相似度差值时,将与最高的语义相似度对应的句子记录为目标样本句子。Understandably, in inputting the sentence to be matched and each of the sentences into the preset similarity recognition model, the semantic similarity between the to-be-matched sentence and each of the sentences is determined, due to the needs discussed here. Select the sentence with the highest semantic similarity to the sentence to be matched and the most matching sentence, so determine the highest semantic similarity in the semantic similarity corresponding to the sentence to be matched and each sentence, and compare the highest semantic similarity with the sentence. The preset similarity thresholds are compared, and when the highest semantic similarity is less than the preset similarity threshold, the difference between the highest semantic similarity and the preset similarity threshold is determined, and the difference is compared with the preset similarity. The difference is compared, and when the difference is smaller than the preset similarity difference, the sentence corresponding to the highest semantic similarity is recorded as the target sample sentence.
在现有技术中,若预设相似度识别模型判定两个句子之间的语义相似度小于预设相似度阈值时,则会判别这两个句子不相似,而在本实施例中,针对于小于预设相似度阈值对应的句子,判定其语义相似度与预设相似度阈值之间差值是否小于预设相似度差值,进而通过步骤S10‐S40进行进一步语义相似度判断。In the prior art, if the preset similarity recognition model determines that the semantic similarity between two sentences is less than the preset similarity threshold, it will determine that the two sentences are not similar. For sentences corresponding to less than the preset similarity threshold, it is determined whether the difference between the semantic similarity and the preset similarity threshold is less than the preset similarity difference, and further semantic similarity judgment is performed through steps S10-S40.
进一步地,在最高的语义相似度小于预设相似度阈值且其与所述预设相似度阈值之间的差值大于或等于预设相似度差值时,则不进行步骤S10‐S40的进一步语义相似度判断。Further, when the highest semantic similarity is smaller than the preset similarity threshold and the difference between it and the preset similarity threshold is greater than or equal to the preset similarity difference, the further steps S10-S40 are not performed. Semantic similarity judgment.
在另一具体实施例中,在确定所述待匹配句子与各所述句子对应的各所述语义相似度中最高的语义相似度之后,还包括:In another specific embodiment, after determining the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences, the method further includes:
在最高的语义相似度大于或等于预设相似度阈值时,将与最高的语义相似度对应的所述句子记录为与所述待匹配句子对应的语义匹配句子。When the highest semantic similarity is greater than or equal to a preset similarity threshold, the sentence corresponding to the highest semantic similarity is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
可以理解地,在最高的语义相似度大于或等于预设相似度阈值时,则表征该最高的语义相似度对应的句子是与待匹配句子语义匹配的句子,则直接将与最高的语义相似度对应的所述句子记录为与所述待匹配句子对应的语义匹配句子。Understandably, when the highest semantic similarity is greater than or equal to the preset similarity threshold, it means that the sentence corresponding to the highest semantic similarity is a sentence that is semantically matched with the sentence to be matched, then the sentence corresponding to the highest semantic similarity is directly matched with the highest semantic similarity. The corresponding sentence is recorded as a semantic matching sentence corresponding to the sentence to be matched.
在一具体实施例中,步骤S10之前,也即所述获取与所述待匹配句子对应的待匹配字符长度,以及与各所述目标样本句子对应的目标样本字符长度之前,包括:In a specific embodiment, before step S10, that is, before obtaining the length of the characters to be matched corresponding to the sentences to be matched, and the length of the target sample characters corresponding to each of the target sample sentences, the steps include:
(1)获取预设文本识别模型;其中,预设文本识别模型可以基于大量训练样本进行训练的word2vec或者bert模型,该预设文本识别模型用于对句子进行词向量转换。(1) Obtaining a preset text recognition model; wherein, the preset text recognition model may be a word2vec or bert model trained based on a large number of training samples, and the preset text recognition model is used to perform word vector conversion on sentences.
(2)将所述待匹配句子输入至所述预设文本识别模型中,得到与所述待匹配句子对应的待匹配词向量;同时,将所述目标样本句子输入至所述预设文本识别模型中,得到与所述目标样本句子对应的目标样本词向量。(2) Inputting the sentence to be matched into the preset text recognition model to obtain the word vector to be matched corresponding to the sentence to be matched; at the same time, inputting the target sample sentence into the preset text recognition model In the model, the target sample word vector corresponding to the target sample sentence is obtained.
具体地,在获取待匹配句子和目标样本句子之后,获取预设文本识别模型,并将待匹配句子输入至预设文本识别模型中,对待匹配句子进行词嵌入处理,也即对待匹配句子进 行分词处理之后并转换为词向量,得到与待匹配句子对应的待匹配词向量;同理,将目标样本句子输入至预设文本识别模型中,对目标样本句子进行词嵌入处理,得到与目标样本句子对应的目标样本词向量。Specifically, after obtaining the sentence to be matched and the target sample sentence, a preset text recognition model is obtained, and the sentence to be matched is input into the preset text recognition model, and word embedding processing is performed on the sentence to be matched, that is, word segmentation is performed on the sentence to be matched. After processing, it is converted into a word vector, and the word vector to be matched corresponding to the sentence to be matched is obtained; in the same way, the target sample sentence is input into the preset text recognition model, and word embedding processing is performed on the target sample sentence to obtain the target sample sentence. The corresponding target sample word vector.
(3)根据各所述待匹配词向量确定所述待匹配句子的待匹配字符长度;同时根据各所述目标样本词向量确定所述目标样本句子的目标样本字符长度。(3) Determine the character length of the to-be-matched sentence according to each of the to-be-matched word vectors; and simultaneously determine the target sample character length of the target sample sentence according to each of the target sample word vectors.
具体地,在将所述待匹配句子输入至所述预设文本识别模型中,得到与所述待匹配句子对应的待匹配词向量;同时,将所述目标样本句子输入至所述预设文本识别模型中,得到与所述目标样本句子对应的目标样本词向量之后,根据各待匹配词向量的具体数量确定待匹配句子的待匹配字符长度;同时根据各目标样本词向量具体数量确定目标样本句子的目标样本字符长度。Specifically, inputting the to-be-matched sentence into the preset text recognition model to obtain a to-be-matched word vector corresponding to the to-be-matched sentence; at the same time, inputting the target sample sentence into the preset text In the recognition model, after obtaining the target sample word vector corresponding to the target sample sentence, the length of the to-be-matched characters of the to-be-matched sentence is determined according to the specific number of each to-be-matched word vector; at the same time, the target sample is determined according to the specific number of each target sample word vector. The target sample character length of the sentence.
S20:在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度。S20: When the length of the character to be matched is less than the character length of the target sample, record the length of the character to be matched as the window length of the sliding window.
具体地,在获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较之后,若待匹配字符长度小于目标样本字符长度,则将待匹配字符长度记录为滑动窗口的窗口长度。Specifically, after obtaining the sentence to be matched and the target sample sentence, and comparing the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence, if the length of the character to be matched is smaller than the target sample character length, the length of the character to be matched is recorded as the window length of the sliding window.
S30:在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果。S30: Slide the sliding window on the target sample sentence, and match the target sample field of the target sample sentence covered by the sliding window with the to-be-matched sentence to obtain a first semantic distance result.
其中,目标样本字段指的是目标样本句子中被滑动窗口覆盖的字符段。词义距离结果表征待匹配句子与目标样本句子之间是否存在关键词义信息。Among them, the target sample field refers to the character segment covered by the sliding window in the target sample sentence. The word sense distance result represents whether there is key sense information between the sentence to be matched and the target sample sentence.
具体地,如图3所示,步骤S30中,包括:Specifically, as shown in FIG. 3, in step S30, it includes:
S301:将所述目标样本句子的起始字符与所述待匹配句子的起始字符对齐,将被所述滑动窗口覆盖的第一目标样本字段记录为第一截取句子。S301: Align the start characters of the target sample sentence with the start characters of the to-be-matched sentence, and record the first target sample field covered by the sliding window as the first intercepted sentence.
其中,起始字符指的是句子中处于起始位置(也即第一位)的字符。Wherein, the starting character refers to the character at the starting position (ie, the first position) in the sentence.
具体地,在将所述待匹配字符长度记录为滑动窗口的窗口长度之后,将目标样本句子的起始字符与待匹配句子的起始字符对齐,目的是为了从目标样本句子的起始字符开始进行滑动窗口覆盖,避免遗漏字符信息。进而将滑动窗口覆盖在目标样本句子上,将被覆盖的目标样本句子中的第一目标样本字段记录为第一截取句子。可以理解地,该第一截取句子长度等于窗口长度,也等于待匹配字符长度。Specifically, after recording the length of the character to be matched as the window length of the sliding window, align the starting character of the target sample sentence with the starting character of the sentence to be matched, in order to start from the starting character of the target sample sentence Perform sliding window coverage to avoid missing character information. Then, the sliding window is covered on the target sample sentence, and the first target sample field in the covered target sample sentence is recorded as the first intercepted sentence. Understandably, the length of the first intercepted sentence is equal to the length of the window and also equal to the length of the characters to be matched.
S302:将所述第一截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第一截取句子的语义结果。S302: Perform semantic matching on the first intercepted sentence and the to-be-matched sentence to obtain a semantic result of the to-be-matched sentence and the first intercepted sentence.
其中,一个词义结果可以视为一个词义距离数值,也即表征各个截取句子与待匹配句子之间的词义距离。Among them, a word sense result can be regarded as a word sense distance value, that is, it represents the word sense distance between each intercepted sentence and the sentence to be matched.
具体地,在将所述目标样本句子的起始字符与所述待匹配句子的起始字符对齐,将被所述滑动窗口覆盖的第一目标样本字段记录为第一截取句子之后,将第一截取句子与待匹配句子进行语义匹配,得到待匹配句子与第一截取句子的语义结果,该语义结果表征第一截取句子与待匹配句子之间的语义是否相近。需要说明的是,此处的语义匹配是基于句子结构信息进行语义匹配判断,该句子结构信息表征第一截取句子与待匹配句子之间的句子字符组成是否相近,亦或者结构是否相近(如句子结构为主谓宾等),可以作为对语义信息的一个补充。Specifically, after aligning the starting character of the target sample sentence with the starting character of the sentence to be matched, and recording the first target sample field covered by the sliding window as the first intercepted sentence, the first The intercepted sentence and the to-be-matched sentence are semantically matched to obtain a semantic result of the to-be-matched sentence and the first intercepted sentence, where the semantic result represents whether the first intercepted sentence and the to-be-matched sentence are semantically similar. It should be noted that the semantic matching here is based on sentence structure information for semantic matching judgment, and the sentence structure information represents whether the sentence character composition between the first intercepted sentence and the sentence to be matched is similar, or whether the structure is similar (such as sentence The structure is subject, predicate, object, etc.), which can be used as a supplement to semantic information.
S303:在所述目标样本句子上将所述滑动窗口向右滑动一个字符长度,将被所述滑动窗口覆盖的第二目标样本字段记录为第二截取句子。S303: Slide the sliding window to the right by one character length on the target sample sentence, and record the second target sample field covered by the sliding window as the second intercepted sentence.
具体地,在将所述第一截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第一截取句子的语义结果之后,表征第一截取句子与待匹配句子之间已经完成语义匹配,则在目标样本句子上将滑动窗口向右滑动一个字符长度,并将被滑动窗口覆盖的第二目标样本字段记录为第二截取句子;可以理解地,第二截取句子字符长度等于待匹配 句子字符长度。Specifically, after semantic matching is performed between the first intercepted sentence and the to-be-matched sentence, and the semantic result of the to-be-matched sentence and the first intercepted sentence is obtained, the relationship between the first intercepted sentence and the to-be-matched sentence is represented. After the semantic matching has been completed, slide the sliding window to the right by one character length on the target sample sentence, and record the second target sample field covered by the sliding window as the second intercepted sentence; understandably, the character length of the second intercepted sentence Equal to the character length of the sentence to be matched.
S304:将所述第二截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第二截取句子的语义结果。S304: Perform semantic matching on the second intercepted sentence and the to-be-matched sentence to obtain a semantic result of the to-be-matched sentence and the second intercepted sentence.
具体地,在在所述目标样本句子上将所述滑动窗口向右滑动一个字符长度,将被所述滑动窗口覆盖的第二目标样本字段记录为第二截取句子之后,将所述第二截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第二截取句子的语义结果,该语义结果表征第二截取句子与待匹配句子之间的语义是否相近。Specifically, after sliding the sliding window to the right by one character length on the target sample sentence, and recording the second target sample field covered by the sliding window as the second intercepted sentence, the second intercepted sentence is The sentence and the to-be-matched sentence are semantically matched to obtain a semantic result of the to-be-matched sentence and the second intercepted sentence, where the semantic result represents whether the second intercepted sentence and the to-be-matched sentence are semantically similar.
S305:在检测到所述滑动窗口的终点字符已与所述目标样本句子的终点字符对齐时,将所有语义结果记录为第一词义距离结果。S305: When it is detected that the end character of the sliding window has been aligned with the end character of the target sample sentence, record all semantic results as first semantic distance results.
其中,终点字符指的是句子中最后一个字符。Among them, the end character refers to the last character in the sentence.
具体地,在经过如上述步骤S301至S304之后,若检测到滑动窗口的终点字符已与目标样本句子的终点字符对齐时,将所有语义结果记录为第一词义距离结果;若当前滑动窗口的终点字符并未与目标样本句子的终点字符对齐,则表征目标样本句子中仍然存在未被覆盖识别的字符,则继续移动滑动窗口,直至检测到所述滑动窗口的终点字符已与所述目标样本句子的终点字符对齐。Specifically, after the above steps S301 to S304, if it is detected that the end character of the sliding window has been aligned with the end character of the target sample sentence, all semantic results are recorded as the first semantic distance result; if the end point of the current sliding window is If the character is not aligned with the end character of the target sample sentence, it means that there are still unrecognized characters in the target sample sentence, then continue to move the sliding window until it is detected that the end character of the sliding window has been aligned with the target sample sentence. end character alignment.
S40:根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分。S40: Determine a first semantic score between the sentence to be matched and the target sample sentence according to the result of the first semantic distance corresponding to the sentence to be matched and the target sample sentence.
其中,第一语义得分指示目标样本句子与待匹配句子中语义相似程度。第一语义得分越高表征目标样本句子中包含与待匹配句子匹配的关键语义信息更多。The first semantic score indicates the semantic similarity between the target sample sentence and the sentence to be matched. The higher the first semantic score indicates that the target sample sentence contains more key semantic information matching the sentence to be matched.
具体地,如图4所示,步骤S40中,也即所述根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分,包括:Specifically, as shown in FIG. 4 , in step S40, that is, according to the first word sense distance result corresponding to the sentence to be matched and the target sample sentence, determine the distance between the sentence to be matched and the target sample sentence. The first semantic score, including:
S401:对所述待匹配句子与所述目标样本句子对应的第一词义距离结果进行求导处理,得到与所述第一词义距离结果对应的词义曲线。S401: Perform derivation processing on a first word sense distance result corresponding to the sentence to be matched and the target sample sentence, to obtain a word meaning curve corresponding to the first word meaning distance result.
其中,第一词义距离结果可以视为由多个词义结果整合而成的连续密度序列,则可以通过对该第一词义距离结果进行求导处理,得到与第一词义距离结果对应的词义曲线。The first word sense distance result can be regarded as a continuous density sequence formed by integrating multiple word sense results, and a word sense curve corresponding to the first word sense distance result can be obtained by derivation of the first word sense distance result.
S402:通过寻峰识别算法确定所述词义曲线中是否存在词义峰值。S402: Determine whether there is a word-meaning peak in the word-meaning curve through a peak-seeking identification algorithm.
其中,寻峰识别算法用于寻找词义曲线中是否出现词义峰值,该词义峰值用于表征待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分。在本实施例中,该寻峰识别算法可以在词义曲线中进行全局搜索,在全局搜索过程中,若词义曲线出现曲线先上升再下滑的点,即为出现词义峰值。The peak-seeking identification algorithm is used to find out whether a word-meaning peak appears in the word-meaning curve, and the word-meaning peak is used to represent the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word-meaning curve. In this embodiment, the peak-seeking identification algorithm can perform a global search in the word meaning curve. During the global search process, if the word meaning curve has a point where the curve first rises and then falls, it is a word meaning peak.
具体地,在对所述待匹配句子与所述目标样本句子对应的第一词义距离结果进行求导处理,得到与所述目标样本句子对应的词义曲线之后,通过寻峰识别算法在词义曲线中寻找是否存在词义峰值,若该词义曲线中存在词义峰值,则可以根据该词义峰值确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分。Specifically, after the derivation process is performed on the first word sense distance result corresponding to the sentence to be matched and the target sample sentence, and the word sense curve corresponding to the target sample sentence is obtained, the peak search algorithm is used to identify the word meaning curve in the word sense curve. Find out whether there is a word meaning peak, if there is a word meaning peak in the word meaning curve, the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve can be determined according to the word meaning peak.
S403:在所述词义曲线中存在词义峰值时,根据所述词义峰值确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分。S403: When there is a word meaning peak in the word meaning curve, determine a first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve according to the word meaning peak.
具体地,在通过寻峰识别算法确定所述词义曲线中是否存在词义峰值之后,在所述词义曲线中存在词义峰值时,可以根据该词义峰值的峰值大小,亦或者词义峰值所占面积大小,确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分。示例性地,词义峰值的峰值越大,表征第一语义得分越高;亦或者词义峰值所占面积大小越大,表征第一语义得分越高。Specifically, after determining whether there is a word meaning peak in the word meaning curve by a peak-seeking identification algorithm, when there is a word meaning peak in the word meaning curve, according to the peak size of the word meaning peak, or the area occupied by the word meaning peak, Determine the first semantic score between the sentence to be matched and the target sample sentence corresponding to the semantic curve. Exemplarily, the larger the peak value of the word sense peak, the higher the first semantic score; or the larger the area occupied by the word sense peak, the higher the first semantic score.
S404:在所述词义曲线不存在词义峰值时,确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分为0。S404: When the word meaning curve does not have a word meaning peak, determine that the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve is 0.
具体地,在通过寻峰识别算法确定所述词义曲线中是否存在词义峰值之后,在所述词 义曲线中不存在词义峰值时,表征该词义曲线对应的目标样本句子与待匹配句子之间不匹配,确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分为0。Specifically, after determining whether there is a word sense peak in the word sense curve through a peak-seeking identification algorithm, when there is no word sense peak in the word sense curve, it indicates that there is a mismatch between the target sample sentence corresponding to the word sense curve and the sentence to be matched , it is determined that the first semantic score between the sentence to be matched and the target sample sentence corresponding to the semantic curve is 0.
S50:在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。S50: When the first semantic score exceeds a preset score threshold, record the target sample sentence as a semantic matching sentence corresponding to the sentence to be matched.
其中,预设分数阈值可以根据不同应用场景进行确定,示例性地,预设分数阈值可以为90、95等数值。The preset score threshold may be determined according to different application scenarios, and for example, the preset score threshold may be a value such as 90 or 95.
具体地,在根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分之后,将第一语义得分与预设分数阈值进行比较,在第一语义得分超过预设分数阈值时,将目标样本句子记录为与待匹配句子对应的语义匹配句子;在第一语义得分未超过预设分数阈值时,则表征该目标样本句子与待匹配句子语义不匹配。Specifically, after determining the first semantic score between the to-be-matched sentence and the target sample sentence according to the first lexical distance result corresponding to the to-be-matched sentence and the target sample sentence, compare the first semantic score with the predicted Set the score threshold for comparison, when the first semantic score exceeds the preset score threshold, record the target sample sentence as the semantic matching sentence corresponding to the sentence to be matched; when the first semantic score does not exceed the preset score The target sample sentence does not semantically match the sentence to be matched.
在本实施例中,通过定义一个滑动窗口指标,以将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果,进而根据该第一词义距离结果确定待匹配句子与所述目标样本句子之间的第一语义得分,以确定目标样本句子中是否存在部分语义信息与待匹配句子相匹配,使得原本不会被召回的目标样本句子(目标样本句子与待匹配句子的语义相似度小于预设相似度阈值时,会被直接判定为目标样本句子与待匹配句子不匹配),存在可能被召回的可能性,进而通过本申请可以为一些缺少样本的目标场景,提供更多的样本数据,同时还提高了短句子与长句子之间的语义匹配准确率,使得长短句之间的语义匹配相似度较高。In this embodiment, a sliding window indicator is defined to match the target sample field of the target sample sentence covered by the sliding window with the to-be-matched sentence to obtain a first word sense distance result, and then according to the The first semantic distance result determines the first semantic score between the sentence to be matched and the target sample sentence, so as to determine whether there is part of the semantic information in the target sample sentence that matches the sentence to be matched, so that the target sample that would not be recalled originally Sentence (when the semantic similarity between the target sample sentence and the sentence to be matched is less than the preset similarity threshold, it will be directly determined that the target sample sentence does not match the sentence to be matched), and there is a possibility of being recalled. Provide more sample data for some target scenes lacking samples, and also improve the accuracy of semantic matching between short sentences and long sentences, so that the semantic matching similarity between long and short sentences is higher.
在一实施例中,如图5所示,步骤S10之后,还包括:In one embodiment, as shown in FIG. 5 , after step S10, it further includes:
S60:在所述待匹配字符长度大于或等于所述目标样本字符长度时,将所述目标样本字符长度记录为所述滑动窗口的窗口长度。S60: When the length of the character to be matched is greater than or equal to the character length of the target sample, record the character length of the target sample as the window length of the sliding window.
具体地,在获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较之后,若待匹配字符长度大于或等于目标样本字符长度,则将目标样本字符长度记录为滑动窗口的窗口长度。Specifically, after obtaining the sentence to be matched and the target sample sentence, and comparing the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence, if the length of the character to be matched is greater than or equal to target sample character length, record the target sample character length as the window length of the sliding window.
S70:在所述待匹配句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述待匹配句子的待匹配字段与所述目标样本句子进行匹配,得到第二词义距离结果。S70: Slide the sliding window on the sentence to be matched, and match the to-be-matched field of the sentence to be matched covered by the sliding window with the target sample sentence to obtain a second semantic distance result.
具体地,在所述待匹配字符长度大于或等于所述目标样本字符长度时,将所述目标样本字符长度记录为所述滑动窗口的窗口长度之后,将所述目标样本句子的起始字符与所述待匹配句子的起始字符对齐,将被所述滑动窗口覆盖的第一待匹配字段(也即待匹配句子中的由与窗口长度相同长度的字符组成的字段)记录为第三截取句子,将所述第三截取句子与所述目标样本句子进行语义匹配,得到所述目标样本句子与所述第三截取句子的语义结果;在所述待匹配句子上将所述滑动窗口向右滑动一个字符长度,将被所述滑动窗口覆盖的第二待匹配字段记录为第四截取句子,将所述第四截取句子与所述目标样本句子进行语义匹配,得到所述目标样本句子与所述第四截取句子的语义结果;在检测到所述滑动窗口的终点字符已与所述待匹配句子的终点字符对齐时,将所有语义结果记录为第二词义距离结果。Specifically, when the length of the character to be matched is greater than or equal to the length of the target sample character, after recording the length of the target sample character as the window length of the sliding window, the starting character of the target sample sentence is compared with the The starting characters of the sentence to be matched are aligned, and the first field to be matched covered by the sliding window (that is, the field in the sentence to be matched that is made up of characters of the same length as the window length) is recorded as the third interception sentence. , perform semantic matching between the third intercepted sentence and the target sample sentence to obtain the semantic result of the target sample sentence and the third intercepted sentence; slide the sliding window to the right on the to-be-matched sentence One character length, the second to-be-matched field covered by the sliding window is recorded as the fourth intercepted sentence, and the fourth intercepted sentence is semantically matched with the target sample sentence to obtain the target sample sentence and the described target sample sentence. The fourth intercepts the semantic results of the sentence; when it is detected that the end character of the sliding window has been aligned with the end character of the sentence to be matched, all semantic results are recorded as the second semantic distance results.
S80:根据所述待匹配句子与所述目标样本句子对应的第二词义距离结果,确定所述待匹配句子与所述目标样本句子之间得到第二语义得分。S80: According to the result of the second semantic distance corresponding to the sentence to be matched and the target sample sentence, determine that a second semantic score is obtained between the sentence to be matched and the target sample sentence.
其中,第二语义得分指示目标样本句子与待匹配句子中语义相似程度。第二语义得分越高表征目标样本句子中包含与待匹配句子匹配的关键语义信息更多。The second semantic score indicates the semantic similarity between the target sample sentence and the sentence to be matched. A higher second semantic score indicates that the target sample sentence contains more key semantic information matching the sentence to be matched.
具体地,在所述待匹配句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述待匹配句子的待匹配字段与所述目标样本句子进行匹配,得到第二词义距离结果之后,对所述第二词义距离结果进行求导处理,得到与所述第二词义距离结果对应的词义曲线;通过寻峰识别算法确定所述词义曲线中是否存在词义峰值;在所述词义曲线中存在词义峰值时, 根据所述词义峰值确定待匹配句子与该词义曲线对应的目标样本句子之间的第二语义得分。在所述词义曲线不存在词义峰值时,确定待匹配句子与该词义曲线对应的目标样本句子之间的第二语义得分为0。Specifically, sliding the sliding window on the to-be-matched sentence, matching the to-be-matched field of the to-be-matched sentence covered by the sliding window with the target sample sentence, and after obtaining the second lexical distance result, Perform derivation processing on the second word sense distance result to obtain a word meaning curve corresponding to the second word meaning distance result; determine whether there is a word meaning peak in the word meaning curve through a peak-seeking identification algorithm; whether there is a word meaning peak in the word meaning curve When the word sense peak is detected, the second semantic score between the sentence to be matched and the target sample sentence corresponding to the word sense curve is determined according to the word meaning peak. When the word meaning curve does not have a word meaning peak, it is determined that the second semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve is 0.
S90:在所述第二语义得分超过所述预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。S90: When the second semantic score exceeds the preset score threshold, record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence.
具体地,在根据所述待匹配句子与所述目标样本句子对应的第二词义距离结果,确定待匹配句子与所述目标样本句子之间的第二语义得分之后,将第二语义得分与预设分数阈值进行比较,在第二语义得分超过预设分数阈值时,将目标样本句子记录为与待匹配句子对应的语义匹配句子;在第二语义得分未超过预设分数阈值时,则表征该目标样本句子与待匹配句子语义不匹配。Specifically, after determining the second semantic score between the to-be-matched sentence and the target sample sentence according to the second semantic distance result corresponding to the to-be-matched sentence and the target sample sentence, the second semantic score is compared with the predicted The score threshold is set for comparison, and when the second semantic score exceeds the preset score threshold, the target sample sentence is recorded as the semantic matching sentence corresponding to the sentence to be matched; when the second semantic score does not exceed the preset score threshold, it is characterized. The target sample sentence does not semantically match the sentence to be matched.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the sequence numbers of the steps in the above embodiments does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.
在一实施例中,提供一种长短句文本语义匹配装置,该长短句文本语义匹配装置与上述实施例中长短句文本语义匹配方法一一对应。如图6所示,该长短句文本语义匹配装置包括句子获取模块10、第一窗口长度记录模块20、第一句子匹配模块30、第一语义得分确定模块40和第一匹配句子确定模块50。各功能模块详细说明如下:In one embodiment, a long-short sentence text semantic matching device is provided, and the long and short sentence text semantic matching device is in one-to-one correspondence with the long-short sentence text semantic matching method in the above embodiment. As shown in FIG. 6 , the apparatus for semantic matching of long and short sentences includes a sentence acquisition module 10 , a first window length recording module 20 , a first sentence matching module 30 , a first semantic score determination module 40 and a first matched sentence determination module 50 . The detailed description of each functional module is as follows:
句子获取模块10,用于获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;The sentence obtaining module 10 is configured to obtain a sentence to be matched and a target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
第一窗口长度记录模块20,用于在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;The first window length recording module 20 is configured to record the length of the character to be matched as the window length of the sliding window when the length of the character to be matched is less than the length of the character of the target sample;
第一句子匹配模块30,用于在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;The first sentence matching module 30 is used to slide the sliding window on the target sample sentence, and match the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain the first sentence. a semantic distance result;
第一语义得分确定模块40,用于根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;The first semantic score determination module 40 is configured to determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
第一匹配句子确定模块50,用于在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。The first matching sentence determination module 50 is configured to record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence when the first semantic score exceeds a preset score threshold.
优选地,所述长短句文本语义匹配装置还包括:Preferably, the long-short sentence text semantic matching device further comprises:
样本文本获取模块,用于获取待匹配句子以及目标样本文本;所述目标样本文本中包含多个句子;a sample text obtaining module, used to obtain a sentence to be matched and a target sample text; the target sample text contains a plurality of sentences;
语义相似度确定模块,用于将所述待匹配句子与各所述句子输入至预设相似度识别模型中,确定所述待匹配句子与各所述句子之间的语义相似度;a semantic similarity determination module, configured to input the to-be-matched sentence and each of the sentences into a preset similarity recognition model to determine the semantic similarity between the to-be-matched sentence and each of the sentences;
最高相似度确定模块,用于确定所述待匹配句子与各所述句子对应的各所述语义相似度中最高的语义相似度;a highest similarity determination module, used for determining the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences;
目标样本句子确定模块,用于在最高的语义相似度小于预设相似度阈值且其与所述预设相似度阈值之间的差值小于预设相似度差值时,将与最高的语义相似度对应的所述句子记录为目标样本句子。The target sample sentence determination module is used to determine the semantic similarity with the highest semantic similarity when the highest semantic similarity is smaller than the preset similarity threshold and the difference between it and the preset similarity threshold is smaller than the preset similarity difference The sentence corresponding to the degree is recorded as the target sample sentence.
优选地,所述长短句文本语义匹配装置还包括:Preferably, the long-short sentence text semantic matching device further comprises:
语义匹配句子记录模块,用于在最高的语义相似度大于或等于预设相似度阈值时,将与最高的语义相似度对应的所述句子记录为与所述待匹配句子对应的语义匹配句子。The semantic matching sentence recording module is configured to record the sentence corresponding to the highest semantic similarity as the semantic matching sentence corresponding to the to-be-matched sentence when the highest semantic similarity is greater than or equal to a preset similarity threshold.
优选地,所述长短句文本语义匹配装置还包括:Preferably, the long-short sentence text semantic matching device further comprises:
文本识别模型获取模块,用于获取预设文本识别模型;The text recognition model acquisition module is used to acquire the preset text recognition model;
词向量确定模块,用于将所述待匹配句子输入至所述预设文本识别模型中,得到与所述待匹配句子对应的待匹配词向量;同时,将所述目标样本句子输入至所述预设文本识别模型中,得到与所述目标样本句子对应的目标样本词向量;A word vector determination module, configured to input the sentence to be matched into the preset text recognition model to obtain a word vector to be matched corresponding to the sentence to be matched; at the same time, input the target sample sentence into the In the preset text recognition model, the target sample word vector corresponding to the target sample sentence is obtained;
字符长度确定模块,用于根据各所述待匹配词向量确定所述待匹配句子的待匹配字符长度;同时根据各所述目标样本词向量确定所述目标样本句子的目标样本字符长度。A character length determination module, configured to determine the length of the characters to be matched in the sentence to be matched according to each of the word vectors to be matched; at the same time, determine the length of the target sample characters of the target sample sentence according to each of the target sample word vectors.
优选地,如图7所示,第一句子匹配模块30包括如下单元:Preferably, as shown in Figure 7, the first sentence matching module 30 includes the following units:
字符对齐单元301,用于将所述目标样本句子的起始字符与所述待匹配句子的起始字符对齐,将被所述滑动窗口覆盖的第一目标样本字段记录为第一截取句子; Character alignment unit 301, for aligning the starting character of the target sample sentence with the starting character of the sentence to be matched, and recording the first target sample field covered by the sliding window as the first intercepted sentence;
第一语义匹配单元302,用于将所述第一截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第一截取句子的语义结果;a first semantic matching unit 302, configured to perform semantic matching on the first intercepted sentence and the to-be-matched sentence, to obtain a semantic result of the to-be-matched sentence and the first intercepted sentence;
窗口滑动单元303,用于在所述目标样本句子上将所述滑动窗口向右滑动一个字符长度,将被所述滑动窗口覆盖的第二目标样本字段记录为第二截取句子; Window sliding unit 303, for sliding the sliding window to the right by one character length on the target sample sentence, and recording the second target sample field covered by the sliding window as the second interception sentence;
第二语义匹配单元304,用于将所述第二截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第二截取句子的语义结果;The second semantic matching unit 304 is configured to perform semantic matching between the second intercepted sentence and the to-be-matched sentence to obtain a semantic result of the to-be-matched sentence and the second intercepted sentence;
词义距离结果记录单元305,用于在检测到所述滑动窗口的终点字符已与所述目标样本句子的终点字符对齐时,将所有语义结果记录为第一词义距离结果。The lexical distance result recording unit 305 is configured to record all semantic results as the first lexical distance result when it is detected that the end character of the sliding window has been aligned with the end character of the target sample sentence.
优选地,如图8所示,所述第一语义得分确定模块40包括:Preferably, as shown in FIG. 8 , the first semantic score determination module 40 includes:
词义曲线确定单元401,用于对所述待匹配句子与所述目标样本句子对应的第一词义距离结果进行求导处理,得到与所述第一词义距离结果对应的词义曲线;A word meaning curve determination unit 401, configured to perform derivation processing on the first word meaning distance result corresponding to the sentence to be matched and the target sample sentence, to obtain a word meaning curve corresponding to the first word meaning distance result;
词义峰值确定单元402,用于通过寻峰识别算法确定所述词义曲线中是否存在词义峰值;A word meaning peak determining unit 402, configured to determine whether there is a word meaning peak in the word meaning curve through a peak-seeking identification algorithm;
第一语义得分确定单元403,用于在所述词义曲线中存在词义峰值时,根据所述词义峰值确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分;A first semantic score determination unit 403, configured to determine, according to the word meaning peak, a first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve when there is a word meaning peak in the word meaning curve;
第二语义得分确定单元404,用于在所述词义曲线不存在词义峰值时,确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分为0。The second semantic score determining unit 404 is configured to determine that the first semantic score between the sentence to be matched and the target sample sentence corresponding to the semantic curve is 0 when the word meaning curve does not have a word meaning peak.
优选地,所述长短句文本语义匹配装置还包括:Preferably, the long-short sentence text semantic matching device further comprises:
第二窗口长度记录模块,用于在所述待匹配字符长度大于或等于所述目标样本字符长度时,将所述目标样本字符长度记录为所述滑动窗口的窗口长度;A second window length recording module, configured to record the length of the target sample character as the window length of the sliding window when the length of the character to be matched is greater than or equal to the length of the target sample character;
第二句子匹配模块,用于在所述待匹配句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述待匹配句子的待匹配字段与所述目标样本句子进行匹配,得到第二词义距离结果;The second sentence matching module is used to slide the sliding window on the sentence to be matched, and match the to-be-matched field of the sentence to be matched covered by the sliding window with the target sample sentence to obtain the second sentence. lexical distance results;
第二语义得分确定模块,用于根据所述待匹配句子与所述目标样本句子对应的第二词义距离结果,确定所述待匹配句子与所述目标样本句子之间得到第二语义得分;A second semantic score determination module, configured to determine a second semantic score between the to-be-matched sentence and the target sample sentence according to the second semantic distance result corresponding to the to-be-matched sentence and the target sample sentence;
第二匹配句子确定模块,用于在所述第二语义得分超过所述预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。The second matching sentence determination module is configured to record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence when the second semantic score exceeds the preset score threshold.
关于长短句文本语义匹配装置的具体限定可以参见上文中对于长短句文本语义匹配方法的限定,在此不再赘述。上述长短句文本语义匹配装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。For the specific limitation of the apparatus for semantic matching of long and short sentences, please refer to the above limitation on the method for semantic matching of long and short sentences, which will not be repeated here. Each module in the above-mentioned apparatus for semantic matching of long and short sentences can be implemented in whole or in part by software, hardware and combinations thereof. The above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图9所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括可读存储介质、内存储器。该可读存储介质存储有操作系统、计算机可读指令和数据库。该内存储器为可读存储介质中的操作系统和计算机可读指令的运行提供环境。该计算机设备的数据库用于存储上述实施例中长短句文本语义匹配方法中使用到的数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机可读指令被处理器执行 时以实现一种长短句文本语义匹配方法。本实施例所提供的可读存储介质包括非易失性可读存储介质和易失性可读存储介质。In one embodiment, a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9 . The computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a readable storage medium, an internal memory. The readable storage medium stores an operating system, computer readable instructions and a database. The internal memory provides an environment for the execution of the operating system and computer-readable instructions in the readable storage medium. The database of the computer device is used to store the data used in the text semantic matching method of long and short sentences in the above embodiment. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer readable instructions, when executed by a processor, implement a method for semantic matching of long and short sentences. The readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
在一个实施例中,提供了一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:In one embodiment, there is provided a computer apparatus comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the processor executes the computer The following steps are implemented when readable instructions:
获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;When the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;Sliding the sliding window on the target sample sentence, and matching the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first semantic distance result;
根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;Determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。When the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
在一个实施例中,提供了一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:In one embodiment, one or more readable storage media are provided having computer-readable instructions stored thereon, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processing The device performs the following steps:
获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;When the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;Sliding the sliding window on the target sample sentence, and matching the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first semantic distance result;
根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;Determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。When the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质或者易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a non-volatile computer. In a readable storage medium or a volatile computer-readable storage medium, the computer-readable instructions, when executed, may include the processes of the foregoing method embodiments. Wherein, any reference to memory, storage, database or other medium used in the various embodiments provided in this application may include non-volatile and/or volatile memory. Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory may include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。Those skilled in the art can clearly understand that, for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned functions can be allocated to different functional units, Module completion, that is, dividing the internal structure of the device into different functional units or modules to complete all or part of the functions described above.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例 对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The above-mentioned embodiments are only used to illustrate the technical solutions of the present application, but not to limit them; although the present application has been described in detail with reference to the above-mentioned embodiments, those of ordinary skill in the art should understand that: it can still be used for the above-mentioned implementations. The technical solutions described in the examples are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions in the embodiments of the application, and should be included in the within the scope of protection of this application.

Claims (20)

  1. 一种长短句文本语义匹配方法,其中,包括:A method for semantic matching of long and short sentences, including:
    获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
    在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;When the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
    在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;Sliding the sliding window on the target sample sentence, and matching the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first semantic distance result;
    根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;Determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
    在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。When the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  2. 如权利要求1所述的长短句文本语义匹配方法,其中,所述获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较之前,还包括:The method for semantic matching of long and short sentence texts according to claim 1, wherein the acquiring the sentence to be matched and the target sample sentence, the length of the character to be matched corresponding to the sentence to be matched and the length of the target sample character corresponding to the target sample sentence Before making a length comparison, also include:
    获取待匹配句子以及目标样本文本;所述目标样本文本中包含多个句子;Obtain the sentence to be matched and the target sample text; the target sample text contains multiple sentences;
    将所述待匹配句子与各所述句子输入至预设相似度识别模型中,确定所述待匹配句子与各所述句子之间的语义相似度;Inputting the to-be-matched sentence and each of the sentences into a preset similarity recognition model to determine the semantic similarity between the to-be-matched sentence and each of the sentences;
    确定所述待匹配句子与各所述句子对应的各所述语义相似度中最高的语义相似度;determining the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences;
    在最高的语义相似度小于预设相似度阈值且其与所述预设相似度阈值之间的差值小于预设相似度差值时,将与最高的语义相似度对应的所述句子记录为目标样本句子。When the highest semantic similarity is smaller than the preset similarity threshold and the difference between it and the preset similarity threshold is smaller than the preset similarity difference, the sentence corresponding to the highest semantic similarity is recorded as target sample sentence.
  3. 如权利要求2所述的长短句文本语义匹配方法,其中,所述确定所述待匹配句子与各所述句子对应的各所述语义相似度中最高的语义相似度之后,还包括:The method for semantic matching of long and short sentences as claimed in claim 2, wherein after determining the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences, the method further comprises:
    在最高的语义相似度大于或等于预设相似度阈值时,将与最高的语义相似度对应的所述句子记录为与所述待匹配句子对应的语义匹配句子。When the highest semantic similarity is greater than or equal to a preset similarity threshold, the sentence corresponding to the highest semantic similarity is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  4. 如权利要求1所述的长短句文本语义匹配方法,其中,所述获取与所述待匹配句子对应的待匹配字符长度,以及与各所述目标样本句子对应的目标样本字符长度之前,包括:The method for semantic matching of long-short sentence text according to claim 1, wherein before acquiring the length of the characters to be matched corresponding to the sentences to be matched and the length of the target sample characters corresponding to each of the target sample sentences, the method comprises:
    获取预设文本识别模型;Get the preset text recognition model;
    将所述待匹配句子输入至所述预设文本识别模型中,得到与所述待匹配句子对应的待匹配词向量;同时,将所述目标样本句子输入至所述预设文本识别模型中,得到与所述目标样本句子对应的目标样本词向量;Inputting the to-be-matched sentence into the preset text recognition model to obtain a to-be-matched word vector corresponding to the to-be-matched sentence; at the same time, inputting the target sample sentence into the preset text recognition model, obtaining the target sample word vector corresponding to the target sample sentence;
    根据各所述待匹配词向量确定所述待匹配句子的待匹配字符长度;同时根据各所述目标样本词向量确定所述目标样本句子的目标样本字符长度。The to-be-matched character length of the to-be-matched sentence is determined according to each of the to-be-matched word vectors; at the same time, the target sample character length of the target sample sentence is determined according to each of the target sample word vectors.
  5. 如权利要求1所述的长短句文本语义匹配方法,其中,所述在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果,包括:The method for text semantic matching of long and short sentences according to claim 1, wherein, sliding the sliding window on the target sample sentence, the target sample field of the target sample sentence covered by the sliding window is compared with the target sample field of the target sample sentence. Match the sentences to be matched, and get the first word sense distance result, including:
    将所述目标样本句子的起始字符与所述待匹配句子的起始字符对齐,将被所述滑动窗口覆盖的第一目标样本字段记录为第一截取句子;Align the starting character of the target sample sentence with the starting character of the sentence to be matched, and record the first target sample field covered by the sliding window as the first intercepted sentence;
    将所述第一截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第一截取句子的语义结果;The first intercepted sentence and the to-be-matched sentence are semantically matched to obtain the semantic result of the to-be-matched sentence and the first intercepted sentence;
    在所述目标样本句子上将所述滑动窗口向右滑动一个字符长度,将被所述滑动窗口覆盖的第二目标样本字段记录为第二截取句子;On the target sample sentence, slide the sliding window to the right by one character length, and record the second target sample field covered by the sliding window as the second intercepted sentence;
    将所述第二截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第二截取句子的语义结果;The second intercepted sentence and the to-be-matched sentence are semantically matched to obtain the semantic result of the to-be-matched sentence and the second intercepted sentence;
    在检测到所述滑动窗口的终点字符已与所述目标样本句子的终点字符对齐时,将所有语义结果记录为第一词义距离结果。When it is detected that the end character of the sliding window has been aligned with the end character of the target sample sentence, all semantic results are recorded as the first lexical distance result.
  6. 如权利要求1所述的长短句文本语义匹配方法,其中,所述根据所述待匹配句子与所述目标样本句子对应的词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分,包括:The method for text semantic matching of long and short sentences according to claim 1, wherein the first distance between the to-be-matched sentence and the target sample sentence is determined according to the result of the lexical distance corresponding to the to-be-matched sentence and the target sample sentence. Semantic scores, including:
    对所述待匹配句子与所述目标样本句子对应的第一词义距离结果进行求导处理,得到与所述第一词义距离结果对应的词义曲线;Perform derivation processing on the first word sense distance result corresponding to the sentence to be matched and the target sample sentence, to obtain a word meaning curve corresponding to the first word sense distance result;
    通过寻峰识别算法确定所述词义曲线中是否存在词义峰值;Determine whether there is a word-meaning peak in the word-meaning curve by using a peak-seeking identification algorithm;
    在所述词义曲线中存在词义峰值时,根据所述词义峰值确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分;When there is a word meaning peak in the word meaning curve, determine the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve according to the word meaning peak;
    在所述词义曲线不存在词义峰值时,确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分为0。When the word meaning curve does not have a word meaning peak, it is determined that the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve is 0.
  7. 如权利要求1所述的长短句文本语义匹配方法,其中,所述将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较之后,还包括:The method for semantic matching of long-short sentence text according to claim 1, wherein after the length comparison of the length of the characters to be matched corresponding to the sentence to be matched and the length of the target sample character corresponding to the target sample sentence, the method further comprises:
    在所述待匹配字符长度大于或等于所述目标样本字符长度时,将所述目标样本字符长度记录为所述滑动窗口的窗口长度;When the length of the character to be matched is greater than or equal to the length of the target sample character, recording the length of the target sample character as the window length of the sliding window;
    在所述待匹配句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述待匹配句子的待匹配字段与所述目标样本句子进行匹配,得到第二词义距离结果;sliding the sliding window on the to-be-matched sentence, and matching the to-be-matched field of the to-be-matched sentence covered by the sliding window with the target sample sentence to obtain a second lexical distance result;
    根据所述待匹配句子与所述目标样本句子对应的第二词义距离结果,确定所述待匹配句子与所述目标样本句子之间得到第二语义得分;According to the second semantic distance result corresponding to the to-be-matched sentence and the target sample sentence, determine that a second semantic score is obtained between the to-be-matched sentence and the target sample sentence;
    在所述第二语义得分超过所述预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。When the second semantic score exceeds the preset score threshold, the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  8. 一种长短句文本语义匹配装置,其中,包括:A long-short sentence text semantic matching device, comprising:
    句子获取模块,用于获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;a sentence obtaining module, configured to obtain a sentence to be matched and a target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
    第一窗口长度记录模块,用于在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;a first window length recording module, configured to record the length of the character to be matched as the window length of the sliding window when the length of the character to be matched is less than the length of the character of the target sample;
    第一句子匹配模块,用于在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;The first sentence matching module is used to slide the sliding window on the target sample sentence, and match the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain the first sentence. lexical distance results;
    第一语义得分确定模块,用于根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;a first semantic score determination module, configured to determine a first semantic score between the to-be-matched sentence and the target sample sentence according to the first semantic distance result corresponding to the to-be-matched sentence and the target sample sentence;
    第一匹配句子确定模块,用于在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。The first matching sentence determination module is configured to record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence when the first semantic score exceeds a preset score threshold.
  9. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:A computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the following steps when executing the computer-readable instructions:
    获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
    在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;When the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
    在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;Sliding the sliding window on the target sample sentence, and matching the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first semantic distance result;
    根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;Determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
    在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。When the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  10. 如权利要求9所述的计算机设备,其中,所述获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较之前,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein the acquiring the sentence to be matched and the target sample sentence, and comparing the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence Before, the processor also implements the following steps when executing the computer-readable instructions:
    获取待匹配句子以及目标样本文本;所述目标样本文本中包含多个句子;Obtain the sentence to be matched and the target sample text; the target sample text contains multiple sentences;
    将所述待匹配句子与各所述句子输入至预设相似度识别模型中,确定所述待匹配句子与各所述句子之间的语义相似度;Inputting the to-be-matched sentence and each of the sentences into a preset similarity recognition model to determine the semantic similarity between the to-be-matched sentence and each of the sentences;
    确定所述待匹配句子与各所述句子对应的各所述语义相似度中最高的语义相似度;determining the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences;
    在最高的语义相似度小于预设相似度阈值且其与所述预设相似度阈值之间的差值小于预设相似度差值时,将与最高的语义相似度对应的所述句子记录为目标样本句子。When the highest semantic similarity is smaller than the preset similarity threshold and the difference between it and the preset similarity threshold is smaller than the preset similarity difference, the sentence corresponding to the highest semantic similarity is recorded as target sample sentence.
  11. 如权利要求10所述的计算机设备,其中,所述确定所述待匹配句子与各所述句子对应的各所述语义相似度中最高的语义相似度之后,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 10, wherein after determining the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences, the processor executes the computer executable The following steps are also implemented when reading the command:
    在最高的语义相似度大于或等于预设相似度阈值时,将与最高的语义相似度对应的所述句子记录为与所述待匹配句子对应的语义匹配句子。When the highest semantic similarity is greater than or equal to a preset similarity threshold, the sentence corresponding to the highest semantic similarity is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  12. 如权利要求9所述的计算机设备,其中,所述获取与所述待匹配句子对应的待匹配字符长度,以及与各所述目标样本句子对应的目标样本字符长度之前,所述处理器执行所述计算机可读指令时还实现如下步骤:The computer device according to claim 9, wherein before acquiring the length of the characters to be matched corresponding to the sentences to be matched and the length of the target sample characters corresponding to each of the target sample sentences, the processor executes the When the computer-readable instructions are described, the following steps are also implemented:
    获取预设文本识别模型;Get the preset text recognition model;
    将所述待匹配句子输入至所述预设文本识别模型中,得到与所述待匹配句子对应的待匹配词向量;同时,将所述目标样本句子输入至所述预设文本识别模型中,得到与所述目标样本句子对应的目标样本词向量;Inputting the to-be-matched sentence into the preset text recognition model to obtain a to-be-matched word vector corresponding to the to-be-matched sentence; at the same time, inputting the target sample sentence into the preset text recognition model, obtaining the target sample word vector corresponding to the target sample sentence;
    根据各所述待匹配词向量确定所述待匹配句子的待匹配字符长度;同时根据各所述目标样本词向量确定所述目标样本句子的目标样本字符长度。The to-be-matched character length of the to-be-matched sentence is determined according to each of the to-be-matched word vectors; at the same time, the target sample character length of the target sample sentence is determined according to each of the target sample word vectors.
  13. 如权利要求9所述的计算机设备,其中,所述在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果,包括:The computer device according to claim 9, wherein, by sliding the sliding window on the target sample sentence, a target sample field of the target sample sentence covered by the sliding window is compared with the sentence to be matched Match to get the first word sense distance result, including:
    将所述目标样本句子的起始字符与所述待匹配句子的起始字符对齐,将被所述滑动窗口覆盖的第一目标样本字段记录为第一截取句子;Align the starting character of the target sample sentence with the starting character of the sentence to be matched, and record the first target sample field covered by the sliding window as the first intercepted sentence;
    将所述第一截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第一截取句子的语义结果;The first intercepted sentence and the to-be-matched sentence are semantically matched to obtain the semantic result of the to-be-matched sentence and the first intercepted sentence;
    在所述目标样本句子上将所述滑动窗口向右滑动一个字符长度,将被所述滑动窗口覆盖的第二目标样本字段记录为第二截取句子;On the target sample sentence, slide the sliding window to the right by one character length, and record the second target sample field covered by the sliding window as the second intercepted sentence;
    将所述第二截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第二截取句子的语义结果;The second intercepted sentence and the to-be-matched sentence are semantically matched to obtain the semantic result of the to-be-matched sentence and the second intercepted sentence;
    在检测到所述滑动窗口的终点字符已与所述目标样本句子的终点字符对齐时,将所有语义结果记录为第一词义距离结果。When it is detected that the end character of the sliding window has been aligned with the end character of the target sample sentence, all semantic results are recorded as the first lexical distance result.
  14. 如权利要求9所述的计算机设备,其中,所述根据所述待匹配句子与所述目标样本句子对应的词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分,包括:The computer device according to claim 9, wherein the first semantic score between the to-be-matched sentence and the target sample sentence is determined according to the lexical distance result corresponding to the to-be-matched sentence and the target sample sentence, include:
    对所述待匹配句子与所述目标样本句子对应的第一词义距离结果进行求导处理,得到与所述第一词义距离结果对应的词义曲线;Perform derivation processing on the first word sense distance result corresponding to the sentence to be matched and the target sample sentence, to obtain a word meaning curve corresponding to the first word sense distance result;
    通过寻峰识别算法确定所述词义曲线中是否存在词义峰值;Determine whether there is a word-meaning peak in the word-meaning curve by using a peak-seeking identification algorithm;
    在所述词义曲线中存在词义峰值时,根据所述词义峰值确定待匹配句子与该词义曲线 对应的目标样本句子之间的第一语义得分;When there is a word meaning peak in the word meaning curve, determine the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve according to the word meaning peak;
    在所述词义曲线不存在词义峰值时,确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分为0。When the word meaning curve does not have a word meaning peak, it is determined that the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve is 0.
  15. 一个或多个存储有计算机可读指令的可读存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:One or more readable storage media storing computer-readable instructions, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
    获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较;Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
    在所述待匹配字符长度小于所述目标样本字符长度时,将所述待匹配字符长度记录为滑动窗口的窗口长度;When the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
    在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果;Sliding the sliding window on the target sample sentence, and matching the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first semantic distance result;
    根据所述待匹配句子与所述目标样本句子对应的第一词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分;Determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
    在所述第一语义得分超过预设分数阈值时,将所述目标样本句子记录为与所述待匹配句子对应的语义匹配句子。When the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  16. 如权利要求15所述的可读存储介质,其中,所述获取待匹配句子和目标样本句子,将所述待匹配句子对应的待匹配字符长度与所述目标样本句子对应的目标样本字符长度进行长度比较之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium according to claim 15, wherein, in acquiring the sentence to be matched and the target sample sentence, the length of the characters to be matched corresponding to the sentence to be matched is compared with the length of the target sample character corresponding to the target sample sentence. Before the length comparison, the computer-readable instructions, when executed by one or more processors, cause the one or more processors to further perform the following steps:
    获取待匹配句子以及目标样本文本;所述目标样本文本中包含多个句子;Obtain the sentence to be matched and the target sample text; the target sample text contains multiple sentences;
    将所述待匹配句子与各所述句子输入至预设相似度识别模型中,确定所述待匹配句子与各所述句子之间的语义相似度;Inputting the to-be-matched sentence and each of the sentences into a preset similarity recognition model to determine the semantic similarity between the to-be-matched sentence and each of the sentences;
    确定所述待匹配句子与各所述句子对应的各所述语义相似度中最高的语义相似度;determining the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences;
    在最高的语义相似度小于预设相似度阈值且其与所述预设相似度阈值之间的差值小于预设相似度差值时,将与最高的语义相似度对应的所述句子记录为目标样本句子。When the highest semantic similarity is smaller than the preset similarity threshold and the difference between it and the preset similarity threshold is smaller than the preset similarity difference, the sentence corresponding to the highest semantic similarity is recorded as target sample sentence.
  17. 如权利要求16所述的可读存储介质,其中,所述确定所述待匹配句子与各所述句子对应的各所述语义相似度中最高的语义相似度之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium of claim 16, wherein after the determination of the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences, the computer-readable instructions are executed by When the one or more processors are executed, the one or more processors are caused to further perform the following steps:
    在最高的语义相似度大于或等于预设相似度阈值时,将与最高的语义相似度对应的所述句子记录为与所述待匹配句子对应的语义匹配句子。When the highest semantic similarity is greater than or equal to a preset similarity threshold, the sentence corresponding to the highest semantic similarity is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  18. 如权利要求15所述的可读存储介质,其中,所述获取与所述待匹配句子对应的待匹配字符长度,以及与各所述目标样本句子对应的目标样本字符长度之前,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:The readable storage medium according to claim 15, wherein before acquiring the length of the characters to be matched corresponding to the sentences to be matched and the length of the target sample characters corresponding to each of the target sample sentences, the computer can When the read instruction is executed by one or more processors, the one or more processors further perform the following steps:
    获取预设文本识别模型;Get the preset text recognition model;
    将所述待匹配句子输入至所述预设文本识别模型中,得到与所述待匹配句子对应的待匹配词向量;同时,将所述目标样本句子输入至所述预设文本识别模型中,得到与所述目标样本句子对应的目标样本词向量;Inputting the to-be-matched sentence into the preset text recognition model to obtain a to-be-matched word vector corresponding to the to-be-matched sentence; at the same time, inputting the target sample sentence into the preset text recognition model, obtaining the target sample word vector corresponding to the target sample sentence;
    根据各所述待匹配词向量确定所述待匹配句子的待匹配字符长度;同时根据各所述目标样本词向量确定所述目标样本句子的目标样本字符长度。The to-be-matched character length of the to-be-matched sentence is determined according to each of the to-be-matched word vectors; at the same time, the target sample character length of the target sample sentence is determined according to each of the target sample word vectors.
  19. 如权利要求15所述的可读存储介质,其中,所述在所述目标样本句子上滑动所述滑动窗口,将被所述滑动窗口覆盖的所述目标样本句子的目标样本字段与所述待匹配句子进行匹配,得到第一词义距离结果,包括:The readable storage medium according to claim 15, wherein, by sliding the sliding window on the target sample sentence, the target sample field of the target sample sentence covered by the sliding window is compared with the to-be-to-be sample field. Match the sentence to match, and get the first word sense distance result, including:
    将所述目标样本句子的起始字符与所述待匹配句子的起始字符对齐,将被所述滑动窗口覆盖的第一目标样本字段记录为第一截取句子;Align the starting character of the target sample sentence with the starting character of the sentence to be matched, and record the first target sample field covered by the sliding window as the first intercepted sentence;
    将所述第一截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第 一截取句子的语义结果;The first intercepted sentence is semantically matched with the described sentence to be matched, and the semantic result of the sentence to be matched and the first intercepted sentence is obtained;
    在所述目标样本句子上将所述滑动窗口向右滑动一个字符长度,将被所述滑动窗口覆盖的第二目标样本字段记录为第二截取句子;On the target sample sentence, slide the sliding window to the right by one character length, and record the second target sample field covered by the sliding window as the second intercepted sentence;
    将所述第二截取句子与所述待匹配句子进行语义匹配,得到所述待匹配句子与所述第二截取句子的语义结果;The second intercepted sentence and the to-be-matched sentence are semantically matched to obtain the semantic result of the to-be-matched sentence and the second intercepted sentence;
    在检测到所述滑动窗口的终点字符已与所述目标样本句子的终点字符对齐时,将所有语义结果记录为第一词义距离结果。When it is detected that the end character of the sliding window has been aligned with the end character of the target sample sentence, all semantic results are recorded as the first lexical distance result.
  20. 如权利要求15所述的可读存储介质,其中,所述根据所述待匹配句子与所述目标样本句子对应的词义距离结果,确定待匹配句子与所述目标样本句子之间的第一语义得分,包括:The readable storage medium according to claim 15, wherein the first semantics between the to-be-matched sentence and the target sample sentence is determined according to the result of the semantic distance corresponding to the to-be-matched sentence and the target sample sentence Score, including:
    对所述待匹配句子与所述目标样本句子对应的第一词义距离结果进行求导处理,得到与所述第一词义距离结果对应的词义曲线;Perform derivation processing on the first word sense distance result corresponding to the sentence to be matched and the target sample sentence, to obtain a word meaning curve corresponding to the first word sense distance result;
    通过寻峰识别算法确定所述词义曲线中是否存在词义峰值;Determine whether there is a word-meaning peak in the word-meaning curve by using a peak-seeking identification algorithm;
    在所述词义曲线中存在词义峰值时,根据所述词义峰值确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分;When there is a word meaning peak in the word meaning curve, determine the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve according to the word meaning peak;
    在所述词义曲线不存在词义峰值时,确定待匹配句子与该词义曲线对应的目标样本句子之间的第一语义得分为0。When the word meaning curve does not have a word meaning peak, it is determined that the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve is 0.
PCT/CN2021/083780 2020-12-01 2021-03-30 Text semantic matching method and apparatus for long and short sentences, computer device and storage medium WO2022116436A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011382663.6A CN112446218A (en) 2020-12-01 2020-12-01 Long and short sentence text semantic matching method and device, computer equipment and storage medium
CN202011382663.6 2020-12-01

Publications (1)

Publication Number Publication Date
WO2022116436A1 true WO2022116436A1 (en) 2022-06-09

Family

ID=74739213

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083780 WO2022116436A1 (en) 2020-12-01 2021-03-30 Text semantic matching method and apparatus for long and short sentences, computer device and storage medium

Country Status (2)

Country Link
CN (1) CN112446218A (en)
WO (1) WO2022116436A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511014A (en) * 2022-11-23 2022-12-23 联仁健康医疗大数据科技股份有限公司 Information matching method, device, equipment and storage medium
CN115577092A (en) * 2022-12-09 2023-01-06 深圳市人马互动科技有限公司 User speech processing method and device, electronic equipment and computer storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446218A (en) * 2020-12-01 2021-03-05 平安科技(深圳)有限公司 Long and short sentence text semantic matching method and device, computer equipment and storage medium
CN113255370B (en) * 2021-06-22 2022-09-20 中国平安财产保险股份有限公司 Industry type recommendation method, device, equipment and medium based on semantic similarity
CN114330232A (en) * 2021-12-29 2022-04-12 北京字节跳动网络技术有限公司 Text display method, device, equipment and storage medium
CN114743012B (en) * 2022-04-08 2024-02-06 北京金堤科技有限公司 Text recognition method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460455A (en) * 2018-10-25 2019-03-12 第四范式(北京)技术有限公司 A kind of Method for text detection and device
CN110852056A (en) * 2018-07-25 2020-02-28 中兴通讯股份有限公司 Method, device and equipment for acquiring text similarity and readable storage medium
CN111274822A (en) * 2018-11-20 2020-06-12 华为技术有限公司 Semantic matching method, device, equipment and storage medium
US20200257858A1 (en) * 2018-04-18 2020-08-13 Microsoft Technology Licensing, Llc Multi-scale model for semantic matching
CN112446218A (en) * 2020-12-01 2021-03-05 平安科技(深圳)有限公司 Long and short sentence text semantic matching method and device, computer equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200257858A1 (en) * 2018-04-18 2020-08-13 Microsoft Technology Licensing, Llc Multi-scale model for semantic matching
CN110852056A (en) * 2018-07-25 2020-02-28 中兴通讯股份有限公司 Method, device and equipment for acquiring text similarity and readable storage medium
CN109460455A (en) * 2018-10-25 2019-03-12 第四范式(北京)技术有限公司 A kind of Method for text detection and device
CN111274822A (en) * 2018-11-20 2020-06-12 华为技术有限公司 Semantic matching method, device, equipment and storage medium
CN112446218A (en) * 2020-12-01 2021-03-05 平安科技(深圳)有限公司 Long and short sentence text semantic matching method and device, computer equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511014A (en) * 2022-11-23 2022-12-23 联仁健康医疗大数据科技股份有限公司 Information matching method, device, equipment and storage medium
CN115511014B (en) * 2022-11-23 2023-04-07 联仁健康医疗大数据科技股份有限公司 Information matching method, device, equipment and storage medium
CN115577092A (en) * 2022-12-09 2023-01-06 深圳市人马互动科技有限公司 User speech processing method and device, electronic equipment and computer storage medium
CN115577092B (en) * 2022-12-09 2023-03-24 深圳市人马互动科技有限公司 User speech processing method and device, electronic equipment and computer storage medium

Also Published As

Publication number Publication date
CN112446218A (en) 2021-03-05

Similar Documents

Publication Publication Date Title
WO2022116436A1 (en) Text semantic matching method and apparatus for long and short sentences, computer device and storage medium
WO2018153265A1 (en) Keyword extraction method, computer device, and storage medium
WO2022142613A1 (en) Training corpus expansion method and apparatus, and intent recognition model training method and apparatus
WO2020220539A1 (en) Data increment method and device, computer device and storage medium
WO2019136993A1 (en) Text similarity calculation method and device, computer apparatus, and storage medium
WO2022227162A1 (en) Question and answer data processing method and apparatus, and computer device and storage medium
CN109446885B (en) Text-based component identification method, system, device and storage medium
CN111191032B (en) Corpus expansion method, corpus expansion device, computer equipment and storage medium
WO2021073119A1 (en) Method and apparatus for entity disambiguation based on intention recognition model, and computer device
CN112380837B (en) Similar sentence matching method, device, equipment and medium based on translation model
WO2022134805A1 (en) Document classification prediction method and apparatus, and computer device and storage medium
KR20210088680A (en) Video cutting method, apparatus, computer equipment and storage medium
WO2020114100A1 (en) Information processing method and apparatus, and computer storage medium
CN113297366B (en) Emotion recognition model training method, device, equipment and medium for multi-round dialogue
CN110427612B (en) Entity disambiguation method, device, equipment and storage medium based on multiple languages
CN110598210B (en) Entity recognition model training, entity recognition method, entity recognition device, entity recognition equipment and medium
CN110309504B (en) Text processing method, device, equipment and storage medium based on word segmentation
CN110597966A (en) Automatic question answering method and device
CN112633423B (en) Training method of text recognition model, text recognition method, device and equipment
WO2022141864A1 (en) Conversation intent recognition model training method, apparatus, computer device, and medium
CN111832581B (en) Lung feature recognition method and device, computer equipment and storage medium
WO2022142108A1 (en) Method and apparatus for training interview entity recognition model, and method and apparatus for extracting interview information entity
CN113449489B (en) Punctuation mark labeling method, punctuation mark labeling device, computer equipment and storage medium
CN110633475A (en) Natural language understanding method, device and system based on computer scene and storage medium
CN113076404B (en) Text similarity calculation method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21899480

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21899480

Country of ref document: EP

Kind code of ref document: A1