WO2022116436A1 - Procédé et appareil d'appariement sémantique de texte pour des phrases longues et courtes, dispositif informatique et support de stockage - Google Patents

Procédé et appareil d'appariement sémantique de texte pour des phrases longues et courtes, dispositif informatique et support de stockage Download PDF

Info

Publication number
WO2022116436A1
WO2022116436A1 PCT/CN2021/083780 CN2021083780W WO2022116436A1 WO 2022116436 A1 WO2022116436 A1 WO 2022116436A1 CN 2021083780 W CN2021083780 W CN 2021083780W WO 2022116436 A1 WO2022116436 A1 WO 2022116436A1
Authority
WO
WIPO (PCT)
Prior art keywords
sentence
matched
target sample
semantic
length
Prior art date
Application number
PCT/CN2021/083780
Other languages
English (en)
Chinese (zh)
Inventor
谢静文
阮晓雯
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022116436A1 publication Critical patent/WO2022116436A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • G06F16/90344Query processing by using string matching techniques

Definitions

  • the present application relates to the technical field of semantic parsing, and in particular, to a method, apparatus, computer device and storage medium for semantic matching of long and short sentences.
  • Embodiments of the present application provide a method, an apparatus, a computer device, and a storage medium for semantic matching of long and short sentences, so as to solve the problem of low accuracy of semantic matching between long and short sentences.
  • a method for semantic matching of long and short sentences including:
  • the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
  • the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • a long-short sentence text semantic matching device comprising:
  • a sentence obtaining module configured to obtain a sentence to be matched and a target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
  • a window length recording module configured to record the length of the character to be matched as the window length of the sliding window when the length of the character to be matched is less than the length of the character of the target sample;
  • the first sentence matching module is used to slide the sliding window on the target sample sentence, and match the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain the first sentence. lexical distance results;
  • the first semantic score determination module is used to determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
  • a matching sentence determination module configured to record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence when the first semantic score exceeds a preset score threshold.
  • a computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, wherein the processor implements the following steps when executing the computer-readable instructions:
  • the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
  • the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • One or more readable storage media storing computer-readable instructions, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processors to perform the following steps:
  • the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
  • the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • the method obtains the sentence to be matched and the target sample sentence, the length of the character to be matched corresponding to the sentence to be matched and the target sample character corresponding to the target sample sentence Length comparison is carried out; when the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window; sliding the sliding window on the target sample sentence, the The target sample field of the target sample sentence covered by the sliding window is matched with the sentence to be matched to obtain a first semantic distance result; according to the first semantic distance corresponding to the sentence to be matched and the target sample sentence As a result, the first semantic score between the sentence to be matched and the target sample sentence is determined; when the first semantic score exceeds a preset score threshold, the target sample sentence is recorded as the sentence corresponding to the sentence to be matched. Semantically match sentences.
  • the present application defines a sliding window indicator to match the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain a first word sense distance result, and then according to the first word sense distance
  • the first semantic score between the sentence to be matched and the target sample sentence is determined to determine whether there is part of the semantic information in the target sample sentence that matches the sentence to be matched, so that the target sample sentence that will not be recalled (target sample sentence)
  • target sample sentence When the semantic similarity between the sentence and the sentence to be matched is less than the preset similarity threshold, it will be directly determined that the target sample sentence does not match the sentence to be matched), and there is a possibility of being recalled.
  • the target scene provides more sample data, and also improves the semantic matching accuracy between short sentences and long sentences.
  • FIG. 1 is a schematic diagram of an application environment of a method for semantic matching of long and short sentences in an embodiment of the present application
  • FIG. 2 is a flowchart of a method for semantic matching of long and short sentences in an embodiment of the present application
  • step S30 is a flowchart of step S30 in the method for semantic matching of long and short sentences in an embodiment of the present application
  • step S40 is a flowchart of step S40 in the method for semantic matching of long and short sentences in an embodiment of the present application
  • 5 is another flowchart of a method for semantic matching of long and short sentences in an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a device for semantic matching of long and short sentences in an embodiment of the present application
  • FIG. 7 is a schematic block diagram of a first sentence matching module in a device for semantic matching of long and short sentences in an embodiment of the present application
  • FIG. 8 is a schematic block diagram of a first semantic score determination module in a long-short sentence text semantic matching device according to an embodiment of the present application
  • FIG. 9 is a schematic diagram of a computer device in an embodiment of the present application.
  • the method for semantic matching of long-short sentence text can be applied in the application environment shown in FIG. 1 .
  • the long-short sentence text semantic matching method is applied in a long and short sentence text semantic matching system, and the long and short sentence text semantic matching system includes a client and a server as shown in FIG.
  • the problem of low accuracy of semantic matching between them refers to the client, also known as the client, refers to the program corresponding to the server and providing local services for the client.
  • Clients can be installed on, but not limited to, various personal computers, laptops, smartphones, tablets, and portable wearable devices.
  • the server can be implemented as an independent server or a server cluster composed of multiple servers.
  • a method for semantic matching of long and short sentences is provided, and the method is applied to the server in FIG. 1 as an example for description, including the following steps:
  • S10 Obtain the sentence to be matched and the target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence.
  • the sentences to be matched may be sentences in different application scenarios.
  • the sentences to be matched may be sentences for robot reply.
  • the target sample sentences can also be sentences in different application scenarios.
  • the target sample sentences and the sentences to be matched are sentences in the same application scenario.
  • the length of the characters to be matched refers to the number of characters in the sentence to be matched; the length of the target sample character refers to the number of characters in the target sample sentence.
  • step S10 before step S10, it further includes:
  • S01 Obtain a sentence to be matched and a target sample text; the target sample text includes multiple sentences.
  • the target sample text is the text waiting to be detected whether there is a sentence semantically matching the sentence to be matched, and the target sample text contains multiple sentences. It is understandable that the segmentation processing based on the period form is performed on the target sample text, that is, a sentence in the target sample text that ends with a period is segmented (because usually a complete sentence contains an independent semantic information). Generally, the sentence to be matched is a single sentence, and if there are multiple periods in the sentence to be matched, it can also be split.
  • S02 Input the to-be-matched sentence and each of the sentences into a preset similarity recognition model, and determine the semantic similarity between the to-be-matched sentence and each of the sentences.
  • the preset similarity recognition model may be a model pre-trained by methods such as machine learning, and the preset similarity recognition model is used to determine the semantic similarity between two sentences.
  • S03 Determine the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences.
  • the preset similarity threshold may be set according to the requirements of the actual application scenario.
  • the preset similarity threshold may be set to 0.9, 0.95, or the like.
  • the preset similarity difference value can be any value from 0.1 to 0.5.
  • the semantic similarity between the to-be-matched sentence and each of the sentences is determined, due to the needs discussed here.
  • Select the sentence with the highest semantic similarity to the sentence to be matched and the most matching sentence so determine the highest semantic similarity in the semantic similarity corresponding to the sentence to be matched and each sentence, and compare the highest semantic similarity with the sentence.
  • the preset similarity thresholds are compared, and when the highest semantic similarity is less than the preset similarity threshold, the difference between the highest semantic similarity and the preset similarity threshold is determined, and the difference is compared with the preset similarity.
  • the difference is compared, and when the difference is smaller than the preset similarity difference, the sentence corresponding to the highest semantic similarity is recorded as the target sample sentence.
  • the preset similarity recognition model determines that the semantic similarity between two sentences is less than the preset similarity threshold, it will determine that the two sentences are not similar. For sentences corresponding to less than the preset similarity threshold, it is determined whether the difference between the semantic similarity and the preset similarity threshold is less than the preset similarity difference, and further semantic similarity judgment is performed through steps S10-S40.
  • the method further includes:
  • the sentence corresponding to the highest semantic similarity is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • the highest semantic similarity is greater than or equal to the preset similarity threshold, it means that the sentence corresponding to the highest semantic similarity is a sentence that is semantically matched with the sentence to be matched, then the sentence corresponding to the highest semantic similarity is directly matched with the highest semantic similarity.
  • the corresponding sentence is recorded as a semantic matching sentence corresponding to the sentence to be matched.
  • step S10 before step S10, that is, before obtaining the length of the characters to be matched corresponding to the sentences to be matched, and the length of the target sample characters corresponding to each of the target sample sentences, the steps include:
  • the preset text recognition model may be a word2vec or bert model trained based on a large number of training samples, and the preset text recognition model is used to perform word vector conversion on sentences.
  • a preset text recognition model is obtained, and the sentence to be matched is input into the preset text recognition model, and word embedding processing is performed on the sentence to be matched, that is, word segmentation is performed on the sentence to be matched.
  • word embedding processing is performed on the sentence to be matched, that is, word segmentation is performed on the sentence to be matched.
  • it is converted into a word vector, and the word vector to be matched corresponding to the sentence to be matched is obtained; in the same way, the target sample sentence is input into the preset text recognition model, and word embedding processing is performed on the target sample sentence to obtain the target sample sentence.
  • the corresponding target sample word vector is obtained by obtaining the sentence to be matched and the target sample sentence.
  • the recognition model inputting the to-be-matched sentence into the preset text recognition model to obtain a to-be-matched word vector corresponding to the to-be-matched sentence; at the same time, inputting the target sample sentence into the preset text
  • the recognition model after obtaining the target sample word vector corresponding to the target sample sentence, the length of the to-be-matched characters of the to-be-matched sentence is determined according to the specific number of each to-be-matched word vector; at the same time, the target sample is determined according to the specific number of each target sample word vector.
  • the target sample character length of the sentence is determined according to the specific number of each target sample word vector.
  • the length of the character to be matched is recorded as the window length of the sliding window.
  • the target sample field refers to the character segment covered by the sliding window in the target sample sentence.
  • the word sense distance result represents whether there is key sense information between the sentence to be matched and the target sample sentence.
  • step S30 it includes:
  • S301 Align the start characters of the target sample sentence with the start characters of the to-be-matched sentence, and record the first target sample field covered by the sliding window as the first intercepted sentence.
  • the starting character refers to the character at the starting position (ie, the first position) in the sentence.
  • the length of the first intercepted sentence is equal to the length of the window and also equal to the length of the characters to be matched.
  • S302 Perform semantic matching on the first intercepted sentence and the to-be-matched sentence to obtain a semantic result of the to-be-matched sentence and the first intercepted sentence.
  • a word sense result can be regarded as a word sense distance value, that is, it represents the word sense distance between each intercepted sentence and the sentence to be matched.
  • the first The intercepted sentence and the to-be-matched sentence are semantically matched to obtain a semantic result of the to-be-matched sentence and the first intercepted sentence, where the semantic result represents whether the first intercepted sentence and the to-be-matched sentence are semantically similar.
  • semantic matching here is based on sentence structure information for semantic matching judgment, and the sentence structure information represents whether the sentence character composition between the first intercepted sentence and the sentence to be matched is similar, or whether the structure is similar (such as sentence The structure is subject, predicate, object, etc.), which can be used as a supplement to semantic information.
  • the relationship between the first intercepted sentence and the to-be-matched sentence is represented.
  • S304 Perform semantic matching on the second intercepted sentence and the to-be-matched sentence to obtain a semantic result of the to-be-matched sentence and the second intercepted sentence.
  • the second intercepted sentence is The sentence and the to-be-matched sentence are semantically matched to obtain a semantic result of the to-be-matched sentence and the second intercepted sentence, where the semantic result represents whether the second intercepted sentence and the to-be-matched sentence are semantically similar.
  • the end character refers to the last character in the sentence.
  • S40 Determine a first semantic score between the sentence to be matched and the target sample sentence according to the result of the first semantic distance corresponding to the sentence to be matched and the target sample sentence.
  • the first semantic score indicates the semantic similarity between the target sample sentence and the sentence to be matched. The higher the first semantic score indicates that the target sample sentence contains more key semantic information matching the sentence to be matched.
  • step S40 that is, according to the first word sense distance result corresponding to the sentence to be matched and the target sample sentence, determine the distance between the sentence to be matched and the target sample sentence.
  • the first semantic score including:
  • S401 Perform derivation processing on a first word sense distance result corresponding to the sentence to be matched and the target sample sentence, to obtain a word meaning curve corresponding to the first word meaning distance result.
  • the first word sense distance result can be regarded as a continuous density sequence formed by integrating multiple word sense results, and a word sense curve corresponding to the first word sense distance result can be obtained by derivation of the first word sense distance result.
  • S402 Determine whether there is a word-meaning peak in the word-meaning curve through a peak-seeking identification algorithm.
  • the peak-seeking identification algorithm is used to find out whether a word-meaning peak appears in the word-meaning curve, and the word-meaning peak is used to represent the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word-meaning curve.
  • the peak-seeking identification algorithm can perform a global search in the word meaning curve. During the global search process, if the word meaning curve has a point where the curve first rises and then falls, it is a word meaning peak.
  • the peak search algorithm is used to identify the word meaning curve in the word sense curve. Find out whether there is a word meaning peak, if there is a word meaning peak in the word meaning curve, the first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve can be determined according to the word meaning peak.
  • the first semantic score between the sentence to be matched and the target sample sentence corresponding to the semantic curve e.g., the larger the peak value of the word sense peak, the higher the first semantic score; or the larger the area occupied by the word sense peak, the higher the first semantic score.
  • the preset score threshold may be determined according to different application scenarios, and for example, the preset score threshold may be a value such as 90 or 95.
  • a sliding window indicator is defined to match the target sample field of the target sample sentence covered by the sliding window with the to-be-matched sentence to obtain a first word sense distance result, and then according to the The first semantic distance result determines the first semantic score between the sentence to be matched and the target sample sentence, so as to determine whether there is part of the semantic information in the target sample sentence that matches the sentence to be matched, so that the target sample that would not be recalled originally Sentence (when the semantic similarity between the target sample sentence and the sentence to be matched is less than the preset similarity threshold, it will be directly determined that the target sample sentence does not match the sentence to be matched), and there is a possibility of being recalled.
  • step S10 it further includes:
  • the target sample character length as the window length of the sliding window.
  • the starting character of the target sample sentence is compared with the The starting characters of the sentence to be matched are aligned, and the first field to be matched covered by the sliding window (that is, the field in the sentence to be matched that is made up of characters of the same length as the window length) is recorded as the third interception sentence.
  • the second semantic score indicates the semantic similarity between the target sample sentence and the sentence to be matched.
  • a higher second semantic score indicates that the target sample sentence contains more key semantic information matching the sentence to be matched.
  • the second semantic score is compared with the predicted The score threshold is set for comparison, and when the second semantic score exceeds the preset score threshold, the target sample sentence is recorded as the semantic matching sentence corresponding to the sentence to be matched; when the second semantic score does not exceed the preset score threshold, it is characterized.
  • the target sample sentence does not semantically match the sentence to be matched.
  • a long-short sentence text semantic matching device is provided, and the long and short sentence text semantic matching device is in one-to-one correspondence with the long-short sentence text semantic matching method in the above embodiment.
  • the apparatus for semantic matching of long and short sentences includes a sentence acquisition module 10 , a first window length recording module 20 , a first sentence matching module 30 , a first semantic score determination module 40 and a first matched sentence determination module 50 .
  • the detailed description of each functional module is as follows:
  • the sentence obtaining module 10 is configured to obtain a sentence to be matched and a target sample sentence, and compare the length of the character to be matched corresponding to the sentence to be matched with the length of the target sample character corresponding to the target sample sentence;
  • the first window length recording module 20 is configured to record the length of the character to be matched as the window length of the sliding window when the length of the character to be matched is less than the length of the character of the target sample;
  • the first sentence matching module 30 is used to slide the sliding window on the target sample sentence, and match the target sample field of the target sample sentence covered by the sliding window with the sentence to be matched to obtain the first sentence. a semantic distance result;
  • the first semantic score determination module 40 is configured to determine the first semantic score between the sentence to be matched and the target sample sentence according to the first semantic distance result corresponding to the sentence to be matched and the target sample sentence;
  • the first matching sentence determination module 50 is configured to record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence when the first semantic score exceeds a preset score threshold.
  • the long-short sentence text semantic matching device further comprises:
  • a sample text obtaining module used to obtain a sentence to be matched and a target sample text;
  • the target sample text contains a plurality of sentences;
  • a semantic similarity determination module configured to input the to-be-matched sentence and each of the sentences into a preset similarity recognition model to determine the semantic similarity between the to-be-matched sentence and each of the sentences;
  • a highest similarity determination module used for determining the highest semantic similarity among the semantic similarities corresponding to the sentences to be matched and the sentences;
  • the target sample sentence determination module is used to determine the semantic similarity with the highest semantic similarity when the highest semantic similarity is smaller than the preset similarity threshold and the difference between it and the preset similarity threshold is smaller than the preset similarity difference
  • the sentence corresponding to the degree is recorded as the target sample sentence.
  • the long-short sentence text semantic matching device further comprises:
  • the semantic matching sentence recording module is configured to record the sentence corresponding to the highest semantic similarity as the semantic matching sentence corresponding to the to-be-matched sentence when the highest semantic similarity is greater than or equal to a preset similarity threshold.
  • the long-short sentence text semantic matching device further comprises:
  • the text recognition model acquisition module is used to acquire the preset text recognition model
  • a word vector determination module configured to input the sentence to be matched into the preset text recognition model to obtain a word vector to be matched corresponding to the sentence to be matched; at the same time, input the target sample sentence into the In the preset text recognition model, the target sample word vector corresponding to the target sample sentence is obtained;
  • a character length determination module configured to determine the length of the characters to be matched in the sentence to be matched according to each of the word vectors to be matched; at the same time, determine the length of the target sample characters of the target sample sentence according to each of the target sample word vectors.
  • the first sentence matching module 30 includes the following units:
  • Character alignment unit 301 for aligning the starting character of the target sample sentence with the starting character of the sentence to be matched, and recording the first target sample field covered by the sliding window as the first intercepted sentence;
  • a first semantic matching unit 302 configured to perform semantic matching on the first intercepted sentence and the to-be-matched sentence, to obtain a semantic result of the to-be-matched sentence and the first intercepted sentence;
  • Window sliding unit 303 for sliding the sliding window to the right by one character length on the target sample sentence, and recording the second target sample field covered by the sliding window as the second interception sentence;
  • the second semantic matching unit 304 is configured to perform semantic matching between the second intercepted sentence and the to-be-matched sentence to obtain a semantic result of the to-be-matched sentence and the second intercepted sentence;
  • the lexical distance result recording unit 305 is configured to record all semantic results as the first lexical distance result when it is detected that the end character of the sliding window has been aligned with the end character of the target sample sentence.
  • the first semantic score determination module 40 includes:
  • a word meaning curve determination unit 401 configured to perform derivation processing on the first word meaning distance result corresponding to the sentence to be matched and the target sample sentence, to obtain a word meaning curve corresponding to the first word meaning distance result;
  • a word meaning peak determining unit 402 configured to determine whether there is a word meaning peak in the word meaning curve through a peak-seeking identification algorithm
  • a first semantic score determination unit 403, configured to determine, according to the word meaning peak, a first semantic score between the sentence to be matched and the target sample sentence corresponding to the word meaning curve when there is a word meaning peak in the word meaning curve;
  • the second semantic score determining unit 404 is configured to determine that the first semantic score between the sentence to be matched and the target sample sentence corresponding to the semantic curve is 0 when the word meaning curve does not have a word meaning peak.
  • the long-short sentence text semantic matching device further comprises:
  • a second window length recording module configured to record the length of the target sample character as the window length of the sliding window when the length of the character to be matched is greater than or equal to the length of the target sample character;
  • the second sentence matching module is used to slide the sliding window on the sentence to be matched, and match the to-be-matched field of the sentence to be matched covered by the sliding window with the target sample sentence to obtain the second sentence. lexical distance results;
  • a second semantic score determination module configured to determine a second semantic score between the to-be-matched sentence and the target sample sentence according to the second semantic distance result corresponding to the to-be-matched sentence and the target sample sentence;
  • the second matching sentence determination module is configured to record the target sample sentence as a semantic matching sentence corresponding to the to-be-matched sentence when the second semantic score exceeds the preset score threshold.
  • Each module in the above-mentioned apparatus for semantic matching of long and short sentences can be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • a computer device is provided, and the computer device may be a server, and its internal structure diagram may be as shown in FIG. 9 .
  • the computer device includes a processor, memory, a network interface, and a database connected by a system bus. Among them, the processor of the computer device is used to provide computing and control capabilities.
  • the memory of the computer device includes a readable storage medium, an internal memory.
  • the readable storage medium stores an operating system, computer readable instructions and a database.
  • the internal memory provides an environment for the execution of the operating system and computer-readable instructions in the readable storage medium.
  • the database of the computer device is used to store the data used in the text semantic matching method of long and short sentences in the above embodiment.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer readable instructions when executed by a processor, implement a method for semantic matching of long and short sentences.
  • the readable storage medium provided by this embodiment includes a non-volatile readable storage medium and a volatile readable storage medium.
  • a computer apparatus comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the processor executes the computer
  • the following steps are implemented when readable instructions:
  • the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
  • the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • one or more readable storage media having computer-readable instructions stored thereon, wherein the computer-readable instructions, when executed by one or more processors, cause the one or more processing The device performs the following steps:
  • the length of the character to be matched is less than the length of the character of the target sample, the length of the character to be matched is recorded as the window length of the sliding window;
  • the target sample sentence is recorded as a semantic matching sentence corresponding to the to-be-matched sentence.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Machine Translation (AREA)

Abstract

L'invention concerne un procédé et un appareil d'appariement sémantique de texte pour des phrases longues et courtes, un dispositif informatique et un support de stockage. Le procédé comprend les étapes suivantes : obtention d'une phrase à apparier et d'une phrase échantillon cible, et comparaison de la longueur des caractères à apparier correspondant à la phrase avec la longueur des caractères de l'échantillon cible correspondant à la phrase échantillon cible (S10) ; lorsque la longueur des caractères est inférieure à la longueur des caractères de l'échantillon cible, enregistrement de la longueur des caractères en tant que longueur de fenêtre d'une fenêtre coulissante (S20) ; coulissement de la fenêtre coulissante sur la phrase d'échantillon cible, appariement d'un champ d'échantillon cible de la phrase échantillon cible couverte par la fenêtre coulissante avec la phrase afin d'obtenir un premier résultat de distance sémantique (S30) ; détermination d'un premier score sémantique entre la phrase et la phrase d'échantillon cible selon le premier résultat de distance sémantique correspondant à la phrase et à la phrase échantillon cible (S40) ; lorsque le premier score sémantique dépasse un seuil de score prédéfini, enregistrement de la phrase échantillon cible en tant que phrase d'appariement sémantique correspondant à la phrase (S50). Le procédé améliore la précision d'appariement sémantique entre des phrases longues et courtes.
PCT/CN2021/083780 2020-12-01 2021-03-30 Procédé et appareil d'appariement sémantique de texte pour des phrases longues et courtes, dispositif informatique et support de stockage WO2022116436A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011382663.6A CN112446218A (zh) 2020-12-01 2020-12-01 长短句文本语义匹配方法、装置、计算机设备及存储介质
CN202011382663.6 2020-12-01

Publications (1)

Publication Number Publication Date
WO2022116436A1 true WO2022116436A1 (fr) 2022-06-09

Family

ID=74739213

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083780 WO2022116436A1 (fr) 2020-12-01 2021-03-30 Procédé et appareil d'appariement sémantique de texte pour des phrases longues et courtes, dispositif informatique et support de stockage

Country Status (2)

Country Link
CN (1) CN112446218A (fr)
WO (1) WO2022116436A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511014A (zh) * 2022-11-23 2022-12-23 联仁健康医疗大数据科技股份有限公司 信息匹配方法、装置、设备及存储介质
CN115577092A (zh) * 2022-12-09 2023-01-06 深圳市人马互动科技有限公司 用户话句处理方法、装置、电子设备及计算机存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112446218A (zh) * 2020-12-01 2021-03-05 平安科技(深圳)有限公司 长短句文本语义匹配方法、装置、计算机设备及存储介质
CN113255370B (zh) * 2021-06-22 2022-09-20 中国平安财产保险股份有限公司 基于语义相似度的行业类型推荐方法、装置、设备及介质
CN114330232A (zh) * 2021-12-29 2022-04-12 北京字节跳动网络技术有限公司 一种文本显示方法、装置、设备及存储介质
CN114743012B (zh) * 2022-04-08 2024-02-06 北京金堤科技有限公司 一种文本识别方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109460455A (zh) * 2018-10-25 2019-03-12 第四范式(北京)技术有限公司 一种文本检测方法及装置
CN110852056A (zh) * 2018-07-25 2020-02-28 中兴通讯股份有限公司 一种获取文本相似度的方法、装置、设备及可读存储介质
CN111274822A (zh) * 2018-11-20 2020-06-12 华为技术有限公司 语义匹配方法、装置、设备及存储介质
US20200257858A1 (en) * 2018-04-18 2020-08-13 Microsoft Technology Licensing, Llc Multi-scale model for semantic matching
CN112446218A (zh) * 2020-12-01 2021-03-05 平安科技(深圳)有限公司 长短句文本语义匹配方法、装置、计算机设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200257858A1 (en) * 2018-04-18 2020-08-13 Microsoft Technology Licensing, Llc Multi-scale model for semantic matching
CN110852056A (zh) * 2018-07-25 2020-02-28 中兴通讯股份有限公司 一种获取文本相似度的方法、装置、设备及可读存储介质
CN109460455A (zh) * 2018-10-25 2019-03-12 第四范式(北京)技术有限公司 一种文本检测方法及装置
CN111274822A (zh) * 2018-11-20 2020-06-12 华为技术有限公司 语义匹配方法、装置、设备及存储介质
CN112446218A (zh) * 2020-12-01 2021-03-05 平安科技(深圳)有限公司 长短句文本语义匹配方法、装置、计算机设备及存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115511014A (zh) * 2022-11-23 2022-12-23 联仁健康医疗大数据科技股份有限公司 信息匹配方法、装置、设备及存储介质
CN115511014B (zh) * 2022-11-23 2023-04-07 联仁健康医疗大数据科技股份有限公司 信息匹配方法、装置、设备及存储介质
CN115577092A (zh) * 2022-12-09 2023-01-06 深圳市人马互动科技有限公司 用户话句处理方法、装置、电子设备及计算机存储介质
CN115577092B (zh) * 2022-12-09 2023-03-24 深圳市人马互动科技有限公司 用户话句处理方法、装置、电子设备及计算机存储介质

Also Published As

Publication number Publication date
CN112446218A (zh) 2021-03-05

Similar Documents

Publication Publication Date Title
WO2022116436A1 (fr) Procédé et appareil d'appariement sémantique de texte pour des phrases longues et courtes, dispositif informatique et support de stockage
WO2018153265A1 (fr) Procédé d'extraction de mots-clés, dispositif informatique et support d'informations
WO2022142613A1 (fr) Procédé et appareil d'expansion de corpus de formation et procédé et appareil de formation de modèle de reconnaissance d'intention
WO2020220539A1 (fr) Procédé et dispositif d'incrémentation de données, dispositif informatique et support de stockage
WO2019136993A1 (fr) Procédé et dispositif de calcul de similarité de texte, appareil informatique, et support de stockage
CN109446885B (zh) 一种基于文本的元器件识别方法、系统、装置和存储介质
WO2022227162A1 (fr) Procédé et appareil de traitement de données de questions et de réponses, dispositif informatique et support de stockage
CN111191032B (zh) 语料扩充方法、装置、计算机设备和存储介质
WO2021073119A1 (fr) Procédé et appareil de désambiguïsation d'entité faisant appel à un modèle de reconnaissance d'intention et dispositif informatique
CN112380837B (zh) 基于翻译模型的相似句子匹配方法、装置、设备及介质
WO2020114100A1 (fr) Procédé et appareil de traitement d'informations, et support d'enregistrement informatique
JP2022510479A (ja) ビデオカット方法、ビデオカット装置、コンピュータ機器及び記憶媒体
CN110427612B (zh) 基于多语言的实体消歧方法、装置、设备和存储介质
CN113297366B (zh) 多轮对话的情绪识别模型训练方法、装置、设备及介质
WO2022134805A1 (fr) Procédé et appareil de prédiction de classification de document, dispositif informatique et support de stockage
WO2022141864A1 (fr) Procédé, appareil, dispositif informatique et support d'apprentissage de modèle de reconnaissance d'intention de conversation
CN111832581B (zh) 肺部特征识别方法、装置、计算机设备及存储介质
CN110597966A (zh) 自动问答方法及装置
CN110598210B (zh) 实体识别模型训练、实体识别方法、装置、设备及介质
CN110309504B (zh) 基于分词的文本处理方法、装置、设备及存储介质
CN113449489B (zh) 标点符号标注方法、装置、计算机设备和存储介质
CN112633423B (zh) 文本识别模型的训练方法、文本识别方法、装置及设备
WO2022142108A1 (fr) Procédé et appareil d'apprentissage de modèle de reconnaissance d'entité d'entrevue, et procédé et appareil d'extraction d'entité d'informations d'entrevue
CN110633475A (zh) 基于计算机场景的自然语言理解方法、装置、系统和存储介质
WO2021068524A1 (fr) Procédé et appareil de mise en correspondance d'image, dispositif informatique et support de stockage

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21899480

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21899480

Country of ref document: EP

Kind code of ref document: A1