WO2022121171A1 - Procédé et appareil de mise en correspondance de textes similaires, ainsi que dispositif électronique et support de stockage informatique - Google Patents

Procédé et appareil de mise en correspondance de textes similaires, ainsi que dispositif électronique et support de stockage informatique Download PDF

Info

Publication number
WO2022121171A1
WO2022121171A1 PCT/CN2021/083714 CN2021083714W WO2022121171A1 WO 2022121171 A1 WO2022121171 A1 WO 2022121171A1 CN 2021083714 W CN2021083714 W CN 2021083714W WO 2022121171 A1 WO2022121171 A1 WO 2022121171A1
Authority
WO
WIPO (PCT)
Prior art keywords
standard
text
semantic representation
target
word
Prior art date
Application number
PCT/CN2021/083714
Other languages
English (en)
Chinese (zh)
Inventor
谢静文
阮晓雯
徐亮
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2022121171A1 publication Critical patent/WO2022121171A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Definitions

  • the present application relates to the technical field of speech semantics, and in particular, to a similar text matching method, apparatus, electronic device, and computer-readable storage medium.
  • the inventor realizes that the current main similar text matching methods are mostly keyword-based similar text matching, that is, extracting the keywords in the text, comparing and analyzing the keywords between different texts, and obtaining the degree of coincidence between the keywords, The similarity between different texts is judged according to the degree of coincidence.
  • this method due to the inconsistency of words in the texts, this method often cannot accurately match the similar texts of the target texts. Therefore, how to improve the matching accuracy of similar texts becomes a an urgent problem to be solved.
  • a similar text matching method including:
  • the standard text corresponding to the standard semantic representation of which the matching probability is greater than a preset probability threshold is a text similar to the target text.
  • a similar text matching device includes:
  • a feature word extraction module used to obtain standard text, and extract feature words from the standard text to obtain standard feature words
  • a standard representation building module for constructing a standard semantic representation corresponding to the standard feature word
  • a key-value pair table generating module configured to generate a standard key-value pair table according to the standard feature word and the standard semantic representation
  • the target representation building module is used to obtain the target text, perform feature word extraction on the target text, obtain the target feature word, and construct the target semantic representation corresponding to the target feature word;
  • the similarity calculation module is used to calculate the similarity between the target feature word and the standard feature word in the standard key-value pair table, and determine that the standard semantic representation corresponding to the standard feature word whose similarity is greater than the preset similarity threshold is to be matching semantic representations;
  • a representation matching module configured to perform representation matching between the target semantic representation and the to-be-matched semantic representation to obtain a matching probability between the target semantic representation and the standard semantic representation;
  • a text screening module configured to determine that the standard text corresponding to the standard semantic representation whose matching probability is greater than a preset probability threshold is a similar text to the target text.
  • An electronic device comprising:
  • a processor that executes the instructions stored in the memory to achieve the following steps:
  • the standard text corresponding to the standard semantic representation of which the matching probability is greater than a preset probability threshold is a text similar to the target text.
  • a computer-readable storage medium having at least one instruction stored in the computer-readable storage medium, the at least one instruction being executed by a processor in an electronic device to implement the following steps:
  • the standard text corresponding to the standard semantic representation of which the matching probability is greater than a preset probability threshold is a text similar to the target text.
  • the present application can solve the problem of low matching accuracy of similar texts.
  • FIG. 1 is a schematic flowchart of a similar text matching method provided by an embodiment of the present application.
  • FIG. 2 is a functional block diagram of a similar text matching apparatus provided by an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of an electronic device for implementing the similar text matching method provided by an embodiment of the present application
  • FIG. 4 is an example diagram of a standard key-value pair table in an embodiment of the present application.
  • the embodiment of the present application provides a similar text matching method.
  • the execution body of the similar text matching method includes, but is not limited to, at least one of electronic devices that can be configured to execute the method provided by the embodiments of the present application, such as a server and a terminal.
  • the similar text matching method can be executed by software or hardware installed in a terminal device or a server device, and the software can be a blockchain platform.
  • the server includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
  • the similar text matching method includes:
  • the standard text is any textual text, for example, news text, novel paragraph text, or paper text, and the like.
  • the standard text can be obtained from a blockchain node for storing standard text by using a python statement with a data capture function, and the high data throughput of the blockchain node can be used to improve the acquisition. Efficiency of standard text.
  • the feature word extraction is performed on the standard text to obtain standard feature words, including:
  • the plurality of text word segmentations are screened according to the word segmentation index to obtain standard feature words.
  • performing word segmentation processing on the standard text to obtain multiple text segmentations including:
  • the preset stop thesaurus and the preset standard thesaurus are thesaurus containing multiple word segmentations.
  • the preset stop word database stores word segmentations of multiple stop words, for example, “Sur” and “Ruci”.
  • the preset standard thesaurus contains multiple non-stop word segmentations, for example, "eat”, “sleep”.
  • the embodiment of the present application performs word segmentation processing on standard text, and can divide a standard text with a relatively large length into multiple word segmentations.
  • the analysis and processing of multiple word segmentations is more efficient and accurate than processing directly through standard text.
  • the word segmentation index refers to an index that can reflect the importance of word segmentation, for example, a frequency index indicating the frequency of occurrence of word segmentation, a weight index indicating the weight of word segmentation, and the like.
  • the use of the index algorithm to calculate the word segmentation index of each word segmentation in the plurality of text segmentations includes:
  • the following index algorithm is used to calculate the word segmentation index of each word segment in the plurality of text word segments:
  • TF i is the frequency of the occurrence of the segment i in the multiple text segments
  • IDF i is the opposite value of the frequency of the segment i in the multiple text segments.
  • the multiple text word segmentations are screened by comparing the size of the word segmentation indicators, that is, the text segmentation indicators corresponding to word segmentation indicators greater than a preset indicator threshold are selected as standard feature words.
  • feature word extraction is performed on standard text, which can reduce the amount of data in subsequent matching, and is beneficial to improve the matching efficiency of similar texts.
  • the construction of the standard semantic representation corresponding to the standard feature word includes:
  • the text within a preset length range before and after the needle feature word is used as the standard semantic representation corresponding to the standard feature word.
  • the embodiment of the present application can realize the realization of the text that is abstracted into the standard feature word, so as to increase the semantics of the standard feature word, which is beneficial to improve the accuracy of similar text matching.
  • the generating a standard key-value pair table according to the standard feature word and the standard semantic representation includes:
  • the multiple standard feature words are respectively used as primary keys in the standard key-value pair table
  • the standard semantic representation corresponding to the plurality of standard feature words is used as the primary key value of the primary key in the standard key-value pair table to obtain a standard key-value pair table.
  • FIG. 4 is an example diagram of a standard key-value pair table in the embodiment of the present application.
  • different standard feature words are primary keys, and the corresponding standard feature words can be uniquely found according to the standard feature words. Standard semantic representation.
  • standard feature words and standard semantic representations are stored in the standard key-value pair table in the form of key-value pairs, and the standard key-value pair table can be used. Improve the efficiency of subsequent similar text matching.
  • the target text includes any text that needs to be similar matched, and the target text is analyzed to determine whether the standard text is similar to the target text.
  • the target text can be uploaded by the user.
  • the steps of extracting feature words from the target text to obtain target feature words are the same as the steps of extracting feature words from the standard text in step S1 to obtain standard feature words, and are not repeated here. Repeat.
  • the step of constructing the target semantic representation corresponding to the target feature word is the same as the step of constructing the standard semantic representation corresponding to the standard feature word in step S2, which is not repeated here.
  • the calculation of the similarity between the target feature word and the standard feature word in the standard key-value pair table includes:
  • R is the target feature word
  • S is the standard feature word
  • Pearson is the similarity operation
  • Sim is the similarity between the target feature word and the standard feature word in the standard key-value pair table.
  • the embodiment of the present application determines that the standard semantic representation corresponding to the standard feature word whose similarity is greater than the preset similarity threshold is the semantic representation to be matched.
  • the similarity between target feature word A and standard feature word B is 40
  • the similarity between target feature word A and standard feature word C is 50
  • target feature word A and standard feature word D The similarity between them is 60
  • the preset similarity threshold is 55
  • the performing the representation matching between the target semantic representation and the to-be-matched semantic representation to obtain a matching probability between the target semantic representation and the standard semantic representation includes:
  • a probability operation is performed on the first representation vector and the second representation vector by using a pre-trained matching model to obtain a matching probability between the target semantic representation and the standard semantic representation.
  • performing word vector transformation on the target semantic representation to obtain a first representation vector including:
  • the byte vector set includes a byte vector of each byte in the target semantic representation
  • the byte vectors corresponding to each byte in the target semantic representation are spliced respectively to obtain the first representation vector.
  • byte 1, byte 2, and byte 3 exist in the target semantic representation, where the byte vector corresponding to byte 1 is byte vector a, the byte vector corresponding to byte 2 is byte vector b, and the byte vector corresponding to byte 2 is byte vector b.
  • the byte vector corresponding to Section 3 is the byte vector c, then the byte vectors corresponding to each byte are spliced separately to obtain the first representation vector abc.
  • the steps of converting the standard semantic representation to word vectors to obtain the second representation vector are the same as the steps of converting the target semantic representation to word vectors to obtain the first representation vector, which will not be repeated here.
  • the embodiment of the present application inputs the first characterization vector and the second characterization vector into a pre-trained matching model, and uses the matching model to calculate the matching probability between the first characterization vector and the second characterization vector.
  • the matching model adopts a multi-hop model
  • the multi-hop model includes but is not limited to the CogQA model and the AnsweringTasks model
  • the multi-hop model is used as the matching model to perform probability operations on the first characterization vector and the second characterization vector. , which can improve the efficiency of calculating the matching probability and help to improve the accuracy of the calculated matching probability.
  • the matching probability is less than or equal to a preset probability threshold, it is determined that the standard text corresponding to the standard semantic representation is not a similar text to the target text, and if the matching probability is greater than the probability threshold , then it is determined that the standard text corresponding to the standard semantic representation is a similar text to the target text.
  • the similar text matching method proposed in this application can solve the problem of low matching accuracy of similar texts.
  • FIG. 2 it is a functional block diagram of a similar text matching apparatus provided by an embodiment of the present application.
  • the similar text matching apparatus 100 described in this application can be installed in an electronic device.
  • the similar text matching apparatus 100 may include a feature word extraction module 101 , a standard representation construction module 102 , a key-value pair table generation module 103 , a target representation construction module 104 , a similarity calculation module 105 , and a representation matching module 106 and text filtering module 107.
  • the modules described in this application may also be referred to as units, which refer to a series of computer program segments that can be executed by the processor of the electronic device and can perform fixed functions, and are stored in the memory of the electronic device.
  • each module/unit is as follows:
  • the feature word extraction module 101 is used for obtaining standard text, and extracting feature words from the standard text to obtain standard feature words.
  • the standard text is any textual text, for example, news text, novel paragraph text, or paper text, and the like.
  • the standard text can be obtained from a blockchain node for storing standard text by using a python statement with a data capture function, and the high data throughput of the blockchain node can be used to improve the acquisition. Efficiency of standard text.
  • the feature word extraction module 101 is specifically used for:
  • the plurality of text word segmentations are screened according to the word segmentation index to obtain standard feature words.
  • performing word segmentation processing on the standard text to obtain multiple text segmentations including:
  • the preset stop thesaurus and the preset standard thesaurus are thesaurus containing multiple word segmentations.
  • the preset stop word database stores word segmentations of multiple stop words, for example, “Sur” and “Ruci”.
  • the preset standard thesaurus contains multiple non-stop word segmentations, for example, "eat”, “sleep”.
  • the embodiment of the present application performs word segmentation processing on standard text, and can divide a standard text with a relatively large length into multiple word segmentations.
  • the analysis and processing of multiple word segmentations is more efficient and accurate than processing directly through standard text.
  • the word segmentation index refers to an index that can reflect the importance of word segmentation, for example, a frequency index indicating the frequency of occurrence of word segmentation, a weight index indicating the weight of word segmentation, and the like.
  • the use of the index algorithm to calculate the word segmentation index of each word segmentation in the plurality of text segmentations includes:
  • the following index algorithm is used to calculate the word segmentation index of each word segment in the plurality of text word segments:
  • TF i is the frequency of the occurrence of the segment i in the multiple text segments
  • IDF i is the opposite value of the frequency of the segment i in the multiple text segments.
  • the embodiment of the present application realizes the screening of the multiple text word segmentations by comparing the size of the word segmentation indicators, that is, selecting text segmentations corresponding to word segmentation indicators greater than a preset indicator threshold as standard feature words.
  • feature word extraction is performed on standard text, which can reduce the amount of data in subsequent matching, and is beneficial to improve the matching efficiency of similar texts.
  • the standard representation building module 102 is configured to construct a standard semantic representation corresponding to the standard feature word.
  • the standard characterization building module 102 is specifically used for:
  • the text within a preset length range before and after the needle feature word is used as the standard semantic representation corresponding to the standard feature word.
  • the embodiment of the present application can realize the realization of the text that is abstracted into the standard feature word, so as to increase the semantics of the standard feature word, which is beneficial to improve the accuracy of similar text matching.
  • the key-value pair table generating module 103 is configured to generate a standard key-value pair table according to the standard feature word and the standard semantic representation.
  • the key-value pair table generating module 103 is specifically used for:
  • the multiple standard feature words are respectively used as primary keys in the standard key-value pair table
  • the standard semantic representation corresponding to the plurality of standard feature words is used as the primary key value of the primary key in the standard key-value pair table to obtain a standard key-value pair table.
  • FIG. 4 is an example diagram of a standard key-value pair table in the embodiment of the present application.
  • different standard feature words are primary keys, and the corresponding standard feature words can be uniquely found according to the standard feature words. Standard semantic representation.
  • standard feature words and standard semantic representations are stored in the standard key-value pair table in the form of key-value pairs, and the standard key-value pair table can be used. Improve the efficiency of subsequent similar text matching.
  • the target representation building module 104 is configured to acquire target text, extract feature words from the target text, obtain target feature words, and construct target semantic representations corresponding to the target feature words.
  • the target text includes any text that needs to be similar matched, and the target text is analyzed to determine whether the standard text is similar to the target text.
  • the target text can be uploaded by the user.
  • the step of extracting feature words from the target text to obtain target feature words is consistent with the step of extracting feature words from the standard text by the feature word extraction module 101 to obtain standard feature words, I won't go into details here.
  • the step of constructing the target semantic representation corresponding to the target feature word is consistent with the step of constructing the standard semantic representation corresponding to the standard feature word by the standard representation building module 102, and details are not repeated here.
  • the similarity calculation module 105 is used to calculate the similarity between the target feature word and the standard feature word in the standard key-value pair table, and determine the standard semantics corresponding to the standard feature word whose similarity is greater than a preset similarity threshold.
  • the representation is the semantic representation to be matched.
  • the similarity calculation module 105 is specifically used for:
  • R is the target feature word
  • S is the standard feature word
  • Pearson is the similarity operation
  • Sim is the similarity between the target feature word and the standard feature word in the standard key-value pair table.
  • the embodiment of the present application determines that the standard semantic representation corresponding to the standard feature word whose similarity is greater than the preset similarity threshold is the semantic representation to be matched.
  • the similarity between target feature word A and standard feature word B is 40
  • the similarity between target feature word A and standard feature word C is 50
  • target feature word A and standard feature word D The similarity between them is 60
  • the preset similarity threshold is 55
  • the representation matching module 106 is configured to perform representation matching between the target semantic representation and the to-be-matched semantic representation to obtain a matching probability between the target semantic representation and the standard semantic representation.
  • the characterization matching module 106 is specifically used for:
  • a probability operation is performed on the first representation vector and the second representation vector by using a pre-trained matching model to obtain a matching probability between the target semantic representation and the standard semantic representation.
  • performing word vector transformation on the target semantic representation to obtain a first representation vector including:
  • the byte vector set includes a byte vector of each byte in the target semantic representation
  • the byte vectors corresponding to each byte in the target semantic representation are spliced respectively to obtain the first representation vector.
  • byte 1, byte 2, and byte 3 exist in the target semantic representation, where the byte vector corresponding to byte 1 is byte vector a, the byte vector corresponding to byte 2 is byte vector b, and the byte vector corresponding to byte 2 is byte vector b.
  • the byte vector corresponding to Section 3 is the byte vector c, then the byte vectors corresponding to each byte are spliced separately to obtain the first representation vector abc.
  • the steps of converting the standard semantic representation to word vectors to obtain the second representation vector are the same as the steps of converting the target semantic representation to word vectors to obtain the first representation vector, which will not be repeated here.
  • the embodiment of the present application inputs the first characterization vector and the second characterization vector into a pre-trained matching model, and uses the matching model to calculate the matching probability between the first characterization vector and the second characterization vector.
  • the matching model adopts a multi-hop model
  • the multi-hop model includes but is not limited to the CogQA model and the AnsweringTasks model
  • the multi-hop model is used as the matching model to perform probability operations on the first characterization vector and the second characterization vector. , which can improve the efficiency of calculating the matching probability and help to improve the accuracy of the calculated matching probability.
  • the text screening module 107 is configured to determine that the standard text corresponding to the standard semantic representation whose matching probability is greater than a preset probability threshold is a similar text to the target text.
  • the matching probability is less than or equal to a preset probability threshold, it is determined that the standard text corresponding to the standard semantic representation is not a similar text to the target text, and if the matching probability is greater than the probability threshold , then it is determined that the standard text corresponding to the standard semantic representation is a similar text to the target text.
  • the similar text matching device proposed in the present application can solve the problem of low matching accuracy of similar texts.
  • FIG. 3 it is a schematic structural diagram of an electronic device for implementing a similar text matching method provided by an embodiment of the present application.
  • the electronic device 1 may include a processor 10, a memory 11 and a bus, and may also include a computer program stored in the memory 11 and executable on the processor 10, such as a similar text matching program 12.
  • the memory 11 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, mobile hard disk, multimedia card, card-type memory (for example: SD or DX memory, etc.), magnetic memory, magnetic disk, CD etc.
  • the memory 11 may be an internal storage unit of the electronic device 1 in some embodiments, such as a mobile hard disk of the electronic device 1 .
  • the memory 11 may also be an external storage device of the electronic device 1, such as a pluggable mobile hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital) equipped on the electronic device 1. , SD) card, flash memory card (Flash Card), etc.
  • the memory 11 may also include both an internal storage unit of the electronic device 1 and an external storage device.
  • the memory 11 can not only be used to store application software installed in the electronic device 1 and various types of data, such as the code of the similar text matching program 12, etc., but also can be used to temporarily store data that has been output or will be output.
  • the processor 10 may be composed of integrated circuits, for example, may be composed of a single packaged integrated circuit, or may be composed of multiple integrated circuits packaged with the same function or different functions, including one or more integrated circuits.
  • Central Processing Unit CPU
  • microprocessor digital processing chip
  • graphics processor and combination of various control chips, etc.
  • the processor 10 is the control core (Control Unit) of the electronic device, and uses various interfaces and lines to connect the various components of the entire electronic device, by running or executing programs or modules (such as similar) stored in the memory 11. text matching programs, etc.), and call data stored in the memory 11 to perform various functions of the electronic device 1 and process data.
  • the bus may be a peripheral component interconnect (PCI for short) bus or an extended industry standard architecture (Extended industry standard architecture, EISA for short) bus or the like.
  • PCI peripheral component interconnect
  • EISA Extended industry standard architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the bus is configured to implement connection communication between the memory 11 and at least one processor 10 and the like.
  • FIG. 3 only shows an electronic device with components. Those skilled in the art can understand that the structure shown in FIG. 3 does not constitute a limitation on the electronic device 1, and may include fewer or more components than those shown in the figure. components, or a combination of certain components, or a different arrangement of components.
  • the electronic device 1 may also include a power supply (such as a battery) for powering the various components, preferably, the power supply may be logically connected to the at least one processor 10 through a power management device, so that the power management
  • the device implements functions such as charge management, discharge management, and power consumption management.
  • the power source may also include one or more DC or AC power sources, recharging devices, power failure detection circuits, power converters or inverters, power status indicators, and any other components.
  • the electronic device 1 may further include various sensors, Bluetooth modules, Wi-Fi modules, etc., which will not be repeated here.
  • the electronic device 1 may also include a network interface, optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • a network interface optionally, the network interface may include a wired interface and/or a wireless interface (such as a WI-FI interface, a Bluetooth interface, etc.), which is usually used in the electronic device 1 Establish a communication connection with other electronic devices.
  • the electronic device 1 may further include a user interface, and the user interface may be a display (Display), an input unit (eg, a keyboard (Keyboard)), optionally, the user interface may also be a standard wired interface or a wireless interface.
  • the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode, organic light-emitting diode) touch device, and the like.
  • the display may also be appropriately called a display screen or a display unit, which is used for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
  • the similar text matching program 12 stored in the memory 11 in the electronic device 1 is a combination of multiple instructions, and when running in the processor 10, can realize:
  • the standard text corresponding to the standard semantic representation of which the matching probability is greater than a preset probability threshold is a text similar to the target text.
  • the modules/units integrated in the electronic device 1 may be stored in a computer-readable storage medium.
  • the computer-readable storage medium may be volatile or non-volatile.
  • the computer-readable medium may include: any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disc, a computer memory, a read-only memory (ROM, Read-Only). Memory).
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium can be either volatile or non-volatile.
  • the readable storage medium stores a computer program, and the computer program is electronically stored. When executed by the processor of the device, it can achieve:
  • the standard text corresponding to the standard semantic representation of which the matching probability is greater than a preset probability threshold is a text similar to the target text.
  • modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

La présente invention concerne un procédé de mise en correspondance de textes similaires, comprenant les étapes consistant à : acquérir un texte standard, effectuer une extraction de mots caractéristiques sur le texte standard acquis, et construire une représentation sémantique standard selon un résultat d'extraction ; générer une table de paires de valeurs clés standard selon un mot caractéristique standard et la représentation sémantique standard (S3) ; effectuer une extraction de mots caractéristiques sur un texte cible acquis, et construire une représentation sémantique cible ; calculer la similarité entre un mot caractéristique cible et le mot caractéristique standard, et cribler, selon la similarité, une représentation sémantique à mettre en correspondance ; effectuer une mise en correspondance de représentations sur la représentation sémantique à mettre en correspondance et la représentation sémantique standard, de façon à obtenir une probabilité de mise en correspondance ; et déterminer le texte standard correspondant à la représentation sémantique standard, dont la probabilité de mise en correspondance est supérieure à une valeur seuil de probabilité prédéfinie, comme étant un texte similaire au texte cible (S7). En outre, la présente invention concerne également la technologie des chaînes de blocs. Le texte standard peut être stocké dans un nœud d'une chaîne de blocs. La présente invention concerne en outre un appareil de mise en correspondance de textes similaires, un dispositif électronique et un support de stockage lisible par ordinateur. Au moyen de la présente demande, le problème de la précision relativement faible de la mise en correspondance de textes similaires peut être résolu.
PCT/CN2021/083714 2020-12-10 2021-03-30 Procédé et appareil de mise en correspondance de textes similaires, ainsi que dispositif électronique et support de stockage informatique WO2022121171A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011435054.2A CN112541338A (zh) 2020-12-10 2020-12-10 相似文本匹配方法、装置、电子设备及计算机存储介质
CN202011435054.2 2020-12-10

Publications (1)

Publication Number Publication Date
WO2022121171A1 true WO2022121171A1 (fr) 2022-06-16

Family

ID=75019869

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/083714 WO2022121171A1 (fr) 2020-12-10 2021-03-30 Procédé et appareil de mise en correspondance de textes similaires, ainsi que dispositif électronique et support de stockage informatique

Country Status (2)

Country Link
CN (1) CN112541338A (fr)
WO (1) WO2022121171A1 (fr)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115186775A (zh) * 2022-09-13 2022-10-14 北京远鉴信息技术有限公司 一种图像描述文字的匹配度检测方法、装置及电子设备
CN115545001A (zh) * 2022-11-29 2022-12-30 支付宝(杭州)信息技术有限公司 一种文本匹配方法及装置
CN115879901A (zh) * 2023-02-22 2023-03-31 陕西湘秦衡兴科技集团股份有限公司 一种智能人事自助服务平台
CN116932767A (zh) * 2023-09-18 2023-10-24 江西农业大学 基于知识图谱的文本分类方法、系统、存储介质及计算机
CN117371435A (zh) * 2023-10-09 2024-01-09 北京睿企信息科技有限公司 一种获取热度发生波动的热词的数据处理系统

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541338A (zh) * 2020-12-10 2021-03-23 平安科技(深圳)有限公司 相似文本匹配方法、装置、电子设备及计算机存储介质
CN112883730B (zh) * 2021-03-25 2023-01-17 平安国际智慧城市科技股份有限公司 相似文本匹配方法、装置、电子设备及存储介质
CN113158683A (zh) * 2021-04-15 2021-07-23 平安国际智慧城市科技股份有限公司 重要事项提醒方法、装置、电子设备及计算机存储介质
CN113486266B (zh) * 2021-06-29 2024-05-21 平安银行股份有限公司 页面标签添加方法、装置、设备及存储介质
CN115934880A (zh) * 2022-10-31 2023-04-07 永道工程咨询有限公司 一种工程造价文档数据库构建和工程造价文档检索方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165291A (zh) * 2018-06-29 2019-01-08 厦门快商通信息技术有限公司 一种文本匹配方法及电子设备
US20200242304A1 (en) * 2017-11-29 2020-07-30 Tencent Technology (Shenzhen) Company Limited Text recommendation method and apparatus, and electronic device
CN111639502A (zh) * 2020-05-26 2020-09-08 深圳壹账通智能科技有限公司 文本语义匹配方法、装置、计算机设备及存储介质
CN111898643A (zh) * 2020-07-01 2020-11-06 上海依图信息技术有限公司 一种语义匹配方法及装置
CN112541338A (zh) * 2020-12-10 2021-03-23 平安科技(深圳)有限公司 相似文本匹配方法、装置、电子设备及计算机存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200242304A1 (en) * 2017-11-29 2020-07-30 Tencent Technology (Shenzhen) Company Limited Text recommendation method and apparatus, and electronic device
CN109165291A (zh) * 2018-06-29 2019-01-08 厦门快商通信息技术有限公司 一种文本匹配方法及电子设备
CN111639502A (zh) * 2020-05-26 2020-09-08 深圳壹账通智能科技有限公司 文本语义匹配方法、装置、计算机设备及存储介质
CN111898643A (zh) * 2020-07-01 2020-11-06 上海依图信息技术有限公司 一种语义匹配方法及装置
CN112541338A (zh) * 2020-12-10 2021-03-23 平安科技(深圳)有限公司 相似文本匹配方法、装置、电子设备及计算机存储介质

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115186775A (zh) * 2022-09-13 2022-10-14 北京远鉴信息技术有限公司 一种图像描述文字的匹配度检测方法、装置及电子设备
CN115186775B (zh) * 2022-09-13 2022-12-16 北京远鉴信息技术有限公司 一种图像描述文字的匹配度检测方法、装置及电子设备
CN115545001A (zh) * 2022-11-29 2022-12-30 支付宝(杭州)信息技术有限公司 一种文本匹配方法及装置
CN115545001B (zh) * 2022-11-29 2023-04-07 支付宝(杭州)信息技术有限公司 一种文本匹配方法及装置
CN115879901A (zh) * 2023-02-22 2023-03-31 陕西湘秦衡兴科技集团股份有限公司 一种智能人事自助服务平台
CN115879901B (zh) * 2023-02-22 2023-07-28 陕西湘秦衡兴科技集团股份有限公司 一种智能人事自助服务平台
CN116932767A (zh) * 2023-09-18 2023-10-24 江西农业大学 基于知识图谱的文本分类方法、系统、存储介质及计算机
CN116932767B (zh) * 2023-09-18 2023-12-12 江西农业大学 基于知识图谱的文本分类方法、系统、存储介质及计算机
CN117371435A (zh) * 2023-10-09 2024-01-09 北京睿企信息科技有限公司 一种获取热度发生波动的热词的数据处理系统
CN117371435B (zh) * 2023-10-09 2024-04-05 北京睿企信息科技有限公司 一种获取热度发生波动的热词的数据处理系统

Also Published As

Publication number Publication date
CN112541338A (zh) 2021-03-23

Similar Documents

Publication Publication Date Title
WO2022121171A1 (fr) Procédé et appareil de mise en correspondance de textes similaires, ainsi que dispositif électronique et support de stockage informatique
WO2022134759A1 (fr) Procédé et appareil de génération de mots-clés et dispositif électronique et support de stockage informatique
WO2022142593A1 (fr) Procédé et appareil de classification de texte, dispositif électronique et support de stockage lisible
WO2022160449A1 (fr) Procédé et appareil de classification de texte, dispositif électronique et support de stockage
WO2019174132A1 (fr) Procédé de traitement de données, serveur et support de stockage informatique
WO2020108063A1 (fr) Procédé, appareil et serveur de détermination de mots caractéristiques
CN110532347B (zh) 一种日志数据处理方法、装置、设备和存储介质
WO2022116435A1 (fr) Procédé et appareil de génération de titre, dispositif électronique et support de stockage
WO2022160454A1 (fr) Procédé et appareil de récupération de littérature médicale, dispositif électronique, et support de stockage
WO2022222943A1 (fr) Procédé et appareil de recommandation de département, dispositif électronique et support de stockage
WO2022222300A1 (fr) Procédé et appareil d'extraction de relation ouverte, dispositif électronique et support de stockage
WO2022142020A1 (fr) Procédé et appareil de poussée d'informations, dispositif électronique et support de stockage lisible par ordinateur
WO2022134355A1 (fr) Procédé et appareil de recherche basés sur une invite de mots-clé, dispositif électronique et support de stockage
US9213759B2 (en) System, apparatus, and method for executing a query including boolean and conditional expressions
WO2022142106A1 (fr) Procédé et appareil d'analyse de texte, dispositif électronique et support de stockage lisible
WO2022121172A1 (fr) Procédé et appareil de correction d'erreur de texte, dispositif électronique et support de stockage lisible par ordinateur
WO2022179122A1 (fr) Procédé et appareil de stockage de données utilisant des mégadonnées, et dispositif électronique et support de stockage
CN113434542B (zh) 数据关系识别方法、装置、电子设备及存储介质
CN113722600A (zh) 应用于大数据的数据查询方法、装置、设备及产品
CN113282854A (zh) 数据请求响应方法、装置、电子设备及存储介质
WO2022141860A1 (fr) Procédé et appareil de déduplication de texte, dispositif électronique et support de stockage lisible par ordinateur
CN113434413B (zh) 基于数据差异的数据测试方法、装置、设备及存储介质
WO2022141867A1 (fr) Procédé et appareil de reconnaissance de parole, dispositif électronique et support de stockage lisible
WO2022141838A1 (fr) Procédé et appareil d'analyse de confiance de modèle, dispositif électronique et support de stockage informatique
WO2022134345A1 (fr) Procédé d'accès à un fichier, appareil, dispositif et support de stockage lisible

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21901887

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21901887

Country of ref document: EP

Kind code of ref document: A1