WO2022156180A1 - Procédé de détermination de texte similaire et dispositif associé - Google Patents

Procédé de détermination de texte similaire et dispositif associé Download PDF

Info

Publication number
WO2022156180A1
WO2022156180A1 PCT/CN2021/109391 CN2021109391W WO2022156180A1 WO 2022156180 A1 WO2022156180 A1 WO 2022156180A1 CN 2021109391 W CN2021109391 W CN 2021109391W WO 2022156180 A1 WO2022156180 A1 WO 2022156180A1
Authority
WO
WIPO (PCT)
Prior art keywords
text
detected
target
word
vector
Prior art date
Application number
PCT/CN2021/109391
Other languages
English (en)
Chinese (zh)
Inventor
李小娟
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2022156180A1 publication Critical patent/WO2022156180A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/194Calculation of difference between files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to a method for determining similar texts and related devices.
  • the similarity of sentences is determined by the co-occurrence information of the text.
  • it cannot be accurately calculated.
  • the similarity between two texts reduces the accuracy of determining similar texts.
  • deep text similarity algorithms are generated.
  • sentences are mapped to semantics through the coding layer.
  • the inventor realized that if there are texts with similar text information but opposite meanings, the determination accuracy of similar texts will be low.
  • a first aspect of the present application provides a method for determining similar texts, the method for determining similar texts includes:
  • Calculate the similarity between the to-be-detected feature vector and the target feature vector obtain the text similarity between the to-be-detected text and the target text, and compare the multiple to-be-detected word segmentations with the multiple target word segmentations. The intersection is determined as a co-occurring word;
  • a second aspect of the present application provides an electronic device, the electronic device includes a processor and a memory, the processor is configured to execute computer-readable instructions stored in the memory to implement the following steps:
  • Calculate the similarity between the to-be-detected feature vector and the target feature vector obtain the text similarity between the to-be-detected text and the target text, and compare the multiple to-be-detected word segmentations with the multiple target word segmentations. The intersection is determined as a co-occurring word;
  • a third aspect of the present application provides a computer-readable storage medium on which at least one computer-readable instruction is stored, and the at least one computer-readable instruction is executed by a processor to implement the following steps:
  • Calculate the similarity between the to-be-detected feature vector and the target feature vector obtain the text similarity between the to-be-detected text and the target text, and compare the multiple to-be-detected word segmentations with the multiple target word segmentations. The intersection is determined as a co-occurring word;
  • a fourth aspect of the present application provides an apparatus for determining similar texts, and the apparatus for determining similar texts includes:
  • a determination unit configured to receive a similar text determination request, and determine the text to be detected according to the similar text determination request
  • a generating unit configured to perform word segmentation processing on the text to be detected to obtain a plurality of word segmentations to be detected, and perform word segmentation processing on the target text to obtain a plurality of target word segmentations
  • the generating unit is also used to obtain the union of the plurality of word segmentations to be detected and the plurality of target word segmentations to obtain all word segmentations;
  • the generating unit is further configured to generate a feature vector to be detected according to the plurality of word segmentations to be detected and the plurality of target word segmentations, and to generate a target feature vector according to the plurality of word segmentations to be detected and the plurality of target word segmentations;
  • the determining unit is further configured to calculate the similarity between the to-be-detected feature vector and the target feature vector, obtain the text similarity between the to-be-detected text and the target text, and divide the plurality of to-be-detected word segmentations. The intersection with the multiple target word segments is determined as a co-occurrence word;
  • the determining unit is also used to calculate the co-occurrence quantity of the co-occurrence words, and calculate the total amount of the word segmentation of all the word segmentations;
  • the determining unit is further configured to divide the co-occurrence number by the total amount of word segmentation to obtain a similarity coefficient
  • the determining unit is further configured to determine the polarity features of the text to be detected and the target text according to the tone of the text to be detected and the tone of the target text;
  • the generating unit is further configured to generate text features of the text to be detected and the target text according to the text similarity, the similarity coefficient and the polarity feature;
  • a conversion unit for converting the text to be detected into a semantic vector to be detected, and converting the target text into a target semantic vector
  • the determining unit is further configured to generate semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector, and determine the text feature and the semantic feature to determine the Similar results between the text to be detected and the target text.
  • the present application determines the text similarity, similarity coefficient and polarity feature of the text to be detected and the target text, because the polarity feature can characterize the text to be detected and the target text. Whether the tone of the target text is the same, therefore, the degree of similarity between the text to be detected and the target text can be accurately determined. The problem of low accuracy is solved, and the similarity result between the text to be detected and the target text can be accurately determined through the text feature and the semantic feature.
  • FIG. 1 is a flowchart of a preferred embodiment of the method for determining similar texts in the present application.
  • FIG. 2 is a flowchart of an embodiment of the present application for generating a feature vector to be detected.
  • FIG. 3 is a flow chart of an embodiment of the present application for generating semantic features.
  • FIG. 4 is a functional block diagram of a preferred embodiment of the apparatus for determining similar texts of the present application.
  • FIG. 5 is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for determining similar texts in the present application.
  • FIG. 1 it is a flowchart of a preferred embodiment of the method for determining similar texts of the present application. According to different requirements, the order of the steps in this flowchart can be changed, and some steps can be omitted.
  • the similar text determination method is applied to one or more electronic devices, the electronic device is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored computer-readable instructions, and its hardware Including but not limited to microprocessors, application specific integrated circuits (ASICs), programmable gate arrays (Field-Programmable Gate Arrays, FPGAs), digital processors (Digital Signal Processors, DSPs), embedded devices, etc.
  • ASICs application specific integrated circuits
  • FPGAs Field-Programmable Gate Arrays
  • DSPs Digital Signal Processors
  • embedded devices etc.
  • the electronic device can be any electronic product that can interact with the user, such as a personal computer, a tablet computer, a smart phone, a personal digital assistant (PDA), a game console, an interactive network television ( Internet Protocol Television, IPTV), smart wearable devices, etc.
  • a personal computer a tablet computer
  • a smart phone a personal digital assistant (PDA)
  • PDA personal digital assistant
  • IPTV interactive network television
  • smart wearable devices etc.
  • the electronic equipment may include network equipment and/or user equipment.
  • the network device includes, but is not limited to, a single network electronic device, an electronic device group composed of multiple network electronic devices, or a cloud composed of a large number of hosts or network electronic devices based on cloud computing (Cloud Computing).
  • the network where the electronic device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a virtual private network (Virtual Private Network, VPN), and the like.
  • VPN Virtual Private Network
  • S10 Receive a similar text determination request, and determine the text to be detected according to the similar text determination request.
  • the information carried in the similar text determination request includes, but is not limited to: target text, storage location, and the like.
  • the similar text determination request can be triggered by any user.
  • the text to be detected refers to the text that needs to be detected whether it is similar to the target text. There may be multiple texts to be detected.
  • the electronic device determining the text to be detected according to the similar text determination request includes:
  • a to-be-detected text library is determined from the storage location, and any text is extracted from the to-be-detected text library as the to-be-detected text.
  • the target text refers to the reference text in the similar text determination request.
  • the obtaining, by the electronic device, the target text from the similar text determination request includes:
  • Information for indicating text is acquired from the data information as the target text.
  • the target text since the target text is stored in the similar text determination request, the target text can be quickly acquired from the data information obtained by parsing.
  • FIG. 2 is a flowchart of an embodiment of the present application for generating a feature vector to be detected.
  • generating, by the electronic device, a feature vector to be detected according to the text to be detected and the target text includes:
  • S120 Perform word segmentation processing on the text to be detected to obtain multiple word segmentations to be detected, and perform word segmentation processing on the target text to obtain multiple target word segmentations.
  • the multiple word segments to be detected may be multiple words, and the multiple target word segments may be multiple words.
  • S121 Acquire the union of the multiple to-be-detected word segments and the multiple target word segments to obtain all the segmented words.
  • S122 Generate the feature vector to be detected according to the mapping relationship between the multiple word segments to be detected and all the word segments.
  • the mapping relationship refers to whether the multiple to-be-detected word segments exist in all the word segments.
  • the multiple word segments to be detected are: I, Immediately, Immediately, Help, You, Apply, Please, Okay, Do
  • the multiple target word segments are: I, No, Do, Fa, Help, You, Apply , please, therefore, all the participles mentioned are: I, help, apply, please, immediately, immediately, you, ok, ?
  • the multiple word segments to be detected do not appear, therefore, the feature vector to be detected is [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0].
  • the feature vector to be detected can be determined according to the text to be detected and the target text. Since the feature vector to be detected is generated according to the target text, the feature vector to be detected can be accurately determined. Feature vector to be detected.
  • the electronic device generating a target feature vector according to the text to be detected and the target text includes:
  • the target feature vector is generated according to the mapping relationship between the target word segment and all word segments.
  • S13 Calculate the similarity between the feature vector to be detected and the target feature vector, obtain the text similarity between the text to be detected and the target text, and determine a similarity coefficient according to the text to be detected and the target text .
  • the electronic device uses a cosine similarity calculation formula to calculate the similarity between the feature vector to be detected and the target feature vector.
  • cos ⁇ refers to the similarity between the feature vector to be detected and the target feature vector
  • n refers to the vector dimension of the feature vector to be detected and the target feature vector
  • i refers to the current vector dimension
  • x i refers to the feature vector to be detected
  • y i refers to the target feature vector.
  • the text similarity can be quickly determined through the cosine similarity calculation formula.
  • the electronic device determining the similarity coefficient according to the text to be detected and the target text includes:
  • the similarity coefficient is obtained by dividing the co-occurrence number by the total number of word segmentations.
  • the co-occurrence words are me, help, application, and request
  • the co-occurrence number of the co-occurrence words is calculated to be 4
  • the total number of word segmentations of all the participles is calculated to be 13.
  • the similarity coefficient can be accurately determined according to the co-occurrence words of the text to be detected and the target text.
  • the polarity characteristic includes 1 or 0.
  • the polarity feature is determined to be 1; when the tone of the text to be detected is different from the tone of the target text, the polarity feature is determined as 1. Sex characteristics were determined to be 0.
  • the electronic device determining the polarity characteristics of the text to be detected and the target text according to the tone of the text to be detected and the tone of the target text includes:
  • Detecting whether the text to be detected contains a preset word to obtain a first detection result, and detecting whether the target text contains the preset word to obtain a second detection result, where the preset word is used to indicate a negative tone ;
  • the first tone is the same as the second tone, determining the polarity feature as a first value
  • the polarity feature is determined as a second value.
  • the preset words include, but are not limited to: none, none, no.
  • the tone of the text to be detected and the target text can be accurately determined according to the preset words, and then the polarity feature can be accurately determined.
  • the text feature is obtained by splicing the text similarity, the similarity coefficient and the polarity feature.
  • the text similarity is 0.4714
  • the similarity coefficient is 0.3077
  • the polarity feature is 0.
  • the text feature obtained is [0.4714, 0.3077, 0].
  • S16 Convert the text to be detected into a semantic vector to be detected, and convert the target text into a target semantic vector.
  • the semantic vector to be detected includes the semantics of the text to be detected
  • the target semantic vector includes the semantics of the target text
  • the electronic device converting the text to be detected into a semantic vector to be detected includes:
  • the generated semantic vector to be detected can have the contextual semantics of the text to be detected, and the accuracy of determination of the semantic vector to be detected can be improved.
  • the similarity result includes that the text to be detected is similar to the target text, and the text to be detected is not similar to the target text.
  • FIG. 3 is a flowchart of an embodiment of generating semantic features of the present application.
  • the electronic device generating the semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector includes:
  • S172 Perform iterative mapping on the spliced semantic vector by using a pre-built multi-layer hidden layer to obtain the semantic feature.
  • the semantic feature is obtained according to the operation of the to-be-detected semantic vector and the target semantic vector
  • the semantic feature can include the to-be-detected text and the target text
  • the semantics in the semantics improves the accuracy of the semantic features.
  • determining, by the electronic device, according to the text feature and the semantic feature, the similarity result between the text to be detected and the target text includes:
  • the present application determines the text similarity, similarity coefficient and polarity feature of the text to be detected and the target text, because the polarity feature can characterize the text to be detected and the target text. Whether the tone of the target text is the same, therefore, the degree of similarity between the text to be detected and the target text can be accurately determined. The problem of low accuracy is solved, and the similarity result between the text to be detected and the target text can be accurately determined through the text feature and the semantic feature.
  • the similar text determination device 11 includes a determination unit 110 , an acquisition unit 111 , a generation unit 112 and a conversion unit 113 .
  • the module/unit referred to in this application refers to a series of computer-readable instruction segments that can be acquired by the processor 13 and can perform fixed functions, and are stored in the memory 12 . In this embodiment, the functions of each module/unit will be described in detail in subsequent embodiments.
  • the determination unit 110 receives the similar text determination request, and determines the text to be detected according to the similar text determination request.
  • the information carried in the similar text determination request includes, but is not limited to: target text, storage location, and the like.
  • the similar text determination request can be triggered by any user.
  • the text to be detected refers to the text that needs to be detected whether it is similar to the target text. There may be multiple texts to be detected.
  • the determining unit 110 determines the text to be detected according to the similar text determination request includes:
  • a to-be-detected text library is determined from the storage location, and any text is extracted from the to-be-detected text library as the to-be-detected text.
  • the obtaining unit 111 obtains the target text from the similar text determination request.
  • the target text refers to the reference text in the similar text determination request.
  • the acquiring unit 111 acquiring the target text from the similar text determination request includes:
  • Information for indicating text is acquired from the data information as the target text.
  • the target text since the target text is stored in the similar text determination request, the target text can be quickly acquired from the data information obtained by parsing.
  • the generating unit 112 generates a feature vector to be detected according to the text to be detected and the target text, and generates a target feature vector according to the text to be detected and the target text.
  • the generating unit 112 generates a feature vector to be detected according to the text to be detected and the target text, including:
  • the multiple word segments to be detected may be multiple words, and the multiple target word segments may be multiple words.
  • the to-be-detected feature vector is generated according to the mapping relationship between the plurality of to-be-detected word segments and all of the word segments.
  • the mapping relationship refers to whether the multiple to-be-detected word segments exist in all the word segments.
  • the multiple word segments to be detected are: I, Immediately, Immediately, Help, You, Apply, Please, Okay, Do
  • the multiple target word segments are: I, No, Do, Fa, Help, You, Apply , please, therefore, all the participles mentioned are: I, help, apply, please, immediately, immediately, you, ok, ?
  • the multiple word segments to be detected do not appear, therefore, the feature vector to be detected is [1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0].
  • the feature vector to be detected can be determined according to the text to be detected and the target text. Since the feature vector to be detected is generated according to the target text, the feature vector to be detected can be accurately determined. Feature vector to be detected.
  • the generating unit 112 generates a target feature vector according to the text to be detected and the target text, including:
  • the target feature vector is generated according to the mapping relationship between the target word segment and all word segments.
  • the determining unit 110 calculates the similarity between the feature vector to be detected and the target feature vector, obtains the text similarity between the text to be detected and the target text, and according to the text to be detected and the target text Determine the similarity coefficient.
  • the determining unit 110 uses a cosine similarity calculation formula to calculate the similarity between the feature vector to be detected and the target feature vector.
  • cos ⁇ refers to the similarity between the feature vector to be detected and the target feature vector
  • n refers to the vector dimension of the feature vector to be detected and the target feature vector
  • i refers to the current vector dimension
  • x i refers to the feature vector to be detected
  • y i refers to the target feature vector.
  • the text similarity can be quickly determined through the cosine similarity calculation formula.
  • the determining unit 110 determining the similarity coefficient according to the text to be detected and the target text includes:
  • the similarity coefficient is obtained by dividing the co-occurrence number by the total number of word segmentations.
  • the co-occurrence words are me, help, application, and request
  • the co-occurrence number of the co-occurrence words is calculated to be 4
  • the total number of word segmentations of all the participles is calculated to be 13.
  • the similarity coefficient can be accurately determined according to the co-occurrence words of the text to be detected and the target text.
  • the determining unit 110 determines the polarity characteristics of the text to be detected and the target text according to the tone of the text to be detected and the tone of the target text.
  • the polarity characteristic includes 1 or 0.
  • the polarity feature is determined to be 1; when the tone of the text to be detected is different from the tone of the target text, the polarity feature is determined as 1. Sex characteristics were determined to be 0.
  • the determining unit 110 determines the polarity features of the text to be detected and the target text according to the tone of the text to be detected and the tone of the target text, including:
  • Detecting whether the text to be detected contains a preset word to obtain a first detection result, and detecting whether the target text contains the preset word to obtain a second detection result, where the preset word is used to indicate a negative tone ;
  • the first tone is the same as the second tone, determining the polarity feature as a first value
  • the polarity feature is determined as a second value.
  • the preset words include, but are not limited to: none, none, no.
  • the tone of the text to be detected and the target text can be accurately determined according to the preset words, and then the polarity feature can be accurately determined.
  • the generating unit 112 generates text features of the text to be detected and the target text according to the text similarity, the similarity coefficient and the polarity feature.
  • the text feature is obtained by splicing the text similarity, the similarity coefficient and the polarity feature.
  • the text similarity is 0.4714
  • the similarity coefficient is 0.3077
  • the polarity feature is 0.
  • the text feature obtained is [0.4714, 0.3077, 0].
  • the converting unit 113 converts the text to be detected into a semantic vector to be detected, and converts the target text into a target semantic vector.
  • the semantic vector to be detected includes the semantics of the text to be detected
  • the target semantic vector includes the semantics of the target text
  • the converting unit 113 converts the text to be detected into a semantic vector to be detected including:
  • the generated semantic vector to be detected can have the contextual semantics of the text to be detected, and the accuracy of determination of the semantic vector to be detected can be improved.
  • the determining unit 110 generates the semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector, and determines the text to be detected and the target text according to the text features and the semantic features. similar results for the target text.
  • the similarity result includes that the text to be detected is similar to the target text, and the text to be detected is not similar to the target text.
  • the determining unit 110 generates the semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector, including:
  • the spliced semantic vector is iteratively mapped by using a pre-built multi-layer hidden layer to obtain the semantic feature.
  • the semantic feature is obtained according to the operation of the to-be-detected semantic vector and the target semantic vector
  • the semantic feature can include the to-be-detected text and the target text
  • the semantics in the semantics improves the accuracy of the semantic features.
  • the determining unit 110 determines the similarity result between the text to be detected and the target text according to the text feature and the semantic feature, including:
  • the present application determines the text similarity, similarity coefficient and polarity feature of the text to be detected and the target text, because the polarity feature can characterize the text to be detected and the target text. Whether the tone of the target text is the same, therefore, the degree of similarity between the text to be detected and the target text can be accurately determined. The problem of low accuracy is solved, and the similarity result between the text to be detected and the target text can be accurately determined through the text feature and the semantic feature.
  • FIG. 5 it is a schematic structural diagram of an electronic device implementing a preferred embodiment of the method for determining similar texts of the present application.
  • the electronic device 1 includes, but is not limited to, a memory 12 , a processor 13 , and computer-readable instructions stored in the memory 12 and executable on the processor 13 , such as similar text determination programs.
  • the schematic diagram is only an example of the electronic device 1, and does not constitute a limitation on the electronic device 1, and may include more or less components than the one shown, or combine some components, or different Components, for example, the electronic device 1 may also include input and output devices, network access devices, buses, and the like.
  • the processor 13 may be a central processing unit (Central Processing Unit, CPU), or other general-purpose processors, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor can be a microprocessor or the processor can also be any conventional processor, etc.
  • the processor 13 is the computing core and control center of the electronic device 1, and uses various interfaces and lines to connect the entire electronic device. 1, and the operating system that executes the electronic device 1, as well as various installed applications, program codes, and the like.
  • the computer-readable instructions may be divided into one or more modules/units, and the one or more modules/units are stored in the memory 12 and executed by the processor 13 to Complete this application.
  • the one or more modules/units may be a series of computer-readable instruction segments capable of accomplishing specific functions, and the computer-readable instruction segments are used to describe the execution process of the computer-readable instructions in the electronic device 1 .
  • the computer readable instructions may be divided into a determining unit 110 , an obtaining unit 111 , a generating unit 112 and a converting unit 113 .
  • the memory 12 can be used to store the computer-readable instructions and/or modules, and the processor 13 executes or executes the computer-readable instructions and/or modules stored in the memory 12 and invokes the computer-readable instructions and/or modules stored in the memory 12.
  • the data in the electronic device 1 realizes various functions of the electronic device 1 .
  • the memory 12 may mainly include a stored program area and a stored data area, wherein the stored program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), and the like; the storage data area may Data and the like created according to the use of the electronic device are stored.
  • the memory 12 may include non-volatile and volatile memory such as: hard disk, internal memory, plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash memory card (Flash) Card), at least one disk storage device, flash memory device, or other storage device.
  • non-volatile and volatile memory such as: hard disk, internal memory, plug-in hard disk, Smart Media Card (SMC), Secure Digital (SD) card, Flash memory card (Flash) Card), at least one disk storage device, flash memory device, or other storage device.
  • the memory 12 may be an external memory and/or an internal memory of the electronic device 1 . Further, the storage 12 may be a storage in physical form, such as a memory stick, a TF card (Trans-flash Card) and the like.
  • TF card Trans-flash Card
  • modules/units integrated in the electronic device 1 are implemented in the form of software functional units and sold or used as independent products, they may be stored in a computer-readable storage medium.
  • the computer-readable storage medium may be a non-volatile storage medium or a volatile storage medium.
  • the present application can implement all or part of the processes in the methods of the above embodiments, and can also be completed by instructing relevant hardware through computer-readable instructions, and the computer-readable instructions can be stored in a computer-readable storage medium.
  • the computer-readable instructions when executed by the processor, can implement the steps of the above-mentioned method embodiments.
  • the computer-readable instructions include computer-readable instruction codes
  • the computer-readable instruction codes may be in source code form, object code form, executable file, or some intermediate form, and the like.
  • the computer-readable medium may include: any entity or device capable of carrying the computer-readable instruction code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM, Read-Only). Memory), random access memory (RAM, Random Access Memory).
  • the blockchain referred to in this application is a new application mode of computer technologies such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information to verify its Validity of information (anti-counterfeiting) and generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.
  • the memory 12 in the electronic device 1 stores computer-readable instructions to implement a method for determining similar text
  • the processor 13 can execute the computer-readable instructions to implement:
  • Calculate the similarity between the feature vector to be detected and the target feature vector obtain the text similarity between the text to be detected and the target text, and determine a similarity coefficient according to the text to be detected and the target text;
  • the computer-readable storage medium stores computer-readable instructions, wherein the computer-readable instructions are used to implement the following steps when executed by the processor 13:
  • Calculate the similarity of the feature vector to be detected and the target feature vector obtain the text similarity of the text to be detected and the target text, and determine a similarity coefficient according to the text to be detected and the target text;
  • modules described as separate components may or may not be physically separated, and components shown as modules may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional module in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware, or can be implemented in the form of hardware plus software function modules.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

La présente demande se rapporte à l'intelligence artificielle et concerne un procédé de détermination de texte similaire et un dispositif associé. Le procédé consiste à : déterminer un texte à détecter et un texte cible ; générer un vecteur de caractéristique à détecter et un vecteur de caractéristique cible ; calculer la similitude entre le vecteur de caractéristique à détecter et le vecteur de caractéristique cible ; déterminer un coefficient de similitude et des caractéristiques de polarité ; générer des caractéristiques de texte en fonction de la similitude de texte, du coefficient de similitude et des caractéristiques de polarité ; convertir le texte à détecter en un vecteur sémantique à détecter et convertir le texte cible en un vecteur sémantique cible ; générer des caractéristiques sémantiques du texte à détecter et du texte cible ; et déterminer un résultat de similitude en fonction des caractéristiques de texte et des caractéristiques sémantiques. La présente demande permet d'améliorer la précision de détermination d'un texte similaire. De plus, la présente demande concerne en outre la technologie des chaînes de blocs, et le résultat de similitude peut être stocké dans une chaîne de blocs.
PCT/CN2021/109391 2021-01-19 2021-07-29 Procédé de détermination de texte similaire et dispositif associé WO2022156180A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110071000.0 2021-01-19
CN202110071000.0A CN112395886B (zh) 2021-01-19 2021-01-19 相似文本确定方法及相关设备

Publications (1)

Publication Number Publication Date
WO2022156180A1 true WO2022156180A1 (fr) 2022-07-28

Family

ID=74625659

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109391 WO2022156180A1 (fr) 2021-01-19 2021-07-29 Procédé de détermination de texte similaire et dispositif associé

Country Status (2)

Country Link
CN (1) CN112395886B (fr)
WO (1) WO2022156180A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117195860A (zh) * 2023-11-07 2023-12-08 品茗科技股份有限公司 智能巡检方法、系统、电子设备和计算机可读存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395886B (zh) * 2021-01-19 2021-04-13 深圳壹账通智能科技有限公司 相似文本确定方法及相关设备
CN113239666B (zh) * 2021-05-13 2023-09-29 深圳市智灵时代科技有限公司 一种文本相似度计算方法及系统
CN113987115A (zh) * 2021-09-26 2022-01-28 润联智慧科技(西安)有限公司 一种文本相似度计算方法、装置、设备及存储介质
CN116957368A (zh) * 2022-03-31 2023-10-27 华为技术有限公司 一种评分方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880600A (zh) * 2012-08-30 2013-01-16 北京航空航天大学 基于通用知识网络的词语语义倾向性预测方法
US20140249799A1 (en) * 2013-03-04 2014-09-04 Microsoft Corporation Relational similarity measurement
CN109145299A (zh) * 2018-08-16 2019-01-04 北京金山安全软件有限公司 一种文本相似度确定方法、装置、设备及存储介质
CN109635077A (zh) * 2018-12-18 2019-04-16 武汉斗鱼网络科技有限公司 文本相似度的计算方法、装置、电子设备及存储介质
CN112395886A (zh) * 2021-01-19 2021-02-23 深圳壹账通智能科技有限公司 相似文本确定方法及相关设备

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070136839A1 (en) * 2003-10-14 2007-06-14 Ceres, Inc. Promoter, promoter control elements, and combinations, and uses thereof
CN108090047B (zh) * 2018-01-10 2022-05-24 华南师范大学 一种文本相似度的确定方法及设备
CN108052509B (zh) * 2018-01-31 2019-06-28 北京神州泰岳软件股份有限公司 一种文本相似度计算方法、装置及服务器
CN108595517B (zh) * 2018-03-26 2021-03-09 南京邮电大学 一种大规模文档相似性检测方法
CN108874174B (zh) * 2018-05-29 2020-04-24 腾讯科技(深圳)有限公司 一种文本纠错方法、装置以及相关设备
WO2020020287A1 (fr) * 2018-07-25 2020-01-30 中兴通讯股份有限公司 Procédé d'acquisition de similarité de texte, appareil, dispositif et support de stockage lisible
CN110781277A (zh) * 2019-09-23 2020-02-11 厦门快商通科技股份有限公司 文本识别模型相似度训练方法、系统、识别方法及终端
CN111949766A (zh) * 2020-08-20 2020-11-17 深圳市卡牛科技有限公司 一种文本相似度的识别方法、系统、设备和存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102880600A (zh) * 2012-08-30 2013-01-16 北京航空航天大学 基于通用知识网络的词语语义倾向性预测方法
US20140249799A1 (en) * 2013-03-04 2014-09-04 Microsoft Corporation Relational similarity measurement
CN109145299A (zh) * 2018-08-16 2019-01-04 北京金山安全软件有限公司 一种文本相似度确定方法、装置、设备及存储介质
CN109635077A (zh) * 2018-12-18 2019-04-16 武汉斗鱼网络科技有限公司 文本相似度的计算方法、装置、电子设备及存储介质
CN112395886A (zh) * 2021-01-19 2021-02-23 深圳壹账通智能科技有限公司 相似文本确定方法及相关设备

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117195860A (zh) * 2023-11-07 2023-12-08 品茗科技股份有限公司 智能巡检方法、系统、电子设备和计算机可读存储介质
CN117195860B (zh) * 2023-11-07 2024-03-26 品茗科技股份有限公司 智能巡检方法、系统、电子设备和计算机可读存储介质

Also Published As

Publication number Publication date
CN112395886B (zh) 2021-04-13
CN112395886A (zh) 2021-02-23

Similar Documents

Publication Publication Date Title
WO2022156180A1 (fr) Procédé de détermination de texte similaire et dispositif associé
WO2021114736A1 (fr) Procédé et appareil d'assistance à consultation médicale, dispositif électronique, et support
US11901047B2 (en) Medical visual question answering
WO2022105115A1 (fr) Procédé et appareil d'appariement de paire de question et réponse, dispositif électronique et support de stockage
WO2021120688A1 (fr) Procédé et appareil de détection de mauvais diagnostic, dispositif électronique et support de stockage
CN110134965B (zh) 用于信息处理的方法、装置、设备和计算机可读存储介质
WO2022088671A1 (fr) Procédé et appareil de réponse automatique à des questions, dispositif et support de mémoire
WO2021196825A1 (fr) Procédé et appareil de génération de résumé, dispositif électronique et support
WO2023045184A1 (fr) Procédé et appareil de reconnaissance de catégorie de texte, dispositif informatique et support
CN113268597B (zh) 文本分类方法、装置、设备及存储介质
WO2022134418A1 (fr) Procédé de reconnaissance vidéo et dispositif associé
US11010566B2 (en) Inferring confidence and need for natural language processing of input data
WO2022041889A1 (fr) Procédé et appareil de routage de fonds, dispositif électronique et support d'informations
WO2022160442A1 (fr) Procédé et appareil de génération de réponse, dispositif électronique et support de stockage lisible
WO2022252638A1 (fr) Procédé et appareil de mise en correspondance de textes, dispositif informatique, et support de stockage lisible
WO2022073513A1 (fr) Procédé et appareil d'aide à l'entrée d'informations, dispositif électronique et support de stockage
CN112052409B (zh) 地址解析方法、装置、设备及介质
US20220027612A1 (en) Detecting and processing sections spanning processed document partitions
CN111933241B (zh) 医疗数据解析方法、装置、电子设备及存储介质
WO2024087297A1 (fr) Procédé et appareil d'analyse de sentiments de texte, dispositif électronique et support de stockage
CN113486680B (zh) 文本翻译方法、装置、设备及存储介质
WO2018214956A1 (fr) Procédé et appareil de traduction automatique, et support d'informations
CN113420545B (zh) 摘要生成方法、装置、设备及存储介质
CN113627186B (zh) 基于人工智能的实体关系检测方法及相关设备
TWI777319B (zh) 幹細胞密度確定方法、裝置、電腦裝置及儲存介質

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21920561

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 031123)

122 Ep: pct application non-entry in european phase

Ref document number: 21920561

Country of ref document: EP

Kind code of ref document: A1