CN112395886A

CN112395886A - Similar text determination method and related equipment

Info

Publication number: CN112395886A
Application number: CN202110071000.0A
Authority: CN
Inventors: 李小娟
Original assignee: OneConnect Financial Technology Co Ltd Shanghai
Current assignee: OneConnect Smart Technology Co Ltd; OneConnect Financial Technology Co Ltd Shanghai
Priority date: 2021-01-19
Filing date: 2021-01-19
Publication date: 2021-02-23
Anticipated expiration: 2041-01-19
Also published as: WO2022156180A1; CN112395886B

Abstract

The invention relates to artificial intelligence and provides a method for determining similar texts and related equipment. The method can determine a text to be detected and a target text, generate a feature vector to be detected and a target feature vector, calculate the similarity between the feature vector to be detected and the target feature vector, determine a similarity coefficient and a polarity feature, generate a text feature according to the text similarity, the similarity coefficient and the polarity feature, convert the text to be detected into a semantic vector to be detected, convert the target text into the target semantic vector, generate semantic features of the text to be detected and the target text, and determine a similarity result according to the text feature and the semantic features. The method and the device can improve the determination accuracy of the similar text. Furthermore, the invention also relates to block chain techniques, and the similar results can be stored in a block chain.

Description

Similar text determination method and related equipment

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a similar text determination method and related equipment.

Background

At present, in a conventional unsupervised text similarity algorithm, the similarity of sentences is determined through co-occurrence information of characters, however, if a word of a synonym or synonym appears in a text, the similarity between two texts cannot be accurately calculated, so that the accuracy of determining similar texts is reduced.

Disclosure of Invention

In view of the foregoing, there is a need to provide a similar text determination method and related apparatus, which can improve the determination accuracy of similar text.

On one hand, the invention provides a similar text determination method, which comprises the following steps:

receiving a similar text determination request, and determining a text to be detected according to the similar text determination request;

acquiring a target text from the similar text determination request;

generating a feature vector to be detected according to the text to be detected and the target text, and generating a target feature vector according to the text to be detected and the target text;

calculating the similarity between the feature vector to be detected and the target feature vector to obtain the text similarity between the text to be detected and the target text, and determining a similarity coefficient according to the text to be detected and the target text;

determining the polarity characteristics of the text to be detected and the target text according to the tone of the text to be detected and the tone of the target text;

generating text characteristics of the text to be detected and the target text according to the text similarity, the similarity coefficient and the polarity characteristics;

converting the text to be detected into a semantic vector to be detected, and converting the target text into a target semantic vector;

generating semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector, and determining a similar result of the text to be detected and the target text according to the text features and the semantic features.

According to the preferred embodiment of the present invention, the determining, according to the similar text determination request, the text to be detected includes:

analyzing the similar text to determine a requested message to obtain data information carried by the message;

acquiring information for indicating a position from the data information as a storage position;

and determining a text library to be detected from the storage position, and extracting any text from the text library to be detected as the text to be detected.

According to a preferred embodiment of the present invention, the generating a feature vector to be detected according to the text to be detected and the target text includes:

performing word segmentation on the text to be detected to obtain a plurality of words to be detected, and performing word segmentation on the target text to obtain a plurality of target words;

acquiring a union of the multiple to-be-detected participles and the multiple target participles to obtain all the participles;

and generating the feature vectors to be detected according to the mapping relation between the multiple word segments to be detected and all the word segments.

According to the preferred embodiment of the present invention, the determining the similarity coefficient according to the text to be detected and the target text includes:

determining the intersection of the multiple to-be-detected participles and the multiple target participles as a co-occurrence word;

calculating the co-occurrence number of the co-occurrence words, and calculating the total word segmentation amount of all the word segmentation;

and dividing the co-occurrence number by the total word segmentation amount to obtain the similarity coefficient.

According to a preferred embodiment of the present invention, the determining the polarity characteristics of the text to be detected and the target text according to the mood of the text to be detected and the mood of the target text includes:

detecting whether the text to be detected contains preset words or not to obtain a first detection result, and detecting whether the target text contains the preset words or not to obtain a second detection result, wherein the preset words are used for indicating negative tone;

determining a first tone of the text to be detected according to the first detection result, and determining a second tone of the target text according to the second detection result;

if the first mood is the same as the second mood, determining the polarity characteristic as a first numerical value; or

And if the first language mood is different from the second language mood, determining the polarity characteristic as a second numerical value.

According to a preferred embodiment of the present invention, the converting the text to be detected into the semantic vector to be detected includes:

converting the text to be detected into a word vector sequence;

performing feature extraction on the word vector sequence by using a forward long-short term memory network to obtain a first feature vector;

extracting the characteristics of the word vector sequence by using a reverse long-short term memory network to obtain a second characteristic vector;

and splicing the first feature vector and the second feature vector to obtain the semantic vector to be detected.

According to a preferred embodiment of the present invention, the generating semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector includes:

subtracting the target semantic vector from the semantic vector to be detected to obtain a difference vector;

splicing the semantic vector to be detected, the target semantic vector and the difference vector to obtain a spliced semantic vector;

and carrying out iterative mapping on the spliced semantic vector by utilizing a plurality of pre-constructed hidden layers to obtain the semantic features.

In another aspect, the present invention further provides a similar text determining apparatus, where the similar text determining apparatus includes:

the determining unit is used for receiving a similar text determining request and determining a text to be detected according to the similar text determining request;

the acquisition unit is used for acquiring a target text from the similar text determination request;

the generating unit is used for generating a feature vector to be detected according to the text to be detected and the target text and generating a target feature vector according to the text to be detected and the target text;

the determining unit is further configured to calculate similarity between the feature vector to be detected and the target feature vector, obtain text similarity between the text to be detected and the target text, and determine a similarity coefficient according to the text to be detected and the target text;

the determining unit is further configured to determine the polarity characteristics of the text to be detected and the target text according to the tone of the text to be detected and the tone of the target text;

the generating unit is further configured to generate text features of the text to be detected and the target text according to the text similarity, the similarity coefficient and the polarity feature;

the conversion unit is used for converting the text to be detected into a semantic vector to be detected and converting the target text into a target semantic vector;

the determining unit is further configured to generate semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector, and determine a similar result of the text to be detected and the target text according to the text features and the semantic features.

In another aspect, the present invention further provides an electronic device, including:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the similar text determination method.

In another aspect, the present invention further provides a computer-readable storage medium, in which computer-readable instructions are stored, and the computer-readable instructions are executed by a processor in an electronic device to implement the similar text determination method.

According to the technical scheme, the text similarity, the similarity coefficient and the polarity characteristic of the text to be detected and the target text are determined, and the polarity characteristic can represent whether the moods of the text to be detected and the target text are the same or not, so that the similarity degree of the text to be detected and the target text can be accurately determined, the problem of low accuracy caused by the occurrence of synonyms or synonyms is avoided through the determination of the semantic characteristic, and the similar result of the text to be detected and the target text can be accurately determined through the text characteristic and the semantic characteristic.

Drawings

FIG. 1 is a flow chart of a preferred embodiment of the similar text determination method of the present invention.

FIG. 2 is a flow chart of an embodiment of the present invention for generating feature vectors to be detected.

FIG. 3 is a flow diagram of one embodiment of generating semantic features of the present invention.

Fig. 4 is a functional block diagram of a preferred embodiment of the similar text determining apparatus of the present invention.

Fig. 5 is a schematic structural diagram of an electronic device implementing a similar text determination method according to a preferred embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.

Fig. 1 is a flow chart of a preferred embodiment of the method for determining similar texts according to the present invention. The order of the steps in the flow chart may be changed and some steps may be omitted according to different needs.

The similar text determination method is applied to one or more electronic devices, which are devices capable of automatically performing numerical calculation and/or information processing according to computer readable instructions set or stored in advance, and the hardware thereof includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.

The electronic device may be any electronic product capable of performing human-computer interaction with a user, for example, a Personal computer, a tablet computer, a smart phone, a Personal Digital Assistant (PDA), a game machine, an interactive Internet Protocol Television (IPTV), a smart wearable device, and the like.

The electronic device may include a network device and/or a user device. Wherein the network device includes, but is not limited to, a single network electronic device, an electronic device group consisting of a plurality of network electronic devices, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of hosts or network electronic devices.

The network in which the electronic device is located includes, but is not limited to: the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

And S10, receiving a similar text determination request, and determining the text to be detected according to the similar text determination request.

In at least one embodiment of the present invention, the information carried in the similar text determination request includes, but is not limited to: target text, storage location, etc. The similar text determination request may be triggered by any user.

The text to be detected refers to a text which needs to be detected whether the text is similar to the target text or not. There may be a plurality of texts to be detected.

In at least one embodiment of the present invention, the determining, by the electronic device, the text to be detected according to the similar text determination request includes:

By the embodiment, the whole similar text determining request does not need to be analyzed, so that the obtaining efficiency of the storage position can be improved, and the text to be detected can be quickly obtained.

And S11, acquiring the target text from the similar text determination request.

In at least one embodiment of the present invention, the target text refers to a reference text in the similar text determination request.

In at least one embodiment of the present invention, the electronic device obtaining the target text from the similar text determination request includes:

and acquiring information for indicating a text from the data information as the target text.

According to the embodiment, the target text is stored in the similar text determination request, so that the target text can be quickly acquired from the data information obtained through analysis.

And S12, generating a feature vector to be detected according to the text to be detected and the target text, and generating a target feature vector according to the text to be detected and the target text.

Referring to fig. 2, fig. 2 is a flow chart of an embodiment of the present invention for generating feature vectors to be detected. In at least one embodiment of the present invention, the generating, by the electronic device, the feature vector to be detected according to the text to be detected and the target text includes:

and S120, performing word segmentation processing on the text to be detected to obtain a plurality of words to be detected, and performing word segmentation processing on the target text to obtain a plurality of target words.

The multiple word segments to be detected can be multiple words, and the multiple target word segments can be multiple words.

And S121, acquiring a union of the multiple to-be-detected participles and the multiple target participles to obtain all the participles.

And S122, generating the feature vectors to be detected according to the mapping relation between the multiple participles to be detected and all the participles.

The mapping relation refers to whether the multiple participles to be detected exist in all the participles.

For example: the multiple word segments to be detected are as follows: i, immediately, i, help, you, claim, please, good, do, the plurality of target participles are: i, do, law, help, you, claim, please, therefore, all the participles are: i, help, ask, please, stand, i.e., you, good, do, none, office, law, you, because "none, office, law, you" does not appear in the multiple to-be-detected participles, the to-be-detected feature vector is [1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0 ].

By the aid of the method, the feature vector to be detected can be determined according to the text to be detected and the target text, and the feature vector to be detected can be accurately determined as the feature vector to be detected is generated according to the target text.

In at least one embodiment of the present invention, the generating, by the electronic device, the target feature vector according to the text to be detected and the target text includes:

and generating the target feature vector according to the mapping relation between the target word segmentation and all the word segmentations.

S13, calculating the similarity between the feature vector to be detected and the target feature vector to obtain the text similarity between the text to be detected and the target text, and determining a similarity coefficient according to the text to be detected and the target text.

In at least one embodiment of the present invention, the electronic device calculates the similarity between the feature vector to be detected and the target feature vector by using a cosine similarity calculation formula.

The specific cosine similarity calculation formula is as follows:

；

wherein the content of the first and second substances,

is the similarity between the feature vector to be detected and the target feature vector,

refers to the vector dimensions of the feature vector to be detected and the target feature vector,

refers to the dimension of the current vector,

is referred to as the feature vector to be detected,

refers to the target feature vector.

The text similarity can be quickly determined through the cosine similarity calculation formula.

In at least one embodiment of the present invention, the determining, by the electronic device, a similarity coefficient according to the text to be detected and the target text includes:

Taking the above example into account, the co-occurrence words are me, help, claim and please, the co-occurrence number of the co-occurrence words is calculated to be 4, the total word segmentation amount of all the word segmentation is calculated to be 13, and the total word segmentation amount is calculatedObtaining the similarity coefficient of

。

Through the implementation mode, the similarity coefficient can be accurately determined according to the co-occurrence words of the text to be detected and the target text.

And S14, determining the polarity characteristics of the text to be detected and the target text according to the tone of the text to be detected and the tone of the target text.

In at least one embodiment of the present invention, the polarity feature includes 1 or 0. And when the tone of the text to be detected is the same as that of the target text, determining the polarity characteristic as 1, and when the tone of the text to be detected is different from that of the target text, determining the polarity characteristic as 0.

In at least one embodiment of the present invention, the determining, by the electronic device, the polarity characteristics of the text to be detected and the target text according to the tone of the text to be detected and the tone of the target text includes:

Wherein the predetermined words include, but are not limited to: none, none.

Through the embodiment, the tone of the text to be detected and the tone of the target text can be accurately determined according to the preset words, and then the polarity characteristics can be accurately determined.

And S15, generating text characteristics of the text to be detected and the target text according to the text similarity, the similarity coefficient and the polarity characteristics.

In at least one embodiment of the present invention, the text features are obtained by concatenating the text similarity, the similarity coefficient, and the polarity feature.

For example, the text similarity is 0.4714, the similarity coefficient is 0.3077, the polarity feature is 0, and after concatenation, the text feature is [0.4714, 0.3077, 0 ].

And S16, converting the text to be detected into a semantic vector to be detected, and converting the target text into a target semantic vector.

In at least one embodiment of the present invention, the semantic vector to be detected includes a semantic in the text to be detected, and the target semantic vector includes a semantic of the target text.

In at least one embodiment of the present invention, the electronic device converts the text to be detected into a semantic vector to be detected, including:

converting the text to be detected into a word vector sequence;

By the implementation method, the generated semantic vector to be detected can have the context semantic of the text to be detected, and the determination accuracy of the semantic vector to be detected is improved.

S17, generating semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector, and determining a similar result of the text to be detected and the target text according to the text features and the semantic features.

It is emphasized that the similar results can also be stored in a node of a blockchain in order to further ensure the privacy and security of the similar results.

In at least one embodiment of the present invention, the similarity result includes that the text to be detected is similar to the target text, and the text to be detected is not similar to the target text.

Referring to FIG. 3, FIG. 3 is a flow diagram of one embodiment of generating semantic features of the present invention. In at least one embodiment of the present invention, the generating, by the electronic device, the semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector includes:

and S170, subtracting the target semantic vector from the to-be-detected semantic vector to obtain a difference vector.

And S171, splicing the semantic vector to be detected, the target semantic vector and the difference vector to obtain a spliced semantic vector.

And S172, carrying out iterative mapping on the spliced semantic vector by utilizing a plurality of layers of hidden layers which are constructed in advance to obtain the semantic features.

By the above embodiment, the semantic features are obtained according to the operation of the semantic vector to be detected and the target semantic vector, so that the semantic features have the semantics in the text to be detected and the target text, and the accuracy of the semantic features is improved.

In at least one embodiment of the present invention, the determining, by the electronic device, a similarity result between the text to be detected and the target text according to the text feature and the semantic feature includes:

splicing the text features and the semantic features to obtain a target vector;

and inputting the target vector into a pre-constructed two-classification network to obtain the similar result.

With the above embodiment, since the similarity result is determined by using the text feature and the semantic feature, the similarity result can be accurately determined.

Fig. 4 is a functional block diagram of a preferred embodiment of the similar text determination apparatus according to the present invention. The similar text determination device 11 includes a determination unit 110, an acquisition unit 111, a generation unit 112, and a conversion unit 113. The module/unit referred to herein is a series of computer readable instruction segments that can be accessed by the processor 13 and perform a fixed function and that are stored in the memory 12. In the present embodiment, the functions of the modules/units will be described in detail in the following embodiments.

The determining unit 110 receives the similar text determination request, and determines the text to be detected according to the similar text determination request.

In at least one embodiment of the present invention, the determining unit 110 determines, according to the similar text determination request, that the text to be detected includes:

The acquisition unit 111 acquires the target text from the similar text determination request.

In at least one embodiment of the present invention, the obtaining unit 111 obtains the target text from the similar text determination request, including:

The generating unit 112 generates a feature vector to be detected according to the text to be detected and the target text, and generates a target feature vector according to the text to be detected and the target text.

In at least one embodiment of the present invention, the generating unit 112 generates the feature vector to be detected according to the text to be detected and the target text, including:

performing word segmentation processing on the text to be detected to obtain a plurality of word segments to be detected, and performing word segmentation processing on the target text to obtain a plurality of target word segments.

And acquiring a union of the multiple to-be-detected participles and the multiple target participles to obtain all the participles.

In at least one embodiment of the present invention, the generating unit 112 generates the target feature vector according to the text to be detected and the target text, including:

The determining unit 110 calculates similarity between the feature vector to be detected and the target feature vector to obtain text similarity between the text to be detected and the target text, and determines a similarity coefficient according to the text to be detected and the target text.

In at least one embodiment of the present invention, the determining unit 110 calculates the similarity between the feature vector to be detected and the target feature vector by using a cosine similarity calculation formula.

The specific cosine similarity calculation formula is as follows:

；

wherein the content of the first and second substances,

refers to the dimension of the current vector,

is referred to as the feature vector to be detected,

refers to the target feature vector.

In at least one embodiment of the present invention, the determining unit 110 determines the similarity coefficient according to the text to be detected and the target text, including:

Taking the above example into account, the co-occurrence words are me, help, claim and please, the co-occurrence number of the co-occurrence words is calculated to be 4, the total word segmentation amount of all the word segmentation is calculated to be 13, and the similarity coefficient is obtained by calculation

。

The determining unit 110 determines the polarity characteristics of the text to be detected and the target text according to the tone of the text to be detected and the tone of the target text.

In at least one embodiment of the present invention, the determining unit 110 determines the polarity characteristics of the text to be detected and the target text according to the mood of the text to be detected and the mood of the target text, including:

Wherein the predetermined words include, but are not limited to: none, none.

The generating unit 112 generates text features of the text to be detected and the target text according to the text similarity, the similarity coefficient and the polarity feature.

The conversion unit 113 converts the text to be detected into a semantic vector to be detected, and converts the target text into a target semantic vector.

In at least one embodiment of the present invention, the converting unit 113 converts the text to be detected into the semantic vector to be detected, including:

converting the text to be detected into a word vector sequence;

The determining unit 110 generates semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector, and determines a similar result of the text to be detected and the target text according to the text features and the semantic features.

In at least one embodiment of the present invention, the determining unit 110 generates the semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector, including:

and subtracting the target semantic vector from the semantic vector to be detected to obtain a difference vector.

And splicing the semantic vector to be detected, the target semantic vector and the difference vector to obtain a spliced semantic vector.

In at least one embodiment of the present invention, the determining unit 110 determines, according to the text feature and the semantic feature, a similarity result between the text to be detected and the target text, including:

splicing the text features and the semantic features to obtain a target vector;

Fig. 5 is a schematic structural diagram of an electronic device according to a preferred embodiment of the present invention for implementing a method for determining similar texts.

In one embodiment of the present invention, the electronic device 1 includes, but is not limited to, a memory 12, a processor 13, and computer readable instructions, such as a similar text determination program, stored in the memory 12 and executable on the processor 13.

It will be appreciated by a person skilled in the art that the schematic diagram is only an example of the electronic device 1 and does not constitute a limitation of the electronic device 1, and that it may comprise more or less components than shown, or some components may be combined, or different components, e.g. the electronic device 1 may further comprise an input output device, a network access device, a bus, etc.

The Processor 13 may be a Central Processing Unit (CPU), other general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware component, etc. The processor 13 is an operation core and a control center of the electronic device 1, and is connected to each part of the whole electronic device 1 by various interfaces and lines, and executes an operating system of the electronic device 1 and various installed application programs, program codes, and the like.

Illustratively, the computer readable instructions may be partitioned into one or more modules/units that are stored in the memory 12 and executed by the processor 13 to implement the present invention. The one or more modules/units may be a series of computer readable instruction segments capable of performing specific functions, which are used for describing the execution process of the computer readable instructions in the electronic device 1. For example, the computer readable instructions may be divided into a determination unit 110, an acquisition unit 111, a generation unit 112, and a conversion unit 113.

The memory 12 may be used for storing the computer readable instructions and/or modules, and the processor 13 implements various functions of the electronic device 1 by executing or executing the computer readable instructions and/or modules stored in the memory 12 and invoking data stored in the memory 12. The memory 12 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data created according to use of the electronic device, and the like. The memory 12 may include non-volatile and volatile memories, such as: a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other storage device.

The memory 12 may be an external memory and/or an internal memory of the electronic device 1. Further, the memory 12 may be a memory having a physical form, such as a memory stick, a TF Card (Trans-flash Card), or the like.

The integrated modules/units of the electronic device 1 may be stored in a computer-readable storage medium if they are implemented in the form of software functional units and sold or used as separate products. Based on such understanding, all or part of the flow of the method according to the above embodiments may be implemented by hardware that is configured to be instructed by computer readable instructions, which may be stored in a computer readable storage medium, and when the computer readable instructions are executed by a processor, the steps of the method embodiments may be implemented.

Wherein the computer readable instructions comprise computer readable instruction code which may be in source code form, object code form, an executable file or some intermediate form, and the like. The computer-readable medium may include: any entity or device capable of carrying said computer readable instruction code, recording medium, U-disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM).

The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

With reference to fig. 1, the memory 12 of the electronic device 1 stores computer-readable instructions to implement a similar text determination method, and the processor 13 can execute the computer-readable instructions to implement:

acquiring a target text from the similar text determination request;

Specifically, the processor 13 may refer to the description of the relevant steps in the embodiment corresponding to fig. 1 for a specific implementation method of the computer readable instructions, which is not described herein again.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is only one logical functional division, and other divisions may be realized in practice.

The computer readable storage medium has computer readable instructions stored thereon, wherein the computer readable instructions when executed by the processor 13 are configured to implement the steps of:

acquiring a target text from the similar text determination request;

The modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional module.

The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference signs in the claims shall not be construed as limiting the claim concerned.

Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. The plurality of units or devices may also be implemented by one unit or device through software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Finally, it should be noted that the above embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention is described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention.

Claims

1. A similar text determination method, characterized in that the similar text determination method comprises:

acquiring a target text from the similar text determination request;

2. The method for determining similar texts according to claim 1, wherein the determining the text to be detected according to the similar text determination request comprises:

3. The method for determining similar texts according to claim 1, wherein the generating feature vectors to be detected according to the texts to be detected and the target text comprises:

4. The method for determining similar texts according to claim 3, wherein the determining the similarity coefficient according to the text to be detected and the target text comprises:

5. The method for determining similar texts according to claim 1, wherein the determining the polarity characteristics of the text to be detected and the target text according to the mood of the text to be detected and the mood of the target text comprises:

6. The method for determining similar texts according to claim 1, wherein the converting the text to be detected into the semantic vector to be detected comprises:

converting the text to be detected into a word vector sequence;

7. The method for determining similar texts according to claim 1, wherein the generating semantic features of the text to be detected and the target text according to the semantic vector to be detected and the target semantic vector comprises:

8. A similar text determination apparatus, characterized in that the similar text determination apparatus comprises:

9. An electronic device, characterized in that the electronic device comprises:

a memory storing computer readable instructions; and

a processor executing computer readable instructions stored in the memory to implement the similar text determination method of any of claims 1 to 7.

10. A computer-readable storage medium characterized by: the computer readable storage medium has stored therein computer readable instructions which are executed by a processor in an electronic device to implement the similar text determination method as claimed in any one of claims 1 to 7.