CN111144104B - Text similarity determination method, device and computer readable storage medium - Google Patents

Text similarity determination method, device and computer readable storage medium Download PDF

Info

Publication number
CN111144104B
CN111144104B CN201811297685.5A CN201811297685A CN111144104B CN 111144104 B CN111144104 B CN 111144104B CN 201811297685 A CN201811297685 A CN 201811297685A CN 111144104 B CN111144104 B CN 111144104B
Authority
CN
China
Prior art keywords
text
words
similarity
word
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811297685.5A
Other languages
Chinese (zh)
Other versions
CN111144104A (en
Inventor
路绪海
马怡安
黄挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201811297685.5A priority Critical patent/CN111144104B/en
Publication of CN111144104A publication Critical patent/CN111144104A/en
Application granted granted Critical
Publication of CN111144104B publication Critical patent/CN111144104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The disclosure relates to a method, a device and a computer-readable storage medium for determining text similarity, and relates to the technical field of artificial intelligence. The method comprises the following steps: calculating the degree of correlation between each word in the first text and the second text by using the word vector as a first similarity, wherein the number of words in the first text is smaller than that of words in the second text; selecting a corresponding number of words from the second text as target words according to the number of words of the first text; calculating the correlation degree of each target word and the first text by using the word vector as a second similarity; and calculating the comprehensive similarity of the first text and the second text according to the first similarity, the second similarity and the length of the second text. According to the technical scheme, accuracy of text similarity can be improved.

Description

Text similarity determination method, device and computer readable storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and in particular, to a method for determining text similarity, a device for determining text similarity, and a computer readable storage medium.
Background
In the field of artificial intelligence, text similarity calculation is a typical application of weak artificial intelligence and is the basis for interaction between a robot and a user. How to determine the similarity of texts is one of the directions of intense research in the field.
In the related art, words in a target text are compared with words in a comparison text one by one to determine text similarity.
Disclosure of Invention
The inventors of the present disclosure found that the above-described related art has the following problems: the accuracy of the determined text similarity is low, particularly in the case where the lengths differ between texts or there is an inclusion relationship between texts, which is greatly affected by the lengths of the texts.
In view of this, the present disclosure proposes a technical solution for determining text similarity, which can improve accuracy of text similarity.
According to some embodiments of the present disclosure, there is provided a method for determining text similarity, including: calculating the degree of correlation between each word in a first text and a second text by using word vectors to serve as a first similarity, wherein the number of words in the first text is smaller than that of words in the second text; selecting a corresponding number of words from the second text as target words according to the number of words of the first text; calculating the correlation degree of each target word and the first text by using the word vector as a second similarity; and calculating the comprehensive similarity of the first text and the second text according to the first similarity, the second similarity and the length of the second text.
In some embodiments, vector distances between word vectors of words in the first text and word vectors of words in the second text are calculated; taking the smallest one of all the vector distances as a correlation coefficient of the word; and determining the first similarity according to the weighted sum of the correlation coefficients of all words in the first text.
In some embodiments, the difference between the number of words of the first text and the number of words of the second text is taken as a target number; and selecting the words of the target number from the second text as the target words.
In some embodiments, the last N words are selected as the target words in the second text, where N is the target number.
In some embodiments, vector distances between word vectors of target words in the second text and word vectors of words in the first text are calculated; taking the smallest one of all the vector distances as a correlation coefficient of the target word; and determining the second similarity according to the weighted sum of the correlation coefficients of all target words in the second text.
In some embodiments, the integrated similarity is positively correlated with a weighted sum of the first similarity and the second similarity, and negatively correlated with a length of the second text.
According to other embodiments of the present disclosure, a calculating unit is provided, configured to calculate, using a word vector, a degree of correlation between each word in a first text and a second text as a first similarity, where the number of words in the first text is smaller than the number of words in the second text, calculate, using a word vector, a degree of correlation between each target word and the first text as a second similarity, and calculate, based on the first similarity, the second similarity, and a length of the second text, a comprehensive similarity between the first text and the second text; and the selecting unit is used for selecting the corresponding number of words in the second text as the target words according to the number of words in the first text.
In some embodiments, the calculating unit calculates a vector distance between a word vector of a word in the first text and a word vector of each word in the second text, uses a minimum one of all the vector distances as a correlation coefficient of the word, and determines the first similarity according to a weighted sum of correlation coefficients of all the words in the first text.
In some embodiments, the selecting unit uses a difference value between the number of words of the first text and the number of words of the second text as a target number, and selects the target number of words in the second text as the target word.
In some embodiments, the selecting unit selects the last N words in the second text as the target words, where N is the target number.
In some embodiments, the calculating unit calculates a vector distance between a word vector of the target word in the second text and a word vector of each word in the first text, uses a minimum one of all the vector distances as a correlation coefficient of the target word, and determines the second similarity according to a weighted sum of correlation coefficients of all target words in the second text.
In some embodiments, the integrated similarity is positively correlated with a weighted sum of the first similarity and the second similarity, and negatively correlated with a length of the second text.
According to still further embodiments of the present disclosure, there is provided a text similarity determining apparatus, including: a memory; and a processor coupled to the memory, the processor configured to perform the method of determining text similarity in any of the embodiments described above based on instructions stored in the memory device.
According to still further embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of determining text similarity in any of the above embodiments.
In the above embodiment, not only the degree of correlation between the words in the short text and the long text is considered, but also the degree of correlation between the corresponding number of words in the long text and the short text is considered according to the difference in length between the texts, so that the comprehensive similarity is obtained, and the comprehensive similarity is processed according to the length of the text. Therefore, the adaptability of the method to the text length can be enhanced, and unstable or inaccurate calculation caused by the text length difference is avoided, so that the accuracy of the text similarity is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
FIG. 1 illustrates a flow chart of some embodiments of a method of determining text similarity of the present disclosure;
FIG. 2 illustrates a flow chart of some embodiments of step 110 of FIG. 1;
FIG. 3 illustrates a flow chart of some embodiments of step 120 of FIG. 1;
FIG. 4 illustrates a block diagram of some embodiments of a determination apparatus of text similarity of the present disclosure;
FIG. 5 illustrates a block diagram of further embodiments of a text similarity determination apparatus of the present disclosure;
fig. 6 shows a block diagram of still further embodiments of a determination device of text similarity of the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Fig. 1 illustrates a flow chart of some embodiments of a method of determining text similarity of the present disclosure.
As shown in fig. 1, the method includes: step 110, calculating a first similarity; step 120, selecting a target word; step 130, calculating a second similarity; and step 140, calculating the comprehensive similarity.
In step 110, the degree of correlation between each word in the first text and the second text is calculated as a first similarity by using the word vector, and the number of words in the first text is smaller than that of words in the second text.
In some embodiments, word segmentation may be performed on the first text and the second text to obtain each Word in the first text and the second text, and then a skip-gram model of Word2vec is used to calculate a Word vector of each Word. On the one hand, the calculation efficiency can be improved by taking the shorter text as the processing object, and on the other hand, the correlation degree of the words in the longer text to the shorter text can be further obtained in the subsequent step according to the text length difference, so that the accuracy of the similarity is improved.
In some embodiments, step 110 may be performed by the embodiment of fig. 2.
Fig. 2 illustrates a flow chart of some embodiments of step 110 of fig. 1.
As shown in fig. 2, step 110 includes: step 1110, calculating the distance of each vector; step 1120, determining a correlation coefficient; and step 1130, determining a first similarity.
In step 1110, a vector distance between a word vector of words in the first text and a word vector of words in the second text is calculated. For example, the first text contains a total of L words: w (w) 1,1 、w 1,2 ……w 1,l ……w 1,L The corresponding word vectors are: v 1,1 、v 1,2 ……v 1,l ……v 1,L The method comprises the steps of carrying out a first treatment on the surface of the The second text contains M words in total: w (w) 2,1 、w 2,2 ……w 2,m ……w 1,M The corresponding word vectors are: v 2,1 、v 2,2 ……v 2,m ……v 2,M
In some embodiments, w may be calculated separately 1,1 Word vector v of (a) 1,1 And v 2,1 、v 2,2 ……v 2,m ……v 2,M Euclidean distance d between 1,1 、d 1,2 ……d 1,m ……d 1,M
In step 1120, the smallest one of all vector distances is taken as the correlation coefficient for the term. For example, W 11 Is d 1 =min(d 1,1 、d 1,2 ……d 1,m ……d 1,M ). And then the correlation coefficients of L words in the first text can be obtained: d, d 1 、d 2 ……d l ……d L
In step 1130, a first similarity is determined based on a weighted sum of the correlation coefficients of all words in the first text. For example, the first similarity may be determined to be
Figure BDA0001851574870000051
Can also be applied to d as required l Weighted summation is performed.
After the first similarity is determined, the integrated similarity may be calculated by the remaining steps in FIG. 1.
In step 120, according to the number of words in the first text, a corresponding number of words are selected as target words in the second text. For example, step 120 may be performed by the embodiment of fig. 3.
Fig. 3 illustrates a flow chart of some embodiments of step 120 of fig. 1.
As shown in fig. 3, step 120 includes: step 1210, determining a target number; step 1220, select the target word.
In step 1210, the difference between the number of words of the first text and the number of words of the second text is taken as the target number.
In step 1220, a target number of terms are selected as target terms in the second text. For example, the last N words may be selected in the second text as target words, N being the target number.
In some embodiments, there are 10 words in the first text, 50 words in the second text, and 11 th to 50 th words in the second text may be selected as target words. The target words with corresponding numbers can be selected randomly according to the requirement.
In the case that the longer second text contains words in the shorter first text, but the meaning of the second text is completely different from that of the first text, the target word selected in this way can more accurately express the true meaning of the second text, so that the accuracy of the determined text similarity is improved.
After the target word is selected, the comprehensive similarity can be calculated through the rest steps in fig. 1.
In step 130, the degree of relevance of each target word to the first text is calculated as a second similarity using the word vector. For example, a vector distance between a word vector of the target word in the second text and a word vector of each word in the first text is calculated. The smallest one of all vector distances is taken as the correlation coefficient of the target word. And determining the second similarity according to the weighted sum of the correlation coefficients of all target words in the second text.
In some embodiments, N words in the second text are selectedAs a target word, n=m-L. The N target words may be the (l+1) th word to the (l+n) th word in the second text: w (w) 2,L+1 、w 2,L+2 ……w 1,L+N The corresponding word vectors are: v 2,L+1 、v 2,L+2 ……v 2,L+N . Respectively calculate w 2,L+1 Word vector v of (a) 2,L+1 And v 1,1 、v 1,2 ……v 1,l ……v 1,L Euclidean distance d between L+1,1 、d L+1,2 ……d L+1,L
The smallest one of all vector distances may be taken as the correlation coefficient for the target word. For example, w 2,L+1 Is d L+1 =min(d L+1,1 、d L+1,2 ……d L+1,L ). And then, the correlation coefficients of the total N target words in the second text can be obtained: d, d L+1 、d L+2 ……d L+N
And determining the second similarity according to the weighted sum of the correlation coefficients of all target words in the second text. For example, the second similarity may be determined to be
Figure BDA0001851574870000071
Can also be applied to d as required L+n Weighted summation is performed.
In step 140, a combined similarity of the first text and the second text is calculated based on the first similarity, the second similarity, and the length of the second text. For example, the integrated similarity is positively correlated with a weighted sum of the first similarity and the second similarity, and negatively correlated with the length of the second text.
In some embodiments, where the second relatively long text contains M words in total, the integrated similarity S may be determined according to the following formula:
S=(D 1 +D 2 )/M
in some embodiments, a synonym table and a paraphrasing table may be preset, synonyms and paraphrasing in the first text and the second text are determined according to the synonym table and the paraphrasing table, and the distance between the synonym and the paraphrasing vector is set to 0.
In the above embodiment, not only the degree of correlation between the words in the short text and the long text is considered, but also the degree of correlation between the corresponding number of words in the long text and the short text is considered according to the difference in length between the texts, so that the comprehensive similarity is obtained, and the comprehensive similarity is processed according to the length of the text. Therefore, the adaptability of the method to the text length can be enhanced, and unstable or inaccurate calculation caused by the text length difference is avoided, so that the accuracy of the text similarity is improved.
Fig. 4 illustrates a block diagram of some embodiments of a determination apparatus of text similarity of the present disclosure.
As shown in fig. 4, the text similarity determining device 4 includes a calculating unit 41 and a selecting unit 42.
The calculation unit 41 calculates, as the first similarity, the degree of correlation of each word in the first text with the second text using the word vector, the number of words in the first text being smaller than the number of words in the second text.
In some embodiments, the calculation unit 41 calculates a vector distance between a word vector of words in the first text and a word vector of words in the second text. The calculation unit 41 takes the smallest one of all vector distances as the correlation coefficient of the word. The calculation unit 41 determines the first similarity from a weighted sum of the correlation coefficients of all the words in the first text.
The selecting unit 42 selects a corresponding number of words in the second text as target words according to the number of words in the first text.
In some embodiments, the selection unit 42 takes the difference between the number of words of the first text and the number of words of the second text as the target number N, and selects the target number of words in the second text as the target words. For example, the selection unit 42 selects the last N words in the second text as target words.
The calculation unit 41 calculates the degree of correlation of each target word with the first text as the second degree of similarity using the word vector. For example, the calculation unit 41 calculates a vector distance between a word vector of a target word in the second text and a word vector of each word in the first text, takes the smallest one of all vector distances as a correlation coefficient of the target word, and determines a second similarity from a weighted sum of correlation coefficients of all target words in the second text.
The calculation unit 41 calculates the integrated similarity of the first text and the second text based on the first similarity, the second similarity, and the length of the second text. For example, the integrated similarity is positively correlated with a weighted sum of the first similarity and the second similarity, and negatively correlated with the length of the second text.
In the above embodiment, not only the degree of correlation between the words in the short text and the long text is considered, but also the degree of correlation between the corresponding number of words in the long text and the short text is considered according to the difference in length between the texts, so that the comprehensive similarity is obtained, and the comprehensive similarity is processed according to the length of the text. Therefore, the adaptability of the method to the text length can be enhanced, and unstable or inaccurate calculation caused by the text length difference is avoided, so that the accuracy of the text similarity is improved.
Fig. 5 shows a block diagram of further embodiments of a device for determining text similarity of the present disclosure.
As shown in fig. 5, the text similarity determining device 5 of this embodiment includes: a memory 51 and a processor 52 coupled to the memory 51, the processor 52 being configured to perform the method of determining text similarity in any of the embodiments of the present disclosure based on instructions stored in the memory 51.
The memory 51 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, application programs, boot Loader (Boot Loader), database, and other programs.
Fig. 6 shows a block diagram of still further embodiments of a determination device of text similarity of the present disclosure.
As shown in fig. 6, the text similarity determining device 6 of this embodiment includes: a memory 610 and a processor 620 coupled to the memory 610, the processor 620 being configured to perform the method of determining text similarity in any of the foregoing embodiments based on instructions stored in the memory 610.
The memory 610 may include, for example, system memory, fixed nonvolatile storage media, and the like. The system memory stores, for example, an operating system, application programs, boot loader (BootLoader), and other programs.
The text similarity determining device 6 may further include an input-output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630, 640, 650 and the memory 610 and processor 620 may be connected by, for example, a bus 660. The input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. Network interface 640 provides a connection interface for various networking devices. The storage interface 650 provides a connection interface for external storage devices such as SD cards, U-discs, and the like.
It will be appreciated by those skilled in the art that embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Heretofore, a determination method of text similarity, a determination apparatus of text similarity, and a computer-readable storage medium according to the present disclosure have been described in detail. In order to avoid obscuring the concepts of the present disclosure, some details known in the art are not described. How to implement the solutions disclosed herein will be fully apparent to those skilled in the art from the above description.
The methods and systems of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (12)

1. A method for determining text similarity comprises the following steps:
calculating the degree of correlation between each word in a first text and a second text by using word vectors to serve as a first similarity, wherein the number of words in the first text is smaller than that of words in the second text;
selecting a corresponding number of words from the second text as target words according to the number of words of the first text;
calculating the correlation degree of each target word and the first text by using the word vector as a second similarity;
calculating the comprehensive similarity of the first text and the second text according to the first similarity, the second similarity and the length of the second text;
the selecting the corresponding number of words in the second text as the target words according to the number of words in the first text comprises:
taking the difference value of the word number of the first text and the word number of the second text as a target number;
and selecting the words of the target number from the second text as the target words.
2. The determining method according to claim 1, wherein the calculating, using the word vector, a degree of relevance of each word in the first text to the second text as the first similarity includes:
calculating vector distances between word vectors of words in the first text and word vectors of words in the second text;
taking the smallest one of all the vector distances as a correlation coefficient of the word;
and determining the first similarity according to the weighted sum of the correlation coefficients of all words in the first text.
3. The determining method of claim 1, wherein the selecting the target number of words in the second text as the target words comprises:
and selecting the last N words in the second text as the target words, wherein N is the target number.
4. The determining method of claim 1, wherein the calculating, using the word vector, a degree of relevance of each target word to the first text as a second degree of similarity includes:
calculating vector distances between word vectors of target words in the second text and word vectors of words in the first text;
taking the smallest one of all the vector distances as a correlation coefficient of the target word;
and determining the second similarity according to the weighted sum of the correlation coefficients of all target words in the second text.
5. The method for determining according to any one of claims 1 to 4, wherein,
the integrated similarity is positively correlated with a weighted sum of the first similarity and the second similarity, and negatively correlated with the length of the second text.
6. A text similarity determining device includes:
the computing unit is used for computing the degree of correlation between each word in a first text and a second text by using word vectors to be used as a first similarity, wherein the number of words in the first text is smaller than that of words in the second text, computing the degree of correlation between each target word and the first text by using word vectors to be used as a second similarity, and computing the comprehensive similarity of the first text and the second text according to the first similarity, the second similarity and the length of the second text;
the selecting unit is used for selecting the corresponding number of words in the second text as the target words according to the number of words in the first text, and selecting the difference between the number of words in the first text and the number of words in the second text as the target number and selecting the target number of words in the second text as the target words.
7. The determining apparatus according to claim 6, wherein,
the calculating unit calculates a vector distance between a word vector of a word in the first text and a word vector of each word in the second text, takes the smallest one of all the vector distances as a correlation coefficient of the word, and determines the first similarity according to a weighted sum of the correlation coefficients of all the words in the first text.
8. The determining apparatus according to claim 6, wherein,
and the selecting unit selects the last N words in the second text as the target words, wherein N is the target number.
9. The determining apparatus according to claim 6, wherein,
the calculating unit calculates a vector distance between a word vector of the target word in the second text and a word vector of each word in the first text, takes the smallest one of all the vector distances as a correlation coefficient of the target word, and determines the second similarity according to a weighted sum of the correlation coefficients of all the target words in the second text.
10. The determining apparatus according to any one of claims 6 to 9, wherein,
the integrated similarity is positively correlated with a weighted sum of the first similarity and the second similarity, and negatively correlated with the length of the second text.
11. A text similarity determining device includes:
a memory; and
a processor coupled to the memory, the processor configured to perform the method of determining text similarity of any of claims 1-5 based on instructions stored in the memory device.
12. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of determining text similarity according to any of claims 1-5.
CN201811297685.5A 2018-11-02 2018-11-02 Text similarity determination method, device and computer readable storage medium Active CN111144104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811297685.5A CN111144104B (en) 2018-11-02 2018-11-02 Text similarity determination method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811297685.5A CN111144104B (en) 2018-11-02 2018-11-02 Text similarity determination method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111144104A CN111144104A (en) 2020-05-12
CN111144104B true CN111144104B (en) 2023-06-20

Family

ID=70515097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811297685.5A Active CN111144104B (en) 2018-11-02 2018-11-02 Text similarity determination method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111144104B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8606779B2 (en) * 2006-09-14 2013-12-10 Nec Corporation Search method, similarity calculation method, similarity calculation, same document matching system, and program thereof
US9311390B2 (en) * 2008-01-29 2016-04-12 Educational Testing Service System and method for handling the confounding effect of document length on vector-based similarity scores
JP4735726B2 (en) * 2009-02-18 2011-07-27 ソニー株式会社 Information processing apparatus and method, and program
US10176251B2 (en) * 2015-08-31 2019-01-08 Raytheon Company Systems and methods for identifying similarities using unstructured text analysis
CN105955948B (en) * 2016-04-22 2018-07-24 武汉大学 A kind of short text theme modeling method based on semanteme of word similarity
CN106776559B (en) * 2016-12-14 2020-08-11 东软集团股份有限公司 Text semantic similarity calculation method and device
CN106708804A (en) * 2016-12-27 2017-05-24 努比亚技术有限公司 Method and device for generating word vectors
CN106980870B (en) * 2016-12-30 2020-07-28 中国银联股份有限公司 Method for calculating text matching degree between short texts
CN108009152A (en) * 2017-12-04 2018-05-08 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device of the text similarity analysis based on Spark-Streaming

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于隐马尔科夫模型与语义融合的文本分类;高知新;《计算机应用与软件》;全文 *

Also Published As

Publication number Publication date
CN111144104A (en) 2020-05-12

Similar Documents

Publication Publication Date Title
WO2018120889A1 (en) Input sentence error correction method and device, electronic device, and medium
CN110036399B (en) Neural network data entry system
CN108710613B (en) Text similarity obtaining method, terminal device and medium
CN108132931B (en) Text semantic matching method and device
JP2019526142A (en) Search term error correction method and apparatus
CN110298035B (en) Word vector definition method, device, equipment and storage medium based on artificial intelligence
TW201835818A (en) Implementing neural networks in fixed point arithmetic computing systems
WO2018121531A1 (en) Method and apparatus for generating test case script
CN112580324B (en) Text error correction method, device, electronic equipment and storage medium
US20170004820A1 (en) Method for building a speech feature library, and method, apparatus, device, and computer readable storage media for speech synthesis
WO2019028990A1 (en) Code element naming method, device, electronic equipment and medium
US20180246856A1 (en) Analysis method and analysis device
GB2575580A (en) Supporting interactive text mining process with natural language dialog
JP6237632B2 (en) Text information monitoring dictionary creation device, text information monitoring dictionary creation method, and text information monitoring dictionary creation program
CN109885831B (en) Keyword extraction method, device, equipment and computer readable storage medium
CN112990625A (en) Method and device for allocating annotation tasks and server
CN114463551A (en) Image processing method, image processing device, storage medium and electronic equipment
CN111144104B (en) Text similarity determination method, device and computer readable storage medium
US11783129B2 (en) Interactive control system, interactive control method, and computer program product
JP2019074982A (en) Information search device, search processing method, and program
JP6427480B2 (en) IMAGE SEARCH DEVICE, METHOD, AND PROGRAM
CN111026879B (en) Multi-dimensional value-oriented intent-oriented object-oriented numerical calculation method
JPWO2019106758A1 (en) Language processing apparatus, language processing system, and language processing method
CN112966513A (en) Method and apparatus for entity linking
CN110633474B (en) Mathematical formula identification method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20200512

Assignee: Tianyiyun Technology Co.,Ltd.

Assignor: CHINA TELECOM Corp.,Ltd.

Contract record no.: X2024110000020

Denomination of invention: Method, device, and computer-readable storage medium for determining text similarity

Granted publication date: 20230620

License type: Common License

Record date: 20240315