CN111144104A - Text similarity determination method and device and computer readable storage medium - Google Patents

Text similarity determination method and device and computer readable storage medium Download PDF

Info

Publication number
CN111144104A
CN111144104A CN201811297685.5A CN201811297685A CN111144104A CN 111144104 A CN111144104 A CN 111144104A CN 201811297685 A CN201811297685 A CN 201811297685A CN 111144104 A CN111144104 A CN 111144104A
Authority
CN
China
Prior art keywords
text
words
similarity
word
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811297685.5A
Other languages
Chinese (zh)
Other versions
CN111144104B (en
Inventor
路绪海
马怡安
黄挺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Corp Ltd
Original Assignee
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Corp Ltd filed Critical China Telecom Corp Ltd
Priority to CN201811297685.5A priority Critical patent/CN111144104B/en
Publication of CN111144104A publication Critical patent/CN111144104A/en
Application granted granted Critical
Publication of CN111144104B publication Critical patent/CN111144104B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The disclosure relates to a method and a device for determining text similarity and a computer readable storage medium, and relates to the technical field of artificial intelligence. The method comprises the following steps: calculating the correlation degree of each word in the first text and the second text by using the word vector as a first similarity, wherein the number of the words in the first text is less than that of the words in the second text; selecting a corresponding number of words from the second text as target words according to the number of the words of the first text; calculating the degree of correlation between each target word and the first text by using the word vector as a second similarity; and calculating the comprehensive similarity of the first text and the second text according to the first similarity, the second similarity and the length of the second text. The technical scheme of the text similarity improving method and device can improve accuracy of text similarity.

Description

Text similarity determination method and device and computer readable storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method, an apparatus, and a computer-readable storage medium for determining text similarity.
Background
In the field of artificial intelligence, the calculation of text similarity is a typical application of weak artificial intelligence and is the basis for interaction between a robot and a user. How to determine the similarity of texts is one of the popular research directions in the field.
In the related art, words in the target text are compared with words in the comparison text one by one to determine the text similarity.
Disclosure of Invention
The inventors of the present disclosure found that the following problems exist in the above-described related art: the method is greatly influenced by the text length, and particularly in the case that the text length is different or the texts have inclusion relations, the determined text similarity is low in accuracy.
In view of this, the present disclosure provides a technical solution for determining text similarity, which can improve the accuracy of text similarity.
According to some embodiments of the present disclosure, there is provided a text similarity determination method including: calculating the correlation degree of each word in the first text and the second text by using a word vector to serve as a first similarity, wherein the number of the words in the first text is smaller than that of the words in the second text; according to the number of the words of the first text, selecting a corresponding number of words from the second text as target words; calculating the degree of correlation between each target word and the first text by using the word vector to serve as a second similarity; and calculating the comprehensive similarity of the first text and the second text according to the first similarity, the second similarity and the length of the second text.
In some embodiments, a vector distance between a word vector of a word in the first text and a word vector of each word in the second text is calculated; taking the minimum one of all the vector distances as a correlation coefficient of the word; and determining the first similarity according to the weighted sum of the correlation coefficients of all the words in the first text.
In some embodiments, a difference between the number of words of the first text and the number of words of the second text is taken as a target number; selecting the target number of words in the second text as the target words.
In some embodiments, the last N words are selected as the target words in the second text, where N is the target number.
In some embodiments, calculating a vector distance between a word vector of a target word in the second text and a word vector of each word in the first text; taking the minimum one of all the vector distances as a correlation coefficient of the target word; and determining the second similarity according to the weighted sum of the correlation coefficients of all target words in the second text.
In some embodiments, the integrated similarity is positively correlated with a weighted sum of the first similarity and the second similarity, and negatively correlated with a length of the second text.
According to other embodiments of the present disclosure, a calculating unit is provided, configured to calculate, using a word vector, a degree of correlation between each word in a first text and a second text as a first similarity, where the number of words in the first text is smaller than the number of words in the second text, calculate, using the word vector, a degree of correlation between each target word and the first text as a second similarity, and calculate, according to the first similarity, the second similarity, and a length of the second text, a comprehensive similarity between the first text and the second text; and the selecting unit is used for selecting words with corresponding number in the second text as the target words according to the number of the words in the first text.
In some embodiments, the calculation unit calculates a vector distance between a word vector of a word in the first text and a word vector of each word in the second text, takes the smallest one of all the vector distances as a correlation coefficient of the word, and determines the first similarity according to a weighted sum of the correlation coefficients of all the words in the first text.
In some embodiments, the selecting unit selects a difference between the number of words in the first text and the number of words in the second text as a target number, and selects the target number of words in the second text as the target word.
In some embodiments, the selecting unit selects the last N words in the second text as the target words, where N is the target number.
In some embodiments, the calculation unit calculates a vector distance between a word vector of a target word in the second text and a word vector of each word in the first text, takes the smallest one of all the vector distances as a correlation coefficient of the target word, and determines the second similarity according to a weighted sum of the correlation coefficients of all the target words in the second text.
In some embodiments, the integrated similarity is positively correlated with a weighted sum of the first similarity and the second similarity, and negatively correlated with a length of the second text.
According to still other embodiments of the present disclosure, there is provided a device for determining text similarity, including: a memory; and a processor coupled to the memory, the processor configured to perform the method for determining text similarity in any of the above embodiments based on instructions stored in the memory device.
According to still further embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of determining text similarity in any of the above embodiments.
In the above embodiment, not only the degree of correlation between the words in the short text and the long text is considered, but also the degree of correlation between a corresponding number of words in the long text and the short text is considered according to the length difference between the texts, so as to obtain the comprehensive similarity, and the comprehensive similarity is processed according to the text length. Therefore, the adaptability of the method to the text length can be enhanced, unstable or inaccurate calculation caused by text length difference is avoided, and the accuracy of text similarity is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:
FIG. 1 illustrates a flow diagram of some embodiments of a text similarity determination method of the present disclosure;
FIG. 2 illustrates a flow diagram of some embodiments of step 110 of FIG. 1;
FIG. 3 illustrates a flow diagram of some embodiments of step 120 of FIG. 1;
FIG. 4 illustrates a block diagram of some embodiments of a text similarity determination apparatus of the present disclosure;
FIG. 5 shows a block diagram of further embodiments of a text similarity determination apparatus of the present disclosure;
fig. 6 shows a block diagram of further embodiments of the text similarity determination apparatus of the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.
In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.
Fig. 1 illustrates a flow diagram of some embodiments of a text similarity determination method of the present disclosure.
As shown in fig. 1, the method includes: step 110, calculating a first similarity; step 120, selecting a target word; step 130, calculating a second similarity; and step 140, calculating the comprehensive similarity.
In step 110, the word vectors are used to calculate the degree of correlation between each word in the first text and the second text as a first similarity, and the number of words in the first text is smaller than that in the second text.
In some embodiments, Word segmentation processing may be performed on the first text and the second text to obtain each Word in the first text and the second text, and then a Word vector of each Word is calculated by using a skip-gram model of Word2 vec. On one hand, the calculation efficiency can be improved by taking the shorter text as a processing object, and on the other hand, the correlation degree of the words in the longer text to the shorter text can be further obtained according to the text length difference in the subsequent steps, so that the accuracy of the similarity is improved.
In some embodiments, step 110 may be performed by the embodiment in fig. 2.
FIG. 2 illustrates a flow diagram of some embodiments of step 110 of FIG. 1.
As shown in fig. 2, step 110 includes: step 1110, calculating each vector distance; step 1120, determining a correlation coefficient; and step 1130, determining the first similarity.
In step 1110, a vector distance between the word vector of the word in the first text and the word vector of each word in the second text is calculated. For example, the first text contains L words in total: w is a1,1、w1,2……w1,l……w1,LThe corresponding word vector is: v. of1,1、v1,2……v1,l……v1,L(ii) a The second text contains M words in total: w is a2,1、w2,2……w2,m……w1,MThe corresponding word vector is: v. of2,1、v2,2……v2,m……v2,M
In some embodiments, w may be calculated separately1,1Word vector v1,1And v2,1、v2,2……v2,m……v2,MHas a Euclidean distance d between1,1、d1,2……d1,m……d1,M
In step 1120, the smallest of all vector distances is taken as the correlation coefficient for the word. For example, W11Has a correlation coefficient of d1=min(d1,1、d1,2……d1,m……d1,M). Further, the correlation coefficients of the L words in the first text can be obtained: d1、d2……dl……dL
In step 1130, a first similarity is determined based on a weighted sum of the correlation coefficients of all words in the first text. For example, the first similarity may be determined as
Figure BDA0001851574870000051
D can also be adjusted according to requirementslAnd summing after weighting.
After the first similarity is determined, the overall similarity may be calculated by the remaining steps in fig. 1.
In step 120, according to the number of words in the first text, a corresponding number of words are selected as target words in the second text. Step 120 may be performed, for example, by the embodiment in fig. 3.
Fig. 3 illustrates a flow diagram of some embodiments of step 120 of fig. 1.
As shown in fig. 3, step 120 includes: step 1210, determining a target number; step 1220, select the target word.
In step 1210, the difference between the number of words of the first text and the number of words of the second text is taken as a target number.
In step 1220, a target number of words are selected in the second text as target words. For example, the last N words may be selected as target words in the second text, where N is the target number.
In some embodiments, there are 10 words in the first text and 50 words in the second text, and the 11 th to 50 th words in the second text may be selected as the target words. And a corresponding number of target words can be randomly selected according to the requirement.
Under the condition that the longer second text contains words in the shorter first text, but the meaning of the second text is completely different from that of the first text, the selected target words can more accurately express the true meaning of the second text, so that the accuracy of the determined text similarity is improved.
After the target word is selected, the comprehensive similarity can be calculated through the other steps in fig. 1.
In step 130, the degree of correlation between each target word and the first text is calculated as a second similarity using the word vector. For example, a vector distance between the word vector of the target word in the second text and the word vector of each word in the first text is calculated. And taking the smallest one of all vector distances as the correlation coefficient of the target word. And determining the second similarity according to the weighted sum of the correlation coefficients of all the target words in the second text.
In some embodiments, N words in the second text are selected as target words, where N is M-L. The N target words may be the L +1 th word to the bottom L + N words in the second text: w is a2,L+1、w2,L+2……w1,L+NThe corresponding word vector is: v. of2,L+1、v2,L+2……v2,L+N. Separately calculate w2,L+1Word vector v2,L+1And v1,1、v1,2……v1,l……v1,LHas a Euclidean distance d betweenL+1,1、dL+1,2……dL+1,L
The smallest of all vector distances may be taken as the correlation coefficient for the target word. For example, w2,L+1Has a correlation coefficient of dL+1=min(dL+1,1、dL+1,2……dL+1,L). Further, the correlation coefficients of the total N target words in the second text can be obtained: dL+1、dL+2……dL+N
And determining the second similarity according to the weighted sum of the correlation coefficients of all the target words in the second text. For example, the second similarity may be determined as
Figure BDA0001851574870000071
D can also be adjusted according to requirementsL+nAnd summing after weighting.
In step 140, a comprehensive similarity of the first text and the second text is calculated according to the first similarity, the second similarity and the length of the second text. For example, the integrated similarity is positively correlated with a weighted sum of the first similarity and the second similarity, and negatively correlated with the length of the second text.
In some embodiments, where the relatively long second text contains M words in total, the overall similarity S may be determined according to the following formula:
S=(D1+D2)/M
in some embodiments, the synonym table and the near-sense table may be preset, the synonyms and the near-senses in the first text and the second text are determined according to the synonym table and the near-sense table, and the word vector distance of the synonyms and the near-senses is set to 0.
In the above embodiment, not only the degree of correlation between the words in the short text and the long text is considered, but also the degree of correlation between a corresponding number of words in the long text and the short text is considered according to the length difference between the texts, so as to obtain the comprehensive similarity, and the comprehensive similarity is processed according to the text length. Therefore, the adaptability of the method to the text length can be enhanced, unstable or inaccurate calculation caused by text length difference is avoided, and the accuracy of text similarity is improved.
Fig. 4 illustrates a block diagram of some embodiments of a text similarity determination apparatus of the present disclosure.
As shown in fig. 4, the text similarity determination device 4 includes a calculation unit 41 and a selection unit 42.
The calculation unit 41 calculates the degree of correlation of each word in the first text with the second text as the first similarity using the word vector, the number of words in the first text being smaller than the number of words in the second text.
In some embodiments, the calculation unit 41 calculates a vector distance between the word vector of the word in the first text and the word vector of each word in the second text. The calculation unit 41 takes the smallest one of all vector distances as the correlation coefficient for the word. The calculation unit 41 determines the first similarity from a weighted sum of the correlation coefficients of all words in the first text.
The selecting unit 42 selects a corresponding number of words in the second text as target words according to the number of words in the first text.
In some embodiments, the selecting unit 42 uses the difference between the number of words in the first text and the number of words in the second text as the target number N, and selects a target number of words in the second text as the target words. For example, the selecting unit 42 selects the last N words in the second text as the target words.
The calculation unit 41 calculates the degree of correlation of each target word with the first text as the second similarity using the word vector. For example, the calculation unit 41 calculates a vector distance between the word vector of the target word in the second text and the word vector of each word in the first text, takes the smallest one of all the vector distances as the correlation coefficient of the target word, and determines the second similarity degree according to a weighted sum of the correlation coefficients of all the target words in the second text.
The calculation unit 41 calculates the integrated similarity of the first text and the second text according to the first similarity, the second similarity, and the length of the second text. For example, the integrated similarity is positively correlated with a weighted sum of the first similarity and the second similarity, and negatively correlated with the length of the second text.
In the above embodiment, not only the degree of correlation between the words in the short text and the long text is considered, but also the degree of correlation between a corresponding number of words in the long text and the short text is considered according to the length difference between the texts, so as to obtain the comprehensive similarity, and the comprehensive similarity is processed according to the text length. Therefore, the adaptability of the method to the text length can be enhanced, unstable or inaccurate calculation caused by text length difference is avoided, and the accuracy of text similarity is improved.
Fig. 5 shows a block diagram of further embodiments of the text similarity determination apparatus of the present disclosure.
As shown in fig. 5, the text similarity determination device 5 of this embodiment includes: a memory 51 and a processor 52 coupled to the memory 51, the processor 52 being configured to execute the text similarity determination method in any one embodiment of the present disclosure based on instructions stored in the memory 51.
The memory 51 may include, for example, a system memory, a fixed nonvolatile storage medium, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), a database, and other programs.
Fig. 6 shows a block diagram of further embodiments of the text similarity determination apparatus of the present disclosure.
As shown in fig. 6, the text similarity determination device 6 of this embodiment includes: a memory 610 and a processor 620 coupled to the memory 610, wherein the processor 620 is configured to execute the text similarity determination method in any of the foregoing embodiments based on instructions stored in the memory 610.
The memory 610 may include, for example, system memory, fixed non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a BootLoader (BootLoader), and other programs.
The text similarity determination apparatus 6 may further include an input/output interface 630, a network interface 640, a storage interface 650, and the like. These interfaces 630, 640, 650 and the connections between the memory 610 and the processor 620 may be through a bus 660, for example. The input/output interface 630 provides a connection interface for input/output devices such as a display, a mouse, a keyboard, and a touch screen. The network interface 640 provides a connection interface for various networking devices. The storage interface 650 provides a connection interface for external storage devices such as an SD card and a usb disk.
As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable non-transitory storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
So far, the text similarity determination method, the text similarity determination apparatus, and the computer-readable storage medium according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.
The method and system of the present disclosure may be implemented in a number of ways. For example, the methods and systems of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, and firmware. The above-described order for the steps of the method is for illustration only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless specifically stated otherwise. Further, in some embodiments, the present disclosure may also be embodied as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the foregoing examples are for purposes of illustration only and are not intended to limit the scope of the present disclosure. It will be appreciated by those skilled in the art that modifications may be made to the above embodiments without departing from the scope and spirit of the present disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (14)

1. A text similarity determination method comprises the following steps:
calculating the correlation degree of each word in the first text and the second text by using a word vector to serve as a first similarity, wherein the number of the words in the first text is smaller than that of the words in the second text;
according to the number of the words of the first text, selecting a corresponding number of words from the second text as target words;
calculating the degree of correlation between each target word and the first text by using the word vector to serve as a second similarity;
and calculating the comprehensive similarity of the first text and the second text according to the first similarity, the second similarity and the length of the second text.
2. The determination method according to claim 1, wherein the calculating, as the first similarity, a degree of correlation of each word in the first text with the second text using the word vector comprises:
calculating a vector distance between a word vector of a word in the first text and a word vector of each word in the second text;
taking the minimum one of all the vector distances as a correlation coefficient of the word;
and determining the first similarity according to the weighted sum of the correlation coefficients of all the words in the first text.
3. The determination method according to claim 1, wherein the selecting a corresponding number of words as target words in the second text according to the number of words in the first text comprises:
taking a difference value between the number of words of the first text and the number of words of the second text as a target number;
selecting the target number of words in the second text as the target words.
4. The determination method of claim 3, wherein the selecting the target number of words as the target words in the second text comprises:
and selecting the last N words in the second text as the target words, wherein N is the target number.
5. The determination method according to claim 1, wherein the calculating, as the second similarity, the degree of correlation of each target word with the first text using the word vector comprises:
calculating a vector distance between a word vector of a target word in the second text and a word vector of each word in the first text;
taking the minimum one of all the vector distances as a correlation coefficient of the target word;
and determining the second similarity according to the weighted sum of the correlation coefficients of all target words in the second text.
6. The determination method according to any one of claims 1 to 5,
the integrated similarity is positively correlated with the weighted sum of the first similarity and the second similarity, and negatively correlated with the length of the second text.
7. A device for determining text similarity, comprising:
the calculation unit is used for calculating the correlation degree of each word in a first text and a second text as a first similarity by using a word vector, wherein the number of the words in the first text is smaller than that of the words in the second text, calculating the correlation degree of each target word and the first text as a second similarity by using the word vector, and calculating the comprehensive similarity of the first text and the second text according to the first similarity, the second similarity and the length of the second text;
and the selecting unit is used for selecting words with corresponding number in the second text as the target words according to the number of the words in the first text.
8. The determination apparatus according to claim 7,
the calculation unit calculates vector distances between word vectors of words in the first text and word vectors of words in the second text, takes the smallest one of all the vector distances as a correlation coefficient of the word, and determines the first similarity according to a weighted sum of the correlation coefficients of all the words in the first text.
9. The determination apparatus according to claim 7,
the selecting unit takes the difference value between the number of the words of the first text and the number of the words of the second text as a target number, and selects the words of the target number as the target words in the second text.
10. The determination apparatus according to claim 9,
the selecting unit selects the last N words in the second text as the target words, wherein N is the target number.
11. The determination apparatus according to claim 7,
the calculation unit calculates vector distances between word vectors of target words in the second text and word vectors of words in the first text, takes the smallest one of all the vector distances as a correlation coefficient of the target words, and determines the second similarity according to a weighted sum of the correlation coefficients of all the target words in the second text.
12. The determination apparatus according to any one of claims 7 to 11,
the integrated similarity is positively correlated with the weighted sum of the first similarity and the second similarity, and negatively correlated with the length of the second text.
13. A device for determining text similarity, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the method of determining text similarity of any of claims 1-6 based on instructions stored in the memory device.
14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of determining text similarity according to any one of claims 1 to 6.
CN201811297685.5A 2018-11-02 2018-11-02 Text similarity determination method, device and computer readable storage medium Active CN111144104B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811297685.5A CN111144104B (en) 2018-11-02 2018-11-02 Text similarity determination method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811297685.5A CN111144104B (en) 2018-11-02 2018-11-02 Text similarity determination method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111144104A true CN111144104A (en) 2020-05-12
CN111144104B CN111144104B (en) 2023-06-20

Family

ID=70515097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811297685.5A Active CN111144104B (en) 2018-11-02 2018-11-02 Text similarity determination method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111144104B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090190839A1 (en) * 2008-01-29 2009-07-30 Higgins Derrick C System and method for handling the confounding effect of document length on vector-based similarity scores
US20100023505A1 (en) * 2006-09-14 2010-01-28 Nec Corporation Search method, similarity calculation method, similarity calculation, same document matching system, and program thereof
CN101808210A (en) * 2009-02-18 2010-08-18 索尼公司 Messaging device, information processing method and program
CN105955948A (en) * 2016-04-22 2016-09-21 武汉大学 Short text topic modeling method based on word semantic similarity
US20170060995A1 (en) * 2015-08-31 2017-03-02 Raytheon Company Systems and methods for identifying similarities using unstructured text analysis
CN106708804A (en) * 2016-12-27 2017-05-24 努比亚技术有限公司 Method and device for generating word vectors
CN106776559A (en) * 2016-12-14 2017-05-31 东软集团股份有限公司 The method and device of text semantic Similarity Measure
CN106980870A (en) * 2016-12-30 2017-07-25 中国银联股份有限公司 Text matches degree computational methods between short text
CN108009152A (en) * 2017-12-04 2018-05-08 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device of the text similarity analysis based on Spark-Streaming

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100023505A1 (en) * 2006-09-14 2010-01-28 Nec Corporation Search method, similarity calculation method, similarity calculation, same document matching system, and program thereof
US20090190839A1 (en) * 2008-01-29 2009-07-30 Higgins Derrick C System and method for handling the confounding effect of document length on vector-based similarity scores
CN101808210A (en) * 2009-02-18 2010-08-18 索尼公司 Messaging device, information processing method and program
US20170060995A1 (en) * 2015-08-31 2017-03-02 Raytheon Company Systems and methods for identifying similarities using unstructured text analysis
CN105955948A (en) * 2016-04-22 2016-09-21 武汉大学 Short text topic modeling method based on word semantic similarity
CN106776559A (en) * 2016-12-14 2017-05-31 东软集团股份有限公司 The method and device of text semantic Similarity Measure
CN106708804A (en) * 2016-12-27 2017-05-24 努比亚技术有限公司 Method and device for generating word vectors
CN106980870A (en) * 2016-12-30 2017-07-25 中国银联股份有限公司 Text matches degree computational methods between short text
CN108009152A (en) * 2017-12-04 2018-05-08 陕西识代运筹信息科技股份有限公司 A kind of data processing method and device of the text similarity analysis based on Spark-Streaming

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ZHAO-MAN ZHONG: "Event-Based Text Similarity Computing", 《2009 INTERNATIONAL CONFERENCE ON MANAGEMENT AND SERVICE SCIENCE》 *
陈贤武: "基于语句相似度的主观试题自动阅卷模型研究", 《武汉大学学报(工学版)》, no. 07 *
高知新: "基于隐马尔科夫模型与语义融合的文本分类", 《计算机应用与软件》 *

Also Published As

Publication number Publication date
CN111144104B (en) 2023-06-20

Similar Documents

Publication Publication Date Title
TWI664540B (en) Search word error correction method and device, and weighted edit distance calculation method and device
WO2018120889A1 (en) Input sentence error correction method and device, electronic device, and medium
CN108132931B (en) Text semantic matching method and device
CN108345580B (en) Word vector processing method and device
CN106484777B (en) Multimedia data processing method and device
US10585583B2 (en) Method, device, and terminal apparatus for text input
US20110299737A1 (en) Vision-based hand movement recognition system and method thereof
US10311295B2 (en) Heuristic finger detection method based on depth image
CN109522564B (en) Voice translation method and device
WO2018121531A1 (en) Method and apparatus for generating test case script
WO2018166343A1 (en) Data fusion method and device, storage medium and electronic device
WO2019028990A1 (en) Code element naming method, device, electronic equipment and medium
CN108460098B (en) Information recommendation method and device and computer equipment
CN108596079B (en) Gesture recognition method and device and electronic equipment
CN112580324B (en) Text error correction method, device, electronic equipment and storage medium
US20210124976A1 (en) Apparatus and method for calculating similarity of images
GB2575580A (en) Supporting interactive text mining process with natural language dialog
JP6589639B2 (en) Search system, search method and program
US11783129B2 (en) Interactive control system, interactive control method, and computer program product
CN111144104B (en) Text similarity determination method, device and computer readable storage medium
CN112115715A (en) Natural language text processing method and device, storage medium and electronic equipment
JPWO2019106758A1 (en) Language processing apparatus, language processing system, and language processing method
JP6427480B2 (en) IMAGE SEARCH DEVICE, METHOD, AND PROGRAM
KR102433384B1 (en) Apparatus and method for processing texture image
CN110428814B (en) Voice recognition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20200512

Assignee: Tianyiyun Technology Co.,Ltd.

Assignor: CHINA TELECOM Corp.,Ltd.

Contract record no.: X2024110000020

Denomination of invention: Method, device, and computer-readable storage medium for determining text similarity

Granted publication date: 20230620

License type: Common License

Record date: 20240315

EE01 Entry into force of recordation of patent licensing contract