A kind of Text similarity computing method, apparatus, electronic equipment and storage medium
Technical field
The present embodiments relate to technical field of data processing more particularly to a kind of Text similarity computing method, apparatus,
Electronic equipment and storage medium.
Background technique
Currently, the direct broadcasting room application program based on iOS platform or based on Android platform is quickly grown, it is deep by user
Like.Barrage is that a kind of popular expression way for information interchange and information sharing of platform is broadcast live, and passes through barrage
Interacting between spectators and main broadcaster may be implemented, help to build good live streaming atmosphere.
In machine conference field, one of important link is to find out and read statement semantic similarity highest time
It is multiple.Equally, it is frequently necessary to be directed to water by robot according to water friend's barrage, calculating and the higher reply of its similarity in direct broadcasting room
Friendly barrage is automatically replied.Currently, generalling use TF-IDF (Term Frequency-Inverse in direct broadcasting room
Document Frequency, word frequency is against text frequency) algorithm calculates the similarity between two barrages, and still, TF-IDF is calculated
The frequency distribution that the main thought of method is word-based or phrase occurs in document sets determines the keyword of every document, then
Word frequency vector is constructed according to the number that keyword occurs in document sets, the similarity between word frequency vector by calculating document
Determine the similarity between document, it is seen then that TF-IDF algorithm only accounts for the word frequency of word in document, only accounts for document in other words
The significance level of middle word.
Therefore, it in order to improve Text similarity computing precision, needs that existing similarity calculation algorithm is continued to improve.
Summary of the invention
The embodiment of the present invention provides a kind of Text similarity computing method, apparatus, electronic equipment and storage medium, passes through institute
The computational accuracy of text similarity can be improved in the method for stating.
To achieve the above object, the embodiment of the present invention adopts the following technical scheme that
In a first aspect, the embodiment of the invention provides a kind of Text similarity computing methods, which comprises
Based on the part of speech similarity between two texts of default part of speech weight calculation;
The text similarity between described two texts is calculated against text frequency TF-IDF algorithm based on improved word frequency;
The comprehensive similarity between described two texts is determined according to the part of speech similarity and the text similarity.
Further, the part of speech similarity based between two texts of default part of speech weight calculation includes:
The part of speech similarity between two texts is calculated according to following formula:
Wherein, Simwordpro(A, B) indicates the part of speech similarity between text A and text B, giIt indicates in text A i-th
The part of speech weight of word, g 'iIndicate the part of speech weight of i-th of word in text B, n indicates the phrase in word and text B in text A
At set in word sum, LAIndicate the sum of word in text A, LBIndicate the sum of word in text B.
Further, described to be calculated between described two texts based on improved word frequency against text frequency TF-IDF algorithm
Text similarity, comprising:
The corresponding TF-IDF weight of each word in each text is calculated according to following formula:
Wherein, WijIndicate the corresponding TF-IDF weight of word j in text i, tfijIndicate the number that word j occurs in text i,
N indicates the text for including in text set sum, njIndicate that the text sum in text set comprising word j, i are Text Flags, j is text
The mark of word in this;
The text phase between described two texts is calculated based on the corresponding TF-IDF weight of word each in described two texts
Like degree.
Further, described that described two texts are calculated based on the corresponding TF-IDF weight of word each in described two texts
Between text similarity, comprising:
The text similarity between described two texts is calculated according to following formula:
Wherein, Simtf-idf(A, B) indicates the text similarity between text A and text B, WaiIt indicates in text A i-th
The corresponding TF-IDF weight of word, WbiIndicate that the corresponding TF-IDF weight of i-th of word in text B, n indicate word and text in text A
The sum of word in the set of word composition in this B.
Further, described to be determined between described two texts according to the part of speech similarity and the text similarity
Comprehensive similarity, comprising:
The comprehensive similarity between described two texts is determined according to following formula:
Sim (A, B)=Simwordpro(A,B)*Simtf-idf(A,B)
Wherein, Sim (A, B) indicates the comprehensive similarity between text A and text B, Simwordpro(A, B) indicates text A
Part of speech similarity between text B, Simtf-idf(A, B) indicates the text similarity between text A and text B.
Further, the part of speech similarity based between two texts of default part of speech weight calculation or based on improving
Word frequency calculate the text similarity between described two texts against text frequency TF-IDF algorithm before, the method is also wrapped
It includes:
Participle and part-of-speech tagging processing are carried out to described two texts.
It is further, described that participle and part-of-speech tagging processing are carried out to described two texts, comprising:
Participle is carried out to described two texts using the jieba participle tool in python and part-of-speech tagging is handled.
Second aspect, the embodiment of the invention provides a kind of Text similarity computing device, described device includes:
Part of speech similarity calculation module, for based on the part of speech similarity between two texts of default part of speech weight calculation;
Text similarity calculation module, it is described two for being calculated based on improved word frequency against text frequency TF-IDF algorithm
Text similarity between text;
Comprehensive similarity computing module, it is described two for being determined according to the part of speech similarity and the text similarity
Comprehensive similarity between text.
Further, the part of speech similarity calculation module is specifically used for calculating between two texts according to following formula
Part of speech similarity:
Wherein, Simwordpro(A, B) indicates the part of speech similarity between text A and text B, giIt indicates in text A i-th
The part of speech weight of word, gi' indicate text B in i-th of word part of speech weight, n indicate text A in word and text B in phrase
At set in word sum, LAIndicate the sum of word in text A, LBIndicate the sum of word in text B.
Further, the text similarity calculation module includes:
TF-IDF weight computing unit, for calculating the corresponding TF-IDF of each word in each text according to following formula
Weight:
Wherein, WijIndicate the corresponding TF-IDF weight of word j in text i, tfijIndicate the number that word j occurs in text i,
N indicates the text for including in text set sum, njIndicate that the text sum in text set comprising word j, i are Text Flags, j is text
The mark of word in this;
Text similarity calculated, for calculating institute based on the corresponding TF-IDF weight of word each in described two texts
State the text similarity between two texts.
Further, the text similarity calculated is specifically used for:
The text similarity between described two texts is calculated according to following formula:
Wherein, Simtf-idf(A, B) indicates the text similarity between text A and text B, WaiIt indicates in text A i-th
The corresponding TF-IDF weight of word, WbiIndicate that the corresponding TF-IDF weight of i-th of word in text B, n indicate word and text in text A
The sum of word in the set of word composition in this B.
Further, the comprehensive similarity computing module is specifically used for:
The comprehensive similarity between described two texts is determined according to following formula:
Sim (A, B)=Simwordpro(A,B)*Simtf-idf(A,B)
Wherein, Sim (A, B) indicates the comprehensive similarity between text A and text B, Simwordpro(A, B) indicates text A
Part of speech similarity between text B, Simtf-idf(A, B) indicates the text similarity between text A and text B.
Further, described device further include: processing module, for being based on default two texts of part of speech weight calculation described
Part of speech similarity between this is calculated between described two texts based on improved word frequency against text frequency TF-IDF algorithm
Text similarity before, to described two texts carry out participle and part-of-speech tagging handle.
Further, the processing module is specifically used for: using the jieba participle tool in python to described two texts
This carries out participle and part-of-speech tagging processing.
The third aspect the embodiment of the invention provides a kind of electronic equipment, including memory, processor and is stored in storage
On device and the computer program that can run on a processor, the processor realizes such as above-mentioned the when executing the computer program
Text similarity computing method described in one side.
Fourth aspect, the embodiment of the invention provides a kind of storage medium comprising computer executable instructions, the meters
Calculation machine executable instruction realizes the Text similarity computing side as described in above-mentioned first aspect when being executed as computer processor
Method.
A kind of Text similarity computing method provided in an embodiment of the present invention, by combining the part of speech similarity between text
And text similarity carries out overall merit to the similarity between text, improves the computational accuracy of text similarity, in turn
Improve the matching accuracy of Similar Text.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, institute in being described below to the embodiment of the present invention
Attached drawing to be used is needed to be briefly described, it should be apparent that, the accompanying drawings in the following description is only some implementations of the invention
Example, for those of ordinary skill in the art, without creative efforts, can also implement according to the present invention
The content of example and these attached drawings obtain other attached drawings.
Fig. 1 is a kind of Text similarity computing method flow schematic diagram that the embodiment of the present invention one provides;
Fig. 2 is a kind of Text similarity computing apparatus structure schematic diagram provided by Embodiment 2 of the present invention;
Fig. 3 is the structural schematic diagram for a kind of electronic equipment that the embodiment of the present invention three provides.
Specific embodiment
To keep the technical problems solved, the adopted technical scheme and the technical effect achieved by the invention clearer, below
It will the technical scheme of the embodiment of the invention will be described in further detail in conjunction with attached drawing, it is clear that described embodiment is only
It is a part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those skilled in the art exist
Every other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.
Embodiment one
Fig. 1 is a kind of Text similarity computing method flow schematic diagram that the embodiment of the present invention one provides.The present embodiment is public
The Text similarity computing method opened is suitable for machine conference field, matches from corpus similar to read statement semanteme
Highest answer sentence is spent, to be replied automatically for read statement, is particularly suitable for matching in direct broadcasting room and water friend
The highest sentence of barrage similarity, so that water friend's barrage is automatically replied by robot.The Text similarity computing method
It can be executed by Text similarity computing device, wherein the device can be implemented by software and/or hardware, and be typically integrated in end
In end, such as server etc..Referring specifically to shown in Fig. 1, this method comprises the following steps:
110, based on the part of speech similarity between two texts of default part of speech weight calculation.
Wherein, the part of speech specifically includes: noun, verb, interrogative, adjective and adverbial word etc..It is right in two texts
The part of speech of word is answered to react the similitude of two texts to a certain extent, therefore, by combining the part of speech phase between two texts
The text similarity between two texts is calculated like degree, the computational accuracy of text similarity can be improved.
Illustratively, according to following formula based on the part of speech similarity between two texts of default part of speech weight calculation:
Wherein, Simwordpro(A, B) indicates the part of speech similarity between text A and text B, giIt indicates in text A i-th
The part of speech weight of word, g 'iIndicate the part of speech weight of i-th of word in text B, n indicates the phrase in word and text B in text A
At set in word sum, LAIndicate the sum of word in text A, LBThe sum for indicating word in text B, when i is greater than LAWhen, gi
=0, when i is greater than LBWhen, g 'i=0, concrete meaning may refer to subsequent illustration.By the way that the denominator of formula (1) is arranged
ForThe case where avoidable denominator is zero improves the scope of application of formula (1).
The default part of speech weight is pressed in combination with specific business scenario by the text to part of speech similarity known to multiple groups
It is calculated according to above-mentioned formula (1), the anti-weight for releasing corresponding part of speech.In general, the noun and verb in sentence can express sentence
It is most of semantic, i.e. the meaning that indicates in sentence of noun and verb is relatively heavy, and hence it is also possible to according to business experience,
By the relatively high of noun and verb corresponding part of speech weight setting, and by the relatively low of the weight of other parts of speech setting.It is excellent
Choosing, when specific business scenario is in the barrage text sent based on direct broadcasting room platform, the part of speech weight of noun can be
Value between 0.7-0.8, the part of speech weight of verb can between 0.6-07 value, the part of speech weight of interrogative can be in 0.5-0.6
Between value, for the present embodiment with the part of speech weight of noun for 0.7, the part of speech weight of verb is 0.6, and the part of speech weight of interrogative is
0.5, the part of speech weight of other words is illustrated for being 0.
Assuming that text A are as follows: I wants to go to Beijing and study in college;
Text B are as follows: university of Pekinese is very joyful;
After carrying out participle and part-of-speech tagging processing to text A and text B, obtain:
A=I/n wants to go to the/Beijing adv/n reading/v university/n
The Beijing B=/n/adv university/n is true/and adj is joyful/adj
The set of word composition in word and text B in text A are as follows: I, wants to go to, Beijing, reads, university, it is very, good
Play, therefore, n is equal to 8, the corresponding part of speech weight of each word in set are as follows: U={ 0.7,0,0.7,0.6,0.7,0,0,0 }.Cause
The corresponding part of speech weight g of each word in this text Ai={ 0.7,0,0.7,0.6,0.7,0,0,0 }, each word is corresponding in text B
Part of speech weight g 'i={ 0.7,0,0.7,0,0,0,0,0 }.It include five words in text B due to including five words in text A,
Therefore LA=5, LB=5.
Therefore, the part of speech similarity between text A and text B is calculated based on above-mentioned formula (1)
120, the text calculated between described two texts against text frequency TF-IDF algorithm based on improved word frequency is similar
Degree.
Specifically, described calculate the text between described two texts against text frequency TF-IDF algorithm based on improved word frequency
Word similarity, comprising:
The corresponding TF-IDF weight of each word in each text is calculated according to following formula:
Wherein, WijIndicate the corresponding TF-IDF weight of word j in text i, tfijIndicate the number that word j occurs in text i,
N indicates the text for including in text set sum, njIndicate that the text sum in text set comprising word j, i are Text Flags, j is text
The mark of word in this;
The text phase between described two texts is calculated based on the corresponding TF-IDF weight of word each in described two texts
Like degree.
For specific business scenario, a corpus relevant to specific transactions scene can be arranged in advance, for example, institute
State specific transactions scene are as follows: the text similarity between the barrage text sent for No. 1 direct broadcasting room is calculated, due to each live streaming
Between live content it is different, cause the ownership theme of different direct broadcasting rooms different, therefore, the barrage text sent for different direct broadcasting rooms
The domain term for including in this is not quite similar.For example, the corresponding main broadcaster a of No. 1 direct broadcasting room is especially good at and plays games, it is particularly good at and beats
" king's honor ", therefore, the frequent live game video of No. 1 direct broadcasting room, the ownership theme of No. 1 direct broadcasting room then may be defined as often straight
The game name broadcast, such as " king's honor ", or content relevant to game episode, such as the person names in game, dress
Standby title or Mission Objective etc., such as often the ownership theme of No. 1 direct broadcasting room of live streaming " king's honor " can also be " flowers and trees
It is blue ", " ermine cicada " or " Lu Na " etc..Necessarily comprising much and live game in the barrage text then sent for No. 1 direct broadcasting room
Relevant domain term, at this point, can will then be directed to all barrage texts of No. 1 direct broadcasting room transmission in set period of time as the spy
Determine the corpus under business scenario comprising domain term, is trained based on the corpus using above-mentioned formula (2), it is specific to obtain this
TF-IDF weight vector space under business scenario, i.e., the vector space of the corresponding TF-IDF weight composition of each domain term.Then
For each of No. 1 direct broadcasting room barrage text to be matched, then in the TF-IDF weight vector space under the specific transactions scene
A point or a vector, therefore, available barrage text reflecting in the TF-IDF weight vector space to be matched
It penetrates, the TF-IDF weight of each word in barrage text to be matched can be obtained.
By to existing TF-IDF weight calculation formulaIt improves, it is provided in this embodiment
TF-IDF weight calculation formulaIt may be implemented to treat the neologisms in matched text and carry out TF-IDF
Weight calculation, and then realize and the similitude of the text to be matched comprising neologisms is matched, the neologisms refer in TF-IDF corpus
Or the word not included in TF-IDF dictionary, it therefore, in text set include the text sum n of neologismsj=0, existing TF-IDF power
Re-computation formulaIt can not then adapt to the case where neologisms occur in text to be matched.
Further, described that described two texts are calculated based on the corresponding TF-IDF weight of word each in described two texts
Between text similarity, comprising:
The text similarity between described two texts is calculated according to following formula:
Wherein, Simtf-idf(A, B) indicates the text similarity between text A and text B, WaiIt indicates in text A i-th
The corresponding TF-IDF weight of word, WbiIndicate that the corresponding TF-IDF weight of i-th of word in text B, n indicate word and text in text A
The sum of word in the set of word composition in this B.
In TF-IDF algorithm, if the frequency tf higher that some word or phrase occur in a text, but in text
The frequency occurred in other texts of collection is very low, then it is assumed that the word or phrase have good class discrimination ability, are adapted to
Classification, the word or phrase can be used as keyword, and assign higher tf-idf weight for keyword, and therefore, the tf-idf of word is weighed
Weight increases with the increase of word frequency rate, increases with the increase of the rare degree of word.Each text is calculated using above-mentioned formula (2)
The corresponding TF-IDF weight of each word in this, then each text can be expressed as the real-valued vectors based on TF-IDF weight,
Then the length of the corresponding real-valued vectors of each text is normalized, so that the length one of the corresponding real-valued vectors of each text
It causes, the cosine similarity of the corresponding real-valued vectors of every two text is finally calculated based on above-mentioned formula (3), which is
For the text similarity between two texts.
It should be noted that not limiting sequencing between step 110 and step 120, step 120 can be preferentially executed,
Step 110 can also be preferentially executed, this implementation is illustrated for preferentially executing step 110, and but not is to step 110
With the restriction of the execution sequence of step 120.
Further, the part of speech similarity based between two texts of default part of speech weight calculation or based on improving
Word frequency calculate the text similarity between described two texts against text frequency TF-IDF algorithm before, the method is also wrapped
It includes:
Participle and part-of-speech tagging processing are carried out to described two texts, specifically, using the jieba in python points
Word tool carries out participle to described two texts and part-of-speech tagging is handled, and the present embodiment is no longer described in detail.
130, determine that the synthesis between described two texts is similar according to the part of speech similarity and the text similarity
Degree.
Illustratively, described to be determined between described two texts according to the part of speech similarity and the text similarity
Comprehensive similarity, comprising:
The comprehensive similarity between described two texts is determined according to following formula:
Sim (A, B)=Simwordpro(A,B)*Simtf-idf(A,B) (4)
Wherein, Sim (A, B) indicates the comprehensive similarity between text A and text B, Simwordpro(A, B) indicates text A
Part of speech similarity between text B, Simtf-idf(A, B) indicates the text similarity between text A and text B.
Continue to be exemplified as example with above-mentioned, it is assumed that text A are as follows: I wants to go to Beijing and study in college;
Text B are as follows: university of Pekinese is very joyful;
After carrying out participle and part-of-speech tagging processing to text A and text B, obtain:
A=I/n wants to go to the/Beijing adv/n reading/v university/n
The Beijing B=/n/adv university/n is true/and adj is joyful/adj
Text A, text B are respectively as follows: in the mapping of the vector space of TF-IDF
Wai={ 0.1,0.2,0.3,0.1,0.6,0.1,0.1,0.1 }
Wbi={ 0.1,0.2,0.5,0.2,0.6,0.3,0.4,0.3 }
The text similarity between text A and text B is then obtained according to above-mentioned formula (3) are as follows:
Text similarity Sim between two textstf-idfThe value range of (A, B)=cos θ is [- 1,1], is calculated
Value closer to 1, indicate that the text similarity between two texts is higher, i.e. the semanteme of two texts is closer.
The comprehensive similarity between text A and text B is further obtained according to above-mentioned formula (4):
Sim (A, B)=Simwordpro(A,B)*Simtf-idf(A, B)=0.458*0.907=0.415
As it can be seen that the text similarity 0.907 between text A and text B is very high, if only by between text A and text B
Text similarity determines the semantic similarity between text A and text B, then will appear biggish deviation, and accuracy is not high;And lead to
Cross the scheme of the present embodiment it is found that the comprehensive similarity between text A and text B be not it is very high, illustrate text A and text B
Semanteme be not it is much like, be consistent with actual conditions, therefore, the present embodiment, which passes through, combines the part of speech between two texts similar
Degree and text similarity evaluate the comprehensive similarity between two texts, improve semantic similar between two texts
The computational accuracy of degree, and then improve the matching accuracy of Similar Text.
A kind of Text similarity computing method provided in this embodiment, by based on default two texts of part of speech weight calculation
Between part of speech similarity;The text between described two texts is calculated against text frequency TF-IDF algorithm based on improved word frequency
Similarity;The skill of the comprehensive similarity between described two texts is determined according to the part of speech similarity and the text similarity
Art means realize the computational accuracy for improving semantic similarity between two texts, and then the matching for improving Similar Text is accurate
The purpose of degree.
Embodiment two
Fig. 2 is a kind of Text similarity computing apparatus structure schematic diagram provided by Embodiment 2 of the present invention.Institute referring to fig. 2
Show, described device includes: that part of speech similarity calculation module 210, text similarity calculation module 220 and comprehensive similarity calculate mould
Block 230;
Wherein, part of speech similarity calculation module 210, for based on the part of speech between two texts of default part of speech weight calculation
Similarity;
Text similarity calculation module 220, for being based on improved word frequency against described in the calculating of text frequency TF-IDF algorithm
Text similarity between two texts;
Comprehensive similarity computing module 230, for according to the part of speech similarity and text similarity determination
Comprehensive similarity between two texts.
Further, part of speech similarity calculation module 210 is specifically used for calculating between two texts according to following formula
Part of speech similarity:
Wherein, Simwordpro(A, B) indicates the part of speech similarity between text A and text B, giIt indicates in text A i-th
The part of speech weight of word, gi' indicate text B in i-th of word part of speech weight, n indicate text A in word and text B in phrase
At set in word sum, LAIndicate the sum of word in text A, LBIndicate the sum of word in text B.
Further, text similarity calculation module 220 includes:
TF-IDF weight computing unit, for calculating the corresponding TF-IDF of each word in each text according to following formula
Weight:
Wherein, WijIndicate the corresponding TF-IDF weight of word j in text i, tfijIndicate the number that word j occurs in text i,
N indicates the text for including in text set sum, njIndicate that the text sum in text set comprising word j, i are Text Flags, j is text
The mark of word in this;
Text similarity calculated, for calculating institute based on the corresponding TF-IDF weight of word each in described two texts
State the text similarity between two texts.
Further, the text similarity calculated is specifically used for:
The text similarity between described two texts is calculated according to following formula:
Wherein, Simtf-idf(A, B) indicates the text similarity between text A and text B, WaiIt indicates in text A i-th
The corresponding TF-IDF weight of word, WbiIndicate that the corresponding TF-IDF weight of i-th of word in text B, n indicate word and text in text A
The sum of word in the set of word composition in this B.
Further, comprehensive similarity computing module 230 is specifically used for:
The comprehensive similarity between described two texts is determined according to following formula:
Sim (A, B)=Simwordpro(A,B)*Simtf-idf(A,B)
Wherein, Sim (A, B) indicates the comprehensive similarity between text A and text B, Simwordpro(A, B) indicates text A
Part of speech similarity between text B, Simtf-idf(A, B) indicates the text similarity between text A and text B.
Further, described device further include: processing module, for being based on default two texts of part of speech weight calculation described
Part of speech similarity between this is calculated between described two texts based on improved word frequency against text frequency TF-IDF algorithm
Text similarity before, to described two texts carry out participle and part-of-speech tagging handle.
Further, the processing module is specifically used for: using the jieba participle tool in python to described two texts
This carries out participle and part-of-speech tagging processing.
A kind of Text similarity computing device provided in this embodiment, by based on default two texts of part of speech weight calculation
Between part of speech similarity;The text between described two texts is calculated against text frequency TF-IDF algorithm based on improved word frequency
Similarity;The skill of the comprehensive similarity between described two texts is determined according to the part of speech similarity and the text similarity
Art means realize the computational accuracy for improving semantic similarity between two texts, and then the matching for improving Similar Text is accurate
The purpose of degree.
Embodiment three
Fig. 3 is the structural schematic diagram for a kind of electronic equipment that the embodiment of the present invention three provides.As shown in figure 3, the electronics is set
It is standby to include: processor 670, memory 671 and be stored in the computer journey that run on memory 671 and on processor 670
Sequence;Wherein, the quantity of processor 670 can be one or more, in Fig. 3 by taking a processor 670 as an example;Processor 670 is held
The Text similarity computing method as described in above-described embodiment one is realized when the row computer program.As shown in figure 3, described
Electronic equipment can also include input unit 672 and output device 673.Processor 670, memory 671,672 and of input unit
Output device 673 can be connected by bus or other modes, in Fig. 3 for being connected by bus.
Memory 671 is used as a kind of computer readable storage medium, can be used for storing software program, journey can be performed in computer
Sequence and module, as in the embodiment of the present invention Text similarity computing device/module (for example, in Text similarity computing device
Part of speech similarity calculation module 210, text similarity calculation module 220 and comprehensive similarity computing module 230 etc.).Processing
Software program, instruction and the module that device 670 is stored in memory 671 by operation, thereby executing the various of electronic equipment
Above-mentioned Text similarity computing method is realized in functional application and data processing.
Memory 671 can mainly include storing program area and storage data area, wherein storing program area can store operation system
Application program needed for system, at least one function;Storage data area, which can be stored, uses created data etc. according to terminal.This
Outside, memory 671 may include high-speed random access memory, can also include nonvolatile memory, for example, at least one
Disk memory, flush memory device or other non-volatile solid state memory parts.In some instances, memory 671 can be into one
Step includes the memory remotely located relative to processor 670, these remote memories can be set by network connection to electronics
Standby/storage medium.The example of above-mentioned network include but is not limited to internet, intranet, local area network, mobile radio communication and its
Combination.
Input unit 672 can be used for receiving the number or character information of input, and generates and set with the user of electronic equipment
It sets and the related key signals of function control inputs.Output device 673 may include that display screen etc. shows equipment.
Example IV
The embodiment of the present invention four also provides a kind of storage medium comprising computer executable instructions, and the computer can be held
Row instruction is used to execute a kind of Text similarity computing method when being executed by computer processor, this method comprises:
Based on the part of speech similarity between two texts of default part of speech weight calculation;
The text similarity between described two texts is calculated against text frequency TF-IDF algorithm based on improved word frequency;
The comprehensive similarity between described two texts is determined according to the part of speech similarity and the text similarity.
Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention
The method operation that executable instruction is not limited to the described above, it is similar to can also be performed text provided by any embodiment of the invention
Degree calculates relevant operation.
By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention
It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more
Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art
Part can be embodied in the form of software products, which can store in computer readable storage medium
In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer
Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions are with so that a computer is set
Standby (can be personal computer, storage medium or the network equipment etc.) executes described in each embodiment of the present invention.
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.