CN105320641A - Text checking method and user terminal - Google Patents

Text checking method and user terminal Download PDF

Info

Publication number
CN105320641A
CN105320641A CN201410370686.3A CN201410370686A CN105320641A CN 105320641 A CN105320641 A CN 105320641A CN 201410370686 A CN201410370686 A CN 201410370686A CN 105320641 A CN105320641 A CN 105320641A
Authority
CN
China
Prior art keywords
text
read
character string
fragments
snippet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410370686.3A
Other languages
Chinese (zh)
Other versions
CN105320641B (en
Inventor
芦世先
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201410370686.3A priority Critical patent/CN105320641B/en
Publication of CN105320641A publication Critical patent/CN105320641A/en
Application granted granted Critical
Publication of CN105320641B publication Critical patent/CN105320641B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

An embodiment of the invention discloses a text checking method and a user terminal. The method comprises steps as follows: a text summary of a standard text fragment having a title the same as that of a to-be-read text fragment is acquired from a text signing station; matching processing is performed on the to-be-read text fragment by the aid of the text summary; when a matching result is larger than a preset threshold value, the to-be-read text fragment is output. The accuracy of the text fragment can be further improved, and the text reading quality can be guaranteed.

Description

A kind of text method of calibration and user terminal
Technical field
The present invention relates to Internet technical field, particularly relate to a kind of text method of calibration and user terminal.
Background technology
Along with Internet technology continually develop and perfect, network has become an indispensable part in people's life, and user can be carried out file transfer, browse network text, be played games by the user terminal such as mobile phone and computer interconnection network.
In the process of existing read web text, proper vector is calculated by text fragments, and it is whether correct according to proper vector determination text fragment, such as: judge whether novel content correctly belongs to this novel etc., because the network text only according to current reading carries out the judgement of segment contents, well cannot ensure the accuracy of text fragments, have impact on the quality of text reading
Summary of the invention
The embodiment of the present invention provides a kind of text method of calibration and user terminal, can promote the accuracy of text fragments further, ensures the quality of text reading.
In order to solve the problems of the technologies described above, embodiment of the present invention first aspect provides a kind of text method of calibration, can comprise:
The text snippet of the received text fragment identical with text fragments title to be read is obtained in website contracted by text;
Described text snippet is adopted to carry out matching treatment to described text fragments to be read;
When matching result is greater than predetermined threshold value, export described text fragments to be read.
Embodiment of the present invention second aspect provides a kind of user terminal, can comprise:
Summary acquiring unit, for obtaining the text snippet of the received text fragment identical with text fragments title to be read in contracting website at text;
Fragment match unit, carries out matching treatment for adopting described text snippet to described text fragments to be read;
Fragment output unit, for when matching result is greater than predetermined threshold value, exports described text fragments to be read.
In embodiments of the present invention, by the text snippet of station for acquiring of contracting to the text received text fragment identical with text fragments title to be read, adopt text snippet to treat read text fragment and carry out matching treatment, and when matching result is greater than predetermined threshold value, export described text fragments to be read.By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, further increasing the accuracy of text fragments, and then ensure that the quality of text reading.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the schematic flow sheet of a kind of text method of calibration that the embodiment of the present invention provides;
Fig. 2 is the schematic flow sheet of the another kind of text method of calibration that the embodiment of the present invention provides;
Fig. 3 is the structural representation of a kind of user terminal that the embodiment of the present invention provides;
Fig. 4 is the structural representation of a kind of fragment match unit that the embodiment of the present invention provides;
Fig. 5 is the structural representation of the another kind of fragment match unit that the embodiment of the present invention provides;
Fig. 6 is the structural representation of the another kind of user terminal that the embodiment of the present invention provides;
Fig. 7 is the structural representation of another user terminal that the embodiment of the present invention provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
The text method of calibration that the embodiment of the present invention provides can be applied to the scene of read web novel, such as: when reading network novel text, in website contracted by the text of this storywriter, obtain the text snippet of the received text fragment identical with text fragments to be read (such as: chapters and sections etc.) title; Described text snippet is adopted to carry out matching treatment to described text fragments to be read; When matching result is greater than predetermined threshold value, export the scene etc. of described text fragments to be read.By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, further increasing the accuracy of text fragments, and then ensure that the quality of text reading.
The user terminal that the embodiment of the present invention relates to can comprise: the terminal devices such as computing machine, panel computer, smart mobile phone, notebook computer, palm PC and mobile internet device (MID); Described text website of contracting is the website that text author contracts, and the copyright of text returns described text website of contracting to own, and the text that described text is contracted in website is all received texts, i.e. text accurately; Described received text fragment is belong to the partial content in described received text, such as: chapters and sections, subhead and ownership and the content etc. of this subhead.
Below in conjunction with accompanying drawing 1 and accompanying drawing 2, the text method of calibration that the embodiment of the present invention provides is described in detail.
Refer to Fig. 1, for embodiments providing a kind of schematic flow sheet of text method of calibration.As shown in Figure 1, the embodiment of the present invention said method comprising the steps of S101-step S103.
S101, obtains the text snippet of the received text fragment identical with text fragments title to be read in website contracted by text;
Concrete, when user opens text to be read by user terminal, described user terminal is contracted by text the text snippet of the station for acquiring received text fragment identical with the text fragments title to be read in described text to be read, be understandable that, the text to be read that user terminal is opened comes from some texts polymerization website, described text polymerization website is by extracting the text of third party's website, with the website providing user's free text to read, the received text fragment of contracting in website due to text needs charge, but the text snippet that can get in this received text fragment, therefore normative text summary is adopted to mate described text fragments to be read, the accuracy of text fragments to be read can be promoted.
It should be noted that, before described user terminal obtains the text snippet of the received text fragment identical with text fragments title to be read in website contracted by text, when receiving the browse request of the label carrying text to be read, described user terminal can also obtain the standard directory information be herein associated with described label in website contracted by described text, described label is preferably the text title of described text to be read, described user terminal can adopt the directory information of the directory information of described received text to described text to be read to mate, further, described user terminal mates with the title in the catalogue of described text to be read the title in the catalogue of described received text, described title is specifically as follows the title of each chapters and sections in catalogue.By mating for the first time directory information, coupling basis can be provided for the process of text fragments coupling, after directory information fits through, namely mate the directory information of described received text consistent with the directory information of described text to be read time, described user terminal performs the step of the text snippet obtaining the received text fragment identical with text fragments title to be read in website contracted by text.
S102, adopts described text snippet to carry out matching treatment to described text fragments to be read;
Concrete, described user terminal carries out staging treating to described text snippet and described text fragments to be read according to preset format respectively to described, carries out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.
S103, when matching result is greater than predetermined threshold value, exports described text fragments to be read;
Concrete, when the matching result after matching treatment is greater than predetermined threshold value, show that the similarity of described text fragments to be read and described text snippet is higher, described user terminal can export described text fragments to be read.
In embodiments of the present invention, by the text snippet of station for acquiring of contracting to the text received text fragment identical with text fragments title to be read, adopt text snippet to treat read text fragment and carry out matching treatment, and when matching result is greater than predetermined threshold value, export described text fragments to be read.By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, further increasing the accuracy of text fragments, and then ensure that the quality of text reading.
Refer to Fig. 2, for embodiments providing the schematic flow sheet of another kind of text method of calibration.As shown in Figure 2, the embodiment of the present invention said method comprising the steps of S201-step S208.
S201, when receiving the browse request of the label carrying text to be read, obtains the directory information of the received text be associated with described label in website contracted by text;
S202, adopts the directory information of the directory information of described received text to described text to be read to mate;
Concrete, when receiving the browse request of the label carrying text to be read, user terminal can obtain the standard directory information be herein associated with described label in website contracted by text, described label is preferably the text title of described text to be read, described user terminal can adopt the directory information of the directory information of described received text to described text to be read to mate, further, described user terminal mates with the title in the catalogue of described text to be read the title in the catalogue of described received text, described title is specifically as follows the title of each chapters and sections in catalogue.By mating for the first time directory information, coupling basis can be provided for the process of text fragments coupling.
It should be noted that, the text to be read that user terminal is opened comes from some novels polymerization website, and described novel polymerization website passes through the text extracting third party's website, with the website providing user's free text to read.
S203, after fitting through, obtains the text snippet of the received text fragment identical with text fragments title to be read in website contracted by text;
Concrete, after directory information fits through, namely mate the directory information of described received text consistent with the directory information of described text to be read time, described user terminal is contracted by text the text snippet of the station for acquiring received text fragment identical with the text fragments title to be read in described text to be read, be understandable that, the text to be read that user terminal is opened comes from some texts polymerization website, described text polymerization website is by extracting the text of third party's website, with the website providing user's free text to read, the received text fragment of contracting in website due to text needs charge, but the text snippet that can get in this received text fragment, therefore normative text summary is adopted to mate described text fragments to be read, the accuracy of text fragments to be read can be promoted.
S204, carries out staging treating to described text snippet and described text fragments to be read respectively according to preset format, carries out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read;
Concrete, described user terminal first can carry out staging treating to described text snippet and described text fragments to be read according to preset format respectively, such as: the boundary being segmentation with default number of words, described text snippet and described text fragments to be read are segmented into several character strings; Or according to part of speech, described text snippet and described text fragments to be read are segmented into several character strings etc.Described user terminal can carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.
Preferably, matching treatment process can carry out staging treating to described text snippet and described text fragments to be read for described user terminal according to the first preset format respectively, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read, be understandable that, described first character string can comprise at least one character string, described second character string also can comprise at least one character string, described user terminal obtains the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string, described Levenstein distance is expressed as a character string and becomes editor's quantity of minimum single character that another one character string requires and (comprise insertion, deletion and replacement etc.), described user terminal is according to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.For the first preset format of part of speech segmentation, suppose that the first character string obtained is " sun ", second character string is " heronsbill ", then described string length sum is 5, the Levenstein distance that first character string and the second character string are shown in was 2 (comprise and delete and replace), then according to formula (the Sum-Idist)/Sum of Levenstein ratio, wherein Sum represents the length sum of character string, Idist represents the Levenstein distance between character string, then the Levenstein ratio of described first character string and described second character string is 0.6, described user terminal adopts this kind of mode all to carry out the calculating of Levenstein ratio to all character strings in described text fragments to be read and described text snippet, and the mean value of final Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.Preferably, when calculating the Levenstein ratio of described first character string and described second character string, described first character string and described second character string can also be converted to phonetic respectively, and adopt the Levenstein ratio between the aforesaid way calculating phonetic of described first character string and the phonetic of described second character string, described user terminal obtains the mean value of the Levenstein ratio between the phonetic of described first character string and the Levenstein ratio of described second character string and the phonetic of described first character string and described second character string, using this mean value as Levenstein ratio final between described first character string and described second character string, mate by adopting the mode of phonetic further, phonetically similar word can be avoided the interference of matching treatment process, the accuracy of matching treatment can be promoted further simultaneously.
Be understandable that, adopting the mode of Levenstein ratio to be applicable to string length difference is not very large situation, such as, for text polymerization website, some advertising terms are added in usual meeting before text, and for text snippet, then there is no these advertising terms, therefore in the process of coupling, need too to mate these advertising terms, cause expending the longer time, therefore for this kind of situation, the embodiment of the present invention also provides another kind of matching treatment process to be that described user terminal carries out staging treating to described text snippet and described text fragments to be read respectively according to the second preset format, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read, obtain the number of the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read, the ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.For the second preset format of default number of words segmentation, the one section of statement supposing in described text snippet be " graduation come back home after land here ", one section of statement in described text fragments to be read be " finish page come back home after land here know out ", described default number of words is three, one section of statement then in described text snippet can be divided into " graduates back, industry is come back home, after coming back home, after state, after land, land here " six three-character doctrine strings, and one section of statement in described text fragments to be read can be divided into, and " complete page returns, page is come back home, after coming back home, after state, after land, land here, here land knows, here knowledge is opened " eight the 4th character strings, there are 4 three-character doctrine strings identical with 4 character strings of one section of statement in described text fragments to be read in one section of statement then in described text snippet, therefore the matching result for these two sections of statements is 4/6=0.67.Described user terminal adopts this kind of mode all to carry out the calculating of number ratio to all character strings in described text fragments to be read and described text snippet, and the mean value of final number ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.
Be understandable that, above-mentioned first preset format and the second preset format can be identical preset format, this mode is adopted to name only in order to distinguish at different matching processs, equally, above-mentioned first character string, the second character string and three-character doctrine string, the 4th character string also can be identical character string.
S205, when matching result is greater than predetermined threshold value, exports described text fragments to be read;
Concrete, when the matching result after matching treatment is greater than predetermined threshold value, show that the similarity of described text fragments to be read and described text snippet is higher, described user terminal can export described text fragments to be read.
S206, when matching result is less than or equal to predetermined threshold value, obtains at least one third party text fragments identical with described text fragments title to be read at least one third party's website;
Concrete, when matching result is less than or equal to predetermined threshold value, show that the similarity of described text fragments to be read and described text snippet is not high, described user terminal can obtain at least one third party text fragments identical with described text fragments title to be read at least one third party's website, be understandable that, described third party's website can for other texts polymerization website except the text polymerization website used at present.
S207, adopts described text snippet to calculate the similarity of each third party's text fragments at least one third party's text fragments described respectively;
Concrete, described user terminal can obtain third party's text fragments of third party's website of predetermined number, such as: the third party's text fragments obtaining 10 third party's websites.Described user terminal adopts described text snippet to calculate the similarity of each third party's text fragments in third party's text fragments of described predetermined number respectively, and concrete computation process see above-mentioned matching treatment process, can not repeat at this.
S208, obtains and exports similarity and be greater than described predetermined threshold value and the maximum third party's text fragments of similarity;
Concrete, described user terminal can obtain and export similarity and be greater than described predetermined threshold value and the maximum third party's text fragments of similarity in third party's text fragments of described predetermined number.
In embodiments of the present invention, by the text snippet of station for acquiring of contracting to the text received text fragment identical with text fragments title to be read, adopt text snippet to treat read text fragment and carry out matching treatment, and when matching result is greater than predetermined threshold value, export described text fragments to be read.By the directory information of matched text, the coupling for follow-up text fragment provides coupling basis; By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, improve the accuracy of text fragments; Adopt the mode of Levenstein ratio and the mode of number ratio to mate text fragments, further increase the accuracy of text fragments; Adopt predetermined threshold value to judge matching result, and will export by text fragments the most accurately according to judged result, ensure that the quality of text reading.
Below in conjunction with accompanying drawing 3-accompanying drawing 6, the user terminal that the embodiment of the present invention provides is described in detail.It should be noted that, the user terminal shown in accompanying drawing 3-accompanying drawing 6, for performing Fig. 1 of the present invention and method embodiment illustrated in fig. 2, for convenience of explanation, illustrate only the part relevant to the embodiment of the present invention, concrete ins and outs do not disclose, and please refer to the embodiment shown in Fig. 1 and Fig. 2 of the present invention.
Refer to Fig. 3, for embodiments providing a kind of structural representation of user terminal.As shown in Figure 5, the described user terminal 1 of the embodiment of the present invention can comprise: summary acquiring unit 11, fragment match unit 12 and fragment output unit 13.
Summary acquiring unit 11, for obtaining the text snippet of the received text fragment identical with text fragments title to be read in contracting website at text;
In specific implementation, when user opens text to be read by user terminal 1, described summary acquiring unit 11 is contracted by text the text snippet of the station for acquiring received text fragment identical with the text fragments title to be read in described text to be read, be understandable that, the text to be read that user terminal is opened comes from some texts polymerization website, described text polymerization website is by extracting the text of third party's website, with the website providing user's free text to read, the received text fragment of contracting in website due to text needs charge, but the text snippet that can get in this received text fragment, therefore normative text summary is adopted to mate described text fragments to be read, the accuracy of text fragments to be read can be promoted.
It should be noted that, before described summary acquiring unit 11 obtains the text snippet of the received text fragment identical with text fragments title to be read in website contracted by text, when receiving the browse request of the label carrying text to be read, described user terminal 1 can also obtain the standard directory information be herein associated with described label in website contracted by described text, described label is preferably the text title of described text to be read, described user terminal 1 can adopt the directory information of the directory information of described received text to described text to be read to mate, further, described user terminal 1 mates with the title in the catalogue of described text to be read the title in the catalogue of described received text, described title is specifically as follows the title of each chapters and sections in catalogue.By mating for the first time directory information, coupling basis can be provided for the process of text fragments coupling, after directory information fits through, namely mate the directory information of described received text consistent with the directory information of described text to be read time, described summary acquiring unit 11 performs the step of the text snippet of the received text fragment that acquisition is identical with text fragments title to be read in website contracted by text.
Fragment match unit 12, carries out matching treatment for adopting described text snippet to described text fragments to be read;
In specific implementation, described fragment match unit 12 carries out staging treating to described text snippet and described text fragments to be read according to preset format respectively to described, carries out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.
Concrete, please also refer to Fig. 4, for embodiments providing a kind of structural representation of fragment match unit.As shown in Figure 4, described fragment match unit 12 can comprise:
First obtains subelement 121, for carrying out staging treating to described text snippet and described text fragments to be read respectively according to the first preset format, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read;
Acquisition of information subelement 122, for obtaining the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string;
First result determination subelement 123, for according to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment;
In specific implementation, described fragment match unit 12 first can carry out staging treating to described text snippet and described text fragments to be read according to preset format respectively, such as: the boundary being segmentation with default number of words, described text snippet and described text fragments to be read are segmented into several character strings; Or according to part of speech, described text snippet and described text fragments to be read are segmented into several character strings etc.Described fragment match unit 12 can carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.
Preferably, matching treatment process can carry out staging treating to described text snippet and described text fragments to be read for described first acquisition subelement 121 according to the first preset format respectively, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read, be understandable that, described first character string can comprise at least one character string, described second character string also can comprise at least one character string, described acquisition of information subelement 122 obtains the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string, described Levenstein distance is expressed as a character string and becomes editor's quantity of minimum single character that another one character string requires and (comprise insertion, deletion and replacement etc.), described first result determination subelement 123 is according to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.For the first preset format of part of speech segmentation, suppose that the first character string obtained is " sun ", second character string is " heronsbill ", then described string length sum is 5, the Levenstein distance that first character string and the second character string are shown in was 2 (comprise and delete and replace), then according to formula (the Sum-Idist)/Sum of Levenstein ratio, wherein Sum represents the length sum of character string, Idist represents the Levenstein distance between character string, then the Levenstein ratio of described first character string and described second character string is 0.6, described fragment match unit 12 adopts this kind of mode all to carry out the calculating of Levenstein ratio to all character strings in described text fragments to be read and described text snippet, and the mean value of final Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.Preferably, when calculating the Levenstein ratio of described first character string and described second character string, described first character string and described second character string can also be converted to phonetic respectively, and adopt the Levenstein ratio between the aforesaid way calculating phonetic of described first character string and the phonetic of described second character string, described fragment match unit 12 obtains the mean value of the Levenstein ratio between the phonetic of the Levenstein ratio of described first character string and described second character string and the phonetic of described first character string and described second character string, using this mean value as Levenstein ratio final between described first character string and described second character string, mate by adopting the mode of phonetic further, phonetically similar word can be avoided the interference of matching treatment process, the accuracy of matching treatment can be promoted further simultaneously.
Be understandable that, adopting the mode of Levenstein ratio to be applicable to string length difference is not very large situation, such as, for text polymerization website, some advertising terms are added in usual meeting before text, and for text snippet, then there is no these advertising terms, therefore in the process of coupling, need too to mate these advertising terms, cause expending the longer time, therefore for this kind of situation, the embodiment of the present invention also provides the structural representation of another kind of fragment match unit, as shown in Figure 5, described fragment match unit 12 can comprise:
Second obtains subelement 124, for carrying out staging treating to described text snippet and described text fragments to be read respectively according to the second preset format, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read;
Number obtains subelement 125, for obtaining the number of the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read;
Second result determination subelement 126, carries out the matching result of matching treatment for the ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described being defined as described text fragments to be read and described text snippet;
In specific implementation, described matching treatment process can also carry out staging treating to described text snippet and described text fragments to be read for described second acquisition subelement 124 according to the second preset format respectively, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read, described number obtains the number that subelement 125 obtains the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read, the ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment by described second result determination subelement 126.For the second preset format of default number of words segmentation, the one section of statement supposing in described text snippet be " graduation come back home after land here ", one section of statement in described text fragments to be read be " finish page come back home after land here know out ", described default number of words is three, one section of statement then in described text snippet can be divided into " graduates back, industry is come back home, after coming back home, after state, after land, land here " six three-character doctrine strings, and one section of statement in described text fragments to be read can be divided into, and " complete page returns, page is come back home, after coming back home, after state, after land, land here, here land knows, here knowledge is opened " eight the 4th character strings, there are 4 three-character doctrine strings identical with 4 character strings of one section of statement in described text fragments to be read in one section of statement then in described text snippet, therefore the matching result for these two sections of statements is 4/6=0.67.Described fragment match unit 12 adopts this kind of mode all to carry out the calculating of number ratio to all character strings in described text fragments to be read and described text snippet, and the mean value of final number ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.
Be understandable that, above-mentioned first preset format and the second preset format can be identical preset format, this mode is adopted to name only in order to distinguish at different matching processs, equally, above-mentioned first character string, the second character string and three-character doctrine string, the 4th character string also can be identical character string.Simultaneously, described fragment match unit 12 can comprise the first acquisition subelement 121 simultaneously, acquisition of information subelement 122, first result determination subelement 123 and second obtains subelement 124, number obtains subelement 125, second result determination subelement 126, for the matching treatment process solved in varied situations.
Fragment output unit 13, for when matching result is greater than predetermined threshold value, exports described text fragments to be read;
Concrete, when the matching result after matching treatment is greater than predetermined threshold value, show that the similarity of described text fragments to be read and described text snippet is higher, described fragment output unit 13 can export described text fragments to be read.
In embodiments of the present invention, by the text snippet of station for acquiring of contracting to the text received text fragment identical with text fragments title to be read, adopt text snippet to treat read text fragment and carry out matching treatment, and when matching result is greater than predetermined threshold value, export described text fragments to be read.By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, improve the accuracy of text fragments; Adopt the mode of Levenstein ratio and the mode of number ratio to mate text fragments, further increase the accuracy of text fragments, and then ensure that the quality of text reading.
Refer to Fig. 6, for embodiments providing the structural representation of another kind of user terminal.As shown in Figure 6, the described user terminal 1 of the embodiment of the present invention can comprise: summary acquiring unit 11, fragment match unit 12, fragment output unit 13, information acquisition unit 14, notification unit 15, fragment acquiring unit 16 and computing unit 17; Wherein, the concrete structure of described summary acquiring unit 11, fragment match unit 12 and the part-structure of fragment output unit 13 can the specific descriptions of embodiment shown in Figure 3, do not repeat at this.
Information acquisition unit 14, for when receiving the browse request of the label carrying text to be read, obtains the directory information of the received text be associated with described label in website contracted by text;
Notification unit 15, for adopting the directory information of the directory information of described received text to described text to be read to mate, and after fitting through, notify that described summary acquiring unit 11 performs the text snippet of the received text fragment that acquisition is identical with text fragments title to be read in website contracted by text;
In specific implementation, when described user terminal 1 receives the browse request of the label carrying text to be read, described information acquisition unit 14 can obtain the standard directory information be herein associated with described label in website contracted by text, described label is preferably the text title of described text to be read, described notification unit 15 can adopt the directory information of the directory information of described received text to described text to be read to mate, further, described notification unit 15 mates with the title in the catalogue of described text to be read the title in the catalogue of described received text, described title is specifically as follows the title of each chapters and sections in catalogue.By mating for the first time directory information, coupling basis can be provided for the process of text fragments coupling.
It should be noted that, the text to be read that described user terminal 1 is opened comes from some novels polymerization website, described novel polymerization website is by extracting the text of third party's website, with the website providing user's free text to read, after directory information fits through, namely mate the directory information of described received text consistent with the directory information of described text to be read time, described notification unit 15 notifies that described summary acquiring unit 11 performs the text snippet of the received text fragment that acquisition is identical with text fragments title to be read in website contracted by text.
Fragment acquiring unit 16, for when matching result is less than or equal to predetermined threshold value, obtains at least one third party text fragments identical with described text fragments title to be read at least one third party's website;
In specific implementation, when matching result is less than or equal to predetermined threshold value, show that the similarity of described text fragments to be read and described text snippet is not high, described fragment acquiring unit 16 can obtain at least one third party text fragments identical with described text fragments title to be read at least one third party's website, be understandable that, described third party's website can for other texts polymerization website except the text polymerization website used at present.
Computing unit 17, for the similarity adopting described text snippet to calculate each third party's text fragments at least one third party's text fragments described respectively;
In specific implementation, described computing unit 17 can obtain third party's text fragments of third party's website of predetermined number, such as: the third party's text fragments obtaining 10 third party's websites.Described computing unit 17 adopts described text snippet to calculate the similarity of each third party's text fragments in third party's text fragments of described predetermined number respectively, concrete computation process can the matching treatment process of embodiment shown in Figure 3, does not repeat at this.
Described fragment output unit 12, is also greater than described predetermined threshold value and the maximum third party's text fragments of similarity for obtaining and exporting similarity;
In specific implementation, described fragment output unit 12 can obtain and export similarity and be greater than described predetermined threshold value and the maximum third party's text fragments of similarity in third party's text fragments of described predetermined number.
In embodiments of the present invention, by the text snippet of station for acquiring of contracting to the text received text fragment identical with text fragments title to be read, adopt text snippet to treat read text fragment and carry out matching treatment, and when matching result is greater than predetermined threshold value, export described text fragments to be read.By the directory information of matched text, the coupling for follow-up text fragment provides coupling basis; By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, improve the accuracy of text fragments; Adopt the mode of Levenstein ratio and the mode of number ratio to mate text fragments, further increase the accuracy of text fragments; Adopt predetermined threshold value to judge matching result, and will export by text fragments the most accurately according to judged result, ensure that the quality of text reading.
Refer to Fig. 7, for embodiments providing the structural representation of another user terminal.As shown in Figure 7, described user terminal 1000 can comprise: at least one processor 1001, such as CPU, at least one network interface 1004, user interface 1003, storer 1005, at least one communication bus 1002.Wherein, communication bus 1002 is for realizing the connection communication between these assemblies.Wherein, user interface 1003 can comprise display screen (Display), keyboard (Keyboard), and optional user interface 1003 can also comprise wireline interface, the wave point of standard.Network interface 1004 optionally can comprise wireline interface, the wave point (as WI-FI interface) of standard.Storer 1005 can be high-speed RAM storer, also can be non-labile storer (non-volatilememory), such as at least one magnetic disk memory.Storer 1005 can also be optionally that at least one is positioned at the memory storage away from aforementioned processor 1001.As shown in Figure 7, as comprising operating system, network communication module, Subscriber Interface Module SIM and text verification Application program in a kind of storer 1005 of computer-readable storage medium.
In the user terminal 1000 shown in Fig. 7, network interface 1004 is mainly used in connecting text and contracts website and third party's website, carries out data communication with described user terminal; And processor 1001 may be used for calling the text verification Application program stored in storer 1005, and specifically perform following steps:
The text snippet of the received text fragment identical with text fragments title to be read is obtained in website contracted by text;
Described text snippet is adopted to carry out matching treatment to described text fragments to be read;
When matching result is greater than predetermined threshold value, export described text fragments to be read.
In one embodiment, described processor 1001, before performing the text snippet obtaining the received text fragment identical with text fragments title to be read in website contracted by text, also performs following steps:
When receiving the browse request of the label carrying text to be read, in website contracted by text, obtain the directory information of the received text be associated with described label;
Adopt the directory information of the directory information of described received text to described text to be read to mate, and after fitting through, perform the text snippet of the received text fragment that acquisition is identical with text fragments title to be read in website contracted by text.
In one embodiment, described processor 1001, when performing the described text snippet of employing and carrying out matching treatment to described text fragments to be read, specifically performs following steps:
Respectively staging treating is carried out to described text snippet and described text fragments to be read according to preset format, carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.
In one embodiment, described processor 1001 carries out staging treating to described text snippet and described text fragments to be read in execution respectively according to preset format, when carrying out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read, specifically perform following steps:
Respectively staging treating is carried out to described text snippet and described text fragments to be read according to the first preset format, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read;
Obtain the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string;
According to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.
In one embodiment, described processor 1001 carries out staging treating to described text snippet and described text fragments to be read in execution respectively according to preset format, when carrying out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read, specifically perform following steps:
Respectively staging treating is carried out to described text snippet and described text fragments to be read according to the second preset format, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read;
Obtain the number of the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read;
The ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.
In one embodiment, described processor 1001 also performs following steps:
When matching result is less than or equal to predetermined threshold value, at least one third party's website, obtain at least one third party text fragments identical with described text fragments title to be read;
Described text snippet is adopted to calculate the similarity of each third party's text fragments at least one third party's text fragments described respectively;
Obtain and export similarity and be greater than described predetermined threshold value and the maximum third party's text fragments of similarity.
In embodiments of the present invention, by the text snippet of station for acquiring of contracting to the text received text fragment identical with text fragments title to be read, adopt text snippet to treat read text fragment and carry out matching treatment, and when matching result is greater than predetermined threshold value, export described text fragments to be read.By the directory information of matched text, the coupling for follow-up text fragment provides coupling basis; By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, improve the accuracy of text fragments; Adopt the mode of Levenstein ratio and the mode of number ratio to mate text fragments, further increase the accuracy of text fragments; Adopt predetermined threshold value to judge matching result, and will export by text fragments the most accurately according to judged result, ensure that the quality of text reading.
One of ordinary skill in the art will appreciate that all or part of flow process realized in above-described embodiment method, that the hardware that can carry out instruction relevant by computer program has come, described program can be stored in a computer read/write memory medium, this program, when performing, can comprise the flow process of the embodiment as above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-OnlyMemory, ROM) or random store-memory body (RandomAccessMemory, RAM) etc.
Above disclosedly be only present pre-ferred embodiments, certainly can not limit the interest field of the present invention with this, therefore according to the equivalent variations that the claims in the present invention are done, still belong to the scope that the present invention is contained.

Claims (12)

1. a text method of calibration, is characterized in that, comprising:
The text snippet of the received text fragment identical with text fragments title to be read is obtained in website contracted by text;
Described text snippet is adopted to carry out matching treatment to described text fragments to be read;
When matching result is greater than predetermined threshold value, export described text fragments to be read.
2. method according to claim 1, is characterized in that, before the text snippet of the described received text fragment that acquisition is identical with text fragments title to be read in website contracted by text, also comprises:
When receiving the browse request of the label carrying text to be read, in website contracted by text, obtain the directory information of the received text be associated with described label;
Adopt the directory information of the directory information of described received text to described text to be read to mate, and after fitting through, perform the text snippet of the received text fragment that acquisition is identical with text fragments title to be read in website contracted by text.
3. method according to claim 1, is characterized in that, the described text snippet of described employing carries out matching treatment to described text fragments to be read, comprising:
Respectively staging treating is carried out to described text snippet and described text fragments to be read according to preset format, carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.
4. method according to claim 3, it is characterized in that, describedly respectively staging treating is carried out to described text snippet and described text fragments to be read according to preset format, carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read, comprising:
Respectively staging treating is carried out to described text snippet and described text fragments to be read according to the first preset format, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read;
Obtain the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string;
According to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.
5. method according to claim 3, it is characterized in that, describedly respectively staging treating is carried out to described text snippet and described text fragments to be read according to preset format, carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read, comprising:
Respectively staging treating is carried out to described text snippet and described text fragments to be read according to the second preset format, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read;
Obtain the number of the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read;
The ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.
6. method according to claim 1, is characterized in that, also comprises:
When matching result is less than or equal to predetermined threshold value, at least one third party's website, obtain at least one third party text fragments identical with described text fragments title to be read;
Described text snippet is adopted to calculate the similarity of each third party's text fragments at least one third party's text fragments described respectively;
Obtain and export similarity and be greater than described predetermined threshold value and the maximum third party's text fragments of similarity.
7. a user terminal, is characterized in that, comprising:
Summary acquiring unit, for obtaining the text snippet of the received text fragment identical with text fragments title to be read in contracting website at text;
Fragment match unit, carries out matching treatment for adopting described text snippet to described text fragments to be read;
Fragment output unit, for when matching result is greater than predetermined threshold value, exports described text fragments to be read.
8. terminal according to claim 7, is characterized in that, also comprises:
Information acquisition unit, for when receiving the browse request of the label carrying text to be read, obtains the directory information of the received text be associated with described label in website contracted by text;
Notification unit, for adopting the directory information of the directory information of described received text to described text to be read to mate, and after fitting through, notify that described summary acquiring unit performs the text snippet obtaining the received text fragment identical with text fragments title to be read in website contracted by text.
9. terminal according to claim 7, it is characterized in that, described fragment match unit, specifically for carrying out staging treating to described text snippet and described text fragments to be read respectively according to preset format, carries out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.
10. terminal according to claim 9, is characterized in that, described fragment match unit comprises:
First obtains subelement, for carrying out staging treating to described text snippet and described text fragments to be read respectively according to the first preset format, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read;
Acquisition of information subelement, for obtaining the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string;
First result determination subelement, for according to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.
11. terminals according to claim 9, is characterized in that, described fragment match unit comprises:
Second obtains subelement, for carrying out staging treating to described text snippet and described text fragments to be read respectively according to the second preset format, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read;
Number obtains subelement, for obtaining the number of the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read;
Second result determination subelement, carries out the matching result of matching treatment for the ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described being defined as described text fragments to be read and described text snippet.
12. terminals according to claim 7, is characterized in that, also comprise:
Fragment acquiring unit, for when matching result is less than or equal to predetermined threshold value, obtains at least one third party text fragments identical with described text fragments title to be read at least one third party's website;
Computing unit, for the similarity adopting described text snippet to calculate each third party's text fragments at least one third party's text fragments described respectively;
Described fragment output unit, is also greater than described predetermined threshold value and the maximum third party's text fragments of similarity for obtaining and exporting similarity.
CN201410370686.3A 2014-07-30 2014-07-30 Text verification method and user terminal Active CN105320641B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410370686.3A CN105320641B (en) 2014-07-30 2014-07-30 Text verification method and user terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410370686.3A CN105320641B (en) 2014-07-30 2014-07-30 Text verification method and user terminal

Publications (2)

Publication Number Publication Date
CN105320641A true CN105320641A (en) 2016-02-10
CN105320641B CN105320641B (en) 2020-04-03

Family

ID=55248046

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410370686.3A Active CN105320641B (en) 2014-07-30 2014-07-30 Text verification method and user terminal

Country Status (1)

Country Link
CN (1) CN105320641B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319978A (en) * 2018-02-01 2018-07-24 北京捷通华声科技股份有限公司 A kind of semantic similarity calculation method and device
CN110517050A (en) * 2019-08-12 2019-11-29 太平洋医疗健康管理有限公司 A kind of medical insurance, which instead cheats to exchange, encodes digging system and method
CN110532112A (en) * 2019-08-29 2019-12-03 维沃移动通信有限公司 A kind of object extraction method and mobile terminal
CN111930890A (en) * 2020-07-28 2020-11-13 深圳市梦网科技发展有限公司 Information sending method and device, terminal equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1403959A (en) * 2001-09-07 2003-03-19 联想(北京)有限公司 Content filter based on text content characteristic similarity and theme correlation degree comparison
CN101826099A (en) * 2010-02-04 2010-09-08 蓝盾信息安全技术股份有限公司 Method and system for identifying similar documents and determining document diffusance
CN102999483A (en) * 2011-09-16 2013-03-27 北京百度网讯科技有限公司 Method and device for correcting text

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1403959A (en) * 2001-09-07 2003-03-19 联想(北京)有限公司 Content filter based on text content characteristic similarity and theme correlation degree comparison
CN101826099A (en) * 2010-02-04 2010-09-08 蓝盾信息安全技术股份有限公司 Method and system for identifying similar documents and determining document diffusance
CN102999483A (en) * 2011-09-16 2013-03-27 北京百度网讯科技有限公司 Method and device for correcting text

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刁兴春 等: "一种融合多种编辑距离的字符串相似度计算方法", 《计算机应用研究》 *
赵胜钢 等: "编辑距离算法在科研基金名称数据分析中的应用", 《数字图书馆论坛》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108319978A (en) * 2018-02-01 2018-07-24 北京捷通华声科技股份有限公司 A kind of semantic similarity calculation method and device
CN108319978B (en) * 2018-02-01 2021-01-22 北京捷通华声科技股份有限公司 Semantic similarity calculation method and device
CN110517050A (en) * 2019-08-12 2019-11-29 太平洋医疗健康管理有限公司 A kind of medical insurance, which instead cheats to exchange, encodes digging system and method
CN110532112A (en) * 2019-08-29 2019-12-03 维沃移动通信有限公司 A kind of object extraction method and mobile terminal
CN110532112B (en) * 2019-08-29 2022-10-04 维沃移动通信有限公司 Object extraction method and mobile terminal
CN111930890A (en) * 2020-07-28 2020-11-13 深圳市梦网科技发展有限公司 Information sending method and device, terminal equipment and storage medium

Also Published As

Publication number Publication date
CN105320641B (en) 2020-04-03

Similar Documents

Publication Publication Date Title
CN107220235B (en) Speech recognition error correction method and device based on artificial intelligence and storage medium
US11080322B2 (en) Search methods, servers, and systems
CN105117380A (en) Paste processing method and device
CN105320641A (en) Text checking method and user terminal
CN104156454A (en) Search term correcting method and device
CN108664471B (en) Character recognition error correction method, device, equipment and computer readable storage medium
CN110718226A (en) Speech recognition result processing method and device, electronic equipment and medium
WO2023116561A1 (en) Entity extraction method and apparatus, and electronic device and storage medium
KR20190000776A (en) Information inputting method
CN107688541A (en) File reviewing method, device, server and computer-readable recording medium
CN102955773B (en) For identifying the method and system of chemical name in Chinese document
WO2022105121A1 (en) Distillation method and apparatus applied to bert model, device, and storage medium
CN107609192A (en) The supplement searching method and device of a kind of search engine
CN107861948B (en) Label extraction method, device, equipment and medium
CN111325031B (en) Resume analysis method and device
CN113239256B (en) Method for generating website signature, method and device for identifying website
CN103747284A (en) Video pushing method and server
CN113360895A (en) Station group detection method and device and electronic equipment
US20170116174A1 (en) Electronic word identification techniques based on input context
US9946762B2 (en) Building a domain knowledge and term identity using crowd sourcing
CN111368693A (en) Identification method and device for identity card information
CN107729347B (en) Method, device and equipment for acquiring synonym label and computer readable storage medium
CN113807091B (en) Word mining method and device, electronic equipment and readable storage medium
CN115496734A (en) Quality evaluation method of video content, network training method and device
CN114882874A (en) End-to-end model training method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant