CN105320641A

CN105320641A - Text checking method and user terminal

Info

Publication number: CN105320641A
Application number: CN201410370686.3A
Authority: CN
Inventors: 芦世先
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2014-07-30
Filing date: 2014-07-30
Publication date: 2016-02-10
Anticipated expiration: 2034-07-30
Also published as: CN105320641B

Abstract

An embodiment of the invention discloses a text checking method and a user terminal. The method comprises steps as follows: a text summary of a standard text fragment having a title the same as that of a to-be-read text fragment is acquired from a text signing station; matching processing is performed on the to-be-read text fragment by the aid of the text summary; when a matching result is larger than a preset threshold value, the to-be-read text fragment is output. The accuracy of the text fragment can be further improved, and the text reading quality can be guaranteed.

Description

A kind of text method of calibration and user terminal

Technical field

The present invention relates to Internet technical field, particularly relate to a kind of text method of calibration and user terminal.

Background technology

Along with Internet technology continually develop and perfect, network has become an indispensable part in people's life, and user can be carried out file transfer, browse network text, be played games by the user terminal such as mobile phone and computer interconnection network.

In the process of existing read web text, proper vector is calculated by text fragments, and it is whether correct according to proper vector determination text fragment, such as: judge whether novel content correctly belongs to this novel etc., because the network text only according to current reading carries out the judgement of segment contents, well cannot ensure the accuracy of text fragments, have impact on the quality of text reading

Summary of the invention

The embodiment of the present invention provides a kind of text method of calibration and user terminal, can promote the accuracy of text fragments further, ensures the quality of text reading.

In order to solve the problems of the technologies described above, embodiment of the present invention first aspect provides a kind of text method of calibration, can comprise:

The text snippet of the received text fragment identical with text fragments title to be read is obtained in website contracted by text;

Described text snippet is adopted to carry out matching treatment to described text fragments to be read;

When matching result is greater than predetermined threshold value, export described text fragments to be read.

Embodiment of the present invention second aspect provides a kind of user terminal, can comprise:

Summary acquiring unit, for obtaining the text snippet of the received text fragment identical with text fragments title to be read in contracting website at text;

Fragment match unit, carries out matching treatment for adopting described text snippet to described text fragments to be read;

Fragment output unit, for when matching result is greater than predetermined threshold value, exports described text fragments to be read.

In embodiments of the present invention, by the text snippet of station for acquiring of contracting to the text received text fragment identical with text fragments title to be read, adopt text snippet to treat read text fragment and carry out matching treatment, and when matching result is greater than predetermined threshold value, export described text fragments to be read.By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, further increasing the accuracy of text fragments, and then ensure that the quality of text reading.

Accompanying drawing explanation

In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.

Fig. 1 is the schematic flow sheet of a kind of text method of calibration that the embodiment of the present invention provides;

Fig. 2 is the schematic flow sheet of the another kind of text method of calibration that the embodiment of the present invention provides;

Fig. 3 is the structural representation of a kind of user terminal that the embodiment of the present invention provides;

Fig. 4 is the structural representation of a kind of fragment match unit that the embodiment of the present invention provides;

Fig. 5 is the structural representation of the another kind of fragment match unit that the embodiment of the present invention provides;

Fig. 6 is the structural representation of the another kind of user terminal that the embodiment of the present invention provides;

Fig. 7 is the structural representation of another user terminal that the embodiment of the present invention provides.

Embodiment

Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.

The text method of calibration that the embodiment of the present invention provides can be applied to the scene of read web novel, such as: when reading network novel text, in website contracted by the text of this storywriter, obtain the text snippet of the received text fragment identical with text fragments to be read (such as: chapters and sections etc.) title; Described text snippet is adopted to carry out matching treatment to described text fragments to be read; When matching result is greater than predetermined threshold value, export the scene etc. of described text fragments to be read.By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, further increasing the accuracy of text fragments, and then ensure that the quality of text reading.

The user terminal that the embodiment of the present invention relates to can comprise: the terminal devices such as computing machine, panel computer, smart mobile phone, notebook computer, palm PC and mobile internet device (MID); Described text website of contracting is the website that text author contracts, and the copyright of text returns described text website of contracting to own, and the text that described text is contracted in website is all received texts, i.e. text accurately; Described received text fragment is belong to the partial content in described received text, such as: chapters and sections, subhead and ownership and the content etc. of this subhead.

Below in conjunction with accompanying drawing 1 and accompanying drawing 2, the text method of calibration that the embodiment of the present invention provides is described in detail.

Refer to Fig. 1, for embodiments providing a kind of schematic flow sheet of text method of calibration.As shown in Figure 1, the embodiment of the present invention said method comprising the steps of S101-step S103.

S101, obtains the text snippet of the received text fragment identical with text fragments title to be read in website contracted by text;

Concrete, when user opens text to be read by user terminal, described user terminal is contracted by text the text snippet of the station for acquiring received text fragment identical with the text fragments title to be read in described text to be read, be understandable that, the text to be read that user terminal is opened comes from some texts polymerization website, described text polymerization website is by extracting the text of third party's website, with the website providing user's free text to read, the received text fragment of contracting in website due to text needs charge, but the text snippet that can get in this received text fragment, therefore normative text summary is adopted to mate described text fragments to be read, the accuracy of text fragments to be read can be promoted.

It should be noted that, before described user terminal obtains the text snippet of the received text fragment identical with text fragments title to be read in website contracted by text, when receiving the browse request of the label carrying text to be read, described user terminal can also obtain the standard directory information be herein associated with described label in website contracted by described text, described label is preferably the text title of described text to be read, described user terminal can adopt the directory information of the directory information of described received text to described text to be read to mate, further, described user terminal mates with the title in the catalogue of described text to be read the title in the catalogue of described received text, described title is specifically as follows the title of each chapters and sections in catalogue.By mating for the first time directory information, coupling basis can be provided for the process of text fragments coupling, after directory information fits through, namely mate the directory information of described received text consistent with the directory information of described text to be read time, described user terminal performs the step of the text snippet obtaining the received text fragment identical with text fragments title to be read in website contracted by text.

S102, adopts described text snippet to carry out matching treatment to described text fragments to be read;

Concrete, described user terminal carries out staging treating to described text snippet and described text fragments to be read according to preset format respectively to described, carries out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.

S103, when matching result is greater than predetermined threshold value, exports described text fragments to be read;

Concrete, when the matching result after matching treatment is greater than predetermined threshold value, show that the similarity of described text fragments to be read and described text snippet is higher, described user terminal can export described text fragments to be read.

Refer to Fig. 2, for embodiments providing the schematic flow sheet of another kind of text method of calibration.As shown in Figure 2, the embodiment of the present invention said method comprising the steps of S201-step S208.

S201, when receiving the browse request of the label carrying text to be read, obtains the directory information of the received text be associated with described label in website contracted by text;

S202, adopts the directory information of the directory information of described received text to described text to be read to mate;

Concrete, when receiving the browse request of the label carrying text to be read, user terminal can obtain the standard directory information be herein associated with described label in website contracted by text, described label is preferably the text title of described text to be read, described user terminal can adopt the directory information of the directory information of described received text to described text to be read to mate, further, described user terminal mates with the title in the catalogue of described text to be read the title in the catalogue of described received text, described title is specifically as follows the title of each chapters and sections in catalogue.By mating for the first time directory information, coupling basis can be provided for the process of text fragments coupling.

It should be noted that, the text to be read that user terminal is opened comes from some novels polymerization website, and described novel polymerization website passes through the text extracting third party's website, with the website providing user's free text to read.

S203, after fitting through, obtains the text snippet of the received text fragment identical with text fragments title to be read in website contracted by text;

Concrete, after directory information fits through, namely mate the directory information of described received text consistent with the directory information of described text to be read time, described user terminal is contracted by text the text snippet of the station for acquiring received text fragment identical with the text fragments title to be read in described text to be read, be understandable that, the text to be read that user terminal is opened comes from some texts polymerization website, described text polymerization website is by extracting the text of third party's website, with the website providing user's free text to read, the received text fragment of contracting in website due to text needs charge, but the text snippet that can get in this received text fragment, therefore normative text summary is adopted to mate described text fragments to be read, the accuracy of text fragments to be read can be promoted.

S204, carries out staging treating to described text snippet and described text fragments to be read respectively according to preset format, carries out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read;

Concrete, described user terminal first can carry out staging treating to described text snippet and described text fragments to be read according to preset format respectively, such as: the boundary being segmentation with default number of words, described text snippet and described text fragments to be read are segmented into several character strings; Or according to part of speech, described text snippet and described text fragments to be read are segmented into several character strings etc.Described user terminal can carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.

Preferably, matching treatment process can carry out staging treating to described text snippet and described text fragments to be read for described user terminal according to the first preset format respectively, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read, be understandable that, described first character string can comprise at least one character string, described second character string also can comprise at least one character string, described user terminal obtains the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string, described Levenstein distance is expressed as a character string and becomes editor's quantity of minimum single character that another one character string requires and (comprise insertion, deletion and replacement etc.), described user terminal is according to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.For the first preset format of part of speech segmentation, suppose that the first character string obtained is " sun ", second character string is " heronsbill ", then described string length sum is 5, the Levenstein distance that first character string and the second character string are shown in was 2 (comprise and delete and replace), then according to formula (the Sum-Idist)/Sum of Levenstein ratio, wherein Sum represents the length sum of character string, Idist represents the Levenstein distance between character string, then the Levenstein ratio of described first character string and described second character string is 0.6, described user terminal adopts this kind of mode all to carry out the calculating of Levenstein ratio to all character strings in described text fragments to be read and described text snippet, and the mean value of final Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.Preferably, when calculating the Levenstein ratio of described first character string and described second character string, described first character string and described second character string can also be converted to phonetic respectively, and adopt the Levenstein ratio between the aforesaid way calculating phonetic of described first character string and the phonetic of described second character string, described user terminal obtains the mean value of the Levenstein ratio between the phonetic of described first character string and the Levenstein ratio of described second character string and the phonetic of described first character string and described second character string, using this mean value as Levenstein ratio final between described first character string and described second character string, mate by adopting the mode of phonetic further, phonetically similar word can be avoided the interference of matching treatment process, the accuracy of matching treatment can be promoted further simultaneously.

Be understandable that, adopting the mode of Levenstein ratio to be applicable to string length difference is not very large situation, such as, for text polymerization website, some advertising terms are added in usual meeting before text, and for text snippet, then there is no these advertising terms, therefore in the process of coupling, need too to mate these advertising terms, cause expending the longer time, therefore for this kind of situation, the embodiment of the present invention also provides another kind of matching treatment process to be that described user terminal carries out staging treating to described text snippet and described text fragments to be read respectively according to the second preset format, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read, obtain the number of the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read, the ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.For the second preset format of default number of words segmentation, the one section of statement supposing in described text snippet be " graduation come back home after land here ", one section of statement in described text fragments to be read be " finish page come back home after land here know out ", described default number of words is three, one section of statement then in described text snippet can be divided into " graduates back, industry is come back home, after coming back home, after state, after land, land here " six three-character doctrine strings, and one section of statement in described text fragments to be read can be divided into, and " complete page returns, page is come back home, after coming back home, after state, after land, land here, here land knows, here knowledge is opened " eight the 4th character strings, there are 4 three-character doctrine strings identical with 4 character strings of one section of statement in described text fragments to be read in one section of statement then in described text snippet, therefore the matching result for these two sections of statements is 4/6=0.67.Described user terminal adopts this kind of mode all to carry out the calculating of number ratio to all character strings in described text fragments to be read and described text snippet, and the mean value of final number ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.

Be understandable that, above-mentioned first preset format and the second preset format can be identical preset format, this mode is adopted to name only in order to distinguish at different matching processs, equally, above-mentioned first character string, the second character string and three-character doctrine string, the 4th character string also can be identical character string.

S205, when matching result is greater than predetermined threshold value, exports described text fragments to be read;

S206, when matching result is less than or equal to predetermined threshold value, obtains at least one third party text fragments identical with described text fragments title to be read at least one third party's website;

Concrete, when matching result is less than or equal to predetermined threshold value, show that the similarity of described text fragments to be read and described text snippet is not high, described user terminal can obtain at least one third party text fragments identical with described text fragments title to be read at least one third party's website, be understandable that, described third party's website can for other texts polymerization website except the text polymerization website used at present.

S207, adopts described text snippet to calculate the similarity of each third party's text fragments at least one third party's text fragments described respectively;

Concrete, described user terminal can obtain third party's text fragments of third party's website of predetermined number, such as: the third party's text fragments obtaining 10 third party's websites.Described user terminal adopts described text snippet to calculate the similarity of each third party's text fragments in third party's text fragments of described predetermined number respectively, and concrete computation process see above-mentioned matching treatment process, can not repeat at this.

S208, obtains and exports similarity and be greater than described predetermined threshold value and the maximum third party's text fragments of similarity;

Concrete, described user terminal can obtain and export similarity and be greater than described predetermined threshold value and the maximum third party's text fragments of similarity in third party's text fragments of described predetermined number.

In embodiments of the present invention, by the text snippet of station for acquiring of contracting to the text received text fragment identical with text fragments title to be read, adopt text snippet to treat read text fragment and carry out matching treatment, and when matching result is greater than predetermined threshold value, export described text fragments to be read.By the directory information of matched text, the coupling for follow-up text fragment provides coupling basis; By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, improve the accuracy of text fragments; Adopt the mode of Levenstein ratio and the mode of number ratio to mate text fragments, further increase the accuracy of text fragments; Adopt predetermined threshold value to judge matching result, and will export by text fragments the most accurately according to judged result, ensure that the quality of text reading.

Below in conjunction with accompanying drawing 3-accompanying drawing 6, the user terminal that the embodiment of the present invention provides is described in detail.It should be noted that, the user terminal shown in accompanying drawing 3-accompanying drawing 6, for performing Fig. 1 of the present invention and method embodiment illustrated in fig. 2, for convenience of explanation, illustrate only the part relevant to the embodiment of the present invention, concrete ins and outs do not disclose, and please refer to the embodiment shown in Fig. 1 and Fig. 2 of the present invention.

Refer to Fig. 3, for embodiments providing a kind of structural representation of user terminal.As shown in Figure 5, the described user terminal 1 of the embodiment of the present invention can comprise: summary acquiring unit 11, fragment match unit 12 and fragment output unit 13.

Summary acquiring unit 11, for obtaining the text snippet of the received text fragment identical with text fragments title to be read in contracting website at text;

In specific implementation, when user opens text to be read by user terminal 1, described summary acquiring unit 11 is contracted by text the text snippet of the station for acquiring received text fragment identical with the text fragments title to be read in described text to be read, be understandable that, the text to be read that user terminal is opened comes from some texts polymerization website, described text polymerization website is by extracting the text of third party's website, with the website providing user's free text to read, the received text fragment of contracting in website due to text needs charge, but the text snippet that can get in this received text fragment, therefore normative text summary is adopted to mate described text fragments to be read, the accuracy of text fragments to be read can be promoted.

It should be noted that, before described summary acquiring unit 11 obtains the text snippet of the received text fragment identical with text fragments title to be read in website contracted by text, when receiving the browse request of the label carrying text to be read, described user terminal 1 can also obtain the standard directory information be herein associated with described label in website contracted by described text, described label is preferably the text title of described text to be read, described user terminal 1 can adopt the directory information of the directory information of described received text to described text to be read to mate, further, described user terminal 1 mates with the title in the catalogue of described text to be read the title in the catalogue of described received text, described title is specifically as follows the title of each chapters and sections in catalogue.By mating for the first time directory information, coupling basis can be provided for the process of text fragments coupling, after directory information fits through, namely mate the directory information of described received text consistent with the directory information of described text to be read time, described summary acquiring unit 11 performs the step of the text snippet of the received text fragment that acquisition is identical with text fragments title to be read in website contracted by text.

Fragment match unit 12, carries out matching treatment for adopting described text snippet to described text fragments to be read;

In specific implementation, described fragment match unit 12 carries out staging treating to described text snippet and described text fragments to be read according to preset format respectively to described, carries out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.

Concrete, please also refer to Fig. 4, for embodiments providing a kind of structural representation of fragment match unit.As shown in Figure 4, described fragment match unit 12 can comprise:

First obtains subelement 121, for carrying out staging treating to described text snippet and described text fragments to be read respectively according to the first preset format, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read;

Acquisition of information subelement 122, for obtaining the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string;

First result determination subelement 123, for according to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment;

In specific implementation, described fragment match unit 12 first can carry out staging treating to described text snippet and described text fragments to be read according to preset format respectively, such as: the boundary being segmentation with default number of words, described text snippet and described text fragments to be read are segmented into several character strings; Or according to part of speech, described text snippet and described text fragments to be read are segmented into several character strings etc.Described fragment match unit 12 can carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.

Preferably, matching treatment process can carry out staging treating to described text snippet and described text fragments to be read for described first acquisition subelement 121 according to the first preset format respectively, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read, be understandable that, described first character string can comprise at least one character string, described second character string also can comprise at least one character string, described acquisition of information subelement 122 obtains the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string, described Levenstein distance is expressed as a character string and becomes editor's quantity of minimum single character that another one character string requires and (comprise insertion, deletion and replacement etc.), described first result determination subelement 123 is according to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.For the first preset format of part of speech segmentation, suppose that the first character string obtained is " sun ", second character string is " heronsbill ", then described string length sum is 5, the Levenstein distance that first character string and the second character string are shown in was 2 (comprise and delete and replace), then according to formula (the Sum-Idist)/Sum of Levenstein ratio, wherein Sum represents the length sum of character string, Idist represents the Levenstein distance between character string, then the Levenstein ratio of described first character string and described second character string is 0.6, described fragment match unit 12 adopts this kind of mode all to carry out the calculating of Levenstein ratio to all character strings in described text fragments to be read and described text snippet, and the mean value of final Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.Preferably, when calculating the Levenstein ratio of described first character string and described second character string, described first character string and described second character string can also be converted to phonetic respectively, and adopt the Levenstein ratio between the aforesaid way calculating phonetic of described first character string and the phonetic of described second character string, described fragment match unit 12 obtains the mean value of the Levenstein ratio between the phonetic of the Levenstein ratio of described first character string and described second character string and the phonetic of described first character string and described second character string, using this mean value as Levenstein ratio final between described first character string and described second character string, mate by adopting the mode of phonetic further, phonetically similar word can be avoided the interference of matching treatment process, the accuracy of matching treatment can be promoted further simultaneously.

Be understandable that, adopting the mode of Levenstein ratio to be applicable to string length difference is not very large situation, such as, for text polymerization website, some advertising terms are added in usual meeting before text, and for text snippet, then there is no these advertising terms, therefore in the process of coupling, need too to mate these advertising terms, cause expending the longer time, therefore for this kind of situation, the embodiment of the present invention also provides the structural representation of another kind of fragment match unit, as shown in Figure 5, described fragment match unit 12 can comprise:

Second obtains subelement 124, for carrying out staging treating to described text snippet and described text fragments to be read respectively according to the second preset format, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read;

Number obtains subelement 125, for obtaining the number of the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read;

Second result determination subelement 126, carries out the matching result of matching treatment for the ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described being defined as described text fragments to be read and described text snippet;

In specific implementation, described matching treatment process can also carry out staging treating to described text snippet and described text fragments to be read for described second acquisition subelement 124 according to the second preset format respectively, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read, described number obtains the number that subelement 125 obtains the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read, the ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment by described second result determination subelement 126.For the second preset format of default number of words segmentation, the one section of statement supposing in described text snippet be " graduation come back home after land here ", one section of statement in described text fragments to be read be " finish page come back home after land here know out ", described default number of words is three, one section of statement then in described text snippet can be divided into " graduates back, industry is come back home, after coming back home, after state, after land, land here " six three-character doctrine strings, and one section of statement in described text fragments to be read can be divided into, and " complete page returns, page is come back home, after coming back home, after state, after land, land here, here land knows, here knowledge is opened " eight the 4th character strings, there are 4 three-character doctrine strings identical with 4 character strings of one section of statement in described text fragments to be read in one section of statement then in described text snippet, therefore the matching result for these two sections of statements is 4/6=0.67.Described fragment match unit 12 adopts this kind of mode all to carry out the calculating of number ratio to all character strings in described text fragments to be read and described text snippet, and the mean value of final number ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.

Be understandable that, above-mentioned first preset format and the second preset format can be identical preset format, this mode is adopted to name only in order to distinguish at different matching processs, equally, above-mentioned first character string, the second character string and three-character doctrine string, the 4th character string also can be identical character string.Simultaneously, described fragment match unit 12 can comprise the first acquisition subelement 121 simultaneously, acquisition of information subelement 122, first result determination subelement 123 and second obtains subelement 124, number obtains subelement 125, second result determination subelement 126, for the matching treatment process solved in varied situations.

Fragment output unit 13, for when matching result is greater than predetermined threshold value, exports described text fragments to be read;

Concrete, when the matching result after matching treatment is greater than predetermined threshold value, show that the similarity of described text fragments to be read and described text snippet is higher, described fragment output unit 13 can export described text fragments to be read.

In embodiments of the present invention, by the text snippet of station for acquiring of contracting to the text received text fragment identical with text fragments title to be read, adopt text snippet to treat read text fragment and carry out matching treatment, and when matching result is greater than predetermined threshold value, export described text fragments to be read.By adopting the contract text snippet of received text fragment of station for acquiring of text to treat read text fragment and mate, improve the accuracy of text fragments; Adopt the mode of Levenstein ratio and the mode of number ratio to mate text fragments, further increase the accuracy of text fragments, and then ensure that the quality of text reading.

Refer to Fig. 6, for embodiments providing the structural representation of another kind of user terminal.As shown in Figure 6, the described user terminal 1 of the embodiment of the present invention can comprise: summary acquiring unit 11, fragment match unit 12, fragment output unit 13, information acquisition unit 14, notification unit 15, fragment acquiring unit 16 and computing unit 17; Wherein, the concrete structure of described summary acquiring unit 11, fragment match unit 12 and the part-structure of fragment output unit 13 can the specific descriptions of embodiment shown in Figure 3, do not repeat at this.

Information acquisition unit 14, for when receiving the browse request of the label carrying text to be read, obtains the directory information of the received text be associated with described label in website contracted by text;

Notification unit 15, for adopting the directory information of the directory information of described received text to described text to be read to mate, and after fitting through, notify that described summary acquiring unit 11 performs the text snippet of the received text fragment that acquisition is identical with text fragments title to be read in website contracted by text;

In specific implementation, when described user terminal 1 receives the browse request of the label carrying text to be read, described information acquisition unit 14 can obtain the standard directory information be herein associated with described label in website contracted by text, described label is preferably the text title of described text to be read, described notification unit 15 can adopt the directory information of the directory information of described received text to described text to be read to mate, further, described notification unit 15 mates with the title in the catalogue of described text to be read the title in the catalogue of described received text, described title is specifically as follows the title of each chapters and sections in catalogue.By mating for the first time directory information, coupling basis can be provided for the process of text fragments coupling.

It should be noted that, the text to be read that described user terminal 1 is opened comes from some novels polymerization website, described novel polymerization website is by extracting the text of third party's website, with the website providing user's free text to read, after directory information fits through, namely mate the directory information of described received text consistent with the directory information of described text to be read time, described notification unit 15 notifies that described summary acquiring unit 11 performs the text snippet of the received text fragment that acquisition is identical with text fragments title to be read in website contracted by text.

Fragment acquiring unit 16, for when matching result is less than or equal to predetermined threshold value, obtains at least one third party text fragments identical with described text fragments title to be read at least one third party's website;

In specific implementation, when matching result is less than or equal to predetermined threshold value, show that the similarity of described text fragments to be read and described text snippet is not high, described fragment acquiring unit 16 can obtain at least one third party text fragments identical with described text fragments title to be read at least one third party's website, be understandable that, described third party's website can for other texts polymerization website except the text polymerization website used at present.

Computing unit 17, for the similarity adopting described text snippet to calculate each third party's text fragments at least one third party's text fragments described respectively;

In specific implementation, described computing unit 17 can obtain third party's text fragments of third party's website of predetermined number, such as: the third party's text fragments obtaining 10 third party's websites.Described computing unit 17 adopts described text snippet to calculate the similarity of each third party's text fragments in third party's text fragments of described predetermined number respectively, concrete computation process can the matching treatment process of embodiment shown in Figure 3, does not repeat at this.

Described fragment output unit 12, is also greater than described predetermined threshold value and the maximum third party's text fragments of similarity for obtaining and exporting similarity;

In specific implementation, described fragment output unit 12 can obtain and export similarity and be greater than described predetermined threshold value and the maximum third party's text fragments of similarity in third party's text fragments of described predetermined number.

Refer to Fig. 7, for embodiments providing the structural representation of another user terminal.As shown in Figure 7, described user terminal 1000 can comprise: at least one processor 1001, such as CPU, at least one network interface 1004, user interface 1003, storer 1005, at least one communication bus 1002.Wherein, communication bus 1002 is for realizing the connection communication between these assemblies.Wherein, user interface 1003 can comprise display screen (Display), keyboard (Keyboard), and optional user interface 1003 can also comprise wireline interface, the wave point of standard.Network interface 1004 optionally can comprise wireline interface, the wave point (as WI-FI interface) of standard.Storer 1005 can be high-speed RAM storer, also can be non-labile storer (non-volatilememory), such as at least one magnetic disk memory.Storer 1005 can also be optionally that at least one is positioned at the memory storage away from aforementioned processor 1001.As shown in Figure 7, as comprising operating system, network communication module, Subscriber Interface Module SIM and text verification Application program in a kind of storer 1005 of computer-readable storage medium.

In the user terminal 1000 shown in Fig. 7, network interface 1004 is mainly used in connecting text and contracts website and third party's website, carries out data communication with described user terminal; And processor 1001 may be used for calling the text verification Application program stored in storer 1005, and specifically perform following steps:

In one embodiment, described processor 1001, before performing the text snippet obtaining the received text fragment identical with text fragments title to be read in website contracted by text, also performs following steps:

When receiving the browse request of the label carrying text to be read, in website contracted by text, obtain the directory information of the received text be associated with described label;

Adopt the directory information of the directory information of described received text to described text to be read to mate, and after fitting through, perform the text snippet of the received text fragment that acquisition is identical with text fragments title to be read in website contracted by text.

In one embodiment, described processor 1001, when performing the described text snippet of employing and carrying out matching treatment to described text fragments to be read, specifically performs following steps:

Respectively staging treating is carried out to described text snippet and described text fragments to be read according to preset format, carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.

In one embodiment, described processor 1001 carries out staging treating to described text snippet and described text fragments to be read in execution respectively according to preset format, when carrying out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read, specifically perform following steps:

Respectively staging treating is carried out to described text snippet and described text fragments to be read according to the first preset format, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read;

Obtain the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string;

According to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.

Respectively staging treating is carried out to described text snippet and described text fragments to be read according to the second preset format, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read;

Obtain the number of the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read;

The ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.

In one embodiment, described processor 1001 also performs following steps:

When matching result is less than or equal to predetermined threshold value, at least one third party's website, obtain at least one third party text fragments identical with described text fragments title to be read;

Described text snippet is adopted to calculate the similarity of each third party's text fragments at least one third party's text fragments described respectively;

Obtain and export similarity and be greater than described predetermined threshold value and the maximum third party's text fragments of similarity.

One of ordinary skill in the art will appreciate that all or part of flow process realized in above-described embodiment method, that the hardware that can carry out instruction relevant by computer program has come, described program can be stored in a computer read/write memory medium, this program, when performing, can comprise the flow process of the embodiment as above-mentioned each side method.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-OnlyMemory, ROM) or random store-memory body (RandomAccessMemory, RAM) etc.

Above disclosedly be only present pre-ferred embodiments, certainly can not limit the interest field of the present invention with this, therefore according to the equivalent variations that the claims in the present invention are done, still belong to the scope that the present invention is contained.

Claims

1. a text method of calibration, is characterized in that, comprising:

2. method according to claim 1, is characterized in that, before the text snippet of the described received text fragment that acquisition is identical with text fragments title to be read in website contracted by text, also comprises:

3. method according to claim 1, is characterized in that, the described text snippet of described employing carries out matching treatment to described text fragments to be read, comprising:

4. method according to claim 3, it is characterized in that, describedly respectively staging treating is carried out to described text snippet and described text fragments to be read according to preset format, carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read, comprising:

5. method according to claim 3, it is characterized in that, describedly respectively staging treating is carried out to described text snippet and described text fragments to be read according to preset format, carry out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read, comprising:

6. method according to claim 1, is characterized in that, also comprises:

7. a user terminal, is characterized in that, comprising:

8. terminal according to claim 7, is characterized in that, also comprises:

Information acquisition unit, for when receiving the browse request of the label carrying text to be read, obtains the directory information of the received text be associated with described label in website contracted by text;

Notification unit, for adopting the directory information of the directory information of described received text to described text to be read to mate, and after fitting through, notify that described summary acquiring unit performs the text snippet obtaining the received text fragment identical with text fragments title to be read in website contracted by text.

9. terminal according to claim 7, it is characterized in that, described fragment match unit, specifically for carrying out staging treating to described text snippet and described text fragments to be read respectively according to preset format, carries out matching treatment according to the character string in the character string in the described text snippet after staging treating and described text fragments to be read.

10. terminal according to claim 9, is characterized in that, described fragment match unit comprises:

First obtains subelement, for carrying out staging treating to described text snippet and described text fragments to be read respectively according to the first preset format, and the second character string in the first character string obtained in the described text snippet after staging treating and described text fragments to be read;

Acquisition of information subelement, for obtaining the character string information of described first character string and described second character string, described character string information comprises the Levenstein distance between string length sum and character string;

First result determination subelement, for according to the Levenstein distance between described string length sum and described character string, obtain the Levenstein ratio of described first character string and described second character string, and described Levenstein ratio is defined as the matching result that described text fragments to be read and described text snippet carry out matching treatment.

11. terminals according to claim 9, is characterized in that, described fragment match unit comprises:

Second obtains subelement, for carrying out staging treating to described text snippet and described text fragments to be read respectively according to the second preset format, and at least one the three-character doctrine string obtained in the described text snippet after staging treating and at least one the 4th character string in described text fragments to be read;

Number obtains subelement, for obtaining the number of the 4th character string identical with at least one three-character doctrine string described in described text fragments to be read;

Second result determination subelement, carries out the matching result of matching treatment for the ratio of the described number of the 4th character string and the number of at least one three-character doctrine string described being defined as described text fragments to be read and described text snippet.

12. terminals according to claim 7, is characterized in that, also comprise:

Fragment acquiring unit, for when matching result is less than or equal to predetermined threshold value, obtains at least one third party text fragments identical with described text fragments title to be read at least one third party's website;

Computing unit, for the similarity adopting described text snippet to calculate each third party's text fragments at least one third party's text fragments described respectively;

Described fragment output unit, is also greater than described predetermined threshold value and the maximum third party's text fragments of similarity for obtaining and exporting similarity.