CN105320641B

CN105320641B - Text verification method and user terminal

Info

Publication number: CN105320641B
Application number: CN201410370686.3A
Authority: CN
Inventors: 芦世先
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2014-07-30
Filing date: 2014-07-30
Publication date: 2020-04-03
Anticipated expiration: 2034-07-30
Also published as: CN105320641A

Abstract

The embodiment of the invention discloses a text verification method and a user terminal, wherein the method comprises the following steps: acquiring a text abstract of a standard text segment with the same title as a text segment to be read in a text signing site; matching the text segments to be read by adopting the text abstract; and when the matching result is larger than a preset threshold value, outputting the text segment to be read. The accuracy of the text fragment can be further improved, and the text reading quality is guaranteed.

Description

Text verification method and user terminal

Technical Field

The invention relates to the technical field of internet, in particular to a text verification method and a user terminal.

Background

With the continuous development and improvement of internet technology, networks have become an indispensable part of people's life, and users can connect with networks through user terminals such as mobile phones and computers to perform file transmission, browse network texts, play games and the like.

In the existing process of reading web texts, a feature vector is calculated through a text segment, and whether the text segment is correct or not is determined according to the feature vector, for example: judging whether the novel content belongs to the novel correctly, and the like, because the judgment of the segment content is only carried out according to the currently read network text, the accuracy of the text segment cannot be well ensured, and the quality of text reading is influenced

Disclosure of Invention

The embodiment of the invention provides a text verification method and a user terminal, which can further improve the accuracy of text fragments and ensure the text reading quality.

In order to solve the foregoing technical problem, a first aspect of an embodiment of the present invention provides a text verification method, which may include:

acquiring a text abstract of a standard text segment with the same title as a text segment to be read in a text signing site;

matching the text segments to be read by adopting the text abstract;

and when the matching result is larger than a preset threshold value, outputting the text segment to be read.

A second aspect of an embodiment of the present invention provides a user terminal, which may include:

the abstract acquiring unit is used for acquiring a text abstract of a standard text segment with the same title as that of a text segment to be read in a text signing site;

the segment matching unit is used for matching the text segment to be read by adopting the text abstract;

and the segment output unit is used for outputting the text segment to be read when the matching result is greater than a preset threshold value.

In the embodiment of the invention, the text abstract of the standard text segment with the same title as the text segment to be read is acquired from the text signing site, the text abstract is adopted to carry out matching processing on the text segment to be read, and the text segment to be read is output when the matching result is greater than the preset threshold value. The text abstract of the standard text segment acquired by the text signing site is adopted to match the text segment to be read, so that the accuracy of the text segment is further improved, and the quality of text reading is further ensured.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a text verification method according to an embodiment of the present invention;

fig. 2 is a schematic flowchart of another text verification method according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a user terminal according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a segment matching unit according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of another segment matching unit provided in the embodiment of the present invention;

fig. 6 is a schematic structural diagram of another ue according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of another user terminal according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The text verification method provided by the embodiment of the invention can be applied to scenes of reading network novels, such as: when reading the text of the network novel, acquiring a text abstract of a standard text segment with the same title as that of a text segment to be read (such as a chapter and the like) in a text signing site of a novel author; matching the text segments to be read by adopting the text abstract; and when the matching result is larger than a preset threshold value, outputting the scene of the text segment to be read and the like. The text abstract of the standard text segment acquired by the text signing site is adopted to match the text segment to be read, so that the accuracy of the text segment is further improved, and the quality of text reading is further ensured.

The user terminal related to the embodiment of the invention can comprise: terminal devices such as a computer, a tablet computer, a smart phone, a notebook computer, a palm computer and a Mobile Internet Device (MID); the text signing site is a site signed by a text author, the copyright of the text is owned by the text signing site, and the texts in the text signing site are all standard texts, namely accurate and error-free texts; the standard text segment is a part of content belonging to the standard text, for example: chapter, subtitle, and content attributed to the subtitle, and the like.

The text verification method provided by the embodiment of the invention will be described in detail below with reference to fig. 1 and 2.

Referring to fig. 1, a flow chart of a text verification method according to an embodiment of the present invention is schematically shown. As shown in fig. 1, the method of the embodiment of the present invention includes the following steps S101 to S103.

S101, acquiring a text abstract of a standard text segment with the same title as a text segment to be read in a text signing site;

specifically, when a user opens a text to be read through a user terminal, the user terminal obtains a text abstract of a standard text segment with the same title as that of the text segment to be read in the text to be read through a text signing site, it can be understood that the text to be read opened by the user terminal comes from some text aggregation sites, the text aggregation sites provide a site for free text reading for the user by extracting a text of a third-party site, and the text segment to be read is matched by using the standard text abstract as the standard text segment in the text signing site needs to be charged but the text abstract in the standard text segment can be obtained, so that the accuracy of the text segment to be read can be improved.

It should be noted that, before the user terminal obtains the text abstract of the standard text segment with the same title as the text segment to be read in the text signing site, when receiving a browsing request of a tag carrying a text to be read, the user terminal may further obtain directory information of a standard text associated with the tag in the text signing site, where the tag is preferably a text name of the text to be read, and the user terminal may match the directory information of the text to be read by using the directory information of the standard text, and further, the user terminal matches a title in the directory of the standard text with a title in the directory of the text to be read, and the title may specifically be a title of each chapter in the directory. The method comprises the steps that the directory information is matched for the first time, a matching basis can be provided for the text segment matching process, and after the directory information is matched, namely the directory information matched with the standard text is consistent with the directory information of the text to be read, the user terminal obtains the text abstract of the standard text segment with the same title as the text segment to be read in the text signing site.

S102, matching the text segments to be read by adopting the text abstract;

specifically, the user terminal performs segmentation processing on the text abstract and the text segment to be read respectively according to a preset format, and performs matching processing according to the character string in the text abstract and the character string in the text segment to be read after the segmentation processing.

S103, outputting the text segment to be read when the matching result is larger than a preset threshold value;

specifically, when the matching result after the matching process is greater than a preset threshold, it indicates that the similarity between the text segment to be read and the text abstract is high, and the user terminal may output the text segment to be read.

Referring to fig. 2, a schematic flow chart of another text verification method according to an embodiment of the present invention is provided. As shown in fig. 2, the method of the embodiment of the present invention includes the following steps S201 to S208.

S201, when a browsing request of a label carrying a text to be read is received, acquiring directory information of a standard text associated with the label in a text signing site;

s202, matching the directory information of the text to be read by adopting the directory information of the standard text;

specifically, when a browsing request carrying a tag of a text to be read is received, a user terminal may obtain, in a text subscription site, directory information of a standard text associated with the tag, where the tag is preferably a text name of the text to be read, and the user terminal may match the directory information of the text to be read by using the directory information of the standard text, and further, a title in the directory of the standard text and a title in the directory of the text to be read are matched by the user terminal, where the title may specifically be a title of each chapter in the directory. By carrying out primary matching on the directory information, a matching basis can be provided for the text segment matching process.

It should be noted that the text to be read opened by the user terminal comes from a few novel aggregation sites, and the novel aggregation sites provide free text reading sites for the user by extracting the text of the third-party sites.

S203, after the matching is passed, acquiring a text abstract of a standard text segment with the same title as the text segment to be read in the text signing site;

specifically, after the matching of the directory information is passed, that is, when the directory information of the matched standard text is consistent with the directory information of the text to be read, the user terminal obtains the text abstract of the standard text segment with the same title as the text segment to be read in the text to be read through the text signing site, it can be understood that the text to be read opened by the user terminal comes from some text aggregation sites, the text aggregation sites provide a site for free text reading for the user by extracting the text of a third-party site, and the standard text segment in the text signing site needs to be charged but can be obtained as the text abstract in the standard text segment, so that the text segment to be read is matched by using the standard text abstract, and the accuracy of the text segment to be read can be improved.

S204, segmenting the text abstract and the text segment to be read respectively according to a preset format, and matching according to the character strings in the text abstract and the character strings in the text segment to be read after segmentation;

specifically, the user terminal may first perform segmentation processing on the text summary and the text segment to be read according to a preset format, for example: segmenting the text abstract and the text segment to be read into a plurality of character strings by taking preset word number as a segmentation boundary; or segmenting the text abstract and the text segment to be read into a plurality of character strings and the like according to the part of speech. The user terminal can perform matching processing according to the character strings in the text abstract after segmentation processing and the character strings in the text segment to be read.

Preferably, the matching processing procedure may be that the user terminal performs segmentation processing on the text abstract and the text segment to be read respectively according to a first preset format, and obtains a first character string in the text abstract and a second character string in the text segment to be read after the segmentation processing, it is understood that the first character string may include at least one character string, and the second character string may also include at least one character string, the user terminal obtains character string information of the first character string and the second character string, the character string information includes a sum of character string lengths and a lewistein distance between character strings, the lewistein distance represents an editing number (including insertion, deletion, replacement, and the like) of at least a single character required for changing one character string into another character string, and the user terminal performs matching processing on the sum of character string lengths and the lewistein distance between character strings, and acquiring the Lavenstein ratio of the first character string and the second character string, and determining the Lavenstein ratio as a matching result of the text segment to be read and the text abstract for matching processing. Taking the first preset format of the part of speech segmentation as an example, assuming that the acquired first character string is "sun", the second character string is "sunflower", the Sum of the string lengths is 5 and the levens distance seen by the first string and the second string is 2 (including deletion and replacement), then the resulting string length is determined according to the formula for the levens ratio (Sum-Idist)/Sum, where Sum represents the Sum of the lengths of the strings, Idist represents the Laves distance between strings, the ratio of the levens for the first character string and the second character string is 0.6, the user terminal calculates the ratio of the levens for all the character strings in the text segment to be read and the text abstract in this way, and determining the average value of the final Levensstein ratio as a matching result of the matching processing of the text segment to be read and the text abstract. Preferably, when the levenson ratios of the first character string and the second character string are calculated, the first character string and the second character string can be converted into pinyin respectively, the levenson ratio between the pinyin of the first character string and the pinyin of the second character string is calculated in the above manner, the user terminal obtains the levenson ratios of the first character string and the second character string and the average value of the levenson ratios between the pinyin of the first character string and the pinyin of the second character string, the average value is used as the final levenson ratio between the first character string and the second character string, and the matching is further performed in a pinyin manner, so that the interference of homophones on the matching process can be avoided, and the accuracy of the matching process can be further improved.

It is understood that the manner of using the levenstan ratio is applicable to the case where the lengths of the character strings are not very different, for example, for a text aggregation site, some advertising expressions are usually added in front of a text, and for a text abstract, these advertising expressions are not present, so that in the matching process, these advertising expressions also need to be matched, which results in a long time consumption, and for this case, the embodiment of the present invention further provides another matching process that the user terminal performs segmentation processing on the text abstract and the text segment to be read respectively according to a second preset format, and obtains at least one third character string in the text abstract and at least one fourth character string in the text segment to be read after segmentation processing, obtains the number of fourth character strings in the text segment to be read that is the same as the at least one third character string, and determining the ratio of the number of the fourth character strings to the number of the at least one third character string as a matching result of the matching processing of the text segment to be read and the text abstract. Taking a second preset format of the preset word number segmentation as an example, assuming that a sentence in the text abstract is "where the language is finished after the graduation is returned to the country", the sentence in the text segment to be read is 'the land after returning from the country after finishing the page is recognized', the preset word number is three, then a sentence in the text abstract can be divided into six third character strings of 'graduation return, industry return, after country, after land and so on', and a sentence in the text segment to be read can be divided into eight fourth character strings of' page back after finishing, page back, land after returning, and so on, so as to be recognized and opened, then 4 third character strings in one sentence in the text abstract are the same as 4 character strings in one sentence in the text segment to be read, so that the matching result for the two sentences is 4/6-0.67. The user terminal calculates the number ratio of all the character strings in the text segment to be read and the text abstract by adopting the mode, and determines the average value of the final number ratio as the matching result of the matching processing of the text segment to be read and the text abstract.

It is to be understood that the first preset format and the second preset format may be the same preset format, and the naming is only used for distinguishing different matching processes, and similarly, the first character string, the second character string, the third character string, and the fourth character string may also be the same character string.

S205, when the matching result is larger than a preset threshold value, outputting the text segment to be read;

S206, when the matching result is smaller than or equal to a preset threshold value, at least one third-party text fragment with the same title as the text fragment to be read is obtained from at least one third-party site;

specifically, when the matching result is less than or equal to the preset threshold, it indicates that the similarity between the text segment to be read and the text abstract is not high, and the user terminal may obtain, from at least one third-party site, at least one third-party text segment having the same title as the text segment to be read.

S207, respectively calculating the similarity of each third-party text fragment in the at least one third-party text fragment by using the text abstract;

specifically, the user terminal may obtain third-party text snippets of a preset number of third-party sites, for example: and acquiring the third-party text fragments of 10 third-party sites. The user terminal calculates the similarity of each of the preset number of third-party text segments by using the text abstract, and the specific calculation process may refer to the matching process described above, which is not described herein again.

S208, acquiring and outputting a third-party text segment with the similarity greater than the preset threshold and the maximum similarity;

specifically, the user terminal may obtain and output the third-party text segments with the similarity greater than the preset threshold and the maximum similarity from among the preset number of third-party text segments.

In the embodiment of the invention, the text abstract of the standard text segment with the same title as the text segment to be read is acquired from the text signing site, the text abstract is adopted to carry out matching processing on the text segment to be read, and the text segment to be read is output when the matching result is greater than the preset threshold value. Through the directory information of the matched text, a matching basis is provided for the matching of subsequent text segments; the text abstract of the standard text segment acquired by the text signing site is adopted to match the text segment to be read, so that the accuracy of the text segment is improved; matching the text segments by adopting a Levensstein ratio mode and a number ratio mode, and further improving the accuracy of the text segments; and judging the matching result by adopting a preset threshold value, and outputting the most accurate text segment according to the judgment result, thereby ensuring the text reading quality.

The user terminal provided by the embodiment of the present invention will be described in detail below with reference to fig. 3 to 6. It should be noted that, the user terminals shown in fig. 3 to fig. 6 are used for executing the method according to the embodiment of the present invention shown in fig. 1 and fig. 2, for convenience of description, only the parts related to the embodiment of the present invention are shown, and details of the specific technology are not disclosed, please refer to the embodiment of the present invention shown in fig. 1 and fig. 2.

Referring to fig. 3, a schematic structural diagram of a user terminal is provided in an embodiment of the present invention. As shown in fig. 5, the user terminal 1 according to the embodiment of the present invention may include: a digest acquisition unit 11, a segment matching unit 12, and a segment output unit 13.

The abstract acquiring unit 11 is used for acquiring a text abstract of a standard text segment with the same title as that of a text segment to be read in a text signing site;

in the specific implementation, when a user opens a text to be read through the user terminal 1, the abstract acquiring unit 11 acquires a text abstract of a standard text segment with the same title as that of the text segment to be read in the text to be read through a text signing site, it can be understood that the text to be read opened by the user terminal comes from some text aggregation sites, and the text aggregation sites provide a site for free text reading for the user by extracting a text of a third-party site.

It should be noted that, before the summary obtaining unit 11 obtains the text summary of the standard text segment having the same title as the text segment to be read in the text subscription site, when receiving a browsing request of a tag carrying a text to be read, the user terminal 1 may further obtain directory information of a standard text associated with the tag in the text subscription site, where the tag is preferably a text name of the text to be read, and the user terminal 1 may match the directory information of the text to be read by using the directory information of the standard text, and further, the user terminal 1 matches a title in the directory of the standard text with a title in the directory of the text to be read, where the title may specifically be a title of each chapter in the directory. The initial matching of the directory information can provide a matching basis for the text segment matching process, and after the directory information matching is passed, that is, when the directory information matching the standard text is consistent with the directory information of the text to be read, the abstract acquiring unit 11 executes a step of acquiring the text abstract of the standard text segment with the same title as the text segment to be read in the text signing site.

The segment matching unit 12 is configured to perform matching processing on the text segment to be read by using the text abstract;

in a specific implementation, the segment matching unit 12 performs segmentation processing on the text abstract and the text segment to be read respectively according to a preset format, and performs matching processing according to a character string in the text abstract and a character string in the text segment to be read after the segmentation processing.

Specifically, please refer to fig. 4, which provides a schematic structural diagram of a segment matching unit according to an embodiment of the present invention. As shown in fig. 4, the segment matching unit 12 may include:

the first obtaining subunit 121 is configured to respectively perform segmentation processing on the text abstract and the text fragment to be read according to a first preset format, and obtain a first character string in the text abstract and a second character string in the text fragment to be read after the segmentation processing;

an information obtaining subunit 122, configured to obtain character string information of the first character string and the second character string, where the character string information includes a sum of character string lengths and a levenstein distance between character strings;

a first result determining subunit 123, configured to obtain a levenstein ratio of the first character string and the second character string according to the sum of the lengths of the character strings and the levenstein distance between the character strings, and determine the levenstein ratio as a matching result of matching the text segment to be read and the text abstract;

in a specific implementation, the segment matching unit 12 may first perform segmentation processing on the text abstract and the text segment to be read according to a preset format, for example: segmenting the text abstract and the text segment to be read into a plurality of character strings by taking preset word number as a segmentation boundary; or segmenting the text abstract and the text segment to be read into a plurality of character strings and the like according to the part of speech. The segment matching unit 12 may perform matching processing according to the character strings in the text abstract after the segmentation processing and the character strings in the text segment to be read.

Preferably, the matching processing procedure may be that the first obtaining subunit 121 separately performs segmentation processing on the text abstract and the text segment to be read according to a first preset format, and obtains a first character string in the text abstract and a second character string in the text segment to be read after the segmentation processing, it is understood that the first character string may include at least one character string, and the second character string may also include at least one character string, the information obtaining subunit 122 obtains character string information of the first character string and the second character string, the character string information includes a sum of character string lengths and a lewistan distance between character strings, the lewistan distance represents an edit number (including insertion, deletion, replacement, and the like) of a minimum single character required for one character string to become another character string, the first result determining subunit 123 obtains the levenstein ratio of the first character string and the second character string according to the sum of the lengths of the character strings and the levenstein distance between the character strings, and determines the levenstein ratio as a matching result of matching the text segment to be read and the text abstract. Taking the first preset format of the part of speech segmentation as an example, assuming that the acquired first character string is "sun", the second character string is "sunflower", the Sum of the string lengths is 5 and the levens distance seen by the first string and the second string is 2 (including deletion and replacement), then the resulting string length is determined according to the formula for the levens ratio (Sum-Idist)/Sum, where Sum represents the Sum of the lengths of the strings, Idist represents the Laves distance between strings, the ratio of the levens for the first character string and the second character string is 0.6, the segment matching unit 12 calculates the ratio of the levens for all the character strings in the text segment to be read and the text abstract in this way, and determining the average value of the final Levensstein ratio as a matching result of the matching processing of the text segment to be read and the text abstract. Preferably, when the levenson ratios of the first character string and the second character string are calculated, the first character string and the second character string may be converted into pinyin respectively, the levenson ratio between the pinyin of the first character string and the pinyin of the second character string is calculated in the above manner, the segment matching unit 12 obtains the levenson ratio of the first character string and the second character string and an average value of the levenson ratios between the pinyin of the first character string and the pinyin of the second character string, the average value is used as a final levenson ratio between the first character string and the second character string, and matching is further performed in a pinyin manner, so that interference of homophones on a matching process can be avoided, and meanwhile, accuracy of matching processing can be further improved.

It is understood that the manner of using the levenstan ratio is applicable to the case where the lengths of the character strings are not very different, for example, for a text aggregation site, some advertising expressions are usually added in front of the text, and for a text abstract, these advertising expressions are not present, so that in the matching process, these advertising expressions also need to be matched, which results in a long time consumption, and therefore for this case, the embodiment of the present invention further provides another structural schematic diagram of a segment matching unit, as shown in fig. 5, the segment matching unit 12 may include:

a second obtaining subunit 124, configured to respectively perform segmentation processing on the text abstract and the text segment to be read according to a second preset format, and obtain at least one third character string in the text abstract and at least one fourth character string in the text segment to be read after the segmentation processing;

a number obtaining subunit 125, configured to obtain the number of fourth character strings in the text segment to be read, where the fourth character strings are the same as the at least one third character string;

a second result determining subunit 126, configured to determine, as a matching result of the matching processing between the text segment to be read and the text abstract, a ratio of the number of the fourth character string to the number of the at least one third character string;

in a specific implementation, in the matching process, the second obtaining subunit 124 may further perform segmentation processing on the text abstract and the text segment to be read according to a second preset format, and obtain at least one third character string in the text abstract and at least one fourth character string in the text segment to be read after the segmentation processing, the number obtaining subunit 125 obtains the number of fourth character strings in the text segment to be read, which is the same as the at least one third character string, and the second result determining subunit 126 determines a ratio of the number of the fourth character strings to the number of the at least one third character string as a matching result of the text segment to be read and the text abstract for matching processing. Taking a second preset format of the preset word number segmentation as an example, assuming that a sentence in the text abstract is "where the language is finished after the graduation is returned to the country", the sentence in the text segment to be read is 'the land after returning from the country after finishing the page is recognized', the preset word number is three, then a sentence in the text abstract can be divided into six third character strings of 'graduation return, industry return, after country, after land and so on', and a sentence in the text segment to be read can be divided into eight fourth character strings of' page back after finishing, page back, land after returning, and so on, so as to be recognized and opened, then 4 third character strings in one sentence in the text abstract are the same as 4 character strings in one sentence in the text segment to be read, so that the matching result for the two sentences is 4/6-0.67. The segment matching unit 12 calculates the number ratio of all the character strings in the text segment to be read and the text abstract in this way, and determines the average value of the final number ratio as the matching result of the matching processing of the text segment to be read and the text abstract.

It is to be understood that the first preset format and the second preset format may be the same preset format, and the naming is only used for distinguishing different matching processes, and similarly, the first character string, the second character string, the third character string, and the fourth character string may also be the same character string. Meanwhile, the segment matching unit 12 may include a first obtaining subunit 121, an information obtaining subunit 122, a first result determining subunit 123, a second obtaining subunit 124, a number obtaining subunit 125, and a second result determining subunit 126 at the same time, so as to solve the matching processing procedure in different situations.

The segment output unit 13 is configured to output the text segment to be read when the matching result is greater than a preset threshold;

specifically, when the matching result after the matching processing is greater than the preset threshold, it indicates that the similarity between the text segment to be read and the text abstract is high, and the segment output unit 13 may output the text segment to be read.

In the embodiment of the invention, the text abstract of the standard text segment with the same title as the text segment to be read is acquired from the text signing site, the text abstract is adopted to carry out matching processing on the text segment to be read, and the text segment to be read is output when the matching result is greater than the preset threshold value. The text abstract of the standard text segment acquired by the text signing site is adopted to match the text segment to be read, so that the accuracy of the text segment is improved; the text segments are matched in a Levensstein ratio mode and a number ratio mode, so that the accuracy of the text segments is further improved, and the text reading quality is further ensured.

Referring to fig. 6, a schematic structural diagram of another ue is provided in the embodiment of the present invention. As shown in fig. 6, the user terminal 1 according to the embodiment of the present invention may include: a digest acquisition unit 11, a fragment matching unit 12, a fragment output unit 13, an information acquisition unit 14, a notification unit 15, a fragment acquisition unit 16, and a calculation unit 17; the specific structures of the digest obtaining unit 11, the segment matching unit 12, and a part of the structure of the segment output unit 13 may refer to the specific description of the embodiment shown in fig. 3, which is not described herein again.

The information acquiring unit 14 is configured to acquire, when a browsing request of a tag carrying a text to be read is received, directory information of a standard text associated with the tag in a text signing site;

the notifying unit 15 is configured to match the directory information of the text to be read with the directory information of the standard text, and notify the abstract acquiring unit 11 to acquire a text abstract of a standard text segment having the same title as the text segment to be read in a text signing site after the matching is passed;

in a specific implementation, when the user terminal 1 receives a browsing request of a tag carrying a text to be read, the information obtaining unit 14 may obtain, in a text signing site, directory information of a standard text associated with the tag, where the tag is preferably a text name of the text to be read, the notifying unit 15 may match the directory information of the text to be read by using the directory information of the standard text, and further, the notifying unit 15 matches a title in the directory of the standard text with a title in the directory of the text to be read, where the title may be a title of each chapter in the directory. By carrying out primary matching on the directory information, a matching basis can be provided for the text segment matching process.

It should be noted that the text to be read opened by the user terminal 1 comes from some novel aggregation sites, the novel aggregation sites provide free text reading sites for the user by extracting the text of the third-party site, and after the matching of the directory information is passed, that is, the directory information of the matched standard text is consistent with the directory information of the text to be read, the notification unit 15 notifies the summary acquisition unit 11 to execute the text summary of the standard text segment with the same title as the text segment to be read in the text signing site.

The segment obtaining unit 16 is configured to obtain, in at least one third-party site, at least one third-party text segment having the same title as the text segment to be read when the matching result is smaller than or equal to a preset threshold;

in a specific implementation, when the matching result is less than or equal to the preset threshold, it indicates that the similarity between the text segment to be read and the text abstract is not high, and the segment obtaining unit 16 may obtain, in at least one third-party site, at least one third-party text segment having the same title as the text segment to be read, where it is understood that the third-party site may be a text aggregation site other than a currently used text aggregation site.

The calculating unit 17 is configured to calculate a similarity of each third-party text segment in the at least one third-party text segment by using the text abstract;

in a specific implementation, the calculating unit 17 may obtain third-party text segments of a preset number of third-party sites, for example: and acquiring the third-party text fragments of 10 third-party sites. The calculating unit 17 calculates the similarity of each of the preset number of third-party text segments by using the text summary, and the specific calculating process may refer to the matching processing process in the embodiment shown in fig. 3, which is not described herein again.

The segment output unit 12 is further configured to acquire and output a third-party text segment with a similarity greater than the preset threshold and a maximum similarity;

in a specific implementation, the segment output unit 12 may obtain and output third-party text segments with a similarity greater than the preset threshold and a maximum similarity from among the preset number of third-party text segments.

Referring to fig. 7, a schematic structural diagram of another ue is provided in an embodiment of the present invention. As shown in fig. 7, the user terminal 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, memory 1005, at least one communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display) and a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface and a standard wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (non-volatile memory), such as at least one disk memory. The memory 1005 may optionally be at least one memory device located remotely from the processor 1001. As shown in fig. 7, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a text verification application program.

In the user terminal 1000 shown in fig. 7, the network interface 1004 is mainly used for connecting a text signing site and a third party site, and performing data communication with the user terminal; the processor 1001 may be configured to call the text verification application stored in the memory 1005, and specifically perform the following steps:

matching the text segments to be read by adopting the text abstract;

In one embodiment, before the processor 1001 obtains the text abstract of the standard text segment having the same title as the text segment to be read in the text signing site, the following steps are further performed:

when a browsing request of a label carrying a text to be read is received, acquiring directory information of a standard text associated with the label in a text signing site;

and matching the directory information of the text to be read by adopting the directory information of the standard text, and acquiring the text abstract of the standard text segment with the same title as the text segment to be read in the text signing site after the matching is passed.

In an embodiment, when the processor 1001 performs the matching process on the text segment to be read by using the text abstract, the following steps are specifically performed:

and respectively carrying out segmentation processing on the text abstract and the text segment to be read according to a preset format, and carrying out matching processing according to the character strings in the text abstract and the character strings in the text segment to be read after the segmentation processing.

In an embodiment, when the processor 1001 performs segmentation processing on the text abstract and the text fragment to be read respectively according to a preset format, and performs matching processing according to a character string in the text abstract and a character string in the text fragment to be read after the segmentation processing, the following steps are specifically performed:

respectively carrying out segmentation processing on the text abstract and the text segment to be read according to a first preset format, and acquiring a first character string in the text abstract and a second character string in the text segment to be read after the segmentation processing;

acquiring character string information of the first character string and the second character string, wherein the character string information comprises the sum of the lengths of the character strings and the Levensstein distance between the character strings;

and acquiring the Lavenstein ratio of the first character string and the second character string according to the sum of the lengths of the character strings and the Lavenstein distance between the character strings, and determining the Lavenstein ratio as a matching result of the matching processing of the text segment to be read and the text abstract.

segmenting the text abstract and the text segment to be read according to a second preset format, and acquiring at least one third character string in the text abstract and at least one fourth character string in the text segment to be read after segmentation;

acquiring the number of fourth character strings in the text segment to be read, wherein the number of the fourth character strings is the same as that of the at least one third character string;

and determining the ratio of the number of the fourth character strings to the number of the at least one third character string as a matching result of the matching processing of the text segment to be read and the text abstract.

In one embodiment, the processor 1001 further performs the steps of:

when the matching result is smaller than or equal to a preset threshold value, at least one third-party text segment with the same title as the text segment to be read is obtained from at least one third-party site;

respectively calculating the similarity of each third-party text fragment in the at least one third-party text fragment by adopting the text abstract;

and acquiring and outputting the third-party text segment with the similarity greater than the preset threshold and the maximum similarity.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above disclosure is only for the purpose of illustrating the preferred embodiments of the present invention, and it is therefore to be understood that the invention is not limited by the scope of the appended claims.

Claims

1. A text verification method, comprising:

matching the directory information of the text to be read by adopting the directory information of the standard text;

after the matching is passed, namely when the directory information of the matched standard text is consistent with the directory information of the text to be read, acquiring a text abstract of the standard text segment with the same title as the text segment to be read;

matching the text segment to be read by adopting the text abstract, segmenting the text abstract and the text segment to be read respectively according to a preset format, converting character strings in the text abstract and character strings in the text segment to be read which are segmented into pinyin respectively, and matching according to the pinyin of the character strings in the text abstract and the pinyin of the character strings in the text segment to be read;

2. The method as claimed in claim 1, wherein the segmenting the text abstract and the text segment to be read according to a preset format, converting the character string in the text abstract and the character string in the text segment to be read after the segmenting process into pinyin respectively, and performing matching processing according to the pinyin of the character string in the text abstract and the pinyin of the character string in the text segment to be read comprises:

respectively converting the first character string and the second character string into pinyin, and acquiring the sum of the lengths of the pinyin of the first character string and the pinyin of the second character string and the Levenson distance between the pinyin of the first character string and the pinyin of the second character string;

and acquiring a Levenson ratio between the Pinyin of the first character string and the Pinyin of the second character string according to the sum of the lengths of the Pinyin of the first character string and the Pinyin of the second character string and the Levenson distance between the Pinyin of the first character string and the Pinyin of the second character string, and determining the Levenson ratio as a matching result of matching the text segment to be read and the text abstract.

3. The method according to claim 1, wherein the matching the text segment to be read with the text abstract further comprises:

4. The method of claim 1, further comprising:

5. A user terminal, comprising:

the system comprises an information acquisition unit, a text signing site and a text reading unit, wherein the information acquisition unit is used for acquiring the directory information of a standard text associated with a label when receiving a browsing request of the label carrying a text to be read;

the notification unit is used for matching the directory information of the text to be read by adopting the directory information of the standard text and notifying the abstract acquisition unit after the matching is passed, namely the directory information of the matched standard text is consistent with the directory information of the text to be read;

the segment matching unit is used for respectively carrying out segmentation processing on the text abstract and the text segment to be read according to a preset format, respectively converting character strings in the text abstract and the text segment to be read after the segmentation processing into pinyin, and carrying out matching processing according to the pinyin of the character strings in the text abstract and the pinyin of the character strings in the text segment to be read;

6. The terminal of claim 5, wherein the segment matching unit comprises:

the first obtaining subunit is configured to respectively perform segmentation processing on the text abstract and the text segment to be read according to a first preset format, and obtain a first character string in the text abstract and a second character string in the text segment to be read after the segmentation processing;

an information obtaining subunit, configured to convert the first character string and the second character string into pinyin respectively, and obtain a sum of lengths of the pinyin of the first character string and the pinyin of the second character string and a levenstein distance between the pinyin of the first character string and the pinyin of the second character string;

7. The terminal of claim 5, wherein the segment matching unit comprises:

the second obtaining subunit is configured to respectively perform segmentation processing on the text abstract and the text segment to be read according to a second preset format, and obtain at least one third character string in the text abstract and at least one fourth character string in the text segment to be read after the segmentation processing;

the number obtaining subunit is configured to obtain the number of fourth character strings, which are the same as the at least one third character string, in the text segment to be read;

and the second result determining subunit is configured to determine, as a matching result of the matching processing between the text segment to be read and the text abstract, a ratio of the number of the fourth character string to the number of the at least one third character string.

8. The terminal of claim 5, further comprising:

the segment obtaining unit is used for obtaining at least one third-party text segment with the same title as the text segment to be read from at least one third-party site when the matching result is smaller than or equal to a preset threshold value;

the calculating unit is used for calculating the similarity of each third-party text fragment in the at least one third-party text fragment by adopting the text abstract;

the segment output unit is further configured to acquire and output a third-party text segment with a similarity greater than the preset threshold and a maximum similarity.

9. A user terminal comprising a processor and a memory;

the memory is to store a text verification application, the text verification application including program instructions;

the processor is configured to invoke the text proofing application to perform the text proofing method of any of claims 1-4.

10. A storage medium, characterized in that the storage medium stores a computer program comprising program instructions; the program instructions, when executed by a processor, cause the processor to perform the text verification method of any of claims 1-4.