CN112015889A

CN112015889A - Reading tutoring system generated by text summarization technology

Info

Publication number: CN112015889A
Application number: CN202010832555.8A
Authority: CN
Inventors: 樊星
Original assignee: Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Current assignee: Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Priority date: 2020-08-18
Filing date: 2020-08-18
Publication date: 2020-12-01

Abstract

The invention provides a reading tutoring system generated by a text summarization technology, which scans and converts a target reading text into an editable text, divides the editable text into a plurality of clauses and sentences and extracts effective words in the clauses, analyzes key words of all the effective words, recombines the clauses and sentences according to the key words to form text semantic information matched with the target reading text and outputs a text summarization result consistent with the text semantic information.

Description

Reading tutoring system generated by text summarization technology

Technical Field

The invention relates to the technical field of intelligent education, in particular to a reading tutoring system generated through a text summarization technology.

Background

The main purpose of reading and understanding the text is to extract the main idea of the text from the text, while in actual reading and understanding, a large number of different types of vocabularies such as nouns, verbs, adjectives, adverbs, pronouns and auxiliary words exist in the target reading text, and the vocabularies can respectively contain substantive meanings and insubstantial meanings in the target reading text. Therefore, there is a need in the art for a reading assistance system that can quickly and accurately obtain a text abstract matching the subject matter of a target reading text.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a reading tutoring system generated by a text summarization technology, which comprises a text scanning module, a text recognition conversion module, a text clause and effective word extraction module, a text effective word analysis module, a text semantic recombination determination module and a text summarization result output module; the text scanning module is used for scanning a target reading text so as to convert the target reading text into an electronic reading text; the text recognition conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text; the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence; the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence; the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words; therefore, the reading tutoring system generated by the text summarization technology scans and converts the target reading text into the editable text, divides the editable text into a plurality of clauses and sentences and extracts the effective words therein, analyzes the key words of all the effective words, recombines the clauses and the sentences according to the key words to form the text semantic information matched with the target reading text and outputs the text summarization result consistent with the text semantic information, can screen and eliminate the ineffective words in the target reading text to reserve the effective words with substantial semantics, and generates the corresponding text summarization result according to all the effective words, thereby quickly and accurately obtaining the text summarization consistent with the subject matter content from the target reading text.

The invention provides a reading tutoring system generated by a text abstract technology, which is characterized by comprising a text scanning module, a text recognition and conversion module, a text clause and effective word extraction module, a text effective word analysis module, a text semantic recombination determination module and a text abstract result output module; wherein the content of the first and second substances,

the text scanning module is used for scanning a target reading text so as to convert the target reading text into an electronic reading text;

the text recognition and conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text;

the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence;

the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence;

the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words;

further, the text scanning module comprises a structured light scanning projection sub-module, a reflected light receiving sub-module and a read text electronization sub-module; wherein the content of the first and second substances,

the structured light scanning projection submodule is used for periodically scanning and projecting structured light with light and shade stripe distribution to the target reading text;

the reflected light receiving submodule is used for receiving reflected light formed by the fact that the structured light reaches the target reading text and is reflected by the target reading text;

the reading text electronization sub-module is used for generating the electronized reading text according to the light intensity distribution corresponding to the reflected light, wherein the electronized reading text has an electronic image file format;

further, the text recognition and conversion module comprises a text language determination sub-module and an editable text generation sub-module; wherein the content of the first and second substances,

the text language determination submodule is used for determining the text language type of the electronic reading text according to any text fragment in the electronic reading text;

the editable text generation sub-module is used for reconstructing and editing the electronic reading text according to the language type of the text so as to generate the editable text, wherein the editable text has an electronic text file format;

further, the text clause and valid word extraction module comprises a text clause division sub-module, a text invalid word eliminating sub-module and a text valid word extraction sub-module; wherein the content of the first and second substances,

the text clause dividing module is used for dividing the editable text into n clauses according to all punctuations in the editable text;

the text invalid word removing sub-module is used for identifying invalid words contained in each clause according to a preset invalid word bank and removing invalid words corresponding to each clause, wherein the invalid words comprise any one of adverbs and pronouns;

the text effective word extraction submodule is used for extracting corresponding effective words from each clause after the invalid words are removed according to a preset effective word bank, wherein the effective words comprise any one of verbs, adjectives and nouns;

further, the text effective word analysis module comprises an effective word cluster construction sub-module, a characteristic weight determination sub-module, a comprehensive weight determination sub-module and a keyword determination sub-module; wherein the content of the first and second substances,

the effective word cluster construction submodule is used for forming effective words with the occurrence frequency equal to or greater than the preset frequency in each clause into corresponding effective word clusters;

the characteristic weight determining submodule is used for carrying out iterative scanning operation on the effective word cluster corresponding to each clause so as to obtain a characteristic weight value corresponding to each effective word cluster;

the comprehensive weight determining submodule is used for calculating the comprehensive weight value of each effective word cluster according to the characteristic weight value corresponding to each effective word cluster;

the keyword determining submodule is used for determining keywords of the target reading text according to the comprehensive weight value of each effective word cluster;

further, the effective word cluster feature weight determining submodule is configured to perform iterative scanning operation on the effective word cluster corresponding to each clause according to the following formula (1), so as to obtain a feature weight value of each effective word cluster:

in the above formula (1), Q_fRepresenting the characteristic weight value corresponding to the f-th effective word cluster; k denotes the number of iterative scans, P_fiRepresenting the deviation probability, TF, corresponding to the ith scan of the f-th valid word cluster_f,iDenotes the total number of valid words, TF, contained in the ith scan for the f-th valid word cluster_f-1,iIndicating the total number of effective words contained in the ith scanning by the f-1 th adjacent effective word cluster of the f-th effective word cluster in the ith scanning; TF_f+1,iIndicating the total number of effective words contained in the ith scanning by the f +1 th adjacent effective word cluster next to the f effective word cluster in the ith scanning; t is_f,0Representing the total number of effective words contained in each iterative scanning by the f-th effective word cluster average, wherein s represents a preset coefficient factor and is 0.85;

further, the effective word comprehensive weight determining submodule is configured to calculate and obtain a comprehensive weight value corresponding to the f-th effective word cluster according to the following formula (2):

in the above formula (2), G_fComprehensive weight value of f-th effective word cluster, alpha_fThe weight coefficient representing the f-th effective word cluster is a preset value which can be manually set and is taken as [0.5, 0.8 ]]；R_f1Representing the total number of effective words included in the f-th effective word cluster; r₀Representing the total number of valid words included in all valid word clusters;

the effective word sense association determining submodule is used for determining specific processes among different effective words according to the comprehensive weight value,

firstly, performing descending arrangement on the comprehensive weight values of all effective word clusters, and determining c effective word clusters ranked at the top c as target clusters;

secondly, for each target cluster, calculating the association degree between d effective words included in the current target cluster according to the following formula (3):

wherein q is_uvThe association degree between the u-th effective word and the v-th effective word in the current target cluster is obtained; omega_uvRepresenting the semantic relevance between the preset u-th effective word and the v-th effective word; y is_uRepresenting the occurrence times of the u-th effective word in the current target cluster; y is_vRepresenting the number of times of the v-th valid word appearing in the current target cluster; e_uvRepresenting the times that the u-th effective word and the v-th effective word simultaneously appear in the same effective word cluster; e represents the number of all valid word clusters;

sorting the numerical values of the relevance degrees calculated in the current target cluster from large to small to obtain the relevance degree of the front U bit, and taking effective words corresponding to the relevance degree of the front U bit as keywords corresponding to the current target cluster;

the keyword determination submodule is used for executing the following operations:

acquiring keywords corresponding to all target clusters, and taking the keywords corresponding to all the target clusters as keywords of the target reading text;

further, the text abstract result output module is used for directly taking the key words of the target reading text as the abstract information of the target reading text;

further, the text abstract result output module is configured to extract at least two target sentences from the target reading text, where each target sentence at least includes at least one keyword of the target reading text, and each keyword appears in at least two target sentences;

and taking the at least two target sentences as abstract information of the target reading text.

Compared with the prior art, the reading tutoring system generated by the text summarization technology comprises a text scanning module, a text recognition conversion module, a text clause and effective word extraction module, a text effective word analysis module, a text semantic recombination determination module and a text summarization result output module; the text scanning module is used for scanning a target reading text so as to convert the target reading text into an electronic reading text; the text recognition conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text; the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence; the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence; the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words; therefore, the reading tutoring system generated by the text summarization technology scans and converts the target reading text into the editable text, divides the editable text into a plurality of clauses and sentences and extracts the effective words therein, analyzes the key words of all the effective words, recombines the clauses and the sentences according to the key words to form the text semantic information matched with the target reading text and outputs the text summarization result consistent with the text semantic information, can screen and eliminate the ineffective words in the target reading text to reserve the effective words with substantial semantics, and generates the corresponding text summarization result according to all the effective words, thereby quickly and accurately obtaining the text summarization consistent with the subject matter content from the target reading text.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a reading tutoring system generated by a text summarization technology according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 is a schematic structural diagram of a reading tutoring system generated by a text summarization technology according to an embodiment of the present invention. The reading tutoring system generated by the text abstract technology comprises a text scanning module, a text recognition conversion module, a text clause and effective word extraction module, a text effective word analysis module, a text semantic recombination determination module and a text abstract result output module; wherein the content of the first and second substances,

the text recognition conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text;

the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words.

The reading tutoring system generated by the text summarization technology is different from the prior art that all words in the target reading text are recognized and understood to determine the subject matter content of the target reading text, by carrying out sentence segmentation, identification of invalid words/valid words, filtering and removing of the invalid words, determination of key words of the invalid words and recombination and output of text abstract results on a target reading text, this can decompose the longer text into several shorter and better understood clauses, and filtering and eliminating invalid words from the clause to reduce the workload of semantic analysis on the clause, and the corresponding text abstract is accurately recombined according to the key words of the effective words, so that the main content of the target reading text can be quickly and accurately understood, and the workload of understanding and analyzing the target reading text is effectively reduced.

Preferably, the text scanning module comprises a structured light scanning projection sub-module, a reflected light receiving sub-module and a read text electronization sub-module; wherein the content of the first and second substances,

the reading text electronization submodule is used for generating the electronized reading text according to the light intensity distribution corresponding to the reflected light, wherein the electronized reading text has an electronic image file format.

Because the target reading text is not in an electronically recognizable file format, the target reading text usually exists in a paper form, and the target reading text is converted into an electronic reading text through the text scanning module, so that the target reading text can be conveniently and electronically processed in a corresponding form subsequently, and the applicability of the reading auxiliary system to different types of target reading texts is improved.

Preferably, the text recognition and conversion module comprises a text language determination sub-module and an editable text generation sub-module; wherein the content of the first and second substances,

the text language determination submodule is used for determining the text language type of the electronic reading text according to any text segment in the electronic reading text;

the editable text generation submodule is used for reconstructing and editing the electronic reading text according to the language type of the text so as to generate the editable text, wherein the editable text has an electronic text file format.

The electronic reading text is subjected to reconstruction editing of the corresponding language type through the text recognition conversion module, the electronic reading text in a read-only form can be converted into the text in an editable form, and therefore subsequent adaptive editing processing on the text is facilitated, and the applicability of the text to different reading understanding occasions is improved.

Preferably, the text clause and valid word extraction module comprises a text clause division sub-module, a text invalid word eliminating sub-module and a text valid word extraction sub-module; wherein the content of the first and second substances,

the text effective word extraction submodule is used for extracting corresponding effective words from each clause after the invalid words are removed according to a preset effective word bank, wherein the effective words comprise any one of verbs, adjectives and nouns.

Because the editable text usually has a longer space, the editable text is divided into a plurality of clauses according to the punctuation marks in the editable text, so that the workload of analyzing and identifying the editable text can be effectively reduced, meanwhile, the situation of identification errors caused by the verbosity of the editable text can be avoided, and besides, the identification of invalid words and valid words in the clauses and the removal of the invalid words can reduce the number of words without substantial semantic meanings in the clauses, thereby effectively reducing the difficulty of analyzing and processing the clauses.

Preferably, the text effective word analysis module comprises an effective word cluster construction sub-module, a characteristic weight determination sub-module, a comprehensive weight determination sub-module and a keyword determination sub-module; wherein the content of the first and second substances,

the effective word cluster construction submodule is used for forming effective words with the occurrence frequency equal to or more than the preset frequency in each clause into corresponding effective word clusters;

the keyword determining submodule is used for determining the keywords of the target reading text according to the comprehensive weight value of each effective word cluster.

Preferably, the effective word cluster feature weight determining submodule is configured to perform iterative scanning operation on the effective word cluster corresponding to each clause according to the following formula (1), so as to obtain the feature weight value of each effective word cluster:

in the above formula (1), Q_fRepresenting the characteristic weight value corresponding to the f-th effective word cluster; k denotes the number of iterative scans, P_fiRepresenting the deviation probability, TF, corresponding to the ith scan of the f-th valid word cluster_f,iDenotes the total number of valid words, TF, contained in the ith scan for the f-th valid word cluster_f-1,iIndicating the total number of effective words contained in the ith scanning by the f-1 th adjacent effective word cluster of the f-th effective word cluster in the ith scanning; TF_f+1,iIndicating the total number of effective words contained in the ith scanning by the f +1 th adjacent effective word cluster next to the f effective word cluster in the ith scanning; t is_f,0The total number of effective words contained in the f-th effective word cluster average in each iteration scanning is represented, and s represents a preset coefficient factor and is 0.85.

Preferably, the valid word comprehensive weight determining sub-module is configured to calculate and obtain a comprehensive weight value corresponding to the f-th valid word cluster according to the following formula (2):

the effective word semantic relevance determining submodule is used for determining specific processes among different effective words according to the comprehensive weight value,

and acquiring keywords corresponding to all target clusters, and taking the keywords corresponding to all the target clusters as the keywords of the target reading text.

Preferably, the text summary result output module is configured to directly use the keyword of the target reading text as the summary information of the target reading text.

Preferably, the text summarization result output module is configured to extract at least two target sentences from the target reading text, where each target sentence at least includes at least one keyword of the target reading text, and each keyword appears in at least two target sentences;

The influence of interference vocabulary factors can be avoided in the process of subsequently extracting the keywords by removing the invalid words in the sentences, so that the range of extracting the keywords is reduced, the accuracy of extracting the keywords is improved, the keywords can be accurately screened out by calculating the comprehensive weight value of the valid words, and the keywords are never more biased to the actual reading needs of users.

As can be seen from the content of the above embodiment, the reading tutoring system generated by the text summarization technology includes a text scanning module, a text recognition and conversion module, a text clause and valid word extraction module, a text valid word analysis module, a text semantic recombination determination module, and a text summarization result output module; the text scanning module is used for scanning a target reading text so as to convert the target reading text into an electronic reading text; the text recognition conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text; the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence; the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence; the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words; therefore, the reading tutoring system generated by the text summarization technology scans and converts the target reading text into the editable text, divides the editable text into a plurality of clauses and sentences and extracts the effective words therein, analyzes the key words of all the effective words, recombines the clauses and the sentences according to the key words to form the text semantic information matched with the target reading text and outputs the text summarization result consistent with the text semantic information, can screen and eliminate the ineffective words in the target reading text to reserve the effective words with substantial semantics, and generates the corresponding text summarization result according to all the effective words, thereby quickly and accurately obtaining the text summarization consistent with the subject matter content from the target reading text.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. The reading tutoring system is characterized by comprising a text scanning module, a text recognition and conversion module, a text clause and effective word extraction module, a text effective word analysis module, a text semantic recombination determination module and a text abstract result output module; wherein the content of the first and second substances,

the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence; the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence;

and the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words.

2. The reading coaching system generated by the text summarization technique of claim 1, wherein:

the text scanning module comprises a structured light scanning projection sub-module, a reflected light receiving sub-module and a read text electronization sub-module; wherein the content of the first and second substances,

3. The reading coaching system generated by the text summarization technique of claim 1, wherein:

the text recognition conversion module comprises a text language determination sub-module and an editable text generation sub-module; wherein the content of the first and second substances,

the editable text generation sub-module is used for reconstructing and editing the electronic reading text according to the language type of the text so as to generate the editable text, wherein the editable text has an electronic text file format.

4. The reading coaching system generated by the text summarization technique of claim 1, wherein:

the text clause and valid word extraction module comprises a text clause division sub-module, a text invalid word removing sub-module and a text valid word extraction sub-module; wherein the content of the first and second substances,

5. The reading coaching system generated by the text summarization technique of claim 1, wherein:

the text effective word analysis module comprises an effective word clustering construction sub-module, a characteristic weight determination sub-module, a comprehensive weight determination sub-module and a keyword determination sub-module; wherein the content of the first and second substances,

and the keyword determining submodule is used for determining the keywords of the target reading text according to the comprehensive weight value of each effective word cluster.

6. The reading coaching system generated by the text summarization technique of claim 5, wherein:

the effective word cluster characteristic weight determining submodule is used for carrying out iterative scanning operation on the effective word cluster corresponding to each clause according to the following formula (1) so as to obtain the characteristic weight value of each effective word cluster:

7. The reading coaching system generated by the text summarization technique of claim 6, wherein:

the effective word comprehensive weight determining submodule is used for calculating and obtaining a comprehensive weight value corresponding to the f-th effective word cluster according to the following formula (2):

in the above formula (2), G_fComprehensive weight value of f-th effective word cluster, alpha_fThe weight coefficient representing the f-th effective word cluster is a preset value and can be manually set, and the value range is [0.5, 0.8 ]]；R_f1Representing the total number of effective words included in the f-th effective word cluster; r₀Representing the total number of valid words included in all valid word clusters;

wherein q is_uvThe association degree between the u-th effective word and the v-th effective word in the current target cluster is obtained; omega_uvRepresenting the semantic relevance between the preset u-th effective word and the v-th effective word; y is_uRepresenting the occurrence times of the u-th effective word in the current target cluster; y is_vRepresenting the number of times of the v-th valid word appearing in the current target cluster; e_uvRepresenting the times that the u-th effective word and the v-th effective word appear in the same effective word cluster simultaneously in all the effective word clusters; e represents the number of all valid word clusters;

sorting the numerical values of the relevance degrees calculated in the current target cluster from large to small to obtain the relevance degree of the front U bit, and taking effective words corresponding to the relevance degree of the front U bit as keywords corresponding to the current target cluster; the keyword determination submodule is used for executing the following operations: and acquiring keywords corresponding to all target clusters, and taking the keywords corresponding to all the target clusters as the keywords of the target reading text.

8. The reading coaching system generated by the text summarization technique of claim 7, wherein:

the text abstract result output module is used for directly taking the key words of the target reading text as the abstract information of the target reading text.

9. The reading coaching system generated by the text summarization technique of claim 7, wherein:

the text abstract result output module is used for extracting at least two target sentences from the target reading text, wherein each target sentence at least comprises at least one keyword of the target reading text, and each keyword appears in at least two target sentences;