CN112015889A - Reading tutoring system generated by text summarization technology - Google Patents

Reading tutoring system generated by text summarization technology Download PDF

Info

Publication number
CN112015889A
CN112015889A CN202010832555.8A CN202010832555A CN112015889A CN 112015889 A CN112015889 A CN 112015889A CN 202010832555 A CN202010832555 A CN 202010832555A CN 112015889 A CN112015889 A CN 112015889A
Authority
CN
China
Prior art keywords
text
effective
module
reading
effective word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010832555.8A
Other languages
Chinese (zh)
Inventor
樊星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Original Assignee
Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd filed Critical Shanghai Squirrel Classroom Artificial Intelligence Technology Co Ltd
Priority to CN202010832555.8A priority Critical patent/CN112015889A/en
Publication of CN112015889A publication Critical patent/CN112015889A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/34Browsing; Visualisation therefor
    • G06F16/345Summarisation for human users
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/151Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/263Language identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Abstract

The invention provides a reading tutoring system generated by a text summarization technology, which scans and converts a target reading text into an editable text, divides the editable text into a plurality of clauses and sentences and extracts effective words in the clauses, analyzes key words of all the effective words, recombines the clauses and sentences according to the key words to form text semantic information matched with the target reading text and outputs a text summarization result consistent with the text semantic information.

Description

Reading tutoring system generated by text summarization technology
Technical Field
The invention relates to the technical field of intelligent education, in particular to a reading tutoring system generated through a text summarization technology.
Background
The main purpose of reading and understanding the text is to extract the main idea of the text from the text, while in actual reading and understanding, a large number of different types of vocabularies such as nouns, verbs, adjectives, adverbs, pronouns and auxiliary words exist in the target reading text, and the vocabularies can respectively contain substantive meanings and insubstantial meanings in the target reading text. Therefore, there is a need in the art for a reading assistance system that can quickly and accurately obtain a text abstract matching the subject matter of a target reading text.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a reading tutoring system generated by a text summarization technology, which comprises a text scanning module, a text recognition conversion module, a text clause and effective word extraction module, a text effective word analysis module, a text semantic recombination determination module and a text summarization result output module; the text scanning module is used for scanning a target reading text so as to convert the target reading text into an electronic reading text; the text recognition conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text; the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence; the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence; the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words; therefore, the reading tutoring system generated by the text summarization technology scans and converts the target reading text into the editable text, divides the editable text into a plurality of clauses and sentences and extracts the effective words therein, analyzes the key words of all the effective words, recombines the clauses and the sentences according to the key words to form the text semantic information matched with the target reading text and outputs the text summarization result consistent with the text semantic information, can screen and eliminate the ineffective words in the target reading text to reserve the effective words with substantial semantics, and generates the corresponding text summarization result according to all the effective words, thereby quickly and accurately obtaining the text summarization consistent with the subject matter content from the target reading text.
The invention provides a reading tutoring system generated by a text abstract technology, which is characterized by comprising a text scanning module, a text recognition and conversion module, a text clause and effective word extraction module, a text effective word analysis module, a text semantic recombination determination module and a text abstract result output module; wherein the content of the first and second substances,
the text scanning module is used for scanning a target reading text so as to convert the target reading text into an electronic reading text;
the text recognition and conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text;
the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence;
the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence;
the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words;
further, the text scanning module comprises a structured light scanning projection sub-module, a reflected light receiving sub-module and a read text electronization sub-module; wherein the content of the first and second substances,
the structured light scanning projection submodule is used for periodically scanning and projecting structured light with light and shade stripe distribution to the target reading text;
the reflected light receiving submodule is used for receiving reflected light formed by the fact that the structured light reaches the target reading text and is reflected by the target reading text;
the reading text electronization sub-module is used for generating the electronized reading text according to the light intensity distribution corresponding to the reflected light, wherein the electronized reading text has an electronic image file format;
further, the text recognition and conversion module comprises a text language determination sub-module and an editable text generation sub-module; wherein the content of the first and second substances,
the text language determination submodule is used for determining the text language type of the electronic reading text according to any text fragment in the electronic reading text;
the editable text generation sub-module is used for reconstructing and editing the electronic reading text according to the language type of the text so as to generate the editable text, wherein the editable text has an electronic text file format;
further, the text clause and valid word extraction module comprises a text clause division sub-module, a text invalid word eliminating sub-module and a text valid word extraction sub-module; wherein the content of the first and second substances,
the text clause dividing module is used for dividing the editable text into n clauses according to all punctuations in the editable text;
the text invalid word removing sub-module is used for identifying invalid words contained in each clause according to a preset invalid word bank and removing invalid words corresponding to each clause, wherein the invalid words comprise any one of adverbs and pronouns;
the text effective word extraction submodule is used for extracting corresponding effective words from each clause after the invalid words are removed according to a preset effective word bank, wherein the effective words comprise any one of verbs, adjectives and nouns;
further, the text effective word analysis module comprises an effective word cluster construction sub-module, a characteristic weight determination sub-module, a comprehensive weight determination sub-module and a keyword determination sub-module; wherein the content of the first and second substances,
the effective word cluster construction submodule is used for forming effective words with the occurrence frequency equal to or greater than the preset frequency in each clause into corresponding effective word clusters;
the characteristic weight determining submodule is used for carrying out iterative scanning operation on the effective word cluster corresponding to each clause so as to obtain a characteristic weight value corresponding to each effective word cluster;
the comprehensive weight determining submodule is used for calculating the comprehensive weight value of each effective word cluster according to the characteristic weight value corresponding to each effective word cluster;
the keyword determining submodule is used for determining keywords of the target reading text according to the comprehensive weight value of each effective word cluster;
further, the effective word cluster feature weight determining submodule is configured to perform iterative scanning operation on the effective word cluster corresponding to each clause according to the following formula (1), so as to obtain a feature weight value of each effective word cluster:
Figure BDA0002638524890000041
in the above formula (1), QfRepresenting the characteristic weight value corresponding to the f-th effective word cluster; k denotes the number of iterative scans, PfiRepresenting the deviation probability, TF, corresponding to the ith scan of the f-th valid word clusterf,iDenotes the total number of valid words, TF, contained in the ith scan for the f-th valid word clusterf-1,iIndicating the total number of effective words contained in the ith scanning by the f-1 th adjacent effective word cluster of the f-th effective word cluster in the ith scanning; TFf+1,iIndicating the total number of effective words contained in the ith scanning by the f +1 th adjacent effective word cluster next to the f effective word cluster in the ith scanning; t isf,0Representing the total number of effective words contained in each iterative scanning by the f-th effective word cluster average, wherein s represents a preset coefficient factor and is 0.85;
further, the effective word comprehensive weight determining submodule is configured to calculate and obtain a comprehensive weight value corresponding to the f-th effective word cluster according to the following formula (2):
Figure BDA0002638524890000042
in the above formula (2), GfComprehensive weight value of f-th effective word cluster, alphafThe weight coefficient representing the f-th effective word cluster is a preset value which can be manually set and is taken as [0.5, 0.8 ]];Rf1Representing the total number of effective words included in the f-th effective word cluster; r0Representing the total number of valid words included in all valid word clusters;
the effective word sense association determining submodule is used for determining specific processes among different effective words according to the comprehensive weight value,
firstly, performing descending arrangement on the comprehensive weight values of all effective word clusters, and determining c effective word clusters ranked at the top c as target clusters;
secondly, for each target cluster, calculating the association degree between d effective words included in the current target cluster according to the following formula (3):
Figure BDA0002638524890000051
wherein q isuvThe association degree between the u-th effective word and the v-th effective word in the current target cluster is obtained; omegauvRepresenting the semantic relevance between the preset u-th effective word and the v-th effective word; y isuRepresenting the occurrence times of the u-th effective word in the current target cluster; y isvRepresenting the number of times of the v-th valid word appearing in the current target cluster; euvRepresenting the times that the u-th effective word and the v-th effective word simultaneously appear in the same effective word cluster; e represents the number of all valid word clusters;
sorting the numerical values of the relevance degrees calculated in the current target cluster from large to small to obtain the relevance degree of the front U bit, and taking effective words corresponding to the relevance degree of the front U bit as keywords corresponding to the current target cluster;
the keyword determination submodule is used for executing the following operations:
acquiring keywords corresponding to all target clusters, and taking the keywords corresponding to all the target clusters as keywords of the target reading text;
further, the text abstract result output module is used for directly taking the key words of the target reading text as the abstract information of the target reading text;
further, the text abstract result output module is configured to extract at least two target sentences from the target reading text, where each target sentence at least includes at least one keyword of the target reading text, and each keyword appears in at least two target sentences;
and taking the at least two target sentences as abstract information of the target reading text.
Compared with the prior art, the reading tutoring system generated by the text summarization technology comprises a text scanning module, a text recognition conversion module, a text clause and effective word extraction module, a text effective word analysis module, a text semantic recombination determination module and a text summarization result output module; the text scanning module is used for scanning a target reading text so as to convert the target reading text into an electronic reading text; the text recognition conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text; the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence; the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence; the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words; therefore, the reading tutoring system generated by the text summarization technology scans and converts the target reading text into the editable text, divides the editable text into a plurality of clauses and sentences and extracts the effective words therein, analyzes the key words of all the effective words, recombines the clauses and the sentences according to the key words to form the text semantic information matched with the target reading text and outputs the text summarization result consistent with the text semantic information, can screen and eliminate the ineffective words in the target reading text to reserve the effective words with substantial semantics, and generates the corresponding text summarization result according to all the effective words, thereby quickly and accurately obtaining the text summarization consistent with the subject matter content from the target reading text.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic structural diagram of a reading tutoring system generated by a text summarization technology according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic structural diagram of a reading tutoring system generated by a text summarization technology according to an embodiment of the present invention. The reading tutoring system generated by the text abstract technology comprises a text scanning module, a text recognition conversion module, a text clause and effective word extraction module, a text effective word analysis module, a text semantic recombination determination module and a text abstract result output module; wherein the content of the first and second substances,
the text scanning module is used for scanning a target reading text so as to convert the target reading text into an electronic reading text;
the text recognition conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text;
the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence;
the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence;
the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words.
The reading tutoring system generated by the text summarization technology is different from the prior art that all words in the target reading text are recognized and understood to determine the subject matter content of the target reading text, by carrying out sentence segmentation, identification of invalid words/valid words, filtering and removing of the invalid words, determination of key words of the invalid words and recombination and output of text abstract results on a target reading text, this can decompose the longer text into several shorter and better understood clauses, and filtering and eliminating invalid words from the clause to reduce the workload of semantic analysis on the clause, and the corresponding text abstract is accurately recombined according to the key words of the effective words, so that the main content of the target reading text can be quickly and accurately understood, and the workload of understanding and analyzing the target reading text is effectively reduced.
Preferably, the text scanning module comprises a structured light scanning projection sub-module, a reflected light receiving sub-module and a read text electronization sub-module; wherein the content of the first and second substances,
the structured light scanning projection submodule is used for periodically scanning and projecting structured light with light and shade stripe distribution to the target reading text;
the reflected light receiving submodule is used for receiving reflected light formed by the fact that the structured light reaches the target reading text and is reflected by the target reading text;
the reading text electronization submodule is used for generating the electronized reading text according to the light intensity distribution corresponding to the reflected light, wherein the electronized reading text has an electronic image file format.
Because the target reading text is not in an electronically recognizable file format, the target reading text usually exists in a paper form, and the target reading text is converted into an electronic reading text through the text scanning module, so that the target reading text can be conveniently and electronically processed in a corresponding form subsequently, and the applicability of the reading auxiliary system to different types of target reading texts is improved.
Preferably, the text recognition and conversion module comprises a text language determination sub-module and an editable text generation sub-module; wherein the content of the first and second substances,
the text language determination submodule is used for determining the text language type of the electronic reading text according to any text segment in the electronic reading text;
the editable text generation submodule is used for reconstructing and editing the electronic reading text according to the language type of the text so as to generate the editable text, wherein the editable text has an electronic text file format.
The electronic reading text is subjected to reconstruction editing of the corresponding language type through the text recognition conversion module, the electronic reading text in a read-only form can be converted into the text in an editable form, and therefore subsequent adaptive editing processing on the text is facilitated, and the applicability of the text to different reading understanding occasions is improved.
Preferably, the text clause and valid word extraction module comprises a text clause division sub-module, a text invalid word eliminating sub-module and a text valid word extraction sub-module; wherein the content of the first and second substances,
the text clause dividing module is used for dividing the editable text into n clauses according to all punctuations in the editable text;
the text invalid word removing sub-module is used for identifying invalid words contained in each clause according to a preset invalid word bank and removing invalid words corresponding to each clause, wherein the invalid words comprise any one of adverbs and pronouns;
the text effective word extraction submodule is used for extracting corresponding effective words from each clause after the invalid words are removed according to a preset effective word bank, wherein the effective words comprise any one of verbs, adjectives and nouns.
Because the editable text usually has a longer space, the editable text is divided into a plurality of clauses according to the punctuation marks in the editable text, so that the workload of analyzing and identifying the editable text can be effectively reduced, meanwhile, the situation of identification errors caused by the verbosity of the editable text can be avoided, and besides, the identification of invalid words and valid words in the clauses and the removal of the invalid words can reduce the number of words without substantial semantic meanings in the clauses, thereby effectively reducing the difficulty of analyzing and processing the clauses.
Preferably, the text effective word analysis module comprises an effective word cluster construction sub-module, a characteristic weight determination sub-module, a comprehensive weight determination sub-module and a keyword determination sub-module; wherein the content of the first and second substances,
the effective word cluster construction submodule is used for forming effective words with the occurrence frequency equal to or more than the preset frequency in each clause into corresponding effective word clusters;
the characteristic weight determining submodule is used for carrying out iterative scanning operation on the effective word cluster corresponding to each clause so as to obtain a characteristic weight value corresponding to each effective word cluster;
the comprehensive weight determining submodule is used for calculating the comprehensive weight value of each effective word cluster according to the characteristic weight value corresponding to each effective word cluster;
the keyword determining submodule is used for determining the keywords of the target reading text according to the comprehensive weight value of each effective word cluster.
Preferably, the effective word cluster feature weight determining submodule is configured to perform iterative scanning operation on the effective word cluster corresponding to each clause according to the following formula (1), so as to obtain the feature weight value of each effective word cluster:
Figure BDA0002638524890000101
in the above formula (1), QfRepresenting the characteristic weight value corresponding to the f-th effective word cluster; k denotes the number of iterative scans, PfiRepresenting the deviation probability, TF, corresponding to the ith scan of the f-th valid word clusterf,iDenotes the total number of valid words, TF, contained in the ith scan for the f-th valid word clusterf-1,iIndicating the total number of effective words contained in the ith scanning by the f-1 th adjacent effective word cluster of the f-th effective word cluster in the ith scanning; TFf+1,iIndicating the total number of effective words contained in the ith scanning by the f +1 th adjacent effective word cluster next to the f effective word cluster in the ith scanning; t isf,0The total number of effective words contained in the f-th effective word cluster average in each iteration scanning is represented, and s represents a preset coefficient factor and is 0.85.
Preferably, the valid word comprehensive weight determining sub-module is configured to calculate and obtain a comprehensive weight value corresponding to the f-th valid word cluster according to the following formula (2):
Figure BDA0002638524890000102
in the above formula (2), GfComprehensive weight value of f-th effective word cluster, alphafThe weight coefficient representing the f-th effective word cluster is a preset value which can be manually set and is taken as [0.5, 0.8 ]];Rf1Representing the total number of effective words included in the f-th effective word cluster; r0Representing the total number of valid words included in all valid word clusters;
the effective word semantic relevance determining submodule is used for determining specific processes among different effective words according to the comprehensive weight value,
firstly, performing descending arrangement on the comprehensive weight values of all effective word clusters, and determining c effective word clusters ranked at the top c as target clusters;
secondly, for each target cluster, calculating the association degree between d effective words included in the current target cluster according to the following formula (3):
Figure BDA0002638524890000111
wherein q isuvThe association degree between the u-th effective word and the v-th effective word in the current target cluster is obtained; omegauvRepresenting the semantic relevance between the preset u-th effective word and the v-th effective word; y isuRepresenting the occurrence times of the u-th effective word in the current target cluster; y isvRepresenting the number of times of the v-th valid word appearing in the current target cluster; euvRepresenting the times that the u-th effective word and the v-th effective word simultaneously appear in the same effective word cluster; e represents the number of all valid word clusters;
sorting the numerical values of the relevance degrees calculated in the current target cluster from large to small to obtain the relevance degree of the front U bit, and taking effective words corresponding to the relevance degree of the front U bit as keywords corresponding to the current target cluster;
the keyword determination submodule is used for executing the following operations:
and acquiring keywords corresponding to all target clusters, and taking the keywords corresponding to all the target clusters as the keywords of the target reading text.
Preferably, the text summary result output module is configured to directly use the keyword of the target reading text as the summary information of the target reading text.
Preferably, the text summarization result output module is configured to extract at least two target sentences from the target reading text, where each target sentence at least includes at least one keyword of the target reading text, and each keyword appears in at least two target sentences;
and taking the at least two target sentences as abstract information of the target reading text.
The influence of interference vocabulary factors can be avoided in the process of subsequently extracting the keywords by removing the invalid words in the sentences, so that the range of extracting the keywords is reduced, the accuracy of extracting the keywords is improved, the keywords can be accurately screened out by calculating the comprehensive weight value of the valid words, and the keywords are never more biased to the actual reading needs of users.
As can be seen from the content of the above embodiment, the reading tutoring system generated by the text summarization technology includes a text scanning module, a text recognition and conversion module, a text clause and valid word extraction module, a text valid word analysis module, a text semantic recombination determination module, and a text summarization result output module; the text scanning module is used for scanning a target reading text so as to convert the target reading text into an electronic reading text; the text recognition conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text; the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence; the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence; the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words; therefore, the reading tutoring system generated by the text summarization technology scans and converts the target reading text into the editable text, divides the editable text into a plurality of clauses and sentences and extracts the effective words therein, analyzes the key words of all the effective words, recombines the clauses and the sentences according to the key words to form the text semantic information matched with the target reading text and outputs the text summarization result consistent with the text semantic information, can screen and eliminate the ineffective words in the target reading text to reserve the effective words with substantial semantics, and generates the corresponding text summarization result according to all the effective words, thereby quickly and accurately obtaining the text summarization consistent with the subject matter content from the target reading text.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (9)

1. The reading tutoring system is characterized by comprising a text scanning module, a text recognition and conversion module, a text clause and effective word extraction module, a text effective word analysis module, a text semantic recombination determination module and a text abstract result output module; wherein the content of the first and second substances,
the text scanning module is used for scanning a target reading text so as to convert the target reading text into an electronic reading text;
the text recognition and conversion module is used for determining the language type of the text of the electronic reading text and converting the electronic reading text into an editable text according to the language type of the text;
the text clause and valid word extraction module is used for carrying out clause processing on the editable text so as to obtain a plurality of clause sentences, and extracting corresponding valid words from each clause sentence; the text effective word analysis module is used for determining a keyword corresponding to the target reading text according to a corresponding effective word extracted from each sentence dividing sentence;
and the text abstract result output module is used for outputting the text abstract of the target reading text according to the key words.
2. The reading coaching system generated by the text summarization technique of claim 1, wherein:
the text scanning module comprises a structured light scanning projection sub-module, a reflected light receiving sub-module and a read text electronization sub-module; wherein the content of the first and second substances,
the structured light scanning projection submodule is used for periodically scanning and projecting structured light with light and shade stripe distribution to the target reading text;
the reflected light receiving submodule is used for receiving reflected light formed by the fact that the structured light reaches the target reading text and is reflected by the target reading text;
the reading text electronization submodule is used for generating the electronized reading text according to the light intensity distribution corresponding to the reflected light, wherein the electronized reading text has an electronic image file format.
3. The reading coaching system generated by the text summarization technique of claim 1, wherein:
the text recognition conversion module comprises a text language determination sub-module and an editable text generation sub-module; wherein the content of the first and second substances,
the text language determination submodule is used for determining the text language type of the electronic reading text according to any text fragment in the electronic reading text;
the editable text generation sub-module is used for reconstructing and editing the electronic reading text according to the language type of the text so as to generate the editable text, wherein the editable text has an electronic text file format.
4. The reading coaching system generated by the text summarization technique of claim 1, wherein:
the text clause and valid word extraction module comprises a text clause division sub-module, a text invalid word removing sub-module and a text valid word extraction sub-module; wherein the content of the first and second substances,
the text clause dividing module is used for dividing the editable text into n clauses according to all punctuations in the editable text;
the text invalid word removing sub-module is used for identifying invalid words contained in each clause according to a preset invalid word bank and removing invalid words corresponding to each clause, wherein the invalid words comprise any one of adverbs and pronouns;
the text effective word extraction submodule is used for extracting corresponding effective words from each clause after the invalid words are removed according to a preset effective word bank, wherein the effective words comprise any one of verbs, adjectives and nouns.
5. The reading coaching system generated by the text summarization technique of claim 1, wherein:
the text effective word analysis module comprises an effective word clustering construction sub-module, a characteristic weight determination sub-module, a comprehensive weight determination sub-module and a keyword determination sub-module; wherein the content of the first and second substances,
the effective word cluster construction submodule is used for forming effective words with the occurrence frequency equal to or greater than the preset frequency in each clause into corresponding effective word clusters;
the characteristic weight determining submodule is used for carrying out iterative scanning operation on the effective word cluster corresponding to each clause so as to obtain a characteristic weight value corresponding to each effective word cluster;
the comprehensive weight determining submodule is used for calculating the comprehensive weight value of each effective word cluster according to the characteristic weight value corresponding to each effective word cluster;
and the keyword determining submodule is used for determining the keywords of the target reading text according to the comprehensive weight value of each effective word cluster.
6. The reading coaching system generated by the text summarization technique of claim 5, wherein:
the effective word cluster characteristic weight determining submodule is used for carrying out iterative scanning operation on the effective word cluster corresponding to each clause according to the following formula (1) so as to obtain the characteristic weight value of each effective word cluster:
Figure FDA0002638524880000031
in the above formula (1), QfRepresenting the characteristic weight value corresponding to the f-th effective word cluster; k denotes the number of iterative scans, PfiRepresenting the deviation probability, TF, corresponding to the ith scan of the f-th valid word clusterf,iDenotes the total number of valid words, TF, contained in the ith scan for the f-th valid word clusterf-1,iIndicating the total number of effective words contained in the ith scanning by the f-1 th adjacent effective word cluster of the f-th effective word cluster in the ith scanning; TFf+1,iIndicating the total number of effective words contained in the ith scanning by the f +1 th adjacent effective word cluster next to the f effective word cluster in the ith scanning; t isf,0The total number of effective words contained in the f-th effective word cluster average in each iteration scanning is represented, and s represents a preset coefficient factor and is 0.85.
7. The reading coaching system generated by the text summarization technique of claim 6, wherein:
the effective word comprehensive weight determining submodule is used for calculating and obtaining a comprehensive weight value corresponding to the f-th effective word cluster according to the following formula (2):
Figure FDA0002638524880000041
in the above formula (2), GfComprehensive weight value of f-th effective word cluster, alphafThe weight coefficient representing the f-th effective word cluster is a preset value and can be manually set, and the value range is [0.5, 0.8 ]];Rf1Representing the total number of effective words included in the f-th effective word cluster; r0Representing the total number of valid words included in all valid word clusters;
the effective word sense association determining submodule is used for determining specific processes among different effective words according to the comprehensive weight value,
firstly, performing descending arrangement on the comprehensive weight values of all effective word clusters, and determining c effective word clusters ranked at the top c as target clusters;
secondly, for each target cluster, calculating the association degree between d effective words included in the current target cluster according to the following formula (3):
Figure FDA0002638524880000042
wherein q isuvThe association degree between the u-th effective word and the v-th effective word in the current target cluster is obtained; omegauvRepresenting the semantic relevance between the preset u-th effective word and the v-th effective word; y isuRepresenting the occurrence times of the u-th effective word in the current target cluster; y isvRepresenting the number of times of the v-th valid word appearing in the current target cluster; euvRepresenting the times that the u-th effective word and the v-th effective word appear in the same effective word cluster simultaneously in all the effective word clusters; e represents the number of all valid word clusters;
sorting the numerical values of the relevance degrees calculated in the current target cluster from large to small to obtain the relevance degree of the front U bit, and taking effective words corresponding to the relevance degree of the front U bit as keywords corresponding to the current target cluster; the keyword determination submodule is used for executing the following operations: and acquiring keywords corresponding to all target clusters, and taking the keywords corresponding to all the target clusters as the keywords of the target reading text.
8. The reading coaching system generated by the text summarization technique of claim 7, wherein:
the text abstract result output module is used for directly taking the key words of the target reading text as the abstract information of the target reading text.
9. The reading coaching system generated by the text summarization technique of claim 7, wherein:
the text abstract result output module is used for extracting at least two target sentences from the target reading text, wherein each target sentence at least comprises at least one keyword of the target reading text, and each keyword appears in at least two target sentences;
and taking the at least two target sentences as abstract information of the target reading text.
CN202010832555.8A 2020-08-18 2020-08-18 Reading tutoring system generated by text summarization technology Pending CN112015889A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010832555.8A CN112015889A (en) 2020-08-18 2020-08-18 Reading tutoring system generated by text summarization technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010832555.8A CN112015889A (en) 2020-08-18 2020-08-18 Reading tutoring system generated by text summarization technology

Publications (1)

Publication Number Publication Date
CN112015889A true CN112015889A (en) 2020-12-01

Family

ID=73504921

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010832555.8A Pending CN112015889A (en) 2020-08-18 2020-08-18 Reading tutoring system generated by text summarization technology

Country Status (1)

Country Link
CN (1) CN112015889A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378566A (en) * 2021-05-31 2021-09-10 安徽淘云科技股份有限公司 Information content display method, device and equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113378566A (en) * 2021-05-31 2021-09-10 安徽淘云科技股份有限公司 Information content display method, device and equipment

Similar Documents

Publication Publication Date Title
CN1954315B (en) Systems and methods for translating chinese pinyin to chinese characters
KR100961717B1 (en) Method and apparatus for detecting errors of machine translation using parallel corpus
US5867597A (en) High-speed retrieval by example
US7983903B2 (en) Mining bilingual dictionaries from monolingual web pages
AU2002333063B2 (en) Character string identification
US7310773B2 (en) Removal of extraneous text from electronic documents
CN107145584B (en) Resume parsing method based on n-gram model
CN112151014B (en) Speech recognition result evaluation method, device, equipment and storage medium
Chang A new approach for automatic Chinese spelling correction
CN112926345A (en) Multi-feature fusion neural machine translation error detection method based on data enhancement training
CN114266256A (en) Method and system for extracting new words in field
Pal et al. OCR error correction of an inflectional indian language using morphological parsing
CN116127015A (en) NLP large model analysis system based on artificial intelligence self-adaption
CN109815503B (en) Man-machine interaction translation method
CN112015889A (en) Reading tutoring system generated by text summarization technology
US6167367A (en) Method and device for automatic error detection and correction for computerized text files
Aliwy et al. Corpus-based technique for improving Arabic OCR system
Ringlstetter et al. Adaptive text correction with Web-crawled domain-dependent dictionaries
Maheswari et al. Rule based morphological variation removable stemming algorithm
CN110688835B (en) Word feature value-based law-specific field word discovery method and device
CN113836941A (en) Contract navigation method and device
Zhu et al. A novel OCR approach based on document layout analysis and text block classification
Doermann et al. Translation lexicon acquisition from bilingual dictionaries
Hockenmaier Statistical parsing for CCG with simple generative models
JP4334068B2 (en) Keyword extraction method and apparatus for image document

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination