CN113360608B

CN113360608B - Man-machine combined Chinese composition correcting system and method

Info

Publication number: CN113360608B
Application number: CN202110774531.6A
Authority: CN
Inventors: 杨林; 雷思东
Original assignee: Beijing One Stroke Two Stroke Technology Co ltd; Beijing Yueshen Intelligent Technology Co ltd
Current assignee: Beijing One Stroke Two Stroke Technology Co ltd; Beijing Yueshen Intelligent Technology Co ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2023-10-20
Anticipated expiration: 2041-07-08
Also published as: CN113360608A

Abstract

The application relates to a man-machine combined Chinese composition correction system and a man-machine combined Chinese composition correction method, wherein the system comprises a composition acquisition system, a preprocessing system, a correction system and a material recommendation system, wherein the preprocessing system carries out preprocessing on a to-be-corrected composition in a picture format acquired by the composition acquisition system, the correction system carries out automatic correction, correction information is given on an original picture of a composition paper, and therefore teachers and students can see visual correction results; in addition, the provided correction information is in an editable form, so that a teacher can further modify the correction information according to own experience, and the correction result is more in line with the actual situation; and moreover, the material recommendation system can automatically recommend excellent composition materials according to the defects of composition in the correction result, so that students can conveniently improve the composition capability. That is, by adopting the technical scheme of the application, the problems existing in the prior art can be solved, visual correction results can be presented, and more functions are provided.

Description

Man-machine combined Chinese composition correcting system and method

Technical Field

The application relates to the technical field of computers, in particular to a man-machine combined Chinese composition correction system and method.

Background

nlp (Natural Language Processing ) technology starts to penetrate gradually in the fields of Chinese composition and the like, and a part of relatively trivial work of a teacher can be shared by a computer on the work of basic dimension diagnosis and statistical analysis of composition.

The existing automatic composition correction system mostly needs two-stage operation, namely ocr recognition (Optical Character Recognition ) is needed first, the uploaded composition picture is converted into a text form result, and then the converted text content is recognized and corrected based on nlp technology. The correction result is finally displayed in a single text form and cannot be synchronized to the paper, namely, the display mode of the correction result is not visual; most of the existing systems only realize correction functions and have single functions.

Disclosure of Invention

The application provides a man-machine combined Chinese composition correction system and method, which aim to solve the problems that the correction result of the existing automatic composition correction system is not visual in presentation mode and has single function.

The above object of the present application is achieved by the following technical solutions:

in a first aspect, an embodiment of the present application provides a man-machine combined chinese composition correction system, including:

the composition acquisition system is used for acquiring a composition to be modified in a picture format uploaded by a user; wherein, the picture format comprises PDF format;

the preprocessing system is used for carrying out layout analysis on the acquired to-be-modified composition by utilizing a ocr recognition engine so as to extract an actual composition area, obtaining text position coordinate information and text content information, and carrying out topic extraction and segmentation processing;

the correction system is used for correcting the text content information obtained by the preprocessing system and adding the correction information to the corresponding position of the text to be corrected in the original picture format; wherein the correction information is in an editable form, and the correction system provides correction tools so that a user can modify the correction information;

and the material recommendation system is used for automatically recommending excellent composition materials according to the defects of the composition.

Optionally, the composition acquisition system can acquire a single picture or acquire a plurality of pictures uploaded in batches, and if the pictures are the plurality of pictures uploaded in batches, the plurality of pictures are automatically matched with the corresponding names; the matching process comprises the following steps: performing layout analysis on each picture to extract name areas to obtain a plurality of name area pictures, and identifying each name area picture by using a ocr identification engine to obtain name information; and matching the corresponding picture with the corresponding name according to the obtained name information.

Optionally, the process of extracting the actual composition area by the preprocessing system includes:

extracting the maximum communication area at the periphery of the picture, and determining the area inside the communication area as an actual composition area when the maximum communication area exceeds a set area threshold value;

calculating the distance between each point on the maximum connected region outline and four vertexes of the uploaded picture, and respectively selecting four points closest to the four vertexes of the original picture as four vertexes of the actual composition region;

and performing perspective transformation based on the four vertexes of the actual composition area obtained through the selection so as to correct the picture.

Optionally, the process of extracting the title and segmenting the title by the preprocessing system includes:

inputting the corrected picture into a ocr recognition engine, and extracting and segmenting the title aiming at the line coordinate information in the returned text position coordinate information; if the abscissa of the leftmost vertex positions of two continuous lines at the beginning in a piece of paper is larger than the next line and larger than a preset first threshold value, determining a first behavior title area; and if the abscissa of the leftmost vertex position of the current row is greater than the next row and greater than a preset second threshold, considering the start of a new section of the current row.

Optionally, the chinese composition correction system is provided with a pre-trained composition classification model and a comment library, where the composition classification model is obtained based on training of a deep learning algorithm;

and in the process of correcting by the correcting system, the composition genre classification model is utilized, the composition genres are identified based on the text content information, and related comments are automatically selected from the comment library according to the identified composition genres to be pushed so as to be convenient for a user to select and modify.

Optionally, in the correcting system, according to a plurality of preset capability points to be detected, determining capability points which do not appear in the composition information; wherein, each composition genre is correspondingly provided with a plurality of capability points;

and the material recommendation system automatically recommends the corresponding excellent composition materials according to the capability points which do not appear in the composition content information.

Optionally, the correction information includes text comment information and marks, and the marks include lines, graphics and symbols;

when the correction information is added to the corresponding position of the original picture format to be corrected, the correction system adds marks of different forms to the corresponding position in the picture according to the habit of the user aiming at different text content information, and adds text comment information.

Optionally, the system further comprises a general evaluation system for performing overall evaluation on the composition according to each piece of correction information, wherein the general evaluation system is used for scoring different aspects of the composition and giving general scores and general evaluation suggestions, and counting the number of words, words and sentences of the composition.

In a second aspect, an embodiment of the present application further provides a man-machine combined chinese composition modifying method, which is applied to the man-machine combined chinese composition modifying system of any one of the first aspect, and the method includes:

the composition acquisition system acquires a composition to be modified in a picture format uploaded by a user;

the preprocessing system utilizes a ocr recognition engine to conduct layout analysis on the acquired to-be-modified composition to extract an actual composition area, obtain text position coordinate information and text content information, and conduct topic extraction and segmentation processing;

the correction system corrects the text content information obtained by the preprocessing system and adds the correction information to the corresponding position of the text to be corrected in the original picture format;

the material recommending system automatically recommends excellent composition materials according to the defects of compositions.

The technical scheme provided by the embodiment of the application can comprise the following beneficial effects:

the man-machine combined Chinese composition correction system provided by the embodiment of the application comprises a composition acquisition system, a preprocessing system, a correction system and a material recommendation system, wherein the preprocessing system is used for preprocessing a to-be-corrected composition in a picture format acquired by the composition acquisition system, then the correction system is used for automatically correcting the composition and giving correction information on an original picture of a composition paper, so that teachers and students can see visual correction results; in addition, the provided correction information is in an editable form, so that a teacher can further modify the correction information according to own experience, and the correction result is more in line with the actual situation; and moreover, the material recommendation system can automatically recommend excellent composition materials according to the defects of composition in the correction result, so that students can conveniently improve the composition capability. That is, by adopting the technical scheme of the application, the problems existing in the prior art can be solved, visual correction results can be presented, and more functions are provided.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

FIG. 1 is a schematic diagram of a human-computer combined Chinese composition correction system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of an exemplary modification result according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a name matching process according to an embodiment of the present application;

FIG. 4 is a schematic diagram of an overall evaluation result provided by an embodiment of the present application;

fig. 5 is a schematic diagram of a material recommendation process according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

In order to solve the problems mentioned in the background art, the application provides a man-machine combined Chinese composition correction system and a man-machine combined Chinese composition correction method, wherein, firstly, correction information is synchronized to a student composition paper picture by means of an image processing technology, so as to simulate the real correction habit of a teacher to the greatest extent; in addition, a correction tool is provided for a teacher to modify the system pre-correction result, so that the correction result is more in line with the actual situation; in addition, excellent material recommending function is provided after correction, so that students can better promote composition level. Specific embodiments are described in detail below by way of examples.

Examples

Referring to fig. 1, fig. 1 is a schematic workflow diagram of a man-machine combined chinese composition correction system according to an embodiment of the present application. As shown in FIG. 1, the human-computer combined Chinese composition correction system mainly comprises the following parts:

the composition acquisition system 1 is used for acquiring a to-be-modified composition in a picture format uploaded by a user; wherein, the picture format comprises PDF format;

the preprocessing system 2 is used for performing layout analysis on the acquired to-be-modified composition by utilizing a ocr recognition engine to extract an actual composition area, obtaining text position coordinate information and text content information, and performing topic extraction and segmentation processing;

the correction system 3 is used for correcting the text content information obtained by the preprocessing system and adding the correction information to the corresponding position of the text to be corrected in the original picture format; wherein the correction information is in an editable form, and the correction system provides correction tools so that a user can modify the correction information;

and the material recommendation system 4 is used for automatically recommending excellent composition materials according to the defects of the composition.

It should be noted that, strictly speaking, the PDF format does not belong to the picture format, but, since the text in the PDF format and the file in the picture format cannot be directly modified (i.e. different from the text format that can be directly edited and modified by word, txt, etc.), in the processing of the file in the picture format and the file in the PDF format, the processing of the file in the picture format and the file in the PDF format needs to be performed by the ocr technology, so in this embodiment, the PDF format is regarded as one of the picture formats, that is, the pictures mentioned below all include the PDF file.

In addition, after the composition picture is uploaded to the system, ocr identification and AI pre-reading are automatically carried out in the background, after the reading is finished, a text reading result is obtained, and then the correction information is added to the corresponding position of the original picture in a text box or similar mode for visual display, and correction tools are provided for teachers to modify the correction information of the AI pre-reading. The correction information comprises character comment information and marks, wherein the marks comprise lines, figures, symbols and the like; when the correction information is added to the corresponding position of the to-be-corrected text in the original picture format, the correction system adds marks in different forms to the corresponding position in the picture according to the habit of a user (teacher) aiming at different text content information, and adds text comment information.

For example, as shown in fig. 2, the correction information includes labeling a good sentence with wavy lines (settable colors, such as red, not shown in fig. 2), and giving text comment information on the right side (or below, etc.); marking wrongly written characters with circles (with settable colors); marking the unsmooth sentences and the like by using transverse lines (with settable colors), so as to fit the correction habit of a teacher as much as possible; in addition, the teacher may use the right-side correction tool to make secondary edits to the correction results of the AI, including editing text comment information in text boxes, modifying line forms or colors, adding text boxes, symbols, lines, and the like.

According to the technical scheme, in the man-machine combined Chinese composition correction system provided by the embodiment of the application, after the pretreatment system carries out pretreatment on the to-be-corrected composition in the picture format acquired by the composition acquisition system, the correction system carries out automatic correction and gives correction information on the original picture of the composition paper, so that teachers and students can see visual correction results; in addition, the provided correction information is in an editable form, so that a teacher can further modify the correction information according to own experience, and the correction result is more in line with the actual situation; and moreover, the material recommendation system can automatically recommend excellent composition materials according to the defects of composition in the correction result, so that students can conveniently improve the composition capability. That is, by adopting the technical scheme of the application, the problems existing in the prior art can be solved, visual correction results can be presented, and more functions are provided.

Further, in a specific application process, the composition acquisition system can acquire a single picture or acquire a plurality of pictures uploaded in batches (the plurality of pictures can be integrated into a PDF format for uploading), and if the pictures are the plurality of pictures uploaded in batches, the plurality of pictures are automatically matched with corresponding names; as shown in fig. 3, the matching process includes: performing layout analysis on each picture to extract name areas to obtain a plurality of name area pictures, and identifying each name area picture by using a ocr identification engine to obtain name information; and matching the corresponding picture with the corresponding name according to the obtained name information to obtain a matching result.

By automatic name matching, the composition pictures can be distributed to the corresponding names of the students, so that the time for manual distribution by a teacher is saved, and the efficiency is improved.

Furthermore, in some embodiments, the process of extracting the actual composition area by the preprocessing system includes: extracting the maximum communication area at the periphery of the picture, and determining the area inside the communication area as an actual composition area when the maximum communication area exceeds a set area threshold value; calculating the distance between each point on the maximum connected region outline and four vertexes of the uploaded picture, and respectively selecting four points closest to the four vertexes of the original picture as four vertexes of the actual composition region; and performing perspective transformation based on the four vertexes of the actual composition area obtained through the selection so as to correct the picture.

It should be noted that the above process is implemented for the pictures of the paper (as shown in fig. 2) that often occur in the chinese art, and the four vertices of the actual paper area are obtained, that is, the four vertices of the square frame line in the paper shown in fig. 2.

Further, the process of extracting and segmenting the title by the preprocessing system comprises the following steps: inputting the corrected picture into a ocr recognition engine, and extracting and segmenting the title aiming at the line coordinate information in the returned text position coordinate information; if the abscissa of the leftmost vertex positions of two continuous lines at the beginning in a piece of paper is larger than the next line and larger than a preset first threshold value, determining a first behavior title area; and if the abscissa of the leftmost vertex position of the current row is greater than the next row and greater than a preset second threshold, considering the start of a new section of the current row.

Since the head of each segment necessarily contains the indents of two characters (which can also be regarded as two squares for a paper), the abscissa of the first character of each segment (i.e. the character of the leftmost vertex position of the first line of each segment) is necessarily larger than the abscissa of the first character of the next line of the segment, and based on this principle, the header extraction and segmentation process can be performed by the above procedure.

In addition, in some embodiments, for the overall comment of the composition, in specific implementation, the Chinese composition correction system is provided with a pre-trained composition genre classification model and a comment library, wherein the composition genre classification model is trained based on a deep learning algorithm; and in the process of correcting by the correcting system, the composition genre classification model is utilized, the composition genres are identified based on the text content information, and related comments are automatically selected from the comment library according to the identified composition genres to be pushed so as to be convenient for a user to select and modify.

More specifically, millions of composition samples can be collected on each large composition website in advance, and a body-cutting classification model (a body-cutting classifier) is trained by using a deep learning algorithm, so that body-cutting identification is carried out on the composition to be corrected; and the comment labels under each genre are pre-arranged, when a teacher needs to set and modify comments, a comment library can be opened through the provided comment assistant tool, and a proper comment is selected for quick setting, as shown in fig. 4.

In addition, in some embodiments, as shown in fig. 4, the system further includes a general evaluation system for performing overall evaluation on the composition according to each piece of correction information, including scoring different aspects of the composition and giving general scores and general evaluation suggestions, and performing statistics on the number of words, terms and sentences of the composition, where the evaluation may specify templates according to each dimension in the scoring details. As shown in FIG. 4, the different aspects of the composition include content, expression, structure, and context specifications. And, the score and the result such as the general comment suggestion that the system gave, the mr can also revise, when the mr adjusts the score in the scoring details, the comment that the system gave in advance changes simultaneously.

In addition, regarding the excellent composition material recommendation of the material recommendation system, in specific implementation, a plurality of capability points can be set in advance according to each composition, so that in the correction process in the correction system, the capability points which do not appear in the composition content information can be determined according to the preset plurality of capability points to be detected; furthermore, the material recommendation system automatically recommends the corresponding excellent composition materials according to the capability points which do not appear in the composition content information. Taking the writing of the human composition as an example, the writing of the human composition capability points comprises the description of the appearance of the human, the psychological description of the human and the like, after the system diagnoses the capability points of the composition, the capability points which appear and the capability points which do not appear in the composition are diagnosed, and the related excellent material recommendation is carried out on the capability points which do not appear at the recommended learning place. In addition, in order to facilitate recommending materials, a labeled material data set may be preset, so that corresponding materials may be obtained from the labeled material data set according to the capability point diagnosis result or directly according to the genre label, and the specific process is shown in fig. 5.

In addition, the specific working process of the man-machine combined Chinese composition correction system comprises the following steps:

the method comprises the steps that a composition acquisition system 1 acquires a composition to be modified in a picture format uploaded by a user;

the preprocessing system 2 utilizes a ocr recognition engine to conduct layout analysis on the acquired to-be-modified composition to extract an actual composition area, obtain text position coordinate information and text content information, and conduct topic extraction and segmentation processing;

the correction system 3 corrects the text content information obtained by the preprocessing system, and adds the correction information to the corresponding position of the original picture format to be corrected;

the material recommendation system 4 automatically recommends excellent composition materials according to the defects of composition.

In the scheme, a Chinese composition learning closed-loop scheme from machine evaluation to manual correction to material recommendation is provided. The name matching and ai pre-reading are carried out in the uploading process, the correction tool of the platform supports a teacher to modify the pre-reading result of the machine and dynamically updates the pre-reading comment of the machine, meanwhile, the comment library function of the reading assistant can provide thinking click for the teacher to write comments and support the teacher to conveniently change comments, therefore, correction efficiency and quality of the teacher can be greatly optimized, students can conveniently check the correction result, in addition, the system can personally recommend learning materials according to the diagnosis result of the student composition, and further the composition level of the students can be improved.

It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.

It should be noted that in the description of the present application, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present application, unless otherwise indicated, the meaning of "plurality" means at least two.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It is to be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.

Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.

The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the application, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the application.

Claims

1. A human-machine-combined chinese-text modification system, comprising:

the material recommendation system is used for automatically recommending excellent composition materials according to the defects of the composition;

the process of extracting the actual composition area by the preprocessing system comprises the following steps:

based on the four vertexes of the actual composition area obtained by the selection, performing perspective transformation to correct the picture;

the process of the preprocessing system for extracting the title and carrying out segmentation processing comprises the following steps:

2. The system of claim 1, wherein the composition acquisition system is capable of acquiring a single picture or acquiring a plurality of pictures uploaded in batches, and if the pictures are the plurality of pictures uploaded in batches, automatically matching the plurality of pictures with corresponding names; the matching process comprises the following steps: performing layout analysis on each picture to extract name areas to obtain a plurality of name area pictures, and identifying each name area picture by using a ocr identification engine to obtain name information; and matching the corresponding picture with the corresponding name according to the obtained name information.

3. The system according to claim 1, wherein the chinese composition correction system is provided with a pre-trained composition genre classification model and a comment library, wherein the composition genre classification model is trained based on a deep learning algorithm;

4. A system according to claim 3, wherein in the correction system, in the correction process, capability points which do not appear in the composition information are determined according to a plurality of capability points to be detected which are preset; wherein, each composition genre is correspondingly provided with a plurality of capability points;

5. The system of claim 1, wherein the correction information includes text comment information and indicia, the indicia including lines, graphics, and symbols;

6. The system of claim 1, further comprising a general rating system for overall rating the composition based on the correction information, including scoring different aspects of the composition and giving general scores and general rating suggestions, and counting words, words and sentences of the composition.

7. A human-machine-combined chinese composition modifying method, applied to the human-machine-combined chinese composition modifying system according to any one of claims 1 to 6, the method comprising:

the preprocessing system utilizes a ocr recognition engine to conduct layout analysis on the acquired to-be-modified composition to extract an actual composition area, obtain text position coordinate information and text content information, and conduct topic extraction and segmentation processing; the process for extracting the actual composition area comprises the following steps: extracting the maximum communication area at the periphery of the picture, and determining the area inside the communication area as an actual composition area when the maximum communication area exceeds a set area threshold value; calculating the distance between each point on the maximum connected region outline and four vertexes of the uploaded picture, and respectively selecting four points closest to the four vertexes of the original picture as four vertexes of the actual composition region; based on the four vertexes of the actual composition area obtained by the selection, performing perspective transformation to correct the picture; the title extraction and segmentation process comprises the following steps: inputting the corrected picture into a ocr recognition engine, and extracting and segmenting the title aiming at the line coordinate information in the returned text position coordinate information; if the abscissa of the leftmost vertex positions of two continuous lines at the beginning in a piece of paper is larger than the next line and larger than a preset first threshold value, determining a first behavior title area; if the abscissa of the leftmost vertex position of the current row is greater than the next row and greater than a preset second threshold, considering the start of a new section of the current row;