CN113360608A

CN113360608A - Man-machine combined Chinese composition correcting system and method

Info

Publication number: CN113360608A
Application number: CN202110774531.6A
Authority: CN
Inventors: 杨林; 雷思东
Original assignee: Beijing Yueshen Intelligent Technology Co ltd
Current assignee: Beijing One Stroke Two Stroke Technology Co ltd; Beijing Yueshen Intelligent Technology Co ltd
Priority date: 2021-07-08
Filing date: 2021-07-08
Publication date: 2021-09-07
Anticipated expiration: 2041-07-08
Also published as: CN113360608B

Abstract

The system comprises a composition acquisition system, a preprocessing system, a correction system and a material recommendation system, wherein the preprocessing system preprocesses the composition to be corrected in a picture format acquired by the composition acquisition system, the correction system automatically corrects the composition, and correction information is given on an original picture of a writing paper, so that teachers and students can see intuitive correction results; in addition, the given correction information is in an editable form, so that a teacher can further modify the correction information according to own experience, and the correction result is more in line with the actual situation; in addition, the material recommendation system can automatically recommend excellent composition materials according to the defects of the compositions in the correction results, so that students can conveniently improve the composition capability. That is to say, adopt the technical scheme of this application, can solve the problem that prior art exists, can present audio-visual correction result, and provide more functions.

Description

Man-machine combined Chinese composition correcting system and method

Technical Field

The application relates to the technical field of computers, in particular to a human-computer combined Chinese composition correcting system and a method.

Background

nlp (Natural Language Processing) technology starts to gradually penetrate in the fields of Chinese composition and the like, and a part of trivial work of teachers can be shared by computers on the basis of the work of dimension diagnosis and statistical analysis of some compositions.

Most of the existing automatic composition correcting systems require two-stage operations, which include firstly ocr Recognition (Optical Character Recognition), converting uploaded composition pictures into text results, and then recognizing and correcting the converted text contents based on nlp technology. The correction result is finally displayed in a separate text form and cannot be synchronized to the composition paper, namely, the display mode of the correction result is not visual; moreover, most of the existing systems only realize the correction function and have single function.

Disclosure of Invention

The application provides a human-computer combined Chinese composition correcting system and a method, which aim to solve the problems that the correcting result presentation mode of the existing automatic composition correcting system is not visual and has single function.

The above object of the present application is achieved by the following technical solutions:

in a first aspect, an embodiment of the present application provides a human-computer combined chinese composition correcting system, which includes:

the composition acquisition system is used for acquiring the composition to be corrected in the picture format uploaded by the user; wherein, the picture format comprises PDF format;

the preprocessing system is used for analyzing the layout of the obtained composition to be modified by utilizing the ocr recognition engine to extract an actual composition area, obtain text position coordinate information and text content information, and perform title extraction and segmentation processing;

the correction system is used for correcting the text content information obtained by the preprocessing system and adding the correction information to the corresponding position of the original picture format of the composition to be corrected; wherein, the correction information is in an editable form, and the correction system provides a correction tool to enable a user to modify the correction information;

and the material recommending system is used for automatically recommending excellent composition materials according to the defects of the composition.

Optionally, the composition acquiring system may acquire a single picture or acquire a plurality of pictures uploaded in batch, and if the plurality of pictures uploaded in batch are the plurality of pictures uploaded in batch, automatically match the plurality of pictures with the corresponding names; the matching process comprises the following steps: performing layout analysis on each picture to extract name areas to obtain a plurality of name area pictures, and identifying each name area picture by using an ocr identification engine to obtain name information; and matching the corresponding picture with the corresponding name according to the obtained name information.

Optionally, the process of extracting the actual composition area by the preprocessing system includes:

extracting the maximum connected region on the periphery of the picture, and determining the region inside the connected region as an actual composition region when the maximum connected region exceeds a set region threshold;

calculating the distance between each point on the maximum connected region outline and four vertexes of the uploaded picture, and respectively selecting four points which are closest to the four vertexes of the original picture as the four vertexes of the actual composition region;

and performing perspective transformation on the four vertexes of the selected actual composition region to correct the picture.

Optionally, the process of performing the title extraction and the segmentation processing by the preprocessing system includes:

inputting ocr the corrected picture into an identification engine, and performing title extraction and segmentation processing on the returned line coordinate information in the text position coordinate information; if the abscissa of the leftmost vertex positions of two continuous rows at the beginning in one page is larger than the next row and larger than a preset first threshold, determining that the first row is a header area; and if the abscissa of the leftmost vertex position of the current row is larger than the next row and is larger than a preset second threshold, considering that the current row starts a new section.

Optionally, the Chinese composition correcting system is provided with a composition genre classification model and a comment library trained in advance, wherein the composition genre classification model is obtained by training based on a deep learning algorithm;

and in the process of correcting by the correcting system, the composition style classification model is utilized, the composition style is identified based on the text content information, and related comments are automatically selected from the comment library for pushing according to the identified composition style so as to facilitate the selection and modification of users.

Optionally, in the correction process in the correction system, according to a plurality of preset capability points to be detected, determining capability points that do not appear in the composition content information; wherein, each composition and style is correspondingly provided with a plurality of capability points;

and the material recommending system automatically recommends corresponding excellent composition materials according to the capability points which do not appear in the composition content information.

Optionally, the correction information includes text comment information and a mark, and the mark includes a line, a figure and a symbol;

when the correction information is added to the corresponding position of the to-be-corrected text in the original picture format, the correction system adds different forms of marks to the corresponding position in the picture according to the habits of users aiming at different text content information and adds character comment information.

Optionally, the system further comprises a total evaluation system for performing overall evaluation on the composition according to each piece of wholesale information, including scoring different aspects of the composition and giving total scores and total evaluation suggestions, and performing statistics on word count, words and sentences of the composition.

In a second aspect, an embodiment of the present application further provides a human-computer combined chinese composition correcting method, which is applied to the human-computer combined chinese composition correcting system described in any one of the first aspects, and the method includes:

the composition acquisition system acquires a composition to be corrected in a picture format uploaded by a user;

the preprocessing system utilizes an ocr recognition engine to perform layout analysis on the obtained composition to be corrected so as to extract an actual composition area, obtain text position coordinate information and text content information, and perform title extraction and segmentation processing;

the correction system corrects the text content information obtained by the preprocessing system and adds the correction information to the corresponding position of the original picture format of the composition to be corrected;

the material recommending system automatically recommends excellent composition materials according to the defects of compositions.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

the man-machine combined Chinese composition correcting system comprises a composition acquiring system, a preprocessing system, a correcting system and a material recommending system, wherein the preprocessing system preprocesses a composition to be corrected in a picture format acquired by the composition acquiring system, the correcting system automatically corrects the composition, and correction information is given on an original picture of a text paper, so that teachers and students can see intuitive correcting results; in addition, the given correction information is in an editable form, so that a teacher can further modify the correction information according to own experience, and the correction result is more in line with the actual situation; in addition, the material recommendation system can automatically recommend excellent composition materials according to the defects of the compositions in the correction results, so that students can conveniently improve the composition capability. That is to say, adopt the technical scheme of this application, can solve the problem that prior art exists, can present audio-visual correction result, and provide more functions.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic view of a workflow of a human-computer combined chinese composition correcting system according to an embodiment of the present application;

FIG. 2 is a schematic diagram of a batching result provided by an embodiment of the present application;

FIG. 3 is a schematic diagram of a name matching process according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an overall evaluation result provided in an embodiment of the present application;

fig. 5 is a schematic diagram of a material recommendation process according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

In order to solve the problems mentioned in the background art, the application provides a Chinese composition correcting system and a method combining human and machine, wherein firstly, correcting information is synchronized to a student writing paper picture by means of an image processing technology, and the real correcting habit of a teacher is simulated to the maximum extent; moreover, a correction tool is provided for teachers to modify the system pre-correction result, so that the correction result is more in line with the actual situation; in addition, still provide outstanding material recommendation function after the wholesale for the student can be better the promotion composition level. The specific embodiments are described in detail below by way of examples.

Examples

Referring to fig. 1, fig. 1 is a schematic view of a workflow of a human-computer integrated chinese composition correcting system according to an embodiment of the present application. As shown in fig. 1, the human-computer combined chinese composition correcting system mainly includes the following parts:

the composition acquisition system 1 is used for acquiring a composition to be corrected in a picture format uploaded by a user; wherein, the picture format comprises PDF format;

the preprocessing system 2 is used for analyzing the layout of the obtained composition to be modified by utilizing an ocr recognition engine to extract an actual composition area, obtain text position coordinate information and text content information, and perform title extraction and segmentation processing;

the correction system 3 is used for correcting the text content information obtained by the preprocessing system and adding the correction information to the corresponding position of the original picture format of the composition to be corrected; wherein, the correction information is in an editable form, and the correction system provides a correction tool to enable a user to modify the correction information;

and the material recommending system 4 is used for automatically recommending excellent composition materials according to the defects of the composition.

Strictly speaking, the PDF format does not belong to the picture format, but because characters in files of the PDF format and the picture format cannot be directly modified (that is, different from text formats in which words, txt, and the like can be directly edited and modified), in the processing process of files of the picture format and the PDF format, it is necessary to convert characters into pictures by using the ocr technology, and therefore, for convenience of description, in this embodiment, the PDF format is regarded as one of the picture formats, that is, the pictures mentioned below all include PDF files.

In addition, after the composition picture is uploaded to the system, ocr recognition and AI pre-reviewing are automatically carried out in the background, after reviewing is finished, a character reviewing result is obtained, then the reviewing information is added to the corresponding position of the original picture in a text box or the like for visual display, and a reviewing tool is provided for teachers to revise the reviewing information of the AI pre-reviewing. The correction information comprises text comment information and marks, and the marks comprise lines, figures, symbols and the like; when the correction information is added to the corresponding position of the original picture format of the to-be-corrected text, the correction system adds different forms of marks to the corresponding position in the picture according to the habits of users (teachers) and adds character comment information according to different text content information.

For example, as shown in fig. 2, the correction information includes a good sentence marked by a wavy line (which can be set with a color such as red, not shown in fig. 2), and a text comment information is given on the right side (or below, etc.); marking wrongly written characters by circles (with settable colors); the unsmooth sentences and the like are marked by transverse lines (with settable colors), so that the teacher correction habit is fitted as much as possible; in addition, the teacher can use the right-side correction tool to perform secondary editing on the correction result of the AI, including editing the text comment information in the text box, modifying the form or color of the lines, and adding text boxes, symbols, and lines, etc.

According to the technical scheme, in the human-computer combined Chinese composition correcting system provided by the embodiment of the application, after the preprocessing system preprocesses the composition to be corrected in the picture format acquired by the composition acquisition system, the correcting system automatically corrects the composition and gives correction information on the original picture of the composition paper, so that teachers and students can see intuitive correcting results; in addition, the given correction information is in an editable form, so that a teacher can further modify the correction information according to own experience, and the correction result is more in line with the actual situation; in addition, the material recommendation system can automatically recommend excellent composition materials according to the defects of the compositions in the correction results, so that students can conveniently improve the composition capability. That is to say, adopt the technical scheme of this application, can solve the problem that prior art exists, can present audio-visual correction result, and provide more functions.

Furthermore, in a specific application process, the composition acquisition system can acquire a single picture or a plurality of pictures uploaded in batch (the plurality of pictures can be integrated into a PDF format for uploading), and if the plurality of pictures uploaded in batch are the plurality of pictures uploaded in batch, the plurality of pictures are automatically matched with corresponding names; as shown in fig. 3, the matching process includes: performing layout analysis on each picture to extract name areas to obtain a plurality of name area pictures, and identifying each name area picture by using an ocr identification engine to obtain name information; and matching the corresponding picture with the corresponding name according to the obtained name information to obtain a matching result.

Through automatic name matching, composition pictures can be distributed under corresponding (student) names, so that the time of manual distribution by teachers is saved, and the efficiency is improved.

In addition, in some embodiments, the process of extracting the actual composition area by the preprocessing system includes: extracting the maximum connected region on the periphery of the picture, and determining the region inside the connected region as an actual composition region when the maximum connected region exceeds a set region threshold; calculating the distance between each point on the maximum connected region outline and four vertexes of the uploaded picture, and respectively selecting four points which are closest to the four vertexes of the original picture as the four vertexes of the actual composition region; and performing perspective transformation on the four vertexes of the selected actual composition region to correct the picture.

It should be noted that the above process is implemented for composition paper (as shown in fig. 2) pictures frequently appearing in the field of Chinese composition, and the obtained four vertexes of the actual composition area are also four vertexes of the square line in the composition paper shown in fig. 2.

Further, the process of the preprocessing system for title extraction and segmentation processing includes: inputting ocr the corrected picture into an identification engine, and performing title extraction and segmentation processing on line coordinate information in the returned text position coordinate information; if the abscissa of the leftmost vertex positions of two continuous rows at the beginning in one page is larger than the next row and larger than a preset first threshold, determining that the first row is a header area; and if the abscissa of the leftmost vertex position of the current row is larger than the next row and is larger than a preset second threshold, considering that the current row starts a new section.

Since the segment head of each segment necessarily contains the indentation of two characters (which can be regarded as two squares for composition paper), the abscissa of the first character of each segment (i.e. the character at the leftmost vertex position of the first line of each segment) is necessarily larger than the abscissa of the first character of the next line of the segment, and based on this principle, the title extraction and segmentation processing can be performed through the above process.

In addition, in some embodiments, for the overall composition comment, when the overall composition comment is specifically implemented, the Chinese composition correction system is provided with a composition genre classification model and a comment library which are trained in advance, wherein the composition genre classification model is obtained by training based on a deep learning algorithm; and in the process of correcting by the correcting system, the composition genre classification model is utilized, the composition genre is identified based on the text content information, and related comments are automatically selected from the comment library for pushing according to the identified composition genre so as to facilitate the selection and modification of users.

More specifically, millions of composition samples can be collected on each large composition website in advance, and a genre classification model (a genre classifier) is trained by using a deep learning algorithm, so that the genre of the composition to be corrected is identified; and the comment tags under each genre are sorted in advance, when the teacher needs to set and modify the comments, the comment library can be opened through the provided review assistant tool, and the appropriate comments are selected for quick setting, as shown in fig. 4.

In addition, in some embodiments, as shown in fig. 4, the system further includes a total evaluation system for performing overall evaluation on the composition according to the wholesale information, including scoring different aspects of the composition and giving total scores and total evaluation suggestions, and counting words, words and sentences of the composition, wherein the evaluation may specify a template according to each dimension in the scoring details. As shown in fig. 4, the different aspects of the composition described include content, expression, structure, and textual specifications. And the teacher can also modify the results such as the scores and the general comment suggestions given by the system, and the comments given by the system in advance can be changed synchronously when the teacher adjusts the scores in the scoring details.

In addition, in the excellent composition material recommendation of the material recommendation system, when the excellent composition material recommendation is specifically implemented, a plurality of capability points can be set in advance according to each composition type, so that in the correction process of the correction system, the capability points which do not appear in composition content information can be determined according to a plurality of preset capability points to be detected; furthermore, the material recommending system automatically recommends corresponding excellent composition materials according to the capability points which do not appear in the composition content information. Taking a written human composition as an example, the written human composition capability points comprise character appearance description, character psychology description and the like, after the system diagnoses the capability points of the composition, capability points appearing in the composition and capability points not appearing in the composition are diagnosed, and relevant excellent material recommendation is carried out at a recommended learning position for the capability points not appearing. In addition, in order to facilitate recommendation of the material, the labeled material data set may be preset, so that the corresponding material may be obtained from the labeled material data set according to the capability point diagnosis result or directly according to the genre label, and the specific process is shown in fig. 5.

In addition, the specific working process of the human-computer combined Chinese composition correcting system comprises the following steps:

the composition acquisition system 1 acquires a composition to be corrected in a picture format uploaded by a user;

the preprocessing system 2 utilizes an ocr recognition engine to perform layout analysis on the obtained composition to be corrected so as to extract an actual composition area, obtain text position coordinate information and text content information, and perform title extraction and segmentation processing;

the correction system 3 corrects the text content information obtained by the preprocessing system and adds the correction information to the corresponding position of the original picture format of the composition to be corrected;

the material recommending system 4 automatically recommends excellent composition materials based on the shortcomings of the composition.

In the scheme, a Chinese composition learning closed-loop scheme from machine evaluation to manual correction to material recommendation is provided. Carry out name matching and ai at the in-process of uploading and read in advance, the instrument of revising of platform supports the mr to revise the result that the machine read in advance, and the comment that the machine read in advance is updated in advance dynamically, the comment storehouse function of reading in advance assistant can write the comment for the mr and provide the thinking and dial, and support the convenient change comment of mr, consequently, teacher's efficiency and quality of revising can greatly be optimized, and be convenient for the student to look over the result of revising, the system can recommend the study material according to the diagnosis result individuation to student's composition in addition, and then can improve student's composition level.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A human-computer combined Chinese writing correcting system is characterized by comprising:

2. The system of claim 1, wherein the composition acquisition system is capable of acquiring a single picture or acquiring a plurality of pictures uploaded in batch, and if the plurality of pictures uploaded in batch are the plurality of pictures uploaded in batch, automatically matching the plurality of pictures with corresponding names; the matching process comprises the following steps: performing layout analysis on each picture to extract name areas to obtain a plurality of name area pictures, and identifying each name area picture by using an ocr identification engine to obtain name information; and matching the corresponding picture with the corresponding name according to the obtained name information.

3. The system of claim 1, wherein the pre-processing system extracts the actual composition area by:

4. The system of claim 3, wherein the pre-processing system performs the processes of title extraction and segmentation, including:

5. The system according to claim 1, wherein the Chinese composition correcting system is provided with a composition genre classification model and a comment library which are trained in advance, wherein the composition genre classification model is trained based on a deep learning algorithm;

6. The system according to claim 5, wherein in the correction system, in the correction process, according to a plurality of preset ability points to be detected, the ability points which do not appear in the composition content information are determined; wherein, each composition and style is correspondingly provided with a plurality of capability points;

7. The system of claim 1, wherein the wholesale information comprises textual comment information and indicia, the indicia comprising lines, graphics, and symbols;

8. The system of claim 1, further comprising a general review system for overall review of the composition based on the wholesale information, including scoring different aspects of the composition and giving general scores and general review suggestions, and counting words, and sentences of the composition.

9. A man-machine combined chinese composition correcting method applied to the man-machine combined chinese composition correcting system according to any one of claims 1 to 8, the method comprising: