CN112434508A - Research report automatic generation method based on deep learning - Google Patents

Research report automatic generation method based on deep learning Download PDF

Info

Publication number
CN112434508A
CN112434508A CN202011441359.4A CN202011441359A CN112434508A CN 112434508 A CN112434508 A CN 112434508A CN 202011441359 A CN202011441359 A CN 202011441359A CN 112434508 A CN112434508 A CN 112434508A
Authority
CN
China
Prior art keywords
data
filled
research
data item
name
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011441359.4A
Other languages
Chinese (zh)
Other versions
CN112434508B (en
Inventor
黄冬虹
刘谢慧
赵彤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qingyan Lingzhi Information Consulting Beijing Co ltd
Original Assignee
Qingyan Lingzhi Information Consulting Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qingyan Lingzhi Information Consulting Beijing Co ltd filed Critical Qingyan Lingzhi Information Consulting Beijing Co ltd
Priority to CN202011441359.4A priority Critical patent/CN112434508B/en
Publication of CN112434508A publication Critical patent/CN112434508A/en
Application granted granted Critical
Publication of CN112434508B publication Critical patent/CN112434508B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • G06F40/186Templates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/70Denoising; Smoothing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Character Input (AREA)

Abstract

The invention provides a research report automatic generation method based on deep learning, which comprises the following steps: s1, acquiring a template of a research report to be generated and corresponding research data; s2, acquiring the items to be filled in the template; s3, acquiring a corresponding deep learning neural network model and acquiring corresponding calculation data from the research data based on the project to be filled; s4, inputting the calculation data into the deep learning neural network model to obtain a calculation result; s5, filling the calculation result into the item to be filled; and S6, repeating the steps S2-S5 until all the items to be filled are filled, thereby obtaining a research report. Compared with the prior art, the research report of the application is generated faster, and the accuracy is correspondingly higher, because the manual calculation mode can avoid the occurrence of the input error when the input error happens, and the error can be avoided to a great extent by directly obtaining the input error from the database.

Description

Research report automatic generation method based on deep learning
Technical Field
The invention relates to the field of report generation, in particular to a research report automatic generation method based on deep learning.
Background
At present, various research reports are used in various industries, and for the same industry, the framework of the research reports is almost similar, mainly the difference of data and the difference of data calculation results. Various types of research reports are in increasing demand, and at present, the required research reports can not be obtained quickly only by manually analyzing the research data and then writing the research reports.
Disclosure of Invention
In view of the above problems, an object of the present invention is to provide an automatic generation method of a study report based on deep learning.
The invention provides a research report automatic generation method based on deep learning, which comprises the following steps:
s1, acquiring a template of a research report to be generated and corresponding research data;
s2, acquiring the items to be filled in the template;
s3, acquiring a corresponding deep learning neural network model and acquiring corresponding calculation data from the research data based on the project to be filled;
s4, inputting the calculation data into the deep learning neural network model to obtain a calculation result;
s5, filling the calculation result into the item to be filled;
and S6, repeating the steps S2-S5 until all the items to be filled are filled, thereby obtaining a research report.
Preferably, the research data includes a name of a data item and a specific numerical value of the data item, and the research data storage is input into the database by scanning input, and specifically includes:
scanning a paper file recording the research data to obtain a scanned image;
performing character recognition on the scanned image to obtain the name of the data item recorded on the paper file and the specific numerical value of the data item;
and transmitting the name of the data item and the specific numerical value of the data item to the database for storage.
Preferably, the obtaining of the corresponding calculation data from the research data comprises:
the items to be filled comprise names and filling areas of the items to be filled;
and matching the name of the item to be filled with the name of the data item, and taking the specific numerical value of the data item corresponding to the name of the successfully matched data item as calculation data.
Preferably, the filling the calculation result into the item to be filled includes: and filling the calculation result into the filling area.
Preferably, the performing character recognition on the scanned image to obtain the name of the data item recorded on the paper document and the specific numerical value of the data item includes:
carrying out graying processing on the scanned image to obtain a grayscale image;
carrying out noise reduction processing on the gray level image to obtain a noise reduction image;
carrying out segmentation processing on the noise reduction image to obtain a foreground image only containing a character part;
and performing character recognition on the foreground image by adopting an ORC character recognition technology so as to obtain the name of the data item recorded on the paper file and the specific numerical value of the data item.
Compared with the prior art, the invention has the advantages that:
compared with the manual writing of research reports, the research report of the application has the advantages that the generation speed is higher, the accuracy is correspondingly higher, because the manual calculation mode can avoid the input error, and the error can be avoided to a great extent by directly obtaining the data from the database. Moreover, the manual calculation cost is high, the efficiency is slow, and the required research report cannot be obtained quickly.
Drawings
The invention is further illustrated by means of the attached drawings, but the embodiments in the drawings do not constitute any limitation to the invention, and for a person skilled in the art, other drawings can be obtained on the basis of the following drawings without inventive effort.
Fig. 1 is a diagram illustrating an exemplary embodiment of a method for automatically generating a research report based on deep learning according to the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
Referring to fig. 1, the present invention provides an automatic generation method of a research report based on deep learning, which includes:
s1, acquiring a template of a research report to be generated and corresponding research data;
s2, acquiring the items to be filled in the template;
s3, acquiring a corresponding deep learning neural network model and acquiring corresponding calculation data from the research data based on the project to be filled;
s4, inputting the calculation data into the deep learning neural network model to obtain a calculation result;
s5, filling the calculation result into the item to be filled;
and S6, repeating the steps S2-S5 until all the items to be filled are filled, thereby obtaining a research report.
The template of the research report adopts a universal template in the industry, namely the structure of the template is basically consistent, and only the numerical values required to be filled in different items in the template are different.
Research data are collected by researchers, and the researchers can carry out preliminary screening on the research data and remove error data which obviously exceed a normal range so as to enhance the accuracy of generation of subsequent research reports.
Preferably, the research data includes a name of a data item and a specific numerical value of the data item, and the research data storage is input into the database by scanning input, and specifically includes:
scanning a paper file recording the research data to obtain a scanned image;
performing character recognition on the scanned image to obtain the name of the data item recorded on the paper file and the specific numerical value of the data item;
and transmitting the name of the data item and the specific numerical value of the data item to the database for storage.
Besides the scan input, the method can also be a manual input mode, and the mode is directed at the paper document which is too sloppy to be written and can not be scanned by the character recognition technology. Of course, if the research data is electronic data, the researcher can directly enter the electronic data into the database, much more quickly.
Preferably, the obtaining of the corresponding calculation data from the research data comprises:
the items to be filled comprise names and filling areas of the items to be filled;
and matching the name of the item to be filled with the name of the data item, and taking the specific numerical value of the data item corresponding to the name of the successfully matched data item as calculation data.
In addition, an automatic matching module can be provided, the input value of which is the name of the item to be filled in and the output value of which is the name of one or more data items. Researchers can adjust the modules according to actual needs, so that the adaptability of the automatic generation research report is improved.
Preferably, the filling the calculation result into the item to be filled includes: and filling the calculation result into the filling area.
Preferably, the performing character recognition on the scanned image to obtain the name of the data item recorded on the paper document and the specific numerical value of the data item includes:
carrying out graying processing on the scanned image to obtain a grayscale image;
carrying out noise reduction processing on the gray level image to obtain a noise reduction image;
carrying out segmentation processing on the noise reduction image to obtain a foreground image only containing a character part;
and performing character recognition on the foreground image by adopting an ORC character recognition technology so as to obtain the name of the data item recorded on the paper file and the specific numerical value of the data item.
Preferably, the graying the scanned image to obtain a grayscale image includes:
acquiring a brightness component L of the scanned image in a Lab color model,
processing the luminance component as follows:
Figure BDA0002830390020000041
wherein (x, y) represents a coordinate, L (x, y) represents a luminance component value of a pixel having the coordinate (x, y), aL (x, y) represents a luminance component value after processing of a pixel having the coordinate (x, y), aL represents a luminance component after processing, a1+a2=1,a1And a2Representing a preset weight parameter, aveL (x, y) representing the average value of the luminance components of the pixels in the neighborhood with the size of k × k of the pixel with the coordinate of (x, y), maL (x, y) representing the minimum value of the luminance components of the pixels in the neighborhood, neiL representing the average value of the luminance components of all the pixels in the scanned image in the Lab color model, delta representing a control coefficient for controlling L (x, y) to be in a reasonable value range,
converting the aL back to an RGB color model, thereby obtaining an adjusted scanned image;
and carrying out graying processing on the adjusted scanning image to obtain a grayscale image.
The uneven brightness condition easily occurs in the scanning process, and according to the embodiment of the application, the brightness of different pixel points is adaptively adjusted, so that the accurate brightness adjustment can be performed on the currently processed pixel points according to the specific conditions of the pixel points around the currently processed pixel points, the dim light part is enhanced, more detail information can be favorably reserved for subsequent gray level images, and the accuracy of character recognition is improved.
Preferably, the performing noise reduction processing on the grayscale image to obtain a noise-reduced image includes:
carrying out noise point detection on pixel points in the gray level image to determine noise points;
recording the noise point as c, calculating a standard deviation fc (c) of gradients of pixel points in a neighborhood nei (c) with the size of e × e of the noise point c, and if fc (c) is smaller than a preset noise reduction threshold, performing noise reduction processing on the noise point c by adopting the following mode:
Figure BDA0002830390020000042
wherein, ano (c) represents the pixel value of the noise c after processing, f (g) represents the pixel value of the pixel g in nei (c), and numofei represents the total number of the pixels in nei (c);
if fc (c) is greater than or equal to a preset noise reduction threshold value, performing noise reduction processing on the noise point c by using the following formula:
Figure BDA0002830390020000051
where no (c) represents the pixel value of noise c, ano (c) represents the pixel value of noise c after processing, nei (c) represents the set of pixels in the neighborhood of noise c of e × e size, gs (d) mod d, mod represents the template for performing gaussian filtering processing on the grayscale image, x represents the convolution sign, d represents the element in nei (c), aveno (c) represents the average value of the pixel values of all the elements in nei (c), td (c) represents the standard deviation of the gradient amplitudes of all the pixels in nei (c), fc c) represents the standard deviation of the pixels in nei (c), di represents the spatial distance between d and c, and f (d) represents the pixel value of d.
The degree of difference of the pixel points in the nei (c) can be preliminarily judged by calculating the standard deviation of the gradient, if the degree of difference is smaller, the average value of the pixel values in the nei (c) is adopted to replace the pixel value of the c, and therefore the noise reduction result is achieved. When the distinguishing degree of the pixel points in nei (c) is larger, the surrounding situation of the noise point is more complicated, therefore, the relation between the pixel points around the noise point and the pixel points in the aspects of gradient amplitude, space distance and the like is fully considered, different weight proportions can be given to different pixel points in nei (c), so that accurate noise reduction is realized, and a Gaussian filter template is used as a weight value, so that a better noise reduction effect is achieved on the Gaussian noise point.
Preferably, the noise point detection is performed on the pixel points in the gray image, and determining the noise point includes:
marking the pixel point currently being processed as i, and for the pixel point i, marking the set of the pixel points in the neighborhood of r multiplied by r as Ui
Calculate UiThe absolute value of the difference of the gradient amplitudes of each pixel point and the pixel point i:
abv (i, j) | td (i) -td (j) |, where j represents UiOne pixel point in (1); abv (i, j) represents the absolute value of the difference in gradient magnitude of i and j; td (i) and td (j) respectively represent the gradient magnitudes of the pixel points i and j,
and selecting the largest first ma absolute values for summation to obtain a judgment parameter:
Figure BDA0002830390020000052
wherein ad (i) represents a judgment parameter of i, UmaRepresenting the set of the pixel points corresponding to the maximum front ma absolute values; b represents UmaTd (b) represents the gradient amplitude of the pixel point b;
comparing ad (i) with a preset noise judgment threshold, if the ad (i) is greater than the noise judgment threshold, taking the ad (i) as the noise, otherwise, not taking the ad (i) as the noise.
In the conventional noise point detection, the average value of the gray values in the neighborhood is calculated and compared with the currently processed pixel point, so that whether the pixel point is a noise point is determined. According to the processing mode, when a plurality of noise points exist in the neighborhood, the average value is large, the noise points are easy to miss detection, and the problem can be well avoided by detecting the gradient amplitude.
While embodiments of the invention have been shown and described, it will be understood by those skilled in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (5)

1. A research report automatic generation method based on deep learning is characterized by comprising the following steps:
s1, acquiring a template of a research report to be generated and corresponding research data;
s2, acquiring the items to be filled in the template;
s3, acquiring a corresponding deep learning neural network model and acquiring corresponding calculation data from the research data based on the project to be filled;
s4, inputting the calculation data into the deep learning neural network model to obtain a calculation result;
s5, filling the calculation result into the item to be filled;
and S6, repeating the steps S2-S5 until all the items to be filled are filled, thereby obtaining a research report.
2. The method according to claim 1, wherein the research data includes a name of a data item and a specific numerical value of the data item, and the research data is stored and input into the database by scanning input, and specifically includes:
scanning a paper file recording the research data to obtain a scanned image;
performing character recognition on the scanned image to obtain the name of the data item recorded on the paper file and the specific numerical value of the data item;
and transmitting the name of the data item and the specific numerical value of the data item to the database for storage.
3. The method for automatically generating a research report based on deep learning according to claim 2, wherein the obtaining of corresponding calculation data from the research data comprises:
the items to be filled comprise names and filling areas of the items to be filled;
and matching the name of the item to be filled with the name of the data item, and taking the specific numerical value of the data item corresponding to the name of the successfully matched data item as calculation data.
4. The method as claimed in claim 3, wherein the filling of the calculation result into the item to be filled comprises: and filling the calculation result into the filling area.
5. The method of claim 3, wherein the performing text recognition on the scanned image to obtain the name of the data item recorded on the paper document and the specific value of the data item comprises:
carrying out graying processing on the scanned image to obtain a grayscale image;
carrying out noise reduction processing on the gray level image to obtain a noise reduction image;
carrying out segmentation processing on the noise reduction image to obtain a foreground image only containing a character part;
and performing character recognition on the foreground image by adopting an ORC character recognition technology so as to obtain the name of the data item recorded on the paper file and the specific numerical value of the data item.
CN202011441359.4A 2020-12-10 2020-12-10 Research report automatic generation method based on deep learning Active CN112434508B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011441359.4A CN112434508B (en) 2020-12-10 2020-12-10 Research report automatic generation method based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011441359.4A CN112434508B (en) 2020-12-10 2020-12-10 Research report automatic generation method based on deep learning

Publications (2)

Publication Number Publication Date
CN112434508A true CN112434508A (en) 2021-03-02
CN112434508B CN112434508B (en) 2022-02-01

Family

ID=74691542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011441359.4A Active CN112434508B (en) 2020-12-10 2020-12-10 Research report automatic generation method based on deep learning

Country Status (1)

Country Link
CN (1) CN112434508B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121966A (en) * 2017-12-21 2018-06-05 欧浦智网股份有限公司 A kind of list method for automatically inputting, electronic equipment and storage medium based on OCR technique
CN109446345A (en) * 2018-09-26 2019-03-08 深圳中广核工程设计有限公司 Nuclear power file verification processing method and system
CN109800397A (en) * 2017-11-16 2019-05-24 北大方正集团有限公司 Data analysis report automatic generation method, device, computer equipment and medium
US20190189263A1 (en) * 2017-12-15 2019-06-20 International Business Machines Corporation Automated report generation based on cognitive classification of medical images
CN110503403A (en) * 2019-08-27 2019-11-26 陕西蓝图司法鉴定中心 Analysis and identification intelligent automation system and method for degree of injury
CN111986182A (en) * 2020-08-25 2020-11-24 卫宁健康科技集团股份有限公司 Auxiliary diagnosis method, system, electronic device and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109800397A (en) * 2017-11-16 2019-05-24 北大方正集团有限公司 Data analysis report automatic generation method, device, computer equipment and medium
US20190189263A1 (en) * 2017-12-15 2019-06-20 International Business Machines Corporation Automated report generation based on cognitive classification of medical images
CN108121966A (en) * 2017-12-21 2018-06-05 欧浦智网股份有限公司 A kind of list method for automatically inputting, electronic equipment and storage medium based on OCR technique
CN109446345A (en) * 2018-09-26 2019-03-08 深圳中广核工程设计有限公司 Nuclear power file verification processing method and system
CN110503403A (en) * 2019-08-27 2019-11-26 陕西蓝图司法鉴定中心 Analysis and identification intelligent automation system and method for degree of injury
CN111986182A (en) * 2020-08-25 2020-11-24 卫宁健康科技集团股份有限公司 Auxiliary diagnosis method, system, electronic device and storage medium

Also Published As

Publication number Publication date
CN112434508B (en) 2022-02-01

Similar Documents

Publication Publication Date Title
CN112819772B (en) High-precision rapid pattern detection and recognition method
CN101710425B (en) Self-adaptive pre-segmentation method based on gray scale and gradient of image and gray scale statistic histogram
EP3309703B1 (en) Method and system for decoding qr code based on weighted average grey method
CN108108746B (en) License plate character recognition method based on Caffe deep learning framework
KR101795823B1 (en) Text enhancement of a textual image undergoing optical character recognition
JP4680622B2 (en) Classification device
CN106446952B (en) A kind of musical score image recognition methods and device
CN111179243A (en) Small-size chip crack detection method and system based on computer vision
JP5810628B2 (en) Image processing apparatus and image processing program
CN104463795A (en) Processing method and device for dot matrix type data matrix (DM) two-dimension code images
JP2010262648A5 (en) Method for automatic alignment of document objects
KR101597739B1 (en) Image processing apparatus, image processing method, and computer readable medium
CN111353961A (en) Document curved surface correction method and device
CN117456195A (en) Abnormal image identification method and system based on depth fusion
CN113284158B (en) Image edge extraction method and system based on structural constraint clustering
CN115424107B (en) Surface disease detection method of underwater bridge piers based on image fusion and deep learning
CN103530625A (en) Optical character recognition method based on digital image processing
CN115880566A (en) Intelligent marking system based on visual analysis
KR20120132315A (en) Image processing apparatus, image processing method, and computer readable medium
CN112561875A (en) Photovoltaic cell panel coarse grid detection method based on artificial intelligence
CN112434508B (en) Research report automatic generation method based on deep learning
CN117315670B (en) Water meter reading area detection method based on computer vision
JPH11306325A (en) Object detection apparatus and object detection method
CN111161291A (en) Contour detection method based on target depth of field information
Boiangiu et al. Methods of bitonal image conversion for modern and classic documents

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant