CN114782671A - Data structuring method, device and storage medium for OCR recognition of power report picture - Google Patents

Data structuring method, device and storage medium for OCR recognition of power report picture Download PDF

Info

Publication number
CN114782671A
CN114782671A CN202210530637.6A CN202210530637A CN114782671A CN 114782671 A CN114782671 A CN 114782671A CN 202210530637 A CN202210530637 A CN 202210530637A CN 114782671 A CN114782671 A CN 114782671A
Authority
CN
China
Prior art keywords
picture
template
ocr recognition
ocr
report
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210530637.6A
Other languages
Chinese (zh)
Inventor
黄金钊
李学超
李柏新
田明正
林园敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Original Assignee
Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd filed Critical Guangzhou Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority to CN202210530637.6A priority Critical patent/CN114782671A/en
Publication of CN114782671A publication Critical patent/CN114782671A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Character Input (AREA)

Abstract

The invention relates to a data structuring method for OCR recognition of a power report picture. And configuring user role authority by utilizing an OCR template tree module so as to carry out template tree structuralization by using different OCR recognition templates aiming at different service users of the electric power. And carrying out structured service classification aiming at different types of the power report, and associating the configured OCR recognition template with the template tree node. And carrying out structure initialization on the tags according to an OCR recognition dictionary rule, and setting a uniform recognition type aiming at the same tags so as to provide the tags for the template as generalized tags. Uploading the report picture as a template marking picture, carrying out positioning marking through the report picture and the universal label, and storing the position coordinates of the characters in the picture. And cutting the picture according to the marked coordinates by using a picture cutting technology provided by the technical service center. The small pictures are cut and recognized by using an OCR technology provided by an algorithm service center, and finally, the structural data are formed through persistence processing. The power report picture data is convenient to structurally support power business application.

Description

Data structuring method, device and storage medium for OCR recognition of power report picture
Technical Field
The invention relates to the field of power production management, in particular to a data structuring method, a data structuring device and a data storage medium for OCR recognition of a power report picture.
Background
With the acceleration of the digital transformation process of power grid enterprises, higher requirements are put forward on distribution network production management, and structured data storage needs to be carried out on electric power paper reports. For historical reasons, the informatization application degree of the distribution network production management part is insufficient, and paper reports are used for archiving in the past. In the production management process, the situation of equipment cannot be truly reflected, and uncertainty is brought to production decisions.
In the power production operation, unstructured data is an important component of production, unstructured report data (pictures, scanned parts and paper) is still indispensable, and in the production process, the contained contents cannot be directly read, identified and retrieved by a computer, and the data cannot be directly analyzed and mined by the computer. Therefore, modeling needs to be performed on the unstructured data, a structural description model for identifying the whole process of the equipment is constructed, a paper report is subjected to data structuring, a logical relation is established between the paper report and the business, production big data is formed, and decision support is provided for production decisions.
Disclosure of Invention
In order to solve the technical problem or at least partially solve the technical problem, the invention provides a data structuring method, device and storage medium for OCR recognition of a power report picture.
In a first aspect, the present invention provides a data structuring method for OCR recognition of a power report picture, including:
providing multi-tenant authority management according to a technical service center, configuring user role authorities through an OCR template tree module to use different OCR recognition templates aiming at different service users of electric power, and dividing report picture types into different recognition types to configure different template tree nodes;
further, structured service classification is carried out according to different types of the power report, such as a test report, an overhaul report, an acceptance report, other reports and the like, the configured OCR recognition template is associated with a preset template tree node, and a corresponding relation is established between the OCR recognition template and the template tree and is stored;
furthermore, the same label field needs to be combed for different types of reports, the structure initialization is carried out on the labels according to the OCR recognition dictionary rule, and the uniform recognition type is set for the same labels so as to provide template labels as generalized labels.
Furthermore, the report picture is uploaded to serve as a template to label the picture, the labeled picture is loaded according to the original size to serve as a base picture, positioning labeling is carried out through the report picture and the universal label, and the position coordinates of the characters in the picture are stored.
Furthermore, by using a picture cutting technology provided by a technical service center, the image is cut according to the calculated image depth and the image channel, the subscript range of the matrix is calculated by setting coordinate information, the submatrix information in the range is taken, and the picture is cut according to the marked coordinates.
In a second aspect, the present application provides a system of a data structuring method for OCR recognition of a power report picture, including: the system comprises a technical service center, an application service center, an algorithm service center, a file service center and a data center which are connected through a network.
In a third aspect, the application provides a storage medium for a data structuring method by power report picture OCR recognition, the storage medium for implementing the data structuring method by power report picture OCR recognition stores at least one instruction, and the instruction is read and executed to implement the data structuring method by power report picture OCR recognition.
Compared with the prior art, the technical scheme provided by the embodiment of the invention has the following advantages:
according to the method and the device, the OCR template tree module is used for configuring the user role authority so as to use different OCR recognition templates for carrying out template tree structuralization on users with different electric power services. And carrying out structured service classification through different types of power reports, and associating the configured OCR recognition templates with the template tree nodes. And carrying out structure initialization on the tags according to an OCR recognition dictionary rule, and setting a uniform recognition type aiming at the same tags so as to provide the tags for the template as generalized tags. Uploading the report picture as a template marking picture, positioning and marking the report picture and the generalized label, and storing the position coordinates of the characters in the picture. And cutting the picture according to the marked coordinates by using a picture cutting technology provided by a technical service center. And identifying the cut small pictures by using an OCR technology provided by an algorithm service center, and finally performing persistence processing to form structured data. And (3) constructing a structural description model for identifying the whole report process, realizing the data structuralization and data visualization of the electric power paper report, and supporting business application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive labor.
FIG. 1 is a general design flow diagram for formulating and executing an OCR recognition structuring of an electrical paper report according to an embodiment of the present invention;
FIG. 2 is a flow chart of an implementation of formulating and executing an OCR recognition structuring for an electrical paper report according to an embodiment of the present invention;
fig. 3 is a schematic diagram of a system for structuring electric paper report OCR recognition according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
Example 1
The embodiment of the invention provides a data structuring method for OCR recognition of a power report picture, which comprises the following steps:
referring to fig. 2, the process of performing data structuring of the power report picture OCR recognition includes:
and S10, the technical service center provides multi-tenant authority management which comprises user role authority, role menu authority and role template tree authority, and different identification types are divided for report picture types to configure different template tree nodes.
And S20, carrying out structured service classification according to different types of the power reports, wherein the power reports of the same type may have different templates, classifying the power reports according to template rules, configuring OCR recognition templates for the same classification, associating the configured OCR recognition templates with preset template tree nodes, establishing a corresponding relation between the OCR recognition templates and the template tree, and storing the OCR recognition templates and the template tree.
And S30, combing the same label field for different types of reports, and performing structure initialization on the label according to the OCR recognition dictionary rule. And the OCR recognition dictionary rule provides word phrases through the application service center to form a label initialization rule.
S40, OCR identifies the label type of dictionary information including common label content, common table content, XY table content, judges whether the label type is common table content or XY table content, if yes, executes S50, otherwise executes S60.
And S50, establishing the association relationship between the label information and the form information.
S60, OCR identifies if the dictionary information contains label content, if yes, then executing S70, otherwise executing S80.
S70, separator information is entered.
S80, scanning the paper report into pictures, uploading the report pictures to a file service center, loading the pictures as base pictures according to the original sizes, positioning and labeling the pictures and the universal labels, providing visual labels by the application service center, and storing the position coordinates of the characters in the pictures.
And S90, calculating a subscript range of the matrix by setting coordinate information, taking submatrix information in the range, and cutting the picture according to the marked coordinates.
S100, identifying by cutting the small pictures by using an OCR technology provided by an algorithm service center, and finally performing persistence processing to form structured data, wherein the structured data is label information of the identified OCR report and is stored in a data center according to a database dictionary rule.
Example 2
Referring to fig. 3, an embodiment of the present application provides a system of a data structuring method for OCR recognition of a power report picture, including: the system comprises a technical service center, an application service center, an algorithm service center, a file service center and a data center which are connected through a network.
Example 3
The embodiment of the application provides a storage medium for a data structuring method of OCR recognition of a power report picture, wherein the storage medium for realizing the data structuring method of OCR recognition of the power report picture stores at least one instruction, and the instruction is read and executed to realize the data structuring method of OCR recognition of the power report picture.
According to the method and the device, the OCR template tree module is used for configuring the user role authority so as to use different OCR recognition templates for carrying out template tree structuralization on users with different electric power services. And carrying out structured service classification according to different types of the power reports, and associating the configured OCR recognition template with the template tree node. And carrying out structure initialization on the labels according to the OCR recognition dictionary rule, and setting a uniform recognition type aiming at the same labels so as to provide the labels for the template as generalized labels. Uploading the report picture as a template marking picture, positioning and marking the report picture and the generalized label, and storing the position coordinates of the characters in the picture. And cutting the picture according to the marked coordinates by using a picture cutting technology provided by a technical service center. And identifying the cut small pictures by using an OCR technology provided by an algorithm service center, and finally performing persistence processing to form structured data. And (3) constructing a structural description model for identifying the whole report process, realizing the data structuralization and data visualization of the electric power paper report, and supporting business application.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (11)

1. A data structuring method for OCR recognition of power report pictures is characterized by comprising the following steps:
providing multi-tenant authority management according to a technical service center, configuring user role authorities through an OCR template tree module to use different OCR recognition templates aiming at different service users of electric power, and dividing report picture types into different recognition types to configure different template tree nodes;
carrying out structured service classification aiming at different types of electric power reports, such as test reports, overhaul reports, acceptance reports, other reports and the like, associating configured OCR recognition templates with preset template tree nodes, establishing a corresponding relation between the OCR recognition templates and the template trees, and storing the OCR recognition templates and the template trees;
the same label field is required to be combed for different types of reports, the structure initialization is carried out on the label according to the OCR recognition dictionary rule, the uniform recognition type is set for the same label, and the label is provided for the template as a generalized label;
uploading a report picture as a template marking picture, loading the marking picture as a base picture according to the original size, carrying out positioning marking through the report picture and a universal label, and storing the position coordinates of characters in the picture;
cutting the image according to the calculated image depth and the image channel by using a picture cutting technology provided by a technical service center, calculating a subscript range of a matrix by setting coordinate information, taking submatrix information in the range, and cutting the picture according to marked coordinates;
the small pictures are cut and recognized by means of an OCR technology provided by an algorithm service center, and finally persistence processing is carried out to form structured data to be transmitted to a data center.
2. The data structuring method of OCR recognition of power report pictures as claimed in claim 1, characterized in that a technical service center provides multi-tenant rights management, the multi-tenant rights management being divided into a user role right, a role menu right, a role template tree right; the user role authority information comprises a user number and a role number, the role menu information comprises a role number and a menu number, and the role template tree authority information comprises a role number and a template tree number; and dividing the report picture types into different identification types to configure different template tree nodes.
3. The data structuring method based on OCR recognition of power report pictures as claimed in claim 1, characterized in that structured service classification is performed according to different types of power reports, wherein the power reports of the same type may have different templates, the power reports are classified according to template rules, OCR recognition templates are configured in the same classification, configured OCR recognition templates are associated with preset template tree nodes, and the OCR recognition templates are associated with template trees to establish a corresponding relationship and store, wherein the OCR recognition template information includes template numbers, template names and labeled base maps, the template tree node information includes father node numbers, nodes, whether leaf nodes exist, and node sequences, and the OCR template information and tree association includes association numbers, tree node numbers and template numbers.
4. The data structuring method based on OCR recognition of power report pictures as claimed in claim 1, wherein the same label field is combed for different types of reports, and the label is subjected to structure initialization according to OCR recognition dictionary rules; the OCR recognition dictionary rule provides word phrases through an application service center to form a label initialization rule, and dictionary information comprises label numbers, label names, label types, label sequencing, whether labels are included or not and separators.
5. An OCR recognition data structuring method as claimed in claim 2, wherein the OCR recognition dictionary information tag types comprise normal tag content, normal table content, XY table content, and if the tag types are normal table content, XY table content, table tag information needs to be associated, wherein the table tag information comprises a table number, a table name, and tag and table association information comprises a table number, a tag number; and establishing an association relation between the tag information and the table information so as to provide subsequent table data structuring.
6. The power report picture OCR recognition data structuring method as claimed in claim 2, wherein the OCR recognition dictionary information includes tag content, separator information needs to be entered, and the tag information and the content are correctly obtained by recognizing the separator, so as to provide subsequent tag-containing content for data structuring.
7. The data structuring method based on power report picture OCR recognition according to claim 1, characterized by scanning a paper report as a picture, uploading the report picture to a file service center as a template labeled picture, and storing the picture information until the template is a picture ID, wherein the uploaded report picture information comprises a picture number, a picture name and a picture path label; loading the pictures according to the original sizes to be used as base pictures, carrying out positioning labeling through the report pictures and the generalized labels, providing visual labeling by the application service center, and storing position coordinates of characters in the pictures; the position coordinate information comprises an X minimum value, a Y minimum value, an X maximum value and a Y maximum value.
8. The data structuring method of OCR recognition of power report picture as claimed in claim 1, wherein the depth of the cut image is calculated by the number of bits occupied by the pixel points in the image, wherein the number of bits occupied by the pixel points of the binary image is 1 bit, and the depth of the image is 1 bit; the pixel point of the gray image is located between 0 and 255, 2^8 is 255, and the depth of the image is 8;
according to the fact that RGB is basic three primary colors, 8 bits are used for representing one color, the maximum value of each color is 255, the range of the color value of each pixel point is (0-255,0-255 and 0-255), and the calculated image channel is 3;
for an RGB image with a 3-channel depth of 8, there can be 255^3 colors in total; the image is stored by using 3 matrixes, and the color of each pixel of the image can be displayed by overlapping the 3 matrixes; therefore, the subscript range of the matrix is calculated by setting the coordinate information, the sub-matrix information in the range is taken, and the picture is cut according to the marked coordinates.
9. An OCR-recognized data structuring method as claimed in claim 1, characterized in that the small pictures are cut and recognized by OCR technology provided by an algorithm service center, and finally persistence processing is performed to form structured data, wherein the structured data is recognized OCR report label information and is saved to the data center according to a database dictionary rule, and the OCR report label information comprises report numbers, corresponding label numbers and corresponding label values.
10. A system of a data structuring method of OCR recognition of power report pictures is characterized by comprising the following steps: the system comprises a technical service center, an application service center, an algorithm service center, a file service center and a data center which are connected through a network.
11. A storage medium for a data structuring method by power report picture OCR recognition, wherein the storage medium for implementing the data structuring method by power report picture OCR recognition stores at least one instruction, and reads and executes the instruction to implement the data structuring method by power report picture OCR recognition according to any one of claims 1 to 9.
CN202210530637.6A 2022-05-16 2022-05-16 Data structuring method, device and storage medium for OCR recognition of power report picture Pending CN114782671A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210530637.6A CN114782671A (en) 2022-05-16 2022-05-16 Data structuring method, device and storage medium for OCR recognition of power report picture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210530637.6A CN114782671A (en) 2022-05-16 2022-05-16 Data structuring method, device and storage medium for OCR recognition of power report picture

Publications (1)

Publication Number Publication Date
CN114782671A true CN114782671A (en) 2022-07-22

Family

ID=82437072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210530637.6A Pending CN114782671A (en) 2022-05-16 2022-05-16 Data structuring method, device and storage medium for OCR recognition of power report picture

Country Status (1)

Country Link
CN (1) CN114782671A (en)

Similar Documents

Publication Publication Date Title
US5889896A (en) System for performing multiple processes on images of scanned documents
EP0561606B1 (en) Method and system for labeling a document for storage, manipulation, and retrieval
CN108108342B (en) Structured text generation method, search method and device
US20150317413A1 (en) Flexible cad format
CN105740931A (en) Multidimensional anti-counterfeiting code as well as manufacture method and recognition method thereof
CN107679208A (en) A kind of searching method of picture, terminal device and storage medium
CN109409452A (en) The method and apparatus that universal tag parses and automatically generates print label
US11670067B2 (en) Information processing apparatus and non-transitory computer readable medium
CN114067335A (en) Electronic archive text recognition method, system, computer equipment and storage medium
US20100195151A1 (en) Image processing apparatus and control method for the same
CN114782671A (en) Data structuring method, device and storage medium for OCR recognition of power report picture
US20200387733A1 (en) Terminal apparatus, character recognition system, and character recognition method
CN116974999A (en) Electronic document signing method and device, electronic device and storage medium
CN105791503A (en) Method of storing business card information in address list and apparatus thereof
JP2005234790A (en) Handwritten slip processing system and method
CN115391567A (en) Fan standard operation knowledge graph construction method and device and operation machine
KR102418541B1 (en) Wire bundle production method, and apparatus therefor
CN113590115A (en) Method and device for automatically generating service system code
CN107861963B (en) Generation method and device of dangerous contract
CN113033169A (en) Service data processing method and device
CN111444751A (en) Information processing apparatus, storage medium, and information processing method
JP2000020640A (en) Classification system, retrieval system, classification method and recording medium
CN115983199B (en) Mobile digital publishing system and method
KR102352726B1 (en) Electronic apparatus that can convert medical expenses receipt printed on paper into an electronic document and operating method thereof
JP4114812B2 (en) How to manage symbol images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination