CN109446345A - Nuclear power file verification processing method and system - Google Patents

Nuclear power file verification processing method and system Download PDF

Info

Publication number
CN109446345A
CN109446345A CN201811122661.6A CN201811122661A CN109446345A CN 109446345 A CN109446345 A CN 109446345A CN 201811122661 A CN201811122661 A CN 201811122661A CN 109446345 A CN109446345 A CN 109446345A
Authority
CN
China
Prior art keywords
information
picture
nuclear power
file
verification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811122661.6A
Other languages
Chinese (zh)
Inventor
白鹤
颜斯泰
王云福
涂红兵
侯斌
戴伟琦
马菁
刘婧
吴祥勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China General Nuclear Power Corp
China Nuclear Power Engineering Co Ltd
CGN Power Co Ltd
Shenzhen China Guangdong Nuclear Engineering Design Co Ltd
Original Assignee
China General Nuclear Power Corp
China Nuclear Power Engineering Co Ltd
CGN Power Co Ltd
Shenzhen China Guangdong Nuclear Engineering Design Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China General Nuclear Power Corp, China Nuclear Power Engineering Co Ltd, CGN Power Co Ltd, Shenzhen China Guangdong Nuclear Engineering Design Co Ltd filed Critical China General Nuclear Power Corp
Priority to CN201811122661.6A priority Critical patent/CN109446345A/en
Publication of CN109446345A publication Critical patent/CN109446345A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/24Aligning, centring, orientation detection or correction of the image
    • G06V10/243Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Multimedia (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a kind of nuclear power file verification processing method and system, method includes: that unstructured nuclear power file and relevant metadata information are obtained from enterprise content management system;According to the unstructured nuclear power file and relevant metadata information got, the verification rule configuration information based on preparatory typing obtains verification rule;Based on the verification rule, image segmentation is carried out to the unstructured nuclear power file, and Text region is carried out to extract picture structure information to the block of information picture split;In conjunction with the picture structure information, document data verification is carried out.The present invention is suitable for inclusion in the checking treatment of the unstructured nuclear power file of picture, the blank of nuclear power enterprise content automated image identification verification is filled up, can according to demand preparatory typing configuration information to realize that being directed to different file type realizes diversification customization verification rule, it is greatly improved production efficiency, reduces human cost.

Description

Nuclear power file verification processing method and system
Technical field
The present invention relates to nuclear power field more particularly to a kind of nuclear power file verification processing methods and system.
Background technique
According to statistics, in Construction of Nuclear Electricity project, about the 3%~5% of engineering totle drilling cost is caused the problem of transmitting by information Caused by the mistake of change in the work and engineering construction.Nuclear power engineering Enterprise content information data is complicated, and document information quantity is huge Greatly, reach million ranks, especially projects file, technical documentation, business contract, contact letters and each technology path (such as AP1000, EPR three generations nuclear power technology) transfer the possession of data.Since technical data is largely to be stored in enterprise in the form of semi-structured In Content Management System (EnterpriseContent Management System, ECMS), information content is huge.
The metadata information of nuclear power file structure in information platform other than embodying, in the work of non-structured entity Also there is corresponding embodiment in journey document files, and in project implementing process, the metadata stored in ECM is needed through entity text The form of part shows Field Force, therefore the accuracy of nuclear power document information directly affects constructing and implementing for project, in order to Guarantee nuclear power engineering quality and nuclear safety, the standardization of document checks the nuclear power document management important foundation for being with meta data match Work.
Nuclear power document carries out electronization, paperless management, and electronic workflow examination & approval and automation digital signature mention significantly High production efficiency, but documentation review needs to put into a large amount of manpowers, becomes the bottleneck of document circulation.Nuclear power documentation review works The cumbersome routine work of one complexity, each Engineering Documents need to carry out up to 24, are required to artificial nucleus to inspection, It needs to check daily and checks several hundred parts of project files and engineering letters, consume a large amount of manpower and cost, and this repeatability Work.
The patent application of Publication No. CN106815268A discloses a kind of structuring of unstructured electronic document of magnanimity Processing method and system.The invention is only from attribute (such as file name, size, catalogue, the Kazakhstan of the entity electronic file of technical data The information such as uncommon code) it is analyzed and has been extracted, the data in the particular content of non-structured document, especially image are not believed Breath is further processed.
Summary of the invention
The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, providing a kind of nuclear power file school Test processing method and system.
The technical solution adopted by the present invention to solve the technical problems is: a kind of nuclear power file verification processing method is constructed, Include:
Unstructured nuclear power file and relevant metadata information are obtained from enterprise content management system;
According to the unstructured nuclear power file and relevant metadata information got, the verification rule based on preparatory typing is matched Confidence breath obtains verification rule;
Based on the verification rule, image segmentation is carried out to the unstructured nuclear power file, and to the letter split It ceases block picture and carries out Text region to extract picture structure information;
In conjunction with the picture structure information, document data verification is carried out.
Preferably, the method also includes:
Before carrying out image segmentation, the unstructured nuclear power file is pre-processed, the pretreatment includes: pair The unstructured nuclear power file successively carries out gray proces, binary conversion treatment, filtering noise reduction process and picture Slant Rectify.
Preferably, the method also includes:
The colouring information of the unstructured nuclear power file is extracted, file attribute information required for verifying is obtained;
After carrying out image segmentation, the clarity of identification information block picture;
When carrying out document data verification, based on the verification rule, to the picture structure information extracted, clearly Clear degree information, the colouring information file extracted and the file attribute information got carry out overall calibration, export verification knot Fruit provides the explanation of every check results to user interface as the result is shown.
Preferably, described that image segmentation is carried out to the unstructured nuclear power file, comprising: to the unstructured core Picture in text part carries out image edge identification;Slant correction is carried out to the picture for identifying edge;It is partitioned into slant correction Picture afterwards;According to the file template in verification rule, location information is obtained, single block of information is extracted according to location information positioning Picture.
It is preferably, described that Text region is carried out to extract picture structure information to the block of information picture that splits, Include:
Row is carried out to the single block of information picture split, character segmentation obtains single text picture;
It is analyzed by the statistical nature to single text picture, obtains feature vector;
Described eigenvector is inputted into artificial neural network to obtain the text information of the single text picture.
A kind of nuclear power file verification processing system is also claimed in the present invention, comprising:
Content Management System interface, for obtaining unstructured nuclear power file and related member from enterprise content management system Data information;
Rule acquisition module is verified, for according to the unstructured nuclear power file and relevant metadata information got, base In the verification rule configuration information of preparatory typing, verification rule is obtained;
Picture structure information extraction modules, for regular based on the verification, to the unstructured nuclear power file into Row image segmentation, and Text region is carried out to extract picture structure information to the block of information picture split;
Correction verification module, for carrying out document data verification in conjunction with the picture structure information.
Preferably, the system also includes:
Preprocessing module, for being sent to the picture structureization letter after pre-processing to the unstructured nuclear power file Breath extraction module is handled;Wherein, the pretreatment includes: successively to carry out at gray scale to the unstructured nuclear power file Reason, binary conversion treatment, filtering noise reduction process and picture Slant Rectify.
Preferably, the system also includes:
Colouring information extraction module, for extracting the colouring information of the unstructured nuclear power file and being sent to the verification Module;
File attribute extraction module, for obtaining file attribute information required for verification and being sent to the correction verification module;
Clarity identification module the clarity of block of information picture and is sent to the correction verification module for identification;
The correction verification module is specifically used for based on the verification rule, to the picture structure information extracted, clearly Clear degree information, the colouring information file extracted and the file attribute information got carry out overall calibration, export verification knot Fruit provides the explanation of every check results to user interface as the result is shown.
Preferably, the picture structure information extraction modules include:
Picture segmentation extraction unit, for carrying out image edge identification to the picture in the unstructured nuclear power file, Slant correction is carried out to the picture for identifying edge, the picture after being partitioned into slant correction;And according to the text in verification rule Part template obtains location information, extracts single block of information picture according to location information positioning.
Preferably, the picture structure information extraction modules include:
Word recognition unit, for carrying out row to the single block of information picture split, character segmentation obtains single text Picture is analyzed by the statistical nature to single text picture, obtains feature vector, described eigenvector is inputted artificial Neural network is to obtain the text information of the single text picture.
Nuclear power file verification processing method of the invention and system have the advantages that the present invention is suitable for packet The checking treatment of unstructured nuclear power file containing picture has filled up the sky of nuclear power enterprise content automated image identification verification It is white, it preparatory typing configuration information can be advised according to demand with realizing for different file type realization diversification customization verifications Then, it adapts to that nuclear power is multi-specialized, development characteristic of multicomputer, more technology paths, ensure that the integrality and accuracy of nuclear power content, It is greatly improved production efficiency, reduces human cost.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis The attached drawing of offer obtains other attached drawings:
Fig. 1 is the method flow diagram of the embodiment of the present invention one;
Fig. 2 is artificial nerve network model;
Fig. 3 is the system structure diagram of the embodiment of the present invention two.
Specific embodiment
To facilitate the understanding of the present invention, a more comprehensive description of the invention is given in the following sections with reference to the relevant attached drawings.In attached drawing Give exemplary embodiments of the invention.But the invention can be realized in many different forms, however it is not limited to this paper institute The embodiment of description.On the contrary, purpose of providing these embodiments is make it is more thorough and comprehensive to the disclosure.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.
The total thinking of the present invention is: unstructured nuclear power file and related first number are first obtained from enterprise content management system It is believed that breath;It is then based on the verification rule configuration information of preparatory typing, obtains verification rule;It is right again based on the verification rule The unstructured nuclear power file carries out image segmentation, and carries out Text region to the block of information picture split to extract figure Chip architecture information;Finally, carrying out document data verification in conjunction with the picture structure information.So it may be implemented through picture In content information extract to be formed and structured message and then verified, filled up the knowledge of nuclear power enterprise content automated image The blank not verified, and the verification rule configuration information different by typing can be realized more for different file types Sampleization customization verification rule, adapts to that nuclear power is multi-specialized, development characteristic of multicomputer, more technology paths, ensure that nuclear power content Integrality and accuracy are greatly improved production efficiency, reduce human cost.
In order to better understand the above technical scheme, in conjunction with appended figures and specific embodiments to upper It states technical solution to be described in detail, it should be understood that the specific features in the embodiment of the present invention and embodiment are to the application The detailed description of technical solution, rather than the restriction to technical scheme, in the absence of conflict, the present invention are implemented Technical characteristic in example and embodiment can be combined with each other.
Embodiment one
With reference to Fig. 1, embodiment one discloses a kind of nuclear power file verification processing method, and method includes:
S101, unstructured nuclear power file and relevant metadata information are obtained from enterprise content management system (ECMS), Metadata information includes document No., FileVersion, state, title etc..
The unstructured nuclear power file and relevant metadata information that S102, basis are got, the verification based on preparatory typing Rule configuration information obtains verification rule.Verification rule includes the classification of nuclear power file and the template of nuclear power file.
It wherein, is according to nuclear power document management rule typing related data when the verification rule configuration information of preparatory typing. In the present embodiment, verification rule configuration information is stored in following three databases: document classification rule base, verification regional rule Library and metadata verification rule base, the metadata of three database difference definition structures, non-structured nuclear power file And the incidence relation between the two.Related description is as follows.
Meta={ att1,att2,att3...attn}
Position={ pos1,pos2,pos3...posj}
Category={ ca1,ca2,ca3...cam}
Template={ tpl1,tpl2,tpl3...tpli}
Validate=(c, t) | c ∈ Categories, t ∈ Templates }
Wherein, Meta is the set of the metadata of nuclear power file, such as document No., FileVersion, state, title. Position is location of content of nuclear power file, including page-size, number of pages, positioning coordinate etc..
Category is the classification of nuclear power file, i.e., nuclear power document segments classification, such as project file, engineering letters, contract Deng.The foundation of classification is all from the combination of metadata.Categories is nuclear power document classification set.
Template is the template of nuclear power file, the classification for any one nuclear power file, all there is a corresponding mould Plate includes classification information and corresponding metadata location in each template.Templates is the template set of nuclear power file.
Validate is verification regular collection, the classification for each nuclear power file, corresponding one or more nuclear power file Template.
S103, the unstructured nuclear power file is pre-processed;
Wherein, the pretreatment includes four steps: successively carrying out gray proces, two to the unstructured nuclear power file Value processing, filtering noise reduction process and picture Slant Rectify, are below described in detail this four steps.
1031) gray proces: before carrying out gray processing processing, need to verify the color category of the unstructured nuclear power file Property, if verification passes through, it may be considered that the color of each pixel may be expressed as: C=xR+yG+zB in file, wherein x+y+z =1.Then each pixel in color document can be weighted gray processing processing: Gray=(ω according to following formulaRR+ωGG+ωBB)÷3.Since human eye is most sensitive to green, red is taken second place, minimum to the sensibility of blue, therefore makes ωG> ωR > ωBIt will obtain gray level image more easy to identify.Distinguishingly, setting weight is as follows, ωR=0.299, ωB=0.587, ωG= 0.114, ωG=0.114;ωR=0.299;ωB=0.587, the obtained gray level image effect is best.
1032) binary conversion treatment: since pretreated effect quality directly affects the performance of subsequent Text region, and nuclear power The file content identified in file is mostly letter or number, and therefore, it is necessary to carry out binary conversion treatment to picture, processing formula is such as Under:
Gray scale in document is adjusted by binary conversion treatment, is higher than threshold value T0It is 1 (255 are set as in gray scale), Lower than threshold value T0It is 0 (0 is set as in gray scale).Wherein, T0It generally may be configured as the gray value of all pixels in entire file Maxima and minima average value:
1033) it filters noise reduction process: to improve discrimination, the noise processed of file is especially scanned, using image smoothing Filtering algorithm eliminates picture noise.For example, can to the image for having carried out binary conversion treatment, selected window having a size of m*n (m, n > 1, and be odd number, generally 3*3) filter, and be odd number, generally 3*3), find intermediate value in all boundary values, as The numerical value of the central point.For example, 9 pixels of the 3*3 chosen for filter, if this 9 pixels before filtering Value is respectively as follows:
0 0 0
0 255 0
0 0 0
Obviously, intermediate point is noise, and after the filtering noise reduction process of this step, it is each that the 255 of intermediate point become periphery The intermediate value (average) 0 of point, i.e., after noise reduction, become
0 0 0
0 0 0
0 0 0
1034) picture Slant Rectify: can use the method based on projection, file be projected along assigned direction, if obtaining Dry perspective view obtains the inclination angle of file further according to the projection properties (such as mean square deviation) of perspective view, can be simultaneously according to the inclination angle Complete the slant correction of file.
S104, the colouring information for extracting the unstructured nuclear power file obtain file attribute information required for verifying. File attribute information required for verifying can be obtained, as file name, file are big according to nuclear power file Global Information herein Small, file format etc..
S105, it is based on the verification rule, image segmentation is carried out to the unstructured nuclear power file.It needs to illustrate It is S105 and S104 execution sequence between the two and with no restrictions.Only it need to guarantee to perform before final step S109 Step S105 and S104.
It can be determined according to the verification rule got in conjunction with the unstructured nuclear power file that step S101 is obtained The every content verified, wherein image segmentation can then be carried out for the image in nuclear power file, image segmentation Detailed process is as follows:
1051) image edge identification is carried out to the picture in the unstructured nuclear power file: can use edge detection File after operator pre-processes step S103 carries out convolution algorithm, is then detected in nuclear power file using Hough algorithm Image edge on straightway;
1052) to identifying that the picture at edge carries out slant correction: straightway obtained in the previous step is successively decreased row by length Sequence selects several longest straightways to calculate several straightways tilt angle relative to horizontal direction, takes this several Tilt angle of the intermediate value of the tilt angle of a straightway as whole image, the tilt angle based on whole image can pass through Rotation image carries out slant correction to it;
1053) picture after being partitioned into slant correction: the straightway on retention level direction and vertical direction removes other Straightway;Calculate the distance between the different straightway endpoints that remain, if it is less than the threshold value of setting, then to straightway into Row connection, can so obtain the unit table images of table;
1054) single block of information picture is extracted in positioning: firstly, obtaining position letter according to the file template in verification rule Breath, according to the position of location information location information block picture;Secondly, being known after the position of location information block picture by edge Other algorithm takes off block of information picture;Finally, the block of information picture for taking off out is temporarily saved according to rule.Wherein, position Confidence breath includes the page number, starting point and end point.In one specific embodiment, the information that edge algorithms will be taken off can use Block picture saves as BMP format.
S106, Text region is carried out to the block of information picture split to extract picture structure information, specifically included:
1061) row is carried out to the single block of information picture split, character segmentation obtains single text picture;
Wherein, row cutting can use the pixel accumulation method of bianry image, shown as the following formula.
Wherein F (i, j) is text bianry image, and L is capable length, and p is greater than zero experimental constant, depending on making an uproar for document Point.For asterisk wildcard.WhenFor >=when, expression formula if so, be then the row upper bound;WhenFor≤when be then row lower bound.Bound Between can cutting be a line.The method of character segmentation is similar therewith, and character segmentation is equivalent to 90 ° of picture rotation after row cutting Row cutting is carried out again, and L at this time is no longer capable length certainly, but word is high.
1062) it is analyzed by the statistical nature to single text picture, it is special using local gray level algorithm abstraction grid Sign obtains feature vector;
1063) described eigenvector is inputted into artificial neural network (Artificial Neural Network), such as Fig. 2 It is shown, it is arranged by the weight connected to neuron in artificial neural network, calculates whether nonlinear activation function is greater than threshold value, And then output category information yk, classification information yk are also with regard to the text information of the single text picture.
Wherein, xjIt is neuron input information, is the feature vector obtained in step 1062);ωkjIt is that neuron k connects The weight connect, θkFor threshold value,For activation primitive, ykFor the classification information of neuron k output.
The clarity of S107, identification information block picture;
Clarity identification uses gradient algorithm, uses Sobel operator extraction horizontal and vertical two to the picture in file The gradient value in direction realizes definition judgment based on TenenGrad energy gradient function.
D (f)=∑yx| G (x, y) |, (G (x, y) > T)
Wherein, D (f) indicates clarity, and T is given edge detection threshold, and Gx and Gy are at pixel (x, y) respectively The convolution of Sobel both horizontally and vertically edge detection operator.Wherein Sobel operator template is as follows.
S108, it is based on the verification rule, to the picture structure information extracted, sharpness information, extracted Colouring information file and the file attribute information that gets carry out overall calibration, export check results are to user as the result is shown Interface, and the explanation of every check results is provided.
Preferably, method further include: the log during entire method is recorded, such as picture segmentation as a result, member in picture Data recognition result etc..
Embodiment two
With reference to Fig. 3, present embodiment discloses a kind of nuclear power file verification processing systems, comprising:
Content Management System interface 201, for obtaining unstructured nuclear power file and phase from enterprise content management system Close metadata information.Metadata information includes document No., FileVersion, state, title etc..
Rule acquisition module 202 is verified, for according to the unstructured nuclear power file and relevant metadata information that get, Verification rule configuration information based on preparatory typing obtains verification rule.Verification rule includes classification and the nuclear power of nuclear power file The template of file.
It wherein, is according to nuclear power document management rule typing related data when the verification rule configuration information of preparatory typing. In the present embodiment, verification rule configuration information is stored in following three databases: document classification rule base, verification regional rule Library and metadata verification rule base, the metadata of three database difference definition structures, non-structured nuclear power file And the incidence relation between the two, particular content can be with the corresponding contents of reference implementation example one, details are not described herein again.
Preprocessing module 203, for being sent to the picture structure after pre-processing to the unstructured nuclear power file Change information extraction modules to be handled;Wherein, the pretreatment includes: successively to carry out gray scale to the unstructured nuclear power file Processing, binary conversion treatment, filtering noise reduction process and picture Slant Rectify.Particular content can be in the correspondence of reference implementation example one Hold, details are not described herein again.
Colouring information extraction module 204, for extracting the colouring information of the unstructured nuclear power file and being sent to described Correction verification module;
File attribute extraction module 205, for obtaining file attribute information required for verification and being sent to the calibration mode Block;
Picture structure information extraction modules 206, for regular based on the verification, to the unstructured nuclear power file Image segmentation is carried out, and Text region is carried out to extract picture structure information to the block of information picture split;
Clarity identification module 207 clarity of block of information picture and is sent to the correction verification module for identification;Specific mistake Journey can be with the corresponding content of reference implementation example one, and details are not described herein again.
Correction verification module 208, for regular based on the verification, to the picture structure information, the clarity extracted Information, the colouring information file extracted and the file attribute information got carry out overall calibration, and export check results are given User interface as the result is shown, and the explanation of every check results is provided.
Logger module 209, the log in operating process for recording whole system, such as picture segmentation as a result, figure Metadata recognition result in piece etc..
Specifically, above-mentioned picture structure information extraction modules 206 include:
Picture segmentation extraction unit 2061, for carrying out image edge knowledge to the picture in the unstructured nuclear power file Not, the picture to the picture progress slant correction for identifying edge, after being partitioned into slant correction;And according in verification rule File template obtains location information, extracts single block of information picture according to location information positioning.Particular content can be with reference implementation The corresponding content of example one, details are not described herein again.
Word recognition unit 2062, for carrying out row to the single block of information picture split, character segmentation obtains individually Text picture is analyzed by the statistical nature to single text picture, obtains feature vector, described eigenvector is inputted Artificial neural network is to obtain the text information of the single text picture.Particular content can be in the correspondence of reference implementation example one Hold, details are not described herein again.
It should be pointed out that being divided into these modules above in the description of various modules, being for clarity.So And in actual implementation, the boundary of various modules can be fuzzy.For example, any or all functionalities module herein Various hardware and/or software element can be shared.In another example any and/or all functional modules herein can be by sharing Processor execute software instruction and completely or partially implement.In addition, various softwares being performed by one or more processors Module can be shared between various software modules.Correspondingly, unless explicitly requested, the scope of the present invention not by various hardware and/ Or between software element mandatory boundary limitation.
In conclusion nuclear power file verification processing method of the invention and system, have the advantages that the present invention It is suitable for inclusion in the checking treatment of the unstructured nuclear power file of picture, has filled up nuclear power enterprise content automated image identification school The blank tested, can according to demand preparatory typing configuration information to realize that being directed to different file type realizes diversification customization school It tests rule, adapts to that nuclear power is multi-specialized, development characteristic of multicomputer, more technology paths, ensure that the integrality and standard of nuclear power content True property is greatly improved production efficiency, reduces human cost.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form, all of these belong to the protection of the present invention.

Claims (10)

1. a kind of nuclear power file verification processing method characterized by comprising
Unstructured nuclear power file and relevant metadata information are obtained from enterprise content management system;
According to the unstructured nuclear power file and relevant metadata information got, the verification rule based on preparatory typing matches confidence Breath obtains verification rule;
Based on the verification rule, image segmentation is carried out to the unstructured nuclear power file, and to the block of information split Picture carries out Text region to extract picture structure information;
In conjunction with the picture structure information, document data verification is carried out.
2. nuclear power file verification processing method according to claim 1, which is characterized in that the method also includes:
Before carrying out image segmentation, the unstructured nuclear power file is pre-processed, the pretreatment includes: to described Unstructured nuclear power file successively carries out gray proces, binary conversion treatment, filtering noise reduction process and picture Slant Rectify.
3. nuclear power file verification processing method according to claim 1, which is characterized in that the method also includes:
The colouring information of the unstructured nuclear power file is extracted, file attribute information required for verifying is obtained;
After carrying out image segmentation, the clarity of identification information block picture;
When carrying out document data verification, based on the verification rule, to the picture structure information, the clarity extracted Information, the colouring information file extracted and the file attribute information got carry out overall calibration, and export check results are given User interface as the result is shown, and the explanation of every check results is provided.
4. nuclear power file verification processing method according to claim 1, which is characterized in that described to described unstructured Nuclear power file carries out image segmentation, comprising:
Image edge identification is carried out to the picture in the unstructured nuclear power file;
Slant correction is carried out to the picture for identifying edge;
Picture after being partitioned into slant correction;
According to the file template in verification rule, location information is obtained, single block of information picture is extracted according to location information positioning.
5. nuclear power file verification processing method according to claim 1, which is characterized in that described to the letter split It ceases block picture and carries out Text region to extract picture structure information, comprising:
Row is carried out to the single block of information picture split, character segmentation obtains single text picture;
It is analyzed by the statistical nature to single text picture, obtains feature vector;
Described eigenvector is inputted into artificial neural network to obtain the text information of the single text picture.
6. a kind of nuclear power file verification processing system characterized by comprising
Content Management System interface, for obtaining unstructured nuclear power file and associated metadata from enterprise content management system Information;
Rule acquisition module is verified, for being based on pre- according to the unstructured nuclear power file and relevant metadata information got The verification rule configuration information of first typing obtains verification rule;
Picture structure information extraction modules, for carrying out figure to the unstructured nuclear power file based on the verification rule Text region is carried out as segmentation, and to the block of information picture split to extract picture structure information;
Correction verification module, for carrying out document data verification in conjunction with the picture structure information.
7. nuclear power file verification processing system according to claim 6, which is characterized in that the system also includes:
Preprocessing module is mentioned for being sent to the picture structure information after pre-processing to the unstructured nuclear power file Modulus block is handled;
Wherein, the pretreatment includes: successively to carry out gray proces, binary conversion treatment, filter to the unstructured nuclear power file Wave noise reduction process and picture Slant Rectify.
8. nuclear power file verification processing system according to claim 6, which is characterized in that the system also includes:
Colouring information extraction module, for extracting the colouring information of the unstructured nuclear power file and being sent to the calibration mode Block;
File attribute extraction module, for obtaining file attribute information required for verification and being sent to the correction verification module;
Clarity identification module the clarity of block of information picture and is sent to the correction verification module for identification;
The correction verification module is specifically used for based on the verification rule, to the picture structure information, the clarity extracted Information, the colouring information file extracted and the file attribute information got carry out overall calibration, and export check results are given User interface as the result is shown, and the explanation of every check results is provided.
9. nuclear power file verification processing system according to claim 6, which is characterized in that the picture structure information mentions Modulus block includes:
Picture segmentation extraction unit, for carrying out image edge identification to the picture in the unstructured nuclear power file, to knowledge Not Chu edge picture carry out slant correction, the picture after being partitioned into slant correction;And according to the file mould in verification rule Plate obtains location information, extracts single block of information picture according to location information positioning.
10. nuclear power file verification processing system according to claim 6, which is characterized in that the picture structure information Extraction module includes:
Word recognition unit, for carrying out row to the single block of information picture split, character segmentation obtains single text picture, It is analyzed by the statistical nature to single text picture, obtains feature vector, described eigenvector is inputted into artificial neuron Network is to obtain the text information of the single text picture.
CN201811122661.6A 2018-09-26 2018-09-26 Nuclear power file verification processing method and system Pending CN109446345A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811122661.6A CN109446345A (en) 2018-09-26 2018-09-26 Nuclear power file verification processing method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811122661.6A CN109446345A (en) 2018-09-26 2018-09-26 Nuclear power file verification processing method and system

Publications (1)

Publication Number Publication Date
CN109446345A true CN109446345A (en) 2019-03-08

Family

ID=65544449

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811122661.6A Pending CN109446345A (en) 2018-09-26 2018-09-26 Nuclear power file verification processing method and system

Country Status (1)

Country Link
CN (1) CN109446345A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414517A (en) * 2019-04-18 2019-11-05 河北神玥软件科技股份有限公司 It is a kind of for cooperating the quick high accuracy identity card text recognition algorithms for scene of taking pictures
CN111159997A (en) * 2019-12-31 2020-05-15 无锡识凌科技有限公司 Intelligent verification method for enterprise bid document
CN111581191A (en) * 2020-04-10 2020-08-25 岭东核电有限公司 Nuclear safety data verification method and device, computer equipment and storage medium
CN112434508A (en) * 2020-12-10 2021-03-02 清研灵智信息咨询(北京)有限公司 Research report automatic generation method based on deep learning
CN113239893A (en) * 2021-06-10 2021-08-10 深圳智子系科技有限公司 Document input rechecking method, system, electronic equipment and medium
CN113723913A (en) * 2021-08-05 2021-11-30 中核武汉核电运行技术股份有限公司 Nuclear power plant file management method, device, equipment and storage medium
CN114399774A (en) * 2022-01-19 2022-04-26 润申标准化技术服务(上海)有限公司 File processing method and device and electronic equipment
CN117391068A (en) * 2023-10-27 2024-01-12 中国人寿保险股份有限公司山东省分公司 Method and system for checking life insurance security business information based on RPA

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258198A (en) * 2013-04-26 2013-08-21 四川大学 Extraction method for characters in form document image
CN103761290A (en) * 2014-01-15 2014-04-30 浪潮(北京)电子信息产业有限公司 Data management method and system based on content aware
US20160063096A1 (en) * 2014-08-27 2016-03-03 International Business Machines Corporation Image relevance to search queries based on unstructured data analytics
CN105678612A (en) * 2015-12-30 2016-06-15 远光软件股份有限公司 Mobile terminal original certificate electronic intelligent filling system and method
CN106815268A (en) * 2015-12-01 2017-06-09 中广核工程有限公司 The structuring processing method and system of magnanimity destructuring e-file
CN107491730A (en) * 2017-07-14 2017-12-19 浙江大学 A kind of laboratory test report recognition methods based on image procossing

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103258198A (en) * 2013-04-26 2013-08-21 四川大学 Extraction method for characters in form document image
CN103761290A (en) * 2014-01-15 2014-04-30 浪潮(北京)电子信息产业有限公司 Data management method and system based on content aware
US20160063096A1 (en) * 2014-08-27 2016-03-03 International Business Machines Corporation Image relevance to search queries based on unstructured data analytics
CN106815268A (en) * 2015-12-01 2017-06-09 中广核工程有限公司 The structuring processing method and system of magnanimity destructuring e-file
CN105678612A (en) * 2015-12-30 2016-06-15 远光软件股份有限公司 Mobile terminal original certificate electronic intelligent filling system and method
CN107491730A (en) * 2017-07-14 2017-12-19 浙江大学 A kind of laboratory test report recognition methods based on image procossing

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110414517A (en) * 2019-04-18 2019-11-05 河北神玥软件科技股份有限公司 It is a kind of for cooperating the quick high accuracy identity card text recognition algorithms for scene of taking pictures
CN111159997A (en) * 2019-12-31 2020-05-15 无锡识凌科技有限公司 Intelligent verification method for enterprise bid document
CN111159997B (en) * 2019-12-31 2024-04-05 无锡识凌科技有限公司 Intelligent verification method for enterprise bidding document
CN111581191A (en) * 2020-04-10 2020-08-25 岭东核电有限公司 Nuclear safety data verification method and device, computer equipment and storage medium
CN111581191B (en) * 2020-04-10 2023-10-13 岭东核电有限公司 Nuclear safety data verification method, device, computer equipment and storage medium
CN112434508A (en) * 2020-12-10 2021-03-02 清研灵智信息咨询(北京)有限公司 Research report automatic generation method based on deep learning
CN113239893A (en) * 2021-06-10 2021-08-10 深圳智子系科技有限公司 Document input rechecking method, system, electronic equipment and medium
CN113723913A (en) * 2021-08-05 2021-11-30 中核武汉核电运行技术股份有限公司 Nuclear power plant file management method, device, equipment and storage medium
CN114399774A (en) * 2022-01-19 2022-04-26 润申标准化技术服务(上海)有限公司 File processing method and device and electronic equipment
CN117391068A (en) * 2023-10-27 2024-01-12 中国人寿保险股份有限公司山东省分公司 Method and system for checking life insurance security business information based on RPA
CN117391068B (en) * 2023-10-27 2024-04-05 中国人寿保险股份有限公司山东省分公司 Method and system for checking life insurance security business information based on RPA

Similar Documents

Publication Publication Date Title
CN109446345A (en) Nuclear power file verification processing method and system
KR101515256B1 (en) Document verification using dynamic document identification framework
AU2004271639B2 (en) Systems and methods for biometric identification using handwriting recognition
US7724958B2 (en) Systems and methods for biometric identification using handwriting recognition
WO2020164278A1 (en) Image processing method and device, electronic equipment and readable storage medium
JP2011507101A (en) Identification and verification of unknown documents by eigenimage processing
CN107273783A (en) Face identification system and its method
WO2020071558A1 (en) Business form layout analysis device, and analysis program and analysis method therefor
CN112464925A (en) Mobile terminal account opening data bank information automatic extraction method based on machine learning
CN107256378A (en) Language Identification and device
CN110222660B (en) Signature authentication method and system based on dynamic and static feature fusion
Bulatov et al. Towards a unified framework for identity documents analysis and recognition
CN114821725A (en) Miner face recognition system based on neural network
CN112001318A (en) Identity document information acquisition method and system
CN112508000A (en) Method and equipment for generating OCR image recognition model training data
CN112200789A (en) Image identification method and device, electronic equipment and storage medium
CN116959015A (en) File classification and archiving system based on image recognition
CN111414889B (en) Financial statement identification method and device based on character identification
CN108460772A (en) Harassing of advertisement facsimile signal detecting system based on convolutional neural networks and method
CN113705560A (en) Data extraction method, device and equipment based on image recognition and storage medium
CN113627442A (en) Medical information input method, device, equipment and storage medium
Girinath et al. Automatic Number Plate Detection using Deep Learning
CN111291726A (en) Medical bill sorting method, device, equipment and medium
Bogahawatte et al. Online Digital Cheque Clearance and Verification System using Block Chain
Ning et al. Design of an automated data entry system for hand-filled forms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190308

RJ01 Rejection of invention patent application after publication