CN109446345A - Nuclear power file verification processing method and system - Google Patents
Nuclear power file verification processing method and system Download PDFInfo
- Publication number
- CN109446345A CN109446345A CN201811122661.6A CN201811122661A CN109446345A CN 109446345 A CN109446345 A CN 109446345A CN 201811122661 A CN201811122661 A CN 201811122661A CN 109446345 A CN109446345 A CN 109446345A
- Authority
- CN
- China
- Prior art keywords
- information
- picture
- nuclear power
- file
- verification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012795 verification Methods 0.000 title claims abstract description 92
- 238000003672 processing method Methods 0.000 title claims abstract description 15
- 238000007726 management method Methods 0.000 claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 17
- 238000003709 image segmentation Methods 0.000 claims abstract description 15
- 238000013524 data verification Methods 0.000 claims abstract description 8
- 238000012937 correction Methods 0.000 claims description 25
- 238000000605 extraction Methods 0.000 claims description 19
- 238000004040 coloring Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 14
- 230000011218 segmentation Effects 0.000 claims description 14
- 238000006243 chemical reaction Methods 0.000 claims description 9
- 238000001914 filtration Methods 0.000 claims description 8
- 238000011946 reduction process Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 7
- 238000007781 pre-processing Methods 0.000 claims description 6
- 210000002569 neuron Anatomy 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 abstract description 5
- 238000005516 engineering process Methods 0.000 description 6
- 238000005520 cutting process Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 230000018109 developmental process Effects 0.000 description 3
- 238000003708 edge detection Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000005553 drilling Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000003706 image smoothing Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
- G06V10/267—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/06—Energy or water supply
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/24—Aligning, centring, orientation detection or correction of the image
- G06V10/243—Aligning, centring, orientation detection or correction of the image by compensating for image skew or non-uniform image deformations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
Landscapes
- Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- Economics (AREA)
- Strategic Management (AREA)
- Multimedia (AREA)
- Entrepreneurship & Innovation (AREA)
- Development Economics (AREA)
- Tourism & Hospitality (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Health & Medical Sciences (AREA)
- Educational Administration (AREA)
- Primary Health Care (AREA)
- Water Supply & Treatment (AREA)
- Public Health (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Health & Medical Sciences (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a kind of nuclear power file verification processing method and system, method includes: that unstructured nuclear power file and relevant metadata information are obtained from enterprise content management system;According to the unstructured nuclear power file and relevant metadata information got, the verification rule configuration information based on preparatory typing obtains verification rule;Based on the verification rule, image segmentation is carried out to the unstructured nuclear power file, and Text region is carried out to extract picture structure information to the block of information picture split;In conjunction with the picture structure information, document data verification is carried out.The present invention is suitable for inclusion in the checking treatment of the unstructured nuclear power file of picture, the blank of nuclear power enterprise content automated image identification verification is filled up, can according to demand preparatory typing configuration information to realize that being directed to different file type realizes diversification customization verification rule, it is greatly improved production efficiency, reduces human cost.
Description
Technical field
The present invention relates to nuclear power field more particularly to a kind of nuclear power file verification processing methods and system.
Background technique
According to statistics, in Construction of Nuclear Electricity project, about the 3%~5% of engineering totle drilling cost is caused the problem of transmitting by information
Caused by the mistake of change in the work and engineering construction.Nuclear power engineering Enterprise content information data is complicated, and document information quantity is huge
Greatly, reach million ranks, especially projects file, technical documentation, business contract, contact letters and each technology path (such as
AP1000, EPR three generations nuclear power technology) transfer the possession of data.Since technical data is largely to be stored in enterprise in the form of semi-structured
In Content Management System (EnterpriseContent Management System, ECMS), information content is huge.
The metadata information of nuclear power file structure in information platform other than embodying, in the work of non-structured entity
Also there is corresponding embodiment in journey document files, and in project implementing process, the metadata stored in ECM is needed through entity text
The form of part shows Field Force, therefore the accuracy of nuclear power document information directly affects constructing and implementing for project, in order to
Guarantee nuclear power engineering quality and nuclear safety, the standardization of document checks the nuclear power document management important foundation for being with meta data match
Work.
Nuclear power document carries out electronization, paperless management, and electronic workflow examination & approval and automation digital signature mention significantly
High production efficiency, but documentation review needs to put into a large amount of manpowers, becomes the bottleneck of document circulation.Nuclear power documentation review works
The cumbersome routine work of one complexity, each Engineering Documents need to carry out up to 24, are required to artificial nucleus to inspection,
It needs to check daily and checks several hundred parts of project files and engineering letters, consume a large amount of manpower and cost, and this repeatability
Work.
The patent application of Publication No. CN106815268A discloses a kind of structuring of unstructured electronic document of magnanimity
Processing method and system.The invention is only from attribute (such as file name, size, catalogue, the Kazakhstan of the entity electronic file of technical data
The information such as uncommon code) it is analyzed and has been extracted, the data in the particular content of non-structured document, especially image are not believed
Breath is further processed.
Summary of the invention
The technical problem to be solved in the present invention is that in view of the above drawbacks of the prior art, providing a kind of nuclear power file school
Test processing method and system.
The technical solution adopted by the present invention to solve the technical problems is: a kind of nuclear power file verification processing method is constructed,
Include:
Unstructured nuclear power file and relevant metadata information are obtained from enterprise content management system;
According to the unstructured nuclear power file and relevant metadata information got, the verification rule based on preparatory typing is matched
Confidence breath obtains verification rule;
Based on the verification rule, image segmentation is carried out to the unstructured nuclear power file, and to the letter split
It ceases block picture and carries out Text region to extract picture structure information;
In conjunction with the picture structure information, document data verification is carried out.
Preferably, the method also includes:
Before carrying out image segmentation, the unstructured nuclear power file is pre-processed, the pretreatment includes: pair
The unstructured nuclear power file successively carries out gray proces, binary conversion treatment, filtering noise reduction process and picture Slant Rectify.
Preferably, the method also includes:
The colouring information of the unstructured nuclear power file is extracted, file attribute information required for verifying is obtained;
After carrying out image segmentation, the clarity of identification information block picture;
When carrying out document data verification, based on the verification rule, to the picture structure information extracted, clearly
Clear degree information, the colouring information file extracted and the file attribute information got carry out overall calibration, export verification knot
Fruit provides the explanation of every check results to user interface as the result is shown.
Preferably, described that image segmentation is carried out to the unstructured nuclear power file, comprising: to the unstructured core
Picture in text part carries out image edge identification;Slant correction is carried out to the picture for identifying edge;It is partitioned into slant correction
Picture afterwards;According to the file template in verification rule, location information is obtained, single block of information is extracted according to location information positioning
Picture.
It is preferably, described that Text region is carried out to extract picture structure information to the block of information picture that splits,
Include:
Row is carried out to the single block of information picture split, character segmentation obtains single text picture;
It is analyzed by the statistical nature to single text picture, obtains feature vector;
Described eigenvector is inputted into artificial neural network to obtain the text information of the single text picture.
A kind of nuclear power file verification processing system is also claimed in the present invention, comprising:
Content Management System interface, for obtaining unstructured nuclear power file and related member from enterprise content management system
Data information;
Rule acquisition module is verified, for according to the unstructured nuclear power file and relevant metadata information got, base
In the verification rule configuration information of preparatory typing, verification rule is obtained;
Picture structure information extraction modules, for regular based on the verification, to the unstructured nuclear power file into
Row image segmentation, and Text region is carried out to extract picture structure information to the block of information picture split;
Correction verification module, for carrying out document data verification in conjunction with the picture structure information.
Preferably, the system also includes:
Preprocessing module, for being sent to the picture structureization letter after pre-processing to the unstructured nuclear power file
Breath extraction module is handled;Wherein, the pretreatment includes: successively to carry out at gray scale to the unstructured nuclear power file
Reason, binary conversion treatment, filtering noise reduction process and picture Slant Rectify.
Preferably, the system also includes:
Colouring information extraction module, for extracting the colouring information of the unstructured nuclear power file and being sent to the verification
Module;
File attribute extraction module, for obtaining file attribute information required for verification and being sent to the correction verification module;
Clarity identification module the clarity of block of information picture and is sent to the correction verification module for identification;
The correction verification module is specifically used for based on the verification rule, to the picture structure information extracted, clearly
Clear degree information, the colouring information file extracted and the file attribute information got carry out overall calibration, export verification knot
Fruit provides the explanation of every check results to user interface as the result is shown.
Preferably, the picture structure information extraction modules include:
Picture segmentation extraction unit, for carrying out image edge identification to the picture in the unstructured nuclear power file,
Slant correction is carried out to the picture for identifying edge, the picture after being partitioned into slant correction;And according to the text in verification rule
Part template obtains location information, extracts single block of information picture according to location information positioning.
Preferably, the picture structure information extraction modules include:
Word recognition unit, for carrying out row to the single block of information picture split, character segmentation obtains single text
Picture is analyzed by the statistical nature to single text picture, obtains feature vector, described eigenvector is inputted artificial
Neural network is to obtain the text information of the single text picture.
Nuclear power file verification processing method of the invention and system have the advantages that the present invention is suitable for packet
The checking treatment of unstructured nuclear power file containing picture has filled up the sky of nuclear power enterprise content automated image identification verification
It is white, it preparatory typing configuration information can be advised according to demand with realizing for different file type realization diversification customization verifications
Then, it adapts to that nuclear power is multi-specialized, development characteristic of multicomputer, more technology paths, ensure that the integrality and accuracy of nuclear power content,
It is greatly improved production efficiency, reduces human cost.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this
The embodiment of invention for those of ordinary skill in the art without creative efforts, can also basis
The attached drawing of offer obtains other attached drawings:
Fig. 1 is the method flow diagram of the embodiment of the present invention one;
Fig. 2 is artificial nerve network model;
Fig. 3 is the system structure diagram of the embodiment of the present invention two.
Specific embodiment
To facilitate the understanding of the present invention, a more comprehensive description of the invention is given in the following sections with reference to the relevant attached drawings.In attached drawing
Give exemplary embodiments of the invention.But the invention can be realized in many different forms, however it is not limited to this paper institute
The embodiment of description.On the contrary, purpose of providing these embodiments is make it is more thorough and comprehensive to the disclosure.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention
The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool
The purpose of the embodiment of body, it is not intended that in the limitation present invention.
The total thinking of the present invention is: unstructured nuclear power file and related first number are first obtained from enterprise content management system
It is believed that breath;It is then based on the verification rule configuration information of preparatory typing, obtains verification rule;It is right again based on the verification rule
The unstructured nuclear power file carries out image segmentation, and carries out Text region to the block of information picture split to extract figure
Chip architecture information;Finally, carrying out document data verification in conjunction with the picture structure information.So it may be implemented through picture
In content information extract to be formed and structured message and then verified, filled up the knowledge of nuclear power enterprise content automated image
The blank not verified, and the verification rule configuration information different by typing can be realized more for different file types
Sampleization customization verification rule, adapts to that nuclear power is multi-specialized, development characteristic of multicomputer, more technology paths, ensure that nuclear power content
Integrality and accuracy are greatly improved production efficiency, reduce human cost.
In order to better understand the above technical scheme, in conjunction with appended figures and specific embodiments to upper
It states technical solution to be described in detail, it should be understood that the specific features in the embodiment of the present invention and embodiment are to the application
The detailed description of technical solution, rather than the restriction to technical scheme, in the absence of conflict, the present invention are implemented
Technical characteristic in example and embodiment can be combined with each other.
Embodiment one
With reference to Fig. 1, embodiment one discloses a kind of nuclear power file verification processing method, and method includes:
S101, unstructured nuclear power file and relevant metadata information are obtained from enterprise content management system (ECMS),
Metadata information includes document No., FileVersion, state, title etc..
The unstructured nuclear power file and relevant metadata information that S102, basis are got, the verification based on preparatory typing
Rule configuration information obtains verification rule.Verification rule includes the classification of nuclear power file and the template of nuclear power file.
It wherein, is according to nuclear power document management rule typing related data when the verification rule configuration information of preparatory typing.
In the present embodiment, verification rule configuration information is stored in following three databases: document classification rule base, verification regional rule
Library and metadata verification rule base, the metadata of three database difference definition structures, non-structured nuclear power file
And the incidence relation between the two.Related description is as follows.
Meta={ att1,att2,att3...attn}
Position={ pos1,pos2,pos3...posj}
Category={ ca1,ca2,ca3...cam}
Template={ tpl1,tpl2,tpl3...tpli}
Validate=(c, t) | c ∈ Categories, t ∈ Templates }
Wherein, Meta is the set of the metadata of nuclear power file, such as document No., FileVersion, state, title.
Position is location of content of nuclear power file, including page-size, number of pages, positioning coordinate etc..
Category is the classification of nuclear power file, i.e., nuclear power document segments classification, such as project file, engineering letters, contract
Deng.The foundation of classification is all from the combination of metadata.Categories is nuclear power document classification set.
Template is the template of nuclear power file, the classification for any one nuclear power file, all there is a corresponding mould
Plate includes classification information and corresponding metadata location in each template.Templates is the template set of nuclear power file.
Validate is verification regular collection, the classification for each nuclear power file, corresponding one or more nuclear power file
Template.
S103, the unstructured nuclear power file is pre-processed;
Wherein, the pretreatment includes four steps: successively carrying out gray proces, two to the unstructured nuclear power file
Value processing, filtering noise reduction process and picture Slant Rectify, are below described in detail this four steps.
1031) gray proces: before carrying out gray processing processing, need to verify the color category of the unstructured nuclear power file
Property, if verification passes through, it may be considered that the color of each pixel may be expressed as: C=xR+yG+zB in file, wherein x+y+z
=1.Then each pixel in color document can be weighted gray processing processing: Gray=(ω according to following formulaRR+ωGG+ωBB)÷3.Since human eye is most sensitive to green, red is taken second place, minimum to the sensibility of blue, therefore makes ωG> ωR
> ωBIt will obtain gray level image more easy to identify.Distinguishingly, setting weight is as follows, ωR=0.299, ωB=0.587, ωG=
0.114, ωG=0.114;ωR=0.299;ωB=0.587, the obtained gray level image effect is best.
1032) binary conversion treatment: since pretreated effect quality directly affects the performance of subsequent Text region, and nuclear power
The file content identified in file is mostly letter or number, and therefore, it is necessary to carry out binary conversion treatment to picture, processing formula is such as
Under:
Gray scale in document is adjusted by binary conversion treatment, is higher than threshold value T0It is 1 (255 are set as in gray scale),
Lower than threshold value T0It is 0 (0 is set as in gray scale).Wherein, T0It generally may be configured as the gray value of all pixels in entire file
Maxima and minima average value:
1033) it filters noise reduction process: to improve discrimination, the noise processed of file is especially scanned, using image smoothing
Filtering algorithm eliminates picture noise.For example, can to the image for having carried out binary conversion treatment, selected window having a size of m*n (m, n >
1, and be odd number, generally 3*3) filter, and be odd number, generally 3*3), find intermediate value in all boundary values, as
The numerical value of the central point.For example, 9 pixels of the 3*3 chosen for filter, if this 9 pixels before filtering
Value is respectively as follows:
0 0 0
0 255 0
0 0 0
Obviously, intermediate point is noise, and after the filtering noise reduction process of this step, it is each that the 255 of intermediate point become periphery
The intermediate value (average) 0 of point, i.e., after noise reduction, become
0 0 0
0 0 0
0 0 0
1034) picture Slant Rectify: can use the method based on projection, file be projected along assigned direction, if obtaining
Dry perspective view obtains the inclination angle of file further according to the projection properties (such as mean square deviation) of perspective view, can be simultaneously according to the inclination angle
Complete the slant correction of file.
S104, the colouring information for extracting the unstructured nuclear power file obtain file attribute information required for verifying.
File attribute information required for verifying can be obtained, as file name, file are big according to nuclear power file Global Information herein
Small, file format etc..
S105, it is based on the verification rule, image segmentation is carried out to the unstructured nuclear power file.It needs to illustrate
It is S105 and S104 execution sequence between the two and with no restrictions.Only it need to guarantee to perform before final step S109
Step S105 and S104.
It can be determined according to the verification rule got in conjunction with the unstructured nuclear power file that step S101 is obtained
The every content verified, wherein image segmentation can then be carried out for the image in nuclear power file, image segmentation
Detailed process is as follows:
1051) image edge identification is carried out to the picture in the unstructured nuclear power file: can use edge detection
File after operator pre-processes step S103 carries out convolution algorithm, is then detected in nuclear power file using Hough algorithm
Image edge on straightway;
1052) to identifying that the picture at edge carries out slant correction: straightway obtained in the previous step is successively decreased row by length
Sequence selects several longest straightways to calculate several straightways tilt angle relative to horizontal direction, takes this several
Tilt angle of the intermediate value of the tilt angle of a straightway as whole image, the tilt angle based on whole image can pass through
Rotation image carries out slant correction to it;
1053) picture after being partitioned into slant correction: the straightway on retention level direction and vertical direction removes other
Straightway;Calculate the distance between the different straightway endpoints that remain, if it is less than the threshold value of setting, then to straightway into
Row connection, can so obtain the unit table images of table;
1054) single block of information picture is extracted in positioning: firstly, obtaining position letter according to the file template in verification rule
Breath, according to the position of location information location information block picture;Secondly, being known after the position of location information block picture by edge
Other algorithm takes off block of information picture;Finally, the block of information picture for taking off out is temporarily saved according to rule.Wherein, position
Confidence breath includes the page number, starting point and end point.In one specific embodiment, the information that edge algorithms will be taken off can use
Block picture saves as BMP format.
S106, Text region is carried out to the block of information picture split to extract picture structure information, specifically included:
1061) row is carried out to the single block of information picture split, character segmentation obtains single text picture;
Wherein, row cutting can use the pixel accumulation method of bianry image, shown as the following formula.
Wherein F (i, j) is text bianry image, and L is capable length, and p is greater than zero experimental constant, depending on making an uproar for document
Point.For asterisk wildcard.WhenFor >=when, expression formula if so, be then the row upper bound;WhenFor≤when be then row lower bound.Bound
Between can cutting be a line.The method of character segmentation is similar therewith, and character segmentation is equivalent to 90 ° of picture rotation after row cutting
Row cutting is carried out again, and L at this time is no longer capable length certainly, but word is high.
1062) it is analyzed by the statistical nature to single text picture, it is special using local gray level algorithm abstraction grid
Sign obtains feature vector;
1063) described eigenvector is inputted into artificial neural network (Artificial Neural Network), such as Fig. 2
It is shown, it is arranged by the weight connected to neuron in artificial neural network, calculates whether nonlinear activation function is greater than threshold value,
And then output category information yk, classification information yk are also with regard to the text information of the single text picture.
Wherein, xjIt is neuron input information, is the feature vector obtained in step 1062);ωkjIt is that neuron k connects
The weight connect, θkFor threshold value,For activation primitive, ykFor the classification information of neuron k output.
The clarity of S107, identification information block picture;
Clarity identification uses gradient algorithm, uses Sobel operator extraction horizontal and vertical two to the picture in file
The gradient value in direction realizes definition judgment based on TenenGrad energy gradient function.
D (f)=∑y∑x| G (x, y) |, (G (x, y) > T)
Wherein, D (f) indicates clarity, and T is given edge detection threshold, and Gx and Gy are at pixel (x, y) respectively
The convolution of Sobel both horizontally and vertically edge detection operator.Wherein Sobel operator template is as follows.
S108, it is based on the verification rule, to the picture structure information extracted, sharpness information, extracted
Colouring information file and the file attribute information that gets carry out overall calibration, export check results are to user as the result is shown
Interface, and the explanation of every check results is provided.
Preferably, method further include: the log during entire method is recorded, such as picture segmentation as a result, member in picture
Data recognition result etc..
Embodiment two
With reference to Fig. 3, present embodiment discloses a kind of nuclear power file verification processing systems, comprising:
Content Management System interface 201, for obtaining unstructured nuclear power file and phase from enterprise content management system
Close metadata information.Metadata information includes document No., FileVersion, state, title etc..
Rule acquisition module 202 is verified, for according to the unstructured nuclear power file and relevant metadata information that get,
Verification rule configuration information based on preparatory typing obtains verification rule.Verification rule includes classification and the nuclear power of nuclear power file
The template of file.
It wherein, is according to nuclear power document management rule typing related data when the verification rule configuration information of preparatory typing.
In the present embodiment, verification rule configuration information is stored in following three databases: document classification rule base, verification regional rule
Library and metadata verification rule base, the metadata of three database difference definition structures, non-structured nuclear power file
And the incidence relation between the two, particular content can be with the corresponding contents of reference implementation example one, details are not described herein again.
Preprocessing module 203, for being sent to the picture structure after pre-processing to the unstructured nuclear power file
Change information extraction modules to be handled;Wherein, the pretreatment includes: successively to carry out gray scale to the unstructured nuclear power file
Processing, binary conversion treatment, filtering noise reduction process and picture Slant Rectify.Particular content can be in the correspondence of reference implementation example one
Hold, details are not described herein again.
Colouring information extraction module 204, for extracting the colouring information of the unstructured nuclear power file and being sent to described
Correction verification module;
File attribute extraction module 205, for obtaining file attribute information required for verification and being sent to the calibration mode
Block;
Picture structure information extraction modules 206, for regular based on the verification, to the unstructured nuclear power file
Image segmentation is carried out, and Text region is carried out to extract picture structure information to the block of information picture split;
Clarity identification module 207 clarity of block of information picture and is sent to the correction verification module for identification;Specific mistake
Journey can be with the corresponding content of reference implementation example one, and details are not described herein again.
Correction verification module 208, for regular based on the verification, to the picture structure information, the clarity extracted
Information, the colouring information file extracted and the file attribute information got carry out overall calibration, and export check results are given
User interface as the result is shown, and the explanation of every check results is provided.
Logger module 209, the log in operating process for recording whole system, such as picture segmentation as a result, figure
Metadata recognition result in piece etc..
Specifically, above-mentioned picture structure information extraction modules 206 include:
Picture segmentation extraction unit 2061, for carrying out image edge knowledge to the picture in the unstructured nuclear power file
Not, the picture to the picture progress slant correction for identifying edge, after being partitioned into slant correction;And according in verification rule
File template obtains location information, extracts single block of information picture according to location information positioning.Particular content can be with reference implementation
The corresponding content of example one, details are not described herein again.
Word recognition unit 2062, for carrying out row to the single block of information picture split, character segmentation obtains individually
Text picture is analyzed by the statistical nature to single text picture, obtains feature vector, described eigenvector is inputted
Artificial neural network is to obtain the text information of the single text picture.Particular content can be in the correspondence of reference implementation example one
Hold, details are not described herein again.
It should be pointed out that being divided into these modules above in the description of various modules, being for clarity.So
And in actual implementation, the boundary of various modules can be fuzzy.For example, any or all functionalities module herein
Various hardware and/or software element can be shared.In another example any and/or all functional modules herein can be by sharing
Processor execute software instruction and completely or partially implement.In addition, various softwares being performed by one or more processors
Module can be shared between various software modules.Correspondingly, unless explicitly requested, the scope of the present invention not by various hardware and/
Or between software element mandatory boundary limitation.
In conclusion nuclear power file verification processing method of the invention and system, have the advantages that the present invention
It is suitable for inclusion in the checking treatment of the unstructured nuclear power file of picture, has filled up nuclear power enterprise content automated image identification school
The blank tested, can according to demand preparatory typing configuration information to realize that being directed to different file type realizes diversification customization school
It tests rule, adapts to that nuclear power is multi-specialized, development characteristic of multicomputer, more technology paths, ensure that the integrality and standard of nuclear power content
True property is greatly improved production efficiency, reduces human cost.
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form, all of these belong to the protection of the present invention.
Claims (10)
1. a kind of nuclear power file verification processing method characterized by comprising
Unstructured nuclear power file and relevant metadata information are obtained from enterprise content management system;
According to the unstructured nuclear power file and relevant metadata information got, the verification rule based on preparatory typing matches confidence
Breath obtains verification rule;
Based on the verification rule, image segmentation is carried out to the unstructured nuclear power file, and to the block of information split
Picture carries out Text region to extract picture structure information;
In conjunction with the picture structure information, document data verification is carried out.
2. nuclear power file verification processing method according to claim 1, which is characterized in that the method also includes:
Before carrying out image segmentation, the unstructured nuclear power file is pre-processed, the pretreatment includes: to described
Unstructured nuclear power file successively carries out gray proces, binary conversion treatment, filtering noise reduction process and picture Slant Rectify.
3. nuclear power file verification processing method according to claim 1, which is characterized in that the method also includes:
The colouring information of the unstructured nuclear power file is extracted, file attribute information required for verifying is obtained;
After carrying out image segmentation, the clarity of identification information block picture;
When carrying out document data verification, based on the verification rule, to the picture structure information, the clarity extracted
Information, the colouring information file extracted and the file attribute information got carry out overall calibration, and export check results are given
User interface as the result is shown, and the explanation of every check results is provided.
4. nuclear power file verification processing method according to claim 1, which is characterized in that described to described unstructured
Nuclear power file carries out image segmentation, comprising:
Image edge identification is carried out to the picture in the unstructured nuclear power file;
Slant correction is carried out to the picture for identifying edge;
Picture after being partitioned into slant correction;
According to the file template in verification rule, location information is obtained, single block of information picture is extracted according to location information positioning.
5. nuclear power file verification processing method according to claim 1, which is characterized in that described to the letter split
It ceases block picture and carries out Text region to extract picture structure information, comprising:
Row is carried out to the single block of information picture split, character segmentation obtains single text picture;
It is analyzed by the statistical nature to single text picture, obtains feature vector;
Described eigenvector is inputted into artificial neural network to obtain the text information of the single text picture.
6. a kind of nuclear power file verification processing system characterized by comprising
Content Management System interface, for obtaining unstructured nuclear power file and associated metadata from enterprise content management system
Information;
Rule acquisition module is verified, for being based on pre- according to the unstructured nuclear power file and relevant metadata information got
The verification rule configuration information of first typing obtains verification rule;
Picture structure information extraction modules, for carrying out figure to the unstructured nuclear power file based on the verification rule
Text region is carried out as segmentation, and to the block of information picture split to extract picture structure information;
Correction verification module, for carrying out document data verification in conjunction with the picture structure information.
7. nuclear power file verification processing system according to claim 6, which is characterized in that the system also includes:
Preprocessing module is mentioned for being sent to the picture structure information after pre-processing to the unstructured nuclear power file
Modulus block is handled;
Wherein, the pretreatment includes: successively to carry out gray proces, binary conversion treatment, filter to the unstructured nuclear power file
Wave noise reduction process and picture Slant Rectify.
8. nuclear power file verification processing system according to claim 6, which is characterized in that the system also includes:
Colouring information extraction module, for extracting the colouring information of the unstructured nuclear power file and being sent to the calibration mode
Block;
File attribute extraction module, for obtaining file attribute information required for verification and being sent to the correction verification module;
Clarity identification module the clarity of block of information picture and is sent to the correction verification module for identification;
The correction verification module is specifically used for based on the verification rule, to the picture structure information, the clarity extracted
Information, the colouring information file extracted and the file attribute information got carry out overall calibration, and export check results are given
User interface as the result is shown, and the explanation of every check results is provided.
9. nuclear power file verification processing system according to claim 6, which is characterized in that the picture structure information mentions
Modulus block includes:
Picture segmentation extraction unit, for carrying out image edge identification to the picture in the unstructured nuclear power file, to knowledge
Not Chu edge picture carry out slant correction, the picture after being partitioned into slant correction;And according to the file mould in verification rule
Plate obtains location information, extracts single block of information picture according to location information positioning.
10. nuclear power file verification processing system according to claim 6, which is characterized in that the picture structure information
Extraction module includes:
Word recognition unit, for carrying out row to the single block of information picture split, character segmentation obtains single text picture,
It is analyzed by the statistical nature to single text picture, obtains feature vector, described eigenvector is inputted into artificial neuron
Network is to obtain the text information of the single text picture.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811122661.6A CN109446345A (en) | 2018-09-26 | 2018-09-26 | Nuclear power file verification processing method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811122661.6A CN109446345A (en) | 2018-09-26 | 2018-09-26 | Nuclear power file verification processing method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109446345A true CN109446345A (en) | 2019-03-08 |
Family
ID=65544449
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811122661.6A Pending CN109446345A (en) | 2018-09-26 | 2018-09-26 | Nuclear power file verification processing method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109446345A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414517A (en) * | 2019-04-18 | 2019-11-05 | 河北神玥软件科技股份有限公司 | It is a kind of for cooperating the quick high accuracy identity card text recognition algorithms for scene of taking pictures |
CN111159997A (en) * | 2019-12-31 | 2020-05-15 | 无锡识凌科技有限公司 | Intelligent verification method for enterprise bid document |
CN111581191A (en) * | 2020-04-10 | 2020-08-25 | 岭东核电有限公司 | Nuclear safety data verification method and device, computer equipment and storage medium |
CN112434508A (en) * | 2020-12-10 | 2021-03-02 | 清研灵智信息咨询(北京)有限公司 | Research report automatic generation method based on deep learning |
CN113239893A (en) * | 2021-06-10 | 2021-08-10 | 深圳智子系科技有限公司 | Document input rechecking method, system, electronic equipment and medium |
CN113723913A (en) * | 2021-08-05 | 2021-11-30 | 中核武汉核电运行技术股份有限公司 | Nuclear power plant file management method, device, equipment and storage medium |
CN114399774A (en) * | 2022-01-19 | 2022-04-26 | 润申标准化技术服务(上海)有限公司 | File processing method and device and electronic equipment |
CN117391068A (en) * | 2023-10-27 | 2024-01-12 | 中国人寿保险股份有限公司山东省分公司 | Method and system for checking life insurance security business information based on RPA |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103258198A (en) * | 2013-04-26 | 2013-08-21 | 四川大学 | Extraction method for characters in form document image |
CN103761290A (en) * | 2014-01-15 | 2014-04-30 | 浪潮(北京)电子信息产业有限公司 | Data management method and system based on content aware |
US20160063096A1 (en) * | 2014-08-27 | 2016-03-03 | International Business Machines Corporation | Image relevance to search queries based on unstructured data analytics |
CN105678612A (en) * | 2015-12-30 | 2016-06-15 | 远光软件股份有限公司 | Mobile terminal original certificate electronic intelligent filling system and method |
CN106815268A (en) * | 2015-12-01 | 2017-06-09 | 中广核工程有限公司 | The structuring processing method and system of magnanimity destructuring e-file |
CN107491730A (en) * | 2017-07-14 | 2017-12-19 | 浙江大学 | A kind of laboratory test report recognition methods based on image procossing |
-
2018
- 2018-09-26 CN CN201811122661.6A patent/CN109446345A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103258198A (en) * | 2013-04-26 | 2013-08-21 | 四川大学 | Extraction method for characters in form document image |
CN103761290A (en) * | 2014-01-15 | 2014-04-30 | 浪潮(北京)电子信息产业有限公司 | Data management method and system based on content aware |
US20160063096A1 (en) * | 2014-08-27 | 2016-03-03 | International Business Machines Corporation | Image relevance to search queries based on unstructured data analytics |
CN106815268A (en) * | 2015-12-01 | 2017-06-09 | 中广核工程有限公司 | The structuring processing method and system of magnanimity destructuring e-file |
CN105678612A (en) * | 2015-12-30 | 2016-06-15 | 远光软件股份有限公司 | Mobile terminal original certificate electronic intelligent filling system and method |
CN107491730A (en) * | 2017-07-14 | 2017-12-19 | 浙江大学 | A kind of laboratory test report recognition methods based on image procossing |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110414517A (en) * | 2019-04-18 | 2019-11-05 | 河北神玥软件科技股份有限公司 | It is a kind of for cooperating the quick high accuracy identity card text recognition algorithms for scene of taking pictures |
CN111159997A (en) * | 2019-12-31 | 2020-05-15 | 无锡识凌科技有限公司 | Intelligent verification method for enterprise bid document |
CN111159997B (en) * | 2019-12-31 | 2024-04-05 | 无锡识凌科技有限公司 | Intelligent verification method for enterprise bidding document |
CN111581191A (en) * | 2020-04-10 | 2020-08-25 | 岭东核电有限公司 | Nuclear safety data verification method and device, computer equipment and storage medium |
CN111581191B (en) * | 2020-04-10 | 2023-10-13 | 岭东核电有限公司 | Nuclear safety data verification method, device, computer equipment and storage medium |
CN112434508A (en) * | 2020-12-10 | 2021-03-02 | 清研灵智信息咨询(北京)有限公司 | Research report automatic generation method based on deep learning |
CN113239893A (en) * | 2021-06-10 | 2021-08-10 | 深圳智子系科技有限公司 | Document input rechecking method, system, electronic equipment and medium |
CN113723913A (en) * | 2021-08-05 | 2021-11-30 | 中核武汉核电运行技术股份有限公司 | Nuclear power plant file management method, device, equipment and storage medium |
CN114399774A (en) * | 2022-01-19 | 2022-04-26 | 润申标准化技术服务(上海)有限公司 | File processing method and device and electronic equipment |
CN117391068A (en) * | 2023-10-27 | 2024-01-12 | 中国人寿保险股份有限公司山东省分公司 | Method and system for checking life insurance security business information based on RPA |
CN117391068B (en) * | 2023-10-27 | 2024-04-05 | 中国人寿保险股份有限公司山东省分公司 | Method and system for checking life insurance security business information based on RPA |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109446345A (en) | Nuclear power file verification processing method and system | |
KR101515256B1 (en) | Document verification using dynamic document identification framework | |
AU2004271639B2 (en) | Systems and methods for biometric identification using handwriting recognition | |
US7724958B2 (en) | Systems and methods for biometric identification using handwriting recognition | |
WO2020164278A1 (en) | Image processing method and device, electronic equipment and readable storage medium | |
JP2011507101A (en) | Identification and verification of unknown documents by eigenimage processing | |
CN107273783A (en) | Face identification system and its method | |
WO2020071558A1 (en) | Business form layout analysis device, and analysis program and analysis method therefor | |
CN112464925A (en) | Mobile terminal account opening data bank information automatic extraction method based on machine learning | |
CN107256378A (en) | Language Identification and device | |
CN110222660B (en) | Signature authentication method and system based on dynamic and static feature fusion | |
Bulatov et al. | Towards a unified framework for identity documents analysis and recognition | |
CN114821725A (en) | Miner face recognition system based on neural network | |
CN112001318A (en) | Identity document information acquisition method and system | |
CN112508000A (en) | Method and equipment for generating OCR image recognition model training data | |
CN112200789A (en) | Image identification method and device, electronic equipment and storage medium | |
CN116959015A (en) | File classification and archiving system based on image recognition | |
CN111414889B (en) | Financial statement identification method and device based on character identification | |
CN108460772A (en) | Harassing of advertisement facsimile signal detecting system based on convolutional neural networks and method | |
CN113705560A (en) | Data extraction method, device and equipment based on image recognition and storage medium | |
CN113627442A (en) | Medical information input method, device, equipment and storage medium | |
Girinath et al. | Automatic Number Plate Detection using Deep Learning | |
CN111291726A (en) | Medical bill sorting method, device, equipment and medium | |
Bogahawatte et al. | Online Digital Cheque Clearance and Verification System using Block Chain | |
Ning et al. | Design of an automated data entry system for hand-filled forms |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190308 |
|
RJ01 | Rejection of invention patent application after publication |