CN109871521A - A kind of generation method and equipment of electronic document - Google Patents
A kind of generation method and equipment of electronic document Download PDFInfo
- Publication number
- CN109871521A CN109871521A CN201910017061.1A CN201910017061A CN109871521A CN 109871521 A CN109871521 A CN 109871521A CN 201910017061 A CN201910017061 A CN 201910017061A CN 109871521 A CN109871521 A CN 109871521A
- Authority
- CN
- China
- Prior art keywords
- character
- solid images
- document
- entity
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
Abstract
The present invention is suitable for technical field of image processing, provides the generation method and equipment of a kind of electronic document, comprising: obtains the solid images of target entity, and determines the entity type of target entity according to solid images, obtains and the matched document template of entity type;Preset character recognition algorithm is adjusted based on entity type, passes through character recognition algorithm output character information;The centre coordinate of recognized character is obtained according to character zone image, and by centre coordinate and the effective coverage of each document items, determines document items belonging to recognized character;Recognized character is directed into document items affiliated in document template, generates the electronic document about target entity.The present invention chooses manually without user, and according to the position where character, and the project imported needed for determining reduces and imports abnormal situation, without semantic analysis is carried out, improves the efficiency of generation.
Description
Technical field
The invention belongs to technical field of image processing more particularly to the generation methods and equipment of a kind of electronic document.
Background technique
With the continuous propulsion of electronicalization process, since electronic document has many advantages, such as that convenient storage and transmission are timely,
It is widely used in various applications, how entity file is effectively converted into electronic document, then directly affects document
The efficiency of management.The generation technique of existing electronic document artificially identifies corresponding to the entity file generally by administrator
Electronic stencil, and the content for including in entity file is filled up to manually in each project of electronic stencil, works as entity file
When quantity is more and text amount is larger, then more time is needed to carry out the conversion of electronic document, to reduce electronic document
The efficiency of generation.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of generation method of electronic document and equipment, it is existing to solve
The generation technique of electronic document, needs artificially to identify electronic stencil corresponding to the entity file by administrator, and by entity
The content for including in file is filled up to manually in each project of electronic stencil, the lower problem of document structure tree efficiency.
The first aspect of the embodiment of the present invention provides a kind of generation method of electronic document, comprising:
The solid images of target entity are obtained, and determine the entity type of the target entity according to the solid images,
It obtains and the matched document template of the entity type;The document template includes multiple document items;
Preset character recognition algorithm is adjusted based on the entity type, the character recognition algorithm pair by adjusting after
Solid images are handled, and the character information about the solid images is exported;The character information include recognized character with
And the character zone image of the recognized character;
Obtain the centre coordinate of the recognized character according to the character zone image, and by the centre coordinate with
And the effective coverage of each document items, determine the document items belonging to the recognized character;
The recognized character is directed into the document items affiliated in the document template, is generated about the mesh
Mark the electronic document of entity.
The second aspect of the embodiment of the present invention provides a kind of generating device of electronic document, comprising:
Solid images acquiring unit determines institute for obtaining the solid images of target entity, and according to the solid images
The entity type of target entity is stated, is obtained and the matched document template of the entity type;The document template includes multiple texts
Shelves project;
Character information output unit, for adjusting preset character recognition algorithm based on the entity type, by adjusting
The character recognition algorithm afterwards handles solid images, exports the character information about the solid images;The word
According with information includes recognized character and the character zone image of the recognized character;
Document items determination unit, the center for obtaining the recognized character according to the character zone image are sat
Mark, and by the centre coordinate and the effective coverage of each document items, it determines belonging to the recognized character
The document items;
Character information import unit, for the recognized character to be directed into the text affiliated in the document template
Shelves project, generates the electronic document about the target entity.
The third aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in
In the memory and the computer program that can run on the processor, when the processor executes the computer program
Realize each step of first aspect.
The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage
Media storage has computer program, and each step of first aspect is realized when the computer program is executed by processor.
The generation method and equipment for implementing a kind of electronic document provided in an embodiment of the present invention have the advantages that
The embodiment of the present invention is then determined according to the solid images by the solid images of acquisition target entity to be converted
The entity type of target entity obtains the document template to match with entity type;Character recognition is adjusted according to entity type to calculate
Method is extracted the character information for including in solid images, and is determined according to the centre coordinate of recognized character each in character information
Corresponding document items then successively import in each associated document items of document template, generate the electricity about target entity
Subdocument realizes the purpose of electronic document automatically generated.Compared with the generation technique of existing electronic document, the present invention is implemented
Example can determine corresponding document items according to the position of character, and be adjusted according to entity type to character recognition algorithm, improve
The accuracy of character recognition algorithm, chooses without user manually, reduces and imports abnormal situation, without carrying out semantic point
Analysis, improves the efficiency of generation.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some
Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is a kind of implementation flow chart of the generation method for electronic document that first embodiment of the invention provides;
Fig. 2 is a kind of generation method S102 specific implementation flow chart for electronic document that second embodiment of the invention provides;
Fig. 3 is a kind of generation method S103 specific implementation flow chart for electronic document that third embodiment of the invention provides;
Fig. 4 is a kind of generation method specific implementation flow chart for electronic document that fourth embodiment of the invention provides;
Fig. 5 is a kind of generation method S101 specific implementation flow chart for electronic document that fifth embodiment of the invention provides;
Fig. 6 is a kind of structural block diagram of the generating device for electronic document that one embodiment of the invention provides;
Fig. 7 be another embodiment of the present invention provides a kind of terminal device schematic diagram.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right
The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and
It is not used in the restriction present invention.
The embodiment of the present invention is then determined according to the solid images by the solid images of acquisition target entity to be converted
The entity type of target entity obtains the document template to match with entity type;Character recognition is adjusted according to entity type to calculate
Method is extracted the character information for including in solid images, and is determined according to the centre coordinate of recognized character each in character information
Corresponding document items then successively import in each associated document items of document template, generate the electricity about target entity
Subdocument realizes the purpose of electronic document automatically generated, solves the generation technique of existing electronic document, needs to pass through pipe
Reason person artificially identifies electronic stencil corresponding to the entity file, and the content for including in entity file is filled up to electronics manually
In each project of template, the lower problem of document structure tree efficiency.
In embodiments of the present invention, the executing subject of process is terminal device.The terminal device includes but is not limited to: service
The equipment that device, computer, smart phone and tablet computer etc. are able to carry out the generation operation of electronic document.Fig. 1 shows this
The implementation flow chart of the generation method for the electronic document that invention first embodiment provides, details are as follows:
In S101, the solid images of target entity are obtained, and determine the target entity according to the solid images
Entity type obtains and the matched document template of the entity type;The document template includes multiple document items.
In the present embodiment, terminal device can receive the sterogram about target entity of user terminal transmission, at this
In the case of, user can acquire the solid images about target entity, and lead to by the shooting unit of own user terminal built-in
The client that user terminal is equipped with is crossed, the solid images that shooting obtains are uploaded to terminal device, terminal device is receiving
After the image uploading instructions of client, then the relevant operation of S101 is executed.Optionally, terminal device uploads image to ensure
Legitimacy, terminal device can obtain the program number of client, judge whether the client is by legal by program number
The program file of publication channel downloading rejects the solid images if client number is identified as illegal parameter, returns
Image abnormity information avoids the user of unauthorized from carrying out image recognition to ensure that the legitimacy of authentication operation, so as to cause
Load excessive and reduce recognition efficiency and accuracy rate;Conversely, executing S101's if program number is identified as legal number
Operation.Except through receiving outside the solid images that other equipment are sent, can also by shooting module built in terminal device or
The image acquisition units such as scan module obtain the solid images about target entity, and in this case, terminal device can receive
Client-initiated image capture instruction or when detecting that image acquisition region placed target entity to be identified, then start image
Acquisition module, obtains the image information at pickup area current time, and executes the relevant operation of S101.
In the present embodiment, terminal device can pre-process the solid images received, know so as to improve
The accuracy of other entity type.Specifically executing pretreated mode can be with are as follows: terminal device obtains environment light when acquisition image
By force, bloom regulation coefficient and shade regulation coefficient are determined based on current environmental light intensity, passes through above-mentioned two regulation coefficient pair
The highlight area of solid images and shadow region are adjusted;Identify the boundary profile in solid images about target entity,
Solid images are cut based on the boundary profile, to filter off invalid background area;To cutting and reality adjusted
Body image carries out gray processing processing, so as to improve the accuracy rate of identification.
In the present embodiment, terminal device can determine the corresponding entity type of the target entity according to solid images.Tool
Body identifies that the mode of entity type can be with are as follows: terminal device identifies the picture size of the solid images, and according to the picture size
It is compared with preset file type size list, determines the corresponding file type of the picture size, to acquire pass
In the entity type of the target entity.Other than determining entity type according to picture size, terminal device can also be to target reality
The file title of body is identified that file keyword included in extraction document title determines the entity class of the target entity
Type.
In the present embodiment, terminal device is that different entity types has been pre-configured with corresponding document template, due to not
The document items that same entity type is included can be different, and in order to improve the accuracy being automatically imported, terminal device can be preparatory
Document items corresponding to different entities type are packaged, the document template about the entity type are generated, to improve
The efficiency that successive character imports.The document project is specifically used for the different types of information for indicating that the target entity includes, a kind of
Type information can correspond to a document items, such as recording address name in a certain target entity, then address name can
With a corresponding document items;The target entity also describes user address, then user address can also correspond to another document
Project improves the accuracy of importing so that convenient carry out classification importing to information different types of in entity type.
In S102, preset character recognition algorithm, the character by adjusting after are adjusted based on the entity type
Recognizer handles solid images, exports the character information about the solid images;The character information includes
Identify character and the character zone image of the recognized character.
In the present embodiment, terminal device is after having determined the entity type of target entity, the available entity type
Corresponding recognizer parameter, and preset character recognition algorithm is adjusted by recognizer parameter, so that the word
Symbol recognizer matches with the entity type, to improve the accuracy of character recognition algorithm.Wherein, if character recognition algorithm
For a pond neural network, the character information for including in solid images, the then knowledge are extracted by multiple pond and full articulamentum
Other algorithm parameter can be pond convolution kernel, obtain corresponding pond convolution kernel, particularly, the entity based on entity type
Type includes the character information of multiple and different font sizes and font type, then can correspond to multiple pond convolution kernels, thus
It can be improved the efficiency of character recognition.It, should by sliding window judgement if character recognition algorithm is a window character recognition algorithm
Whether include character in window overlay area, then can obtain corresponding sliding window by entity type.
Optionally, in the present embodiment, which can be the OCR algorithm based on Tessract technology, then
Terminal device can obtain the font sample database being associated according to entity type, so as to include quickly each to solid images
A character carries out match cognization,.Compared with character recognition mode neural network based, by OCR identification flow path efficiency compared with
Height, and it is lower for the hardware requirement of terminal device, on the other hand, character sample library is constructed by Tessract, also can
The character for identifying different fonts type, further improves the efficiency of identification.
In the present embodiment, terminal device can position the character zone where the character after recognizing a character, and
Obtain character coordinates of the coordinate where the central point of the character zone as the recognized character, then based on character coordinates with
Corresponding relationship between recognized character.
In S103, the centre coordinate of the recognized character is obtained according to the character zone image, and by described
The effective coverage of centre coordinate and each document items determines the document items belonging to the recognized character.
In the present embodiment, after terminal device has determined a character in solid images, where the available character
Character zone image, the character zone image is determined by four angular coordinates.And it is determined based on four angular coordinates
Identify the centre coordinate of character.By the character information that different document project is included, can be fixed on belonging to the document project
In effective coverage, thus can characteristic coordinates according to centre coordinate as the recognized character, by calculate centre coordinate with
The distance between effective coverage of document items value, to determine whether the recognized character belongs to this article based on the distance value
Shelves project.Optionally, if the distance value is less than preset correlation distance, determine that the recognized character belongs to the document project;
Conversely, determining that the recognized character is not belonging to the document project if the distance value is greater than or equal to preset correlation threshold, count
Calculate the distance value of the recognized character Yu other document items.
In the present embodiment, since entity documents are after printing based on document template, to pass through business personnel or client's hand
The document generated after corresponding informance is write, therefore entity documents have a corresponding document template, and in the document template
It is lesser that each document items, which are associated with the distance between information, therefore can be by calculating each document items and having known
The distance between malapropism symbol value, can identify to obtain document items corresponding to each recognized character, thus will be each by realizing
A recognized character is automatically imported the purpose of document template.
Optionally, in the present embodiment, terminal device can choose one on character zone image and close with document items
A nearest point of the effective coverage of connection, and preset Euclidean distance computation model is imported into according to the coordinate of above-mentioned two point,
Determine the Euclidean distance between two coordinate points, it is preferable that terminal device can carry out variant to Euclidean distance computation model, mention
The weight of high longitudinal coordinate, and the weight of lateral coordinates is reduced, specific Euclidean distance variant formula is as follows:
Wherein, α and β is predetermined coefficient.Due to belonging to identical document items, it should it is in same level region, because
This ordinate adjust the distance value weight answer it is larger, conversely, if the information content of a certain document items is more, initial character and tail
Lateral shift between character and the reference coordinate of document items is larger, but still falls within the same document items, is based on this, corresponding
The weight of abscissa should be smaller, so as to improve the accuracy rate of identification.
In S104, the recognized character is directed into the document items affiliated in the document template, is generated
Electronic document about the target entity.
In the present embodiment, terminal device has been after having determined the corresponding document items of each recognized character, can will be each
A recognized character is imported into corresponding document items, to generate the electronic document about target entity, is realized automatic raw
At the purpose of electronic document.
Above as can be seen that a kind of generation method of electronic document provided in an embodiment of the present invention is to be converted by obtaining
The solid images of target entity then determine the entity type of target entity according to the solid images, obtain and entity type phase
Matched document template;Character recognition algorithm is adjusted according to entity type, extracts the character information for including in solid images, and root
Corresponding document items are determined according to the centre coordinate of recognized character each in character information, and it is each then successively to import document template
In a associated document items, the electronic document about target entity is generated, realizes the purpose of electronic document automatically generated.With
The generation technique of existing electronic document is compared, and the embodiment of the present invention can determine corresponding document item according to the position of character
Mesh, and character recognition algorithm is adjusted according to entity type, the accuracy of character recognition algorithm is improved, is selected manually without user
It takes, reduces and import abnormal situation, without semantic analysis is carried out, improve the efficiency of generation.
Fig. 2 shows the specific implementation streams of the generation method S102 of electronic document of second embodiment of the invention offer a kind of
Cheng Tu.Referring to fig. 2, relative to embodiment described in Fig. 1, a kind of generation method S102 packet of electronic document provided in this embodiment
Include: S1021~S1025, specific details are as follows:
Further, described that preset character recognition algorithm is adjusted based on the entity type, by adjusting after described in
Character recognition algorithm exports the character information about the solid images, comprising:
In S1021, the solid images are imported into five layers of pond network and carry out the operation of pond dimensionality reduction, obtain the entity
The pond eigenmatrix of image.
In the present embodiment, in order to determine that the character properties of solid images, terminal device can pass through preset five layers of pond
Change network and dimension-reduction treatment is carried out to the solid images, since dimensionality reduction operates the characteristics of image that solid images can be allowed to be included more
Obviously, character size included in solid images, character font type and word can be determined such as by pond dimensionality reduction
The information such as region position are accorded with, and after carrying out pond dimensionality reduction to solid images, the data handled needed for terminal device
Amount can be greatly decreased, so as to improve the efficiency of identification.
In the present embodiment, the size adjusting of solid images is preset standard size by terminal device, to pass through base
Quasi- pond convolution kernel carries out dimensionality reduction operation to solid images adjusted;Or terminal device identifies the figure of the solid images
As size, to adjust the pond convolution kernel of each level in five layers of pond network based on picture size.It is grasped by above-mentioned adjustment
Make, it is ensured that the consistency of the pond eigenmatrix of output.
Optionally, in the present embodiment, terminal device can carry out gray processing processing to solid images first, so as to
The case where profile for reducing figure layer number that the solid images are included and prominent character due to solid images is color image
Under, then include the image data of three figure layers, needs to carry out pond dimensionality reduction to three images simultaneously, pond dimensionality reduction calculates meeting
It is larger, therefore by carrying out gray processing processing to the solid images, the figure layer number of solid images can not only be reduced, to reduce
The calculation amount of pond dimensionality reduction, additionally it is possible to improve the diversity factor between character boundary and background image, improve what character information extracted
Efficiency.
In S1022, acquisition and the matched sliding window of the entity type, based on the sliding window in the pond
Sliding selection is carried out on eigenmatrix, obtains multiple window feature sequences.
In the present embodiment, terminal device can be according to the entity type of target entity, and determination is associated with entity type
Sliding window, the character boundary and font type that different entities type is included can have differences, therefore terminal device can be
Each entity type configuration sliding window that character information matches therewith, then based on the sliding window in pond eigenmatrix
On carry out sliding selection, the data that institute's frame takes then are used as a window feature sequence, thus slide choose during, can give birth to
At multiple window feature sequences.
Optionally, the ginseng for being included for different entity types, the size of matched sliding window and the window
Number is also different.
In S1023, all window feature sequences are imported into preset Recognition with Recurrent Neural Network, are generated about sterogram
The character recognition window of picture.
In the present embodiment, terminal device, can after traversal obtains all window feature sequences of pond eigenmatrix
Each window feature sequence to be imported into preset Recognition with Recurrent Neural Network, it may thereby determine that solid images institute is matched
Character recognition window, i.e. target window archor.Wherein, which includes circulation layer and full articulamentum, terminal
Equipment is provided with cycle-index, speciality extraction is carried out by circulation layer to all window speciality sequences, to constitute cycle specificity
Sequence, and full articulamentum is finally poured into based on the cycle specificity sequence, it can output is about the corresponding character of the solid images
Identification window.
In S1024, area image and the word that the character recognition window is covered in the solid images are calculated
The convolution value between identification window is accorded with, identifying based on the convolution value by the area image that the character recognition window is covered is
No is character zone image.
In the present embodiment, terminal device can be carried out slider box by character recognition window in solid images and take, and be counted
Calculate volume of the character recognition between the area image and character recognition window of solid images locking covering in each sliding process
Product value, the numerical value based on convolution value judge whether the overlay area is character zone image.Since the character recognition window is root
It is generated according to the character feature sequence of solid images, i.e., the area image to match with the character recognition window, then it can be with
It determines that the area image includes character information, therefore, the volume between calculating character identification window and area image can be passed through
Whether product value is character zone image come the area image for judging this covering.
In the present embodiment, terminal device is provided with a matching range, if the convolution value is known in the matching range
The area image of another edition of a book time covering is character zone image;Conversely, if the convolution value identifies this covering outside matching range
Area image be not character zone image.
In S1025, identifies the character that the character zone image includes, generate the character information.
In the present embodiment, the character for including in the character zone image, and root are determined by character recognition algorithms such as OCR
Character information is generated according to the location information where the character and character zone image identified.
In embodiments of the present invention, by carrying out dimension-reduction treatment to solid images, and it is special based on the window after dimension-reduction treatment
It levies matrix and generates character recognition window, so as to improve the accuracy of character recognition.
Fig. 3 shows the specific implementation stream of the generation method S103 of electronic document of third embodiment of the invention offer a kind of
Cheng Tu.Referring to Fig. 3, relative to Fig. 1 the embodiment described, a kind of generation method S103 packet of electronic document provided in this embodiment
Include: S1031~S1033, specific details are as follows:
Further, described that the centre coordinate of the recognized character is obtained according to the character zone image, and pass through
The effective coverage of the centre coordinate and each document items determines the document item belonging to the recognized character
Mesh, comprising:
In S1031, the angular coordinate of the character zone image is obtained, and according to the angular coordinate and the sterogram
The picture size of picture calculates the centre coordinate.
In the present embodiment, character zone image can carry out region restriction, therefore, terminal device by multiple angular coordinates
Two diagonal angular coordinates or four angular coordinates can be arbitrarily chosen from the character zone image, to pass through angular coordinate meter
Calculate the geometric center of the character zone image.For example, respectively (the x of two angular coordinates1,y1) and (x2,y2), then the character
The geometric center of area image is are as follows:
In the present embodiment, terminal device can also obtain the picture size of the solid images, thus according to the geometric center
And picture size calculates the centre coordinate of the character zone image.Assuming that the length and width of the solid images are divided into L and H, then should
The geometric center of character zone is are as follows:It the problem of due to shooting angle and resolution ratio, may be to word
The position of symbol area image has an impact, and determines the character zone by the picture size and geometric center of solid images
Centre coordinate, it is possible to reduce the influence of solid images size suffered by the centre coordinate improves the accuracy of document items identification.
In S1032, calculate each coordinate points on the contour line of the centre coordinate and the effective coverage at a distance of away from
From characteristic distance of the selection the smallest distance of numerical value as the character zone image and the document items.
In the present embodiment, terminal device can calculate the center after the centre coordinate that character zone image has been determined
The distance between each coordinate points value on coordinate and document items contour line, and the smallest distance value is chosen as the character zone
The characteristic distance value of image and document items, due to nearest with the profile and border distance of document items, then with the document item
The degree of correlation between mesh is bigger;Conversely, the profile and border distance with the document project is remoter, then between the document project
The degree of correlation it is smaller.Because terminal device is it needs to be determined that centre coordinate and document items in order to determine the degree of correlation between the two
Effective coverage between minimum value, i.e., above-mentioned characteristic distance.
In S1033, the smallest document items of the characteristic distance are chosen as described in the character zone image
Document items.
In the present embodiment, terminal device can calculate the characteristic distance between character feature region and each document items,
And document items of the smallest document items in selected characteristic region as the character zone image.Preferably, terminal device can root
Associated document project of the document items covered according to the character feature area image as the recognized character, and only calculate and close
Join the characteristic distance of document items.Since the character zone image of the recognized character may fall into having for multiple document items
Region is imitated, just needs to differentiate which document items belonged at this time, and the character zone image of the recognized character is not fallen
The document items entered, then it is inevitable uncorrelated to the recognized character, without calculating this feature distance, to reduce a large amount of
Invalid computation.
In embodiments of the present invention, it is handled by the weighting that the centre coordinate to character zone image carries out picture size,
So as to reduce the influence that picture size calculates centre coordinate, the accuracy of identification is improved.
Fig. 4 shows a kind of specific implementation flow of the generation method of electronic document of fourth embodiment of the invention offer
Figure.Referring to fig. 4, relative to embodiment described in Fig. 1 to Fig. 3, a kind of generation method of electronic document provided in this embodiment is in institute
It states and preset character recognition algorithm is adjusted based on the entity type, the character recognition algorithm by adjusting after is to sterogram
As being handled, before character information of the output about the solid images, further includes: S401~S403, specific details are as follows:
In S401, according to the pixel value of pixel each in the solid images, being averaged for the solid images is calculated
Pixel value.
In the present embodiment, terminal device can count each pixel in the solid images to extract character zone image
The pixel value of point, determines the reference colours of the solid images.Due to compared with character zone image, occupied by the image of background area
Area is more, therefore the corresponding reference colours of the solid images should be close with the color of background area image.Based on this, terminal device
The average pixel value of the solid images can be calculated, so as to be convenient for identifying background pixel point.
In S402, if the difference of any pixel and the average pixel value is less than default in the solid images
Background threshold, then identify the pixel be background pixel point.
In the present embodiment, terminal device can calculate the difference between the pixel value of each pixel and the average pixel value
Value, may thereby determine that whether the reference colours of the pixel and solid images are close, if close, can determine that the pixel is
Background pixel point, therefore all pixels point can be classified, the pixel that difference is less than preset background threshold is identified as carrying on the back
Scene vegetarian refreshments;And the pixel that the difference is greater than or equal to background threshold is identified as character pixels point.
In S403, region recognition that the background pixel point is covered is background area image, and by the background area
Area image is removed from the solid images, obtains the character zone image.
In the present embodiment, the continuous region recognition that terminal device constitutes background pixel point is a Background regional image
Picture, terminal, which is set, to delete in solid images in all background area images, then the available word comprising character pixels point
Accord with area image.
In embodiments of the present invention, the average pixel value of solid images is determined, to identify to obtain according to average pixel value
Character zone image, to improve the recognition efficiency and accuracy of character zone.
Fig. 5 shows the specific implementation stream of the generation method S101 of electronic document of fifth embodiment of the invention offer a kind of
Cheng Tu.Referring to Fig. 5, relative to embodiment described in Fig. 1-Fig. 3, a kind of generation method S101 of electronic document provided in this embodiment
Include: S1011~S1012, specific details are as follows:
In S1011, the identifier of predeterminable area in the solid images is obtained.
In the present embodiment, identifier, the identifier can be configured on target entity at least one scheduled region
It can be a character string or two dimensional code mark etc., terminal device can extract the identifier of predeterminable area from entity area, and
Symbol recognition is carried out to the identifier.
In S1012, the entity type of the target entity is determined based on the identifier.
In the present embodiment, the identifier is compared terminal device with entity type identification list, so that it is determined that should
The matched entity type of identifier institute, so as to determine the entity type of the target entity.
In embodiments of the present invention, it is identified by the identifier to predeterminated position, so that it is determined that entity type, is improved
The identification accuracy and recognition efficiency of entity type.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process
Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit
It is fixed.
Fig. 6 shows a kind of structural block diagram of the generating device of electronic document of one embodiment of the invention offer, the electronics
The each unit that the generating device of document includes is used to execute each step in the corresponding embodiment of Fig. 1.Referring specifically to Fig. 1 and figure
Associated description in embodiment corresponding to 1.For ease of description, only the parts related to this embodiment are shown.
Referring to Fig. 6, the generating device of the electronic document includes:
Solid images acquiring unit 61 is determined for obtaining the solid images of target entity, and according to the solid images
The entity type of the target entity obtains and the matched document template of the entity type;The document template includes multiple
Document items;
Character information output unit 62 passes through tune for adjusting preset character recognition algorithm based on the entity type
The character recognition algorithm after whole handles solid images, exports the character information about the solid images;It is described
Character information includes recognized character and the character zone image of the recognized character;
Document items determination unit 63, the center for obtaining the recognized character according to the character zone image are sat
Mark, and by the centre coordinate and the effective coverage of each document items, it determines belonging to the recognized character
The document items;
Character information import unit 64, for by the recognized character be directed into the document template belonging to described in
Document items generate the electronic document about the target entity.
Optionally, the character information output unit 62, comprising:
Pond eigenmatrix generation unit carries out pond dimensionality reduction behaviour for the solid images to be imported five layers of pond network
Make, obtains the pond eigenmatrix of the solid images;
Window feature sequence output unit is based on the cunning for acquisition and the matched sliding window of the entity type
Dynamic window carries out sliding selection on the pond eigenmatrix, obtains multiple window feature sequences;
Character recognition window generation unit, for all window feature sequences to be imported preset circulation nerve net
Network generates the character recognition window about solid images;
Character zone recognition unit, the administrative division map covered for calculating the character recognition window in the solid images
Picture and the convolution value between the character recognition window are covered based on convolution value identification by the character recognition window
Whether area image is character zone image;
Character recognition unit, the character that the character zone image includes for identification, generates the character information.
Optionally, the document items determination unit 63 includes:
Centre coordinate computing unit, for obtaining the angular coordinate of the character zone image, and according to the angular coordinate with
And the picture size of the solid images, calculate the centre coordinate;
Characteristic distance output unit, for calculating the centre coordinate and each coordinate on the contour line of the effective coverage
The distance of point chooses spy of the smallest distance of numerical value as the character zone image and the document items
Levy distance;
Characteristic distance comparing unit, for choosing the smallest document items of the characteristic distance as the character area
Document items described in area image.
Optionally, the generating device of the electronic document further include:
Average pixel value computing unit, for the pixel value according to pixel each in the solid images, described in calculating
The average pixel value of solid images;
Background pixel point recognition unit, if for any pixel in the solid images and the average pixel value
Difference be less than preset background threshold, then identify the pixel be background pixel point;
Character zone image extraction unit, the region recognition for covering the background pixel point are Background regional image
Picture, and the background area image is removed from the solid images, obtain the character zone image.
Optionally, the solid images acquiring unit 61 includes:
Identifier acquiring unit, for obtaining the identifier of predeterminable area in the solid images;
Entity type determination unit, for determining the entity type of the target entity based on the identifier.
Therefore, the generating device of electronic document provided in an embodiment of the present invention equally can be determining pair according to the position of character
The document items answered, and character recognition algorithm is adjusted according to entity type, the accuracy of character recognition algorithm is improved, is not necessarily to
User chooses manually, reduces and imports abnormal situation, without semantic analysis is carried out, improves the efficiency of generation.
Fig. 7 be another embodiment of the present invention provides a kind of terminal device schematic diagram.As shown in fig. 7, the embodiment
Terminal device 7 includes: processor 70, memory 71 and is stored in the memory 71 and can transport on the processor 70
Capable computer program 72, such as the generation program of electronic document.The processor 70 executes real when the computer program 72
Step in the generation method embodiment of existing above-mentioned each electronic document, such as S101 shown in FIG. 1 to S104.Alternatively, described
Processor 70 realizes the function of each unit in above-mentioned each Installation practice when executing the computer program 72, such as shown in Fig. 6
61 to 64 function of module.
Illustratively, the computer program 72 can be divided into one or more units, one or more of
Unit is stored in the memory 71, and is executed by the processor 70, to complete the present invention.One or more of lists
Member can be the series of computation machine program instruction section that can complete specific function, and the instruction segment is for describing the computer journey
Implementation procedure of the sequence 72 in the terminal device 7.For example, the computer program 72, which can be divided solid images, obtains list
Member, character information output unit, document items determination unit and character information import unit, each unit concrete function institute as above
It states.
The terminal device 7 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set
It is standby.The terminal device may include, but be not limited only to, processor 70, memory 71.It will be understood by those skilled in the art that Fig. 7
The only example of terminal device 7 does not constitute the restriction to terminal device 7, may include than illustrating more or fewer portions
Part perhaps combines certain components or different components, such as the terminal device can also include input-output equipment, net
Network access device, bus etc..
Alleged processor 70 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
The memory 71 can be the internal storage unit of the terminal device 7, such as the hard disk or interior of terminal device 7
It deposits.The memory 71 is also possible to the External memory equipment of the terminal device 7, such as be equipped on the terminal device 7
Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge
Deposit card (Flash Card) etc..Further, the memory 71 can also both include the storage inside list of the terminal device 7
Member also includes External memory equipment.The memory 71 is for storing needed for the computer program and the terminal device
Other programs and data.The memory 71 can be also used for temporarily storing the data that has exported or will export.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality
Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each
Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified
Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all
It is included within protection scope of the present invention.
Claims (10)
1. a kind of generation method of electronic document characterized by comprising
The solid images of target entity are obtained, and determine the entity type of the target entity according to the solid images, are obtained
With the matched document template of the entity type;The document template includes multiple document items;
Preset character recognition algorithm is adjusted based on the entity type, the character recognition algorithm by adjusting after is to entity
Image is handled, and the character information about the solid images is exported;The character information includes recognized character and should
The character zone image of recognized character;
The centre coordinate of the recognized character is obtained according to the character zone image, and by the centre coordinate and respectively
The effective coverage of a document items determines the document items belonging to the recognized character;
The recognized character is directed into the document items affiliated in the document template, is generated real about the target
The electronic document of body.
2. generation method according to claim 1, which is characterized in that described to adjust preset word based on the entity type
Recognizer is accorded with, the character recognition algorithm by adjusting after exports the character information about the solid images, comprising:
The solid images are imported into five layers of pond network and carry out the operation of pond dimensionality reduction, obtain the pond feature of the solid images
Matrix;
Acquisition and the matched sliding window of the entity type, are carried out on the pond eigenmatrix based on the sliding window
Sliding is chosen, and multiple window feature sequences are obtained;
All window feature sequences are imported into preset Recognition with Recurrent Neural Network, generate the character recognition window about solid images
Mouthful;
The character recognition window is calculated between the area image and the character recognition window that the solid images are covered
Convolution value, identify by the area image that the character recognition window is covered whether be character zone figure based on the convolution value
Picture;
It identifies the character that the character zone image includes, generates the character information.
3. generation method according to claim 1, which is characterized in that described according to character zone image acquisition
The centre coordinate of recognized character, and by the centre coordinate and the effective coverage of each document items, determine institute
State the document items belonging to recognized character, comprising:
Obtain the angular coordinate of the character zone image, and according to the angular coordinate and the picture size of the solid images,
Calculate the centre coordinate;
The distance for calculating each coordinate points on the centre coordinate and the contour line of the effective coverage, it is minimum to choose numerical value
Characteristic distance of the distance as the character zone image and the document items;
The smallest document items of the characteristic distance are chosen as document items described in the character zone image.
4. generation method according to claim 1-3, which is characterized in that be based on the entity type tune described
Whole preset character recognition algorithm, the character recognition algorithm by adjusting after handle solid images, output about
Before the character information of the solid images, further includes:
According to the pixel value of pixel each in the solid images, the average pixel value of the solid images is calculated;
If the difference of any pixel and the average pixel value is less than preset background threshold in the solid images,
Identify that the pixel is background pixel point;
The region recognition that the background pixel point is covered is background area image, and by the background area image from the reality
It is removed in body image, obtains the character zone image.
5. generation method according to claim 1-3, which is characterized in that the sterogram for obtaining target entity
Picture, and determine according to the solid images entity type of the target entity, comprising:
Obtain the identifier of predeterminable area in the solid images;
The entity type of the target entity is determined based on the identifier.
6. a kind of generating device of electronic document characterized by comprising
Solid images acquiring unit determines the mesh for obtaining the solid images of target entity, and according to the solid images
The entity type of entity is marked, is obtained and the matched document template of the entity type;The document template includes multiple document items
Mesh;
Character information output unit, for adjusting preset character recognition algorithm based on the entity type, by adjusting after
The character recognition algorithm handles solid images, exports the character information about the solid images;The character letter
Breath includes recognized character and the character zone image of the recognized character;
Document items determination unit, for obtaining the centre coordinate of the recognized character according to the character zone image, and
By the centre coordinate and the effective coverage of each document items, the text belonging to the recognized character is determined
Shelves project;
Character information import unit, for the recognized character to be directed into the document item affiliated in the document template
Mesh generates the electronic document about the target entity.
7. generating device according to claim 6, which is characterized in that the character information output unit, comprising:
Pond eigenmatrix generation unit carries out the operation of pond dimensionality reduction for the solid images to be imported five layers of pond network,
Obtain the pond eigenmatrix of the solid images;
Window feature sequence output unit is based on the sliding window for acquisition and the matched sliding window of the entity type
Mouth carries out sliding selection on the pond eigenmatrix, obtains multiple window feature sequences;
Character recognition window generation unit, it is raw for all window feature sequences to be imported preset Recognition with Recurrent Neural Network
At the character recognition window about solid images;
Character zone recognition unit, the area image covered in the solid images for calculating the character recognition window with
Convolution value between the character recognition window identifies the region covered by the character recognition window based on the convolution value
Whether image is character zone image;
Character recognition unit, the character that the character zone image includes for identification, generates the character information.
8. generating device according to claim 6, which is characterized in that the document items determination unit includes:
Centre coordinate computing unit, for obtaining the angular coordinate of the character zone image, and according to the angular coordinate and institute
The picture size for stating solid images calculates the centre coordinate;
Characteristic distance output unit, for calculating the centre coordinate and each coordinate points on the contour line of the effective coverage
Distance, choose the smallest distance of numerical value as the character zone image and the document items feature away from
From;
Characteristic distance comparing unit, for choosing the smallest document items of the characteristic distance as the character zone figure
As the document items.
9. a kind of terminal device, which is characterized in that the terminal device includes memory, processor and is stored in the storage
In device and the computer program that can run on the processor, when the processor executes the computer program such as right is wanted
The step of seeking any one of 1 to 5 the method.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists
In when the computer program is executed by processor the step of any one of such as claim 1 to 5 of realization the method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910017061.1A CN109871521A (en) | 2019-01-08 | 2019-01-08 | A kind of generation method and equipment of electronic document |
PCT/CN2019/118554 WO2020143325A1 (en) | 2019-01-08 | 2019-11-14 | Electronic document generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910017061.1A CN109871521A (en) | 2019-01-08 | 2019-01-08 | A kind of generation method and equipment of electronic document |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109871521A true CN109871521A (en) | 2019-06-11 |
Family
ID=66917551
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910017061.1A Withdrawn CN109871521A (en) | 2019-01-08 | 2019-01-08 | A kind of generation method and equipment of electronic document |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109871521A (en) |
WO (1) | WO2020143325A1 (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110764721A (en) * | 2019-09-19 | 2020-02-07 | 北京三快在线科技有限公司 | Template generation method and device, electronic equipment and computer readable medium |
CN111144210A (en) * | 2019-11-26 | 2020-05-12 | 泰康保险集团股份有限公司 | Image structuring processing method and device, storage medium and electronic equipment |
WO2020143325A1 (en) * | 2019-01-08 | 2020-07-16 | 平安科技(深圳)有限公司 | Electronic document generation method and device |
CN111444907A (en) * | 2020-03-24 | 2020-07-24 | 上海东普信息科技有限公司 | Character recognition method, device, equipment and storage medium |
CN112926590A (en) * | 2021-03-18 | 2021-06-08 | 上海晨兴希姆通电子科技有限公司 | Method and system for segmenting and identifying characters on cable |
CN115761781A (en) * | 2023-01-06 | 2023-03-07 | 江苏狄诺尼信息技术有限责任公司 | Note image data identification system for engineering electronic archives |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113130023B (en) * | 2021-04-22 | 2023-04-07 | 嘉兴易迪希计算机技术有限公司 | Image-text recognition and entry method and system in EDC system |
CN113435331B (en) * | 2021-06-28 | 2023-06-09 | 平安科技(深圳)有限公司 | Image character recognition method, system, electronic equipment and storage medium |
CN115052132A (en) * | 2022-07-05 | 2022-09-13 | 国网江苏省电力有限公司南通市通州区供电分公司 | Fishing electric shock prevention early warning method and system based on artificial intelligence |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5144683A (en) * | 1989-04-28 | 1992-09-01 | Hitachi, Ltd. | Character recognition equipment |
CN101523413A (en) * | 2006-11-16 | 2009-09-02 | 国际商业机器公司 | Automated generation of form definitions from hard-copy forms |
CN102831416A (en) * | 2012-08-15 | 2012-12-19 | 广州广电运通金融电子股份有限公司 | Character identification method and relevant device |
CN108121984A (en) * | 2016-11-30 | 2018-06-05 | 杭州海康威视数字技术股份有限公司 | A kind of character identifying method and device |
CN108765118A (en) * | 2018-05-18 | 2018-11-06 | 北京大账房网络科技股份有限公司 | Bill is mixed to sweep the method and system for generating voucher |
CN108984578A (en) * | 2017-05-31 | 2018-12-11 | 株式会社日立制作所 | Computer, document recognition methods and system |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105260733A (en) * | 2015-09-11 | 2016-01-20 | 北京百度网讯科技有限公司 | Method and device for processing image information |
CN106407976B (en) * | 2016-08-30 | 2019-11-05 | 百度在线网络技术(北京)有限公司 | The generation of image character identification model and perpendicular column character picture recognition methods and device |
CN108121966A (en) * | 2017-12-21 | 2018-06-05 | 欧浦智网股份有限公司 | A kind of list method for automatically inputting, electronic equipment and storage medium based on OCR technique |
CN109710907A (en) * | 2018-12-20 | 2019-05-03 | 平安科技(深圳)有限公司 | A kind of generation method and equipment of electronic document |
CN109871521A (en) * | 2019-01-08 | 2019-06-11 | 平安科技(深圳)有限公司 | A kind of generation method and equipment of electronic document |
-
2019
- 2019-01-08 CN CN201910017061.1A patent/CN109871521A/en not_active Withdrawn
- 2019-11-14 WO PCT/CN2019/118554 patent/WO2020143325A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5144683A (en) * | 1989-04-28 | 1992-09-01 | Hitachi, Ltd. | Character recognition equipment |
CN101523413A (en) * | 2006-11-16 | 2009-09-02 | 国际商业机器公司 | Automated generation of form definitions from hard-copy forms |
CN102831416A (en) * | 2012-08-15 | 2012-12-19 | 广州广电运通金融电子股份有限公司 | Character identification method and relevant device |
CN108121984A (en) * | 2016-11-30 | 2018-06-05 | 杭州海康威视数字技术股份有限公司 | A kind of character identifying method and device |
CN108984578A (en) * | 2017-05-31 | 2018-12-11 | 株式会社日立制作所 | Computer, document recognition methods and system |
CN108765118A (en) * | 2018-05-18 | 2018-11-06 | 北京大账房网络科技股份有限公司 | Bill is mixed to sweep the method and system for generating voucher |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020143325A1 (en) * | 2019-01-08 | 2020-07-16 | 平安科技(深圳)有限公司 | Electronic document generation method and device |
CN110764721A (en) * | 2019-09-19 | 2020-02-07 | 北京三快在线科技有限公司 | Template generation method and device, electronic equipment and computer readable medium |
CN111144210A (en) * | 2019-11-26 | 2020-05-12 | 泰康保险集团股份有限公司 | Image structuring processing method and device, storage medium and electronic equipment |
CN111144210B (en) * | 2019-11-26 | 2023-07-18 | 泰康保险集团股份有限公司 | Image structuring processing method and device, storage medium and electronic equipment |
CN111444907A (en) * | 2020-03-24 | 2020-07-24 | 上海东普信息科技有限公司 | Character recognition method, device, equipment and storage medium |
CN111444907B (en) * | 2020-03-24 | 2023-05-16 | 上海东普信息科技有限公司 | Method, device, equipment and storage medium for character recognition |
CN112926590A (en) * | 2021-03-18 | 2021-06-08 | 上海晨兴希姆通电子科技有限公司 | Method and system for segmenting and identifying characters on cable |
CN112926590B (en) * | 2021-03-18 | 2023-12-01 | 上海晨兴希姆通电子科技有限公司 | Segmentation recognition method and system for characters on cable |
CN115761781A (en) * | 2023-01-06 | 2023-03-07 | 江苏狄诺尼信息技术有限责任公司 | Note image data identification system for engineering electronic archives |
Also Published As
Publication number | Publication date |
---|---|
WO2020143325A1 (en) | 2020-07-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109871521A (en) | A kind of generation method and equipment of electronic document | |
US11238310B2 (en) | Training data acquisition method and device, server and storage medium | |
AU2017421316B2 (en) | Systems and methods for verifying authenticity of ID photo | |
CN104834693B (en) | Visual pattern search method and system based on deep search | |
EP3798917A1 (en) | Generative adversarial network (gan) for generating images | |
CN108229419A (en) | For clustering the method and apparatus of image | |
US20120027252A1 (en) | Hand gesture detection | |
CN104915673B (en) | A kind of objective classification method and system of view-based access control model bag of words | |
CN110378235A (en) | A kind of fuzzy facial image recognition method, device and terminal device | |
CN110175249A (en) | A kind of search method and system of similar pictures | |
CN106599925A (en) | Plant leaf identification system and method based on deep learning | |
EP4109332A1 (en) | Certificate authenticity identification method and apparatus, computer-readable medium, and electronic device | |
CN108875540A (en) | Image processing method, device and system and storage medium | |
CN110490238A (en) | A kind of image processing method, device and storage medium | |
CN109829448A (en) | Face identification method, device and storage medium | |
CN110489659A (en) | Data matching method and device | |
CN109522970A (en) | Image classification method, apparatus and system | |
CN109271930A (en) | Micro- expression recognition method, device and storage medium | |
CN108492301A (en) | A kind of Scene Segmentation, terminal and storage medium | |
JP2004062605A (en) | Scene identification method and device, and program | |
CN108228684A (en) | Training method, device, electronic equipment and the computer storage media of Clustering Model | |
CN109871845A (en) | Certificate image extracting method and terminal device | |
CN109740417A (en) | Invoice type recognition methods, device, storage medium and computer equipment | |
CN112115979A (en) | Fusion method and device of infrared image and visible image | |
CN109948401A (en) | Data processing method and its system for text |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20190611 |
|
WW01 | Invention patent application withdrawn after publication |