CN109871521A - A kind of generation method and equipment of electronic document - Google Patents

A kind of generation method and equipment of electronic document Download PDF

Info

Publication number
CN109871521A
CN109871521A CN201910017061.1A CN201910017061A CN109871521A CN 109871521 A CN109871521 A CN 109871521A CN 201910017061 A CN201910017061 A CN 201910017061A CN 109871521 A CN109871521 A CN 109871521A
Authority
CN
China
Prior art keywords
character
solid images
document
entity
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910017061.1A
Other languages
Chinese (zh)
Inventor
黄泽浩
宋欢儿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910017061.1A priority Critical patent/CN109871521A/en
Publication of CN109871521A publication Critical patent/CN109871521A/en
Priority to PCT/CN2019/118554 priority patent/WO2020143325A1/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion

Abstract

The present invention is suitable for technical field of image processing, provides the generation method and equipment of a kind of electronic document, comprising: obtains the solid images of target entity, and determines the entity type of target entity according to solid images, obtains and the matched document template of entity type;Preset character recognition algorithm is adjusted based on entity type, passes through character recognition algorithm output character information;The centre coordinate of recognized character is obtained according to character zone image, and by centre coordinate and the effective coverage of each document items, determines document items belonging to recognized character;Recognized character is directed into document items affiliated in document template, generates the electronic document about target entity.The present invention chooses manually without user, and according to the position where character, and the project imported needed for determining reduces and imports abnormal situation, without semantic analysis is carried out, improves the efficiency of generation.

Description

A kind of generation method and equipment of electronic document
Technical field
The invention belongs to technical field of image processing more particularly to the generation methods and equipment of a kind of electronic document.
Background technique
With the continuous propulsion of electronicalization process, since electronic document has many advantages, such as that convenient storage and transmission are timely, It is widely used in various applications, how entity file is effectively converted into electronic document, then directly affects document The efficiency of management.The generation technique of existing electronic document artificially identifies corresponding to the entity file generally by administrator Electronic stencil, and the content for including in entity file is filled up to manually in each project of electronic stencil, works as entity file When quantity is more and text amount is larger, then more time is needed to carry out the conversion of electronic document, to reduce electronic document The efficiency of generation.
Summary of the invention
In view of this, the embodiment of the invention provides a kind of generation method of electronic document and equipment, it is existing to solve The generation technique of electronic document, needs artificially to identify electronic stencil corresponding to the entity file by administrator, and by entity The content for including in file is filled up to manually in each project of electronic stencil, the lower problem of document structure tree efficiency.
The first aspect of the embodiment of the present invention provides a kind of generation method of electronic document, comprising:
The solid images of target entity are obtained, and determine the entity type of the target entity according to the solid images, It obtains and the matched document template of the entity type;The document template includes multiple document items;
Preset character recognition algorithm is adjusted based on the entity type, the character recognition algorithm pair by adjusting after Solid images are handled, and the character information about the solid images is exported;The character information include recognized character with And the character zone image of the recognized character;
Obtain the centre coordinate of the recognized character according to the character zone image, and by the centre coordinate with And the effective coverage of each document items, determine the document items belonging to the recognized character;
The recognized character is directed into the document items affiliated in the document template, is generated about the mesh Mark the electronic document of entity.
The second aspect of the embodiment of the present invention provides a kind of generating device of electronic document, comprising:
Solid images acquiring unit determines institute for obtaining the solid images of target entity, and according to the solid images The entity type of target entity is stated, is obtained and the matched document template of the entity type;The document template includes multiple texts Shelves project;
Character information output unit, for adjusting preset character recognition algorithm based on the entity type, by adjusting The character recognition algorithm afterwards handles solid images, exports the character information about the solid images;The word According with information includes recognized character and the character zone image of the recognized character;
Document items determination unit, the center for obtaining the recognized character according to the character zone image are sat Mark, and by the centre coordinate and the effective coverage of each document items, it determines belonging to the recognized character The document items;
Character information import unit, for the recognized character to be directed into the text affiliated in the document template Shelves project, generates the electronic document about the target entity.
The third aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in In the memory and the computer program that can run on the processor, when the processor executes the computer program Realize each step of first aspect.
The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage Media storage has computer program, and each step of first aspect is realized when the computer program is executed by processor.
The generation method and equipment for implementing a kind of electronic document provided in an embodiment of the present invention have the advantages that
The embodiment of the present invention is then determined according to the solid images by the solid images of acquisition target entity to be converted The entity type of target entity obtains the document template to match with entity type;Character recognition is adjusted according to entity type to calculate Method is extracted the character information for including in solid images, and is determined according to the centre coordinate of recognized character each in character information Corresponding document items then successively import in each associated document items of document template, generate the electricity about target entity Subdocument realizes the purpose of electronic document automatically generated.Compared with the generation technique of existing electronic document, the present invention is implemented Example can determine corresponding document items according to the position of character, and be adjusted according to entity type to character recognition algorithm, improve The accuracy of character recognition algorithm, chooses without user manually, reduces and imports abnormal situation, without carrying out semantic point Analysis, improves the efficiency of generation.
Detailed description of the invention
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.
Fig. 1 is a kind of implementation flow chart of the generation method for electronic document that first embodiment of the invention provides;
Fig. 2 is a kind of generation method S102 specific implementation flow chart for electronic document that second embodiment of the invention provides;
Fig. 3 is a kind of generation method S103 specific implementation flow chart for electronic document that third embodiment of the invention provides;
Fig. 4 is a kind of generation method specific implementation flow chart for electronic document that fourth embodiment of the invention provides;
Fig. 5 is a kind of generation method S101 specific implementation flow chart for electronic document that fifth embodiment of the invention provides;
Fig. 6 is a kind of structural block diagram of the generating device for electronic document that one embodiment of the invention provides;
Fig. 7 be another embodiment of the present invention provides a kind of terminal device schematic diagram.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
The embodiment of the present invention is then determined according to the solid images by the solid images of acquisition target entity to be converted The entity type of target entity obtains the document template to match with entity type;Character recognition is adjusted according to entity type to calculate Method is extracted the character information for including in solid images, and is determined according to the centre coordinate of recognized character each in character information Corresponding document items then successively import in each associated document items of document template, generate the electricity about target entity Subdocument realizes the purpose of electronic document automatically generated, solves the generation technique of existing electronic document, needs to pass through pipe Reason person artificially identifies electronic stencil corresponding to the entity file, and the content for including in entity file is filled up to electronics manually In each project of template, the lower problem of document structure tree efficiency.
In embodiments of the present invention, the executing subject of process is terminal device.The terminal device includes but is not limited to: service The equipment that device, computer, smart phone and tablet computer etc. are able to carry out the generation operation of electronic document.Fig. 1 shows this The implementation flow chart of the generation method for the electronic document that invention first embodiment provides, details are as follows:
In S101, the solid images of target entity are obtained, and determine the target entity according to the solid images Entity type obtains and the matched document template of the entity type;The document template includes multiple document items.
In the present embodiment, terminal device can receive the sterogram about target entity of user terminal transmission, at this In the case of, user can acquire the solid images about target entity, and lead to by the shooting unit of own user terminal built-in The client that user terminal is equipped with is crossed, the solid images that shooting obtains are uploaded to terminal device, terminal device is receiving After the image uploading instructions of client, then the relevant operation of S101 is executed.Optionally, terminal device uploads image to ensure Legitimacy, terminal device can obtain the program number of client, judge whether the client is by legal by program number The program file of publication channel downloading rejects the solid images if client number is identified as illegal parameter, returns Image abnormity information avoids the user of unauthorized from carrying out image recognition to ensure that the legitimacy of authentication operation, so as to cause Load excessive and reduce recognition efficiency and accuracy rate;Conversely, executing S101's if program number is identified as legal number Operation.Except through receiving outside the solid images that other equipment are sent, can also by shooting module built in terminal device or The image acquisition units such as scan module obtain the solid images about target entity, and in this case, terminal device can receive Client-initiated image capture instruction or when detecting that image acquisition region placed target entity to be identified, then start image Acquisition module, obtains the image information at pickup area current time, and executes the relevant operation of S101.
In the present embodiment, terminal device can pre-process the solid images received, know so as to improve The accuracy of other entity type.Specifically executing pretreated mode can be with are as follows: terminal device obtains environment light when acquisition image By force, bloom regulation coefficient and shade regulation coefficient are determined based on current environmental light intensity, passes through above-mentioned two regulation coefficient pair The highlight area of solid images and shadow region are adjusted;Identify the boundary profile in solid images about target entity, Solid images are cut based on the boundary profile, to filter off invalid background area;To cutting and reality adjusted Body image carries out gray processing processing, so as to improve the accuracy rate of identification.
In the present embodiment, terminal device can determine the corresponding entity type of the target entity according to solid images.Tool Body identifies that the mode of entity type can be with are as follows: terminal device identifies the picture size of the solid images, and according to the picture size It is compared with preset file type size list, determines the corresponding file type of the picture size, to acquire pass In the entity type of the target entity.Other than determining entity type according to picture size, terminal device can also be to target reality The file title of body is identified that file keyword included in extraction document title determines the entity class of the target entity Type.
In the present embodiment, terminal device is that different entity types has been pre-configured with corresponding document template, due to not The document items that same entity type is included can be different, and in order to improve the accuracy being automatically imported, terminal device can be preparatory Document items corresponding to different entities type are packaged, the document template about the entity type are generated, to improve The efficiency that successive character imports.The document project is specifically used for the different types of information for indicating that the target entity includes, a kind of Type information can correspond to a document items, such as recording address name in a certain target entity, then address name can With a corresponding document items;The target entity also describes user address, then user address can also correspond to another document Project improves the accuracy of importing so that convenient carry out classification importing to information different types of in entity type.
In S102, preset character recognition algorithm, the character by adjusting after are adjusted based on the entity type Recognizer handles solid images, exports the character information about the solid images;The character information includes Identify character and the character zone image of the recognized character.
In the present embodiment, terminal device is after having determined the entity type of target entity, the available entity type Corresponding recognizer parameter, and preset character recognition algorithm is adjusted by recognizer parameter, so that the word Symbol recognizer matches with the entity type, to improve the accuracy of character recognition algorithm.Wherein, if character recognition algorithm For a pond neural network, the character information for including in solid images, the then knowledge are extracted by multiple pond and full articulamentum Other algorithm parameter can be pond convolution kernel, obtain corresponding pond convolution kernel, particularly, the entity based on entity type Type includes the character information of multiple and different font sizes and font type, then can correspond to multiple pond convolution kernels, thus It can be improved the efficiency of character recognition.It, should by sliding window judgement if character recognition algorithm is a window character recognition algorithm Whether include character in window overlay area, then can obtain corresponding sliding window by entity type.
Optionally, in the present embodiment, which can be the OCR algorithm based on Tessract technology, then Terminal device can obtain the font sample database being associated according to entity type, so as to include quickly each to solid images A character carries out match cognization,.Compared with character recognition mode neural network based, by OCR identification flow path efficiency compared with Height, and it is lower for the hardware requirement of terminal device, on the other hand, character sample library is constructed by Tessract, also can The character for identifying different fonts type, further improves the efficiency of identification.
In the present embodiment, terminal device can position the character zone where the character after recognizing a character, and Obtain character coordinates of the coordinate where the central point of the character zone as the recognized character, then based on character coordinates with Corresponding relationship between recognized character.
In S103, the centre coordinate of the recognized character is obtained according to the character zone image, and by described The effective coverage of centre coordinate and each document items determines the document items belonging to the recognized character.
In the present embodiment, after terminal device has determined a character in solid images, where the available character Character zone image, the character zone image is determined by four angular coordinates.And it is determined based on four angular coordinates Identify the centre coordinate of character.By the character information that different document project is included, can be fixed on belonging to the document project In effective coverage, thus can characteristic coordinates according to centre coordinate as the recognized character, by calculate centre coordinate with The distance between effective coverage of document items value, to determine whether the recognized character belongs to this article based on the distance value Shelves project.Optionally, if the distance value is less than preset correlation distance, determine that the recognized character belongs to the document project; Conversely, determining that the recognized character is not belonging to the document project if the distance value is greater than or equal to preset correlation threshold, count Calculate the distance value of the recognized character Yu other document items.
In the present embodiment, since entity documents are after printing based on document template, to pass through business personnel or client's hand The document generated after corresponding informance is write, therefore entity documents have a corresponding document template, and in the document template It is lesser that each document items, which are associated with the distance between information, therefore can be by calculating each document items and having known The distance between malapropism symbol value, can identify to obtain document items corresponding to each recognized character, thus will be each by realizing A recognized character is automatically imported the purpose of document template.
Optionally, in the present embodiment, terminal device can choose one on character zone image and close with document items A nearest point of the effective coverage of connection, and preset Euclidean distance computation model is imported into according to the coordinate of above-mentioned two point, Determine the Euclidean distance between two coordinate points, it is preferable that terminal device can carry out variant to Euclidean distance computation model, mention The weight of high longitudinal coordinate, and the weight of lateral coordinates is reduced, specific Euclidean distance variant formula is as follows:
Wherein, α and β is predetermined coefficient.Due to belonging to identical document items, it should it is in same level region, because This ordinate adjust the distance value weight answer it is larger, conversely, if the information content of a certain document items is more, initial character and tail Lateral shift between character and the reference coordinate of document items is larger, but still falls within the same document items, is based on this, corresponding The weight of abscissa should be smaller, so as to improve the accuracy rate of identification.
In S104, the recognized character is directed into the document items affiliated in the document template, is generated Electronic document about the target entity.
In the present embodiment, terminal device has been after having determined the corresponding document items of each recognized character, can will be each A recognized character is imported into corresponding document items, to generate the electronic document about target entity, is realized automatic raw At the purpose of electronic document.
Above as can be seen that a kind of generation method of electronic document provided in an embodiment of the present invention is to be converted by obtaining The solid images of target entity then determine the entity type of target entity according to the solid images, obtain and entity type phase Matched document template;Character recognition algorithm is adjusted according to entity type, extracts the character information for including in solid images, and root Corresponding document items are determined according to the centre coordinate of recognized character each in character information, and it is each then successively to import document template In a associated document items, the electronic document about target entity is generated, realizes the purpose of electronic document automatically generated.With The generation technique of existing electronic document is compared, and the embodiment of the present invention can determine corresponding document item according to the position of character Mesh, and character recognition algorithm is adjusted according to entity type, the accuracy of character recognition algorithm is improved, is selected manually without user It takes, reduces and import abnormal situation, without semantic analysis is carried out, improve the efficiency of generation.
Fig. 2 shows the specific implementation streams of the generation method S102 of electronic document of second embodiment of the invention offer a kind of Cheng Tu.Referring to fig. 2, relative to embodiment described in Fig. 1, a kind of generation method S102 packet of electronic document provided in this embodiment Include: S1021~S1025, specific details are as follows:
Further, described that preset character recognition algorithm is adjusted based on the entity type, by adjusting after described in Character recognition algorithm exports the character information about the solid images, comprising:
In S1021, the solid images are imported into five layers of pond network and carry out the operation of pond dimensionality reduction, obtain the entity The pond eigenmatrix of image.
In the present embodiment, in order to determine that the character properties of solid images, terminal device can pass through preset five layers of pond Change network and dimension-reduction treatment is carried out to the solid images, since dimensionality reduction operates the characteristics of image that solid images can be allowed to be included more Obviously, character size included in solid images, character font type and word can be determined such as by pond dimensionality reduction The information such as region position are accorded with, and after carrying out pond dimensionality reduction to solid images, the data handled needed for terminal device Amount can be greatly decreased, so as to improve the efficiency of identification.
In the present embodiment, the size adjusting of solid images is preset standard size by terminal device, to pass through base Quasi- pond convolution kernel carries out dimensionality reduction operation to solid images adjusted;Or terminal device identifies the figure of the solid images As size, to adjust the pond convolution kernel of each level in five layers of pond network based on picture size.It is grasped by above-mentioned adjustment Make, it is ensured that the consistency of the pond eigenmatrix of output.
Optionally, in the present embodiment, terminal device can carry out gray processing processing to solid images first, so as to The case where profile for reducing figure layer number that the solid images are included and prominent character due to solid images is color image Under, then include the image data of three figure layers, needs to carry out pond dimensionality reduction to three images simultaneously, pond dimensionality reduction calculates meeting It is larger, therefore by carrying out gray processing processing to the solid images, the figure layer number of solid images can not only be reduced, to reduce The calculation amount of pond dimensionality reduction, additionally it is possible to improve the diversity factor between character boundary and background image, improve what character information extracted Efficiency.
In S1022, acquisition and the matched sliding window of the entity type, based on the sliding window in the pond Sliding selection is carried out on eigenmatrix, obtains multiple window feature sequences.
In the present embodiment, terminal device can be according to the entity type of target entity, and determination is associated with entity type Sliding window, the character boundary and font type that different entities type is included can have differences, therefore terminal device can be Each entity type configuration sliding window that character information matches therewith, then based on the sliding window in pond eigenmatrix On carry out sliding selection, the data that institute's frame takes then are used as a window feature sequence, thus slide choose during, can give birth to At multiple window feature sequences.
Optionally, the ginseng for being included for different entity types, the size of matched sliding window and the window Number is also different.
In S1023, all window feature sequences are imported into preset Recognition with Recurrent Neural Network, are generated about sterogram The character recognition window of picture.
In the present embodiment, terminal device, can after traversal obtains all window feature sequences of pond eigenmatrix Each window feature sequence to be imported into preset Recognition with Recurrent Neural Network, it may thereby determine that solid images institute is matched Character recognition window, i.e. target window archor.Wherein, which includes circulation layer and full articulamentum, terminal Equipment is provided with cycle-index, speciality extraction is carried out by circulation layer to all window speciality sequences, to constitute cycle specificity Sequence, and full articulamentum is finally poured into based on the cycle specificity sequence, it can output is about the corresponding character of the solid images Identification window.
In S1024, area image and the word that the character recognition window is covered in the solid images are calculated The convolution value between identification window is accorded with, identifying based on the convolution value by the area image that the character recognition window is covered is No is character zone image.
In the present embodiment, terminal device can be carried out slider box by character recognition window in solid images and take, and be counted Calculate volume of the character recognition between the area image and character recognition window of solid images locking covering in each sliding process Product value, the numerical value based on convolution value judge whether the overlay area is character zone image.Since the character recognition window is root It is generated according to the character feature sequence of solid images, i.e., the area image to match with the character recognition window, then it can be with It determines that the area image includes character information, therefore, the volume between calculating character identification window and area image can be passed through Whether product value is character zone image come the area image for judging this covering.
In the present embodiment, terminal device is provided with a matching range, if the convolution value is known in the matching range The area image of another edition of a book time covering is character zone image;Conversely, if the convolution value identifies this covering outside matching range Area image be not character zone image.
In S1025, identifies the character that the character zone image includes, generate the character information.
In the present embodiment, the character for including in the character zone image, and root are determined by character recognition algorithms such as OCR Character information is generated according to the location information where the character and character zone image identified.
In embodiments of the present invention, by carrying out dimension-reduction treatment to solid images, and it is special based on the window after dimension-reduction treatment It levies matrix and generates character recognition window, so as to improve the accuracy of character recognition.
Fig. 3 shows the specific implementation stream of the generation method S103 of electronic document of third embodiment of the invention offer a kind of Cheng Tu.Referring to Fig. 3, relative to Fig. 1 the embodiment described, a kind of generation method S103 packet of electronic document provided in this embodiment Include: S1031~S1033, specific details are as follows:
Further, described that the centre coordinate of the recognized character is obtained according to the character zone image, and pass through The effective coverage of the centre coordinate and each document items determines the document item belonging to the recognized character Mesh, comprising:
In S1031, the angular coordinate of the character zone image is obtained, and according to the angular coordinate and the sterogram The picture size of picture calculates the centre coordinate.
In the present embodiment, character zone image can carry out region restriction, therefore, terminal device by multiple angular coordinates Two diagonal angular coordinates or four angular coordinates can be arbitrarily chosen from the character zone image, to pass through angular coordinate meter Calculate the geometric center of the character zone image.For example, respectively (the x of two angular coordinates1,y1) and (x2,y2), then the character The geometric center of area image is are as follows:
In the present embodiment, terminal device can also obtain the picture size of the solid images, thus according to the geometric center And picture size calculates the centre coordinate of the character zone image.Assuming that the length and width of the solid images are divided into L and H, then should The geometric center of character zone is are as follows:It the problem of due to shooting angle and resolution ratio, may be to word The position of symbol area image has an impact, and determines the character zone by the picture size and geometric center of solid images Centre coordinate, it is possible to reduce the influence of solid images size suffered by the centre coordinate improves the accuracy of document items identification.
In S1032, calculate each coordinate points on the contour line of the centre coordinate and the effective coverage at a distance of away from From characteristic distance of the selection the smallest distance of numerical value as the character zone image and the document items.
In the present embodiment, terminal device can calculate the center after the centre coordinate that character zone image has been determined The distance between each coordinate points value on coordinate and document items contour line, and the smallest distance value is chosen as the character zone The characteristic distance value of image and document items, due to nearest with the profile and border distance of document items, then with the document item The degree of correlation between mesh is bigger;Conversely, the profile and border distance with the document project is remoter, then between the document project The degree of correlation it is smaller.Because terminal device is it needs to be determined that centre coordinate and document items in order to determine the degree of correlation between the two Effective coverage between minimum value, i.e., above-mentioned characteristic distance.
In S1033, the smallest document items of the characteristic distance are chosen as described in the character zone image Document items.
In the present embodiment, terminal device can calculate the characteristic distance between character feature region and each document items, And document items of the smallest document items in selected characteristic region as the character zone image.Preferably, terminal device can root Associated document project of the document items covered according to the character feature area image as the recognized character, and only calculate and close Join the characteristic distance of document items.Since the character zone image of the recognized character may fall into having for multiple document items Region is imitated, just needs to differentiate which document items belonged at this time, and the character zone image of the recognized character is not fallen The document items entered, then it is inevitable uncorrelated to the recognized character, without calculating this feature distance, to reduce a large amount of Invalid computation.
In embodiments of the present invention, it is handled by the weighting that the centre coordinate to character zone image carries out picture size, So as to reduce the influence that picture size calculates centre coordinate, the accuracy of identification is improved.
Fig. 4 shows a kind of specific implementation flow of the generation method of electronic document of fourth embodiment of the invention offer Figure.Referring to fig. 4, relative to embodiment described in Fig. 1 to Fig. 3, a kind of generation method of electronic document provided in this embodiment is in institute It states and preset character recognition algorithm is adjusted based on the entity type, the character recognition algorithm by adjusting after is to sterogram As being handled, before character information of the output about the solid images, further includes: S401~S403, specific details are as follows:
In S401, according to the pixel value of pixel each in the solid images, being averaged for the solid images is calculated Pixel value.
In the present embodiment, terminal device can count each pixel in the solid images to extract character zone image The pixel value of point, determines the reference colours of the solid images.Due to compared with character zone image, occupied by the image of background area Area is more, therefore the corresponding reference colours of the solid images should be close with the color of background area image.Based on this, terminal device The average pixel value of the solid images can be calculated, so as to be convenient for identifying background pixel point.
In S402, if the difference of any pixel and the average pixel value is less than default in the solid images Background threshold, then identify the pixel be background pixel point.
In the present embodiment, terminal device can calculate the difference between the pixel value of each pixel and the average pixel value Value, may thereby determine that whether the reference colours of the pixel and solid images are close, if close, can determine that the pixel is Background pixel point, therefore all pixels point can be classified, the pixel that difference is less than preset background threshold is identified as carrying on the back Scene vegetarian refreshments;And the pixel that the difference is greater than or equal to background threshold is identified as character pixels point.
In S403, region recognition that the background pixel point is covered is background area image, and by the background area Area image is removed from the solid images, obtains the character zone image.
In the present embodiment, the continuous region recognition that terminal device constitutes background pixel point is a Background regional image Picture, terminal, which is set, to delete in solid images in all background area images, then the available word comprising character pixels point Accord with area image.
In embodiments of the present invention, the average pixel value of solid images is determined, to identify to obtain according to average pixel value Character zone image, to improve the recognition efficiency and accuracy of character zone.
Fig. 5 shows the specific implementation stream of the generation method S101 of electronic document of fifth embodiment of the invention offer a kind of Cheng Tu.Referring to Fig. 5, relative to embodiment described in Fig. 1-Fig. 3, a kind of generation method S101 of electronic document provided in this embodiment Include: S1011~S1012, specific details are as follows:
In S1011, the identifier of predeterminable area in the solid images is obtained.
In the present embodiment, identifier, the identifier can be configured on target entity at least one scheduled region It can be a character string or two dimensional code mark etc., terminal device can extract the identifier of predeterminable area from entity area, and Symbol recognition is carried out to the identifier.
In S1012, the entity type of the target entity is determined based on the identifier.
In the present embodiment, the identifier is compared terminal device with entity type identification list, so that it is determined that should The matched entity type of identifier institute, so as to determine the entity type of the target entity.
In embodiments of the present invention, it is identified by the identifier to predeterminated position, so that it is determined that entity type, is improved The identification accuracy and recognition efficiency of entity type.
It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.
Fig. 6 shows a kind of structural block diagram of the generating device of electronic document of one embodiment of the invention offer, the electronics The each unit that the generating device of document includes is used to execute each step in the corresponding embodiment of Fig. 1.Referring specifically to Fig. 1 and figure Associated description in embodiment corresponding to 1.For ease of description, only the parts related to this embodiment are shown.
Referring to Fig. 6, the generating device of the electronic document includes:
Solid images acquiring unit 61 is determined for obtaining the solid images of target entity, and according to the solid images The entity type of the target entity obtains and the matched document template of the entity type;The document template includes multiple Document items;
Character information output unit 62 passes through tune for adjusting preset character recognition algorithm based on the entity type The character recognition algorithm after whole handles solid images, exports the character information about the solid images;It is described Character information includes recognized character and the character zone image of the recognized character;
Document items determination unit 63, the center for obtaining the recognized character according to the character zone image are sat Mark, and by the centre coordinate and the effective coverage of each document items, it determines belonging to the recognized character The document items;
Character information import unit 64, for by the recognized character be directed into the document template belonging to described in Document items generate the electronic document about the target entity.
Optionally, the character information output unit 62, comprising:
Pond eigenmatrix generation unit carries out pond dimensionality reduction behaviour for the solid images to be imported five layers of pond network Make, obtains the pond eigenmatrix of the solid images;
Window feature sequence output unit is based on the cunning for acquisition and the matched sliding window of the entity type Dynamic window carries out sliding selection on the pond eigenmatrix, obtains multiple window feature sequences;
Character recognition window generation unit, for all window feature sequences to be imported preset circulation nerve net Network generates the character recognition window about solid images;
Character zone recognition unit, the administrative division map covered for calculating the character recognition window in the solid images Picture and the convolution value between the character recognition window are covered based on convolution value identification by the character recognition window Whether area image is character zone image;
Character recognition unit, the character that the character zone image includes for identification, generates the character information.
Optionally, the document items determination unit 63 includes:
Centre coordinate computing unit, for obtaining the angular coordinate of the character zone image, and according to the angular coordinate with And the picture size of the solid images, calculate the centre coordinate;
Characteristic distance output unit, for calculating the centre coordinate and each coordinate on the contour line of the effective coverage The distance of point chooses spy of the smallest distance of numerical value as the character zone image and the document items Levy distance;
Characteristic distance comparing unit, for choosing the smallest document items of the characteristic distance as the character area Document items described in area image.
Optionally, the generating device of the electronic document further include:
Average pixel value computing unit, for the pixel value according to pixel each in the solid images, described in calculating The average pixel value of solid images;
Background pixel point recognition unit, if for any pixel in the solid images and the average pixel value Difference be less than preset background threshold, then identify the pixel be background pixel point;
Character zone image extraction unit, the region recognition for covering the background pixel point are Background regional image Picture, and the background area image is removed from the solid images, obtain the character zone image.
Optionally, the solid images acquiring unit 61 includes:
Identifier acquiring unit, for obtaining the identifier of predeterminable area in the solid images;
Entity type determination unit, for determining the entity type of the target entity based on the identifier.
Therefore, the generating device of electronic document provided in an embodiment of the present invention equally can be determining pair according to the position of character The document items answered, and character recognition algorithm is adjusted according to entity type, the accuracy of character recognition algorithm is improved, is not necessarily to User chooses manually, reduces and imports abnormal situation, without semantic analysis is carried out, improves the efficiency of generation.
Fig. 7 be another embodiment of the present invention provides a kind of terminal device schematic diagram.As shown in fig. 7, the embodiment Terminal device 7 includes: processor 70, memory 71 and is stored in the memory 71 and can transport on the processor 70 Capable computer program 72, such as the generation program of electronic document.The processor 70 executes real when the computer program 72 Step in the generation method embodiment of existing above-mentioned each electronic document, such as S101 shown in FIG. 1 to S104.Alternatively, described Processor 70 realizes the function of each unit in above-mentioned each Installation practice when executing the computer program 72, such as shown in Fig. 6 61 to 64 function of module.
Illustratively, the computer program 72 can be divided into one or more units, one or more of Unit is stored in the memory 71, and is executed by the processor 70, to complete the present invention.One or more of lists Member can be the series of computation machine program instruction section that can complete specific function, and the instruction segment is for describing the computer journey Implementation procedure of the sequence 72 in the terminal device 7.For example, the computer program 72, which can be divided solid images, obtains list Member, character information output unit, document items determination unit and character information import unit, each unit concrete function institute as above It states.
The terminal device 7 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.The terminal device may include, but be not limited only to, processor 70, memory 71.It will be understood by those skilled in the art that Fig. 7 The only example of terminal device 7 does not constitute the restriction to terminal device 7, may include than illustrating more or fewer portions Part perhaps combines certain components or different components, such as the terminal device can also include input-output equipment, net Network access device, bus etc..
Alleged processor 70 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.
The memory 71 can be the internal storage unit of the terminal device 7, such as the hard disk or interior of terminal device 7 It deposits.The memory 71 is also possible to the External memory equipment of the terminal device 7, such as be equipped on the terminal device 7 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..Further, the memory 71 can also both include the storage inside list of the terminal device 7 Member also includes External memory equipment.The memory 71 is for storing needed for the computer program and the terminal device Other programs and data.The memory 71 can be also used for temporarily storing the data that has exported or will export.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features;And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims (10)

1. a kind of generation method of electronic document characterized by comprising
The solid images of target entity are obtained, and determine the entity type of the target entity according to the solid images, are obtained With the matched document template of the entity type;The document template includes multiple document items;
Preset character recognition algorithm is adjusted based on the entity type, the character recognition algorithm by adjusting after is to entity Image is handled, and the character information about the solid images is exported;The character information includes recognized character and should The character zone image of recognized character;
The centre coordinate of the recognized character is obtained according to the character zone image, and by the centre coordinate and respectively The effective coverage of a document items determines the document items belonging to the recognized character;
The recognized character is directed into the document items affiliated in the document template, is generated real about the target The electronic document of body.
2. generation method according to claim 1, which is characterized in that described to adjust preset word based on the entity type Recognizer is accorded with, the character recognition algorithm by adjusting after exports the character information about the solid images, comprising:
The solid images are imported into five layers of pond network and carry out the operation of pond dimensionality reduction, obtain the pond feature of the solid images Matrix;
Acquisition and the matched sliding window of the entity type, are carried out on the pond eigenmatrix based on the sliding window Sliding is chosen, and multiple window feature sequences are obtained;
All window feature sequences are imported into preset Recognition with Recurrent Neural Network, generate the character recognition window about solid images Mouthful;
The character recognition window is calculated between the area image and the character recognition window that the solid images are covered Convolution value, identify by the area image that the character recognition window is covered whether be character zone figure based on the convolution value Picture;
It identifies the character that the character zone image includes, generates the character information.
3. generation method according to claim 1, which is characterized in that described according to character zone image acquisition The centre coordinate of recognized character, and by the centre coordinate and the effective coverage of each document items, determine institute State the document items belonging to recognized character, comprising:
Obtain the angular coordinate of the character zone image, and according to the angular coordinate and the picture size of the solid images, Calculate the centre coordinate;
The distance for calculating each coordinate points on the centre coordinate and the contour line of the effective coverage, it is minimum to choose numerical value Characteristic distance of the distance as the character zone image and the document items;
The smallest document items of the characteristic distance are chosen as document items described in the character zone image.
4. generation method according to claim 1-3, which is characterized in that be based on the entity type tune described Whole preset character recognition algorithm, the character recognition algorithm by adjusting after handle solid images, output about Before the character information of the solid images, further includes:
According to the pixel value of pixel each in the solid images, the average pixel value of the solid images is calculated;
If the difference of any pixel and the average pixel value is less than preset background threshold in the solid images, Identify that the pixel is background pixel point;
The region recognition that the background pixel point is covered is background area image, and by the background area image from the reality It is removed in body image, obtains the character zone image.
5. generation method according to claim 1-3, which is characterized in that the sterogram for obtaining target entity Picture, and determine according to the solid images entity type of the target entity, comprising:
Obtain the identifier of predeterminable area in the solid images;
The entity type of the target entity is determined based on the identifier.
6. a kind of generating device of electronic document characterized by comprising
Solid images acquiring unit determines the mesh for obtaining the solid images of target entity, and according to the solid images The entity type of entity is marked, is obtained and the matched document template of the entity type;The document template includes multiple document items Mesh;
Character information output unit, for adjusting preset character recognition algorithm based on the entity type, by adjusting after The character recognition algorithm handles solid images, exports the character information about the solid images;The character letter Breath includes recognized character and the character zone image of the recognized character;
Document items determination unit, for obtaining the centre coordinate of the recognized character according to the character zone image, and By the centre coordinate and the effective coverage of each document items, the text belonging to the recognized character is determined Shelves project;
Character information import unit, for the recognized character to be directed into the document item affiliated in the document template Mesh generates the electronic document about the target entity.
7. generating device according to claim 6, which is characterized in that the character information output unit, comprising:
Pond eigenmatrix generation unit carries out the operation of pond dimensionality reduction for the solid images to be imported five layers of pond network, Obtain the pond eigenmatrix of the solid images;
Window feature sequence output unit is based on the sliding window for acquisition and the matched sliding window of the entity type Mouth carries out sliding selection on the pond eigenmatrix, obtains multiple window feature sequences;
Character recognition window generation unit, it is raw for all window feature sequences to be imported preset Recognition with Recurrent Neural Network At the character recognition window about solid images;
Character zone recognition unit, the area image covered in the solid images for calculating the character recognition window with Convolution value between the character recognition window identifies the region covered by the character recognition window based on the convolution value Whether image is character zone image;
Character recognition unit, the character that the character zone image includes for identification, generates the character information.
8. generating device according to claim 6, which is characterized in that the document items determination unit includes:
Centre coordinate computing unit, for obtaining the angular coordinate of the character zone image, and according to the angular coordinate and institute The picture size for stating solid images calculates the centre coordinate;
Characteristic distance output unit, for calculating the centre coordinate and each coordinate points on the contour line of the effective coverage Distance, choose the smallest distance of numerical value as the character zone image and the document items feature away from From;
Characteristic distance comparing unit, for choosing the smallest document items of the characteristic distance as the character zone figure As the document items.
9. a kind of terminal device, which is characterized in that the terminal device includes memory, processor and is stored in the storage In device and the computer program that can run on the processor, when the processor executes the computer program such as right is wanted The step of seeking any one of 1 to 5 the method.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In when the computer program is executed by processor the step of any one of such as claim 1 to 5 of realization the method.
CN201910017061.1A 2019-01-08 2019-01-08 A kind of generation method and equipment of electronic document Withdrawn CN109871521A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910017061.1A CN109871521A (en) 2019-01-08 2019-01-08 A kind of generation method and equipment of electronic document
PCT/CN2019/118554 WO2020143325A1 (en) 2019-01-08 2019-11-14 Electronic document generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910017061.1A CN109871521A (en) 2019-01-08 2019-01-08 A kind of generation method and equipment of electronic document

Publications (1)

Publication Number Publication Date
CN109871521A true CN109871521A (en) 2019-06-11

Family

ID=66917551

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910017061.1A Withdrawn CN109871521A (en) 2019-01-08 2019-01-08 A kind of generation method and equipment of electronic document

Country Status (2)

Country Link
CN (1) CN109871521A (en)
WO (1) WO2020143325A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110764721A (en) * 2019-09-19 2020-02-07 北京三快在线科技有限公司 Template generation method and device, electronic equipment and computer readable medium
CN111144210A (en) * 2019-11-26 2020-05-12 泰康保险集团股份有限公司 Image structuring processing method and device, storage medium and electronic equipment
WO2020143325A1 (en) * 2019-01-08 2020-07-16 平安科技(深圳)有限公司 Electronic document generation method and device
CN111444907A (en) * 2020-03-24 2020-07-24 上海东普信息科技有限公司 Character recognition method, device, equipment and storage medium
CN112926590A (en) * 2021-03-18 2021-06-08 上海晨兴希姆通电子科技有限公司 Method and system for segmenting and identifying characters on cable
CN115761781A (en) * 2023-01-06 2023-03-07 江苏狄诺尼信息技术有限责任公司 Note image data identification system for engineering electronic archives

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113130023B (en) * 2021-04-22 2023-04-07 嘉兴易迪希计算机技术有限公司 Image-text recognition and entry method and system in EDC system
CN113435331B (en) * 2021-06-28 2023-06-09 平安科技(深圳)有限公司 Image character recognition method, system, electronic equipment and storage medium
CN115052132A (en) * 2022-07-05 2022-09-13 国网江苏省电力有限公司南通市通州区供电分公司 Fishing electric shock prevention early warning method and system based on artificial intelligence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5144683A (en) * 1989-04-28 1992-09-01 Hitachi, Ltd. Character recognition equipment
CN101523413A (en) * 2006-11-16 2009-09-02 国际商业机器公司 Automated generation of form definitions from hard-copy forms
CN102831416A (en) * 2012-08-15 2012-12-19 广州广电运通金融电子股份有限公司 Character identification method and relevant device
CN108121984A (en) * 2016-11-30 2018-06-05 杭州海康威视数字技术股份有限公司 A kind of character identifying method and device
CN108765118A (en) * 2018-05-18 2018-11-06 北京大账房网络科技股份有限公司 Bill is mixed to sweep the method and system for generating voucher
CN108984578A (en) * 2017-05-31 2018-12-11 株式会社日立制作所 Computer, document recognition methods and system

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105260733A (en) * 2015-09-11 2016-01-20 北京百度网讯科技有限公司 Method and device for processing image information
CN106407976B (en) * 2016-08-30 2019-11-05 百度在线网络技术(北京)有限公司 The generation of image character identification model and perpendicular column character picture recognition methods and device
CN108121966A (en) * 2017-12-21 2018-06-05 欧浦智网股份有限公司 A kind of list method for automatically inputting, electronic equipment and storage medium based on OCR technique
CN109710907A (en) * 2018-12-20 2019-05-03 平安科技(深圳)有限公司 A kind of generation method and equipment of electronic document
CN109871521A (en) * 2019-01-08 2019-06-11 平安科技(深圳)有限公司 A kind of generation method and equipment of electronic document

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5144683A (en) * 1989-04-28 1992-09-01 Hitachi, Ltd. Character recognition equipment
CN101523413A (en) * 2006-11-16 2009-09-02 国际商业机器公司 Automated generation of form definitions from hard-copy forms
CN102831416A (en) * 2012-08-15 2012-12-19 广州广电运通金融电子股份有限公司 Character identification method and relevant device
CN108121984A (en) * 2016-11-30 2018-06-05 杭州海康威视数字技术股份有限公司 A kind of character identifying method and device
CN108984578A (en) * 2017-05-31 2018-12-11 株式会社日立制作所 Computer, document recognition methods and system
CN108765118A (en) * 2018-05-18 2018-11-06 北京大账房网络科技股份有限公司 Bill is mixed to sweep the method and system for generating voucher

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020143325A1 (en) * 2019-01-08 2020-07-16 平安科技(深圳)有限公司 Electronic document generation method and device
CN110764721A (en) * 2019-09-19 2020-02-07 北京三快在线科技有限公司 Template generation method and device, electronic equipment and computer readable medium
CN111144210A (en) * 2019-11-26 2020-05-12 泰康保险集团股份有限公司 Image structuring processing method and device, storage medium and electronic equipment
CN111144210B (en) * 2019-11-26 2023-07-18 泰康保险集团股份有限公司 Image structuring processing method and device, storage medium and electronic equipment
CN111444907A (en) * 2020-03-24 2020-07-24 上海东普信息科技有限公司 Character recognition method, device, equipment and storage medium
CN111444907B (en) * 2020-03-24 2023-05-16 上海东普信息科技有限公司 Method, device, equipment and storage medium for character recognition
CN112926590A (en) * 2021-03-18 2021-06-08 上海晨兴希姆通电子科技有限公司 Method and system for segmenting and identifying characters on cable
CN112926590B (en) * 2021-03-18 2023-12-01 上海晨兴希姆通电子科技有限公司 Segmentation recognition method and system for characters on cable
CN115761781A (en) * 2023-01-06 2023-03-07 江苏狄诺尼信息技术有限责任公司 Note image data identification system for engineering electronic archives

Also Published As

Publication number Publication date
WO2020143325A1 (en) 2020-07-16

Similar Documents

Publication Publication Date Title
CN109871521A (en) A kind of generation method and equipment of electronic document
US11238310B2 (en) Training data acquisition method and device, server and storage medium
AU2017421316B2 (en) Systems and methods for verifying authenticity of ID photo
CN104834693B (en) Visual pattern search method and system based on deep search
EP3798917A1 (en) Generative adversarial network (gan) for generating images
CN108229419A (en) For clustering the method and apparatus of image
US20120027252A1 (en) Hand gesture detection
CN104915673B (en) A kind of objective classification method and system of view-based access control model bag of words
CN110378235A (en) A kind of fuzzy facial image recognition method, device and terminal device
CN110175249A (en) A kind of search method and system of similar pictures
CN106599925A (en) Plant leaf identification system and method based on deep learning
EP4109332A1 (en) Certificate authenticity identification method and apparatus, computer-readable medium, and electronic device
CN108875540A (en) Image processing method, device and system and storage medium
CN110490238A (en) A kind of image processing method, device and storage medium
CN109829448A (en) Face identification method, device and storage medium
CN110489659A (en) Data matching method and device
CN109522970A (en) Image classification method, apparatus and system
CN109271930A (en) Micro- expression recognition method, device and storage medium
CN108492301A (en) A kind of Scene Segmentation, terminal and storage medium
JP2004062605A (en) Scene identification method and device, and program
CN108228684A (en) Training method, device, electronic equipment and the computer storage media of Clustering Model
CN109871845A (en) Certificate image extracting method and terminal device
CN109740417A (en) Invoice type recognition methods, device, storage medium and computer equipment
CN112115979A (en) Fusion method and device of infrared image and visible image
CN109948401A (en) Data processing method and its system for text

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20190611

WW01 Invention patent application withdrawn after publication