CN110390320A

CN110390320A - A kind of includes the recognition methods and system of the image information of multiple documents

Info

Publication number: CN110390320A
Application number: CN201910729840.4A
Authority: CN
Inventors: 徐青松; 陈明权; 罗欢
Original assignee: Hangzhou Glority Software Ltd
Current assignee: Hangzhou Glority Software Ltd
Priority date: 2018-08-13
Filing date: 2019-08-08
Publication date: 2019-10-29
Also published as: CN109241857A

Abstract

The present invention proposes the recognition methods and system of a kind of image information for including multiple documents, this method comprises: the first trained based on the image and in advance model, identifies each document region on image, wherein the first model is model neural network based；Image and the second trained in advance model based on each document, identify each region on document, and all or part of information recorded on each region and document is associated, wherein the second model is model neural network based；The image in each region is cut and obtains, the image in each region is by being parallel to horizontal rectangle or having inclined rectangle to define relative to horizontal line；And image and third model trained in advance based on each region, the character in each region is identified, so that it is determined that the information recorded on document, wherein third model is model neural network based.The present invention can efficiently and accurately identify the information recorded on various documents.

Description

A kind of includes the recognition methods and system of the image information of multiple documents

Technical field

This disclosure relates to which a kind of includes the recognition methods and system of the image information of multiple documents.

Background technique

Accurately identify that the information recorded on various documents is remarkable.

Accordingly, there exist the demands to new technology.

Summary of the invention

One purpose of the disclosure is to provide the recognition methods and system of a kind of image information for including multiple documents.

According to the disclosure in a first aspect, provide a kind of recognition methods of image information for including multiple documents, wrap Include: the first trained based on the image and in advance model identifies every in one or more document regions on the image A document region, wherein first model is model neural network based；Image based on each document and in advance The second trained model identifies each region in one or more regions on the document, one or more of regions In each region and the document on all or part of information for recording it is associated, wherein second model is based on mind Model through network；Cut and obtain the image in each region in one or more of regions, one or more of areas The image in each region in domain is by being parallel to horizontal rectangle or having inclined rectangle to define relative to horizontal line； And image based on each region in one or more of regions and third model trained in advance, described in identification The character in each region in one or more regions, so that it is determined that the information recorded on the document, wherein the third Model is model neural network based.

According to the second aspect of the disclosure, a kind of identifying system of image information for including multiple documents is provided, is wrapped Include: the first model, first model are models neural network based；Second model, second model are based on nerve The model of network；Third model, the third model are models neural network based；And one or more first devices, One or more of first devices are configured as: the first trained based on the image and in advance model identifies the image On one or more document regions in each document region；Image and second model based on each document, Identify each region in one or more regions on the document, each region and institute in one or more of regions It is associated to state all or part of information recorded on document；It cuts and obtains each region in one or more of regions The image of image, each region in one or more of regions is by being parallel to horizontal rectangle or relative to horizontal line There is inclined rectangle to define；And image and preparatory instruction based on each region in one or more of regions Experienced third model identifies the character in each region in one or more of regions, so that it is determined that remembering on the document The information of load.

According to the third aspect of the disclosure, a kind of non-transitorycomputer readable storage medium is provided, it is described non-provisional Property computer readable storage medium on be stored with the executable instruction of series of computation machine, when the series of computation machine is executable Instruction when being executed by one or more computing devices so that one or more of computing devices: based on the image and pre- First the first model of training, identifies each document region in one or more document regions on the image, wherein described First model is model neural network based；Image based on the document and in advance the second model of training, described in identification Each region in one or more regions on document, on each region and the document in one or more of regions All or part of information of record is associated, wherein second model is model neural network based；It cuts and obtains institute The image in each region in one or more regions is stated, the image in each region in one or more of regions is by putting down Row in horizontal rectangle or relative to horizontal line has inclined rectangle to define；And it is based on one or more of regions In each region image and third model trained in advance, identify each region in one or more of regions In character, so that it is determined that the information recorded on the document, wherein the third model is model neural network based.

By the detailed description referring to the drawings to the exemplary embodiment of the disclosure, the other feature of the disclosure and its Advantage will become apparent.

Detailed description of the invention

The attached drawing for constituting part of specification describes embodiment of the disclosure, and together with the description for solving Release the principle of the disclosure.

The disclosure can be more clearly understood according to following detailed description referring to attached drawing, in which:

Fig. 1 be schematically show suitable for some embodiments of the present disclosure include multiple documents image signal Figure.

Fig. 2 is to schematically show the exemplary of document suitable for some embodiments of the present disclosure at least part of to show It is intended to.

Fig. 3 A and Fig. 3 B are respectively schematically to show to record on the identification document according to some embodiments of the present disclosure At least part of block diagram of the method for information.

Fig. 4 is the method for schematically showing the information recorded on the identification document according to some embodiments of the present disclosure At least part of flow chart.

Fig. 5 is the system for schematically showing the information recorded on the identification document according to some embodiments of the present disclosure At least part of structure chart.

Fig. 6 is the system for schematically showing the information recorded on the identification document according to some embodiments of the present disclosure At least part of structure chart.

Fig. 7 be schematically show suitable for some embodiments of the present disclosure include multiple documents image signal Figure.

Fig. 8 is to schematically show an example of the image of the document suitable for some embodiments of the present disclosure at least The schematic diagram of a part.

Fig. 9 A is the schematic diagram in a region being identified in document shown in Fig. 8.

Fig. 9 B is the schematic diagram that slant correction is passed through in region shown in Fig. 9 A.

Fig. 9 C is the schematic diagram in a region being identified in document shown in Fig. 8.

Fig. 9 D is the schematic diagram that slant correction is passed through in region shown in Fig. 9 C.

Note that same appended drawing reference is used in conjunction between different attached drawings sometimes in embodiments described below It indicates same section or part with the same function, and omits its repeated explanation.In the present specification, using similar mark Number and letter indicate similar terms, therefore, once being defined in a certain Xiang Yi attached drawing, then do not needed in subsequent attached drawing pair It is further discussed.

Specific embodiment

Hereinafter reference will be made to the drawings the various exemplary embodiments of the disclosure are described in detail.It should also be noted that unless in addition having Body explanation, the unlimited system of component and the positioned opposite of step, numerical expression and the numerical value otherwise illustrated in these embodiments is originally Scope of disclosure.In being described below, in order to preferably explain the disclosure, many details are elaborated, it being understood, however, that The disclosure can also be practiced in the case where without these details.

Be to the description only actually of at least one exemplary embodiment below it is illustrative, never as to the disclosure And its application or any restrictions used.In shown here and discussion all examples, any occurrence should be interpreted only It is merely exemplary, not as limitation.

Technology, method and apparatus known to person of ordinary skill in the relevant may be not discussed in detail, but suitable In the case of, the technology, method and apparatus should be considered as part of specification.

Present disclose provides it is a kind of include multiple documents image information recognition methods, comprising: be based on the image First model of training in advance, identifies each document region in one or more document regions on the image, wherein First model is model neural network based；Image and the second trained in advance model based on each document, Identify each region in one or more regions on the document, each region and institute in one or more of regions It is associated to state all or part of information recorded on document, wherein second model is model neural network based；Cutting And the image in each region in one or more of regions is obtained, the shadow in each region in one or more of regions It seem by being parallel to horizontal rectangle or thering is inclined rectangle to define relative to horizontal line；And based on one or The image in each region in multiple regions and third model trained in advance, identify in one or more of regions Character in each region, so that it is determined that the information recorded on the document, wherein the third model is based on neural network Model.

In some embodiments, by the third model, based on each region in one or more of regions Image and its position in whole document, to identify the character in each region in one or more of regions.

It should be appreciated that the disclosure so-called " document " refers to the entity for recording information on it, these information are with some Mode is disposed on document, and is held by one of middle text, outer text, number, symbol, figure etc. or diversified forms It carries.Some specific examples of the disclosure so-called " document " can be, and invoice, bill, duty receipt, receipt, shopping list, food and drink are small Ticket, insurance policy, expense report, deposit flowing water list, credit card statement, express delivery list, stroke list, ticket, boarding card, patent disclosure text The various documents filled in by artificial and/or machine such as this information page, ballot paper, questionnaire, evaluation table, attendance sheet, application form. Wherein, expense report, which can be considered as, includes multiple invoices and pastes the document form on a piece of paper.Those skilled in the art It is appreciated that the disclosure so-called " document " be not limited to these specific examples listed herein, and be not limited to finance or Commercially related bill is also not necessarily limited to have the document of official seal thereon, can be the document with type fount and be also possible to band Have the document of hand-written script, can be with regulation and/or general format document may not be with regulation and/or it is general The document of format.

In some embodiments, each document region step in one or more document regions on the identification image Afterwards, each document region is cut and obtains the image of each document, the image of each document is inputted into institute respectively later The second model is stated to be handled.For example, for document image 10 as shown in Figure 1, it includes have multiple documents 100,200, 300, the different location being respectively distributed on the document image 10.The first trained based on the document image 10 and in advance mould Type, identifies each document region 100,200,300 on the document image 10, and each document region is cut and obtained The image of each document is inputted second model later respectively and handled by the image of each document.

Identify the character in each region in one or more regions on document, so that it may according to these character institutes The information of carrying determines the information recorded on document.For example, for document 100 as shown in Figure 2, first based on training in advance Second model identifies the region 110,120,130,140 on document 100, wherein record on each region and document 100 one Kind information is associated；Cut and obtain the image in each region 110,120,130,140 on the document 100；It is then based on pre- First trained third model identifies the character in each region 110,120,130,140, to just can determine that on document 100 The content of documented information in each region.For example, each region includes at least the minimum of character included in the region Bounding box area encompassed.In some embodiments, be input to third model trained in advance is one or more of The image in each region in region and its position in whole document, thus identified by third model one or The character in each region in multiple regions.

It will be understood by those skilled in the art that Fig. 1 and document shown in Figure 2 10 and document 100 are only schematic , it cannot be used to limit the disclosure.Although illustrating only corresponding multiple regions in Fig. 1 and Fig. 2, on the document of the disclosure It region obviously can also be less or more.Although the boundary in region 110,120,130,140 illustrated in fig. 2 is with parallel It is defined in horizontal rectangle, but its boundary can also be by having inclined rectangle relative to horizontal line, parallelogram, appointing Meaning quadrangle etc. defines, or by round, ellipse, other polygons (such as triangle, trapezoidal, arbitrary polygon Deng) and irregular shape etc. define.Any one region on the document of the disclosure can be arranged in appointing in document What position, for example, in Fig. 2, region 110 and region 120, which can be, to be closer even adjacent, and region 130 can be with It is positioned at the edge of document 100, region 140 can be lesser compared with other regions.Certainly, those skilled in the art may be used also To understand, the arrangement mode, positional relationship, size of each region etc. on the document of the disclosure are not limited to Fig. 1 and Fig. 2 Shown in mode, this is related with the concrete condition of document, and Fig. 1 and document shown in Fig. 2 are only an example.

The image of document refers to the document, such as picture, the video of document etc. presented with visual means.To on document It includes the boundary for identifying region that each region in one or more regions, which carries out identification,.For example, region boundary with flat Row in the case where defining, can determine the region by determining at least two vertex of the rectangle in horizontal rectangle. It, can be by determining the rectangle at least in the case where the boundary in region is to have inclined rectangle relative to horizontal line to define Three vertex determine the region.Can be used the method for checking object based on R-CNN, the method for checking object based on YOLO, Based on original detection target text detection (such as based on character, based on word, based on line of text etc.), be based on object boundary frame Shape text detection (horizontal or close to horizontal text detection, more be oriented to text detection etc.).

In some embodiments, it is desirable to which the position in each region is input to third model with the character in identification region. The position in region can be any form that can indicate position of this region in document, for example, the position in region can be with It is coordinate (absolutely or relative coordinate) of the vertex (one or more) in region in document, is also possible to the vertex (one in region It is a or multiple) coordinate in document (absolutely or relative coordinate) and side length (one or more), it can also be the center in region The coordinate (absolutely or relative coordinate) and radius (one or more) of (one or more) in document.Character in each region It can be one of middle text, outer text, number, symbol, figure etc. or a variety of.

In some embodiments, the image in each region in one or more regions is input to third model, to know Character not in the region.The image in each region in one or more of regions be by be parallel to horizontal rectangle or There is inclined rectangle to define relative to horizontal line.The standard defined above is to be in level in image according to whole document Or heeling condition determines, and when the states such as inclination or distortion are presented in document, identified by the second model one or more Each region in a region can also show horizontally or diagonally equal different conditions.

In some cases, for example, region boundary to have the case where inclined rectangle is to define relative to horizontal line Under, slant correction can also be carried out to the image of each region, so that being input to the image in the region of third model to pass through Image after slant correction.For example, can have inclined square relative to horizontal line by the determining boundary for being used to delimited area Then the image in the region is rotated the angle relative to the horizontal inclined angle of institute by shape, so that being used to delimited area Boundary rectangular parallel in horizontal line, to carry out slant correction.The tilt angle can be according to delimited area boundary Rectangle apex coordinate is calculated.

Fig. 7 schematically show suitable for some embodiments of the present disclosure include multiple documents image signal Figure.Content as described above includes that the images of multiple documents is input into the first model, and the first model identifies wherein One or more document regions 600, the multiple document region 600 can be arbitrary form arrangement and arrangement, be identified It is defined with frame indicated by appended drawing reference 600 in each document region out.It cuts and obtains each document field in region 600 Image, then the image of each document field 600 can be input to the second model and handled.

The example that Fig. 8 schematically shows the image of a document suitable for the disclosure.As described above interior Holding, the image of document is input into the second model, and the second model identifies one or more of regions 610 to 690, wherein It is defined with frame indicated by appended drawing reference 610 to 690 in each region being identified.It will be understood by those skilled in the art that the Two models may recognize that regions more more or fewer than the region outlined in the accompanying drawings.

Second model may recognize that the class of information associated with each region when identifying each region Type.For example, information associated with region 610 is the title and number of trade company, information associated with region 620 is raw for bill At time, with detail and every amount of money that region 630,640 associated information are consumption, letter associated with region 650 Breath is the subtotal of spending amount, and information associated with region 660 is the expenses of taxation, and information associated with region 670 is consumption gold The total of volume, information associated with region 680 are collection amount, and information associated with region 690 is change amount.

The image in each region in region 610 to 690 is cut and obtains, it then can will be every in region 610 to 690 The image in a region is sequentially inputted to third model, to identify the character in the region.It can be by region 610 as shown in Figure 9 A Image or the image in region 650 as shown in Figure 9 C be input to third model to identify the character in the region.In addition, by There is inclination relative to horizontal line in the frame for delimited area 610 to 690, it in some embodiments, can also be to each region Image carry out slant correction after be input to third model again.For example, shown in the image in region 610 and Fig. 9 C shown in Fig. 9 A Region 650 image in, obtain and be used to delimited area boundary and relative to horizontal line have inclined rectangle apex coordinate, pass through The rectangle is calculated relative to the inclined angle of horizontal line institute, the image in the region is then rotated into the angle, so that being used to The rectangular parallel on the boundary of delimited area is in horizontal line, to carry out slant correction.It later, can be by process as shown in Figure 9 B The image in the region 610 of slant correction or the image in the region 650 by slant correction as shown in fig. 9d are input to third Model is to identify the character in the region.

It later, can be according to the class of character and information associated with each region in each region identified Type, to determine the information recorded on document.In the present embodiment, the name of the as final trade company for determining related document region 610 Claim text and numerical digit content, the document in region 620 generates time figure content, the consumption details in region 630,640 and each Item amount of money digital content, the digital content of 650 spending amount subtotal of region, the digital content of 660 expenses of taxation of region, region 670 disappears Take the digital content of amount of money total, the digital content of 680 collection amount of region, the letter of the digital content of 690 change amount of region Breath.Information identified above can be directly displayed on each relevant range of document image, can also pass through list or segmentation text Font formula additionally exports display.

The disclosure utilizes model neural network based, first identifies one or more tickets in the image of document to be identified According to region, one or more regions in each bill image are then identified, then identify the character in each region, thus It identifies the information recorded on each document, so, it is possible efficiently and accurately to identify the information recorded on various documents.Example Such as, not high for resolution ratio, crooked, illegible, have it is being stained, paper curl, fill in (by artificial and/or machine Device) position equal document lack of standardization image, can be identified using disclosed method and the system being described below.

First model is obtained by following process: including more to each of the first document image sample training concentration The image sample for opening document is labeled processing, to mark out in one or more document regions in each image sample Each document region；And by the first document image sample training collection by the mark processing, to the first mind It is trained through network, to obtain first model.The first nerves network is based on algorithm of target detection (Detection) neural network, in some embodiments, the first nerves network is based on convolutional neural networks (CNN), the model foundations such as RCNN or Mask-RCNN.

Second model can be obtained by following process: each document image concentrated to the second document image sample training Sample is labeled processing, to mark out each region in one or more regions in each document image sample, one Or each region in multiple regions is associated with all or part of information in document image sample；And by by mark Second document image sample training collection of processing, is trained first nerves network, to obtain the second model.For example, Fig. 8 institute The example that shows can be the image of a document through marking.When marking out each region 610 to 690, can also mark The type of information associated by each region 610 to 690 out.In some embodiments, nervus opticus network is residual based on depth What poor network (Resnet) was established.

Being trained to nervus opticus network can also include: based on the second document image test sample collection, to by instructing The output accuracy rate of experienced nervus opticus network is tested；If output accuracy rate is less than scheduled first threshold, increase by the The quantity for the document image sample that two document image sample trainings are concentrated, each document shadow in increased document image sample As sample standard deviation is handled by mark；And the second document image sample instruction after the quantity by increasing document image sample Practice collection, nervus opticus network is trained again.It is then based on what the second document image test sample collection crossed re -training The output accuracy rate of nervus opticus network is tested again, until the output accuracy rate of nervus opticus network is met the requirements i.e. not Until scheduled first threshold.In this way, the nervus opticus network that output accuracy rate is met the requirements may be used as above-mentioned identification Trained second model in the process.

First model and the second model use identical training and testing process, and may be incorporated in primary training Or completed in testing process, to establish a general identification model, the image for including multiple documents is identified After processing, can directly one-off recognition go out in one or more regions on the region and every document of every document Each region.

Third model can be obtained by following process: each document image concentrated to third document image sample training Sample is labeled processing, to mark out each region in one or more regions in each document image sample and every Character in a region, each region in one or more regions are related to all or part of information in document image sample Connection；And by the third document image sample training collection by mark processing, third nerve network is trained, to obtain Third model.In some embodiments, on the image and document of the document concentrated based on third document image sample training The position in each region in one or more regions is trained third nerve network to obtain third model.Some In embodiment, the shadow in each region in one or more regions on document concentrated based on third document image sample training Picture is trained nervus opticus network to obtain third model.Under some cases of these embodiments, for example, in region Boundary in the case where defining, be input to third nerve network to be trained to have inclined rectangle relative to horizontal line The image in region is the image after slant correction.For example, can by determine be used to delimited area boundary relative to Horizontal line has inclined rectangle relative to the horizontal inclined angle of institute, and the image in the region is then rotated the angle, with So that be used to delimited area boundary rectangular parallel in horizontal line, to carry out slant correction.The tilt angle can root It is calculated according to the rectangle apex coordinate on delimited area boundary.In some embodiments, third nerve network is based on recurrence mind It is established through network (RNN).

Being trained to third nerve network can also include: based on third document image test sample collection, to by instructing The output accuracy rate of experienced third nerve network is tested；If exporting accuracy rate is less than scheduled threshold value, increase third list According to the quantity for the document image sample that image sample training is concentrated, each document image sample in increased document image sample This is handled by mark；And the third document image sample training after the quantity by increasing document image sample Collection, is again trained third nerve network.It is then based on third document image test sample collection crosses re -training The output accuracy rate of three neural networks is tested again, until the output accuracy rate of third nerve network meet the requirements it is i.e. not small Until scheduled threshold value.In this way, the third nerve network that output accuracy rate is met the requirements may be used as in above-mentioned identification process Trained third model.

It will be understood by those skilled in the art that for train first nerves network the first document image sample training collection, Third document for training the second document image sample training collection of nervus opticus network and for training third nerve network Image sample training collection can be identical set and be also possible to different set, it can including identical document shadow Decent also may include not identical or not exactly the same document image sample.First for testing first nerves network is single According to image test sample collection, the second document image test sample collection for testing nervus opticus network and it is used to test third mind Third document image test sample collection through network can be identical set and be also possible to different set, it can including Identical document image sample also may include not identical or not exactly the same document image sample.For in testing Judge scheduled first threshold that whether the output accuracy rate of nervus opticus network meets the requirements with for judging the in testing The scheduled second threshold whether the output accuracy rate of three neural networks meets the requirements can be identical value and be also possible to difference Value.The quantity for the document image sample that first and second document image sample trainings are concentrated and the first, second, and third list According to the quantity for the document image sample that image test sample is concentrated, can according to need to select.Document after identification Image, can also be used as document image sample and be added in above-mentioned any one or more training sets or test set, thus The quantity of document image sample for training and/or test is constantly increased, so that trained model Precision improves.

Fig. 3 A and Fig. 3 B are respectively schematically to show to record on the identification document according to some embodiments of the present disclosure At least part of block diagram of the method for information.Based on it is described include multiple documents image and in advance training the first mould Type identifies each document region in one or more document regions on the image, is then based on each list to be identified According to image 210 and in advance training the second model 220, to identify each region in one or more regions on document 230；The of image based on each region 230 in the one or more regions for cutting acquisition on document and in advance training Three models 250, to identify the character 260 in each region in one or more regions on document；In some embodiments, Image and the second model 220 based on each region 230 in one or more of regions, also identification and one or more areas The information type 240 of the associated information in each region in domain；And based on the letter associated with each region identified The character 260 in each region in the information type 240 of breath and the one or more regions identified, to determine document The information of upper record；In some embodiments, image based on each region 230 in one or more of regions and its Position in whole document, to identify the character 260 in each region in one or more of regions.

In some embodiments, in one or more regions on image 210, document based on each document to be identified Each region 230 and third model 250 trained in advance, to identify each of one or more regions on document Character 260 in region.In some embodiments, the image 210 based on document and the second model 220, also identification with one or The information type 240 of the associated information in each region in multiple regions；And it is related to each region based on what is identified The character 260 in each region in the information type 240 of the information of connection and the one or more regions identified comes true Order according to upper record information.In some embodiments, every in the image based on whole document and one or more regions The position in a region 230, to identify the character 260 in each region in one or more regions on document.

The information type of information associated with each region can be one or more types.For example, when document is certain When kind application form, in one case, the information type of information associated with a region in document can be applicant The information type of name, information associated with another region in document can be ID card No.；In another situation Under, the information type of information associated with some region in document can be full name of applicant and ID card No..For example, When document is certain invoice, in one case, the information type of information associated with a region in document can be with It is invoice code name, the information type of information associated with another region in document can be the pre-tax amount of money；In another kind In the case of, the information type of information associated with some region in document can be invoice code name and the pre-tax amount of money.With one The information type of the associated information of different zones in a or multiple regions can be the same or different.For example, working as document When for shopping list, in one case, first the information type of associated information can be purchased with multiple and different regions The commodity bought.

Fig. 4 is the method for schematically showing the information recorded on the identification document according to some embodiments of the present disclosure At least part of flow chart.In this embodiment, disclosed method includes: the first of based on the image and in advance training Model identifies each document region (310) in one or more document regions on the image；Shadow based on each document Picture and in advance training the second model, identify document on one or more regions in each region, and with one or more The information type (320) of the associated information in each region in a region；Based on each of one or more of regions The image in region and third model trained in advance identify the character in each region in one or more regions (330)；And information type based on the information associated with each region in one or more regions identified, with And the character in each region in the one or more regions identified, to determine the information recorded on document (340).

In these embodiments, first model is obtained by following process: to the first document image sample training collection Each of include that the image samples of multiple documents is labeled processing, to mark out one in each image sample Or each document region in multiple document regions；And pass through the first document image sample by the mark processing Training set is trained first nerves network, to obtain first model.Second model can be obtained by following process To: each document image sample concentrated to the second document image sample training is labeled processing, to mark out each document The information type in each region and information associated with each region in one or more regions in image sample, one Each region in a or multiple regions is associated with all or part of information in document image sample；And by by mark The the second document image sample training collection for infusing processing, is trained nervus opticus network, to obtain the second model.It can be with base The output accuracy rate for the nervus opticus network trained is tested in the second document image test sample collection, if accuracy rate It is unsatisfactory for requiring, that is, is less than scheduled first threshold, then increase by the second document image sample training and concentrate document image sample Again nervus opticus network is trained after quantity, until the output accuracy rate of nervus opticus network meet the requirements it is i.e. not small Until scheduled first threshold.In this way, the nervus opticus network that output accuracy rate is met the requirements may be used as above-mentioned identified Trained second model in journey.

In some embodiments, before each region in the one or more regions identified on each document, this public affairs The method opened further include: image and the 4th trained in advance model based on document identify the classification of document, wherein the 4th mould Type is model neural network based；And it is selected according to the classification identified by the second model to be used and/or third Model.In some embodiments, the classification of document includes at least the languages of document.For example, can image based on document and pre- First the 4th model of training can be one in the following terms come the type for identifying the language for the information recorded on carrying document It is or multinomial: the language such as Chinese, English, Japanese, the language that Morse code, pictograph, ASCII character etc. are presented with certain coding form Speech etc..Then it is selected the second model to be used and/or third model according to the recognition result.In these cases, needle To different languages, may train in advance has different the second model and/or third model, this facilitates mentioning for model accuracy It is high.

4th model can be obtained by following process: each document image concentrated to the 4th document image sample training Sample is labeled processing, to mark out the classification of each document image sample；And it is single by the 4th by mark processing According to image sample training collection, fourth nerve network is trained, to obtain the 4th model.In some embodiments, the 4th mind It is established through network based on depth convolutional neural networks (CNN).Fourth nerve network is trained further include: be based on the 4th Document image test sample collection, tests the output accuracy rate of trained fourth nerve network；If exporting accuracy rate Less than scheduled threshold value, then increase the quantity for the document image sample that the 4th document image sample training is concentrated, the increased list of institute It is handled according to each document image sample standard deviation in image sample by mark；And the quantity by increasing document image sample The 4th document image sample training collection later, is again trained fourth nerve network.It is then based on the 4th document image Test sample collection tests the output accuracy rate for the fourth nerve network that re -training is crossed again, until fourth nerve network Output accuracy rate meet the requirements i.e. not less than until scheduled third threshold value.In this way, exporting the accuracy rate is met the requirements the 4th Neural network may be used as trained 4th model in above-mentioned identification process.

It will be understood by those skilled in the art that the 4th document image sample training collection and the first, second, and third document shadow As sample training collection, it can be identical set and be also possible to different set.4th document image test sample collection and first, Second and third document image test sample collection, it can be identical set and be also possible to different set.Third threshold value and the One and second threshold, it can be identical value and be also possible to different values.4th document image sample training collection and the 4th document The quantity for the document image sample that image test sample is concentrated, can according to need to select.Document after identification Image can also be used as document image sample and be added in above-mentioned any one or more training sets or test set, to make The quantity that must be used for training and/or the document image sample tested can constantly increase, so that the essence of trained model Degree improves.

Fig. 5 is the system for schematically showing the information recorded on the identification document according to some embodiments of the present disclosure 400 at least part of structure chart.It will be understood by those skilled in the art that system 400 is an example, should not be regarded To limit the scope of the present disclosure or features described herein.In this example, system 400 may include the first model 410, Two models 420, third model 430 and one or more first devices 440.Wherein, the first model 410 is based on neural network Model；Second model 420 is model neural network based；Third model 430 is model neural network based；One or Multiple first devices 440 are configured as: the first trained based on the image and in advance model 410 identifies on the image Each document region in one or more document regions；Image and the second model 420 based on each document identify on document One or more regions in each region, the whole recorded on each region and document in one or more regions or portion Divide information associated；Cut and obtain the image in each region in one or more of regions, one or more of areas The image in each region in domain is by being parallel to horizontal rectangle or having inclined rectangle to define relative to horizontal line； And image and third model 430 based on each region in one or more of regions, identify one or more areas The character in each region in domain, so that it is determined that the information recorded on document.In some embodiments, one or more first Device 440 is configured as: the image based on document, each region in one or more regions and third model 430 come Identify the character in each region in one or more regions on document.

The image in each region in one or more of regions is by being parallel to horizontal rectangle or relative to water What horizontal line had inclined rectangle to define, also, one or more of first devices 440 are also configured in response to opposite There is inclined rectangle in horizontal line, is come by the third model 430 based on the image in each region by slant correction Identify the character in each region in one or more of regions.

From the above description, it can be seen that one or more first devices 440 can be additionally configured to: based on the image and in advance First the first model of training, identifies each document region in one or more document regions on the image；Based on each The image of document and the second model also identify the info class of information associated with each region in one or more regions Type；And the information type based on the information associated with each region in one or more regions identified, Yi Jishi Not Chu one or more regions in each region in character, to determine the information recorded on document.The identification of the disclosure The system 400 for the information recorded on document can also include the 4th model (not shown) neural network based.It is one or more First device 440 can be additionally configured to: before each region in one or more regions on identification document, based on single According to image and the 4th model, identify the classification of document；And it is selected according to the classification identified by the second mould to be used Type and/or third model.

It will be understood by those skilled in the art that the various operations described above about one or more first devices 440, Can be configured as and carried out in a first device 440, also can be configured as be distributed in multiple first devices 440 into Row.Each of one or more first devices 440 may each be computing device, storage device or be provided simultaneously with calculating with The device of store function.

Although in system 400 shown in fig. 5, the first model 410, the second model 420 and third model 430 and one or more A first device 440 is indicated with individual rectangle frame respectively, but the first model 410, the second model 420 and third model 430 are also possible to be stored in one or more first devices 440, such as the first model 410, the second model 420 and third Model 430 is stored in the same first device 440 or the first model 410, the second model 420 and third model 430 divide Be not stored in different first devices 440 or the first model 410, the second model 420 and third model 430 in it is any one A a part is stored in a first device 440, other parts are stored in other first devices 440；Certainly, first Model 410, the second model 420 and third model 430 can also be not stored in one or more first devices 440, but be deposited Storage is in other devices.

For the information recorded on the document that identifies, it can be used for further downstream processing.Downstream processing can lead to One or more second devices are crossed (reference can be made to 520) shown in fig. 6 carry out.One or more second devices can be configured as: Send the image of document to one or more first devices 440；And it is obtained from one or more first devices 440 digital The information recorded on the document identified changed.These information digitized that one or more second devices obtain, It can be used for downstream processing.For example, for the information on the Attendance Sheet that identifies, one or more second devices can be used To count rate of attendance etc.；For the information on the food and drink receipt that identifies, one or more second devices, which can be logged into, to disappear Take record etc..

It will be understood by those skilled in the art that sending the image of document the second of one or more first devices 440 to Device, the second device with the information digitized is obtained from one or more first devices 440, can be identical device It is also possible to different devices.

Fig. 6 is the system for schematically showing the information recorded on the identification document according to some embodiments of the present disclosure 500 at least part of structure chart.System 500 includes one or more first devices 510 and one or more second devices 520, wherein one or more first devices 510 are connect with one or more second devices 520 by network 530, and one or more Each of a first device 510 and other one or more first devices 510 or the element of each and other Each of one or more elements, and one or more second devices 520 and other one or more second devices 520 or each a element and other one or more elements, it can also be connected by network 530.

Each of one or more first devices 510 may each be computing device, storage device or be provided simultaneously with meter Calculate the device with store function.Each of one or more first devices 510 can contain one or more processors 511, one or more memories 512 and the other assemblies being typically found in the devices such as computer.One or more first Each of one or more memories 512 in device 510 can store and can be accessed by one or more processors 511 Content, including can be by instruction 513 that one or more processors 511 execute and can be by one or more processors 511 Come the data 514 retrieved, manipulated or store.

Instruction 513 can be any instruction set that will directly be executed by one or more processors 511, such as machine generation Code, or any instruction set executed indirectly, such as script.Term " instruction " herein, " application ", " process ", " step Suddenly it may be used interchangeably herein with " program " ".Instruction 513 can store as object code format so as to by one or more Processor 511 is directly handled, or is stored as any other computer language, including explaining on demand or the independent source of just-ahead-of-time compilation The script or set of code module.Instruction 513 may include causing the one or more of such as one or more first devices 510 Computing device serves as the instruction of first, second and/or third nerve network.Finger is explained in more detail in this paper other parts Enable 513 function, method and routine.

One or more memories 512 can be that can store can be by the content that one or more processors 511 access Any provisional or non-transitorycomputer readable storage medium, such as hard disk drive, storage card, ROM, RAM, DVD, CD, USB storage, energy memory write and read-only memory etc..One or more of one or more memories 512 may include Distributed memory system, wherein instruction 513 and/or data 514 can store and may be physically located at identical or different ground It manages on multiple and different storage devices at position.One or more of one or more memories 512 can be via such as Fig. 5 Shown in network 530 be connected to one or more first devices 510, and/or can be attached directly to or be incorporated to one Or in any one of multiple first devices 510.

Data 514 can be retrieved, be stored or be modified to one or more processors 511 according to instruction 513.It is stored in one Or the data 514 in multiple memories 512 may include the image of document to be identified, various document image sample sets, be used for First, second and/or the parameter of third nerve network etc..Other data not associated with the image of document or neural network It can be stored in one or more memories 512.For example, although subject matter described herein is not by any certain number It is limited according to structure, but data 514 may also be stored in computer register (not shown), as with many different words The table or XML document of section and record are stored in relevant database.Data 514 can be formatted as any computing device Readable format, such as, but not limited to binary value, ASCII or Unicode.In addition, data 514 may include being enough to identify phase Close any information of information, such as number, descriptive text, proprietary code, pointer, to being stored at such as other network sites It is used to calculate the information of related data Deng the reference of the data in other memories or by function.

One or more processors 511 can be any conventional processors, such as commercially available central processing list in the market First (CPU), graphics processing unit (GPU) etc..Alternatively, one or more processors 511 can also be personal module, such as Specific integrated circuit (ASIC) or other hardware based processors.Although being not required, one or more first is filled Setting 510 may include special hardware component faster or to more efficiently carry out specific calculating process, such as to the shadow of document As carrying out image procossing etc..

Although in Fig. 5 schematically by one or more first devices 510 and one or more processors 511, one or Multiple memories 512 and other elements are shown in the same frame, but first device, processor, computing facillities or Memory can actually include the multiple places being likely to be present in the same physical housings or in different multiple physical housings Manage device, computing facillities or memory.For example, one in one or more memories 512 can be and be located at and one Or hard disk drive or other storage mediums in the different shell of shell of each of multiple first devices 510.Therefore, Reference processor, computing facillities or memory are understood to include reference may parallel work-flow or possible non-parallel The set of the processor of operation, computing facillities or memory.For example, one or more first devices 510 may include The server computational device operated as the server zone of load balance.In addition, though some functions described above are referred to It is shown as occurring on the single computing device with single processor, but the various aspects of subject matter described herein can be with It is for example in communication with each other by network 530 by multiple first devices 510 to realize.

Each of one or more first devices 510 can be located at the different nodes of network 530, and can be straight Ground connection communicates with other nodes of network 530 indirectly.Although illustrating only the first and second devices 510,520 in Fig. 5, It will be understood by those skilled in the art that system 500 can also include other devices, wherein each different device is respectively positioned on network At 530 different nodes.Various agreements and system can be used by the component part in network 530 and system as described herein (for example, the first and second devices, first, second, and third model etc.) interconnection, so that network 530 can be internet, ten thousand Tie up a part of net, particular inline net, wide area network or local area network.Network 530 can use Ethernet, WiFi and HTTP etc. Standard communication protocol, be for one or more companies proprietary agreement and aforementioned protocols various combinations.Although working as To transmit or receive certain advantages are obtained when information as described above, but subject matter described herein be not limited to it is any specific The mode of intelligence transmission.

Each of one or more second devices 520 can be configured as in one or more first devices 510 Each is similar, that is, have one or more processors 521 as described above, one or more memories 522 and Instruction and data.Each second device 520 can be intended to the personal computing device used by user or be used by enterprise Business computer device, and there are all components being usually used in combination with personal computing device or business computer device, Such as central processing unit (CPU), storing data and the memory of instruction (for example, RAM and internal hard disk drive) are such as shown Show device (for example, the monitor, touch screen, projector, TV with screen or other devices for being operable to display information), mouse One or more I/O equipment 523 of mark, keyboard, touch screen, microphone, loudspeaker, and/or Network Interface Unit etc..One or Multiple second devices 520 can also include for capture still image or record video flowing one or more cameras 524 and All components for these elements to be connected to each other.

Although one or more second devices 520 can include respectively full-scale personal computing device, they can The mobile computing device that can wirelessly exchange data with server by networks such as internets can be optionally included.Citing For, one or more second devices 520 can be mobile phone, or PDA, tablet PC or energy that such as band is wirelessly supported The devices such as enough net books that information is obtained via internet.In another example, one or more second devices 520 can be Wearable computing system.

Word " A or B " in specification and claim includes " A and B " and " A or B ", rather than is exclusively only wrapped Include " A " or only include " B ", unless otherwise specified.

In the disclosure, mean to combine embodiment description to " one embodiment ", referring to for " some embodiments " Feature, structure or characteristic are included at least one embodiment, at least some embodiments of the disclosure.Therefore, phrase is " at one In embodiment ", the appearance of " in some embodiments " everywhere in the disclosure be not necessarily referring to it is same or with some embodiments.This It outside, in one or more embodiments, can in any suitable combination and/or sub-portfolio comes assemblage characteristic, structure or characteristic.

As used in this, word " illustrative " means " be used as example, example or explanation ", not as will be by " model " accurately replicated.It is not necessarily to be interpreted than other implementations in any implementation of this exemplary description It is preferred or advantageous.Moreover, the disclosure is not by above-mentioned technical field, background technique, summary of the invention or specific embodiment Given in go out theory that is any stated or being implied limited.

As used in this, word " substantially " means comprising the appearance by the defect, device or the element that design or manufacture Any small variation caused by difference, environment influence and/or other factors.Word " substantially " also allows by ghost effect, makes an uproar Caused by sound and the other practical Considerations being likely to be present in actual implementation with perfect or ideal situation Between difference.

Foregoing description can indicate to be " connected " or " coupled " element together or node or feature.As used herein , unless explicitly stated otherwise, " connection " means an element/node/feature and another element/node/feature in electricity Above, it is directly connected (or direct communication) mechanically, in logic or in other ways.Similarly, unless explicitly stated otherwise, " coupling " mean an element/node/feature can with another element/node/feature in a manner of direct or be indirect in machine On tool, electrically, in logic or in other ways link to allow to interact, even if the two features may not direct Connection is also such.That is, " coupling " is intended to encompass the direct connection and connection, including benefit indirectly of element or other feature With the connection of one or more intermediary elements.

In addition, middle certain term of use can also be described below, and thus not anticipate just to the purpose of reference Figure limits.For example, unless clearly indicated by the context, be otherwise related to the word " first " of structure or element, " second " and it is other this Class number word does not imply order or sequence.

It should also be understood that one word of "comprises/comprising" as used herein, illustrates that there are pointed feature, entirety, steps Suddenly, operation, unit and/or component, but it is not excluded that in the presence of or increase one or more of the other feature, entirety, step, behaviour Work, unit and/or component and/or their combination.

In the disclosure, term " component " and " system ", which are intended that, is related to an entity related with computer, or hard Part, the combination of hardware and software, software or software in execution.For example, a component can be, but it is not limited to, is locating Process, object, executable, execution thread, and/or the program etc. run on reason device.It is illustrated with, in a server Both the application program of upper operation and the server can be a component.One or more components can reside in one The process of execution and/or the inside of thread, and a component can be located on a computer and/or be distributed on two Between platform or more.

It should be appreciated by those skilled in the art that the boundary between aforesaid operations is merely illustrative.Multiple operations It can be combined into single operation, single operation can be distributed in additional operation, and operating can at least portion in time Divide and overlappingly executes.Moreover, alternative embodiment may include multiple examples of specific operation, and in other various embodiments In can change operation order.But others are modified, variations and alternatives are equally possible.Therefore, the specification and drawings It should be counted as illustrative and not restrictive.

Although being described in detail by some specific embodiments of the example to the disclosure, the skill of this field Art personnel it should be understood that above example merely to be illustrated, rather than in order to limit the scope of the present disclosure.It is disclosed herein Each embodiment can in any combination, without departing from spirit and scope of the present disclosure.It is to be appreciated by one skilled in the art that can be with A variety of modifications are carried out without departing from the scope and spirit of the disclosure to embodiment.The scope of the present disclosure is limited by appended claims It is fixed.

Claims

1. a kind of includes the recognition methods of the image information of multiple documents characterized by comprising

The first trained based on the image and in advance model, identifies every in one or more document regions on the image A document region, wherein first model is model neural network based；

Image and the second trained in advance model based on each document, identify one or more regions on the document In each region, all or part of information phase recorded on each region and the document in one or more of regions Association, wherein second model is model neural network based；

Cut and obtain the image in each region in one or more of regions, each of one or more of regions The image in region is by being parallel to horizontal rectangle or having inclined rectangle to define relative to horizontal line；And

Image based on each region in one or more of regions and third model trained in advance, described in identification The character in each region in one or more regions, so that it is determined that the information recorded on the document, wherein the third Model is model neural network based.

2. recognition methods according to claim 1, which is characterized in that one or more document areas on the identification image After each document region step in domain, each document region is cut and obtains the image of each document, it later will be every The image of a document inputs second model respectively and is handled.

3. recognition methods according to claim 1, which is characterized in that the cutting simultaneously obtains one or more of regions In each region image step after, in response to having inclined rectangle relative to horizontal line, to the image in each region Carry out slant correction processing, and the image in treated each region inputted into the third model, come identify it is one or The character in each region in multiple regions.

4. recognition methods according to claim 1, which is characterized in that the method is based on institute by the third model Image and its position in whole document in each region in one or more regions are stated, it is one or more to identify The character in each region in a region.

5. recognition methods according to claim 1, which is characterized in that the method also includes:

Image and second model based on the document, also identification and each region phase in one or more of regions The information type of associated information；And

The information type based on the information associated with each region in one or more of regions identified, with And the character in each region in the one or more of regions identified, to determine the letter recorded on the document Breath.

6. recognition methods according to claim 1, which is characterized in that identifying one or more regions on the document In each region before, the method also includes:

Image and the 4th trained in advance model based on the document, identify the classification of the document, wherein the 4th mould Type is model neural network based；And

It is selected according to the classification identified by second model to be used and/or the third model.

7. recognition methods according to claim 6, which is characterized in that the classification of the document includes at least languages.

8. recognition methods according to claim 1, which is characterized in that first model is obtained by following process:

Processing is labeled to the image sample that each of the first document image sample training concentration includes multiple documents, with mark Outpour each document region in one or more document regions in each image sample；And

By the first document image sample training collection by the mark processing, first nerves network is trained, To obtain first model.

9. recognition methods according to claim 1, which is characterized in that second model is obtained by following process:

The each document image sample concentrated to the second document image sample training is labeled processing, each described to mark out Each region in one or more regions in document image sample, each region and institute in one or more of regions The all or part of information stated in document image sample is associated；And

By the second document image sample training collection by the mark processing, nervus opticus network is trained, To obtain second model.

10. recognition methods according to claim 1, which is characterized in that the third model is obtained by following process:

The each document image sample concentrated to third document image sample training is labeled processing, each described to mark out The character in each region and each region in one or more regions in document image sample, it is one or more of Each region in region is associated with all or part of information in the document image sample；And

By the third document image sample training collection by the mark processing, third nerve network is trained, To obtain the third model.

11. recognition methods according to claim 6, which is characterized in that the 4th model is obtained by following process:

The each document image sample concentrated to the 4th document image sample training is labeled processing, each described to mark out The classification of document image sample；And

By the 4th document image sample training collection by the mark processing, fourth nerve network is trained, To obtain the 4th model.

12. recognition methods according to claim 8, which is characterized in that the first nerves network is based on target detection The neural network of algorithm.

13. recognition methods according to claim 9, which is characterized in that the nervus opticus network is based on depth residual error What network was established.

14. recognition methods according to claim 10, which is characterized in that the third nerve network is based on recurrent neural What network was established.

15. recognition methods according to claim 11, which is characterized in that the fourth nerve network is based on depth convolution Neural network.

16. a kind of identifying system of document information characterized by comprising

First model, first model are models neural network based；

Second model, second model are models neural network based；

Third model, the third model are models neural network based；And

One or more first devices, one or more of first devices are configured as:

The first trained based on the image and in advance model, identifies every in one or more document regions on the image A document region；

Image and second model based on each document, identify every in one or more regions on the document A region, all or part of information recorded on each region and the document in one or more of regions are associated；

Image based on each region in one or more of regions and third model trained in advance, described in identification The character in each region in one or more regions, so that it is determined that the information recorded on the document.

17. a kind of non-transitorycomputer readable storage medium, which is characterized in that the non-transitory computer-readable storage medium The executable instruction of series of computation machine is stored in matter, when the instruction that the series of computation machine can be performed is one or more When computing device executes, so that one or more of computing devices: