WO2022142549A1 - 文本识别方法、装置和存储介质 - Google Patents

文本识别方法、装置和存储介质 Download PDF

Info

Publication number
WO2022142549A1
WO2022142549A1 PCT/CN2021/121541 CN2021121541W WO2022142549A1 WO 2022142549 A1 WO2022142549 A1 WO 2022142549A1 CN 2021121541 W CN2021121541 W CN 2021121541W WO 2022142549 A1 WO2022142549 A1 WO 2022142549A1
Authority
WO
WIPO (PCT)
Prior art keywords
area
field
processed
certificate
target
Prior art date
Application number
PCT/CN2021/121541
Other languages
English (en)
French (fr)
Inventor
詹明捷
刘学博
梁鼎
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Publication of WO2022142549A1 publication Critical patent/WO2022142549A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present disclosure relates to the technical field of text recognition, and in particular, to a text recognition method, device and storage medium.
  • OCR Optical Character Recognition
  • the traditional optical character recognition method generally only supports the recognition of images with a fixed layout, that is, the position of the characters to be recognized in the layout is required to be determined. For images whose layout is not fixed, the recognition accuracy is lower.
  • the present disclosure provides a text recognition method, device and storage medium.
  • a text recognition method comprising: acquiring a first area in a template certificate; determining a second area corresponding to the first area in the certificate to be processed, the The document to be processed is of the same document type as the template document, and the relative position of the first area in the template document is the same as the relative position of the second area in the document to be processed; Text recognition is performed on the second area according to the identification method corresponding to the attribute information of the field in one area.
  • the attribute information of the field includes at least one of a character type of the field and a font type of the field.
  • the method further includes: after performing text recognition on the second region, determining whether the second region is based on at least one of location information and semantic information of text in the second region is the target area to be adjusted; adjust the target area; and perform text recognition on the adjusted target area.
  • determining whether the second area is a target area that needs to be adjusted based on at least one of the location information and semantic information of the fields in the second area includes: meeting at least one of the following conditions
  • the second area of the second area is determined as the target area: the position of the field in the second area exceeds the boundary of the second area; the semantics of the field in the second area is incomplete; the field in the second area is the same as the The fields in the first region belong to different semantic categories.
  • the adjusting the target areas includes: when the number of the target areas is greater than a preset number threshold and the offset directions of the target areas are the same, determining a plurality of target areas. an overall offset of the target area; and adjust the plurality of target areas based on the overall offset.
  • the adjusting the target area includes: when the number of the target areas is not greater than a preset number threshold, or there are at least two target areas with different offset directions, determining an offset of a first target area in the document to be processed; and adjusting a second target area other than the first target area based on the offset of the first target area.
  • the first target area is detected before the second target area.
  • the adjusting the target area includes: searching the document to be processed for a field having the same semantic category as a field of the first area; adjusting the target area to find The second area where the fields of the .
  • the determining the second area corresponding to the first area in the document to be processed includes: determining the second area corresponding to the first area in the document to be processed based on a pre-established transformation matrix;
  • the transformation matrix is determined based on the following manner: establishing k first matrices based on k third areas in the template certificate and k fourth areas in the to-be-processed certificate, 1 ⁇ k ⁇ N, k and N are positive integers, N is the total number of groups of the third area and the fourth area, the third area in each group corresponds to the fourth area one-to-one and includes the same text information; for the k For each of the first matrices, match the remaining N-k groups of the third area and the fourth area based on the first matrix, and determine the number of groups that are successfully matched; The first matrix with the largest number of successfully matched groups is determined as the transformation matrix.
  • the establishing k first matrices based on the k third areas in the template credential and the k fourth areas in the to-be-processed credential includes: obtaining k first matrices from the k third areas in the template credential Select multiple point pairs from the i third area and the i th fourth area in the document to be processed, where i is a positive integer, and the multiple point pairs may include the center point pair of the first field and the center point of the last field point-to-point pair, mid-point-point pair of the upper boundary of the region, and mid-point-point pair of the lower boundary of the region; based on the i-th third region and the i-th fourth region, establish the k-th The ith first matrix in a matrix.
  • the method further includes: after performing text recognition on the second area, based on at least one of semantic information and location information of fields in the second area, The fields are split into multiple new second areas; text recognition is performed on each of the new second areas respectively.
  • the splitting the fields in the second area into multiple new second areas based on the semantic information of the fields in the second area includes: based on the semantic information of the fields in the second area Semantic information of the field, the fields in the second area are divided into multiple field groups, and the semantics of the fields in the different field groups are irrelevant; each of the field groups is split into a new second area middle.
  • the number of the second regions is multiple; the method further includes: after the text recognition is performed on the second regions, recognizing at least two second regions in the second regions The result is semantically recognized as a whole; based on the semantic recognition results of the at least two second regions as a whole, text information is output.
  • the performing text recognition on the second area based on the recognition mode corresponding to the attribute information of the field in the first area includes: calling based on the attribute information of the field in the first area Corresponding neural network; perform text recognition on the second region through the called neural network.
  • the document to be processed includes a fixed field and a non-fixed field; after performing text recognition on the second area, the method further includes: sending the identified non-fixed field and the non-fixed field to the target device.
  • the association relationship of the fixed field in the document to be processed is performed, so that the target device displays the fixed field and the identified non-fixed field in association based on the association relationship.
  • a text recognition device the device includes: an acquisition module for acquiring a first area in a template certificate; a determination module for determining whether the certificate to be processed corresponds to the first area in the certificate A second area corresponding to an area, the certificate to be processed has the same certificate type as the template certificate, the relative position of the first area in the template certificate and the second area in the certificate to be processed The relative positions of the first regions are the same; the first recognition module is configured to perform text recognition on the second region based on the recognition mode corresponding to the attribute information of the fields in the first region.
  • the attribute information of the field includes the character type of the field and/or the font type of the field.
  • the apparatus further includes: an adjustment module, configured to determine whether the second area is a target area to be adjusted based on the position information and/or semantic information of the text in the second area; and The target area is adjusted, and text recognition is performed on the adjusted target area.
  • an adjustment module configured to determine whether the second area is a target area to be adjusted based on the position information and/or semantic information of the text in the second area.
  • the adjustment module is configured to: determine a second area that satisfies at least one of the following conditions as the target area: the position of the field in the second area exceeds the boundary of the second area; the The semantics of the fields in the second area are incomplete; the fields in the second area and the fields in the first area belong to different semantic categories.
  • the adjustment module is configured to: in the case that the number of the target areas is greater than a preset number threshold, and the offset directions of the respective target areas are the same, determine the overall offset of the multiple target areas, The plurality of target regions are adjusted based on the overall offset.
  • the adjustment module is configured to: in the case that the number of the target areas is not greater than a preset number threshold, or the offset directions of at least two target areas are different, determine the number of the documents to be processed.
  • the offset of the first target area is adjusted based on the offset of the first target area, and the second target area other than the first target area is adjusted.
  • the first target area is detected before the second target area.
  • the adjustment module is configured to search the document to be processed for a field having the same semantic category as the field of the first area, and adjust the target area to the second area where the searched field is located.
  • the determining module is configured to: determine a second area corresponding to the first area in the document to be processed based on a pre-established transformation matrix; based on the k third areas in the template document and all the The k fourth areas in the document to be processed establish k first matrices, 1 ⁇ k ⁇ N, k and N are both positive integers, N is the total number of groups of the third area and the fourth area, The third area in each group corresponds to the fourth area one-to-one and includes the same text information; for each of the k first matrices, the remaining N-k third areas are determined based on the first matrix. Matching with the groups in the fourth area to determine the number of successfully matched groups; determining the first matrix with the largest number of successfully matched groups as the conversion matrix.
  • the determining module is configured to: select a plurality of point pairs from the i-th third area in the template certificate and the i-th fourth area in the to-be-processed certificate, where i is a positive Integer, the plurality of point pairs may include the center point pair of the first field, the center point pair of the last field, the midpoint point pair of the upper boundary of the area and the midpoint point pair of the lower boundary of the area; based on the i-th third area and A plurality of point pairs selected in the i-th fourth region determine the i-th first matrix in the plurality of first matrices.
  • the apparatus further includes: a splitting module for, after performing text recognition on the second area, based on semantic information and/or location information of fields in the second area, splitting the The fields in the second area are split into multiple new second areas; the second recognition module is used for text recognition for each new second area respectively.
  • the splitting module is configured to: based on the semantic information of the fields in the second area, divide the fields in the second area into a plurality of field groups, and the semantics of the fields in different field groups Not relevant; split each field group into a new second area.
  • the apparatus further includes: a third recognition module, configured to identify at least one of the second areas after text recognition is performed on the second area. Semantic recognition is performed on the recognition results of the two second regions as a whole; an output module is configured to output text information based on the semantic recognition results of the at least two second regions as a whole.
  • the first recognition module is configured to: call a corresponding neural network based on attribute information of a field in the first area; perform text recognition on the second area through the called neural network.
  • the document to be processed includes a fixed field and a non-fixed field; after performing text recognition on the second area, the method further includes: sending the identified non-fixed field and the non-fixed field to the target device.
  • the association relationship of the fixed field in the document to be processed is performed, so that the target device displays the fixed field and the identified non-fixed field in association based on the association relationship.
  • a text recognition system including: a client, configured to upload the certificate to be processed, and send the certificate to be processed to a server; and a server, configured to execute any task of the present disclosure. The method of an embodiment.
  • the document to be processed includes a fixed field and a non-fixed field; the server is further configured to: after performing text recognition on the second area, send the identified non-fixed field to the client the association relationship between the field and the fixed field in the document to be processed; the client is further configured to: in response to receiving the association relationship sent by the server, compare the fixed field with the fixed field based on the association relationship The identified non-fixed fields are displayed in association.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, implements the method described in any one of the embodiments.
  • a computer device including a memory, a processor, and a computer program stored in the memory and executable on the processor, the processor implementing any implementation when executing the program method described in the example.
  • a text processing method applied to a client, the method includes: uploading a certificate to be processed to a server, where the certificate to be processed includes fixed fields and non-fixed fields;
  • the association relationship between the identified non-fixed field sent by the server and the fixed field in the document to be processed, and based on the association relationship, the fixed field and the identified non-fixed field are associated and displayed;
  • the association between the identified non-fixed field and the fixed field in the document to be processed is an identification result obtained by the server by performing the method described in any of the foregoing embodiments of the present disclosure to identify the document to be processed.
  • a computer program including computer-readable codes, when the computer-readable codes are executed by a processor, the methods described in any of the foregoing embodiments of the present disclosure are implemented.
  • the embodiment of the present disclosure performs text recognition on the document to be processed based on the template certificate. Since the document to be processed is of the same type as the template certificate, the to-be-recognized area can be accurately located from the certificate to be processed based on the template certificate. In addition, since the attribute information of the field in the first area in the template certificate is the same as the attribute information of the field in the second area in the certificate to be processed, according to the different attribute information of the field in the first area, different identification methods are used to identify the second area. It can reduce the recognition errors when recognizing fields with different categories but high similarity, thereby improving the accuracy of text recognition.
  • FIG. 1 is a flowchart of a text recognition method according to an embodiment of the present disclosure.
  • FIG. 2 is a schematic diagram of establishing a conversion matrix according to an embodiment of the present disclosure.
  • 3A to 3C are schematic diagrams of situations in which the second region needs to be adjusted according to an embodiment of the present disclosure.
  • 4A to 4C are schematic diagrams of an adjustment manner of the second region according to an embodiment of the present disclosure.
  • 5A to 5C are schematic diagrams of creating a template credential according to an embodiment of the present disclosure.
  • FIG. 6 is a schematic diagram of a text recognition result according to an embodiment of the present disclosure.
  • FIG. 7 is a block diagram of a text recognition apparatus according to an embodiment of the present disclosure.
  • FIG. 8 is a schematic structural diagram of a computer device according to an embodiment of the present disclosure.
  • FIG. 9 is a schematic diagram of a text recognition system according to an embodiment of the present disclosure.
  • first, second, third, etc. may be used in this disclosure to describe various pieces of information, such information should not be limited by these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information, and similarly, the second information may also be referred to as the first information, without departing from the scope of the present disclosure.
  • word "if” as used herein can be interpreted as "at the time of” or "when” or "in response to determining.”
  • an embodiment of the present disclosure provides a text recognition method, and the method may include:
  • Step 101 Obtain the first area in the template certificate
  • Step 102 Determine a second area corresponding to the first area in the certificate to be processed, the certificate to be processed is the same as the certificate type of the template certificate, and the relative position of the first area in the template certificate is the same as that of the template certificate.
  • the relative positions of the second areas in the document to be processed are the same;
  • Step 103 Perform text recognition on the second area based on the recognition method corresponding to the attribute information of the field in the first area.
  • the methods of the embodiments of the present disclosure may be executed by a server, and the server may be a single server or a server cluster including multiple servers.
  • the server may pre-store the template credential.
  • a number of different categories of template credentials can be stored.
  • the different types of template certificates may be ID cards, driver's licenses, Hong Kong and Macau passes, and the like.
  • the first area may be an area including a non-fixed field in the template certificate, and a non-fixed field refers to a field with different text contents in multiple different certificates of the same category.
  • the template document also includes fixed fields, ie, fields with the same text content in multiple different documents of the same category.
  • the ID cards of different residents include fields “name” and "gender", etc., then "name” and “gender” are fixed fields, also called reference fields or reference fields.
  • the name on Zhang San's ID card is specifically "Zhang San”
  • the name on Li Si's ID card is specifically "Li Si", then "Zhang San” and "Li Si” belong to non-fixed fields, also known as identification field.
  • One or more first areas may be included in a template document.
  • each first area may include only one text line (called a group of fields), and each text line may include one or more characters arranged horizontally.
  • the characters may include, but are not limited to, at least one or a combination of at least two of numbers, letters, Chinese characters, and symbols.
  • the first area can be manually selected by the user when creating the template certificate, or obtained through a pre-trained neural network or other methods.
  • a second area corresponding to the first area in the document to be processed may be determined.
  • the certificate to be processed may be in a picture format or a portable document format (Portable Document Format, PDF) and other formats.
  • the document to be processed is of the same document type as the template document.
  • the template certificate is also an identity card.
  • the relative position of the first area in the template document is the same as the relative position of the second area in the document to be processed.
  • the relative position of an area in the document refers to a normalized position obtained by normalizing the position of the area based on the size of the document.
  • the position of an area can be represented by the position of a feature point on the area, and the feature point can be a center point or a corner point of the area.
  • the coordinates of the feature points of the first area are (x1, y1)
  • the length (horizontal dimension) and height (vertical dimension) of the template certificate are (X1, Y1) respectively
  • the second area The coordinates of the feature points of the document are (x2, y2)
  • the length (the size in the horizontal direction) and the height (the size in the vertical direction) of the document to be processed are (X2, Y2) respectively, then the following conditions are met:
  • the third area including the reference field in the template document and the fourth area including the reference field in the document to be processed may be obtained first. Then, multiple point pairs are selected from the third and fourth regions. Each point pair includes a first point in the third area and a second point in the fourth area, and the relative position of the first point in the third area and the relative position of the second point in the fourth area The relative positions are the same.
  • the plurality of point pairs may include a point pair consisting of the center point of the first field of the third area and the center point of the first field of the fourth area, and the center point of the last field of the third area
  • a transformation matrix may be established from the plurality of point pairs, and then the first region is transformed based on the transformation matrix to determine the second region.
  • the third region and the fourth region may also be filtered, and only the completely matched third region and the fourth region are retained.
  • the exact match means that the fields in the third area and the fourth area are completely identical.
  • the final reserved third and fourth regions can be used to build the transformation matrix.
  • k first matrices may be established based on k third areas in the template credential and k fourth areas in the to-be-processed credential, where 1 ⁇ k ⁇ N, where k and N are both A positive integer, N is the total number of groups of the third area and the fourth area, and the third area and the fourth area in each group are in one-to-one correspondence and include the same text information.
  • the remaining N-k groups of the third region and the fourth region are matched based on the first matrix, and the number of successfully matched groups is determined.
  • the first matrix with the largest number of successfully matched groups is determined as the transformation matrix.
  • the numerical values in this embodiment are only for illustration, and the numerical values used in practical applications are not limited thereto.
  • the ith group includes the ith first region and the ith second region, that is, the ith third region corresponds to the ith fourth region, i is a positive integer, 1 ⁇ i ⁇ 5.
  • Multiple point pairs can be selected from the first third region and the first fourth region to establish a first matrix M 1
  • multiple point pairs can be selected from the second third region and the second fourth region, A first matrix M 2 is established.
  • the third third area and the third fourth area, the fourth third area and the fourth fourth area, and the fifth third area and the fifth Multiple point pairs in the four regions are matched, and the number m 1 of successfully matched groups is determined.
  • the 4th third area and the 4th fourth area and the 5th third area and the 5th fourth area respectively based on the first matrix M2 Match a plurality of point pairs in , and determine the successful matching group number m 2 .
  • a first matrix corresponding to the larger of the group numbers m 1 and m 2 is determined as the transformation matrix.
  • the above-mentioned method of selecting the optimal first matrix from the plurality of first matrices as the conversion matrix improves the determination of the second area. accuracy, thereby improving the accuracy of text recognition.
  • the above-mentioned processing of the first matrix M 1 and the first matrix M 2 may be performed in parallel, or may be performed in series, which is not limited in the present disclosure.
  • a recognition method for performing text recognition on the second area may be determined based on attribute information of the fields in the first area.
  • the attribute information of the field includes the character type of the field and/or the font type of the field.
  • the fields in the first area may include one or more characters, the character types may include but are not limited to one of numbers, letters, symbols, Chinese characters, and mixed, and the mixed character type refers to the first.
  • the fields in the area include various character types, such as mixed character types of numbers and letters, mixed character types of numbers and Chinese characters, and so on.
  • the font type of the field includes but is not limited to one of Song, Kai, Times New Roman, and mixed, and the mixed font type means that the field in the first area includes multiple font types.
  • the attribute information of the fields in the first area can be manually input by the user when creating the template, or can be identified through a neural network model.
  • Performing text recognition on the second area based on the recognition method corresponding to the attribute information of the fields in the first area can reduce recognition errors when the field types are different but the similarity is high, thereby improving the text recognition accuracy.
  • the fields in the second area include the letter "O", which can easily be confused with the number "0” if the text recognition method common to all fields is used.
  • the method of text recognition for the text of the letter type can be used to avoid recognizing the letter "O" as the number "0", thereby improving the recognition accuracy.
  • the photo of the certificate to be processed uploaded by the user may be different from the template certificate due to the shooting angle and other reasons.
  • the size and spacing of the second area in the document to be processed may be different from the size and spacing of the corresponding first area in the template document. Therefore, after the text recognition is performed on the second area, it can also be determined whether the second area is a target area that needs to be adjusted based on the position information and/or semantic information of the text in the second area, and the The target area is adjusted, and the text recognition is performed again on the adjusted target area.
  • the adjusting includes adjusting the orientation of the target area so that the semantic information of the fields in the target area is complete, and may also include adjusting the size of the target area so that only one text line is included in a target area.
  • the second area that satisfies at least one of the following conditions can be used as the target area:
  • Condition 1 The semantics of the fields in the second area are incomplete, that is, the second area only includes a sentence or a part of a word.
  • the solid line frame is the second area. It can be seen that the boundary of the second area divides the two words belonging to the same word “community”, resulting in that only the word “community” is included in the second area
  • the word "small” in that is, the semantics of the fields in the second area are incomplete.
  • Condition 2 The position of the field in the second area exceeds the boundary of the second area. As shown in FIG. 3B , the field in the second area exceeds the upper border of the second area.
  • Condition 3 The fields in the second area and the fields in the first area belong to different semantic categories. As shown in Fig. 3C, the semantic class of the field in the first area of the template document is "name”, while the semantic class of the field in the second area of the document to be processed is "age”, and the two belong to different semantic classes.
  • the text recognition results may deviate greatly from the real results. Therefore, it is necessary to adjust the position of the target area that satisfies any of the above conditions, so as to improve the accuracy of text recognition.
  • the target area may be adjusted based on the adjustment mode of the target area around the target area and/or the semantic information of each field in the document to be processed.
  • the overall offsets of the multiple target areas may be determined, and based on the overall offsets The plurality of target areas are adjusted.
  • the preset number threshold may be determined based on the product of the number of the second areas in the document to be processed and a preset weight, where the preset weight is a positive number less than or equal to 1.
  • the preset number threshold may be equal to 90% of the number of the second areas, or the preset number threshold may be equal to the number of the second areas.
  • the overall offset of the multiple target areas may be equal to the average offset of each of the multiple target areas.
  • the average offset of each target area in the horizontal direction and the average offset of each target area in the vertical direction can be calculated separately, and then the average offset in the horizontal direction can be calculated according to the average offset in the horizontal direction.
  • the multiple target areas are adjusted in the vertical direction, and the multiple target areas are adjusted in the vertical direction according to the calculated average offset in the vertical direction.
  • the document 401 to be processed before adjustment includes three second areas 401a, 401b and 401c, and these three second areas all have upward offsets, then according to the average of the three second areas The offset amount is adjusted downwards to obtain the adjusted certificate to be processed 402 .
  • the offset of the first target area in the document to be processed may be determined.
  • the second target area other than the first target area is adjusted based on the offset of the first target area. Since there are target areas with different offset directions, it is necessary to adjust the target areas with different offset directions respectively, so as to improve the accuracy of adjusting the target areas.
  • the offset of the first target area is used as a reference amount for adjusting the second target area, and the adjustment amount for adjusting the second target area can be more accurately determined.
  • the adjustment amount for adjusting the target area 403 may be determined according to the offset amount of the target area 404 and the offset amount of the target area 405 .
  • the first target area may be detected before the second target area.
  • the detected second target area may be adjusted based on the offset of the detected first target area.
  • the detection may be performed along a certain direction of the document to be processed, and the specific direction may be from top to bottom, or from left to right, etc.
  • a field having the same semantic category as the field in the first area may also be searched from the document to be processed, and the target area may be adjusted to the second area where the searched field is located.
  • the target area can be adjusted in the manner of this embodiment, thereby improving the accuracy of adjusting the target area in the above-mentioned situation.
  • the field "21" of the same semantic category of "age” can be searched in the document to be processed ”, thereby adjusting the target area 407 including the field “21” as shown in the right part of FIG. 4C.
  • the fields in the second area may be split into a plurality of fields based on semantic information and/or location information of the fields in the second area In the new second area; perform text recognition on each new second area separately.
  • a plurality of text lines may be relatively close, or a second area may include multiple text lines.
  • the text line "Li Si” and the text line "Female” are included in the second area.
  • the second area needs to be split to get two new second areas, one of which only includes the text line "Li Si", and the other new second area only includes the text Line "female".
  • the fields in the second area can be divided into multiple field groups, and the semantics of the fields in different field groups are not related; into a new second area.
  • the semantics of the text line "Li Si” and the text line "Female” are "name” and "gender” respectively, which belong to fields of different semantic categories, and their semantics are irrelevant, so that the text line can be "Li Si” and the text line "Female” are split into two different new second areas.
  • semantic recognition may be performed on the recognition results of at least two second regions in the plurality of second regions as a whole; based on the at least two second regions The overall semantic recognition results, output text information. This process may be referred to as joint semantic recognition.
  • the joint semantic recognition may be performed after adjusting the target area in the second area.
  • the text information "Li Si”, “XX Street, XX City, XX province” and “XX Community No. XX” are respectively identified from the three second areas in the document to be processed, then the text information "Li Si” and "XX No. XX Street in XX City, province” performs joint semantic recognition to determine whether the two pieces of text information are related, and if so, combine the two pieces of text information into the same piece.
  • joint semantic recognition can be performed on "XX street in XX city, XX province” and "XX number in XX community”. Since the semantic categories of the two pieces of text information are both addresses, the two pieces of text information can be combined into the same piece to obtain the text information "XX District, XX Street, XX City, XX City, XX province".
  • the text recognition method in the above embodiment can be used to recognize the non-fixed field in the document to be processed.
  • the text recognition result of the non-fixed field can be associated with the fixed field in the document to be processed to determine the fixed field to which the recognition result of each non-fixed field belongs. For example, after obtaining the text information "XX District, XX Street, XX City, XX City, XX province", the text information can be associated with the fixed field "Residential Address”. Further, the association result can also be output, for example, the text information of the identified non-fixed field can be output to the tail of the associated fixed field.
  • the second area where the field is located can be determined based on the coordinates of the field. Then, the field Wn is output to the tail of the previous field Wn -1 in the second region. If the field Wn is the first field in the second area, it is directly output to the tail of the corresponding fixed field. For example, for the field "Li Si" in the second area, the first field “Li” can be output to the end of the fixed field "Name", and the second field “Si" can be output to the field "Li”" at the end.
  • an association relationship between the identified non-fixed field and the fixed field in the document to be processed may be sent to the target device, so that the target device can associate the fixed field with the fixed field based on the association relationship.
  • the identified non-fixed fields are displayed in association.
  • the above text recognition can be realized by using a neural network.
  • a corresponding neural network may be called based on the attribute information of the fields in the first area, and text recognition is performed on the second area through the called neural network. Text recognition through neural network can obtain high recognition accuracy.
  • a template credential may be pre-created.
  • the photo of the template ID can be collected first and uploaded to the client, and then the corners of the uploaded template ID photo can be adjusted to adjust the size of the template ID photo. Further, perspective transformation can also be performed on the template ID photo to adjust the angle and direction of the text in the template ID photo.
  • the first area (the left area in FIG. 5B ) can be selected from the template ID photo, and the field names (for example, date of birth, gender, name, ID number, etc.) and fields of the identification fields in the first area can also be selected. type (eg, text, number, etc.) to edit.
  • the field name and field type may be manually input by the user after selecting the first area, or may be automatically identified by the neural network, and may be manually modified by the user if the identification result is incorrect. Then, a fixed field (the field marked with gray as the background color in the left area of FIG. 5C ) can be selected. Likewise, fixed fields can be entered manually by the user or automatically recognized by the neural network and modified manually by the user. The selected fixed fields are distributed around the template certificate as much as possible to improve the accuracy of the final text recognition result. Once created, the template credential can be saved on the server.
  • the server can identify the location, category and direction of one or more certificates to be processed from the picture or document of the certificate to be processed, and identify For each certificate to be processed, call the corresponding template certificate for identification.
  • an identification result output by the server is shown, which includes a fixed field and an identification field, and the identification field can be outputted to the end of the corresponding fixed field.
  • the writing order of each step does not mean a strict execution order but constitutes any limitation on the implementation process, and the specific execution order of each step should be based on its function and possible Internal logic is determined.
  • the present disclosure also provides a text recognition device, the device comprising:
  • an obtaining module 701, configured to obtain the first area in the template certificate
  • the determining module 702 is configured to determine a second area corresponding to the first area in the certificate to be processed, the certificate to be processed is the same as the certificate type of the template certificate, and the first area is in the template certificate.
  • the relative position is the same as the relative position of the second area in the document to be processed;
  • the first recognition module 703 is configured to perform text recognition on the second area based on the recognition mode corresponding to the attribute information of the fields in the first area.
  • the attribute information of the field includes the character type of the field and/or the font type of the field.
  • the apparatus further includes: an adjustment module, configured to determine whether the second area is a target area to be adjusted based on the position information and/or semantic information of the text in the second area; and The target area is adjusted, and text recognition is performed on the adjusted target area.
  • an adjustment module configured to determine whether the second area is a target area to be adjusted based on the position information and/or semantic information of the text in the second area.
  • the adjustment module is configured to: determine a second area that satisfies at least one of the following conditions as the target area: the position of the field in the second area exceeds the boundary of the second area; the The semantics of the fields in the second area are incomplete; the fields in the second area and the fields in the first area belong to different semantic categories.
  • the adjustment module is configured to: in the case that the number of the target areas is greater than a preset number threshold, and the offset directions of the respective target areas are the same, determine the overall offset of the multiple target areas, The plurality of target regions are adjusted based on the overall offset.
  • the adjustment module is configured to: in the case that the number of the target areas is not greater than a preset number threshold, or the offset directions of at least two target areas are different, determine the number of the documents to be processed.
  • the offset of the first target area is adjusted based on the offset of the first target area, and the second target area other than the first target area is adjusted.
  • the first target area is detected before the second target area.
  • the adjustment module is configured to: search for a field with the same semantic category as a field in the first area from the document to be processed, and adjust the target area to the first field where the searched field is located. Second area.
  • the determining module 702 is configured to: determine a second area corresponding to the first area in the document to be processed based on a pre-established transformation matrix; based on the k third areas in the template document and The k fourth areas in the document to be processed establish k first matrices, 1 ⁇ k ⁇ N, k and N are both positive integers, and N is the total number of groups of the third area and the fourth area , the third area in each group corresponds to the fourth area one-to-one and includes the same text information; for each of the k first matrices, based on the first matrix, the remaining N-k third The regions are matched with the groups of the fourth region, and the number of successfully matched groups is determined; the first matrix with the largest number of successfully matched groups is determined as the conversion matrix.
  • the determining module 702 is configured to: select a plurality of point pairs from the i-th third area in the template certificate and the i-th fourth area in the to-be-processed certificate, where i is A positive integer, the plurality of point pairs may include the center point pair of the first field, the center point pair of the last field, the middle point pair of the upper boundary of the area, and the middle point pair of the lower boundary of the area; based on the i-th third area and a plurality of point pairs selected in the i-th fourth region to determine the i-th first matrix in the plurality of first matrices.
  • the apparatus further includes: a splitting module 705, configured to, after the text recognition is performed on the second area, based on the semantic information and/or position information of the fields in the second area, divide the The fields in the second area are split into multiple new second areas; the second recognition module 706 is configured to perform text recognition on each new second area respectively.
  • a splitting module 705 configured to, after the text recognition is performed on the second area, based on the semantic information and/or position information of the fields in the second area, divide the The fields in the second area are split into multiple new second areas; the second recognition module 706 is configured to perform text recognition on each new second area respectively.
  • the splitting module 705 is configured to: based on the semantic information of the fields in the second area, divide the fields in the second area into multiple field groups, and the fields in different field groups are divided into multiple field groups. Semantics are irrelevant; split each field group into a new second region.
  • the number of the second area is multiple; the apparatus further includes: a third recognition module 704, configured to identify the second area in the second area after text recognition is performed on the second area. Semantic recognition is performed on the recognition results of the at least two second regions as a whole; the output module 707 is configured to output text information based on the semantic recognition results of the at least two second regions as a whole.
  • the first recognition module 703 is configured to: call a corresponding neural network based on the attribute information of the field in the first area; perform text recognition on the second area through the called neural network.
  • the document to be processed includes a fixed field and a non-fixed field; after performing text recognition on the second area, the method further includes: sending the identified non-fixed field and the non-fixed field to the target device.
  • the association relationship of the fixed field in the document to be processed is performed, so that the target device displays the fixed field and the identified non-fixed field in association based on the association relationship.
  • the functions or modules included in the apparatuses provided in the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments.
  • the embodiments of the present specification further provide a computer device, which at least includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements any of the above-mentioned embodiments when executing the program. method described.
  • FIG. 8 shows a more specific schematic diagram of the hardware structure of a computer device provided by an embodiment of this specification.
  • the device may include: a processor 801 , a memory 802 , an input/output interface 803 , a communication interface 804 and a bus 805 .
  • the processor 801 , the memory 802 , the input/output interface 803 and the communication interface 804 realize the communication connection among each other within the device through the bus 805 .
  • the processor 801 can be implemented by a general-purpose central processing unit (Central Processing Unit, CPU), a microprocessor, an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc. program to implement the technical solutions provided by the embodiments of this specification.
  • CPU Central Processing Unit
  • ASIC Application Specific Integrated Circuit
  • the memory 802 may be implemented in the form of a read-only memory (Read Only Memory, ROM), a random access memory (Random Access Memory, RAM), a static storage device, a dynamic storage device, and the like.
  • the memory 802 may store an operating system and other application programs. When implementing the technical solutions provided by the embodiments of this specification through software or firmware, relevant program codes are stored in the memory 802 and invoked by the processor 801 for execution.
  • the input/output interface 803 can be used to connect input/output modules to realize information input and output.
  • the input/output/module can be configured in the device as a component (not shown in the figure), or can be externally connected to the device to provide corresponding functions.
  • the input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc.
  • the output device may include a display, a speaker, a vibrator, an indicator light, and the like.
  • the communication interface 804 can be used to connect a communication module (not shown in the figure), so as to realize the communication interaction between the device and other devices.
  • the communication module may implement communication through wired means (eg, USB, network cable, etc.), or may implement communication through wireless means (eg, mobile network, WIFI, Bluetooth, etc.).
  • Bus 805 may include a path to transfer information between various components of the device (eg, processor 801, memory 802, input/output interface 803, and communication interface 804).
  • the above-mentioned device only shows the processor 801, the memory 802, the input/output interface 803, the communication interface 804 and the bus 805, in the specific implementation process, the device may also include necessary components for normal operation. other components.
  • the above-mentioned device may only include components necessary to implement the solutions of the embodiments of the present specification, rather than all the components shown in the figures.
  • an embodiment of the present disclosure further provides a text recognition system, including a client 901 for uploading the certificate to be processed and sending the certificate to be processed to a server 902; and a server 902 for executing The method described in any embodiment of the present disclosure.
  • the client 901 may be installed on smart terminals such as mobile phones, tablet computers, and desktop computers.
  • the intelligent terminal is provided with an interactive component for uploading photos.
  • the interactive component may be a touch screen, a mouse, a key, and the like.
  • the intelligent terminal may also be provided with a display screen for previewing uploaded photos and text recognition results.
  • the intelligent terminal may also include a communication interface for communicating with the server 902, so as to send the photos uploaded by the user and various instructions sent by the user to the server 902, and receive various information including the text recognition result returned by the server 902. information and instructions.
  • the document to be processed includes fixed fields and non-fixed fields; the server 902 is further configured to: after performing text recognition on the second area, send the recognized data to the client 901
  • the association relationship between the non-fixed field and the fixed field in the document to be processed; the client 901 is further configured to: in response to receiving the association relationship sent by the server 902, perform an association relationship with the fixed field based on the association relationship Fields are displayed in association with the identified non-fixed fields. For example, display non-fixed fields at the end of the corresponding fixed fields.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, implements the method described in any of the foregoing embodiments.
  • Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • computer-readable media does not include transitory computer-readable media, such as modulated data signals and carrier waves.
  • Embodiments of the present disclosure further provide a text processing method, which can be applied to the client 901.
  • the client 901 executes the text processing method
  • the client 901 uploads the certificate to be processed to the server 902
  • the text recognition method according to any one of the embodiments is disclosed to identify the document to be processed and obtain the association relationship between the identified non-fixed field and the fixed field in the document to be processed
  • the fixed field is determined based on the association relationship. Fields are displayed in association with the identified non-fixed fields.
  • An embodiment of the present disclosure further provides a computer program, including computer-readable code, which implements the method described in any embodiment of the present disclosure when the computer-readable code is executed by a processor.
  • a typical implementing device is a computer, which may be in the form of a personal computer, laptop computer, cellular phone, camera phone, smart phone, personal digital assistant, media player, navigation device, email sending and receiving device, game control desktop, tablet, wearable device, or a combination of any of these devices.
  • each embodiment in this specification is described in a progressive manner, and the same and similar parts between the various embodiments may be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and reference may be made to the partial description of the method embodiment for related parts.
  • the device embodiments described above are only illustrative, wherein the modules described as separate components may or may not be physically separated.
  • the functions of each module may be integrated into the same module. or multiple software and/or hardware implementations. Some or all of the modules may also be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Character Input (AREA)

Abstract

本公开实施例提供一种文本识别方法、装置和存储介质,获取模板证件中的第一区域;确定待处理证件中与所述第一区域对应的第二区域,所述待处理证件与所述模板证件的证件类别相同,所述第一区域在所述模板证件中的相对位置与所述第二区域在所述待处理证件中的相对位置相同;基于与所述第一区域中的字段的属性信息对应的识别方式对所述第二区域进行文本识别。

Description

文本识别方法、装置和存储介质
相关申请的交叉引用
本公开要求于2020年12月31日提交的、申请号为202011617846.1、发明名称为“文本识别方法、装置和系统”的中国专利申请的优先权,该中国专利申请公开的全部内容以引用的方式并入本文中。
技术领域
本公开涉及文本识别技术领域,尤其涉及文本识别方法、装置和存储介质。
背景技术
光学字符识别(Optical Character Recognition,OCR)能够将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工。传统的光学字符识别方式一般只支持对版面固定的图像进行识别,即,要求待识别的字符在版面中的位置是确定的。对于版面不固定的图像,识别准确度较低。
发明内容
本公开提供一种文本识别方法、装置和存储介质。
根据本公开实施例的第一方面,提供一种文本识别方法,所述方法包括:获取模板证件中的第一区域;确定待处理证件中与所述第一区域对应的第二区域,所述待处理证件与所述模板证件的证件类别相同,所述第一区域在所述模板证件中的相对位置与所述第二区域在所述待处理证件中的相对位置相同;基于与所述第一区域中的字段的属性信息对应的识别方式对所述第二区域进行文本识别。
在一些实施例中,所述字段的属性信息包括所述字段的字符类型和所述字段的字体类型中的至少一个。
在一些实施例中,所述方法还包括:在对所述第二区域进行文本识别之后,基于所述第二区域中文本的位置信息和语义信息中的至少一个,确定所述第二区域是否为需要调整的目标区域;对所述目标区域进行调整;并对调整后的所述目标区域进行文本识别。
在一些实施例中,所述基于所述第二区域中字段的位置信息和语义信息中的至少一个,确定所述第二区域是否为需要调整的目标区域,包括:将满足以下至少一项条件的第二区域确定为所述目标区域:所述第二区域中字段的位置超出所述第二区域的边界;所述第二区域中字段的语义不完整;所述第二区域中字段与所述第一区域中字段属于不同语义类别。
在一些实施例中,所述对所述目标区域进行调整,包括:在所述目标区域的数量大于预设数量阈值,且各个所述目标区域的偏移方向相同的情况下,确定多个所述目标区 域的整体偏移量;基于所述整体偏移量对所述多个目标区域进行调整。
在一些实施例中,所述对所述目标区域进行调整,包括:在所述目标区域的数量不大于预设数量阈值,或者存在至少两个所述目标区域的偏移方向不同的情况下,确定所述待处理证件中的第一目标区域的偏移量;基于所述第一目标区域的偏移量,对所述第一目标区域以外的第二目标区域进行调整。
在一些实施例中,所述第一目标区域在所述第二目标区域之前检测到。
在一些实施例中,所述对所述目标区域进行调整,包括:从所述待处理证件中查找与所述第一区域的字段具有相同语义类别的字段;将所述目标区域调整为查找到的字段所在的第二区域。
在一些实施例中,所述确定待处理证件中与所述第一区域对应的第二区域,包括:基于预先建立的转换矩阵确定待处理证件中与所述第一区域对应的第二区域;其中,所述转换矩阵基于以下方式确定:基于所述模板证件中的k个第三区域和所述待处理证件中的k个第四区域建立k个第一矩阵,1≤k<N,k和N均为正整数,N为所述第三区域与所述第四区域的总组数,每组中的第三区域与第四区域一一对应且包括的文本信息相同;针对所述k个第一矩阵中的每个第一矩阵,基于所述第一矩阵对其余N-k个第三区域与第四区域的组进行匹配,确定匹配成功的组数;将所述k个第一矩阵中匹配成功的组数最多的第一矩阵确定为所述转换矩阵。
在一些实施例中,所述基于所述模板证件中的k个第三区域和所述待处理证件中的k个第四区域建立k个第一矩阵,包括:从所述模板证件中的第i个第三区域和所述待处理证件中的第i个第四区域中选取多个点对,i为正整数,所述多个点对可以包括首字段的中心点点对、末字段的中心点点对、区域上边界的中点点对以及区域下边界的中点点对;基于所述第i个第三区域和所述第i个第四区域中的多个点对,建立所述k个第一矩阵中的第i个第一矩阵。
在一些实施例中,所述方法还包括:在对所述第二区域进行文本识别之后,基于所述第二区域中字段的语义信息和位置信息中的至少一个,将所述第二区域中的字段拆分到多个新的第二区域中;分别对每个所述新的第二区域进行文本识别。
在一些实施例中,所述基于所述第二区域中字段的语义信息,将所述第二区域中的字段拆分到多个新的第二区域中,包括:基于所述第二区域中字段的语义信息,将所述第二区域中的字段划分为多个字段组,不同所述字段组中的字段的语义不相关;将每个所述字段组拆分到一个新的第二区域中。
在一些实施例中,所述第二区域的数量为多个;所述方法还包括:在对所述第二区域进行文本识别之后,将所述第二区域中至少两个第二区域的识别结果作为整体进行语义识别;基于所述至少两个第二区域整体的语义识别结果,输出文本信息。
在一些实施例中,所述基于与所述第一区域中的字段的属性信息对应的识别方式对所述第二区域进行文本识别,包括:基于所述第一区域中的字段的属性信息调用对应的神经网络;通过调用的神经网络对所述第二区域进行文本识别。
在一些实施例中,所述待处理证件中包括固定字段和非固定字段;在对所述第二区域进行文本识别之后,所述方法还包括:向目标设备发送识别出的非固定字段与所述待处理证件中的固定字段的关联关系,以使所述目标设备基于所述关联关系对所述固定字段与所述识别出的非固定字段进行关联显示。
根据本公开实施例的第二方面,提供一种文本识别装置,所述装置包括:获取模块,用于获取模板证件中的第一区域;确定模块,用于确定待处理证件中与所述第一区域对应的第二区域,所述待处理证件与所述模板证件的证件类别相同,所述第一区域在所述模板证件中的相对位置与所述第二区域在所述待处理证件中的相对位置相同;第一识别模块,用于基于与所述第一区域中的字段的属性信息对应的识别方式对所述第二区域进行文本识别。
在一些实施例中,所述字段的属性信息包括所述字段的字符类型和/或所述字段的字体类型。
在一些实施例中,所述装置还包括:调整模块,用于基于所述第二区域中文本的位置信息和/或语义信息,确定所述第二区域是否为需要调整的目标区域;以及对所述目标区域进行调整,并对调整后的目标区域进行文本识别。
在一些实施例中,所述调整模块用于:将满足以下至少一项条件的第二区域确定为所述目标区域:所述第二区域中字段的位置超出所述第二区域的边界;所述第二区域中字段的语义不完整;所述第二区域中字段与所述第一区域中字段属于不同语义类别。
在一些实施例中,所述调整模块用于:在所述目标区域的数量大于预设数量阈值,且各个目标区域的偏移方向相同的情况下,确定多个目标区域的整体偏移量,基于所述整体偏移量对所述多个目标区域进行调整。
在一些实施例中,所述调整模块用于:在所述目标区域的数量不大于预设数量阈值,或者存在至少两个目标区域的偏移方向不同的情况下,确定所述待处理证件中的第一目标区域的偏移量,基于所述第一目标区域的偏移量,对所述第一目标区域以外的第二目标区域进行调整。
在一些实施例中,所述第一目标区域在所述第二目标区域之前检测到。
在一些实施例中,所述调整模块用于从待处理证件中查找与所述第一区域的字段具有相同语义类别的字段,将所述目标区域调整为查找到的字段所在的第二区域。
在一些实施例中,所述确定模块用于:基于预先建立的转换矩阵确定待处理证件中与所述第一区域对应的第二区域;基于所述模板证件中的k个第三区域和所述待处理证件中的k个第四区域建立k个第一矩阵,1≤k<N,k和N均为正整数,N为所述第三区域与所述第四区域的总组数,每组中的第三区域与第四区域一一对应且包括的文本信息相同;针对所述k个第一矩阵中的每个第一矩阵,基于所述第一矩阵对其余N-k个第三区域与第四区域的组进行匹配,确定匹配成功的组数;将匹配成功的组数最多的第一矩阵确定为所述转换矩阵。
在一些实施例中,所述确定模块用于:从所述模板证件中的第i个第三区域和所述待处理证件中的第i个第四区域中选取多个点对,i为正整数,所述多个点对可以包括首字段的中心点点对、末字段的中心点点对、区域上边界的中点点对以及区域下边界的中点点对;基于所述第i个第三区域和所述第i个第四区域中选取的多个点对,确定所述多个第一矩阵中的第i个第一矩阵。
在一些实施例中,所述装置还包括:拆分模块,用于在对所述第二区域进行文本识别之后,基于所述第二区域中字段的语义信息和/或位置信息,将所述第二区域中的字段拆分到多个新的第二区域中;第二识别模块,用于分别对每个新的第二区域进行文本识别。
在一些实施例中,所述拆分模块用于:基于所述第二区域中字段的语义信息,将所述第二区域中的字段划分为多个字段组,不同字段组中的字段的语义不相关;将每个字段组拆分到一个新的第二区域中。
在一些实施例中,所述第二区域的数量为多个;所述装置还包括:第三识别模块,用于在对所述第二区域进行文本识别之后,将所述第二区域中至少两个第二区域的识别结果作为整体进行语义识别;输出模块,用于基于所述至少两个第二区域整体的语义识别结果,输出文本信息。
在一些实施例中,所述第一识别模块用于:基于所述第一区域中的字段的属性信息调用对应的神经网络;通过调用的神经网络对所述第二区域进行文本识别。
在一些实施例中,所述待处理证件中包括固定字段和非固定字段;在对所述第二区域进行文本识别之后,所述方法还包括:向目标设备发送识别出的非固定字段与所述待处理证件中的固定字段的关联关系,以使所述目标设备基于所述关联关系对所述固定字段与所述识别出的非固定字段进行关联显示。
根据本公开实施例的第三方面,提供一种文本识别系统,包括:客户端,用于上传所述待处理证件,并向服务器发送所述待处理证件;以及服务器,用于执行本公开任一实施例所述的方法。
在一些实施例中,所述待处理证件中包括固定字段和非固定字段;所述服务器还用于:在对所述第二区域进行文本识别之后,向所述客户端发送识别出的非固定字段与所述待处理证件中的固定字段的关联关系;所述客户端还用于:响应于接收到所述服务器发送的所述关联关系,基于所述关联关系对所述固定字段与所述识别出的非固定字段进行关联显示。
根据本公开实施例的第四方面,提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现任一实施例所述的方法。
根据本公开实施例的第五方面,提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现任一实施例所述的方法。
根据本公开实施例的第六方面,提供一种文本处理方法,应用于客户端,所述方法包括:向服务器上传待处理证件,所述待处理证件中包括固定字段和非固定字段;接收所述服务器发送的识别出的所述非固定字段与所述待处理证件中的固定字段的关联关系,并基于所述关联关系对所述固定字段与所述识别出的非固定字段进行关联显示;其中,所述识别出的非固定字段与所述待处理证件中的固定字段的关联关系为所述服务器通过执行本公开前述任一实施例所述的方法对待处理证件进行识别得到的识别结果。
根据本公开实施例的第七方面,提供一种计算机程序,包括计算机可读代码,所述计算机可读代码被处理器执行时实现本公开前述任一实施例所述的方法。
本公开实施例基于模板证件对待处理证件进行文本识别,由于待处理证件与所述模板证件的证件类别相同,从而可以基于模板证件准确地从待处理证件中定位到待识别区域。此外,由于模板证件中第一区域的字段的属性信息与待处理证件中第二区域的字段的属性信息相同,根据第一区域的字段的不同属性信息,采用不同的识别方式对所述第二区域进行文本识别,减少了对类别不同但相似度较高的字段进行识别时的识别错误,从而提高了文本识别准确度。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1是本公开实施例的文本识别方法的流程图。
图2是本公开实施例的建立转换矩阵的示意图。
图3A至图3C是本公开实施例的需要调整第二区域的情况的示意图。
图4A至图4C是本公开实施例的第二区域的调整方式的示意图。
图5A至5C是本公开实施例的建立模板证件的示意图。
图6是本公开实施例的文本识别结果的示意图。
图7是本公开实施例的文本识别装置的框图。
图8是本公开实施例的计算机设备的结构示意图。
图9是本公开实施例的文本识别系统的示意图。
具体实施方式
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本公开相一致的所有实施方式。相反,它们仅是与如 所附权利要求书中所详述的、本公开的一些方面相一致的装置和方法的例子。
在本公开使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本公开。在本公开和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合。
应当理解,尽管在本公开可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本公开范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
为了使本技术领域的人员更好的理解本公开实施例中的技术方案,并使本公开实施例的上述目的、特征和优点能够更加明显易懂,下面结合附图对本公开实施例中的技术方案作进一步详细的说明。
如图1所示,本公开实施例提供一种文本识别方法,所述方法可包括:
步骤101:获取模板证件中的第一区域;
步骤102:确定待处理证件中与所述第一区域对应的第二区域,所述待处理证件与所述模板证件的证件类别相同,所述第一区域在所述模板证件中的相对位置与所述第二区域在所述待处理证件中的相对位置相同;
步骤103:基于与所述第一区域中的字段的属性信息对应的识别方式对所述第二区域进行文本识别。
本公开实施例的方法可由服务器执行,所述服务器可以是单台服务器,也可以是包括多台服务器的服务器集群。在步骤101中,服务器可以预先存储模板证件。具体地,可以存储多种不同类别的模板证件。所述不同类别的模板证件可以是身份证、驾驶证、港澳通行证等。
第一区域可以是模板证件中包括非固定字段在内的区域,非固定字段是指在同一类别的多张不同证件中文本内容不同的字段。模板证件中还包括固定字段,即,在同一类别的多张不同证件中文本内容相同的字段。例如,不同居民的身份证上都包括字段“姓名”和“性别”等,则“姓名”和“性别”属于固定字段,也称为参照字段或者参考字段。而张三的身份证上的姓名具体为“张三”,李四的身份证上的姓名具体为“李四”,则“张三”和“李四”属于非固定字段,也称为识别字段。
一个模板证件中可以包括一个或多个第一区域。为了提高文本识别的准确度,每个第一区域内可以仅包括一个文本行(称为一组字段),每个文本行中可以包括水平排列的一个或多个字符。所述字符可以包括但不限于数字、字母、汉字、符号等中的至少一种或者至少两种的组合。第一区域可以由用户在创建模板证件时手动框选,也可以通过 预先训练的神经网络或者其他方式获取。
在步骤102中,可以确定待处理证件中与所述第一区域对应的第二区域。其中,所述待处理证件可以是图片格式或者便携式文档格式(Portable Document Format,PDF)等格式。所述待处理证件与所述模板证件的证件类别相同。例如,在所述待处理证件为身份证的情况下,所述模板证件也是身份证。所述第一区域在所述模板证件中的相对位置与所述第二区域在所述待处理证件中的相对位置相同。其中,证件中一个区域的相对位置是指基于该证件的尺寸对该区域的位置进行归一化处理后得到的归一化位置。一个区域的位置可以用该区域上的特征点的位置来表示,所述特征点可以是该区域的中心点或者角点。假设所述第一区域的特征点的坐标为(x1,y1),模板证件的长度(水平方向的尺寸)和高度(竖直方向的尺寸)分别为(X1,Y1),所述第二区域的特征点的坐标为(x2,y2),待处理证件的长度(水平方向的尺寸)和高度(竖直方向的尺寸)分别为(X2,Y2),则满足以下条件:
x1/X1=x2/X2;
y1/Y1=y2/Y2。
可以先获取模板证件中包括参考字段的第三区域以及待处理证件中包括参考字段的第四区域。然后,从第三区域和第四区域中选取多个点对。每个点对中包括第三区域中的第一点以及第四区域中的第二点,且所述第一点在第三区域中的相对位置与所述第二点在第四区域中的相对位置相同。例如,所述多个点对可以包括所述第三区域的首个字段的中心点与所述第四区域的首个字段的中心点组成的点对,所述第三区域的最后一个字段的中心点与所述第四区域的最后一个字段的中心点组成的点对、所述第三区域的上边界的中点与所述第四区域的上边界的中点组成的点对、以及所述第三区域的下边界的中点与所述第四区域的下边界的中点组成的点对。根据所述多个点对可以建立变换矩阵,然后,基于所述变换矩阵对所述第一区域进行变换,以确定所述第二区域。
在根据多个点对建立变换矩阵之前,还可以对第三区域和第四区域进行筛选,只保留完全匹配的第三区域和第四区域。其中,所述完全匹配是指第三区域和第四区域中的字段完全相同。最终保留的第三区域和第四区域可用于建立所述变换矩阵。
在一些实施例中,可以基于所述模板证件中的k个第三区域和所述待处理证件中的k个第四区域建立k个第一矩阵,1≤k<N,k和N均为正整数,N为所述第三区域与所述第四区域的总组数,每组中的第三区域与第四区域一一对应且包括的文本信息相同。针对所述k个第一矩阵中的每个第一矩阵,基于所述第一矩阵对其余N-k个第三区域与第四区域的组进行匹配,确定匹配成功的组数。将匹配成功的组数最多的第一矩阵确定为所述转换矩阵。
参见图2,通过一个数值实施例对本公开建立转换矩阵的方式进行说明,其中,k=2,N=5。本领域技术人员可以理解,本实施例中的数值仅为举例说明,实际应用中所采用的数值不限于此。为了便于描述,假设第i个组包括第i个第一区域和第i个第二区域,即第i个第三区域与第i个第四区域相对应,i为正整数,1≤i≤5。可以从第1个第三 区域和第1个第四区域中选取多个点对,建立第一矩阵M 1,从第2个第三区域和第2个第四区域中选取多个点对,建立第一矩阵M 2。然后,通过基于第一矩阵M 1分别对第3个第三区域和第3个第四区域、第4个第三区域和第4个第四区域以及第5个第三区域和第5个第四区域中的多个点对进行匹配,确定匹配成功的组数m 1。通过基于第一矩阵M 2分别对第3个第三区域和第3个第四区域、第4个第三区域和第4个第四区域以及第5个第三区域和第5个第四区域中的多个点对进行匹配,确定匹配成功的组数m 2。将组数m 1和m 2中较大者对应的第一矩阵确定为所述转换矩阵。
相比于将通过一次点对匹配计算得到的第一矩阵直接作为转换矩阵的方式,上述从多个第一矩阵中选取最优的第一矩阵作为转换矩阵的方式,提高了确定第二区域的准确性,从而提高了文本识别的准确性。上述对第一矩阵M 1和第一矩阵M 2的处理可以并行执行,也可以串行执行,本公开对此不作限制。
在步骤103中,可以基于所述第一区域中的字段的属性信息确定对所述第二区域进行文本识别的识别方式。字段的属性信息包括所述字段的字符类型和/或所述字段的字体类型。所述第一区域中的字段可以包括一个或多个字符,所述字符类型可以包括但不限于数字、字母、符号、汉字、混合中的一种,所述混合字符类型是指所述第一区域中的字段包括多种字符类型,例如,数字与字母的混合字符类型、数字与汉字的混合字符类型等。所述字段的字体类型包括但不限于宋体、楷体、Times New Roman、混合中的一种,所述混合字体类型是指所述第一区域中的字段包括多种字体类型。所述第一区域中字段的属性信息可以在创建模板时由用户手动输入,也可以通过神经网络模型识别得到。基于与所述第一区域中的字段的属性信息对应的识别方式对所述第二区域进行文本识别,能够减少字段类型不同但相似度较高时的识别错误,从而提高文本识别准确度。例如,第二区域中的字段包括字母“O”,如果采用对所有字段通用的文本识别方式,很容易将字母“O”与数字“0”相混淆。但如果先确定了字段的字符类型为字母,则可以采用针对字母类型的文本进行文本识别的方式,避免了将字母“O”识别成数字“0”,从而提高了识别准确度。
在一些实施例中,用户上传的待处理证件的照片由于拍摄角度等原因,可能与模板证件存在一定的差异。例如,待处理证件中第二区域的尺寸和间距可能不同于模板证件中对应第一区域的尺寸和间距。因此,在对所述第二区域进行文本识别之后,还可以基于所述第二区域中文本的位置信息和/或语义信息,确定所述第二区域是否为需要调整的目标区域,对所述目标区域进行调整,并对调整后的目标区域重新进行文本识别。所述调整包括调整所述目标区域的方向,以使所述目标区域内的字段的语义信息是完整的,还可以包括调整所述目标区域的大小,使一个目标区域内仅包括一个文本行。参考图3A至图3C,可以将满足以下至少一项条件的第二区域作为目标区域:
条件一:所述第二区域中字段的语义不完整,即第二区域内仅包括一句话或者一个词语中的一部分。如图3A所示,实线框内为第二区域,可以看出,第二区域的边界将属于同一词语“小区”的两个字分割开了,导致第二区域内仅包括词语“小区”中的“小”字,即第二区域中字段的语义不完整。
条件二:所述第二区域中字段的位置超出所述第二区域的边界,如图3B所示,第二区域中的字段超出了第二区域的上边框。
条件三:所述第二区域中字段与所述第一区域中字段属于不同语义类别。如图3C所示,模板证件中第一区域中的字段的语义类别为“姓名”,而待处理证件中第二区域中的字段的语义类别为“年龄”,二者属于不同的语义类别。
在存在上述任一情况时,可能导致文本识别结果与真实结果偏差较大。因此,需要对满足上述任一条件的目标区域的位置进行调整,从而提高文本识别的准确性。在对一个目标区域进行调整时,可以基于该目标区域周围的目标区域的调整方式和/或待处理证件中各个字段的语义信息,对所述目标区域进行调整。
可选地,在所述目标区域的数量大于预设数量阈值,且各个目标区域的偏移方向相同的情况下,可以确定多个目标区域的整体偏移量,基于所述整体偏移量对所述多个目标区域进行调整。在这种情况下,由于多个目标区域的偏移方向相同,各个目标区域的偏移量常常较为接近,因此,基于多个目标区域的整体偏移量,按照相同的方式对所述多个目标区域进行统一调整,能够提高对目标区域的调整效率,较为准确地实现对多个目标区域的调整。所述预设数量阈值可以基于待处理证件中第二区域的数量与预设权重的乘积来确定,所述预设权重为小于或等于1的正数。例如,所述预设数量阈值可以等于所述第二区域数量的90%,或者所述预设数量阈值可以等于所述第二区域的数量。以所述预设数量阈值等于所述第二区域的数量为例,所述多个目标区域的整体偏移量可以等于所述多个目标区域中各个目标区域的平均偏移量。具体来说,可以分别计算所述各个目标区域在水平方向的平均偏移量和所述各个目标区域在竖直方向的平均偏移量,再根据计算出的水平方向的平均偏移量在水平方向上对所述多个目标区域进行调整,根据计算出的竖直方向的平均偏移量在竖直方向上对所述多个目标区域进行调整。
如图4A所示,调整前的待处理证件401中包括三个第二区域401a、401b和401c,这三个第二区域均存在向上的偏移量,则根据这三个第二区域的平均偏移量,将这三个第二区域均向下调整,得到调整后的待处理证件402。
可选地,在所述目标区域的数量不大于预设数量阈值,或者存在至少两个目标区域的偏移方向不同的情况下,可以确定所述待处理证件中的第一目标区域的偏移量,基于所述第一目标区域的偏移量,对所述第一目标区域以外的第二目标区域进行调整。由于存在偏移方向不同的目标区域,因此需要对偏移方向不同的目标区域分别进行调整,从而提高对目标区域进行调整的准确性。本实施例将第一目标区域的偏移量作为对第二目标区域进行调整的参考量,能够较为准确地确定对第二目标区域进行调整的调整量。如图4B所示,可以根据目标区域404的偏移量和目标区域405的偏移量确定对目标区域403进行调整的调整量。
具体来说,第一目标区域可以是在所述第二目标区域之前检测到的。例如,可以基于检测到的第1个目标区域的偏移量,对检测到的第2个目标区域进行调整。其中,所述检测可以沿着待处理证件的某个特定方向进行,所述特定方向可以是从上到下,或者从左到右等。
可选地,还可以从所述待处理证件中查找与所述第一区域的字段具有相同语义类别的字段,将所述目标区域调整为查找到的字段所在的第二区域。在各个目标区域的偏移量比较随机,检测到第一个目标区域的情况下,可以通过本实施例的方式对目标区域进行调整,从而提高上述情况下对目标区域进行调整的准确性。如图4C所示,针对模板证件中第一区域406中的“年龄”这一语义类别的字段“18”,可以在待处理证件中查找与其同为“年龄”这一语义类别的字段“21”,从而将包括字段“21”的目标区域407调整为图4C中右侧部分所示。
在一些实施例中,在对所述第二区域进行文本识别之后,可以基于所述第二区域中字段的语义信息和/或位置信息,将所述第二区域中的字段拆分到多个新的第二区域中;分别对每个新的第二区域进行文本识别。
一般来说,一个第二区域中仅包括一个文本行。但由于拍摄角度等原因可能导致多个文本行距离较近,也可能导致一个第二区域中包括多个文本行。例如,第二区域中包括文本行“李四”和文本行“女”。在这种情况下,需要对第二区域进行拆分,得到两个新的第二区域,其中一个新的第二区域仅包括文本行“李四”,另一个新的第二区域仅包括文本行“女”。通过拆分第二区域,能够减少因拍摄角度等原因导致的识别错误,从而提高文本识别的准确性。
具体来说,可以基于所述第二区域中字段的语义信息,将所述第二区域中的字段划分为多个字段组,不同字段组中的字段的语义不相关;将每个字段组拆分到一个新的第二区域中。例如,在前面的例子中,文本行“李四”和文本行“女”的语义分别是“姓名”和“性别”,二者属于不同语义类别的字段,语义不相关,从而可以将文本行“李四”和文本行“女”拆分到两个不同的新的第二区域中。
在一些实施例中,在对所述第二区域进行文本识别之后,可以将多个第二区域中至少两个第二区域的识别结果作为整体进行语义识别;基于所述至少两个第二区域整体的语义识别结果,输出文本信息。该过程可称为联合语义识别。所述联合语义识别可以在对第二区域中的目标区域进行调整之后进行。
例如,从待处理证件中的三个第二区域中分别识别出文本信息“李四”、“XX省XX市XX街道”和“XX小区XX号”,则可以对“李四”和“XX省XX市XX街道”进行联合语义识别,以判断这两条文本信息是否相关,如果相关,则将这两条文本信息合并为同一条。同理,可以对“XX省XX市XX街道”和“XX小区XX号”进行联合语义识别。由于这两条文本信息的语义类别都是地址,因此,可以将这两条文本信息合并为同一条,得到文本信息“XX省XX市XX街道XX小区XX号”。
上述实施例中的文本识别方法可用于对待处理证件中的非固定字段进行识别。在得到非固定字段的文本识别结果之后,可以将非固定字段的文本识别结果与待处理证件中的固定字段进行关联处理,以确定每一条非固定字段的识别结果所属的固定字段。例如,在得到文本信息“XX省XX市XX街道XX小区XX号”之后,可以将该文本信息与固定字段“居住地址”进行关联。进一步地,还可以对关联结果进行输出,例如,识别出的非固定字段的文本信息可以输出至其关联的固定字段的尾部。具体来说,对于一个 字段W n,可以基于该字段的坐标,确定该字段所在的第二区域。然后,将字段W n输出至所在第二区域的上一个字段W n-1的尾部。如果字段W n是第二区域中的第一个字段,则将其直接输出至对应的固定字段的尾部。例如,对于第二区域中的字段“李四”,可以将其中的第一个字段“李”输出至固定字段“姓名”的尾部,将其中的第二个字段“四”输出至字段“李”的尾部。在一些实施例中,可以向目标设备发送识别出的非固定字段与所述待处理证件中的固定字段的关联关系,以使所述目标设备基于所述关联关系对所述固定字段与所述识别出的非固定字段进行关联显示。
上述文本识别可采用神经网络实现。可以基于所述第一区域中的字段的属性信息调用对应的神经网络,通过调用的神经网络对所述第二区域进行文本识别。通过神经网络进行文本识别,能够获得较高的识别准确性。
下面结合一个具体示例,对本公开实施例的方案进行说明。如图5A至图5C所示,可以预先创建模板证件。具体来说,可以先采集模板证件的照片并上传至客户端,然后,可以对上传的模板证件照片的角点进行调整,以调整模板证件照片的大小。进一步地,还可以对模板证件照片进行透视变换,以调整模板证件照片中文字的角度和方向。然后,可以从模板证件照片中选取第一区域(图5B中左侧区域),还可以对第一区域中的识别字段的字段名(例如,出生日期、性别、姓名、证件号码等)和字段类型(例如,文字、数字等)进行编辑。其中,字段名和字段类型可以在选取第一区域之后,由用户手动输入,也可以由神经网络自动识别,并在识别结果有误的情况下,可以由用户手动修改。随后,可以选取固定字段(如图5C中左侧区域内用灰色作为底色标记出的字段)。同样地,固定字段可以由用户手动输入,也可以由神经网络自动识别,并可以由用户手动修改。选取的固定字段尽量分布在模板证件的四周,以提高最终的文本识别结果的准确度。创建完成之后,可以将模板证件保存在服务器中。
在用户通过网页、客户端等上传待处理证件的图片或文档后,服务器可以从所述待处理证件的图片或文档中识别出一个或多个待处理证件的位置、类别和方向,并针对识别出的每个待处理证件,调用相应的模板证件来进行识别。如图6所示,示出了服务器输出的一种识别结果,其中包括固定字段和识别字段,可将识别字段输出至对应的固定字段的尾部。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定。
如图7所示,本公开还提供一种文本识别装置,所述装置包括:
获取模块701,用于获取模板证件中的第一区域;
确定模块702,用于确定待处理证件中与所述第一区域对应的第二区域,所述待处理证件与所述模板证件的证件类别相同,所述第一区域在所述模板证件中的相对位置与所述第二区域在所述待处理证件中的相对位置相同;
第一识别模块703,用于基于与所述第一区域中的字段的属性信息对应的识别方式 对所述第二区域进行文本识别。
在一些实施例中,所述字段的属性信息包括所述字段的字符类型和/或所述字段的字体类型。
在一些实施例中,所述装置还包括:调整模块,用于基于所述第二区域中文本的位置信息和/或语义信息,确定所述第二区域是否为需要调整的目标区域;以及对所述目标区域进行调整,并对调整后的目标区域进行文本识别。
在一些实施例中,所述调整模块用于:将满足以下至少一项条件的第二区域确定为所述目标区域:所述第二区域中字段的位置超出所述第二区域的边界;所述第二区域中字段的语义不完整;所述第二区域中字段与所述第一区域中字段属于不同语义类别。
在一些实施例中,所述调整模块用于:在所述目标区域的数量大于预设数量阈值,且各个目标区域的偏移方向相同的情况下,确定多个目标区域的整体偏移量,基于所述整体偏移量对所述多个目标区域进行调整。
在一些实施例中,所述调整模块用于:在所述目标区域的数量不大于预设数量阈值,或者存在至少两个目标区域的偏移方向不同的情况下,确定所述待处理证件中的第一目标区域的偏移量,基于所述第一目标区域的偏移量,对所述第一目标区域以外的第二目标区域进行调整。
在一些实施例中,所述第一目标区域在所述第二目标区域之前检测到。
在一些实施例中,所述调整模块用于:从所述待处理证件中查找与所述第一区域的字段具有相同语义类别的字段,将所述目标区域调整为查找到的字段所在的第二区域。
在一些实施例中,所述确定模块702用于:基于预先建立的转换矩阵确定待处理证件中与所述第一区域对应的第二区域;基于所述模板证件中的k个第三区域和所述待处理证件中的k个第四区域建立k个第一矩阵,1≤k<N,k和N均为正整数,N为所述第三区域与所述第四区域的总组数,每组中的第三区域与第四区域一一对应且包括的文本信息相同;针对所述k个第一矩阵中的每个第一矩阵,基于所述第一矩阵对其余N-k个第三区域与第四区域的组进行匹配,确定匹配成功的组数;将匹配成功的组数最多的第一矩阵确定为所述转换矩阵。
在一些实施例中,所述确定模块702用于:从所述模板证件中的第i个第三区域和所述待处理证件中的第i个第四区域中选取多个点对,i为正整数,所述多个点对可以包括首字段的中心点点对、末字段的中心点点对、区域上边界的中点点对以及区域下边界的中点点对;基于所述第i个第三区域和所述第i个第四区域中选取的多个点对,确定所述多个第一矩阵中的第i个第一矩阵。
在一些实施例中,所述装置还包括:拆分模块705,用于在对所述第二区域进行文本识别之后,基于所述第二区域中字段的语义信息和/或位置信息,将所述第二区域中的字段拆分到多个新的第二区域中;第二识别模块706,用于分别对每个新的第二区域进行文本识别。
在一些实施例中,所述拆分模块705用于:基于所述第二区域中字段的语义信息,将所述第二区域中的字段划分为多个字段组,不同字段组中的字段的语义不相关;将每个字段组拆分到一个新的第二区域中。
在一些实施例中,所述第二区域的数量为多个;所述装置还包括:第三识别模块704,用于在对所述第二区域进行文本识别之后,将所述第二区域中至少两个第二区域的识别结果作为整体进行语义识别;输出模块707,用于基于所述至少两个第二区域整体的语义识别结果,输出文本信息。
在一些实施例中,所述第一识别模块703用于:基于所述第一区域中的字段的属性信息调用对应的神经网络;通过调用的神经网络对所述第二区域进行文本识别。
在一些实施例中,所述待处理证件中包括固定字段和非固定字段;在对所述第二区域进行文本识别之后,所述方法还包括:向目标设备发送识别出的非固定字段与所述待处理证件中的固定字段的关联关系,以使所述目标设备基于所述关联关系对所述固定字段与所述识别出的非固定字段进行关联显示。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述。
本说明书实施例还提供一种计算机设备,其至少包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,处理器执行所述程序时实现前述任一实施例所述的方法。
图8示出了本说明书实施例所提供的一种更为具体的计算机设备硬件结构示意图,该设备可以包括:处理器801、存储器802、输入/输出接口803、通信接口804和总线805。其中处理器801、存储器802、输入/输出接口803和通信接口804通过总线805实现彼此之间在设备内部的通信连接。
处理器801可以采用通用的中央处理器(Central Processing Unit,CPU)、微处理器、应用专用集成电路(Application Specific Integrated Circuit,ASIC)、或者一个或多个集成电路等方式实现,用于执行相关程序,以实现本说明书实施例所提供的技术方案。
存储器802可以采用只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、静态存储设备,动态存储设备等形式实现。存储器802可以存储操作系统和其他应用程序,在通过软件或者固件来实现本说明书实施例所提供的技术方案时,相关的程序代码保存在存储器802中,并由处理器801来调用执行。
输入/输出接口803可以用于连接输入/输出模块,以实现信息输入及输出。输入输出/模块可以作为组件配置在设备中(图中未示出),也可以外接于设备以提供相应功能。其中输入设备可以包括键盘、鼠标、触摸屏、麦克风、各类传感器等,输出设备可以包括显示器、扬声器、振动器、指示灯等。
通信接口804可以用于连接通信模块(图中未示出),以实现本设备与其他设备的通信交互。其中通信模块可以通过有线方式(例如USB、网线等)实现通信,也可以通过无线方式(例如移动网络、WIFI、蓝牙等)实现通信。
总线805可以包括一通路,在设备的各个组件(例如处理器801、存储器802、输入/输出接口803和通信接口804)之间传输信息。
需要说明的是,尽管上述设备仅示出了处理器801、存储器802、输入/输出接口803、通信接口804以及总线805,但是在具体实施过程中,该设备还可以包括实现正常运行所必需的其他组件。此外,本领域的技术人员可以理解的是,上述设备中也可以仅包含实现本说明书实施例方案所必需的组件,而不必包含图中所示的全部组件。
如图9所示,本公开实施例还提供一种文本识别系统,包括客户端901,用于上传所述待处理证件,并向服务器902发送所述待处理证件;以及服务器902,用于执行本公开任一实施例所述的方法。
其中,所述客户端901可以安装在手机、平板电脑、台式电脑等智能终端上。所述智能终端上设有交互组件,用于上传照片。所述交互组件可以是触摸屏、鼠标、按键等。所述智能终端上还可以设有显示屏,用于预览上传的照片以及文本识别结果。所述智能终端还可以包括通信接口,用于与服务器902进行通信,以向服务器902发送用户上传的照片和用户发送的各种指令,并接收服务器902返回的包括文本识别结果在内的各种信息和指令。
在一些实施例中,所述待处理证件中包括固定字段和非固定字段;所述服务器902还用于:在对所述第二区域进行文本识别之后,向所述客户端901发送识别出的非固定字段与所述待处理证件中的固定字段的关联关系;所述客户端901还用于:响应于接收到所述服务器902发送的所述关联关系,基于所述关联关系对所述固定字段与所述识别出的非固定字段进行关联显示。例如,将非固定字段显示在对应的固定字段的末尾。
本公开实施例还提供一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现前述任一实施例所述的方法。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
本公开实施例还提供一种文本处理方法,可以应用于所述客户端901,所述客户端901在执行该文本处理方法时向服务器902上传待处理证件,并在接收到服务器902通 过执行本公开任一实施例所述的文本识别方法对所述待处理证件进行识别得到识别出的非固定字段与所述待处理证件中的固定字段的关联关系后,基于所述关联关系对所述固定字段与所述识别出的非固定字段进行关联显示。
本公开实施例还提供一种计算机程序,包括计算机可读代码,在所述计算机可读代码被处理器执行时实现本公开任一实施例所述的方法。
通过以上的实施方式的描述可知,本领域的技术人员可以清楚地了解到本说明书实施例可借助软件加必需的通用硬件平台的方式来实现。基于这样的理解,本说明书实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品可以存储在存储介质中,如ROM/RAM、磁碟、光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本说明书实施例各个实施例或者实施例的某些部分所述的方法。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于装置实施例而言,由于其基本相似于方法实施例,所以描述得比较简单,相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,在实施本说明书实施例方案时可以把各模块的功能在同一个或多个软件和/或硬件中实现。也可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。
以上所述仅是本说明书实施例的具体实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本说明书实施例原理的前提下,还可以做出若干改进和润饰,这些改进和润饰也应视为本说明书实施例的保护范围。

Claims (20)

  1. 一种文本识别方法,其特征在于,所述方法包括:
    获取模板证件中的第一区域;
    确定待处理证件中与所述第一区域对应的第二区域,所述待处理证件与所述模板证件的证件类别相同,所述第一区域在所述模板证件中的相对位置与所述第二区域在所述待处理证件中的相对位置相同;
    基于与所述第一区域中的字段的属性信息对应的识别方式对所述第二区域进行文本识别。
  2. 根据权利要求1所述的方法,其特征在于,所述字段的属性信息包括所述字段的字符类型和所述字段的字体类型中的至少一个。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    在对所述第二区域进行文本识别之后,基于所述第二区域中文本的位置信息和语义信息中的至少一个,确定所述第二区域是否为需要调整的目标区域;
    对所述目标区域进行调整;并
    对调整后的所述目标区域进行文本识别。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述第二区域中字段的位置信息和语义信息中的至少一个,确定所述第二区域是否为需要调整的目标区域,包括:
    将满足以下至少一项条件的第二区域确定为所述目标区域:
    所述第二区域中字段的位置超出所述第二区域的边界;
    所述第二区域中字段的语义不完整;
    所述第二区域中字段与所述第一区域中字段属于不同语义类别。
  5. 根据权利要求3或4所述的方法,其特征在于,所述对所述目标区域进行调整,包括:
    在所述目标区域的数量大于预设数量阈值,且各个所述目标区域的偏移方向相同的情况下,确定多个所述目标区域的整体偏移量;
    基于所述整体偏移量对所述多个目标区域进行调整。
  6. 根据权利要求3至5任意一项所述的方法,其特征在于,所述对所述目标区域进行调整,包括:
    在所述目标区域的数量不大于预设数量阈值,或者存在至少两个所述目标区域的偏移方向不同的情况下,确定所述待处理证件中的第一目标区域的偏移量;
    基于所述第一目标区域的偏移量,对所述第一目标区域以外的第二目标区域进行调整。
  7. 根据权利要求6所述的方法,其特征在于,所述第一目标区域在所述第二目标区域之前检测到。
  8. 根据权利要求3所述的方法,其特征在于,所述对所述目标区域进行调整,包括:
    从所述待处理证件中查找与所述第一区域的字段具有相同语义类别的字段;
    将所述目标区域调整为查找到的字段所在的第二区域。
  9. 根据权利要求1至8任意一项所述的方法,其特征在于,所述确定待处理证件 中与所述第一区域对应的第二区域,包括:
    基于预先建立的转换矩阵确定待处理证件中与所述第一区域对应的第二区域;
    其中,所述转换矩阵基于以下方式确定:
    基于所述模板证件中的k个第三区域和所述待处理证件中的k个第四区域建立k个第一矩阵,1≤k<N,k和N均为正整数,N为所述第三区域与所述第四区域的总组数,每组中的第三区域与第四区域一一对应且包括的文本信息相同;
    针对所述k个第一矩阵中的每个第一矩阵,基于所述第一矩阵对其余N-k个第三区域与第四区域的组进行匹配,确定匹配成功的组数;
    将所述k个第一矩阵中匹配成功的组数最多的第一矩阵确定为所述转换矩阵。
  10. 根据权利要求9所述的方法,其特征在于,所述基于所述模板证件中的k个第三区域和所述待处理证件中的k个第四区域建立k个第一矩阵,包括:
    从所述模板证件中的第i个第三区域和所述待处理证件中的第i个第四区域中选取多个点对,i为正整数,所述多个点对包括首字段的中心点点对、末字段的中心点点对、区域上边界的中点点对以及区域下边界的中点点对;
    基于所述第i个第三区域和所述第i个第四区域中的多个点对,建立所述k个第一矩阵中的第i个第一矩阵。
  11. 根据权利要求1至10任意一项所述的方法,其特征在于,所述方法还包括:
    在对所述第二区域进行文本识别之后,基于所述第二区域中字段的语义信息和位置信息中的至少一个,将所述第二区域中的字段拆分到多个新的第二区域中;
    分别对每个所述新的第二区域进行文本识别。
  12. 根据权利要求11所述的方法,其特征在于,所述基于所述第二区域中字段的语义信息,将所述第二区域中的字段拆分到多个新的第二区域中,包括:
    基于所述第二区域中字段的语义信息,将所述第二区域中的字段划分为多个字段组,不同所述字段组中的字段的语义不相关;
    将每个所述字段组拆分到一个新的第二区域中。
  13. 根据权利要求1至12任意一项所述的方法,其特征在于,所述第二区域的数量为多个;所述方法还包括:
    在对所述第二区域进行文本识别之后,将所述第二区域中至少两个第二区域的识别结果作为整体进行语义识别;
    基于所述至少两个第二区域整体的语义识别结果,输出文本信息。
  14. 根据权利要求1至13任意一项所述的方法,其特征在于,所述基于与所述第一区域中的字段的属性信息对应的识别方式对所述第二区域进行文本识别,包括:
    基于所述第一区域中的字段的属性信息调用对应的神经网络;
    通过调用的神经网络对所述第二区域进行文本识别。
  15. 根据权利要求1至14任意一项所述的方法,其特征在于,所述待处理证件中包括固定字段和非固定字段;在对所述第二区域进行文本识别之后,所述方法还包括:
    向目标设备发送识别出的非固定字段与所述待处理证件中的固定字段的关联关系,以使所述目标设备基于所述关联关系对所述固定字段与所述识别出的非固定字段进行关联显示。
  16. 一种文本识别装置,其特征在于,所述装置包括:
    获取模块,用于获取模板证件中的第一区域;
    确定模块,用于确定待处理证件中与所述第一区域对应的第二区域,所述待处理证件与所述模板证件的证件类别相同,所述第一区域在所述模板证件中的相对位置与所述第二区域在所述待处理证件中的相对位置相同;
    第一识别模块,用于基于与所述第一区域中的字段的属性信息对应的识别方式对所述第二区域进行文本识别。
  17. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现权利要求1至15任意一项所述的方法。
  18. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至15任意一项所述的方法。
  19. 一种文本处理方法,应用于客户端,其特征在于,所述方法包括:
    向服务器上传待处理证件,所述待处理证件中包括固定字段和非固定字段;
    接收所述服务器发送的识别出的所述非固定字段与所述待处理证件中的固定字段的关联关系,并基于所述关联关系对所述固定字段与所述识别出的非固定字段进行关联显示;
    其中,所述识别出的非固定字段与所述待处理证件中的固定字段的关联关系为所述服务器通过执行权利要求1至15任意一项所述的方法对所述待处理证件进行识别得到的识别结果。
  20. 一种计算机程序,包括计算机可读代码,其特征在于,所述计算机可读代码被处理器执行时实现权利要求1至15任意一项所述的方法。
PCT/CN2021/121541 2020-12-31 2021-09-29 文本识别方法、装置和存储介质 WO2022142549A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011617846.1A CN112633279A (zh) 2020-12-31 2020-12-31 文本识别方法、装置和系统
CN202011617846.1 2020-12-31

Publications (1)

Publication Number Publication Date
WO2022142549A1 true WO2022142549A1 (zh) 2022-07-07

Family

ID=75287196

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/121541 WO2022142549A1 (zh) 2020-12-31 2021-09-29 文本识别方法、装置和存储介质

Country Status (2)

Country Link
CN (1) CN112633279A (zh)
WO (1) WO2022142549A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633279A (zh) * 2020-12-31 2021-04-09 北京市商汤科技开发有限公司 文本识别方法、装置和系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110288755A (zh) * 2019-05-21 2019-09-27 平安银行股份有限公司 基于文本识别的发票检验方法、服务器及存储介质
US10621727B1 (en) * 2016-07-26 2020-04-14 Intuit Inc. Label and field identification without optical character recognition (OCR)
CN111126125A (zh) * 2019-10-15 2020-05-08 平安科技(深圳)有限公司 证件中的目标文本提取方法、装置、设备及可读存储介质
CN111931784A (zh) * 2020-09-17 2020-11-13 深圳壹账通智能科技有限公司 票据识别方法、系统、计算机设备与计算机可读存储介质
CN112633279A (zh) * 2020-12-31 2021-04-09 北京市商汤科技开发有限公司 文本识别方法、装置和系统

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492643B (zh) * 2018-10-11 2023-12-19 平安科技(深圳)有限公司 基于ocr的证件识别方法、装置、计算机设备及存储介质
CN110245674B (zh) * 2018-11-23 2023-09-15 浙江大华技术股份有限公司 模板匹配方法、装置、设备及计算机存储介质
CN110569850B (zh) * 2019-08-20 2022-07-12 北京旷视科技有限公司 字符识别模板匹配方法、装置和文本识别设备
CN110689010B (zh) * 2019-09-27 2021-05-11 支付宝(杭州)信息技术有限公司 一种证件识别方法及装置
CN111444908B (zh) * 2020-03-25 2024-02-02 腾讯科技(深圳)有限公司 图像识别方法、装置、终端和存储介质
CN111914840A (zh) * 2020-07-31 2020-11-10 中国建设银行股份有限公司 一种文本识别方法、模型训练方法、装置及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10621727B1 (en) * 2016-07-26 2020-04-14 Intuit Inc. Label and field identification without optical character recognition (OCR)
CN110288755A (zh) * 2019-05-21 2019-09-27 平安银行股份有限公司 基于文本识别的发票检验方法、服务器及存储介质
CN111126125A (zh) * 2019-10-15 2020-05-08 平安科技(深圳)有限公司 证件中的目标文本提取方法、装置、设备及可读存储介质
CN111931784A (zh) * 2020-09-17 2020-11-13 深圳壹账通智能科技有限公司 票据识别方法、系统、计算机设备与计算机可读存储介质
CN112633279A (zh) * 2020-12-31 2021-04-09 北京市商汤科技开发有限公司 文本识别方法、装置和系统

Also Published As

Publication number Publication date
CN112633279A (zh) 2021-04-09

Similar Documents

Publication Publication Date Title
WO2022142551A1 (zh) 表单处理方法、装置、介质及计算机设备
WO2022142550A1 (zh) 图像识别方法、装置和存储介质
US11238362B2 (en) Modeling semantic concepts in an embedding space as distributions
WO2020238054A1 (zh) Pdf文档中图表的定位方法、装置及计算机设备
US8600989B2 (en) Method and system for image matching in a mixed media environment
US11461386B2 (en) Visual recognition using user tap locations
US20220253631A1 (en) Image processing method, electronic device and storage medium
US20150339536A1 (en) Collaborative text detection and recognition
US20070047780A1 (en) Shared Document Annotation
US20060285172A1 (en) Method And System For Document Fingerprint Matching In A Mixed Media Environment
US20060262962A1 (en) Method And System For Position-Based Image Matching In A Mixed Media Environment
EP1917636B1 (en) Method and system for image matching in a mixed media environment
US11734341B2 (en) Information processing method, related device, and computer storage medium
WO2022105119A1 (zh) 意图识别模型的训练语料生成方法及其相关设备
WO2022105569A1 (zh) 页面方向识别方法、装置、设备及计算机可读存储介质
CN100552670C (zh) 一种自动识别数字文档版心的方法
WO2022142549A1 (zh) 文本识别方法、装置和存储介质
US10740644B2 (en) Method and system for background removal from documents
WO2021051562A1 (zh) 人脸特征点定位方法、装置、计算设备和存储介质
US10891463B2 (en) Signature match system and method
JP6441142B2 (ja) 検索装置、方法及びプログラム
CN114445833B (zh) 文本识别方法、装置、电子设备和存储介质
WO2022105120A1 (zh) 图片文字检测方法、装置、计算机设备及存储介质
CN113220949A (zh) 一种隐私数据识别系统的构建方法及装置
US8280891B1 (en) System and method for the calibration of a scoring function

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21913294

Country of ref document: EP

Kind code of ref document: A1