WO2021110174A1 - 图像识别方法、装置、电子设备和存储介质 - Google Patents

图像识别方法、装置、电子设备和存储介质 Download PDF

Info

Publication number
WO2021110174A1
WO2021110174A1 PCT/CN2020/134332 CN2020134332W WO2021110174A1 WO 2021110174 A1 WO2021110174 A1 WO 2021110174A1 CN 2020134332 W CN2020134332 W CN 2020134332W WO 2021110174 A1 WO2021110174 A1 WO 2021110174A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
key block
sub
key
recognition
Prior art date
Application number
PCT/CN2020/134332
Other languages
English (en)
French (fr)
Inventor
周锴
王雷
宋祺
张睿
Original Assignee
北京三快在线科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京三快在线科技有限公司 filed Critical 北京三快在线科技有限公司
Publication of WO2021110174A1 publication Critical patent/WO2021110174A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words

Definitions

  • This application relates to the field of image recognition, in particular to image recognition methods, devices, electronic equipment and storage media.
  • Image recognition has a wide range of applications in fields such as identity verification and word processing.
  • An important application scenario is to identify business licenses, ID cards and other certificates to verify identity or qualifications.
  • an image recognition method including: acquiring an image to be recognized; selecting a key block detection model that matches the recognition category, and keying the image to be recognized according to the selected key block detection model Block detection; if multiple key blocks are detected, the multiple key blocks detected are clustered, and a number of sub-images are segmented from the image to be recognized according to the clustering results, so that each of the The sub-picture includes a plurality of the key blocks; and the character recognition is performed on each of the sub-pictures.
  • the method further includes: if the key block cannot be detected, determining that the category of the image to be recognized does not match the recognition category.
  • the key block detection model is obtained by training in the following manner: obtaining a sample image of a specified category as training data, the sample image is annotated with multiple key blocks; using the training data for iterative training, Obtain a key block detection model matching the specified category; wherein, the key block detection model is implemented based on a target detection algorithm.
  • clustering the plurality of key blocks detected includes: performing clustering based on the respective vector representations of the plurality of key blocks, and the clustering result satisfies the following condition: The ratio of the area of the image to the area of the image to be identified is not greater than the first threshold, and the ratio of the area of each key block in each sub-image to the area of the sub-image is not less than the second threshold.
  • the vector representation of the key block includes: the coordinates of the center point of the key block, the width of the key block, and the height of the key block.
  • performing character recognition on the sub-picture includes: performing character line detection on the sub-picture to obtain the detected character line; matching the detected character line with key blocks in the sub-picture, The attribute of the matching text line is determined according to the attribute of the key block in the sub-picture.
  • performing text recognition on the sub-picture further includes: performing text content recognition on the detected text line.
  • an image recognition device which includes: an image acquisition unit for acquiring an image to be recognized; a key block detection unit for selecting a key block detection model that matches the recognition category, according to The selected key block detection model performs key block detection on the image to be recognized; the clustering unit is used for clustering the detected multiple key blocks if multiple key blocks are detected, According to the clustering result, a number of sub-images are segmented from the image to be recognized, so that each of the sub-images includes a number of the key blocks; the recognition unit is used to perform character recognition on each of the sub-images.
  • the recognition unit is further configured to determine that the category of the image to be recognized does not match the recognition category if the key block cannot be detected.
  • the key block detection model is obtained by training in the following manner: obtaining a sample image of a specified category as training data, the sample image is annotated with multiple key blocks; using the training data for iterative training, Obtain a key block detection model matching the specified category; wherein, the key block detection model is implemented based on a target detection algorithm.
  • the clustering unit is configured to perform clustering based on the respective vector representations of the multiple key blocks, and the clustering result satisfies the following conditions: the area of each of the sub-images and the to-be-identified
  • the ratio of the area of the image is not greater than the first threshold, and the ratio of the area of each of the key blocks in each sub-image to the area of the sub-image is not less than the second threshold.
  • the vector representation of the key block includes: the coordinates of the center point of the key block, the width of the key block, and the height of the key block.
  • the recognition unit is configured to perform text line detection on the sub-picture to obtain the detected text line; match the detected text line with key blocks in the sub-picture, according to the The attribute of the key block in the sub-picture determines the attribute of the matching text line.
  • the recognition unit is configured to perform text content recognition on the detected text line.
  • an electronic device including: a processor; and a memory storing computer-executable instructions, which when executed, cause the processor to execute any of the foregoing Methods.
  • a computer-readable storage medium wherein the computer-readable storage medium stores one or more programs, and when the one or more programs are executed by a processor, the Any of the methods described above.
  • Fig. 1 shows a schematic flowchart of an image recognition method according to an embodiment of the present application
  • Figure 2 shows multiple key blocks detected in the food business license image
  • Figure 3 shows an invoice image containing small text content
  • Fig. 4 shows a schematic diagram after sub-picture division of Fig. 2;
  • Figure 5 shows the text line detection result of the left half of the sub-picture in Figure 4.
  • Fig. 6 shows a schematic structural diagram of an image recognition device according to an embodiment of the present application.
  • Fig. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • Fig. 8 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.
  • One method is to individually design specific licenses. Due to the need for a priori information induction, service development and other steps, it is very labor-intensive and time-consuming, and usually requires at least 2 man-months to achieve.
  • Another method is to take advantage of the relatively fixed format of the license and perform image matching between the image to be recognized and the sample image of the corresponding format before performing the recognition. However, this method is suitable for ideal situations such as clear images to be recognized and no deformation. Once the image to be recognized has text line drift, deformation, affine transformation, etc., the recognition effect is very unsatisfactory.
  • the corresponding key block detection model using the relatively fixed feature of the license format, select the corresponding key block detection model according to the recognition category (such as ID card recognition, business license recognition...), and determine the key block from the image to be recognized;
  • the key blocks are clustered, and the image to be recognized is segmented according to the clustering result.
  • the segmented image can be appropriately enlarged, so that the result of character recognition on the segmented image is more accurate.
  • the two strongly correlated steps of key block detection and intelligent image segmentation based on the detected key blocks are used, and the recognition accuracy and recall rate are significantly improved.
  • the recognition and development of the new format can be realized in only about 3 man-days, which greatly reduces the resource cost.
  • the embodiments of the present application can be applied to the recognition of images of relatively fixed-format licenses, including but not limited to identity verification, qualification verification, etc., and can be applied to business fields such as food delivery and financial services. A detailed introduction will be given below in conjunction with each embodiment.
  • Fig. 1 shows a schematic flowchart of an image recognition method according to an embodiment of the present application. As shown in Fig. 1, the method includes step S110 to step S140.
  • Step S110 Obtain an image to be recognized.
  • the image to be recognized may be an image uploaded by a user, and has a broad understanding, such as photos, screenshots, and video frames extracted from videos belong to the category of images.
  • the user can be required to upload an ID photo; when registering for a takeaway merchant, the merchant can be required to provide a business license photo ,and many more.
  • Uploading the image to be recognized can be associated with category information, which can be used to indicate the image category (referred to as the "recognition category)" that needs to be recognized in the uploading scene of the image to be recognized, and the category information can be related to the image category to be recognized.
  • the category does not match.
  • the recognition category is a driver’s license photo
  • the category of the image to be recognized is driving The photo of the certificate obviously does not match the recognition category.
  • Step S120 Select a key block detection model that matches the recognition category, and perform key block detection on the image to be recognized according to the selected key block detection model.
  • the key block detection model for the ID card recognition scene, select the key block detection model that matches the ID card; for the invoice recognition scene, select the key block detection model that matches the invoice.
  • the key block detection model here can be obtained through deep learning training.
  • preprocessing can be performed, such as image segmentation, and removal of parts that are not related to licenses and documents; images can be beautified and corrected to make the text clearer,
  • the shape of the license is closer to the ideal state; the direction of the image to be recognized can be adjusted first to improve the recognition accuracy. If the driver's license photo taken by the user is upside down, it can be rotated 180 degrees before detection, etc. .
  • the key block detection in this paper can determine the location and attributes of key blocks.
  • the key blocks can be detected (mark the detected key blocks with a bounding box) or by segmentation (mark the detected key blocks with a mask) determine.
  • Step S130 if multiple key blocks are detected, cluster the multiple key blocks detected, and segment a number of sub-images from the image to be recognized according to the clustering results, so that each sub-image contains several key blocks. Block.
  • the detected key blocks can contain attributes.
  • Figure 2 shows multiple key blocks detected in the food business license image. It can be seen from Figure 2 that the detected key blocks correspond to the operator name (keyword key_0), operator name (key_0_content), social credit code (keyword key_1), social credit code (content key_1_content)... in storage Or it can be stored as the corresponding key and content during calculation, such as key_2, key_2_content, key_3, key_3_content, and so on. It can be seen that the key block can actually correspond to a text block, such as an information area that must be included in a fixed-format image.
  • this application proposes key block detection, intelligent image segmentation based on the detected key blocks, first obtains the segmented sub-image, and then recognizes the text line of the sub-image, which greatly improves the recognition accuracy and recall rate, and Sub-pictures can also be appropriately enlarged to make the characteristics of the text more obvious.
  • step S140 character recognition is performed on each sub-picture. This application does not limit the implementation of character recognition.
  • the method shown in Figure 1 first selects the matching key block detection model according to the recognition category after obtaining the image to be recognized, and then clusters the detected key blocks, and then selects the key block detection model according to the clustering results. A number of sub-images are segmented from the image to be recognized, and finally text recognition is performed on each sub-image respectively.
  • the above method further includes: if the key block cannot be detected, determining that the category of the image to be recognized does not match the recognition category.
  • the driver's license recognition scenario if the user misrepresents the driving license, it is difficult to detect the key block based on the key block detection model of the driver's license. At this time, it can be judged that the image is wrong, specifically, it can be waited for.
  • the category of the recognition image does not match the recognition category.
  • This embodiment can have better practicability in a business scenario. For example, it can prompt "the image is wrong, please re-upload the image" based on this.
  • the recognition efficiency is better. And if it is determined that the image to be recognized does not contain the key block corresponding to the recognition category, the possibility that the image to be recognized does not contain the effective information of the corresponding recognition category is very high, so that the to be recognized can be found at a lower cost and faster The category of the image does not match the recognition category.
  • the key block detection model can be obtained by training in the following manner: obtaining a sample image of a specified category as the training data, and the sample image is labeled with multiple key blocks; using the training data to iterate Train to obtain a key block detection model that matches the specified category.
  • the key block detection model is implemented based on the target detection algorithm.
  • the network architecture of the key block detection model here can directly use the existing target detection framework and train based on the labeled training data to obtain a key block detection model that matches the category. It is also possible to build a general basic target detection framework, perform different training based on different training data, and obtain different key block detection models.
  • the target detection framework is used to detect key blocks.
  • the detected key blocks can contain multiple text lines. There is no need to separate multiple text lines, only the location of the key blocks needs to be determined. And attributes.
  • clustering the detected multiple key blocks may include: clustering based on the vector representation of the key blocks. Among them, the clustering result satisfies the following conditions: the ratio of the area of each sub-image to the area of the image to be recognized is not greater than the first threshold, and the area of each key block in each sub-image and the ratio of the area of the sub-image are equal Not less than the second threshold.
  • the vector representation of the key block includes: the coordinate of the center point of the key block, the width of the key block, and the height of the key block.
  • the vector of a key block is expressed as (x, y, w, h), x, y are the horizontal and vertical coordinates of the center point of the key block, w is the width of the key block, and h is the height of the key block.
  • the embodiment of the present application adopts an evaluation function to control the segmentation result, and the specific evaluation function may be the following function:
  • each sub-area is a graph showing S i (0 ⁇ i ⁇ k), each sub-block of the key included in FIG accumulated and expressed as the area S boxi (0 ⁇ i ⁇ k), the area of image to be recognized Is S, threshold 1 and threshold 2 are the first threshold and the second threshold respectively, and the two thresholds may be equal.
  • clustering and sub-graph segmentation can be performed dynamically, for example, first initialize k to 1, and then determine whether the sub-graph obtained by such segmentation satisfies the evaluation function according to the above-mentioned evaluation function. If it is not satisfied, add 1 to k, and then perform clustering and sub-graph segmentation until the above evaluation function is satisfied.
  • Fig. 4 The result of the segmentation of Fig. 2 is shown in Fig. 4, in which the two sub-images obtained by the segmentation are framed by dashed lines.
  • performing character recognition on each sub-picture includes: performing character line detection on each sub-picture to obtain the detected character line; and combining the detected character line and the key block For matching, determine the attributes of the matching text line according to the attributes of the key block.
  • the image to be recognized may have rotation (the rotation here includes rotation at a small angle, for example, when the image is taken, there may be a small angle between the vertical line of the document and the vertical line of the image due to the shooting angle), simulation For example, a rectangular document is photographed like an oblique parallelogram, blur, etc. Therefore, the line segmentation detection of each key block in the sub-image is directly performed, and the recognition effect is not ideal.
  • this application proposes a method of text line segmentation, that is, the text line detection is performed on the sub-picture to obtain multiple text lines; then the text line is matched with the key block, and the phase can be determined according to the attribute of the key block. The attributes of the matched text line.
  • the matching implementation methods include but are not limited to using IoU (Intersection over Union).
  • IoU is also a common concept in target detection. It usually refers to the overlap rate of the generated candidate bound and the ground truth bound, that is, the ratio of their intersection and union, also known as the intersection ratio .
  • the detected IoU of a text line and a key block is greater than a preset threshold, it can be considered that they match.
  • Figure 5 shows the text line detection results of the left half of the subgraph in Figure 4 (shown in white boxes), in which the attribute of the text line "xx cold food shop in Xiqing District, Tianjin" is the "operator name” Content item.
  • the text line detection algorithm may include, but is not limited to, the CTPN algorithm, the seg-link algorithm, and so on.
  • performing character recognition on each sub-picture separately further includes: performing character content recognition on the detected character line.
  • This application does not limit the implementation of text content recognition.
  • the recognized text content can be applied to the corresponding scenario.
  • the user only needs to provide the image of the business license, without having to manually fill in the legal representative and other information, but directly use the text content recognition result.
  • Fig. 6 shows a schematic structural diagram of an image recognition device according to an embodiment of the present application.
  • the image recognition device 600 includes:
  • the image acquisition unit 610 is configured to acquire an image to be recognized.
  • the image to be recognized may be an image uploaded by a user, and has a broad understanding, such as photos, screenshots, and video frames extracted from videos belong to the category of images.
  • the user can be required to upload an ID photo; when registering for a takeaway merchant, the merchant can be required to provide a business license photo ,and many more.
  • Uploading the image to be recognized can be associated with category information, which can be used to indicate the image category (referred to as the "recognition category)" that needs to be recognized in the uploading scene of the image to be recognized, and the category information can be related to the image category to be recognized.
  • the category does not match.
  • the recognition category is a driver’s license photo
  • the category of the image to be recognized is driving The photo of the certificate obviously does not match the recognition category.
  • the key block detection unit 620 is configured to select a key block detection model that matches the recognition category, and perform key block detection on the image to be recognized according to the selected key block detection model.
  • the key block detection model for the ID card recognition scene, select the key block detection model that matches the ID card; for the invoice recognition scene, select the key block detection model that matches the invoice.
  • the key block detection model here can be obtained through deep learning training.
  • preprocessing can be performed, such as image segmentation, and removal of parts that are not related to licenses and documents; images can be beautified and corrected to make the text clearer,
  • the shape of the license is closer to the ideal state; the direction of the image to be recognized can be adjusted first to improve the recognition accuracy. If the driver's license photo taken by the user is upside down, it can be rotated 180 degrees before detection, etc. .
  • the key block detection in this paper can determine the location and attributes of key blocks.
  • the key blocks can be detected (mark the detected key blocks with a bounding box) or by segmentation (mark the detected key blocks with a mask) determine.
  • the clustering unit 630 is configured to, if multiple key blocks are detected, cluster the multiple key blocks detected, and segment several sub-images from the image to be recognized according to the clustering results, so that each sub-image is separated Contains several key blocks.
  • the detected key blocks can contain attributes.
  • Figure 2 shows multiple key blocks detected in the food business license image. It can be seen from Figure 2 that the detected key blocks correspond to the operator name (keyword key_0), operator name (key_0_content), social credit code (keyword key_1), social credit code (content key_1_content)... in storage Or it can be stored as the corresponding key and content during calculation, such as key_2, key_2_content, key_3, key_3_content, and so on. It can be seen that the key block can actually correspond to a text block, such as an information area that must be included in a fixed-format image.
  • this application proposes key block detection, intelligent image segmentation based on the detected key blocks, first obtains the segmented sub-image, and then recognizes the text line of the sub-image, which greatly improves the recognition accuracy and recall rate, and Sub-pictures can also be appropriately enlarged to make the characteristics of the text more obvious.
  • the recognition unit 640 is configured to perform character recognition on each sub-picture. This application does not limit the implementation of character recognition.
  • the device shown in Figure 6 first selects a matching key block detection model according to the recognition category, and then clusters the detected key blocks, and according to the clustering results A number of sub-images are segmented from the image to be recognized, and finally text recognition is performed on each sub-image respectively.
  • the recognition unit 640 is further configured to determine that the category of the image to be recognized does not match the recognition category if the key block cannot be detected.
  • the driver's license recognition scenario if the user misrepresents the driving license, it is difficult to detect the key block based on the key block detection model of the driver's license. At this time, it can be judged that the image is wrong, specifically, it can be waited for.
  • the category of the recognition image does not match the recognition category.
  • This embodiment can have better practicability in a business scenario. For example, it can prompt "the image is wrong, please re-upload the image" based on this.
  • the recognition efficiency is better. And if it is determined that the image to be recognized does not contain the key block corresponding to the recognition category, the possibility that the image to be recognized does not contain the effective information of the corresponding recognition category is very high, so that the to be recognized can be found at a lower cost and faster The category of the image does not match the recognition category.
  • the key block detection model can be obtained by training in the following manner: obtaining a sample image of a specified category as training data, and the sample image is labeled with multiple key blocks; using training The data is iteratively trained to obtain a key block detection model that matches the specified category.
  • the key block detection model is implemented based on the target detection algorithm.
  • the network architecture of the key block detection model here can directly use the existing target detection framework and train based on the labeled training data to obtain a key block detection model that matches the category. It is also possible to build a general basic target detection framework, perform different training based on different training data, and obtain different key block detection models.
  • the target detection framework is used to detect key blocks.
  • the detected key blocks can contain multiple text lines. There is no need to separate multiple text lines, only the location of the key blocks needs to be determined. And attributes.
  • the clustering unit 630 is configured to perform clustering based on the vector representation of the key block.
  • the clustering result satisfies the following conditions: the ratio of the area of each sub-image to the area of the image to be recognized is not greater than the first threshold, and the area of each key block in each sub-image and the ratio of the area of the sub-image are equal Not less than the second threshold.
  • the vector representation of the key block includes: the coordinate of the center point of the key block, the width of the key block, and the height of the key block.
  • the vector of a key block is expressed as (x, y, w, h), x, y are the horizontal and vertical coordinates of the center point of the key block, w is the width of the key block, and h is the height of the key block. .
  • the embodiment of the present application adopts an evaluation function to control the segmentation result, and the specific evaluation function may be the following function:
  • each sub-area is a graph showing S i (0 ⁇ i ⁇ k), each sub-block of the key included in FIG accumulated and expressed as the area S boxi (0 ⁇ i ⁇ k), the area of image to be recognized Is S, threshold 1 and threshold 2 are the first threshold and the second threshold respectively, and the two thresholds may be equal.
  • clustering and subgraph segmentation can be performed dynamically, for example, first initialize k to 1, and then judge whether the subgraph obtained by such segmentation satisfies the evaluation function according to the above evaluation function. If it is not satisfied, add 1 to k, and then perform clustering and sub-graph segmentation until the above evaluation function is satisfied.
  • Fig. 4 The result of the segmentation of Fig. 2 is shown in Fig. 4, in which the two sub-images obtained by the segmentation are framed by dashed lines.
  • the recognition unit 640 is used to perform character line detection on each sub-image to obtain the detected character line; Match, determine the attribute of the matched text line according to the attribute of the key block.
  • the image to be recognized may have rotation (the rotation here includes rotation at a small angle, for example, when the image is taken, there may be a small angle between the vertical line of the document and the vertical line of the image due to the shooting angle), simulation For example, a rectangular document is photographed like an oblique parallelogram, blur, etc. Therefore, the line segmentation detection of each key block in the sub-image is directly performed, and the recognition effect is not ideal.
  • this application proposes a method of text line segmentation, that is, the text line detection is performed on the sub-picture to obtain multiple text lines; then the text line is matched with the key block, and the phase can be determined according to the attribute of the key block. The attributes of the matched text line.
  • the matching implementation methods include but are not limited to using IoU (Intersection over Union).
  • IoU is also a common concept in target detection. It usually refers to the overlap rate of the generated candidate bound and the ground truth bound, that is, the ratio of their intersection and union, also known as the intersection ratio .
  • the detected IoU of a text line and a key block is greater than a preset threshold, it can be considered that they match.
  • Figure 5 shows the text line detection results of the left half of the subgraph in Figure 4 (shown in white boxes), in which the attribute of the text line "xx cold food shop in Xiqing District, Tianjin" is the "operator name” Content item.
  • the text line detection algorithm may include, but is not limited to, the CTPN algorithm, the seg-link algorithm, and so on.
  • the recognition unit 640 is configured to perform text content recognition on the detected text line. This application does not limit the implementation of text content recognition.
  • the recognized text content can be applied to the corresponding scenario.
  • the user only needs to provide the image of the business license, without having to manually fill in the legal representative and other information, but directly use the text content recognition result.
  • the key block detection model After acquiring the image to be recognized, it first selects the matching key block detection model according to its category, and then clusters the detected key blocks, and then clusters the detected key blocks according to the clustering As a result, a number of sub-images are segmented from the image to be recognized, so that each sub-image contains a number of key blocks, thereby completing the intelligent segmentation of the image, and finally, each sub-image is individually recognized.
  • the key block detection model, clustering algorithm and text recognition can all be implemented on the basis of existing technology.
  • the key point is to use key block detection and intelligent image segmentation based on detected key blocks.
  • the correlation step solves the problem of formatting and outputting the content of the licenses and document images with a relatively fixed format, and greatly reduces the development manpower and time costs.
  • modules or units or components in the embodiments can be combined into one module or unit or component, and in addition, they can be divided into multiple sub-modules or sub-units or sub-components. Except that at least some of such features and/or processes or units are mutually exclusive, any combination can be used to compare all the features disclosed in this specification (including the accompanying claims, abstract and drawings) and any method or methods disclosed in this manner or All the processes or units of the equipment are combined. Unless expressly stated otherwise, each feature disclosed in this specification (including the accompanying claims, abstract and drawings) may be replaced by an alternative feature providing the same, equivalent or similar purpose.
  • the various component embodiments of the present application may be implemented by hardware, or by software modules running on one or more processors, or by a combination of them.
  • a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the image recognition device according to the embodiments of the present application.
  • This application can also be implemented as a device or device program (for example, a computer program and a computer program product) for executing part or all of the methods described herein.
  • Such a program for implementing the present application may be stored on a computer-readable medium, or may have the form of one or more signals.
  • Such a signal can be downloaded from an Internet website, or provided on a carrier signal, or provided in any other form.
  • FIG. 7 shows a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • the electronic device 700 includes a processor 710 and a memory 720 arranged to store computer-executable instructions (computer-readable program code).
  • the memory 720 may be an electronic memory such as flash memory, EEPROM (Electrically Erasable Programmable Read Only Memory), EPROM, hard disk, or ROM.
  • the memory 720 has a storage space 730 for storing computer-readable program codes 731 for executing any method steps in the above-mentioned methods.
  • the storage space 730 for storing computer-readable program codes may include various computer-readable program codes 731 respectively used to implement various steps in the above method.
  • the computer-readable program code 731 may be read from or written into one or more computer program products. These computer program products include program code carriers such as hard disks, compact disks (CDs), memory cards, or floppy disks. Such a computer program product is usually a computer-readable storage medium as described in FIG. 8.
  • Fig. 8 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present application.
  • the computer-readable storage medium 800 stores the computer-readable program code 731 for executing the steps of the method according to the present application, which can be read by the processor 710 of the electronic device 700, when the computer-readable program code 731 is run by the electronic device 700 , Causing the electronic device 700 to execute each step in the method described above.
  • the computer readable program code 731 stored in the computer readable storage medium can execute the method shown in any of the above embodiments.
  • the computer readable program code 731 may be compressed in an appropriate form.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Character Input (AREA)

Abstract

一种图像识别方法、装置、电子设备和存储介质。所述方法包括:获取待识别图像(S110);选择与识别类别匹配的关键区块检测模型,根据选择的关键区块检测模型对所述待识别图像进行关键区块检测(S120);若检测出多个关键区块,则对检测出的所述多个关键区块进行聚类,根据聚类结果从所述待识别图像中分割出若干个子图,使各所述子图包含若干个所述关键区块(S130);对各所述子图进行文字识别(S140)。

Description

图像识别方法、装置、电子设备和存储介质 技术领域
本申请涉及图像识别领域,具体涉及图像识别方法、装置、电子设备和存储介质。
背景技术
图像识别在身份验证、文字处理等领域有着广泛的应用。一个重要的应用场景为,对营业执照、身份证等证照进行识别,以进行身份或资格的校验。
发明内容
依据本申请的一个方面,提供了一种图像识别方法,包括:获取待识别图像;选择与识别类别匹配的关键区块检测模型,根据选择的关键区块检测模型对所述待识别图像进行关键区块检测;若检测出多个关键区块,则对检测出的所述多个关键区块进行聚类,根据聚类结果从所述待识别图像中分割出若干个子图,使各所述子图包含若干个所述关键区块;对各所述子图进行文字识别。
可选地,所述方法还包括:若不能检测出关键区块,则判定所述待识别图像的类别与所述识别类别不符。
可选地,所述关键区块检测模型是通过如下方式训练得到的:获取指定类别的样本图像作为训练数据,所述样本图像标注有多个关键区块;利用所述训练数据进行迭代训练,得到与所述指定类别匹配的关键区块检测模型;其中,所述关键区块检测模型是基于目标检测算法实现的。
可选地,对检测出的所述多个关键区块进行聚类包括:基于所述多个关键区块各自的向量表示进行聚类,所述聚类结果满足如下条件:每个所述子图的面积与所述待识别图像的面积的比值不大于第一阈值,且每个所述子图中各所述关键区块的面积和与该子图的面积的比值不小于第二阈值。
可选地,所述关键区块的向量表示包括:所述关键区块的中心点坐标、所述关键区块的宽和所述关键区块的高。
可选地,对所述子图进行文字识别包括:对所述子图进行文字行检测,得到检测出的文字行;将检测出的文字行与所述子图中的关键区块进行匹配,根据所述子图中的关 键区块的属性确定相匹配的文字行的属性。
可选地,对所述子图进行文字识别还包括:对检测出的文字行进行文字内容识别。
依据本申请的另一方面,提供了一种图像识别装置,包括:图像获取单元,用于获取待识别图像;关键区块检测单元,用于选择与识别类别匹配的关键区块检测模型,根据选择的关键区块检测模型对所述待识别图像进行关键区块检测;聚类单元,用于若检测出多个关键区块,则对检测出的所述多个关键区块进行聚类,根据聚类结果从所述待识别图像中分割出若干个子图,使各所述子图包含若干个所述关键区块;识别单元,用于对各所述子图进行文字识别。
可选地,所述识别单元,还用于若不能检测出关键区块,则判定所述待识别图像的类别与所述识别类别不符。
可选地,所述关键区块检测模型是通过如下方式训练得到的:获取指定类别的样本图像作为训练数据,所述样本图像标注有多个关键区块;利用所述训练数据进行迭代训练,得到与所述指定类别匹配的关键区块检测模型;其中,所述关键区块检测模型是基于目标检测算法实现的。
可选地,所述聚类单元,用于基于所述多个关键区块各自的向量表示进行聚类,所述聚类结果满足如下条件:每个所述子图的面积与所述待识别图像的面积的比值不大于第一阈值,且每个所述子图中各所述关键区块的面积和与该子图的面积的比值不小于第二阈值。
可选地,所述关键区块的向量表示包括:所述关键区块的中心点坐标、所述关键区块的宽和所述关键区块的高。
可选地,所述识别单元,用于对所述子图进行文字行检测,得到检测出的文字行;将检测出的文字行与所述子图中的关键区块进行匹配,根据所述子图中的关键区块的属性确定相匹配的文字行的属性。
可选地,所述识别单元,用于对检测出的文字行进行文字内容识别。
依据本申请的又一方面,提供了一种电子设备,包括:处理器;以及存储计算机可执行指令的存储器,所述可执行指令在被执行时使所述处理器执行如上述任一所述的方法。
依据本申请的再一方面,提供了一种计算机可读存储介质,其中,所述计算机可读 存储介质存储一个或多个程序,所述一个或多个程序当被处理器执行时,实现如上述任一所述的方法。
由上述可知,本申请的实施例,在获取到待识别图像后,先依据识别类别选择相匹配的关键区块检测模型,再对检测出的多个关键区块进行聚类,根据聚类结果从待识别图像中分割出若干个子图,使各子图分别包含若干个关键区块,从而完成了图像的智能分割,最后对各子图分别进行文字识别。通过利用关键区块检测、并基于检测出的关键区块进行图像智能分割,解决了版式较为固定的证照、文档图像内容格式化输出的难题,大大降低了开发人力和时间成本。
上述说明仅是本申请实施例的概述,为了能够更清楚了解本申请的技术手段,而可依照说明书的内容予以实施,并且为了让本申请的上述和其它目的、特征和优点能够更明显易懂,以下特举本申请的具体实施方式。
附图说明
通过阅读下文优选实施方式的详细描述,各种其他的优点和益处对于本领域普通技术人员将变得清楚明了。附图仅用于示出优选实施方式的目的,而并不认为是对本申请的限制。而且在整个附图中,用相同的参考符号表示相同的部件。在附图中:
图1示出了根据本申请一个实施例的一种图像识别方法的流程示意图;
图2示出了在食品经营许可证图像中检测出的多个关键区块;
图3示出了包含较小文字内容的发票图像;
图4示出了对图2进行子图分割后的示意图;
图5示出了图4中左半部分子图的文字行检测结果;
图6示出了根据本申请一个实施例的一种图像识别装置的结构示意图;
图7示出了根据本申请一个实施例的电子设备的结构示意图;
图8示出了根据本申请一个实施例的计算机可读存储介质的结构示意图。
具体实施方式
下面将参照附图更详细地描述本申请的示例性实施例。虽然附图中显示了本申请的示例性实施例,然而应当理解,可以以各种形式实现本申请而不应被这里阐述的实施例 所限制。相反,提供这些实施例是为了能够更透彻地理解本申请,并且能够将本申请的范围完整的传达给本领域的技术人员。
目前的图像识别方法除了人工方式外,还包括如下做法:一种方法为,针对特定证照进行单独设计。由于需要先验信息归纳、服务开发等诸多步骤,非常耗费人力和时间,通常需要至少2人月才能实现。另一种方法是利用证照版式相对固定的特点,将待识别图像与相应版式的样本图像进行图像匹配后,再进行识别。但这种方式适用于待识别图像清晰、无形变等理想情况。一旦待识别图像存在文字行漂移、形变、仿射变换等情况,识别效果非常不理想。
根据本申请的实施例:利用证照版式相对固定的特点,根据识别类别(如身份证识别、营业执照识别……)选择相应的关键区块检测模型,从待识别图像中确定关键区块;再对关键区块进行聚类,根据聚类结果对待识别图像进行分割,分割后的图像可以适当放大,以使得对分割后的图像进行文字识别的结果更准确。
在申请的实施例中,利用了关键区块检测、基于检测出的关键区块进行图像智能分割这两个强关联步骤,识别准确率和召回率有着显著提升。并且,对于新版式的识别开发只需要3人日左右即可实现,大大降低了资源成本。
本申请的实施例可应用于对版式较为固定的证照的图像进行识别,包括但不限于身份验证、资质验证等,可应用于外卖、金融服务等业务领域。下面结合各实施例进行详细的介绍。
图1示出了根据本申请一个实施例的一种图像识别方法的流程示意图。如图1所示,该方法包括步骤S110至步骤S140。
步骤S110,获取待识别图像。其中,待识别图像可以是由用户上传的图像,并且具有广义的理解,如照片、截图以及视频中提取出的视频帧都属于图像的范畴。
就待识别图像的内容和上传场景而言,举例来说,在购买金融产品前,要验证用户的身份信息,可以要求用户上传身份证照片;在外卖商家注册时,可以要求商家提供营业执照照片,等等。
上传待识别图像可关联有一个类别信息,该类别信息可以用于指示该待识别图像上传的场景需要识别的图像类别(简称为“识别类别)”,并且该类别信息可以与该待识别图像的类别不相符。例如,在要求上传驾驶证照片的场景中,需要识别的图像类别为驾驶证照片,而用户上传的待识别图像可以为行驶证照片,则识别类别为驾驶证照片, 待识别图像的类别为行驶证照片,显然与识别类别不符。
步骤S120,选择与识别类别匹配的关键区块检测模型,根据选择的关键区块检测模型对待识别图像进行关键区块检测。
例如,对于身份证识别场景,选择与身份证匹配的关键区块检测模型;对于发票识别场景,选择与发票匹配的关键区块检测模型。这里的关键区块检测模型可以是通过深度学习训练得到的。优选地,在待识别图像被送入关键区块检测模型之前,可以进行预处理,例如进行图像切分,切除与证照、文档无关的部分;可以对图像进行美化和校正,使得文字更清晰,证照的形状更贴近理想状态;可以先调整待识别图像的方向,以提高识别准确性,如用户拍摄的驾驶证照片是上下颠倒的,可以先将其进行180度旋转后再进行检测,等等。
本文的关键区块检测可以确定关键区块的位置和属性。在本申请的实施例中,关键区块可以通过检测方式(将检测出的关键区块以包围框bounding box标记出来)或者通过分割方式(将检测出的关键区块以掩模mask标记出来)确定。
步骤S130,若检测出多个关键区块,则对检测出的多个关键区块进行聚类,根据聚类结果从待识别图像中分割出若干个子图,使各子图分别包含若干个关键区块。
检测出的关键区块可以包含属性。图2示出了在食品经营许可证图像中检测出的多个关键区块。由图2可见,检测出的关键区块分别对应经营者名称(关键字key_0)、经营者名称(内容key_0_content)、社会信用代码(关键字key_1)、社会信用代码(内容key_1_content)……在存储或计算时可以分别存储为相应的key和content,例如key_2、key_2_content、key_3、key_3_content,等等。可见,关键区块实际对应的可以是文字区块,如固定版式图像中必然包含的信息区域。
针对图3这类文字区域相对于整个图像尺寸较小的情景,基于卷积神经网络等神经网络的文字行检测算法,无法关注到较小文字的特征(在卷积时较小文字被过度压缩,无法提取有效特征),因而可能会出现漏检的情况。因而本申请提出了关键区块检测、基于检测出的关键区块进行图像智能分割,先得到分割后的子图,再对子图进行文字行识别,大大提升了识别准确率和召回率,并且子图还可以进行适当放大,使得文字的特征更明显。
步骤S140,对各子图分别进行文字识别。本申请对文字识别的实现方式不做限制。
可见,图1所示的方法,在获取到待识别图像后,先依据识别类别选择相匹配的关 键区块检测模型,再对检测出的多个关键区块进行聚类,根据聚类结果从待识别图像中分割出若干个子图,最后对各子图分别进行文字识别。通过利用关键区块检测、并基于检测出的关键区块进行图像智能分割,解决了版式较为固定的证照、文档图像内容格式化输出的难题,大大降低了开发人力和时间成本。
在本申请的一个实施例中,上述方法还包括:若不能检测出关键区块,则判定待识别图像的类别与识别类别不符。
例如,在驾驶证识别场景下,如果用户误传了行驶证,那么根据驾驶证的关键区块检测模型就难以检测出关键区块,这时可以判断为图像有误,具体而言可以是待识别图像的类别与识别类别不符。该实施例可以在业务场景下有着较好的实用性,如可以基于此提示“图像有误,请重新上传图像”。
由于关键区块检测相较于文字行识别的粒度更大,因此识别效率更好。并且如果确定待识别图像不包含与识别类别相对应的关键区块,则该待识别图像不包含相应识别类别的有效信息的可能性就非常高,从而能更低成本、更快捷地发现待识别图像的类别与识别类别不符。
在本申请的一个实施例中,上述方法中,关键区块检测模型可以通过如下方式训练得到:获取指定类别的样本图像作为训练数据,样本图像标注有多个关键区块;利用训练数据进行迭代训练,得到与该指定类别匹配的关键区块检测模型。其中,关键区块检测模型是基于目标检测算法实现的。
这里的关键区块检测模型的网络架构可以直接使用已有的目标检测框架,基于标注的训练数据进行训练,得到与类别匹配的关键区块检测模型。也可以搭建一个通用的基础目标检测框架,基于不同训练数据进行不同的训练,得到不同的关键区块检测模型。
由于关键区块的粒度较大,因而目标检测框架的性能是足够的,但是如果希望直接根据目标检测框架进行文字行定位则难以满足。在本申请的实施例中,使用目标检测框架进行关键区块的检测,检测出的关键区块可以包含多个文字行,不需要对多个文字行进行分离,只需要确定关键区块的位置以及属性。
在本申请的一个实施例中,上述方法中,对检测出的多个关键区块进行聚类可以包括:基于关键区块的向量表示进行聚类。其中,聚类结果满足如下条件:每个子图的面积与待识别图像的面积的比值均不大于第一阈值,且每个子图中各关键区块的面积和与该子图的面积的比值均不小于第二阈值。
具体地,在本申请的一个实施例中,上述方法中,关键区块的向量表示包括:关键区块的中心点坐标、关键区块的宽和关键区块的高。例如一个关键区块的向量表示为(x,y,w,h),x,y分别为关键区块的中心点的横纵坐标,w为关键区块的宽,h为关键区块的高。
为了避免子图的过度分割或是分割粒度过粗,本申请的实施例采用评价函数控制分割结果,具体的评价函数可以是下述函数:
S i/S≤threshold 1and S boxi/S i≥threshold 2
其中,每个子图的面积表示为S i(0≤i<k),每个子图中包含的各关键区块的面积累加和表示为S boxi(0≤i<k),待识别图像的面积为S,threshold 1和threshold 2分别为第一阈值和第二阈值,两个阈值可以是相等的。
具体来说,聚类和子图分割可以是动态进行的,例如先初始化k为1,然后根据上述评价函数判断这样分割得到的子图是否满足评价函数。若不满足则将k加1,再进行聚类和子图分割,直到满足上述评价函数。
图2分割后的结果如图4所示,其中以虚线框出了分割得到的两个子图。
在本申请的一个实施例中,上述方法中,对各子图分别进行文字识别包括:对各子图分别进行文字行检测,得到检测出的文字行;将检测出的文字行与关键区块进行匹配,根据关键区块的属性确定相匹配的文字行的属性。
由于待识别图像可能存在旋转(这里的旋转包括小角度的旋转,例如拍摄图像时可能由于拍摄角度导致图像中文档的中垂线与图像的中垂线之间存在一个小角度夹角)、仿射变化(例如,将矩形文档拍摄得像倾斜的平行四边形)、模糊等情况,因此直接对子图中各关键区块做线分割检测,然后进行识别的效果不够理想。对此,本申请提出了文字行分割的方式,即先对子图做文字行检测,得到多个文字行;再将文字行与关键区块进行匹配,就能够根据关键区块的属性确定相匹配的文字行的属性。
匹配的实现方式包括但不限于利用IoU(Intersection over Union)。IoU也是目标检测中的一个常用概念,通常指产生的候选框(candidate bound)与原标记框(ground truth bound)的交叠率,即它们的交集与并集的比值,也称为交并比。在本申请的实施例中,如果检测到的一个文字行与一个关键区块的IoU大于预设阈值,即可认为二者匹配。
例如图5示出了图4中左半部分子图的文字行检测结果(白框所示),其中,“天津市西青区xx冷食店”这一文字行的属性就是“经营者名称”的内容项。
其中,文字行检测算法可以包括但不限于CTPN算法、seg-link算法等。
在本申请的一个实施例中,上述方法中,对各子图分别进行文字识别还包括:对检测出的文字行进行文字内容识别。本申请对文字内容识别的实现方式不做限制。
识别出的文字内容可以应用到相应场景,如用户只需要提供营业执照的图像,就不必手动填写法定代表人等信息,而是直接使用文字内容识别的结果即可。
图6示出了根据本申请一个实施例的一种图像识别装置的结构示意图。如图6所示,图像识别装置600包括:
图像获取单元610,用于获取待识别图像。其中,待识别图像可以是由用户上传的图像,并且具有广义的理解,如照片、截图以及视频中提取出的视频帧都属于图像的范畴。
就待识别图像的内容和上传场景而言,举例来说,在购买金融产品前,要验证用户的身份信息,可以要求用户上传身份证照片;在外卖商家注册时,可以要求商家提供营业执照照片,等等。
上传待识别图像可关联有一个类别信息,该类别信息可以用于指示该待识别图像上传的场景需要识别的图像类别(简称为“识别类别)”,并且该类别信息可以与该待识别图像的类别不相符。例如,在要求上传驾驶证照片的场景中,需要识别的图像类别为驾驶证照片,而用户上传的待识别图像可以为行驶证照片,则识别类别为驾驶证照片,待识别图像的类别为行驶证照片,显然与识别类别不符。
关键区块检测单元620,用于选择与识别类别匹配的关键区块检测模型,根据选择的关键区块检测模型对待识别图像进行关键区块检测。
例如,对于身份证识别场景,选择与身份证匹配的关键区块检测模型;对于发票识别场景,选择与发票匹配的关键区块检测模型。这里的关键区块检测模型可以是通过深度学习训练得到的。优选地,在待识别图像被送入关键区块检测模型之前,可以进行预处理,例如进行图像切分,切除与证照、文档无关的部分;可以对图像进行美化和校正,使得文字更清晰,证照的形状更贴近理想状态;可以先调整待识别图像的方向,以提高识别准确性,如用户拍摄的驾驶证照片是上下颠倒的,可以先将其进行180度旋转后再进行检测,等等。
本文的关键区块检测可以确定关键区块的位置和属性。在本申请的实施例中,关键区块可以通过检测方式(将检测出的关键区块以包围框bounding box标记出来)或者通 过分割方式(将检测出的关键区块以掩模mask标记出来)确定。
聚类单元630,用于若检测出多个关键区块,则对检测出的多个关键区块进行聚类,根据聚类结果从待识别图像中分割出若干个子图,使各子图分别包含若干个关键区块。
检测出的关键区块可以包含属性。图2示出了在食品经营许可证图像中检测出的多个关键区块。由图2可见,检测出的关键区块分别对应经营者名称(关键字key_0)、经营者名称(内容key_0_content)、社会信用代码(关键字key_1)、社会信用代码(内容key_1_content)……在存储或计算时可以分别存储为相应的key和content,例如key_2、key_2_content、key_3、key_3_content,等等。可见,关键区块实际对应的可以是文字区块,如固定版式图像中必然包含的信息区域。
针对图3这类文字区域相对于整个图像尺寸较小的情景,基于卷积神经网络等神经网络的文字行检测算法,无法关注到较小文字的特征(在卷积时较小文字被过度压缩,无法提取有效特征),因而可能会出现漏检的情况。因而本申请提出了关键区块检测、基于检测出的关键区块进行图像智能分割,先得到分割后的子图,再对子图进行文字行识别,大大提升了识别准确率和召回率,并且子图还可以进行适当放大,使得文字的特征更明显。
识别单元640,用于对各子图分别进行文字识别。本申请对文字识别的实现方式不做限制。
可见,图6所示的装置,在获取到待识别图像后,先依据识别类别选择相匹配的关键区块检测模型,再对检测出的多个关键区块进行聚类,根据聚类结果从待识别图像中分割出若干个子图,最后对各子图分别进行文字识别。通过利用关键区块检测、并基于检测出的关键区块进行图像智能分割,解决了版式较为固定的证照、文档图像内容格式化输出的难题,大大降低了开发人力和时间成本。
在本申请的一个实施例中,上述图像识别装置600中,识别单元640,还用于若不能检测出关键区块,则判定待识别图像的类别与识别类别不符。
例如,在驾驶证识别场景下,如果用户误传了行驶证,那么根据驾驶证的关键区块检测模型就难以检测出关键区块,这时可以判断为图像有误,具体而言可以是待识别图像的类别与识别类别不符。该实施例可以在业务场景下有着较好的实用性,如可以基于此提示“图像有误,请重新上传图像”。
由于关键区块检测相较于文字行识别的粒度更大,因此识别效率更好。并且如果确 定待识别图像不包含与识别类别相对应的关键区块,则该待识别图像不包含相应识别类别的有效信息的可能性就非常高,从而能更低成本、更快捷地发现待识别图像的类别与识别类别不符。
在本申请的一个实施例中,上述图像识别装置600中,关键区块检测模型可以通过如下方式训练得到:获取指定类别的样本图像作为训练数据,样本图像标注有多个关键区块;利用训练数据进行迭代训练,得到与该指定类别匹配的关键区块检测模型。其中,关键区块检测模型是基于目标检测算法实现的。
这里的关键区块检测模型的网络架构可以直接使用已有的目标检测框架,基于标注的训练数据进行训练,得到与类别匹配的关键区块检测模型。也可以搭建一个通用的基础目标检测框架,基于不同训练数据进行不同的训练,得到不同的关键区块检测模型。
由于关键区块的粒度较大,因而目标检测框架的性能是足够的,但是如果希望直接根据目标检测框架进行文字行定位则难以满足。在本申请的实施例中,使用目标检测框架进行关键区块的检测,检测出的关键区块可以包含多个文字行,不需要对多个文字行进行分离,只需要确定关键区块的位置以及属性。
在本申请的一个实施例中,上述图像识别装置600中,聚类单元630,用于基于关键区块的向量表示进行聚类。其中,聚类结果满足如下条件:每个子图的面积与待识别图像的面积的比值均不大于第一阈值,且每个子图中各关键区块的面积和与该子图的面积的比值均不小于第二阈值。
具体地,在本申请的一个实施例中,上述图像识别装置600中,关键区块的向量表示包括:关键区块的中心点坐标、关键区块的宽和关键区块的高。
例如一个关键区块的向量表示为(x,y,w,h),x,y分别为关键区块的中心点的横纵坐标,w为关键区块的宽,h为关键区块的高。
为了避免子图的过度分割或是分割粒度过粗,本申请的实施例采用评价函数控制分割结果,具体的评价函数可以是下述函数:
S i/S≤threshold 1and S boxi/S i≥threshold 2
其中,每个子图的面积表示为S i(0≤i<k),每个子图中包含的各关键区块的面积累加和表示为S boxi(0≤i<k),待识别图像的面积为S,threshold 1和threshold 2分别为第一阈值和第二阈值,两个阈值可以是相等的。
具体来说,聚类和子图分割可以是动态进行的,例如先初始化k为1,然后根据上 述评价函数判断这样分割得到的子图是否满足评价函数。若不满足则将k加1,再进行聚类和子图分割,直到满足上述评价函数。
图2分割后的结果如图4所示,其中以虚线框出了分割得到的两个子图。
在本申请的一个实施例中,上述图像识别装置600中,识别单元640,用于对各子图分别进行文字行检测,得到检测出的文字行;将检测出的文字行与关键区块进行匹配,根据关键区块的属性确定相匹配的文字行的属性。
由于待识别图像可能存在旋转(这里的旋转包括小角度的旋转,例如拍摄图像时可能由于拍摄角度导致图像中文档的中垂线与图像的中垂线之间存在一个小角度夹角)、仿射变化(例如,将矩形文档拍摄得像倾斜的平行四边形)、模糊等情况,因此直接对子图中各关键区块做线分割检测,然后进行识别的效果不够理想。对此,本申请提出了文字行分割的方式,即先对子图做文字行检测,得到多个文字行;再将文字行与关键区块进行匹配,就能够根据关键区块的属性确定相匹配的文字行的属性。
匹配的实现方式包括但不限于利用IoU(Intersection over Union)。IoU也是目标检测中的一个常用概念,通常指产生的候选框(candidate bound)与原标记框(ground truth bound)的交叠率,即它们的交集与并集的比值,也称为交并比。在本申请的实施例中,如果检测到的一个文字行与一个关键区块的IoU大于预设阈值,即可认为二者匹配。
例如图5示出了图4中左半部分子图的文字行检测结果(白框所示),其中,“天津市西青区xx冷食店”这一文字行的属性就是“经营者名称”的内容项。
其中,文字行检测算法可以包括但不限于CTPN算法、seg-link算法等。
在本申请的一个实施例中,上述图像识别装置600中,识别单元640,用于对检测出的文字行进行文字内容识别。本申请对文字内容识别的实现方式不做限制。
识别出的文字内容可以应用到相应场景,如用户只需要提供营业执照的图像,就不必手动填写法定代表人等信息,而是直接使用文字内容识别的结果即可。
综上所述,本申请的实施例,在获取到待识别图像后,先依据其类别选择相匹配的关键区块检测模型,再对检测出的多个关键区块进行聚类,根据聚类结果从待识别图像中分割出若干个子图,使各子图分别包含若干个关键区块,从而完成了图像的智能分割,最后对各子图分别进行文字识别。其中的关键区块检测模型、聚类算法和文字识别都可以在现有技术的基础上实现,关键点在于利用了关键区块检测、基于检测出的关键区块进行图像智能分割这两个强关联步骤,解决了版式较为固定的证照、文档图像内容格式 化输出的难题,大大降低了开发人力和时间成本。
需要说明的是:在此提供的算法和显示不与任何特定计算机、虚拟装置或者其它设备固有相关。各种通用装置也可以与基于在此的示教一起使用。根据上面的描述,构造这类装置所要求的结构是显而易见的。此外,本申请也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本申请的内容,并且上面对特定语言所做的描述是为了披露本申请的最佳实施方式。
在此处所提供的说明书中,说明了大量具体细节。然而,能够理解,本申请的实施例可以在没有这些具体细节的情况下实践。在一些实例中,并未详细示出公知的方法、结构和技术,以便不模糊对本说明书的理解。
类似地,应当理解,为了精简本申请并帮助理解各个实施例中的一个或多个,在上面对本申请的示例性实施例的描述中,本申请的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本申请要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,本申请在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本申请的单独实施例。
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。
此外,本领域的技术人员能够理解,尽管在此所述的一些实施例包括其它实施例中所包括的某些特征而不是其它特征,但是不同实施例的特征的组合意味着处于本申请的范围之内并且形成不同的实施例。例如,在下面的权利要求书中,所要求保护的实施例的任意之一都可以以任意的组合方式来使用。
本申请的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运 行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本申请实施例的图像识别装置中的一些或者全部部件的一些或者全部功能。本申请还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本申请的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。
例如,图7示出了根据本申请一个实施例的电子设备的结构示意图。该电子设备700包括处理器710和被安排成存储计算机可执行指令(计算机可读程序代码)的存储器720。存储器720可以是诸如闪存、EEPROM(电可擦除可编程只读存储器)、EPROM、硬盘或者ROM之类的电子存储器。存储器720具有存储用于执行上述方法中的任何方法步骤的计算机可读程序代码731的存储空间730。例如,用于存储计算机可读程序代码的存储空间730可以包括分别用于实现上面的方法中的各种步骤的各个计算机可读程序代码731。计算机可读程序代码731可以从一个或者多个计算机程序产品中读出或者写入到这一个或者多个计算机程序产品中。这些计算机程序产品包括诸如硬盘,紧致盘(CD)、存储卡或者软盘之类的程序代码载体。这样的计算机程序产品通常为例如图8所述的计算机可读存储介质。图8示出了根据本申请一个实施例的一种计算机可读存储介质的结构示意图。该计算机可读存储介质800存储有用于执行根据本申请的方法步骤的计算机可读程序代码731,可以被电子设备700的处理器710读取,当计算机可读程序代码731由电子设备700运行时,导致该电子设备700执行上面所描述的方法中的各个步骤,具体来说,该计算机可读存储介质存储的计算机可读程序代码731可以执行上述任一实施例中示出的方法。计算机可读程序代码731可以以适当形式进行压缩。
应该注意的是上述实施例对本申请进行说明而不是对本申请进行限制,并且本领域技术人员在不脱离所附权利要求的范围的情况下可设计出替换实施例。在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。本申请可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。单词第一、第二、以及第三等的使用不表示任何顺序。可将这些单词解释为名称。

Claims (10)

  1. 一种图像识别方法,包括:
    获取待识别图像;
    选择与识别类别匹配的关键区块检测模型,
    根据选择的关键区块检测模型对所述待识别图像进行关键区块检测;
    若检测出多个关键区块,则
    对检测出的所述多个关键区块进行聚类,
    根据聚类结果从所述待识别图像中分割出若干个子图,使各所述子图包含若干个所述关键区块;
    对各所述子图进行文字识别。
  2. 如权利要求1所述的方法,还包括:
    若不能检测出关键区块,则判定所述待识别图像的类别与所述识别类别不符。
  3. 如权利要求1所述的方法,其中,所述关键区块检测模型是通过如下方式训练得到的:
    获取指定类别的样本图像作为训练数据,所述样本图像标注有多个关键区块;
    利用所述训练数据进行迭代训练,得到与所述指定类别匹配的关键区块检测模型;
    其中,所述关键区块检测模型是基于目标检测算法实现的。
  4. 如权利要求1所述的方法,其中,对检测出的所述多个关键区块进行聚类包括:
    基于所述多个关键区块各自的向量表示进行聚类,所述聚类结果满足如下条件:
    每个所述子图的面积与所述待识别图像的面积的比值不大于第一阈值,且
    每个所述子图中各所述关键区块的面积和与该子图的面积的比值不小于第二阈值。
  5. 如权利要求4所述的方法,其中,所述关键区块的向量表示包括:
    所述关键区块的中心点坐标,
    所述关键区块的宽,和
    所述关键区块的高。
  6. 如权利要求1所述的方法,其中,对所述子图进行文字识别包括:
    对所述子图进行文字行检测,得到检测出的文字行;
    将检测出的文字行与所述子图中的关键区块进行匹配,
    根据所述子图中的关键区块的属性确定相匹配的文字行的属性。
  7. 如权利要求6所述的方法,其中,对所述子图进行文字识别还包括:
    对检测出的文字行进行文字内容识别。
  8. 一种图像识别装置,包括:
    图像获取单元,用于获取待识别图像;
    关键区块检测单元,用于选择与识别类别匹配的关键区块检测模型,根据选择的关键区块检测模型对所述待识别图像进行关键区块检测;
    聚类单元,用于若检测出多个关键区块,则对检测出的所述多个关键区块进行聚类,根据聚类结果从所述待识别图像中分割出若干个子图,使各所述子图包含若干个所述关键区块;
    识别单元,用于对各所述子图进行文字识别。
  9. 一种电子设备,包括:
    处理器;以及
    存储计算机可执行指令的存储器,所述可执行指令在被执行时使所述处理器执行如权利要求1-7中任一项所述的方法。
  10. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储一个或多个程序,所述一个或多个程序当被处理器执行时,实现如权利要求1-7中任一项所述的方法。
PCT/CN2020/134332 2019-12-05 2020-12-07 图像识别方法、装置、电子设备和存储介质 WO2021110174A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911237612.1A CN111160395A (zh) 2019-12-05 2019-12-05 图像识别方法、装置、电子设备和存储介质
CN201911237612.1 2019-12-05

Publications (1)

Publication Number Publication Date
WO2021110174A1 true WO2021110174A1 (zh) 2021-06-10

Family

ID=70556519

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134332 WO2021110174A1 (zh) 2019-12-05 2020-12-07 图像识别方法、装置、电子设备和存储介质

Country Status (2)

Country Link
CN (1) CN111160395A (zh)
WO (1) WO2021110174A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743361A (zh) * 2021-09-16 2021-12-03 上海深杳智能科技有限公司 基于图像目标检测的文档切割方法

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111160395A (zh) * 2019-12-05 2020-05-15 北京三快在线科技有限公司 图像识别方法、装置、电子设备和存储介质
CN112308046A (zh) * 2020-12-02 2021-02-02 龙马智芯(珠海横琴)科技有限公司 图像的文本区域定位方法、装置、服务器及可读存储介质
CN112597773B (zh) * 2020-12-08 2022-12-13 上海深杳智能科技有限公司 文档结构化方法、系统、终端及介质
CN112686237A (zh) * 2020-12-21 2021-04-20 福建新大陆软件工程有限公司 一种证照ocr识别方法

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107977665A (zh) * 2017-12-15 2018-05-01 北京科摩仕捷科技有限公司 一种发票中关键信息的识别方法及计算设备
CN108171239A (zh) * 2018-02-02 2018-06-15 杭州清本科技有限公司 证书图像文字的提取方法、装置及系统、计算机存储介质
CN109034159A (zh) * 2018-05-28 2018-12-18 北京捷通华声科技股份有限公司 图像信息提取方法和装置
CN109492643A (zh) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 基于ocr的证件识别方法、装置、计算机设备及存储介质
US20190180154A1 (en) * 2017-12-13 2019-06-13 Abbyy Development Llc Text recognition using artificial intelligence
CN111160395A (zh) * 2019-12-05 2020-05-15 北京三快在线科技有限公司 图像识别方法、装置、电子设备和存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160307057A1 (en) * 2015-04-20 2016-10-20 3M Innovative Properties Company Fully Automatic Tattoo Image Processing And Retrieval
CN109840520A (zh) * 2017-11-24 2019-06-04 中国移动通信集团广东有限公司 一种发票关键信息识别方法及系统
CN108133212B (zh) * 2018-01-05 2021-06-29 东华大学 一种基于深度学习的定额发票金额识别系统
CN108520254B (zh) * 2018-03-01 2022-05-10 腾讯科技(深圳)有限公司 一种基于格式化图像的文本检测方法、装置以及相关设备
CN108776970B (zh) * 2018-06-12 2021-01-12 北京字节跳动网络技术有限公司 图像处理方法和装置
CN110472554B (zh) * 2019-08-12 2022-08-30 南京邮电大学 基于姿态分割和关键点特征的乒乓球动作识别方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190180154A1 (en) * 2017-12-13 2019-06-13 Abbyy Development Llc Text recognition using artificial intelligence
CN107977665A (zh) * 2017-12-15 2018-05-01 北京科摩仕捷科技有限公司 一种发票中关键信息的识别方法及计算设备
CN108171239A (zh) * 2018-02-02 2018-06-15 杭州清本科技有限公司 证书图像文字的提取方法、装置及系统、计算机存储介质
CN109034159A (zh) * 2018-05-28 2018-12-18 北京捷通华声科技股份有限公司 图像信息提取方法和装置
CN109492643A (zh) * 2018-10-11 2019-03-19 平安科技(深圳)有限公司 基于ocr的证件识别方法、装置、计算机设备及存储介质
CN111160395A (zh) * 2019-12-05 2020-05-15 北京三快在线科技有限公司 图像识别方法、装置、电子设备和存储介质

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113743361A (zh) * 2021-09-16 2021-12-03 上海深杳智能科技有限公司 基于图像目标检测的文档切割方法

Also Published As

Publication number Publication date
CN111160395A (zh) 2020-05-15

Similar Documents

Publication Publication Date Title
WO2021110174A1 (zh) 图像识别方法、装置、电子设备和存储介质
US11594053B2 (en) Deep-learning-based identification card authenticity verification apparatus and method
WO2020098250A1 (zh) 字符识别方法、服务器及计算机可读存储介质
US9754164B2 (en) Systems and methods for classifying objects in digital images captured using mobile devices
CN111737522B (zh) 视频匹配方法、基于区块链的侵权存证方法和装置
US9576221B2 (en) Systems, methods, and devices for image matching and object recognition in images using template image classifiers
US8917935B2 (en) Detecting text using stroke width based text detection
CN110097068B (zh) 相似车辆的识别方法和装置
CN110853033B (zh) 基于帧间相似度的视频检测方法和装置
US20210295114A1 (en) Method and apparatus for extracting structured data from image, and device
US20150379371A1 (en) Object Detection Utilizing Geometric Information Fused With Image Data
CN108810619B (zh) 识别视频中水印的方法、装置和电子设备
JP2008217347A (ja) ナンバープレート認識装置、その制御方法、コンピュータプログラム
CN110688524B (zh) 视频检索方法、装置、电子设备及存储介质
CN114663871A (zh) 图像识别方法、训练方法、装置、系统及存储介质
WO2018121414A1 (zh) 电子设备、目标图像识别方法及装置
US20160027050A1 (en) Method of providing advertisement service using cloud album
CN113743378B (zh) 一种基于视频的火情监测方法和装置
WO2019239743A1 (ja) 物体検出装置、方法、及びプログラム
CN112287905A (zh) 车辆损伤识别方法、装置、设备及存储介质
WO2014061222A1 (ja) 情報処理装置、情報処理方法および情報処理用プログラム
WO2019163699A1 (ja) 特徴抽出装置、特徴抽出方法、照合システム、および記憶媒体
CN114821062A (zh) 基于图像分割的商品识别方法及装置
CN114092684A (zh) 一种文本校准方法、装置、设备及存储介质
Zhou et al. Maximum Entropy Threshold Segmentation for Target Matching Using Speeded‐Up Robust Features

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20896376

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20896376

Country of ref document: EP

Kind code of ref document: A1