WO2023284502A1

WO2023284502A1 - Image processing method and apparatus, device, and storage medium

Info

Publication number: WO2023284502A1
Application number: PCT/CN2022/100269
Authority: WO
Inventors: 徐青松; 李青
Original assignee: 杭州睿胜软件有限公司
Priority date: 2021-07-13
Filing date: 2022-06-22
Publication date: 2023-01-19
Also published as: CN113486828B; CN113486828A

Abstract

An image processing method, an image processing apparatus, an electronic device, and a computer-readable storage medium. The image processing method comprises: obtaining an initial image, wherein the initial image comprises at least one target object; processing the initial image to obtain an intermediate image; recognizing the intermediate image by using a region detection model to obtain a connected image comprising M object connected regions; determining M bounding boxes in the connected image respectively corresponding to the M object connected regions; capturing N image blocks from the initial image on the basis of the M bounding boxes, each image block comprising at least one target object; and recognizing the N image blocks by using an object recognition model to obtain the target object in the initial image.

Description

Image processing method, device, device and storage medium

technical field

Embodiments of the present disclosure relate to an image processing method, an image processing apparatus, electronic equipment, and a computer-readable storage medium.

Background technique

With the development of digital technology, it is possible to use text recognition technology to recognize text images to obtain information recorded in text images, such as using OCR (Optical Character Recognition, optical character recognition) recognition technology to convert text content on pictures and photos , converted directly to editable text. However, the current text recognition algorithm has high complexity and a large amount of calculation, so there are restrictions on the use environment, and it is only suitable for execution on devices with high hardware configuration such as servers, but on devices with low hardware configuration such as terminal devices. When it is executed, the recognition speed will be very slow or even impossible to recognize, so it is not easy to perform text recognition when the terminal device is offline.

Contents of the invention

At least one embodiment of the present disclosure provides an image processing method, including: obtaining an initial image, the initial image includes at least one target object; processing the initial image to obtain an intermediate image; using a region detection model to identify the intermediate image to obtain an image including A connected image of M object connected regions; determine M bounding boxes corresponding to the M object connected regions in the connected image; based on the M bounding boxes, intercept N image blocks from the initial image, and each image block includes at least A target object; and using an object recognition model to identify N image blocks to obtain the target object in the initial image, where M and N are both positive integers.

For example, in the image processing method provided by an embodiment of the present disclosure, using a region detection model to identify an intermediate image to obtain a connected image including M object connected regions includes: using a region detection model to process the intermediate image to obtain a connected image including multiple A connected image of initial object connected regions; performing morphological transformation on the connected image including multiple initial object connected regions, so as to obtain a connected image including M object connected regions based on the connected image including multiple initial object connected regions.

For example, in the image processing method provided by an embodiment of the present disclosure, processing the initial image to obtain the intermediate image includes: reducing the size of the initial image from the initial size to a predetermined size; performing binarization processing on the initial image of the predetermined size , to get the intermediate image.

For example, in the image processing method provided by an embodiment of the present disclosure, determining M bounding boxes respectively corresponding to M object connected regions in the connected image includes: extracting the contour information of each of the M object connected regions; , to determine the respective bounding boxes of the connected regions of M objects.

For example, in the image processing method provided by an embodiment of the present disclosure, intercepting N image blocks from the initial image based on M bounding boxes includes: according to the correspondence between the intermediate image and the initial image, based on the M bounding boxes Each bounding box in , corresponds to intercepting an image block in the initial image, and M is equal to N; or perform predetermined processing on M bounding boxes to obtain N processed bounding boxes, and according to the difference between the intermediate image and the initial image Corresponding relationship of , based on each processed bounding box, correspondingly intercepts an image block in the initial image.

For example, in the image processing method provided by an embodiment of the present disclosure, performing predetermined processing on the M bounding boxes includes: scoring the M bounding boxes to obtain quality scores corresponding to the M bounding boxes; Bounding boxes whose value is less than the score threshold are regarded as invalid bounding boxes, and invalid bounding boxes are deleted.

For example, in the image processing method provided by an embodiment of the present disclosure, scoring the M bounding boxes includes: performing the following operations on each of the M bounding boxes: determining the area of the bounding box and the The area of the pixel corresponding to the target object; based on the ratio of the area of the pixel to the area of the bounding box, the quality score corresponding to the bounding box is determined.

For example, in the image processing method provided in an embodiment of the present disclosure, performing predetermined processing on the M bounding boxes includes: enlarging one or more bounding boxes in the M bounding boxes by a first predetermined factor.

For example, in the image processing method provided in an embodiment of the present disclosure, performing predetermined processing on the M bounding boxes further includes: detecting whether at least some areas overlap between every two adjacent bounding boxes in the M bounding boxes, and if so, Each of the two bounding boxes whose at least partial areas overlap is reduced based on a second predetermined multiple, so that the reduced two bounding boxes do not overlap or the overlapping area decreases.

For example, in the image processing method provided by an embodiment of the present disclosure, using an object recognition model to identify N image blocks to obtain the target object in the initial image includes: determining that the length of the N image blocks in the first direction is greater than Identify P first image blocks with a length threshold, and divide each first image block into at least two sub-image blocks to obtain a plurality of sub-image blocks corresponding to the P first image blocks, the length of each sub-image block is equal to or less than the recognition length threshold; and using the object recognition model to identify a plurality of sub-image blocks to obtain the target object in the P first image block, the target object in the initial image includes the target object in the P first image block, and P is positive integer.

For example, in the image processing method provided by an embodiment of the present disclosure, the object recognition model is used to identify N image blocks to obtain the target object in the initial image, and further includes: determining the length of the N image blocks in the first direction Q second image blocks smaller than the recognition length threshold, and each second image block is processed to obtain Q processed second image blocks, the length of each processed second image block in the first direction To identify the length threshold; use the object recognition model to identify Q processed second image blocks to obtain the target objects in the Q second image blocks, and the target objects in the initial image also include the targets in the Q second image blocks Object, Q is a positive integer.

For example, in the image processing method provided by an embodiment of the present disclosure, dividing each first image block into at least two sub-image blocks includes: performing the following operations on the ith first image block among the N image blocks: In the first direction, a candidate segmentation point is set for each interval identification length threshold to determine at least one candidate segmentation point corresponding to the i-th first image block; based on at least one candidate segmentation point, determine the i-th first image block corresponding to At least one segmentation point; based on at least one segmentation point, the i-th first image block is divided into at least two sub-image blocks, where i is a positive integer less than or equal to P.

For example, in the image processing method provided in an embodiment of the present disclosure, based on at least one candidate segmentation point, determining at least one segmentation point corresponding to the i-th first image block includes: if the i-th first image block Any candidate segmentation point in at least one candidate segmentation point contains an interval area within the range of the distance threshold, then a point in the interval area is used as a segmentation point corresponding to the i-th first image block; if the i-th first If any of the at least one candidate segmentation point in the image block does not contain a gap area within the range of the distance threshold, any candidate segmentation point is used as a segmentation point corresponding to the i-th first image block.

For example, in the image processing method provided in an embodiment of the present disclosure, processing each second image block includes: stitching end image blocks at least one end of each second image block in the first direction, so as to A processed second image block corresponding to each second image block is obtained, and the pixel value of each pixel in the end image block is different from the pixel value of the pixel corresponding to each object in the second image block.

For example, in the image processing method provided by an embodiment of the present disclosure, each first image block includes multiple target objects, and the multiple target objects are arranged in sequence along the first direction.

For example, in the image processing method provided by an embodiment of the present disclosure, at least one target object includes characters.

An embodiment of the present disclosure provides an image processing device, including: an image acquisition module configured to obtain an initial image, the initial image including at least one target object; an image processing module configured to process the initial image to obtain an intermediate Image; a region identification module configured to identify the intermediate image using a region detection model to obtain a connected image including connected regions of M objects; a determination module configured to determine the connection with the M objects in the connected image M bounding boxes corresponding to the connected regions; an interception module configured to intercept N image blocks from the initial image based on the M bounding boxes, each of which includes at least one target object; and object recognition A module configured to use an object recognition model to identify the N image blocks to obtain the target object in the initial image, where M and N are both positive integers.

An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory storing one or more computer program modules; the one or more computer program modules are configured to be executed by the processor, and the one Or a plurality of computer program modules are included for implementing the image processing method according to any one of the above embodiments.

An embodiment of the present disclosure also provides a computer-readable storage medium for non-transitory storage of computer-readable instructions. When the computer-readable instructions are executed by a computer, the image processing according to any of the above-mentioned embodiments can be realized. method.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the accompanying drawings of the embodiments will be briefly introduced below. Obviously, the accompanying drawings in the following description only relate to some embodiments of the present disclosure, rather than limiting the present disclosure .

Fig. 1 is a schematic flowchart of an image processing method provided by at least one embodiment of the present disclosure;

Fig. 2 is a schematic diagram of an initial image provided by at least one embodiment of the present disclosure;

Fig. 3 is a schematic diagram of a target object provided by at least one embodiment of the present disclosure;

Fig. 4 is a schematic diagram of a binarized image provided by at least one embodiment of the present disclosure;

Fig. 5 is a schematic diagram of a connected image provided by at least one embodiment of the present disclosure;

Fig. 6 is a schematic diagram of a bounding box provided by at least one embodiment of the present disclosure;

Fig. 7A is a schematic diagram of an image block intercepted from an initial image provided by at least one embodiment of the present disclosure;

Fig. 7B is a schematic diagram of an image block provided by at least one embodiment of the present disclosure;

Fig. 8 is a schematic diagram of a connected image including multiple initial object connected regions provided by at least one embodiment of the present disclosure;

Fig. 9 is a schematic flowchart of identifying N image blocks provided by at least one embodiment of the present disclosure;

Fig. 10A is a schematic diagram of segmented image blocks provided by at least one embodiment of the present disclosure;

Fig. 10B is a schematic diagram of splicing end image blocks provided by at least one embodiment of the present disclosure;

Fig. 11 is a schematic diagram of a target object recognition result provided by at least one embodiment of the present disclosure;

Fig. 12 is a schematic block diagram of an image processing device provided by at least one embodiment of the present disclosure;

Fig. 13 is a schematic block diagram of an electronic device provided by at least one embodiment of the present disclosure;

Fig. 14 is a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure;

Fig. 15 is a schematic diagram of a computer-readable storage medium provided by at least one embodiment of the present disclosure; and

Fig. 16 is a schematic diagram of a hardware environment provided by at least one embodiment of the present disclosure.

detailed description

In order to make the purpose, technical solutions and advantages of the embodiments of the present disclosure clearer, the technical solutions of the embodiments of the present disclosure will be clearly and completely described below in conjunction with the drawings of the embodiments of the present disclosure. Apparently, the described embodiments are some of the embodiments of the present disclosure, not all of them. Based on the described embodiments of the present disclosure, all other embodiments obtained by persons of ordinary skill in the art without creative effort fall within the protection scope of the present disclosure.

Unless otherwise defined, the technical terms or scientific terms used in the present disclosure shall have the usual meanings understood by those skilled in the art to which the present disclosure belongs. "First", "second" and similar words used in the present disclosure do not indicate any order, quantity or importance, but are only used to distinguish different components. "Comprising" or "comprising" and similar words mean that the elements or items appearing before the word include the elements or items listed after the word and their equivalents, without excluding other elements or items. Words such as "connected" or "connected" are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "Up", "Down", "Left", "Right" and so on are only used to indicate the relative positional relationship. When the absolute position of the described object changes, the relative positional relationship may also change accordingly.

At least one embodiment of the present disclosure provides an image processing method, an image processing device, electronic equipment, and a computer-readable storage medium. The image processing method includes: obtaining an initial image, the initial image includes at least one target object; processing the initial image to obtain an intermediate image; using a region detection model to identify the intermediate image to obtain a connected image including M object connected regions; Determining M bounding boxes respectively corresponding to the connected regions of M objects in the connected image; based on the M bounding boxes, intercepting N image blocks from the initial image, each image block including at least one target object; and using an object recognition model Identify N image blocks to obtain the target object in the initial image, and both M and N are positive integers.

The image processing method provided by the embodiments of the present disclosure can first convert the initial image into an intermediate image, and then convert the intermediate image into a connected image using a region detection model to obtain several object connected regions, determine the bounding boxes corresponding to the object connected regions, and then Go back to the initial image to intercept the image block corresponding to the bounding box. Compared with the algorithm of determining the area where the object is located directly based on the initial image in the related art, the method of the embodiment of the present disclosure has a smaller calculation amount and a simpler processing process, thus solving the problem of high complexity and large amount of calculation, and making the object The recognition algorithm can be applied to terminal devices with low hardware configuration such as mobile phones, so that the terminal devices can also perform object recognition even when they are offline.

The image processing method of the embodiment of the present disclosure can be applied to the image processing device of the embodiment of the present disclosure, and the image processing device can be configured on an electronic device. The electronic device may be a personal computer, a mobile terminal, etc., and the mobile terminal may be a hardware device such as a mobile phone or a tablet computer.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings, but the present disclosure is not limited to these specific embodiments.

Fig. 1 is a schematic flowchart of an image processing method provided by at least one embodiment of the present disclosure.

As shown in Fig. 1, the method includes steps S110-S160.

Step S110: Obtain an initial image, where the initial image includes at least one target object.

Step S120: Process the initial image to obtain an intermediate image.

Step S130: Use the region detection model to identify the intermediate image to obtain a connected image including M object connected regions.

Step S140: Determine M bounding boxes respectively corresponding to the M object connected regions in the connected image.

Step S150: Based on the M bounding boxes, intercept N image blocks from the initial image, each image block includes at least one target object.

Step S160: Use the object recognition model to identify N image blocks to obtain the target object in the initial image.

For example, both M and N are positive integers.

For example, in step S110, the initial image may be in various forms, such as electronic files in any image form such as photos, scanned images, screenshots, and PDF image pages. The initial image can be a grayscale image or a color image.

Fig. 2 is a schematic diagram of an initial image 201 provided by at least one embodiment of the present disclosure. As shown in FIG. 2 , the initial image 201 includes at least one target object, and the at least one target object may include characters. For example, each character can be a number, Chinese characters (Chinese characters, Chinese words, etc.), foreign characters (for example, foreign letters, foreign words, etc.), special characters (for example, percent sign "%"), punctuation marks, graphics (eg, triangle, arrow), etc. For example, characters can be multiple fonts, which can be printing fonts or handwritten fonts, and printing fonts can include known multiple fonts, such as Song, Hei, Kai, Times New Roman, Arial, etc. In addition, printing fonts can also be Including artistic fonts, etc. For example, in the example shown in FIG. 2 , the target object includes English letters and numbers.

Fig. 3 is a schematic diagram of a target object provided by at least one embodiment of the present disclosure. As shown in Fig. 3, in another example, the target object may also include a variety of patterns, for example, a heart-shaped pattern, a smiling face pattern, a cloud-shaped pattern patterns, sun patterns, moon patterns and more. In addition, the target object can also be in other forms than characters and patterns. The following description will be made in detail by taking the target object as an example. The processing methods of other types of target objects can correspond to the processing methods of reference characters.

For example, the type of target object can be determined according to actual needs, the type of target object to be recognized can be preset, and the corresponding area detection model and object recognition model can be trained according to the type of object to be recognized, so that the area detection model can The location of the corresponding type of object is included in the object connected area, and the object recognition model can recognize the corresponding type of object. For example, in some application scenarios, it is necessary to recognize English words and punctuation marks, you can use sample images containing English words and punctuation marks to train the region detection model and object recognition model, so that the trained region detection model can recognize English words and punctuation marks The regions where the punctuation marks are located are connected, and the trained object recognition model can recognize English words and punctuation marks.

For example, in step S120, processing the initial image to obtain an intermediate image may include: reducing the size of the initial image from the initial size to a predetermined size; performing binarization processing on the initial image of the predetermined size to obtain an intermediate image.

For example, the sizes of different initial images may be inconsistent. In order to facilitate processing, the initial images can be uniformly reduced from their original size to a predetermined size. The predetermined size can be, for example, 640*640 (pixels). On the one hand, it can reduce subsequent calculations. On the other hand, the uniform size can facilitate subsequent processing, for example, it can facilitate the region recognition processing of the region detection model.

For example, the image reduced to a predetermined size (that is, the original image of the predetermined size) can be normalized. In one example, each pixel value (such as a gray value) of the initial image of the predetermined size can be mapped to 0 ~1, that is, divide the pixel value by 255 to convert it to a value between 0 and 1. In another example, each pixel value of the initial image of a predetermined size may be mapped to a range between -1.0 and 1.0.

For example, the normalized image can be binarized to obtain a binarized image, and the binarized image can be used as the above-mentioned intermediate image. FIG. 4 is a schematic diagram of a binarized image provided by at least one embodiment of the present disclosure. The binarized image shown in FIG. 4 is a binarized image of the initial image shown in FIG. 2 . As shown in FIG. 4 , a binarization threshold (for example, 0.3, which can be set according to actual conditions, which is not specifically limited in the present disclosure) can be preset, and each pixel value after normalization can be compared with the The size of the binarization threshold is compared, if the pixel value is higher than or equal to the binarization threshold, the pixel value is converted to 1, that is, the color of the corresponding pixel point becomes pure white; if the pixel value is lower than the binarization threshold Threshold, the pixel value is converted to 0, that is, the color of the corresponding pixel point is changed to pure black. Based on this method, a pure black and white image can be obtained, and the pure black and white image is a binary image.

For example, in some embodiments, before or after any step in the above-mentioned size reduction processing, normalization processing and binarization processing, the image (initial image or an initial image of a predetermined size or after normalization The original image or binarized image) is tilt-corrected so that the characters in the image are arranged in a horizontal direction (such as the X direction shown in FIG. 4 ) or a vertical direction (such as the Y direction shown in FIG. 4 ). In addition, the original image can also be cropped to remove the background area of the surrounding area.

For example, in step S130, the region detection model can be implemented using machine learning technology and run on a general-purpose computing device or a special-purpose computing device, for example. The region detection model is a pre-trained neural network model. For example, a region detection model can be implemented using a neural network such as a deep convolutional neural network (DEEP-CNN). Input the intermediate image into the area detection model, and the area detection model can identify the area where each object in the intermediate image to be identified is located, and mark the connected areas of each identified object. In a scene where the target object is a character, the object connected region may be a character connected region. For example, the region detection model can be implemented using DBNet (Driving Behavior Net, driving behavior network) architecture, the backbone network (Backbone) in the DBNet architecture can use MobileNetV3 Large network, MobileNetV3 Large is a lightweight network, in some embodiments, MobileNetV3 For example, the parameter quantity of the Large network can be reduced on the basis of the original data quantity, for example, it can be reduced to r times the original data quantity, and r is a positive number greater than 0 and less than 1, for example, r=0.75 (r can be set according to the actual situation). In other embodiments of the present disclosure, according to actual needs, the region detection model may adopt a network architecture other than the DBNet architecture, and the backbone network may adopt a network other than the MobileNetV3 Large network.

It should be noted that the position and type of each object in the initial image is the same as that of each object in the intermediate image, as shown in Figures 2 and 4, the initial image includes the object "DECLARATION AND ASSIGNMENT", and The object "DECLARATION AND ASSIGNMENT" is located on the upper side of the initial image, the intermediate image also includes the object "DECLARATION AND ASSIGNMENT", and the object "DECLARATION AND ASSIGNMENT" is also located on the upper side of the intermediate image.

Fig. 5 is a schematic diagram of a connected image provided by at least one embodiment of the present disclosure. The connected image shown in Fig. 5 is a connected image obtained by processing the intermediate image shown in Fig. 4, and the connected image shown in Fig. 5 includes M A connected image of connected regions of objects. For example, the size of the connected image shown in FIG. 5 is the same as the size of the intermediate image shown in FIG. 4 .

As shown in FIG. 4 and FIG. 5 , each row of characters may correspond to one or more object connected regions (also called character connected regions). For example, if each character in a row is arranged continuously, that is, the interval between every adjacent two characters in a row does not exceed a predetermined interval (for example, the interval of two (or three, etc.) spaces), then the row of characters can be corresponds to form an object connected region. For example, Fig. 4 shows the character line "DECLARATION AND ASSIGNMENT", since the intervals between the adjacent characters in the line do not exceed the predetermined interval, an object connected area 501 can be formed corresponding to the character line "DECLARATION AND ASSIGNMENT" . It should be noted that the predetermined interval may be set according to actual conditions, which is not limited in the present disclosure. In addition, for the character line "DECLARATION AND ASSIGNMENT", either a single English letter can be used as a character, or an English word can be used as a character.

For example, if each character in a row is arranged discontinuously, that is, the interval between two adjacent characters in a character row exceeds a predetermined interval (for example, the interval between two (or three, etc.) spaces), then it can be based on The number of intervals forms several object connected areas, for example, the ath character to the a+bth character in a character line are arranged continuously, and the interval between the a+bth character and the a+b+1th character exceeds If a predetermined interval is set, the a+b+1th character to the a+b+cth character are arranged continuously, then the ath character to the a+bth character can correspond to form an object connected region, and the a+b+th character 1 character to the a+b+c character can correspond to form another object connected region, and a, b and c are all positive integers. For example, Fig. 4 shows character line "Signature:_____Date:_____", if in this embodiment, underline is not as the object of detection and identification, then the character in this line is first character "Signature", the first character successively. The second character ":", the third character "Date" and the fourth character ":", since the interval between the second character ":" and the third character "Date" exceeds the predetermined interval, the first The first character "Signature" and the second character ":" are consecutive, the third character "Date" and the fourth character ":" are consecutive, so the first character "Signature" and the second character " :” can correspond to form an object connected region 502, and the third character “Date” and the fourth character “:” can correspond to form another object connected region 503.

For example, in step S140, a corresponding bounding box may be determined according to each object connected region in the M object connected regions.

FIG. 6 is a schematic diagram of a bounding frame provided by at least one embodiment of the present disclosure. As shown in FIG. 5 and FIG. 6 , in this embodiment, the bounding frame is, for example, a rectangular frame. The minimum bounding box that is completely surrounded by the connected region of the object. The size of the smallest bounding box can be determined according to the length and height of the connected region of the object. For example, as shown in Figure 5 and Figure 6, for the connected object region 501, in the process of determining the size of the bounding box of the connected object region 501 in the X direction, the leftmost part of the connected object region 501 in the X direction can be determined The X coordinate corresponding to the endpoint of , and the X coordinate corresponding to the rightmost endpoint, the absolute value of the difference between the two X coordinates is taken as the size of the bounding box 601 of the object connected region 501 in the X direction. In the process of determining the size of the bounding box of the object connected region 501 in the Y direction, the Y coordinate corresponding to the lowest point and the Y coordinate corresponding to the highest point of the connected object region 501 in the Y direction can be determined, and the two Y coordinates The absolute value of the difference between them is taken as the size of the bounding box 601 of the connected object region 501 in the Y direction, thus, the bounding box 601 surrounding the connected object region 501 can be obtained. Similarly, the bounding box corresponding to each connected object region can be determined, for example, the bounding box 602 corresponding to the connected object region 502, the bounding box 603 corresponding to the connected object region 503, and the like. It is worth noting that, in order to clearly show the bounding boxes, the size of each bounding box shown in FIG. The dimensions in the X direction and the Y direction may be equal to those determined in the above manner.

For example, in other embodiments, the bounding box may also be in other shapes than rectangle, such as oval, triangle, trapezoid and so on.

It should be noted that the bounding boxes of the connected regions of the objects may also be determined in other suitable ways.

For example, in step 150, corresponding image blocks may be intercepted from the initial image 201 according to one or more (N) bounding boxes among the M bounding boxes.

Fig. 7A is a schematic diagram of an image block intercepted from an initial image provided by at least one embodiment of the present disclosure, and Fig. 7B is a schematic diagram of an image block provided by at least one embodiment of the present disclosure, combined with Fig. 2, Fig. 7A and Fig. 7B, if in In the process of obtaining the intermediate image or the connected image, the tilt correction process is carried out, then before the image block is intercepted from the initial image, the tilt correction process can be performed on the initial image 201 to obtain the corrected initial image 201`, and then from the corrected The image block is intercepted from the subsequent initial image 201'. In an example, for each of the M bounding boxes, an image block of a corresponding area may be intercepted from the initial image 201 , and in this case, M and N are equal. For example, according to the coordinate parameters of the M bounding boxes, according to the corresponding relationship (for example, mapping relationship) between the intermediate image and the initial image, the M bounding boxes are all mapped to the initial image 201 ′, so as to intercept the initial image 201 ′ An image block is framed by each bounding frame, thereby obtaining M image blocks, for example, the image block 701 is intercepted according to the bounding frame 601, the image block 702 is obtained according to the bounding frame 602, the image block 703 is obtained according to the bounding frame 603, and the image block is obtained according to the enclosing frame Block 604 intercepts the image block 704, and intercepts the image block 705 according to the bounding box 605, and so on. In another example, N may also be smaller than M, that is, a partial bounding box may be selected from the M bounding boxes, and then an image block defined by the partial bounding box may be intercepted from the initial image 201 .

For example, in step S160, the object recognition model may be used to identify each image block to obtain the character content in each image block. In the case where the target object includes characters, the object recognition model may include a character recognition model. For example, the character recognition model may be implemented based on technologies such as optical character recognition and run on a general-purpose computing device or a special-purpose computing device. For example, the character recognition model may also be Can be a pre-trained neural network model. In some embodiments, for example, there may be semantic errors, logic errors, etc. in the recognized multiple character contents. Therefore, it is necessary to verify the character contents recognized by the character recognition model and correct the semantic errors and logical errors in the character contents. etc. to get the exact character content. For example, the character recognition model can use the CRNN (Convolutional Recurrent Neural Network, convolutional cyclic neural network) + CTC (Connectionist Temporal Classification, connection time series classification) architecture, and the backbone network (Backbone) of the CRNN+CTC architecture can use the MobileNetV3 Small network. To adapt to the identification of image blocks in the embodiments of the present disclosure, adaptive adjustments can be made, for example, adaptive adjustments can be made to the inverted_res_block part in the MobileNetV3 Small network.

For example, at least one character can be recognized for each image block, and each character can be a single Chinese character, a single foreign character (for example, a single English letter or a single English word, etc.), a single number, a single symbol, a single graphic, a single punctuation symbols etc. For example, the character content "DECLARATION AND ASSIGNMENT" can be identified according to the image block 701, the character content "Signature:" can be identified according to the image block 702, and the character content "Date:" can be identified according to the image block 702.

For example, in other embodiments, the target object may include other objects other than characters, such as patterns, etc. In this case, the object recognition model may also include a pattern recognition model, etc., and the pattern recognition model, for example, runs on a general-purpose computing device Or on a dedicated computing device, for example, the pattern recognition model can also be a pre-trained neural network model. In one example, the pattern recognition model can recognize the pattern as a corresponding English word or Chinese word, for example, it can recognize the sun pattern as the word "sun". In another example, a pattern recognition model can also be used to convert the pattern into a corresponding stick figure. For example, a variety of stick figures can be stored in advance. Select the stick figure corresponding to the sun pattern from the library, and use the stick figure as the recognition result.

For example, in the case of multiple types of target objects, different recognition models can be used to identify different types of target objects, and the recognition results of multiple recognition models can be spliced and combined to obtain the recognition results of all target objects in the initial image .

The image processing method provided by the embodiments of the present disclosure can first convert the initial image into an intermediate image, and then convert the intermediate image into a connected image using a region detection model to obtain several object connected regions, determine the bounding boxes corresponding to the object connected regions, and then Go back to the initial image to intercept the image block corresponding to the bounding box. Compared with the algorithm of determining the area where the object is located directly based on the initial image in the related art, the method of the embodiment of the present disclosure has a smaller calculation amount and a simpler processing process, thus at least partially solving the problem of high complexity and large amount of calculation. The object recognition algorithm can be applied to terminal devices with low hardware configuration such as mobile phones, and the terminal device can also perform object recognition when it is offline.

For example, in step S130 (using the region detection model to identify the intermediate image to obtain a connected image including M object connected regions), the region detection model can be used to process the intermediate image to obtain a plurality of initial object connected regions Connected image: performing morphological transformation on the connected image including multiple initial object connected regions, so as to obtain a connected image including M object connected regions based on the connected image including multiple initial object connected regions.

Fig. 8 is a schematic diagram of a connected image including multiple initial object connected regions provided by at least one embodiment of the present disclosure. As shown in Fig. 8 , there may be There are problems such as small white dots 801 and glued lines 802. For example, two adjacent text lines are glued because of a certain pixel between the lines. In this case, the connected image including multiple initial object connected regions can be morphologically transformed to obtain the corrected connected image shown in FIG. 5 (that is, the connected image of M object connected regions). In the corrected connected image, the small white dot 801 is removed, and the cohesive row 802 is split into row 504 and row 505 shown in FIG. 5 . Morphological transformations can include closing operations and opening operations. The opening operation can smooth contours, break off narrow necks (such as thin white lines), and eliminate small protrusions, such as removing cohesive rows. Opening; the closing operation can also smooth the outline of the object, but contrary to the opening operation, the closing operation can bridge narrower discontinuities and slender gullies, eliminate small holes, and fill in the breaks in the contour line, such as removing small white spots.

For example, in step S140 (determining the M bounding boxes corresponding to the M object connected regions in the connected image), the contour information of each of the M object connected regions can be extracted; based on the contour information, each of the M object connected regions can be determined bounding box.

For example, the contour information may be contour line information, such as coordinate information of the contour line. The contour line information of the region can be extracted for each object connected region, and the boundary points of the object connected region in the X direction and Y direction can be determined according to the contour line information, and then the minimum bounding box corresponding to the object connected region can be determined according to the boundary points, namely The bounding box of the connected region of the object. For example, various contour extraction algorithms in opencv (a kind of computer vision and machine learning software library) can be used to realize the extraction of contour line information. Various contour extraction algorithms include Canny (Canny) edge detection algorithm, Sobel (Sobel) for example. ) edge detection algorithm, etc.

For example, in step S150 (cutting N image blocks from the initial image based on M bounding boxes), as mentioned above, in one example, M and N can be equal, according to the correspondence between the intermediate image and the initial image , based on each of the M bounding boxes, correspondingly intercepting an image block in the initial image.

For example, in one example, before determining the bounding box, the size of the connected image can be scaled (for example, enlarged) to the original size of the original image, so that the size of the connected image is consistent with the size of the original image, and then the original size In the connected image of the object, the bounding boxes of the connected area of the object are determined according to the contour information of the connected area of the object, and then each bounding box is mapped to the initial image. After the size of the connected image is scaled to the original size, since the pixel values of some newly added pixels are obtained by interpolation calculation, the pixel values of these pixels are values between 0 and 1, so for the convenience of processing , the scaled connected image can be binarized (for example, if the grayscale value of the image ranges from 0 to 255, the threshold is set to 127, and when the grayscale value of the image ranges from 0 to 1, the threshold is set to 0.5), convert the scaled connected image into a pure black and white image, and then determine the bounding box in the binarized connected image. In another example, the connected image may not be scaled, but the bounding box is determined in the connected image with a predetermined size, and then according to the proportional relationship between the original size and the predetermined size, the size of the bounding box is enlarged to obtain the original size The corresponding enlarged bounding box is used to map the enlarged bounding box to the corresponding area of the original image. It should be noted that other suitable methods may also be used to map the bounding boxes in the connected images to the initial image, which is not specifically limited in the present disclosure.

In another example, M bounding boxes can be pre-processed to obtain N processed bounding boxes, and according to the correspondence between the intermediate image and the initial image, based on each processed bounding box, correspondingly intercept the initial An image block in the image, M and N are equal or not.

For example, performing predetermined processing on the M bounding boxes may include: scoring the M bounding boxes to obtain quality scores corresponding to the M bounding boxes respectively; using bounding boxes with quality scores smaller than the score threshold as invalid bounding boxes, And remove invalid bounding boxes.

For example, scoring the M bounding boxes may include: performing the following operations for each of the M bounding boxes: determining the area of the bounding box and the area of the pixels corresponding to the target object located in the bounding box; based on the pixel-based area The ratio to the area of the bounding box determines the quality score corresponding to the bounding box.

For example, the bounding box can be mapped to the binarized image shown in Figure 4. The color of the character is different from the background color, that is, the pixel value of the character is different from the pixel value of the background. For example, the pixel value of the character is 1, and the pixel value of the background is 1. The pixel value is 0. In the process of calculating the ratio of the area of the target object in the bounding box to the area of the bounding box, each pixel in the bounding box can be traversed, and the number of pixels whose pixel values are equal to the pixel value of the target object can be counted to obtain the corresponding By dividing the number of pixels corresponding to the statistical target object by the number of all pixels included in the bounding box, the ratio of the area of the pixel corresponding to the target object to the area of the bounding box can be obtained. In one example, the ratio can be directly used as the quality score of the bounding box; in another example, several ratio ranges can be divided, and each ratio range corresponds to a score, for example, the ratio range [0-0.2) can be Corresponding to a score of 1, [0.2～0.4) can correspond to a score of 2, ..., [0.8～1] can correspond to a score of 5.

For example, in other embodiments, the quality score can be determined according to the inclination of the bounding box. For example, for the binarized image shown in FIG. (or Y direction) to determine the quality score of the bounding box. For example, the included angle can be directly used as the quality score, or several included angle ranges can be divided, and each included angle range corresponds to a score. In addition, those skilled in the art may use other methods to score image blocks.

For example, after the quality scores of the bounding boxes are obtained, the bounding boxes whose quality scores are lower than a predetermined score threshold may be removed, and high-quality bounding boxes are retained. By scoring the bounding boxes and removing invalid bounding boxes, invalid content can be filtered out, subsequent invalid calculations can be avoided, and the accuracy of recognition results can be guaranteed.

For example, the score threshold may be set according to actual conditions. In some examples, the score threshold may be s times the highest predetermined score, and s is, for example, between 0.3 and 0.8. For example, if the quality score is a value between 0 and 1, the highest predetermined score is 1, the score threshold can be a value between 0.3 and 0.8 (for example, 0.5), and the bounding box with the quality score greater than or equal to the score threshold It can be considered as a high-quality bounding box, and the bounding box with a quality score smaller than the score threshold can be considered as an invalid bounding box.

For example, performing predetermined processing on the M bounding boxes may further include: enlarging one or more bounding boxes in the M bounding boxes by a first predetermined factor.

For example, some bounding boxes may have a smaller range and the target object is not completely enclosed in the bounding box, for example, some characters in the text line are not surrounded by the bounding box or some characters are not included in the bounding box. To solve this problem, these bounding boxes can be enlarged to include target objects that are not contained in the bounding boxes. For example, the bounding box can be enlarged according to k (the first predetermined multiple) times of the area-to-perimeter ratio, for example, the center of enlargement is the center of the bounding box, k is, for example, a positive number greater than 1 and less than 2, for example, k is 1.6 . For example, for any bounding box, the enlarged bounding box corresponding to the bounding box may completely cover the bounding box.

For example, all the M bounding boxes can be enlarged, or several bounding boxes with a smaller range can be selected from the M bounding boxes for zooming in. For example, it can be detected whether there is If the target object is not enclosed in any bounding box, for example, it may be detected whether there are a certain number of pixels of the target object within a predetermined surrounding range of the bounding box, and if so, the bounding box may be enlarged. For example, the predetermined surrounding range may be an annular area between a virtual bounding box and the bounding box obtained by enlarging the bounding box with its central point as the magnification center by t times, and t is, for example, greater than 1 and less than 2. For example, the M bounding boxes include the first bounding box, and the center point of the first bounding box is used as the magnification center to enlarge the first bounding box by t times to obtain the first virtual bounding box, and the first virtual bounding box can be combined with the first bounding box The annular area between the frames serves as a predetermined peripheral range of the first bounding frame. For example, the bounding box includes two first sides extending along the X direction and two second sides extending along the Y direction. Both sides are enlarged by t times to obtain a virtual bounding box. In this case, the distance between the center point of the bounding box and the first side of the virtual bounding box is t times the distance between the center point and the first side of the bounding box, for example, the distance between the center point and the first side of the bounding box The distance between the first sides is 5 (mm), then the distance between the center point of the bounding box and the first side of the virtual bounding box is 5t (mm). Likewise, the distance between the center point of the bounding box and the second side of the virtual bounding box is t times the distance between the center point and the second side of the bounding box.

For example, the enlarging operation for the bounding boxes can be performed after the operation of removing the invalid bounding boxes. In this case, after removing the invalid bounding boxes, there are still N bounding boxes, and the enlarging process can be performed on all the N bounding boxes, or can be obtained from N Several bounding boxes with a smaller range are screened out from the bounding boxes for zooming in.

For example, performing predetermined processing on the M bounding boxes may also include: detecting whether at least some areas overlap between every two adjacent bounding boxes in the M bounding boxes, and if so, each of the two bounding boxes with at least partially overlapping areas The two bounding boxes are reduced based on the second predetermined multiple, so that the two reduced bounding boxes do not overlap or the overlapping area is reduced.

For example, some bounding boxes may enclose a large area and cause some areas of two adjacent bounding boxes to overlap. To solve this problem, these bounding boxes can be reduced so that the reduced two bounding boxes do not overlap or overlap The area is reduced. For example, the intersection between every two adjacent bounding boxes can be calculated. The intersection between two adjacent bounding boxes is, for example, the MIoU value (Mean Intersection over Union, semantic segmentation evaluation index) between two adjacent bounding boxes. , and shrink according to the multiple of 0.9*(1-MIoU), the second predetermined multiple is, for example, the 0.9*(1-MIoU), the second predetermined multiple is, for example, a value between 0.5 and 0.9, that is, the bounding box is reduced to the original 0.5 to 0.9 times the size. The shrinking process for the bounding box can be performed after the zoom-in process for the bounding box, which can avoid the problem that the adjacent bounding boxes are connected or overlapped after the bounding box is enlarged, so that each bounding box has a suitable range, and then can be in the initial image. An image block of appropriate size is intercepted from the image.

For example, after the N image blocks are intercepted, at least part of the N image blocks can be scaled so that the processed N image blocks have the same size in the Y direction. For example, the N image blocks can be The size of the block in the Y direction is uniformly scaled to the size corresponding to 32 pixels to facilitate the processing of the subsequent object recognition model.

Fig. 9 is a schematic flowchart of identifying N image blocks provided by at least one embodiment of the present disclosure. As shown in Fig. 9, for example, in step S160 (using an object recognition model to identify N image blocks to obtain the target object), steps S161 to S164 may be included.

Step S161: Determine the P first image blocks whose length in the first direction is greater than the recognition length threshold among the N image blocks, and divide each first image block into at least two sub-image blocks to obtain An image block corresponds to a plurality of sub-image blocks, and the length of each sub-image block is equal to or smaller than the recognition length threshold. For example, P is a positive integer.

Step S162: Using the object recognition model to identify a plurality of sub-image blocks to obtain target objects in the P first image blocks. For example, the target object in the initial image includes the target object in the P first image blocks.

Step S163: Determine Q second image blocks whose length in the first direction is smaller than the recognition length threshold among the N image blocks, and process each second image block to obtain Q processed second image blocks, The length of each processed second image block in the first direction is the recognition length threshold. For example, Q is a positive integer.

Step S164: Use the object recognition model to identify the Q processed second image blocks, so as to obtain the target objects in the Q second image blocks. For example, the target object in the initial image also includes the target object in the Q second image blocks.

For example, in some embodiments, the N image blocks may only include the first image block and not include the second image block. In this case, in step S160, only step S161 and step S162 may be executed without executing Step S163 and Step S164. In some other embodiments, the N image blocks may only include the second image block but not the first image block. In this case, in step S160, only step S163 and step S164 may be performed without performing step S161 and step S162.

For example, each first image block includes multiple target objects, and the multiple target objects are arranged in sequence along the first direction. The first direction can be the length direction of the image block, and the length direction of the image block can be determined according to the arrangement direction of the target objects in the image block. For example, as shown in Figure 7B, the characters in the image block are arranged according to the X direction, then the first direction May refer to the X direction.

For example, the length of an image block can be represented by the number of pixels, and a recognition length threshold can be preset, and the recognition length threshold can be, for example, 400-1000 pixels, such as 640 pixels. For an image block larger than the recognition length threshold among the N image blocks, the image block may be segmented, for example, divided into several sub-image blocks whose length is less than or equal to the recognition length threshold. For an image block among the N image blocks that is smaller than the recognition length threshold, the image block may be processed to have a length equal to the recognition length threshold. Based on this method, on the one hand, processing image blocks into approximately uniform sizes can facilitate model processing; on the other hand, dividing larger image blocks into small image blocks can reduce the amount of calculation of the model, and can be used simply The recognition model is used for recognition, which improves the recognition speed.

For example, in step S161, dividing each first image block into at least two sub-image blocks may include: performing the following operations on the i-th first image block among the N image blocks: in the first direction, every interval Identifying the length threshold to set a candidate segmentation point to determine at least one candidate segmentation point corresponding to the i-th first image block; based on at least one candidate segmentation point, determine at least one segmentation point corresponding to the i-th first image block; based on at least A segmentation point, which divides the i-th first image block into at least two sub-image blocks, for example, i is a positive integer less than or equal to P.

Fig. 10A is a schematic diagram of a segmented image block provided by at least one embodiment of the present disclosure. As shown in Fig. 10A , the segmentation process is described by taking the image block 704 as an example, starting from the starting point 901 of the image block 704, and identifying the length threshold at each interval L sets a candidate segmentation point, for example, the candidate segmentation points 902 and 903 are obtained. A segmentation point can be determined according to each candidate segmentation point. In one example, the candidate segmentation point can be directly used as a segmentation point, for example, the candidate segmentation point 902 can be used as a segmentation point; in another example, the candidate segmentation point can be A point within the predetermined distance range of is used as a segmentation point. The predetermined distance range can be, for example, the range between [pc-lg, pc+lg] in the X direction, pc is the X coordinate of the candidate segmentation point, and lg is g pixels The size of g, for example, is between 12 and 60.

For example, a segmentation point 903 ′ is determined within a predetermined distance range of the candidate segmentation point 903 . After each division point is obtained, the image block may be cut along the division point, for example,

sub-image blocks

7041 , 7042 and 7043 are obtained by cutting.

For example, based on at least one candidate segmentation point, determining at least one segmentation point corresponding to the i-th first image block may include: if any candidate segmentation point in the at least one candidate segmentation point in the i-th first image block Include interval area in the scope of distance threshold value, then take a point in the interval area as a segmentation point corresponding to the i-th first image block; if any of at least one candidate segmentation point in the i-th first image block If the range of the distance threshold of the candidate segmentation point does not include the interval area, any candidate segmentation point is taken as a segmentation point corresponding to the i-th first image block.

For example, in a scene where the target object includes characters, if the candidate segmentation point is exactly located in the space between two adjacent characters, the candidate segmentation point can be used as the segmentation point, for example, the candidate segmentation point 902 is located between the characters "," and In the interval area between the characters "Building", the candidate segmentation point can be used as a segmentation point. If the candidate segmentation point is not located in the interval area between two adjacent characters, the interval area near the candidate segmentation point can be determined, and a point in the interval area is used as the segmentation point. For example, the candidate segmentation point 903 is located in the character "Beijing ", but not in the space between the characters, therefore, the pixel points within a certain distance range of the candidate segmentation point 903 can be traversed to find the space between the character "Beijing", for example, the character "," (located in the character "Road " and the character "Beijing") and the space between the character "Beijing" is within a certain distance from the character "Beijing", then a space can be determined in the space between the character "," and the character "Beijing". point as the split point, for example, the midpoint of the character space area as the split point. If there is no character space area within the predetermined distance range of the candidate segmentation point, the candidate segmentation point may be regarded as a segmentation point.

For example, after the division point is determined, the first image block may be cut according to the position of the cut point to obtain several sub-image blocks.

For example, processing each second image block may include: in the first direction, splicing end image blocks at at least one end of each second image block to obtain a processed first image block corresponding to each second image block. Two image blocks. For example, the pixel value of each pixel in the end image block is different from the pixel value of the pixel corresponding to each object in the second image block.

FIG. 10B is a schematic diagram of splicing end image blocks provided by at least one embodiment of the present disclosure. As shown in FIG. 10B , the segmentation process is described by taking the image block 702 as an example. For example, for the image block 702 whose length is less than the recognition length threshold L , the image block 702 can be complemented. For example, the end image block can be spliced on one side or both sides of the X direction of the image block 702. The pixel value of the end image block is different from the pixel value of the target object. The end image The pixel value of the block may be consistent with the pixel value of the background part of the image block 702, for example, and the length of the new image block 702' obtained after splicing is, for example, equal to the recognition length threshold L.

For example, for sub-image blocks obtained after cutting whose length is less than the recognition length threshold L, splicing and supplementary length processing may also be performed. For example, as shown in FIG. 10A, if the length of the sub-image block 7043 obtained after cutting is less than the recognition length threshold L , then the sub-image block 7043 can be processed according to the above splicing manner, so that the length of the processed sub-image block 7043 is equal to the recognition length threshold L.

For example, after each sub-image block and the second image block corresponding to the recognition length threshold are obtained by means of cutting processing and/or splicing processing, the object recognition model can be used to identify each sub-image block and the second image block deal with. Taking the recognition of English letters as an example, the length of each English letter and punctuation mark is, for example, 4 pixels, then for an image block of 32*640*3, 640/4=160 English letters can be recognized, 32*640*3 32, for example, represents the height corresponding to the image block height of 32 pixels, 640, for example, represents the length corresponding to the image block length of 640 pixels, and 3, for example, represents that the image block is an image block with 3 channels.

For example, the object recognition model can be trained to output d possible candidate recognition results for each target object, and d is an integer greater than 0 and less than 5. For example, taking the recognition of English letters as an example, when d is 2, For the English letter "m" in the image block, the object recognition model may output candidate recognition results as "m" and "n". For example, for a 32*640*3 image block, 160*d recognition results can be returned, and each character is 4 pixels, so there are 160 characters, and d represents the candidate recognition result of the object recognition model to judge each character quantity. Then, the argmax function can be used to operate on 160*d recognition results and return 160 recognition results. It is equivalent to finding the most likely recognition result from the candidate recognition results of each character. For example, in some embodiments, in the recognition process, it is judged and recognized by dividing 4 pixels pixel by pixel, and there may be repeated recognition results, so it is also possible to remove repeated recognition results through deduplication operations to obtain image blocks the final recognition result.

Fig. 11 is a schematic diagram of the target object recognition result provided by at least one embodiment of the present disclosure, combined with Fig. 2, 7A, 7B and 11, for example, the object recognized according to each sub-image block and the processed second image block The recognition results are combined and spliced to obtain the object recognition result 1100 corresponding to the initial image, that is, the processing result obtained after the initial image 201 is processed by the image processing method provided in the present disclosure is shown in FIG. 11 . As shown in FIG. 11 , in the object recognition result 1100, all the characters (i.e. target objects) in the initial image 201 are recognized, and the relative positional relationship between each character in the object recognition result 1100 and its position in the initial image 201 The relative positions in are the same.

At least one embodiment of the present disclosure further provides an image processing device, and FIG. 12 is a schematic block diagram of an image processing device provided by at least one embodiment of the present disclosure.

As shown in FIG. 12 , the image processing device may include: an image acquisition module 1201 , an image processing module 1202 , an area identification module 1203 , a determination module 1204 , an interception module 1205 and an object identification module 1206 .

For example, the image acquiring module 1201 is configured to acquire an initial image, and the initial image includes at least one target object. For example, the image acquiring module 1201 may execute step S110 described in FIG. 1 , for a specific introduction, please refer to the related description of step S110 , which will not be repeated here.

For example, the image processing module 1202 is configured to process the initial image to obtain an intermediate image. For example, the image processing module 1202 may execute the step S120 described in FIG. 1 , for a specific introduction, please refer to the related description of the step S120 , which will not be repeated here.

For example, the region recognition module 1203 is configured to use a region detection model to recognize the intermediate image to obtain a connected image including M object connected regions, where M is a positive integer. For example, the area identification module 1203 may execute the step S130 described in FIG. 1 , for a specific introduction, refer to the relevant description of the step S130 , which will not be repeated here.

For example, the determination module 1204 is configured to determine M bounding boxes respectively corresponding to the M object connected regions in the connected image. For example, the determining module 1204 may execute the step S140 described in FIG. 1 , for a specific introduction, please refer to the relevant description of the step S140 , which will not be repeated here.

For example, the interception module 1205 is configured to intercept N image blocks from the initial image based on the M bounding boxes, each image block includes at least one target object, and N is a positive integer. For example, the intercepting module 1205 may execute step S150 described in FIG. 1 , for a specific introduction, please refer to the relevant description of step S150 , which will not be repeated here.

For example, the object recognition module 1206 is configured to use an object recognition model to recognize N image blocks, so as to obtain the target object in the initial image. For example, the object recognition module 1206 may execute the step S160 described in FIG. 1 , for a specific introduction, please refer to the relevant description of the step S160 , which will not be repeated here.

In addition, the image processing device can achieve technical effects similar to those of the aforementioned image processing method, which will not be repeated here.

For example, image acquisition module 1201, image processing module 1202, area identification module 1203, determination module 1204, interception module 1205, and/or object identification module 1206 include codes and programs stored in memory; processors can execute the codes and programs to Realize some or all of the functions of the image acquisition module 1201 , image processing module 1202 , area identification module 1203 , determination module 1204 , interception module 1205 and/or object identification module 1206 described above. For example, the image acquisition module 1201, the image processing module 1202, the area identification module 1203, the determination module 1204, the interception module 1205 and/or the object identification module 1206 may be dedicated hardware devices, which are used to implement the above-mentioned image acquisition module 1201, Some or all of the functions of the image processing module 1202 , the region recognition module 1203 , the determination module 1204 , the interception module 1205 and/or the object recognition module 1206 . For example, the image acquisition module 1201, the image processing module 1202, the area recognition module 1203, the determination module 1204, the interception module 1205 and/or the object recognition module 1206 may be a circuit board or a combination of multiple circuit boards for realizing the above-mentioned function. In the embodiment of the present application, the circuit board or a combination of multiple circuit boards may include: (1) one or more processors; (2) one or more non-transitory memories connected to the processors; and (3) Processor-executable firmware stored in memory.

At least one embodiment of the present disclosure further provides an electronic device, and FIG. 13 is a schematic block diagram of the electronic device provided by at least one embodiment of the present disclosure.

For example, as shown in FIG. 13 , an electronic device 1300 includes a processor 1301 , a communication interface 1302 , a memory 1303 and a communication bus 1304 . The processor 1301, the communication interface 1302, and the memory 1303 communicate with each other through the communication bus 1304, and the processor 1301, the communication interface 1302, the memory 1303 and other components may also communicate through a network connection. The present disclosure does not limit the type and function of the network here.

For example, memory 1303 is used to store computer readable instructions. When the processor 1301 is configured to execute computer-readable instructions, implement the image processing method according to any of the foregoing embodiments. For the specific implementation of each step of the image processing method and related explanations, reference may be made to the above-mentioned embodiment of the image processing method, and details are not repeated here.

For example, other implementations of the image processing method implemented by the processor 1301 executing the program stored in the memory 1303 are the same as the implementations mentioned in the foregoing method embodiments, and will not be repeated here.

For example, communication bus 1304 may be a Peripheral Component Interconnect Standard (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.

For example, the communication interface 1302 is used to implement communication between the electronic device and other devices.

For example, the processor 1301 and the memory 1303 may be set at the server (or cloud).

For example, the processor 1301 may control other components in the electronic device to perform desired functions. The processor 1301 can be a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components. The central processing unit (CPU) may be an X86 or ARM architecture or the like.

For example, memory 1303 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example. Non-volatile memory may include, for example, read only memory (ROM), hard disks, erasable programmable read only memory (EPROM), compact disc read only memory (CD-ROM), USB memory, flash memory, and the like. One or more computer-readable instructions can be stored on the computer-readable storage medium, and the processor 1301 can execute the computer-readable instructions to realize various functions of the electronic device. Various application programs, various data, and the like can also be stored in the storage medium.

For example, for a detailed description of the process of image processing performed by the electronic device, reference may be made to relevant descriptions in the embodiments of the image processing method, and repeated descriptions will not be repeated.

Fig. 14 is a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure.

At least one embodiment of the present disclosure also provides another electronic device. As shown in FIG. 14 , an electronic device 1400 may include a memory 1401 , a processor 1402 and an image acquisition component 1403 . It should be noted that the components of the electronic device 1400 shown in FIG. 14 are exemplary rather than limiting, and the electronic device 1400 may also have other components according to actual application requirements.

For example, the image acquiring component 1403 is used to acquire an initial image. The memory 1401 is used to store initial images and computer readable instructions. Processor 1402 is used to read the initial image and execute computer readable instructions. When the computer-readable instructions are executed by the processor 1402, one or more steps in the image processing method according to any of the above-mentioned embodiments are executed.

For example, the image acquisition component 1403 can be an image acquisition device, for example, the image acquisition component 1403 can be a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a network camera, and other devices for image acquisition .

For example, the initial image may be an original image directly collected by the image acquisition component 1403, or may be an image obtained after preprocessing the original image. Preprocessing can eliminate irrelevant information or noise information in the original image, so as to process the image better. Preprocessing may include, for example, performing image augmentation (Data Augment), image scaling, gamma (Gamma) correction, image enhancement, or noise reduction filtering on the original image.

For example, the processor 1402 may control other components in the electronic device 1400 to perform desired functions. The processor 1402 may be a central processing unit (CPU), a tensor processing unit (TPU), or a graphics processing unit (GPU), etc., which has data processing capabilities and/or program execution capabilities.

For example, memory 1401 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. One or more computer-readable instructions can be stored on the computer-readable storage medium, and the processor 1402 can execute the computer-readable instructions to implement various functions of the electronic device 1400 .

For example, for a detailed description of the process of image processing performed by the electronic device 1400, reference may be made to relevant descriptions in the embodiments of the image processing method, and repeated descriptions will not be repeated.

Fig. 15 is a schematic diagram of a computer-readable storage medium provided by at least one embodiment of the present disclosure. For example, as shown in FIG. 15 , one or more computer readable instructions 1501 may be stored non-transitory on storage medium 1500 . For example, when the computer-readable instructions 1501 are executed by the processor, one or more steps in the image processing method described above may be performed.

For example, the storage medium 1500 may be applied in the electronic device 1300 and/or the electronic device 1400 , for example, it may include the memory 1303 in the electronic device 1300 and/or the memory 1401 in the electronic device 1400 .

For example, for the description of the storage medium 1500, reference may be made to the description of the memory in the embodiments of the electronic device 1300 and/or the electronic device 1400, and repeated descriptions will not be repeated.

Fig. 16 is a schematic diagram of a hardware environment provided by at least one embodiment of the present disclosure. The electronic device provided by the present disclosure can be applied in the Internet system.

The image processing apparatus, the electronic device 1300, and/or the electronic device 1400 involved in the present disclosure may be implemented using the computer system provided in FIG. 16 . Such computer systems can include personal computers, laptops, tablets, mobile phones, personal digital assistants, smart glasses, smart watches, smart rings, smart helmets, and any smart portable or wearable device. The specific system in this embodiment illustrates a hardware platform including a user interface using functional block diagrams. Such computer equipment may be a general purpose computer equipment or a special purpose computer equipment. Both computer devices can be used to realize the image processing apparatus and electronic devices in this embodiment. The computer system can implement any of the components of the presently described information needed to achieve image processing recognition. For example, a computer system can be realized by a computer device through its hardware devices, software programs, firmware, and combinations thereof. For the sake of convenience, only one computer device is drawn in Fig. 16, but the relevant computer functions for realizing the information required for image processing described in this embodiment can be implemented by a group of similar platforms in a distributed manner, Distribute the processing load of a computer system.

As shown in Figure 16, the computer system can include a communication port 1650, which is connected to a network for data communication. For example, the computer system can send and receive information and data through the communication port 1650, that is, the communication port 1650 can realize the communication between the computer system and the computer system. Other electronic devices communicate wirelessly or by wire to exchange data. The computer system may also include a processor group 1620 (ie, the processor described above) for executing program instructions. The processor group 1620 may consist of at least one processor (eg, CPU). The computer system may include an internal communication bus 1610 . A computer system may include different forms of program storage units and data storage units (i.e., memory or storage media described above), such as hard disk 1670, read-only memory (ROM) 1630, random-access memory (RAM) 1640, which can be used to store Various data files used by the computer for processing and/or communicating, and possibly program instructions executed by the processor group 1620 . The computer system may also include an input/output component 1660 for enabling input/output data flow between the computer system and other components (eg, user interface 1680, etc.).

Typically, the following devices can be connected to the input/output assembly 1660: input devices including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibrator, etc. output devices; storage devices including, for example, magnetic tapes, hard disks, etc.; and communication interfaces.

While FIG. 16 shows a computer system with various devices, it should be understood that the computer system is not required to have all of the devices shown and, instead, the computer system may have more or fewer devices.

For this disclosure, the following points need to be explained:

(1) The drawings of the embodiments of the present disclosure only relate to the structures involved in the embodiments of the present disclosure, and other structures may refer to general designs.

(2) In the case of no conflict, the embodiments of the present disclosure and the features in the embodiments can be combined with each other to obtain new embodiments.

The above description is only a specific implementation manner of the present disclosure, but the protection scope of the present disclosure is not limited thereto, and the protection scope of the present disclosure should be based on the protection scope of the claims.

Claims

An image processing method, comprising:

obtaining an initial image, wherein the initial image includes at least one target object;

processing the initial image to obtain an intermediate image;

Using a region detection model to identify the intermediate image to obtain a connected image including M object connected regions;

determining M bounding boxes respectively corresponding to the connected regions of the M objects in the connected image;

Based on the M bounding boxes, intercepting N image blocks from the initial image, wherein each of the image blocks includes at least one target object; and

identifying the N image blocks by using an object recognition model to obtain the target object in the initial image,

Wherein, both M and N are positive integers.
The method according to claim 1, wherein, using a region detection model to identify the intermediate image, to obtain a connected image comprising M object connected regions, comprising:

processing the intermediate image by using the region detection model to obtain a connected image including a plurality of connected regions of initial objects;

performing a morphological transformation on the connected image including the multiple initial object connected regions, so as to obtain the connected image including the M object connected regions based on the connected image including the multiple initial object connected regions.
The method according to claim 2, wherein processing the initial image to obtain an intermediate image comprises:

reducing the size of the initial image from an initial size to a predetermined size;

Perform binarization processing on the initial image of the predetermined size to obtain the intermediate image.
The method according to claim 2, wherein determining M bounding boxes respectively corresponding to the M object connected regions in the connected image comprises:

extracting contour information of each of the M object connected regions;

Based on the contour information, the respective bounding boxes of the M object connected regions are determined.
The method according to claim 1, wherein, based on the M bounding boxes, intercepting N image blocks from the initial image comprises:

According to the correspondence between the intermediate image and the initial image, based on each of the M bounding boxes, correspondingly intercepting an image block in the initial image, where M is equal to N; or

performing predetermined processing on the M bounding boxes to obtain N processed bounding boxes, and according to the correspondence between the intermediate image and the initial image, based on each of the processed bounding boxes, correspondingly intercepting An image patch in the original image.
The method according to claim 5, wherein performing predetermined processing on the M bounding boxes includes:

Scoring the M bounding boxes to obtain quality scores corresponding to the M bounding boxes;

A bounding box whose quality score is smaller than the score threshold is regarded as an invalid bounding box, and the invalid bounding box is deleted.
The method according to claim 6, wherein scoring the M bounding boxes comprises: performing the following operations for each of the M bounding boxes:

determining the area of the bounding box and the area of pixels corresponding to the target object located in the bounding box;

Based on the ratio of the area of the pixel to the area of the bounding box, the quality score corresponding to the bounding box is determined.
The method according to claim 5, wherein performing predetermined processing on the M bounding boxes includes:

Enlarge one or more bounding boxes in the M bounding boxes by a first predetermined factor.
The method according to any one of claims 6-8, wherein performing predetermined processing on the M bounding boxes further includes:

Detecting whether at least some areas overlap between every two adjacent bounding boxes in the M bounding boxes,

If so, each of the two bounding boxes with at least partial overlapping areas is reduced based on a second predetermined multiple, so that the reduced two bounding boxes do not overlap or the overlapping area is reduced.
The method according to any one of claims 1-6, wherein using an object recognition model to identify the N image blocks to obtain the target object in the initial image comprises:

Determining P first image blocks whose length in the first direction is greater than the recognition length threshold among the N image blocks, and dividing each of the first image blocks into at least two sub-image blocks, so as to obtain the A plurality of sub-image blocks corresponding to the P first image blocks, wherein the length of each of the sub-image blocks is equal to or less than the recognition length threshold; and

identifying the plurality of sub-image blocks by using the object recognition model to obtain target objects in the P first image blocks,

Wherein, the target object in the initial image includes the target object in the P first image blocks, and P is a positive integer.
The method according to claim 10, wherein, using an object recognition model to identify the N image blocks to obtain the target object in the initial image, further comprising:

Determining Q second image blocks whose length in the first direction is less than the recognition length threshold among the N image blocks, and processing each of the second image blocks to obtain Q processed The second image block, wherein the length of each processed second image block in the first direction is the recognition length threshold;

identifying the Q processed second image blocks by using the object recognition model, so as to obtain target objects in the Q second image blocks,

Wherein, the target object in the initial image also includes the target object in the Q second image blocks, and Q is a positive integer.
The method according to claim 10, wherein dividing each of the first image blocks into at least two sub-image blocks comprises:

The following operations are performed on the i-th first image block among the N image blocks:

In the first direction, setting a candidate segmentation point at intervals of the identification length threshold to determine at least one candidate segmentation point corresponding to the ith first image block;

Based on the at least one candidate segmentation point, determine at least one segmentation point corresponding to the ith first image block;

based on the at least one segmentation point, dividing the ith first image block into at least two sub-image blocks,

Wherein, i is a positive integer less than or equal to P.
The method according to claim 12, wherein, based on the at least one candidate segmentation point, determining at least one segmentation point corresponding to the ith first image block comprises:

If an interval area is included within the range of the distance threshold of any candidate segmentation point in the at least one candidate segmentation point in the ith first image block, then a point in the interval area is used as the first A segmentation point corresponding to the i first image block;

If the interval region is not included in the range of the distance threshold of any candidate segmentation point in the at least one candidate segmentation point in the ith first image block, then the any candidate segmentation point is used as A segmentation point corresponding to the ith first image block.
The method according to claim 11, wherein processing each of the second image blocks comprises:

In the first direction, at least one end of each of the second image blocks is spliced to obtain a processed second image block corresponding to each of the second image blocks, wherein the The pixel value of each pixel in the end image block is different from the pixel value of the pixel corresponding to each object in the second image block.
The method according to claim 10, wherein each of the first image blocks includes a plurality of target objects, and the plurality of target objects are arranged in sequence along the first direction.
The method according to any one of claims 1-6, wherein said at least one target object comprises a character.
An image processing device, comprising:

An image acquisition module configured to acquire an initial image, wherein the initial image includes at least one target object;

an image processing module configured to process the initial image to obtain an intermediate image;

An area recognition module configured to use an area detection model to identify the intermediate image, so as to obtain a connected image including M object connected areas;

A determination module configured to determine M bounding boxes corresponding to the connected regions of the M objects in the connected image;

An intercepting module configured to intercept N image blocks from the initial image based on the M bounding boxes, wherein each of the image blocks includes at least one target object; and

an object recognition module configured to use an object recognition model to recognize the N image blocks to obtain the target object in the initial image,

Wherein, both M and N are positive integers.
An electronic device comprising:

processor;

memory for storing one or more computer program modules;

Wherein, the one or more computer program modules are configured to be executed by the processor, and the one or more computer program modules include instructions for realizing the image processing method according to any one of claims 1-16 .
A computer-readable storage medium, used for non-transitory storage of computer-readable instructions, when the computer-readable instructions are executed by a computer, the image processing method described in any one of claims 1-16 can be realized.