WO2023284502A1 - Procédé et appareil de traitement d'image, dispositif et support de stockage - Google Patents

Procédé et appareil de traitement d'image, dispositif et support de stockage Download PDF

Info

Publication number
WO2023284502A1
WO2023284502A1 PCT/CN2022/100269 CN2022100269W WO2023284502A1 WO 2023284502 A1 WO2023284502 A1 WO 2023284502A1 CN 2022100269 W CN2022100269 W CN 2022100269W WO 2023284502 A1 WO2023284502 A1 WO 2023284502A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
bounding boxes
initial
blocks
image blocks
Prior art date
Application number
PCT/CN2022/100269
Other languages
English (en)
Chinese (zh)
Inventor
徐青松
李青
Original Assignee
杭州睿胜软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 杭州睿胜软件有限公司 filed Critical 杭州睿胜软件有限公司
Publication of WO2023284502A1 publication Critical patent/WO2023284502A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/136Segmentation; Edge detection involving thresholding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/155Segmentation; Edge detection involving morphological operators
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/187Segmentation; Edge detection involving region growing; involving region merging; involving connected component labelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/60Analysis of geometric attributes
    • G06T7/62Analysis of geometric attributes of area, perimeter, diameter or volume
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20036Morphological image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30168Image quality inspection

Definitions

  • Embodiments of the present disclosure relate to an image processing method, an image processing apparatus, electronic equipment, and a computer-readable storage medium.
  • At least one embodiment of the present disclosure provides an image processing method, including: obtaining an initial image, the initial image includes at least one target object; processing the initial image to obtain an intermediate image; using a region detection model to identify the intermediate image to obtain an image including A connected image of M object connected regions; determine M bounding boxes corresponding to the M object connected regions in the connected image; based on the M bounding boxes, intercept N image blocks from the initial image, and each image block includes at least A target object; and using an object recognition model to identify N image blocks to obtain the target object in the initial image, where M and N are both positive integers.
  • using a region detection model to identify an intermediate image to obtain a connected image including M object connected regions includes: using a region detection model to process the intermediate image to obtain a connected image including multiple A connected image of initial object connected regions; performing morphological transformation on the connected image including multiple initial object connected regions, so as to obtain a connected image including M object connected regions based on the connected image including multiple initial object connected regions.
  • processing the initial image to obtain the intermediate image includes: reducing the size of the initial image from the initial size to a predetermined size; performing binarization processing on the initial image of the predetermined size , to get the intermediate image.
  • determining M bounding boxes respectively corresponding to M object connected regions in the connected image includes: extracting the contour information of each of the M object connected regions; , to determine the respective bounding boxes of the connected regions of M objects.
  • intercepting N image blocks from the initial image based on M bounding boxes includes: according to the correspondence between the intermediate image and the initial image, based on the M bounding boxes Each bounding box in , corresponds to intercepting an image block in the initial image, and M is equal to N; or perform predetermined processing on M bounding boxes to obtain N processed bounding boxes, and according to the difference between the intermediate image and the initial image Corresponding relationship of , based on each processed bounding box, correspondingly intercepts an image block in the initial image.
  • performing predetermined processing on the M bounding boxes includes: scoring the M bounding boxes to obtain quality scores corresponding to the M bounding boxes; Bounding boxes whose value is less than the score threshold are regarded as invalid bounding boxes, and invalid bounding boxes are deleted.
  • scoring the M bounding boxes includes: performing the following operations on each of the M bounding boxes: determining the area of the bounding box and the The area of the pixel corresponding to the target object; based on the ratio of the area of the pixel to the area of the bounding box, the quality score corresponding to the bounding box is determined.
  • performing predetermined processing on the M bounding boxes includes: enlarging one or more bounding boxes in the M bounding boxes by a first predetermined factor.
  • performing predetermined processing on the M bounding boxes further includes: detecting whether at least some areas overlap between every two adjacent bounding boxes in the M bounding boxes, and if so, Each of the two bounding boxes whose at least partial areas overlap is reduced based on a second predetermined multiple, so that the reduced two bounding boxes do not overlap or the overlapping area decreases.
  • using an object recognition model to identify N image blocks to obtain the target object in the initial image includes: determining that the length of the N image blocks in the first direction is greater than Identify P first image blocks with a length threshold, and divide each first image block into at least two sub-image blocks to obtain a plurality of sub-image blocks corresponding to the P first image blocks, the length of each sub-image block is equal to or less than the recognition length threshold; and using the object recognition model to identify a plurality of sub-image blocks to obtain the target object in the P first image block, the target object in the initial image includes the target object in the P first image block, and P is positive integer.
  • the object recognition model is used to identify N image blocks to obtain the target object in the initial image, and further includes: determining the length of the N image blocks in the first direction Q second image blocks smaller than the recognition length threshold, and each second image block is processed to obtain Q processed second image blocks, the length of each processed second image block in the first direction To identify the length threshold; use the object recognition model to identify Q processed second image blocks to obtain the target objects in the Q second image blocks, and the target objects in the initial image also include the targets in the Q second image blocks Object, Q is a positive integer.
  • dividing each first image block into at least two sub-image blocks includes: performing the following operations on the ith first image block among the N image blocks: In the first direction, a candidate segmentation point is set for each interval identification length threshold to determine at least one candidate segmentation point corresponding to the i-th first image block; based on at least one candidate segmentation point, determine the i-th first image block corresponding to At least one segmentation point; based on at least one segmentation point, the i-th first image block is divided into at least two sub-image blocks, where i is a positive integer less than or equal to P.
  • determining at least one segmentation point corresponding to the i-th first image block includes: if the i-th first image block Any candidate segmentation point in at least one candidate segmentation point contains an interval area within the range of the distance threshold, then a point in the interval area is used as a segmentation point corresponding to the i-th first image block; if the i-th first If any of the at least one candidate segmentation point in the image block does not contain a gap area within the range of the distance threshold, any candidate segmentation point is used as a segmentation point corresponding to the i-th first image block.
  • processing each second image block includes: stitching end image blocks at least one end of each second image block in the first direction, so as to A processed second image block corresponding to each second image block is obtained, and the pixel value of each pixel in the end image block is different from the pixel value of the pixel corresponding to each object in the second image block.
  • each first image block includes multiple target objects, and the multiple target objects are arranged in sequence along the first direction.
  • At least one target object includes characters.
  • An embodiment of the present disclosure provides an image processing device, including: an image acquisition module configured to obtain an initial image, the initial image including at least one target object; an image processing module configured to process the initial image to obtain an intermediate Image; a region identification module configured to identify the intermediate image using a region detection model to obtain a connected image including connected regions of M objects; a determination module configured to determine the connection with the M objects in the connected image M bounding boxes corresponding to the connected regions; an interception module configured to intercept N image blocks from the initial image based on the M bounding boxes, each of which includes at least one target object; and object recognition A module configured to use an object recognition model to identify the N image blocks to obtain the target object in the initial image, where M and N are both positive integers.
  • An embodiment of the present disclosure also provides an electronic device, including: a processor; a memory storing one or more computer program modules; the one or more computer program modules are configured to be executed by the processor, and the one Or a plurality of computer program modules are included for implementing the image processing method according to any one of the above embodiments.
  • An embodiment of the present disclosure also provides a computer-readable storage medium for non-transitory storage of computer-readable instructions.
  • the computer-readable instructions are executed by a computer, the image processing according to any of the above-mentioned embodiments can be realized. method.
  • Fig. 1 is a schematic flowchart of an image processing method provided by at least one embodiment of the present disclosure
  • Fig. 2 is a schematic diagram of an initial image provided by at least one embodiment of the present disclosure
  • Fig. 3 is a schematic diagram of a target object provided by at least one embodiment of the present disclosure.
  • Fig. 4 is a schematic diagram of a binarized image provided by at least one embodiment of the present disclosure
  • Fig. 5 is a schematic diagram of a connected image provided by at least one embodiment of the present disclosure.
  • Fig. 6 is a schematic diagram of a bounding box provided by at least one embodiment of the present disclosure.
  • Fig. 7A is a schematic diagram of an image block intercepted from an initial image provided by at least one embodiment of the present disclosure
  • Fig. 7B is a schematic diagram of an image block provided by at least one embodiment of the present disclosure.
  • Fig. 8 is a schematic diagram of a connected image including multiple initial object connected regions provided by at least one embodiment of the present disclosure
  • Fig. 9 is a schematic flowchart of identifying N image blocks provided by at least one embodiment of the present disclosure.
  • Fig. 10A is a schematic diagram of segmented image blocks provided by at least one embodiment of the present disclosure.
  • Fig. 10B is a schematic diagram of splicing end image blocks provided by at least one embodiment of the present disclosure.
  • Fig. 11 is a schematic diagram of a target object recognition result provided by at least one embodiment of the present disclosure.
  • Fig. 12 is a schematic block diagram of an image processing device provided by at least one embodiment of the present disclosure.
  • Fig. 13 is a schematic block diagram of an electronic device provided by at least one embodiment of the present disclosure.
  • Fig. 14 is a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure.
  • Fig. 15 is a schematic diagram of a computer-readable storage medium provided by at least one embodiment of the present disclosure.
  • Fig. 16 is a schematic diagram of a hardware environment provided by at least one embodiment of the present disclosure.
  • At least one embodiment of the present disclosure provides an image processing method, an image processing device, electronic equipment, and a computer-readable storage medium.
  • the image processing method includes: obtaining an initial image, the initial image includes at least one target object; processing the initial image to obtain an intermediate image; using a region detection model to identify the intermediate image to obtain a connected image including M object connected regions; Determining M bounding boxes respectively corresponding to the connected regions of M objects in the connected image; based on the M bounding boxes, intercepting N image blocks from the initial image, each image block including at least one target object; and using an object recognition model Identify N image blocks to obtain the target object in the initial image, and both M and N are positive integers.
  • the image processing method provided by the embodiments of the present disclosure can first convert the initial image into an intermediate image, and then convert the intermediate image into a connected image using a region detection model to obtain several object connected regions, determine the bounding boxes corresponding to the object connected regions, and then Go back to the initial image to intercept the image block corresponding to the bounding box.
  • the method of the embodiment of the present disclosure has a smaller calculation amount and a simpler processing process, thus solving the problem of high complexity and large amount of calculation, and making the object
  • the recognition algorithm can be applied to terminal devices with low hardware configuration such as mobile phones, so that the terminal devices can also perform object recognition even when they are offline.
  • the image processing method of the embodiment of the present disclosure can be applied to the image processing device of the embodiment of the present disclosure, and the image processing device can be configured on an electronic device.
  • the electronic device may be a personal computer, a mobile terminal, etc.
  • the mobile terminal may be a hardware device such as a mobile phone or a tablet computer.
  • Fig. 1 is a schematic flowchart of an image processing method provided by at least one embodiment of the present disclosure.
  • the method includes steps S110-S160.
  • Step S110 Obtain an initial image, where the initial image includes at least one target object.
  • Step S120 Process the initial image to obtain an intermediate image.
  • Step S130 Use the region detection model to identify the intermediate image to obtain a connected image including M object connected regions.
  • Step S140 Determine M bounding boxes respectively corresponding to the M object connected regions in the connected image.
  • Step S150 Based on the M bounding boxes, intercept N image blocks from the initial image, each image block includes at least one target object.
  • Step S160 Use the object recognition model to identify N image blocks to obtain the target object in the initial image.
  • both M and N are positive integers.
  • the initial image may be in various forms, such as electronic files in any image form such as photos, scanned images, screenshots, and PDF image pages.
  • the initial image can be a grayscale image or a color image.
  • Fig. 2 is a schematic diagram of an initial image 201 provided by at least one embodiment of the present disclosure.
  • the initial image 201 includes at least one target object, and the at least one target object may include characters.
  • each character can be a number, Chinese characters (Chinese characters, Chinese words, etc.), foreign characters (for example, foreign letters, foreign words, etc.), special characters (for example, percent sign "%"), punctuation marks, graphics (eg, triangle, arrow), etc.
  • characters can be multiple fonts, which can be printing fonts or handwritten fonts, and printing fonts can include known multiple fonts, such as Song, Hei, Kai, Times New Roman, Arial, etc.
  • printing fonts can also be Including artistic fonts, etc.
  • the target object includes English letters and numbers.
  • Fig. 3 is a schematic diagram of a target object provided by at least one embodiment of the present disclosure.
  • the target object may also include a variety of patterns, for example, a heart-shaped pattern, a smiling face pattern, a cloud-shaped pattern patterns, sun patterns, moon patterns and more.
  • the target object can also be in other forms than characters and patterns. The following description will be made in detail by taking the target object as an example.
  • the processing methods of other types of target objects can correspond to the processing methods of reference characters.
  • the type of target object can be determined according to actual needs, the type of target object to be recognized can be preset, and the corresponding area detection model and object recognition model can be trained according to the type of object to be recognized, so that the area detection model can The location of the corresponding type of object is included in the object connected area, and the object recognition model can recognize the corresponding type of object.
  • it is necessary to recognize English words and punctuation marks you can use sample images containing English words and punctuation marks to train the region detection model and object recognition model, so that the trained region detection model can recognize English words and punctuation marks The regions where the punctuation marks are located are connected, and the trained object recognition model can recognize English words and punctuation marks.
  • processing the initial image to obtain an intermediate image may include: reducing the size of the initial image from the initial size to a predetermined size; performing binarization processing on the initial image of the predetermined size to obtain an intermediate image.
  • the sizes of different initial images may be inconsistent.
  • the initial images can be uniformly reduced from their original size to a predetermined size.
  • the predetermined size can be, for example, 640*640 (pixels).
  • the uniform size can facilitate subsequent processing, for example, it can facilitate the region recognition processing of the region detection model.
  • each pixel value (such as a gray value) of the initial image of the predetermined size can be mapped to 0 ⁇ 1, that is, divide the pixel value by 255 to convert it to a value between 0 and 1.
  • each pixel value of the initial image of a predetermined size may be mapped to a range between -1.0 and 1.0.
  • the normalized image can be binarized to obtain a binarized image, and the binarized image can be used as the above-mentioned intermediate image.
  • FIG. 4 is a schematic diagram of a binarized image provided by at least one embodiment of the present disclosure.
  • the binarized image shown in FIG. 4 is a binarized image of the initial image shown in FIG. 2 . As shown in FIG.
  • a binarization threshold (for example, 0.3, which can be set according to actual conditions, which is not specifically limited in the present disclosure) can be preset, and each pixel value after normalization can be compared with the The size of the binarization threshold is compared, if the pixel value is higher than or equal to the binarization threshold, the pixel value is converted to 1, that is, the color of the corresponding pixel point becomes pure white; if the pixel value is lower than the binarization threshold Threshold, the pixel value is converted to 0, that is, the color of the corresponding pixel point is changed to pure black. Based on this method, a pure black and white image can be obtained, and the pure black and white image is a binary image.
  • the image (initial image or an initial image of a predetermined size or after normalization
  • the original image or binarized image) is tilt-corrected so that the characters in the image are arranged in a horizontal direction (such as the X direction shown in FIG. 4 ) or a vertical direction (such as the Y direction shown in FIG. 4 ).
  • the original image can also be cropped to remove the background area of the surrounding area.
  • the region detection model can be implemented using machine learning technology and run on a general-purpose computing device or a special-purpose computing device, for example.
  • the region detection model is a pre-trained neural network model.
  • a region detection model can be implemented using a neural network such as a deep convolutional neural network (DEEP-CNN).
  • DEEP-CNN deep convolutional neural network
  • Input the intermediate image into the area detection model, and the area detection model can identify the area where each object in the intermediate image to be identified is located, and mark the connected areas of each identified object.
  • the object connected region may be a character connected region.
  • the region detection model can be implemented using DBNet (Driving Behavior Net, driving behavior network) architecture
  • the backbone network (Backbone) in the DBNet architecture can use MobileNetV3 Large network
  • MobileNetV3 Large is a lightweight network
  • the region detection model may adopt a network architecture other than the DBNet architecture
  • the backbone network may adopt a network other than the MobileNetV3 Large network.
  • each object in the initial image is the same as that of each object in the intermediate image, as shown in Figures 2 and 4, the initial image includes the object "DECLARATION AND ASSIGNMENT", and The object “DECLARATION AND ASSIGNMENT” is located on the upper side of the initial image, the intermediate image also includes the object "DECLARATION AND ASSIGNMENT”, and the object “DECLARATION AND ASSIGNMENT” is also located on the upper side of the intermediate image.
  • Fig. 5 is a schematic diagram of a connected image provided by at least one embodiment of the present disclosure.
  • the connected image shown in Fig. 5 is a connected image obtained by processing the intermediate image shown in Fig. 4, and the connected image shown in Fig. 5 includes M A connected image of connected regions of objects.
  • the size of the connected image shown in FIG. 5 is the same as the size of the intermediate image shown in FIG. 4 .
  • each row of characters may correspond to one or more object connected regions (also called character connected regions). For example, if each character in a row is arranged continuously, that is, the interval between every adjacent two characters in a row does not exceed a predetermined interval (for example, the interval of two (or three, etc.) spaces), then the row of characters can be corresponds to form an object connected region.
  • a predetermined interval for example, the interval of two (or three, etc.) spaces
  • Fig. 4 shows the character line "DECLARATION AND ASSIGNMENT"
  • an object connected area 501 can be formed corresponding to the character line "DECLARATION AND ASSIGNMENT" .
  • the predetermined interval may be set according to actual conditions, which is not limited in the present disclosure.
  • the character line "DECLARATION AND ASSIGNMENT” either a single English letter can be used as a character, or an English word can be used as a character.
  • each character in a row is arranged discontinuously, that is, the interval between two adjacent characters in a character row exceeds a predetermined interval (for example, the interval between two (or three, etc.) spaces), then it can be based on The number of intervals forms several object connected areas, for example, the ath character to the a+bth character in a character line are arranged continuously, and the interval between the a+bth character and the a+b+1th character exceeds If a predetermined interval is set, the a+b+1th character to the a+b+cth character are arranged continuously, then the ath character to the a+bth character can correspond to form an object connected region, and the a+b+th character 1 character to the a+b+c character can correspond to form another object connected region, and a, b and c are all positive integers.
  • a predetermined interval for example, the interval between two (or three, etc.) spaces
  • Fig. 4 shows character line "Signature:_____Date:_____”, if in this embodiment, underline is not as the object of detection and identification, then the character in this line is first character "Signature", the first character successively.
  • a corresponding bounding box may be determined according to each object connected region in the M object connected regions.
  • FIG. 6 is a schematic diagram of a bounding frame provided by at least one embodiment of the present disclosure.
  • the bounding frame is, for example, a rectangular frame.
  • the minimum bounding box that is completely surrounded by the connected region of the object.
  • the size of the smallest bounding box can be determined according to the length and height of the connected region of the object.
  • the leftmost part of the connected object region 501 in the X direction can be determined
  • the X coordinate corresponding to the endpoint of , and the X coordinate corresponding to the rightmost endpoint, the absolute value of the difference between the two X coordinates is taken as the size of the bounding box 601 of the object connected region 501 in the X direction.
  • the Y coordinate corresponding to the lowest point and the Y coordinate corresponding to the highest point of the connected object region 501 in the Y direction can be determined, and the two Y coordinates The absolute value of the difference between them is taken as the size of the bounding box 601 of the connected object region 501 in the Y direction, thus, the bounding box 601 surrounding the connected object region 501 can be obtained.
  • the bounding box corresponding to each connected object region can be determined, for example, the bounding box 602 corresponding to the connected object region 502, the bounding box 603 corresponding to the connected object region 503, and the like. It is worth noting that, in order to clearly show the bounding boxes, the size of each bounding box shown in FIG. The dimensions in the X direction and the Y direction may be equal to those determined in the above manner.
  • the bounding box may also be in other shapes than rectangle, such as oval, triangle, trapezoid and so on.
  • bounding boxes of the connected regions of the objects may also be determined in other suitable ways.
  • corresponding image blocks may be intercepted from the initial image 201 according to one or more (N) bounding boxes among the M bounding boxes.
  • Fig. 7A is a schematic diagram of an image block intercepted from an initial image provided by at least one embodiment of the present disclosure
  • Fig. 7B is a schematic diagram of an image block provided by at least one embodiment of the present disclosure, combined with Fig. 2, Fig. 7A and Fig. 7B, if in In the process of obtaining the intermediate image or the connected image, the tilt correction process is carried out, then before the image block is intercepted from the initial image, the tilt correction process can be performed on the initial image 201 to obtain the corrected initial image 201 ⁇ , and then from the corrected The image block is intercepted from the subsequent initial image 201'.
  • an image block of a corresponding area may be intercepted from the initial image 201 , and in this case, M and N are equal.
  • the M bounding boxes are all mapped to the initial image 201 ′, so as to intercept the initial image 201 ′
  • An image block is framed by each bounding frame, thereby obtaining M image blocks, for example, the image block 701 is intercepted according to the bounding frame 601, the image block 702 is obtained according to the bounding frame 602, the image block 703 is obtained according to the bounding frame 603, and the image block is obtained according to the enclosing frame
  • Block 604 intercepts the image block 704, and intercepts the image block 705 according to the bounding box 605, and so on.
  • N may also be smaller than M, that is, a partial bounding box may be selected from the M bound
  • the object recognition model may be used to identify each image block to obtain the character content in each image block.
  • the object recognition model may include a character recognition model.
  • the character recognition model may be implemented based on technologies such as optical character recognition and run on a general-purpose computing device or a special-purpose computing device.
  • the character recognition model may also be Can be a pre-trained neural network model.
  • the character recognition model can use the CRNN (Convolutional Recurrent Neural Network, convolutional cyclic neural network) + CTC (Connectionist Temporal Classification, connection time series classification) architecture, and the backbone network (Backbone) of the CRNN+CTC architecture can use the MobileNetV3 Small network.
  • CRNN Convolutional Recurrent Neural Network, convolutional cyclic neural network
  • CTC Connectionist Temporal Classification, connection time series classification
  • Backbone Backbone of the CRNN+CTC architecture
  • adaptive adjustments can be made, for example, adaptive adjustments can be made to the inverted_res_block part in the MobileNetV3 Small network.
  • each character can be recognized for each image block, and each character can be a single Chinese character, a single foreign character (for example, a single English letter or a single English word, etc.), a single number, a single symbol, a single graphic, a single punctuation symbols etc.
  • the character content "DECLARATION AND ASSIGNMENT” can be identified according to the image block 701
  • the character content "Signature:” can be identified according to the image block 702
  • the character content "Date:” can be identified according to the image block 702.
  • the target object may include other objects other than characters, such as patterns, etc.
  • the object recognition model may also include a pattern recognition model, etc.
  • the pattern recognition model for example, runs on a general-purpose computing device Or on a dedicated computing device, for example, the pattern recognition model can also be a pre-trained neural network model.
  • the pattern recognition model can recognize the pattern as a corresponding English word or Chinese word, for example, it can recognize the sun pattern as the word "sun".
  • a pattern recognition model can also be used to convert the pattern into a corresponding stick figure.
  • a variety of stick figures can be stored in advance. Select the stick figure corresponding to the sun pattern from the library, and use the stick figure as the recognition result.
  • different recognition models can be used to identify different types of target objects, and the recognition results of multiple recognition models can be spliced and combined to obtain the recognition results of all target objects in the initial image .
  • the image processing method provided by the embodiments of the present disclosure can first convert the initial image into an intermediate image, and then convert the intermediate image into a connected image using a region detection model to obtain several object connected regions, determine the bounding boxes corresponding to the object connected regions, and then Go back to the initial image to intercept the image block corresponding to the bounding box.
  • the method of the embodiment of the present disclosure has a smaller calculation amount and a simpler processing process, thus at least partially solving the problem of high complexity and large amount of calculation.
  • the object recognition algorithm can be applied to terminal devices with low hardware configuration such as mobile phones, and the terminal device can also perform object recognition when it is offline.
  • the region detection model can be used to process the intermediate image to obtain a plurality of initial object connected regions Connected image: performing morphological transformation on the connected image including multiple initial object connected regions, so as to obtain a connected image including M object connected regions based on the connected image including multiple initial object connected regions.
  • Fig. 8 is a schematic diagram of a connected image including multiple initial object connected regions provided by at least one embodiment of the present disclosure.
  • problems such as small white dots 801 and glued lines 802.
  • the connected image including multiple initial object connected regions can be morphologically transformed to obtain the corrected connected image shown in FIG. 5 (that is, the connected image of M object connected regions).
  • the small white dot 801 is removed, and the cohesive row 802 is split into row 504 and row 505 shown in FIG. 5 .
  • Morphological transformations can include closing operations and opening operations.
  • the opening operation can smooth contours, break off narrow necks (such as thin white lines), and eliminate small protrusions, such as removing cohesive rows. Opening; the closing operation can also smooth the outline of the object, but contrary to the opening operation, the closing operation can bridge narrower discontinuities and slender gullies, eliminate small holes, and fill in the breaks in the contour line, such as removing small white spots.
  • step S140 determining the M bounding boxes corresponding to the M object connected regions in the connected image
  • the contour information of each of the M object connected regions can be extracted; based on the contour information, each of the M object connected regions can be determined bounding box.
  • the contour information may be contour line information, such as coordinate information of the contour line.
  • the contour line information of the region can be extracted for each object connected region, and the boundary points of the object connected region in the X direction and Y direction can be determined according to the contour line information, and then the minimum bounding box corresponding to the object connected region can be determined according to the boundary points, namely The bounding box of the connected region of the object.
  • various contour extraction algorithms in opencv can be used to realize the extraction of contour line information.
  • Various contour extraction algorithms include Canny (Canny) edge detection algorithm, Sobel (Sobel) for example. ) edge detection algorithm, etc.
  • step S150 cutting N image blocks from the initial image based on M bounding boxes
  • M and N can be equal, according to the correspondence between the intermediate image and the initial image , based on each of the M bounding boxes, correspondingly intercepting an image block in the initial image.
  • the size of the connected image can be scaled (for example, enlarged) to the original size of the original image, so that the size of the connected image is consistent with the size of the original image, and then the original size
  • the bounding boxes of the connected area of the object are determined according to the contour information of the connected area of the object, and then each bounding box is mapped to the initial image.
  • the scaled connected image can be binarized (for example, if the grayscale value of the image ranges from 0 to 255, the threshold is set to 127, and when the grayscale value of the image ranges from 0 to 1, the threshold is set to 0.5), convert the scaled connected image into a pure black and white image, and then determine the bounding box in the binarized connected image.
  • the connected image may not be scaled, but the bounding box is determined in the connected image with a predetermined size, and then according to the proportional relationship between the original size and the predetermined size, the size of the bounding box is enlarged to obtain the original size
  • the corresponding enlarged bounding box is used to map the enlarged bounding box to the corresponding area of the original image. It should be noted that other suitable methods may also be used to map the bounding boxes in the connected images to the initial image, which is not specifically limited in the present disclosure.
  • M bounding boxes can be pre-processed to obtain N processed bounding boxes, and according to the correspondence between the intermediate image and the initial image, based on each processed bounding box, correspondingly intercept the initial An image block in the image, M and N are equal or not.
  • performing predetermined processing on the M bounding boxes may include: scoring the M bounding boxes to obtain quality scores corresponding to the M bounding boxes respectively; using bounding boxes with quality scores smaller than the score threshold as invalid bounding boxes, And remove invalid bounding boxes.
  • scoring the M bounding boxes may include: performing the following operations for each of the M bounding boxes: determining the area of the bounding box and the area of the pixels corresponding to the target object located in the bounding box; based on the pixel-based area The ratio to the area of the bounding box determines the quality score corresponding to the bounding box.
  • the bounding box can be mapped to the binarized image shown in Figure 4.
  • the color of the character is different from the background color, that is, the pixel value of the character is different from the pixel value of the background.
  • the pixel value of the character is 1, and the pixel value of the background is 1.
  • the pixel value is 0.
  • each pixel in the bounding box can be traversed, and the number of pixels whose pixel values are equal to the pixel value of the target object can be counted to obtain the corresponding
  • the ratio of the area of the pixel corresponding to the target object to the area of the bounding box can be obtained.
  • the ratio can be directly used as the quality score of the bounding box; in another example, several ratio ranges can be divided, and each ratio range corresponds to a score, for example, the ratio range [0-0.2) can be Corresponding to a score of 1, [0.2 ⁇ 0.4) can correspond to a score of 2, ..., [0.8 ⁇ 1] can correspond to a score of 5.
  • the quality score can be determined according to the inclination of the bounding box. For example, for the binarized image shown in FIG. (or Y direction) to determine the quality score of the bounding box.
  • the included angle can be directly used as the quality score, or several included angle ranges can be divided, and each included angle range corresponds to a score.
  • those skilled in the art may use other methods to score image blocks.
  • the bounding boxes whose quality scores are lower than a predetermined score threshold may be removed, and high-quality bounding boxes are retained.
  • the score threshold may be set according to actual conditions.
  • the score threshold may be s times the highest predetermined score, and s is, for example, between 0.3 and 0.8.
  • the score threshold can be a value between 0.3 and 0.8 (for example, 0.5), and the bounding box with the quality score greater than or equal to the score threshold It can be considered as a high-quality bounding box, and the bounding box with a quality score smaller than the score threshold can be considered as an invalid bounding box.
  • performing predetermined processing on the M bounding boxes may further include: enlarging one or more bounding boxes in the M bounding boxes by a first predetermined factor.
  • some bounding boxes may have a smaller range and the target object is not completely enclosed in the bounding box, for example, some characters in the text line are not surrounded by the bounding box or some characters are not included in the bounding box.
  • these bounding boxes can be enlarged to include target objects that are not contained in the bounding boxes.
  • the bounding box can be enlarged according to k (the first predetermined multiple) times of the area-to-perimeter ratio, for example, the center of enlargement is the center of the bounding box, k is, for example, a positive number greater than 1 and less than 2, for example, k is 1.6 .
  • the enlarged bounding box corresponding to the bounding box may completely cover the bounding box.
  • all the M bounding boxes can be enlarged, or several bounding boxes with a smaller range can be selected from the M bounding boxes for zooming in. For example, it can be detected whether there is If the target object is not enclosed in any bounding box, for example, it may be detected whether there are a certain number of pixels of the target object within a predetermined surrounding range of the bounding box, and if so, the bounding box may be enlarged.
  • the predetermined surrounding range may be an annular area between a virtual bounding box and the bounding box obtained by enlarging the bounding box with its central point as the magnification center by t times, and t is, for example, greater than 1 and less than 2.
  • the M bounding boxes include the first bounding box, and the center point of the first bounding box is used as the magnification center to enlarge the first bounding box by t times to obtain the first virtual bounding box, and the first virtual bounding box can be combined with the first bounding box
  • the annular area between the frames serves as a predetermined peripheral range of the first bounding frame.
  • the bounding box includes two first sides extending along the X direction and two second sides extending along the Y direction. Both sides are enlarged by t times to obtain a virtual bounding box.
  • the distance between the center point of the bounding box and the first side of the virtual bounding box is t times the distance between the center point and the first side of the bounding box, for example, the distance between the center point and the first side of the bounding box The distance between the first sides is 5 (mm), then the distance between the center point of the bounding box and the first side of the virtual bounding box is 5t (mm). Likewise, the distance between the center point of the bounding box and the second side of the virtual bounding box is t times the distance between the center point and the second side of the bounding box.
  • the enlarging operation for the bounding boxes can be performed after the operation of removing the invalid bounding boxes.
  • the enlarging process can be performed on all the N bounding boxes, or can be obtained from N Several bounding boxes with a smaller range are screened out from the bounding boxes for zooming in.
  • performing predetermined processing on the M bounding boxes may also include: detecting whether at least some areas overlap between every two adjacent bounding boxes in the M bounding boxes, and if so, each of the two bounding boxes with at least partially overlapping areas The two bounding boxes are reduced based on the second predetermined multiple, so that the two reduced bounding boxes do not overlap or the overlapping area is reduced.
  • some bounding boxes may enclose a large area and cause some areas of two adjacent bounding boxes to overlap.
  • these bounding boxes can be reduced so that the reduced two bounding boxes do not overlap or overlap
  • the area is reduced.
  • the intersection between every two adjacent bounding boxes can be calculated.
  • the intersection between two adjacent bounding boxes is, for example, the MIoU value (Mean Intersection over Union, semantic segmentation evaluation index) between two adjacent bounding boxes.
  • the second predetermined multiple is, for example, the 0.9*(1-MIoU), the second predetermined multiple is, for example, a value between 0.5 and 0.9, that is, the bounding box is reduced to the original 0.5 to 0.9 times the size.
  • the shrinking process for the bounding box can be performed after the zoom-in process for the bounding box, which can avoid the problem that the adjacent bounding boxes are connected or overlapped after the bounding box is enlarged, so that each bounding box has a suitable range, and then can be in the initial image. An image block of appropriate size is intercepted from the image.
  • the N image blocks can be scaled so that the processed N image blocks have the same size in the Y direction.
  • the N image blocks can be The size of the block in the Y direction is uniformly scaled to the size corresponding to 32 pixels to facilitate the processing of the subsequent object recognition model.
  • Fig. 9 is a schematic flowchart of identifying N image blocks provided by at least one embodiment of the present disclosure. As shown in Fig. 9, for example, in step S160 (using an object recognition model to identify N image blocks to obtain the target object), steps S161 to S164 may be included.
  • Step S161 Determine the P first image blocks whose length in the first direction is greater than the recognition length threshold among the N image blocks, and divide each first image block into at least two sub-image blocks to obtain An image block corresponds to a plurality of sub-image blocks, and the length of each sub-image block is equal to or smaller than the recognition length threshold.
  • P is a positive integer.
  • Step S162 Using the object recognition model to identify a plurality of sub-image blocks to obtain target objects in the P first image blocks.
  • the target object in the initial image includes the target object in the P first image blocks.
  • Step S163 Determine Q second image blocks whose length in the first direction is smaller than the recognition length threshold among the N image blocks, and process each second image block to obtain Q processed second image blocks,
  • the length of each processed second image block in the first direction is the recognition length threshold.
  • Q is a positive integer.
  • Step S164 Use the object recognition model to identify the Q processed second image blocks, so as to obtain the target objects in the Q second image blocks.
  • the target object in the initial image also includes the target object in the Q second image blocks.
  • the N image blocks may only include the first image block and not include the second image block.
  • step S160 only step S161 and step S162 may be executed without executing Step S163 and Step S164.
  • the N image blocks may only include the second image block but not the first image block. In this case, in step S160, only step S163 and step S164 may be performed without performing step S161 and step S162.
  • each first image block includes multiple target objects, and the multiple target objects are arranged in sequence along the first direction.
  • the first direction can be the length direction of the image block, and the length direction of the image block can be determined according to the arrangement direction of the target objects in the image block.
  • the characters in the image block are arranged according to the X direction, then the first direction May refer to the X direction.
  • the length of an image block can be represented by the number of pixels, and a recognition length threshold can be preset, and the recognition length threshold can be, for example, 400-1000 pixels, such as 640 pixels.
  • the image block For an image block larger than the recognition length threshold among the N image blocks, the image block may be segmented, for example, divided into several sub-image blocks whose length is less than or equal to the recognition length threshold.
  • the image block For an image block among the N image blocks that is smaller than the recognition length threshold, the image block may be processed to have a length equal to the recognition length threshold.
  • processing image blocks into approximately uniform sizes can facilitate model processing; on the other hand, dividing larger image blocks into small image blocks can reduce the amount of calculation of the model, and can be used simply
  • the recognition model is used for recognition, which improves the recognition speed.
  • dividing each first image block into at least two sub-image blocks may include: performing the following operations on the i-th first image block among the N image blocks: in the first direction, every interval Identifying the length threshold to set a candidate segmentation point to determine at least one candidate segmentation point corresponding to the i-th first image block; based on at least one candidate segmentation point, determine at least one segmentation point corresponding to the i-th first image block; based on at least A segmentation point, which divides the i-th first image block into at least two sub-image blocks, for example, i is a positive integer less than or equal to P.
  • Fig. 10A is a schematic diagram of a segmented image block provided by at least one embodiment of the present disclosure.
  • the segmentation process is described by taking the image block 704 as an example, starting from the starting point 901 of the image block 704, and identifying the length threshold at each interval L sets a candidate segmentation point, for example, the candidate segmentation points 902 and 903 are obtained.
  • a segmentation point can be determined according to each candidate segmentation point.
  • the candidate segmentation point can be directly used as a segmentation point, for example, the candidate segmentation point 902 can be used as a segmentation point; in another example, the candidate segmentation point can be A point within the predetermined distance range of is used as a segmentation point.
  • the predetermined distance range can be, for example, the range between [pc-lg, pc+lg] in the X direction, pc is the X coordinate of the candidate segmentation point, and lg is g pixels
  • the size of g for example, is between 12 and 60.
  • a segmentation point 903 ′ is determined within a predetermined distance range of the candidate segmentation point 903 .
  • the image block may be cut along the division point, for example, sub-image blocks 7041 , 7042 and 7043 are obtained by cutting.
  • determining at least one segmentation point corresponding to the i-th first image block may include: if any candidate segmentation point in the at least one candidate segmentation point in the i-th first image block Include interval area in the scope of distance threshold value, then take a point in the interval area as a segmentation point corresponding to the i-th first image block; if any of at least one candidate segmentation point in the i-th first image block If the range of the distance threshold of the candidate segmentation point does not include the interval area, any candidate segmentation point is taken as a segmentation point corresponding to the i-th first image block.
  • the candidate segmentation point can be used as the segmentation point, for example, the candidate segmentation point 902 is located between the characters "," and In the interval area between the characters "Building", the candidate segmentation point can be used as a segmentation point. If the candidate segmentation point is not located in the interval area between two adjacent characters, the interval area near the candidate segmentation point can be determined, and a point in the interval area is used as the segmentation point.
  • the candidate segmentation point 903 is located in the character "Beijing ", but not in the space between the characters, therefore, the pixel points within a certain distance range of the candidate segmentation point 903 can be traversed to find the space between the character "Beijing", for example, the character ",” (located in the character “Road “ and the character “Beijing") and the space between the character "Beijing" is within a certain distance from the character "Beijing", then a space can be determined in the space between the character ",” and the character "Beijing". point as the split point, for example, the midpoint of the character space area as the split point. If there is no character space area within the predetermined distance range of the candidate segmentation point, the candidate segmentation point may be regarded as a segmentation point.
  • the first image block may be cut according to the position of the cut point to obtain several sub-image blocks.
  • processing each second image block may include: in the first direction, splicing end image blocks at at least one end of each second image block to obtain a processed first image block corresponding to each second image block.
  • Two image blocks For example, the pixel value of each pixel in the end image block is different from the pixel value of the pixel corresponding to each object in the second image block.
  • FIG. 10B is a schematic diagram of splicing end image blocks provided by at least one embodiment of the present disclosure.
  • the segmentation process is described by taking the image block 702 as an example.
  • the image block 702 can be complemented.
  • the end image block can be spliced on one side or both sides of the X direction of the image block 702.
  • the pixel value of the end image block is different from the pixel value of the target object.
  • the end image The pixel value of the block may be consistent with the pixel value of the background part of the image block 702, for example, and the length of the new image block 702' obtained after splicing is, for example, equal to the recognition length threshold L.
  • splicing and supplementary length processing may also be performed. For example, as shown in FIG. 10A, if the length of the sub-image block 7043 obtained after cutting is less than the recognition length threshold L , then the sub-image block 7043 can be processed according to the above splicing manner, so that the length of the processed sub-image block 7043 is equal to the recognition length threshold L.
  • the object recognition model can be used to identify each sub-image block and the second image block deal with.
  • the length of each English letter and punctuation mark is, for example, 4 pixels
  • 32*640*3 32 represents the height corresponding to the image block height of 32 pixels
  • 640 for example, represents the length corresponding to the image block length of 640 pixels
  • the object recognition model can be trained to output d possible candidate recognition results for each target object, and d is an integer greater than 0 and less than 5. For example, taking the recognition of English letters as an example, when d is 2, For the English letter "m" in the image block, the object recognition model may output candidate recognition results as "m" and "n". For example, for a 32*640*3 image block, 160*d recognition results can be returned, and each character is 4 pixels, so there are 160 characters, and d represents the candidate recognition result of the object recognition model to judge each character quantity. Then, the argmax function can be used to operate on 160*d recognition results and return 160 recognition results. It is equivalent to finding the most likely recognition result from the candidate recognition results of each character. For example, in some embodiments, in the recognition process, it is judged and recognized by dividing 4 pixels pixel by pixel, and there may be repeated recognition results, so it is also possible to remove repeated recognition results through deduplication operations to obtain image blocks the final recognition result.
  • Fig. 11 is a schematic diagram of the target object recognition result provided by at least one embodiment of the present disclosure, combined with Fig. 2, 7A, 7B and 11, for example, the object recognized according to each sub-image block and the processed second image block
  • the recognition results are combined and spliced to obtain the object recognition result 1100 corresponding to the initial image, that is, the processing result obtained after the initial image 201 is processed by the image processing method provided in the present disclosure is shown in FIG. 11 .
  • the object recognition result 1100 all the characters (i.e. target objects) in the initial image 201 are recognized, and the relative positional relationship between each character in the object recognition result 1100 and its position in the initial image 201 The relative positions in are the same.
  • FIG. 12 is a schematic block diagram of an image processing device provided by at least one embodiment of the present disclosure.
  • the image processing device may include: an image acquisition module 1201 , an image processing module 1202 , an area identification module 1203 , a determination module 1204 , an interception module 1205 and an object identification module 1206 .
  • the image acquiring module 1201 is configured to acquire an initial image, and the initial image includes at least one target object.
  • the image acquiring module 1201 may execute step S110 described in FIG. 1 , for a specific introduction, please refer to the related description of step S110 , which will not be repeated here.
  • the image processing module 1202 is configured to process the initial image to obtain an intermediate image.
  • the image processing module 1202 may execute the step S120 described in FIG. 1 , for a specific introduction, please refer to the related description of the step S120 , which will not be repeated here.
  • the region recognition module 1203 is configured to use a region detection model to recognize the intermediate image to obtain a connected image including M object connected regions, where M is a positive integer.
  • the area identification module 1203 may execute the step S130 described in FIG. 1 , for a specific introduction, refer to the relevant description of the step S130 , which will not be repeated here.
  • the determination module 1204 is configured to determine M bounding boxes respectively corresponding to the M object connected regions in the connected image.
  • the determining module 1204 may execute the step S140 described in FIG. 1 , for a specific introduction, please refer to the relevant description of the step S140 , which will not be repeated here.
  • the interception module 1205 is configured to intercept N image blocks from the initial image based on the M bounding boxes, each image block includes at least one target object, and N is a positive integer.
  • the intercepting module 1205 may execute step S150 described in FIG. 1 , for a specific introduction, please refer to the relevant description of step S150 , which will not be repeated here.
  • the object recognition module 1206 is configured to use an object recognition model to recognize N image blocks, so as to obtain the target object in the initial image.
  • the object recognition module 1206 may execute the step S160 described in FIG. 1 , for a specific introduction, please refer to the relevant description of the step S160 , which will not be repeated here.
  • the image processing device can achieve technical effects similar to those of the aforementioned image processing method, which will not be repeated here.
  • image acquisition module 1201, image processing module 1202, area identification module 1203, determination module 1204, interception module 1205, and/or object identification module 1206 include codes and programs stored in memory; processors can execute the codes and programs to Realize some or all of the functions of the image acquisition module 1201 , image processing module 1202 , area identification module 1203 , determination module 1204 , interception module 1205 and/or object identification module 1206 described above.
  • the image acquisition module 1201, the image processing module 1202, the area identification module 1203, the determination module 1204, the interception module 1205 and/or the object identification module 1206 may be dedicated hardware devices, which are used to implement the above-mentioned image acquisition module 1201, Some or all of the functions of the image processing module 1202 , the region recognition module 1203 , the determination module 1204 , the interception module 1205 and/or the object recognition module 1206 .
  • the image acquisition module 1201, the image processing module 1202, the area recognition module 1203, the determination module 1204, the interception module 1205 and/or the object recognition module 1206 may be a circuit board or a combination of multiple circuit boards for realizing the above-mentioned function.
  • the circuit board or a combination of multiple circuit boards may include: (1) one or more processors; (2) one or more non-transitory memories connected to the processors; and (3) Processor-executable firmware stored in memory.
  • FIG. 13 is a schematic block diagram of the electronic device provided by at least one embodiment of the present disclosure.
  • an electronic device 1300 includes a processor 1301 , a communication interface 1302 , a memory 1303 and a communication bus 1304 .
  • the processor 1301, the communication interface 1302, and the memory 1303 communicate with each other through the communication bus 1304, and the processor 1301, the communication interface 1302, the memory 1303 and other components may also communicate through a network connection.
  • the present disclosure does not limit the type and function of the network here.
  • memory 1303 is used to store computer readable instructions.
  • processor 1301 is configured to execute computer-readable instructions, implement the image processing method according to any of the foregoing embodiments.
  • the specific implementation of each step of the image processing method and related explanations reference may be made to the above-mentioned embodiment of the image processing method, and details are not repeated here.
  • communication bus 1304 may be a Peripheral Component Interconnect Standard (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like.
  • PCI Peripheral Component Interconnect Standard
  • EISA Extended Industry Standard Architecture
  • the communication bus can be divided into an address bus, a data bus, a control bus, and the like. For ease of representation, only one thick line is used in the figure, but it does not mean that there is only one bus or one type of bus.
  • the communication interface 1302 is used to implement communication between the electronic device and other devices.
  • the processor 1301 and the memory 1303 may be set at the server (or cloud).
  • the processor 1301 may control other components in the electronic device to perform desired functions.
  • the processor 1301 can be a central processing unit (CPU), a network processor (NP), etc.; it can also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable Logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • the central processing unit (CPU) may be an X86 or ARM architecture or the like.
  • memory 1303 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include random access memory (RAM) and/or cache memory (cache), etc., for example.
  • Non-volatile memory may include, for example, read only memory (ROM), hard disks, erasable programmable read only memory (EPROM), compact disc read only memory (CD-ROM), USB memory, flash memory, and the like.
  • ROM read only memory
  • EPROM erasable programmable read only memory
  • CD-ROM compact disc read only memory
  • USB memory flash memory
  • flash memory flash memory
  • Fig. 14 is a schematic block diagram of another electronic device provided by at least one embodiment of the present disclosure.
  • an electronic device 1400 may include a memory 1401 , a processor 1402 and an image acquisition component 1403 . It should be noted that the components of the electronic device 1400 shown in FIG. 14 are exemplary rather than limiting, and the electronic device 1400 may also have other components according to actual application requirements.
  • the image acquiring component 1403 is used to acquire an initial image.
  • the memory 1401 is used to store initial images and computer readable instructions.
  • Processor 1402 is used to read the initial image and execute computer readable instructions. When the computer-readable instructions are executed by the processor 1402, one or more steps in the image processing method according to any of the above-mentioned embodiments are executed.
  • the image acquisition component 1403 can be an image acquisition device, for example, the image acquisition component 1403 can be a camera of a smart phone, a camera of a tablet computer, a camera of a personal computer, a lens of a digital camera, a network camera, and other devices for image acquisition .
  • the initial image may be an original image directly collected by the image acquisition component 1403, or may be an image obtained after preprocessing the original image.
  • Preprocessing can eliminate irrelevant information or noise information in the original image, so as to process the image better.
  • Preprocessing may include, for example, performing image augmentation (Data Augment), image scaling, gamma (Gamma) correction, image enhancement, or noise reduction filtering on the original image.
  • the processor 1402 may control other components in the electronic device 1400 to perform desired functions.
  • the processor 1402 may be a central processing unit (CPU), a tensor processing unit (TPU), or a graphics processing unit (GPU), etc., which has data processing capabilities and/or program execution capabilities.
  • CPU central processing unit
  • TPU tensor processing unit
  • GPU graphics processing unit
  • memory 1401 may include any combination of one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • One or more computer-readable instructions can be stored on the computer-readable storage medium, and the processor 1402 can execute the computer-readable instructions to implement various functions of the electronic device 1400 .
  • Fig. 15 is a schematic diagram of a computer-readable storage medium provided by at least one embodiment of the present disclosure.
  • one or more computer readable instructions 1501 may be stored non-transitory on storage medium 1500 .
  • the computer-readable instructions 1501 are executed by the processor, one or more steps in the image processing method described above may be performed.
  • the storage medium 1500 may be applied in the electronic device 1300 and/or the electronic device 1400 , for example, it may include the memory 1303 in the electronic device 1300 and/or the memory 1401 in the electronic device 1400 .
  • the description of the storage medium 1500 reference may be made to the description of the memory in the embodiments of the electronic device 1300 and/or the electronic device 1400, and repeated descriptions will not be repeated.
  • Fig. 16 is a schematic diagram of a hardware environment provided by at least one embodiment of the present disclosure.
  • the electronic device provided by the present disclosure can be applied in the Internet system.
  • the image processing apparatus, the electronic device 1300, and/or the electronic device 1400 involved in the present disclosure may be implemented using the computer system provided in FIG. 16 .
  • Such computer systems can include personal computers, laptops, tablets, mobile phones, personal digital assistants, smart glasses, smart watches, smart rings, smart helmets, and any smart portable or wearable device.
  • the specific system in this embodiment illustrates a hardware platform including a user interface using functional block diagrams.
  • Such computer equipment may be a general purpose computer equipment or a special purpose computer equipment. Both computer devices can be used to realize the image processing apparatus and electronic devices in this embodiment.
  • the computer system can implement any of the components of the presently described information needed to achieve image processing recognition.
  • a computer system can be realized by a computer device through its hardware devices, software programs, firmware, and combinations thereof.
  • a computer device For the sake of convenience, only one computer device is drawn in Fig. 16, but the relevant computer functions for realizing the information required for image processing described in this embodiment can be implemented by a group of similar platforms in a distributed manner, Distribute the processing load of a computer system.
  • the computer system can include a communication port 1650, which is connected to a network for data communication.
  • the computer system can send and receive information and data through the communication port 1650, that is, the communication port 1650 can realize the communication between the computer system and the computer system.
  • Other electronic devices communicate wirelessly or by wire to exchange data.
  • the computer system may also include a processor group 1620 (ie, the processor described above) for executing program instructions.
  • the processor group 1620 may consist of at least one processor (eg, CPU).
  • the computer system may include an internal communication bus 1610 .
  • a computer system may include different forms of program storage units and data storage units (i.e., memory or storage media described above), such as hard disk 1670, read-only memory (ROM) 1630, random-access memory (RAM) 1640, which can be used to store Various data files used by the computer for processing and/or communicating, and possibly program instructions executed by the processor group 1620 .
  • the computer system may also include an input/output component 1660 for enabling input/output data flow between the computer system and other components (eg, user interface 1680, etc.).
  • input devices including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibrator, etc. output devices; storage devices including, for example, magnetic tapes, hard disks, etc.; and communication interfaces.
  • input devices including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.
  • LCD liquid crystal display
  • speaker vibrator
  • storage devices including, for example, magnetic tapes, hard disks, etc.
  • communication interfaces including, for example, magnetic tapes, hard disks, etc.
  • FIG. 16 shows a computer system with various devices, it should be understood that the computer system is not required to have all of the devices shown and, instead, the computer system may have more or fewer devices.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Geometry (AREA)
  • Character Input (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé de traitement d'image, un appareil de traitement d'image, un dispositif électronique et un support de stockage lisible par ordinateur. Le procédé de traitement d'image consiste à : obtenir une image initiale, l'image initiale comprenant au moins un objet cible ; traiter l'image initiale pour obtenir une image intermédiaire ; reconnaître l'image intermédiaire à l'aide d'un modèle de détection de région pour obtenir une image connectée comprenant M régions connectées d'objet ; déterminer M boîtes de délimitation dans l'image connectée qui correspondent respectivement aux M régions connectées d'objet ; capturer N blocs d'image provenant de l'image initiale sur la base des M boîtes de délimitation, chaque bloc d'image comprenant au moins un objet cible ; et reconnaître les N blocs d'image à l'aide d'un modèle de reconnaissance d'objet afin d'obtenir l'objet cible de l'image initiale.
PCT/CN2022/100269 2021-07-13 2022-06-22 Procédé et appareil de traitement d'image, dispositif et support de stockage WO2023284502A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110788327.XA CN113486828B (zh) 2021-07-13 2021-07-13 图像处理方法、装置、设备和存储介质
CN202110788327.X 2021-07-13

Publications (1)

Publication Number Publication Date
WO2023284502A1 true WO2023284502A1 (fr) 2023-01-19

Family

ID=77938189

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/100269 WO2023284502A1 (fr) 2021-07-13 2022-06-22 Procédé et appareil de traitement d'image, dispositif et support de stockage

Country Status (2)

Country Link
CN (1) CN113486828B (fr)
WO (1) WO2023284502A1 (fr)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116189194A (zh) * 2023-04-27 2023-05-30 北京中昌工程咨询有限公司 一种用于工程建模的图纸增强分割方法
CN116204105A (zh) * 2023-05-05 2023-06-02 北京睿企信息科技有限公司 一种关联图像呈现的处理系统
CN117409428A (zh) * 2023-12-13 2024-01-16 南昌理工学院 一种试卷信息处理方法、系统、计算机及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486828B (zh) * 2021-07-13 2024-04-30 杭州睿胜软件有限公司 图像处理方法、装置、设备和存储介质
CN114445825A (zh) * 2022-02-07 2022-05-06 北京百度网讯科技有限公司 文字检测方法、装置、电子设备和存储介质
CN114745500B (zh) * 2022-03-28 2023-09-19 联想(北京)有限公司 图像处理方法及输出检测系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140072219A1 (en) * 2012-09-08 2014-03-13 Konica Minolta Laboratory U.S.A., Inc. Document image binarization and segmentation using image phase congruency
CN110348449A (zh) * 2019-07-10 2019-10-18 电子科技大学 一种基于神经网络的身份证文字识别方法
CN111860479A (zh) * 2020-06-16 2020-10-30 北京百度网讯科技有限公司 光学字符识别方法、装置、电子设备及存储介质
CN112464931A (zh) * 2020-11-06 2021-03-09 马上消费金融股份有限公司 文本检测方法、模型训练方法及相关设备
CN112560847A (zh) * 2020-12-25 2021-03-26 中国建设银行股份有限公司 图像文本区域定位方法及装置、存储介质及电子设备
CN113486828A (zh) * 2021-07-13 2021-10-08 杭州睿胜软件有限公司 图像处理方法、装置、设备和存储介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222613A (zh) * 2019-05-28 2019-09-10 绍兴数鸿科技有限公司 一种基于卷积神经网络的竖排版繁体中文识别方法
CN110390260B (zh) * 2019-06-12 2024-03-22 平安科技(深圳)有限公司 图片扫描件处理方法、装置、计算机设备及存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140072219A1 (en) * 2012-09-08 2014-03-13 Konica Minolta Laboratory U.S.A., Inc. Document image binarization and segmentation using image phase congruency
CN110348449A (zh) * 2019-07-10 2019-10-18 电子科技大学 一种基于神经网络的身份证文字识别方法
CN111860479A (zh) * 2020-06-16 2020-10-30 北京百度网讯科技有限公司 光学字符识别方法、装置、电子设备及存储介质
CN112464931A (zh) * 2020-11-06 2021-03-09 马上消费金融股份有限公司 文本检测方法、模型训练方法及相关设备
CN112560847A (zh) * 2020-12-25 2021-03-26 中国建设银行股份有限公司 图像文本区域定位方法及装置、存储介质及电子设备
CN113486828A (zh) * 2021-07-13 2021-10-08 杭州睿胜软件有限公司 图像处理方法、装置、设备和存储介质

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116189194A (zh) * 2023-04-27 2023-05-30 北京中昌工程咨询有限公司 一种用于工程建模的图纸增强分割方法
CN116204105A (zh) * 2023-05-05 2023-06-02 北京睿企信息科技有限公司 一种关联图像呈现的处理系统
CN117409428A (zh) * 2023-12-13 2024-01-16 南昌理工学院 一种试卷信息处理方法、系统、计算机及存储介质
CN117409428B (zh) * 2023-12-13 2024-03-01 南昌理工学院 一种试卷信息处理方法、系统、计算机设备及存储介质

Also Published As

Publication number Publication date
CN113486828A (zh) 2021-10-08
CN113486828B (zh) 2024-04-30

Similar Documents

Publication Publication Date Title
WO2023284502A1 (fr) Procédé et appareil de traitement d'image, dispositif et support de stockage
WO2021147631A1 (fr) Procédé et dispositif d'élimination de contenu manuscrit et support d'enregistrement
WO2021233266A1 (fr) Procédé et appareil de détection de bord et dispositif électronique et support de stockage
US20230222631A1 (en) Method and device for removing handwritten content from text image, and storage medium
EP3940589B1 (fr) Procédé d'analyse de disposition, dispositif électronique et produit programme informatique
US8755595B1 (en) Automatic extraction of character ground truth data from images
CN110647882A (zh) 图像校正方法、装置、设备及存储介质
CN111259878A (zh) 一种检测文本的方法和设备
US20220092325A1 (en) Image processing method and device, electronic apparatus and storage medium
CN110942004A (zh) 基于神经网络模型的手写识别方法、装置及电子设备
US10169650B1 (en) Identification of emphasized text in electronic documents
CN112926421B (zh) 图像处理方法和装置、电子设备和存储介质
WO2021051553A1 (fr) Procédé et appareil de classification et de positionnement d'informations de certificat
CN113223025A (zh) 图像处理方法及装置、神经网络的训练方法及装置
CN110598703A (zh) 一种基于深度神经网络的ocr识别方法及装置
CN113436222A (zh) 图像处理方法、图像处理装置、电子设备及存储介质
WO2022166707A1 (fr) Procédé et appareil de traitement d'image, dispositif électronique et support de stockage
WO2022002002A1 (fr) Procédé de traitement d'image, appareil de traitement d'image, dispositif électronique et support de stockage
CN112597940B (zh) 证件图像识别方法、装置及存储介质
US11367296B2 (en) Layout analysis
CN114241486A (zh) 一种提高识别试卷学生信息准确率的方法
CN113449686A (zh) 一种身份证造假的识别方法、装置、设备和介质
JPH0916713A (ja) 画像領域分割方法
CN113963366A (zh) 图像处理方法及装置、电子设备和计算机可读存储介质
CN116740727A (zh) 票据图像的处理方法、装置、电子设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22841142

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE