CN113297870A - Image processing method, image processing device, electronic equipment and computer readable storage medium - Google Patents

Image processing method, image processing device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN113297870A
CN113297870A CN202010109109.4A CN202010109109A CN113297870A CN 113297870 A CN113297870 A CN 113297870A CN 202010109109 A CN202010109109 A CN 202010109109A CN 113297870 A CN113297870 A CN 113297870A
Authority
CN
China
Prior art keywords
code block
code
image
region
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010109109.4A
Other languages
Chinese (zh)
Inventor
赵玉瑶
李伟
张海锋
张兰兰
金哲洙
姜映映
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Samsung Telecom R&D Center
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Original Assignee
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Samsung Telecommunications Technology Research Co Ltd, Samsung Electronics Co Ltd filed Critical Beijing Samsung Telecommunications Technology Research Co Ltd
Priority to CN202010109109.4A priority Critical patent/CN113297870A/en
Publication of CN113297870A publication Critical patent/CN113297870A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • G06K7/14Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
    • G06K7/1404Methods for optical code recognition
    • G06K7/1439Methods for optical code recognition including a method step for retrieval of the optical code
    • G06K7/1443Methods for optical code recognition including a method step for retrieval of the optical code locating of the code in an image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • G06K7/14Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
    • G06K7/1404Methods for optical code recognition
    • G06K7/1408Methods for optical code recognition the method being specifically adapted for the type of code
    • G06K7/14131D bar codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06KGRAPHICAL DATA READING; PRESENTATION OF DATA; RECORD CARRIERS; HANDLING RECORD CARRIERS
    • G06K7/00Methods or arrangements for sensing record carriers, e.g. for reading patterns
    • G06K7/10Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation
    • G06K7/14Methods or arrangements for sensing record carriers, e.g. for reading patterns by electromagnetic radiation, e.g. optical sensing; by corpuscular radiation using light without selection of wavelength, e.g. sensing reflected white light
    • G06K7/1404Methods for optical code recognition
    • G06K7/1408Methods for optical code recognition the method being specifically adapted for the type of code
    • G06K7/14172D bar codes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Toxicology (AREA)
  • Electromagnetism (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application provides an image processing method, an image processing device, an electronic device and a computer readable storage medium, the method extracts the code region image of each code block region from the image to be processed by determining the position information and the code block type of each code block region in the image to be processed and further respectively according to the position information of each code block region, and the geometric transformation is respectively carried out on the code area images to obtain the corresponding code area setting images, thereby avoiding the consumption of computing resources caused by the line-by-line scanning and the arrangement and combination of positioning points in the traditional method, and respectively according to the code block types of the code block areas, the corresponding code region setting image is decoded, so that the consumption of computing resources caused by multiple scanning with different locators is avoided, the rapid identification of multiple code block regions of various types can be supported, and the detection effect is obviously improved.

Description

Image processing method, image processing device, electronic equipment and computer readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an image processing method, an image processing apparatus, an electronic device, and a computer-readable storage medium.
Background
Barcode technologies such as one-dimensional codes and two-dimensional codes are comprehensive technologies integrating image identification, image processing, information encoding, information encryption and network communication. The method is generated and rapidly developed in the application of computer technology, has the advantages of fast input, high accuracy, low cost, strong reliability and the like, and is widely applied to various industries.
In the conventional method, taking detection of a two-dimensional code as an example, the picture needs to be scanned row by row and column by column, a locator of the two-dimensional code possibly existing in the picture is found, a standard square (or rectangle) is obtained from a code block main body according to the locator, and a corresponding relationship between a converted pixel point and code block information is determined, so that corresponding decoding processing is performed.
However, if the picture size is too large, the row-by-row and column-by-column scanning consumes a lot of computing resources. Moreover, if there are multiple codes in the picture, the searched anchor points need to be arranged and combined to determine which code each anchor point belongs to, so that the calculated amount is exponentially increased. Even if the plurality of codes are different types of codes, different locators are required to be used for scanning respectively, which is too large in calculation amount and significantly increases power consumption and time.
Disclosure of Invention
In order to overcome the above technical problems or at least partially solve the above technical problems, the following technical solutions are proposed:
in a first aspect, the present application provides an image processing method, including:
determining the position information and code block type of each code block region in the image to be processed;
extracting code region images of the code block regions from the image to be processed according to the position information of the code block regions;
respectively carrying out geometric transformation on each code region image to obtain a corresponding code region setting image;
and respectively decoding the corresponding code region setting image according to the code block type of each code block region.
In a second aspect, the present application provides an image processing apparatus comprising:
the determining module is used for determining the position information and the code block type of each code block region in the image to be processed;
the extraction module is used for extracting the code region images of the code block regions from the image to be processed according to the position information of the code block regions;
the geometric transformation module is used for respectively carrying out geometric transformation on each code region image to obtain a corresponding code region setting image;
and the decoding module is used for decoding the corresponding code region setting image according to the code block type of each code block region.
In a third aspect, the present application provides an electronic device comprising: a processor and a memory storing at least one instruction, at least one program, set of codes, or set of instructions, which is loaded and executed by the processor to implement the method as set forth in the first aspect of the application.
In a fourth aspect, the present application provides a computer readable storage medium for storing a computer instruction, program, code set or instruction set which, when run on a computer, causes the computer to perform a method as set forth in the first aspect of the present application.
According to the image processing method, the device, the electronic equipment and the computer readable storage medium, the position information and the code block type of each code block region in the image to be processed are determined, then the code region image of each code block region is extracted from the image to be processed according to the position information of each code block region, geometric transformation is carried out on each code region image, and a corresponding code region setting image is obtained.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a code block identification and information reading process provided by an embodiment of the present application;
fig. 3 is a schematic diagram for displaying visualized thermodynamic diagram information according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an anchor frame-based object detection provided by an embodiment of the present application;
FIG. 5 is a schematic diagram of four vertex-based frameless detection provided by an embodiment of the present application;
FIG. 6 is a schematic diagram of a reduced infrastructure network according to an embodiment of the present application;
FIG. 7 is a diagram illustrating an infrastructure of a neural network provided by an embodiment of the present application;
FIG. 8 is a schematic diagram illustrating the operation of a dilation convolution according to an embodiment of the present application;
FIG. 9 is a schematic diagram of fusion of feature maps of three stages provided in an embodiment of the present application;
FIG. 10 is a schematic illustration of a three-scale feature map merge provided by an embodiment of the present application;
fig. 11 is a schematic diagram of a detection network according to an embodiment of the present application;
fig. 12 is a schematic diagram of a CornerNet detection algorithm provided in an embodiment of the present application;
FIG. 13 is a diagram illustrating grouping of four vertices provided by an embodiment of the present application;
FIG. 14 is a schematic diagram of a model compression algorithm provided by an embodiment of the present application;
fig. 15 is a schematic diagram of geometric transformation performed on a code region image according to an embodiment of the present application;
fig. 16 is a schematic flowchart of decoding a code block region of a QR two-dimensional code type according to an embodiment of the present application;
fig. 17 is a schematic flowchart of detecting a two-dimensional code by a conventional method according to an embodiment of the present application;
fig. 18 is a schematic flowchart of decoding a code block region of a barcode type according to an embodiment of the present application;
fig. 19 is a schematic flowchart of decoding a code block region of a data matrix code type according to an embodiment of the present application;
fig. 20 is a schematic diagram of an intelligent identification process of a two-dimensional code according to an embodiment of the present application;
FIG. 21 is a diagram illustrating simultaneous recognition of two codes in an image according to an embodiment of the present disclosure;
fig. 22 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
An embodiment of the present application provides an image processing method, as shown in fig. 1, the method includes:
step S110: determining the position information and code block type of each code block region in the image to be processed;
step S120: extracting code region images of the code block regions from the image to be processed according to the position information of the code block regions;
step S130: respectively carrying out geometric transformation on each code region image to obtain a corresponding code region setting image;
step S140: and respectively decoding the corresponding code region setting image according to the code block type of each code block region.
In the embodiment of the present application, the image to be processed refers to an acquired image including Code blocks (Code blocks) such as one-dimensional codes and/or two-dimensional codes. In practical application, it may be determined whether the acquired image includes the code block, and if it is determined that the acquired image includes the code block, the acquired image may be used as an image to be processed to perform subsequent processing. For the embodiment of the present application, the number of code blocks in the image to be processed may be one or more, and when the number of code blocks is multiple, the types of the multiple code blocks may be the same or different, and all the code blocks may be detected by the embodiment of the present application.
In the embodiment of the present application, one Code block region refers to a region where one Code block is located in an image to be processed, and may also be referred to as a Code Area (Code Area) for short. It can be understood how many code blocks are included in the image to be processed, that is, how many code block regions can be detected. In step S110, for each code block region in the image to be processed, its corresponding position information and code block type are detected. The position information is position information of a code block region in the image to be processed, and the code block type is a code block type to which a code block in the code block region belongs.
In the embodiment of the present application, the code block types include, but are not limited to, a Quick Response (QR code) two-dimensional code, a Data Matrix code (Data Matrix), a Portable Data File (PDF 417) two-dimensional bar code, a bar code (Barcode), a small program code, and the like.
In step S120, for each code block region in the image to be processed, a corresponding code region image is extracted, and since the position information of each code block region can locate the position of the code block region in the image to be processed, the region located by each position information is extracted from the image to be processed, and thus each code region image can be extracted. In other embodiments, other extraction methods may also be employed.
As will be understood by those skilled in the art, taking code blocks such as one-dimensional codes and two-dimensional codes which are generally fixed rectangles (including squares) as an example, due to the influence of factors such as a shooting angle and a shooting object, each code region image extracted from the image to be processed may be a quadrangle with any angle that deforms in an actual scene, and therefore, in step S130, each code region image is subjected to geometric transformation and converted into a standard orthographic projection pattern, so as to obtain a set image of each code region. Then for a rectangular code, the resulting standard orthographic projection pattern may be a code region rectangular image; for the original square code, the obtained code area rectangular image can also be a code area square image. For a code that is originally a non-rectangular shape such as a circle or an irregular shape, the code can be changed by associating it with a rectangle in a mapping manner such as an inscribed circle of a square, to obtain a code region setting image. And then the code can be decoded according to the code region setting image.
As can be seen from the above description, in the embodiment of the present application, different types of codes can be detected simultaneously, and in step S140, the decoding processing of the corresponding type code is performed on the corresponding code region setting image according to the code block type of each code block region, so as to read information of different types of code blocks in the image to be processed. As an example, for a code block region of a QR two-dimensional code type, decoding or the like may be performed according to the encoding specification of the QR code. The skilled person can associate the code block type and the corresponding decoding manner according to the actual situation in order to decrypt the detected code block.
Fig. 2 illustrates a code block as an example, which shows a code block identification and information reading process according to an embodiment of the present application, specifically, after an image to be processed is subjected to detection (target detection) of a code block region in step S110, a code region image is extracted from the image to be processed in step S120, the code region image is subjected to geometric transformation in step S130 to obtain a normalized code region setting image, and then the code region setting image is decoded and information is extracted in step S140 to obtain a final result. For the case that a plurality of code blocks exist in the picture to be processed, the processing procedure of each code block is similar, and is not described herein again.
According to the image processing method provided by the embodiment of the application, the position information and the code block type of each code block region in the image to be processed are determined, the code region image of each code block region is extracted from the image to be processed according to the position information of each code block region, geometric transformation is performed on each code region image respectively, and a corresponding code region setting image is obtained.
In the embodiment of the present application, a feasible implementation manner is provided for step S110, and specifically, step S110 may include the steps of:
step S111: performing multi-scale (multi-scale) feature extraction on an image to be processed to obtain a feature map;
for the embodiment of the application, the multi-scale feature extraction can realize the extraction of the features of different scales of the image to be processed to obtain the feature map. The large scale corresponds to the global features of the image, and the small scale corresponds to the local features of the image. The sizes of different code blocks in the image to be processed are different, and the detection process of global and local characteristics is combined, so that the method is not only suitable for the image to be processed containing one code block, but also suitable for the image to be processed containing a plurality of code blocks with different sizes and different shapes.
Specifically, this step can be implemented as follows:
step S1111: extracting the characteristics of the image to be processed to obtain a basic sub-characteristic diagram
Step S1112: and performing feature extraction on the basic sub-feature graph based on at least one scale to obtain a sub-feature graph of at least one scale, and obtaining a feature graph according to the sub-feature graph of at least one scale.
Further, in step S1112, when the scales are at least two, the process of extracting features from the basic sub-feature map based on at least two scales to obtain sub-feature maps of at least two scales and obtaining a feature map according to the sub-feature maps of at least two scales may specifically include the steps of: extracting the features of the basic sub-feature graph based on a first scale to obtain a sub-feature graph of the first scale; sequentially carrying out feature extraction on the sub-feature graph of the previous scale based on smaller scales from the second scale to obtain at least one sub-feature graph of smaller scales; and obtaining a feature map according to the sub-feature map with the first scale and at least one sub-feature map with a smaller scale.
Step S112: the set vertices in the feature map are detected, respectively, to obtain the position information and code block type of each code block region.
The set vertex is a vertex set according to the angular characteristics of the code block shape. For example, for a code block region that can be associated into a quadrangle, the set vertices may be four vertices, upper left, upper right, lower right, and lower left. In practical applications, the four vertices may not be in completely horizontal or vertical corresponding positions, and the four vertices may be distinguished according to a predetermined order, such as a clockwise order). In the embodiment of the present application, the position information of each code block region can be obtained by determining the position information of four vertices of a quadrilateral, and can be used to detect a code that generates any angular deformation. Meanwhile, the code block type of each code block region is output by determining the code block types of the four vertices of the quadrangle.
Specifically, this step can be implemented as follows: detecting set vertexes in the feature map respectively to obtain corresponding thermodynamic diagrams (Heatmaps), connecting vectors (Embeddings) and offset information (Offsets); position information and code block types of the code block regions are obtained from the thermodynamic diagrams, the concatenated vectors, and the offset information.
Heatmaps are used for predicting which points may be prediction vertices of the code block region, Embeddings are used for predicting which code block region each prediction vertex belongs to, and Offsets is used for fine adjustment and correction of the positions of the prediction vertices.
Specifically, Heatmaps output information for predicting vertices, as shown in fig. 3, for displaying visualized thermodynamic diagram information. The Heatmaps are represented by a feature map with the dimension of C H W, and comprise C channels, each channel is a Gaussian Mask (Mask), the value range of the Mask is [0, 1], points representing the positions are the scores of vertexes, non-zero values represent that the points at the positions can be a vertex of a certain code block region, and then after the positions are predicted to be a vertex of the certain code block region, the position information of the vertexes is output, namely the position information of each predicted vertex can be determined according to each thermodynamic map.
Where C corresponds to the number of identifiable code block types, those skilled in the art may set C according to actual situations, and as an example, for an embodiment capable of identifying a one-dimensional barcode, a QR two-dimensional code, and a DataMatrix code at the same time, C may be set to 3.
In the embodiment of the present application, the code block type of each prediction vertex is further determined according to each thermodynamic diagram, specifically, a vector may be obtained according to a feature map of each channel, mapping is performed through a full-link layer, a score of each code block type is predicted, and the code block type with the highest score is determined as the code block type of the corresponding prediction vertex.
In the embodiment of the present application, the code block regions are grouped for each prediction vertex according to the connected vector of each prediction vertex. Specifically, taking the set vertices as four vertices, i.e., top left vertex, top right vertex, bottom right vertex, and bottom left vertex, as an example, taking the predicted top left vertex, top right vertex, bottom right vertex, and bottom left vertex as a group, at least one group of four predicted vertices can be obtained. For each group of four prediction vertexes, the correlation between the four prediction vertexes can be compared by using a multidimensional connecting vector, and the probability that the prediction vertexes belong to the same code block region is predicted to be used for grouping the predicted vertexes. If four vertices at the top left, top right, bottom right, and bottom left belong to the same object (code block region), the distance between them will be small, and vertices belonging to different objects (code block regions) should have large distances between them.
In the embodiment of the present application, in order to reduce the amount of calculation, avoid calculating the distance between every two of the four vertices, and optimize the grouping policy, specifically, a reference embedding vector (reference embedding vector) is defined. Determining corresponding reference vectors according to the connection vectors of the set prediction vertexes of each group; wherein the reference vector may be the mean of the connected vectors of the set vertices. And grouping the code block regions of the prediction vertexes according to the distances between the prediction vertexes set in each group and the corresponding reference vectors. When the distance between the connecting vector of each group of the set prediction vertexes and the corresponding reference vector is shortest, the connecting vector and the corresponding reference vector can be classified into the same target (code block region), and the grouped set prediction vertexes can be determined as the set vertexes of each code block region.
Further, the initial position information and the code block type of the corresponding code block region can be obtained according to the position information and the code block type of each grouped prediction vertex. It is understood that the code block type of each group of grouped prediction vertices should be the same, and therefore the code block type of the corresponding code block region can be directly determined.
In the embodiment of the application, the position information of the code block region can be primarily positioned at one time by simultaneously predicting the positions of the set vertexes (for example, four vertexes of the rectangular code block), the code block type of the code block region can be determined, the prediction step is simplified, and the detection speed is increased.
Since the process of obtaining Heatmap from the input image to be processed generates accumulated errors, fine adjustment is required to eliminate the error information to improve the accuracy. In the embodiment of the present application, the initial position information of the corresponding code block region is finely adjusted based on the position information of the corresponding prediction vertex according to the offset information of each prediction vertex, so as to obtain the position information of each code block region. Specifically, the offset information of each prediction vertex can be determined according to the real position of each prediction vertex on the image to be processed and the mapping position of each prediction vertex on the feature map; by way of example, assuming that the down-sampling rate is r, the true position of the prediction vertex on the image to be processed is (x, y), and the mapping position mapped on the feature map is (x, y)
Figure BDA0002389335930000091
The following offset information may be defined:
Figure BDA0002389335930000092
and then according to the offset information of each prediction vertex, fine-tuning the initial position information of the corresponding code block region based on the position information of the corresponding prediction vertex.
In this embodiment of the present application, step S110 may be implemented based on deep learning, specifically, the position information and the code block type of each code block region in the image to be processed are determined by a neural network.
After the neural network processing, more accurate position information of the region where the code block is located in the image to be processed can be given, and the accuracy and precision of subsequent identification can be effectively improved.
Those skilled in the art will appreciate that this step is in the field of object detection in image processing. In practical applications, current object detection algorithms generally fall into two categories, anchor-based (anchor) or frameless.
If the object detection based on the anchor frame is adopted, specific corner (vertex) information of a quadrilateral target area can be determined aiming at a target area image which rotates at will, but in the training and detection processes, as shown in fig. 4, prior information such as the size and proportion of the anchor frame needs to be preset, and fine adjustment needs to be carried out aiming at a special scene, so that the model obtained by final training has good detection capability. Therefore, for such detection models, for a long strip-shaped target region, the length-to-height ratio of the frame needs to be adjusted to be larger, and for predicting a target close to a square, the length-to-height ratio needs to be adjusted to be closer to 1, a large number of candidate frames are usually generated, and finally, a final result is extracted in a scoring manner, so that the detection models cannot be well and automatically adapted to various application scenes, and the adaptability is poor under the condition of simultaneously dealing with various types of codes.
In the embodiment of the present application, a one-step method based on keypoint detection, that is, a frameless strategy, is adopted, as shown in fig. 5, through four vertices of a quadrilateral (corresponding to m1, m2, m3, and m4 in fig. 5), a code block (corresponding to G in fig. 5) that generates any angular deformation in an actual scene can be detectedm)。
Specifically, the neural network may include: a feature extraction network and a detection head network, wherein the feature extraction network can perform the operation of step S111; the detection head network may perform the operation of step S112. Namely:
in step S111: performing multi-scale feature extraction on the image to be processed through a feature extraction network to obtain a feature map;
in step S112: and respectively detecting the set top points in the characteristic diagram through a detection head network to obtain the position information and the code block type of each code block region.
Namely, through the feature extraction and detection head network, the final positioning result and code category can be output from the input of the image to be processed.
In practical application, a plurality of detection networks may be set in the detection head network according to the number of the set vertices, and as an example, when the set vertices are four vertices, i.e., an upper left vertex, an upper right vertex, a lower right vertex, and a lower left vertex, the detection head network may include four detection networks, and the four detection networks respectively correspond to the four vertices, i.e., the upper left vertex, the upper right vertex, the lower right vertex, and the lower left vertex; then, in step S112, four vertices in the feature map may be detected by four detection networks, respectively, to obtain the position information and code block type of each code block region.
In the embodiment of the application, the feature extraction network may include a basic network and at least one additional feature extraction layer which are connected in sequence; wherein the base network may perform the operation of step S1111, and the at least one additional feature extraction layer may perform the operation of step S1112. Namely:
step S1111: extracting the features of the image to be processed through a basic network to obtain a basic sub-feature map;
step S1112: and performing feature extraction on the basic sub-feature map based on at least one scale through at least one additional feature extraction layer to obtain a sub-feature map of at least one scale, and obtaining a feature map according to the sub-feature map of at least one scale.
Wherein each additional feature extraction layer may perform feature extraction based on a scale. Then, in step S1112, when the scales are at least two, i.e. at least through at least two additional feature extraction layers, feature extraction is performed on the basic sub-feature map based on the at least two scales respectively. Specifically, feature extraction is carried out on the basic sub-feature graph based on a first scale through a first additional feature extraction layer to obtain a sub-feature graph of the first scale; and starting from the second additional feature extraction layer, sequentially extracting features of the sub-feature map of the previous scale based on a smaller scale.
In practical applications, a person skilled in the art can select a suitable basic Network according to actual needs, for example, Convolutional Neural Networks (ConvNet, CNN) such as ResNet (Residual Neural Network), VGG (Visual Geometry Group Network), MobileNet (a lightweight Network), and the like. The embodiment of the application provides a basic network, as shown in fig. 6, for a simplified small network, the calculation amount can be reduced, the speed can be increased, and the operation on the terminal can be faster. Wherein, each of the convolution layers (Conv)1 to 5 adopts 3 × 3 convolution, and the step length (Stride) is 1. The Pooling layer (Pooling) Size (Size) is 2, the step Size (Stride) is 2, and the Pooling layer employs a Max Pooling strategy (Max Pooling).
Furthermore, at least one additional feature extraction layer is connected to the basic network, the low-layer feature map can have higher spatial resolution, and the high-layer feature map has a larger receptive field and thus contains more semantic information.
In the embodiment of the application, as shown in fig. 7, the image to be processed may obtain a detection result through the feature extraction network and the detection head network. In order to achieve a better multi-scale special effect extraction effect, for the feature extraction network part in fig. 7, three additional feature extraction layers (corresponding to the additional feature extraction layer 1, the additional feature extraction layer 2, and the additional feature extraction layer 3 in fig. 7, for convenience of description, the additional feature extraction layer is denoted by Stage hereinafter) are connected to the base network. Specifically, the features obtained by the underlying network (i.e. the above-mentioned underlying sub-feature map) are adjusted to Stage1 (i.e. the above-mentioned sub-feature map of the first scale), and Stage2 and Stage3 sequentially extract feature information of higher levels (i.e. two sub-feature maps of smaller scales), and the features of Stage1, Stage2 and Stage3 are fused (Concatenate) together to obtain a feature map for detection.
In this embodiment of the present application, the process of fusing the features of Stage1, Stage2, and Stage3 together to obtain a feature map for detection, that is, the process of obtaining the feature map according to the sub-feature map of the first scale and the sub-feature map of at least one smaller scale may specifically include the steps of: performing expansion convolution on at least one sub-feature map with smaller scale according to the resolution of the sub-feature map with the first scale; and fusing the at least one expanded and convolved sub-feature map with the sub-feature map of the first scale to obtain a feature map so as to reduce the calculation amount.
Among them, the dilation convolution is an operation commonly used in a convolutional neural network. In the layer-by-layer increasing process of the network, the pooling layer continuously reduces the resolution of the image, and higher-order semantic information can be acquired through the processing of subsequent layers. However, when the network finally performs feature extraction, the feature map size needs to be restored to the original image size or a similar size. At this time, a larger receptive field can be obtained through the operation of dilation convolution, and meanwhile, the loss of information quantity is reduced. As shown in fig. 8, the operation of the dilation convolution is shown.
In the embodiment of the present application, assuming that the size of the input image is H × W, as shown in fig. 9, for the feature maps of three stages, the feature maps of Stage2 and Stage3 are first raised to the same resolution as that of Stage1 by using dilation Convolution (scaled Convolution), and then L2 Normalization (L2 Normalization) is performed, so that the feature maps of the three stages are fused in the channel dimension, and the size of the feature map for detection obtained after fusion is (H/r) (W/r), where r represents the down-sampling rate. The combination of the feature maps of the three scales is shown in fig. 10.
In the embodiment of the present application, as shown in fig. 7, the example where the detection head network portion includes four detection networks (corresponding to m1, m2, m3, and m4 in fig. 7) is taken, where each of the detection networks includes three branch networks; and the three branch networks of each detection network are used for detecting corresponding vertexes in the feature map to obtain corresponding thermodynamic diagrams, connection vectors and offset information.
The embodiment of the present application provides a detection network, taking the m1 branch as an example, the other three branches are similar, as shown in fig. 11, and are divided into three sub-branches. Specifically, the signature for detection is first split into three branches by a convolutional layer, which contains a 3 × 3 convolution, Batch Normalization (BN), and ReLU activation function (the convolutional layer may be abbreviated as Conv3 × 3_ BN _ ReLU). After each branch is respectively subjected to 3 × 3 convolution and a ReLU activation function (which may be abbreviated as Conv3 × 3_ ReLU), respectively subjected to 1 × 1 convolution (Conv1 × 1), respectively, corresponding Heatmaps, Embeddings and Offsets are obtained. By combining the four detection networks, the position information and the code block type of each code block region can be obtained according to each predicted thermodynamic diagram, connecting vector and offset information.
In the embodiment of the application, a one-step method is adopted, so that the method has great advantages in detection speed and almost even performance compared with a method adopting a Faster Region-based Convolutional Neural network (fast Convolutional Neural network). Secondly, an Anchor frame (Anchor) -based method is not adopted, but a concept of bottom-up key point detection is adopted, and detected corresponding key points are connected together through a vector connection method, so that a whole is formed, and accurate prediction of the whole target position is realized.
An advantage of an embodiment of the present application is that the step of setting the size and scale of the frame is eliminated, which is very adaptable to bar codes with very large aspect ratios. Because two-dimensional codes are generally close to squares, barcodes have shorter lengths and longer-serial-number barcodes have larger length-width ratios, if such barcodes are detected, the size and the ratio of a target frame need to be adjusted to ensure that code block areas with various ratios can be detected simultaneously, which increases detection overhead and reduces accuracy. Secondly, set vertexes (for example, four vertexes) of the code block region are respectively assigned to the multidimensional vector in a connected vector mode, the vertex position of the code block region is found through key point detection, and the vertexes are divided into different code block regions through the connected vectors. The specific position of the code block region is obtained by combining the vertexes, and the identification of code blocks with various types, sizes and high-width ratios is well realized.
Compared with other frameless target detection algorithms, such as CornerNet, the method defines a horizontal rectangle based on the upper left corner and the lower right corner of the object, so as to determine the position information of the object. As shown in fig. 12, the convolutional network CNN outputs thermodynamic diagram (Heatmap) information of each TOP-Left/Bottom-Right corner (TOP-Left/Bottom-Right) of the target, and a connection vector (Embeddings) of each corner. Whether two similar angular points (upper left/lower right) belong to the same object or not is predicted through network training, and finally the accurate position of the object is obtained. Because only two angular points are provided, a common frameless target detection algorithm can only predict a target (an object or a person) at a horizontal position or a vertical position, cannot locate an object at an inclined angle or any rotating angle, and cannot achieve the purpose of detecting the information of the real rotating angle and the position of the vertex of the target.
In the embodiment of the application, the set vertexes close to the target are detected through the CNN, taking the set vertexes as four vertexes as an example, the similarity information of the four order points of the same code block is obtained through training, and the similarity information is combined and fine-tuned to obtain the final position. Since a general code block can be generally regarded as or mapped to a fixed rectangle (square), when the general code block is deformed in an actual scene and becomes a quadrangle with an arbitrary rotation angle, the general code block can be acquired by four vertexes so as to segment a code word portion (code block region) and then perform correction and reading of information bits.
In the embodiment of the present application, in order to train the better outputs Heatmaps, Embeddings and Offsets of the detection network, the following loss functions (loss) are provided:
(1) loss function of location information (label loss):
to deal with the problem of a severe imbalance of the positive and negative sample ratios, the embodiment of the present application employs a focal loss (focal loss) function. The focus loss function for the cross entropy of the two classes is defined as:
Figure BDA0002389335930000141
wherein: y is the input, y' is the output of the activation function, and γ >0 is the control factor, which can reduce the loss of easily classifiable samples and focus on more difficult and easily misclassified samples.
In addition, the importance of the positive and negative samples can be further balanced by adding a balance factor alpha, and the formula is as follows:
Figure BDA0002389335930000142
further, the variables may define a focus loss function on the basis of this with respect to the position information.
From the above, the class of vertices (code block type) can be predicted from the thermodynamic diagram. For a vertex of class C (C ∈ C), the base position group Truth is only one, and the rest are negative. And for negative values, an equal training penalty strategy is not adopted, the reference position is taken as a central point, the radius is the negative value in the circle of R, and the corresponding penalty value is reduced. The reduced penalty is achieved by a non-normalized two-dimensional gaussian function as shown below:
Figure BDA0002389335930000143
the position at the point where x is 0, y is 0 is the reference position, label is 1, and σ is 1/3 of the circle radius. Thus, the prediction box composed of the negative sample vertices close to the reference position still has a large overlap area IoU (overlap degree) with the reference point.
In the embodiment of the present application, p is setcijFor the predicted thermodynamic diagram (i, j) location, the class is a score of c, ycijFor the above gaussian enhanced thermodynamic diagram reference, the following focus loss is defined:
Figure BDA0002389335930000144
wherein the parameters are defined as follows:
n: the number of code block regions;
c: number of code block types;
h: is the feature map height;
w: is the feature map width;
γ: the hyper-parameter of the focus loss controls the loss weight of the difficult and easy classification sample to be 2;
α: after the hyper-parameter of the focus loss is enhanced by the Gaussian mask, different loss functions are adopted for negative samples close to the positive sample, and the loss functions are set to be 4;
ycij1: this point is represented as the predicted target reference vertex, where the loss function is the focus loss function, and the other values are the focus loss function modified by the alpha parameter.
(2) Loss function of packet (group loss):
for each vertex of the prediction, there is a corresponding connected vector (embedding vector). In the embodiment of the present application, a reference vector is further defined, and taking the example that the set vertex is four vertices, the connected vector loss function is composed of the following two loss functions:
1) merging loss function (pull loss):
Figure BDA0002389335930000151
wherein:
n: represents the number of code block regions;
Figure BDA0002389335930000152
a connected vector representing a vertex near the upper left corner belonging to the kth code block region;
Figure BDA0002389335930000153
a connected vector representing a vertex near the upper right corner belonging to the kth code block region;
Figure BDA0002389335930000154
a connected vector representing vertices near the lower right corner belonging to the kth code block region;
Figure BDA0002389335930000155
representing a connection of vertices near the lower left corner belonging to the kth code block regionVector quantity;
ek: means representing a connected vector of four vertices of upper left, upper right, lower right, and lower left belonging to a kth code block region;
as can be seen from the above equation, minimizing the loss ensures that the distance between the connected vector of each vertex and the reference vector of the corresponding code block region is the shortest, thus ensuring that they are classified as the same code block region.
2) Separation loss function (push loss):
Figure BDA0002389335930000161
wherein:
n: represents the number of code block regions;
ej: means representing a connected vector of four vertices belonging to a k-th code block region;
ek: means representing a connected vector of four vertices of upper left, upper right, lower right, and lower left belonging to a kth code block region;
Δ: the constant is set to 1.
As can be seen from the above equation, minimizing the loss, i.e., increasing the distance between two different targets, thereby partitioning the vertices into different code block regions.
In the embodiment of the present application, the merge loss function is used to reduce the distance between four predicted vertices belonging to the same code block region; while the separation loss function is used to extend the distance of the connected vectors of the four vertices belonging to different code block regions. Wherein L ispullThe loss function groups the vertexes of the same code block region; and L ispushVertices belonging to different code block regions are separated. As shown in fig. 13, the correct vertices can be classified into the corresponding code block regions.
(3) Offset loss function (offset loss):
in the embodiment of the present application, the offset loss is different from the offset between the true value frame and the predicted frame in the frame-based target detection algorithm, where the offset information offset represents the accuracy information lost from the input image to the feature map rounding. Because the size is transformed during the processing of each neural network layer from the input image to the feature map, the rounding necessarily results in loss of precision. Especially for small scale targets (code block regions).
Specifically, assuming that the down-sampling rate is r, for a real position (x, y) point on the input image, the position after mapping to the feature map is
Figure BDA0002389335930000162
The following offset information may be defined:
Figure BDA0002389335930000163
and defines a loss function based on L1 smoothing:
Figure BDA0002389335930000171
wherein:
n: represents the number of code block regions;
ok: the position of the feature map after being integrated;
Figure BDA0002389335930000172
the corresponding position of the original input image.
(4) Combine all loss functions:
and (3) combining the loss functions by adopting an Adam algorithm to obtain:
L=Lfe+αLpull+βLpush+γLoffset
wherein:
α: the coefficient for PULL loss was set to 0.1;
beta: the coefficient for PUSH loss was set to 0.1;
γ: the coefficient for offset loss is set to 1.
In the embodiment of the application, in order to enable the neural network to be applied to smart phones and other terminals, requirements are provided for the running speed, the memory and the power consumption of a neural network model. As can be seen from the above description, a smaller basic network can be adopted on the premise of ensuring the accuracy, and meanwhile, in order to reduce the operation speed and the memory occupation, the embodiment of the present application provides that the initial neural network is compressed according to a preset model compression algorithm to obtain the neural network.
Wherein, a person skilled in the art can select a suitable model compression algorithm according to actual conditions, and the embodiment of the present application provides an optional model compression algorithm, that is, pruning and quantizing the model.
Illustratively, as shown in fig. 14, a VGG16 model is used to train a sufficiently accurate model, which is used to perform supervised learning on the initial neural network of the present solution. And sending the same original training data into the VGG16 network and the initial neural network, and comparing the probability prediction results output by the networks, thereby guiding the evaluation of the effect of the initial neural network after pruning quantification.
And searching the initial neural network, wherein the specified strategy is that the accuracy of the prediction result of the pruned model is not reduced too much. Pruning is carried out according to network hierarchy circulation, then a quantization training strategy is adopted, a quantization model is used for forward prediction, a floating point model is used for backward propagation, and the initial neural network is trained. After repeated multiple times, a quantitative model with the least reduction in precision is found as a final result.
In the embodiment of the present application, the position information and the code block type of any one of the code block regions predicted by the neural network may be output in the following manner, taking the case where four vertices are set as the vertices.
(x1,y1,x2,y2,x3,y3,x4,y4,class)
Wherein (x)i,yi) The coordinates of the ith vertex of the code block region respectively comprise four vertices of a quadrangle, namely an upper left vertex, an upper right vertex, a lower right vertex and a lower left vertex(ii) a class is the code block type to which the code block region belongs.
In practice, the quadrilateral may be a non-parallel/perpendicular rectangle or square. In step S130 of the embodiment of the present application, each code region image is geometrically transformed to obtain a final horizontal pattern for decoding.
Specifically, as shown in fig. 15, for several deformation images shown in the figure, changes in at least one of the following forms may be adopted, but not limited to:
(a) for the deformed code area image, the deformed code area image can be directly used as an image after geometric transformation, namely, an original image is still adopted;
(b) for the deformed code region image, rigid body change can be carried out on the deformed code region image, and the deformed code region image can be converted into a standard horizontal rectangular pattern;
(c) for the deformed code area image, affine change can be carried out on the deformed code area image, and the deformed code area image can be converted into a standard horizontal rectangular pattern;
(d) the deformed code region image can be subjected to projection change and can be converted into a standard horizontal rectangular pattern.
The matrix form incorporating the above several transformations can be expressed as the following coordinate transformation formula:
Figure BDA0002389335930000181
wherein (x, y) is the coordinate value before conversion, and (x ', y') is the coordinate value after conversion. a isijIs the parameter corresponding to the variation matrix. In addition, the two-dimensional plane coordinates are converted into homogeneous coordinates (N + 1-dimensional coordinates are represented by N-dimensional coordinates) by adding an additional variable w. Wherein w may take 1 or other values for different forms of geometric variations.
In the embodiment of the present application, the code block type of the code is already given for the detected code block region, so in step S140, it is only necessary to perform direct decoding processing according to the corresponding type.
As an example, when detecting that the code block type of a certain code block region is a QR two-dimensional code, the following procedure may be invoked to decode the code block region, as shown in fig. 16:
1. adaptive threshold binarization processing for code block region (code region setting image)
The purpose of binarization is to determine the black and white boundaries. After converting the picture into a gray map, each point is represented by a gray value, and a specific gray value can be preset, and if the specific gray value is larger than the specific gray value, the specific gray value is white (0), and if the specific gray value is smaller than the specific gray value, the specific gray value is black (1). From the binarized image, a matrix with a distribution of 0/1 values is obtained. In practice, the use of numerous other means and methods may be involved, such as: shadow removal, information loss optimization when a color picture is converted into a gray-scale image, dirt treatment and the like.
2. Extracting symbol code
In the conventional method, as shown in fig. 17, after the binarization processing is performed on the image to be processed, the image needs to be scanned row by row and column by column, the locator of the two-dimensional code is determined, affine transformation is performed on the image to be processed according to the location of the locator to change the image to be processed into a square, and then the symbol code is extracted to perform corresponding decoding processing.
In the embodiment of the present application, since the position of the code block region is determined more accurately in the previous step and geometric correction is performed, the direction of the two-dimensional code can be determined directly according to the four vertices of the square region, and it is sufficient to simply determine which three vertices are the vertices having the locators.
Then, according to the matrix after binarization determined in the first step, the original image with the pixel as the unit is converted into a symbol code matrix with the module as the unit.
3. Decoding
With the symbol code matrix, the remaining part is decoded according to the encoding specification of the QR code. Firstly obtaining version information, then obtaining format information, obtaining error correction level from the format information, eliminating a mask, reading code word information according to the information arrangement mode of the QR code, then carrying out RS code error correction processing, obtaining information bit stream, and finally analyzing and generating real information.
In another example, when the code block type of a certain code block region is detected as a barcode, the following procedure may be invoked to decode the code block region, as shown in fig. 18:
firstly, the code block region (code region setting image) is subjected to binarization processing of adaptive threshold, and for the bar code which is subjected to geometric transformation, bar information only needs to be directly read from the horizontal direction or the vertical direction, and a symbol code is extracted for decoding.
In another example, when the code block type of a certain code block region is detected as a data matrix code, the following procedure may be invoked to decode the code block region, as shown in fig. 19:
firstly, performing adaptive threshold binarization processing on a code block region (code region setting image), determining a positioning frame according to four vertexes of the code block region, then determining whether four sides of the positioning frame meet requirements, namely whether two L-shaped sides can be formed, extracting a symbol code according to the L-shaped sides, converting a binarized pixel-level image into a symbol code, acquiring corresponding version information, performing error correction code processing, and extracting a final result sequence.
It is understood that the above decoding process for different types of code block regions is only exemplary, and those skilled in the art can expand the above steps to apply the present solution to the detection of other types of codes, such as PDF417 two-dimensional bar code, applet code, etc., and shall also fall within the spirit and scope of the present application.
The image processing method provided by the embodiment of the application adopts a two-step method of firstly carrying out target detection by a neural network and then carrying out identification and information extraction, and can accurately give the position information and the code block type of the region where the code block is located in the image, so that the region can be directly subjected to geometric transformation and converted into a pixel matrix which is convenient for subsequent data processing and information extraction of corresponding types, and more universal and efficient code detection can be realized.
In the embodiment of the present application, an application scenario of the code block detection method in each of the above embodiments is optimized. Specifically, the code block detection method described above can be applied to a smart Camera (Camera) function. The code block detection method can be called by opening the intelligent camera, and corresponding functions can be automatically executed according to scene requirements.
Illustratively, as shown in fig. 20, in an application scenario, the intelligent identification process of the two-dimensional code is as follows:
contacts, phone books, maps, mail, search, etc. intelligent APPs (applications) may invoke the intelligent camera function. After the intelligent camera is started, the camera previews an original image, performs target detection on the original image through a depth learning algorithm, and identifies the detected two-dimensional code through a two-dimensional code identification engine to obtain corresponding information content.
In practical applications, applications in smart cameras that utilize the code block detection method may include, but are not limited to, the following functions: acquiring related information of the contact person by detecting the two-dimensional code; opening a corresponding webpage by detecting the two-dimensional code; acquiring phone book information by detecting the two-dimensional code, and quickly positioning call record information and the like; the functions of mail service, mail sending and the like are quickly started by detecting the two-dimension code; the functions of quick positioning, card punching and the like are realized by detecting the two-dimensional code; and intelligently searching related contents by detecting the two-dimension code. The present invention is not limited to the embodiments described herein, and the embodiments may be extended accordingly by those skilled in the art.
Furthermore, in the embodiment of the application, at least two kinds of code information in the image can be simultaneously identified and simultaneously decoded and displayed through the intelligent camera combining deep learning identification and code information extraction.
As an example, as shown in fig. 21, for a QR two-dimensional code (corresponding to Type QR in fig. 21) and a DataMatrix code (corresponding to Type DM in fig. 21) included in the graph, the recognition results of the two codes (as shown in Data in fig. 21) may be directly displayed by the method of the embodiment of the present application.
In practical application, the method provided by each embodiment can be applied to the aspects of commodity scanning, supermarket article counting and the like, can support quick identification and reading of information of a plurality of code block regions of various types, and has remarkable advantages.
In addition, the deep learning mode can be applied to detection and Recognition functions of one-dimensional codes, two-dimensional codes and the like, and can further expand the types and the ranges of Recognition, and can be combined with other Recognition functions to realize detection tasks of various scenes in a unified manner, such as directions of OCR (Optical Character Recognition), Character Recognition, video action Recognition, other artificial intelligence applications and the like. The present invention is not limited to the embodiments described herein, and the embodiments may be extended accordingly by those skilled in the art.
An embodiment of the present application further provides an image processing apparatus, as shown in fig. 22, the image processing apparatus 220 may include: a determination module 2201, an extraction module 2202, a geometric transformation module 2203, and a decoding module 2204, wherein,
the determining module 2201 is configured to determine the location information and code block type of each code block region in the image to be processed;
the extraction module 2202 is configured to extract a code region image of each code block region from the image to be processed according to the position information of each code block region;
the geometric transformation module 2203 is configured to perform geometric transformation on each code region image to obtain a corresponding code region setting image;
the decoding module 2204 is configured to perform decoding processing on the corresponding code region setting image according to the code block type of each code block region.
In an alternative implementation manner, the determining module 2201, when configured to determine the code block type and the position information of each code block region in the image to be processed, is specifically configured to:
performing multi-scale feature extraction on the image to be processed through a feature extraction network to obtain a feature map;
and respectively detecting the set top points in the characteristic diagram through a detection head network to obtain the position information and the code block type of each code block region.
In an optional implementation manner, the feature extraction network includes a basic network and at least one additional feature extraction layer, and when the determining module 2201 is configured to perform multi-scale feature extraction on the image to be processed to obtain the feature map, it is specifically configured to:
extracting the features of the image to be processed through a basic network to obtain a basic sub-feature map;
and performing feature extraction on the basic sub-feature map based on at least one scale through at least one additional feature extraction layer to obtain a sub-feature map of at least one scale, and obtaining a feature map according to the sub-feature map of at least one scale.
In an alternative implementation manner, when the scales are at least two, the determining module 2201 is specifically configured to, when the feature extraction layer is configured to perform feature extraction on the basic sub-feature map based on at least two scales through at least two additional feature extraction layers to obtain sub-feature maps of at least two scales, and obtain a feature map according to the sub-feature maps of at least two scales:
performing feature extraction on the basic sub-feature map based on a first scale through a first additional feature extraction layer to obtain a sub-feature map of the first scale;
sequentially carrying out feature extraction on the sub-feature map of the previous scale based on a smaller scale from the second additional feature extraction layer to obtain at least one sub-feature map of a smaller scale;
and obtaining a feature map according to the sub-feature map with the first scale and the sub-feature map with at least one smaller scale.
In an alternative implementation, the determining module 2201, when configured to obtain the feature map according to the sub-feature map of the first scale and the sub-feature map of at least one smaller scale, is specifically configured to:
performing expansion convolution on at least one sub-feature map with smaller scale according to the resolution of the sub-feature map with the first scale;
and fusing the at least one expanded and convolved sub-feature map with the first scale to obtain the feature map.
In an alternative implementation manner, the determining module 2201, when configured to detect the set vertices in the feature map respectively to obtain the position information and the code block type of each code block region, is specifically configured to:
respectively detecting set vertexes in the feature map to obtain corresponding thermodynamic diagrams, connection vectors and offset information;
position information and code block types of the code block regions are obtained from the thermodynamic diagrams, the concatenated vectors, and the offset information.
In an alternative implementation, the determining module 2201, when configured to obtain the position information and the code block type of each code block region according to each thermodynamic diagram, the connecting vector and the offset information, is specifically configured to:
determining each prediction vertex according to each thermodynamic diagram, and determining the position information and the code block type of each prediction vertex;
grouping the code block regions of the prediction vertexes according to the connection vectors of the prediction vertexes, determining each group of grouped prediction vertexes as a set vertex of each code block region, and obtaining initial position information and code block types of the corresponding code block regions according to the position information and the code block types of each group of grouped prediction vertexes respectively;
and finely adjusting the initial position information of the corresponding code block region based on the position information of the corresponding prediction vertex according to the offset information of each prediction vertex to obtain the position information of each code block region.
In an alternative implementation, the determining module 2201, when configured to group the code block regions of the prediction vertices according to the connection vectors of the prediction vertices, is specifically configured to:
determining corresponding reference vectors according to the connection vectors of the set prediction vertexes of each group;
and grouping the code block regions of the prediction vertexes according to the distances between the prediction vertexes set in each group and the corresponding reference vectors.
In an optional implementation manner, the determining module 2201, when configured to perform fine adjustment on the initial position information of the corresponding code block region based on the position information of the corresponding prediction vertex according to the offset information of each prediction vertex to obtain the position information of each code block region, is specifically configured to:
determining the offset information of each prediction vertex according to the real position of each prediction vertex on the image to be processed and the mapping position of each prediction vertex on the feature map;
and according to the offset information of each prediction vertex, fine-tuning the initial position information of the corresponding code block region based on the position information of the corresponding prediction vertex.
In an alternative implementation, the code region setting image is a code region rectangular image;
the detection head network comprises four detection networks, and the four detection networks respectively correspond to four vertexes, namely upper left vertex, upper right vertex, lower right vertex and lower left vertex;
the determining module 2201 is specifically configured to, when the determining module is configured to detect the set vertices in the feature map through the detection head network to obtain the position information and the code block type of each code block region:
and respectively detecting four vertexes in the characteristic diagram through four detection networks to obtain the position information and the code block type of each code block region.
In an alternative implementation, the geometric transformation module 2203, when used for geometrically transforming any code region image, is specifically configured to at least one of:
taking any code region image as an image after geometric transformation;
carrying out rigid body change on any code region image;
carrying out affine change on any code region image;
and carrying out projection change on any code region image.
In an alternative implementation, the code block types include a quick response QR two-dimensional code, a data matrix code, a portable data file PDF417 two-dimensional barcode, a barcode, and an applet code.
It can be clearly understood by those skilled in the art that the image processing apparatus provided in the embodiment of the present application has the same implementation principle and the same technical effect as those of the foregoing method embodiment, and for convenience and brevity of description, corresponding contents in the foregoing method embodiment may be referred to where no part of the apparatus embodiment is mentioned, and are not repeated herein.
An embodiment of the present application further provides an electronic device, including: a processor and a memory, the memory storing at least one instruction, at least one program, set of codes or set of instructions, which is loaded and executed by the processor to implement the respective content of the aforementioned method embodiments.
Optionally, the electronic device may further comprise a transceiver. The processor is coupled to the transceiver, such as via a bus. It should be noted that the transceiver in practical application is not limited to one, and the structure of the electronic device does not constitute a limitation to the embodiments of the present application.
The processor may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. A processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, a DSP and a microprocessor, or the like.
A bus may include a path that transfers information between the above components. The bus may be a PCI bus or an EISA bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. The memory may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The embodiment of the present application also provides a computer-readable storage medium for storing computer instructions, which when run on a computer, enable the computer to execute the corresponding content in the foregoing method embodiments.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (15)

1. An image processing method, comprising:
determining the position information and code block type of each code block region in the image to be processed;
extracting code region images of the code block regions from the image to be processed according to the position information of the code block regions;
respectively carrying out geometric transformation on each code region image to obtain a corresponding code region setting image;
and decoding the corresponding code region setting image according to the code block type of each code block region.
2. The image processing method according to claim 1, wherein the determining the position information and code block type of each code block region in the image to be processed comprises:
performing multi-scale feature extraction on the image to be processed through a feature extraction network to obtain a feature map;
and respectively detecting the set top points in the characteristic diagram through a detection head network to obtain the position information and the code block type of each code block region.
3. The image processing method according to claim 2, wherein the feature extraction network comprises a basic network and at least one additional feature extraction layer, and performing multi-scale feature extraction on the image to be processed through the feature extraction network to obtain a feature map comprises:
extracting the features of the image to be processed through the basic network to obtain a basic sub-feature map;
and performing feature extraction on the basic sub-feature map based on at least one scale through the at least one additional feature extraction layer to obtain a sub-feature map of at least one scale, and obtaining the feature map according to the sub-feature map of at least one scale.
4. The image processing method according to claim 3, wherein when the scales are at least two, performing feature extraction on the basic sub-feature map based on at least two scales through the at least two additional feature extraction layers to obtain sub-feature maps of at least two scales, and obtaining the feature map according to the sub-feature maps of at least two scales comprises:
performing feature extraction on the basic sub-feature map based on a first scale through a first additional feature extraction layer to obtain a sub-feature map of the first scale;
sequentially carrying out feature extraction on the sub-feature map of the previous scale based on a smaller scale from the second additional feature extraction layer to obtain at least one sub-feature map of a smaller scale;
and obtaining the feature map according to the sub-feature map of the first scale and the sub-feature map of the at least one smaller scale.
5. The image processing method according to claim 4, wherein obtaining the feature map according to the sub-feature map of the first scale and the at least one sub-feature map of a smaller scale comprises:
performing expansion convolution on the sub-feature map of the at least one smaller scale according to the resolution of the sub-feature map of the first scale;
and fusing the at least one expanded and convolved sub-feature map with the sub-feature map of the first scale to obtain the feature map.
6. The image processing method according to any one of claims 2 to 5, wherein the detecting the set vertices in the feature map to obtain the position information and code block type of each code block region includes:
respectively detecting set vertexes in the feature map to obtain corresponding thermodynamic diagrams, connection vectors and offset information;
and obtaining the position information and the code block type of each code block region according to each thermodynamic diagram, the connecting vector and the offset information.
7. The image processing method according to claim 6, wherein the deriving the position information and code block type of each code block region from each thermodynamic diagram, a connecting vector, and offset information comprises:
determining each prediction vertex according to each thermodynamic diagram, and determining the position information and the code block type of each prediction vertex;
grouping the code block regions of the prediction vertexes according to the connection vectors of the prediction vertexes, determining each group of grouped prediction vertexes as a set vertex of each code block region, and obtaining initial position information and code block types of the corresponding code block regions according to the position information and the code block types of each group of grouped prediction vertexes respectively;
and fine-tuning the initial position information of the corresponding code block region based on the position information of the corresponding prediction vertex according to the offset information of each prediction vertex to obtain the position information of each code block region.
8. The image processing method according to claim 7, wherein the grouping of the code block region for each of the prediction vertices according to the connected vectors of the prediction vertices comprises:
determining corresponding reference vectors according to the connection vectors of the set prediction vertexes of each group;
and grouping the code block regions of the prediction vertexes according to the distances between the prediction vertexes set in each group and the corresponding reference vectors.
9. The method according to claim 7, wherein the fine-tuning, based on the position information of the corresponding prediction vertex, initial position information of the corresponding code block region according to the offset information of the respective prediction vertex to obtain the position information of the respective code block region comprises:
determining the offset information of each prediction vertex according to the real position of each prediction vertex on the image to be processed and the mapping position of each prediction vertex on the feature map;
and fine-tuning the initial position information of the corresponding code block region based on the position information of the corresponding prediction vertex according to the offset information of each prediction vertex.
10. The image processing method according to any of claims 2-9, wherein the code region setting image is a code region rectangular image;
the detection head network comprises four detection networks, and the four detection networks respectively correspond to four vertexes, namely upper left vertex, upper right vertex, lower right vertex and lower left vertex;
the detecting the set top points in the feature map respectively through the detecting head network to obtain the position information and the code block type of each code block region includes:
and respectively detecting four vertexes in the feature map through the four detection networks to obtain the position information and the code block type of each code block region.
11. The image processing method according to any of claims 1-10, wherein geometrically transforming any code region image comprises at least one of:
taking any code region image as an image after geometric transformation;
carrying out rigid body change on any code region image;
carrying out affine change on any code region image;
and carrying out projection change on any code region image.
12. The image processing method according to any one of claims 1 to 11, wherein the code block types include a quick response QR two-dimensional code, a datamatrix code, a portable data file PDF417 two-dimensional barcode, a barcode, and a applet code.
13. An image processing apparatus characterized by comprising:
the determining module is used for determining the position information and the code block type of each code block region in the image to be processed;
an extraction module, configured to extract a code region image of each code block region from the image to be processed according to the position information of each code block region;
the geometric transformation module is used for respectively carrying out geometric transformation on each code region image to obtain a corresponding code region setting image;
and the decoding module is used for decoding the corresponding code region setting image according to the code block type of each code block region.
14. An electronic device, comprising: a processor and a memory, the memory storing at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the method of any of claims 1-12.
15. A computer-readable storage medium for storing a computer instruction, a program, a set of codes, or a set of instructions, which, when run on a computer, causes the computer to perform the method of any one of claims 1-12.
CN202010109109.4A 2020-02-21 2020-02-21 Image processing method, image processing device, electronic equipment and computer readable storage medium Pending CN113297870A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010109109.4A CN113297870A (en) 2020-02-21 2020-02-21 Image processing method, image processing device, electronic equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010109109.4A CN113297870A (en) 2020-02-21 2020-02-21 Image processing method, image processing device, electronic equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN113297870A true CN113297870A (en) 2021-08-24

Family

ID=77317578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010109109.4A Pending CN113297870A (en) 2020-02-21 2020-02-21 Image processing method, image processing device, electronic equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN113297870A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113962242A (en) * 2021-11-11 2022-01-21 重庆赛迪奇智人工智能科技有限公司 Two-dimensional code number plate identification method and device, electronic equipment and storage medium
CN114548132A (en) * 2022-02-22 2022-05-27 广东奥普特科技股份有限公司 Bar code detection model training method and device and bar code detection method and device
CN116451720A (en) * 2023-06-09 2023-07-18 陕西西煤云商信息科技有限公司 Warehouse material scanning and identifying method and identifying system thereof

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113962242A (en) * 2021-11-11 2022-01-21 重庆赛迪奇智人工智能科技有限公司 Two-dimensional code number plate identification method and device, electronic equipment and storage medium
CN114548132A (en) * 2022-02-22 2022-05-27 广东奥普特科技股份有限公司 Bar code detection model training method and device and bar code detection method and device
CN116451720A (en) * 2023-06-09 2023-07-18 陕西西煤云商信息科技有限公司 Warehouse material scanning and identifying method and identifying system thereof

Similar Documents

Publication Publication Date Title
CN110738207B (en) Character detection method for fusing character area edge information in character image
CN110046529B (en) Two-dimensional code identification method, device and equipment
US11475681B2 (en) Image processing method, apparatus, electronic device and computer readable storage medium
CN113111871B (en) Training method and device of text recognition model, text recognition method and device
CN110222687B (en) Complex background card surface information identification method and system
CN113297870A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN110717366A (en) Text information identification method, device, equipment and storage medium
US20230351130A1 (en) Method for detecting and reading a matrix code marked on a glass substrate
CN111178290A (en) Signature verification method and device
Molina-Moreno et al. Efficient scale-adaptive license plate detection system
CN112926564B (en) Picture analysis method, system, computer device and computer readable storage medium
CA3045391C (en) Method for detection and recognition of long-range high-density visual markers
CN113626444B (en) Table query method, device, equipment and medium based on bitmap algorithm
CN112560845A (en) Character recognition method and device, intelligent meal taking cabinet, electronic equipment and storage medium
CN111652011B (en) Method and device for reading two-dimensional code
CN116884003A (en) Picture automatic labeling method and device, electronic equipment and storage medium
CN116704505A (en) Target detection method, device, equipment and storage medium
Jiang et al. An efficient and unified recognition method for multiple license plates in unconstrained scenarios
CN117037049B (en) Image content detection method and system based on YOLOv5 deep learning
CN112287763A (en) Image processing method, apparatus, device and medium
CN110704667B (en) Rapid similarity graph detection method based on semantic information
CN115082944A (en) Intelligent identification and segmentation method, system and terminal for table
CN114494678A (en) Character recognition method and electronic equipment
CN115063831A (en) High-performance pedestrian retrieval and re-identification method and device
CN113780116A (en) Invoice classification method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20210824

WD01 Invention patent application deemed withdrawn after publication