CN110598715A - Image recognition method and device, computer equipment and readable storage medium - Google Patents

Image recognition method and device, computer equipment and readable storage medium Download PDF

Info

Publication number
CN110598715A
CN110598715A CN201910851503.2A CN201910851503A CN110598715A CN 110598715 A CN110598715 A CN 110598715A CN 201910851503 A CN201910851503 A CN 201910851503A CN 110598715 A CN110598715 A CN 110598715A
Authority
CN
China
Prior art keywords
attribute
region
scale
image
area
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910851503.2A
Other languages
Chinese (zh)
Inventor
高立钊
孙冲
许海华
賈佳亞
戴宇榮
沈小勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201910851503.2A priority Critical patent/CN110598715A/en
Publication of CN110598715A publication Critical patent/CN110598715A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • G06V10/464Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations

Abstract

The embodiment of the application discloses an image identification method, an image identification device, computer equipment and a readable storage medium, and relates to the computer vision technology of artificial intelligence; specifically, an image to be recognized can be acquired; detecting attribute areas of the objects in the image to be identified to obtain a plurality of original attribute areas; carrying out scale transformation on the original attribute region to obtain candidate attribute regions with multiple scales; extracting the features of the candidate attribute region to obtain a multi-scale feature group, wherein the multi-scale feature group comprises the region features of the candidate attribute region of each scale; selecting attribute areas of corresponding scales from the candidate attribute areas of multiple scales based on the multi-scale feature group to obtain a target attribute area combination corresponding to the original attribute area; and identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result. The scheme can improve the accuracy of image recognition.

Description

Image recognition method and device, computer equipment and readable storage medium
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to an image identification method, an image identification device, computer equipment and a readable storage medium.
Background
The main task of fine-grained image recognition is to identify or divide different subclasses belonging to the same large class of images, for example, to further classify bird pictures.
Currently, a commonly used fine-grained image recognition scheme is to use Artificial Intelligence (AI) to realize image recognition fine-grained image recognition, and specifically, to capture local detail information of an object in an image by using AI, and to recognize the object in the image based on the local detail information. For example, a convolutional neural network with an image recognition function may be trained by using a sample image labeled with an attribute region, an image to be recognized is input to the trained convolutional neural network, the convolutional neural network detects the attribute region of a target object in the image to be recognized, and then the convolutional neural network classifies the target object in the image to be recognized according to feature information of the attribute region, thereby implementing image recognition.
However, the current AI-based image recognition scheme cannot adapt to the difference between the images, and therefore, the detected attribute region often has the problem of inaccurate positioning of the scale, the position and the like, and the accuracy of image recognition (or classification) is low.
Disclosure of Invention
The embodiment of the application provides an image identification method, an image identification device, computer equipment and a readable storage medium, which can improve the accuracy of image identification.
The embodiment of the application provides an image identification method, which comprises the following steps:
acquiring an image to be identified;
detecting attribute areas of the objects in the image to be identified to obtain a plurality of original attribute areas;
carrying out scale transformation on the original attribute region to obtain candidate attribute regions with multiple scales;
extracting the features of the candidate attribute region to obtain a multi-scale feature group, wherein the multi-scale feature group comprises the region features of the candidate attribute region of each scale;
selecting attribute areas of corresponding scales from the candidate attribute areas of multiple scales based on the multi-scale feature group to obtain a target attribute area combination corresponding to the original attribute area;
and identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result.
Correspondingly, an embodiment of the present application further provides an image recognition apparatus, including:
the device comprises an acquisition unit, a recognition unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be recognized;
the detection unit is used for detecting the attribute areas of the objects in the image to be identified to obtain a plurality of original attribute areas;
the transformation unit is used for carrying out scale transformation on the original attribute region to obtain candidate attribute regions with a plurality of scales;
the characteristic extraction unit is used for extracting the characteristics of the candidate attribute regions to obtain a multi-scale characteristic group, and the multi-scale characteristic group comprises the region characteristics of the candidate attribute regions of each scale;
the region selection unit is used for selecting attribute regions of corresponding scales from the candidate attribute regions of multiple scales based on the multi-scale feature group to obtain a target attribute region combination corresponding to the original attribute region;
and the identification unit is used for identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result.
In one embodiment, the region selection unit includes:
the matrix construction subunit is used for constructing a corresponding characteristic circulation matrix based on the multi-scale characteristic group;
the response value operator unit is used for calculating a scale response value corresponding to the candidate attribute region combination based on the characteristic cycle matrix, and the scale response value is the response value of the optimal scale combination of the candidate attribute region combination;
and the selecting subunit is used for selecting a target attribute region combination corresponding to the original attribute region from the candidate attribute region combinations based on the scale response value.
In an embodiment, the response value operator unit is configured to: obtaining a scale selection weight parameter; and calculating a scale response value of the candidate attribute region combination according to the scale selection weight parameter and the feature circulation matrix.
In an embodiment, the response value operator unit is configured to: and carrying out Fourier transform processing according to the scale selection weight parameters and the characteristic circulation matrix to obtain a scale response value of the candidate attribute region combination.
In one embodiment, the identification unit includes:
the fusion subunit is used for fusing the regional characteristics of the attribute regions in the target attribute region combination to obtain the target regional characteristics corresponding to the original attribute region;
and the identification subunit is used for identifying the object in the image to be identified according to the target area characteristic corresponding to the original attribute area to obtain an identification result.
In an embodiment, the fusion subunit is configured to fuse, for each target attribute region combination corresponding to the original attribute region, the region features of the attribute regions in the target attribute region combination to obtain a combined region feature corresponding to each target attribute region combination; and according to the combined area characteristic corresponding to each target attribute area combination, acquiring the target area characteristic corresponding to the original attribute area.
In one embodiment, the identifier subunit is configured to: performing a plurality of spatial directional pooling treatments on the target area characteristics corresponding to the original attribute area to obtain a plurality of pooled characteristics; selecting corresponding features from the plurality of pooled features to obtain target pooled features corresponding to the original attribute region; and identifying the object in the image to be identified according to the target pooling characteristic corresponding to the original attribute region to obtain an identification result.
In one embodiment, the identifier subunit is configured to: fusing the target area characteristics corresponding to each original attribute area to obtain fused target area characteristics; and identifying the object in the image to be identified according to the fused target area characteristics to obtain an identification result.
In one embodiment, the identifier subunit is configured to:
dividing the target area feature map corresponding to the original attribute area into a plurality of overlapped feature sub-blocks;
expanding each feature sub-block into a feature vector to obtain a feature vector set corresponding to the feature map;
and performing a plurality of spatial directional pooling processes on the feature vector set to obtain a plurality of pooled features.
In one embodiment, the identifier subunit is configured to:
performing full-connection operation on the characteristic information after each pooling process based on a shared full-connection layer to obtain a plurality of characteristic information with the same dimensionality;
and selecting corresponding features from the feature information with the same dimensions to obtain the target pooling features corresponding to the attribute region combination.
In an embodiment, the detection unit is configured to perform attribute region detection on the object in the image to be recognized by using a trained region detection network.
In an embodiment, the image recognition apparatus may further include: a training unit;
the training unit is configured to:
acquiring a plurality of sample images of the marked attribute regions;
respectively carrying out attribute region detection on the sample objects in the sample images by adopting a preset region detection network to obtain n sample attribute regions of the sample images, wherein n is a positive integer greater than 1;
acquiring cross entropy loss of an s-th sample attribute area in each sample image, wherein s is a positive integer and is less than or equal to n;
acquiring similarity information between the s-th sample attribute areas in each sample image;
and training the preset area detection network according to the cross entropy loss and the similarity information to obtain the trained area detection network.
In an embodiment, the transformation unit is configured to: determining a central point and a plurality of vertexes of the original attribute region; determining a first vertex far from the central point and a second vertex close to the central point from the plurality of vertices; and amplifying the scale of the original attribute region according to the first vertex, and reducing the scale of the original attribute region according to the second vertex to obtain candidate attribute regions with multiple scales.
Correspondingly, the present application also provides a computer device, which includes a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the steps in any of the image recognition methods provided in the embodiments of the present application.
In addition, the embodiment of the present application further provides a computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps in any one of the image recognition methods provided by the embodiment of the present application.
The method and the device can acquire the image to be identified; detecting attribute areas of the objects in the image to be identified to obtain a plurality of original attribute areas; carrying out scale transformation on the original attribute region to obtain candidate attribute regions with multiple scales; extracting the features of the candidate attribute region to obtain a multi-scale feature group, wherein the multi-scale feature group comprises the region features of the candidate attribute region of each scale; selecting attribute areas of corresponding scales from the candidate attribute areas of multiple scales based on the multi-scale feature group to obtain a target attribute area combination corresponding to the original attribute area; and identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result. According to the scheme, multi-scale candidate attribute regions can be generated for a single original attribute region, and the attribute region with the best scale is selected in a self-adaptive mode, so that images with different scales can be self-adapted, accurate positioning or attribute region detection is achieved, and accuracy of image identification is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a scene schematic diagram of an image recognition method provided in an embodiment of the present application;
FIG. 2a is a flowchart of an image recognition method provided in an embodiment of the present application;
FIG. 2b is a block diagram of an image recognition framework provided by an embodiment of the present application;
fig. 2c is a schematic structural diagram of a DSM module provided in an embodiment of the present application;
FIG. 3a is an image of a bird specimen provided by an embodiment of the present application;
FIG. 3b is a schematic flow chart of an image recognition method according to an embodiment of the present disclosure;
fig. 4a is a schematic structural diagram of an image recognition apparatus provided in an embodiment of the present application;
fig. 4b is a schematic structural diagram of an image recognition apparatus provided in the embodiment of the present application;
fig. 4c is another schematic structural diagram of an image recognition apparatus provided in the embodiment of the present application;
fig. 4d is another schematic structural diagram of an image recognition apparatus provided in the embodiment of the present application;
fig. 5 is a schematic structural diagram of a computer device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the application provides an image identification method, an image identification device, computer equipment and a computer readable storage medium. The image recognition device may be integrated in a computer device, and the computer device may be a server or a terminal.
The image recognition scheme provided by the embodiment of the application relates to the Computer Vision technology (CV) of artificial intelligence. Image recognition such as fine-grained image recognition or classification can be realized through an artificial intelligence computer vision technology, and a recognition result is obtained.
Computer Vision technology (CV) is a science for researching how to make a machine look, and more specifically, it refers to that a camera and a Computer are used to replace human eyes to perform machine Vision such as identification, tracking and measurement on a target, and further perform graphic processing, so that the Computer processing becomes an image more suitable for human eyes to observe or transmitted to an instrument to detect. As a scientific discipline, computer vision research-related theories and techniques attempt to build artificial intelligence systems that can capture information from images or multidimensional data. Computer vision technologies generally include technologies such as image processing, image recognition, image segmentation, image semantic understanding, image retrieval, OCR, video processing, video semantic understanding, video content/behavior recognition, three-dimensional object reconstruction, 3D technology, virtual reality, augmented reality, synchronous positioning, map construction, and the like, and also include common biometric technologies such as face recognition, fingerprint recognition, and the like.
In the embodiments of the present application, the term "image recognition" refers to a technique and a process for identifying a region having a unique property from an image and finding an object of interest. In the embodiment of the present application, the identification of the target object in the image is mainly referred to, for example, the identification of the animal type of the animal in the animal image, and the like.
For example, referring to fig. 1, taking as an example that the image recognition apparatus is integrated in a computing device, the computing device may acquire an image to be recognized, for example, the computing device may acquire a band recognition image by itself through a camera or the like; detecting attribute areas of an object in an image to be identified to obtain a plurality of original attribute areas; carrying out scale transformation on the original attribute region to obtain candidate attribute regions with multiple scales; extracting the features of the candidate attribute regions to obtain a multi-scale feature group, wherein the multi-scale feature group comprises the region features of the candidate attribute regions of each scale; selecting attribute areas of corresponding scales from the candidate attribute areas of multiple scales based on the multi-scale feature group to obtain a target attribute area combination corresponding to the original attribute area; and identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result. For example, the type of the object in the image may be identified.
The following are detailed below. It should be noted that the following description of the embodiments is not intended to limit the preferred order of the embodiments.
The embodiment will be described from the perspective of an image recognition apparatus, which may be specifically integrated in a computer device, where the computer device may be a server or a terminal; the terminal may include a tablet Computer, a notebook Computer, a Personal Computer (PC), a micro processing box, or other devices.
As shown in fig. 2a, the specific flow of the image recognition method may be as follows:
201. and acquiring an image to be identified.
The image to be recognized is an image containing an object to be recognized, the object to be recognized may be an object such as an animal, a person, a building, etc., for example, the image to be recognized may be an image of a bird, an image of a circuit board (including a defective wire, or a device).
In the embodiment of the application, the image to be identified can be acquired by the image acquisition equipment and then provided for the image identification device. The image capturing device may include a terminal having an image capturing function, such as a mobile phone.
In addition, in an embodiment, the image to be recognized may also be acquired by an image recognition apparatus, such as a computer device in which the image recognition apparatus is located.
202. And detecting the attribute regions of the object in the image to be identified to obtain a plurality of original attribute regions.
The attribute region may be a local region, such as a local rectangular region, in the image, which can represent a certain attribute (e.g., a head) of the target object. For example, a localized area of the bird's head, foot, or abdomen may be characterized in an image of the bird. For another example, localized areas of defects may be characterized in the circuit board image, etc.
In the embodiment of the present application, the attribute region detection may be implemented by using a deep learning network, for example, a region detection network may be trained, and the trained region detection network is used to detect the attribute region. That is, the step of "performing attribute region detection on an object in an image to be recognized" may include: and adopting the trained area detection network to detect the attribute area of the object in the image to be recognized.
The training method of the area detection network includes a plurality of training methods, for example, a back propagation method is used for training, for example, a sample image labeled with an attribute area may be used for training the area detection network, for example, loss of a prediction attribute area and a labeled attribute area of the area detection network is calculated, and the network is trained based on the loss.
In an embodiment, in order to improve the detection accuracy of the attribute regions and further improve the accuracy of image recognition, morphological constraints may be introduced during the network training process to make the ith attribute regions of different samples have the same semantic information, for example, the first attribute regions are all bird heads, thereby implementing semantic alignment of the attribute regions.
The specific training process is as follows:
acquiring a plurality of sample images of the marked attribute regions;
respectively carrying out attribute region detection on sample objects in the sample images by adopting a preset region detection network to obtain n sample attribute regions of the sample images, wherein n is a positive integer greater than 1;
acquiring cross entropy loss of an s-th sample attribute area in each sample image, wherein s is a positive integer and is less than or equal to n;
acquiring similarity information between the s-th sample attribute areas in each sample image;
and training the preset area detection network according to the cross entropy loss and the similarity information to obtain the trained area detection network.
The cross entropy may be the cross entropy of the predicted s-th sample attribute region and the labeled s-th attribute region, and may be calculated according to the labeled region and the predicted region.
The similarity information may include euclidean distances and the like, and may be calculated based on the region features of the s-th sample attribute region in each sample image.
For example, the embodiment of the present application may introduce a morphological constraint item during the training of the area detection network, so as to achieve semantic alignment of the attribute areas (for example, let the s-th attribute areas of different samples all correspond to the header). The loss function containing the constraint term is expressed as:the first two terms of the loss function represent cross entropy loss of the s-th attribute region of the i, j two samples, and the last term represents the Euclidean distance of the features of the s-th attribute region of the i, j two samples. According to the embodiment of the application, the attribute regions can be realized in a mode of constraining the distance of the s-th attribute region of different samplesSemantic alignment, and the accuracy of detecting and positioning attribute areas is improved, so that the accuracy of image classification (image identification) is improved.
Original attribute regions such as a head region of a bird, an abdomen region of a bird, a foot region of a bird, and the like can be detected from an image to be recognized such as a bird image by the above steps.
203. And carrying out scale transformation on the original attribute region to obtain candidate attribute regions with multiple scales.
For example, scale transformation may be performed on each original attribute region to obtain candidate attribute regions (multi-scale candidate attribute regions) of multiple scales corresponding to each original attribute region.
Due to the fact that the scales of different images are different, in order to be capable of being adaptive to the scales of various images, accurately positioning the attribute region and improving the accuracy of image identification, the multi-scale candidate attribute region can be generated for a single attribute region, then, the optimal scale is selected from the multi-scale candidate attribute region in a self-adaptive mode, alignment of the multi-scale attribute region is achieved, and the attribute region is accurately positioned.
Among them, there are various ways of scaling, such as reduction, enlargement, elongation, etc. The method can be selected according to actual requirements. The number of the generated candidate attribute regions may be set according to actual requirements, such as an odd number of regions.
In an embodiment, the original attribute region is subjected to reduction and enlargement processing to obtain a multi-scale candidate attribute region of the original attribute region. Specifically, the scaling process of the original attribute region is as follows:
determining a central point and a plurality of vertexes of an original attribute region;
determining a first vertex far from the central point and a second vertex close to the central point from the plurality of vertices;
and amplifying the scale of the original attribute region according to the first vertex, and reducing the scale of the original attribute region according to the second vertex to obtain candidate attribute regions with multiple scales.
For example, a center point and two vertices of the original attribute region are determined, and then one of the vertices is defined as a first vertex distant from the center point and the other vertex is defined as a second vertex close to the center point according to a preset rule, for example, randomly.
In one embodiment, the first vertex may be fixed, and the scale of the original attribute region is enlarged based on the scale factor; and fixing the second vertex, and reducing the scale of the original attribute region based on the scale scaling factor.
For example, in an embodiment, referring to fig. 2b, for a frame image of image recognition in the embodiment of the present application, as shown in fig. 2b, a plurality of bird images to be recognized may be obtained, then, an original attribute region in each bird image is detected through a region detection network, and then, operations such as Scale transformation, optimal region selection, and the like may be implemented by using a Scale Mining decision module (DSM) so as to implement alignment of multi-Scale attribute regions and precise positioning of attribute regions.
The DSM module can generate the multi-scale attribute region and realize the alignment and response value output of the multi-scale attribute region. Assume DSM module uses Ai=[φiiii]Denotes the ith property region, where φ i denotes the property region characteristics, ψiAnd ηiTwo vertex coordinates, gamma, each representing an attribute regioni=(wi,hi) Indicating the width and height of the attribute region.
First, the center coordinates of the selected K attribute regions are calculated, and ψ is calculatediDefine as vertices away from the center, ηiDefined as the vertex near the center. The algorithm generates S (odd) multi-scale candidate attribute regions by enlarging and reducing the original attribute region. '
If the attribute region is narrowed, η should be fixediNarrowing the width and height of the original attribute regionA candidate attribute region represented as
If the attribute region is enlarged, psi should be fixediAmplifying the width and height of the original attribute regionA candidate attribute region represented asWhere alpha represents a scaling factor.
Through the above steps, a multi-scale candidate attribute region of each original attribute region, for example, a multi-scale candidate attribute region corresponding to a head region of a bird, may be obtained.
204. And performing feature extraction on the candidate attribute region to obtain a multi-scale feature group, wherein the multi-scale feature group comprises the region features of the candidate attribute region of each scale.
For example, feature extraction may be performed on candidate attribute regions of each scale of the original attribute to obtain a multi-scale feature group.
In an embodiment, in order to improve efficiency and accuracy of feature extraction, a convolutional neural network may be used to extract features of the candidate attribute region, for example, an image feature extraction network, which may be a network such as a residual error network, is used to extract the features.
For example, referring to fig. 2b, a convolutional neural network in a DSM module may be used to extract region features of each candidate attribute region of the original attribute region to obtain a set of multi-scale features
205. And selecting attribute regions of corresponding scales from the candidate attribute regions of multiple scales based on the multi-scale feature group to obtain a target attribute region combination corresponding to the original attribute region.
For example, for each original attribute region, a corresponding number of attribute regions may be selected from the multi-scale candidate attribute regions corresponding to each original attribute region, so as to obtain an attribute region combination corresponding to each original attribute region.
Wherein the target attribute region combination comprises one or more attribute regions selected from the multi-scale candidate attribute regions. For example, candidate attribute regions of 3 scales may be selected from 5 candidate attribute regions of the bird head region to constitute an attribute region combination.
In an embodiment, the target attribute region combination may be an attribute region combination that is optimal for the current image to be identified. That is, the scale of the region in the target attribute region combination is the best match or best scale for the scale of the currently identified image.
In an embodiment, in order to improve the selection efficiency and accuracy of the attribute regions, multi-scale alignment may be achieved through a circulant matrix expression (Toeplitz matrix) based on multi-scale features, multi-scale response values are obtained, and attribute region combinations are selected based on the scale response values. For example, the step "selecting an attribute region of a corresponding scale from candidate attribute regions of multiple scales based on a multi-scale feature group to obtain a target attribute region combination corresponding to an original attribute region" may include:
constructing a corresponding characteristic circulation matrix based on the multi-scale characteristic group;
calculating a scale response value corresponding to the candidate attribute region combination based on the characteristic cycle matrix, wherein the scale response value is used for representing that the scale combination of the candidate attribute region combination is the response value of the optimal scale combination;
and selecting a target attribute region combination corresponding to the original attribute region from the candidate attribute region combinations based on the scale response value.
Wherein the candidate attribute region combination may include a region combination composed of a predetermined number of candidate attribute regions in the multi-scale candidate attribute region.
The predetermined number is the number of the regions that need to be selected from the multi-scale candidate attribute region, and may be set according to actual requirements, for example, 1, 3, 4, and so on. For example, the candidate attribute region combination may be a region combination composed of any 3 candidate attribute regions in the multi-scale candidate attribute region.
The scale response value corresponding to the candidate attribute region combination is the response value of the scale combination of the candidate attribute region combination as the optimal scale combination; the scale response value represents the distance between the candidate attribute region combination and the optimal scale combination, and the like; a larger response value indicates a smaller distance from the optimal scale combination (i.e., closer to the optimal scale combination); conversely, a smaller response value indicates a greater distance from the optimal scale combination (i.e., a closer to the optimal scale combination).
In this embodiment of the present application, there are multiple candidate attribute region combinations, and the number may be equal to the number of combinations for selecting a predetermined number of regions in the multi-scale candidate attribute region. The embodiment of the application can calculate the scale response value of each or each candidate attribute region combination based on the feature cycle matrix, and then determine the attribute region combination (i.e. the optimal attribute region combination) corresponding to the original attribute region from a plurality of candidate attribute region combinations according to the scale response value.
In one embodiment, the attribute region combination may be selected by setting a scale selection weight parameter; specifically, the step of "calculating a scale response value corresponding to the candidate attribute region combination based on the feature circulant matrix" may include:
obtaining a scale selection weight parameter;
and calculating a scale response value of the candidate attribute region combination according to the scale selection weight parameter and the characteristic circulation matrix.
For example, in an embodiment, a matrix product mode may be adopted to perform matrix multiplication on the scale selection weight parameter and the feature circulant matrix to obtain a scale response value of the candidate attribute region combination.
For example, referring to fig. 2b, a convolutional neural network in a DSM module is used to extract the region features of each candidate attribute region of the original attribute region to obtain a set of multi-scale featuresThereafter, a set of multi-scale features can be constructedThe cyclic matrix of (2) is as follows:
wherein the content of the first and second substances,representing feature vectorsThe kth element of (1).
The response value of the candidate attribute region combination can be calculated by the following formula:
whereinI.e. selecting a weight parameter for the scale.
Therefore, the selection of the area combination can be realized through the cyclic shift of the cyclic matrix, the efficiency and the accuracy of the selection of the area combination can be improved, and the positioning accuracy of the attribute area is further improved.
In an embodiment, in order to increase the calculation speed of the response value and increase the image recognition efficiency, an over Fast Fourier Transform (FFT) method may be used to calculate the response value. That is, the step of "selecting a weight parameter and a feature circulant matrix according to a scale, and calculating a scale response value of a candidate attribute region combination" may include: and carrying out Fourier transform processing according to the scale selection weight parameters and the characteristic circulation matrix to obtain a scale response value of the candidate attribute region combination.
For example, according to the characteristics of the circulant matrix, the response value can also be calculated by FFT:
wherein Representing a Fourier matrix, (.)CRepresenting complex conjugates, the corresponding algorithm time complexity is O (KSlog)2 S)。
In the back propagation stage of DSM module, if delta is usediDenotes yiAn error of (2), thenCan be expressed asWhereinIs thatCorresponding to the shifted expression.Can be expressed asTherefore, the adaptive selection of the optimal scale can be realized based on the response value prediction mode of the multi-scale feature cyclic shift expression.
Through the steps, the optimal target attribute region combination can be selected from the multi-scale candidate attribute regions of the original attribute region. For example, referring to fig. 2c, as a structure diagram of a DSM module, original attribute regions (i.e. rectangular frames) in an input bird image are detected through a region detection network, and then, a DSM may be used to perform scale transformation on each original attribute region to obtain 5 scale candidate attribute regions of each original attribute region; then, extracting the region characteristics of each scale candidate attribute region based on the convolutional neural network, and selecting 3 optimal attribute regions from the 5 scale candidate attribute regions through a circulant matrix module to obtain an optimal region combination.
In one embodiment, the number of the optimal attribute region combinations may be multiple, and therefore, multiple attribute region combinations may be selected from the candidate attribute region combinations based on the scale response value, and then, a target attribute region combination, i.e., a final optimal region combination, may be selected from the multiple attribute region combinations. For example, the combination with the largest feature value may be selected according to the feature value corresponding to the attribute region combination, for example, the combination with the largest feature value is selected as the target attribute region combination.
In practical application, after selecting the target attribute region combination of the original attribute regions, any region in the target attribute region combination may be used to represent the combination, or the combination itself may be directly used to represent the combination.
For example, referring to FIG. 2b, after detecting the original attribute regions of two bird images, the optimal attribute region (e.g., optimal attribute region combination) that matches the bird image dimensions may be selected by the DSM module.
206. And identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result.
Specifically, the object in the image to be recognized may be recognized according to the area features of the attribute areas in the target attribute area combination corresponding to each original attribute area.
For example, as shown in fig. 2b, a bird in the image to be recognized can be recognized by combining the region features in the target attribute regions corresponding to the respective region regions (head, abdomen, foot, etc.) of the bird.
In an embodiment, in order to improve the efficiency and accuracy of recognition, the region features in each target attribute region combination may be fused, and specifically, the step "recognizing the object in the image to be recognized according to the region features of the attribute regions in the target attribute region combination to obtain the recognition result" may include:
fusing the area characteristics of the attribute areas in the target attribute area combination to obtain the target area characteristics corresponding to the original attribute areas;
and identifying the object in the image to be identified according to the target area characteristic corresponding to the original attribute area to obtain an identification result.
For example, for each target attribute region combination corresponding to the original attribute region, the region features of the attribute regions in the target attribute region combination are fused to obtain a combined region feature corresponding to each target attribute region combination; acquiring target area characteristics corresponding to the original attribute areas according to the combined area characteristics corresponding to each target attribute area combination; and identifying the object in the image to be identified according to the characteristics of the target area to obtain an identification result.
For example, the combined region feature of each target attribute region may be selected based on the feature value size of the combined region feature, such as selecting the combined region feature with the largest feature value as the target region feature of the original attribute region.
For example, referring to fig. 2c, after the DSM module selects multiple optimal attribute region combinations (including 3 optimal attribute regions) of the original attribute region, the region features of the 3 attribute regions in the region combinations may be fused to obtain the combined region feature corresponding to each optimal attribute region combination. Then, the combined region feature with the largest feature value is selected from the combined region features combined for the optimal attribute regions, and is used as the target region feature of the original attribute region for subsequent image classification or identification.
In an embodiment, it is considered that due to the target image identification scheme, features of the attribute region are extracted by using global pooling in the convolutional neural network, spatial directional information of the attribute region is lost, and spatial directional alignment cannot be performed on the attribute region, so that the accuracy of image classification or identification is low.
In order to overcome the foregoing problems, in the embodiments of the present application, a directional pooling process may be used to achieve spatial directional alignment of attribute regions, for example, for two bird head regions of the same category, i.e., left and right, similar features are extracted, so as to improve the accuracy of image classification or identification.
Specifically, the step of "identifying the object in the image to be identified according to the target region feature corresponding to the original attribute region to obtain the identification result" may include:
performing a plurality of spatial directional pooling treatments on the target area characteristics corresponding to the original attribute area to obtain a plurality of pooled characteristics;
selecting corresponding features from the plurality of pooled features to obtain target pooled features corresponding to the original attribute region;
and identifying the object in the image to be identified according to the target pooling characteristic corresponding to the original attribute region to obtain an identification result.
There are various spatial orientations, such as horizontal, vertical, rotated horizontal, rotated vertical, and so on.
In an embodiment, to facilitate implementation of spatial directional pooling operation and improve efficiency, the method may perform division and combination on a region feature map corresponding to an original attribute region, and specifically, the step "performing multiple spatial directional pooling processes on a target region feature corresponding to the original attribute region to obtain multiple post-pooling features" may include:
dividing a target area feature map corresponding to an original attribute area into a plurality of overlapped feature sub-blocks;
expanding each feature sub-block into a feature vector to obtain a feature vector set corresponding to the feature map;
and performing a plurality of spatial directional pooling processes on the feature vector set to obtain a plurality of pooled features.
For example, referring to fig. 2b, a global mean pooling (globallaoperable pooling) operation commonly used in the field of image recognition may lose a large amount of spatial information, and the present technology proposes an Orientation Pooling (OP) module for implementing spatial orientation alignment of image features, which helps to extract common features of pictures with symmetry, rotation, translation, and other orientation differences.
In obtaining the original genusCharacteristic map of sexual regionThen, the OP module first divides the feature map into NH×NWThere are overlapping sub-blocks, each of which has a height and width of h x w, respectively.
Then the features of each sub-block are spread into vectors, and the feature map can be re-expressed asNext, the OP module uses horizontal pooling, vertical pooling, and rotated horizontal pooling and vertical pooling, respectively, on the re-expressed feature map, and the four pooling operations can be formulated as follows:
after obtaining the plurality of pooled features, an optimal pooled feature may be selected from the plurality of pooled features, for example, the step "selecting a corresponding feature from the plurality of pooled features to obtain a target pooled feature corresponding to the original attribute region" may include:
performing full-connection operation on the characteristic information after each pooling process based on a shared full-connection layer to obtain a plurality of characteristic information with the same dimensionality;
and selecting corresponding features from the feature information with the same dimensions to obtain the target pooling features corresponding to the attribute region combination.
For example, the OP module passes through the four directionsAfter pooling, the OP module selects the best one of the four pooling methods by taking the maximum value to obtain the final response value output, i.e. y is max (ω)TΓhTΓvTΓ′h’,ωTΓ′v) Where ω represents a parameter of the shared fully-connected layer. For pictures with different visual angles, the directional pooling module can better realize feature alignment, reserve more spatial information and further improve the accuracy of classification or identification.
In the embodiment of the application, after the target region features of each original attribute region are obtained, the object in the image can be identified based on the target region features corresponding to each original attribute region, and in order to improve the identification accuracy, in an embodiment, the features of each region can be fused, and the fused features are used for identifying the object. Specifically, the step of identifying the object in the image to be identified according to the target area characteristic corresponding to the original attribute area to obtain an identification result, and obtaining the identification result, may include:
fusing the target area characteristics corresponding to each original attribute area to obtain fused target area characteristics;
and identifying the object in the image to be identified according to the fused target area characteristics to obtain an identification result.
The fusion mode of the target region features may be various, for example, a weighted summation mode may be adopted for fusion; for example, the target region features corresponding to each original attribute region are weighted and summed according to the feature weights corresponding to the target regions.
For example, referring to fig. 2b, the DSM module is used to accurately locate the attribute region of the bird image, then the OP module is used to achieve spatial feature alignment of the obtained attribute region, and finally the prediction of the classification result is achieved by weighted summation of the features of different attribute regions. Specifically, after obtaining the region features of each region area in the bird image through the DSM and OP modules, the region features of each region area in the bird image may be weighted and summed continuously using the weighting parameters such as feature weights w1, w2, w3, and then the type of bird, such as a sparrow, in the bird image may be identified based on the summed features.
The image identification method provided by the embodiment of the application can be applied to any image identification scene, for example, in the current chip and circuit board industrial development, the fault detection requirement is very large. The traditional defect detection needs a large amount of experienced experts for classification, and has high requirements on personnel, high cost and low efficiency. These defects mostly appear as substantially similar black areas in appearance, differing only in detail, e.g., some defects with halos and some defects approximating water stains. Therefore, the image fine-grained classification algorithm provided by the embodiment of the application can be used for identification. Meanwhile, the defects in the image have the characteristics of different sizes and asymmetric spatial morphology, and are suitable for being identified by a spatial and scale alignment algorithm.
For example, the image to be identified may be a circuit board image containing a defect area, and a conventional circuit defect may be mostly represented as a black area on the image, so that the defect area in the circuit image may be a black area. According to the implementation of the present application, the DSM module is used to accurately locate the attribute region of the circuit image (the attribute region may be a region that characterizes the defect region), then the OP module is used to implement spatial feature alignment on the obtained attribute region, and finally the prediction of the classification result is implemented by the weighted summation of the features of different attribute regions.
Specifically, referring to fig. 2b, the DSM module detects original attribute regions (i.e. rectangular frames) in an input circuit image through the region detection network, and then performs scale transformation on each original attribute region by using DSM to obtain multiple scale candidate attribute regions of each original attribute region; then, extracting the region characteristics of each scale candidate attribute region based on the convolutional neural network, and selecting a predetermined number of optimal attribute regions from the multiple scale candidate attribute regions through a circulant matrix module to obtain an optimal region combination.
Referring to fig. 2c, after the DSM module selects multiple optimal attribute region combinations (including 3 optimal attribute regions) of the original attribute region, the region features of the attribute regions in the region combinations may be fused to obtain a combined region feature corresponding to each optimal attribute region combination. Then, the combined region feature with the largest feature value is selected from the combined region features combined for the optimal attribute regions, and is used as the target region feature of the original attribute region for subsequent image classification or identification.
The OP module performs a plurality of spatial directional pooling processes such as the above four spatial directional pooling processes on the target area characteristics to obtain pooled area characteristics of each original attribute area, performs weighted summation on the pooled area characteristics of each original attribute area by adopting weight parameters, and then identifies the type of a defect area in the circuit image, such as burnout, damage and the like, based on the summed characteristics.
In addition, the method of the embodiment of the application can also be applied to various directions such as mobile phone model identification in the field of electronic commerce and the like, has a wide application prospect, and can be specifically applied and realized by referring to the introduction of the embodiment.
As can be seen from the above, the embodiment of the application acquires an image to be identified; detecting attribute areas of an object in an image to be identified to obtain a plurality of original attribute areas; carrying out scale transformation on the original attribute region to obtain candidate attribute regions with multiple scales; extracting the features of the candidate attribute regions to obtain a multi-scale feature group, wherein the multi-scale feature group comprises the region features of the candidate attribute regions of each scale; selecting attribute areas of corresponding scales from the candidate attribute areas of multiple scales based on the multi-scale feature group to obtain a target attribute area combination corresponding to the original attribute area; and identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result. According to the scheme, multi-scale candidate attribute regions can be generated for a single original attribute region, and the attribute region with the best scale is selected in a self-adaptive mode, so that images with different scales can be self-adapted, accurate positioning or attribute region detection is achieved, and accuracy of image identification is improved.
In addition, morphological constraint can be introduced in the scheme, so that the ith attribute regions of different samples have the same semantic information header. After semantic alignment of the attribute regions is realized, the directional pooling algorithm is used for realizing spatial directional alignment of the attribute regions, for example, for two bird head regions of the same category, namely, the left bird head region and the right bird head region, similar features are extracted, and the accuracy of classification or image identification is further improved.
The method according to the preceding embodiment is illustrated in further detail below by way of example.
In this embodiment, the image recognition apparatus will be described by taking an example in which the image recognition apparatus is specifically integrated in a computer device.
At present, as can be seen from analyzing a sample of an image fine-grained classification public data set (refer to fig. 3a), the image has various differences due to different shooting distances and shooting angles. For example, the upper and lower pictures of each column of pictures in the figure belong to the same category, but the first column of pictures on the left side has a scale difference, the second column of pictures on the left side has a symmetry difference, the third column of pictures on the right side has a rotation difference, and the first column of pictures on the right side has a translation difference. Therefore, an algorithm needs to be designed, the attribute region of the picture is accurately positioned, the spatial alignment of attribute features is realized, and the accuracy of classification or image identification is further improved.
In order to improve the accuracy of image recognition or classification, the following scheme can be adopted in the embodiment of the application:
and (I) training the network.
First, a computer device may obtain a sample image set from which an image recognition network is trained. Wherein, referring to fig. 2b, the image recognition network may include: the system comprises an area detection network, a DSM (Scale mining discriminant) module and an OP module. In particular the structure of the DSM module can be referred to the description of the above embodiments.
In the embodiment of the application, the image recognition network may be trained based on the sample image labeled with the object type, for example, the image recognition network may be trained in a back propagation manner. For the training method of the area detection network, the above description may be referred to.
And secondly, the target object in the image to be recognized can be recognized through the trained image recognition network.
As shown in fig. 3b, an image recognition method specifically includes the following steps:
301. the computer device obtains an image to be recognized.
For example, a computer device may receive bird images uploaded by a user.
302. And the computer equipment adopts a region detection network to perform attribute region detection on the object in the image to be identified so as to obtain a plurality of original attribute regions.
For example, the computer device may input a bird image to be recognized into an area detection network in the image recognition network, and detect an original attribute area of the bird image, such as a head area, a foot area, an abdomen area, and the like, through the area detection network.
For another example, the computer device may input the image of the defective circuit board to be identified to an area detection network in the image identification network, and detect the original attribute area of the image of the defective circuit board through the area detection network.
303. And the computer equipment performs scale transformation on each original attribute region through a scale mining judgment module to obtain a multi-scale candidate attribute region of each original attribute region.
For example, the DSM module may generate the multi-scale attribute region by scale scaling, and the specific scaling manner may refer to the description of the above embodiments.
304. And the computer equipment performs feature extraction on the candidate attribute regions through a scale mining and judging module to obtain a multi-scale feature group of each original attribute region.
Wherein the multi-scale feature group comprises region features of the candidate attribute regions of each scale
For example, the DSM module may extract region features through a convolutional network.
305. And the computer equipment constructs a characteristic circulation matrix corresponding to the multi-scale characteristic group through a scale mining and judging module.
For example, a convolutional neural network in a DSM module is adopted to extract the region characteristics of each candidate attribute region of the original attribute region to obtain a group of multi-scale characteristicsThereafter, a set of multi-scale features can be constructedThe circulant matrix of (c). Specific circulant matrices are described above.
306. And the computer equipment calculates the scale response value of the candidate attribute region combination through the scale mining discrimination module based on the scale selection weight parameter and the characteristic circulation matrix.
The scale response value corresponding to the candidate attribute region combination is the response value of the scale combination of the candidate attribute region combination as the optimal scale combination; the scale response value represents the distance between the candidate attribute region combination and the optimal scale combination, and the like; a larger response value indicates a smaller distance from the optimal scale combination (i.e., closer to the optimal scale combination); conversely, a smaller response value indicates a greater distance from the optimal scale combination (i.e., a closer to the optimal scale combination). The specific scale response value calculation is described above.
307. And the computer equipment selects a target attribute region combination corresponding to the original attribute region from the candidate attribute region combinations through the scale mining discrimination module based on the scale response value.
Specifically, an attribute region combination with the response value ranked first N bits, N being a positive integer greater than 1, may be selected.
308. And the computer equipment fuses the region characteristics of the attribute regions in the target attribute region combination through a scale mining judgment module to obtain the target region characteristics corresponding to the original attribute region.
For example, the computer device may fuse, for each target attribute region combination corresponding to the original attribute region, the region features of the attribute regions in the target attribute region combination to obtain a combined region feature corresponding to each target attribute region combination; and acquiring the target area characteristics corresponding to the original attribute areas according to the combined area characteristics corresponding to each target attribute area combination.
309. The computer equipment performs a plurality of spatial directional pooling processes on the target area characteristics corresponding to the original attribute area through a directional pooling module to obtain a plurality of pooled characteristics.
The specific pooling may be as described with reference to the above examples.
310. And the computer equipment selects corresponding features from the plurality of pooled features through the directional pooling module to obtain target pooled features corresponding to the original attribute region.
For example, full-connection operation can be performed on the pooled features to obtain feature information with the same dimensions; and selecting corresponding features from the feature information with the same dimensions to obtain the target pooling features corresponding to the attribute region combination.
311. And the computer equipment inputs the target pooling characteristics into a classification network, and classifies the objects in the image to be recognized through the classification network to obtain a recognition result.
For example, the classification network classifies the target object in the image based on the input target pooling feature to obtain a classification result, and the classification result is used as a recognition result.
For example, when the image to be recognized is a bird image, it is possible to recognize that a bird in the bird image is a yellow bird in the above manner.
For another example, when the image to be recognized is a defective circuit board image, the type of the circuit board defect may be recognized as damage or the like in the above manner.
Therefore, the multi-scale candidate attribute region can be generated for the single original attribute region through the scale mining and judging module, and the attribute region with the optimal scale can be selected in a self-adaptive mode, so that the images with different scales can be self-adapted, the accurate positioning or the detection of the attribute region can be realized, and the accuracy of image identification is improved.
In addition, morphological constraint can be introduced in the scheme, so that the ith attribute regions of different samples have the same semantic information header. After semantic alignment of the attribute regions is realized, spatial directional alignment of the attribute regions is realized by using a plurality of directional pooling algorithms through a directional pooling module, for example, for two bird head regions of the same category, which are left and right, similar features are extracted, and the accuracy of classification or image identification is further improved.
In order to better implement the above method, the embodiment of the present application further provides an image recognition apparatus, which may be integrated in a computer device, such as a server or a terminal.
For example, as shown in fig. 4a, the image recognition apparatus may include an acquisition unit 401, a detection unit 402, a transformation unit 403, a feature extraction unit 404, a region selection unit 405, a recognition unit 406, and the like as follows:
an acquiring unit 401, configured to acquire an image to be recognized;
a detecting unit 402, configured to perform attribute region detection on an object in the image to be identified, so as to obtain a plurality of original attribute regions;
a transforming unit 403, configured to perform scale transformation on the original attribute region to obtain candidate attribute regions of multiple scales;
a feature extraction unit 404, configured to perform feature extraction on the candidate attribute region to obtain a multi-scale feature group, where the multi-scale feature group includes a region feature of the candidate attribute region at each scale;
a region selection unit 405, configured to select, based on the multi-scale feature group, an attribute region of a corresponding scale from the candidate attribute regions of multiple scales, so as to obtain a target attribute region combination corresponding to the original attribute region;
and the identifying unit 406 is configured to identify an object in the image to be identified according to the area feature of the attribute area in the target attribute area combination, so as to obtain an identification result.
In an embodiment, referring to fig. 4b, the region selection unit 405 includes:
a matrix construction subunit 4051, configured to construct a corresponding feature circulation matrix based on the multi-scale feature group;
a response value operator unit 4052, configured to calculate a scale response value corresponding to a candidate attribute region combination based on the feature circulant matrix, where the scale response value is a response value of an optimal scale combination of the candidate attribute region combination;
a selecting sub-unit 4053, configured to select, based on the scale response value, a target attribute region combination corresponding to the original attribute region from the candidate attribute region combinations.
In an embodiment, the response value operator unit 4053 is configured to: obtaining a scale selection weight parameter; and calculating a scale response value of the candidate attribute region combination according to the scale selection weight parameter and the feature circulation matrix.
In an embodiment, the response value operator unit 4053 is configured to: and carrying out Fourier transform processing according to the scale selection weight parameters and the characteristic circulation matrix to obtain a scale response value of the candidate attribute region combination.
In an embodiment, referring to fig. 4c, the identifying unit 406 may include:
the fusion subunit 4061 is configured to fuse the region features of the attribute regions in the target attribute region combination to obtain target region features corresponding to the original attribute region;
the identifying subunit 4062 is configured to identify, according to the target area feature corresponding to the original attribute area, an object in the image to be identified, so as to obtain an identification result.
In an embodiment, the fusing subunit 4061 is configured to fuse, for each target attribute region combination corresponding to the original attribute region, the region features of the attribute regions in the target attribute region combination to obtain a combined region feature corresponding to each target attribute region combination; and according to the combined area characteristic corresponding to each target attribute area combination, acquiring the target area characteristic corresponding to the original attribute area.
In an embodiment, the identifier unit 4062 is configured to: performing a plurality of spatial directional pooling treatments on the target area characteristics corresponding to the original attribute area to obtain a plurality of pooled characteristics; selecting corresponding features from the plurality of pooled features to obtain target pooled features corresponding to the original attribute region; and identifying the object in the image to be identified according to the target pooling characteristic corresponding to the original attribute region to obtain an identification result.
In an embodiment, the identifier unit 4062 is configured to: fusing the target area characteristics corresponding to each original attribute area to obtain fused target area characteristics; and identifying the object in the image to be identified according to the fused target area characteristics to obtain an identification result.
In an embodiment, the identifier unit 4062 is configured to:
dividing the target area feature map corresponding to the original attribute area into a plurality of overlapped feature sub-blocks;
expanding each feature sub-block into a feature vector to obtain a feature vector set corresponding to the feature map;
and performing a plurality of spatial directional pooling processes on the feature vector set to obtain a plurality of pooled features.
In an embodiment, the identifier unit 4062 is configured to:
performing full-connection operation on the characteristic information after each pooling process based on a shared full-connection layer to obtain a plurality of characteristic information with the same dimensionality;
and selecting corresponding features from the feature information with the same dimensions to obtain the target pooling features corresponding to the attribute region combination.
In an embodiment, the detecting unit 402 is configured to perform attribute region detection on an object in the image to be recognized by using a trained region detection network.
In an embodiment, referring to fig. 4d, the image recognition apparatus may further include: a training unit 407;
the training unit is configured to:
acquiring a plurality of sample images of the marked attribute regions;
respectively carrying out attribute region detection on the sample objects in the sample images by adopting a preset region detection network to obtain n sample attribute regions of the sample images, wherein n is a positive integer greater than 1;
acquiring cross entropy loss of an s-th sample attribute area in each sample image, wherein s is a positive integer and is less than or equal to n;
acquiring similarity information between the s-th sample attribute areas in each sample image;
and training the preset area detection network according to the cross entropy loss and the similarity information to obtain the trained area detection network.
In an embodiment, the transformation unit 403 is configured to: determining a central point and a plurality of vertexes of the original attribute region; determining a first vertex far from the central point and a second vertex close to the central point from the plurality of vertices; and amplifying the scale of the original attribute region according to the first vertex, and reducing the scale of the original attribute region according to the second vertex to obtain candidate attribute regions with multiple scales.
In a specific implementation, the above units may be implemented as independent entities, or may be combined arbitrarily to be implemented as the same or several entities, and the specific implementation of the above units may refer to the foregoing method embodiments, which are not described herein again.
As can be seen from the above, in the embodiment of the present application, the image recognition apparatus is configured to obtain, through the obtaining unit 401, an image to be recognized; detecting an attribute region of an object in the image to be identified by a detecting unit 402 to obtain a plurality of original attribute regions; the transformation unit 403 performs scale transformation on the original attribute region to obtain candidate attribute regions of multiple scales; feature extraction is performed on the candidate attribute region by a feature extraction unit 404 to obtain a multi-scale feature group, where the multi-scale feature group includes region features of the candidate attribute region at each scale; selecting attribute regions of corresponding scales from candidate attribute regions of multiple scales by a region selection unit 405 based on the multi-scale feature group to obtain a target attribute region combination corresponding to an original attribute region; and the identifying unit 406 identifies the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result. According to the scheme, multi-scale candidate attribute regions can be generated for a single original attribute region, and the attribute region with the best scale is selected in a self-adaptive mode, so that images with different scales can be self-adapted, accurate positioning or attribute region detection is achieved, and accuracy of image identification is improved.
The embodiment of the present application further provides a computer device, as shown in fig. 5, which shows a schematic structural diagram of the computer device according to the embodiment of the present application, specifically:
the computer device may include components such as a processor 501 of one or more processing cores, memory 502 of one or more computer-readable storage media, a power supply 503, and an input unit 504. Those skilled in the art will appreciate that the computer device configuration illustrated in FIG. 5 does not constitute a limitation of computer devices, and may include more or fewer components than those illustrated, or some components may be combined, or a different arrangement of components. Wherein:
the processor 501 is a control center of the computer device, connects various parts of the entire computer device by using various interfaces and lines, and performs various functions of the computer device and processes data by running or executing software programs and/or modules stored in the memory 502 and calling data stored in the memory 502, thereby monitoring the computer device as a whole. Optionally, processor 501 may include one or more processing cores; preferably, the processor 501 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 501.
The memory 502 may be used to store software programs and modules, and the processor 501 executes various functional applications and data processing by operating the software programs and modules stored in the memory 502. The memory 502 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like; the storage data area may store data created according to use of the computer device, and the like. Further, the memory 502 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device. Accordingly, the memory 502 may also include a memory controller to provide the processor 501 with access to the memory 502.
The computer device further comprises a power supply 503 for supplying power to the various components, and preferably, the power supply 503 may be logically connected to the processor 501 through a power management system, so that functions of managing charging, discharging, power consumption, and the like are realized through the power management system. The power supply 503 may also include any component of one or more dc or ac power sources, recharging systems, power failure detection circuitry, power converters or inverters, power status indicators, and the like.
The computer device may also include an input unit 504, and the input unit 504 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.
Although not shown, the computer device may further include a display unit and the like, which are not described in detail herein. Specifically, in this embodiment, the processor 501 in the computer device loads the executable file corresponding to the process of one or more application programs into the memory 502 according to the following instructions, and the processor 501 runs the application programs stored in the memory 502, so as to implement various functions as follows:
acquiring an image to be identified; detecting attribute areas of the objects in the image to be identified to obtain a plurality of original attribute areas; carrying out scale transformation on the original attribute region to obtain candidate attribute regions with multiple scales; extracting the features of the candidate attribute region to obtain a multi-scale feature group, wherein the multi-scale feature group comprises the region features of the candidate attribute region of each scale; selecting attribute areas of corresponding scales from the candidate attribute areas of multiple scales based on the multi-scale feature group to obtain a target attribute area combination corresponding to the original attribute area; and identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result.
The above operations can be referred to the previous embodiments specifically, and are not described herein.
As can be seen from the above, the computer device of the present embodiment acquires the image to be recognized; detecting attribute areas of the objects in the image to be identified to obtain a plurality of original attribute areas; carrying out scale transformation on the original attribute region to obtain candidate attribute regions with multiple scales; extracting the features of the candidate attribute region to obtain a multi-scale feature group, wherein the multi-scale feature group comprises the region features of the candidate attribute region of each scale; selecting attribute areas of corresponding scales from the candidate attribute areas of multiple scales based on the multi-scale feature group to obtain a target attribute area combination corresponding to the original attribute area; and identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result. According to the scheme, multi-scale candidate attribute regions can be generated for a single original attribute region, and the attribute region with the best scale is selected in a self-adaptive mode, so that images with different scales can be self-adapted, accurate positioning or attribute region detection is achieved, and accuracy of image identification is improved.
It will be understood by those skilled in the art that all or part of the steps of the methods of the above embodiments may be performed by instructions or by associated hardware controlled by the instructions, which may be stored in a computer readable storage medium and loaded and executed by a processor.
To this end, the present application provides a computer-readable storage medium, in which a computer program is stored, where the computer program can be loaded by a processor to execute the steps in any one of the medical image segmentation methods provided in the present application. For example, the computer program may perform the steps of:
acquiring an image to be identified; detecting attribute areas of the objects in the image to be identified to obtain a plurality of original attribute areas; carrying out scale transformation on the original attribute region to obtain candidate attribute regions with multiple scales; extracting the features of the candidate attribute region to obtain a multi-scale feature group, wherein the multi-scale feature group comprises the region features of the candidate attribute region of each scale; selecting attribute areas of corresponding scales from the candidate attribute areas of multiple scales based on the multi-scale feature group to obtain a target attribute area combination corresponding to the original attribute area; and identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result.
The above operations can be implemented in the foregoing embodiments, and are not described in detail herein.
Wherein the computer-readable storage medium may include: read Only Memory (ROM), Random Access Memory (RAM), magnetic or optical disks, and the like.
Since the instructions stored in the computer-readable storage medium can execute the steps in any image recognition method provided in the embodiments of the present application, the beneficial effects that can be achieved by any image recognition method provided in the embodiments of the present application can be achieved, which are detailed in the foregoing embodiments and will not be described again here.
The foregoing describes an image recognition method, an image recognition apparatus, a computer device, and a computer-readable storage medium provided in the embodiments of the present application in detail, and specific examples are applied herein to illustrate the principles and implementations of the present invention, and the descriptions of the foregoing embodiments are only used to help understand the method and the core idea of the present invention; meanwhile, for those skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (15)

1. An image recognition method, comprising:
acquiring an image to be identified;
detecting attribute areas of the objects in the image to be identified to obtain a plurality of original attribute areas;
carrying out scale transformation on the original attribute region to obtain candidate attribute regions with multiple scales;
extracting the features of the candidate attribute region to obtain a multi-scale feature group, wherein the multi-scale feature group comprises the region features of the candidate attribute region of each scale;
selecting attribute areas of corresponding scales from the candidate attribute areas of multiple scales based on the multi-scale feature group to obtain a target attribute area combination corresponding to the original attribute area;
and identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result.
2. The image recognition method of claim 1, wherein selecting attribute regions of corresponding scales from the candidate attribute regions of multiple scales based on the multi-scale feature group to obtain a target attribute region combination corresponding to an original attribute region comprises:
constructing a corresponding feature circulation matrix based on the multi-scale feature group;
calculating a scale response value corresponding to a candidate attribute region combination based on a feature cycle matrix, wherein the scale response value is a response value of the scale combination of the candidate attribute region combination as an optimal scale combination;
and selecting a target attribute region combination corresponding to the original attribute region from the candidate attribute region combinations based on the scale response value.
3. The image recognition method of claim 2, wherein calculating the scale response value corresponding to the candidate attribute region combination based on the feature circulant matrix comprises:
obtaining a scale selection weight parameter;
and calculating a scale response value of the candidate attribute region combination according to the scale selection weight parameter and the feature circulation matrix.
4. The image recognition method of claim 3, wherein calculating a scale response value of a candidate attribute region combination according to the scale selection weight parameter and the feature circulant matrix comprises:
and carrying out Fourier transform processing according to the scale selection weight parameters and the characteristic circulation matrix to obtain a scale response value of the candidate attribute region combination.
5. The image recognition method of claim 2, wherein recognizing the object in the image to be recognized according to the area features of the attribute areas in the target attribute area combination to obtain a recognition result comprises:
fusing the area characteristics of the attribute areas in the target attribute area combination to obtain the target area characteristics corresponding to the original attribute areas;
and identifying the object in the image to be identified according to the target area characteristic corresponding to the original attribute area to obtain an identification result.
6. The image recognition method of claim 5, wherein fusing the region features of the attribute regions in the target attribute region combination to obtain the target region feature corresponding to the original attribute region, comprises:
for each target attribute region combination corresponding to the original attribute region, fusing the region characteristics of the attribute regions in the target attribute region combination to obtain the combined region characteristics corresponding to each target attribute region combination;
and according to the combined area characteristic corresponding to each target attribute area combination, acquiring the target area characteristic corresponding to the original attribute area.
7. The image recognition method of claim 5, wherein recognizing the object in the image to be recognized according to the target area feature corresponding to the original attribute area to obtain a recognition result comprises:
performing a plurality of spatial directional pooling treatments on the target area characteristics corresponding to the original attribute area to obtain a plurality of pooled characteristics;
selecting corresponding features from the plurality of pooled features to obtain target pooled features corresponding to the original attribute region;
and identifying the object in the image to be identified according to the target pooling characteristic corresponding to the original attribute region to obtain an identification result.
8. The image recognition method of claim 5, wherein recognizing the object in the image to be recognized according to the target area feature corresponding to the original attribute area to obtain a recognition result, and obtaining the recognition result comprises:
fusing the target area characteristics corresponding to each original attribute area to obtain fused target area characteristics;
and identifying the object in the image to be identified according to the fused target area characteristics to obtain an identification result.
9. The image recognition method of claim 7, wherein performing a plurality of spatially directional pooling processes on the target region feature corresponding to the original attribute region to obtain a plurality of pooled features comprises:
dividing the target area feature map corresponding to the original attribute area into a plurality of overlapped feature sub-blocks;
expanding each feature sub-block into a feature vector to obtain a feature vector set corresponding to the feature map;
and performing a plurality of spatial directional pooling processes on the feature vector set to obtain a plurality of pooled features.
10. The image recognition method of claim 7, wherein selecting corresponding features from the plurality of pooled features to obtain target pooled features corresponding to the original attribute region comprises:
performing full-connection operation on the characteristic information after each pooling process based on a shared full-connection layer to obtain a plurality of characteristic information with the same dimensionality;
and selecting corresponding features from the feature information with the same dimensions to obtain the target pooling features corresponding to the attribute region combination.
11. The image recognition method of claim 1, wherein detecting the attribute region of the object in the image to be recognized comprises: and adopting a trained area detection network to perform attribute area detection on the object in the image to be identified.
12. The image recognition method of claim 11, further comprising:
acquiring a plurality of sample images of the marked attribute regions;
respectively carrying out attribute region detection on the sample objects in the sample images by adopting a preset region detection network to obtain n sample attribute regions of the sample images, wherein n is a positive integer greater than 1;
acquiring cross entropy loss of an s-th sample attribute area in each sample image, wherein s is a positive integer and is less than or equal to n;
acquiring similarity information between the s-th sample attribute areas in each sample image;
and training the preset area detection network according to the cross entropy loss and the similarity information to obtain the trained area detection network.
13. An image recognition apparatus, comprising:
the device comprises an acquisition unit, a recognition unit and a processing unit, wherein the acquisition unit is used for acquiring an image to be recognized;
the detection unit is used for detecting the attribute areas of the objects in the image to be identified to obtain a plurality of original attribute areas;
the transformation unit is used for carrying out scale transformation on the original attribute region to obtain candidate attribute regions with a plurality of scales;
the characteristic extraction unit is used for extracting the characteristics of the candidate attribute regions to obtain a multi-scale characteristic group, and the multi-scale characteristic group comprises the region characteristics of the candidate attribute regions of each scale;
the region selection unit is used for selecting attribute regions of corresponding scales from the candidate attribute regions of multiple scales based on the multi-scale feature group to obtain a target attribute region combination corresponding to the original attribute region;
and the identification unit is used for identifying the object in the image to be identified according to the area characteristics of the attribute areas in the target attribute area combination to obtain an identification result.
14. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, carries out the steps of the method according to any one of claims 1-12.
15. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the steps of the method according to any of claims 1-12 are implemented when the program is executed by the processor.
CN201910851503.2A 2019-09-04 2019-09-04 Image recognition method and device, computer equipment and readable storage medium Pending CN110598715A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910851503.2A CN110598715A (en) 2019-09-04 2019-09-04 Image recognition method and device, computer equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910851503.2A CN110598715A (en) 2019-09-04 2019-09-04 Image recognition method and device, computer equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN110598715A true CN110598715A (en) 2019-12-20

Family

ID=68858550

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910851503.2A Pending CN110598715A (en) 2019-09-04 2019-09-04 Image recognition method and device, computer equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN110598715A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401376A (en) * 2020-03-12 2020-07-10 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN111583256A (en) * 2020-05-21 2020-08-25 北京航空航天大学 Dermatoscope image classification method based on rotating mean value operation
CN111598107A (en) * 2020-04-17 2020-08-28 南开大学 Multi-task joint detection method based on dynamic feature selection
CN112257728A (en) * 2020-11-12 2021-01-22 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, computer device, and storage medium
CN112288724A (en) * 2020-10-30 2021-01-29 北京市商汤科技开发有限公司 Defect detection method and device, electronic equipment and storage medium
CN112857268A (en) * 2021-01-21 2021-05-28 北京百度网讯科技有限公司 Object area measuring method, device, electronic device and storage medium
CN112990152A (en) * 2021-05-10 2021-06-18 中国科学院自动化研究所 Vehicle weight identification method based on key point detection and local feature alignment

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401376A (en) * 2020-03-12 2020-07-10 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN111401376B (en) * 2020-03-12 2023-06-30 腾讯科技(深圳)有限公司 Target detection method, target detection device, electronic equipment and storage medium
CN111598107A (en) * 2020-04-17 2020-08-28 南开大学 Multi-task joint detection method based on dynamic feature selection
CN111583256B (en) * 2020-05-21 2022-11-04 北京航空航天大学 Dermatoscope image classification method based on rotating mean value operation
CN111583256A (en) * 2020-05-21 2020-08-25 北京航空航天大学 Dermatoscope image classification method based on rotating mean value operation
CN112288724A (en) * 2020-10-30 2021-01-29 北京市商汤科技开发有限公司 Defect detection method and device, electronic equipment and storage medium
CN112288724B (en) * 2020-10-30 2023-10-20 北京市商汤科技开发有限公司 Defect detection method and device, electronic equipment and storage medium
CN112257728A (en) * 2020-11-12 2021-01-22 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, computer device, and storage medium
CN112257728B (en) * 2020-11-12 2021-08-17 腾讯科技(深圳)有限公司 Image processing method, image processing apparatus, computer device, and storage medium
US11669990B2 (en) 2021-01-21 2023-06-06 Beijing Baidu Netcom Science And Technology Co., Ltd. Object area measurement method, electronic device and storage medium
CN112857268A (en) * 2021-01-21 2021-05-28 北京百度网讯科技有限公司 Object area measuring method, device, electronic device and storage medium
CN112990152B (en) * 2021-05-10 2021-07-30 中国科学院自动化研究所 Vehicle weight identification method based on key point detection and local feature alignment
CN112990152A (en) * 2021-05-10 2021-06-18 中国科学院自动化研究所 Vehicle weight identification method based on key point detection and local feature alignment

Similar Documents

Publication Publication Date Title
CN110598715A (en) Image recognition method and device, computer equipment and readable storage medium
Wang et al. SaliencyGAN: Deep learning semisupervised salient object detection in the fog of IoT
Wang et al. Detect globally, refine locally: A novel approach to saliency detection
US10936911B2 (en) Logo detection
CN110909651B (en) Method, device and equipment for identifying video main body characters and readable storage medium
Wang et al. RGB-D salient object detection via minimum barrier distance transform and saliency fusion
CN109960742B (en) Local information searching method and device
CN107045631B (en) Method, device and equipment for detecting human face characteristic points
CN109658455A (en) Image processing method and processing equipment
CN109816769A (en) Scene based on depth camera ground drawing generating method, device and equipment
Wang et al. Small-object detection based on yolo and dense block via image super-resolution
WO2021114814A1 (en) Human body attribute recognition method and apparatus, electronic device and storage medium
US9626585B2 (en) Composition modeling for photo retrieval through geometric image segmentation
CN109657583A (en) Face's critical point detection method, apparatus, computer equipment and storage medium
EP2697775A1 (en) Method of detecting facial attributes
CN108846404B (en) Image significance detection method and device based on related constraint graph sorting
CN111652974B (en) Method, device, equipment and storage medium for constructing three-dimensional face model
CN108830185B (en) Behavior identification and positioning method based on multi-task joint learning
CN113095106A (en) Human body posture estimation method and device
CN114155365B (en) Model training method, image processing method and related device
CN114445633A (en) Image processing method, apparatus and computer-readable storage medium
CN111382791B (en) Deep learning task processing method, image recognition task processing method and device
CN113822254B (en) Model training method and related device
CN107948586A (en) Trans-regional moving target detecting method and device based on video-splicing
CN109635755A (en) Face extraction method, apparatus and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40018726

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination