CN111340124A - Method and device for identifying entity category in image - Google Patents
Method and device for identifying entity category in image Download PDFInfo
- Publication number
- CN111340124A CN111340124A CN202010139339.5A CN202010139339A CN111340124A CN 111340124 A CN111340124 A CN 111340124A CN 202010139339 A CN202010139339 A CN 202010139339A CN 111340124 A CN111340124 A CN 111340124A
- Authority
- CN
- China
- Prior art keywords
- target
- image
- classification
- features
- entity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/46—Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
- G06V10/462—Salient features, e.g. scale invariant feature transforms [SIFT]
- G06V10/464—Salient features, e.g. scale invariant feature transforms [SIFT] using a plurality of salient features, e.g. bag-of-words [BoW] representations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Physics & Mathematics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
The application provides a method and a device for identifying entity categories in an image, wherein the method comprises the following steps: extracting image features of a target image according to a preset algorithm, and determining at least one target area containing a target entity in the target image according to the image features; extracting the classification characteristic of each target area in at least one target area, and fusing all the classification characteristics to obtain a local classification characteristic; extracting global classification features of the target image, and fusing the local classification features and the global classification features to obtain target classification features; and determining the entity class of the target entity according to the target classification characteristics. Therefore, local features of the entity are strengthened, the entity category in the image can be accurately determined according to the combination of the global features and the local features, and the technical problem that the specific entity category is difficult to determine due to the similarity of the global features of the image in the prior art is solved.
Description
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for identifying entity categories in an image.
Background
Image object classification understanding has attracted great research interest in recent years as a basic and important task of image classification understanding, and meanwhile, the image object classification understanding is successfully deployed in a plurality of application products and intelligently solves a plurality of practical problems. With the rapid development of deep learning techniques in recent years, deep learning has also become the most advanced technique in image object classification. I.e. to learn the entity classes in the image from the global depth features of the image.
In the related art, global features in an image are similar in many scenes, and the difficulty in identifying entity categories according to the global features is high, and entities of different categories are classified into one category, for example, when the categories of food are identified, the difficulty is increased for determining the categories of the food due to high similarity between the food.
Disclosure of Invention
The present application is directed to solving, at least to some extent, one of the technical problems in the related art.
Therefore, a first objective of the present application is to provide a method for identifying entity classes in an image, so as to strengthen local features existing in an entity, and accurately determine entity classes in the image according to a combination of global features and the local features, thereby solving a technical problem that in the prior art, global features of images are similar and it is difficult to determine specific classes of the entity.
A second object of the present application is to provide an apparatus for identifying a class of an entity in an image.
A third object of the present application is to provide a terminal device.
A fourth object of the present application is to propose a computer readable storage medium.
In order to achieve the above object, an embodiment of the first aspect of the present application provides a method for identifying an entity class in an image, where the method includes: extracting image features of a target image according to a preset algorithm, and determining at least one target area containing a target entity in the target image according to the image features; extracting the classification characteristic of each target area in the at least one target area, and fusing all the classification characteristics to obtain a local classification characteristic; extracting global classification features of the target image, and fusing the local classification features and the global classification features to obtain target classification features; and determining the entity class of the target entity according to the target classification characteristic.
The device for identifying entity categories in images provided by the embodiment of the second aspect of the application comprises: the first determination module is used for extracting image characteristics of a target image according to a preset algorithm and determining at least one target area containing a target entity in the target image according to the image characteristics; the first fusion module is used for extracting the classification characteristic of each target area in the at least one target area and fusing all the classification characteristics to obtain a local classification characteristic; the second fusion module is used for extracting the global classification characteristic of the target image and fusing the local classification characteristic and the global classification characteristic to obtain a target classification characteristic; and the second determining module is used for determining the entity class of the target entity according to the target classification characteristic.
The terminal device provided in the embodiment of the third aspect of the present application includes: the identification method comprises the following steps of storing a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein the processor executes the computer program to realize the identification method of the entity class in the image.
A computer-readable storage medium is provided in an embodiment of the fourth aspect of the present application, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for identifying the entity class in the image according to the embodiment of the first aspect of the present application.
The technical scheme provided by the application at least comprises the following technical effects:
the method comprises the steps of extracting image features of a target image according to a preset algorithm, determining at least one target area containing a target entity in the target image according to the image features, extracting classification features of each target area in the at least one target area, fusing all the classification features to obtain local classification features, further extracting global classification features of the target image, fusing the local classification features and the global classification features to obtain target classification features, and finally determining entity classes of the target entity according to the target classification features. Thereby, an entity class of the target entity is determined according to the target classification characteristic.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic flowchart illustrating an entity class identification method according to an embodiment of the present disclosure;
fig. 2 is a schematic view of an application scenario for identifying entity categories according to an embodiment of the present application;
FIG. 3 is an architectural diagram of a P-net algorithm according to one embodiment of the present application;
FIG. 4 is a schematic diagram of a P-net algorithm training flow according to one embodiment of the present application;
FIG. 5 is an architecture diagram of the R-Net algorithm according to one embodiment of the present application;
FIG. 6 is a flowchart illustrating a target area acquisition method according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a training flow of the R-Net algorithm according to one embodiment of the present application;
FIG. 8 is an architectural diagram of a fine classification model according to one embodiment of the present application;
FIG. 9 is a schematic diagram of the training of a fine classification model according to one embodiment of the present application;
FIG. 10 is a flow diagram of a method for identifying an entity class according to another embodiment of the present application;
FIG. 11 is a schematic diagram of a training flow of a classifier according to one embodiment of the present application;
FIG. 12 is a block diagram of an entity class identification apparatus according to an embodiment of the present application; and
fig. 13 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary and intended to be used for explaining the present application and should not be construed as limiting the present application.
The following describes an entity class identification method and apparatus according to an embodiment of the present application with reference to the drawings.
The entity in the embodiment of the application can be food, sports goods and jewelry and the like, and the identification of the category referred by the application can be understood as the identification of the sub-category under the entity.
Fig. 1 is a flowchart illustrating an entity class identification method according to an embodiment of the present disclosure.
In view of the problem that the entity categories mentioned in the above background are difficult to distinguish, the embodiment of the present application provides an entity category identification method to achieve accurate identification of entity categories, as shown in fig. 1, the method includes the following steps:
The target image can be actively uploaded by a user or can be a current shot image automatically determined in an album
In this embodiment, the image feature of the target image is extracted according to a preset algorithm, and at least one target region including the target entity is determined in the target image according to the image feature, that is, a region where the entity exists in the target image is selected, for example, as shown in fig. 2, at least one target region may be determined according to the image feature, and each target region includes the corresponding entity.
It should be noted that, in different application scenarios, the manner of acquiring at least one target area is different, and the following examples are illustrated:
example one:
in this example, according to a preset first convolution algorithm, a first one-dimensional feature map of a target image is obtained, that is, a one-dimensional feature map of the image is extracted, so as to more efficiently determine an entity in the image, further, an image feature of the first one-dimensional feature map is extracted, at least one candidate region where the target entity is located is determined, and the at least one candidate region is determined as the at least one target region.
As an example, the first convolution algorithm is a P-net algorithm, and referring to the architecture of the P-net algorithm shown in fig. 3, in an actual execution process, the target image is processed to be 12 × 12 according to a downsampling method, and further, the target image is processed to be 12 × 3 according to a color channel.
First, 12 × 3 target images are input to the first convolutional layer, 10 feature maps of 5 × 5 are generated by 10 convolution kernels of 3 × 3 of the first convolutional layer and Max power (stride 2) operation of 2 × 2, and further, the feature maps are input to the second convolutional layer, 16 feature maps of 3 × 3 are generated by 16 convolution kernels of 3 × 10 of the second convolutional layer, the feature maps are input to the third convolutional layer, 32 feature maps of 1 × 1 are generated by 32 convolution kernels of 3 × 3 16 of the third convolutional layer, 2 feature maps of 1 × 1 are generated for the 32 feature maps of 1 × 1, and the feature maps of 1 × 1 can be used for classification by 2 convolution kernels of 1 × 1, and can be used for determining the regression features by 7 convolution kernels of 1 in this example, to optimize the structure of the algorithm.
Certainly, in order to improve the efficacy of the P-net algorithm, as shown in fig. 4, the architecture of the P-net algorithm is trained in advance, after down-sampling processing is performed on a large number of sample images, the P-net algorithm is trained, a loss function is calculated according to the difference between the obtained candidate area and the actual labeled area, and the architecture of the P-net algorithm is optimized according to the loss function.
Considering that the P-net algorithm can only roughly determine the area where the corresponding entity is located, in order to further determine the area where the entity is located, the target area can be further refined and determined on the basis of the candidate area.
The method comprises the steps of extracting a second one-dimensional feature map of each candidate region in at least one candidate region according to a preset second convolution algorithm, further extracting image features of a first one-dimensional feature map, converging at least one candidate region according to the image features of the first one-dimensional feature map, and determining the converged at least one candidate region as at least one target region. Convergence here may be understood as deletion of a wrong candidate region where no entity is present, or deletion of a sub-region of a candidate region where an entity is present that does not contain an entity.
In some possible examples, when the second convolution algorithm is the R-Net algorithm, the candidate region obtained by the P-Net algorithm is processed into a 24 × 24 size image with reference to the architecture diagram of the R-Net algorithm shown in fig. 5, the image is input to the first convolution layer of the R-Net algorithm, 28 11 feature maps are generated after passing through 28 convolution kernels of 3 × 3 and max firing of 3 × 3 (strand 2), the feature map is input to the second convolution layer, 48 feature maps of 4 × 4 are generated after passing through 48 convolution kernels of 3 × 28 and max firing of 3 × 3 (strand 2) of the second convolution layer, the feature map is input to the third convolution layer of the algorithm, the feature map is generated after passing through 64 convolution kernels of 64 × 2 and 48 convolution kernels of the third convolution layer, and the feature map is converted into a full-size 3 × 64 connected graph of 3, and then converting a full connection layer for converting the classification problem of the regression frame, wherein the full connection layer is a full connection layer for position regression problem of the bounding box, and a target region in the candidate region is determined according to the output of the full connection layer.
That is, when the target entity is a gourmet, as shown in fig. 6, the target image is input to the PNet algorithm to obtain a large number of candidate regions that are likely to be gourmet, and then Rnet converges to further filter out the unqualified candidate regions. At the same time as the filtering, a finer and more compact target area is obtained (since the PNet-derived area may be too large or contain no food by mistake). The R-Net cuts an original target image on the basis of candidate region (bbox) information predicted by the P-Net, adjusts the size of the candidate region to a fixed scale, and screens the candidate region by a non-maximum Suppression algorithm (NMS) to determine the target region.
Similarly, referring to fig. 7, an R-Net algorithm needs to be trained in advance, an image of a target region where an entity is marked is downsampled to a fixed size and then input into the R-Net algorithm, a loss function is determined according to comparison between the region where the entity is located and the pre-marked region obtained by the R-Net algorithm, and the framework of the R-Net algorithm is adjusted according to the loss function.
Example two:
in this example, a general image feature of the target entity is preset, and a connected region including the general image feature is determined as a target region.
And 102, extracting the classification characteristic of each target area in at least one target area, and fusing all the classification characteristics to obtain a local classification characteristic.
Specifically, the classification features of each target region in at least one target region are extracted, and all the classification features are fused to obtain local classification features, that is, the entity category is determined by mainly considering the local classification features of the entity.
As a possible implementation manner, a fine classification model is preset, and the fine classification model can extract the classification features of each target area in each target area.
In an embodiment of the present application, the selected target areas are sorted, and the sorting is performed to ensure consistency of the order, for example, only three target areas, which are 4x4, 3x3, and 2x2 respectively, the lengths of the sign vectors extracted by the fine classification model are the same, such as 1x1028, but the meanings actually represented by each group of vectors are different, therefore, for example, 100 random boxes are used in the training and the training is performed in a descending order, when the target image is processed after the training is completed, the sorting rules should be followed, 100 random boxes are used and the sequence is in a descending order, and then the target areas are sequentially input into the fine classification network to obtain candidate classification features corresponding to each target area, if the vectors corresponding to the n target areas are i, i ∈ { i } i1,...,inInputting the fine classification model to obtain n eigenvectors v, v ∈ { v }1,...,vn}。
The method comprises the steps of determining the size of each target area, sequencing at least one target area according to the size, sequentially inputting at least one target area into a pre-trained fine classification model according to a sequencing result, determining the classification feature of each target area to ensure the acquisition accuracy of the classification feature, ensuring that the model can be optimized more and more, and optimizing the model according to a regression mechanism after extracting the classification feature once to ensure that the classification feature obtained by subsequent input is more and more accurate.
In an embodiment of the present application, a confidence level of each target region including the target entity may also be determined, for example, the confidence level is determined according to a feature matching degree with the target entity, after the candidate classification feature of each target region is extracted, a corresponding classification feature is determined according to the candidate classification feature and the confidence level of each target region, for example, a weight value corresponding to the confidence level is determined, and the classification feature is determined according to a product of the weight value and the corresponding classification feature.
And 103, extracting the global classification features of the target image, and fusing the local classification features and the global classification features to obtain the target classification features.
Specifically, the global feature of the image is considered, the global classification feature of the target image is extracted, and the local classification feature and the global classification feature are fused to obtain the target classification feature. In an embodiment of the present application, the global classification feature and the local classification feature may be processed into a one-dimensional feature and then spliced into a target classification feature, so as to reduce the amount of computation.
In one embodiment of the present application, referring to fig. 8, for a fine classification model, a fine classification model based on an unsupervised training is assumed. The architecture is divided into two branches. It is noted that the feature extractors in the upper and lower branches share a network parameter and structure. The network selection is based on task requirements, such as ResNet on a server, and Mobilene on a mobile phone and other devices. The model is obtained by pre-training under large data sets such as Imagnet and the like. The upper branch in the image is a traditional forward network, and the whole target image is input into a feature extractor to obtain global classification features and then is sent into a feature fusion network. The guidance network in fig. 8 solves the problem that it is not known that where a local area in a target image is locally optimal, by selecting a candidate frame of a local area, where, referring to fig. 9, the guidance network needs to be trained, where the teacher network aims to play a role of supervision and guidance, and supervises the extraction of features by comparing the information confidence of each target area obtained by the guidance network, and the teacher network adjusts the network parameters of the guidance network according to the calculation of loss, and after inputting the target areas into the guidance network, i.e., the lower arm in fig. 8, can obtain the classification features of each target area (taking 3 target areas as an example in the figure), and concatenate the one-dimensional features extracted from each target area to form a fused one-dimensional vector. The reason for adopting the one-dimensional vector is that the one-dimensional vector has small calculation amount and high processing efficiency, and simultaneously, the obtained result is similar with the characteristic vector with higher dimension.
And 104, determining the entity type of the target entity according to the target classification characteristics.
Specifically, the target classification features not only embody local features but also include global features, and entity categories of the target entities are determined according to the target classification features, such as determining specific subcategories of food and the like.
Furthermore, after the entity types are determined, the photo album can be divided according to the types, when the entity types are multiple, multiple entity types can be added in a file house to serve as the description of the image, and therefore the user can know the image types without watching the image.
In an embodiment of the application, after the entity type is determined, appropriate beautifying processing is performed on different entities in the target image according to the entity type matching corresponding beautifying parameters, so that beautifying experience is greatly improved.
Referring to fig. 10, after the fine classification model obtains the fused target classification features, the target classification features may be input to a pre-trained classifier, which may be a non-linear classifier, such as a non-linear SVM. The nonlinear classifier can effectively expand the classification dimensionality and reduce the defects of softmax in nonlinear classification. For the training of the classifier, as shown in fig. 11 below, the classifier converges by iterating through the training samples until the value of the optimization function is optimal. The classifier can effectively expand classification dimensionality, and takes an SVM as an example, the SVM projects the features into a high-dimensional space, and then nonlinear differentiation is carried out on the features. For linear classifiers such as Softmax and full link layers, the method only has good effect on low-dimensional linear classification. By using the scheme, the defect of Softmax in nonlinear classification can be reduced.
Therefore, the method of the embodiment of the application solves the problem that the local characteristic position is difficult to artificially determine. Meanwhile, the extraction of the global depth features is maintained. The problem of pain point of entity classification is solved by combining global and local features, entities of different types in the image can be accurately identified, different entities can be directionally beautified when a user takes a picture, and entities such as food are large demand points in the picture taking function, so that the application prospect is wide.
To sum up, in the method for identifying entity classes in an image according to the embodiment of the present application, image features of a target image are extracted according to a preset algorithm, at least one target region including a target entity is determined in the target image according to the image features, classification features of each target region in the at least one target region are extracted, all the classification features are fused to obtain local classification features, further, global classification features of the target image are extracted, the local classification features and the global classification features are fused to obtain target classification features, and finally, entity classes of the target entity are determined according to the target classification features. Thus, the entity class of the target entity is determined according to the target classification characteristic.
In order to implement the above embodiments, the present application further provides an apparatus for identifying entity categories in an image.
Fig. 12 is a schematic structural diagram of an apparatus for identifying an entity category in an image according to an embodiment of the present application.
As shown in fig. 12, the apparatus for recognizing the entity category in the image includes: a first determination module 10, a first fusion module 20, a second fusion module 30 and a second determination module 40.
The first determining module 10 is configured to extract image features of a target image according to a preset algorithm, and determine at least one target area containing a target entity in the target image according to the image features;
the first fusion module 20 is configured to extract a classification feature of each target region in at least one target region, and fuse all the classification features to obtain a local classification feature;
the second fusion module 30 is configured to extract global classification features of the target image, and fuse the local classification features and the global classification features to obtain target classification features;
and a second determining module 40, configured to determine an entity class of the target entity according to the target classification characteristic.
Further, in a possible implementation manner of the embodiment of the present application, the first determining module 10 is specifically configured to:
acquiring a first one-dimensional feature map of a target image according to a preset first convolution algorithm;
extracting image features of the first one-dimensional feature map, and determining at least one candidate region where a target entity is located;
at least one candidate region is determined as at least one target region.
In this embodiment, the first determining module 10 is specifically configured to: extracting a second one-dimensional feature map of each candidate region in at least one candidate region according to a preset second convolution algorithm;
extracting image features of the first one-dimensional feature map, and converging at least one candidate region according to the image features of the first one-dimensional feature map;
and determining the converged at least one candidate region as at least one target region.
It should be noted that the foregoing explanation of the embodiment of the method for identifying entity classes in an image is also applicable to the apparatus for identifying entity classes in an image of the embodiment, and is not repeated herein.
To sum up, the device for identifying entity classes in images according to the embodiment of the present application extracts image features of a target image according to a preset algorithm, determines at least one target region including a target entity in the target image according to the image features, extracts classification features of each target region in the at least one target region, fuses all the classification features to obtain local classification features, further extracts global classification features of the target image, fuses the local classification features and the global classification features to obtain target classification features, and finally determines entity classes of the target entity according to the target classification features. Thus, the entity class of the target entity is determined according to the target classification characteristic.
In order to implement the above embodiment, the present application further provides a terminal device.
Fig. 13 is a schematic structural diagram of a terminal device according to an embodiment of the present application. As shown in fig. 13, the terminal device 1300 may include: the memory 1302, the processor 1304, and the computer program 1306 stored in the memory 1303 and operable on the processor 1304, when the processor 1304 executes the computer program 406, the method for identifying the entity category in the image according to any of the embodiments described above in the present application is implemented.
In order to achieve the above embodiments, the present application further proposes a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the method for identifying the entity class in the image according to any of the above embodiments of the present application.
In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present application, "plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing steps of a custom logic function or process, and alternate implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. If implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.
Claims (10)
1. A method for identifying a category of an entity in an image, the method comprising:
extracting image features of a target image according to a preset algorithm, and determining at least one target area containing a target entity in the target image according to the image features;
extracting the classification characteristic of each target area in the at least one target area, and fusing all the classification characteristics to obtain a local classification characteristic;
extracting global classification features of the target image, and fusing the local classification features and the global classification features to obtain target classification features;
and determining the entity class of the target entity according to the target classification characteristic.
2. The method of claim 1, wherein the extracting image features of the target image according to a preset algorithm, and determining at least one target region containing a target entity in the target image according to the image features comprises:
acquiring a first one-dimensional feature map of the target image according to a preset first convolution algorithm;
extracting image features of the first one-dimensional feature map, and determining at least one candidate region where the target entity is located;
determining the at least one candidate region as the at least one target region.
3. The method of claim 2, wherein said determining the at least one candidate region as the at least one target region comprises:
extracting a second one-dimensional feature map of each candidate region in the at least one candidate region according to a preset second convolution algorithm;
extracting image features of the first one-dimensional feature map, and converging the at least one candidate region according to the image features of the first one-dimensional feature map;
determining the at least one candidate region after convergence as the at least one target region.
4. The method of claim 1, wherein said extracting classification features for each of said at least one target region comprises:
determining a confidence level of each target region containing the target entity;
extracting candidate classification features of each target region;
and determining corresponding classification features according to the candidate classification features and the confidence degrees of each target region.
5. The method of claim 1, wherein said extracting classification features for each of said at least one target region comprises:
determining the size of each target area;
sorting the at least one target area according to the size;
and sequentially inputting the at least one target area into a pre-trained fine classification model according to the sequencing result, and determining the classification characteristic of each target area.
6. An apparatus for identifying a category of an entity in an image, comprising:
the first determination module is used for extracting image characteristics of a target image according to a preset algorithm and determining at least one target area containing a target entity in the target image according to the image characteristics;
the first fusion module is used for extracting the classification characteristic of each target area in the at least one target area and fusing all the classification characteristics to obtain a local classification characteristic;
the second fusion module is used for extracting the global classification characteristic of the target image and fusing the local classification characteristic and the global classification characteristic to obtain a target classification characteristic;
and the second determining module is used for determining the entity class of the target entity according to the target classification characteristic.
7. The apparatus of claim 6, wherein the first determining module is specifically configured to:
acquiring a first one-dimensional feature map of the target image according to a preset first convolution algorithm;
extracting image features of the first one-dimensional feature map, and determining at least one candidate region where the target entity is located;
determining the at least one candidate region as the at least one target region.
8. The apparatus of claim 6, wherein the first determining module is specifically configured to:
extracting a second one-dimensional feature map of each candidate region in the at least one candidate region according to a preset second convolution algorithm;
extracting image features of the first one-dimensional feature map, and converging the at least one candidate region according to the image features of the first one-dimensional feature map;
determining the at least one candidate region after convergence as the at least one target region.
9. A terminal device, comprising: memory, processor and computer program stored on the memory and executable on the processor, the processor implementing a method of identifying a category of an entity in an image according to any one of claims 1 to 5 when executing the computer program.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a method of identifying a category of an entity in an image according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010139339.5A CN111340124A (en) | 2020-03-03 | 2020-03-03 | Method and device for identifying entity category in image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010139339.5A CN111340124A (en) | 2020-03-03 | 2020-03-03 | Method and device for identifying entity category in image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111340124A true CN111340124A (en) | 2020-06-26 |
Family
ID=71182086
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010139339.5A Pending CN111340124A (en) | 2020-03-03 | 2020-03-03 | Method and device for identifying entity category in image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111340124A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112085035A (en) * | 2020-09-14 | 2020-12-15 | 北京字节跳动网络技术有限公司 | Image processing method, image processing device, electronic equipment and computer readable medium |
CN112528897A (en) * | 2020-12-17 | 2021-03-19 | Oppo(重庆)智能科技有限公司 | Portrait age estimation method, Portrait age estimation device, computer equipment and storage medium |
CN112580694A (en) * | 2020-12-01 | 2021-03-30 | 中国船舶重工集团公司第七0九研究所 | Small sample image target identification method and system based on joint attention mechanism |
CN114140613A (en) * | 2021-12-08 | 2022-03-04 | 北京有竹居网络技术有限公司 | Image detection method, image detection device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170293824A1 (en) * | 2014-12-30 | 2017-10-12 | Baidu Online Network Technology ( Beijing) Co., Ltd. | Method and device for recognizing subject area of image |
CN108961157A (en) * | 2018-06-19 | 2018-12-07 | Oppo广东移动通信有限公司 | Image processing method, picture processing unit and terminal device |
CN109034119A (en) * | 2018-08-27 | 2018-12-18 | 苏州广目信息技术有限公司 | A kind of method for detecting human face of the full convolutional neural networks based on optimization |
CN109784186A (en) * | 2018-12-18 | 2019-05-21 | 深圳云天励飞技术有限公司 | A kind of pedestrian recognition methods, device, electronic equipment and computer readable storage medium again |
CN110555446A (en) * | 2019-08-19 | 2019-12-10 | 北京工业大学 | Remote sensing image scene classification method based on multi-scale depth feature fusion and transfer learning |
CN110705653A (en) * | 2019-10-22 | 2020-01-17 | Oppo广东移动通信有限公司 | Image classification method, image classification device and terminal equipment |
CN110751218A (en) * | 2019-10-22 | 2020-02-04 | Oppo广东移动通信有限公司 | Image classification method, image classification device and terminal equipment |
-
2020
- 2020-03-03 CN CN202010139339.5A patent/CN111340124A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170293824A1 (en) * | 2014-12-30 | 2017-10-12 | Baidu Online Network Technology ( Beijing) Co., Ltd. | Method and device for recognizing subject area of image |
CN108961157A (en) * | 2018-06-19 | 2018-12-07 | Oppo广东移动通信有限公司 | Image processing method, picture processing unit and terminal device |
CN109034119A (en) * | 2018-08-27 | 2018-12-18 | 苏州广目信息技术有限公司 | A kind of method for detecting human face of the full convolutional neural networks based on optimization |
CN109784186A (en) * | 2018-12-18 | 2019-05-21 | 深圳云天励飞技术有限公司 | A kind of pedestrian recognition methods, device, electronic equipment and computer readable storage medium again |
CN110555446A (en) * | 2019-08-19 | 2019-12-10 | 北京工业大学 | Remote sensing image scene classification method based on multi-scale depth feature fusion and transfer learning |
CN110705653A (en) * | 2019-10-22 | 2020-01-17 | Oppo广东移动通信有限公司 | Image classification method, image classification device and terminal equipment |
CN110751218A (en) * | 2019-10-22 | 2020-02-04 | Oppo广东移动通信有限公司 | Image classification method, image classification device and terminal equipment |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112085035A (en) * | 2020-09-14 | 2020-12-15 | 北京字节跳动网络技术有限公司 | Image processing method, image processing device, electronic equipment and computer readable medium |
CN112580694A (en) * | 2020-12-01 | 2021-03-30 | 中国船舶重工集团公司第七0九研究所 | Small sample image target identification method and system based on joint attention mechanism |
CN112580694B (en) * | 2020-12-01 | 2024-04-19 | 中国船舶重工集团公司第七0九研究所 | Small sample image target recognition method and system based on joint attention mechanism |
CN112528897A (en) * | 2020-12-17 | 2021-03-19 | Oppo(重庆)智能科技有限公司 | Portrait age estimation method, Portrait age estimation device, computer equipment and storage medium |
CN112528897B (en) * | 2020-12-17 | 2023-06-13 | Oppo(重庆)智能科技有限公司 | Portrait age estimation method, device, computer equipment and storage medium |
CN114140613A (en) * | 2021-12-08 | 2022-03-04 | 北京有竹居网络技术有限公司 | Image detection method, image detection device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109447169B (en) | Image processing method, training method and device of model thereof and electronic system | |
CN109614985B (en) | Target detection method based on densely connected feature pyramid network | |
CN111340124A (en) | Method and device for identifying entity category in image | |
CN111476302B (en) | fast-RCNN target object detection method based on deep reinforcement learning | |
CN112131978B (en) | Video classification method and device, electronic equipment and storage medium | |
CN106682696B (en) | The more example detection networks and its training method refined based on online example classification device | |
CN111274981B (en) | Target detection network construction method and device and target detection method | |
CN114399644A (en) | Target detection method and device based on small sample | |
CN111881849A (en) | Image scene detection method and device, electronic equipment and storage medium | |
CN109753884A (en) | A kind of video behavior recognition methods based on key-frame extraction | |
CN111027347A (en) | Video identification method and device and computer equipment | |
CN113223614A (en) | Chromosome karyotype analysis method, system, terminal device and storage medium | |
CN111432206A (en) | Video definition processing method and device based on artificial intelligence and electronic equipment | |
CN112766170A (en) | Self-adaptive segmentation detection method and device based on cluster unmanned aerial vehicle image | |
CN116311218A (en) | Noise plant point cloud semantic segmentation method and system based on self-attention feature fusion | |
CN115240024A (en) | Method and system for segmenting extraterrestrial pictures by combining self-supervised learning and semi-supervised learning | |
CN112529025A (en) | Data processing method and device | |
CN113139540B (en) | Backboard detection method and equipment | |
CN105102607A (en) | Image processing device, program, storage medium, and image processing method | |
CN115018884B (en) | Visible light infrared visual tracking method based on multi-strategy fusion tree | |
CN111738310A (en) | Material classification method and device, electronic equipment and storage medium | |
CN111144422A (en) | Positioning identification method and system for aircraft component | |
CN115512428A (en) | Human face living body distinguishing method, system, device and storage medium | |
CN115063628A (en) | Fruit picking prediction method based on visual semantic segmentation | |
CN114972965A (en) | Scene recognition method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |