CN111310806B - Classification network, image processing method, device, system and storage medium - Google Patents

Classification network, image processing method, device, system and storage medium Download PDF

Info

Publication number
CN111310806B
CN111310806B CN202010075053.5A CN202010075053A CN111310806B CN 111310806 B CN111310806 B CN 111310806B CN 202010075053 A CN202010075053 A CN 202010075053A CN 111310806 B CN111310806 B CN 111310806B
Authority
CN
China
Prior art keywords
target
classification
network
classified
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010075053.5A
Other languages
Chinese (zh)
Other versions
CN111310806A (en
Inventor
李永波
李伯勋
张弛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN202010075053.5A priority Critical patent/CN111310806B/en
Publication of CN111310806A publication Critical patent/CN111310806A/en
Application granted granted Critical
Publication of CN111310806B publication Critical patent/CN111310806B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Abstract

The invention provides a classification network, an image processing method, a device, a system and a storage medium, wherein the classification network comprises a feature extraction sub-network, a target direction classification sub-network and a target category classification sub-network, wherein: the characteristic extraction sub-network is used for extracting characteristics of the input image; the target direction classification sub-network is used for generating a direction classification result of a target to be classified in the input image based on the feature extraction result; and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result. According to the classification network provided by the embodiment of the invention, under the general framework of the classification network, the branch structure for classifying the direction of the object to be classified is added, and the classification of the object to be classified is based on the direction classification result of the added branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensions.

Description

Classification network, image processing method, device, system and storage medium
Technical Field
The present invention relates to the field of object classification technologies, and in particular, to a classification network, an image processing method, an image processing device, an image processing system, and a storage medium.
Background
The use of neural networks is now becoming more and more common in the field of image processing, such as image recognition. For example, in security scenes, classification judgment of targets such as pedestrians and faces by using a classification network is a fundamental problem in scene application. In order to improve the classification accuracy performance of the classification network, the existing method generally monitors the learning process of the classification network by optimizing the feature extraction network part of the classification network to obtain better feature extraction or designing a more reasonable loss function.
Disclosure of Invention
The invention provides a classification network and an image processing scheme, wherein the classification network is provided with a branch structure for classifying the direction of a target to be classified under the general framework of the classification network, and the classification of the target to be classified is based on the direction classification result of the added branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensionalities. The classification network and the image processing scheme according to the present invention will be briefly described below, and more details will be described in the following detailed description with reference to the drawings.
According to an aspect of the present invention, there is provided a classification network comprising a feature extraction sub-network, a target direction classification sub-network and a target class classification sub-network, wherein: the characteristic extraction sub-network is used for extracting characteristics of the input image; the target direction classification sub-network is used for generating a direction classification result of a target to be classified in the input image based on the feature extraction result; and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.
In one embodiment of the present invention, the loss function employed by the classification network in training is the sum of the loss function of the target class classification sub-network and the loss function of the target direction classification sub-network.
In one embodiment of the present invention, the feature extraction sub-network includes a convolution layer and a pooling layer, the target direction classification sub-network includes a convolution layer, a pooling layer, and a fully connected layer, and the target class classification sub-network includes a convolution layer, a pooling layer, and a fully connected layer.
According to another aspect of the present invention, there is provided an image processing method including: acquiring an input image and extracting features of the input image; generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and generating a category classification result of the object to be classified based on the feature extraction result and the direction classification result.
In one embodiment of the invention, the image processing method is performed by a trained classification network comprising: the characteristic extraction sub-network is used for extracting the characteristics of the input image; a target direction classification sub-network for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction; and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.
In one embodiment of the present invention, the loss function employed by the classification network in training is the sum of the loss function of the target class classification sub-network and the loss function of the target direction classification sub-network.
In one embodiment of the present invention, the feature extraction sub-network includes a convolution layer and a pooling layer, the target direction classification sub-network includes a convolution layer, a pooling layer, and a fully connected layer, and the target class classification sub-network includes a convolution layer, a pooling layer, and a fully connected layer.
According to still another aspect of the present invention, there is provided an image processing apparatus including: the feature extraction module is used for acquiring an input image and extracting features of the input image; the direction classification module is used for generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and the category classification module is used for generating a category classification result of the object to be classified based on the feature extraction result and the direction classification result.
According to a further aspect of the present invention there is provided an image processing system comprising a processor and a storage device having stored thereon a computer program which when executed by the processor performs the image processing method of any of the above.
According to still another aspect of the present invention, there is provided a storage medium having stored thereon a computer program which, when run, performs the image processing method of any one of the above.
According to a further aspect of the present invention, there is provided a computer program for executing the image processing method of any one of the above when executed by a computer or processor, the computer program further being for implementing the modules in the image processing apparatus of any one of the above.
According to the classification network provided by the embodiment of the invention, the branch structure for classifying the direction of the object to be classified is added under the general framework of the classification network, and the classification of the object to be classified is based on the direction classification result of the added branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensions, and the accuracy of the classification of the object in the image can be improved based on the image processing method, the image processing device and the image processing system of the classification network.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent from the following more particular description of embodiments of the present invention, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, and not constitute a limitation to the invention. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 shows a schematic block diagram of a classification network according to an embodiment of the invention.
Fig. 2 shows a schematic diagram of a training process of a classification network according to an embodiment of the invention.
Fig. 3 shows a schematic flow chart of an image processing method according to an embodiment of the invention.
Fig. 4 shows a schematic block diagram of an image processing apparatus according to an embodiment of the present invention.
Fig. 5 shows a schematic block diagram of an image processing system according to an embodiment of the invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention and not all embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein. Based on the embodiments of the invention described in the present application, all other embodiments that a person skilled in the art would have without inventive effort shall fall within the scope of the invention.
Fig. 1 shows a schematic block diagram of a classification network 100 according to an embodiment of the invention. As shown in fig. 1, the classification network 100 includes a feature extraction sub-network 110, a target direction classification sub-network 120, and a target class classification sub-network 130. Wherein the feature extraction sub-network 110 is used for extracting features of the input image. The target direction classification sub-network 120 is configured to generate a direction classification result of a target to be classified in the input image based on the result of feature extraction output by the feature extraction sub-network 110. The target class classification sub-network 130 is configured to generate a class classification result of the target to be classified based on the feature extraction result output by the feature extraction sub-network 110 and the direction classification result output by the target direction classification sub-network 120.
In an embodiment of the present invention, the object to be classified in the input image may include an object such as a pedestrian, a face, a car, or others. The final purpose of the classification network 100 is to classify the class of the object to be classified in the input image, i.e. determine the probability of whether the object to be classified is a pedestrian, a face, a car or other object. For example, the classification network 100 may be a network for two classifications, in which case the classification network 100 may be used to determine whether the object to be classified in the input image is a target object of some type, such as a face, a pedestrian, or the like. As another example, the classification network 100 may also be a multi-classification network, in which case the classification network 100 may be used to determine what types of target objects are each of the targets to be classified in the input image.
In an embodiment of the present invention, the classification network 100 includes a target direction classification sub-network 120, the target direction classification sub-network 120 being a sub-network capable of classifying the direction of a target to be classified in an input image. The direction of the target to be classified may refer to a positional relationship of a key part of the target to be classified in the input image. For example, when the object to be classified is a pedestrian, the directions of the object to be classified may include a first direction and a second direction, wherein the first direction may refer to that in the input image, the head of the pedestrian is above and the foot is below (i.e., the person is sitting); the second direction may refer to the input image with the pedestrian's feet above and the head below (i.e., the person is upside down). For another example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a third direction and a fourth direction in addition to the aforementioned first direction and second direction, wherein the third direction may refer to that the head of the pedestrian is left and the foot is right in the input image; the fourth direction may refer to the pedestrian's foot to the left and head to the right in the input image. For another example, when the object to be classified is a face, the directions of the object to be classified may include a first direction and a second direction, where the first direction may refer to that eyes on the face are above and the mouth is below (i.e., the face is being) in the input image; the second direction may refer to the input image with the mouth above and the eyes below on the face (i.e., the face is flipped over). In other examples, the direction of the object to be classified may also be other, and is not exemplified here.
In general, the target direction classification sub-network 120 may be configured to determine a direction of a target to be classified in the input image based on the features extracted by the feature extraction sub-network 110, where the direction characteristic of the target to be classified facilitates determining a class of the target to be classified. Based on this, the target class classification sub-network 130 may output a class classification result of the target to be classified based on the features extracted by the feature extraction sub-network 110 and the direction classification result of the target to be classified output by the target direction classification sub-network 120. Compared with the classification based on the result of feature extraction, the classification network 100 according to the embodiment of the invention classifies the classification of the object to be classified according to the direction classification result of the object to be classified, so that a more accurate classification result can be obtained, and the classification network can be used in various visual tasks to improve the network performance.
The classification network 100 of the present invention is further described below in conjunction with fig. 2. Fig. 2 shows a schematic diagram of a training process of the classification network 100 according to an embodiment of the invention. In fig. 2, two classifications are taken as examples. As shown in fig. 2, the sample image I may be an existing sample, on the basis of which a multidirectional sample may be constructed. In general, objects to be classified in an actual security scene have obvious directivity, so that, for example, a sample image I can be turned upside down to generate the sample image I flip . Based on the constructed samples in different directions, the samples can be marked to obtain sample labels. The labeled differently oriented samples are then input to the classification network 100 to train the classification network.
As shown in fig. 2, a sample image I and a sample image I flip Is input to the feature extraction subnetwork 110. Feature extraction subnetwork 110 may include a convolutional layer, a pooling layer, or the like network structure. The feature extraction sub-network 110 extracts feature vectors of the sample image. The outputs of the feature extraction sub-network 110 are input to the target class classification sub-network 130 and the target direction classification sub-network 120.
The target class classification sub-network 130 may include a convolutional layer, a pooling layer, a fully-connected layer, and the like. The target class classification sub-network 130 outputs a class classification result, e.g., classification probability P, of the sample image based on the feature vector output by the feature extraction sub-network 110 cls And 1-P cls . Wherein P is cls Can represent the probability that the object to be classified in the sample image is a pedestrian, 1-P cls The probability that the object to be classified in the sample image is not a pedestrian may be represented. The loss function of the target class classification sub-network 130 may be represented as L cls It may be a common class loss function such as cross entropy loss or the like. Loss function L of target class classification sub-network 130 cls Can be determined by classifying probability P cls And sample tag y, i.e. L cls =Loss(p cls ,y)。
The target direction classification sub-network 120 may include a convolutional layer, a pooling layer, a fully-connected layer, and the like. The target direction classification sub-network 120 outputs a direction classification result, e.g., classification probability P, of the sample image based on the feature vector output by the feature extraction sub-network 110 flip And 1-P flip . Wherein P is flip Can represent the probability that the direction of the object to be classified in the sample image is the first direction (for example, the forward direction), 1-P flip The probability that the direction of the object to be classified in the sample image is not the first direction (e.g., is a reverse direction) may be represented. The loss function of the target direction classification sub-network 120 may be represented as L flip . Loss function L of target direction classification sub-network 120 flip Can be determined by classifying probability P flip And sample tag y flip Performing calculations, i.e. L flip =Loss(p flip ,y flip )。
During the training process, feature extraction sub-network 110, target class classification sub-network 130, and target direction classification sub-network 120 may be jointly trained under the supervision of a loss function L of classification network 100. Illustratively, the loss function L of the classification network 100 may be constructed as the loss function L of the target class classification sub-network 130 cls And a loss function L of the target direction classification sub-network 120 flip The sum of L=L cls +L flip . In other examples, the loss function L of the classification network 100 may also be constructed to be L-based cls And L flip Is a loss function of other forms of (a).
It should be noted that, in the embodiment of the present invention, the loss function L of the target class classification sub-network 130 cls And a loss function L of the target direction classification sub-network 120 flip The specific form of (2) is not limited, L cls And L flip Various suitable loss functions may be employed, either existing or occurring in the future.
The output of the classification network 100 is P cls I.e. the output of the target direction classification sub-network 120 is not output as an overall network of the classification network 100, but is only used to assist the target class classification sub-network 130 in classifying classes of targets to be classifiedClass. In the network inference process, the output of the classification network 100 is identical to the output of the classical classification network (i.e., the classification network comprising only the feature extraction sub-network and the target class classification sub-network), i.e., the discrimination scores of the targets to be classified. Thus, the overall network inference process does not increase forward inference complexity.
Based on the above description, the classification network according to the embodiment of the invention increases the branch structure for classifying the direction of the object to be classified under the general framework of the classification network, and makes the classification of the object to be classified based on the direction classification result of the increased branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensions without increasing the inference complexity of the network.
An image processing method according to another aspect of the present invention, which may be performed by a classification network according to an embodiment of the present invention, is described below in connection with fig. 3. Fig. 3 shows a schematic flow chart of an image processing method 300 according to an embodiment of the invention. As shown in fig. 3, the image processing method 300 may include the steps of:
in step S310, an input image is acquired, and feature extraction is performed on the input image.
In step S320, a direction classification result of the object to be classified in the input image is generated based on the result of the feature extraction.
In step S330, a classification result of the object to be classified is generated based on the result of the feature extraction and the direction classification result.
In an embodiment of the present invention, the object to be classified in the input image may include an object such as a pedestrian, a face, a car, or others. The final purpose of the image processing method 300 is to classify the class of the object to be classified in the input image, i.e. determine the probability of whether the object to be classified is a pedestrian, a face, a car or other object. Taking two classification as an example, the image processing method 300 may determine whether the object to be classified in the input image is a certain type of object, such as a face, a pedestrian, or the like.
In the embodiment of the invention, after the input image is subjected to feature extraction, the direction characteristic of the object to be classified in the input image is judged according to the result of feature extraction, and then the class classification of the object to be classified is judged based on the result of feature extraction and by means of the direction characteristic of the object to be classified in the input image. The direction of the target to be classified may refer to a positional relationship of a key part of the target to be classified in the input image. For example, when the object to be classified is a pedestrian, the directions of the object to be classified may include a first direction and a second direction, wherein the first direction may refer to that in the input image, the head of the pedestrian is above and the foot is below (i.e., the person is sitting); the second direction may refer to the input image with the pedestrian's feet above and the head below (i.e., the person is upside down). For another example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a third direction and a fourth direction in addition to the aforementioned first direction and second direction, wherein the third direction may refer to that the head of the pedestrian is left and the foot is right in the input image; the fourth direction may refer to the pedestrian's foot to the left and head to the right in the input image. For another example, when the object to be classified is a face, the directions of the object to be classified may include a first direction and a second direction, where the first direction may refer to that eyes on the face are above and the mouth is below (i.e., the face is being) in the input image; the second direction may refer to the input image with the mouth above and the eyes below on the face (i.e., the face is flipped over). In other examples, the direction of the object to be classified may also be other, and is not exemplified here.
The direction characteristics of the objects to be classified are helpful to finally judge the types of the objects to be classified, so that the image processing method according to the embodiment of the invention can obtain more accurate type classification results compared with the type classification which is carried out only by means of the results of feature extraction, and can be used in various visual tasks to improve the classification performance.
In an embodiment of the present invention, the image processing method 300 may be performed by a trained classification network, which may be the classification network described above in connection with fig. 1 and 2. Those skilled in the art can appreciate the structure and operation of the classification network for performing the image processing method 300 in conjunction with the foregoing description, and for brevity, will not be described in detail herein.
The image processing method according to the embodiment of the present invention is exemplarily shown above. Illustratively, the sorting network method according to an embodiment of the present invention may be implemented in a device, apparatus or system having a memory and a processor. In addition, the image processing method according to the embodiment of the invention can be conveniently deployed on mobile equipment such as smart phones, tablet computers, personal computers and the like. Alternatively, the image processing method according to the embodiment of the invention may be deployed at the server (or cloud). Alternatively, the image processing method according to the embodiment of the present invention may be distributed and deployed at the server side (or cloud side) and the personal terminal.
An image processing apparatus provided in still another aspect of the present invention is described below with reference to fig. 4. Fig. 4 shows a schematic block diagram of an image processing apparatus 400 according to an embodiment of the invention.
As shown in fig. 4, the image processing apparatus 400 according to an embodiment of the present invention includes a feature extraction module 410, a direction classification module 420, and a category classification module 430. The feature extraction module 410 is configured to obtain an input image, and perform feature extraction on the input image. The direction classification module 420 is configured to generate a direction classification result of the object to be classified in the input image based on the result of the feature extraction. The category classification module 430 is configured to generate a category classification result of the object to be classified based on the feature extraction result and the direction classification result. The respective modules may perform the respective steps/functions of the image processing method described above in connection with fig. 3, respectively.
In an embodiment of the present invention, the object to be classified in the input image may include an object such as a pedestrian, a face, a car, or others. The final purpose of the image processing apparatus 400 is to classify the class of the object to be classified in the input image, i.e. to determine the probability of whether the object to be classified is a pedestrian, a face, a car or other object. Taking two classification as an example, the image processing apparatus 400 may determine whether the object to be classified in the input image is a certain type of object, for example, whether it is a face, whether it is a pedestrian, or the like.
In an embodiment of the present invention, the feature extraction module 410 is configured to perform feature extraction on an input image, and after a result of feature extraction is obtained, the direction classification module 420 determines a direction characteristic of a target to be classified in the input image according to the result of feature extraction, and then the class classification module determines class classification of the target to be classified based on the result of feature extraction and by means of the direction characteristic of the target to be classified in the input image. The direction of the target to be classified may refer to a positional relationship of a key part of the target to be classified in the input image. For example, when the object to be classified is a pedestrian, the directions of the object to be classified may include a first direction and a second direction, wherein the first direction may refer to that in the input image, the head of the pedestrian is above and the foot is below (i.e., the person is sitting); the second direction may refer to the input image with the pedestrian's feet above and the head below (i.e., the person is upside down). For another example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a third direction and a fourth direction in addition to the aforementioned first direction and second direction, wherein the third direction may refer to that the head of the pedestrian is left and the foot is right in the input image; the fourth direction may refer to the pedestrian's foot to the left and head to the right in the input image. For another example, when the object to be classified is a face, the directions of the object to be classified may include a first direction and a second direction, where the first direction may refer to that eyes on the face are above and the mouth is below (i.e., the face is being) in the input image; the second direction may refer to the input image with the mouth above and the eyes below on the face (i.e., the face is flipped over). In other examples, the direction of the object to be classified may also be other, and is not exemplified here.
Because the direction characteristics of the objects to be classified are helpful to finally judge the types of the objects to be classified, the image processing device provided by the embodiment of the invention can obtain more accurate type classification results compared with the type classification which is carried out only by means of the results of feature extraction, and can be used in various visual tasks to improve the classification performance.
In an embodiment of the present invention, the modules of the image processing apparatus 400 may be implemented by a trained classification network, which may be the classification network described above in connection with fig. 1 and 2. For example, the feature extraction module 410 of the image processing apparatus 400 is implemented by the feature extraction sub-network 110 of the classification network 100, the direction classification module 420 of the image processing apparatus 400 is implemented by the target direction classification sub-network 110 of the classification network 100, and the class classification module 430 of the image processing apparatus 400 is implemented by the target class classification sub-network 130 of the classification network 100. Those skilled in the art can understand the structure and operation of each module for executing the image processing apparatus 400 in conjunction with the foregoing description, and for brevity, a detailed description thereof will be omitted herein.
An image processing system according to still another aspect of the present invention is described below with reference to fig. 5. Fig. 5 shows a schematic block diagram of an image processing system 500 according to an embodiment of the invention.
Fig. 5 shows a schematic block diagram of an image processing system 500 according to an embodiment of the invention. The image processing system 500 includes a storage device 510 and a processor 520.
Wherein the storage means 510 stores program code for implementing the respective steps in the image processing method according to an embodiment of the present invention. The processor 520 is adapted to run program code stored in the storage means 510 for performing the respective steps of the image processing method according to an embodiment of the invention and for implementing the respective modules in the image processing apparatus according to an embodiment of the invention.
In one embodiment, the program code, when executed by the processor 520, causes the image processing system 500 to perform the steps of: acquiring an input image and extracting features of the input image; generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and generating a category classification result of the object to be classified based on the feature extraction result and the direction classification result.
In one embodiment of the invention, the program code, when executed by the processor 520, causes the image processing system 500 to perform the steps performed by a trained classification network comprising: the characteristic extraction sub-network is used for extracting the characteristics of the input image; a target direction classification sub-network for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction; and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.
In one embodiment of the present invention, the loss function employed by the classification network in training is the sum of the loss function of the target class classification sub-network and the loss function of the target direction classification sub-network.
Furthermore, according to an embodiment of the present invention, there is also provided a storage medium on which program instructions are stored, which program instructions, when being executed by a computer or a processor, are for performing the respective steps of the image processing method of the embodiment of the present invention, and for realizing the respective modules in the image processing apparatus according to the embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a memory component of a tablet computer, a hard disk of a personal computer, read-only memory (ROM), erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, or any combination of the foregoing storage media.
In an embodiment, the computer program instructions may implement the respective functional modules of the image processing apparatus according to the embodiments of the present invention when being executed by a computer and/or may perform the image processing method according to the embodiments of the present invention.
In one embodiment, the computer program instructions, when executed by a computer or processor, cause the computer or processor to perform the steps of: acquiring an input image and extracting features of the input image; generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and generating a category classification result of the object to be classified based on the feature extraction result and the direction classification result.
In one embodiment of the invention, the computer program instructions, when executed by a computer or processor, cause the steps performed by the computer or processor to be performed by a trained classification network comprising: the characteristic extraction sub-network is used for extracting the characteristics of the input image; a target direction classification sub-network for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction; and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.
In one embodiment of the present invention, the loss function employed by the classification network in training is the sum of the loss function of the target class classification sub-network and the loss function of the target direction classification sub-network.
Furthermore, according to an embodiment of the present invention, there is also provided a computer program, which may be stored on a cloud or local storage medium. Which when executed by a computer or processor is adapted to carry out the respective steps of the image processing method of an embodiment of the invention and to carry out the respective modules in the image processing apparatus according to an embodiment of the invention.
Based on the above description, the classification network according to the embodiment of the invention increases the branch structure for classifying the direction of the object to be classified under the general framework of the classification network, and enables the classification of the object to be classified based on the direction classification result of the increased branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensions, and the image processing method, device and system based on the classification network can be enabled to improve the accuracy of the classification of the object in the image.
Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the above illustrative embodiments are merely illustrative and are not intended to limit the scope of the present invention thereto. Various changes and modifications may be made therein by one of ordinary skill in the art without departing from the scope and spirit of the invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another device, or some features may be omitted or not performed.
In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in order to streamline the invention and aid in understanding one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof in the description of exemplary embodiments of the invention. However, the method of the present invention should not be construed as reflecting the following intent: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be combined in any combination, except combinations where the features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.
Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some of the modules in an item analysis device according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.
The foregoing description is merely illustrative of specific embodiments of the present invention and the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present invention. The protection scope of the invention is subject to the protection scope of the claims.

Claims (10)

1. A classification network comprising a feature extraction sub-network, a target direction classification sub-network, and a target class classification sub-network, wherein:
the characteristic extraction sub-network is used for extracting characteristics of the input image;
the target direction classification sub-network is used for generating a direction classification result of a target to be classified in the input image based on the feature extraction result; the direction classification result comprises the relative position relation between at least two key parts of the target to be classified;
the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result;
for one target category, the target category corresponds to a preset direction category, and when the direction classification result of the target to be classified is the preset direction category of the one target category and the feature extraction result of the target to be classified is the feature extraction result of the target category, the target category classification sub-network classifies the target to be classified as the target category.
2. The classification network of claim 1, wherein the classification network is trained to employ a loss function that is a sum of a loss function of the target class classification sub-network and a loss function of the target direction classification sub-network.
3. The classification network according to claim 1 or 2, wherein the feature extraction sub-network comprises a convolutional layer and a pooling layer, the target direction classification sub-network comprises a convolutional layer, a pooling layer, and a fully-connected layer, and the target class classification sub-network comprises a convolutional layer, a pooling layer, and a fully-connected layer.
4. An image processing method, characterized in that the image processing method comprises:
acquiring an input image and extracting features of the input image;
generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and
generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result;
the direction classification result comprises the relative position relation between at least two key parts of the target to be classified;
for one target category, the target category corresponds to a preset direction category, and when the direction classification result of the target to be classified is the preset direction category of the one target category and the feature extraction result of the target to be classified is the feature extraction result of the target category, the target category classification sub-network classifies the target to be classified as the target category.
5. The image processing method of claim 4, wherein the image processing method is performed by a trained classification network, the classification network comprising:
the characteristic extraction sub-network is used for extracting the characteristics of the input image;
a target direction classification sub-network for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction; and
and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.
6. The image processing method according to claim 5, wherein the loss function adopted by the classification network at the time of training is a sum of the loss function of the target class classification sub-network and the loss function of the target direction classification sub-network.
7. The image processing method according to claim 5 or 6, wherein the feature extraction sub-network includes a convolution layer and a pooling layer, the target direction classification sub-network includes a convolution layer, a pooling layer, and a full-connection layer, and the target class classification sub-network includes a convolution layer, a pooling layer, and a full-connection layer.
8. An image processing apparatus, characterized in that the image processing apparatus comprises:
the feature extraction module is used for acquiring an input image and extracting features of the input image;
the direction classification module is used for generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and
the category classification module is used for generating a category classification result of the object to be classified based on the feature extraction result and the direction classification result;
the direction classification result comprises the relative position relation between at least two key parts of the target to be classified;
for one target category, the target category corresponds to a preset direction category, and when the direction classification result of the target to be classified is the preset direction category of the one target category and the feature extraction result of the target to be classified is the feature extraction result of the target category, the target category classification sub-network classifies the target to be classified as the target category.
9. An image processing system, characterized in that it comprises a processor and a storage means, on which a computer program is stored, which computer program, when being executed by the processor, performs the image processing method according to any of claims 4-7.
10. A storage medium having stored thereon a computer program which, when run, performs the image processing method according to any of claims 4-7.
CN202010075053.5A 2020-01-22 2020-01-22 Classification network, image processing method, device, system and storage medium Active CN111310806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010075053.5A CN111310806B (en) 2020-01-22 2020-01-22 Classification network, image processing method, device, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010075053.5A CN111310806B (en) 2020-01-22 2020-01-22 Classification network, image processing method, device, system and storage medium

Publications (2)

Publication Number Publication Date
CN111310806A CN111310806A (en) 2020-06-19
CN111310806B true CN111310806B (en) 2024-03-15

Family

ID=71145294

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010075053.5A Active CN111310806B (en) 2020-01-22 2020-01-22 Classification network, image processing method, device, system and storage medium

Country Status (1)

Country Link
CN (1) CN111310806B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289686A (en) * 2011-08-09 2011-12-21 北京航空航天大学 Method for identifying classes of moving targets based on transfer learning
CN103390167A (en) * 2013-07-18 2013-11-13 奇瑞汽车股份有限公司 Multi-characteristic layered traffic sign identification method
CN103854016A (en) * 2014-03-27 2014-06-11 北京大学深圳研究生院 Human body behavior classification and identification method and system based on directional common occurrence characteristics
CN104253944A (en) * 2014-09-11 2014-12-31 陈飞 Sight connection-based voice command issuing device and method
CN105469400A (en) * 2015-11-23 2016-04-06 广州视源电子科技股份有限公司 Rapid identification and marking method of electronic component polarity direction and system thereof
CN105590116A (en) * 2015-12-18 2016-05-18 华南理工大学 Bird image identification method based on head part alignment
CN110210535A (en) * 2019-05-21 2019-09-06 北京市商汤科技开发有限公司 Neural network training method and device and image processing method and device
CN110263868A (en) * 2019-06-24 2019-09-20 北京航空航天大学 Image classification network based on SuperPoint feature

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6563873B2 (en) * 2016-08-02 2019-08-21 トヨタ自動車株式会社 Orientation discrimination device and orientation discrimination method

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102289686A (en) * 2011-08-09 2011-12-21 北京航空航天大学 Method for identifying classes of moving targets based on transfer learning
CN103390167A (en) * 2013-07-18 2013-11-13 奇瑞汽车股份有限公司 Multi-characteristic layered traffic sign identification method
CN103854016A (en) * 2014-03-27 2014-06-11 北京大学深圳研究生院 Human body behavior classification and identification method and system based on directional common occurrence characteristics
CN104253944A (en) * 2014-09-11 2014-12-31 陈飞 Sight connection-based voice command issuing device and method
CN105469400A (en) * 2015-11-23 2016-04-06 广州视源电子科技股份有限公司 Rapid identification and marking method of electronic component polarity direction and system thereof
CN105590116A (en) * 2015-12-18 2016-05-18 华南理工大学 Bird image identification method based on head part alignment
CN110210535A (en) * 2019-05-21 2019-09-06 北京市商汤科技开发有限公司 Neural network training method and device and image processing method and device
CN110263868A (en) * 2019-06-24 2019-09-20 北京航空航天大学 Image classification network based on SuperPoint feature

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘晓华 ; 张弛 ; .基于BP神经网络分类器的运动物体识别方法的研究.硅谷.2019,(第24期),第54-55页. *
潘宗序 ; 安全智 ; 张冰尘 ; .基于深度学习的雷达图像目标识别研究进展.中国科学:信息科学.2014,(第12期),第98-111页. *

Also Published As

Publication number Publication date
CN111310806A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
US10275688B2 (en) Object detection with neural network
CN109145766B (en) Model training method and device, recognition method, electronic device and storage medium
US10275719B2 (en) Hyper-parameter selection for deep convolutional networks
CN105426356B (en) A kind of target information recognition methods and device
CN112434721A (en) Image classification method, system, storage medium and terminal based on small sample learning
CN109117879B (en) Image classification method, device and system
CN105574550A (en) Vehicle identification method and device
CN112784670A (en) Object detection based on pixel differences
CN109685065B (en) Layout analysis method and system for automatically classifying test paper contents
CN109034086B (en) Vehicle weight identification method, device and system
CN110555428B (en) Pedestrian re-identification method, device, server and storage medium
CN112633159B (en) Human-object interaction relation identification method, model training method and corresponding device
CN111931859B (en) Multi-label image recognition method and device
CN113963147B (en) Key information extraction method and system based on semantic segmentation
KR20170109304A (en) Method for parallel learning of cascade classifier by object recognition
CN114581710A (en) Image recognition method, device, equipment, readable storage medium and program product
KR101545809B1 (en) Method and apparatus for detection license plate
He et al. Aggregating local context for accurate scene text detection
CN112418256A (en) Classification, model training and information searching method, system and equipment
CN112597997A (en) Region-of-interest determining method, image content identifying method and device
CN110490876B (en) Image segmentation method based on lightweight neural network
CN111310806B (en) Classification network, image processing method, device, system and storage medium
CN113255766B (en) Image classification method, device, equipment and storage medium
CN115080745A (en) Multi-scene text classification method, device, equipment and medium based on artificial intelligence
CN115049872A (en) Image point cloud feature fusion classification method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant