CN111310806B

CN111310806B - Classification network, image processing method, device, system and storage medium

Info

Publication number: CN111310806B
Application number: CN202010075053.5A
Authority: CN
Inventors: 李永波; 李伯勋; 张弛
Original assignee: Beijing Megvii Technology Co Ltd
Current assignee: Beijing Megvii Technology Co Ltd
Priority date: 2020-01-22
Filing date: 2020-01-22
Publication date: 2024-03-15
Anticipated expiration: 2040-01-22
Also published as: CN111310806A

Abstract

The invention provides a classification network, an image processing method, a device, a system and a storage medium, wherein the classification network comprises a feature extraction sub-network, a target direction classification sub-network and a target category classification sub-network, wherein: the characteristic extraction sub-network is used for extracting characteristics of the input image; the target direction classification sub-network is used for generating a direction classification result of a target to be classified in the input image based on the feature extraction result; and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result. According to the classification network provided by the embodiment of the invention, under the general framework of the classification network, the branch structure for classifying the direction of the object to be classified is added, and the classification of the object to be classified is based on the direction classification result of the added branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensions.

Description

Classification network, image processing method, device, system and storage medium

Technical Field

The present invention relates to the field of object classification technologies, and in particular, to a classification network, an image processing method, an image processing device, an image processing system, and a storage medium.

Background

The use of neural networks is now becoming more and more common in the field of image processing, such as image recognition. For example, in security scenes, classification judgment of targets such as pedestrians and faces by using a classification network is a fundamental problem in scene application. In order to improve the classification accuracy performance of the classification network, the existing method generally monitors the learning process of the classification network by optimizing the feature extraction network part of the classification network to obtain better feature extraction or designing a more reasonable loss function.

Disclosure of Invention

The invention provides a classification network and an image processing scheme, wherein the classification network is provided with a branch structure for classifying the direction of a target to be classified under the general framework of the classification network, and the classification of the target to be classified is based on the direction classification result of the added branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensionalities. The classification network and the image processing scheme according to the present invention will be briefly described below, and more details will be described in the following detailed description with reference to the drawings.

According to an aspect of the present invention, there is provided a classification network comprising a feature extraction sub-network, a target direction classification sub-network and a target class classification sub-network, wherein: the characteristic extraction sub-network is used for extracting characteristics of the input image; the target direction classification sub-network is used for generating a direction classification result of a target to be classified in the input image based on the feature extraction result; and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.

In one embodiment of the present invention, the loss function employed by the classification network in training is the sum of the loss function of the target class classification sub-network and the loss function of the target direction classification sub-network.

In one embodiment of the present invention, the feature extraction sub-network includes a convolution layer and a pooling layer, the target direction classification sub-network includes a convolution layer, a pooling layer, and a fully connected layer, and the target class classification sub-network includes a convolution layer, a pooling layer, and a fully connected layer.

According to another aspect of the present invention, there is provided an image processing method including: acquiring an input image and extracting features of the input image; generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and generating a category classification result of the object to be classified based on the feature extraction result and the direction classification result.

In one embodiment of the invention, the image processing method is performed by a trained classification network comprising: the characteristic extraction sub-network is used for extracting the characteristics of the input image; a target direction classification sub-network for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction; and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.

According to still another aspect of the present invention, there is provided an image processing apparatus including: the feature extraction module is used for acquiring an input image and extracting features of the input image; the direction classification module is used for generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and the category classification module is used for generating a category classification result of the object to be classified based on the feature extraction result and the direction classification result.

According to a further aspect of the present invention there is provided an image processing system comprising a processor and a storage device having stored thereon a computer program which when executed by the processor performs the image processing method of any of the above.

According to still another aspect of the present invention, there is provided a storage medium having stored thereon a computer program which, when run, performs the image processing method of any one of the above.

According to a further aspect of the present invention, there is provided a computer program for executing the image processing method of any one of the above when executed by a computer or processor, the computer program further being for implementing the modules in the image processing apparatus of any one of the above.

According to the classification network provided by the embodiment of the invention, the branch structure for classifying the direction of the object to be classified is added under the general framework of the classification network, and the classification of the object to be classified is based on the direction classification result of the added branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensions, and the accuracy of the classification of the object in the image can be improved based on the image processing method, the image processing device and the image processing system of the classification network.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following more particular description of embodiments of the present invention, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, and not constitute a limitation to the invention. In the drawings, like reference numerals generally refer to like parts or steps.

Fig. 1 shows a schematic block diagram of a classification network according to an embodiment of the invention.

Fig. 2 shows a schematic diagram of a training process of a classification network according to an embodiment of the invention.

Fig. 3 shows a schematic flow chart of an image processing method according to an embodiment of the invention.

Fig. 4 shows a schematic block diagram of an image processing apparatus according to an embodiment of the present invention.

Fig. 5 shows a schematic block diagram of an image processing system according to an embodiment of the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, exemplary embodiments according to the present invention will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some embodiments of the present invention and not all embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein. Based on the embodiments of the invention described in the present application, all other embodiments that a person skilled in the art would have without inventive effort shall fall within the scope of the invention.

Fig. 1 shows a schematic block diagram of a classification network 100 according to an embodiment of the invention. As shown in fig. 1, the classification network 100 includes a feature extraction sub-network 110, a target direction classification sub-network 120, and a target class classification sub-network 130. Wherein the feature extraction sub-network 110 is used for extracting features of the input image. The target direction classification sub-network 120 is configured to generate a direction classification result of a target to be classified in the input image based on the result of feature extraction output by the feature extraction sub-network 110. The target class classification sub-network 130 is configured to generate a class classification result of the target to be classified based on the feature extraction result output by the feature extraction sub-network 110 and the direction classification result output by the target direction classification sub-network 120.

In an embodiment of the present invention, the object to be classified in the input image may include an object such as a pedestrian, a face, a car, or others. The final purpose of the classification network 100 is to classify the class of the object to be classified in the input image, i.e. determine the probability of whether the object to be classified is a pedestrian, a face, a car or other object. For example, the classification network 100 may be a network for two classifications, in which case the classification network 100 may be used to determine whether the object to be classified in the input image is a target object of some type, such as a face, a pedestrian, or the like. As another example, the classification network 100 may also be a multi-classification network, in which case the classification network 100 may be used to determine what types of target objects are each of the targets to be classified in the input image.

In an embodiment of the present invention, the classification network 100 includes a target direction classification sub-network 120, the target direction classification sub-network 120 being a sub-network capable of classifying the direction of a target to be classified in an input image. The direction of the target to be classified may refer to a positional relationship of a key part of the target to be classified in the input image. For example, when the object to be classified is a pedestrian, the directions of the object to be classified may include a first direction and a second direction, wherein the first direction may refer to that in the input image, the head of the pedestrian is above and the foot is below (i.e., the person is sitting); the second direction may refer to the input image with the pedestrian's feet above and the head below (i.e., the person is upside down). For another example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a third direction and a fourth direction in addition to the aforementioned first direction and second direction, wherein the third direction may refer to that the head of the pedestrian is left and the foot is right in the input image; the fourth direction may refer to the pedestrian's foot to the left and head to the right in the input image. For another example, when the object to be classified is a face, the directions of the object to be classified may include a first direction and a second direction, where the first direction may refer to that eyes on the face are above and the mouth is below (i.e., the face is being) in the input image; the second direction may refer to the input image with the mouth above and the eyes below on the face (i.e., the face is flipped over). In other examples, the direction of the object to be classified may also be other, and is not exemplified here.

In general, the target direction classification sub-network 120 may be configured to determine a direction of a target to be classified in the input image based on the features extracted by the feature extraction sub-network 110, where the direction characteristic of the target to be classified facilitates determining a class of the target to be classified. Based on this, the target class classification sub-network 130 may output a class classification result of the target to be classified based on the features extracted by the feature extraction sub-network 110 and the direction classification result of the target to be classified output by the target direction classification sub-network 120. Compared with the classification based on the result of feature extraction, the classification network 100 according to the embodiment of the invention classifies the classification of the object to be classified according to the direction classification result of the object to be classified, so that a more accurate classification result can be obtained, and the classification network can be used in various visual tasks to improve the network performance.

The classification network 100 of the present invention is further described below in conjunction with fig. 2. Fig. 2 shows a schematic diagram of a training process of the classification network 100 according to an embodiment of the invention. In fig. 2, two classifications are taken as examples. As shown in fig. 2, the sample image I may be an existing sample, on the basis of which a multidirectional sample may be constructed. In general, objects to be classified in an actual security scene have obvious directivity, so that, for example, a sample image I can be turned upside down to generate the sample image I _flip . Based on the constructed samples in different directions, the samples can be marked to obtain sample labels. The labeled differently oriented samples are then input to the classification network 100 to train the classification network.

As shown in fig. 2, a sample image I and a sample image I _flip Is input to the feature extraction subnetwork 110. Feature extraction subnetwork 110 may include a convolutional layer, a pooling layer, or the like network structure. The feature extraction sub-network 110 extracts feature vectors of the sample image. The outputs of the feature extraction sub-network 110 are input to the target class classification sub-network 130 and the target direction classification sub-network 120.

The target class classification sub-network 130 may include a convolutional layer, a pooling layer, a fully-connected layer, and the like. The target class classification sub-network 130 outputs a class classification result, e.g., classification probability P, of the sample image based on the feature vector output by the feature extraction sub-network 110 _cls And 1-P _cls . Wherein P is _cls Can represent the probability that the object to be classified in the sample image is a pedestrian, 1-P _cls The probability that the object to be classified in the sample image is not a pedestrian may be represented. The loss function of the target class classification sub-network 130 may be represented as L _cls It may be a common class loss function such as cross entropy loss or the like. Loss function L of target class classification sub-network 130 _cls Can be determined by classifying probability P _cls And sample tag y, i.e. L _cls ＝Loss(p _cls ,y)。

The target direction classification sub-network 120 may include a convolutional layer, a pooling layer, a fully-connected layer, and the like. The target direction classification sub-network 120 outputs a direction classification result, e.g., classification probability P, of the sample image based on the feature vector output by the feature extraction sub-network 110 _flip And 1-P _flip . Wherein P is _flip Can represent the probability that the direction of the object to be classified in the sample image is the first direction (for example, the forward direction), 1-P _flip The probability that the direction of the object to be classified in the sample image is not the first direction (e.g., is a reverse direction) may be represented. The loss function of the target direction classification sub-network 120 may be represented as L _flip . Loss function L of target direction classification sub-network 120 _flip Can be determined by classifying probability P _flip And sample tag y _flip Performing calculations, i.e. L _flip ＝Loss(p _flip ,y _flip )。

During the training process, feature extraction sub-network 110, target class classification sub-network 130, and target direction classification sub-network 120 may be jointly trained under the supervision of a loss function L of classification network 100. Illustratively, the loss function L of the classification network 100 may be constructed as the loss function L of the target class classification sub-network 130 _cls And a loss function L of the target direction classification sub-network 120 _flip The sum of L=L _cls +L _flip . In other examples, the loss function L of the classification network 100 may also be constructed to be L-based _cls And L _flip Is a loss function of other forms of (a).

It should be noted that, in the embodiment of the present invention, the loss function L of the target class classification sub-network 130 _cls And a loss function L of the target direction classification sub-network 120 _flip The specific form of (2) is not limited, L _cls And L _flip Various suitable loss functions may be employed, either existing or occurring in the future.

The output of the classification network 100 is P _cls I.e. the output of the target direction classification sub-network 120 is not output as an overall network of the classification network 100, but is only used to assist the target class classification sub-network 130 in classifying classes of targets to be classifiedClass. In the network inference process, the output of the classification network 100 is identical to the output of the classical classification network (i.e., the classification network comprising only the feature extraction sub-network and the target class classification sub-network), i.e., the discrimination scores of the targets to be classified. Thus, the overall network inference process does not increase forward inference complexity.

Based on the above description, the classification network according to the embodiment of the invention increases the branch structure for classifying the direction of the object to be classified under the general framework of the classification network, and makes the classification of the object to be classified based on the direction classification result of the increased branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensions without increasing the inference complexity of the network.

An image processing method according to another aspect of the present invention, which may be performed by a classification network according to an embodiment of the present invention, is described below in connection with fig. 3. Fig. 3 shows a schematic flow chart of an image processing method 300 according to an embodiment of the invention. As shown in fig. 3, the image processing method 300 may include the steps of:

in step S310, an input image is acquired, and feature extraction is performed on the input image.

In step S320, a direction classification result of the object to be classified in the input image is generated based on the result of the feature extraction.

In step S330, a classification result of the object to be classified is generated based on the result of the feature extraction and the direction classification result.

In an embodiment of the present invention, the object to be classified in the input image may include an object such as a pedestrian, a face, a car, or others. The final purpose of the image processing method 300 is to classify the class of the object to be classified in the input image, i.e. determine the probability of whether the object to be classified is a pedestrian, a face, a car or other object. Taking two classification as an example, the image processing method 300 may determine whether the object to be classified in the input image is a certain type of object, such as a face, a pedestrian, or the like.

In the embodiment of the invention, after the input image is subjected to feature extraction, the direction characteristic of the object to be classified in the input image is judged according to the result of feature extraction, and then the class classification of the object to be classified is judged based on the result of feature extraction and by means of the direction characteristic of the object to be classified in the input image. The direction of the target to be classified may refer to a positional relationship of a key part of the target to be classified in the input image. For example, when the object to be classified is a pedestrian, the directions of the object to be classified may include a first direction and a second direction, wherein the first direction may refer to that in the input image, the head of the pedestrian is above and the foot is below (i.e., the person is sitting); the second direction may refer to the input image with the pedestrian's feet above and the head below (i.e., the person is upside down). For another example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a third direction and a fourth direction in addition to the aforementioned first direction and second direction, wherein the third direction may refer to that the head of the pedestrian is left and the foot is right in the input image; the fourth direction may refer to the pedestrian's foot to the left and head to the right in the input image. For another example, when the object to be classified is a face, the directions of the object to be classified may include a first direction and a second direction, where the first direction may refer to that eyes on the face are above and the mouth is below (i.e., the face is being) in the input image; the second direction may refer to the input image with the mouth above and the eyes below on the face (i.e., the face is flipped over). In other examples, the direction of the object to be classified may also be other, and is not exemplified here.

The direction characteristics of the objects to be classified are helpful to finally judge the types of the objects to be classified, so that the image processing method according to the embodiment of the invention can obtain more accurate type classification results compared with the type classification which is carried out only by means of the results of feature extraction, and can be used in various visual tasks to improve the classification performance.

In an embodiment of the present invention, the image processing method 300 may be performed by a trained classification network, which may be the classification network described above in connection with fig. 1 and 2. Those skilled in the art can appreciate the structure and operation of the classification network for performing the image processing method 300 in conjunction with the foregoing description, and for brevity, will not be described in detail herein.

The image processing method according to the embodiment of the present invention is exemplarily shown above. Illustratively, the sorting network method according to an embodiment of the present invention may be implemented in a device, apparatus or system having a memory and a processor. In addition, the image processing method according to the embodiment of the invention can be conveniently deployed on mobile equipment such as smart phones, tablet computers, personal computers and the like. Alternatively, the image processing method according to the embodiment of the invention may be deployed at the server (or cloud). Alternatively, the image processing method according to the embodiment of the present invention may be distributed and deployed at the server side (or cloud side) and the personal terminal.

An image processing apparatus provided in still another aspect of the present invention is described below with reference to fig. 4. Fig. 4 shows a schematic block diagram of an image processing apparatus 400 according to an embodiment of the invention.

As shown in fig. 4, the image processing apparatus 400 according to an embodiment of the present invention includes a feature extraction module 410, a direction classification module 420, and a category classification module 430. The feature extraction module 410 is configured to obtain an input image, and perform feature extraction on the input image. The direction classification module 420 is configured to generate a direction classification result of the object to be classified in the input image based on the result of the feature extraction. The category classification module 430 is configured to generate a category classification result of the object to be classified based on the feature extraction result and the direction classification result. The respective modules may perform the respective steps/functions of the image processing method described above in connection with fig. 3, respectively.

In an embodiment of the present invention, the object to be classified in the input image may include an object such as a pedestrian, a face, a car, or others. The final purpose of the image processing apparatus 400 is to classify the class of the object to be classified in the input image, i.e. to determine the probability of whether the object to be classified is a pedestrian, a face, a car or other object. Taking two classification as an example, the image processing apparatus 400 may determine whether the object to be classified in the input image is a certain type of object, for example, whether it is a face, whether it is a pedestrian, or the like.

In an embodiment of the present invention, the feature extraction module 410 is configured to perform feature extraction on an input image, and after a result of feature extraction is obtained, the direction classification module 420 determines a direction characteristic of a target to be classified in the input image according to the result of feature extraction, and then the class classification module determines class classification of the target to be classified based on the result of feature extraction and by means of the direction characteristic of the target to be classified in the input image. The direction of the target to be classified may refer to a positional relationship of a key part of the target to be classified in the input image. For example, when the object to be classified is a pedestrian, the directions of the object to be classified may include a first direction and a second direction, wherein the first direction may refer to that in the input image, the head of the pedestrian is above and the foot is below (i.e., the person is sitting); the second direction may refer to the input image with the pedestrian's feet above and the head below (i.e., the person is upside down). For another example, when the object to be classified is a pedestrian, the direction of the object to be classified may include a third direction and a fourth direction in addition to the aforementioned first direction and second direction, wherein the third direction may refer to that the head of the pedestrian is left and the foot is right in the input image; the fourth direction may refer to the pedestrian's foot to the left and head to the right in the input image. For another example, when the object to be classified is a face, the directions of the object to be classified may include a first direction and a second direction, where the first direction may refer to that eyes on the face are above and the mouth is below (i.e., the face is being) in the input image; the second direction may refer to the input image with the mouth above and the eyes below on the face (i.e., the face is flipped over). In other examples, the direction of the object to be classified may also be other, and is not exemplified here.

Because the direction characteristics of the objects to be classified are helpful to finally judge the types of the objects to be classified, the image processing device provided by the embodiment of the invention can obtain more accurate type classification results compared with the type classification which is carried out only by means of the results of feature extraction, and can be used in various visual tasks to improve the classification performance.

In an embodiment of the present invention, the modules of the image processing apparatus 400 may be implemented by a trained classification network, which may be the classification network described above in connection with fig. 1 and 2. For example, the feature extraction module 410 of the image processing apparatus 400 is implemented by the feature extraction sub-network 110 of the classification network 100, the direction classification module 420 of the image processing apparatus 400 is implemented by the target direction classification sub-network 110 of the classification network 100, and the class classification module 430 of the image processing apparatus 400 is implemented by the target class classification sub-network 130 of the classification network 100. Those skilled in the art can understand the structure and operation of each module for executing the image processing apparatus 400 in conjunction with the foregoing description, and for brevity, a detailed description thereof will be omitted herein.

An image processing system according to still another aspect of the present invention is described below with reference to fig. 5. Fig. 5 shows a schematic block diagram of an image processing system 500 according to an embodiment of the invention.

Fig. 5 shows a schematic block diagram of an image processing system 500 according to an embodiment of the invention. The image processing system 500 includes a storage device 510 and a processor 520.

Wherein the storage means 510 stores program code for implementing the respective steps in the image processing method according to an embodiment of the present invention. The processor 520 is adapted to run program code stored in the storage means 510 for performing the respective steps of the image processing method according to an embodiment of the invention and for implementing the respective modules in the image processing apparatus according to an embodiment of the invention.

In one embodiment, the program code, when executed by the processor 520, causes the image processing system 500 to perform the steps of: acquiring an input image and extracting features of the input image; generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and generating a category classification result of the object to be classified based on the feature extraction result and the direction classification result.

In one embodiment of the invention, the program code, when executed by the processor 520, causes the image processing system 500 to perform the steps performed by a trained classification network comprising: the characteristic extraction sub-network is used for extracting the characteristics of the input image; a target direction classification sub-network for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction; and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.

Furthermore, according to an embodiment of the present invention, there is also provided a storage medium on which program instructions are stored, which program instructions, when being executed by a computer or a processor, are for performing the respective steps of the image processing method of the embodiment of the present invention, and for realizing the respective modules in the image processing apparatus according to the embodiment of the present invention. The storage medium may include, for example, a memory card of a smart phone, a memory component of a tablet computer, a hard disk of a personal computer, read-only memory (ROM), erasable programmable read-only memory (EPROM), portable compact disc read-only memory (CD-ROM), USB memory, or any combination of the foregoing storage media.

In an embodiment, the computer program instructions may implement the respective functional modules of the image processing apparatus according to the embodiments of the present invention when being executed by a computer and/or may perform the image processing method according to the embodiments of the present invention.

In one embodiment, the computer program instructions, when executed by a computer or processor, cause the computer or processor to perform the steps of: acquiring an input image and extracting features of the input image; generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and generating a category classification result of the object to be classified based on the feature extraction result and the direction classification result.

In one embodiment of the invention, the computer program instructions, when executed by a computer or processor, cause the steps performed by the computer or processor to be performed by a trained classification network comprising: the characteristic extraction sub-network is used for extracting the characteristics of the input image; a target direction classification sub-network for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction; and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.

Furthermore, according to an embodiment of the present invention, there is also provided a computer program, which may be stored on a cloud or local storage medium. Which when executed by a computer or processor is adapted to carry out the respective steps of the image processing method of an embodiment of the invention and to carry out the respective modules in the image processing apparatus according to an embodiment of the invention.

Based on the above description, the classification network according to the embodiment of the invention increases the branch structure for classifying the direction of the object to be classified under the general framework of the classification network, and enables the classification of the object to be classified based on the direction classification result of the increased branch structure, so that the classification accuracy performance of the classification network can be improved through different dimensions, and the image processing method, device and system based on the classification network can be enabled to improve the accuracy of the classification of the object in the image.

Although the illustrative embodiments have been described herein with reference to the accompanying drawings, it is to be understood that the above illustrative embodiments are merely illustrative and are not intended to limit the scope of the present invention thereto. Various changes and modifications may be made therein by one of ordinary skill in the art without departing from the scope and spirit of the invention. All such changes and modifications are intended to be included within the scope of the present invention as set forth in the appended claims.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, e.g., the division of the elements is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple elements or components may be combined or integrated into another device, or some features may be omitted or not performed.

In the description provided herein, numerous specific details are set forth. However, it is understood that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Similarly, it should be appreciated that in order to streamline the invention and aid in understanding one or more of the various inventive aspects, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof in the description of exemplary embodiments of the invention. However, the method of the present invention should not be construed as reflecting the following intent: i.e., the claimed invention requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.

It will be understood by those skilled in the art that all of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or units of any method or apparatus so disclosed, may be combined in any combination, except combinations where the features are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings), may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.

Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features but not others included in other embodiments, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

Various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that some or all of the functions of some of the modules in an item analysis device according to embodiments of the present invention may be implemented in practice using a microprocessor or Digital Signal Processor (DSP). The present invention can also be implemented as an apparatus program (e.g., a computer program and a computer program product) for performing a portion or all of the methods described herein. Such a program embodying the present invention may be stored on a computer readable medium, or may have the form of one or more signals. Such signals may be downloaded from an internet website, provided on a carrier signal, or provided in any other form.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The use of the words first, second, third, etc. do not denote any order. These words may be interpreted as names.

The foregoing description is merely illustrative of specific embodiments of the present invention and the scope of the present invention is not limited thereto, and any person skilled in the art can easily think about variations or substitutions within the scope of the present invention. The protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A classification network comprising a feature extraction sub-network, a target direction classification sub-network, and a target class classification sub-network, wherein:

the characteristic extraction sub-network is used for extracting characteristics of the input image;

the target direction classification sub-network is used for generating a direction classification result of a target to be classified in the input image based on the feature extraction result; the direction classification result comprises the relative position relation between at least two key parts of the target to be classified;

the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result;

for one target category, the target category corresponds to a preset direction category, and when the direction classification result of the target to be classified is the preset direction category of the one target category and the feature extraction result of the target to be classified is the feature extraction result of the target category, the target category classification sub-network classifies the target to be classified as the target category.

2. The classification network of claim 1, wherein the classification network is trained to employ a loss function that is a sum of a loss function of the target class classification sub-network and a loss function of the target direction classification sub-network.

3. The classification network according to claim 1 or 2, wherein the feature extraction sub-network comprises a convolutional layer and a pooling layer, the target direction classification sub-network comprises a convolutional layer, a pooling layer, and a fully-connected layer, and the target class classification sub-network comprises a convolutional layer, a pooling layer, and a fully-connected layer.

4. An image processing method, characterized in that the image processing method comprises:

acquiring an input image and extracting features of the input image;

generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and

generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result;

the direction classification result comprises the relative position relation between at least two key parts of the target to be classified;

5. The image processing method of claim 4, wherein the image processing method is performed by a trained classification network, the classification network comprising:

the characteristic extraction sub-network is used for extracting the characteristics of the input image;

a target direction classification sub-network for generating a direction classification result of a target to be classified in the input image based on the result of the feature extraction; and

and the target category classification sub-network is used for generating a category classification result of the target to be classified based on the feature extraction result and the direction classification result.

6. The image processing method according to claim 5, wherein the loss function adopted by the classification network at the time of training is a sum of the loss function of the target class classification sub-network and the loss function of the target direction classification sub-network.

7. The image processing method according to claim 5 or 6, wherein the feature extraction sub-network includes a convolution layer and a pooling layer, the target direction classification sub-network includes a convolution layer, a pooling layer, and a full-connection layer, and the target class classification sub-network includes a convolution layer, a pooling layer, and a full-connection layer.

8. An image processing apparatus, characterized in that the image processing apparatus comprises:

the feature extraction module is used for acquiring an input image and extracting features of the input image;

the direction classification module is used for generating a direction classification result of the target to be classified in the input image based on the result of the feature extraction; and

the category classification module is used for generating a category classification result of the object to be classified based on the feature extraction result and the direction classification result;

9. An image processing system, characterized in that it comprises a processor and a storage means, on which a computer program is stored, which computer program, when being executed by the processor, performs the image processing method according to any of claims 4-7.

10. A storage medium having stored thereon a computer program which, when run, performs the image processing method according to any of claims 4-7.