CN106407991B

CN106407991B - Image attribute recognition method and system and related network training method and system

Info

Publication number: CN106407991B
Application number: CN201610825966.8A
Authority: CN
Inventors: 汤晓鸥; 石武; 吕健勤
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2016-09-14
Filing date: 2016-09-14
Publication date: 2020-02-11
Anticipated expiration: 2036-09-14
Also published as: CN106407991A

Abstract

The application discloses an image attribute identification method and system and a related network training method and system. The image attribute identification method comprises the following steps: extracting a feature map from the image, wherein the feature map contains attributes of the image; identifying a plurality of candidate attributes from the feature map; respectively determining confidence degrees of a plurality of candidate attributes; and determining at least one candidate attribute from the plurality of candidate attributes as the attribute of the image according to the determined confidence degrees. The image attribute identification method and the image attribute identification system improve the accuracy of image attribute identification.

Description

Image attribute recognition method and system and related network training method and system

Technical Field

The application relates to the field of deep learning, in particular to an image attribute identification method and system and a related network training method and system.

Background

Image attributes are the actual properties that an image objectively possesses, including but not limited to, the type, color, shape, etc. of the image. As a typical representative of the deep learning Network, CNN (convolutional Neural Network) is increasingly widely used in the field of image attribute recognition.

In a conventional method for performing image attribute recognition by using a CNN, some candidate attributes of an image are predicted by training a plurality of sub-neural networks, and the recognition accuracy is further improved by averaging the candidate attributes.

Disclosure of Invention

The application provides a technical scheme of image attribute recognition and a technical scheme of related network training.

An aspect of an embodiment of the present application provides an image attribute identification method, which may include: extracting a feature map from the image, wherein the feature map contains attributes of the image; identifying a plurality of candidate attributes from the feature map; respectively determining confidence degrees of a plurality of candidate attributes; and determining at least one candidate attribute from the plurality of candidate attributes as the attribute of the image according to the determined confidence degrees.

Another aspect of the embodiments of the present application provides a training method for an image attribute recognition system, where the image attribute recognition system includes an attribute pre-recognition neural network, and the attribute pre-recognition neural network includes a feature extraction layer and a plurality of pre-recognition sub-neural networks connected thereto, and the training method includes: extracting a training feature map from a training image with reference attributes through a feature extraction layer; respectively identifying a plurality of training alternative attributes from the extracted training characteristic graph through a plurality of pre-identified sub neural networks; selecting a pre-recognition sub-neural network with the minimum attribute error between the training alternative attribute and the reference attribute from the plurality of pre-recognition sub-neural networks; and correcting parameters of the selected pre-identified sub-neural network by back-propagating the attribute error in the attribute pre-identified neural network until the training result satisfies a predetermined convergence condition.

Another aspect of an embodiment of the present application provides an image attribute identification system, which may include: a feature extraction unit that extracts a feature map from the image, the feature map including an attribute of the image; a pre-recognition unit which recognizes a plurality of candidate attributes from the feature map; and a selecting unit that determines the confidence degrees of the plurality of candidate attributes, and determines at least one candidate attribute from the plurality of candidate attributes as an attribute of the image according to each determined confidence degree.

Another aspect of the embodiments of the present application provides a training system for an image attribute recognition system, where the image attribute recognition system may include an attribute pre-recognition neural network, and the attribute pre-recognition neural network may include a feature extraction layer and a plurality of pre-recognition sub-neural networks connected thereto, and is characterized in that the training system: extracting a training feature map from a training image with reference attributes through a feature extraction layer; respectively identifying a plurality of training alternative attributes from the extracted training characteristic graph through a plurality of pre-identified sub neural networks; selecting a pre-recognition sub-neural network with the minimum attribute error between the training alternative attribute and the reference attribute from the plurality of pre-recognition sub-neural networks; and correcting parameters of the selected pre-identified sub-neural network by back-propagating the attribute error in the attribute pre-identified neural network until the training result satisfies a predetermined convergence condition.

Another aspect of an embodiment of the present application provides an image attribute identification system, which may include: a memory storing executable instructions; and a processor in communication with the memory to execute the executable instructions to: extracting a feature map from the image, wherein the feature map contains attributes of the image; identifying a plurality of candidate attributes from the feature map; respectively determining confidence degrees of a plurality of candidate attributes; and determining at least one candidate attribute from the plurality of candidate attributes as the attribute of the image according to the determined confidence degrees.

Another aspect of the embodiments of the present application provides a training system for an image attribute recognition system, which may include: a memory storing executable instructions; and a processor in communication with the memory to execute the executable instructions to: extracting a training feature map from a training image with reference attributes through a feature extraction layer; respectively identifying a plurality of training alternative attributes from the extracted training characteristic graph through a plurality of pre-identified sub neural networks; selecting a pre-recognition sub-neural network with the minimum attribute error between the training alternative attribute and the reference attribute from the plurality of pre-recognition sub-neural networks; and correcting parameters of the selected pre-identified sub-neural network by back-propagating the attribute error in the attribute pre-identified neural network until the training result satisfies a predetermined convergence condition.

Another aspect of embodiments of the present application provides a non-transitory computer storage medium that may store computer-readable instructions that, when executed, may cause a processor to: extracting a feature map from the image, wherein the feature map contains attributes of the image; identifying a plurality of candidate attributes from the feature map; respectively determining confidence degrees of a plurality of candidate attributes; and determining at least one candidate attribute from the plurality of candidate attributes as the attribute of the image according to the determined confidence degrees.

Another aspect of embodiments of the present application provides a non-transitory computer storage medium that may store computer-readable instructions that, when executed, may cause a processor to: extracting a training feature map from a training image with reference attributes through a feature extraction layer; respectively identifying a plurality of training alternative attributes from the extracted training characteristic graph through a plurality of pre-identified sub neural networks; selecting a pre-recognition sub-neural network with the minimum attribute error between the training alternative attribute and the reference attribute from the plurality of pre-recognition sub-neural networks; and correcting parameters of the selected pre-identified sub-neural network by back-propagating the attribute error in the attribute pre-identified neural network until the training result satisfies a predetermined convergence condition.

Different from the prior art that each alternative attribute is subjected to average processing, the technical scheme of the application fully considers the difference among the alternative attributes by respectively determining the confidence degrees of the multiple alternative attributes, so that the specificity and the pertinence of the alternative attributes are enhanced. On the basis, at least one candidate attribute is selected from the candidate attributes as the attribute of the image by judging the confidence degree of each candidate attribute of the image, and the identification accuracy is improved.

Drawings

Other features, objects, and advantages of the present application will become more apparent upon reading the following detailed description made with reference to the accompanying drawings in which:

FIG. 1 is a flow diagram illustrating image attribute identification according to an embodiment of the present application;

FIG. 2 is an architecture diagram illustrating an attribute pre-recognition neural network according to an embodiment of the present application;

FIG. 3 is a flow chart illustrating a method of training an attribute pre-recognition neural network according to an embodiment of the present application;

FIG. 4 is a schematic diagram illustrating the relationship of an supervised neural network to an attribute pre-identified neural network in accordance with an embodiment of the present application;

FIG. 5 is a flow chart illustrating a training method of an supervised neural network in accordance with an embodiment of the present application;

FIG. 6 is a block diagram illustrating an image attribute identification system according to an embodiment of the present application; and

FIG. 7 is a block diagram illustrating a computer system suitable for implementing embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the accompanying drawings and embodiments. It is to be understood that the specific embodiments described herein are merely illustrative of the application and are not limiting of the application. It should be noted that, for convenience of description, only the portions related to the present application are shown in the drawings. The present application will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

In a conventional method for performing image attribute recognition by using a CNN, some candidate attributes of an image are predicted by training a plurality of sub-neural networks, and the recognition accuracy is further improved by averaging the candidate attributes. In this image attribute identification method, the difference of the candidate attributes of the image is ignored to some extent, and thus the final identification accuracy is not good.

FIG. 1 is a flow chart 1000 illustrating image attribute identification according to an embodiment of the present application. First, in step S1010, a feature map may be extracted from the image, the feature map including attributes of the image. The image may be, for example, an electronic picture stored in the form of RGB values. The feature map may be, for example, a semantic representation of the image in the CNN domain. Attributes of an image include, but are not limited to, the type, color, shape, etc. of the image. In step S1030, a plurality of candidate attributes may be identified from the feature map. The plurality of candidate attributes may be identified, for example, in different modes. For example, in the color attribute identification of an image, one candidate attribute may be identified in each of an overexposed mode, a normally exposed mode, and an underexposed mode. In this case, the alternative property may be a color distribution of the image, for example expressed in the form of RGB values. In step S1050, confidence levels for the plurality of candidate attributes may be determined. Still taking the color attribute identification described above as an example, the confidence level of the candidate attributes identified in each mode may be determined according to certain criteria (e.g., various feature representations of the image). Finally, in step S1070, at least one candidate attribute may be determined from the plurality of candidate attributes as an attribute of the image according to the determined confidence degrees.

Fig. 2 is an architecture diagram illustrating an attribute pre-recognition neural network 2000 according to an embodiment of the present application. Hereinafter, the attribute pre-recognition neural network 2000 is implemented in the form of CNN. However, it should be understood that the attribute pre-recognition neural network 2000 may be implemented in any cascaded deep learning network. For clarity, some well-known structures, including but not limited to active layers, pooling layers, and portions of convolutional layers, have been omitted from FIG. 2. As understood by those skilled in the art, the structure of the attribute pre-recognition neural network 2000 is not limited to the specific structure shown in fig. 2. The attribute pre-recognition neural network 2000 may have any specific structure without departing from the inventive concept of the present application, including, but not limited to, the number of layers, the functions of the layers, the connection relationship of the layers, and the like.

As shown in fig. 2, extracting the feature map 2300 from the image 2100 may be accomplished as follows: the feature extraction layer 2200 of the pre-recognition neural network 2000 extracts the feature map 2300 from the image 2100 in a convolution manner. For simplicity, only one feature extraction layer 2200 is shown, however, the feature extraction layer may be multiple layers, the number of which is determined by actual requirements. The feature extraction layer 2200 may extract features of the image 2100 for subsequent attribute identification operations. In this way, features with different properties can be extracted by setting different convolution kernels (such as weights and offset values), so that a filtering function is realized, the signal-to-noise ratio of the feature map is improved, and the feature representation capability of the feature map is enhanced. The extracted features constitute a feature map 2300, wherein the feature map 2300 contains attributes of the image.

After the extraction of the feature map 2300 is completed, the feature map 2300 may be input to a plurality of pre-recognizer neural networks connected to the feature extraction layer 2200, such as a first pre-recognizer neural network 2410, a second pre-recognizer neural network 2420, a third pre-recognizer neural network 2430, and the like, shown in fig. 2. A plurality of candidate attributes, e.g., a first candidate attribute 2510, a second candidate attribute 2520, a third candidate attribute 2530, etc., may be identified from the feature map 2300 by a plurality of

pre-identified sub-neural networks

2410, 2420, 2430, etc., of the attribute pre-identified neural network 2000. The first, second, and third pre-recognizer

neural networks

2410, 2420, 2430, etc. may have different parameters to operate in different operating modes and may all be connected to the feature extraction layer 2200. Since each of the pre-recognizer neural networks may have different parameters, each of the pre-recognizer neural networks may operate in different modes. For example, using color attribute identification as an example, the first pre-recognizer neural network 2410 may be adapted to operate in an overexposure mode, the second pre-recognizer neural network 2420 may be adapted to operate in a normal exposure mode, and the third pre-recognizer neural network 2430 may be adapted to operate in an underexposure mode. In this case, if the image 2100 is an overexposed image, the first pre-recognizer neural network 2410 may make a more accurate recognition of the color attribute (i.e., the confidence in the first candidate attribute 2510 is higher); if the image 2100 is a normally exposed image, the second pre-recognizer neural network 2420 may make a more accurate recognition of the color attribute (i.e., the confidence of the second alternative attribute 2520 is higher); if image 2100 is an underexposed image, third pre-recognizer neural network 2430 may make a more accurate recognition of the color attribute (i.e., the confidence of third candidate attribute 2530 is higher). In this way, each pre-identified sub-neural network can have its own unique attribute identification capability, and can provide more accurate attribute identification in the corresponding working mode, that is, can identify more accurate alternative attributes in the corresponding working mode. The training method of the attribute pre-recognition neural network 2000 is given below in conjunction with fig. 2 and 3.

In an embodiment of the present application, the attribute pre-recognition neural network 2000 may be pre-trained before performing the alternative attribute recognition. FIG. 3 is a flowchart 3000 illustrating a training method of the attribute pre-recognition neural network 2000, according to an embodiment of the present application. As shown in fig. 3, in step S3010, a training feature map may be extracted from a training image having reference attributes by the feature extraction layer 2200. The training images may be prepared in advance and labeled with reference attributes in advance. In step S3030, a plurality of training candidate attributes may be identified from the extracted training feature map through a plurality of pre-identifier neural networks (the first pre-identifier neural network 2410, the second pre-identifier neural network 2420, the third pre-identifier neural network 2430, and the like). In step S3050, a pre-recognizer neural network with the smallest attribute error between the training candidate attribute and the reference attribute may be selected from the plurality of pre-recognizer neural networks. For example, for a certain overexposed training image, the first training candidate attribute identified by the first pre-recognizer neural network 2410 may have the smallest attribute error with respect to the reference attribute. In step S3070, parameters of the selected pre-recognition sub-neural network may be modified by back-propagating (back-predicting) the attribute error in the attribute pre-recognition neural network 2000 until the training result satisfies a predetermined convergence condition. For example, the training error is less than a certain threshold, the training error falls within a certain tolerance, the training process iterates a predetermined number of times, and so on. In addition, the parameters of the non-selected pre-recognizer neural network are kept unchanged during the correction of the parameters of the selected pre-recognizer neural network by back-propagating the attribute error in the attribute pre-recognizer network. For example, in the case where the attribute error between the first training candidate attribute and the reference attribute identified by the first pre-recognizer neural network 2410 is minimum, the parameters (e.g., convolution kernel parameters, etc.) of the second pre-recognizer neural network 2420 and the third pre-recognizer neural network 2430, etc. may be locked without parameter update through back propagation. In contrast, the attribute error between the first training candidate attribute and the reference attribute may be propagated back through only the first pre-recognizer neural network 2410 in the attribute pre-recognizer neural network 2000 and thus modify the parameters of the first pre-recognizer neural network 2410. A similar training process may be performed a number of times until the attribute error converges, e.g., is less than a predetermined first threshold. The first threshold value can be preset according to actual requirements. Through such a training process, each of the pre-identified sub-neural networks of the attribute pre-identified neural network 2000 may have its own unique attribute identification capability, and may provide more accurate attribute identification in a corresponding operating mode.

In an embodiment of the present application, the confidence of the plurality of alternative attributes 2510, 2520, 2530, etc. may be determined from the image 2100 by an inspection neural network 4000 pre-trained with the attribute pre-recognition neural network 2000. The supervisory neural network 4000 may also be implemented by the structure of CNN. Fig. 4 is a schematic diagram illustrating a relationship of the supervised neural network 4000 to the attribute pre-identified neural network 2000 in accordance with an embodiment of the present application. In the co-training of the attribute pre-recognition neural network 2000 and the supervisory neural network 4000, both networks share the same input, i.e., a training image having a reference attribute is simultaneously input to the attribute pre-recognition neural network 2000 and the supervisory neural network 4000. The attribute pre-recognition neural network 2000 is trained on learned attribute errors, and the attribute pre-recognition neural network 2000 is trained on learned confidence errors of the candidate attributes. The supervised neural network 4000 can play a role of an inspector in the training process of the attribute pre-recognition neural network 2000, and by training together with the attribute pre-recognition neural network 2000 in advance, the supervised neural network 2000 can respectively determine the confidence degrees of a plurality of candidate attributes through learning, so that at least one candidate attribute can be determined from the plurality of candidate attributes according to each determined confidence degree to serve as the real attribute of the image.

Fig. 5 shows a flowchart 5000 of a training method of the supervised neural network 4000 in accordance with an embodiment of the present application. In step S5010, confidence levels of a plurality of training candidate attributes may be determined from the training images; and in step S5030, the parameters of the monitoring neural network 4000 may be corrected by back-propagating a confidence error between the determined confidence and the reference confidence in the monitoring neural network 4000 until the training result satisfies a predetermined convergence condition. For example, the training error is less than a certain threshold, the training error falls within a certain tolerance, the training process iterates a predetermined number of times, and so on. That is, in the training process of the supervised neural network 4000, the training image and the candidate attribute may be input to the supervised neural network 4000 together, the supervised neural network 4000 may determine a confidence for each training candidate attribute according to the training image, and then may compare each determined confidence with the reference confidence, respectively, to determine each confidence error and then modify the parameter of the supervised neural network 4000 using the each confidence error. Through the co-training mode, the parameters of the supervisory neural network 4000 can be adapted to the parameters of the attribute pre-recognition neural network 2000, so that after the training process is finished, even if the reference attribute and the reference confidence coefficient do not exist, the supervisory neural network 4000 can effectively judge the confidence coefficient of the alternative attribute and recognize the real attribute of the image from the multiple alternative attributes according to the confidence coefficient.

In an embodiment of the present application, the reference confidence may be determined by: the training candidate attribute identified by the pre-recognizer neural network with the smallest attribute error in the training of the attribute pre-recognition neural network 2000 has a high reference confidence (e.g., 1), and the other training candidate attributes have a low reference confidence (e.g., 0). For example, for a certain overexposed training image, the first training candidate attribute identified by the first pre-recognizer neural network 2410 may have the smallest attribute error with respect to the reference attribute. In this case, the first pre-recognizer neural network 2410 identifies the first training candidate attribute with a reference confidence level of 1, while the remaining pre-recognizer neural networks (e.g., pre-recognizer

neural networks

2420 and 2430, etc.) identify training candidate attributes with a confidence level of 0. In this case, accordingly, the confidence level of each training candidate attribute determined by the supervised neural network 4000 may be compared with the reference confidence level to derive a confidence level error. During the training process of the supervised neural network 4000, the confidence error may be propagated back to modify the parameters of the supervised neural network 4000 until the error between the reference confidence of the first training candidate attribute and 1 and the error between the second training candidate attribute and the third training candidate attribute and 0 are smaller than the respective second threshold values. The second threshold may be different or identical for different training alternative attributes and may be preset according to actual requirements. By setting the confidence reference value in this way, the confidence of the plurality of candidate attributes determined by the supervisory neural network 4000 can be prompted to be gradually binarized, thereby improving the accuracy of recognition.

In an embodiment of the present application, monitoring neural network 4000 may determine at least one candidate attribute from a plurality of candidate attributes (e.g., first candidate attribute 2510, second candidate attribute 2520, third candidate attribute 2530, etc.) as an attribute of image 2100 according to the determined confidence levels, for example, the candidate attribute with the highest confidence level may be identified as the attribute of the image. In an alternative embodiment, the first few candidate attributes may be selected as the attributes of the image according to the confidence of the candidate attributes from high to low. The image attribute determined in this way is not a simple average of a plurality of candidate attributes, and thus the unique attribute recognition capability of each pre-recognition sub neural network of the attribute pre-recognition neural network 2000 can be utilized, so that more accurate attribute recognition can be given in a corresponding operation mode according to the mode of the image.

The image attribute recognition method and the training method of the image attribute recognition system described with reference to fig. 1 to 5, and the like, may be implemented by a computer system. The computer system includes a memory storing executable instructions and a processor. The processor communicates with the memory to execute the executable instructions to implement the methods described with reference to fig. 1-5. Alternatively or additionally, the image attribute recognition method and the training method of the image attribute recognition system, and the like described with reference to fig. 1 to 5 may be implemented by a non-transitory computer storage medium. The medium stores computer readable instructions that, when executed, cause a processor to perform the method described with reference to fig. 1-5.

Fig. 6 is a block diagram illustrating an image attribute identification system 6000 according to an embodiment of the present application. The image attribute identification system 6000 may include: a feature extraction unit 6100 that extracts a feature map from the image, the feature map including an attribute of the image; a pre-recognition unit 6200 that recognizes a plurality of candidate attributes from the feature map; and a selecting unit 6300 that determines the confidence degrees of the plurality of candidate attributes, and determines at least one candidate attribute from the plurality of candidate attributes as an attribute of the image according to the determined confidence degrees, respectively. The image attribute identification system 6000 may be implemented by the architecture of the CNN, for example, the image attribute identification system 6000 may include the attribute pre-recognition neural network 2000 described with reference to fig. 2. In one embodiment of the present application, the feature extraction unit 6100 may include a feature extraction layer of the attribute pre-recognition neural network, and the feature extraction layer extracts the feature map from the image in a convolution manner. In one embodiment of the application, the pre-recognition unit 6200 may comprise a plurality of pre-recognition sub-neural networks of the attribute pre-recognition neural network, the plurality of pre-recognition sub-neural networks having different parameters to operate in different operation modes and being connected to the feature extraction layer, the plurality of pre-recognition sub-neural networks recognizing a plurality of alternative attributes from the feature map. The above-mentioned operation mode may include at least one of: an overexposure mode, a normal exposure mode and an underexposure mode. In one embodiment of the present application, the image attribute recognition system may further include a first training unit 6400 for training the attribute pre-recognition neural network, the first training unit 6400: extracting a training feature map from a training image with reference attributes through a feature extraction layer; respectively identifying a plurality of training alternative attributes from the extracted training characteristic graph through a plurality of pre-identified sub neural networks; selecting a pre-recognition sub-neural network with the minimum attribute error between the training alternative attribute and the reference attribute from the plurality of pre-recognition sub-neural networks; and correcting the parameter training result of the selected pre-recognition sub-neural network to meet a predetermined convergence condition by back-propagating the attribute error in the attribute pre-recognition neural network. For example, the training error is less than a certain threshold, the training error falls within a certain tolerance, the training process iterates a predetermined number of times, and so on. In an embodiment of the application, the first training unit 6400 may keep the parameters of the unselected pre-recognizer neural network unchanged during the correction of the parameters of the selected pre-recognizer neural network by back-propagating the attribute error in the attribute pre-recognizer neural network. In an embodiment of the present application, the selecting unit 6300 may include an supervised neural network trained together with the attribute pre-recognition neural network in advance, and the supervised neural network determines confidence degrees of a plurality of candidate attributes respectively according to the images. In one embodiment of the present application, the attribute pre-recognition neural network and the supervised neural network may share as input a training image with reference attributes during the training process. The image attribute recognition system may further comprise a second training unit 6500 for training the supervised neural network, the second training unit 6500: determining confidence degrees of a plurality of training alternative attributes according to the training images; and correcting the parameters of the supervised neural network by back-propagating confidence errors between the determined confidence and the reference confidence in the supervised neural network until the training result satisfies a predetermined convergence condition. For example, the training error is less than a certain threshold, the training error falls within a certain tolerance, the training process iterates a predetermined number of times, and so on. In one embodiment of the present application, the candidate attribute identified by the pre-recognizer neural network with the smallest attribute error in the first training unit has a high confidence of the reference (e.g., 1), and the other candidate attributes have a low confidence of the reference (e.g., 0). In an embodiment of the present application, the selection unit 6300 may identify one candidate attribute with the highest confidence as the attribute of the image. Embodiments of the present application also include a training system of an image attribute recognition system, wherein the image attribute recognition system may include an attribute pre-recognition neural network, the attribute pre-recognition neural network may include a feature extraction layer and a plurality of pre-recognition sub-neural networks connected thereto, and the training system may implement the training process described with reference to fig. 3 to train the attribute pre-recognition neural network. The image attribute identification system may also include an auditing neural network trained in conjunction with the attribute pre-identification neural network, and the training system may implement the training process described above to train the auditing neural network.

The image property recognition system described with reference to FIG. 6 and the training system of the image property recognition system described briefly may be implemented by a computer system. The computer system may include a memory storing executable instructions and a processor. The processor is in communication with the memory to execute executable instructions to implement the image attribute recognition system described with reference to FIG. 6 and the training system of the image attribute recognition system described briefly. Alternatively or additionally, the image attribute recognition system described with reference to FIG. 6 and the training system of the image attribute recognition system described briefly may be implemented by a non-transitory computer storage medium. The medium stores computer readable instructions that, when executed, cause a processor to execute the image attribute recognition system described with reference to FIG. 6 and the training system of the image attribute recognition system described briefly.

Referring now to FIG. 7, FIG. 7 is a block diagram that illustrates a computer system 7000 that is suitable for implementing embodiments of the present application.

As shown in fig. 7, the computer system 7000 may include a processing unit (such as a Central Processing Unit (CPU)7001, a Graphics Processing Unit (GPU), or the like) that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)7002 or a program loaded from a storage section 7008 into a Random Access Memory (RAM) 7003. In the RAM 7003, various programs and data required for the operation of the system 7000 are also stored. The CPU 7001, ROM 7002, and RAM 7003 are connected to one another through a bus 7004. An input/output I/O interface 7005 is also connected to the bus 7004.

The following are components that may be connected to the I/O interface 7005: an input portion 7006 including a keyboard, a mouse, and the like; an output portion 7007 including a cathode ray tube CRT, a liquid crystal display device LCD, a speaker, and the like; a storage portion 7008 including a hard disk and the like; and a communication portion 7009 including a network interface card (such as a LAN card and a modem). The communication portion 7009 can perform communication processing over a network such as the internet. The driver 7010 may also be connected to the I/O interface 7005 as necessary. A removable medium 7011 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like may be mounted on the drive 7010 so that a computer program read out therefrom is installed into the storage portion 7008 as needed.

In particular, according to embodiments of the present disclosure, the methods described above with reference to fig. 1-5 and the system described with reference to fig. 6 may be implemented as computer software programs. For example, embodiments of the disclosure may include a computer program product comprising a computer program tangibly embodied in a machine-readable medium. The computer program comprises program code for performing the methods of fig. 1-5 and implementing the system of fig. 6. In such an embodiment, the computer program can be downloaded from a network through the communication section 709 and installed, and/or can be installed from the removable medium 7011.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules referred to in the embodiments of the present application may be implemented by software or hardware. The described units or modules may also be provided in a processor. The names of these units or modules should not be construed as limiting these units or modules.

The above description is only exemplary of the present application and illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or their equivalents without departing from the inventive concept. For example, the above features and the technical features having similar functions disclosed in the present application are mutually replaced to form the technical solution.

Claims

1. An image attribute identification method, comprising:

extracting a feature map from an image, the feature map containing attributes of the image;

identifying a plurality of candidate attributes from the feature map;

determining confidence degrees of the plurality of candidate attributes respectively; and

determining at least one candidate attribute from the plurality of candidate attributes as an attribute of the image according to each of the determined confidence levels,

wherein the extracting the feature map from the image comprises:

extracting the feature map from the image in a convolution manner by a feature extraction layer of an attribute pre-recognition neural network,

wherein the identifying a plurality of candidate attributes from the feature map comprises:

identifying a plurality of candidate attributes from the feature map by a plurality of pre-identified sub-neural networks of the attribute pre-identified neural network connected to the feature extraction layer, the plurality of pre-identified sub-neural networks having different parameters to operate in different operating modes.

2. The image attribute identification method according to claim 1, characterized in that the operation mode includes at least one of: an overexposure mode, a normal exposure mode and an underexposure mode.

3. The image attribute identification method according to claim 1 or 2, further comprising pre-training the attribute pre-recognition neural network, wherein the training of the attribute pre-recognition neural network comprises:

extracting a training feature map from a training image with a reference attribute through the feature extraction layer;

identifying a plurality of training alternative attributes from the extracted training feature map through the plurality of pre-recognizer neural networks respectively;

selecting a pre-recognizer neural network with the minimum attribute error between the training candidate attribute and the reference attribute from the plurality of pre-recognizer neural networks; and

and correcting the parameters of the selected pre-recognition sub-neural network by back-propagating the attribute error in the attribute pre-recognition neural network until the training result meets a preset convergence condition.

4. The image attribute identification method of claim 3, wherein the training of the attribute pre-recognition neural network further comprises:

the parameters of the non-selected pre-identified sub-neural networks are kept unchanged during the correction of the parameters of the selected pre-identified sub-neural network by back-propagating the attribute error in the attribute pre-identified neural network.

5. The image attribute identification method of claim 3, wherein the determining the confidence levels of the plurality of candidate attributes comprises:

and respectively determining the confidence degrees of the plurality of candidate attributes according to the images through an inspection neural network which is pre-trained together with the attribute pre-recognition neural network.

6. The image attribute identification method according to claim 5, wherein the attribute pre-recognition neural network and the supervised neural network share a training image with reference attributes as input in a training process.

7. The image attribute recognition method of claim 6, wherein the training of the supervised neural network comprises:

determining confidence degrees of the plurality of training alternative attributes according to the training images respectively; and

modifying parameters of the supervised neural network by back-propagating confidence errors between the determined confidence and reference confidence in the supervised neural network until training results satisfy a predetermined convergence condition.

8. The image attribute identification method according to claim 7, wherein the reference confidence of the identified training candidate attribute of the pre-recognizer neural network with the smallest attribute error in the training of the attribute pre-recognition neural network is high, and the reference confidence of the other training candidate attributes is low.

9. The image attribute identification method according to claim 1, wherein the determining at least one candidate attribute from the plurality of candidate attributes as the attribute of the image according to the determined confidence degrees comprises:

sorting the plurality of candidate attributes from high to low according to confidence; and

and selecting a preset number of candidate attributes from the candidate attributes with the highest confidence coefficient as the attributes of the image.

10. The image attribute identification method according to claim 1, wherein the determining at least one candidate attribute from the plurality of candidate attributes as the attribute of the image according to the determined confidence degrees comprises:

and determining the candidate attribute with the highest confidence as the attribute of the image.

11. A training method of an image attribute recognition system, wherein the image attribute recognition system comprises an attribute pre-recognition neural network, and the attribute pre-recognition neural network comprises a feature extraction layer and a plurality of pre-recognition sub-neural networks connected with the feature extraction layer, and the training method comprises the following steps:

12. The method for training an image attribute recognition system according to claim 11, wherein the parameters of the unselected pre-recognition sub-neural networks are kept unchanged during the correction of the parameters of the selected pre-recognition sub-neural network by back-propagating the attribute error in the attribute pre-recognition neural network.

13. The training method of the image attribute recognition system according to claim 11 or 12, wherein the image recognition system further comprises a supervised neural network, the attribute pre-recognition neural network and the supervised neural network share a training image with a reference attribute as an input during training, the training method of the supervised neural network comprising:

14. An image attribute identification system, comprising:

a feature extraction unit that extracts a feature map from an image, the feature map containing an attribute of the image;

a pre-recognition unit which recognizes a plurality of candidate attributes from the feature map; and

a selecting unit that determines the confidence degrees of the plurality of candidate attributes, respectively, and determines at least one candidate attribute from the plurality of candidate attributes as an attribute of the image according to each of the determined confidence degrees,

wherein the feature extraction unit includes a feature extraction layer of an attribute pre-recognition neural network, and the feature extraction layer extracts the feature map from the image in a convolution manner,

wherein the pre-recognition unit recognizes a plurality of candidate attributes from the feature map through a plurality of pre-recognizer neural networks of the attribute pre-recognition neural network connected to the feature extraction layer, the plurality of pre-recognizer neural networks having different parameters to operate in different operation modes.

15. The image attribute recognition system of claim 14, wherein the operational mode comprises at least one of: an overexposure mode, a normal exposure mode and an underexposure mode.

16. The image attribute recognition system of claim 14 or 15, further comprising a first training unit for training the attribute pre-recognition neural network, the first training unit:

17. The image attribute recognition system of claim 16, wherein the first training unit keeps the parameters of the unselected pre-recognizer neural network unchanged during the modification of the parameters of the selected pre-recognizer neural network by back-propagating the attribute error in the attribute pre-recognizer neural network.

18. The image attribute recognition system of claim 16, wherein the selection unit comprises an supervised neural network trained together with the attribute pre-recognition neural network in advance, the supervised neural network determining the confidence levels of the plurality of candidate attributes from the image, respectively.

19. The image attribute recognition system of claim 18, wherein the attribute pre-recognition neural network and the supervised neural network share as input a training image having reference attributes during a training process.

20. The image attribute recognition system of claim 19, further comprising a second training unit for training the supervised neural network, the second training unit:

21. The image attribute recognition system of claim 20, wherein the pre-recognizer neural network with the smallest attribute error in the first training unit recognizes the training candidate attribute with a high reference confidence and the other training candidate attributes with a low reference confidence.

22. The image attribute recognition system of claim 14, wherein the selection unit:

23. The image attribute recognition system of claim 14, wherein the selection unit:

and identifying the candidate attribute with the highest confidence as the attribute of the image.

24. A training system for an image attribute recognition system, the image attribute recognition system comprising an attribute pre-recognition neural network, the attribute pre-recognition neural network comprising a feature extraction layer and a plurality of pre-recognition sub-neural networks connected thereto, the training system being characterized by:

25. The training system of the image attribute recognition system of claim 24, wherein the training system keeps the parameters of the unselected pre-recognizer neural networks unchanged during the correction of the parameters of the selected pre-recognizer neural network by back-propagating the attribute error in the attribute pre-recognizer neural network.

26. The training system of the image attribute recognition system of claim 24 or 25, further comprising an auditing neural network, wherein the training system inputs training images having reference attributes together into the attribute pre-recognition neural network and the auditing neural network for training, wherein the training system:

27. A non-transitory computer storage medium storing computer readable instructions that, when executed, cause a processor to:

extracting a feature map from the image, wherein the feature map contains attributes of the image;

identifying a plurality of candidate attributes from the feature map;

respectively determining confidence degrees of a plurality of candidate attributes; and

determining at least one candidate attribute from the plurality of candidate attributes as an attribute of the image according to the determined confidence levels,

wherein the extracting the feature map from the image comprises:

28. A non-transitory computer storage medium storing computer readable instructions that, when executed, cause a processor to:

extracting a training feature map from a training image with reference attributes through a feature extraction layer;

respectively identifying a plurality of training alternative attributes from the extracted training characteristic graph through a plurality of pre-identified sub neural networks;

selecting a pre-recognition sub-neural network with the minimum attribute error between the training alternative attribute and the reference attribute from the plurality of pre-recognition sub-neural networks; and

the parameters of the selected pre-identified sub-neural network are modified by back-propagating the attribute error in the attribute pre-identified neural network until the training result satisfies a predetermined convergence condition.