CN109858482B

CN109858482B - Image key area detection method and system and terminal equipment

Info

Publication number: CN109858482B
Application number: CN201910042460.3A
Authority: CN
Inventors: 张发恩; 杨麒弘; 赵江华; 张祥伟; 秦永强
Original assignee: Ainnovation Chongqing Technology Co ltd
Current assignee: Ainnovation Chongqing Technology Co ltd
Priority date: 2019-01-16
Filing date: 2019-01-16
Publication date: 2020-04-14
Anticipated expiration: 2039-01-16
Also published as: CN109858482A

Abstract

The invention relates to a method for detecting key regions of images, a system and a terminal device thereof, which train a classification network for commodities to be classified by using a deep learning method, carry out forward reasoning on the commodities to be classified by using a neural network to obtain classes and activation images of the commodities, further carry out backward propagation on the basis of codes matched with the classes of the commodities as gradients of the neural network to obtain a gradient activation image capable of reflecting the key regions of the images to be processed, thereby removing non-important regions with low correlation degree with required classification information, reducing background interference of the original images to be processed and obtaining the required key regions. Different from the existing method needing manual labeling, the technical scheme provided by the invention can be obtained through self-training of the neural network, and compared with the key region manually labeled by people, the key region obtained by the detection method and the system thereof has better robustness.

Description

Image key area detection method and system and terminal equipment

[ technical field ] A method for producing a semiconductor device

The invention relates to the field of artificial intelligence, in particular to a method, a system and a terminal device for detecting key areas of images.

[ background of the invention ]

With the dramatic increase of image processing amount, people pay more and more attention to how to perform efficient labeling and classification processing on images. In the prior art, key areas and key points in an image are manually labeled mainly based on human priori knowledge, and the labeled key contents can enable a neural network to learn. However, such a division of the key area requires a lot of labor, material and time costs, and due to the difference of personal experiences, there is a problem that the labeling is inaccurate, especially when the label is applied to a packaged commodity such as a bottle, a box, a bag or a box with similar characteristics, such as a similar packaging label, because the characteristic difference is small, it is difficult to define a suitable key point or key area according to the personal experiences, for example, in some labeling classification processes, it often occurs that the box or the box with the similar packaging label is selected to be the same key area.

[ summary of the invention ]

In order to solve the technical problem that a proper key area is difficult to define in the prior art, the invention provides an image key area detection method, a system and terminal equipment thereof.

In order to solve the technical problems, the invention provides the following technical scheme: a method for detecting key regions of an image comprises the following steps: step S1, training to obtain a neural network; step S2, inputting an image to be processed, and carrying out forward reasoning on the image to be processed based on the neural network so as to obtain a classification result of commodities in the image to be processed and a required activation map (activation map); wherein, the activation map is a corresponding characteristic map on the last convolutional layer in the process of forward reasoning of the neural network; (ii) a Step S3, converting the classification result into a coding result, and reversely propagating the coding result as the gradient in the neural network to obtain a required gradient map; step S4, calculating the average gradient value of each channel in the gradient map corresponding to the last convolution layer; multiplying the obtained average gradient value by the corresponding channel number to obtain a product value of the corresponding gradient map; the product values are respectively subjected to weighted average operation with the activation maps obtained in step S2 to obtain a required gradient activation map (gradient activation map), which may represent key regions of the image to be processed.

Preferably, after the above step S4, the following steps are included: step S5, converting the gradient activation map into a thermodynamic map; and step S6, superposing the thermodynamic diagram and the image to be processed to obtain a key area, and optimizing the new classification neural network based on the key area.

Preferably, in the step S6, the optimizing the new classification neural network based on the key area specifically includes the following steps: cutting out the key area to obtain a new cut image so as to continuously train and optimize a new classification neural network; or adding a channel in the image to be processed to store the thermodynamic diagram, and continuing to train and optimize the new classification neural network.

Preferably, after the step S1, before the step S2 is performed, it is further determined whether the neural network training in the step S1 converges, if yes, the process proceeds to the step S2, and if no, the process returns to the step S1 to continue the training; the activation map obtained in step S2, which is specifically the activation map on the last convolutional layer in the forward estimation performed by the neural network;

preferably, in step S3, the converting the classification result into the encoding result specifically includes: and performing one-hot coding (one-bit effective coding) on the classification result, wherein each classification corresponds to one-hot coding.

In order to solve the above technical problems, the present invention provides another technical solution: an image key region detection system, comprising: a training module configured to train to obtain a neural network; the activation map acquisition module is configured to input an image to be processed, and forward reasoning is carried out on the image to be processed based on the neural network so as to obtain a classification result of commodities in the image to be processed and a required activation map; a gradient map acquisition module configured to convert the classification result into an encoding result and perform back propagation on the encoding result as a gradient in the neural network to obtain a required gradient map; and a gradient activation map obtaining module configured to calculate an average gradient value of pixels corresponding to each channel in a gradient map corresponding to the last convolutional layer; multiplying the obtained average gradient value by the corresponding channel number to obtain a product value of the corresponding gradient map; and performing weighted average operation on the product values and the activation maps obtained in the step S2 respectively to obtain a required gradient activation map, wherein the gradient activation map can represent key areas of the image to be processed.

Preferably, the image key region detection system further includes: the judging module is used for judging whether the neural network obtained by training in the training module is trained and converged; the image conversion module is used for converting the gradient activation map into a thermodynamic map; and the key area acquisition module is used for overlapping the thermodynamic diagram and the image to be processed to obtain a key area and optimizing a new classification neural network based on the key area.

Preferably, the image conversion module further comprises: the mean value calculating unit is used for calculating the mean value of the gradient maps corresponding to different convolutional layers; the product calculation unit is used for multiplying the average value of the gradient maps corresponding to the different convolutional layers by the channel number of the different convolutional layers to obtain a product value of the corresponding gradient maps; and the weighted average calculation unit is used for carrying out weighted average operation on the product values and the activation map respectively so as to obtain the required gradient activation map.

In order to solve the above technical problems, the present invention provides another technical solution: a terminal device characterized by: the terminal device comprises a storage unit and a processing unit, wherein the storage unit is used for storing a computer program, and the processing unit is used for executing the steps in the image key area detection method through the computer program stored in the storage unit.

Compared with the prior art, the image key area detection method, the system and the terminal equipment provided by the invention have the following beneficial effects:

the invention provides a method and a system for detecting key regions of images, which train a classification network for commodities to be classified by using a deep learning method, forward reason the commodities to be classified by using the neural network to obtain the classes of the commodities and an activation map, further perform backward propagation by using codes matched with the classes of the commodities as the gradients of the neural network to obtain a gradient activation map capable of reflecting the key regions of the images to be processed, thereby removing non-important regions with low correlation degree with required classification information, effectively reducing background interference of original images to be processed, and further concentrating on the key regions with higher discrimination degree.

Different from the existing method needing manual labeling, the method and the system for detecting the key region of the image can be obtained through self-training of a neural network, and the key region obtained through the method and the system can have better robustness (Robust) compared with the key region manually labeled.

Further, in the invention, the obtained gradient activation map is converted into a thermodynamic map, so that the visualization of the neural network is realized, and the region of the neural network, which is concerned by the commodity, is obtained and used as the key region of the commodity. Therefore, the detection of the key areas of the commodities can be automatically completed, and the performance of the classification network can be further improved by utilizing the key areas.

In the invention, the obtained key area is put into a new classification neural network for training, and the new classification neural network can be further optimized, so that the performance of the neural network can be rapidly improved, and the automatic detection process can save a large amount of time and labor cost.

In order to further improve the robustness of the trained neural network, the method for detecting the image key area is summarized, whether the neural network is trained to be converged is judged, if the neural network is not trained to be converged, the data set is continuously used for training the neural network, and therefore the accuracy of the neural network obtained by training for classifying the commodities can be guaranteed.

In the invention, a one-hot coding mode is used for coding the classification result of the commodity, the classification result of the commodity can be directly converted into a corresponding multidimensional vector, the one-hot coding is used as the gradient of a neural network to carry out back propagation so as to obtain a corresponding gradient map, a required key area is obtained based on the selection of the one-hot coding, the key area is related to the corresponding category in the onehot coding, and the flexibility and the accuracy of detecting the key area from the image to be processed can be further improved based on the selection of the one-hot coding.

In the invention, the activation map is obtained by forward reasoning of the neural network and the gradient map obtained by backward propagation of the activation map is synthesized to obtain the gradient activation map, which can reflect the important region of the selected commodity category in the image to be processed without manual marking, thereby greatly reducing the cost of manpower, material resources and time and avoiding the problem of inaccurate key region of the image due to manual marking.

In the invention, two processing modes of the acquired key area, namely cutting the key area to obtain a new cut image or adding a channel in the image to be processed to store the thermodynamic diagram, can be used for further optimizing a new classification neural network, thereby improving the performance of the commodity classification neural network.

The present invention also provides a terminal device comprising a storage unit for storing a computer program and a processing unit for executing the steps in the image key region detection method by the computer program stored in the storage unit. Therefore, the terminal device also has the same beneficial effects as the image key area detection method, and details are not repeated herein.

[ description of the drawings ]

Fig. 1A is a schematic flow chart illustrating a step of a method for detecting a key region of an image according to a first embodiment of the present invention.

FIG. 1B is a flowchart illustrating steps of another embodiment of the method for detecting a key region of an image shown in FIG. 1A.

FIG. 2 is a flowchart illustrating the step of determining whether the neural network training converges after the step S1 and before the step S2 shown in FIG. 1A.

Fig. 3 is a schematic diagram of a specific flow of step S4 shown in fig. 1.

Fig. 4A is a schematic diagram of a specific flow of step S6 shown in fig. 2.

Fig. 4B is a schematic diagram of the detailed flow steps of another embodiment of step S6 in fig. 4A.

Fig. 5 is a block diagram of an image key region detection system according to a second embodiment of the present invention.

FIG. 6 is a functional block diagram of another embodiment of the image key region detection system shown in FIG. 5.

Fig. 7 is a detailed block diagram of the image conversion module shown in fig. 6.

Fig. 8 is a block diagram of a terminal device according to a third embodiment of the present invention.

Reference is made to the accompanying drawings in which:

20, an image key area detection system; 21, a training module; 22, acquiring an activation map module; 23, acquiring a gradient map module; 231, an encoding module; 24, acquiring a gradient activation map module; 25, a judging module; 26, an image conversion module; 261, a mean value calculation unit; 262, a product calculation unit; 263, weighted average calculating unit; 27, acquiring a key area module;

30, a terminal device; 31, a storage unit; 32, a processing unit; 33, an input section; 34, an output section; 35, a communication section.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1A, a first embodiment of the invention provides a method for detecting a key region of an image, which includes the following steps:

step S1, training to obtain a neural network;

step S2, inputting an image to be processed, and carrying out forward reasoning on the neural network based on the image to be processed to obtain a classification result of the commodity in the image to be processed and a required activation map (activation map);

step S3, converting the classification result into a coding result, and reversely propagating the coding result as the gradient in the neural network to obtain a required gradient map; and

in step S4, the activation map and the gradient map are combined to obtain a gradient activation map (gradient activation map), which may represent a key region of the image to be processed.

Optionally, in order to further obtain a more accurate key area and further utilize the obtained key area, after the step S4, as shown in fig. 1B, the image key area detection method may further include the following steps:

step S5, converting the gradient activation map into a thermodynamic map; and

and step S6, superposing the thermodynamic diagram and the image to be processed to obtain a key area, and optimizing the new classification neural network based on the key area.

In step S1, the neural network includes a neural network that can be used to classify the object to be classified, and specifically, the classification network used herein may be any Convolutional Neural Network (CNN), which may specifically include but is not limited to: any of an alexnet neural network, vgg neural network, or a resnet neural network.

In this embodiment, taking the classification of the commodities as an example, in processing the to-be-processed image containing the commodities to be classified, objects such as bottles, boxes, belts or boxes are often required, and the difficulty of classification is increased because the characteristics of different types of commodities are more and the characteristics of different commodities are more different.

In the embodiment, a proper neural network for classifying the commodities is selected and trained, so that the accuracy of detecting the key regions of the images can be improved.

Specifically, in order to improve the stability of the neural network obtained by training, as shown in fig. 2, after the step S1, before the step S2 is performed, the following steps are further required:

and step S1-2, judging whether the neural network training in the step S1 converges, if yes, entering the step S2, and if not, returning to the step S1 to continue the training.

Specifically, in some embodiments of the present invention, determining whether the neural network is trained to converge may be determined based on a loss function (loss function) of the trained neural network, wherein the loss function may directly reflect the accuracy of the predicted value of the neural network. When the variation of the loss function is not large, the neural network can be trained to an optimal state, and the accuracy of the trained neural network can be considered to be better. In the present invention, the loss function may include, but is not limited to, a square loss function, a logarithmic loss function, a cross-entropy loss function, and the like.

In the step S2, inputting the image to be processed into the neural network obtained by training, performing forward reasoning, selecting a suitable convolution kernel (filter), and obtaining a corresponding feature map (feature map); in a specific forward inference process, each convolutional layer may include a convolution kernel or multiple convolution kernels, and each convolution kernel of a convolutional layer has an image feature of interest, such as a vertical edge, a horizontal edge, a color or a texture, and so on of an image to be processed. In the invention, the more convolution kernels are corresponding to convolution layers which are farther away from the input layer, the more detailed characteristic information can be embodied, and the more characteristics can be detected and identified. After the forward reasoning is completed, the classification result of the commodities in the image to be processed can be obtained.

In the step S2, in the process of forward inference by the neural network, the corresponding feature map on the last convolutional layer is an activation map, and the activation map can represent High level semantic information (High level feature) of the image, where the High level semantic information of the image can directly reflect the classification information of the image commodity to be processed.

For example, in some specific embodiments, it is desirable to classify an object, such as a bottle, box, bag or box, in the image to be processed, and the characteristic used for classification may be label text, shape or color of the bottle, box, bag or box. The corresponding activation map on the last convolutional layer may represent the classification characteristics of the bottle, box, bag or box corresponding to the classification result.

In step S3, the classification result is converted into an encoding result, which may specifically be: one-hot encoding (one-bit significance encoding) is performed based on the classification result obtained in step S2, and specifically, the one-hot encoding is a representation of a classification variable as a binary vector, which first requires mapping of a classification value to an integer value. Each integer-valued multiple is then represented as a binary vector, except for the index of the integer, which is a zero value, labeled 1. That is, in the present invention, each class corresponds to one hot code.

Specifically, assuming that there are 6 classes of classification results to be encoded, onehot encoding is performed on the first class to (1,0,0, 0), and if encoding is performed on the fourth class, one hot encoding corresponds to (0,0,0,1,0, 0).

One hot is further encoded as the gradient of the neural network for the class of commodity and backpropagated in step S3 to obtain a gradient map on the last convolutional layer. In the neural network, there is a gradient map for each convolutional layer. One hot codes capable of reflecting commodity classification are equivalent to the gradient of the neural network, and the method can directly reflect whether the accurate probability of the neural network for the commodity classification is improved or reduced after the convolution layer is processed on the basis of the corresponding gradient map.

For example, if the classification measured in step S2 is the first classification, the corresponding one hot code (1,0,0,0,0,0) is used as the gradient of the neural network for the first classification commodity and is propagated backwards, so that the gradient map of the plurality of convolutional layers corresponding to the first classification commodity is obtained, and the gradient map on the last convolutional layer is taken. The gradient map obtained on the last convolutional layer may reflect the critical areas of the first category of goods. If the key area of other types of commodities needs to be acquired, the one hot code can be replaced, so that the one hot code corresponds to the code of the required type of commodities.

The number and manner of the above categories are merely examples, and in some specific embodiments, one hot codes may also correspond to tens of categories, hundreds of categories, or thousands of categories, as a limitation herein.

In step S4, as shown in fig. 3, the step of synthesizing the activation map and the gradient map to obtain the gradient activation map specifically includes the following steps:

step S41, calculating the average gradient value of the corresponding pixel of each dimension channel in the gradient map corresponding to the last convolutional layer; specifically, in the present invention, the outputs of a convolution layer share P-dimensional channels, which respectively correspond to P key point positions, and the output of the convolution feature is a tensor of dimension W × H × P, where P denotes the number of channels, W denotes the width of the gradient map of the output, and H denotes the height of the gradient map of the output; each dimension of the convolutional layer channel can be represented as a W × H dimension matrix.

And step S42, multiplying the average gradient value obtained in step S41 by the corresponding channel number to obtain the product value of the corresponding gradient map.

Step S43, respectively carrying out weighted average operation on the product values and the activation map obtained in step S2 to obtain a required gradient activation map; specifically, the activation map obtained in step S2 corresponds to a floating point number, and after performing weighted average operation, the obtained weighted average is assigned to the corresponding pixel point, so as to obtain the required gradient activation map. In this step, the activation map only selects the feature map of the last convolutional layer in the forward reasoning process, which contains high-level semantic information and has a relatively high degree of correlation with the classification task, so that the classification accuracy can be further improved.

Based on the above steps, the input of each channel can be converted into a gradient activation map with the same size as the original image. The region with a stronger response in the gradient activation map may represent a local region in the original image. For any commodity, a rough local mapping is generated on the last convolutional layer to highlight the regions which have important effect on the prediction data, and the position with the strongest response in the gradient activation map can be regarded as the corresponding key region in the original image.

In the step S5, the further converting the gradient activation map into the thermodynamic map specifically includes the following steps:

and averaging the obtained channel-by-channel average values of the gradient activation map, wherein the average value is the thermodynamic map for the selected classification class. The gradient activation map is converted into the thermodynamic map, so that the key area of the image to be processed can be visualized, the range and the position of the key area can be obtained more intuitively, and the classification accuracy can be improved.

In the step S6, please refer to fig. 4A, the thermodynamic diagram is superimposed on the image to be processed to obtain a key region, and the new classification neural network is optimized based on the key region, which specifically includes the following steps:

step S61, superposing the thermodynamic diagram and the image to be processed, and taking the superposed area of the thermodynamic diagram and the image to be processed as a key area; as an example of the present invention, the superposition of the thermodynamic diagram and the image to be processed may be implemented by using an OpenCV function.

And step S62, using the key area to continuously train and optimize a new classification neural network.

Specifically, as further shown in fig. 4B, the step S62 may be specifically subdivided into the following steps:

step S621, cutting out the key area to obtain a new cut image so as to continuously train and optimize a new classification neural network; or

Step S622, adding a channel in the image to be processed to store the thermodynamic diagram, and continuing to train and optimize a new classification neural network; the channel is similar to an RGB three channel or a gray image channel, is used for storing the thermodynamic diagram and can be superposed with the image to be processed, so that a corresponding key area is extracted and obtained.

In the above steps S621 and S622, the corresponding steps may further use the obtained key region to train and optimize a new classification neural network, so as to facilitate improving the performance of the neural network.

Referring to fig. 5, a second embodiment of the invention provides an image key region detection system 20, which includes:

a training module 21 configured to train to obtain a neural network;

the activation map acquisition module 22 is configured to input an image to be processed, and perform forward reasoning on the image to be processed based on the neural network so as to obtain a classification result of the commodity in the image to be processed and a required activation map;

a gradient map obtaining module 23 configured to convert the classification result into an encoding result, and perform back propagation on the encoding result as a gradient in the neural network to obtain a required gradient map; and

a gradient activation map acquisition module 24 configured to synthesize the activation map and the gradient map to obtain a gradient activation map.

Referring to fig. 5, in the gradient map obtaining module 23, the method further includes:

the encoding module 231 is configured to perform one hot encoding on the classification result, so that each classification corresponds to one onehot encoding. The one hot code may correspond to an arrangement order of the classification results, wherein, assuming that there are 6 classes in the classification results, the one hot code corresponding to the first class is (1,0,0, 0), and the one hot code corresponding to the fourth class is (0,0,0,1,0, 0).

In the present embodiment, one hot encoding that can reflect the commodity classification is equivalent to the gradient of the neural network. Based on the corresponding gradient map, it can be directly judged whether the accurate probability for the commodity classification is improved or reduced after the neural network passes through the convolution layer.

In order to further determine the stability of the neural network obtained by training the training module, please refer to fig. 6, the image key region detection system further includes:

a judging module 25, configured to judge whether the neural network obtained by training in the training module is training converged;

the specific judgment of the judgment module 25 includes:

if the neural network is judged to be trained and converged, the activation map acquisition module 22 is configured to input an image to be processed into the neural network, and perform forward reasoning on the image to be processed based on the neural network to obtain a classification result of a commodity in the image to be processed and a required activation map;

if the neural network is not trained and converged, the training module 21 continues to train the neural network.

The specific related limitations for determining whether the neural network is trained and converged are the same as those described in the first embodiment, and are not repeated herein.

Optionally, in order to find a more suitable key region and use the obtained key region to optimize the new classification neural network, please continue to refer to fig. 6, the image key region detection system 20 may further include:

an image conversion module 26 for converting the gradient activation map into a thermodynamic map; and

and a key region acquiring module 27, configured to superimpose the thermodynamic diagram and the image to be processed to obtain a key region, and optimize a new classification neural network based on the key region.

Specifically, with continuing reference to fig. 7, the image conversion module 26 may further include:

an average value calculating unit 261, configured to calculate an average value of a gradient map corresponding to the last convolutional layer; specifically, after obtaining the average gradient value of each pixel of the gradient map on the corresponding convolutional layer, the average value of the gradient map is obtained.

A product calculating unit 262, configured to multiply the average value of the gradient map corresponding to the convolutional layer by the number of channels of the corresponding convolutional layer to obtain a product value of the corresponding gradient map; and

and a weighted average calculating unit 263, configured to perform a weighted average operation on the product values and the activation map respectively to obtain a required gradient activation map. Specifically, after weighted average operation is performed, the obtained weighted average is assigned to the corresponding pixel point, so that a required gradient activation map is obtained.

Referring to fig. 8, a third embodiment of the present invention provides a terminal device 30 for implementing the image key area detection method, where the terminal device 30 includes a storage unit 31 and a processing unit 32, the storage unit 31 is used for storing a computer program, and the processing unit 32 is used for executing steps in the image key area detection method through the computer program stored in the storage unit 31.

In some specific embodiments of the present invention, the terminal device 20 may be hardware or software. When the terminal device is hardware, it may be various electronic devices having a display screen and supporting video playing, including but not limited to a smart phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture Experts Group audio Layer III, motion Picture Experts Group audio Layer 3), an MP4 player (Moving Picture Experts Group audio Layer IV, motion Picture Experts Group audio Layer 4), a laptop portable computer, a desktop computer, and the like. When the terminal device is software, the terminal device can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The storage unit 31 includes a storage portion of a Read Only Memory (ROM), a Random Access Memory (RAM), a hard disk, and the like, and the processing unit 32 may perform various appropriate actions and processes according to a program stored in the Read Only Memory (ROM) or a program loaded into the Random Access Memory (RAM). In a Random Access Memory (RAM), various programs and data necessary for the operation of the terminal device 30 are also stored.

As shown in fig. 8, the terminal device 30 may further include an input portion 33 of a keyboard, a mouse, or the like; the terminal device 30 may further include an output section 34 such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker and the like; and the terminal device 30 may further include a communication section 35 such as a network interface card of a LAN card, a modem, or the like. The communication section 35 performs communication processing via a network such as the internet.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, the disclosed embodiments of the invention may include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 35.

The computer program, when executed by the processing unit 32, performs the above-described functions defined in the image key area detection method of the present application. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the present application, a computer readable storage medium may also be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures of the present application illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will be understood that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: the image key area detection system comprises a training module, an activation image acquisition module, a gradient image acquisition module and a gradient activation image acquisition module. Wherein the names of the modules do not in some cases constitute a limitation of the module itself.

As another aspect, the fourth embodiment of the present invention also provides a computer-readable medium, which may be contained in the apparatus described in the above-described embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: training to obtain a neural network; inputting an image to be processed, and carrying out forward reasoning on the image to be processed based on the neural network so as to obtain a classification result of commodities in the image to be processed and a required activation map; converting the classification result into a coding result, and performing backward propagation on the coding result as a gradient in the neural network to obtain a required gradient map; and synthesizing the activation map and the gradient map to obtain a gradient activation map, wherein the gradient activation map can represent key areas of the image to be processed.

Different from the existing method needing manual labeling, the method and the system for detecting the key region of the image can be obtained through self-training of the neural network, and the key region obtained through the method and the system can have better robustness compared with the key region manually labeled.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents and improvements made within the spirit of the present invention are intended to be included within the scope of the present invention.

Claims

1. A method for detecting key regions of an image is characterized in that: the method comprises the following steps:

step S1, training to obtain a neural network;

step S2, inputting an image to be processed, and carrying out forward reasoning on the image to be processed based on the neural network so as to obtain a classification result of commodities in the image to be processed and a required activation map; wherein, the activation map is a corresponding characteristic map on the last convolutional layer in the process of forward reasoning of the neural network;

step S3, converting the classification result into a coding result, and reversely transmitting the coding result as the gradient in the neural network to obtain a required gradient map; and

step S4, calculating the average gradient value of the corresponding pixel of each dimension channel in the gradient map corresponding to the last convolutional layer; multiplying the obtained average gradient value by the corresponding channel number to obtain a product value of the corresponding gradient map; respectively carrying out weighted average operation on the product values and the obtained activation map to obtain a required gradient activation map; the gradient activation map may represent key regions of the image to be processed.

2. The image key region detection method of claim 1, wherein: after the above step S4, the following steps are included:

step S5, converting the gradient activation map into a thermodynamic map; and

3. The image key region detection method of claim 2, wherein: in the step S6, the optimizing the new classification neural network based on the key area specifically includes the following steps: cutting out the key area to obtain a new cut image so as to continuously train and optimize a new classification neural network; or adding a channel in the image to be processed to store the thermodynamic diagram, and continuing to train and optimize the new classification neural network.

4. The image key region detection method of claim 1, wherein: after the step S1, before performing the step S2, it is also necessary to determine whether the neural network training in the step S1 converges, if yes, the process proceeds to the step S2, and if no, the process returns to the step S1 to continue the training; the activation map obtained in step S2 is specifically the activation map on the last convolutional layer in the forward estimation performed by the neural network.

5. The image key region detection method of claim 1, wherein: in step S3, converting the classification result into an encoding result specifically includes: and performing one-hot coding on the classification result, wherein each classification corresponds to one-hot coding.

6. An image key region detection system, characterized by: it includes:

a training module configured to train to obtain a neural network;

the activation map acquisition module is configured to input an image to be processed, and forward reasoning is carried out on the image to be processed based on the neural network so as to obtain a classification result of commodities in the image to be processed and a required activation map;

a gradient map acquisition module configured to convert the classification result into an encoding result and perform back propagation on the encoding result as a gradient in the neural network to obtain a required gradient map; and

a gradient activation map obtaining module configured to calculate an average gradient value of pixels corresponding to each channel in a gradient map corresponding to a last convolutional layer; multiplying the obtained average gradient value by the corresponding channel number to obtain a product value of the corresponding gradient map; and respectively carrying out weighted average operation on the product values and the obtained activation maps to obtain a required gradient activation map, wherein the gradient activation map can represent key areas of the image to be processed.

7. The image key region detection system of claim 6, wherein: the image key region detection system further includes:

the judging module is used for judging whether the neural network obtained by training in the training module is trained and converged;

the image conversion module is used for converting the gradient activation map into a thermodynamic map; and

and the key area acquisition module is used for overlapping the thermodynamic diagram and the image to be processed to obtain a key area and optimizing a new classification neural network based on the key area.

8. The image key region detection system of claim 7, wherein: the image conversion module further includes:

the mean value calculating unit is used for calculating the mean value of the gradient maps corresponding to different convolutional layers;

the product calculation unit is used for multiplying the average value of the gradient maps corresponding to the different convolutional layers by the channel number of the different convolutional layers to obtain a product value of the corresponding gradient maps; and

and the weighted average calculation unit is used for carrying out weighted average operation on the product values and the activation map respectively so as to obtain the required gradient activation map.

9. A terminal device characterized by: the terminal device includes a storage unit for storing a computer program and a processing unit for executing the steps in the image key area detection method according to any one of claims 1 to 5 by the computer program stored in the storage unit.