CN112329835A

CN112329835A - Image processing method, electronic device, and storage medium

Info

Publication number: CN112329835A
Application number: CN202011192053.XA
Authority: CN
Inventors: 孟祥飞; 孙华文; 刘金明; 孙娜; 冯源
Original assignee: Tianhe Supercomputing Huaihai Sub Center
Current assignee: Tianhe Supercomputing Huaihai Sub Center
Priority date: 2020-10-30
Filing date: 2020-10-30
Publication date: 2021-02-05

Abstract

The invention provides an image processing method, an electronic device and a storage medium, wherein the method is used for a CNN convolutional neural network and comprises the following steps: performing feature extraction on a target image based on a plurality of specified feature extraction modes to obtain feature sets corresponding to the plurality of specified feature extraction modes, wherein the plurality of specified feature extraction modes comprise at least two of a Gabor feature extraction mode, a wavelet transform feature extraction mode and a CNN convolution feature extraction mode; and fusing the feature sets corresponding to the multiple specified feature extraction modes to obtain multi-feature fusion information, wherein the multi-feature fusion information is used for transmitting the multi-feature fusion information to a hidden layer of the CNN convolutional neural network. The technical scheme improves the generalization of the CNN convolutional neural network application, and reduces the labor cost and the time cost in the development process while improving the model identification rate.

Description

Image processing method, electronic device, and storage medium

[ technical field ] A method for producing a semiconductor device

The present invention relates to the field of neural network technologies, and in particular, to an image processing method, an electronic device, and a storage medium.

[ background of the invention ]

In CNN convolutional neural networks for object recognition of target images, feature information is typically extracted in a specified manner for input into the hidden layers of the model for further processing. For example, the feature information of the target image may be extracted by a Gabor filter. The designated mode has personalized requirements on a model architecture, a pooling strategy and a classification mode of the CNN convolutional neural network, so that the model architecture, the pooling strategy and the classification mode of the CNN convolutional neural network are often set and adjusted by adopting a large amount of manual intervention in the process of developing the CNN convolutional neural network, the consumed labor cost and the time cost are huge, errors are easy to occur, and the recognition rate of the CNN convolutional neural network is negatively influenced.

Therefore, how to reduce the manual intervention in the CNN convolutional neural network development process becomes a technical problem to be solved urgently at present.

[ summary of the invention ]

The embodiment of the invention provides an image processing method, electronic equipment and a storage medium, and aims to solve the technical problem that a specified feature extraction mode consumes a large amount of labor cost in the development process of a CNN convolutional neural network in the related art.

In a first aspect, an embodiment of the present invention provides an image processing method, including: performing feature extraction on a target image based on a plurality of specified feature extraction modes to obtain feature sets corresponding to the plurality of specified feature extraction modes, wherein the plurality of specified feature extraction modes comprise at least two of a Gabor feature extraction mode, a wavelet transform feature extraction mode and a CNN convolution feature extraction mode; and fusing the feature sets corresponding to the multiple specified feature extraction modes to obtain multi-feature fusion information, wherein the multi-feature fusion information is used for transmitting the multi-feature fusion information to a hidden layer of the CNN convolutional neural network.

In a second aspect, an embodiment of the present invention provides an electronic device, including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the method of any of the first aspects above.

In a third aspect, an embodiment of the present invention provides a storage medium, which stores computer-executable instructions for executing the method flow described in any one of the first aspect.

According to the technical scheme, in the development process of the CNN convolutional neural network, the feature sets obtained by various specified feature extraction modes are combined into multi-feature fusion information to be used for subsequent processing of a hidden layer of the CNN convolutional neural network. Because the multi-feature fusion information is obtained based on various specified feature extraction modes, a preset model architecture, a pooling strategy, a classification mode and the like which are compatible with the various specified feature extraction modes can be uniformly applied in the development process of the CNN convolutional neural network, and the existing contents of the model architecture, the pooling strategy, the classification mode and the like in the single feature extraction mode do not need to be manually adjusted in each CNN convolutional neural network development. Therefore, the generalization of the CNN convolutional neural network application is improved, and the human cost and the time cost in the development process of the CNN convolutional neural network are reduced while the model identification rate is improved.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 shows a flow diagram of an image processing method according to an embodiment of the invention.

[ detailed description ] embodiments

For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.

It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

Example one

As shown in fig. 1, a flow of an image processing method according to an embodiment of the present invention includes:

and 102, extracting the features of the target image based on a plurality of specified feature extraction modes to obtain feature sets corresponding to the specified feature extraction modes.

The multiple specified feature extraction modes comprise at least two of a Gabor feature extraction mode, a wavelet transformation feature extraction mode and a CNN convolution feature extraction mode.

The Gabor feature extraction method adopts a Gabor function, the Gabor function is a linear filter for edge extraction, and in the spatial domain, the two-dimensional Gabor filter is a Gaussian kernel function modulated by sinusoidal plane waves. The Gabor is divided into a real part and an imaginary part, and the real part is used for filtering to smooth the image, and the imaginary part is used for filtering to detect the edge, so that the corresponding feature set is finally obtained. The Gabor filter is provided with a filtering angle and a filtering scale, wherein the filtering angle is a filtering direction, and optionally, the filtering angles are set to be 0 degrees, 30 degrees, 60 degrees, 90 degrees, 120 degrees and 150 degrees, so that filtering can be performed from 6 different filtering directions to obtain texture information in different directions.

Further, six filter scales, i.e. wavelengths, are set, optionally 8, 10, 12, 14, 15, 17 for each filter scale in pixel. And filtering 6 filtering directions under each filtering scale to obtain 6 characteristic graphs, fusing the 6 characteristic graphs together, and finally obtaining a fused characteristic graph corresponding to the filtering scale. So far, 6 filter scales are counted to obtain 6 fusion feature maps.

It should be understood that the filtering angle and the filtering scale are not limited to the above values, but may be any other values meeting the actual development requirements.

In the wavelet transform feature extraction mode, two-dimensional wavelet transform can be adopted, and the two-dimensional wavelet transform can decompose a target image into low-frequency domain feature information and high-frequency domain feature information to serve as an obtained feature set. Wherein the low frequency domain feature information includes an approximate outline of the target image, and the high frequency domain feature information includes a horizontal high frequency component, a vertical high frequency component, and a diagonal high frequency component, showing details of the target image.

In a CNN (Convolutional Neural Networks) convolution feature extraction manner, 6 random convolution kernels can be set in a tensrflow system to perform convolution on a target image, so as to obtain a corresponding feature set.

In one possible design, the multiple specified feature extraction modes include a Gabor feature extraction mode, a wavelet transform feature extraction mode, and a CNN convolution feature extraction mode.

In one possible design, the plurality of specified feature extraction modes include a wavelet transform feature extraction mode and a CNN convolution feature extraction mode.

In one possible design, the multiple specified feature extraction modes include a Gabor feature extraction mode and a CNN convolution feature extraction mode.

In one possible design, the multiple specified feature extraction modes include a Gabor feature extraction mode and a wavelet transform feature extraction mode.

And 104, fusing the feature sets corresponding to the multiple specified feature extraction modes to obtain multi-feature fusion information, wherein the multi-feature fusion information is used for transmitting the multi-feature fusion information to a hidden layer of the CNN convolutional neural network.

In the development process of the CNN convolutional neural network, feature sets obtained by various specified feature extraction modes are combined into multi-feature fusion information, namely, the multi-feature fusion information has the feature characteristics of the various feature extraction modes. Therefore, a preset model architecture, a pooling strategy, a classification mode and the like which are compatible with various specified feature extraction modes can be uniformly applied in the CNN convolutional neural network development process, and the existing contents of the model architecture, the pooling strategy, the classification mode and the like do not need to be manually adjusted for a single feature extraction mode during each CNN convolutional neural network development. Therefore, the generalization of the CNN convolutional neural network application is improved, and the human cost and the time cost in the development process of the CNN convolutional neural network are reduced while the model identification rate is improved.

In one possible design, the manner of fusing the feature sets corresponding to the plurality of designated feature extraction manners includes: and fusing the feature sets corresponding to the multiple specified feature extraction modes respectively through a PCA fusion mode.

The PCA fusion mode is mainly characterized in that each feature set is projected into a new coordinate space by adopting a linear projection method, so that new components are distributed according to information quantity, the information quantity contained in the first principal component is maximum, and the basic condition met by each component is that original variable information is reflected to the maximum extent under the condition that the components are mutually irrelevant.

In another possible design, the manner of fusing the feature sets corresponding to the plurality of specified feature extraction manners includes a weighted fusion manner, which is described in detail below by an embodiment two-pair weighted fusion manner.

Example two

The weighted fusion mode comprises the following steps:

step 202, obtaining the recognition rate of the plurality of specified feature extraction modes for the same specified object.

Step 204, determining a weighting parameter corresponding to each specified feature extraction mode based on the identification rate of each specified feature extraction mode.

Since the feature information extracted by each feature extraction method is different for the same designated object, the capability range of the various feature extraction methods for identifying the designated object based on the extracted feature information is different, that is, the recognition rates of the various feature extraction methods for the same designated object are different. The recognition rate reflects the probability that the feature extraction manner accurately recognizes the designated object, and the higher the recognition rate, the greater the contribution of the feature extraction manner to the recognition of the designated object, i.e., the contribution of a feature extraction manner to the recognition rate of the designated object affects the recognition of the designated object. Each feature extraction method corresponds to a unique weighting parameter in its own contribution level to identifying the designated object, and thus it is understood that the recognition rate of the feature extraction method to the designated object determines the weighting parameter of the feature extraction method.

Step 206, determining the occupation ratio of the feature set obtained by each specified feature extraction mode in the multi-feature fusion information based on the weighting parameter corresponding to each specified feature extraction mode.

For each of the plurality of specific feature extraction methods, the larger the weighting parameter is relative to the other specific feature extraction methods, the more the contribution of the weighting parameter to the identification of the specific object is, that is, the contribution of each specific feature extraction method to the identification of the specific object is positively correlated with the corresponding weighting parameter. In order to protect the contribution level of each designated feature extraction mode to the identification designated object, the occupation ratio of the feature set obtained by each designated feature extraction mode in the multi-feature fusion information can be determined based on the weighting parameter of each designated feature extraction mode, so that the positive correlation between the difference between the contributions of each designated feature extraction mode to the identification designated object and the difference between the weighting parameters corresponding to each designated feature extraction mode is ensured. Therefore, the proportion of the feature set obtained based on the weighting parameter corresponding to the specified feature extraction mode in the multi-feature fusion information can be determined.

And 208, fusing the plurality of feature sets in a weighted fusion mode based on the occupation ratios corresponding to the plurality of feature sets.

In one possible design, when the multiple specified feature extraction modes include a Gabor feature extraction mode, a wavelet transformation feature extraction mode and a CNN convolution feature extraction mode, the target image is processed based on a Gabor filter to obtain a feature set X1, the target image is processed based on a wavelet transformation filter to obtain a feature set X2, and the target image is processed based on a CNN convolution to obtain a feature set X3. Next, ratios a, b, and c corresponding to X1, X2, and X3 were obtained based on the method shown in example 2, where a + b + c was 1.

In addition, before step 206, the weighting parameters corresponding to the specified feature extraction manner may also be corrected based on the type of the object to be recognized in the target image. Specifically, the personalized features of different types of objects to be recognized are different, and for a given object to be recognized, the personalized features thereof can reflect the difference with other objects more than the general features thereof, in other words, the personalized features thereof contribute to the recognition thereof more than the general features thereof. Therefore, the weighting parameter corresponding to the personalized feature should be raised to a higher level, so as to facilitate the effective identification of the object to be identified. Specifically, based on the type of the object to be recognized in the target image, the weighting parameters of the feature set in which the designated personalized features are located in the type are increased, so that the designated personalized features are guaranteed to generate high recognition contribution in the final model.

In addition, before the fusing the feature sets corresponding to the plurality of specified feature extraction methods, the method further includes: all the feature sets are subjected to normalization processing; the fusing the feature sets corresponding to the multiple specified feature extraction modes comprises: and performing feature fusion on all the feature sets subjected to the normalization processing.

Because the feature sets generated by different specified feature extraction modes are in different dimensions, and if the feature sets are directly fused, the accuracy of the obtained multi-feature fusion information is influenced by the difference of the different dimensions, so that the recognition rate of the finally obtained model is influenced. In this way, normalization processing can be carried out on the feature sets under different specified feature extraction modes, and the feature sets are uniformly converted under the same dimension, so that differences caused by different dimensions are eliminated, and the influence of the differences caused by different dimensions on the model recognition rate is reduced.

EXAMPLE III

On the basis of the first embodiment and the second embodiment, fusing the feature sets corresponding to the plurality of specified feature extraction modes respectively, including: simplifying the feature sets corresponding to the multiple specified feature extraction modes respectively, and fusing the simplified feature sets; the method further comprises the following steps: and carrying out the simplification processing on the multi-feature fusion information, and applying the multi-feature fusion information after the simplification processing to the hidden layer of the CNN convolutional neural network.

The characteristics of various designated characteristic extraction modes which are good at recognition are different, some characteristics greatly contribute to the object to be recognized in the recognition target image, some characteristics slightly contribute to the object to be recognized in the recognition target image, and if all the characteristics are processed in the process of developing the CNN convolutional neural network, consumed time resources and system resources are huge. Therefore, the feature sets corresponding to various specified feature extraction modes can be simplified, so that the time resources and the system resources consumed subsequently are reduced.

Similarly, the fused multi-feature fusion information can be simplified, so that the time resource and the system resource consumed subsequently are reduced.

The simplified processing is mainly used for identifying and eliminating the characteristics which do not sufficiently contribute to the identification rate of the object identified by the CNN convolutional neural network. In this regard, first, a plurality of feature values may be selected according to a predetermined target ratio, and the maximum weight corresponding to the plurality of feature values may be smaller than the minimum weight corresponding to the feature values other than the plurality of feature values in the total feature values. That is, all feature values need to be sorted, and a minimum, predetermined target ratio of a plurality of feature values is selected.

Further, if the plurality of eigenvalues are deleted directly, the eigenvalues are reduced too much, which affects the effectiveness of the CNN convolutional neural network. Therefore, partial characteristic values of a designated proportion can be randomly selected from the plurality of characteristic values, namely, partial characteristic values are selected from the plurality of characteristic values with the lowest weight for simplification, so that the validity of the CNN convolutional neural network is not influenced by the great reduction of the characteristic values, a part of characteristic values are successfully simplified, and subsequently consumed time resources and system resources are reduced.

Specifically, the weights of the partial feature values can be set to zero to realize simplification of the partial feature values, and since the contribution level of the partial feature values to the recognition rate is determined based on the feature weights in the CNN convolutional neural network development processing, after the weights are set to zero, the contribution level of the partial feature values to the recognition rate is reduced to zero, so that the influence of the partial feature values on the recognition rate is not considered, and the simplification of the development process is realized.

An electronic device of an embodiment of the invention includes at least one memory; and a processor communicatively coupled to the at least one memory; wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the scheme of any of the embodiments described above. Therefore, the electronic device has the same technical effects as any of the above embodiments, and is not described herein again.

In any of the above embodiments, optionally, the CNN convolutional neural network comprises a LeNet5 model.

The electronic device of embodiments of the present invention exists in a variety of forms, including but not limited to:

(1) mobile communication devices, which are characterized by mobile communication capabilities and are primarily targeted at providing voice and data communications. Such terminals include smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.

(2) The ultra-mobile personal computer equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include PDA, MID, and UMPC devices, such as ipads.

(3) Portable entertainment devices such devices may display and play multimedia content. Such devices include audio and video players (e.g., ipods), handheld game consoles, electronic books, as well as smart toys and portable car navigation devices.

(4) The server is similar to a general computer architecture, but has higher requirements on processing capability, stability, reliability, safety, expandability, manageability and the like because of the need of providing highly reliable services.

(5) And other electronic devices with data interaction functions.

In addition, an embodiment of the present invention provides a storage medium, which stores computer-executable instructions for executing the method flow described in any of the foregoing embodiments.

The technical scheme of the invention is described in detail in combination with the attached drawings, and through the technical scheme of the invention, the generalization of the CNN convolutional neural network application is improved, and the manpower cost and the time cost in the development process of the model are reduced while the model identification rate is improved.

It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions in actual implementation, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. An image processing method for a CNN convolutional neural network, comprising:

performing feature extraction on a target image based on a plurality of specified feature extraction modes to obtain feature sets corresponding to the plurality of specified feature extraction modes, wherein the plurality of specified feature extraction modes comprise at least two of a Gabor feature extraction mode, a wavelet transform feature extraction mode and a CNN convolution feature extraction mode;

and fusing the feature sets corresponding to the multiple specified feature extraction modes to obtain multi-feature fusion information, wherein the multi-feature fusion information is used for transmitting the multi-feature fusion information to a hidden layer of the CNN convolutional neural network.

2. The method according to claim 1, wherein the fusing the feature sets corresponding to the plurality of specified feature extraction methods comprises:

acquiring the recognition rate of the plurality of specified feature extraction modes for the same specified object;

determining a weighting parameter corresponding to each specified feature extraction mode based on the identification rate of each specified feature extraction mode;

determining the proportion of the feature set obtained by each specified feature extraction mode in the multi-feature fusion information based on the weighting parameter corresponding to each specified feature extraction mode;

and fusing the plurality of feature sets in a weighting fusion mode based on the occupation ratios corresponding to the plurality of feature sets.

3. The method according to claim 1, wherein the fusing the feature sets corresponding to the plurality of specified feature extraction methods comprises:

and fusing the feature sets corresponding to the multiple specified feature extraction modes respectively through a PCA fusion mode.

4. The method according to claim 2 or 3, wherein before the fusing the feature sets corresponding to the plurality of specified feature extraction methods, the method further comprises:

all the feature sets are subjected to normalization processing;

the fusing the feature sets corresponding to the multiple specified feature extraction modes comprises:

and performing feature fusion on all the feature sets subjected to the normalization processing.

5. The method according to any one of claims 1 to 3, wherein the fusing the feature sets corresponding to the plurality of specified feature extraction methods comprises:

simplifying the feature sets corresponding to the multiple specified feature extraction modes respectively, and fusing the simplified feature sets;

the method further comprises the following steps: the multi-feature fusion information is subjected to simplification processing, and the multi-feature fusion information subjected to simplification processing is used for a hidden layer of the CNN convolutional neural network;

wherein, the simplified processing mode comprises the following steps:

selecting a plurality of characteristic values according to a preset target proportion for all characteristic values in each characteristic set and/or in the multi-characteristic fusion information, wherein the maximum weight corresponding to the plurality of characteristic values is smaller than the minimum weight corresponding to other characteristic values except the plurality of characteristic values in all the characteristic values;

randomly selecting a part of characteristic values with a specified proportion from the plurality of characteristic values;

and setting the weight of the part of characteristic values to zero.

6. An electronic device, comprising: at least one processor; and a memory communicatively coupled to the at least one processor;

wherein the memory stores instructions executable by the at least one processor, the instructions being arranged to perform the method of any of the preceding claims 1 to 5.

7. A storage medium having stored thereon computer-executable instructions for performing the method flow of any of claims 1-5.