CN112069905A - Image processing method, apparatus, device and medium - Google Patents

Image processing method, apparatus, device and medium Download PDF

Info

Publication number
CN112069905A
CN112069905A CN202010791058.8A CN202010791058A CN112069905A CN 112069905 A CN112069905 A CN 112069905A CN 202010791058 A CN202010791058 A CN 202010791058A CN 112069905 A CN112069905 A CN 112069905A
Authority
CN
China
Prior art keywords
activation function
differentiable
layer
neural network
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010791058.8A
Other languages
Chinese (zh)
Inventor
马宁宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Megvii Technology Co Ltd
Original Assignee
Beijing Megvii Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Megvii Technology Co Ltd filed Critical Beijing Megvii Technology Co Ltd
Priority to CN202010791058.8A priority Critical patent/CN112069905A/en
Publication of CN112069905A publication Critical patent/CN112069905A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/20Scenes; Scene-specific elements in augmented reality scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the invention provides an image processing method, an image processing device, image processing equipment and an image processing medium, aiming at improving the accuracy of image recognition, wherein the method comprises the following steps: obtaining an image to be processed; inputting the image to be processed into a target network to obtain a processing result output by the target network; wherein, the activation function of at least one layer in the target network is a differentiable activation function, and the differentiable activation function is obtained by training an initial differentiable activation function; the processing result is a classification result of the image to be processed or an identification result of an object contained in the image to be processed.

Description

Image processing method, apparatus, device and medium
Technical Field
The present invention relates to the field of image processing technologies, and in particular, to an image processing method, an image processing apparatus, an image processing device, and an image processing medium.
Background
With the development of artificial intelligence, the neural network model is adopted to realize the identification of objects, and the application is more and more extensive. Generally, there are various neural network models for object recognition, for example, ResNet model, mobileNet model, etc. exist in the related art, and these neural network models can achieve better object recognition effect.
However, with the in-depth practice of object recognition using these neural network models, the applicant has found that: the activation functions used by these neural network models can present bottlenecks in the training process, such as: in the process of training the neural network model, some parameters of the neural network model stop updating, so that the training degree of the neural network model is insufficient, and if the trained neural network model is used for object recognition, the obtained recognition accuracy is not high.
Disclosure of Invention
In view of the above problems, an image processing method, apparatus, device, and medium of the embodiments of the present invention are proposed to overcome or at least partially solve the above problems.
In order to solve the above problem, a first aspect of the present invention discloses an image processing method, including:
obtaining an image to be processed;
inputting the image to be processed into a target network to obtain a processing result output by the target network;
wherein, the activation function of at least one layer in the target network is a differentiable activation function, and the differentiable activation function is obtained by training an initial differentiable activation function;
the processing result is a classification result of the image to be processed or an identification result of an object contained in the image to be processed.
Optionally, the initial activation function contains a learnable parameter.
Optionally, the training step of the target network is as follows:
inputting a sample image into a convolutional neural network to obtain a processing result output by the convolutional neural network, wherein an activation function of at least one layer in the convolutional neural network is the initial differentiable activation function, and parameters of the convolutional neural network comprise learnable parameters;
and updating parameters of the convolutional neural network according to a residual error between a processing result output by the convolutional neural network and a standard processing result until a preset training end condition is met to obtain the target network, wherein the standard processing result is a classification label labeled for the sample image in advance or a label of an object contained in the sample image.
Optionally, the learnable parameters in the initial differentiable activation function include a first parameter for adjusting an upper and a lower limit of a first derivative of the initial differentiable activation function, and/or a second parameter for adjusting a smooth differentiable degree of the differentiable activation function.
Optionally, the second parameter is at a pixel level, and values of the second parameter corresponding to different pixel points are independent of each other; or
The second parameter is at channel level, and all pixel points of the same channel share the same value of the second parameter; or
The second parameter is hierarchical, and all the pixel points of all the channels of the same layer share the same value of the second parameter.
Optionally, the initial differentiable activation function is:
Sβ(p1x,p2x)=(p1-p2)x·σ[β(p1-p2)x]+p2x;
wherein p is1Is a parameter, p, which makes the upper limit of the first derivative of the initial differentiable activation function adjustable2The lower limit of the first derivative of the initial differentiable activation function is adjustable, beta is a parameter for adjusting the smooth differentiable degree of the initial differentiable activation function, sigma is a Sigmoid function, and x represents input.
Optionally, the convolutional neural network is constructed by:
determining at least one original activation function layer from an original neural network;
replacing the at least one original activation function layer with a differentiable activation function layer to obtain the convolutional neural network, wherein an activation function in the differentiable activation function layer is the initial differentiable activation function.
Optionally, the convolutional neural network is constructed by:
determining a plurality of original convolutional layers to be processed from an original neural network, wherein an original activation function layer is configured between every two adjacent original convolutional layers to be processed in the plurality of original convolutional layers to be processed;
replacing the original activation function layer configured between one or more groups of adjacent two original convolution layers to be processed with a differentiable activation function layer to obtain the convolutional neural network, wherein an activation function in the differentiable activation function layer is the initial differentiable activation function.
Optionally, the convolutional neural network is constructed by:
determining a feature processing layer to be processed from an original neural network, wherein the output end of the feature processing layer is connected with the input end of an original activation function layer, and the feature processing layer is a residual error layer or a feature splicing layer;
replacing the original activation function layer connected with the output end of the feature processing layer with the differentiable activation function layer to obtain the convolutional neural network, wherein the activation function in the differentiable activation function layer is the initial differentiable activation function.
In a second aspect of the embodiments of the present invention, there is provided an image processing apparatus, including:
the image obtaining module is used for obtaining an image to be processed;
the image input module is used for inputting the image to be processed into a target network to obtain a processing result output by the target network;
wherein, the activation function of at least one layer in the target network is a differentiable activation function, and the differentiable activation function is obtained by training an initial differentiable activation function;
the processing result is a classification result of the image to be processed or an identification result of an object contained in the image to be processed.
In a third aspect of the embodiments of the present invention, an electronic device is further disclosed, including:
one or more processors; and
one or more machine readable media having instructions stored thereon which, when executed by the one or more processors, cause the apparatus to perform an image processing method as described in embodiments of the first aspect of the invention.
In a fourth aspect of the embodiments of the present invention, a computer-readable storage medium is further disclosed, which stores a computer program for causing a processor to execute the image processing method according to the embodiment of the first aspect of the present invention.
In the embodiment of the invention, an image to be processed can be obtained, and the image to be processed is input into a target network to obtain a processing result output by the target network; wherein the activation function of at least one layer in the target network is a differentiable activation function; the processing result is a classification result of the image to be processed or an identification result of an object contained in the image to be processed.
Compared with the prior art, the embodiment of the invention at least comprises the following advantages:
some points in the existing activation functions are not differentiable (for example, in the activation function like relu, the intersection point of y ═ x and y ═ 0 is not differentiable), and the gradient (i.e., derivative) of these differentiable points cannot be returned in the back propagation process, and only the gradient can be approximated. The activation function of the embodiment of the invention is differentiable, namely, each point of the activation function can be differentiable, so that the precise value of the gradient of each point can be transmitted back in the back propagation process. Therefore, the training effect can be improved, the target network has higher accuracy when the object is identified, the accuracy when the neural network model identifies the object is obviously improved, and the higher accuracy can be achieved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 shows S in an embodiment of the present inventionβThe non-linear plot of (a, b);
FIG. 2 is a non-linear plot of a differentiable activation function in an embodiment of the present invention;
FIG. 3A is a schematic diagram of a convolutional neural network in accordance with the present invention;
FIG. 3B is a schematic diagram of another convolutional neural network configuration in accordance with the practice of the present invention;
FIG. 3C is a schematic diagram of another convolutional neural network in accordance with the practice of the present invention;
FIG. 4 is a flow chart of the steps of a method of image processing in the practice of the present invention;
FIG. 5 is a statistical chart comparing the respective recognition error rates of an original neural network and a target network in the practice of the present invention;
FIG. 6 is a block diagram of an image processing apparatus in an implementation of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below to clearly and completely describe the technical solutions in the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the related art, when an existing neural network model is used for object recognition, activation functions such as a ReLU (Rectified Linear Unit) are generally used, but the activation functions are not locally conductive, so that some parameters of the neural network model stop updating in a training process of the neural network model using the activation functions, the training degree is insufficient, and the accuracy of the obtained neural network model for object recognition is low. In order to improve the identification accuracy of the neural network model, the applicant proposes the following technical idea: the object identification is performed by using a target network with a differentiable activation function, and in practice, an image to be processed can be input into the target network, and then a processing result output by the target network is obtained. Therefore, as the activation function in the target network is the differentiable activation function, the parameters of the network are continuously updated in the process of training to obtain the target network, the training degree is enhanced, the object identification function of the obtained target network is enhanced, and the identification accuracy is greatly improved.
In this embodiment, the target network is obtained by training the convolutional neural network through the training sample, so that in the process of constructing the target network, the convolutional neural network needs to be obtained first, and the target network is obtained through training the convolutional neural network.
In order to facilitate understanding of the present invention, the initial differentiable activation function of the present application is introduced first, wherein the initial differentiable activation function can be expressed by the following formula:
Sβ(p1x,p2x)=(p1-p2)x·σ[β(p1-p2)x]+p2x;
wherein p is1Is a parameter, p, which makes the upper limit of the first derivative of the initial differentiable activation function adjustable2The lower limit of the first derivative of the initial differentiable activation function is adjustable, beta is a parameter for adjusting the smooth differentiable degree of the initial differentiable activation function, sigma is a Sigmoid function, and x represents input.
Wherein p is1It can be understood that the upper limit, p, of the gradient of the convolutional neural network in the process of being trained is controlled2It is understood that the lower limit of the gradient of the convolutional neural network in the training process can be changed and updated along with the training, so that when the target network is obtained, the trained differentiable activation function is also obtained, and thus, the trained differentiable activation function is also differentiable. Therefore, gradient feedback can be effectively carried out in the process of training to obtain the target network, and the upper and lower limits of the gradient can be dynamically adjusted.
The process of obtaining the initial differentiable activation function can be explained as follows:
first, a common α can be passed-softmax formula, obtaining a smoothed form S of the max (a, b) functionβ(a, b)), which is derived as follows:
Figure BDA0002623758280000061
wherein β is used to control the degree of smoothing, S as β approaches positive infinityβAnd (a, b) is max (a, b). The rest is a smooth max, the resulting non-linear curve is shown in fig. 1.
Then, let a be p1x,b=p2x, the above differentiable activation function can be obtained:
Sβ(p1x,p2x)=(p1-p2)x·σ[β(p1-p2)x]+p2x。
after derivation of the differentiable activation function, the obtained nonlinear distribution is shown in fig. 2, and it can be seen that the upper and lower limits of the first derivative are adjustable and smoother by the initial differentiable activation function. Thus, in the process of training to obtain the target network, the learnable parameters in the initial differentiable activation function are fixed, and after the fixation, β is not 0 or positive infinity, so that the differentiable activation function obtained after training is also differentiable.
Next, a description will be given of a procedure of how to construct a convolutional neural network, which can be constructed in the following manner depending on how many layers the initial differentiable activation function is located in.
Determining at least one original activation function layer from an original neural network; and replacing the at least one original activation function layer with a differentiable activation function layer to obtain the convolutional neural network, wherein an activation function in the differentiable activation function layer is the initial differentiable activation function.
In this embodiment, the original neural network may be any convolutional neural network, for example, a neural network with a shuffle Net v2 structure, a neural network with a Mobile Net v2 structure, a ResNet neural network, and the like. The original activation function layer includes an original activation function, and the original activation function may be a ReLU activation function, a Sigmoid function, or a tanh (hyperbolic tangent function).
In this embodiment, one original activation function layer of the plurality of original activation function layers may be replaced with an initial differentiable activation function layer, or all or part of the plurality of original activation function layers may be replaced with an initial differentiable activation function layer, and a specific replaceable layer may be determined according to experience or actual needs.
In this embodiment, the ReLU activation function in any of the original activation function layers in the original neural network may be replaced with a differentiable activation function. Referring to fig. 3A, a schematic diagram of a convolutional neural network construction is shown, taking an original neural network as a ResNet network as an example. As shown in fig. 3A, for example, the ReLU activation function in the original activation function layer to which the output of the convolutional layer "1 × 1, 64" is connected may be replaced with a differentiable activation function, so that the original activation function layer is replaced with a differentiable activation function layer. Of course, it is also possible to replace the ReLU activation function in the original activation function layer to which the output of the convolutional layer "3 × 3, 64" is connected with the initial differentiable activation function at the same time, thereby realizing the replacement of two original activation function layers.
Alternatively, in one example, the replaced original activation function layer may refer to an original activation function layer configured between every two adjacent original convolution layers to be processed in the original neural network.
Correspondingly, when the convolutional neural network is constructed, a plurality of original convolutional layers to be processed can be determined from the original neural network, and an original activation function layer is configured between every two adjacent original convolutional layers to be processed in the plurality of original convolutional layers to be processed; replacing the original activation function layer configured between one or more groups of adjacent two original convolution layers to be processed with a differentiable activation function layer to obtain the convolutional neural network, wherein an activation function in the differentiable activation function layer is the initial differentiable activation function.
One of the original convolution layers to be processed is understood to be a layer for performing a single convolution process on the image. In this embodiment, the original neural network generally includes a plurality of original convolutional layers, an original activation function layer may be configured between every two adjacent original convolutional layers, that is, an original activation function layer is configured between an output end of a current original convolutional layer and an input end of a next adjacent original convolutional layer, and an output of the current original convolutional layer is input to the next adjacent original convolutional layer after passing through the original activation function layer.
Thus, in this embodiment, the original activation function layer configured between one or more sets of two adjacent original convolution layers to be processed may be replaced with a differentiable activation function layer.
Referring to fig. 3B, a schematic diagram of a target network construction in this example, taking an original neural network as a ResNet network as an example, is shown. As shown in fig. 3B, the convolutional layer "1 × 1, 64" and the convolutional layer "3 × 3, 64" are two adjacent convolutional layers between which the original activation function layer is disposed, wherein the original activation function layer connected between the "1 × 1, 64" and the convolutional layer "3 × 3, 64" may be replaced with a differentiable activation function layer.
Optionally, in an example, the replaced original activation function layer may also refer to an original activation function layer to which an output end of a feature processing layer in the original neural network is connected.
Correspondingly, a feature processing layer to be processed can be determined from the original neural network, the output end of the feature processing layer is connected with the input end of the original activation function layer, and the feature processing layer is a residual error layer or a feature splicing layer; replacing the original activation function layer connected with the output end of the feature processing layer with the differentiable activation function layer to obtain the convolutional neural network, wherein the activation function in the differentiable activation function layer is the initial differentiable activation function.
The feature processing layer may have different types according to the type of the original neural network, for example, when the original neural network is a ResNet network, the feature processing layer is a residual error layer and is configured to fuse the input multiple features and the original features, and when the original neural network is a MasNet neural network, the feature processing layer is a feature splicing layer and is configured to splice the input multiple features.
In this embodiment, the original activation function layer connected to the output end of each feature processing layer in the original neural network may be replaced with a differentiable activation function layer, that is, the original activation function in the original activation function layer is replaced with an initial differentiable activation function.
Referring to fig. 3C, a schematic diagram of the target network construction, which takes the original neural network as the ResNet network as an example in this example, is shown. As shown in fig. 3C, the feature processing layer is shown by "+" sign in the figure, specifically, is a residual layer, and is used for fusing the input to the convolutional layer "1 × 1, 64" and the output of the convolutional layer "1 × 1, 256", and an original activation function layer is configured after the feature processing layer, and the original function in the original activation function layer is a ReLU activation function, which may be replaced with the initial differentiable activation function described in the present embodiment.
It should be noted that, in specific implementation, while the original activation function layer connected to the output end of each feature processing layer in the original neural network is replaced by the differentiable activation function layer, the original activation function layer configured between every two adjacent original convolution layers to be processed in the original neural network may also be replaced by the differentiable activation function layer, so as to replace all the original activation function layers in the original neural network.
In this embodiment, after the convolutional neural network is constructed by any one or more combinations of the above, and the convolutional neural network is obtained, the convolutional neural network may be trained by using the training samples. In order to dynamically and adaptively control the upper and lower limits of the gradient, the initial differentiable activation function in this embodiment includes a learnable parameter, such as p in the above initial differentiable activation function1And p2(ii) a Wherein the training step of the target network may be as follows:
firstly, inputting a sample image into a convolutional neural network to obtain a processing result output by the convolutional neural network, wherein an activation function of at least one layer in the convolutional neural network is an initial differentiable activation function, and parameters of the convolutional neural network comprise learnable parameters.
And secondly, updating parameters of the convolutional neural network according to a residual error between a processing result output by the convolutional neural network and a standard processing result until a preset training end condition is met to obtain the target network, wherein the standard processing result is a classification label labeled for the sample image in advance or a label of an object contained in the sample image.
The preset training end condition may be that the residual error meets the requirement or that the parameter is updated for a sufficient number of times. Learnable parameters are understood to be parameters that can control the upper and lower bounds of the gradient during training of the convolutional neural network, which can be continuously updated during training.
In this embodiment, the sample image may be acquired according to an actual object identification requirement, for example, if the object identification is face identification, the sample image may be an image acquired for a face, and if the object identification is animal classification identification, the sample image may be an image acquired for various animals.
The standard processing result may be a correct recognition result of the input sample image, or may be understood as a true result. For example, if the object recognition is animal classification recognition, the standard processing result may be the correct class to which the animal in the sample image actually belongs, and the processing result output by the convolutional neural network may be understood as: and (5) a prediction result for identifying the sample image.
The residual error between the output processing result and the standard processing result may refer to a difference between the processing result and the standard processing result, and the loss of the convolutional neural network may be obtained according to the difference, and the parameter of the convolutional neural network may be updated according to the loss. Wherein the parameters of the convolutional neural network may comprise learnable parameters in the initial differentiable activation function.
In an alternative example, the learnable parameter comprises a first parameter for adjusting an upper and a lower limit of a first derivative of the initial differentiable activation function, and/or a second parameter for adjusting a smooth differentiable degree of the initial differentiable activation function.
Wherein the first parameter may control the upper and lower limits of the first derivative of the initial differentiable activation function, including a parameter that may control the upper limit of the first derivative of the initial differentiable activation function, such as p in the above-mentioned initial differentiable activation function1And a parameter which can control the lower limit of the first derivative of the initial differentiable activation function, such as p in the above-mentioned initial differentiable activation function2
In yet another optional example, the second parameter is at a pixel level, and values of the second parameter corresponding to different pixel points are independent of each other; or the second parameter is at channel level, and all the pixel points of the same channel share the same value of the second parameter; or the second parameter is hierarchical, and all the pixel points of all the channels of the same layer share the same value of the second parameter.
In this case, the values of the respective second parameters of different pixel points may be the same or different. Since the second parameter is a pixel-level parameter, the degree of smooth differentiability of the initial differentiable activation function can be more finely controlled.
When the second parameter is a channel-level parameter, the values of the respective second parameters of different channels are independent of each other, but the same value of the second parameter is shared by the pixels of the same channel. In this case, since the second parameter is a channel-level parameter, the smooth differentiable degree of the initial differentiable activation function can be controlled regionally.
When the second parameter is a slave level parameter, the values of the respective second parameters of different levels are independent from each other, and in this case, the values of the respective second parameters of each pixel point in the same level are the same. Since the second parameter is a parameter of the hierarchy, the smooth differentiable degree of the initial differentiable activation function can be controlled hierarchically.
Of course, the second parameter can be set to be pixel level, channel level or level according to actual requirements or experience.
Through the above process, a target network is obtained, wherein the activation function of at least one layer of the target network is a differentiable activation function, so that image processing can be performed by using the target network. Referring to fig. 4, a flowchart illustrating steps of an image processing method according to an embodiment of the present invention is shown, and as shown in fig. 4, the method may specifically include the following steps:
step S401: and obtaining an image to be processed.
The image to be processed may refer to an image that needs to be subject to object recognition, and the image to be processed may include an image of an object to be recognized. The process of obtaining the image to be processed may refer to an image shot in real time for the object to be identified, may refer to a frame of video image extracted from a video stream, or may refer to a pre-stored image searched from an image library.
Step S402: and inputting the image to be processed into a target network to obtain a processing result output by the target network.
Wherein, the activation function of at least one layer in the target network is a differentiable activation function, and the differentiable activation function is obtained by training an initial differentiable activation function; the processing result is a classification result of the image to be processed or an identification result of an object contained in the image to be processed.
In this embodiment, the target network may classify or identify the image to be processed to obtain a classification result or an identification result of the image to be processed.
Compared with the prior art, in an activation function (e.g., a ReLU activation function) used by a convolutional neural network, some points are not differentiable, for example, intersections of y ═ x and y ═ 0 in the ReLU activation function are not differentiable, and gradients of the differentiable points cannot be returned in a back propagation process and can only be approximated, which results in the problems of low training quality and immobility of model training. After the technical scheme implemented by the invention is adopted, because the activation function of at least one layer in the target network is a differentiable activation function, each point of the activation function can be microminiature, and therefore, the accurate value of each point gradient can be transmitted back in the back propagation process, so that the training intensity of training the convolutional neural network can be enhanced, and the parameters of the network can be updated, so that the target network has higher accuracy rate when the object is identified, the accuracy rate of the neural network model when the object is identified is obviously improved, and higher accuracy rate can be achieved.
In order to facilitate visual understanding, the applicant processes activation functions in a plurality of original neural networks to replace the activation functions with differentiable activation functions, so as to construct a plurality of different target networks, the applicant performs image recognition by using the different target networks, and performs statistics on the recognition error rate of each target network and the recognition error rate of the corresponding original neural network, and the statistical result can be shown in fig. 5.
The primitive neural network included in fig. 5 is mainly: and constructing four different target networks correspondingly by using ShuffleNet V2, Mobile Net V2 and ResNet. In fig. 5, the last row "ACON-C" represents the error rate of the target network obtained after the activation function is replaced with the differentiable activation function, and the corresponding error rate in the second last row "max (x, 0)" represents the error rate of the original neural network (i.e., before the activation function is replaced with the differentiable activation function) for object recognition.
Therefore, for each original neural network, after the activation function is replaced by the differentiable activation function, the error rate of object identification is greatly reduced, and it can be understood that when the error rate of object identification is greatly reduced, the accuracy of object identification is greatly improved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 6, a block diagram of an image processing apparatus according to an embodiment of the present invention is shown, and as shown in fig. 6, the apparatus may specifically include the following modules:
an image obtaining module 601, configured to obtain an image to be processed;
an image input module 602, configured to input the image to be processed into a target network, so as to obtain a processing result output by the target network;
wherein, the activation function of at least one layer in the target network is a differentiable activation function, and the differentiable activation function is obtained by training an initial differentiable activation function; the processing result is a classification result of the image to be processed or an identification result of an object contained in the image to be processed.
Optionally, the initial differentiable activation function contains a learnable parameter.
The apparatus may further include the following modules:
the system comprises a sample input module, a convolution neural network and a data processing module, wherein the sample input module is used for inputting a sample image into the convolution neural network to obtain a processing result output by the convolution neural network, an activation function of at least one layer in the convolution neural network is the initial differentiable activation function, and parameters of the convolution neural network comprise learnable parameters;
and the parameter updating module is used for updating the parameters of the convolutional neural network according to the residual error between the processing result output by the convolutional neural network and a standard processing result until a preset training end condition is met to obtain the target network, wherein the standard processing result is a classification label pre-labeled for the sample image or a label of an object contained in the sample image.
Optionally, the learnable parameter comprises a first parameter for adjusting an upper and a lower limit of a first derivative of the initial differentiable activation function, and/or a second parameter for adjusting a smooth differentiable degree of the initial differentiable activation function.
Optionally, the second parameter is at a pixel level, and values of the second parameter corresponding to different pixel points are independent of each other; or
The second parameter is at channel level, and all pixel points of the same channel share the same value of the second parameter; or
The second parameter is hierarchical, and all the pixel points of all the channels of the same layer share the same value of the second parameter.
Optionally, the initial differentiable activation function is:
Sβ(p1x,p2x)=(p1-p2)x·σ[β(p1-p2)x]+p2x;
wherein p is1Is a parameter, p, which makes the upper limit of the first derivative of the initial differentiable activation function adjustable2The lower limit of the first derivative of the initial differentiable activation function is adjustable, beta is a parameter for adjusting the smooth differentiable degree of the initial differentiable activation function, sigma is a Sigmoid function, and x represents input.
Optionally, the apparatus may further include a first convolutional neural network constructing module, configured to construct the convolutional neural network, and specifically may include the following units:
a first determining unit, configured to determine at least one original activation function layer from an original neural network;
a first replacing unit, configured to replace the at least one original activation function layer with a differentiable activation function layer to obtain the convolutional neural network, where an activation function in the differentiable activation function layer is the initial differentiable activation function.
Optionally, the apparatus may further include a second convolutional neural network constructing module, configured to construct the convolutional neural network, and specifically may include the following units:
a second determining unit, configured to determine a plurality of original convolutional layers to be processed from an original neural network, where an original activation function layer is configured between every two adjacent original convolutional layers to be processed in the plurality of original convolutional layers to be processed;
a second replacing unit, configured to replace the original activation function layer configured between one or more groups of adjacent two original convolution layers to be processed with a differentiable activation function layer, so as to obtain the convolutional neural network, where an activation function in the differentiable activation function layer is the initial differentiable activation function.
Optionally, the apparatus may further include a third convolutional neural network constructing module, configured to construct the convolutional neural network, and specifically, the apparatus may include the following units:
a third determining unit, configured to determine a feature processing layer to be processed from an original neural network, where an output end of the feature processing layer is connected to an input end of an original activation function layer, and the feature processing layer is a residual layer or a feature splicing layer;
a third replacing unit, configured to replace the original activation function layer connected to the output end of the feature processing layer with the differentiable activation function layer to obtain the convolutional neural network, where an activation function in the differentiable activation function layer is the initial differentiable activation function.
It should be noted that the device embodiments are similar to the method embodiments, so that the description is simple, and reference may be made to the method embodiments for relevant points.
Embodiments of the present invention further provide an electronic device, which may be configured to execute an image processing method and may include a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor is configured to execute the image processing method.
Embodiments of the present invention further provide a computer-readable storage medium storing a computer program for causing a processor to execute the image processing method according to the embodiments of the present invention.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The foregoing detailed description of an image processing method, an image processing apparatus, an image processing device, and a storage medium according to the present invention has been presented, and the principles and embodiments of the present invention are described herein by using specific examples, and the descriptions of the above examples are only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (12)

1. An image processing method, comprising:
obtaining an image to be processed;
inputting the image to be processed into a target network to obtain a processing result output by the target network;
wherein, the activation function of at least one layer in the target network is a differentiable activation function, and the differentiable activation function is obtained by training an initial differentiable activation function;
the processing result is a classification result of the image to be processed or an identification result of an object contained in the image to be processed.
2. The method of claim 1, wherein the initial differentiable activation function contains a learnable parameter.
3. The method of claim 2, wherein the target network is trained by:
inputting a sample image into a convolutional neural network to obtain a processing result output by the convolutional neural network, wherein an activation function of at least one layer in the convolutional neural network is the initial differentiable activation function, and parameters of the convolutional neural network comprise learnable parameters; and updating parameters of the convolutional neural network according to a residual error between a processing result output by the convolutional neural network and a standard processing result until a preset training end condition is met to obtain the target network, wherein the standard processing result is a classification label labeled for the sample image in advance or a label of an object contained in the sample image.
4. A method according to any of claims 1-3, wherein the learnable parameters in the initial differentiable activation function comprise a first parameter for making the upper and lower limits of the first derivative of the initial differentiable activation function adjustable, and/or a second parameter for adjusting the smooth differentiable degree of the initial differentiable activation function.
5. The method of claim 4, wherein the second parameter is pixel-level, and values of the second parameter corresponding to different pixels are independent of each other; or
The second parameter is at channel level, and all pixel points of the same channel share the same value of the second parameter; or
The second parameter is hierarchical, and all the pixel points of all the channels of the same layer share the same value of the second parameter.
6. The method according to any of claims 1-5, wherein the initial differentiable activation function is:
Sβ(p1x,p2x)=(p1-p2)x·σ[β(p1-p2)x]+p2x;
wherein p is1Is a parameter, p, which makes the upper limit of the first derivative of the initial differentiable activation function adjustable2The lower limit of the first derivative of the initial differentiable activation function is adjustable, beta is a parameter for adjusting the smooth differentiable degree of the initial differentiable activation function, sigma is a Sigmoid function, and x represents input.
7. The method of any one of claims 2-6, wherein the convolutional neural network is constructed by:
determining at least one original activation function layer from an original neural network;
replacing the at least one original activation function layer with a differentiable activation function layer to obtain the convolutional neural network, wherein an activation function in the differentiable activation function layer is the initial differentiable activation function.
8. The method of any one of claims 2-7, wherein the convolutional neural network is constructed by:
determining a plurality of original convolutional layers to be processed from an original neural network, wherein an original activation function layer is configured between every two adjacent original convolutional layers to be processed in the plurality of original convolutional layers to be processed;
replacing the original activation function layer configured between one or more groups of adjacent two original convolution layers to be processed with a differentiable activation function layer to obtain the convolutional neural network, wherein an activation function in the differentiable activation function layer is the initial differentiable activation function.
9. The method of any one of claims 2-8, wherein the convolutional neural network is constructed by:
determining a feature processing layer to be processed from an original neural network, wherein the output end of the feature processing layer is connected with the input end of an original activation function layer, and the feature processing layer is a residual error layer or a feature splicing layer;
replacing the original activation function layer connected with the output end of the feature processing layer with the differentiable activation function layer to obtain the convolutional neural network, wherein the activation function in the differentiable activation function layer is the initial differentiable activation function.
10. An image processing apparatus, characterized in that the apparatus comprises:
the image obtaining module is used for obtaining an image to be processed;
the image input module is used for inputting the image to be processed into a target network to obtain a processing result output by the target network;
wherein, the activation function of at least one layer in the target network is a differentiable activation function, and the differentiable activation function is obtained by training an initial differentiable activation function;
the processing result is a classification result of the image to be processed or an identification result of an object contained in the image to be processed.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor when executing implementing the image processing method according to any of claims 1-9.
12. A computer-readable storage medium storing a computer program for causing a processor to execute the image processing method according to any one of claims 1 to 9.
CN202010791058.8A 2020-08-07 2020-08-07 Image processing method, apparatus, device and medium Pending CN112069905A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010791058.8A CN112069905A (en) 2020-08-07 2020-08-07 Image processing method, apparatus, device and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010791058.8A CN112069905A (en) 2020-08-07 2020-08-07 Image processing method, apparatus, device and medium

Publications (1)

Publication Number Publication Date
CN112069905A true CN112069905A (en) 2020-12-11

Family

ID=73660876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010791058.8A Pending CN112069905A (en) 2020-08-07 2020-08-07 Image processing method, apparatus, device and medium

Country Status (1)

Country Link
CN (1) CN112069905A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801266A (en) * 2020-12-24 2021-05-14 武汉旷视金智科技有限公司 Neural network construction method, device, equipment and medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112801266A (en) * 2020-12-24 2021-05-14 武汉旷视金智科技有限公司 Neural network construction method, device, equipment and medium
CN112801266B (en) * 2020-12-24 2023-10-31 武汉旷视金智科技有限公司 Neural network construction method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN112052787B (en) Target detection method and device based on artificial intelligence and electronic equipment
CN110533097B (en) Image definition recognition method and device, electronic equipment and storage medium
US10552737B2 (en) Artificial neural network class-based pruning
Ma et al. Blind image quality assessment by learning from multiple annotators
CN107679466B (en) Information output method and device
CN109242013B (en) Data labeling method and device, electronic equipment and storage medium
CN111160569A (en) Application development method and device based on machine learning model and electronic equipment
CN110856037B (en) Video cover determination method and device, electronic equipment and readable storage medium
US10956716B2 (en) Method for building a computer-implemented tool for assessment of qualitative features from face images
CN107545301B (en) Page display method and device
CN108334878B (en) Video image detection method, device and equipment and readable storage medium
WO2023284465A1 (en) Image detection method and apparatus, computer-readable storage medium, and computer device
WO2020211242A1 (en) Behavior recognition-based method, apparatus and storage medium
CN113128478A (en) Model training method, pedestrian analysis method, device, equipment and storage medium
CN111144567A (en) Training method and device of neural network model
CN112069905A (en) Image processing method, apparatus, device and medium
CN110489435B (en) Data processing method and device based on artificial intelligence and electronic equipment
CN117056595A (en) Interactive project recommendation method and device and computer readable storage medium
CN111783936A (en) Convolutional neural network construction method, device, equipment and medium
CN109783769B (en) Matrix decomposition method and device based on user project scoring
CN116110033A (en) License plate generation method and device, nonvolatile storage medium and computer equipment
CN112287938B (en) Text segmentation method, system, device and medium
CN114170484A (en) Picture attribute prediction method and device, electronic equipment and storage medium
CN113392867A (en) Image identification method and device, computer equipment and storage medium
CN114399901A (en) Method and equipment for controlling traffic system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination