CN115187845A - Image processing method, device and equipment - Google Patents

Image processing method, device and equipment Download PDF

Info

Publication number
CN115187845A
CN115187845A CN202210815529.3A CN202210815529A CN115187845A CN 115187845 A CN115187845 A CN 115187845A CN 202210815529 A CN202210815529 A CN 202210815529A CN 115187845 A CN115187845 A CN 115187845A
Authority
CN
China
Prior art keywords
determining
feature
image
characteristic
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210815529.3A
Other languages
Chinese (zh)
Inventor
张凯
任文奇
李哲暘
谭文明
任烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Hikvision Digital Technology Co Ltd
Original Assignee
Hangzhou Hikvision Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Hikvision Digital Technology Co Ltd filed Critical Hangzhou Hikvision Digital Technology Co Ltd
Priority to CN202210815529.3A priority Critical patent/CN115187845A/en
Publication of CN115187845A publication Critical patent/CN115187845A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Abstract

The application provides an image processing method, an image processing device and image processing equipment, wherein the method comprises the following steps: acquiring input characteristics corresponding to an image to be processed; determining a shift operation feature based on the input feature, determining a range-defining feature based on the input feature and the shift operation feature; determining a mapping feature corresponding to the image to be processed based on the shifting operation feature and the limited range feature; performing normalization operation based on the mapping characteristics to obtain normalization characteristics, and determining output characteristics corresponding to the image to be processed based on the normalization characteristics; and determining an image processing result corresponding to the image to be processed based on the output characteristic. By the technical scheme, the operation complexity can be reduced, the operation amount is small, and the resource consumption is small.

Description

Image processing method, device and equipment
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to an image processing method, apparatus, and device.
Background
Machine learning is a way to realize artificial intelligence, is a multi-field cross subject, and relates to a plurality of subjects such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and the like. Machine learning is used to study how computers simulate or implement human learning behaviors to acquire new knowledge or skills and reorganize existing knowledge structures to improve their performance. Machine learning focuses more on algorithm design, so that a computer can automatically learn rules from data and predict unknown data by using the rules.
Machine learning has found a wide variety of applications, such as deep learning, data mining, computer vision, natural language processing, biometric recognition, search engines, medical diagnostics, speech and handwriting recognition, and the like.
In order to implement artificial intelligence processing by machine learning, the server needs to acquire a large amount of sample data, train a machine learning model based on the sample data, and deploy the machine learning model to the terminal device (such as a camera) so that the terminal device implements artificial intelligence processing based on the machine learning model.
However, since the processing procedure of the machine learning model involves a large number of complex operations, such as the operation of the e-exponential operation, the amount of operations of the machine learning model is large, and the resource consumption is large.
Disclosure of Invention
The application provides an image processing method, which comprises the following steps:
acquiring input characteristics corresponding to an image to be processed;
determining a shift operation feature based on the input feature, determining a range-defining feature based on the input feature and the shift operation feature; determining a mapping feature corresponding to the image to be processed based on the shifting operation feature and the limited range feature; performing normalization operation based on the mapping characteristics to obtain normalization characteristics, and determining output characteristics corresponding to the image to be processed based on the normalization characteristics;
and determining an image processing result corresponding to the image to be processed based on the output characteristic.
The present application provides an image processing apparatus, the apparatus including:
the acquisition module is used for acquiring the input characteristics corresponding to the image to be processed;
a determination module for determining a shift operation feature based on the input feature, a range-defining feature based on the input feature and the shift operation feature; determining a mapping feature corresponding to the image to be processed based on the shifting operation feature and the limited range feature; performing normalization operation based on the mapping characteristics to obtain normalization characteristics, and determining output characteristics corresponding to the image to be processed based on the normalization characteristics;
and the processing module is used for determining an image processing result corresponding to the image to be processed based on the output characteristic.
The application provides an image processing apparatus, including: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is configured to execute machine-executable instructions to implement the image processing method disclosed in the above example.
According to the technical scheme, the operation of the e index operation can be replaced by the shift operation and the linear operation, namely the e index is approximate to the combination of the shift operation (2 exponential power) and the linear operation, so that the resource overhead can be greatly reduced, and the approximation precision is improved by introducing the piecewise linear operation. The method can reduce the operation complexity, has small operation amount and resource consumption, reduces the resource overhead of a chip, ensures that the performance of the target network model is high, and can be widely applied to various artificial intelligent service scenes, such as image classification, target detection, segmentation, attitude estimation and other service scenes.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings required to be used in the embodiments of the present application or the technical solutions in the prior art are briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present application.
FIG. 1 is a schematic flow chart diagram of an image processing method in one embodiment of the present application;
FIG. 2A is a schematic diagram of a classification network in one embodiment of the present application;
FIG. 2B is a schematic diagram of a detection network according to an embodiment of the present application;
FIG. 3 is a schematic diagram of a target network model in one embodiment of the present application;
FIG. 4 is a flow diagram illustrating an image processing method according to an embodiment of the present application;
FIG. 5 is a process diagram of a target network layer in one embodiment of the present application;
FIG. 6 is a schematic diagram of a target network model in one embodiment of the present application;
FIG. 7 is a schematic diagram of an image processing apparatus according to an embodiment of the present application;
fig. 8 is a hardware configuration diagram of an image processing apparatus in an embodiment of the present application.
Detailed Description
The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" is used may be interpreted as "at … …" or "at … …" or "in response to a determination".
The embodiment of the present application provides an image processing method, which may be applied to any type of device, such as a terminal device (e.g., a camera, etc.) that needs to deploy a target network model, and is not limited thereto. Referring to fig. 1, a flow chart of the image processing method is schematically shown, and the method may include:
step 101, obtaining input characteristics corresponding to an image to be processed.
102, determining a shifting operation characteristic based on the input characteristic, and determining a limited range characteristic based on the input characteristic and the shifting operation characteristic; determining a mapping characteristic corresponding to the image to be processed based on the shifting operation characteristic and the limited range characteristic; and carrying out normalization operation based on the mapping characteristics to obtain normalization characteristics, and determining output characteristics corresponding to the image to be processed based on the normalization characteristics.
And 103, determining an image processing result corresponding to the image to be processed based on the output characteristic.
For example, determining the mapping feature corresponding to the image to be processed based on the shift operation feature and the limited range feature may include, but is not limited to: determining an exponential operation value based on the shift operation characteristic; determining a linear operation value based on the defined range feature; the mapping feature is determined based on the exponential operation value and the linear operation value.
Exemplary, determining linear operating values based on the range-defining characteristics may include, but is not limited to: and determining the linear operation value by adopting a K piecewise linear function based on the limited range characteristic, a preset threshold and the calibrated target parameter value, wherein K can be a positive integer greater than 1, such as 2, 3 and the like.
In one possible embodiment, the linear operation value is determined by using a K-piece linear function based on the limited range feature, the preset threshold value and the calibrated target parameter value, which may include but is not limited to:
when K is 2, the linear operation value is determined using the following formula:
Figure BDA0003737559590000041
wherein s represents a linear operation value, a 1 、b 1 、a 2 、b 2 Representing a target parameter value, p representing a limited range characteristic, and 0.5 × ln2 representing a preset threshold value;
alternatively, when K is 2, the linear operation value is determined using the following formula:
Figure BDA0003737559590000042
wherein s represents a linear operation value, a 1 、b 1 、a 2 、b 2 Representing a target parameter value, p representing a limited range characteristic, and 0.3 × ln2 representing a preset threshold value;
alternatively, when K is 3, the linear operation value is determined using the following equation:
Figure BDA0003737559590000043
wherein s represents a linear operation value, a 1 、b 1 、a 2 、b 2 、a 3 、b 3 Representing the target parameter values, p representing the range-defining features, 0.33 ln2, 0.67 ln2, ln2 representing the preset threshold values.
In one possible embodiment, the target network model may include a plurality of network layers including a target network layer for implementing the normalization function, the network layers preceding the target network layer constituting a first sub-network, and the network layers following the target network layer constituting a second sub-network, on the basis of which:
for step 101, an image to be processed may be input to a first sub-network, so as to obtain an input feature corresponding to the image to be processed; for example, after obtaining the image to be processed, the first sub-network may process the image to be processed, and the processing process is not limited to obtain the input feature corresponding to the image to be processed.
For step 102, after obtaining the input feature corresponding to the image to be processed, the input feature may be input to the target network layer, the target network layer determines a shift operation feature based on the input feature, determines a limited range feature based on the input feature and the shift operation feature, determines a mapping feature corresponding to the image to be processed based on the shift operation feature and the limited range feature, performs a normalization operation based on the mapping feature to obtain a normalization feature, and determines an output feature corresponding to the image to be processed based on the normalization feature.
For step 103, after obtaining the output feature corresponding to the image to be processed, the output feature may be input to the second sub-network, so as to obtain the image processing result corresponding to the image to be processed. For example, after obtaining the output feature, the second sub-network may process the output feature to obtain an image processing result.
Illustratively, the target network layer may be a sigmod network layer in the target network model; alternatively, the target network layer may be a softmax network layer in the target network model. Of course, the sigmod network layer and the softmax network layer are just two examples of the target network layer, and the target network layer is not limited thereto.
Illustratively, in order to obtain a target network model, a trained network model may be obtained, and after the trained network model is obtained, an e-index operator of a target network layer in the trained network model is replaced with a target type operator to obtain the target network model; wherein: the target type operator is used for realizing functions of determining a shift operation characteristic based on the input characteristic, determining a limit range characteristic based on the input characteristic and the shift operation characteristic, and determining a mapping characteristic based on the shift operation characteristic and the limit range characteristic.
According to the technical scheme, the operation of the e index operation can be replaced by the shift operation and the linear operation, namely the e index is approximate to the combination of the shift operation (2 exponential power) and the linear operation, so that the resource overhead can be greatly reduced, and the approximation precision is improved by introducing the piecewise linear operation. The method can reduce the operation complexity, has small operation amount and resource consumption, reduces the resource overhead of a chip, ensures that the performance of the target network model is high, and can be widely applied to various artificial intelligence service scenes, such as image classification, target detection, segmentation, attitude estimation and the like.
The following describes the technical solution of the embodiment of the present application with reference to a specific application scenario.
Before the technical solutions of the present application are introduced, concepts related to the embodiments of the present application are introduced.
Machine learning: machine learning is a way to implement artificial intelligence, and is used to study how a computer simulates or implements human learning behaviors to acquire new knowledge or skills, and reorganize an existing knowledge structure to continuously improve its performance. Deep learning, which is a process of using mathematical models to model specific problems in the real world to solve similar problems in the field, and neural networks belong to a subclass of machine learning. The neural network is an arithmetic mathematical model for simulating animal neural network behavior characteristics and performing distributed parallel information processing, and the neural network achieves the purpose of processing information by adjusting the interconnection relationship among a large number of internal nodes depending on the complexity of the system. For convenience of description, taking the structure and function of a neural network as an example, other sub-classes of machine learning are similar to those of the neural network.
A neural network: the neural network may include, but is not limited to, a Convolutional Neural Network (CNN), a cyclic neural network (RNN), a fully-connected network, etc., and the structural units of the neural network may include, but are not limited to, a convolutional layer (Conv), a pooling layer (Pool), an excitation layer, a fully-connected layer (FC), etc.
In practical application, one or more convolution layers, one or more pooling layers, one or more excitation layers, and one or more fully-connected layers may be combined to construct a neural network according to different requirements.
In the convolutional layer, the input data features are enhanced by performing a convolution operation on the input data features using a convolution kernel, the convolution kernel may be a matrix of m × n, the input data features of the convolutional layer are convolved with the convolution kernel, the output data features of the convolutional layer may be obtained, and the convolution operation is actually a filtering process.
In the pooling layer, the input data features (such as the output of the convolutional layer) are subjected to operations of taking the maximum value, taking the minimum value, taking the average value and the like, so that the input data features are sub-sampled by utilizing the principle of local correlation, the processing amount is reduced, the feature invariance is kept, and the operation of the pooling layer is actually a down-sampling process.
In the excitation layer, the input data features can be mapped using an activation function (e.g., a nonlinear function), thereby introducing a nonlinear factor such that the neural network enhances expressive power through a combination of nonlinearities.
The activation function may include, but is not limited to, a ReLU (Rectified Linear Unit) function that is used to set features less than 0 to 0, while features greater than 0 remain unchanged.
In the fully-connected layer, the fully-connected layer is configured to perform fully-connected processing on all data features input to the fully-connected layer, so as to obtain a feature vector, where the feature vector may include multiple data features.
And (3) network model: a network model may be pre-constructed, which is referred to as an initial network model, the initial network model is an unfinished training network model, and the initial network model may be a machine learning model, such as a deep learning-based machine learning model or a neural network-based machine learning model, which is not limited herein.
For the training process of the initial network model, the initial network model may be trained by using sample data, that is, the process of adjusting and optimizing the model weights (network parameters) in the initial network model. For example, the initial network model may include a plurality of network layers, each network layer includes a model weight, for example, for the initial network model based on the neural network, the model weight may be a convolutional layer parameter (e.g., a convolutional kernel parameter), a pooling layer parameter, an excitation layer parameter, a fully-connected layer parameter, and the like, and for the training process of the initial network model, the model weights in the plurality of network layers of the initial network model may be adjusted and optimized by using the sample data.
After the training of the initial network model is completed, a trained network model can be obtained, and the model weights in the trained network model are the model weights after the adjustment and optimization are completed. After the trained network model is obtained, the trained network model can be deployed to the terminal device, so that the terminal device can realize artificial intelligence processing based on the trained network model. For example, the trained network model may be widely applied to various service scenarios of artificial intelligence, such as image classification, target detection, segmentation, pose estimation, and the like, and for example, the target detection may be implemented based on the trained network model, and for example, for the face detection function, an image including a face is input to the trained network model, and the image is subjected to artificial intelligence processing by the trained network model, and the artificial intelligence processing result is a face detection result. For the vehicle detection function, an image including a vehicle is input to a trained network model, and the image is subjected to artificial intelligence processing through the trained network model, wherein the artificial intelligence processing result is a vehicle detection result.
In a possible implementation manner, the initial network model may include an e-index operator, after the initial network model is trained to obtain a trained network model, the trained network model may also include the e-index operator, and after the trained network model is deployed to a terminal device, when the terminal device implements artificial intelligence processing based on the trained network model, operation needs to be performed based on the e-index operator, and an operation process of the e-index operator is a complex operation, which may bring relatively large resource overhead and need special resources to support, so that an operation amount of the trained network model is large and resource consumption is large.
In view of the above findings, an e-exponent approximate calculation method provided in the embodiments of the present application may replace e-exponent operation with shift operation and linear operation, that is, e-exponent is approximated to a combination of shift operation (2 power of exponent) and linear operation, so that resource overhead is greatly reduced, and by introducing piecewise linear operation, approximation accuracy is improved. The method has the advantages of reducing the operation complexity, reducing the operation amount, reducing the resource consumption and reducing the resource overhead of the chip. The e index approximate calculation method provided by the embodiment can be widely applied to various artificial intelligent service scenes, such as image classification, target detection, segmentation, attitude estimation and other service scenes.
In this embodiment, an initial network model may be pre-constructed, the initial network model may include a plurality of network layers, and a training set may be pre-constructed, where the training set may include a large amount of sample data and calibration data corresponding to each sample data. The initial network model can be trained based on the training set, the trained network model can be obtained after training is completed, the e-index operator of the target network layer in the trained network model can be replaced by the target type operator after the trained network model is obtained, so that the target network model is obtained, and the target network model is deployed to the terminal device.
For example, the multiple network layers of the trained network model may include a target network layer, where the target network layer is a network layer for implementing a normalization function, that is, normalizing any value to a certain specified value interval, such as a value interval of 0 to 1, and the target network layer for implementing the normalization function generally implements the normalization function by using an e-index operator.
For example, with respect to the process of implementing the normalization function by the target type operator, the target type operator is used to implement the functions of determining the shift operation feature based on the input feature, determining the range-defining feature based on the input feature and the shift operation feature, and determining the mapping feature based on the shift operation feature and the range-defining feature, and the process of implementing the normalization function by the target type operator may be referred to in the following embodiments.
For example, in the field of artificial intelligence such as target detection, segmentation, image classification and the like, the sigmod function and the softmax function may both implement the normalization function by using an e-index operator, and therefore, a target network layer for implementing the normalization function may be a sigmod network layer and/or a softmax network layer.
For convenience of distinction, a network layer adopting a sigmod function may be referred to as a sigmod network layer, and in the sigmod network layer, the sigmod function is used for mapping variables to a value interval of 0 to 1.
For convenience of distinguishing, a network layer adopting a softmax function can be called a softmax network layer, and in the softmax network layer, the softmax function is also called a normalized index function, is a popularization of a two-classification function in multi-classification, and is used for displaying a multi-classification result in a probability form.
For example, for the image classification task, in the classification network, the penultimate network layer often uses the softmax function to implement the classification function, as shown in fig. 2A, which is a schematic structural diagram of the classification network, and therefore, the penultimate network layer can be used as the target network layer. For the image detection task, referring to fig. 2B, for a schematic structural diagram of a detection network, a sigmod function or a softmax function is often adopted by a classification network layer to realize a classification function, so that the classification network layer can be used as a target network layer.
In summary, in this embodiment, the target network model includes a plurality of network layers, where the plurality of network layers include a target network layer for implementing the normalization function, and the target network layer is a sigmod network layer in the target network model, and/or the target network layer is a softmax network layer in the target network model. The target network layer can further comprise a target type operation operator for realizing the normalization function, and the target type operation operator is used for realizing the functions of determining the shift operation characteristic based on the input characteristic, determining the limited range characteristic based on the input characteristic and the shift operation characteristic, and determining the mapping characteristic based on the shift operation characteristic and the limited range characteristic.
In this embodiment, all network layers in front of the target network layer may be formed into a first sub-network, and network layers behind the target network layer may be formed into a second sub-network, so that the target network model includes the first sub-network, the target network layer, and the second sub-network, the first sub-network including at least one network layer, and the second sub-network including at least one network layer. Referring to fig. 3, which is a schematic structural diagram of a target network model, the target network model may include a network layer 1, a network layer 2, a network layer 3, a network layer 4, and a network layer 5, where, for example, only 5 network layers are used, the number of network layers of the target network model may be greater than 5, and assuming that the network layer 4 is a softmax network layer, the network layer 4 may be used as the target network layer, then a first sub-network may include the network layer 1, the network layer 2, and the network layer 3, and a second sub-network may include the network layer 5.
After the target network model is obtained, the target network model can be deployed to the terminal device, the terminal device realizes image processing based on the target network model, and the image processing process is an artificial intelligence processing process based on the target network model. An image processing method provided in an embodiment of the present application may be applied to a terminal device, and as shown in fig. 4, is a flowchart of the method, and the method may include:
step 401, obtaining an input feature corresponding to an image to be processed.
For example, after the to-be-processed image is obtained, the to-be-processed image may be input to a first sub-network of the target network model, and the to-be-processed image is processed by the first sub-network, so as to obtain input features corresponding to the to-be-processed image, for example, the network layer 1, the network layer 2, and the network layer 3 sequentially process the to-be-processed image, and the processing process is not limited, so as to obtain input features corresponding to the to-be-processed image.
Step 402, determining a shift operation characteristic based on an input characteristic corresponding to an image to be processed, and determining a limited range characteristic based on the input characteristic and the shift operation characteristic; determining a mapping characteristic corresponding to the image to be processed based on the shifting operation characteristic and the limited range characteristic; and carrying out normalization operation based on the mapping characteristics to obtain normalization characteristics, and determining output characteristics corresponding to the image to be processed based on the normalization characteristics.
For example, after obtaining the input feature corresponding to the image to be processed, the input feature may be input to the target network layer, the target network layer determines a shift operation feature based on the input feature, determines a limited range feature based on the input feature and the shift operation feature, determines a mapping feature corresponding to the image to be processed based on the shift operation feature and the limited range feature, performs a normalization operation based on the mapping feature to obtain a normalization feature, and determines an output feature corresponding to the image to be processed based on the normalization feature.
In one possible embodiment, for any one x, solving for e x Can be converted into e within a limited range p ,0<p<ln2 and one shift operation: e.g. of the type x =2 z e p =e zln2 e p Based on the above principle, in this embodiment, for step 402, the following steps may be adopted:
step 4021, determining a shift operation characteristic based on the input characteristic.
For example, based on the input features, the following formula can be used to determine the shift operation features:
Figure BDA0003737559590000101
in the above formula, x represents an input characteristic, and z represents a shift operation characteristic.
Step 4022, determining a range-defining feature based on the input feature and the shift operation feature.
For example, the range-defining feature can be determined using the following formula: p = x-z × ln2,0 s are woven p and ln2, in the above formula, x denotes an input characteristic, z denotes a shift operation characteristic, and p denotes a range-defining characteristic.
Step 4023, determining an index operation value based on the shift operation characteristics.
For example, based on the shift operation characteristic, the exponential operation value may be determined using the following formula: w =2 z In the above formula, z represents a shift operation characteristic, and w represents an exponential operation value.
Step 4024, determining linear operation values based on the limited range features.
For example, based on the range-defining feature, the following formula may be used to determine the linear operation value: s = e p In the above formula, p represents a limit range characteristic, and s represents a linear operation value.
In one possible embodiment, for e p For example, based on the limited range feature, the preset threshold and the target parameter value, the linear operation value s may be determined by using a piecewise linear function K, where K may be a positive integer greater than 1, such as 2 or 3.
For example, to determine the linear operation value s, the following method may be used:
in the method 1, when the K piecewise linear function is a 2 piecewise linear function, based on the limited range feature, the preset threshold and the target parameter value, the linear operation value s can be determined by using the following formula:
Figure BDA0003737559590000111
in the above formula, s represents a linear operation value, a 1 、b 1 、a 2 、b 2 Representing the target parameter value, p representing the range-defining feature, and 0.5 x ln2 representing the preset threshold. Wherein, with respect to a 1 、b 1 、a 2 、b 2 For example, for a p value, the s = e is obtained by s = e p And calculating an s value corresponding to the p value, wherein the p value and the s value are a set of calibration data. A can be calculated by substituting multiple sets of calibration data into the above formula 1 、b 1 、a 2 、b 2 The value of (a).
Mode 2, when the K piecewise linear function is a 2 piecewise linear function, based on the limited range feature, the preset threshold and the target parameter value, the linear operation value s may be determined by using the following formula:
Figure BDA0003737559590000112
in the above formula, s represents a linear operation value, a 1 、b 1 、a 2 、b 2 Representing the target parameter value, p representing the range-defining feature, and 0.3 x ln2 representing the preset threshold. Wherein, with respect to a 1 、b 1 、a 2 、b 2 For example, for a p value, the s = e is obtained by s = e p And calculating an s value corresponding to the p value, wherein the p value and the s value are a set of calibration data. A can be calculated by substituting multiple sets of calibration data into the above formula 1 、b 1 、a 2 、b 2 The value of (a).
Mode 3, when the K piecewise linear function is a 3 piecewise linear function, based on the limited range feature, the preset threshold and the target parameter value, the linear operation value s may be determined by using the following formula:
Figure BDA0003737559590000113
in the above formula, s represents a linear operation value, a 1 、b 1 、a 2 、b 2 、a 3 、b 3 Representing the target parameter values, p representing the range-defining features, 0, 0.33 ln2, 0.67 ln2, ln2 each representing a preset threshold.
Wherein, with respect to a 1 、b 1 、a 2 、b 2 、a 3 、b 3 For example, for a p value, the value of s = e is obtained by s = e p And calculating an s value corresponding to the p value, wherein the p value and the s value are a set of calibration data. On the basis, a can be calculated by substituting a plurality of groups of calibration data into the formula 1 、b 1 、a 2 、b 2 、a 3 、b 3 The value of (a).
Of course, the above 3 ways are only a few examples of determining the linear operation value s, and under the 2-segment linear function, different preset thresholds correspond to different determination ways, and the preset thresholds are not limited and can be configured according to experience. Under the 3-piecewise linear function, different preset thresholds correspond to different determination modes, and the preset thresholds are not limited and can be configured according to experience. Of course, the K piecewise linear function may also be a 4 piecewise linear function, a 5 piecewise linear function, etc., which are similar in implementation principle and only need to design different preset thresholds.
Step 4025, determining mapping characteristics based on the exponential operating values and the linear operating values.
For example, based on the exponential operation value and the linear operation value, the following formula may be used to determine the mapping characteristics: k = w s =2 z e p W denotes an exponential operation value, and s denotes a linear operation value.
The processing procedure of steps 4021 to 4025 will be described below with reference to fig. 5. Firstly, a shift operation characteristic z and a limited range characteristic p are determined based on an input characteristic x, then an exponential operation value w is determined based on the shift operation characteristic z, a piecewise linear function is adopted to perform fitting calculation based on the limited range characteristic p to obtain a linear operation value s, and then a mapping characteristic can be calculated based on the exponential operation value w and the linear operation value s.
And 4026, performing normalization operation based on the mapping characteristics to obtain normalization characteristics.
For example, after obtaining the mapping characteristic, a normalization operation may be performed based on the mapping characteristic to obtain a normalization characteristic, where the normalization characteristic may be a value in a specified value interval, such as a value in a value interval of 0-1, and in this embodiment, the process of the normalization operation is not limited.
And step 4027, determining output characteristics corresponding to the image to be processed based on the normalized characteristics.
For example, after obtaining the normalized feature, the target network layer may perform processing based on the normalized feature, and the processing is not limited, so as to obtain an output feature corresponding to the image to be processed.
In step 403, an image processing result corresponding to the image to be processed is determined based on the output feature.
For example, after obtaining the output feature corresponding to the image to be processed, the output feature may be input to the second sub-network, so as to obtain the image processing result corresponding to the image to be processed. For example, after obtaining the output feature, the second sub-network may process the output feature to obtain an image processing result.
Referring to fig. 6, a schematic diagram of a structure of a target network model is shown, a network layer using a softmax function is a target network layer, and processing of the target network model is described in combination with the structure of the target network model.
After the image to be processed is obtained, convolution operation can be performed on the image to be processed through the first convolution layer to obtain a first convolution characteristic, convolution operation can be performed on the image to be processed through the second convolution layer to obtain a second convolution characteristic, and convolution operation can be performed on the image to be processed through the third convolution layer to obtain a third convolution characteristic. Based on the first convolution feature and the second convolution feature, the first convolution feature and the second convolution feature can be subjected to mul operation through the first mul network layer, and the input feature of the softmax network layer is obtained.
After the input features are obtained, the input features may be input to the softmax network layer, which is the target network layer of the above embodiments. Based on the input features, the softmax network layer can obtain the output features corresponding to the images to be processed by adopting the processing flows of the steps 4021 to the step 4027.
Based on the output feature and the third convolution feature corresponding to the image to be processed, the output feature and the third convolution feature can be subjected to mul operation through the second mul network layer to obtain a mul operation feature, the mul operation feature is input to the FFN network layer, the mul operation feature is processed through the FFN network layer to obtain an image processing result (namely an artificial intelligence processing result) corresponding to the image to be processed, and the image processing process is completed.
According to the technical scheme, the operation of the e index operation can be replaced by the shift operation and the linear operation, namely the e index is approximate to the combination of the shift operation (2 exponential power) and the linear operation, so that the resource overhead can be greatly reduced, and the approximation precision is improved by introducing the piecewise linear operation. The method can reduce the operation complexity, has small operation amount and resource consumption, reduces the resource overhead of a chip, ensures that the performance of the target network model is high, and can be widely applied to various artificial intelligent service scenes, such as image classification, target detection, segmentation, attitude estimation and other service scenes.
Based on the same application concept as the method described above, an image processing apparatus is proposed in the embodiment of the present application, and as shown in fig. 7, the image processing apparatus is a schematic structural diagram of the apparatus, and the apparatus may include:
an obtaining module 71, configured to obtain an input feature corresponding to an image to be processed;
a determining module 72 for determining a shift operation feature based on the input feature, a range-defining feature based on the input feature and the shift operation feature; determining a mapping feature corresponding to the image to be processed based on the shifting operation feature and the limited range feature; performing normalization operation based on the mapping characteristics to obtain normalization characteristics, and determining output characteristics corresponding to the image to be processed based on the normalization characteristics;
and the processing module 73 is configured to determine an image processing result corresponding to the image to be processed based on the output feature.
In a possible implementation manner, the determining module 72 is specifically configured to, when determining the mapping feature corresponding to the image to be processed based on the shift operation feature and the limited-range feature: determining an exponential operation value based on the shift operation characteristic; and determining a linear operation value based on the limited range feature; and determining the mapping characteristic based on the exponential operation value and the linear operation value.
For example, the determining module 72 is specifically configured to, when determining the linear operation value based on the range-defining characteristic: and determining the linear operation value by adopting a K piecewise linear function based on the limited range characteristic, a preset threshold value and a calibrated target parameter value, wherein K is a positive integer greater than 1.
For example, the determining module 72 is specifically configured to determine the linear operation value by using a K-piecewise linear function based on the limited range feature, a preset threshold and a calibrated target parameter value:
when K is 2, the linear operation value is determined using the following formula:
Figure BDA0003737559590000141
wherein s represents the linear operation value, a 1 、b 1 、a 2 、b 2 Representing the target parameter value, p representing the range-defining feature, 0.5 x ln2 representing the preset threshold;
or, when K is 2, determining the linear operation value by using the following formula:
Figure BDA0003737559590000142
wherein s represents the linear operation value, a 1 、b 1 、a 2 、b 2 Representing the target parameter value, p representing the range-defining feature, 0.3 x ln2 representing the preset threshold;
or, when K is 3, determining the linear operation value by using the following formula:
Figure BDA0003737559590000143
wherein s represents the linear operation value, a 1 、b 1 、a 2 、b 2 、a 3 、b 3 Represents the target parameter value, p represents the range-defining feature, and 0.33 ln2, 0.67 ln2, ln2 represent the preset threshold.
Illustratively, the target network model includes a plurality of network layers, where the plurality of network layers include a target network layer for implementing the normalization function, a network layer in front of the target network layer forms a first sub-network, and a network layer behind the target network layer forms a second sub-network, and the obtaining module 71 is specifically configured to: inputting an image to be processed to a first sub-network to obtain input characteristics corresponding to the image to be processed; the determining module 72 is specifically configured to: inputting the input features to the target network layer, determining shifting operation features by the target network layer based on the input features, determining limited range features based on the input features and the shifting operation features, determining mapping features corresponding to the image to be processed based on the shifting operation features and the limited range features, performing normalization operation based on the mapping features to obtain normalized features, and determining output features corresponding to the image to be processed based on the normalized features; the processing module 73 is specifically configured to: and inputting the output characteristics to a second sub-network to obtain an image processing result corresponding to the image to be processed.
Illustratively, the target network layer is a sigmod network layer in the target network model; or, the target network layer is a softmax network layer in the target network model.
Illustratively, the processing module 73 is further configured to, after obtaining the trained network model, replace an e-index operator of a target network layer in the trained network model with a target type operator to obtain the target network model; wherein: the target type operation operator is used for realizing functions of determining a shift operation characteristic based on the input characteristic, determining a limited range characteristic based on the input characteristic and the shift operation characteristic, and determining a mapping characteristic based on the shift operation characteristic and the limited range characteristic.
Based on the same application concept as the method described above, the embodiment of the present application provides an image processing apparatus, as shown in fig. 8, the image processing apparatus includes a processor 81 and a machine-readable storage medium 82, and the machine-readable storage medium 82 stores machine-executable instructions capable of being executed by the processor 81; the processor 81 is configured to execute machine-executable instructions to implement the image processing methods disclosed in the above examples.
Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where several computer instructions are stored, and when the computer instructions are executed by a processor, the image processing method disclosed in the above example of the present application can be implemented.
The machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.
The systems, apparatuses, modules or units described in the above embodiments may be specifically implemented by a computer chip or an entity, or implemented by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (10)

1. An image processing method, characterized in that the method comprises:
acquiring input characteristics corresponding to an image to be processed;
determining a shift operation feature based on the input feature, determining a range-defining feature based on the input feature and the shift operation feature; determining a mapping feature corresponding to the image to be processed based on the shifting operation feature and the limited range feature; performing normalization operation based on the mapping characteristics to obtain normalization characteristics, and determining output characteristics corresponding to the image to be processed based on the normalization characteristics;
and determining an image processing result corresponding to the image to be processed based on the output characteristic.
2. The method according to claim 1, wherein the determining the mapping feature corresponding to the image to be processed based on the shift operation feature and the limited range feature comprises:
determining an exponential operation value based on the shift operation characteristic;
determining a linear operation value based on the range-defining feature;
determining the mapping characteristic based on the exponential operation value and the linear operation value.
3. The method of claim 2,
the determining a linear operation value based on the range-defining feature comprises:
and determining the linear operation value by adopting a K piecewise linear function based on the limited range characteristic, a preset threshold value and a calibrated target parameter value, wherein K is a positive integer greater than 1.
4. The method of claim 3,
determining the linear operation value by adopting a K piecewise linear function based on the limited range characteristic, a preset threshold value and a calibrated target parameter value, wherein the method comprises the following steps:
when K is 2, the linear operation value is determined using the following formula:
Figure FDA0003737559580000011
wherein s represents the linear operationValue of a 1 、b 1 、a 2 、b 2 Representing the target parameter value, p representing the range-defining feature, 0.5 x ln2 representing the preset threshold;
or, when K is 2, determining the linear operation value by using the following formula:
Figure FDA0003737559580000021
wherein s represents the linear operation value, a 1 、b 1 、a 2 、b 2 Representing the target parameter value, p representing the defined range feature, 0.3 × ln2 representing the preset threshold value;
alternatively, when K is 3, the linear operation value is determined using the following equation:
Figure FDA0003737559580000022
wherein s represents the linear operation value, a 1 、b 1 、a 2 、b 2 、a 3 、b 3 Represents the target parameter value, p represents the range-defining feature, and 0.33 ln2, 0.67 ln2, ln2 represent the preset threshold.
5. The method according to any one of claims 1 to 4,
the target network model comprises a plurality of network layers, the plurality of network layers comprise a target network layer for realizing the normalization function, the network layers in front of the target network layer form a first sub-network, and the network layers behind the target network layer form a second sub-network, and the method specifically comprises the following steps:
inputting an image to be processed to a first sub-network to obtain an input characteristic corresponding to the image to be processed;
inputting the input features to the target network layer, determining shifting operation features by the target network layer based on the input features, determining limited range features based on the input features and the shifting operation features, determining mapping features corresponding to the image to be processed based on the shifting operation features and the limited range features, performing normalization operation based on the mapping features to obtain normalized features, and determining output features corresponding to the image to be processed based on the normalized features;
and inputting the output characteristics to a second sub-network to obtain an image processing result corresponding to the image to be processed.
6. The method of claim 5,
the target network layer is a sigmod network layer in the target network model; alternatively, the first and second electrodes may be,
the target network layer is a softmax network layer in the target network model.
7. The method of claim 5, wherein after obtaining the trained network model, replacing an e-index operator of a target network layer in the trained network model with a target type operator to obtain the target network model; wherein: the target type operation operator is used for realizing functions of determining a shift operation characteristic based on the input characteristic, determining a limited range characteristic based on the input characteristic and the shift operation characteristic, and determining a mapping characteristic based on the shift operation characteristic and the limited range characteristic.
8. An image processing apparatus, characterized in that the apparatus comprises:
the acquisition module is used for acquiring the input characteristics corresponding to the image to be processed;
a determination module for determining a shift operation feature based on the input feature, a range-defining feature based on the input feature and the shift operation feature; determining a mapping feature corresponding to the image to be processed based on the shifting operation feature and the limited range feature; performing normalization operation based on the mapping characteristics to obtain normalization characteristics, and determining output characteristics corresponding to the image to be processed based on the normalization characteristics;
and the processing module is used for determining an image processing result corresponding to the image to be processed based on the output characteristic.
9. The apparatus according to claim 8, wherein the determining module, when determining the mapping feature corresponding to the image to be processed based on the shift operation feature and the limited-range feature, is specifically configured to: determining an exponential operation value based on the shift operation characteristic; determining a linear operation value based on the range-defining feature; determining the mapping feature based on the exponential operation value and the linear operation value;
wherein the determining module is specifically configured to, when determining the linear operation value based on the range-defining feature: determining the linear operation value by adopting a K piecewise linear function based on the limited range characteristic, a preset threshold and a calibrated target parameter value, wherein K is a positive integer greater than 1;
the determining module is specifically configured to determine the linear operation value by using a K piecewise linear function based on the limited range feature, a preset threshold and a calibrated target parameter value, wherein the K piecewise linear function is used to:
when K is 2, the linear operation value is determined using the following formula:
Figure FDA0003737559580000031
wherein s represents the linear operation value, a 1 、b 1 、a 2 、b 2 Representing the target parameter value, p representing the range-defining feature, 0.5 x ln2 representing the preset threshold;
or, when K is 2, determining the linear operation value by using the following formula:
Figure FDA0003737559580000032
wherein s represents the linear operation value, a 1 、b 1 、a 2 、b 2 Representing the target parameter value, p representing the range-defining feature, 0.3 x ln2 representing the preset threshold;
or, when K is 3, determining the linear operation value by using the following formula:
Figure FDA0003737559580000041
wherein s represents the linear operation value, a 1 、b 1 、a 2 、b 2 、a 3 、b 3 Representing the target parameter value, p representing the defined range feature, 0.33 × ln2, 0.67 × ln2, ln2 representing the preset threshold value;
the target network model includes a plurality of network layers, the plurality of network layers include a target network layer for implementing a normalization function, a network layer in front of the target network layer forms a first sub-network, a network layer behind the target network layer forms a second sub-network, and the obtaining module is specifically configured to: inputting an image to be processed to a first sub-network to obtain an input characteristic corresponding to the image to be processed; the determining module is specifically configured to: inputting the input features to the target network layer, determining shifting operation features by the target network layer based on the input features, determining limited range features based on the input features and the shifting operation features, determining mapping features corresponding to the image to be processed based on the shifting operation features and the limited range features, performing normalization operation based on the mapping features to obtain normalized features, and determining output features corresponding to the image to be processed based on the normalized features; the processing module is specifically configured to: inputting the output characteristics to a second sub-network to obtain an image processing result corresponding to the image to be processed;
wherein the target network layer is a sigmod network layer in the target network model; or, the target network layer is a softmax network layer in the target network model;
the processing module is further configured to replace an e-index operator of a target network layer in the trained network model with a target type operator after the trained network model is obtained, so as to obtain the target network model; wherein: the target type operation operator is used for realizing functions of determining a shift operation characteristic based on the input characteristic, determining a limited range characteristic based on the input characteristic and the shift operation characteristic, and determining a mapping characteristic based on the shift operation characteristic and the limited range characteristic.
10. An image processing apparatus characterized by comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; the processor is configured to execute machine executable instructions to perform the method steps of any of claims 1-7.
CN202210815529.3A 2022-07-08 2022-07-08 Image processing method, device and equipment Pending CN115187845A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210815529.3A CN115187845A (en) 2022-07-08 2022-07-08 Image processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210815529.3A CN115187845A (en) 2022-07-08 2022-07-08 Image processing method, device and equipment

Publications (1)

Publication Number Publication Date
CN115187845A true CN115187845A (en) 2022-10-14

Family

ID=83517060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210815529.3A Pending CN115187845A (en) 2022-07-08 2022-07-08 Image processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN115187845A (en)

Similar Documents

Publication Publication Date Title
CN107529650B (en) Closed loop detection method and device and computer equipment
CN112613581B (en) Image recognition method, system, computer equipment and storage medium
KR20200022739A (en) Method and device to recognize image and method and device to train recognition model based on data augmentation
CN113705769A (en) Neural network training method and device
CN113435509B (en) Small sample scene classification and identification method and system based on meta-learning
CN110222718B (en) Image processing method and device
CN111783996B (en) Data processing method, device and equipment
CN112766062B (en) Human behavior identification method based on double-current deep neural network
CN113570029A (en) Method for obtaining neural network model, image processing method and device
US11951622B2 (en) Domain adaptation using simulation to simulation transfer
CN110610143A (en) Crowd counting network method, system, medium and terminal for multi-task joint training
CN111126249A (en) Pedestrian re-identification method and device combining big data and Bayes
CN114091554A (en) Training set processing method and device
Kim et al. Exploring temporal information dynamics in spiking neural networks
CN117058235A (en) Visual positioning method crossing various indoor scenes
CN110889316B (en) Target object identification method and device and storage medium
CN114170484B (en) Picture attribute prediction method and device, electronic equipment and storage medium
CN115795355A (en) Classification model training method, device and equipment
CN114155388B (en) Image recognition method and device, computer equipment and storage medium
CN115526310A (en) Network model quantification method, device and equipment
CN115187845A (en) Image processing method, device and equipment
CN115409159A (en) Object operation method and device, computer equipment and computer storage medium
CN111788582A (en) Electronic device and control method thereof
CN113919479B (en) Method for extracting data features and related device
CN111753978B (en) Forward time consumption determination method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination