CN111178520A - Data processing method and device of low-computing-capacity processing equipment - Google Patents
Data processing method and device of low-computing-capacity processing equipment Download PDFInfo
- Publication number
- CN111178520A CN111178520A CN202010011285.4A CN202010011285A CN111178520A CN 111178520 A CN111178520 A CN 111178520A CN 202010011285 A CN202010011285 A CN 202010011285A CN 111178520 A CN111178520 A CN 111178520A
- Authority
- CN
- China
- Prior art keywords
- neural network
- training
- sparse
- processing
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 9
- 238000013528 artificial neural network Methods 0.000 claims abstract description 199
- 238000000034 method Methods 0.000 claims abstract description 43
- 230000006870 function Effects 0.000 claims description 105
- 238000003058 natural language processing Methods 0.000 claims description 38
- 238000010276 construction Methods 0.000 claims description 25
- 238000005457 optimization Methods 0.000 claims description 23
- 238000003062 neural network model Methods 0.000 claims description 19
- 210000002569 neuron Anatomy 0.000 claims description 18
- CURLTUGMZLYLDI-UHFFFAOYSA-N Carbon dioxide Chemical compound O=C=O CURLTUGMZLYLDI-UHFFFAOYSA-N 0.000 claims description 12
- 229910002092 carbon dioxide Inorganic materials 0.000 claims description 6
- 239000001569 carbon dioxide Substances 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000003709 image segmentation Methods 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/061—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Image Analysis (AREA)
- Feedback Control In General (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a data processing method and device of low-computing-capacity processing equipment. The method comprises the following steps: in the real-time computer vision processing process, a processing device with low computing power acquires image data; the processing equipment uses a preset neural network to perform computer vision processing on the acquired image data to obtain a computer vision processing result; wherein the preset neural network is a target neural network obtained by the following process. Constructing an initial neural network, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures; training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network; and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.
Description
Technical Field
The present invention relates to the field of computers, and in particular, to a data processing method and apparatus for a low-computing-capability processing device.
Background
In recent years, deep neural networks have enjoyed great success in many areas, such as computer vision, natural language processing, and the like. However, the model of the deep neural network often includes a large number of model parameters, which are large in calculation amount and low in processing speed, and cannot be calculated in real time on some devices with low power consumption and low calculation capability (such as embedded devices and integrated devices). A device with low computing power is a device with low computing power if the computing power of the device is lower than the computing power required by the computing task or the computing model deployed thereon.
To solve this problem, some solutions are currently proposed:
Solution 2, presented by Jose m. alvarez in the paper "Learning the number of neurons in denenetworks" is basically the same as solution 1, except that in this solution 2 the group sparsity constraints for each layer of neurons are different, i.e. the strength of the group constraints for different layers of neurons is different.
Scheme 3, Wen Wei in the paper "Learning Structured spaces in Deep NeuralNetworks" indicates that deploying such a large model requires a large amount of computing and storage resources. The solution proposed by this paper is to learn, for example, the number of neurons, the shape of neurons, the depth of cross-layer connected network layers, etc., using group sparsity constraints.
However, the neural network obtained according to the scheme still cannot well meet the requirements of compact structure, high running speed and high precision, so that real-time operation cannot be performed on equipment with low computing power.
Disclosure of Invention
In an embodiment of the present application, on the one hand, a data processing method for a low computing power processing device is provided, where the method includes:
in the real-time computer vision processing process, a processing device with low computing power acquires image data;
the processing equipment uses a preset neural network to perform computer vision processing on the acquired image data to obtain a computer vision processing result; the preset neural network is a target neural network obtained by the following processing:
constructing an initial neural network for realizing computer vision processing, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;
training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network;
and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing the computer vision processing.
In another aspect, an embodiment of the present application provides a data processing apparatus for a low-computation-power processing device, where the apparatus includes: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring image data in a real-time computer vision processing process;
performing computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result; wherein the preset neural network is a target neural network obtained by a construction device, and the construction device includes:
the first construction unit is used for constructing an initial neural network, and a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators which are used for scaling the output of the corresponding specific structures;
the training unit is used for training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network;
and the second construction unit is used for deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.
In another aspect of the present application, a data processing method of a low computing power processing device is provided, including:
in the real-time natural language processing process, processing equipment with low computing power acquires text data;
the processing equipment uses a preset neural network to perform natural language processing on the acquired text data to obtain a natural language processing result; the preset neural network is a target neural network obtained by the following processing:
constructing an initial neural network for realizing natural language processing, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;
training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network;
and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing natural language processing.
In another aspect of the present application, a data processing apparatus of a low computing power processing device is provided, including: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring text data in a real-time natural language processing process;
carrying out natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result; wherein the preset neural network is a target neural network obtained by a construction device, and the construction device includes:
the third construction unit is used for constructing an initial neural network for realizing natural language processing, and a plurality of specific structures preset in the initial neural network are respectively provided with corresponding sparse scaling operators which are used for scaling the output of the corresponding specific structures;
the training unit is used for training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network;
and the fourth construction unit is used for deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing natural language processing.
According to the data processing method of the low-computing-capacity processing device, on one hand, the sparse scaling operator is introduced to scale the output of different specific structures, new constraint does not need to be added to the weight, the weight and the sparse scaling operator can be independently optimized, the precision of the neural network can be improved, on the other hand, the specific structure with the zero sparse scaling operator does not contribute to the output result of the neural network, the specific structure with the zero sparse scaling operator is deleted, the precision of the neural network is not affected, and the neural network can be simplified to improve the running speed of the neural network. Thus, a low computing power processing device may apply the neural network described above to perform real-time computer vision processing, or to perform real-time natural language processing.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a flow chart of a method for constructing a neural network according to an embodiment of the present invention;
FIG. 2 is a block diagram of a particular architecture of an embodiment of the present invention;
FIG. 3 is a diagram of a residual block in a residual network according to an embodiment of the present invention;
FIG. 4 is a diagram of a specific structure module according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a neuron as a specific structure in accordance with an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an apparatus for constructing a neural network according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, a flowchart of a method for constructing a neural network according to an embodiment of the present invention is shown, where the method includes:
And 102, training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network.
And 103, deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.
Preferably, the foregoing step 101 can be realized by the following steps a1 to A3:
and A1, selecting a neural network model.
According to the embodiment of the invention, a neural network model corresponding to the function (such as a computer vision processing function, image segmentation, object detection, face recognition, natural language processing function and the like) realized by the expected target neural network can be selected from a preset neural network model set, and a corresponding neural network model can be constructed according to the function realized by the expected target neural network. The present application is not strictly limited.
And A2, determining a specific structure of the neural network model needing to be provided with a sparse scaling operator.
In the embodiment of the present invention, a designer may determine a specific structure in the neural network model, for example: all or a portion of the neurons of a certain layer or layers of a network in a neural network may be determined to be of a particular structure. And/or, determining one or more modules in the neural network having the following characteristics as the specific structure: characteristic 1, including more than one network layer (e.g., the particular structure includes more than two cascaded network layers); characteristic 2, parallel connection with other modules or cross-layer connection between the front and back ends of the module. And/or, determining one or more modules in the neural network having the following characteristics as a specific structure: property 1, a module comprising more than one module (e.g., the particular structure comprises more than two modules in parallel); property 2, the front and back ends of the module have cross-layer connections.
And A3, setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.
In the embodiment of the invention, the value of the sparse scaling operator of each specific structure is more than or equal to 0. Preferably, the value of the initial sparse scaling operator is close to 1, for example, it may be directly 1.
Preferably, in the embodiment of the present invention, the step 102 may be specifically realized by the following steps B1 to B3:
and B1, constructing an objective function corresponding to the initial neural network, wherein the objective function comprises a loss function and a sparse regular function. The objective function is shown in equation (1):
in the formula (1), W is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,sampling data x for neural networkiThe loss of the amount of the carbon dioxide gas,is a sparse regularization function.
And B2, performing iterative training on the initial neural network by adopting the training sample data.
And step B3, when the iterative training times reach a threshold value or the objective function meets a preset convergence condition, obtaining the intermediate neural network.
Preferably, the step B2 may be implemented by performing the following iterative training on the initial neural network for a plurality of times, which is described by taking an iterative process of a non-first iteration and a non-last iteration (hereinafter referred to as the current iterative training) as an example, where the iterative training includes the following steps C1 to C3:
step C1, taking the sparse scaling operator obtained by the previous iteration training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iteration training;
step C2, taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training;
and C3, performing next iteration training based on the weight and the sparse scaling operator of the iteration training.
The first iterative training process is as follows: taking an initial sparse scaling operator as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and performing second iterative training based on the weight of the iterative training and the sparse scaling operator.
The last iteration training process is as follows: taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and taking the neural network containing the sparse scaling operator and the weight obtained by the iterative training as an intermediate neural network.
Preferably, in the embodiment of the present invention, the first optimization algorithm may be, but is not limited to, any one of the following algorithms: random gradient descent algorithm, variant algorithm introducing momentum.
Preferably, in the embodiment of the present invention, the second optimization algorithm may be, but is not limited to, any one of the following algorithms: an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm, and an alternating direction multiplier algorithm.
Preferably, in another embodiment, the objective function in the embodiment of the present invention includes a loss function, a weight regularization function and a sparsity regularization function, and the objective function is represented by equation (2):
in the formula (2), W is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,sampling data x for neural networkiThe loss of the amount of the carbon dioxide gas,in order to be a function of the weight regularization,is a sparse regularization function.
Preferably, in the embodiments of the present inventionIs sparsely regularized with weight γ, i.e.Of course, those skilled in the art will also appreciateA more complex sparse constraint, such as a non-convex sparse constraint, is set.
To further describe how to solve W and λ in the objective function in the embodiment of the present invention in detail, the objective function is taken as formula (2),For example, the optimization objective function solution obtained by one-time iterative training is described to obtain W and λ. Will be provided withIs marked asIs marked as
With λ as a constant and W as a variable, the objective function is converted intoThe value of W can be solved by adopting a random gradient descent algorithm, and the specific process is not described in detail.
With W as a constant and λ as a variable, the objective function is converted intoThe value of lambda is solved by adopting an accelerated neighborhood gradient descent algorithm, which can be specifically obtained by the following methods:
in embodiment 1, λ is obtained by the following formulae (3) to (5):
wherein etatRepresents the step size of the gradient descent at the time of the t-th iterative training, for the soft threshold operator, the following is defined
Mode 2, the solution of λ in the aforementioned mode 1 requires additional forward and backward calculation to obtainApplying this algorithm directly to the existing deep learning framework is somewhat difficult. Therefore, the method 2 modifies the formula of the method 1 to obtain the formulas (6) to (8), and calculates λ from the formulas (6) to (8):
λt=λt-1+vtformula (8)
Mode 3 and the embodiment of the present invention provide simpler λ calculated by the following formulas (9) to (11) to further reduce the difficulty:
wherein lambda't-1=λt-1+μt-1vt-1Mu is a preset fixed value, and W and lambda are updated in the form of batch random gradient descent.
Specific structures are described in detail below as modules, and neurons, respectively.
As shown in fig. 2, it is assumed that the neural network includes N modules, each module corresponds to a sparse scaling operator, and the front end and the back end of each module have cross-layer connection.
Taking a specific example, assuming that the neural network is a residual network, and the specific structure is a residual module, as shown in fig. 3, the front and back ends of the residual module have cross-layer connection, and the ith residual moduleThe block corresponds to a sparse scaling operator of λiAnd then:
if the residual error model is trained, obtaining a sparse scaling operator lambda of a third residual error model3When the residual error is equal to 0, the 3 rd residual error module in the residual error network is usedAnd (5) deleting.
As shown in fig. 4, it is assumed that the neural network includes N modules, each module includes M modules, each module includes a plurality of cascaded network layers, and each module corresponds to one sparse scaling operator.
As shown in fig. 5, assuming that the neural network includes L network layers, and the L-th network layer includes k neurons, the k neurons correspond to one sparse scaling operator respectively.
According to the neural network obtained through the processing, on one hand, the sparse scaling operator is introduced to scale the output of different specific structures, new constraint does not need to be added to the weight, the weight and the sparse scaling operator can be independently optimized, the precision of the neural network can be improved, on the other hand, the specific structure with the sparse scaling operator being zero does not contribute to the output result of the neural network, the specific structure with the sparse scaling operator being zero is deleted, the precision of the neural network is not influenced, and the neural network can be simplified to improve the running speed of the neural network.
The neural network can thus be applied in low computing power processing devices for real-time computer vision processing of image data or real-time natural language processing of text data. Low computing power processing devices include low computing power integrated or embedded devices, low computing power computing platforms, low computing power mobile devices, and the like.
In one example embodiment, a method for data processing by a low computing power processing device includes:
and 2, the processing equipment performs computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result.
Wherein the preset neural network is a target neural network obtained by processing as shown in fig. 1.
Through the processing, the low-computing-capacity processing equipment can efficiently and quickly process the acquired image data in real-time computer vision processing through the pre-configured neural network. Thereby enabling a low computing power processing device to perform real-time computer vision processing through the configured neural network.
In one example embodiment, a method for data processing by a low computing power processing device includes:
step 1', in the real-time natural language processing process, a processing device with low computing power acquires text data;
and 2', the processing equipment performs natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result.
Wherein the preset neural network is a target neural network obtained by processing as shown in fig. 1.
Through the processing, the low-computing-capacity processing equipment can efficiently and quickly process the acquired text data in real-time natural language processing through the pre-configured neural network. Thereby enabling a low computing power processing device to perform real-time natural language processing through a pre-configured neural network.
Example two
Based on the same inventive concept of the method for constructing a neural network provided in the first embodiment, a second embodiment of the present invention provides an apparatus for constructing a neural network, the apparatus having a structure as shown in fig. 6, and the apparatus includes:
the first construction unit 61 is configured to construct an initial neural network, where a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, where the sparse scaling operators are used to scale outputs of the corresponding specific structures;
the training unit 62 is configured to train the weight of the initial neural network and the sparse scaling operator with the specific structure by using preset training sample data to obtain an intermediate neural network;
and a second constructing unit 63, configured to delete the specific structure in which the sparse scaling operator is zero in the intermediate neural network, so as to obtain the target neural network.
Preferably, the first constructing unit 61 specifically includes a selecting module, a specific structure determining module, and a constructing module, where:
the selecting module is used for selecting a neural network model;
in the embodiment of the invention, the selection module can be specifically realized as follows: a neural network model corresponding to the function (e.g., the function of computer vision processing: image segmentation, object detection, face recognition, or natural language processing) implemented by the desired target neural network may be selected from a preset set of neural network models, or a corresponding neural network model may be constructed according to the function implemented by the desired target neural network. The present application is not strictly limited.
The specific structure determining module is used for determining a specific structure of the neural network model, which needs to be provided with a sparse scaling operator;
and the building module is used for setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.
In the embodiment of the invention, the value of the sparse scaling operator of each specific structure is more than or equal to 0 and less than or equal to 1. Preferably, the value of the initial sparse scaling operator is close to 1, for example, it may be directly 1.
Preferably, the training unit 62 specifically includes an objective function constructing module, a training module, and a determining module, where:
the target function construction module is used for constructing a target function corresponding to the initial neural network, and the target function comprises a loss function and a sparse regular function;
the training module is used for carrying out iterative training on the initial neural network by adopting the training sample data;
and the determining module is used for obtaining the intermediate neural network when the iterative training times reach a threshold value or the target function meets a preset convergence condition.
Preferably, the training module is specifically configured to: performing the following iterative training on the initial neural network for a plurality of times (the iterative training is not the first iterative training and is not the last iterative training): taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and performing next iterative training based on the weight of the iterative training and the sparse scaling operator.
The first iterative training process is as follows: taking an initial sparse scaling operator as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and performing second iterative training based on the weight of the iterative training and the sparse scaling operator.
The last iteration training process is as follows: taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and taking the neural network containing the sparse scaling operator and the weight obtained by the iterative training as an intermediate neural network.
Preferably, the first optimization algorithm may be, but is not limited to, any one of the following algorithms: random gradient descent algorithm, variant algorithm introducing momentum.
Preferably, the second optimization algorithm is an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm or an alternating direction multiplier algorithm.
Preferably, the objective function is:
w is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,sampling data x for neural networkiThe loss of the amount of the carbon dioxide gas,is a sparse regularization function.
Preferably, in another embodiment, the objective function in the embodiment of the present invention includes a loss function, a weight regularization function, and a sparsity regularization function, and the objective function is as follows:
w is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,sampling data x for neural networkiThe loss of the amount of the carbon dioxide gas,in order to be a function of the weight regularization,is a sparse regularization function.
Preferably, in the embodiments of the present inventionIs sparsely regularized with weight γ, i.e.Of course, those skilled in the art will also appreciateA more complex sparse constraint, such as a non-convex sparse constraint, is set.
Preferably, the specific structure is a neuron; or, the specific structure is a module including more than one network layer (for example, the specific structure includes more than two cascaded network layers), and the module is connected in parallel with other modules; alternatively, the specific structure is a module including more than one parallel module (for example, the specific structure includes more than two parallel modules), and the front end and the rear end of the module have cross-layer connection.
On one hand, the neural network obtained by the construction device shown in fig. 6 introduces the sparse scaling operator to scale the output of different specific structures, new constraints do not need to be added to the weights, the weights and the sparse scaling operator can be independently optimized, the precision of the neural network can be improved, on the other hand, the specific structure with the sparse scaling operator being zero does not contribute to the output result of the neural network, the specific structure with the sparse scaling operator being zero is deleted, the precision of the neural network is not influenced, and the neural network can be simplified to improve the operation speed of the neural network.
The neural network can thus be used in low computing power processing devices for real-time computer vision processing of image data or real-time natural language processing of text data.
In one example embodiment, a data processing apparatus of a low computing power processing device is provided that may be used for real-time image data processing. The device includes: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring image data in a real-time computer vision processing process;
and carrying out computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result.
Wherein the preset neural network is a target neural network obtained by the construction apparatus shown in fig. 6. When the low-computing-capacity processing device performs image data processing, the first constructing unit 61 shown in fig. 6 may be used to construct an initial neural network used for computer vision processing, and the second constructing unit 63 may be used to construct a target neural network used for computer vision processing.
The apparatus may be located in, be part of, or be integrated with the low computing power processing device. The device can process the image data acquired in real time in the real-time computer vision processing process through a preset neural network.
In one example embodiment, a data processing apparatus of a low computing power processing device is provided that may be used for real-time text data processing. The device includes: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring text data in a real-time natural language processing process;
and carrying out natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result.
Wherein the preset neural network is a target neural network obtained by the construction apparatus shown in fig. 6. In order to distinguish from real-time computer vision processing performed on image data when a low-computing-power processing device performs real-time natural language processing on text data, the first building unit 61 shown in fig. 6 may be a third building unit (not shown in the figure), the second building unit 63 may be a fourth building unit (not shown in the figure), the third building unit may be configured to build an initial neural network used for natural language processing, and the fourth building unit may be configured to build a target neural network used for natural language processing.
The apparatus may be located in, be part of, or be integrated with the low computing power processing device. The device can process the text data acquired in real time in the real-time natural language processing process through a preset neural network.
The foregoing is the core idea of the present invention, and in order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features and advantages of the embodiments of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention are further described in detail with reference to the accompanying drawings.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (16)
1. A data processing method for a low computing power processing device, comprising:
in the real-time computer vision processing process, a processing device with low computing power acquires image data;
the processing equipment uses a preset neural network to perform computer vision processing on the acquired image data to obtain a computer vision processing result; the preset neural network is a target neural network obtained by the following processing:
constructing an initial neural network for realizing computer vision processing, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;
training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network;
and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing the computer vision processing.
2. The method according to claim 1, wherein constructing an initial neural network implementing computer vision processing comprises:
selecting a neural network model for realizing computer vision processing;
determining a specific structure of the neural network model needing to be provided with a sparse scaling operator;
and setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.
3. The method according to claim 1, wherein training the weights of the initial neural network and the sparse scaling operator of the specific structure with preset training sample data to obtain an intermediate neural network specifically comprises:
constructing an objective function corresponding to an initial neural network, wherein the objective function comprises a loss function and a sparse regular function;
performing iterative training on the initial neural network by adopting the training sample data;
and when the iterative training times reach a threshold value or the target function meets a preset convergence condition, obtaining the intermediate neural network.
4. The method according to claim 3, wherein the iteratively training the initial neural network using the training sample data specifically comprises:
performing the following iterative training on the initial neural network for a plurality of times:
taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training;
taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training;
and performing next iterative training based on the weight of the iterative training and the sparse scaling operator.
5. The method of claim 4, wherein the second optimization algorithm is an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm, or an alternating direction multiplier algorithm.
7. The method according to any one of claims 1 to 6, wherein the specific structure is a neuron;
or the specific structure is a module comprising more than one network layer, and the module is connected with other modules in parallel;
or the specific structure is a module comprising more than one module, and the front end and the rear end of the module are connected in a cross-layer mode.
8. A data processing apparatus of a low computing power processing device, comprising: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring image data in a real-time computer vision processing process;
performing computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result; wherein the preset neural network is a target neural network obtained by a construction device, and the construction device includes:
the computer vision processing system comprises a first construction unit, a second construction unit and a third construction unit, wherein the first construction unit is used for constructing an initial neural network for realizing computer vision processing, a plurality of specific structures preset in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;
the training unit is used for training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network for realizing computer vision processing;
and the second construction unit is used for deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.
9. The apparatus according to claim 8, wherein the first building unit comprises:
the selection module is used for selecting a neural network model for realizing computer vision processing;
the specific structure determining module is used for determining a specific structure of the neural network model, which needs to be provided with a sparse scaling operator;
and the building module is used for setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.
10. The method according to claim 8, wherein the training unit specifically comprises:
the target function construction module is used for constructing a target function corresponding to the initial neural network, and the target function comprises a loss function and a sparse regular function;
the training module is used for carrying out iterative training on the initial neural network by adopting the training sample data;
and the determining module is used for obtaining the intermediate neural network when the iterative training times reach a threshold value or the target function meets a preset convergence condition.
11. The apparatus of claim 10, wherein the training module is specifically configured to:
performing the following iterative training on the initial neural network for a plurality of times:
taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training;
taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training;
and performing next iterative training based on the weight of the iterative training and the sparse scaling operator.
12. The apparatus of claim 11, wherein the second optimization algorithm is an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm, or an alternating direction multiplier algorithm.
14. The device according to any one of claims 8 to 13, wherein the specific structure is a neuron;
or the specific structure is a module comprising more than one network layer, and the module is connected with other modules in parallel;
or the specific structure is a module comprising more than one module, and the front end and the rear end of the module are connected in a cross-layer mode.
15. A data processing method for a low computing power processing device, comprising:
in the real-time natural language processing process, processing equipment with low computing power acquires text data;
the processing equipment uses a preset neural network to perform natural language processing on the acquired text data to obtain a natural language processing result; the preset neural network is a target neural network obtained by the following processing:
constructing an initial neural network for realizing natural language processing, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;
training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network;
and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing natural language processing.
16. A data processing apparatus of a low computing power processing device, comprising: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring text data in a real-time natural language processing process;
carrying out natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result; wherein the preset neural network is a target neural network obtained by a construction device, and the construction device includes:
the third construction unit is used for constructing an initial neural network for realizing natural language processing, and a plurality of specific structures preset in the initial neural network are respectively provided with corresponding sparse scaling operators which are used for scaling the output of the corresponding specific structures;
the training unit is used for training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network;
and the fourth construction unit is used for deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing natural language processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010011285.4A CN111178520B (en) | 2017-06-15 | 2017-06-15 | Method and device for constructing neural network |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710450550.7A CN107247991A (en) | 2017-06-15 | 2017-06-15 | A kind of method and device for building neutral net |
CN202010011285.4A CN111178520B (en) | 2017-06-15 | 2017-06-15 | Method and device for constructing neural network |
Related Parent Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710450550.7A Division CN107247991A (en) | 2017-06-15 | 2017-06-15 | A kind of method and device for building neutral net |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111178520A true CN111178520A (en) | 2020-05-19 |
CN111178520B CN111178520B (en) | 2024-06-07 |
Family
ID=60019020
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010011285.4A Active CN111178520B (en) | 2017-06-15 | 2017-06-15 | Method and device for constructing neural network |
CN201710450550.7A Pending CN107247991A (en) | 2017-06-15 | 2017-06-15 | A kind of method and device for building neutral net |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710450550.7A Pending CN107247991A (en) | 2017-06-15 | 2017-06-15 | A kind of method and device for building neutral net |
Country Status (2)
Country | Link |
---|---|
CN (2) | CN111178520B (en) |
WO (1) | WO2018227801A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673694A (en) * | 2021-05-26 | 2021-11-19 | 阿里巴巴新加坡控股有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11651223B2 (en) | 2017-10-27 | 2023-05-16 | Baidu Usa Llc | Systems and methods for block-sparse recurrent neural networks |
US11461628B2 (en) * | 2017-11-03 | 2022-10-04 | Samsung Electronics Co., Ltd. | Method for optimizing neural networks |
CN108805258B (en) * | 2018-05-23 | 2021-10-12 | 北京图森智途科技有限公司 | Neural network training method and device and computer server |
CN109284820A (en) * | 2018-10-26 | 2019-01-29 | 北京图森未来科技有限公司 | A kind of search structure method and device of deep neural network |
CN109840588B (en) * | 2019-01-04 | 2023-09-08 | 平安科技(深圳)有限公司 | Neural network model training method, device, computer equipment and storage medium |
CN112417610A (en) * | 2019-08-22 | 2021-02-26 | 中国电力科学研究院有限公司 | Wear assessment method and system and optimization method and system for aluminum alloy monofilaments |
CN110472400B (en) * | 2019-08-22 | 2021-06-01 | 浪潮集团有限公司 | Trusted computer system based on face recognition and implementation method |
CN110751267B (en) * | 2019-09-30 | 2021-03-30 | 京东城市(北京)数字科技有限公司 | Neural network structure searching method, training method, device and storage medium |
CN111985644B (en) * | 2020-08-28 | 2024-03-08 | 北京市商汤科技开发有限公司 | Neural network generation method and device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006001121A1 (en) * | 2004-06-25 | 2006-01-05 | Shin Caterpillar Mitsubishi Ltd. | Data compressing device and method, data analyzing device and method, and data managing system |
CN104200224A (en) * | 2014-08-28 | 2014-12-10 | 西北工业大学 | Valueless image removing method based on deep convolutional neural networks |
CN104751842A (en) * | 2013-12-31 | 2015-07-01 | 安徽科大讯飞信息科技股份有限公司 | Method and system for optimizing deep neural network |
CN106295794A (en) * | 2016-07-27 | 2017-01-04 | 中国石油大学(华东) | The neural network modeling approach of fractional order based on smooth Group Lasso penalty term |
CN106503654A (en) * | 2016-10-24 | 2017-03-15 | 中国地质大学(武汉) | A kind of face emotion identification method based on the sparse autoencoder network of depth |
CN106548234A (en) * | 2016-11-17 | 2017-03-29 | 北京图森互联科技有限责任公司 | A kind of neural networks pruning method and device |
CN106650928A (en) * | 2016-10-11 | 2017-05-10 | 广州视源电子科技股份有限公司 | Neural network optimization method and device |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103793694B (en) * | 2014-02-10 | 2017-02-08 | 天津大学 | Human face recognition method based on multiple-feature space sparse classifiers |
CN106548192B (en) * | 2016-09-23 | 2019-08-09 | 北京市商汤科技开发有限公司 | Image processing method, device and electronic equipment neural network based |
-
2017
- 2017-06-15 CN CN202010011285.4A patent/CN111178520B/en active Active
- 2017-06-15 CN CN201710450550.7A patent/CN107247991A/en active Pending
- 2017-09-18 WO PCT/CN2017/102033 patent/WO2018227801A1/en active Application Filing
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2006001121A1 (en) * | 2004-06-25 | 2006-01-05 | Shin Caterpillar Mitsubishi Ltd. | Data compressing device and method, data analyzing device and method, and data managing system |
CN104751842A (en) * | 2013-12-31 | 2015-07-01 | 安徽科大讯飞信息科技股份有限公司 | Method and system for optimizing deep neural network |
CN104200224A (en) * | 2014-08-28 | 2014-12-10 | 西北工业大学 | Valueless image removing method based on deep convolutional neural networks |
CN106295794A (en) * | 2016-07-27 | 2017-01-04 | 中国石油大学(华东) | The neural network modeling approach of fractional order based on smooth Group Lasso penalty term |
CN106650928A (en) * | 2016-10-11 | 2017-05-10 | 广州视源电子科技股份有限公司 | Neural network optimization method and device |
CN106503654A (en) * | 2016-10-24 | 2017-03-15 | 中国地质大学(武汉) | A kind of face emotion identification method based on the sparse autoencoder network of depth |
CN106548234A (en) * | 2016-11-17 | 2017-03-29 | 北京图森互联科技有限责任公司 | A kind of neural networks pruning method and device |
Non-Patent Citations (2)
Title |
---|
JIN-KYU KIM等: "An efficient pruning and weight sharing method for neural network", 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-ASIA (ICCE-ASIA), 5 January 2017 (2017-01-05) * |
李鸣;张鸿;: "基于卷积神经网络迭代优化的图像分类算法", 计算机工程与设计, no. 01 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113673694A (en) * | 2021-05-26 | 2021-11-19 | 阿里巴巴新加坡控股有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN113673694B (en) * | 2021-05-26 | 2024-08-27 | 阿里巴巴创新公司 | Data processing method and device, electronic equipment and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN111178520B (en) | 2024-06-07 |
WO2018227801A1 (en) | 2018-12-20 |
CN107247991A (en) | 2017-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111178520A (en) | Data processing method and device of low-computing-capacity processing equipment | |
US11870947B2 (en) | Generating images using neural networks | |
US11651259B2 (en) | Neural architecture search for convolutional neural networks | |
CN111414987B (en) | Training method and training device of neural network and electronic equipment | |
KR102318772B1 (en) | Domain Separation Neural Networks | |
WO2020082663A1 (en) | Structural search method and apparatus for deep neural network | |
US10380479B2 (en) | Acceleration of convolutional neural network training using stochastic perforation | |
JP7439151B2 (en) | neural architecture search | |
KR102415506B1 (en) | Device and method to reduce neural network | |
CN114503121A (en) | Resource constrained neural network architecture search | |
US20180018555A1 (en) | System and method for building artificial neural network architectures | |
CN111406267A (en) | Neural architecture search using performance-predictive neural networks | |
US11144782B2 (en) | Generating video frames using neural networks | |
CN110622178A (en) | Learning neural network structure | |
WO2021042857A1 (en) | Processing method and processing apparatus for image segmentation model | |
CN111008631B (en) | Image association method and device, storage medium and electronic device | |
WO2020152233A1 (en) | Action selection using interaction history graphs | |
CN110956655B (en) | Dense depth estimation method based on monocular image | |
CN114282666A (en) | Structured pruning method and device based on local sparse constraint | |
CN118643874A (en) | Method and device for training neural network | |
KR20220134627A (en) | Hardware-optimized neural architecture discovery | |
CN113723603A (en) | Method, device and storage medium for updating parameters | |
Huai et al. | Zerobn: Learning compact neural networks for latency-critical edge systems | |
CN113705724B (en) | Batch learning method of deep neural network based on self-adaptive L-BFGS algorithm | |
CN114021697A (en) | End cloud framework neural network generation method and system based on reinforcement learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |