CN111178520A - Data processing method and device of low-computing-capacity processing equipment - Google Patents

Data processing method and device of low-computing-capacity processing equipment Download PDF

Info

Publication number
CN111178520A
CN111178520A CN202010011285.4A CN202010011285A CN111178520A CN 111178520 A CN111178520 A CN 111178520A CN 202010011285 A CN202010011285 A CN 202010011285A CN 111178520 A CN111178520 A CN 111178520A
Authority
CN
China
Prior art keywords
neural network
training
sparse
processing
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010011285.4A
Other languages
Chinese (zh)
Other versions
CN111178520B (en
Inventor
王乃岩
黄泽昊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tusimple Technology Co Ltd
Original Assignee
Beijing Tusimple Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tusimple Technology Co Ltd filed Critical Beijing Tusimple Technology Co Ltd
Priority to CN202010011285.4A priority Critical patent/CN111178520B/en
Publication of CN111178520A publication Critical patent/CN111178520A/en
Application granted granted Critical
Publication of CN111178520B publication Critical patent/CN111178520B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/061Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using biological neurons, e.g. biological neurons connected to an integrated circuit
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Image Analysis (AREA)
  • Feedback Control In General (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a data processing method and device of low-computing-capacity processing equipment. The method comprises the following steps: in the real-time computer vision processing process, a processing device with low computing power acquires image data; the processing equipment uses a preset neural network to perform computer vision processing on the acquired image data to obtain a computer vision processing result; wherein the preset neural network is a target neural network obtained by the following process. Constructing an initial neural network, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures; training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network; and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.

Description

Data processing method and device of low-computing-capacity processing equipment
Technical Field
The present invention relates to the field of computers, and in particular, to a data processing method and apparatus for a low-computing-capability processing device.
Background
In recent years, deep neural networks have enjoyed great success in many areas, such as computer vision, natural language processing, and the like. However, the model of the deep neural network often includes a large number of model parameters, which are large in calculation amount and low in processing speed, and cannot be calculated in real time on some devices with low power consumption and low calculation capability (such as embedded devices and integrated devices). A device with low computing power is a device with low computing power if the computing power of the device is lower than the computing power required by the computing task or the computing model deployed thereon.
To solve this problem, some solutions are currently proposed:
solution 1, Hao Zhou in the paper "Less is More: Towards Compact CNNS" indicates that the existing convolutional neural network includes a large number of parameters, and deployment and operation of such convolutional neural network on a computing platform requires a large amount of memory resources and hardware resources, which makes it impossible to use a deep convolutional neural network on a mobile computing device with limited memory resources. The paper proposes a scheme for learning the number of neurons in each layer of a neural network by group sparsity constraint, which adds group sparsity constraint to the weights of a convolutional neural network, i.e., the weight of each neuron is a group. Since the group sparsity constraint compresses the weight in each group to 0 as fully as possible, when the weight of a neuron is 0 fully, the neuron can be removed, and thus the number of neurons in the neural network can be learned.
Solution 2, presented by Jose m. alvarez in the paper "Learning the number of neurons in denenetworks" is basically the same as solution 1, except that in this solution 2 the group sparsity constraints for each layer of neurons are different, i.e. the strength of the group constraints for different layers of neurons is different.
Scheme 3, Wen Wei in the paper "Learning Structured spaces in Deep NeuralNetworks" indicates that deploying such a large model requires a large amount of computing and storage resources. The solution proposed by this paper is to learn, for example, the number of neurons, the shape of neurons, the depth of cross-layer connected network layers, etc., using group sparsity constraints.
However, the neural network obtained according to the scheme still cannot well meet the requirements of compact structure, high running speed and high precision, so that real-time operation cannot be performed on equipment with low computing power.
Disclosure of Invention
In an embodiment of the present application, on the one hand, a data processing method for a low computing power processing device is provided, where the method includes:
in the real-time computer vision processing process, a processing device with low computing power acquires image data;
the processing equipment uses a preset neural network to perform computer vision processing on the acquired image data to obtain a computer vision processing result; the preset neural network is a target neural network obtained by the following processing:
constructing an initial neural network for realizing computer vision processing, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;
training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network;
and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing the computer vision processing.
In another aspect, an embodiment of the present application provides a data processing apparatus for a low-computation-power processing device, where the apparatus includes: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring image data in a real-time computer vision processing process;
performing computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result; wherein the preset neural network is a target neural network obtained by a construction device, and the construction device includes:
the first construction unit is used for constructing an initial neural network, and a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators which are used for scaling the output of the corresponding specific structures;
the training unit is used for training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network;
and the second construction unit is used for deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.
In another aspect of the present application, a data processing method of a low computing power processing device is provided, including:
in the real-time natural language processing process, processing equipment with low computing power acquires text data;
the processing equipment uses a preset neural network to perform natural language processing on the acquired text data to obtain a natural language processing result; the preset neural network is a target neural network obtained by the following processing:
constructing an initial neural network for realizing natural language processing, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;
training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network;
and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing natural language processing.
In another aspect of the present application, a data processing apparatus of a low computing power processing device is provided, including: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring text data in a real-time natural language processing process;
carrying out natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result; wherein the preset neural network is a target neural network obtained by a construction device, and the construction device includes:
the third construction unit is used for constructing an initial neural network for realizing natural language processing, and a plurality of specific structures preset in the initial neural network are respectively provided with corresponding sparse scaling operators which are used for scaling the output of the corresponding specific structures;
the training unit is used for training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network;
and the fourth construction unit is used for deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing natural language processing.
According to the data processing method of the low-computing-capacity processing device, on one hand, the sparse scaling operator is introduced to scale the output of different specific structures, new constraint does not need to be added to the weight, the weight and the sparse scaling operator can be independently optimized, the precision of the neural network can be improved, on the other hand, the specific structure with the zero sparse scaling operator does not contribute to the output result of the neural network, the specific structure with the zero sparse scaling operator is deleted, the precision of the neural network is not affected, and the neural network can be simplified to improve the running speed of the neural network. Thus, a low computing power processing device may apply the neural network described above to perform real-time computer vision processing, or to perform real-time natural language processing.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a flow chart of a method for constructing a neural network according to an embodiment of the present invention;
FIG. 2 is a block diagram of a particular architecture of an embodiment of the present invention;
FIG. 3 is a diagram of a residual block in a residual network according to an embodiment of the present invention;
FIG. 4 is a diagram of a specific structure module according to an embodiment of the present invention;
FIG. 5 is a diagram illustrating a neuron as a specific structure in accordance with an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an apparatus for constructing a neural network according to an embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example one
Referring to fig. 1, a flowchart of a method for constructing a neural network according to an embodiment of the present invention is shown, where the method includes:
step 101, constructing an initial neural network, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures.
And 102, training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network.
And 103, deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.
Preferably, the foregoing step 101 can be realized by the following steps a1 to A3:
and A1, selecting a neural network model.
According to the embodiment of the invention, a neural network model corresponding to the function (such as a computer vision processing function, image segmentation, object detection, face recognition, natural language processing function and the like) realized by the expected target neural network can be selected from a preset neural network model set, and a corresponding neural network model can be constructed according to the function realized by the expected target neural network. The present application is not strictly limited.
And A2, determining a specific structure of the neural network model needing to be provided with a sparse scaling operator.
In the embodiment of the present invention, a designer may determine a specific structure in the neural network model, for example: all or a portion of the neurons of a certain layer or layers of a network in a neural network may be determined to be of a particular structure. And/or, determining one or more modules in the neural network having the following characteristics as the specific structure: characteristic 1, including more than one network layer (e.g., the particular structure includes more than two cascaded network layers); characteristic 2, parallel connection with other modules or cross-layer connection between the front and back ends of the module. And/or, determining one or more modules in the neural network having the following characteristics as a specific structure: property 1, a module comprising more than one module (e.g., the particular structure comprises more than two modules in parallel); property 2, the front and back ends of the module have cross-layer connections.
And A3, setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.
In the embodiment of the invention, the value of the sparse scaling operator of each specific structure is more than or equal to 0. Preferably, the value of the initial sparse scaling operator is close to 1, for example, it may be directly 1.
Preferably, in the embodiment of the present invention, the step 102 may be specifically realized by the following steps B1 to B3:
and B1, constructing an objective function corresponding to the initial neural network, wherein the objective function comprises a loss function and a sparse regular function. The objective function is shown in equation (1):
Figure BDA0002357237200000061
in the formula (1), W is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,
Figure BDA0002357237200000062
sampling data x for neural networkiThe loss of the amount of the carbon dioxide gas,
Figure BDA0002357237200000063
is a sparse regularization function.
And B2, performing iterative training on the initial neural network by adopting the training sample data.
And step B3, when the iterative training times reach a threshold value or the objective function meets a preset convergence condition, obtaining the intermediate neural network.
Preferably, the step B2 may be implemented by performing the following iterative training on the initial neural network for a plurality of times, which is described by taking an iterative process of a non-first iteration and a non-last iteration (hereinafter referred to as the current iterative training) as an example, where the iterative training includes the following steps C1 to C3:
step C1, taking the sparse scaling operator obtained by the previous iteration training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iteration training;
step C2, taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training;
and C3, performing next iteration training based on the weight and the sparse scaling operator of the iteration training.
The first iterative training process is as follows: taking an initial sparse scaling operator as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and performing second iterative training based on the weight of the iterative training and the sparse scaling operator.
The last iteration training process is as follows: taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and taking the neural network containing the sparse scaling operator and the weight obtained by the iterative training as an intermediate neural network.
Preferably, in the embodiment of the present invention, the first optimization algorithm may be, but is not limited to, any one of the following algorithms: random gradient descent algorithm, variant algorithm introducing momentum.
Preferably, in the embodiment of the present invention, the second optimization algorithm may be, but is not limited to, any one of the following algorithms: an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm, and an alternating direction multiplier algorithm.
Preferably, in another embodiment, the objective function in the embodiment of the present invention includes a loss function, a weight regularization function and a sparsity regularization function, and the objective function is represented by equation (2):
Figure BDA0002357237200000071
in the formula (2), W is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,
Figure BDA0002357237200000072
sampling data x for neural networkiThe loss of the amount of the carbon dioxide gas,
Figure BDA0002357237200000073
in order to be a function of the weight regularization,
Figure BDA0002357237200000074
is a sparse regularization function.
Preferably, in the embodiments of the present invention
Figure BDA0002357237200000075
Is sparsely regularized with weight γ, i.e.
Figure BDA0002357237200000076
Of course, those skilled in the art will also appreciate
Figure BDA0002357237200000077
A more complex sparse constraint, such as a non-convex sparse constraint, is set.
To further describe how to solve W and λ in the objective function in the embodiment of the present invention in detail, the objective function is taken as formula (2),
Figure BDA0002357237200000078
For example, the optimization objective function solution obtained by one-time iterative training is described to obtain W and λ. Will be provided with
Figure BDA0002357237200000079
Is marked as
Figure BDA00023572372000000710
Is marked as
Figure BDA00023572372000000711
With λ as a constant and W as a variable, the objective function is converted into
Figure BDA00023572372000000712
The value of W can be solved by adopting a random gradient descent algorithm, and the specific process is not described in detail.
With W as a constant and λ as a variable, the objective function is converted into
Figure BDA00023572372000000713
The value of lambda is solved by adopting an accelerated neighborhood gradient descent algorithm, which can be specifically obtained by the following methods:
in embodiment 1, λ is obtained by the following formulae (3) to (5):
Figure BDA0002357237200000081
Figure BDA0002357237200000082
Figure BDA0002357237200000083
wherein etatRepresents the step size of the gradient descent at the time of the t-th iterative training,
Figure BDA0002357237200000084
Figure BDA0002357237200000085
for the soft threshold operator, the following is defined
Figure BDA0002357237200000086
Mode 2, the solution of λ in the aforementioned mode 1 requires additional forward and backward calculation to obtain
Figure BDA0002357237200000087
Applying this algorithm directly to the existing deep learning framework is somewhat difficult. Therefore, the method 2 modifies the formula of the method 1 to obtain the formulas (6) to (8), and calculates λ from the formulas (6) to (8):
Figure BDA0002357237200000088
Figure BDA0002357237200000089
λt=λt-1+vtformula (8)
Mode 3 and the embodiment of the present invention provide simpler λ calculated by the following formulas (9) to (11) to further reduce the difficulty:
Figure BDA00023572372000000810
Figure BDA00023572372000000811
Figure BDA00023572372000000812
wherein lambda't-1=λt-1t-1vt-1Mu is a preset fixed value, and W and lambda are updated in the form of batch random gradient descent.
Specific structures are described in detail below as modules, and neurons, respectively.
As shown in fig. 2, it is assumed that the neural network includes N modules, each module corresponds to a sparse scaling operator, and the front end and the back end of each module have cross-layer connection.
Taking a specific example, assuming that the neural network is a residual network, and the specific structure is a residual module, as shown in fig. 3, the front and back ends of the residual module have cross-layer connection, and the ith residual moduleThe block corresponds to a sparse scaling operator of λiAnd then:
Figure BDA00023572372000000813
if the residual error model is trained, obtaining a sparse scaling operator lambda of a third residual error model3When the residual error is equal to 0, the 3 rd residual error module in the residual error network is used
Figure BDA0002357237200000091
And (5) deleting.
As shown in fig. 4, it is assumed that the neural network includes N modules, each module includes M modules, each module includes a plurality of cascaded network layers, and each module corresponds to one sparse scaling operator.
As shown in fig. 5, assuming that the neural network includes L network layers, and the L-th network layer includes k neurons, the k neurons correspond to one sparse scaling operator respectively.
According to the neural network obtained through the processing, on one hand, the sparse scaling operator is introduced to scale the output of different specific structures, new constraint does not need to be added to the weight, the weight and the sparse scaling operator can be independently optimized, the precision of the neural network can be improved, on the other hand, the specific structure with the sparse scaling operator being zero does not contribute to the output result of the neural network, the specific structure with the sparse scaling operator being zero is deleted, the precision of the neural network is not influenced, and the neural network can be simplified to improve the running speed of the neural network.
The neural network can thus be applied in low computing power processing devices for real-time computer vision processing of image data or real-time natural language processing of text data. Low computing power processing devices include low computing power integrated or embedded devices, low computing power computing platforms, low computing power mobile devices, and the like.
In one example embodiment, a method for data processing by a low computing power processing device includes:
step 1, in the real-time computer vision processing process, a processing device with low computing power acquires image data;
and 2, the processing equipment performs computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result.
Wherein the preset neural network is a target neural network obtained by processing as shown in fig. 1.
Through the processing, the low-computing-capacity processing equipment can efficiently and quickly process the acquired image data in real-time computer vision processing through the pre-configured neural network. Thereby enabling a low computing power processing device to perform real-time computer vision processing through the configured neural network.
In one example embodiment, a method for data processing by a low computing power processing device includes:
step 1', in the real-time natural language processing process, a processing device with low computing power acquires text data;
and 2', the processing equipment performs natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result.
Wherein the preset neural network is a target neural network obtained by processing as shown in fig. 1.
Through the processing, the low-computing-capacity processing equipment can efficiently and quickly process the acquired text data in real-time natural language processing through the pre-configured neural network. Thereby enabling a low computing power processing device to perform real-time natural language processing through a pre-configured neural network.
Example two
Based on the same inventive concept of the method for constructing a neural network provided in the first embodiment, a second embodiment of the present invention provides an apparatus for constructing a neural network, the apparatus having a structure as shown in fig. 6, and the apparatus includes:
the first construction unit 61 is configured to construct an initial neural network, where a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, where the sparse scaling operators are used to scale outputs of the corresponding specific structures;
the training unit 62 is configured to train the weight of the initial neural network and the sparse scaling operator with the specific structure by using preset training sample data to obtain an intermediate neural network;
and a second constructing unit 63, configured to delete the specific structure in which the sparse scaling operator is zero in the intermediate neural network, so as to obtain the target neural network.
Preferably, the first constructing unit 61 specifically includes a selecting module, a specific structure determining module, and a constructing module, where:
the selecting module is used for selecting a neural network model;
in the embodiment of the invention, the selection module can be specifically realized as follows: a neural network model corresponding to the function (e.g., the function of computer vision processing: image segmentation, object detection, face recognition, or natural language processing) implemented by the desired target neural network may be selected from a preset set of neural network models, or a corresponding neural network model may be constructed according to the function implemented by the desired target neural network. The present application is not strictly limited.
The specific structure determining module is used for determining a specific structure of the neural network model, which needs to be provided with a sparse scaling operator;
and the building module is used for setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.
In the embodiment of the invention, the value of the sparse scaling operator of each specific structure is more than or equal to 0 and less than or equal to 1. Preferably, the value of the initial sparse scaling operator is close to 1, for example, it may be directly 1.
Preferably, the training unit 62 specifically includes an objective function constructing module, a training module, and a determining module, where:
the target function construction module is used for constructing a target function corresponding to the initial neural network, and the target function comprises a loss function and a sparse regular function;
the training module is used for carrying out iterative training on the initial neural network by adopting the training sample data;
and the determining module is used for obtaining the intermediate neural network when the iterative training times reach a threshold value or the target function meets a preset convergence condition.
Preferably, the training module is specifically configured to: performing the following iterative training on the initial neural network for a plurality of times (the iterative training is not the first iterative training and is not the last iterative training): taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and performing next iterative training based on the weight of the iterative training and the sparse scaling operator.
The first iterative training process is as follows: taking an initial sparse scaling operator as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and performing second iterative training based on the weight of the iterative training and the sparse scaling operator.
The last iteration training process is as follows: taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and taking the neural network containing the sparse scaling operator and the weight obtained by the iterative training as an intermediate neural network.
Preferably, the first optimization algorithm may be, but is not limited to, any one of the following algorithms: random gradient descent algorithm, variant algorithm introducing momentum.
Preferably, the second optimization algorithm is an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm or an alternating direction multiplier algorithm.
Preferably, the objective function is:
Figure BDA0002357237200000121
w is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,
Figure BDA0002357237200000122
sampling data x for neural networkiThe loss of the amount of the carbon dioxide gas,
Figure BDA0002357237200000123
is a sparse regularization function.
Preferably, in another embodiment, the objective function in the embodiment of the present invention includes a loss function, a weight regularization function, and a sparsity regularization function, and the objective function is as follows:
Figure BDA0002357237200000124
w is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,
Figure BDA0002357237200000125
sampling data x for neural networkiThe loss of the amount of the carbon dioxide gas,
Figure BDA0002357237200000126
in order to be a function of the weight regularization,
Figure BDA0002357237200000127
is a sparse regularization function.
Preferably, in the embodiments of the present invention
Figure BDA0002357237200000128
Is sparsely regularized with weight γ, i.e.
Figure BDA0002357237200000129
Of course, those skilled in the art will also appreciate
Figure BDA00023572372000001210
A more complex sparse constraint, such as a non-convex sparse constraint, is set.
Preferably, the specific structure is a neuron; or, the specific structure is a module including more than one network layer (for example, the specific structure includes more than two cascaded network layers), and the module is connected in parallel with other modules; alternatively, the specific structure is a module including more than one parallel module (for example, the specific structure includes more than two parallel modules), and the front end and the rear end of the module have cross-layer connection.
On one hand, the neural network obtained by the construction device shown in fig. 6 introduces the sparse scaling operator to scale the output of different specific structures, new constraints do not need to be added to the weights, the weights and the sparse scaling operator can be independently optimized, the precision of the neural network can be improved, on the other hand, the specific structure with the sparse scaling operator being zero does not contribute to the output result of the neural network, the specific structure with the sparse scaling operator being zero is deleted, the precision of the neural network is not influenced, and the neural network can be simplified to improve the operation speed of the neural network.
The neural network can thus be used in low computing power processing devices for real-time computer vision processing of image data or real-time natural language processing of text data.
In one example embodiment, a data processing apparatus of a low computing power processing device is provided that may be used for real-time image data processing. The device includes: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring image data in a real-time computer vision processing process;
and carrying out computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result.
Wherein the preset neural network is a target neural network obtained by the construction apparatus shown in fig. 6. When the low-computing-capacity processing device performs image data processing, the first constructing unit 61 shown in fig. 6 may be used to construct an initial neural network used for computer vision processing, and the second constructing unit 63 may be used to construct a target neural network used for computer vision processing.
The apparatus may be located in, be part of, or be integrated with the low computing power processing device. The device can process the image data acquired in real time in the real-time computer vision processing process through a preset neural network.
In one example embodiment, a data processing apparatus of a low computing power processing device is provided that may be used for real-time text data processing. The device includes: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring text data in a real-time natural language processing process;
and carrying out natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result.
Wherein the preset neural network is a target neural network obtained by the construction apparatus shown in fig. 6. In order to distinguish from real-time computer vision processing performed on image data when a low-computing-power processing device performs real-time natural language processing on text data, the first building unit 61 shown in fig. 6 may be a third building unit (not shown in the figure), the second building unit 63 may be a fourth building unit (not shown in the figure), the third building unit may be configured to build an initial neural network used for natural language processing, and the fourth building unit may be configured to build a target neural network used for natural language processing.
The apparatus may be located in, be part of, or be integrated with the low computing power processing device. The device can process the text data acquired in real time in the real-time natural language processing process through a preset neural network.
The foregoing is the core idea of the present invention, and in order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features and advantages of the embodiments of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention are further described in detail with reference to the accompanying drawings.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (16)

1. A data processing method for a low computing power processing device, comprising:
in the real-time computer vision processing process, a processing device with low computing power acquires image data;
the processing equipment uses a preset neural network to perform computer vision processing on the acquired image data to obtain a computer vision processing result; the preset neural network is a target neural network obtained by the following processing:
constructing an initial neural network for realizing computer vision processing, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;
training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network;
and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing the computer vision processing.
2. The method according to claim 1, wherein constructing an initial neural network implementing computer vision processing comprises:
selecting a neural network model for realizing computer vision processing;
determining a specific structure of the neural network model needing to be provided with a sparse scaling operator;
and setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.
3. The method according to claim 1, wherein training the weights of the initial neural network and the sparse scaling operator of the specific structure with preset training sample data to obtain an intermediate neural network specifically comprises:
constructing an objective function corresponding to an initial neural network, wherein the objective function comprises a loss function and a sparse regular function;
performing iterative training on the initial neural network by adopting the training sample data;
and when the iterative training times reach a threshold value or the target function meets a preset convergence condition, obtaining the intermediate neural network.
4. The method according to claim 3, wherein the iteratively training the initial neural network using the training sample data specifically comprises:
performing the following iterative training on the initial neural network for a plurality of times:
taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training;
taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training;
and performing next iterative training based on the weight of the iterative training and the sparse scaling operator.
5. The method of claim 4, wherein the second optimization algorithm is an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm, or an alternating direction multiplier algorithm.
6. The method of claim 3, wherein the objective function is:
Figure FDA0002357237190000021
w is weight, lambda is sparse scaling operator vector, N is number of sample data,
Figure FDA0002357237190000022
sampling data x for neural networkiThe loss of the amount of the carbon dioxide gas,
Figure FDA0002357237190000023
is a sparse regularization function.
7. The method according to any one of claims 1 to 6, wherein the specific structure is a neuron;
or the specific structure is a module comprising more than one network layer, and the module is connected with other modules in parallel;
or the specific structure is a module comprising more than one module, and the front end and the rear end of the module are connected in a cross-layer mode.
8. A data processing apparatus of a low computing power processing device, comprising: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring image data in a real-time computer vision processing process;
performing computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result; wherein the preset neural network is a target neural network obtained by a construction device, and the construction device includes:
the computer vision processing system comprises a first construction unit, a second construction unit and a third construction unit, wherein the first construction unit is used for constructing an initial neural network for realizing computer vision processing, a plurality of specific structures preset in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;
the training unit is used for training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network for realizing computer vision processing;
and the second construction unit is used for deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.
9. The apparatus according to claim 8, wherein the first building unit comprises:
the selection module is used for selecting a neural network model for realizing computer vision processing;
the specific structure determining module is used for determining a specific structure of the neural network model, which needs to be provided with a sparse scaling operator;
and the building module is used for setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.
10. The method according to claim 8, wherein the training unit specifically comprises:
the target function construction module is used for constructing a target function corresponding to the initial neural network, and the target function comprises a loss function and a sparse regular function;
the training module is used for carrying out iterative training on the initial neural network by adopting the training sample data;
and the determining module is used for obtaining the intermediate neural network when the iterative training times reach a threshold value or the target function meets a preset convergence condition.
11. The apparatus of claim 10, wherein the training module is specifically configured to:
performing the following iterative training on the initial neural network for a plurality of times:
taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training;
taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training;
and performing next iterative training based on the weight of the iterative training and the sparse scaling operator.
12. The apparatus of claim 11, wherein the second optimization algorithm is an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm, or an alternating direction multiplier algorithm.
13. The apparatus of claim 10, wherein the objective function is:
Figure FDA0002357237190000041
where W is the weight, λ is the sparse scaling operator vector, N is the number of sample data,
Figure FDA0002357237190000042
sampling data x for neural networkiThe loss of the amount of the carbon dioxide gas,
Figure FDA0002357237190000043
is a sparse regularization function.
14. The device according to any one of claims 8 to 13, wherein the specific structure is a neuron;
or the specific structure is a module comprising more than one network layer, and the module is connected with other modules in parallel;
or the specific structure is a module comprising more than one module, and the front end and the rear end of the module are connected in a cross-layer mode.
15. A data processing method for a low computing power processing device, comprising:
in the real-time natural language processing process, processing equipment with low computing power acquires text data;
the processing equipment uses a preset neural network to perform natural language processing on the acquired text data to obtain a natural language processing result; the preset neural network is a target neural network obtained by the following processing:
constructing an initial neural network for realizing natural language processing, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;
training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network;
and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing natural language processing.
16. A data processing apparatus of a low computing power processing device, comprising: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:
acquiring text data in a real-time natural language processing process;
carrying out natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result; wherein the preset neural network is a target neural network obtained by a construction device, and the construction device includes:
the third construction unit is used for constructing an initial neural network for realizing natural language processing, and a plurality of specific structures preset in the initial neural network are respectively provided with corresponding sparse scaling operators which are used for scaling the output of the corresponding specific structures;
the training unit is used for training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network;
and the fourth construction unit is used for deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing natural language processing.
CN202010011285.4A 2017-06-15 2017-06-15 Method and device for constructing neural network Active CN111178520B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010011285.4A CN111178520B (en) 2017-06-15 2017-06-15 Method and device for constructing neural network

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710450550.7A CN107247991A (en) 2017-06-15 2017-06-15 A kind of method and device for building neutral net
CN202010011285.4A CN111178520B (en) 2017-06-15 2017-06-15 Method and device for constructing neural network

Related Parent Applications (1)

Application Number Title Priority Date Filing Date
CN201710450550.7A Division CN107247991A (en) 2017-06-15 2017-06-15 A kind of method and device for building neutral net

Publications (2)

Publication Number Publication Date
CN111178520A true CN111178520A (en) 2020-05-19
CN111178520B CN111178520B (en) 2024-06-07

Family

ID=60019020

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202010011285.4A Active CN111178520B (en) 2017-06-15 2017-06-15 Method and device for constructing neural network
CN201710450550.7A Pending CN107247991A (en) 2017-06-15 2017-06-15 A kind of method and device for building neutral net

Family Applications After (1)

Application Number Title Priority Date Filing Date
CN201710450550.7A Pending CN107247991A (en) 2017-06-15 2017-06-15 A kind of method and device for building neutral net

Country Status (2)

Country Link
CN (2) CN111178520B (en)
WO (1) WO2018227801A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673694A (en) * 2021-05-26 2021-11-19 阿里巴巴新加坡控股有限公司 Data processing method and device, electronic equipment and computer readable storage medium

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11651223B2 (en) 2017-10-27 2023-05-16 Baidu Usa Llc Systems and methods for block-sparse recurrent neural networks
US11461628B2 (en) * 2017-11-03 2022-10-04 Samsung Electronics Co., Ltd. Method for optimizing neural networks
CN108805258B (en) * 2018-05-23 2021-10-12 北京图森智途科技有限公司 Neural network training method and device and computer server
CN109284820A (en) * 2018-10-26 2019-01-29 北京图森未来科技有限公司 A kind of search structure method and device of deep neural network
CN109840588B (en) * 2019-01-04 2023-09-08 平安科技(深圳)有限公司 Neural network model training method, device, computer equipment and storage medium
CN112417610A (en) * 2019-08-22 2021-02-26 中国电力科学研究院有限公司 Wear assessment method and system and optimization method and system for aluminum alloy monofilaments
CN110472400B (en) * 2019-08-22 2021-06-01 浪潮集团有限公司 Trusted computer system based on face recognition and implementation method
CN110751267B (en) * 2019-09-30 2021-03-30 京东城市(北京)数字科技有限公司 Neural network structure searching method, training method, device and storage medium
CN111985644B (en) * 2020-08-28 2024-03-08 北京市商汤科技开发有限公司 Neural network generation method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006001121A1 (en) * 2004-06-25 2006-01-05 Shin Caterpillar Mitsubishi Ltd. Data compressing device and method, data analyzing device and method, and data managing system
CN104200224A (en) * 2014-08-28 2014-12-10 西北工业大学 Valueless image removing method based on deep convolutional neural networks
CN104751842A (en) * 2013-12-31 2015-07-01 安徽科大讯飞信息科技股份有限公司 Method and system for optimizing deep neural network
CN106295794A (en) * 2016-07-27 2017-01-04 中国石油大学(华东) The neural network modeling approach of fractional order based on smooth Group Lasso penalty term
CN106503654A (en) * 2016-10-24 2017-03-15 中国地质大学(武汉) A kind of face emotion identification method based on the sparse autoencoder network of depth
CN106548234A (en) * 2016-11-17 2017-03-29 北京图森互联科技有限责任公司 A kind of neural networks pruning method and device
CN106650928A (en) * 2016-10-11 2017-05-10 广州视源电子科技股份有限公司 Neural network optimization method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103793694B (en) * 2014-02-10 2017-02-08 天津大学 Human face recognition method based on multiple-feature space sparse classifiers
CN106548192B (en) * 2016-09-23 2019-08-09 北京市商汤科技开发有限公司 Image processing method, device and electronic equipment neural network based

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006001121A1 (en) * 2004-06-25 2006-01-05 Shin Caterpillar Mitsubishi Ltd. Data compressing device and method, data analyzing device and method, and data managing system
CN104751842A (en) * 2013-12-31 2015-07-01 安徽科大讯飞信息科技股份有限公司 Method and system for optimizing deep neural network
CN104200224A (en) * 2014-08-28 2014-12-10 西北工业大学 Valueless image removing method based on deep convolutional neural networks
CN106295794A (en) * 2016-07-27 2017-01-04 中国石油大学(华东) The neural network modeling approach of fractional order based on smooth Group Lasso penalty term
CN106650928A (en) * 2016-10-11 2017-05-10 广州视源电子科技股份有限公司 Neural network optimization method and device
CN106503654A (en) * 2016-10-24 2017-03-15 中国地质大学(武汉) A kind of face emotion identification method based on the sparse autoencoder network of depth
CN106548234A (en) * 2016-11-17 2017-03-29 北京图森互联科技有限责任公司 A kind of neural networks pruning method and device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JIN-KYU KIM等: "An efficient pruning and weight sharing method for neural network", 2016 IEEE INTERNATIONAL CONFERENCE ON CONSUMER ELECTRONICS-ASIA (ICCE-ASIA), 5 January 2017 (2017-01-05) *
李鸣;张鸿;: "基于卷积神经网络迭代优化的图像分类算法", 计算机工程与设计, no. 01 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673694A (en) * 2021-05-26 2021-11-19 阿里巴巴新加坡控股有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN113673694B (en) * 2021-05-26 2024-08-27 阿里巴巴创新公司 Data processing method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN111178520B (en) 2024-06-07
WO2018227801A1 (en) 2018-12-20
CN107247991A (en) 2017-10-13

Similar Documents

Publication Publication Date Title
CN111178520A (en) Data processing method and device of low-computing-capacity processing equipment
US11870947B2 (en) Generating images using neural networks
US11651259B2 (en) Neural architecture search for convolutional neural networks
CN111414987B (en) Training method and training device of neural network and electronic equipment
KR102318772B1 (en) Domain Separation Neural Networks
WO2020082663A1 (en) Structural search method and apparatus for deep neural network
US10380479B2 (en) Acceleration of convolutional neural network training using stochastic perforation
JP7439151B2 (en) neural architecture search
KR102415506B1 (en) Device and method to reduce neural network
CN114503121A (en) Resource constrained neural network architecture search
US20180018555A1 (en) System and method for building artificial neural network architectures
CN111406267A (en) Neural architecture search using performance-predictive neural networks
US11144782B2 (en) Generating video frames using neural networks
CN110622178A (en) Learning neural network structure
WO2021042857A1 (en) Processing method and processing apparatus for image segmentation model
CN111008631B (en) Image association method and device, storage medium and electronic device
WO2020152233A1 (en) Action selection using interaction history graphs
CN110956655B (en) Dense depth estimation method based on monocular image
CN114282666A (en) Structured pruning method and device based on local sparse constraint
CN118643874A (en) Method and device for training neural network
KR20220134627A (en) Hardware-optimized neural architecture discovery
CN113723603A (en) Method, device and storage medium for updating parameters
Huai et al. Zerobn: Learning compact neural networks for latency-critical edge systems
CN113705724B (en) Batch learning method of deep neural network based on self-adaptive L-BFGS algorithm
CN114021697A (en) End cloud framework neural network generation method and system based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant