CN111178520A

CN111178520A - Data processing method and device of low-computing-capacity processing equipment

Info

Publication number: CN111178520A
Application number: CN202010011285.4A
Authority: CN
Inventors: 王乃岩; 黄泽昊
Original assignee: Beijing Tusimple Technology Co Ltd
Current assignee: Beijing Tusimple Technology Co Ltd
Priority date: 2017-06-15
Filing date: 2017-06-15
Publication date: 2020-05-19
Anticipated expiration: 2037-06-15
Also published as: CN111178520B; WO2018227801A1; CN107247991A

Abstract

The invention discloses a data processing method and device of low-computing-capacity processing equipment. The method comprises the following steps: in the real-time computer vision processing process, a processing device with low computing power acquires image data; the processing equipment uses a preset neural network to perform computer vision processing on the acquired image data to obtain a computer vision processing result; wherein the preset neural network is a target neural network obtained by the following process. Constructing an initial neural network, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures; training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network; and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.

Description

Data processing method and device of low-computing-capacity processing equipment

Technical Field

The present invention relates to the field of computers, and in particular, to a data processing method and apparatus for a low-computing-capability processing device.

Background

In recent years, deep neural networks have enjoyed great success in many areas, such as computer vision, natural language processing, and the like. However, the model of the deep neural network often includes a large number of model parameters, which are large in calculation amount and low in processing speed, and cannot be calculated in real time on some devices with low power consumption and low calculation capability (such as embedded devices and integrated devices). A device with low computing power is a device with low computing power if the computing power of the device is lower than the computing power required by the computing task or the computing model deployed thereon.

To solve this problem, some solutions are currently proposed:

solution 1, Hao Zhou in the paper "Less is More: Towards Compact CNNS" indicates that the existing convolutional neural network includes a large number of parameters, and deployment and operation of such convolutional neural network on a computing platform requires a large amount of memory resources and hardware resources, which makes it impossible to use a deep convolutional neural network on a mobile computing device with limited memory resources. The paper proposes a scheme for learning the number of neurons in each layer of a neural network by group sparsity constraint, which adds group sparsity constraint to the weights of a convolutional neural network, i.e., the weight of each neuron is a group. Since the group sparsity constraint compresses the weight in each group to 0 as fully as possible, when the weight of a neuron is 0 fully, the neuron can be removed, and thus the number of neurons in the neural network can be learned.

Solution 2, presented by Jose m. alvarez in the paper "Learning the number of neurons in denenetworks" is basically the same as solution 1, except that in this solution 2 the group sparsity constraints for each layer of neurons are different, i.e. the strength of the group constraints for different layers of neurons is different.

Scheme 3, Wen Wei in the paper "Learning Structured spaces in Deep NeuralNetworks" indicates that deploying such a large model requires a large amount of computing and storage resources. The solution proposed by this paper is to learn, for example, the number of neurons, the shape of neurons, the depth of cross-layer connected network layers, etc., using group sparsity constraints.

However, the neural network obtained according to the scheme still cannot well meet the requirements of compact structure, high running speed and high precision, so that real-time operation cannot be performed on equipment with low computing power.

Disclosure of Invention

In an embodiment of the present application, on the one hand, a data processing method for a low computing power processing device is provided, where the method includes:

in the real-time computer vision processing process, a processing device with low computing power acquires image data;

the processing equipment uses a preset neural network to perform computer vision processing on the acquired image data to obtain a computer vision processing result; the preset neural network is a target neural network obtained by the following processing:

constructing an initial neural network for realizing computer vision processing, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;

training the weight of the initial neural network and the sparse scaling operator with a specific structure by adopting preset training sample data to obtain an intermediate neural network;

and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing the computer vision processing.

In another aspect, an embodiment of the present application provides a data processing apparatus for a low-computation-power processing device, where the apparatus includes: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:

acquiring image data in a real-time computer vision processing process;

performing computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result; wherein the preset neural network is a target neural network obtained by a construction device, and the construction device includes:

the first construction unit is used for constructing an initial neural network, and a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators which are used for scaling the output of the corresponding specific structures;

the training unit is used for training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network;

and the second construction unit is used for deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.

In another aspect of the present application, a data processing method of a low computing power processing device is provided, including:

in the real-time natural language processing process, processing equipment with low computing power acquires text data;

the processing equipment uses a preset neural network to perform natural language processing on the acquired text data to obtain a natural language processing result; the preset neural network is a target neural network obtained by the following processing:

constructing an initial neural network for realizing natural language processing, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;

and deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing natural language processing.

In another aspect of the present application, a data processing apparatus of a low computing power processing device is provided, including: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:

acquiring text data in a real-time natural language processing process;

carrying out natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result; wherein the preset neural network is a target neural network obtained by a construction device, and the construction device includes:

the third construction unit is used for constructing an initial neural network for realizing natural language processing, and a plurality of specific structures preset in the initial neural network are respectively provided with corresponding sparse scaling operators which are used for scaling the output of the corresponding specific structures;

and the fourth construction unit is used for deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network for realizing natural language processing.

According to the data processing method of the low-computing-capacity processing device, on one hand, the sparse scaling operator is introduced to scale the output of different specific structures, new constraint does not need to be added to the weight, the weight and the sparse scaling operator can be independently optimized, the precision of the neural network can be improved, on the other hand, the specific structure with the zero sparse scaling operator does not contribute to the output result of the neural network, the specific structure with the zero sparse scaling operator is deleted, the precision of the neural network is not affected, and the neural network can be simplified to improve the running speed of the neural network. Thus, a low computing power processing device may apply the neural network described above to perform real-time computer vision processing, or to perform real-time natural language processing.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.

FIG. 1 is a flow chart of a method for constructing a neural network according to an embodiment of the present invention;

FIG. 2 is a block diagram of a particular architecture of an embodiment of the present invention;

FIG. 3 is a diagram of a residual block in a residual network according to an embodiment of the present invention;

FIG. 4 is a diagram of a specific structure module according to an embodiment of the present invention;

FIG. 5 is a diagram illustrating a neuron as a specific structure in accordance with an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an apparatus for constructing a neural network according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example one

Referring to fig. 1, a flowchart of a method for constructing a neural network according to an embodiment of the present invention is shown, where the method includes:

step 101, constructing an initial neural network, wherein a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures.

And 102, training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network.

And 103, deleting the specific structure with the sparse scaling operator being zero in the intermediate neural network to obtain the target neural network.

Preferably, the foregoing step 101 can be realized by the following steps a1 to A3:

and A1, selecting a neural network model.

According to the embodiment of the invention, a neural network model corresponding to the function (such as a computer vision processing function, image segmentation, object detection, face recognition, natural language processing function and the like) realized by the expected target neural network can be selected from a preset neural network model set, and a corresponding neural network model can be constructed according to the function realized by the expected target neural network. The present application is not strictly limited.

And A2, determining a specific structure of the neural network model needing to be provided with a sparse scaling operator.

In the embodiment of the present invention, a designer may determine a specific structure in the neural network model, for example: all or a portion of the neurons of a certain layer or layers of a network in a neural network may be determined to be of a particular structure. And/or, determining one or more modules in the neural network having the following characteristics as the specific structure: characteristic 1, including more than one network layer (e.g., the particular structure includes more than two cascaded network layers); characteristic 2, parallel connection with other modules or cross-layer connection between the front and back ends of the module. And/or, determining one or more modules in the neural network having the following characteristics as a specific structure: property 1, a module comprising more than one module (e.g., the particular structure comprises more than two modules in parallel); property 2, the front and back ends of the module have cross-layer connections.

And A3, setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.

In the embodiment of the invention, the value of the sparse scaling operator of each specific structure is more than or equal to 0. Preferably, the value of the initial sparse scaling operator is close to 1, for example, it may be directly 1.

Preferably, in the embodiment of the present invention, the step 102 may be specifically realized by the following steps B1 to B3:

and B1, constructing an objective function corresponding to the initial neural network, wherein the objective function comprises a loss function and a sparse regular function. The objective function is shown in equation (1):

in the formula (1), W is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,

sampling data x for neural network_iThe loss of the amount of the carbon dioxide gas,

is a sparse regularization function.

And B2, performing iterative training on the initial neural network by adopting the training sample data.

And step B3, when the iterative training times reach a threshold value or the objective function meets a preset convergence condition, obtaining the intermediate neural network.

Preferably, the step B2 may be implemented by performing the following iterative training on the initial neural network for a plurality of times, which is described by taking an iterative process of a non-first iteration and a non-last iteration (hereinafter referred to as the current iterative training) as an example, where the iterative training includes the following steps C1 to C3:

step C1, taking the sparse scaling operator obtained by the previous iteration training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iteration training;

step C2, taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training;

and C3, performing next iteration training based on the weight and the sparse scaling operator of the iteration training.

The first iterative training process is as follows: taking an initial sparse scaling operator as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and performing second iterative training based on the weight of the iterative training and the sparse scaling operator.

The last iteration training process is as follows: taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and taking the neural network containing the sparse scaling operator and the weight obtained by the iterative training as an intermediate neural network.

Preferably, in the embodiment of the present invention, the first optimization algorithm may be, but is not limited to, any one of the following algorithms: random gradient descent algorithm, variant algorithm introducing momentum.

Preferably, in the embodiment of the present invention, the second optimization algorithm may be, but is not limited to, any one of the following algorithms: an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm, and an alternating direction multiplier algorithm.

Preferably, in another embodiment, the objective function in the embodiment of the present invention includes a loss function, a weight regularization function and a sparsity regularization function, and the objective function is represented by equation (2):

in the formula (2), W is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,

in order to be a function of the weight regularization,

is a sparse regularization function.

Preferably, in the embodiments of the present invention

Is sparsely regularized with weight γ, i.e.

Of course, those skilled in the art will also appreciate

A more complex sparse constraint, such as a non-convex sparse constraint, is set.

To further describe how to solve W and λ in the objective function in the embodiment of the present invention in detail, the objective function is taken as formula (2),

For example, the optimization objective function solution obtained by one-time iterative training is described to obtain W and λ. Will be provided with

Is marked as

Is marked as

With λ as a constant and W as a variable, the objective function is converted into

The value of W can be solved by adopting a random gradient descent algorithm, and the specific process is not described in detail.

With W as a constant and λ as a variable, the objective function is converted into

The value of lambda is solved by adopting an accelerated neighborhood gradient descent algorithm, which can be specifically obtained by the following methods:

in embodiment 1, λ is obtained by the following formulae (3) to (5):

wherein eta_tRepresents the step size of the gradient descent at the time of the t-th iterative training,

for the soft threshold operator, the following is defined

Mode 2, the solution of λ in the aforementioned mode 1 requires additional forward and backward calculation to obtain

Applying this algorithm directly to the existing deep learning framework is somewhat difficult. Therefore, the method 2 modifies the formula of the method 1 to obtain the formulas (6) to (8), and calculates λ from the formulas (6) to (8):

λ_t＝λ_t-1+v_tformula (8)

Mode 3 and the embodiment of the present invention provide simpler λ calculated by the following formulas (9) to (11) to further reduce the difficulty:

wherein lambda'_t-1＝λ_t-1+μ_t-1v_t-1Mu is a preset fixed value, and W and lambda are updated in the form of batch random gradient descent.

Specific structures are described in detail below as modules, and neurons, respectively.

As shown in fig. 2, it is assumed that the neural network includes N modules, each module corresponds to a sparse scaling operator, and the front end and the back end of each module have cross-layer connection.

Taking a specific example, assuming that the neural network is a residual network, and the specific structure is a residual module, as shown in fig. 3, the front and back ends of the residual module have cross-layer connection, and the ith residual moduleThe block corresponds to a sparse scaling operator of λⁱAnd then:

if the residual error model is trained, obtaining a sparse scaling operator lambda of a third residual error model³When the residual error is equal to 0, the 3 rd residual error module in the residual error network is used

And (5) deleting.

As shown in fig. 4, it is assumed that the neural network includes N modules, each module includes M modules, each module includes a plurality of cascaded network layers, and each module corresponds to one sparse scaling operator.

As shown in fig. 5, assuming that the neural network includes L network layers, and the L-th network layer includes k neurons, the k neurons correspond to one sparse scaling operator respectively.

According to the neural network obtained through the processing, on one hand, the sparse scaling operator is introduced to scale the output of different specific structures, new constraint does not need to be added to the weight, the weight and the sparse scaling operator can be independently optimized, the precision of the neural network can be improved, on the other hand, the specific structure with the sparse scaling operator being zero does not contribute to the output result of the neural network, the specific structure with the sparse scaling operator being zero is deleted, the precision of the neural network is not influenced, and the neural network can be simplified to improve the running speed of the neural network.

The neural network can thus be applied in low computing power processing devices for real-time computer vision processing of image data or real-time natural language processing of text data. Low computing power processing devices include low computing power integrated or embedded devices, low computing power computing platforms, low computing power mobile devices, and the like.

In one example embodiment, a method for data processing by a low computing power processing device includes:

step 1, in the real-time computer vision processing process, a processing device with low computing power acquires image data;

and 2, the processing equipment performs computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result.

Wherein the preset neural network is a target neural network obtained by processing as shown in fig. 1.

Through the processing, the low-computing-capacity processing equipment can efficiently and quickly process the acquired image data in real-time computer vision processing through the pre-configured neural network. Thereby enabling a low computing power processing device to perform real-time computer vision processing through the configured neural network.

step 1', in the real-time natural language processing process, a processing device with low computing power acquires text data;

and 2', the processing equipment performs natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result.

Through the processing, the low-computing-capacity processing equipment can efficiently and quickly process the acquired text data in real-time natural language processing through the pre-configured neural network. Thereby enabling a low computing power processing device to perform real-time natural language processing through a pre-configured neural network.

Example two

Based on the same inventive concept of the method for constructing a neural network provided in the first embodiment, a second embodiment of the present invention provides an apparatus for constructing a neural network, the apparatus having a structure as shown in fig. 6, and the apparatus includes:

the first construction unit 61 is configured to construct an initial neural network, where a plurality of preset specific structures in the initial neural network are respectively provided with corresponding sparse scaling operators, where the sparse scaling operators are used to scale outputs of the corresponding specific structures;

the training unit 62 is configured to train the weight of the initial neural network and the sparse scaling operator with the specific structure by using preset training sample data to obtain an intermediate neural network;

and a second constructing unit 63, configured to delete the specific structure in which the sparse scaling operator is zero in the intermediate neural network, so as to obtain the target neural network.

Preferably, the first constructing unit 61 specifically includes a selecting module, a specific structure determining module, and a constructing module, where:

the selecting module is used for selecting a neural network model;

in the embodiment of the invention, the selection module can be specifically realized as follows: a neural network model corresponding to the function (e.g., the function of computer vision processing: image segmentation, object detection, face recognition, or natural language processing) implemented by the desired target neural network may be selected from a preset set of neural network models, or a corresponding neural network model may be constructed according to the function implemented by the desired target neural network. The present application is not strictly limited.

The specific structure determining module is used for determining a specific structure of the neural network model, which needs to be provided with a sparse scaling operator;

and the building module is used for setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.

In the embodiment of the invention, the value of the sparse scaling operator of each specific structure is more than or equal to 0 and less than or equal to 1. Preferably, the value of the initial sparse scaling operator is close to 1, for example, it may be directly 1.

Preferably, the training unit 62 specifically includes an objective function constructing module, a training module, and a determining module, where:

the target function construction module is used for constructing a target function corresponding to the initial neural network, and the target function comprises a loss function and a sparse regular function;

the training module is used for carrying out iterative training on the initial neural network by adopting the training sample data;

and the determining module is used for obtaining the intermediate neural network when the iterative training times reach a threshold value or the target function meets a preset convergence condition.

Preferably, the training module is specifically configured to: performing the following iterative training on the initial neural network for a plurality of times (the iterative training is not the first iterative training and is not the last iterative training): taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training; taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training; and performing next iterative training based on the weight of the iterative training and the sparse scaling operator.

Preferably, the first optimization algorithm may be, but is not limited to, any one of the following algorithms: random gradient descent algorithm, variant algorithm introducing momentum.

Preferably, the second optimization algorithm is an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm or an alternating direction multiplier algorithm.

Preferably, the objective function is:

w is the weight of the neural network, lambda is the sparse scaling operator vector of the neural network, N is the number of training sample data,

is a sparse regularization function.

Preferably, in another embodiment, the objective function in the embodiment of the present invention includes a loss function, a weight regularization function, and a sparsity regularization function, and the objective function is as follows:

in order to be a function of the weight regularization,

is a sparse regularization function.

Preferably, in the embodiments of the present invention

Is sparsely regularized with weight γ, i.e.

Of course, those skilled in the art will also appreciate

Preferably, the specific structure is a neuron; or, the specific structure is a module including more than one network layer (for example, the specific structure includes more than two cascaded network layers), and the module is connected in parallel with other modules; alternatively, the specific structure is a module including more than one parallel module (for example, the specific structure includes more than two parallel modules), and the front end and the rear end of the module have cross-layer connection.

On one hand, the neural network obtained by the construction device shown in fig. 6 introduces the sparse scaling operator to scale the output of different specific structures, new constraints do not need to be added to the weights, the weights and the sparse scaling operator can be independently optimized, the precision of the neural network can be improved, on the other hand, the specific structure with the sparse scaling operator being zero does not contribute to the output result of the neural network, the specific structure with the sparse scaling operator being zero is deleted, the precision of the neural network is not influenced, and the neural network can be simplified to improve the operation speed of the neural network.

The neural network can thus be used in low computing power processing devices for real-time computer vision processing of image data or real-time natural language processing of text data.

In one example embodiment, a data processing apparatus of a low computing power processing device is provided that may be used for real-time image data processing. The device includes: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:

acquiring image data in a real-time computer vision processing process;

and carrying out computer vision processing on the acquired image data by using a preset neural network to obtain a computer vision processing result.

Wherein the preset neural network is a target neural network obtained by the construction apparatus shown in fig. 6. When the low-computing-capacity processing device performs image data processing, the first constructing unit 61 shown in fig. 6 may be used to construct an initial neural network used for computer vision processing, and the second constructing unit 63 may be used to construct a target neural network used for computer vision processing.

The apparatus may be located in, be part of, or be integrated with the low computing power processing device. The device can process the image data acquired in real time in the real-time computer vision processing process through a preset neural network.

In one example embodiment, a data processing apparatus of a low computing power processing device is provided that may be used for real-time text data processing. The device includes: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:

acquiring text data in a real-time natural language processing process;

and carrying out natural language processing on the acquired text data by using a preset neural network to obtain a natural language processing result.

Wherein the preset neural network is a target neural network obtained by the construction apparatus shown in fig. 6. In order to distinguish from real-time computer vision processing performed on image data when a low-computing-power processing device performs real-time natural language processing on text data, the first building unit 61 shown in fig. 6 may be a third building unit (not shown in the figure), the second building unit 63 may be a fourth building unit (not shown in the figure), the third building unit may be configured to build an initial neural network used for natural language processing, and the fourth building unit may be configured to build a target neural network used for natural language processing.

The apparatus may be located in, be part of, or be integrated with the low computing power processing device. The device can process the text data acquired in real time in the real-time natural language processing process through a preset neural network.

The foregoing is the core idea of the present invention, and in order to make the technical solutions in the embodiments of the present invention better understood and make the above objects, features and advantages of the embodiments of the present invention more obvious and understandable, the technical solutions in the embodiments of the present invention are further described in detail with reference to the accompanying drawings.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A data processing method for a low computing power processing device, comprising:

2. The method according to claim 1, wherein constructing an initial neural network implementing computer vision processing comprises:

selecting a neural network model for realizing computer vision processing;

determining a specific structure of the neural network model needing to be provided with a sparse scaling operator;

and setting an initial sparse scaling operator for a specific structure in the neural network model to obtain the initial neural network.

3. The method according to claim 1, wherein training the weights of the initial neural network and the sparse scaling operator of the specific structure with preset training sample data to obtain an intermediate neural network specifically comprises:

constructing an objective function corresponding to an initial neural network, wherein the objective function comprises a loss function and a sparse regular function;

performing iterative training on the initial neural network by adopting the training sample data;

and when the iterative training times reach a threshold value or the target function meets a preset convergence condition, obtaining the intermediate neural network.

4. The method according to claim 3, wherein the iteratively training the initial neural network using the training sample data specifically comprises:

performing the following iterative training on the initial neural network for a plurality of times:

taking a sparse scaling operator obtained by previous iterative training as a constant of the objective function, taking the weight as a variable of the objective function, and optimizing the objective function by adopting a first optimization algorithm to obtain the weight of the current iterative training;

taking the weight of the iterative training as a constant of the objective function, taking a sparse scaling operator as a variable of the objective function, and optimizing the objective function by adopting a second optimization algorithm to obtain the sparse scaling operator of the iterative training;

and performing next iterative training based on the weight of the iterative training and the sparse scaling operator.

5. The method of claim 4, wherein the second optimization algorithm is an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm, or an alternating direction multiplier algorithm.

6. The method of claim 3, wherein the objective function is:

w is weight, lambda is sparse scaling operator vector, N is number of sample data,

is a sparse regularization function.

7. The method according to any one of claims 1 to 6, wherein the specific structure is a neuron;

or the specific structure is a module comprising more than one network layer, and the module is connected with other modules in parallel;

or the specific structure is a module comprising more than one module, and the front end and the rear end of the module are connected in a cross-layer mode.

8. A data processing apparatus of a low computing power processing device, comprising: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:

acquiring image data in a real-time computer vision processing process;

the computer vision processing system comprises a first construction unit, a second construction unit and a third construction unit, wherein the first construction unit is used for constructing an initial neural network for realizing computer vision processing, a plurality of specific structures preset in the initial neural network are respectively provided with corresponding sparse scaling operators, and the sparse scaling operators are used for scaling the output of the corresponding specific structures;

the training unit is used for training the weight of the initial neural network and the sparse scaling operator with the specific structure by adopting preset training sample data to obtain an intermediate neural network for realizing computer vision processing;

9. The apparatus according to claim 8, wherein the first building unit comprises:

the selection module is used for selecting a neural network model for realizing computer vision processing;

10. The method according to claim 8, wherein the training unit specifically comprises:

11. The apparatus of claim 10, wherein the training module is specifically configured to:

12. The apparatus of claim 11, wherein the second optimization algorithm is an accelerated neighborhood gradient descent algorithm, a neighborhood gradient descent algorithm, or an alternating direction multiplier algorithm.

13. The apparatus of claim 10, wherein the objective function is:

where W is the weight, λ is the sparse scaling operator vector, N is the number of sample data,

is a sparse regularization function.

14. The device according to any one of claims 8 to 13, wherein the specific structure is a neuron;

15. A data processing method for a low computing power processing device, comprising:

16. A data processing apparatus of a low computing power processing device, comprising: at least one processor and at least one memory, at least one machine executable instruction stored in the at least one memory, the at least one processor executing the at least one machine executable instruction to perform the following:

acquiring text data in a real-time natural language processing process;