CN117391187A

CN117391187A - Neural network lossy transmission optimization method and system based on dynamic hierarchical mask

Info

Publication number: CN117391187A
Application number: CN202311416736.2A
Authority: CN
Inventors: 黄志青; 余俊
Original assignee: Guangzhou Hengsha Digital Technology Co ltd
Current assignee: Guangzhou Hengsha Digital Technology Co ltd
Priority date: 2023-10-27
Filing date: 2023-10-27
Publication date: 2024-01-12

Abstract

The invention discloses a neural network lossy transmission optimization method and a system based on dynamic hierarchical masks, wherein the method comprises the following steps: obtaining a deep neural network to be inferred and splitting to obtain a plurality of sub-deep neural networks; based on the output layers of the sub-depth neural networks, introducing dynamic layering masks to obtain the sub-depth neural networks with the layering masks; training a plurality of sub-deep neural networks with layering masks; and deploying the trained sub-deep neural networks with the layering masks to terminal equipment to perform reasoning on the input data, so as to obtain output data corresponding to the input data. According to the invention, by introducing the dynamic layering mask, the loss of the network transmission to the intermediate features can be avoided, and the efficiency of distributed collaborative reasoning is improved. The method and the system for optimizing the lossy transmission of the neural network based on the dynamic hierarchical mask can be widely applied to the technical field of deep neural network optimization.

Description

Neural network lossy transmission optimization method and system based on dynamic hierarchical mask

Technical Field

The invention relates to the technical field of deep neural network optimization, in particular to a neural network lossy transmission optimization method and system based on dynamic hierarchical masks.

Background

With the rapid development of the fields of internet of things (IoT) and Deep Neural Networks (DNNs), the continuous progress of Artificial Intelligence (AI) applications, such as typical applications of face recognition, target detection, augmented reality, etc., is being promoted. However, the limited computing resources on mobile devices pose significant challenges to mobile machine learning applications. Some applications at this stage typically employ an optimized computational model that is several orders of magnitude smaller than the standard model. While this approach reduces computational burden, it may also result in reduced inference accuracy, and therefore, severe computational constraints make more computationally intensive AI tasks difficult to implement, further hampering the development of mobile-end applications, while existing optimization of neural networks creates an artificial bottleneck by modifying the model structure, resulting in fewer intermediate transmission features, thereby improving system efficiency, i.e., reducing communication delay, but requiring modification of the original model structure, which requires the introduction of a secondary neural network, e.g., auto-encoder simulating the loss of communication transmission during training, but introducing additional neural networks introduces additional computational power consumption and time delay.

Disclosure of Invention

In order to solve the technical problems, the invention aims to provide a neural network lossy transmission optimization method and a neural network lossy transmission optimization system based on a dynamic hierarchical mask, which can avoid loss of intermediate features caused by network transmission and improve the efficiency of distributed collaborative reasoning by introducing the dynamic hierarchical mask.

The first technical scheme adopted by the invention is as follows: the neural network lossy transmission optimization method based on the dynamic hierarchical mask comprises the following steps:

obtaining a depth neural network to be inferred and splitting the depth neural network to obtain a plurality of sub-depth neural networks, wherein the output items of the sub-depth neural network positioned in front are input items of the sub-depth neural network positioned behind in any two adjacent sub-depth neural networks in the plurality of sub-depth neural networks;

introducing a dynamic layering mask based on the output layers of the sub-depth neural networks to obtain a plurality of sub-depth neural networks with layering masks;

training the sub-depth neural networks with the layering masks to obtain trained sub-depth neural networks with the layering masks;

and deploying the trained sub-depth neural networks with the layering masks to terminal equipment to perform reasoning on input data, so as to obtain output data corresponding to the input data.

Further, the step of obtaining the deep neural network to be inferred and splitting to obtain a plurality of sub-deep neural networks specifically includes:

the method comprises the steps of performing distributed deployment based on a deep neural network, and constructing a deep neural network to be inferred, wherein the deep neural network to be inferred represents the distributed deep neural network;

setting a split point based on a hidden layer of the deep neural network to be inferred;

and carrying out splitting treatment according to the splitting points of the deep neural network to be inferred, so as to obtain a plurality of sub-deep neural networks.

Further, the step of introducing a dynamic layering mask into the output layer based on the plurality of sub-depth neural networks to obtain a plurality of sub-depth neural networks with layering masks specifically includes:

inserting a dynamic layering mask based on the output layers of the sub-depth neural networks, wherein the dynamic layering mask comprises an inter-channel mask and an intra-channel mask;

acquiring weight distribution of the output layers of the plurality of sub-deep neural networks based on the dynamic layering mask;

generating a dynamic layering mask value according to the weight distribution of the output layers of the sub-depth neural networks, wherein the dynamic layering mask value considers the weight of each sub-depth neural network output layer and considers the weight relation among different sub-depth neural network output layers;

based on the dynamic layering mask value, a plurality of sub-deep neural networks with layering masks are obtained.

Further, the inter-channel mask is used for distinguishing signals of the output layers of the different sub-depth neural networks, controlling information transfer and interference among the output layers of the different sub-depth neural networks, and performing feature extraction and information interaction among the output layers of the different sub-depth neural networks.

Further, the intra-channel mask is used for suppressing the weight of the channel of the sub-depth neural network output layer, a corresponding intra-channel mask value is generated by calculating the weight distribution of each channel, each element in the intra-channel mask value indicates whether the weight of the channel is suppressed, if the element value of the corresponding intra-channel mask value is 0, the weight of the channel is suppressed, and if the element value of the corresponding intra-channel mask value is 1, the weight of the channel is not suppressed.

Further, the expression of the output layer of the several sub-deep neural networks is specifically as follows:

in the above-mentioned method, the step of,representing the output of the neural network at the ith position of the mth layer,/th>A feature map representing the ith position of the mth layer,/->Neural network weights representing the mth layer, i-th position,/->Input representing layer I, < >>And I _j Representing the input of layer 0, the original image.

Further, the expression of the inter-channel mask is specifically as follows:

M ₁ (p ₁ )～Bernoulli(p ₁ )

in the above, p ₁ Representing the probability of setting the input data to 0, 1-p ₁ Represents the probability of setting the input data to 1, σ represents the activation function, w represents the deep neural network weight, x represents the output of the sub-deep neural network, M ₁ (. Cndot.) represents the Bernoulli distribution, while ". Cndot.represents the multiplication of element-wise, bernoulli (. Cndot.) represents the compliance with the Bernoulli distribution.

Further, the expression of the intra-channel mask is specifically as follows:

M ₂ (γ)～Bernoulli(γ)

in the above, M ₂ (gamma) represents the Bernoulli distribution parameterized by gamma, gamma represents the probability of activation to the center point of a rectangular block, F (·) represents the fill function, p ₂ Representing the ratio of control mask set to 0, 1-p ₂ Indicating the ratio at which the control mask is set to 1.

Further, the expression of the probability of activation to be the center point of the rectangular block is specifically as follows:

in the above equation, B represents the length of a rectangular block, and H, W represents the spatial dimension of output data.

The second technical scheme adopted by the invention is as follows: a neural network lossy transmission optimization system based on dynamic hierarchical masking, comprising:

the splitting module is used for acquiring the deep neural network to be inferred and splitting to obtain a plurality of sub-deep neural networks, wherein the output items of the sub-deep neural network positioned in front are input items of the sub-deep neural network positioned behind in any two adjacent sub-deep neural networks in the plurality of sub-deep neural networks;

the inserting module is used for introducing dynamic layering masks based on the output layers of the sub-depth neural networks to obtain the sub-depth neural networks with the layering masks;

the training module is used for training the plurality of sub-depth neural networks with the layering masks to obtain a plurality of trained sub-depth neural networks with the layering masks;

and the reasoning module is used for deploying the trained sub-deep neural networks with the layering masks to terminal equipment to perform reasoning on input data, so as to obtain output data corresponding to the input data.

The method and the system have the beneficial effects that: according to the invention, the deep neural network to be inferred is obtained and split, a dynamic layering mask is introduced based on the output layers of a plurality of sub-deep neural networks, so that a common loss mode caused by noise, equipment faults and the like in the network communication process is described, dynamic value can be taken in the dynamic layering mask according to the current network communication condition, so that self-adaption is realized, data loss caused by data transmission in the middle of the network layers is avoided, communication delay can be reduced, the sub-networks are further deployed on different equipment for inference, the basic calculation cost of distributed collaborative inference can be reduced, and the efficiency of the distributed collaborative inference is improved.

Drawings

FIG. 1 is a flow chart of steps of a neural network lossy transmission optimization method based on dynamic hierarchical masking in accordance with an embodiment of the present invention;

FIG. 2 is a block diagram of a neural network lossy transmission optimization system based on dynamic hierarchical masking in accordance with an embodiment of the present invention;

FIG. 3 is a schematic diagram of a prior art distributed collaborative neural network architecture;

FIG. 4 is a schematic diagram of the overall flow of a hierarchical mask based neural network optimization system in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram illustrating an overall process flow of a dynamic hierarchical masking module according to an embodiment of the present invention;

FIG. 6 is a diagram illustrating dynamic hierarchical mask generation in accordance with an embodiment of the present invention.

Detailed Description

The invention will now be described in further detail with reference to the drawings and to specific examples. The step numbers in the following embodiments are set for convenience of illustration only, and the order between the steps is not limited in any way, and the execution order of the steps in the embodiments may be adaptively adjusted according to the understanding of those skilled in the art.

Explanation of technical terms of the present invention:

distributed collaborative reasoning: the distributed collaborative reasoning of the deep neural network is a method for completing the neural network reasoning task by utilizing multi-node and multi-device collaboration. The neural network reasoning task is distributed to different nodes or devices to realize parallel calculation and collaborative reasoning, so that overall reasoning speed and accuracy are improved, in distributed collaborative reasoning, different nodes or devices can play different roles, for example, some nodes can be responsible for processing input data, some nodes can be responsible for executing convolutional layer calculation of the neural network, and other nodes can be responsible for executing full-connection layer or output layer calculation and the like. The nodes perform data interaction and result summarization through a communication protocol, and finally complete the whole neural network reasoning task, and the distributed collaborative reasoning has the advantages that the computing resources and parallel processing capacity of multiple nodes and multiple devices can be fully utilized, and the reasoning speed and the reasoning efficiency are improved. Meanwhile, as different nodes or devices can be located in different geographic positions or network topological structures, geographic diversity and system fault tolerance can be realized, overall reasoning accuracy and reliability are improved, and distributed collaborative reasoning is an important research direction in the future deep learning field and can be applied to various scenes, such as cloud end, edge end, terminal equipment and the like. Meanwhile, with the continuous development of technologies such as the Internet of things, edge computing and 5G communication, the application prospect of distributed collaborative reasoning is also becoming wider and wider.

Inter-channel masking: the Inter-Channel Mask (Inter-Channel Mask) is a special Mask of the present invention. And the middle layer is mainly arranged at the division position of the neural network, and the feature map output by the middle layer is set to 0-1 according to the set mask. Since the output of the neural network intermediate layer typically contains multiple channels, the mask is primarily used to solve the inter-channel set problem, and is therefore referred to as an inter-channel mask. The manner in which the mask 0-1 is set has been described by the expression of the inter-channel mask.

In-channel masking: the Intra-Channel Mask (Intra-Channel Mask) here is a special Mask of the present invention. And the middle layer is mainly arranged at the division position of the neural network, and the feature map output by the middle layer is set to 0-1 according to the set mask. Since the output of the neural network intermediate layer typically contains multiple channels, the mask is primarily used to solve the problem of set in the channels, and is therefore referred to as an in-channel mask. The manner in which the mask 0-1 is set has been described by the expression of the intra-channel mask.

Referring to fig. 3, distributed collaborative deep learning is one possible direction, which breaks down a neural network into multiple parts, and makes parallel reasoning on different devices, so that the computing power of the distributed devices can be effectively utilized after the parallel processing. The invention further improves the efficiency of distributed collaborative reasoning.

Further by way of illustration, as shown in FIG. 6, assuming a current image classification task, training is based on a neural network ResNet 50. Distributed deployment requires first splitting the ResNet50 network into sub-networks 1 and 2, each containing several layers of ResNet 50. In order to prevent signal loss in the distributed deployment intermediate communication process, the invention adopts a neural network optimization system based on a hierarchical mask, and training is performed after a dynamic hierarchical mask is added in the last layer of the sub-network 1. After training, the images are deployed on the equipment 1 and the equipment 2 respectively, and finally the task of whole image recognition is completed.

Referring to fig. 1 and 4, the present invention provides a neural network lossy transmission optimization method based on dynamic hierarchical masking, the method comprising the steps of:

s1, acquiring a depth neural network to be inferred and splitting the depth neural network to obtain a plurality of sub-depth neural networks, wherein in any two adjacent sub-depth neural networks in the plurality of sub-depth neural networks, the output item of the sub-depth neural network positioned in front is the input item of the sub-depth neural network positioned behind;

specifically, a deep neural network to be inferred is constructed based on distributed deployment of the deep neural network, and the deep neural network to be inferred represents the distributed deep neural network; setting a split point based on a hidden layer of the deep neural network to be inferred; and carrying out splitting treatment according to the splitting points of the deep neural network to be inferred, so as to obtain a plurality of sub-deep neural networks.

Based on any original neural network S (neural network structure of any structure, and any input), it is currently to be distributed for deployment, a distributed neural network is constructed, which contains several hidden layers. Splitting points are set at a certain layer, and each splitting point is split into a plurality of sub-networks S1 and S2 …. Each sub-network contains several layers of the original neural network.

In this embodiment, for any two adjacent sub-neural networks in the plurality of sub-neural networks, the output term of the preceding sub-neural network is the input term of the following sub-neural network, for example, the deep neural network is divided into sub-neural network 1, sub-neural network 2, and sub-neural network S, where the sequence numbers in the sub-neural network 1, sub-neural network 2, sub-neural network S are in the order of the positions of the sub-neural networks in the deep neural network, that is, the sub-neural network 1 is the position in the deep neural network before the sub-neural network 2, the sub-neural network 2 is the position in the deep neural network before the sub-neural network 3, and so on, and the sub-neural network S is the last of the deep neural network.

S2, introducing a dynamic layering mask based on the output layers of the sub-depth neural networks to obtain a plurality of sub-depth neural networks with layering masks;

s21, inserting a dynamic layering mask based on the output layers of the sub-depth neural networks, wherein the dynamic layering mask comprises an inter-channel mask and an intra-channel mask;

specifically, as shown in fig. 5, a hierarchical mask, namely M1 and M2 …, is added to the last layer of each sub-network, and the invention inserts a dynamic hierarchical mask module at the communication place between the sub-networks to screen the intermediate feature map of the network transmission.

S22, acquiring weight distribution of output layers of the sub-depth neural networks based on the dynamic layering mask;

s23, generating a dynamic layering mask value according to the weight distribution of the output layers of the sub-depth neural networks, wherein the dynamic layering mask value considers the weight of each sub-depth neural network output layer and considers the weight relation among different sub-depth neural network output layers;

s24, obtaining a plurality of sub-depth neural networks with hierarchical masks based on the dynamic hierarchical mask values.

Specifically, in general, the output of the neural network layer m is expressed by the following formula:

The dynamic hierarchical mask consists of two parts: inter-channel masking and intra-channel masking to describe common loss patterns due to noise and device failures, etc. during network communications.

The generation of the inter-channel mask may be based on the following formula:

M ₁ (p ₁ )～Bernoulli(p ₁ )

in the above, p ₁ Representing the probability of setting the input data to 0, 1-p ₁ Represents the probability of setting the input data to 1, σ represents the activation function, w represents the deep neural network weight, x represents the output of the sub-deep neural network, M ₁ (. Cndot.) represents the Bernoulli distribution, by the multiplication of element-wise, bernoulli (. Cndot.) represents the compliance with the Bernoulli distribution;

the neural network maps the characteristic of the whole channel as p ₁ Is set to 0 at 1-p ₁ The probability of (1) is set to 1.

For signals of the output layers of the different sub-depth neural networks, which are used for distinguishing the inter-channel masks, information transmission and interference between the output layers of the different sub-depth neural networks are controlled, and feature extraction and information interaction between the output layers of the different sub-depth neural networks are carried out;

in this embodiment, the inter-channel mask is mainly used for processing multi-channel signals, for example, in digital television signal processing, and the inter-channel mask is used for distinguishing signals of different channels, so as to avoid interference between channels. In the context of deep neural networks, inter-channel masking may refer to masking between different convolutional layers for controlling information transfer and interference between the different layers. In particular, the inter-channel mask can be used for realizing feature extraction and information interaction between different convolution layers, so that the expression capability and generalization performance of the model are improved, namely, the inter-channel mask is mainly used for processing multi-channel signals and has the function of separating signals of different channels and avoiding interference among the channels. In the transmission of digital televisions, to save bandwidth, multiple channel signals are typically compressed into one data stream. At this time, the inter-channel mask may help separate the individual channel signals while avoiding mutual interference. Also, at the receiving end, the inter-channel mask may also help spread the received data stream into the original multiple channel signals.

The generation of the intra-channel mask may be based on the following formula:

M ₂ (γ)～Bernoulli(γ)

Here, M ₂ (γ) is the Bernoulli distribution parameterized by γ, which represents the probability that a certain activation becomes the center point of a certain block. F (-) is a fill function that fills a rectangular block of length B centered on a point. P is p ₂ Controlling the proportion of mask set to 0, the calculation of γ may be based on the following formula, where the feature map is emptyThe inter dimension is H W:

For the weight of the channel of which the in-channel mask is used for inhibiting the sub-depth neural network output layer, generating a corresponding in-channel mask value by calculating the weight distribution of each channel, wherein each element in the in-channel mask value represents whether the weight of the channel is inhibited, if the element value of the corresponding in-channel mask value is 0, the weight of the channel is inhibited, and if the element value of the corresponding in-channel mask value is 1, the weight of the channel is not inhibited;

in this embodiment, each channel learns a weight distribution during the calculation of the convolutional layer. To generate an intra-channel mask, a weight distribution for each channel needs to be calculated. This may be achieved by normalizing the weights of each channel, from which a corresponding intra-channel mask may be generated. Each element in the mask indicates whether the weight of the channel is suppressed. Normally, if the weight of a certain channel is smaller, the corresponding mask element value is 0, which indicates that the weight of the channel is suppressed; otherwise, the corresponding mask element value is 1, which indicates that the weight of the channel is not suppressed, and the generated intra-channel mask is applied to the calculation process of the convolution layer. Specifically, in the output of the convolution layer, the weight of the suppressed channel is set to 0, thereby reducing the complexity of the model; for the channels which are not suppressed, the original weights of the channels are reserved;

the in-channel mask is mainly used for processing a single channel signal, and is used for protecting the integrity of the channel signal and inhibiting the interference of other channel signals. The intra-channel mask generates a specific mask signal according to the characteristics of the channel, and distinguishes each frequency band in the channel, and simultaneously suppresses interference of signals of other channels. It generally adopts advanced signal processing algorithms and techniques, such as fourier transform, wavelet transform, etc., that is, intra-channel mask is mainly applied to Convolutional Neural Network (CNN), which is used to suppress the weights of some channels, so as to reduce the complexity of the model and improve the generalization capability of the model. In deep neural networks, each channel learns a weight distribution, and a corresponding intra-channel mask can be generated by calculating the weight distribution of each channel. Each element in the mask indicates whether the weight of the channel is suppressed. If the weight of a certain channel is smaller, the corresponding mask element value is 0, which indicates that the weight of the channel is suppressed; otherwise, the corresponding mask element value is 1, indicating that the weight of the channel is not suppressed. The generalization capability and the robustness of the model can be effectively improved by applying the mask in the channel, and the over-fitting phenomenon is reduced.

P in dynamic hierarchical mask ₁ And p ₂ The dynamic value can be taken according to the current network communication condition (T) so as to realize self-adaption.

S3, training the sub-depth neural networks with the layering masks to obtain trained sub-depth neural networks with the layering masks;

specifically, the method of calculating the weight distribution of each channel by the dynamic hierarchical mask is generally obtained through the learning process of the convolutional neural network. In convolutional neural networks, the weight distribution of each channel is continuously adjusted and optimized by a back-propagation algorithm and a gradient descent optimization algorithm. Specifically, during the training process, the convolutional neural network continuously adjusts the weight distribution of each channel according to the characteristics of the input data. And obtaining a gradient value of each channel weight by carrying out gradient calculation on the loss function, and then using the gradient value for updating the weight. Over multiple iterations and optimizations, the network will gradually adapt to different input data, and the weight distribution of each channel will also gradually tend to a relatively steady state. After training is completed, a corresponding dynamic hierarchical mask may be generated from the weight distribution for each channel. The weights of each channel can be normalized to make the weights of different channels have comparability, and then the normalized weight value is used as an element of the dynamic layering mask. It should be noted that the method of calculation of the dynamic hierarchical mask may vary from neural network structure to neural network structure and algorithm to algorithm. The specific calculation process may be adjusted and optimized according to different application scenarios and requirements.

S4, deploying the trained sub-deep neural networks with the layering masks to terminal equipment to perform reasoning of input data, and obtaining output data corresponding to the input data.

Specifically, the sub-deep neural network with the hierarchical mask is deployed to the terminal equipment to perform reasoning on the input data, so that rapid and self-adaptive reasoning and analysis on the input data on the terminal equipment can be realized.

Firstly, training the sub-deep neural network with the layering mask, packaging the sub-deep neural network into an executable file or a model library, and then deploying the executable file or the model library on terminal equipment. When input data exists on the terminal equipment, the input data can be preprocessed, such as normalization, abnormal value removal and other operations, and then input into the sub-depth neural network for reasoning. In the reasoning process, the sub-depth neural network performs layer-by-layer feature extraction and analysis on the input data according to a preset layering mask, so that a reasoning result is obtained. Because the layering mask can adaptively adjust the weight of each channel, the sub-depth neural network can be better adapted to different input data, and the reasoning accuracy and efficiency are improved. The sub-deep neural network is deployed on the terminal equipment, so that the computing resources and parallel processing capacity of the terminal equipment can be fully utilized, and quick reasoning and analysis can be realized. Meanwhile, the model size of the sub-depth neural network is usually smaller, so that transmission overhead and time delay can be reduced, and the overall reasoning performance is improved.

Referring to fig. 2, a dynamic hierarchical mask-based neural network lossy transmission optimization system, comprising:

The content in the method embodiment is applicable to the system embodiment, the functions specifically realized by the system embodiment are the same as those of the method embodiment, and the achieved beneficial effects are the same as those of the method embodiment.

While the preferred embodiment of the present invention has been described in detail, the invention is not limited to the embodiment, and various equivalent modifications and substitutions can be made by those skilled in the art without departing from the spirit of the invention, and these modifications and substitutions are intended to be included in the scope of the present invention as defined in the appended claims.

Claims

1. The neural network lossy transmission optimization method based on the dynamic hierarchical mask is characterized by comprising the following steps of:

2. The method for optimizing lossy transmission of a neural network based on a dynamic hierarchical mask according to claim 1, wherein the step of obtaining a deep neural network to be inferred and splitting the deep neural network to obtain a plurality of sub-deep neural networks specifically comprises:

3. The method for optimizing lossy transmission of a neural network based on a dynamic hierarchical mask according to claim 1, wherein the step of introducing the dynamic hierarchical mask into the output layer based on the sub-deep neural networks to obtain the sub-deep neural networks with the hierarchical masks specifically comprises the steps of:

4. The method for optimizing lossy transmission of a neural network based on dynamic hierarchical masking according to claim 3, wherein the inter-channel masking is used for distinguishing signals of output layers of different sub-depth neural networks, controlling information transfer and interference between the output layers of the different sub-depth neural networks, and performing feature extraction and information interaction between the output layers of the different sub-depth neural networks.

5. A neural network lossy transmission optimization method based on dynamic hierarchical masking according to claim 3, wherein the intra-channel masking is used to suppress the weights of channels of the sub-depth neural network output layer, and a corresponding intra-channel masking value is generated by calculating the weight distribution of each channel, and each element in the intra-channel masking value indicates whether the weights of the channels are suppressed, and if the element value of the corresponding intra-channel masking value is 0, the weights of the channels are suppressed, and if the element value of the corresponding intra-channel masking value is 1, the weights of the channels are not suppressed.

6. The method for optimizing lossy transmission of a neural network based on dynamic hierarchical masking according to claim 3, wherein the expressions of the output layers of the plurality of sub-deep neural networks are specifically as follows:

7. A neural network lossy transmission optimization method based on dynamic hierarchical masking according to claim 3, wherein the expression of the inter-channel masking is specifically as follows:

M ₁ (p ₁ )～Bernoulli(p ₁ )

8. A neural network lossy transmission optimization method based on dynamic hierarchical masking according to claim 3, wherein the expression of the intra-channel masking is specifically as follows:

M ₂ (γ)～Bernoulli(γ)

9. The method for optimizing lossy transmission over a neural network based on dynamic hierarchical masking according to claim 8, wherein the probability of activation to be the center point of a rectangular block is expressed as follows:

10. The neural network lossy transmission optimizing system based on the dynamic hierarchical mask is characterized by comprising the following modules: