CN114640356A - Big data compression method, system and storage medium based on neural network - Google Patents

Big data compression method, system and storage medium based on neural network Download PDF

Info

Publication number
CN114640356A
CN114640356A CN202210351881.6A CN202210351881A CN114640356A CN 114640356 A CN114640356 A CN 114640356A CN 202210351881 A CN202210351881 A CN 202210351881A CN 114640356 A CN114640356 A CN 114640356A
Authority
CN
China
Prior art keywords
data
network
self
coding
neurons
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210351881.6A
Other languages
Chinese (zh)
Inventor
周杨凡
常小梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Henan Banlong Electronic Technology Co ltd
Original Assignee
Henan Banlong Electronic Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Henan Banlong Electronic Technology Co ltd filed Critical Henan Banlong Electronic Technology Co ltd
Priority to CN202210351881.6A priority Critical patent/CN114640356A/en
Publication of CN114640356A publication Critical patent/CN114640356A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03MCODING; DECODING; CODE CONVERSION IN GENERAL
    • H03M7/00Conversion of a code where information is represented by a given sequence or number of digits to a code where the same, similar or subset of information is represented by a different sequence or number of digits
    • H03M7/30Compression; Expansion; Suppression of unnecessary data, e.g. redundancy reduction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections

Abstract

The invention discloses a big data compression method, a big data compression system and a storage medium based on a neural network, and relates to the field of artificial intelligence. The method mainly comprises the following steps: carrying out arithmetic coding on each data to be compressed, and respectively obtaining the initial weight of each data; constructing a self-coding network, wherein the self-coding network comprises an input layer, an output layer and at least one hidden layer; taking the coded data as the input and the output of the self-coding network at the same time, taking the initial weight of each data as the initial weight of the neuron corresponding to each data, and training the self-coding network; and performing network pruning on the self-coding network from small to large according to the weight until at least one of the compression rate and the accuracy of the self-coding network is out of the range of the corresponding preset threshold value, and taking the data corresponding to the hidden layer in the self-coding network after pruning is completed as the compressed data. The embodiment of the invention can improve the processing efficiency of big data compression.

Description

Big data compression method, system and storage medium based on neural network
Technical Field
The application relates to the field of data compression, in particular to a big data compression method, a big data compression system and a storage medium based on a neural network.
Background
Big data (big data), or huge data, refers to the data with huge scale, which cannot be captured, managed, processed and organized in a reasonable time to help enterprises make business decisions more positive by the current mainstream software tools.
Generally speaking, a large amount of redundant information exists in data, the characteristic is more obvious in the middle of large data, the data compression aims at reducing the data redundancy as much as possible, and currently, a self-coding network is generally adopted for data compression. However, the inventor of the embodiment of the present invention finds that, in the process of compressing large data by using a self-coding network, the required training time is long, the self-coding network required to be constructed is complex, the training time is further increased, and the time required by the compression process and the decompression process is too long.
Disclosure of Invention
In view of the above technical problems, the present invention provides a method, a system and a storage medium for compressing big data based on a neural network, which can compress a neural network model in combination with the distribution characteristics of the data while compressing the data, so that the data compression ratio meets the requirement and the time consumed by compression is reduced.
In a first aspect, an embodiment of the present disclosure provides a big data compression method based on a neural network, including:
and carrying out arithmetic coding on each data to be compressed, and respectively obtaining the initial weight of each data according to the similarity between the coded data and other coded data and the symbol type contained in the data.
Constructing a self-coding network, wherein the self-coding network comprises an input layer, an output layer and at least one hidden layer, and the number of neurons of the input layer and the output layer is the same and is greater than that of the neurons of the hidden layer.
And taking the coded data as the input and the output of the self-coding network at the same time, taking the initial weight of each data as the initial weight of the neuron corresponding to each data, and training the self-coding network to obtain the weight matrix of the coding network.
And carrying out network pruning on the self-coding network by taking the influence degree of the neurons in the self-coding network on the entropy of the weight matrix as the importance value of the neurons, and deleting redundant neurons and connections according to the importance value of the neurons and the weight value connected in the weight matrix in the network pruning process until at least one of the compression ratio and the accuracy of the self-coding network is out of the corresponding preset threshold range.
And taking the data corresponding to the hidden layer in the self-coding network after the network pruning as compressed data.
In one possible embodiment, performing network pruning on the self-coding network, and deleting redundant neurons and connections according to the importance values of the neurons and the weight values of the connections in the weight matrix in the network pruning process, includes:
and deleting the connection corresponding to the minimum weight value in the weight matrix in the self-coding network, and retraining the self-coding network after deleting the connection.
And when the set of input connections of a plurality of neurons is included in the set of input connections of any neuron, deleting the nerve with the minimum importance value in the neurons, and retraining the self-coding network after the neurons are deleted.
Deleting the neurons without output connection or output connection in the trained self-coding network, and retraining the self-coding network.
In one possible embodiment, the degree of influence of the neurons in the self-coding network on the entropy of the weight matrix is taken as the importance value of the neurons, and the method comprises the following steps:
taking the entropy of a weight matrix of the self-coding network as a first entropy when the neuron exists in the self-coding network, taking the entropy of the weight matrix obtained after setting a value corresponding to the neuron in the weight matrix to 0 as a second entropy, and taking the absolute value of the difference value between the first entropy and the second entropy as the influence degree of the neuron on the entropy of the weight matrix.
In a possible embodiment, the obtaining the initial weight of each data according to the similarity between the encoded data and other encoded data and the symbol type included in the data includes:
and respectively obtaining the similarity of each encoded data according to the mean value of the similarity between the encoded data and other encoded data.
All the coincidence types included in all the data are counted to obtain the coincidence types of all the data, and the ratio of the symbol type included in each coincidence to the symbol type of all the data is used as the symbol ratio of each data.
And taking the ratio of the symbol ratio of the data to the similarity of the data as the initial weight of the data.
In a possible embodiment, before performing arithmetic coding on each data to be compressed, the method further includes processing abnormal symbols in each data respectively.
In one possible embodiment, the abnormal symbols in each data are processed through a box graph.
In one possible embodiment, the method for obtaining the similarity between the encoded data and the encoded other data includes:
Figure BDA0003580908170000031
wherein s isijFor the similarity of the i-th data after encoding and the j-th data after encoding, aiThe decimal point number of the ith data after coding, ajThe decimal point number of the coded jth data, biIs the value of the i-th data after encoding, bjAnd j is a positive integer not greater than n, i is not equal to j, and n is the number of the coded data.
In a second aspect, an embodiment of the present invention provides a big data compression system based on a neural network, including:
and the arithmetic coding module is used for carrying out arithmetic coding on each data to be compressed.
And the initial weight acquisition module is used for respectively acquiring the initial weight of each datum according to the similarity between the encoded datum and other encoded data and the symbol type contained in the datum.
The self-coding network construction module is used for constructing a self-coding network, the self-coding network comprises an input layer, an output layer and at least one hidden layer, and the number of neurons of the input layer and the output layer is the same and is greater than that of the neurons of the hidden layer.
And the weight matrix acquisition module is used for simultaneously taking the coded data as the input and the output of the self-coding network, taking the initial weight of each data as the initial weight of the neuron corresponding to each data, and training the self-coding network to obtain the weight matrix of the coding network.
And the network pruning module is used for performing network pruning on the self-coding network by taking the influence degree of the neurons in the self-coding network on the entropy of the weight matrix as the importance values of the neurons, and deleting redundant neurons and connections according to the importance values of the neurons and the weight values connected in the weight matrix in the network pruning process until at least one of the compression rate and the accuracy rate of the self-coding network is out of the corresponding preset threshold range.
And the compressed data acquisition module is used for taking the data corresponding to the hidden layer in the self-coding network after the network pruning as the compressed data.
In a third aspect, an embodiment of the present invention provides a storage medium for big data compression based on a neural network, where the storage medium stores a program that is capable of being loaded and executed by a processor to implement a big data compression method based on a neural network as in the embodiment of the present invention.
Compared with the prior art, the embodiment of the invention has the beneficial effects that at least: the neural network model is compressed by combining the distribution characteristics of the data while the data is compressed, so that the time consumed by compression is reduced while the data compression ratio meets the requirement.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
Fig. 1 is a schematic flow diagram of a neural network-based big data compression method according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of a self-coding network in an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a neural network-based big data compression system according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
The terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature; in the description of the present embodiment, "a plurality" means two or more unless otherwise specified.
Generally speaking, a large amount of redundant information exists in data, the redundant information in large data is more, and data compression is to reduce the data redundancy as much as possible. The traditional data compression method usually aims at reducing redundancy in data and focuses on coding design, but the purpose of large data compression is to compress the data to obtain smaller data volume, and decompress the data on the other side through transmission, and if the structure of a neural network is too complicated, namely the quantity of parameters is more, the compression and decompression time is longer.
The embodiment of the invention provides a big data compression method based on a neural network, as shown in fig. 1, comprising the following steps:
step S101, performing arithmetic coding on each data to be compressed, and respectively obtaining initial weight of each data according to similarity between the coded data and other coded data and symbol types contained in the data.
Step S102, a self-coding network is constructed, the self-coding network comprises an input layer, an output layer and at least one hidden layer, and the number of neurons of the input layer and the output layer is the same and is larger than that of the neurons of the hidden layer.
And step S103, taking the coded data as input and output of the self-coding network at the same time, taking the initial weight of each data as the initial weight of the neuron corresponding to each data, and training the self-coding network to obtain a weight matrix of the coding network.
And S104, taking the influence degree of the neurons in the self-coding network on the entropy of the weight matrix as the importance value of the neurons, carrying out network pruning on the self-coding network, and deleting redundant neurons and connections according to the importance value of the neurons and the weight value of the connections in the network pruning process until at least one of the compression rate and the accuracy rate of the self-coding network is out of the corresponding preset threshold range.
And S105, taking the data corresponding to the hidden layer in the self-coding network after the network pruning as compressed data.
The main purposes of the invention are: and constructing a self-coding network and compressing the self-coding network according to the characteristics of the data to obtain the self-coding network with pertinence and the compressed carding, and simultaneously facilitating the follow-up utilization of the self-coding network to perform a more efficient big data compression process.
Further, step S101, performing arithmetic coding on each data to be compressed, and obtaining an initial weight of each data according to similarity between the coded data and other coded data and a symbol type included in the data. The method specifically comprises the following steps:
arithmetic coding belongs to entropy coding and is a lossless compression method with a compression ratio of about 2:1 to 5:1, which obviously cannot meet the requirement of large data coding. The result of arithmetic coding is a section, and a number is arbitrarily selected from the obtained section as the coding result of the symbol sequence.
Each data includes at least one symbol, for example, when there are five different symbols, the five different symbols are respectively included in the different data, and the widths of the different encoded symbols represent the frequency of each symbol, that is, the larger the frequency is, the larger the encoded width is. And respectively coding each data by utilizing the coding process of arithmetic coding to obtain each coded data.
Big data (big data), or huge data, refers to the data that is too large to be captured, managed, processed and organized in a reasonable time to help the enterprise to make business decisions more positive by the current mainstream software tools.
The large data has huge data scale, so that the number of neurons of an input layer and an output layer required by constructing a self-coding network is large; however, when the number of input neurons is large, the training time of the self-coding network is long, and a large number of layers are needed to achieve a good coding result, and the training time of the self-coding network is further increased, so that the efficiency of the data compression process is low.
Optionally, before performing arithmetic coding on each data to be compressed, abnormal values in each data may be removed respectively. It should be noted that outliers refer to those unreasonable values that are present in the data, while unreasonable values refer to values that deviate from the normal range rather than values that are in error. Outliers in a dataset may be due to sensor failures, human logging errors, or abnormal events. Therefore, more accurate data can be obtained, the workload of the subsequent processing process can be effectively reduced, and the training time of the self-coding network is further shortened.
The method for processing the abnormal value includes: and judging whether a specific numerical value is in a reasonable range or not according to the maximum value and the minimum value of the numerical value, and screening and processing abnormal values can be realized by utilizing the box type graph.
Specifically, the box chart can be used for observing the overall distribution of the data, and the overall distribution of the data is described by using statistics such as median, 25/% quantile, 75/% quantile, upper boundary, lower boundary and the like. By calculating these statistics, a box map is generated, the box containing most of the normal data, and the values or symbols outside the upper and lower bounds of the box are the outliers in the data.
Firstly, according to the mean value of the similarity between the encoded data and other encoded data, the similarity of each encoded data is respectively obtained. The calculation process of the similarity of any two different data after coding comprises the following steps:
calculating the similarity of the coding results according to the decimal point digit of the coding results and the numerical difference of the coding results,
Figure BDA0003580908170000061
wherein s isijFor the similarity of the coded ith data and the coded jth data, aiThe decimal point number of the ith data after coding, ajThe decimal point number of the coded jth data, biIs the value of the ith data after encoding, bjAnd j is a positive integer not greater than n, i is not equal to j, and n is the number of the coded data.
It should be noted that the greater the similarity of the encoded data, the greater the possibility that the data can be replaced, the greater the similarity, the greater the redundancy, and the amount of information of the data can be represented by other data or a combination of other data, and thus the lower the importance level thereof, and conversely, the greater the importance level thereof.
Then, all the coincidence types included in all the data are counted to obtain the coincidence types of all the data, and the ratio of the symbol type included in each coincidence to the symbol type of all the data is used as the symbol ratio of each data.
It should be noted that the ratio of the number of symbols included in each piece of data to the number of all symbols is calculated, and the larger the ratio is, the greater the unavailability of the data is, that is, the more important the data is.
And finally, taking the ratio of the symbol ratio of the data to the similarity of the data as the initial weight of the data.
Further, step S102, a self-coding network is constructed, where the self-coding network includes an input layer, an output layer, and at least one hidden layer, and the number of neurons in the input layer and the output layer is the same and greater than the number of neurons in the hidden layer. The method specifically comprises the following steps:
it should be noted that the self-coding belongs to a special neural network, the number of neurons in the hidden layer in the network is smaller than the number of input neurons, the output of the network is the same as the input, and in the embodiment of the present invention, the hidden layer in the self-coding network is used as the compressed result.
The self-coding network is commonly used for data compression, and generally needs a deeper network structure in order to obtain a better compression effect, however, the deeper network structure can prolong the training time for compressing big data, so the embodiment of the invention calculates the weight of the neurons in the self-coding network and performs network pruning on the self-coding network according to the weight of the input neurons, thereby removing redundant neurons or nodes in the self-coding network, obtaining the more targeted self-coding network, and further improving the efficiency of data compression.
Fig. 2 is a schematic structural diagram of a self-coding network according to an embodiment of the present invention, and as shown in fig. 2, the self-coding network includes an input layer, an output layer, and at least one hidden layer, the number of neurons in the input layer and the output layer is the same and is greater than the number of neurons in the hidden layer, and the hidden layer is also referred to as an intermediate layer. Therefore, the initial self-coding network is obtained, and the subsequent training and network pruning processes of the self-coding network are facilitated.
Further, step S103 is to train the self-coding network to obtain a weight matrix of the coding network by using the encoded data as input and output of the self-coding network at the same time, and using the initial weight of each data as the initial weight of the neuron corresponding to each data.
The purpose of using the encoded data as both input and output of the self-encoding network is to check the accuracy of the data that is decompressed after being processed by the self-encoding network.
And secondly, training the self-coding network by taking the initial weight of each data as the initial weight of the neuron corresponding to each data, so that the network pruning of the self-coding network in the subsequent process is facilitated, the network structure of the self-coding network is simplified, and the processing efficiency is improved.
Further, step S104, taking the influence degree of the neurons in the self-coding network on the entropy of the weight matrix as an importance value of the neurons, performing network pruning on the self-coding network, and deleting redundant neurons and connections according to the importance value of the neurons and the weight values connected in the weight matrix in the network pruning process until at least one of the compression ratio and the accuracy of the self-coding network is outside a corresponding preset threshold range.
Firstly, the influence degree of the neurons in the self-coding network on the entropy of the weight matrix is taken as the importance value of the neurons,
specifically, the obtaining of the degree of influence of the neuron on the entropy of the weight matrix includes taking the entropy of the weight matrix of the self-coding network as a first entropy value when the neuron exists in the self-coding network, taking the entropy of the weight matrix obtained by setting a value corresponding to the neuron in the weight matrix to 0 as a second entropy value, and taking an absolute value of a difference value between the first entropy value and the second entropy value as the degree of influence of the neuron on the entropy of the weight matrix.
And secondly, performing network pruning on the self-coding network, and deleting redundant neurons and connections according to the importance values of the neurons and the weight values of the connections in the weight matrix in the network pruning process until at least one of the compression rate and the accuracy of the self-coding network is out of the corresponding preset threshold range.
Specifically, deleting redundant neurons and connections according to the importance values of the neurons and the weight values of the connections in the weight matrix in the network pruning process, including:
deleting the connection corresponding to the minimum weight value in the weight matrix in the self-coding network, and retraining the self-coding network after deleting the connection; when the input connection set of a plurality of neurons is included in the input connection set of any neuron, deleting the neuron with the minimum importance value in the neurons, and retraining the self-coding network after the neurons are deleted; deleting the neurons without output connection or output connection in the trained self-coding network, and retraining the self-coding network. And performing connection or neuron deletion, and retraining the self-encoding network until at least one of the compression rate and the accuracy rate of the self-encoding network is out of the corresponding preset threshold range.
It should be noted that, the unimportant connections or neurons are clipped according to the order of the weighted values from small to large, and after one connection is clipped, if there is a certain neuron without input connection or output connection, it indicates that this neuron does not work for the whole model, and this neuron can be directly deleted.
Meanwhile, the output result and the input result of the self-coding network are compared, so that the compression accuracy can be obtained; when the compression accuracy is out of the preset accuracy threshold range, indicating that the input neurons deleted previously need to be reserved to ensure the accuracy; similarly, when the compression ratio of the self-encoding network is out of the preset compression ratio threshold range, it is indicated that the input neurons deleted previously need to be retained to ensure the accuracy, and then the neural network is retrained to improve the accuracy of the model after clipping. And repeating the processes of network cutting and retraining until at least one of the compression rate and the accuracy of the network does not meet the corresponding threshold range, and selecting the neural network before neuron deletion as the self-encoding network obtained by training.
Further, step S105, taking data corresponding to the hidden layer in the self-coding network after the network pruning as compressed data.
And after the self-coding network after pruning is obtained, transmitting the data corresponding to the hidden layer in the obtained self-coding network as compressed data.
Based on the same inventive concept as the above method, the present embodiment further provides a big data compression system based on a neural network, as shown in fig. 3, including:
and an arithmetic coding module 201, configured to perform arithmetic coding on each data to be compressed.
An initial weight obtaining module 202, configured to obtain initial weights of the data according to similarities between the encoded data and other encoded data and types of symbols included in the data.
The self-coding network construction module 203 is configured to construct a self-coding network, where the self-coding network includes an input layer, an output layer, and at least one hidden layer, and the numbers of neurons in the input layer and the output layer are the same and greater than the number of neurons in the hidden layer.
The weight matrix obtaining module 204 is configured to use the encoded data as input and output of the self-encoding network at the same time, use the initial weight of each data as the initial weight of the neuron corresponding to each data, and train the self-encoding network to obtain a weight matrix of the encoding network.
The network pruning module 205 is configured to perform network pruning on the self-coding network by using the degree of influence of the neurons in the self-coding network on the entropy of the weight matrix as an importance value of the neurons, and delete redundant neurons and connections in the network pruning process according to the importance value of the neurons and a weight value connected in the weight matrix until at least one of the compression rate and the accuracy of the self-coding network is outside a corresponding preset threshold range.
And the compressed data acquisition module 206 is configured to take data corresponding to the hidden layer in the self-coding network after the network pruning as compressed data.
The embodiment of the present invention further provides a storage medium for big data compression based on a neural network, where the storage medium stores computer instructions, and the instructions are executed to satisfy the big data compression method based on a neural network described in any of the above embodiments.
In summary, embodiments of the present invention provide a method, a system, and a storage medium for compressing big data based on a neural network, where a neural network model is also compressed in combination with a distribution characteristic of the data while compressing the data, so that a data compression ratio meets a requirement and a time consumed by compression is reduced.
The use of words such as "including," "comprising," "having," and the like in this disclosure is an open-ended term that means "including, but not limited to," and is used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".
It should also be noted that the various components or steps may be broken down and/or re-combined in the methods and systems of the present invention. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.
The above-mentioned embodiments are merely examples for clearly illustrating the present invention and do not limit the scope of the present invention. Other variations and modifications in the above description will occur to those skilled in the art and are not necessarily exhaustive of all embodiments. All designs identical or similar to the present invention are within the scope of the present invention.

Claims (9)

1. A big data compression method based on a neural network is characterized by comprising the following steps:
carrying out arithmetic coding on each data to be compressed, and respectively obtaining the initial weight of each data according to the similarity between the coded data and other coded data and the symbol type contained in the data;
constructing a self-coding network, wherein the self-coding network comprises an input layer, an output layer and at least one hidden layer, and the number of neurons of the input layer and the output layer is the same and is greater than that of the neurons of the hidden layer;
taking the encoded data as the input and the output of the self-encoding network at the same time, taking the initial weight of each data as the initial weight of the neuron corresponding to each data, and training the self-encoding network to obtain a weight matrix of the encoding network;
taking the influence degree of the neurons in the self-coding network on the entropy of the weight matrix as the importance value of the neurons, carrying out network pruning on the self-coding network, and deleting redundant neurons and connections according to the importance value of the neurons and the weight value connected in the weight matrix in the network pruning process until at least one of the compression ratio and the accuracy of the self-coding network is out of the corresponding preset threshold range;
and taking the data corresponding to the hidden layer in the self-coding network after the network pruning as compressed data.
2. The big data compression method based on the neural network as claimed in claim 1, wherein the network pruning is performed on the self-coding network, and the redundant neurons and connections are deleted according to the importance values of the neurons and the weight values of the connections in the weight matrix in the network pruning process, comprising:
deleting the connection corresponding to the minimum weight value in the weight matrix in the self-coding network, and retraining the self-coding network after deleting the connection;
when the input connection set of a plurality of neurons is included in the input connection set of any neuron, deleting the neuron with the minimum importance value in the neurons, and retraining the self-coding network after the neurons are deleted;
deleting the neurons without output connection or output connection in the trained self-coding network, and retraining the self-coding network.
3. The neural network-based big data compression method according to claim 1, wherein the method for using the degree of influence of the neurons in the self-coding network on the entropy of the weight matrix as the importance value of the neurons comprises the following steps:
taking the entropy of a weight matrix of the self-coding network as a first entropy when the neuron exists in the self-coding network, taking the entropy of the weight matrix obtained after setting a value corresponding to the neuron in the weight matrix to 0 as a second entropy, and taking the absolute value of the difference value between the first entropy and the second entropy as the influence degree of the neuron on the entropy of the weight matrix.
4. The big data compression method based on the neural network as claimed in claim 1, wherein the obtaining of the initial weight of each data according to the similarity between the encoded data and other encoded data and the symbol type included in the data comprises:
respectively obtaining the similarity of each encoded data according to the mean value of the similarity between the encoded data and other encoded data;
counting all coincidence types contained in all data to obtain coincidence types of all data, and respectively taking the ratio of the symbol type contained in each coincidence to the symbol type of all data as the symbol ratio of each data;
and taking the ratio of the symbol ratio of the data to the similarity of the data as the initial weight of the data.
5. The neural network-based big data compression method according to claim 1, wherein before arithmetically encoding each data to be compressed, the method further comprises processing abnormal symbols in each data respectively.
6. The neural network-based big data compression method as claimed in claim 4, wherein the processing of abnormal symbols in each data is realized by a box graph.
7. The big data compression method based on the neural network as claimed in claim 3, wherein the method for obtaining the similarity between the encoded data and other encoded data comprises:
Figure FDA0003580908160000021
wherein s isijFor the similarity of the coded ith data and the coded jth data, aiThe decimal point number of the ith data after coding, ajThe decimal point number of the coded jth data, biIs the value of the ith data after encoding, bjAnd j is a positive integer not greater than n, i is not equal to j, and n is the number of the coded data.
8. A neural network-based big data compression system, comprising:
the arithmetic coding module is used for carrying out arithmetic coding on each data to be compressed;
the initial weight acquisition module is used for respectively acquiring the initial weight of each datum according to the similarity of the encoded datum and other encoded data and the symbol type contained in the datum;
the self-coding network construction module is used for constructing a self-coding network, the self-coding network comprises an input layer, an output layer and at least one hidden layer, and the number of neurons of the input layer and the output layer is the same and is greater than that of the neurons of the hidden layer;
the weight matrix acquisition module is used for simultaneously taking the encoded data as the input and the output of the self-encoding network, taking the initial weight of each data as the initial weight of the neuron corresponding to each data, and training the self-encoding network to obtain the weight matrix of the encoding network;
the network pruning module is used for performing network pruning on the self-coding network by taking the influence degree of the neurons in the self-coding network on the entropy of the weight matrix as the importance values of the neurons, and deleting redundant neurons and connections according to the importance values of the neurons and the weight values connected in the weight matrix in the network pruning process until at least one of the compression rate and the accuracy rate of the self-coding network is out of the corresponding preset threshold range;
and the compressed data acquisition module is used for taking the data corresponding to the hidden layer in the self-coding network after the network pruning as the compressed data.
9. A computer-readable storage medium storing a program which can be loaded by a processor and which, when executed, implements the neural network-based big data compression method according to any one of claims 1 to 7.
CN202210351881.6A 2022-04-02 2022-04-02 Big data compression method, system and storage medium based on neural network Pending CN114640356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210351881.6A CN114640356A (en) 2022-04-02 2022-04-02 Big data compression method, system and storage medium based on neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210351881.6A CN114640356A (en) 2022-04-02 2022-04-02 Big data compression method, system and storage medium based on neural network

Publications (1)

Publication Number Publication Date
CN114640356A true CN114640356A (en) 2022-06-17

Family

ID=81952288

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210351881.6A Pending CN114640356A (en) 2022-04-02 2022-04-02 Big data compression method, system and storage medium based on neural network

Country Status (1)

Country Link
CN (1) CN114640356A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114864108A (en) * 2022-07-05 2022-08-05 深圳市圆道妙医科技有限公司 Processing method and processing system for syndrome and prescription matching data
CN115359497A (en) * 2022-10-14 2022-11-18 景臣科技(南通)有限公司 Call center monitoring alarm method and system
CN115632660A (en) * 2022-12-22 2023-01-20 山东海量信息技术研究院 Data compression method, device, equipment and medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114864108A (en) * 2022-07-05 2022-08-05 深圳市圆道妙医科技有限公司 Processing method and processing system for syndrome and prescription matching data
CN115359497A (en) * 2022-10-14 2022-11-18 景臣科技(南通)有限公司 Call center monitoring alarm method and system
CN115359497B (en) * 2022-10-14 2023-03-24 景臣科技(南通)有限公司 Call center monitoring alarm method and system
CN115632660A (en) * 2022-12-22 2023-01-20 山东海量信息技术研究院 Data compression method, device, equipment and medium

Similar Documents

Publication Publication Date Title
CN114640356A (en) Big data compression method, system and storage medium based on neural network
CN112714032B (en) Wireless network protocol knowledge graph construction analysis method, system, equipment and medium
CN108764273A (en) A kind of method, apparatus of data processing, terminal device and storage medium
CN115801901B (en) Enterprise production emission data compression processing method
CN111126595A (en) Method and equipment for model compression of neural network
CN112149797A (en) Neural network structure optimization method and device and electronic equipment
CN113258935A (en) Communication compression method based on model weight distribution in federated learning
CN117078048A (en) Digital twinning-based intelligent city resource management method and system
CN110188877A (en) A kind of neural network compression method and device
CN113076196A (en) Cloud computing host load prediction method combining attention mechanism and gated cycle unit
CN113328755A (en) Compressed data transmission method facing edge calculation
CN113111889A (en) Target detection network processing method for edge computing terminal
CN112215398A (en) Power consumer load prediction model establishing method, device, equipment and storage medium
CN115086301A (en) Data analysis system and method for compression uploading equalization
CN117290364B (en) Intelligent market investigation data storage method
CN113609126B (en) Integrated storage management method and system for multi-source space-time data
CN116307709A (en) Comprehensive assessment method and system for flood control capacity of transformer substation based on information gain fusion
CN116185797A (en) Method, device and storage medium for predicting server resource saturation
CN114202077A (en) Machine learning model compression method based on federal learning and mean value iteration
CN114254828A (en) Power load prediction method based on hybrid convolution feature extractor and GRU
CN114640357B (en) Data encoding method, apparatus and storage medium
CN112488805A (en) Long-renting market early warning method based on multiple regression time series analysis
CN111723420A (en) Structural topology optimization method based on deep learning
JPWO2022166199A5 (en)
CN111275184B (en) Method, system, device and storage medium for realizing neural network compression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination