CN117113383B

CN117113383B - Privacy protection method and system for local production data of equipment

Info

Publication number: CN117113383B
Application number: CN202311352321.3A
Authority: CN
Inventors: 朱军; 武茂国; 朱立豪; 张中华; 邢亚
Original assignee: Shandong Wanshi Mechanical Technology Co ltd
Current assignee: Shandong Wanshi Mechanical Technology Co ltd
Priority date: 2023-10-19
Filing date: 2023-10-19
Publication date: 2024-01-26
Anticipated expiration: 2043-10-19
Also published as: CN117113383A

Abstract

The invention relates to the technical field of data encryption, in particular to a method and a system for protecting privacy of local production data of equipment. The method comprises the following steps: acquiring production data, traversing the production data by using windows with different lengths to acquire repeated character strings, and replacing the repeated character strings by using preset characters to acquire data to be tested; determining a character quantity influence coefficient according to the quantity difference of characters in the data to be tested of two windows with adjacent lengths and the quantity of characters in the data to be tested of the window with smaller length; determining the frequency discrete degree and the distribution confusion degree according to the frequency of different characters in any window and the total number of the characters in the window; screening candidate windows from windows of different lengths; and determining an optimal window according to the character quantity influence coefficient, the frequency discrete degree and the distribution confusion degree, and encrypting the data to be tested of the optimal window to obtain encrypted data. The invention improves the operation control effect of the memory while improving the data encryption security.

Description

Privacy protection method and system for local production data of equipment

Technical Field

The invention relates to the technical field of data encryption, in particular to a method and a system for protecting privacy of local production data of equipment.

Background

The data encryption control can effectively reduce the time and cost of data transmission and processing; the risk of data leakage can be reduced by controlling the data in the memory to be compressed and then encrypted, so that the privacy of a user is protected, the control of the data to be compressed and encrypted has important significance for saving storage space, improving data transmission and processing efficiency and increasing data security, the operation control of the memory depends on the data quantity of the stored data, and when the data quantity is large, the operation control effect of the memory is poor, so that the control is very important for the self-adaptive compression and encryption of the data when the huge local production data is faced.

In the related art, data is directly compressed and encrypted through arithmetic coding control software, because production data is usually log data, the data volume is huge, repeated data is more, at this time, more storage resources are occupied after the data is compressed and encrypted, that is, the storage resources for data encryption are more occupied, and the operation control effect of a memory is poor.

Disclosure of Invention

In order to solve the technical problems of relatively poor data encryption effect, relatively large memory resource occupation and relatively poor operation control effect of a memory, the invention provides a method and a system for protecting privacy of local production data of equipment, which adopts the following technical scheme:

in one aspect, the present invention provides a method for protecting privacy of local production data of a device, where the method includes:

acquiring production data in the local production process of equipment, sequentially traversing the production data by using windows with different lengths, counting repeated character strings corresponding to the windows with different lengths, and replacing the repeated character strings by using preset characters to obtain data to be tested;

taking a window with a larger length of two windows with adjacent lengths as a first window and a window with a smaller length as a second window, and determining a character quantity influence coefficient of the first window according to the quantity difference of characters in data to be detected corresponding to the first window and the second window and the quantity of characters in the data to be detected corresponding to the second window;

determining the frequency discrete degree of the characters in the data to be tested corresponding to the window according to the frequency of different characters in any window, and determining the distribution confusion degree of the characters in the data to be tested according to the frequency of different characters and the total number of the characters in the window; screening candidate windows from the windows with different lengths according to the minimum value of the frequency of each character in the data to be tested corresponding to the production data and the windows with different lengths;

according to the character quantity influence coefficient, the frequency discrete degree and the distribution confusion degree of the same candidate window, determining a preferred coefficient of the candidate window, determining an optimal window according to the preferred coefficient, performing arithmetic coding on data to be tested of the optimal window, and encrypting the data to be tested to obtain encrypted data.

Further, the sequentially traversing the production data by using windows with different lengths, counting repeated character strings corresponding to the windows with different lengths, including:

traversing the production data by using a window with a first length, and counting character strings with identical character arrangement in the production data as repeated character strings with the first length;

taking the first length minus 1 as a second length, traversing by using a window of the second length in the production data except the repeated character strings of the first length, and obtaining the repeated character strings of the second length;

sequentially decreasing the window length, and respectively carrying out iteration of repeated character string recognition in the rest production data until the iteration is completed when the window length is 2;

and taking the repeated character strings with all the counted lengths after the iteration is completed as repeated character strings corresponding to the window with the first length.

Further, the determining the character quantity influence coefficient of the first window according to the difference of the quantity of the characters in the data to be tested corresponding to the first window and the second window and the quantity of the characters in the data to be tested corresponding to the second window includes:

taking the absolute value of the difference between the number of characters in the data to be tested corresponding to the first window and the number of characters in the data to be tested corresponding to the second window as the difference between the number of characters in the first window and the number of characters in the second window;

and taking the ratio normalized value of the character quantity difference and the quantity of characters in the data to be tested corresponding to the second window as a character quantity influence coefficient of the first window.

Further, the determining the frequency discrete degree of the window corresponding to the character in the data to be tested according to the frequency of the different characters in any window includes:

calculating the average value of the character frequency in the same window as a character average value;

based on a standard deviation calculation formula, calculating the standard deviation of the frequency of the character corresponding to the window according to the frequency of the different characters, the character mean value and the type number of the characters, and taking the normalized value of the standard deviation as the frequency discrete degree of the character.

Further, the determining the degree of confusion of the distribution of the characters in the data to be tested according to the frequency of different characters and the total number of the characters in the window includes:

calculating the ratio of the frequency of different characters to the total number of characters in the window respectively as the frequency of corresponding characters;

based on an information entropy formula, calculating according to the frequency of all characters to obtain information entropy of character distribution in the data to be tested, and carrying out normalization processing on the information entropy to obtain the distribution confusion degree.

Further, the screening candidate windows from the windows with different lengths according to the minimum value of the frequency of each character in the data to be tested corresponding to the production data and the windows with different lengths includes:

and taking a window with the minimum value of the character frequency in the data to be tested being greater than or equal to the minimum value of the character frequency in the production data as a candidate window.

Further, the determining the preferred coefficient of the candidate window according to the character quantity influence coefficient, the frequency discrete degree and the distribution confusion degree of the same candidate window includes:

determining a character distribution influence coefficient according to the frequency discrete degree and the distribution confusion degree, wherein the frequency discrete degree and the character distribution influence coefficient are in positive correlation, the distribution confusion degree and the character distribution influence coefficient are in positive correlation, and the value of the character distribution influence coefficient is a normalized numerical value;

and calculating the product of the character quantity influence coefficient and the character distribution influence coefficient as the preference coefficient.

Further, the determining the optimal window according to the preference coefficient includes:

and taking the candidate window with the maximum optimal coefficient as an optimal window.

Further, the encrypted data is obtained by adopting an AES algorithm.

On the other hand, the invention also provides a device local production data privacy protection system, which comprises a memory and a processor, wherein the processor executes a computer program stored in the memory to realize the device local production data privacy protection method.

The invention has the following beneficial effects:

according to the invention, through acquiring the production data in the local production process of the equipment, counting the repeated character strings corresponding to windows with different lengths, replacing the repeated character strings with preset characters to obtain the data to be tested, and combining the arranged repeated characters into the character strings, it can be understood that the corresponding data to be tested are different due to different lengths of the selected windows, so that encryption of the data to be tested is different, when decryption keys of the repeated character strings are not determined, the data to be tested cannot be effectively reversely pushed according to the data to be tested to obtain the data to be tested, and according to the difference of the number of the characters in the different data to be tested and the number of the characters in the data to be tested, the influence coefficient of the number of the characters in the window is determined, when the change of the number of the characters is large, the repeated characters in the corresponding production data can be represented to be effectively converted into the repeated character strings, namely, the effect corresponding to the iteration is better, and the encryption efficiency and the encryption effect are obviously improved; according to minimum values of frequency of each character in the production data and the data to be tested corresponding to windows with different lengths, candidate windows are screened from the windows with different lengths, and calculation amount for analyzing the different windows in the follow-up process is reduced by screening the windows with different lengths, so that the window analysis speed can be effectively improved; the frequency discrete degree and the distribution chaotic degree of the characters in the data to be tested are calculated according to different windows, and it can be understood that the more discrete and chaotic the character distribution is, the better the corresponding encryption effect is, so that the invention combines the character quantity influence coefficient, the frequency discrete degree and the distribution chaotic degree to determine the optimal coefficient, can effectively consider the influence of character change and character distribution, so that the data to be tested of the optimal window can reduce the data quantity of the encryption data as much as possible while guaranteeing the encryption effect of the data to be tested, and improves the encryption processing efficiency, that is, the invention processes the production data with higher consistency of the coding distribution, thereby guaranteeing the encryption effect, enhancing the safety of the production data, effectively improving the encryption efficiency, reducing the occupation of storage resources and improving the operation control effect of a memory.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions and advantages of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of a method for protecting privacy of local production data of a device according to an embodiment of the present invention.

Detailed Description

In order to further describe the technical means and effects adopted by the present invention to achieve the preset purpose, the following detailed description refers to the specific implementation, structure, characteristics and effects of a method and a system for protecting the privacy of local production data of a device according to the present invention, with reference to the accompanying drawings and preferred embodiments. In the following description, different "one embodiment" or "another embodiment" means that the embodiments are not necessarily the same. Furthermore, the particular features, structures, or characteristics of one or more embodiments may be combined in any suitable manner.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

The following specifically describes a specific scheme of a method and a system for protecting privacy of local production data of equipment provided by the invention with reference to the accompanying drawings.

Referring to fig. 1, a flowchart of a method for protecting privacy of local production data of a device according to an embodiment of the present invention is shown, where the method includes:

s101: and acquiring production data in the local production process of the equipment, sequentially traversing the production data by using windows with different lengths, counting repeated character strings corresponding to the windows with different lengths, and replacing the repeated character strings by using preset characters to obtain data to be tested.

It can be understood that most devices generate log data corresponding to the devices in the running process, the log data is the state data of the devices in the production running process, and the log data of the devices plays an important role in aspects of device maintenance, fault removal, performance optimization and the like, so that the privacy and the safety of the log data are critical, and the log data of the devices are usually required to be encoded and encrypted.

The production data is log data generated in the running process of the device, and it can be understood that, because more identical character strings, such as device models, user interaction operations, and the like, generally appear in the log data, when the device performs coding encryption, the coding distribution consistency is easy to be higher, so that the production data can be reversely decoded according to the coding with higher consistency, and further the encryption effect is poor.

According to the invention, the production data are sequentially traversed by using windows with different lengths, repeated character strings are counted on the production data, so that the character frequency in the production data is changed, the confusion degree of the data is further improved, and the encryption effect is enhanced.

Further, in the embodiment of the present invention, using windows of different lengths to sequentially traverse production data, counting repeated character strings corresponding to the windows of different lengths, including: traversing the production data by using a window with a first length, and counting character strings with identical character arrangement in the production data as repeated character strings with the first length; taking the first length minus 1 as a second length, traversing by using a window of the second length in the production data except the repeated character strings of the first length, and obtaining the repeated character strings of the second length; sequentially decreasing the window length, and respectively carrying out iteration of repeated character string recognition in the rest production data until the iteration is completed when the window length is 2; and taking the repeated character strings with all the counted lengths after the iteration is completed as repeated character strings corresponding to the window with the first length.

The first length is a length corresponding to the window, and in the embodiment of the invention, the window with multiple lengths can be used for traversing the production data.

The repeated character strings corresponding to windows with different lengths are respectively used as the maximum lengths of the repeated character strings, and the production data is subjected to non-overlapping traversal so as to be divided into repeated character strings with different lengths.

For example, a specific example is given by a first length of 4, the characters in the production data are sequentially traversed by using a window with the length of 4, so as to determine repeated character strings with the length of 4 and non-overlapping and repeating, then 3 is taken as a second length, the characters remaining in the production data, namely, the characters not belonging to the repeated character strings are traversed by using a window with the length of 3, so as to obtain repeated character strings with the length of 3 and non-overlapping and repeating, then the remaining characters obtained by dividing the production data by two rounds of processing are traversed again by using a window with the length of 2, so as to obtain repeated character strings with the length of 2 and non-overlapping and repeating, and since the window length reaches 2, the iteration is stopped at this time, the repeated character strings with the lengths of 4, 3 and 2 are counted as the repeated character strings corresponding to the window with the length of 4. It will be appreciated that the repeated strings do not overlap each other.

After determining the repeated character strings, replacing the repeated character strings with preset characters to obtain data to be tested, and it can be understood that the corresponding preset characters can be respectively set because the types of the repeated character strings corresponding to windows with different lengths are different, and the preset characters and the replaced character strings are used as decoding keys in the subsequent decoding process, so that the decoding of the data is realized.

S102: and taking the window with larger length of the two windows with adjacent lengths as a first window and the window with smaller length as a second window, and determining the character quantity influence coefficient of the first window according to the quantity difference of characters in the data to be tested corresponding to the first window and the second window and the quantity of characters in the data to be tested corresponding to the second window.

In the embodiment of the invention, after the repeated character strings corresponding to the windows with different lengths are determined, in order to analyze the data to be tested under different lengths, the window with larger length in the two windows with adjacent lengths is used as a first window, and the window with smaller length is used as a second window.

Further, in some embodiments of the present invention, determining a character number influence coefficient of the first window according to a difference in number of characters in the data to be measured corresponding to the first window and the second window and a number of characters in the data to be measured corresponding to the second window includes: taking the absolute value of the difference between the number of characters in the data to be tested corresponding to the first window and the number of characters in the data to be tested corresponding to the second window as the difference between the number of characters in the first window and the number of characters in the second window; and taking the ratio normalized value of the character quantity difference and the quantity of characters in the data to be tested corresponding to the second window as a character quantity influence coefficient of the first window.

The character quantity influence coefficient is a change influence value generated by the number of repeated character strings corresponding to the window change, and it can be understood that after the window length is increased, the number of characters and the distribution of the characters in the obtained data to be tested can be changed.

Therefore, the invention uses the absolute value of the difference between the number of characters in the data to be tested corresponding to the first window and the number of characters in the data to be tested corresponding to the second window as the difference of the number of characters, and when the value of the difference of the number of characters is larger, the iteration effect from the first window to the second window is better, and the difference of the number of characters is used as the character number influence coefficient of the first window.

S103: determining the frequency discrete degree of the characters in the data to be tested corresponding to the window according to the frequency of different characters in any window, and determining the distribution confusion degree of the characters in the data to be tested according to the frequency of different characters and the total number of the characters in the window; and screening candidate windows from the windows with different lengths according to the minimum value of the frequency of each character in the production data and the data to be tested corresponding to the windows with different lengths.

After the influence coefficients of the number of the characters corresponding to different windows are determined, the distribution situation of the characters in the data to be tested can be counted respectively, and it can be understood that the distribution situations of the characters in the data to be tested corresponding to the windows with different lengths are required to be obtained respectively because the types of the characters and the frequency of the various characters in the data to be tested corresponding to the windows with different lengths can be distinguished.

The frequency discrete degree and the distribution confusion degree are coefficients for representing the distribution condition of characters in the data to be tested, and it can be understood that the more discrete and chaotic the character distribution is, the better the corresponding data encryption effect is, so that the frequency discrete degree and the distribution confusion degree are positively correlated with the encryption effect.

Further, in some embodiments of the present invention, determining the frequency discrete degree of the window corresponding to the character in the data to be tested according to the frequency of the different characters in any window includes: calculating the average value of the character frequency in the same window as a character average value; based on a standard deviation calculation formula, calculating the standard deviation of the frequency of the character corresponding to the window according to the frequency of different characters, the character mean value and the type number of the characters, and taking the normalized value of the standard deviation as the frequency discrete degree of the character.

Taking the frequency discrete degree of the character in any data to be tested as an example, the corresponding calculation formula may specifically be, for example:

wherein W represents the degree of frequency dispersion, j represents the index of the character type in the data to be measured, t represents the total type of the characters in the data to be measured,frequency of representing j-th class character, +.>Represents the average value of the frequency of all characters in the data to be tested,the standard deviation of the character frequency is represented, and the G () represents normalization processing, and in one embodiment of the present invention, the normalization processing may specifically be, for example, maximum and minimum normalization processing, and normalization in subsequent steps may all be performed by maximum and minimum normalization processing, and in other embodiments of the present invention, other normalization methods may be selected according to a specific range of values, which will not be described herein.

It can be understood that the standard deviation can effectively represent the dispersion degree of the character frequency distribution in the data to be tested, so that the frequency dispersion degree is obtained through the standard deviation normalization, the reliability of the frequency dispersion degree is higher, meanwhile, the influence caused by dimension can be eliminated through the normalization processing, and the subsequent data analysis according to the frequency dispersion degree is facilitated.

Further, in some embodiments of the present invention, determining the degree of confusion of the distribution of the characters in the data to be measured according to the frequency of different characters and the total number of characters in the window includes: calculating the ratio of the frequency of different characters to the total number of characters in the window respectively as the frequency of corresponding characters; based on the information entropy formula, calculating according to the frequency of all characters to obtain the information entropy of character distribution in the data to be tested, and carrying out normalization processing on the information entropy to obtain the distribution confusion degree.

The information entropy is characteristic data for representing the degree of disorder of character distribution, and the larger the information entropy is, the more disorder of corresponding data distribution can be represented, so that the degree of disorder of the distribution of characters in the data to be tested is determined through the information entropy. The calculation formula of the degree of confusion of the distribution may specifically be, for example:

wherein K represents the degree of disorder of distribution, j represents the index of the character type in the data to be measured, t represents the total type of the characters in the data to be measured,frequency of representing j-th class character, +.>Representing the total number of characters in the window, G () represents the normalization process.

In the embodiment of the invention, the distribution disorder degree is normalization of information entropy, that is, the larger the distribution disorder degree is, the more the distribution of characters in the corresponding data to be detected is disordered.

Further, in some embodiments of the present invention, selecting candidate windows from windows of different lengths according to minimum values of frequencies of respective characters in the production data and the data to be tested corresponding to windows of different lengths includes: and taking a window with the minimum value of the character frequency in the data to be tested being greater than or equal to the minimum value of the character frequency in the production data as a candidate window.

In the embodiment of the invention, because the frequency is too low and can influence the data encryption efficiency, in each data to be tested, because the frequency of the character corresponding to the longest character string in the previous iteration is influenced by the occurrence of new characters, the frequency of the longest character string in the current iteration is required to be considered, namely, when the minimum value of the character frequency in the data to be tested is more than or equal to the minimum value of the character frequency in the production data, the corresponding data to be tested can be characterized to realize effective repeated data identification, wherein, particularly, when the window size is 2, if the corresponding repeated character string frequency is less than the minimum value of the character frequency in the production data, the production data is not processed.

Therefore, in the embodiment of the invention, the window with the minimum value of the character frequency in the data to be detected being more than or equal to the minimum value of the character frequency in the production data is used as the candidate window, and the windows with different lengths are screened, so that the calculation amount for analyzing all different windows in the follow-up process is reduced, and the window analysis speed can be effectively improved.

S104: according to the character quantity influence coefficient, the frequency discrete degree and the distribution confusion degree of the same candidate window, determining the optimal coefficient of the candidate window, determining the optimal window according to the optimal coefficient, performing arithmetic coding on the data to be tested of the optimal window, and encrypting to obtain encrypted data.

Further, in some embodiments of the present invention, determining the preferred coefficients of the candidate window according to the character number influence coefficient, the frequency dispersion degree and the distribution confusion degree of the same candidate window includes: determining character distribution influence coefficients according to the frequency discrete degree and the distribution confusion degree, wherein the frequency discrete degree and the character distribution influence coefficients are in positive correlation, the distribution confusion degree and the character distribution influence coefficients are in positive correlation, and the value of the character distribution influence coefficients is a normalized numerical value; the product of the character number influence coefficient and the character distribution influence coefficient is calculated as a preference coefficient.

The positive correlation relationship indicates that the dependent variable increases along with the increase of the independent variable, the dependent variable decreases along with the decrease of the independent variable, and the specific relationship can be multiplication relationship, addition relationship, idempotent of an exponential function and is determined by practical application; the negative correlation indicates that the dependent variable decreases with increasing independent variable, and the dependent variable increases with decreasing independent variable, which may be a subtraction relationship, a division relationship, or the like, and is determined by the actual application.

In the embodiment of the invention, the normalized value of the product of the frequency discrete degree distribution disorder degree can be calculated as the character distribution influence coefficient, or the normalized value of the sum of the frequency discrete degree distribution disorder degree can be calculated as the character distribution influence coefficient.

After the character distribution influence coefficient is determined, the product of the character quantity influence coefficient and the character distribution influence coefficient is calculated to be used as a preference coefficient, and it is understood that the determination of the preference coefficient can effectively combine the character quantity change and the character distribution condition, so that the preference of each candidate window is determined.

Further, in some embodiments of the present invention, determining the optimal window according to the preference coefficients includes: and taking the candidate window with the largest preference coefficient as an optimal window.

In the embodiment of the invention, the candidate window with the largest optimization coefficient is used as the optimal window, and when the optimization is larger, the character distribution of the data to be tested corresponding to the candidate window can be characterized as more chaotic, the degree of dispersion is higher, and the character change is larger.

The arithmetic coding is a lossless compression method, which is a technology well known in the art, and can further reduce the data volume through arithmetic coding compression, thereby reducing the storage occupation of the data, improving the encryption speed of the data, compressing through arithmetic coding and encrypting the data, wherein the data encryption can be specifically, for example, an advanced encryption standard (Advanced Encryption Standard, AES) algorithm, or can be, for example, a plurality of other data encryption algorithms, and the data encryption is not limited.

According to the embodiment of the invention, the data to be detected with more disordered character distribution can be encrypted through arithmetic coding and encryption, so that the encryption effect of the encrypted data is improved, the data quantity of the encrypted data is reduced, and the safety of the encrypted data is ensured.

In summary, the invention calculates the repeated character strings corresponding to windows with different lengths respectively by acquiring the production data in the local production process of the equipment, and uses the preset characters to replace the repeated character strings to obtain the data to be tested, so that the arranged repeated characters can be combined into the character strings, and it can be understood that the corresponding data to be tested are different due to different lengths of the selected windows, so that the encryption of the data to be tested is different, when the decryption key of the repeated character strings is not determined, the data to be tested cannot be effectively reversely pushed according to the data to be tested to obtain the production data, and according to the difference of the number of the characters in the different data to be tested and the number of the characters in the data to be tested, the influence coefficient of the number of the characters in the window is determined, when the change of the number of the characters is large, the repeated characters in the corresponding production data can be effectively converted into the repeated character strings, namely the effect corresponding to this iteration is better, and the encryption efficiency and the encryption effect are obviously improved; according to minimum values of frequency of each character in the production data and the data to be tested corresponding to windows with different lengths, candidate windows are screened from the windows with different lengths, and calculation amount for analyzing the different windows in the follow-up process is reduced by screening the windows with different lengths, so that the window analysis speed can be effectively improved; the frequency discrete degree and the distribution chaotic degree of the characters in the data to be tested are calculated according to different windows, and it can be understood that the more discrete and chaotic the character distribution is, the better the corresponding encryption effect is, so that the invention combines the character quantity influence coefficient, the frequency discrete degree and the distribution chaotic degree to determine the optimal coefficient, can effectively consider the influence of character change and character distribution, so that the data to be tested of the optimal window can reduce the data quantity of the encryption data as much as possible while guaranteeing the encryption effect of the data to be tested, and improves the encryption processing efficiency, that is, the invention processes the production data with higher consistency of the coding distribution, thereby guaranteeing the encryption effect, enhancing the safety of the production data, effectively improving the encryption efficiency, reducing the occupation of storage resources and improving the operation control effect of a memory.

The invention also provides a device local production data privacy protection system, which comprises a memory and a processor, wherein the processor executes a computer program stored in the memory to realize the device local production data privacy protection method.

It should be noted that: the sequence of the embodiments of the present invention is only for description, and does not represent the advantages and disadvantages of the embodiments. The processes depicted in the accompanying drawings do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments.

Claims

1. A method of protecting privacy of production data locally at a device, the method comprising:

taking a window with a large length of two windows with adjacent lengths as a first window and a window with a small length as a second window, and determining a character quantity influence coefficient of the first window according to the quantity difference of characters in data to be tested corresponding to the first window and the second window and the quantity of characters in the data to be tested corresponding to the second window;

determining a preferred coefficient of the candidate window according to the character quantity influence coefficient, the frequency discrete degree and the distribution confusion degree of the same candidate window, determining an optimal window according to the preferred coefficient, performing arithmetic coding on data to be tested of the optimal window, and encrypting the data to be tested to obtain encrypted data;

the step of sequentially traversing the production data by using windows with different lengths, and counting repeated character strings respectively corresponding to the windows with different lengths comprises the following steps:

taking the repeated character strings with all the counted lengths after the iteration is completed as repeated character strings corresponding to windows with the first length;

the determining the character quantity influence coefficient of the first window according to the quantity difference of the characters in the data to be tested corresponding to the first window and the second window and the quantity of the characters in the data to be tested corresponding to the second window comprises the following steps:

taking a ratio normalization value of the character quantity difference and the quantity of characters in the data to be tested corresponding to a second window as a character quantity influence coefficient of the first window;

the determining the frequency discrete degree of the window corresponding to the character in the data to be tested according to the frequency of different characters in any window comprises the following steps:

calculating standard deviation of the frequency of the character corresponding to the window according to the frequency of the different characters, the character mean value and the type number of the characters based on a standard deviation calculation formula, and taking a normalized value of the standard deviation as the frequency discrete degree of the character;

the determining the distribution confusion degree of the characters in the data to be tested according to the frequency of different characters and the total number of the characters in the window comprises the following steps:

based on an information entropy formula, calculating according to the frequency of all characters to obtain information entropy of character distribution in the data to be tested, and carrying out normalization processing on the information entropy to obtain the distribution confusion degree;

the screening candidate windows from the windows with different lengths according to the minimum value of the frequency of each character in the data to be tested corresponding to the production data and the windows with different lengths comprises the following steps:

taking a window with the minimum value of the character frequency in the data to be tested being greater than or equal to the minimum value of the character frequency in the production data as a candidate window;

the determining the preferred coefficient of the candidate window according to the character quantity influence coefficient, the frequency discrete degree and the distribution confusion degree of the same candidate window comprises the following steps:

calculating the product of the character quantity influence coefficient and the character distribution influence coefficient as the preference coefficient;

the determining the optimal window according to the preferred coefficient comprises the following steps:

2. A method of protecting privacy of locally produced data of a device as claimed in claim 1 wherein the encrypted data is obtained using AES algorithm.

3. A device local production data privacy protection system comprising a memory and a processor, wherein the processor executes a computer program stored in the memory to implement a device local production data privacy protection method as claimed in any one of claims 1-2.