CN113987556A - Data processing method and device, electronic equipment and storage medium - Google Patents
Data processing method and device, electronic equipment and storage medium Download PDFInfo
- Publication number
- CN113987556A CN113987556A CN202111593205.1A CN202111593205A CN113987556A CN 113987556 A CN113987556 A CN 113987556A CN 202111593205 A CN202111593205 A CN 202111593205A CN 113987556 A CN113987556 A CN 113987556A
- Authority
- CN
- China
- Prior art keywords
- segment
- code
- coding
- frequency
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/08—Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
- H04L9/0861—Generation of secret information including derivation or calculation of cryptographic keys or passwords
- H04L9/0863—Generation of secret information including derivation or calculation of cryptographic keys or passwords involving passwords or one-time passwords
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Bioethics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The application relates to a data processing method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a first coding segment, wherein the first coding segment is used for representing one data segment in data to be stored; counting a first frequency of each code in a standard code set appearing in a first code segment, wherein the standard code set comprises all codes in the first code segment; and generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, and the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency. The data storage method and device solve the technical problem that data storage cannot avoid data distribution conditions counted by a storage party.
Description
Technical Field
The present application relates to the field of data processing, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
With the development of the internet, data generated under the internet is increased in geometric multiples. In the face of massive data, the traditional storage defects are more and more obvious, such as poor expansibility, single point failure and the like, and the traditional centralized storage is gradually replaced by distributed storage. The data distribution storage technology aims to establish a novel distributed encryption storage network and provide efficient storage service for users.
When data distribution is stored, a storage party does not want to know the data distribution situation, and when encryption and other methods are adopted in the related art, possible data distribution can be counted through frequency statistics.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The application provides a data processing method and device, electronic equipment and a storage medium, which are used for at least solving the technical problem that data storage cannot avoid data distribution situation statistics by a storage party in the related technology.
According to an aspect of an embodiment of the present application, there is provided a data processing method, including: acquiring a first coding segment, wherein the first coding segment is used for representing one data segment in data to be stored; counting a first frequency of each code in a standard code set, wherein the first frequency appears in a first code segment, and all codes in the first code segment are contained in the standard code set; and generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, and the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency.
According to another aspect of the embodiments of the present application, there is also provided a data processing apparatus, including: the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a first coding segment, and the first coding segment is used for representing one data segment in data to be stored; the statistical module is used for counting the first frequency of each code in the standard code set appearing in the first code segment, wherein the standard code set comprises all codes in the first code segment; and the generating module is used for generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, and the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency.
According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program which, when executed, performs the above-described method.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above method through the computer program.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the steps of any of the embodiments of the method described above.
In the embodiment of the application, a first coding segment is obtained, wherein the first coding segment is used for representing one data segment in data to be stored; counting a first frequency of each code in a standard code set, wherein the first frequency appears in a first code segment, and all codes in the first code segment are contained in the standard code set; generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency, coding frequency statistics is performed on the coding segment representing one data segment by using the codes to represent data to be stored, and coding frequency supplementation is performed on the coding segment under the condition of reserving the original coding segment, so that the frequency of each code in the coding segment after frequency supplementation is the same, the purpose that a storage party cannot count the distribution condition of the original data during data storage is achieved, the technical problem that data storage cannot avoid the data distribution condition counted by the storage party is solved, and the technical effect of enhancing data confidentiality is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of a hardware environment for a data processing method according to an embodiment of the present application;
FIG. 2 is a flow chart of an alternative data processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an alternative data processing apparatus according to an embodiment of the present application;
fig. 4 is a block diagram of a terminal according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial nouns or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
standard coding set: in this application, a standard encoding set refers to a full set of encodings, i.e. a set of all encodings that conform to the encoding format.
Frequency statistics: in this application, frequency statistics refer to the number of times each code in a statistical standard code set appears in a code segment.
Frequency supplement: in this application, frequency supplementation refers to adding a certain number of codes to a code segment according to a certain frequency, so that the number of times of occurrence of each code in the code segment after frequency supplementation is the same.
According to an aspect of embodiments of the present application, there is provided an embodiment of a method for data processing.
Alternatively, in the present embodiment, the data processing method described above may be applied to a hardware environment constituted by the terminal 101 and the server 103 as shown in fig. 1. As shown in fig. 1, a server 103 is connected to a terminal 101 through a network, which may be used to provide data processing services for the terminal or a client installed on the terminal, and a database 105 may be provided on the server or separately from the server, and is used to provide data storage services for the server 103, and the network includes but is not limited to: the terminal 101 is not limited to a PC, a mobile phone, a tablet computer, and the like. The data processing method according to the embodiment of the present application may be executed by the server 103, the terminal 101, or both the server 103 and the terminal 101. The terminal 101 executing the data processing method according to the embodiment of the present application may also be executed by a client installed thereon. A data processing method executed on a server according to an embodiment of the present application will be described as an example.
Fig. 2 is a flow chart of an alternative data processing method according to an embodiment of the present application, which may include the following steps, as shown in fig. 2:
step S202, a server acquires a first coding segment, wherein the first coding segment is used for representing one data segment in data to be stored;
step S204, the server counts the first frequency of each code in the standard code set appearing in the first code segment, and the standard code set comprises all the codes in the first code segment;
step S206, the server generates a second code segment according to the maximum frequency of all the first frequencies, the second code segment carries the first code segment, and the frequency of each code in the standard code set appearing in the second code segment is the same as the maximum frequency.
Through the steps S202 to S206, the data to be stored is represented by the codes, the coding frequency statistics is performed on the coding segment representing one data segment, the coding frequency supplementation is performed on the coding segment under the condition that the original coding segment is reserved, so that the frequency of each code in the coding segment after the frequency supplementation is the same, the purpose that the storage party cannot count the distribution condition of the original data during data storage is achieved, the technical problem that the data storage party cannot count the data distribution condition is solved, and the technical effect of enhancing the data confidentiality is achieved.
In the technical solution provided in step S202, the server acquires a first encoding segment, where the first encoding segment is used to represent one data segment in the data to be stored.
The first coding segment represents a data segment, and when the data to be stored comprises a plurality of data segments, the plurality of first coding segments corresponding to the data segments are adopted to represent all the data to be stored.
The obtaining of the first coding segment may be directly obtaining a character string in the data segment, or may be re-coding the character string in the data segment into the first coding segment by using other coding modes, and the adopted coding format may be Unicode (Unicode), ASCII (american standard code for information interchange), GBK (chinese character coding character set), and the like commonly used in the prior art, or may be a coding format in which code bits are customized (that is, N code bits are used as one code) on the basis of one existing coding format.
For example, [ abc ] is a string in one data segment, (1) character format: the first code segment may be a directly obtained string of characters [ a, b, c ], each character being a code; (2) the self-defined coding format is as follows: the first code segment can also be obtained by carrying out equal-length coding on character strings by adopting other coding formats, wherein the ASCII values corresponding to a, b and c are 97, 98 and 99 respectively, then the ASCII values are 1100001, 1100010 and 1100011 according to 7-bit binary systems, the first code segment [1100001, 1100010 and 1100011] is obtained together, and the code bit is customized on the basis of the coding format, and when the code bit is 3, the first code segment is [110, 000, 111, 000, 101, 100 and 011 ].
As an alternative embodiment, the server may obtain the first code segment as follows: the server cuts the data to be stored to obtain one or more data segments, and cuts the data to be stored, wherein the cutting mode can be set according to requirements, if the data needs to be stored in unequal lengths, the data to be stored can be cut in unequal lengths, and if the data needs to be stored in equal lengths, the data to be stored can be cut in fixed lengths; and then, coding each data segment into a first code segment by adopting a preset code format, wherein the preset code format comprises a custom code format and a character format.
In the technical solution provided in step S204, the server counts a first frequency of each code in the standard code set appearing in the first code segment, where the standard code set includes all codes in the first code segment.
The standard code set referred to in this application is a code set used for frequency statistics, in which each code occurs only once, and covers all possible codes occurring in the first code segment.
The first frequency refers to the number of occurrences of codes in the standard code set in the first code segment.
As an alternative embodiment, when the first encoding set represents the data segment in a character format, the server determines the standard encoding set according to the character format; and counting the frequency of each code in the standard code set appearing in the first code segment to obtain a first frequency, wherein the first code segment is divided into a plurality of codes according to a character format.
For example, if the first code segment represents the data segment by using lower case english alphabet, the standard code set includes all lower case english alphabet, and the standard code set is { a, b, c, d, e, f, g, h, i, j, k, l, m, n … … x, y, z }, and if the first code segment is [ a, b, c, d ], the statistically obtained frequency table is { a =1, b =1, c =1, d =1, e =0 … … x =0, y =0, z =0}, where a =1 represents that a in the standard code set occurs 1 times in the first code segment, and e =0 represents that e in the standard code set occurs 0 times in the first code segment.
As another optional embodiment, when the first encoding set represents the data segment in the custom encoding format, the server determines the standard encoding set according to the custom encoding format, wherein if the custom encoding format uses the binary character string with the specified number of bits as one encoding, the standard encoding set contains all the binary character strings with the specified number of bits; and counting the frequency of each code in the standard code set appearing in the first code segment to obtain a first frequency, wherein the first code segment is divided into a plurality of codes according to a self-defined code format. When a user-defined coding format with smaller code bits is adopted, the size of the frequency table can be reduced, and the difficulty of statistics is increased.
For example, if the first coding segment represents the data segment by using a 3-bit binary string, the standard coding set includes all 3-bit binary strings, and the standard coding set is {000, 001, 010, 011, 100, 101, 110, 111}, and if the first coding segment is [110, 000, 111, 000, 101, 100, 011], the statistically obtained frequency table is { '000' =2, '001' =0, '010' =0, '011' =1, '100' =1, '101' =1, '110' =1, '111' =1}, where '000' =2 indicates that '000' in the standard coding set occurs 2 times in the first coding segment and '001' =0 indicates that '001' in the standard coding set occurs 0 times in the first coding segment.
In the technical solution provided in step S206, the server generates a second code segment according to a maximum frequency of all the first frequencies, where the second code segment carries the first code segment, and a frequency of each code in the standard code set appearing in the second code segment is the same as the maximum frequency.
The maximum frequency refers to a frequency value corresponding to a code with the largest number of occurrences of codes in the standard code set in the first code segment.
The second coding section is generated in order to prevent a storage party from counting the real data distribution situation, the second coding section is obtained by performing frequency supplement on the first coding section, the frequency of each code in the standard coding set appearing in the second coding section is the same as the maximum frequency, and when the storage party counts the data distribution situation, the counting is the coding frequency in the second coding section after data processing.
As an alternative embodiment, the server subtracts the first frequency of each code in the standard code set from the maximum frequency to obtain the second frequency of the code; then generating a third code segment according to the second frequency of each code in the standard code set, wherein the frequency of each code in the standard code set appearing in the third code segment is the second frequency of the code; and finally, combining the first coding segment and the third coding segment to obtain a second coding segment. For example, the first code segment is [ aab ], the standard code set is { a, b, c, d, e, f, g, h, i, j, k, l, m, n … … x, y, z }, the statistically derived frequency table is { a =2, b =1, c =0, d =0, e =0 … … x =0, y =0, z =0}, the maximum frequency is 2, the generated third code segment is [ b, c, c, d, d, e, e, f, f … … z, z ], and for code b, the first frequency is 1 and the second frequency is 1.
Optionally, the server combines the first encoding segment with the third encoding segment to obtain the second encoding segment may be: randomly ordering all codes in the third code segment, wherein the Random Shuffle algorithm can be adopted to randomly order the third code segment; and splicing the first coding section and the randomly sequenced third coding section to obtain a second coding section, wherein the splicing mode can be that the first coding section is in front of the third coding section and the third coding section is behind the third coding section, or the third coding section is in front of the first coding section and the first coding section is behind the third coding section.
Optionally, if the data to be stored includes a plurality of data segments, each second encoding segment may be processed in the following manner, so as to increase the difficulty of counting the data distribution condition of each data segment in the data to be stored: the server acquires the maximum value of the maximum frequencies corresponding to all the second coding segments; and performing coding frequency supplement on the second coding section according to the maximum value to obtain an updated second coding section, wherein the frequency of each code in the standard coding set appearing in the updated second coding section is the same as the maximum value.
For example, the data to be stored includes three data segments A, B, C, which correspond to the second coding segment a, the second coding segment B, and the second coding segment C, respectively, where the maximum frequency of the second coding segment a is 3, the maximum frequency of the second coding segment B is 5, the maximum frequency of the second coding segment C is 4, and the maximum frequency is 5, and the coding frequency of each second coding segment is supplemented to obtain the updated second coding segment a, the updated second coding segment B, and the updated second coding segment C, and the frequencies of each code in the standard coding set appearing in the updated second coding segment a, the updated second coding segment B, and the updated second coding segment C are all 5.
Optionally, in order to facilitate restoring the data to be stored, the server acquires position information of the data segment corresponding to the first encoding segment in the data to be stored; converting the position information into position codes by adopting a preset coding format, wherein the preset coding format is the coding format adopted by the first coding section; the position codes are spliced into a second code segment. The position coding is added in the first coding section, so that slight disturbance to the frequency can be realized, and the difficulty of counting the data distribution condition is further increased.
As an optional embodiment, the server obtains a second encoding segment set, where the second encoding segment set includes a second encoding segment corresponding to each first encoding segment in the data to be stored; the second encoding segment set is subjected to exclusive-or operation by using the data key to obtain an exclusive-or group, the exclusive-or group is used for representing the encrypted data to be stored, the exclusive-or operation can be changed into other encryption modes, such as RC4 (Rivest Cipher 4, which is a stream encryption algorithm), 3DES (triple data encryption algorithm), and the like, but after the defect of frequency cracking is overcome, the simplest exclusive-or encryption mode can also become an encryption mode which is difficult to crack.
The second coding segment set is the data to be stored after data processing, and frequency statistics and frequency supplementation are carried out on the basis of keeping the original data, so that the data distribution condition of the processed data is difficult to be counted. When the second encoding segment set is encrypted and stored, an encryption algorithm with frequency statistical defects can be used for encryption, and if the encryption algorithm without the frequency statistical defects is used, the encryption can be confusing.
As an alternative example, the following describes the technical solution of the present application in combination with the specific embodiments:
carrying out unequal length cutting on the data S, and recording each segment as SiCorresponding to SiSubscript interval of (a) corresponds to [ x ] of Si,yi]Wherein x isi≤yiFor each segment, the interval is expanded, wherein the characters used for expansion are used to balance SiAnd the frequency of the character set Σ.
For example, the character set Σ is all lower case letters, all the characters are abcdefghijklmn … … xyz, where S isi= abcd, then the frequency table is { a =1, b =1, c =1, d =1, e =0 … … x =0, y =0, z =0 }.
The character set itself has a plurality of encoding methods, or the character string can be represented by binary groups and then encoded by equal binary widths, such as a =1, b =2, c =3, can be encoded by char array as [ 'a', 'b', 'c', ] and then binary can be encoded as abc corresponding binary values, 97, 98, 99 respectively, and then 1100001, 1100010, 1100011, which are 110000111000101100011 together, according to 7-bit binary, and then a new K-bit encoding is customized, such as K =3, the above binary string can be grouped as 110|000|111|000|101|100|011, and then frequency statistics is performed according to a new grouping, such as the frequency table of the above grouping is { '000':2, '110':1, '111':1, '101':1, '100':1, '011, wherein the binary string has 000 =3, and wherein the character string has 000', 001, 010, 011, 100, 101, 110, 111, 8 in total.
To obtain the above SiThe step of frequency complementation is to randomize and complement the character set aiming at the whole character set so as to make SiThe respective characters of (a) are present in the character set with a balanced (equal) frequency. The method comprises the following steps:
(1) first of all, obtaining SiAnd obtains the maximum frequency M of the frequency table, e.g., '000' in the binary frequency table of the above example is the maximum frequency.
Generating a new character string K according to the condition that any character i in the character set total set sigma appears in the frequency table f, wherein the generation algorithm of K is as follows:
for i in Σ
if i in f:
K += i*(M-f[i])
else:
K += i*M
the algorithm supplements the frequencies of other characters according to the maximum frequency M and performs equalization, and the specific meaning in the algorithm is as follows: the sigma is a character complete set, i is any character in the character complete set sigma, f [ i ] is the frequency of the character i in the frequency table, and M is the maximum frequency; traversing the character corpus (for i in Σ), if the character i exists in the frequency table f (if i in f), supplementing (M-fi) characters i if the character i exists, i.e., K connecting (M-fi) i characters (K + = i (M-fi)); otherwise (else) supplements M characters i, i.e. K connects M i characters (K + = i × M).
(2) And carrying out Random Shuffle on the character string K to obtain a new character string K'.
(3) Then K' and SiCombine to obtain a new string SKiCan be combined at any position, and only needs to keep SiThe original sequence is just needed.
(4) Individual SK's can be combined as necessaryiExtending to groups of equal length, the algorithm only needs to generate SKiFurther frequency supplementary expansion is performed.
(5) Expression of grouping information on a sigma character setAdding to SKiIn this way, it can be used in restoration, and it can also slightly disturb the frequency, increasing the difficulty of being counted, if there is a stronger requirement, this step may not be done, and the information can be stored under the chain.
Repeating the above algorithm to obtain SK of all KiThen, a key C is generated, and the expression mode of the secondary system of the character string is subjected to exclusive OR. The exclusive or mode is to convert the character string into char array (character array), then convert into byte array (byte array), and finally perform exclusive or C operation on the byte array.
In this way we obtain an exclusive-or group X of frequency equalizations in the sense of a character set ΣiThen X is introducediStoring into different chains or different transactions, thus completing the storage of the split condition of the scattered, imperceptible content of the data. Only if C is known, the exact data partitioning is known.
The exclusive-or operation can be converted into other encryption modes, but after the defect of frequency cracking is overcome, the simplest exclusive-or encryption mode can also become an encryption mode which is difficult to crack.
According to the scheme, the frequency statistics is carried out after the data to be stored is re-encoded, so that each encoded code bit can be defined by user, the size of a frequency table can be reduced by selecting a smaller code bit, the frequency supplement workload can be reduced, and the statistical difficulty can be increased without the encoding bit number in the prior art; the data to be stored are cut in unequal lengths and then each data segment is processed respectively, so that the requirement of unequal length storage can be met; coding frequency statistics is carried out on a coding section representing one data section, and coding frequency supplement is carried out on the coding section under the condition that an original coding section is reserved, so that the frequency of each code in the coding section after frequency supplement is the same; the coding sections after frequency supplementation are further frequency supplemented, so that the coding frequency of each coding section is equal, and the statistical difficulty of the data distribution condition among the data sections is increased; the coding information is added into the coding section after frequency supplement in the same expression mode as the coding section, so that the data to be stored can be restored conveniently, the frequency can be slightly disturbed, and the difficulty of counting is increased; if the data to be stored has stronger encryption requirements, the coding information can be stored under the link instead of being added into the coding section; processing each data segment to obtain processed data to be stored, so that the processed data to be stored can be encrypted by an encryption algorithm with frequency statistical defects (such as XOR encryption), and if the encryption algorithm without the frequency statistical defects is adopted, the data to be stored can be puzzled, and the analysis difficulty is further increased; the purpose that the storage party cannot count the original data distribution condition during data storage is achieved, the technical problem that the data storage party cannot avoid counting the data distribution condition is solved, and therefore the technical effect of enhancing data confidentiality is achieved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
According to another aspect of the embodiments of the present application, there is also provided a data processing apparatus for implementing the above data processing method. Fig. 3 is a schematic diagram of an alternative data processing apparatus according to an embodiment of the present application, which may include, as shown in fig. 3: an obtaining module 32, configured to obtain a first encoding segment, where the first encoding segment is used to represent one data segment in the data to be stored; a counting module 34, configured to count a first frequency of each code in the standard code set appearing in the first code segment, where the standard code set includes all codes in the first code segment; a generating module 36, configured to generate a second code segment according to a maximum frequency in all the first frequencies, where the second code segment carries the first code segment, and a frequency of each code in the standard code set appearing in the second code segment is the same as the maximum frequency.
It should be noted that the obtaining module 32 in this embodiment may be configured to execute step S202 in this embodiment, the counting module 34 in this embodiment may be configured to execute step S204 in this embodiment, and the generating module 36 in this embodiment may be configured to execute step S206 in this embodiment.
It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may operate in a hardware environment as shown in fig. 1, and may be implemented by software or hardware.
Through the module, the technical problem that data storage cannot avoid data distribution situation statistics by a storage party can be solved, and the technical effect of enhancing data confidentiality is achieved.
As an alternative embodiment, the generating module 36 further includes: the calculating unit is used for subtracting the first frequency of each code in the standard code set from the maximum frequency to obtain a second frequency of the code; a generating unit, configured to generate a third encoded segment according to the second frequency of each code in the standard code set, wherein the frequency of each code in the standard code set appearing in the third encoded segment is the second frequency of the code; and the combination unit is used for combining the first coding segment and the third coding segment to obtain a second coding segment.
Optionally, the combination unit is further configured to: randomly sequencing all codes in the third code segment; and splicing the first coding section and the randomly sequenced third coding section to obtain a second coding section.
As an alternative embodiment, the obtaining module 32 further includes: the cutting unit is used for cutting the data to be stored to obtain one or more data segments; and the coding unit is used for coding each data segment into a first coding segment by adopting a preset coding format, wherein the preset coding format comprises a custom coding format and a character format.
Optionally, the statistic module 34 further includes: the determining unit is used for determining a standard coding set according to a custom coding format, wherein if the custom coding format takes the binary character string with the designated digit as one code, the standard coding set contains all the binary character strings meeting the designated digit; and the statistical unit is used for counting the frequency of each code in the standard code set appearing in the first code segment to obtain a first frequency, wherein the first code segment is divided into a plurality of codes according to a self-defined code format.
Optionally, the determining unit is further configured to determine a standard encoding set according to a character format; the statistical unit is further used for counting the frequency of each code in the standard code set appearing in the first code segment to obtain a first frequency, wherein the first code segment is divided into a plurality of codes according to a character format.
Optionally, the generating module 36 further includes: the acquisition unit is used for acquiring the position information of the data segment corresponding to the first coding segment in the data to be stored; the conversion unit is used for converting the position information into position codes by adopting a preset coding format; and the splicing unit is used for splicing the position codes into the second code segment.
Optionally, the obtaining unit is further configured to: if the data to be stored comprises a plurality of data segments, processing each second coding segment according to the following mode: acquiring the maximum value of the maximum frequencies corresponding to all the second coding segments; and performing coding frequency supplement on the second coding section according to the maximum value to obtain an updated second coding section, wherein the frequency of each code in the standard coding set appearing in the updated second coding section is the same as the maximum value.
As an alternative embodiment, the generating module 36 is further configured to: acquiring a second coding segment set, wherein the second coding segment set comprises second coding segments corresponding to each first coding segment in the data to be stored; and carrying out XOR operation on the second coding segment set by using the data key to obtain an XOR group, wherein the XOR group is used for representing the encrypted data to be stored.
It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may be operated in a hardware environment as shown in fig. 1, and may be implemented by software, or may be implemented by hardware, where the hardware environment includes a network environment.
According to another aspect of the embodiments of the present application, there is also provided a server or a terminal for implementing the data processing method.
Fig. 4 is a block diagram of a terminal according to an embodiment of the present application, and as shown in fig. 4, the terminal may include: one or more processors 401 (only one shown in fig. 4), a memory 403, and a transmission device 405. as shown in fig. 4, the terminal may further include an input-output device 407.
The memory 403 may be used to store software programs and modules, such as program instructions/modules corresponding to the data processing method and apparatus in the embodiment of the present application, and the processor 401 executes the software programs and modules stored in the memory 403 to execute various functional applications and data processing, that is, to implement the data processing method. The memory 403 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 403 may further include memory located remotely from processor 401, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmitting device 405 is used for receiving or sending data via a network, and may also be used for data transmission between the processor and the memory. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 405 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 405 is a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
In particular, the memory 403 is used for storing application programs.
The processor 401 may call the application stored in the memory 403 via the transmission means 405 to perform the following steps: acquiring a first coding segment, wherein the first coding segment is used for representing one data segment in data to be stored; counting a first frequency of each code in a standard code set, wherein the first frequency appears in a first code segment, and all codes in the first code segment are contained in the standard code set; and generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, and the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency.
By adopting the embodiment of the application, a data processing scheme is provided. The data to be stored is represented by the codes, the coding frequency statistics is carried out on the coding segment representing one data segment, the coding frequency supplement is carried out on the coding segment under the condition that the original coding segment is reserved, the frequency of each code in the coding segment after the frequency supplement is the same, the purpose that a storage party cannot count the distribution condition of the original data during data storage is achieved, the technical problem that the data storage cannot avoid the data distribution condition counted by the storage party is solved, and the technical effect of enhancing the data confidentiality is achieved.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
It can be understood by those skilled in the art that the structure shown in fig. 4 is only an illustration, and the terminal may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a Mobile Internet Device (MID), a PAD, etc. Fig. 4 is a diagram illustrating a structure of the electronic device. For example, the terminal may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 4, or have a different configuration than shown in FIG. 4.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
Embodiments of the present application also provide a storage medium. Alternatively, in the present embodiment, the storage medium described above may be used for program codes for executing the data processing method.
Optionally, in this embodiment, the storage medium may be located on at least one of a plurality of network devices in a network shown in the above embodiment.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps:
s1, acquiring a first coding segment, wherein the first coding segment is used for representing one data segment in the data to be stored;
s2, counting the first frequency of each code in the standard code set appearing in the first code segment, wherein the standard code set contains all codes in the first code segment;
and S3, generating a second code segment according to the maximum frequency in all the first frequencies, wherein the second code segment carries the first code segment, and the frequency of each code in the standard code set appearing in the second code segment is the same as the maximum frequency.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.
Claims (12)
1. A data processing method, comprising:
acquiring a first coding segment, wherein the first coding segment is used for representing one data segment in data to be stored;
counting a first frequency at which each code in a standard code set occurs in the first code segment, wherein the standard code set contains all codes in the first code segment;
and generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, and the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency.
2. The method of claim 1, wherein generating a second code segment at a maximum frequency of all of the first frequencies comprises:
subtracting the first frequency of each code in the standard code set from the maximum frequency to obtain a second frequency of the code;
generating a third code segment at the second frequency of each code in the standard code set, wherein the frequency of each code in the standard code set occurring in the third code segment is the second frequency of the code;
and combining the first coding segment and the third coding segment to obtain the second coding segment.
3. The method of claim 2, wherein combining the first coded segment with the third coded segment to obtain the second coded segment comprises:
randomly sequencing all codes in the third code segment;
and splicing the first coding segment and the randomly sequenced third coding segment to obtain the second coding segment.
4. The method of claim 1, wherein obtaining the first code segment comprises:
cutting the data to be stored to obtain one or more data segments;
and coding each data segment into one first code segment by adopting a preset code format, wherein the preset code format comprises a custom code format and a character format.
5. The method of claim 4, wherein each code in the set of statistical standard codes encodes a first frequency of occurrence in the first code segment, comprising:
determining the standard coding set according to the self-defined coding format, wherein if the self-defined coding format takes the binary character string with the specified digit as one code, the standard coding set comprises all the binary character strings which accord with the specified digit;
and counting the frequency of each code in the standard code set appearing in the first code segment to obtain the first frequency, wherein the first code segment is divided into a plurality of codes according to the custom code format.
6. The method of claim 4, wherein each code in the set of statistical standard codes encodes a first frequency of occurrence in the first code segment, comprising:
determining the standard coding set according to the character format;
and counting the frequency of each code in the standard code set appearing in the first code segment to obtain the first frequency, wherein the first code segment is divided into a plurality of codes according to the character format.
7. The method of claim 4, wherein after generating the second code segment at a maximum frequency of all of the first frequencies, the method further comprises:
acquiring the position information of a data segment corresponding to the first coding segment in the data to be stored;
converting the position information into a position code by adopting the preset coding format;
splicing the position code into the second code segment.
8. The method of claim 7, wherein before obtaining the location information of the data segment corresponding to the first encoded segment in the data to be stored, the method further comprises:
if the data to be stored comprises a plurality of data segments, processing each second coding segment according to the following mode:
acquiring the maximum value of the maximum frequencies corresponding to all the second coding segments;
and performing coding frequency supplement on the second coding section according to the maximum value to obtain an updated second coding section, wherein the frequency of each code in the standard coding set appearing in the updated second coding section is the same as the maximum value.
9. The method of any of claims 1 to 8, wherein after generating a second code segment at a maximum frequency of all of the first frequencies, the method further comprises:
acquiring a second coding segment set, wherein the second coding segment set comprises second coding segments corresponding to each first coding segment in the data to be stored;
and carrying out XOR operation on the second coding segment set by using a data key to obtain an XOR group, wherein the XOR group is used for representing the encrypted data to be stored.
10. A data processing apparatus, comprising:
the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a first coding segment, and the first coding segment is used for representing one data segment in data to be stored;
a counting module, configured to count a first frequency of each code in a standard code set appearing in the first code segment, where the standard code set includes all codes in the first code segment;
a generating module, configured to generate a second coding segment according to a maximum frequency in all the first frequencies, where the second coding segment carries the first coding segment, and a frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the method of any of the preceding claims 1 to 9 by means of the computer program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data processing method according to any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111593205.1A CN113987556B (en) | 2021-12-24 | 2021-12-24 | Data processing method and device, electronic equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111593205.1A CN113987556B (en) | 2021-12-24 | 2021-12-24 | Data processing method and device, electronic equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113987556A true CN113987556A (en) | 2022-01-28 |
CN113987556B CN113987556B (en) | 2022-05-10 |
Family
ID=79734186
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111593205.1A Active CN113987556B (en) | 2021-12-24 | 2021-12-24 | Data processing method and device, electronic equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113987556B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116738471A (en) * | 2023-08-10 | 2023-09-12 | 陕西昕晟链云信息科技有限公司 | Block chain-based decentralization data analysis method |
Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5045853A (en) * | 1987-06-17 | 1991-09-03 | Intel Corporation | Method and apparatus for statistically encoding digital data |
JP2006165801A (en) * | 2004-12-03 | 2006-06-22 | Canon Inc | Coding mode discrimination apparatus and method |
TW201131369A (en) * | 2010-03-09 | 2011-09-16 | Silicon Motion Inc | Electronic apparatus and method for storing data in a memory |
CN102460404A (en) * | 2009-06-01 | 2012-05-16 | 起元技术有限责任公司 | Generating obfuscated data |
US20130297573A1 (en) * | 2012-05-07 | 2013-11-07 | Sybase Inc. | Character Data Compression for Reducing Storage Requirements in a Database System |
CN106484753A (en) * | 2016-06-07 | 2017-03-08 | 湖南千年华光软件开发有限公司 | Data processing method |
CN107332567A (en) * | 2017-06-09 | 2017-11-07 | 西安万像电子科技有限公司 | Coding method and device |
CN109697277A (en) * | 2017-10-20 | 2019-04-30 | 北京京东尚科信息技术有限公司 | The method and apparatus of Text compression |
CN110266316A (en) * | 2019-05-08 | 2019-09-20 | 阿里巴巴集团控股有限公司 | A kind of data compression, decompressing method, device and equipment |
CN110569487A (en) * | 2019-08-19 | 2019-12-13 | 积成电子股份有限公司 | base64 extension coding method and system based on high-frequency character substitution algorithm |
CN110932822A (en) * | 2019-12-02 | 2020-03-27 | 泰康保险集团股份有限公司 | Data encoding method, data decoding method, device, equipment and storage medium |
CN111130558A (en) * | 2019-12-31 | 2020-05-08 | 世纪恒通科技股份有限公司 | Coding table compression method based on statistical probability |
-
2021
- 2021-12-24 CN CN202111593205.1A patent/CN113987556B/en active Active
Patent Citations (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5045853A (en) * | 1987-06-17 | 1991-09-03 | Intel Corporation | Method and apparatus for statistically encoding digital data |
JP2006165801A (en) * | 2004-12-03 | 2006-06-22 | Canon Inc | Coding mode discrimination apparatus and method |
CN102460404A (en) * | 2009-06-01 | 2012-05-16 | 起元技术有限责任公司 | Generating obfuscated data |
TW201131369A (en) * | 2010-03-09 | 2011-09-16 | Silicon Motion Inc | Electronic apparatus and method for storing data in a memory |
US20130297573A1 (en) * | 2012-05-07 | 2013-11-07 | Sybase Inc. | Character Data Compression for Reducing Storage Requirements in a Database System |
CN106484753A (en) * | 2016-06-07 | 2017-03-08 | 湖南千年华光软件开发有限公司 | Data processing method |
CN107332567A (en) * | 2017-06-09 | 2017-11-07 | 西安万像电子科技有限公司 | Coding method and device |
CN109697277A (en) * | 2017-10-20 | 2019-04-30 | 北京京东尚科信息技术有限公司 | The method and apparatus of Text compression |
CN110266316A (en) * | 2019-05-08 | 2019-09-20 | 阿里巴巴集团控股有限公司 | A kind of data compression, decompressing method, device and equipment |
CN110569487A (en) * | 2019-08-19 | 2019-12-13 | 积成电子股份有限公司 | base64 extension coding method and system based on high-frequency character substitution algorithm |
CN110932822A (en) * | 2019-12-02 | 2020-03-27 | 泰康保险集团股份有限公司 | Data encoding method, data decoding method, device, equipment and storage medium |
CN111130558A (en) * | 2019-12-31 | 2020-05-08 | 世纪恒通科技股份有限公司 | Coding table compression method based on statistical probability |
Non-Patent Citations (1)
Title |
---|
董乾: "《采用哈夫曼编码技术提高硬件无损压缩效率的算法研究》", 《中国博士学位论文全文数据库(信息科技辑)》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116738471A (en) * | 2023-08-10 | 2023-09-12 | 陕西昕晟链云信息科技有限公司 | Block chain-based decentralization data analysis method |
CN116738471B (en) * | 2023-08-10 | 2023-10-20 | 陕西昕晟链云信息科技有限公司 | Block chain-based decentralization data analysis method |
Also Published As
Publication number | Publication date |
---|---|
CN113987556B (en) | 2022-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108377183B (en) | XDR data information encryption method, device, equipment and medium | |
CN115865523B (en) | Data encryption transmission method for information analysis system | |
CN110138739B (en) | Data information encryption method and device, computer equipment and storage medium | |
CN111683046A (en) | Method, device, equipment and storage medium for compressing and acquiring file | |
CN105100085B (en) | A kind of method and apparatus that information is encrypted and decrypted | |
US7925012B2 (en) | Method and system for the secure distribution of compressed digital texts | |
CN112235104B (en) | Data encryption transmission method, system, terminal and storage medium | |
CN110365468B (en) | Anonymization processing method, device, equipment and storage medium | |
CN116151740B (en) | Inventory transaction data process safety management system and cloud platform | |
CN107105324B (en) | Method and client for protecting bullet screen information | |
CN110995391A (en) | Data transmission method in isolated network, server and terminal | |
CN113987556B (en) | Data processing method and device, electronic equipment and storage medium | |
CN115150818B (en) | Communication transmission encryption method based on artificial intelligence | |
CN114422134A (en) | Data secure transmission method and equipment | |
CN112437060A (en) | Data transmission method and device, computer equipment and storage medium | |
CN114285575A (en) | Image encryption and decryption method and device, storage medium and electronic device | |
CN115296862A (en) | Network data secure transmission method based on data coding | |
CN114614829A (en) | Satellite data frame processing method and device, electronic equipment and readable storage medium | |
CN116821967B (en) | Intersection computing method and system for privacy protection | |
CN113904832A (en) | Data encryption method, device, equipment and storage medium | |
CN117201797A (en) | Remote sensing image data processing method, device, equipment and storage medium | |
KR101045222B1 (en) | Method of encrypting and synthesizing personal information into order information and contents information, apparatus, server and recording media | |
CN112953716A (en) | Method and device for generating and verifying exchange code | |
CN107092815A (en) | The method and server of a kind of protection module file | |
CN113343269B (en) | Encryption method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
EE01 | Entry into force of recordation of patent licensing contract |
Application publication date: 20220128 Assignee: Hangzhou Quanke Technology Co.,Ltd. Assignor: HANGZHOU HYPERCHAIN TECHNOLOGIES Co.,Ltd. Contract record no.: X2022980029948 Denomination of invention: Data processing method and device, electronic equipment, storage medium Granted publication date: 20220510 License type: Common License Record date: 20230115 |
|
EE01 | Entry into force of recordation of patent licensing contract |