CN113987556B - Data processing method and device, electronic equipment and storage medium - Google Patents

Data processing method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113987556B
CN113987556B CN202111593205.1A CN202111593205A CN113987556B CN 113987556 B CN113987556 B CN 113987556B CN 202111593205 A CN202111593205 A CN 202111593205A CN 113987556 B CN113987556 B CN 113987556B
Authority
CN
China
Prior art keywords
segment
code
coding
frequency
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111593205.1A
Other languages
Chinese (zh)
Other versions
CN113987556A (en
Inventor
曾磊
邵羽
詹士潇
李伟
匡立中
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Qulian Technology Co Ltd
Original Assignee
Hangzhou Qulian Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Qulian Technology Co Ltd filed Critical Hangzhou Qulian Technology Co Ltd
Priority to CN202111593205.1A priority Critical patent/CN113987556B/en
Publication of CN113987556A publication Critical patent/CN113987556A/en
Application granted granted Critical
Publication of CN113987556B publication Critical patent/CN113987556B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0863Generation of secret information including derivation or calculation of cryptographic keys or passwords involving passwords or one-time passwords

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The application relates to a data processing method and device, electronic equipment and a storage medium, wherein the method comprises the following steps: acquiring a first coding segment, wherein the first coding segment is used for representing one data segment in data to be stored; counting a first frequency of each code in a standard code set appearing in a first code segment, wherein the standard code set comprises all codes in the first code segment; and generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, and the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency. The data storage method and device solve the technical problem that data storage cannot avoid data distribution conditions counted by a storage party.

Description

Data processing method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of data processing, and in particular, to a data processing method and apparatus, an electronic device, and a storage medium.
Background
With the development of the internet, data generated under the internet is increased in geometric multiples. In the face of massive data, the traditional storage defects are more and more obvious, such as poor expansibility, single point failure and the like, and the traditional centralized storage is gradually replaced by distributed storage. The data distribution storage technology aims to establish a novel distributed encryption storage network and provide efficient storage service for users.
When data distribution is stored, a storage party does not want to know the data distribution situation, and when encryption and other methods are adopted in the related art, possible data distribution can be counted through frequency statistics.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The application provides a data processing method and device, electronic equipment and a storage medium, which are used for at least solving the technical problem that data storage cannot avoid data distribution situation statistics by a storage party in the related technology.
According to an aspect of an embodiment of the present application, there is provided a data processing method, including: acquiring a first coding segment, wherein the first coding segment is used for representing one data segment in data to be stored; counting a first frequency of each code in a standard code set, wherein the first frequency appears in a first code segment, and all codes in the first code segment are contained in the standard code set; and generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, and the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency.
According to another aspect of the embodiments of the present application, there is also provided a data processing apparatus, including: the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a first coding segment, and the first coding segment is used for representing one data segment in data to be stored; the statistical module is used for counting the first frequency of each code in the standard code set appearing in the first code segment, wherein the standard code set comprises all codes in the first code segment; and the generating module is used for generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, and the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency.
According to another aspect of the embodiments of the present application, there is also provided a storage medium including a stored program which, when executed, performs the above-described method.
According to another aspect of the embodiments of the present application, there is also provided an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the above method through the computer program.
According to an aspect of the application, a computer program product or computer program is provided, comprising computer instructions, the computer instructions being stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the steps of any of the embodiments of the method described above.
In the embodiment of the application, a first coding segment is obtained, wherein the first coding segment is used for representing one data segment in data to be stored; counting a first frequency of each code in a standard code set, wherein the first frequency appears in a first code segment, and all codes in the first code segment are contained in the standard code set; generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency, coding frequency statistics is performed on the coding segment representing one data segment by using the codes to represent data to be stored, and coding frequency supplementation is performed on the coding segment under the condition of reserving the original coding segment, so that the frequency of each code in the coding segment after frequency supplementation is the same, the purpose that a storage party cannot count the distribution condition of the original data during data storage is achieved, the technical problem that data storage cannot avoid the data distribution condition counted by the storage party is solved, and the technical effect of enhancing data confidentiality is achieved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a schematic diagram of a hardware environment for a data processing method according to an embodiment of the present application;
FIG. 2 is a flow chart of an alternative data processing method according to an embodiment of the present application;
FIG. 3 is a schematic diagram of an alternative data processing apparatus according to an embodiment of the present application;
fig. 4 is a block diagram of a terminal according to an embodiment of the present application.
Detailed Description
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
First, partial nouns or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:
standard coding set: in this application, a standard encoding set refers to a full set of encodings, i.e. a set of all encodings that conform to the encoding format.
Frequency statistics: in this application, frequency statistics refer to the number of times each code in a statistical standard code set appears in a code segment.
Frequency supplement: in this application, frequency supplementation refers to adding a certain number of codes to a code segment according to a certain frequency, so that the number of times of occurrence of each code in the code segment after frequency supplementation is the same.
According to an aspect of embodiments of the present application, there is provided an embodiment of a method for data processing.
Alternatively, in the present embodiment, the data processing method described above may be applied to a hardware environment constituted by the terminal 101 and the server 103 as shown in fig. 1. As shown in fig. 1, a server 103 is connected to a terminal 101 through a network, which may be used to provide data processing services for the terminal or a client installed on the terminal, and a database 105 may be provided on the server or separately from the server, and is used to provide data storage services for the server 103, and the network includes but is not limited to: the terminal 101 is not limited to a PC, a mobile phone, a tablet computer, and the like. The data processing method according to the embodiment of the present application may be executed by the server 103, the terminal 101, or both the server 103 and the terminal 101. The terminal 101 executing the data processing method according to the embodiment of the present application may also be executed by a client installed thereon. A data processing method executed on a server according to an embodiment of the present application will be described as an example.
Fig. 2 is a flow chart of an alternative data processing method according to an embodiment of the present application, which may include the following steps, as shown in fig. 2:
step S202, a server acquires a first coding segment, wherein the first coding segment is used for representing one data segment in data to be stored;
step S204, the server counts the first frequency of each code in the standard code set appearing in the first code segment, and the standard code set comprises all the codes in the first code segment;
step S206, the server generates a second code segment according to the maximum frequency of all the first frequencies, the second code segment carries the first code segment, and the frequency of each code in the standard code set appearing in the second code segment is the same as the maximum frequency.
Through the steps S202 to S206, the data to be stored is represented by the codes, the coding frequency statistics is performed on the coding segment representing one data segment, the coding frequency supplementation is performed on the coding segment under the condition that the original coding segment is reserved, so that the frequency of each code in the coding segment after the frequency supplementation is the same, the purpose that the storage party cannot count the distribution condition of the original data during data storage is achieved, the technical problem that the data storage party cannot count the data distribution condition is solved, and the technical effect of enhancing the data confidentiality is achieved.
In the technical solution provided in step S202, the server acquires a first encoding segment, where the first encoding segment is used to represent one data segment in the data to be stored.
The first coding segment represents a data segment, and when the data to be stored comprises a plurality of data segments, the plurality of first coding segments corresponding to the data segments are adopted to represent all the data to be stored.
The obtaining of the first coding segment may be directly obtaining a character string in the data segment, or may be re-coding the character string in the data segment into the first coding segment by using other coding modes, and the adopted coding format may be Unicode (Unicode), ASCII (american standard code for information interchange), GBK (chinese character coding character set), and the like commonly used in the prior art, or may be a coding format in which code bits are customized (that is, N code bits are used as one code) on the basis of one existing coding format.
For example, [ abc ] is a string in one data segment, (1) character format: the first code segment may be a directly obtained string of characters [ a, b, c ], each character being a code; (2) the self-defined coding format is as follows: the first code segment can also be obtained by carrying out equal-length coding on character strings by adopting other coding formats, wherein the ASCII values corresponding to a, b and c are 97, 98 and 99 respectively, then the ASCII values are 1100001, 1100010 and 1100011 according to 7-bit binary systems, the first code segment [1100001, 1100010 and 1100011] is obtained together, and the code bit is customized on the basis of the coding format, and when the code bit is 3, the first code segment is [110, 000, 111, 000, 101, 100 and 011 ].
As an alternative embodiment, the server may obtain the first code segment as follows: the server cuts the data to be stored to obtain one or more data segments, and cuts the data to be stored, wherein the cutting mode can be set according to requirements, if the data needs to be stored in unequal lengths, the data to be stored can be cut in unequal lengths, and if the data needs to be stored in equal lengths, the data to be stored can be cut in fixed lengths; and then, coding each data segment into a first code segment by adopting a preset code format, wherein the preset code format comprises a custom code format and a character format.
In the technical solution provided in step S204, the server counts a first frequency of each code in the standard code set appearing in the first code segment, where the standard code set includes all codes in the first code segment.
The standard code set referred to in this application is a code set used for frequency statistics, in which each code occurs only once, and covers all possible codes occurring in the first code segment.
The first frequency refers to the number of occurrences of codes in the standard code set in the first code segment.
As an alternative embodiment, when the first encoding set represents the data segment in a character format, the server determines the standard encoding set according to the character format; and counting the frequency of each code in the standard code set appearing in the first code segment to obtain a first frequency, wherein the first code segment is divided into a plurality of codes according to a character format.
For example, if the first code segment represents the data segment by using lower case english alphabet, the standard code set includes all lower case english alphabet, and the standard code set is { a, b, c, d, e, f, g, h, i, j, k, l, m, n … … x, y, z }, and if the first code segment is [ a, b, c, d ], the statistically obtained frequency table is { a =1, b =1, c =1, d =1, e =0 … … x =0, y =0, z =0}, where a =1 represents that a in the standard code set occurs 1 times in the first code segment, and e =0 represents that e in the standard code set occurs 0 times in the first code segment.
As another optional embodiment, when the first encoding set represents the data segment in the custom encoding format, the server determines the standard encoding set according to the custom encoding format, wherein if the custom encoding format uses the binary character string with the specified number of bits as one encoding, the standard encoding set contains all the binary character strings with the specified number of bits; and counting the frequency of each code in the standard code set appearing in the first code segment to obtain a first frequency, wherein the first code segment is divided into a plurality of codes according to a self-defined code format. When a user-defined coding format with smaller code bits is adopted, the size of the frequency table can be reduced, and the difficulty of statistics is increased.
For example, if the first coding segment represents the data segment by using a 3-bit binary string, the standard coding set includes all 3-bit binary strings, and the standard coding set is {000, 001, 010, 011, 100, 101, 110, 111}, and if the first coding segment is [110, 000, 111, 000, 101, 100, 011], the statistically obtained frequency table is { '000' =2, '001' =0, '010' =0, '011' =1, '100' =1, '101' =1, '110' =1, '111' =1}, where '000' =2 indicates that '000' in the standard coding set occurs 2 times in the first coding segment and '001' =0 indicates that '001' in the standard coding set occurs 0 times in the first coding segment.
In the technical solution provided in step S206, the server generates a second code segment according to a maximum frequency of all the first frequencies, where the second code segment carries the first code segment, and a frequency of each code in the standard code set appearing in the second code segment is the same as the maximum frequency.
The maximum frequency refers to a frequency value corresponding to a code with the largest number of occurrences of codes in the standard code set in the first code segment.
The second coding section is generated in order to prevent a storage party from counting the real data distribution situation, the second coding section is obtained by performing frequency supplement on the first coding section, the frequency of each code in the standard coding set appearing in the second coding section is the same as the maximum frequency, and when the storage party counts the data distribution situation, the counting is the coding frequency in the second coding section after data processing.
As an alternative embodiment, the server subtracts the first frequency of each code in the standard code set from the maximum frequency to obtain the second frequency of the code; then generating a third code segment according to the second frequency of each code in the standard code set, wherein the frequency of each code in the standard code set appearing in the third code segment is the second frequency of the code; and finally, combining the first coding segment and the third coding segment to obtain a second coding segment. For example, the first code segment is [ aab ], the standard code set is { a, b, c, d, e, f, g, h, i, j, k, l, m, n … … x, y, z }, the statistically derived frequency table is { a =2, b =1, c =0, d =0, e =0 … … x =0, y =0, z =0}, the maximum frequency is 2, the generated third code segment is [ b, c, c, d, d, e, e, f, f … … z, z ], and for code b, the first frequency is 1 and the second frequency is 1.
Optionally, the server combines the first encoding segment with the third encoding segment to obtain the second encoding segment may be: randomly ordering all codes in the third code segment, wherein the Random Shuffle algorithm can be adopted to randomly order the third code segment; and splicing the first coding section and the randomly sequenced third coding section to obtain a second coding section, wherein the splicing mode can be that the first coding section is in front of the third coding section and the third coding section is behind the third coding section, or the third coding section is in front of the first coding section and the first coding section is behind the third coding section.
Optionally, if the data to be stored includes a plurality of data segments, each second encoding segment may be processed in the following manner, so as to increase the difficulty of counting the data distribution condition of each data segment in the data to be stored: the server acquires the maximum value of the maximum frequencies corresponding to all the second coding segments; and performing coding frequency supplement on the second coding section according to the maximum value to obtain an updated second coding section, wherein the frequency of each code in the standard coding set appearing in the updated second coding section is the same as the maximum value.
For example, the data to be stored includes three data segments A, B, C, which correspond to the second coding segment a, the second coding segment B, and the second coding segment C, respectively, where the maximum frequency of the second coding segment a is 3, the maximum frequency of the second coding segment B is 5, the maximum frequency of the second coding segment C is 4, and the maximum frequency is 5, and the coding frequency of each second coding segment is supplemented to obtain the updated second coding segment a, the updated second coding segment B, and the updated second coding segment C, and the frequencies of each code in the standard coding set appearing in the updated second coding segment a, the updated second coding segment B, and the updated second coding segment C are all 5.
Optionally, in order to facilitate restoring the data to be stored, the server acquires position information of the data segment corresponding to the first encoding segment in the data to be stored; converting the position information into position codes by adopting a preset coding format, wherein the preset coding format is the coding format adopted by the first coding section; the position codes are spliced into a second code segment. The position coding is added in the first coding section, so that slight disturbance to the frequency can be realized, and the difficulty of counting the data distribution condition is further increased.
As an optional embodiment, the server obtains a second encoding segment set, where the second encoding segment set includes a second encoding segment corresponding to each first encoding segment in the data to be stored; the second encoding segment set is subjected to exclusive-or operation by using the data key to obtain an exclusive-or group, the exclusive-or group is used for representing the encrypted data to be stored, the exclusive-or operation can be changed into other encryption modes, such as RC4 (Rivest Cipher 4, which is a stream encryption algorithm), 3DES (triple data encryption algorithm), and the like, but after the defect of frequency cracking is overcome, the simplest exclusive-or encryption mode can also become an encryption mode which is difficult to crack.
The second coding segment set is the data to be stored after data processing, and frequency statistics and frequency supplementation are carried out on the basis of keeping the original data, so that the data distribution condition of the processed data is difficult to be counted. When the second encoding segment set is encrypted and stored, an encryption algorithm with frequency statistical defects can be used for encryption, and if the encryption algorithm without the frequency statistical defects is used, the encryption can be confusing.
As an alternative example, the following describes the technical solution of the present application in combination with the specific embodiments:
carrying out unequal length cutting on the data S, and recording each segment as SiCorresponding to SiSubscript interval of (a) corresponds to [ x ] of Si,yi]Wherein x isi≤yiFor each cutting segment, the interval is extended, whichIn which the character used is extended for balancing SiAnd the frequency of the character set Σ.
For example, the character set Σ is all lower case letters, all the characters are abcdefghijklmn … … xyz, where S isi= abcd, then the frequency table is { a =1, b =1, c =1, d =1, e =0 … … x =0, y =0, z =0 }.
The character set itself has a plurality of encoding methods, or the character string can be represented by binary groups and then encoded by equal binary widths, such as a =1, b =2, c =3, can be encoded by char array as [ 'a', 'b', 'c', ] and then binary can be encoded as abc corresponding binary values, 97, 98, 99 respectively, and then 1100001, 1100010, 1100011, which are 110000111000101100011 together, according to 7-bit binary, and then a new K-bit encoding is customized, such as K =3, the above binary string can be grouped as 110|000|111|000|101|100|011, and then frequency statistics is performed according to a new grouping, such as the frequency table of the above grouping is { '000':2, '110':1, '111':1, '101':1, '100':1, '011, wherein the binary string has 000 =3, and wherein the character string has 000', 001, 010, 011, 100, 101, 110, 111, 8 in total.
To obtain the above SiThe step of frequency complementation is to randomize and complement the character set aiming at the whole character set so as to make SiThe respective characters of (a) are present in the character set with a balanced (equal) frequency. The method comprises the following steps:
(1) first of all, obtaining SiAnd obtains the maximum frequency M of the frequency table, e.g., '000' in the binary frequency table of the above example is the maximum frequency.
Generating a new character string K according to the condition that any character i in the character set total set sigma appears in the frequency table f, wherein the generation algorithm of K is as follows:
for i in Σ
if i in f:
K += i*(M-f[i])
else:
K += i*M
the algorithm supplements the frequencies of other characters according to the maximum frequency M and performs equalization, and the specific meaning in the algorithm is as follows: the sigma is a character complete set, i is any character in the character complete set sigma, f [ i ] is the frequency of the character i in the frequency table, and M is the maximum frequency; traversing the character corpus (for i in Σ), if the character i exists in the frequency table f (if i in f), supplementing (M-fi) characters i if the character i exists, i.e., K connecting (M-fi) i characters (K + = i (M-fi)); otherwise (else) supplements M characters i, i.e. K connects M i characters (K + = i × M).
(2) And carrying out Random Shuffle on the character string K to obtain a new character string K'.
(3) Then K' and SiCombine to obtain a new string SKiCan be combined at any position, and only needs to keep SiThe original sequence is just needed.
(4) Individual SK's can be combined as necessaryiExtending to groups of equal length, the algorithm only needs to generate SKiFurther frequency supplementary expansion is performed.
(5) Then adding the expression mode of the grouping information on the sigma character set into SKiIn this way, it can be used in restoration, and it can also slightly disturb the frequency, increasing the difficulty of being counted, if there is a stronger requirement, this step may not be done, and the information can be stored under the chain.
Repeating the above algorithm to obtain SK of all KiThen, a key C is generated, and the expression mode of the secondary system of the character string is subjected to exclusive OR. The exclusive or mode is to convert the character string into char array (character array), then convert into byte array (byte array), and finally perform exclusive or C operation on the byte array.
In this way we obtain an exclusive-or group X of frequency equalizations in the sense of a character set ΣiThen X is introducediStoring into different chains or different transactions, thus completing the storage of the split condition of the scattered, imperceptible content of the data. Only if C is known, the exact data partitioning is known.
The exclusive-or operation can be converted into other encryption modes, but after the defect of frequency cracking is overcome, the simplest exclusive-or encryption mode can also become an encryption mode which is difficult to crack.
According to the scheme, the frequency statistics is carried out after the data to be stored is re-encoded, so that each encoded code bit can be defined by user, the size of a frequency table can be reduced by selecting a smaller code bit, the frequency supplement workload can be reduced, and the statistical difficulty can be increased without the encoding bit number in the prior art; the data to be stored are cut in unequal lengths and then each data segment is processed respectively, so that the requirement of unequal length storage can be met; coding frequency statistics is carried out on a coding section representing one data section, and coding frequency supplement is carried out on the coding section under the condition that an original coding section is reserved, so that the frequency of each code in the coding section after frequency supplement is the same; the coding sections after frequency supplementation are further frequency supplemented, so that the coding frequency of each coding section is equal, and the statistical difficulty of the data distribution condition among the data sections is increased; the coding information is added into the coding section after frequency supplement in the same expression mode as the coding section, so that the data to be stored can be restored conveniently, the frequency can be slightly disturbed, and the difficulty of counting is increased; if the data to be stored has stronger encryption requirements, the coding information can be stored under the link instead of being added into the coding section; processing each data segment to obtain processed data to be stored, so that the processed data to be stored can be encrypted by an encryption algorithm with frequency statistical defects (such as XOR encryption), and if the encryption algorithm without the frequency statistical defects is adopted, the data to be stored can be puzzled, and the analysis difficulty is further increased; the purpose that the storage party cannot count the original data distribution condition during data storage is achieved, the technical problem that the data storage party cannot avoid counting the data distribution condition is solved, and therefore the technical effect of enhancing data confidentiality is achieved.
It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present application or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a ROM/RAM, a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a server, or a network device) to execute the method described in the embodiments of the present application.
According to another aspect of the embodiments of the present application, there is also provided a data processing apparatus for implementing the above data processing method. Fig. 3 is a schematic diagram of an alternative data processing apparatus according to an embodiment of the present application, and as shown in fig. 3, the apparatus may include: an obtaining module 32, configured to obtain a first encoding segment, where the first encoding segment is used to represent one data segment in the data to be stored; a counting module 34, configured to count a first frequency of each code in the standard code set appearing in the first code segment, where the standard code set includes all codes in the first code segment; a generating module 36, configured to generate a second code segment according to a maximum frequency in all the first frequencies, where the second code segment carries the first code segment, and a frequency of each code in the standard code set appearing in the second code segment is the same as the maximum frequency.
It should be noted that the obtaining module 32 in this embodiment may be configured to execute step S202 in this embodiment, the counting module 34 in this embodiment may be configured to execute step S204 in this embodiment, and the generating module 36 in this embodiment may be configured to execute step S206 in this embodiment.
It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may operate in a hardware environment as shown in fig. 1, and may be implemented by software or hardware.
Through the module, the technical problem that data storage cannot avoid data distribution situation statistics by a storage party can be solved, and the technical effect of enhancing data confidentiality is achieved.
As an alternative embodiment, the generating module 36 further includes: the calculating unit is used for subtracting the first frequency of each code in the standard code set from the maximum frequency to obtain a second frequency of the code; a generating unit, configured to generate a third encoded segment according to the second frequency of each code in the standard code set, wherein the frequency of each code in the standard code set appearing in the third encoded segment is the second frequency of the code; and the combination unit is used for combining the first coding segment and the third coding segment to obtain a second coding segment.
Optionally, the combination unit is further configured to: randomly sequencing all codes in the third code segment; and splicing the first coding section and the randomly sequenced third coding section to obtain a second coding section.
As an alternative embodiment, the obtaining module 32 further includes: the cutting unit is used for cutting the data to be stored to obtain one or more data segments; and the coding unit is used for coding each data segment into a first coding segment by adopting a preset coding format, wherein the preset coding format comprises a custom coding format and a character format.
Optionally, the statistic module 34 further includes: the determining unit is used for determining a standard coding set according to a custom coding format, wherein if the custom coding format takes the binary character string with the designated digit as one code, the standard coding set contains all the binary character strings meeting the designated digit; and the statistical unit is used for counting the frequency of each code in the standard code set appearing in the first code segment to obtain a first frequency, wherein the first code segment is divided into a plurality of codes according to a self-defined code format.
Optionally, the determining unit is further configured to determine a standard encoding set according to a character format; the statistical unit is further used for counting the frequency of each code in the standard code set appearing in the first code segment to obtain a first frequency, wherein the first code segment is divided into a plurality of codes according to a character format.
Optionally, the generating module 36 further includes: the acquisition unit is used for acquiring the position information of the data segment corresponding to the first coding segment in the data to be stored; the conversion unit is used for converting the position information into position codes by adopting a preset coding format; and the splicing unit is used for splicing the position codes into the second code segment.
Optionally, the obtaining unit is further configured to: if the data to be stored comprises a plurality of data segments, processing each second coding segment according to the following mode: acquiring the maximum value of the maximum frequencies corresponding to all the second coding segments; and performing coding frequency supplement on the second coding section according to the maximum value to obtain an updated second coding section, wherein the frequency of each code in the standard coding set appearing in the updated second coding section is the same as the maximum value.
As an alternative embodiment, the generating module 36 is further configured to: acquiring a second coding segment set, wherein the second coding segment set comprises second coding segments corresponding to each first coding segment in the data to be stored; and carrying out XOR operation on the second coding segment set by using the data key to obtain an XOR group, wherein the XOR group is used for representing the encrypted data to be stored.
It should be noted here that the modules described above are the same as the examples and application scenarios implemented by the corresponding steps, but are not limited to the disclosure of the above embodiments. It should be noted that the modules described above as a part of the apparatus may be operated in a hardware environment as shown in fig. 1, and may be implemented by software, or may be implemented by hardware, where the hardware environment includes a network environment.
According to another aspect of the embodiments of the present application, there is also provided a server or a terminal for implementing the data processing method.
Fig. 4 is a block diagram of a terminal according to an embodiment of the present application, and as shown in fig. 4, the terminal may include: one or more processors 401 (only one shown in fig. 4), a memory 403, and a transmission means 405. as shown in fig. 4, the terminal may further include an input output device 407.
The memory 403 may be used to store software programs and modules, such as program instructions/modules corresponding to the data processing method and apparatus in the embodiment of the present application, and the processor 401 executes the software programs and modules stored in the memory 403 to execute various functional applications and data processing, that is, to implement the data processing method. Memory 403 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 403 may further include memory located remotely from the processor 401, which may be connected to the terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The transmitting device 405 is used for receiving or sending data via a network, and may also be used for data transmission between the processor and the memory. Examples of the network may include a wired network and a wireless network. In one example, the transmission device 405 includes a Network adapter (NIC) that can be connected to a router via a Network cable and other Network devices to communicate with the internet or a local area Network. In one example, the transmission device 405 is a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.
In particular, the memory 403 is used for storing application programs.
The processor 401 may call the application stored in the memory 403 via the transmission means 405 to perform the following steps: acquiring a first coding segment, wherein the first coding segment is used for representing one data segment in data to be stored; counting a first frequency of each code in a standard code set, wherein the first frequency appears in a first code segment, and all codes in the first code segment are contained in the standard code set; and generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, and the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency.
By adopting the embodiment of the application, a data processing scheme is provided. The data to be stored is represented by the codes, the coding frequency statistics is carried out on the coding segment representing one data segment, the coding frequency supplement is carried out on the coding segment under the condition that the original coding segment is reserved, the frequency of each code in the coding segment after the frequency supplement is the same, the purpose that a storage party cannot count the distribution condition of the original data during data storage is achieved, the technical problem that the data storage cannot avoid the data distribution condition counted by the storage party is solved, and the technical effect of enhancing the data confidentiality is achieved.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
It can be understood by those skilled in the art that the structure shown in fig. 4 is only an illustration, and the terminal may be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palm computer, and a Mobile Internet Device (MID), a PAD, etc. Fig. 4 is a diagram illustrating a structure of the electronic device. For example, the terminal may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 4, or have a different configuration than shown in FIG. 4.
Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.
Embodiments of the present application also provide a storage medium. Alternatively, in the present embodiment, the storage medium described above may be used for program codes for executing the data processing method.
Optionally, in this embodiment, the storage medium may be located on at least one of a plurality of network devices in a network shown in the above embodiment.
Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps:
s1, acquiring a first coding segment, wherein the first coding segment is used for representing one data segment in the data to be stored;
s2, counting the first frequency of each code in the standard code set appearing in the first code segment, wherein the standard code set contains all codes in the first code segment;
and S3, generating a second code segment according to the maximum frequency in all the first frequencies, wherein the second code segment carries the first code segment, and the frequency of each code in the standard code set appearing in the second code segment is the same as the maximum frequency.
Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments, and this embodiment is not described herein again.
Optionally, in this embodiment, the storage medium may include, but is not limited to: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
The integrated unit in the above embodiments, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in the above computer-readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or a part of or all or part of the technical solution contributing to the prior art may be embodied in the form of a software product stored in a storage medium, and including instructions for causing one or more computer devices (which may be personal computers, servers, network devices, or the like) to execute all or part of the steps of the method described in the embodiments of the present application.
In the above embodiments of the present application, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed client may be implemented in other manners. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims (12)

1. A data processing method, comprising:
acquiring a first coding segment, wherein the first coding segment is used for representing one data segment in data to be stored;
counting a first frequency at which each code in a standard code set occurs in the first code segment, wherein the standard code set contains all codes in the first code segment;
and generating a second coding segment according to the maximum frequency in all the first frequencies, wherein the second coding segment carries the first coding segment, and the frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency.
2. The method of claim 1, wherein generating a second code segment at a maximum frequency of all of the first frequencies comprises:
subtracting the first frequency of each code in the standard code set from the maximum frequency to obtain a second frequency of the code;
generating a third code segment at the second frequency of each code in the standard code set, wherein the frequency of each code in the standard code set occurring in the third code segment is the second frequency of the code;
and combining the first coding segment and the third coding segment to obtain the second coding segment.
3. The method of claim 2, wherein combining the first coded segment with the third coded segment to obtain the second coded segment comprises:
randomly sequencing all codes in the third code segment;
and splicing the first coding segment and the randomly sequenced third coding segment to obtain the second coding segment.
4. The method of claim 1, wherein obtaining the first code segment comprises:
cutting the data to be stored to obtain one or more data segments;
and coding each data segment into one first code segment by adopting a preset code format, wherein the preset code format comprises a custom code format and a character format.
5. The method of claim 4, wherein each code in the set of statistical standard codes encodes a first frequency of occurrence in the first code segment, comprising:
determining the standard coding set according to the self-defined coding format, wherein if the self-defined coding format takes the binary character string with the specified digit as one code, the standard coding set comprises all the binary character strings which accord with the specified digit;
and counting the frequency of each code in the standard code set appearing in the first code segment to obtain the first frequency, wherein the first code segment is divided into a plurality of codes according to the custom code format.
6. The method of claim 4, wherein each code in the set of statistical standard codes encodes a first frequency of occurrence in the first code segment, comprising:
determining the standard coding set according to the character format;
and counting the frequency of each code in the standard code set appearing in the first code segment to obtain the first frequency, wherein the first code segment is divided into a plurality of codes according to the character format.
7. The method of claim 4, wherein after generating the second code segment at a maximum frequency of all of the first frequencies, the method further comprises:
acquiring the position information of a data segment corresponding to the first coding segment in the data to be stored;
converting the position information into a position code by adopting the preset coding format;
splicing the position code into the second code segment.
8. The method of claim 7, wherein before obtaining the location information of the data segment corresponding to the first encoded segment in the data to be stored, the method further comprises:
if the data to be stored comprises a plurality of data segments, processing each second coding segment according to the following mode:
acquiring the maximum value of the maximum frequencies corresponding to all the second coding segments;
and performing coding frequency supplement on the second coding section according to the maximum value to obtain an updated second coding section, wherein the frequency of each code in the standard coding set appearing in the updated second coding section is the same as the maximum value.
9. The method of any of claims 1 to 8, wherein after generating a second code segment at a maximum frequency of all of the first frequencies, the method further comprises:
acquiring a second coding segment set, wherein the second coding segment set comprises second coding segments corresponding to each data segment in the data to be stored;
and carrying out XOR operation on the second coding segment set by using a data key to obtain an XOR group, wherein the XOR group is used for representing the encrypted data to be stored.
10. A data processing apparatus, comprising:
the device comprises an acquisition module, a storage module and a processing module, wherein the acquisition module is used for acquiring a first coding segment, and the first coding segment is used for representing one data segment in data to be stored;
a counting module, configured to count a first frequency of each code in a standard code set appearing in the first code segment, where the standard code set includes all codes in the first code segment;
a generating module, configured to generate a second coding segment according to a maximum frequency in all the first frequencies, where the second coding segment carries the first coding segment, and a frequency of each code in the standard coding set appearing in the second coding segment is the same as the maximum frequency.
11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor executes the method of any of the preceding claims 1 to 9 by means of the computer program.
12. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the data processing method according to any one of claims 1 to 9.
CN202111593205.1A 2021-12-24 2021-12-24 Data processing method and device, electronic equipment and storage medium Active CN113987556B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111593205.1A CN113987556B (en) 2021-12-24 2021-12-24 Data processing method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111593205.1A CN113987556B (en) 2021-12-24 2021-12-24 Data processing method and device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113987556A CN113987556A (en) 2022-01-28
CN113987556B true CN113987556B (en) 2022-05-10

Family

ID=79734186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111593205.1A Active CN113987556B (en) 2021-12-24 2021-12-24 Data processing method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113987556B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116738471B (en) * 2023-08-10 2023-10-20 陕西昕晟链云信息科技有限公司 Block chain-based decentralization data analysis method

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5045853A (en) * 1987-06-17 1991-09-03 Intel Corporation Method and apparatus for statistically encoding digital data
JP2006165801A (en) * 2004-12-03 2006-06-22 Canon Inc Coding mode discrimination apparatus and method
TW201131369A (en) * 2010-03-09 2011-09-16 Silicon Motion Inc Electronic apparatus and method for storing data in a memory
CN102460404A (en) * 2009-06-01 2012-05-16 起元技术有限责任公司 Generating obfuscated data
CN106484753A (en) * 2016-06-07 2017-03-08 湖南千年华光软件开发有限公司 Data processing method
CN107332567A (en) * 2017-06-09 2017-11-07 西安万像电子科技有限公司 Coding method and device
CN109697277A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 The method and apparatus of Text compression
CN110569487A (en) * 2019-08-19 2019-12-13 积成电子股份有限公司 base64 extension coding method and system based on high-frequency character substitution algorithm
CN111130558A (en) * 2019-12-31 2020-05-08 世纪恒通科技股份有限公司 Coding table compression method based on statistical probability

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9129001B2 (en) * 2012-05-07 2015-09-08 Sybase, Inc. Character data compression for reducing storage requirements in a database system
CN110266316B (en) * 2019-05-08 2023-02-21 创新先进技术有限公司 Data compression and decompression method, device and equipment
CN110932822B (en) * 2019-12-02 2022-06-17 泰康保险集团股份有限公司 Data encoding method, data decoding method, device, equipment and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5045853A (en) * 1987-06-17 1991-09-03 Intel Corporation Method and apparatus for statistically encoding digital data
JP2006165801A (en) * 2004-12-03 2006-06-22 Canon Inc Coding mode discrimination apparatus and method
CN102460404A (en) * 2009-06-01 2012-05-16 起元技术有限责任公司 Generating obfuscated data
TW201131369A (en) * 2010-03-09 2011-09-16 Silicon Motion Inc Electronic apparatus and method for storing data in a memory
CN106484753A (en) * 2016-06-07 2017-03-08 湖南千年华光软件开发有限公司 Data processing method
CN107332567A (en) * 2017-06-09 2017-11-07 西安万像电子科技有限公司 Coding method and device
CN109697277A (en) * 2017-10-20 2019-04-30 北京京东尚科信息技术有限公司 The method and apparatus of Text compression
CN110569487A (en) * 2019-08-19 2019-12-13 积成电子股份有限公司 base64 extension coding method and system based on high-frequency character substitution algorithm
CN111130558A (en) * 2019-12-31 2020-05-08 世纪恒通科技股份有限公司 Coding table compression method based on statistical probability

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《采用哈夫曼编码技术提高硬件无损压缩效率的算法研究》;董乾;《中国博士学位论文全文数据库(信息科技辑)》;20200515;第I138-9页 *

Also Published As

Publication number Publication date
CN113987556A (en) 2022-01-28

Similar Documents

Publication Publication Date Title
CN108377183B (en) XDR data information encryption method, device, equipment and medium
CN103684760B (en) The encryption of communication and the method, apparatus of decryption and system
JP3650611B2 (en) Program for encryption and decryption
CN109688289B (en) Image encryption transmission method, image decryption method and device
CN110224999B (en) Information interaction method and device and storage medium
CN111683046A (en) Method, device, equipment and storage medium for compressing and acquiring file
CN105100085B (en) A kind of method and apparatus that information is encrypted and decrypted
CN107105324B (en) Method and client for protecting bullet screen information
CN110995391A (en) Data transmission method in isolated network, server and terminal
CN113987556B (en) Data processing method and device, electronic equipment and storage medium
CN110138739A (en) Data information encryption method, device, computer equipment and storage medium
CN115150818B (en) Communication transmission encryption method based on artificial intelligence
CN115842621B (en) Intelligent medical system based on big data and cloud edge cooperation
CN116151740B (en) Inventory transaction data process safety management system and cloud platform
CN115296862A (en) Network data secure transmission method based on data coding
CN114285575B (en) Image encryption and decryption method and device, storage medium and electronic device
CN115695051A (en) Data center transmission management system based on remote network platform architecture
CN112235104B (en) Data encryption transmission method, system, terminal and storage medium
CN113904832A (en) Data encryption method, device, equipment and storage medium
CN106375177A (en) Message transmission method and apparatus
KR101045222B1 (en) Method of encrypting and synthesizing personal information into order information and contents information, apparatus, server and recording media
CN112953716A (en) Method and device for generating and verifying exchange code
CN115757535A (en) Data query method, data storage method and device and electronic equipment
US20130117576A1 (en) Converting apparatus, converting method, and recording medium of converting program
CN113762958A (en) Method and device for generating electronic certificate

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20220128

Assignee: Hangzhou Quanke Technology Co.,Ltd.

Assignor: HANGZHOU HYPERCHAIN TECHNOLOGIES Co.,Ltd.

Contract record no.: X2022980029948

Denomination of invention: Data processing method and device, electronic equipment, storage medium

Granted publication date: 20220510

License type: Common License

Record date: 20230115