CN110472202B - Unicode-based information embedding and extracting method - Google Patents

Unicode-based information embedding and extracting method Download PDF

Info

Publication number
CN110472202B
CN110472202B CN201910740854.6A CN201910740854A CN110472202B CN 110472202 B CN110472202 B CN 110472202B CN 201910740854 A CN201910740854 A CN 201910740854A CN 110472202 B CN110472202 B CN 110472202B
Authority
CN
China
Prior art keywords
byte
character
information
low
assigned
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910740854.6A
Other languages
Chinese (zh)
Other versions
CN110472202A (en
Inventor
张怡
周诠
黎军
沈俊
刘娟妮
梁薇
李静玲
崔涛
呼延烺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Institute of Space Radio Technology
Original Assignee
Xian Institute of Space Radio Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Institute of Space Radio Technology filed Critical Xian Institute of Space Radio Technology
Priority to CN201910740854.6A priority Critical patent/CN110472202B/en
Publication of CN110472202A publication Critical patent/CN110472202A/en
Application granted granted Critical
Publication of CN110472202B publication Critical patent/CN110472202B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides an information embedding and extracting method based on Unicode coding, which comprises the following steps: preprocessing the high byte and the low byte of the text message character after Unicode coding to centralize the numerical value range of the high byte and the low byte of each character; (1.2) carrying out piecewise coding calculation on the high bytes and the low bytes after pretreatment, so that the high bytes and the low bytes after coding are concentrated in certain data bits; (1.3) using unused data bits in the encoded high byte and low byte for information embedding; the information extraction method comprises the following steps: (2.1) extracting the embedded information according to the section where the high byte and the low byte of the message containing the embedded information are located; (2.2) decoding the high byte and the low byte of the message containing the embedded information to obtain the high byte and the low byte of the message with the encoded information removed; and (2.3) performing inverse preprocessing on the high bytes and the low bytes of the message with the coding information removed, and recovering the high bytes and the low bytes of the original data.

Description

Unicode-based information embedding and extracting method
Technical Field
The invention relates to a data communication method, in particular to a Unicode-based information embedding and extracting method, which belongs to the field of communication (such as data communication technology and the like).
Background
In satellite communication systems and navigation systems, text messages of a certain format are transmitted between a transmitting end and a destination end, and the messages mainly consist of Chinese characters, various characters and various symbols. In the transmission process of satellite data, sensitive information can be embedded into the satellite data, and the satellite data and the sensitive information are cooperatively transmitted on the basis of not increasing transmission capacity.
The common text message information embedding and hiding method mainly expands research aiming at the formats such as font size, color, line spacing and the like of texts, and hides embedded information through different text formats. For example, the relative position relation of texts is utilized to hide information, such as shift codes, including line spacing and word spacing codes; feature coding, in which marks are embedded by changing a certain feature of a character, including distinguishing the width of the character occupied by Chinese and English punctuations, modifying fonts and the like; synonym substitution methods, text camouflage, and the like.
However, in either of the above methods, in the case where the communication channel is occupied, it is necessary to have a certain time interval for retransmission of other messages to be desired. This results in inefficiency and underutilization of channel resources.
Disclosure of Invention
The technical solution of the invention is as follows: the method for embedding and extracting the Unicode-based information is provided, sensitive information is embedded into a transmission message under the condition of not adding additional information, original information and embedded information are not damaged to be recovered, and the system capacity is improved under the condition of not adding channel resources.
The technical scheme of the invention is as follows: an information embedding and extracting method based on Unicode coding, the method comprises:
the information embedding method based on Unicode coding comprises the following steps:
the method comprises the steps of (1.1) preprocessing high bytes H and low bytes L of text message characters after Unicode coding, centralizing the numerical value ranges of the high bytes and the low bytes of each character, and facilitating segmentation processing according to the numerical values of the high bytes and the low bytes after preprocessing;
(1.2) carrying out sectional coding calculation on the preprocessed high byte H 'and the preprocessed low byte L', so that the high byte H 'and the low byte L' after coding are concentrated in certain data bits;
(1.3) using unused data bits in the encoded high byte H ", low byte L" for information embedding;
the information extraction method based on Unicode coding comprises the following steps:
(2.1) extracting the embedded information M according to the section where the high byte H 'and the low byte L' of the message containing the embedded information are located;
(2.2) decoding the high byte H 'and the low byte L' of the message containing the embedded information to obtain the high byte H of the message with the encoded information removed ' And low byte L '
(2.3) high byte H of message for removing coding information ' And low byte L ' And (4) performing inverse preprocessing to recover the high byte H and the low byte L of the original data.
The specific method for preprocessing the high byte H and the low byte L of the information character after Unicode encoding in the step (1.1) is as follows:
(1.1.1), when the character is a kanji character, performing the following processing on the character high byte H and low byte L:
H'=H-78
L'=L
wherein H 'is a high byte after preprocessing, and each bit from high to low can be expressed as H' 7 、H' 6 、H' 5 、H' 4 、H' 3 、H' 2 、H' 1 、H' 0 The method comprises the steps of carrying out a first treatment on the surface of the L 'is a low byte after preprocessing, and each bit from high to low can be expressed as L' 7 、L' 6 、L' 5 、L' 4 、L' 3 、L' 2 、L' 1 、L' 0
(1.1.2), when the character is a letter or a symbol, performing the following processing on the character high byte H and low byte L:
first, the low byte L is expressed in the form of l=2n+i, where i is 0 or 1;
then, the high byte H 'and the low byte L' after pretreatment are obtained through calculation according to the following formula;
H'=n+66;
L' 7 =i,L' 0 to L' 6 Are all set to 0.
When the character high byte H ' after the preprocessing is located between [82, 127], the specific method for performing the segmentation encoding calculation on the high byte H ' and the low byte L ' after the preprocessing in the step (1.2) is as follows:
the 7 th bit of the encoded high byte H ' is assigned to 1, the rest bits are assigned to the data bit corresponding to the preprocessed high byte H ', the 7 th bit of the low byte L ' is the data bit corresponding to the preprocessed low byte L ', and the 0 th to 6 th bits of the low byte L ' are used for information embedding;
when the character high byte H 'after the preprocessing is located between [0, 81] and the character low byte L' <64 after the preprocessing, the specific method for performing the segmentation encoding calculation on the high byte H 'and the low byte L' after the preprocessing in the step (1.2) is as follows:
the 7 th bit of the encoded high byte H 'is assigned to be 1, and the rest bits are assigned to be the data bits corresponding to the preprocessed high byte H'; the 0 th bit to 5 th bit of the encoded low byte L ' are assigned as the data bits corresponding to the preprocessed low byte L ', and the 6 th bit and 7 th bit of the low byte L ' are used for information embedding.
When the character high byte H 'after the preprocessing is located between [0, 81] and the character low byte L' after the preprocessing is located between [128, 255], the specific method for performing the segmentation encoding calculation on the high byte H 'and the low byte L' after the preprocessing in the step (1.2) is as follows:
the 7 th bit of the encoded high byte H 'is assigned with 0, and the rest bits are assigned with the corresponding data bits of the preprocessed high byte H'; the encoded low byte L "is assigned to the pre-processed low byte L' minus 128, and bit 7 of the low byte L" is used for information embedding.
When the character high byte H 'after the preprocessing is located between [0, 81] and the character low byte L' after the preprocessing is located between [64, 109], the specific method for performing the segmentation encoding calculation on the high byte H 'and the low byte L' after the preprocessing in the step (1.2) is as follows:
the 7 th bit of the encoded high byte H 'is assigned 0, and the rest bits are assigned 18 plus the preprocessed low byte L'; the encoded low byte L "is assigned to the preprocessed high byte H', and the 7 th bit of the low byte L" is used for information embedding, specifically:
H” 7 =0,H”=L'+18;L”=H';
when the character high byte H 'after the preprocessing is located between [0, 45] and the character low byte L' after the preprocessing is located between [110, 127], the specific method for performing the segmentation encoding calculation on the high byte H 'and the low byte L' after the preprocessing in the step (1.2) is as follows:
the 7 th bit of the encoded high byte H 'is assigned 0, and the rest bits are assigned 82 to the preprocessed high byte H'; the encoded low byte L "is assigned as the pre-processed low byte L', and bit 7 of the low byte L" is used for information embedding.
When the character high byte H 'after the preprocessing is located between [46, 81] and the character high byte L' after the preprocessing is located between [110, 127], the specific method for performing the segmentation encoding calculation on the high byte H 'and the low byte L' after the preprocessing in the step (1.2) is as follows:
the 7 th bit of the encoded high byte H 'is assigned 0, and the rest bits are assigned 36 plus the preprocessed high byte H'; the encoded low byte L "is assigned a value of 28 subtracted from the pre-processed low byte L', and the 7 th bit of the low byte L" is used for information embedding.
When the high byte H "of the message containing embedded information is located between [0, 81 ]:
the embedded information is M, which is the 7 th bit of the low byte L' of the message containing the embedded information.
Character high byte H for removing coding information ' High byte H "is a character containing embedded information;
character low byte L for removing coding information ' Adding 128 to the 7 th position 0 of the low byte L' of the message containing the embedded information;
the high byte H of the original character is the high byte H of the message with the encoded information removed ' Adding 78;
the low byte L of the original character is the low byte L of the message from which the encoded information is removed '
When the character high byte H "containing embedded information is located between [82, 127] and the character low byte L" containing embedded information is located at [0, 81] or [128, 209 ]:
the embedded information is M, which is the 7 th bit of the low byte L' of the message containing the embedded information.
Character high byte H for removing coding information ' The 7 th bit of (a) is assigned a value of 0 and the remaining bits are assigned to contain embedded informationThe character low byte L "of (2);
character low byte L for removing coding information ' The 7 th bit of (2) is assigned a value of 0, and the remaining bits are assigned a character high byte H' containing embedded information minus 18;
the high byte H of the original character is assigned to the character high byte H from which the encoded information is removed ' Adding 78;
low byte L assignment of original character is character low byte L of removal coding information '
When the character high byte H "containing embedded information is located between [82, 127] and the character low byte L" containing embedded information is located between [110, 127] or [238, 255 ]:
the embedded information is M, which is the 7 th bit of the low byte L' of the message containing the embedded information.
Character high byte H for removing coding information ' A high byte H "of information with embedded information is assigned minus 82;
character low byte L for removing coding information ' The 7 th bit of the data is assigned to 0, and the other bits are assigned to the corresponding data bit of the character low byte L' containing embedded information;
the high byte H of the original character is assigned to the character high byte H from which the encoded information is removed ' Adding 78;
low byte L assignment of original character is character low byte L of removal coding information '
When the character high byte H "containing the embedded information is located between [82, 117] and the character low byte L" containing the embedded information is located between [82, 99] or [210, 227 ]:
the embedded information is M, which is the 7 th bit of the low byte L' of the message containing the embedded information.
The character high byte H 'with the encoded information removed is assigned as the character high byte H' with the embedded information minus 36;
character low byte L for removing coding information ' The 7 th bit of (2) is assigned 0, and the rest bits are assigned as the corresponding data bit of the character low byte L' containing embedded information plus 28;
high-byte H assignment of original character as character high-word with encoded information removedSection H ' Adding 78;
low byte L assignment of original character is character low byte L of removal coding information '
When the character high byte H "containing embedded information is located between [128, 209 ]:
the embedded information is M, which is the 7 th bit and the 6 th bit of the low byte L' of the message containing the embedded information.
Character high byte H for removing coding information ' Assigning a character high byte H "containing embedded information minus 128;
character low byte L for removing coding information ' The 7 th bit and the 6 th bit of the data are assigned to 0, and the rest bits are assigned to the corresponding data bits of the character low byte L' containing embedded information;
the high byte H of the original character is assigned to the character high byte H from which the encoded information is removed ' Adding 78;
low byte L assignment of original character is character low byte L of removal coding information '
When the character high byte H "containing embedded information is located between [210, 255 ]:
the embedded information is M, namely the lower 7 bits of the character low byte L' containing the embedded information;
character high byte H for removing coding information ' The 7 th bit of the data is assigned to 0, and the other bits are assigned to the corresponding data bit of the character high byte H' containing embedded information;
character low byte L for removing coding information ' The 0 th to 6 th bits of (a) are assigned 0, and the rest bits are assigned as the corresponding data bits of the character low byte L' containing embedded information;
the high byte H of the original character is assigned 0;
the low byte L of the original character is assigned as: l= (H) ' -66)×2+L '7
Compared with the prior art, the invention has the beneficial effects that:
the invention embeds the sensitive information into the transmission information without adding additional information, and does not damage the original information and the embedded information, and improves the system capacity without adding channel resources.
Drawings
FIG. 1 is a block diagram of an implementation of information embedding and information extraction based on Unicode encoding;
Detailed Description
The following detailed description refers to the accompanying drawings and detailed description.
Unicode is a relatively universal character code for computers in the world, which is extended from the ASCII character set and is mainly characterized by the use of two byte codes per character. The method can meet the requirements of cross-platform text processing and conversion, so that the method is used for researching a lossless information embedding method aiming at the text message based on Unicode coding. Firstly, researching and analyzing the characteristics of the code, processing the information characters (Chinese characters, alphabetic characters and other symbols), calculating the sectional code and transforming between different areas, and on the premise of not needing any additional information, lossless embedding 1-7 bit information into each character. The method can realize the cooperative transmission of satellite information and sensitive information, can achieve the information embedding rate of 1/16-7/16, can recover the original information and the embedded information without distortion, and improves the transmission capacity of the system under the condition of not increasing channel resources.
The embedded information carrier studied in the invention is a text message transmitted in a satellite communication system and a navigation system, and the message mainly consists of Chinese characters, letters and various symbols. In the Unicode coding range, [0x4E00,0x9FBF ] (hexadecimal) (or decimal [19968, 40869 ]) is a Chinese character range, [0x0021, 0x0076 ] (hexadecimal) (or decimal [33, 124 ]) is a letter, number, common symbol range. Thus, the present invention is directed to the analysis and study of two locations in Unicode encoding, [0x4E00,0x9FBF ] (hexadecimal) and [0x0021,0x 0076C ] (hexadecimal).
In Unicode encoding, since each character is encoded with two bytes, the present invention divides the encoding of each character into high bytes and low bytes. Wherein, the high byte refers to the high 8-bit data, and the low byte refers to the low 8-bit data. Firstly, the characteristics and the numerical range of the two zone bit numerical values in Unicode coding are analyzed, the high and low bytes of the characters are coded into different sections according to the different numerical values, so that the numerical range of the characters of the text message is more concentrated, and the information embedding position is obtained.
Aiming at the information which is transmitted in a satellite communication system and a navigation system and consists of Chinese characters, various characters and various symbols, the invention researches an information embedding method without additional information and no damage.
As shown in fig. 1, the information embedding process and the extraction process based on Unicode coding provided by the invention are specifically implemented as follows:
1. the information embedding method based on Unicode coding comprises the following steps:
the method comprises the steps of (1.1) preprocessing high bytes H and low bytes L of text message characters after Unicode coding, centralizing the numerical value ranges of the high bytes and the low bytes of each character, and facilitating segmentation processing according to the numerical values of the high bytes and the low bytes after preprocessing;
it is assumed that each character Unicode, after encoding, can be represented as [ H, L ]]Where H refers to the high byte, for a total of 8 bits, each bit from high to low can be expressed as (H 7 、H 6 、H 5 、H 4 、H 3 、H 2 、H 1 、H 0 ) The method comprises the steps of carrying out a first treatment on the surface of the L refers to a low byte, 8 bits total, and each bit from high to low can be expressed as (L 7 、L 6 、L 5 、L 4 、L 3 、L 2 、L 1 、L 0 ). The high and low byte values H, L corresponding to the characters in the section are all converted decimal values; h 7 、H 6 、H 5 、H 4 、H 3 、H 2 、H 1 、H 0 An 8-bit binary number corresponding to H; l (L) 7 、L 6 、L 5 、L 4 、L 3 、L 2 、L 1 、L 0 Is the 8-bit binary value corresponding to L.
The pretreatment is specifically as follows:
(1.1.1), when the character is a kanji character (i.e., H is located between [78, 59 ]), the following processing is performed on the character high byte H, low byte L:
H'=H-78
L'=L
wherein H 'is a high byte after preprocessing, and each bit from high to low can be expressed as H' 7 、H' 6 、H' 5 、H' 4 、H' 3 、H' 2 、H' 1 、H' 0 The method comprises the steps of carrying out a first treatment on the surface of the L 'is a low byte after preprocessing, and each bit from high to low can be expressed as L' 7 、L' 6 、L' 5 、L' 4 、L' 3 、L' 2 、L' 1 、L' 0 The method comprises the steps of carrying out a first treatment on the surface of the H' is located at [0, 81 after pretreatment]Between, L' is at [0, 255]Between them;
(1.1.2), when the character is a letter or a symbol (H is 0, L is located between [33, 124 ]), the following processing is performed on the character high byte H, low byte L:
first, the low byte L is expressed in the form of l=2n+i, where i is 0 or 1;
then, the high byte H 'and the low byte L' after pretreatment are obtained through calculation according to the following formula;
H'=n+66;
L' 7 =i,L' 0 to L' 6 All are set with 0; h' after pretreatment is located at [82, 127]]Between, L' is 0 or 128.
(1.2) carrying out sectional coding calculation on the preprocessed high byte H 'and the preprocessed low byte L', so that the high byte H 'and the low byte L' after coding are concentrated in certain data bits;
the specific method for carrying out sectional coding calculation on the preprocessed high byte H 'and low byte L' comprises the following steps:
(1.2.1), when the character high byte H ' after preprocessing is located between [82, 127], then it is indicated that the character is a letter or a symbol, in which case the low byte can only be 0 or 128, then the 7 th bit of the encoded high byte h″ is assigned to 1, the rest of the bits are assigned to the data bits corresponding to the high byte H ' after preprocessing, the 7 th bit of the low byte l″ is the data bits corresponding to the low byte L ' after preprocessing, and the 0 th to 6 th bits of the low byte l″ are used for information embedding, specifically:
H” 7 =1,H” 6 =H' 6 ,H” 5 =H' 5 ,H” 4 =H' 4 ,H” 3 =H' 3 ,H” 2 =H' 2 ,H” 1 =H' 1 ,H” 0 =H' 0
L” 7 =L' 7
in which case the character can embed 7 bits of information, the encoded H "being located between 210, 255.
(1.2.2), when the character high byte H 'after preprocessing is between 0, 81 and the character low byte L' after preprocessing is <64 (i.e., L 'is between 0, 63), the 7 th bit of the encoded high byte H "is assigned to 1, and the remaining bits are assigned to the data bits corresponding to the preprocessed high byte H'; the 0 th bit to 5 th bit of the encoded low byte L ' are assigned as the data bit corresponding to the preprocessed low byte L ', and the 6 th bit and 7 th bit of the low byte L ' are used for information embedding, specifically:
H” 7 =1,H” 6 =H' 6 ,H” 5 =H' 5 ,H” 4 =H' 4 ,H” 3 =H' 3 ,H” 2 =H' 2 ,H” 1 =H' 1 ,H” 0 =H' 0
L” 5 =L' 5 ,L” 4 =L' 4 ,L” 3 =L' 3 ,L” 2 =L' 2 ,L” 1 =L' 1 ,L” 0 =L' 0
in which case the character can embed 2 bits of information, the encoded H "being located between 128, 209.
(1.2.3) when the character high byte H ' after preprocessing is between [0, 81] and the character low byte L ' after preprocessing is between [128, 255], the 7 th bit of the encoded high byte H "is assigned 0, and the remaining bits are assigned to the data bits corresponding to the high byte H ' after preprocessing; the encoded low byte L "is assigned to the pre-processed low byte L' minus 128, and the 7 th bit of the low byte L" is used for information embedding, specifically:
H” 7 =0,H” 6 =H' 6 ,H” 5 =H' 5 ,H” 4 =H' 4 ,H” 3 =H' 3 ,H” 2 =H' 2 ,H” 1 =H' 1 ,H” 0 =H' 0
L”=L'-128;
in this case the character can embed 1 bit of information with H "between [0, 81] and L" between [0, 127 ].
(1.2.4) when the character high byte H ' after preprocessing is between [0, 81] and the character low byte L ' after preprocessing is between [64, 109], the 7 th bit of the encoded high byte H "is assigned 0, and the remaining bits are assigned 18 plus the preprocessed low byte L '; the encoded low byte L "is assigned to the preprocessed high byte H', and the 7 th bit of the low byte L" is used for information embedding, specifically:
H” 7 =0,H”=L'+18;L”=H';
in this case the character can embed 1 bit of information with H "between [82, 127] and L" between [0, 81 ].
(1.2.5) when the character high byte H ' after preprocessing is between [0, 45] and the character low byte L ' after preprocessing is between [110, 127], the 7 th bit of the encoded high byte H "is assigned 0, and the remaining bits are assigned 82 added to the preprocessed high byte H '; the encoded low byte L "is assigned as the preprocessed low byte L', and the 7 th bit of the low byte L" is used for information embedding, specifically:
H” 7 =0,H”=H'+82;L”=L';
in this case the character can embed 1 bit of information with H "between [82, 127] and L" between [110, 127 ].
(1.2.6), when the character high byte H ' after the preprocessing is located between [46, 81] and when the character low byte L ' after the preprocessing is located between [110, 127], the 7 th bit of the encoded high byte H "is assigned 0, and the remaining bits are assigned 36 plus the preprocessed high byte H '; the encoded low byte L "is assigned as the pre-processed low byte L' minus 28, and the 7 th bit of the low byte L" is used for information embedding, specifically:
H” 7 =0,H”=H'+36;L”=L'-28;
in this case the character can embed 1 bit of information with H "between 82, 117 and L" between 82, 99.
(1.3) using unused data bits in the encoded high byte H ", low byte L" for information embedding;
the sectional coding and information embedding conditions of the invention are summarized in the following table:
table 1 numerical ranges of high and low bytes of characters before and after information embedding
2. The information extraction method based on Unicode coding comprises the following steps:
after receiving the message containing the embedded information, the receiving end divides the received data into a high byte H 'and a low byte L', firstly analyzes the characteristics of the high byte and the low byte of the data, decodes and processes the data according to the analysis result, then extracts the embedded information M, and simultaneously recovers the original data H, L, namely the high byte H and the low byte L of the text message character after Unicode encoding.
And judging the range of the high byte H 'and the low byte L' and carrying out transformation operation of different areas on the H 'and the L'.
(2.1) extracting the embedded information M according to the section where the high byte H 'and the low byte L' of the message containing the embedded information are located;
(2.1.1), H' is located between [0, 81], 1 bit of information is embedded in the 7 th bit of the character low byte. Thus, the embedded information is M the 7 th bit of the message low byte L' containing the embedded information.
(2.1.2), H 'is located between [82, 127] and L' is located at [0, 81] or [128, 209], 1 bit of information is embedded in the 7 th bit of the low byte of the character. Thus, the embedded information is M the 7 th bit of the message low byte L' containing the embedded information.
(2.1.3), H 'is located between [82, 127] and L' is located between [110, 127] or [238, 255], the original data is a Kanji character, and 1 bit of information is embedded in the 7 th bit of the low byte of the character. Thus, the embedded information is M the 7 th bit of the message low byte L' containing the embedded information.
(2.1.4), H 'is located between [82, 117] and L' is located between [82, 99] or [210, 227], 1 bit of information is embedded in the 7 th bit of the low byte of the character. Thus, the embedded information is M the 7 th bit of the message low byte L' containing the embedded information.
(2.1.5), H' is between [128, 209], the original data is a Chinese character, and the 6 th and 7 th bits of the low byte of the character are embedded with 2 bits of information. Thus, the embedded information is the 7 th bit and the 6 th bit of the message low byte L' containing the embedded information.
(2.1.6), H "between [210, 255], the original data is a letter or symbol character, 7 bits of information is embedded in bits 0, 1, 2, 3, 4, 5, 6 of the character low byte, and thus the embedded information is M the low 7 bits of the character low byte L" containing the embedded information;
(2.2) decoding the high byte H 'and the low byte L' of the message containing the embedded information to obtain the high byte H of the message with the encoded information removed ' And low byte L ' The method comprises the steps of carrying out a first treatment on the surface of the The decoding method comprises the following steps:
(2.2.1), H' is between [0, 81], the decoding method is:
character high byte H for removing coding information ' High byte H "is a character containing embedded information;
character low byte L for removing coding information ' The low byte L "7 th position 0 is appended 128 to the message containing the embedded information.
(2.2.2), H "is between [82, 127] and L" is at [0, 81] or [128, 209 ]:
character high byte H for removing coding information ' The 7 th bit of (a) is assigned 0, and the remaining bits are assigned as character low byte L' containing embedded information;
character low byte L for removing coding information ' The 7 th bit of (2) is assigned a value of 0, and the remaining bits are assigned a character high byte H' containing embedded information minus 18;
(2.2.3), H' at [82, 127]]And L between ' Located at [110, 127]]Or [238, 255]]When in between:
character high byte H for removing coding information ' A high byte H "of information with embedded information is assigned minus 82;
character low byte L for removing coding information ' The 7 th bit of the data is assigned to 0, and the other bits are assigned to the corresponding data bit of the character low byte L' containing embedded information;
(2.2.4) H 'is located between [82, 117] and L' is located between [82, 99] or [210, 227]
Character high byte H for removing coding information ' Assigning a character high byte H "containing embedded information minus 36;
character low byte L for removing coding information ' The 7 th bit of (2) is assigned 0, and the rest bits are assigned as the corresponding data bit of the character low byte L' containing embedded information plus 28;
(2.2.5) H' is located between [128, 209]
Character high byte H for removing coding information ' Assigning a character high byte H "containing embedded information minus 128;
character low byte L for removing coding information ' The 7 th bit and the 6 th bit of the data are assigned to 0, and the rest bits are assigned to the corresponding data bits of the character low byte L' containing embedded information;
(2.2.6) H' is between [210, 255 ]:
character high byte H for removing coding information ' The 7 th bit of the data is assigned to 0, and the other bits are assigned to the corresponding data bit of the character high byte H' containing embedded information;
character low byte L for removing coding information ' The 0 th to 6 th bits of (a) are assigned 0, and the rest bits are assigned as the corresponding data bits of the character low byte L' containing embedded information;
(2.3) high byte H of message for removing coding information ' And low byte L ' And (4) performing inverse preprocessing to recover the high byte H and the low byte L of the original data.
(2.3.1), H' is between [0, 81]
The high byte H of the original character is the high byte H of the message with the encoded information removed ' Adding 78;
the low byte L of the original character is the low byte L of the message from which the encoded information is removed '
H is located between [78, 159] and L is located between [128, 255 ].
(2.3.2), H "is between [82, 127] and L" is at [0, 81] or [128, 209 ]:
the high byte H of the original character is assigned to the character high byte H from which the encoded information is removed ' Adding 78;
low byte L assignment of original character is character low byte L of removal coding information '
H is located between [78, 159] and L is located between [64, 109 ].
(2.3.3), H' at [82, 127]]And L between ' Located at [110, 127]]Or [238, 255]]Between which are located
The high byte H of the original character is assigned to the character high byte H from which the encoded information is removed ' Adding 78;
the low byte L of the original character is assigned as the character low byte L' for removing the coding information;
h is located between [78, 123], and L is located between [110, 127 ].
(2.3.4), H "is between [82, 117] and L" is between [82, 99] or [210, 227 ]:
the high byte H of the original character is assigned to the character high byte H from which the encoded information is removed ' Adding 78;
low byte L assignment of original character is character low byte L of removal coding information '
H is located between [124, 159] and L is located between [110, 127 ].
(2.3.5), H "between [128, 209 ]:
the high byte H of the original character is assigned to the character high byte H from which the encoded information is removed ' Adding 78;
low byte L assignment of original character is character low byte L of removal coding information '
H is located between [78, 159] and L is located between [0, 63 ].
(2.3.6), H "between [210, 255 ]:
the high byte H of the original character is assigned 0;
the low byte L of the original character is assigned as: l= (H' -66) × 2+L '7
H is 0 and L is located between [33, 124 ].
Thus, the extraction of the embedded information based on Unicode coding is completed and the original message is restored.
When information is embedded, the invention separates the high byte and the low byte of the data after Unicode coding, analyzes, processes and codes different ranges of the high byte and the low byte, and increases the embedding amount of character information.
In the message based on Unicode coding, assuming that the proportion of letters and various other characters is m, the proportion of characters with low bytes less than 64 in the kanji characters is n, and the proportion of other kanji characters is k, namely m+n+k=1, then in the whole message, the information embedding rate R is: r=m×7/16+n×2/16+k/16
If the proportion of the letters and various symbols in the whole message is m=1, the information embedding amount is the largest in this case, and 43.75% of data can be embedded in the whole message;
if the whole message has no letters or other symbols, and the low byte value in the kanji character is larger than 63, i.e. k=1, the information embedding amount is minimum in this case, and 6.25% of data can be embedded in the whole message;
in other cases, the embeddable data rate in the entire message is between 6.25% -43.75%.
In summary, the present invention provides a method for embedding and extracting information based on Unicode, which provides a method for embedding and extracting information without additional information and damage in text messages based on Unicode through data analysis, segmentation coding and different domain transformation processing of high and low bytes of characters. The method can improve the transmission capacity of the system without increasing channel resources, and is suitable for various data transmission systems with different types. Therefore, the invention is suitable for being used in space information networks, in particular in satellite communication systems and navigation systems, and has wide application prospect and practicability.
What is not described in detail in the present specification belongs to the known technology of those skilled in the art.

Claims (13)

1. The Unicode-based information embedding and extracting method is characterized by comprising the following steps of:
the information embedding method based on Unicode coding comprises the following steps:
the method comprises the steps of (1.1) preprocessing high bytes H and low bytes L of text message characters after Unicode coding, centralizing the numerical value ranges of the high bytes and the low bytes of each character, and facilitating segmentation processing according to the numerical values of the high bytes and the low bytes after preprocessing;
the specific method for preprocessing the high byte H and the low byte L of the information character after Unicode encoding in the step (1.1) is as follows:
(1.1.1), when the character is a kanji character, performing the following processing on the character high byte H and low byte L:
H'=H-78
L'=L
wherein H 'is a high byte after preprocessing, and each bit from high to low can be expressed as H' 7 、H' 6 、H' 5 、H' 4 、H' 3 、H' 2 、H' 1 、H' 0 The method comprises the steps of carrying out a first treatment on the surface of the L 'is a low byte after preprocessing, and each bit from high to low can be expressed as L' 7 、L' 6 、L' 5 、L' 4 、L' 3 、L' 2 、L' 1 、L' 0
(1.1.2), when the character is a letter or a symbol, performing the following processing on the character high byte H and low byte L:
first, the low byte L is expressed in the form of l=2n+i, where i is 0 or 1;
then, the high byte H 'and the low byte L' after pretreatment are obtained through calculation according to the following formula;
H'=n+66;
L' 7 =i,L' 0 to L' 6 All are set with 0;
(1.2) carrying out sectional coding calculation on the preprocessed high byte H 'and the preprocessed low byte L', so that the high byte H 'and the low byte L' after coding are concentrated in certain data bits;
(1.3) using unused data bits in the encoded high byte H ", low byte L" for information embedding;
the information extraction method based on Unicode coding comprises the following steps:
(2.1) extracting the embedded information M according to the section where the high byte H 'and the low byte L' of the message containing the embedded information are located;
(2.2) decoding the high byte H 'and the low byte L' of the message containing the embedded information to obtain a high byte H 'and a low byte L' of the message from which the encoded information is removed;
(2.3) performing inverse preprocessing on the high byte H 'and the low byte L' of the message from which the coding information is removed, and recovering the high byte H and the low byte L of the original data.
2. The method for embedding and extracting information based on Unicode according to claim 1, wherein when the high byte H ' of the character after preprocessing is located between [82, 127], the specific method for performing the sectional encoding calculation on the high byte H ' and the low byte L ' after preprocessing in step (1.2) is as follows:
the 7 th bit of the encoded high byte H ' is assigned to 1, the rest bits are assigned to the data bit corresponding to the preprocessed high byte H ', the 7 th bit of the low byte L ' is the data bit corresponding to the preprocessed low byte L ', and the 0 th to 6 th bits of the low byte L ' are used for information embedding.
3. The method for embedding and extracting information based on Unicode according to claim 2, wherein when the high byte H 'of the character after preprocessing is located between [0, 81] and the low byte L' of the character after preprocessing is <64 >, the specific method for performing the sectional encoding calculation on the high byte H 'and the low byte L' after preprocessing in step (1.2) is as follows:
the 7 th bit of the encoded high byte H 'is assigned to be 1, and the rest bits are assigned to be the data bits corresponding to the preprocessed high byte H'; the 0 th bit to 5 th bit of the encoded low byte L ' are assigned as the data bits corresponding to the preprocessed low byte L ', and the 6 th bit and 7 th bit of the low byte L ' are used for information embedding.
4. The method for embedding and extracting information based on Unicode according to claim 2, wherein when the character high byte H 'after preprocessing is located between [0, 81] and the character low byte L' after preprocessing is located between [128, 255], the specific method for performing the sectional encoding calculation on the high byte H 'and the low byte L' after preprocessing in step (1.2) is as follows:
the 7 th bit of the encoded high byte H 'is assigned with 0, and the rest bits are assigned with the corresponding data bits of the preprocessed high byte H'; the encoded low byte L "is assigned to the pre-processed low byte L' minus 128, and bit 7 of the low byte L" is used for information embedding.
5. The method for embedding and extracting information based on Unicode according to claim 2, wherein when the character high byte H 'after preprocessing is located between [0, 81] and the character low byte L' after preprocessing is located between [64, 109], the specific method for performing the sectional encoding calculation on the high byte H 'and the low byte L' after preprocessing in step (1.2) is as follows:
the 7 th bit of the encoded high byte H 'is assigned 0, and the rest bits are assigned 18 plus the preprocessed low byte L'; the encoded low byte L "is assigned to the preprocessed high byte H', and the 7 th bit of the low byte L" is used for information embedding, specifically:
H” 7 =0,H”=L'+18;L”=H'。
6. the method for embedding and extracting information based on Unicode according to claim 2, wherein when the character high byte H 'after preprocessing is located between [0, 45] and the character low byte L' after preprocessing is located between [110, 127], the specific method for performing the sectional encoding calculation on the high byte H 'and the low byte L' after preprocessing in step (1.2) is as follows:
the 7 th bit of the encoded high byte H 'is assigned 0, and the rest bits are assigned 82 to the preprocessed high byte H'; the encoded low byte L "is assigned as the pre-processed low byte L', and bit 7 of the low byte L" is used for information embedding.
7. The method for embedding and extracting information based on Unicode according to claim 2, wherein when the character high byte H 'after preprocessing is located between [46, 81] and when the character high byte L' after preprocessing is located between [110, 127], the specific method for performing the sectional encoding calculation on the high byte H 'and the low byte L' after preprocessing in step (1.2) is as follows:
the 7 th bit of the encoded high byte H 'is assigned 0, and the rest bits are assigned 36 plus the preprocessed high byte H'; the encoded low byte L "is assigned a value of 28 subtracted from the pre-processed low byte L', and the 7 th bit of the low byte L" is used for information embedding.
8. The method for embedding and extracting information based on Unicode encoding according to claim 1, wherein when the high byte H "of the message containing the embedded information is located between [0, 81 ]:
the embedded information is M7 th bit of the low byte L' of the message containing the embedded information;
the character high byte H 'of the code information is removed to be the character high byte H' containing the embedded information;
the character low byte L 'with the code information removed is the message low byte L' containing embedded information, and 128 is added after the 7 th position 0;
the high byte H of the original character is the high byte H' of the message from which the encoded information was removed plus 78;
the low byte L of the original character is the low byte L' of the message from which the encoded information is removed.
9. The information embedding and extracting method based on Unicode encoding as claimed in claim 1, wherein when a character high byte H "containing embedded information is located between [82, 127] and a character low byte L" containing embedded information is located at [0, 81] or [128, 209 ]:
the embedded information is M7 th bit of the low byte L' of the message containing the embedded information;
the 7 th bit of the character high byte H 'of the removed coding information is assigned 0, and the rest bits are assigned as character low byte L' containing embedded information;
the 7 th bit of the character low byte L 'with the code information removed is assigned to 0, and the rest bits are assigned to the character high byte H' with embedded information minus 18;
the high byte H of the original character is assigned as the character high byte H' with the encoded information removed plus 78;
the low byte L of the original character is assigned as the character low byte L' from which the encoded information is removed.
10. The information embedding and extracting method based on Unicode encoding as claimed in claim 1, wherein when a character high byte H "containing embedded information is located between [82, 127] and a character low byte L" containing embedded information is located between [110, 127] or [238, 255 ]:
the embedded information is M7 th bit of the low byte L' of the message containing the embedded information;
the character high byte H 'with the encoded information removed is assigned as the information high byte H' with embedded information minus 82;
the 7 th bit of the character low byte L 'with the coding information removed is assigned to 0, and the rest bits are assigned to the corresponding data bits of the character low byte L' with the embedded information;
the high byte H of the original character is assigned as the character high byte H' with the encoded information removed plus 78;
the low byte L of the original character is assigned as the character low byte L' from which the encoded information is removed.
11. The method for embedding and extracting information based on Unicode encoding as claimed in claim 1, wherein when the character high byte H "containing the embedded information is located between [82, 117] and the character low byte L" containing the embedded information is located between [82, 99] or [210, 227 ]:
the embedded information is M7 th bit of the low byte L' of the message containing the embedded information;
the character high byte H 'with the encoded information removed is assigned as the character high byte H' with the embedded information minus 36;
the 7 th bit of the character low byte L 'with the code information removed is assigned to 0, and the rest bits are assigned to the corresponding data bit of the character low byte L' with the embedded information plus 28;
the high byte H of the original character is assigned as the character high byte H' with the encoded information removed plus 78;
the low byte L of the original character is assigned as the character low byte L' from which the encoded information is removed.
12. The method for embedding and extracting information based on Unicode encoding as claimed in claim 1, wherein when the character high byte H "containing the embedded information is located between [128, 209 ]:
the embedded information is M, namely the 7 th bit and the 6 th bit of the low byte L' of the message containing the embedded information;
the character high byte H 'with the encoded information removed is assigned to the character high byte H' with the embedded information minus 128;
the 7 th bit and the 6 th bit of the character low byte L 'of the code information are removed to be assigned to 0, and the rest bits are assigned to the corresponding data bits of the character low byte L' containing the embedded information;
the high byte H of the original character is assigned as the character high byte H' with the encoded information removed plus 78;
the low byte L of the original character is assigned as the character low byte L' from which the encoded information is removed.
13. The method for embedding and extracting information based on Unicode encoding as claimed in claim 1, wherein when the character high byte H "containing the embedded information is located between [210, 255 ]:
the embedded information is M, namely the lower 7 bits of the character low byte L' containing the embedded information;
the 7 th bit of the character high byte H 'with the coding information removed is assigned to 0, and the rest bits are assigned to the corresponding data bits of the character high byte H' with the embedded information;
the 0 th to 6 th bits of the character low byte L 'with the coding information removed are assigned to 0, and the rest bits are assigned to the corresponding data bits of the character low byte L' with the embedded information;
the high byte H of the original character is assigned 0;
the low byte L of the original character is assigned as: l= (H '-66) × 2+L' 7
CN201910740854.6A 2019-08-12 2019-08-12 Unicode-based information embedding and extracting method Active CN110472202B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910740854.6A CN110472202B (en) 2019-08-12 2019-08-12 Unicode-based information embedding and extracting method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910740854.6A CN110472202B (en) 2019-08-12 2019-08-12 Unicode-based information embedding and extracting method

Publications (2)

Publication Number Publication Date
CN110472202A CN110472202A (en) 2019-11-19
CN110472202B true CN110472202B (en) 2023-08-01

Family

ID=68510464

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910740854.6A Active CN110472202B (en) 2019-08-12 2019-08-12 Unicode-based information embedding and extracting method

Country Status (1)

Country Link
CN (1) CN110472202B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113518079B (en) * 2021-06-17 2022-11-11 西安空间无线电技术研究所 Data feature-based segmented information embedding method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6125380A (en) * 1998-04-13 2000-09-26 Winbond Electronics Corporation Dividing method
CN1542614A (en) * 2003-05-01 2004-11-03 中兴通讯股份有限公司 Bit vector method used for Chinese string matching

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9411785B1 (en) * 2015-04-22 2016-08-09 Pebble Technology, Corp. Embedding hidden content in unicode
CN106570356B (en) * 2016-11-01 2020-01-31 南京理工大学 Text watermark embedding and extracting method based on Unicode coding

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6125380A (en) * 1998-04-13 2000-09-26 Winbond Electronics Corporation Dividing method
CN1542614A (en) * 2003-05-01 2004-11-03 中兴通讯股份有限公司 Bit vector method used for Chinese string matching

Also Published As

Publication number Publication date
CN110472202A (en) 2019-11-19

Similar Documents

Publication Publication Date Title
CA2556950A1 (en) Encoding and decoding alphanumeric data
CN104732228B (en) A kind of detection of PDF document mess code, the method for correction
CN108418683B (en) Carrier-free text steganography method based on Chinese character structural features
EP1345162A3 (en) Character recognition system and method
CN110414194B (en) Text watermark embedding and extracting method
Tayyeh et al. Novel steganography scheme using Arabic text features in Holy Quran
CN110472202B (en) Unicode-based information embedding and extracting method
CN103400173A (en) Generating method and reading method of two-dimensional code containing private information
Rahma et al. Text steganography based on unicode of characters in multilingual
CN105630755B (en) Big-dipper satellite short message expands the source encoding and decoding method and device of information content transmission
CN110704649A (en) Method and system for constructing flow image data set
JP5551660B2 (en) Computer-implemented method for encoding text into matrix code symbols, computer-implemented method for decoding matrix code symbols, encoder for encoding text into matrix code symbols, and decoder for decoding matrix code symbols
CN106777061B (en) Information hiding system and method based on webpage text and image and extraction method
CN105573981A (en) Method and device for extracting Chinese names of people and places
Abbasi et al. Urdu text steganography: Utilizing isolated letters
CN102880874B (en) Character identifying method and Character recognizer
Zaynalov et al. UNICODE For Hiding Information In A Text Document
CN110704813A (en) Character anti-piracy system based on character recoding
CN108108267B (en) Data recovery method and device
KR100636370B1 (en) Apparatus and method for coding using bit-precision, and apparatus and method for decoding according to the same
CN101352015A (en) Transmission of handwriting over SMS protocol
JPS58184646A (en) Message communication system
Manoharan Towards robust steganography using T-codes
CN107391461B (en) Tibetan language code encoding method and device and Tibetan language code decoding method and device
CN112055217B (en) Method for carrying information in any byte data without loss

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant