CN110472202B

CN110472202B - Unicode-based information embedding and extracting method

Info

Publication number: CN110472202B
Application number: CN201910740854.6A
Authority: CN
Inventors: 张怡; 周诠; 黎军; 沈俊; 刘娟妮; 梁薇; 李静玲; 崔涛; 呼延烺
Original assignee: Xian Institute of Space Radio Technology
Current assignee: Xian Institute of Space Radio Technology
Priority date: 2019-08-12
Filing date: 2019-08-12
Publication date: 2023-08-01
Anticipated expiration: 2039-08-12
Also published as: CN110472202A

Abstract

The invention provides an information embedding and extracting method based on Unicode coding, which comprises the following steps: preprocessing the high byte and the low byte of the text message character after Unicode coding to centralize the numerical value range of the high byte and the low byte of each character; (1.2) carrying out piecewise coding calculation on the high bytes and the low bytes after pretreatment, so that the high bytes and the low bytes after coding are concentrated in certain data bits; (1.3) using unused data bits in the encoded high byte and low byte for information embedding; the information extraction method comprises the following steps: (2.1) extracting the embedded information according to the section where the high byte and the low byte of the message containing the embedded information are located; (2.2) decoding the high byte and the low byte of the message containing the embedded information to obtain the high byte and the low byte of the message with the encoded information removed; and (2.3) performing inverse preprocessing on the high bytes and the low bytes of the message with the coding information removed, and recovering the high bytes and the low bytes of the original data.

Description

Unicode-based information embedding and extracting method

Technical Field

The invention relates to a data communication method, in particular to a Unicode-based information embedding and extracting method, which belongs to the field of communication (such as data communication technology and the like).

Background

In satellite communication systems and navigation systems, text messages of a certain format are transmitted between a transmitting end and a destination end, and the messages mainly consist of Chinese characters, various characters and various symbols. In the transmission process of satellite data, sensitive information can be embedded into the satellite data, and the satellite data and the sensitive information are cooperatively transmitted on the basis of not increasing transmission capacity.

The common text message information embedding and hiding method mainly expands research aiming at the formats such as font size, color, line spacing and the like of texts, and hides embedded information through different text formats. For example, the relative position relation of texts is utilized to hide information, such as shift codes, including line spacing and word spacing codes; feature coding, in which marks are embedded by changing a certain feature of a character, including distinguishing the width of the character occupied by Chinese and English punctuations, modifying fonts and the like; synonym substitution methods, text camouflage, and the like.

However, in either of the above methods, in the case where the communication channel is occupied, it is necessary to have a certain time interval for retransmission of other messages to be desired. This results in inefficiency and underutilization of channel resources.

Disclosure of Invention

The technical solution of the invention is as follows: the method for embedding and extracting the Unicode-based information is provided, sensitive information is embedded into a transmission message under the condition of not adding additional information, original information and embedded information are not damaged to be recovered, and the system capacity is improved under the condition of not adding channel resources.

The technical scheme of the invention is as follows: an information embedding and extracting method based on Unicode coding, the method comprises:

the information embedding method based on Unicode coding comprises the following steps:

the method comprises the steps of (1.1) preprocessing high bytes H and low bytes L of text message characters after Unicode coding, centralizing the numerical value ranges of the high bytes and the low bytes of each character, and facilitating segmentation processing according to the numerical values of the high bytes and the low bytes after preprocessing;

(1.2) carrying out sectional coding calculation on the preprocessed high byte H 'and the preprocessed low byte L', so that the high byte H 'and the low byte L' after coding are concentrated in certain data bits;

(1.3) using unused data bits in the encoded high byte H ", low byte L" for information embedding;

the information extraction method based on Unicode coding comprises the following steps:

(2.1) extracting the embedded information M according to the section where the high byte H 'and the low byte L' of the message containing the embedded information are located;

(2.2) decoding the high byte H 'and the low byte L' of the message containing the embedded information to obtain the high byte H of the message with the encoded information removed _' And low byte L _' ；

(2.3) high byte H of message for removing coding information _' And low byte L _' And (4) performing inverse preprocessing to recover the high byte H and the low byte L of the original data.

The specific method for preprocessing the high byte H and the low byte L of the information character after Unicode encoding in the step (1.1) is as follows:

(1.1.1), when the character is a kanji character, performing the following processing on the character high byte H and low byte L:

H'＝H-78

L'＝L

wherein H 'is a high byte after preprocessing, and each bit from high to low can be expressed as H' ₇ 、H' ₆ 、H' ₅ 、H' ₄ 、H' ₃ 、H' ₂ 、H' ₁ 、H' ₀ The method comprises the steps of carrying out a first treatment on the surface of the L 'is a low byte after preprocessing, and each bit from high to low can be expressed as L' ₇ 、L' ₆ 、L' ₅ 、L' ₄ 、L' ₃ 、L' ₂ 、L' ₁ 、L' ₀ ；

(1.1.2), when the character is a letter or a symbol, performing the following processing on the character high byte H and low byte L:

first, the low byte L is expressed in the form of l=2n+i, where i is 0 or 1;

then, the high byte H 'and the low byte L' after pretreatment are obtained through calculation according to the following formula;

H'＝n+66；

L' ₇ ＝i，L' ₀ to L' ₆ Are all set to 0.

When the character high byte H ' after the preprocessing is located between [82, 127], the specific method for performing the segmentation encoding calculation on the high byte H ' and the low byte L ' after the preprocessing in the step (1.2) is as follows:

the 7 th bit of the encoded high byte H ' is assigned to 1, the rest bits are assigned to the data bit corresponding to the preprocessed high byte H ', the 7 th bit of the low byte L ' is the data bit corresponding to the preprocessed low byte L ', and the 0 th to 6 th bits of the low byte L ' are used for information embedding;

when the character high byte H 'after the preprocessing is located between [0, 81] and the character low byte L' <64 after the preprocessing, the specific method for performing the segmentation encoding calculation on the high byte H 'and the low byte L' after the preprocessing in the step (1.2) is as follows:

the 7 th bit of the encoded high byte H 'is assigned to be 1, and the rest bits are assigned to be the data bits corresponding to the preprocessed high byte H'; the 0 th bit to 5 th bit of the encoded low byte L ' are assigned as the data bits corresponding to the preprocessed low byte L ', and the 6 th bit and 7 th bit of the low byte L ' are used for information embedding.

When the character high byte H 'after the preprocessing is located between [0, 81] and the character low byte L' after the preprocessing is located between [128, 255], the specific method for performing the segmentation encoding calculation on the high byte H 'and the low byte L' after the preprocessing in the step (1.2) is as follows:

the 7 th bit of the encoded high byte H 'is assigned with 0, and the rest bits are assigned with the corresponding data bits of the preprocessed high byte H'; the encoded low byte L "is assigned to the pre-processed low byte L' minus 128, and bit 7 of the low byte L" is used for information embedding.

When the character high byte H 'after the preprocessing is located between [0, 81] and the character low byte L' after the preprocessing is located between [64, 109], the specific method for performing the segmentation encoding calculation on the high byte H 'and the low byte L' after the preprocessing in the step (1.2) is as follows:

the 7 th bit of the encoded high byte H 'is assigned 0, and the rest bits are assigned 18 plus the preprocessed low byte L'; the encoded low byte L "is assigned to the preprocessed high byte H', and the 7 th bit of the low byte L" is used for information embedding, specifically:

H” ₇ ＝0，H”＝L'+18；L”＝H'；

when the character high byte H 'after the preprocessing is located between [0, 45] and the character low byte L' after the preprocessing is located between [110, 127], the specific method for performing the segmentation encoding calculation on the high byte H 'and the low byte L' after the preprocessing in the step (1.2) is as follows:

the 7 th bit of the encoded high byte H 'is assigned 0, and the rest bits are assigned 82 to the preprocessed high byte H'; the encoded low byte L "is assigned as the pre-processed low byte L', and bit 7 of the low byte L" is used for information embedding.

When the character high byte H 'after the preprocessing is located between [46, 81] and the character high byte L' after the preprocessing is located between [110, 127], the specific method for performing the segmentation encoding calculation on the high byte H 'and the low byte L' after the preprocessing in the step (1.2) is as follows:

the 7 th bit of the encoded high byte H 'is assigned 0, and the rest bits are assigned 36 plus the preprocessed high byte H'; the encoded low byte L "is assigned a value of 28 subtracted from the pre-processed low byte L', and the 7 th bit of the low byte L" is used for information embedding.

When the high byte H "of the message containing embedded information is located between [0, 81 ]:

the embedded information is M, which is the 7 th bit of the low byte L' of the message containing the embedded information.

Character high byte H for removing coding information _' High byte H "is a character containing embedded information;

character low byte L for removing coding information _' Adding 128 to the 7 th position 0 of the low byte L' of the message containing the embedded information;

the high byte H of the original character is the high byte H of the message with the encoded information removed _' Adding 78;

the low byte L of the original character is the low byte L of the message from which the encoded information is removed _' 。

When the character high byte H "containing embedded information is located between [82, 127] and the character low byte L" containing embedded information is located at [0, 81] or [128, 209 ]:

Character high byte H for removing coding information _' The 7 th bit of (a) is assigned a value of 0 and the remaining bits are assigned to contain embedded informationThe character low byte L "of (2);

character low byte L for removing coding information _' The 7 th bit of (2) is assigned a value of 0, and the remaining bits are assigned a character high byte H' containing embedded information minus 18;

the high byte H of the original character is assigned to the character high byte H from which the encoded information is removed _' Adding 78;

low byte L assignment of original character is character low byte L of removal coding information _' ；

When the character high byte H "containing embedded information is located between [82, 127] and the character low byte L" containing embedded information is located between [110, 127] or [238, 255 ]:

Character high byte H for removing coding information _' A high byte H "of information with embedded information is assigned minus 82;

character low byte L for removing coding information _' The 7 th bit of the data is assigned to 0, and the other bits are assigned to the corresponding data bit of the character low byte L' containing embedded information;

When the character high byte H "containing the embedded information is located between [82, 117] and the character low byte L" containing the embedded information is located between [82, 99] or [210, 227 ]:

The character high byte H 'with the encoded information removed is assigned as the character high byte H' with the embedded information minus 36;

character low byte L for removing coding information _' The 7 th bit of (2) is assigned 0, and the rest bits are assigned as the corresponding data bit of the character low byte L' containing embedded information plus 28;

high-byte H assignment of original character as character high-word with encoded information removedSection H _' Adding 78;

When the character high byte H "containing embedded information is located between [128, 209 ]:

the embedded information is M, which is the 7 th bit and the 6 th bit of the low byte L' of the message containing the embedded information.

Character high byte H for removing coding information _' Assigning a character high byte H "containing embedded information minus 128;

character low byte L for removing coding information _' The 7 th bit and the 6 th bit of the data are assigned to 0, and the rest bits are assigned to the corresponding data bits of the character low byte L' containing embedded information;

When the character high byte H "containing embedded information is located between [210, 255 ]:

the embedded information is M, namely the lower 7 bits of the character low byte L' containing the embedded information;

character high byte H for removing coding information _' The 7 th bit of the data is assigned to 0, and the other bits are assigned to the corresponding data bit of the character high byte H' containing embedded information;

character low byte L for removing coding information _' The 0 th to 6 th bits of (a) are assigned 0, and the rest bits are assigned as the corresponding data bits of the character low byte L' containing embedded information;

the high byte H of the original character is assigned 0;

the low byte L of the original character is assigned as: l= (H) _' -66)×2+L _'7 。

Compared with the prior art, the invention has the beneficial effects that:

the invention embeds the sensitive information into the transmission information without adding additional information, and does not damage the original information and the embedded information, and improves the system capacity without adding channel resources.

Drawings

FIG. 1 is a block diagram of an implementation of information embedding and information extraction based on Unicode encoding;

Detailed Description

The following detailed description refers to the accompanying drawings and detailed description.

Unicode is a relatively universal character code for computers in the world, which is extended from the ASCII character set and is mainly characterized by the use of two byte codes per character. The method can meet the requirements of cross-platform text processing and conversion, so that the method is used for researching a lossless information embedding method aiming at the text message based on Unicode coding. Firstly, researching and analyzing the characteristics of the code, processing the information characters (Chinese characters, alphabetic characters and other symbols), calculating the sectional code and transforming between different areas, and on the premise of not needing any additional information, lossless embedding 1-7 bit information into each character. The method can realize the cooperative transmission of satellite information and sensitive information, can achieve the information embedding rate of 1/16-7/16, can recover the original information and the embedded information without distortion, and improves the transmission capacity of the system under the condition of not increasing channel resources.

The embedded information carrier studied in the invention is a text message transmitted in a satellite communication system and a navigation system, and the message mainly consists of Chinese characters, letters and various symbols. In the Unicode coding range, [0x4E00,0x9FBF ] (hexadecimal) (or decimal [19968, 40869 ]) is a Chinese character range, [0x0021, 0x0076 ] (hexadecimal) (or decimal [33, 124 ]) is a letter, number, common symbol range. Thus, the present invention is directed to the analysis and study of two locations in Unicode encoding, [0x4E00,0x9FBF ] (hexadecimal) and [0x0021,0x 0076C ] (hexadecimal).

In Unicode encoding, since each character is encoded with two bytes, the present invention divides the encoding of each character into high bytes and low bytes. Wherein, the high byte refers to the high 8-bit data, and the low byte refers to the low 8-bit data. Firstly, the characteristics and the numerical range of the two zone bit numerical values in Unicode coding are analyzed, the high and low bytes of the characters are coded into different sections according to the different numerical values, so that the numerical range of the characters of the text message is more concentrated, and the information embedding position is obtained.

Aiming at the information which is transmitted in a satellite communication system and a navigation system and consists of Chinese characters, various characters and various symbols, the invention researches an information embedding method without additional information and no damage.

As shown in fig. 1, the information embedding process and the extraction process based on Unicode coding provided by the invention are specifically implemented as follows:

1. the information embedding method based on Unicode coding comprises the following steps:

it is assumed that each character Unicode, after encoding, can be represented as [ H, L ]]Where H refers to the high byte, for a total of 8 bits, each bit from high to low can be expressed as (H ₇ 、H ₆ 、H ₅ 、H ₄ 、H ₃ 、H ₂ 、H ₁ 、H ₀ ) The method comprises the steps of carrying out a first treatment on the surface of the L refers to a low byte, 8 bits total, and each bit from high to low can be expressed as (L ₇ 、L ₆ 、L ₅ 、L ₄ 、L ₃ 、L ₂ 、L ₁ 、L ₀ ). The high and low byte values H, L corresponding to the characters in the section are all converted decimal values; h ₇ 、H ₆ 、H ₅ 、H ₄ 、H ₃ 、H ₂ 、H ₁ 、H ₀ An 8-bit binary number corresponding to H; l (L) ₇ 、L ₆ 、L ₅ 、L ₄ 、L ₃ 、L ₂ 、L ₁ 、L ₀ Is the 8-bit binary value corresponding to L.

The pretreatment is specifically as follows:

(1.1.1), when the character is a kanji character (i.e., H is located between [78, 59 ]), the following processing is performed on the character high byte H, low byte L:

H'＝H-78

L'＝L

wherein H 'is a high byte after preprocessing, and each bit from high to low can be expressed as H' ₇ 、H' ₆ 、H' ₅ 、H' ₄ 、H' ₃ 、H' ₂ 、H' ₁ 、H' ₀ The method comprises the steps of carrying out a first treatment on the surface of the L 'is a low byte after preprocessing, and each bit from high to low can be expressed as L' ₇ 、L' ₆ 、L' ₅ 、L' ₄ 、L' ₃ 、L' ₂ 、L' ₁ 、L' ₀ The method comprises the steps of carrying out a first treatment on the surface of the H' is located at [0, 81 after pretreatment]Between, L' is at [0, 255]Between them;

(1.1.2), when the character is a letter or a symbol (H is 0, L is located between [33, 124 ]), the following processing is performed on the character high byte H, low byte L:

first, the low byte L is expressed in the form of l=2n+i, where i is 0 or 1;

H'＝n+66；

L' ₇ ＝i，L' ₀ to L' ₆ All are set with 0; h' after pretreatment is located at [82, 127]]Between, L' is 0 or 128.

the specific method for carrying out sectional coding calculation on the preprocessed high byte H 'and low byte L' comprises the following steps:

(1.2.1), when the character high byte H ' after preprocessing is located between [82, 127], then it is indicated that the character is a letter or a symbol, in which case the low byte can only be 0 or 128, then the 7 th bit of the encoded high byte h″ is assigned to 1, the rest of the bits are assigned to the data bits corresponding to the high byte H ' after preprocessing, the 7 th bit of the low byte l″ is the data bits corresponding to the low byte L ' after preprocessing, and the 0 th to 6 th bits of the low byte l″ are used for information embedding, specifically:

H” ₇ ＝1，H” ₆ ＝H' ₆ ，H” ₅ ＝H' ₅ ，H” ₄ ＝H' ₄ ，H” ₃ ＝H' ₃ ，H” ₂ ＝H' ₂ ，H” ₁ ＝H' ₁ ，H” ₀ ＝H' ₀ ；

L” ₇ ＝L' ₇ ；

in which case the character can embed 7 bits of information, the encoded H "being located between 210, 255.

(1.2.2), when the character high byte H 'after preprocessing is between 0, 81 and the character low byte L' after preprocessing is <64 (i.e., L 'is between 0, 63), the 7 th bit of the encoded high byte H "is assigned to 1, and the remaining bits are assigned to the data bits corresponding to the preprocessed high byte H'; the 0 th bit to 5 th bit of the encoded low byte L ' are assigned as the data bit corresponding to the preprocessed low byte L ', and the 6 th bit and 7 th bit of the low byte L ' are used for information embedding, specifically:

L” ₅ ＝L' ₅ ，L” ₄ ＝L' ₄ ，L” ₃ ＝L' ₃ ，L” ₂ ＝L' ₂ ，L” ₁ ＝L' ₁ ，L” ₀ ＝L' ₀ ；

in which case the character can embed 2 bits of information, the encoded H "being located between 128, 209.

(1.2.3) when the character high byte H ' after preprocessing is between [0, 81] and the character low byte L ' after preprocessing is between [128, 255], the 7 th bit of the encoded high byte H "is assigned 0, and the remaining bits are assigned to the data bits corresponding to the high byte H ' after preprocessing; the encoded low byte L "is assigned to the pre-processed low byte L' minus 128, and the 7 th bit of the low byte L" is used for information embedding, specifically:

H” ₇ ＝0，H” ₆ ＝H' ₆ ，H” ₅ ＝H' ₅ ，H” ₄ ＝H' ₄ ，H” ₃ ＝H' ₃ ，H” ₂ ＝H' ₂ ，H” ₁ ＝H' ₁ ，H” ₀ ＝H' ₀ ；

L”＝L'-128；

in this case the character can embed 1 bit of information with H "between [0, 81] and L" between [0, 127 ].

(1.2.4) when the character high byte H ' after preprocessing is between [0, 81] and the character low byte L ' after preprocessing is between [64, 109], the 7 th bit of the encoded high byte H "is assigned 0, and the remaining bits are assigned 18 plus the preprocessed low byte L '; the encoded low byte L "is assigned to the preprocessed high byte H', and the 7 th bit of the low byte L" is used for information embedding, specifically:

H” ₇ ＝0，H”＝L'+18；L”＝H'；

in this case the character can embed 1 bit of information with H "between [82, 127] and L" between [0, 81 ].

(1.2.5) when the character high byte H ' after preprocessing is between [0, 45] and the character low byte L ' after preprocessing is between [110, 127], the 7 th bit of the encoded high byte H "is assigned 0, and the remaining bits are assigned 82 added to the preprocessed high byte H '; the encoded low byte L "is assigned as the preprocessed low byte L', and the 7 th bit of the low byte L" is used for information embedding, specifically:

H” ₇ ＝0，H”＝H'+82；L”＝L'；

in this case the character can embed 1 bit of information with H "between [82, 127] and L" between [110, 127 ].

(1.2.6), when the character high byte H ' after the preprocessing is located between [46, 81] and when the character low byte L ' after the preprocessing is located between [110, 127], the 7 th bit of the encoded high byte H "is assigned 0, and the remaining bits are assigned 36 plus the preprocessed high byte H '; the encoded low byte L "is assigned as the pre-processed low byte L' minus 28, and the 7 th bit of the low byte L" is used for information embedding, specifically:

H” ₇ ＝0，H”＝H'+36；L”＝L'-28；

in this case the character can embed 1 bit of information with H "between 82, 117 and L" between 82, 99.

the sectional coding and information embedding conditions of the invention are summarized in the following table:

table 1 numerical ranges of high and low bytes of characters before and after information embedding

2. The information extraction method based on Unicode coding comprises the following steps:

after receiving the message containing the embedded information, the receiving end divides the received data into a high byte H 'and a low byte L', firstly analyzes the characteristics of the high byte and the low byte of the data, decodes and processes the data according to the analysis result, then extracts the embedded information M, and simultaneously recovers the original data H, L, namely the high byte H and the low byte L of the text message character after Unicode encoding.

And judging the range of the high byte H 'and the low byte L' and carrying out transformation operation of different areas on the H 'and the L'.

(2.1.1), H' is located between [0, 81], 1 bit of information is embedded in the 7 th bit of the character low byte. Thus, the embedded information is M the 7 th bit of the message low byte L' containing the embedded information.

(2.1.2), H 'is located between [82, 127] and L' is located at [0, 81] or [128, 209], 1 bit of information is embedded in the 7 th bit of the low byte of the character. Thus, the embedded information is M the 7 th bit of the message low byte L' containing the embedded information.

(2.1.3), H 'is located between [82, 127] and L' is located between [110, 127] or [238, 255], the original data is a Kanji character, and 1 bit of information is embedded in the 7 th bit of the low byte of the character. Thus, the embedded information is M the 7 th bit of the message low byte L' containing the embedded information.

(2.1.4), H 'is located between [82, 117] and L' is located between [82, 99] or [210, 227], 1 bit of information is embedded in the 7 th bit of the low byte of the character. Thus, the embedded information is M the 7 th bit of the message low byte L' containing the embedded information.

(2.1.5), H' is between [128, 209], the original data is a Chinese character, and the 6 th and 7 th bits of the low byte of the character are embedded with 2 bits of information. Thus, the embedded information is the 7 th bit and the 6 th bit of the message low byte L' containing the embedded information.

(2.1.6), H "between [210, 255], the original data is a letter or symbol character, 7 bits of information is embedded in bits 0, 1, 2, 3, 4, 5, 6 of the character low byte, and thus the embedded information is M the low 7 bits of the character low byte L" containing the embedded information;

(2.2) decoding the high byte H 'and the low byte L' of the message containing the embedded information to obtain the high byte H of the message with the encoded information removed _' And low byte L _' The method comprises the steps of carrying out a first treatment on the surface of the The decoding method comprises the following steps:

(2.2.1), H' is between [0, 81], the decoding method is:

character low byte L for removing coding information _' The low byte L "7 th position 0 is appended 128 to the message containing the embedded information.

(2.2.2), H "is between [82, 127] and L" is at [0, 81] or [128, 209 ]:

character high byte H for removing coding information _' The 7 th bit of (a) is assigned 0, and the remaining bits are assigned as character low byte L' containing embedded information;

(2.2.3), H' at [82, 127]]And L between _' Located at [110, 127]]Or [238, 255]]When in between:

(2.2.4) H 'is located between [82, 117] and L' is located between [82, 99] or [210, 227]

Character high byte H for removing coding information _' Assigning a character high byte H "containing embedded information minus 36;

(2.2.5) H' is located between [128, 209]

(2.2.6) H' is between [210, 255 ]:

(2.3.1), H' is between [0, 81]

the low byte L of the original character is the low byte L of the message from which the encoded information is removed _' ；

H is located between [78, 159] and L is located between [128, 255 ].

(2.3.2), H "is between [82, 127] and L" is at [0, 81] or [128, 209 ]:

H is located between [78, 159] and L is located between [64, 109 ].

(2.3.3), H' at [82, 127]]And L between _' Located at [110, 127]]Or [238, 255]]Between which are located

the low byte L of the original character is assigned as the character low byte L' for removing the coding information;

h is located between [78, 123], and L is located between [110, 127 ].

(2.3.4), H "is between [82, 117] and L" is between [82, 99] or [210, 227 ]:

H is located between [124, 159] and L is located between [110, 127 ].

(2.3.5), H "between [128, 209 ]:

H is located between [78, 159] and L is located between [0, 63 ].

(2.3.6), H "between [210, 255 ]:

the high byte H of the original character is assigned 0;

the low byte L of the original character is assigned as: l= (H' -66) × 2+L _'7 。

H is 0 and L is located between [33, 124 ].

Thus, the extraction of the embedded information based on Unicode coding is completed and the original message is restored.

When information is embedded, the invention separates the high byte and the low byte of the data after Unicode coding, analyzes, processes and codes different ranges of the high byte and the low byte, and increases the embedding amount of character information.

In the message based on Unicode coding, assuming that the proportion of letters and various other characters is m, the proportion of characters with low bytes less than 64 in the kanji characters is n, and the proportion of other kanji characters is k, namely m+n+k=1, then in the whole message, the information embedding rate R is: r=m×7/16+n×2/16+k/16

If the proportion of the letters and various symbols in the whole message is m=1, the information embedding amount is the largest in this case, and 43.75% of data can be embedded in the whole message;

if the whole message has no letters or other symbols, and the low byte value in the kanji character is larger than 63, i.e. k=1, the information embedding amount is minimum in this case, and 6.25% of data can be embedded in the whole message;

in other cases, the embeddable data rate in the entire message is between 6.25% -43.75%.

In summary, the present invention provides a method for embedding and extracting information based on Unicode, which provides a method for embedding and extracting information without additional information and damage in text messages based on Unicode through data analysis, segmentation coding and different domain transformation processing of high and low bytes of characters. The method can improve the transmission capacity of the system without increasing channel resources, and is suitable for various data transmission systems with different types. Therefore, the invention is suitable for being used in space information networks, in particular in satellite communication systems and navigation systems, and has wide application prospect and practicability.

What is not described in detail in the present specification belongs to the known technology of those skilled in the art.

Claims

1. The Unicode-based information embedding and extracting method is characterized by comprising the following steps of:

H'＝H-78

L'＝L

first, the low byte L is expressed in the form of l=2n+i, where i is 0 or 1;

H'＝n+66；

L' ₇ ＝i，L' ₀ to L' ₆ All are set with 0;

(2.2) decoding the high byte H 'and the low byte L' of the message containing the embedded information to obtain a high byte H 'and a low byte L' of the message from which the encoded information is removed;

(2.3) performing inverse preprocessing on the high byte H 'and the low byte L' of the message from which the coding information is removed, and recovering the high byte H and the low byte L of the original data.

2. The method for embedding and extracting information based on Unicode according to claim 1, wherein when the high byte H ' of the character after preprocessing is located between [82, 127], the specific method for performing the sectional encoding calculation on the high byte H ' and the low byte L ' after preprocessing in step (1.2) is as follows:

the 7 th bit of the encoded high byte H ' is assigned to 1, the rest bits are assigned to the data bit corresponding to the preprocessed high byte H ', the 7 th bit of the low byte L ' is the data bit corresponding to the preprocessed low byte L ', and the 0 th to 6 th bits of the low byte L ' are used for information embedding.

3. The method for embedding and extracting information based on Unicode according to claim 2, wherein when the high byte H 'of the character after preprocessing is located between [0, 81] and the low byte L' of the character after preprocessing is <64 >, the specific method for performing the sectional encoding calculation on the high byte H 'and the low byte L' after preprocessing in step (1.2) is as follows:

4. The method for embedding and extracting information based on Unicode according to claim 2, wherein when the character high byte H 'after preprocessing is located between [0, 81] and the character low byte L' after preprocessing is located between [128, 255], the specific method for performing the sectional encoding calculation on the high byte H 'and the low byte L' after preprocessing in step (1.2) is as follows:

5. The method for embedding and extracting information based on Unicode according to claim 2, wherein when the character high byte H 'after preprocessing is located between [0, 81] and the character low byte L' after preprocessing is located between [64, 109], the specific method for performing the sectional encoding calculation on the high byte H 'and the low byte L' after preprocessing in step (1.2) is as follows:

H” ₇ ＝0，H”＝L'+18；L”＝H'。

6. the method for embedding and extracting information based on Unicode according to claim 2, wherein when the character high byte H 'after preprocessing is located between [0, 45] and the character low byte L' after preprocessing is located between [110, 127], the specific method for performing the sectional encoding calculation on the high byte H 'and the low byte L' after preprocessing in step (1.2) is as follows:

7. The method for embedding and extracting information based on Unicode according to claim 2, wherein when the character high byte H 'after preprocessing is located between [46, 81] and when the character high byte L' after preprocessing is located between [110, 127], the specific method for performing the sectional encoding calculation on the high byte H 'and the low byte L' after preprocessing in step (1.2) is as follows:

8. The method for embedding and extracting information based on Unicode encoding according to claim 1, wherein when the high byte H "of the message containing the embedded information is located between [0, 81 ]:

the embedded information is M7 th bit of the low byte L' of the message containing the embedded information;

the character high byte H 'of the code information is removed to be the character high byte H' containing the embedded information;

the character low byte L 'with the code information removed is the message low byte L' containing embedded information, and 128 is added after the 7 th position 0;

the high byte H of the original character is the high byte H' of the message from which the encoded information was removed plus 78;

the low byte L of the original character is the low byte L' of the message from which the encoded information is removed.

9. The information embedding and extracting method based on Unicode encoding as claimed in claim 1, wherein when a character high byte H "containing embedded information is located between [82, 127] and a character low byte L" containing embedded information is located at [0, 81] or [128, 209 ]:

the 7 th bit of the character high byte H 'of the removed coding information is assigned 0, and the rest bits are assigned as character low byte L' containing embedded information;

the 7 th bit of the character low byte L 'with the code information removed is assigned to 0, and the rest bits are assigned to the character high byte H' with embedded information minus 18;

the high byte H of the original character is assigned as the character high byte H' with the encoded information removed plus 78;

the low byte L of the original character is assigned as the character low byte L' from which the encoded information is removed.

10. The information embedding and extracting method based on Unicode encoding as claimed in claim 1, wherein when a character high byte H "containing embedded information is located between [82, 127] and a character low byte L" containing embedded information is located between [110, 127] or [238, 255 ]:

the character high byte H 'with the encoded information removed is assigned as the information high byte H' with embedded information minus 82;

the 7 th bit of the character low byte L 'with the coding information removed is assigned to 0, and the rest bits are assigned to the corresponding data bits of the character low byte L' with the embedded information;

11. The method for embedding and extracting information based on Unicode encoding as claimed in claim 1, wherein when the character high byte H "containing the embedded information is located between [82, 117] and the character low byte L" containing the embedded information is located between [82, 99] or [210, 227 ]:

the 7 th bit of the character low byte L 'with the code information removed is assigned to 0, and the rest bits are assigned to the corresponding data bit of the character low byte L' with the embedded information plus 28;

12. The method for embedding and extracting information based on Unicode encoding as claimed in claim 1, wherein when the character high byte H "containing the embedded information is located between [128, 209 ]:

the embedded information is M, namely the 7 th bit and the 6 th bit of the low byte L' of the message containing the embedded information;

the character high byte H 'with the encoded information removed is assigned to the character high byte H' with the embedded information minus 128;

the 7 th bit and the 6 th bit of the character low byte L 'of the code information are removed to be assigned to 0, and the rest bits are assigned to the corresponding data bits of the character low byte L' containing the embedded information;

13. The method for embedding and extracting information based on Unicode encoding as claimed in claim 1, wherein when the character high byte H "containing the embedded information is located between [210, 255 ]:

the 7 th bit of the character high byte H 'with the coding information removed is assigned to 0, and the rest bits are assigned to the corresponding data bits of the character high byte H' with the embedded information;

the 0 th to 6 th bits of the character low byte L 'with the coding information removed are assigned to 0, and the rest bits are assigned to the corresponding data bits of the character low byte L' with the embedded information;

the high byte H of the original character is assigned 0;

the low byte L of the original character is assigned as: l= (H '-66) × 2+L' ₇ 。