CN116645971A

CN116645971A - Semantic communication text transmission optimization method based on deep learning

Info

Publication number: CN116645971A
Application number: CN202310512333.1A
Authority: CN
Inventors: 袁源; 宋晓勤; 赵晨辰; 刘宇; 陈思祺
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2023-05-08
Filing date: 2023-05-08
Publication date: 2023-08-25

Abstract

The invention provides a semantic communication text transmission optimization method based on deep learning, which aims at the problem of limited data compression of a traditional communication system. For the further optimization problem in the scene of limited computing resources at the data processing end, the channel coding and decoding are adopted as black box processing, so that the semantic information is optimized as the focus optimization method, the accuracy of text semantic transmission of the semantic communication system is further improved, and a good balance is achieved between complexity and performance.

Description

Semantic communication text transmission optimization method based on deep learning

Technical Field

The invention relates to a semantic communication technology, in particular to a semantic communication method based on deep learning, and more particularly relates to a text transmission-oriented semantic communication optimization method based on deep learning.

Background

From 1G to 5G, communication is mainly concerned with how to accurately and efficiently transmit bits from a transmitter to a receiver. Under the traditional communication architecture, we can reach the transmission rate of the order of gigabit per second (bps), the system capacity gradually approaches the shannon limit, and today in intelligent application scenarios such as man-machine interaction, autopilot, geological monitoring and remote health, the incredible data volume is generated, such as the current cellular network needs to process data traffic growing at an exponential speed, including uplink and downlink data rates of 1Tb/s, traffic densities of 1-10, time delays of 0.1ms, and the like, which inevitably bring about a huge challenge to the traditional communication system. Semantic communication (Semantic Communication, SC) has immeasurable potential in data compression and signaling enhancement as an important product of the convergence development of Deep Learning (DL) and communication technology.

Compared with a mature grammar communication technology, research on semantic communication is still in a preliminary stage, scientists have performed some exploratory work in the field, and certain progress is made in the aspects of design of a semantic communication system architecture at a structural level, training of a background knowledge base in an algorithm aspect, application of a receiving end and the like. However, only a few researches are conducted on semantic communication optimization algorithms in detail, and most of the optimization algorithms at present do not fully consider the shortage of computing resources, and only theoretical demonstration is conducted, so that simulation verification at an application level is lacking. For example, universal Transformer is an optimization algorithm based on a transducer model, and the semantic coding and decoding network is dynamically changed by adding an adaptive network to adapt to the transmission requirement of transmitting texts with different complexity, but the reduced transmission data calculation amount cannot well offset the added adaptive network calculation amount.

Therefore, the invention provides a semantic communication optimization algorithm based on deep learning, which aims at a 6G mobile communication scene with intensive data and resources, takes maximized semantic information as an optimization target of a semantic communication system, and achieves good balance between complexity and performance.

Disclosure of Invention

The invention aims to: aiming at the problems existing in the prior art, a semantic communication optimization algorithm based on deep learning is provided, and the transmission of text data is oriented. The method aims to extract and recover the semantic information by using a transducer, so that the receiving end can recover the semantic information more greatly under the condition of low signal-to-noise ratio.

The technical scheme is as follows: aiming at the problem of limited data compression of the traditional communication system, a semantic communication system constructed by a transducer is adopted to extract and compress semantic information under a specific scene, and then the acquired semantic information is subjected to source channel joint coding/decoding to retain the semantic information to a greater extent under the condition of limited channel capacity. For the further optimization problem of the semantic communication system facing text transmission under the scene of limited computing resources of a data processing end, the channel coding and decoding are adopted as black box processing, so that the semantic information is optimized to be the optimization method of the concentration point, and the accuracy rate of the semantic transmission of text semantics of the semantic communication system is further improved. And finally, based on the constructed mathematical model, comparing and testing with other schemes on different channel environments, and analyzing and verifying the robustness of the system under the condition of limited computing resources. The invention is realized by the following technical scheme: a semantic communication text transmission optimization method based on deep learning comprises the following steps:

(1) At a transmitting end, firstly, preprocessing an input corpus to generate a training set, a testing set and a corresponding dictionary, so that the predicted text can be recovered conveniently;

(2) Input sentence according to initial parameters of modelSub S performs semantic coding S _α (-), semantic representation sequence m=s generated by semantic encoder using a transducer _α (s)；

(3) Channel coded C _β (-) to ensure a stable transmission of the sequence over the channel, the coded symbol stream x=c _β [S _α (s)]；

(4) Establishing a channel model according to the required signal-to-noise ratio condition and the environment for transmitting information;

(5) At the receiving end, the channel output signal y is firstly sent to a channel decoding module to recover the semantic representation sequence

(6) Taking channel coding and decoding as a black box, extracting a recovered semantic representation sequence n and a semantic representation sequence m generated by a semantic encoder, and sending the semantic representation sequence n and the semantic representation sequence m into a semantic optimization network to obtain a loss value required by optimizing semantic information;

(7) According to the local background knowledge base of the receiving end, carrying out semantic decoding on the semantic representation sequence to obtain a predicted text sequence

(8) Performing cross entropy loss function calculation on the predicted s' and the target sequence s, and performing back propagation on the obtained result and a semantic optimization function together to train a system model;

(9) And in the stage of analyzing the system performance, testing the trained system under different channel environments, and focusing on the performance of the system under the condition of limited computing resources by taking BLEU or semantic similarity as an evaluation index.

Further, the step (1) includes the following specific steps:

(1a) Data cleaning: removing accent marks in a language, filtering out unnecessary characters such as XML labels, special symbols and the like, and adding a blank in front of punctuation marks at the end of a sentence so as to separate the punctuation marks from text contents;

(1b) Word segmentation: the text is split into corresponding words, phrases or symbols, etc. for easier subsequent processing. The method employed varies for corpora in different languages.

If the input text is English, french, german and the like, the processing mode is simpler, regular expression word segmentation can be used, non-English characters, namely non-a-Z 'and non-A-Z' characters, can be directly deleted, and capital letters are converted into lowercase forms, so that repeated vocabulary is reduced, and a model is simplified.

And for the processing of the Chinese database, the processing is relatively complex. Firstly, a Chinese word segmentation component 'Jieba' library in a Python third party library is required to be called, and a cut function in the library is used for splitting a Chinese text to be processed. In addition, the deletion operation is required for the non-Chinese characters, and the characters to be processed are known to be the characters of non-one- ' according to the initial Chinese character ' one ' and the ending Chinese character ' ' of the ASCII code table. Finally, the operation of removing stop words is also carried out;

(1c) Clause: the long texts are separated according to sentence standards, so that single sentences can be conveniently processed and sentence lengths can be counted. And the sequence start-end marking is carried out, so that the model can be helped to better identify the sentence structure and grammar rule of the processed language, and the performance of the model are improved;

(1d) Vocabulary construction: creating a list containing all words and uniquely encoding the words so as to perform word frequency statistics and vector conversion on the sentences;

(1e) Sequence Padding (Sequence Padding): in order to make all sequences have the same length and be convenient to input into a training model, filling words are required to be added into sentences with different lengths, generally 0 is selected as a filling word, then post alignment operation is carried out, and the sentences with the longest sentence length are filled;

(1f) Data set partitioning: the dataset was assembled as per 9:1 into training and testing sets, so that the experimental model is evaluated for performance at different stages using the corresponding data sets.

Further, the step (2) includes the following specific steps:

(2a) A semantic codec model is built, comprising a 4-layer encoder layer and a 4-layer decoder layer. Semantic coding is carried out on the input sentences through a local knowledge base, semantic information is extracted from a physical channel, and effective compression of data is achieved;

(2b) For one sentence s of input, s= [ w ] ₁ ，w ₂ ，...，w _l ]Wherein w is _i Representing the i-th word in the sentence, l is the sentence length. Firstly, expanding an input word to a model dimension 128 in a word embedding mode, and enabling a generated word vector to have characteristic semantics and specific position information through the following position codes:

wherein: d is the dimension of the word vector, i is the index of its dimension, and position is the position index of the word vector.

Then, the sentence input into the semantic encoder becomes a form of:

(2c) The key point of the self-attention calculation of the input word vector is the training of three weight matrixes in the self-attention network, namely W ^Q 、W ^K 、W ^V After the vectors are multiplied by the word embedded vector, a Query vector (Q), a Key vector (K) and a Value vector (V) can be obtained, and then attention calculation is carried out by the following formula:

(2d) Since the transducer uses a Multi-head mechanism (Multi-head technology), i.e. by multiple sets of W ^Q 、W ^K 、W ^V The matrix can obtain a plurality of groups of Q, K, V vectors, so that in order to enable the model to pay attention to information of different semantic layers at the same time, Z matrixes obtained by calculation of each group are integrated through splicing (Concat) operation;

MultiHead＝Concat(Z ₁ ，Z ₂ ，...Z _k ) Expression 4

(2e) The spliced result is transmitted into a feedforward network, is transmitted into a channel coding module after being subjected to residual connection (Residual Connection) and layer normalization (Layer Normalization, LN), and the generated semantic representation sequence is S _α (s)。

Further, the step (4) comprises the following specific steps:

(4a) Most physical channels can be modeled by neural networks. For additive white gaussian noise (Additive White Gaussian Noise, AWGN) channels, multiplicative gaussian noise channels, and erasure channels, a simple neural network can model them. For fading channels, such as Rayleigh fading channels, more complex neural networks are needed;

(4b) If the channel coefficient is g and the channel noise is n, after the information of the transmitting end is transmitted, the signal received by the receiving end can be represented by the following formula:

y=gx+n expression 5

(4c) If the channel is AWGN, g=1, N corresponds to gaussian distribution N (0, σ ² ) The method comprises the steps of carrying out a first treatment on the surface of the If the channel is a Rayleigh fading channel, g conforms to Rayleigh distributiong is more than or equal to 0, and under the condition of high signal-to-noise ratio, g and n can be distributed in a circular symmetry complex Gaussian manner CN (0, sigma) ² ) Simulation was performed.

Further, the step (6) comprises the following specific steps:

(6a) In the optimization process, the channel coding and decoding part is firstly processed by a black box, and a semantic representation sequence m generated by a sender semantic encoder and a semantic representation sequence n recovered after the receiver channel decoding are used as inputs of a semantic optimization network. In order to correlate semantic information at two ends of receiving and transmitting to the greatest extent, a maximized mutual information (Mutual Information) mechanism is adopted as an optimization target of the network, and as the mutual information determines the information content of the coded data to a certain extent, the maximized mutual information mechanism has the advantages that the signaling rate of the system can be improved, and the channel tolerance is maximized;

(6b) The method for calculating the mutual information is shown as follows:

wherein: m and N are distribution spaces of semantic characterization sequences M and N, p (M and N) are joint probability distributions of M and N, p (M) and p (N) are edge probability distributions of M and N respectively;

(6c) From the definition of KL divergence, the mutual information of m, n is the KL divergence of the product of the joint probability distribution of m, n and the edge probability distribution, so in order to optimize the mutual information, some properties of KL divergence are needed below;

(6d) Given that KL divergence is a special form of f-divergence, starting from the nature of f-divergence, the general expression for f-divergence is:

wherein: f (·) is a convex function, satisfying

KL divergence is the f-divergence when f (t) =t·logt:

(6e) The lower bound representation of the constraint equation satisfied by the mutual information can be obtained from the nature of the convex function:

wherein: x is any traversable equation;

(6f) In summary, a loss function of the semantic optimization network can be established to realize gradient descent:

the function X that maximizes mutual information can be found out by training the network.

Further, the step (8) comprises the following specific steps:

(8a) Calculating the distribution difference of the prediction result s' and the target result s according to the two distribution differences:

wherein: p represents the probability distribution of the target result, p ' represents the probability distribution of the predicted result, H (p ') represents the information entropy of the predicted result, and H (p, p ') represents the cross entropy of the target result and the predicted result;

(8b) Since H (p') is a certain value, optimizing KL divergence is as effective as optimizing cross entropy terms. In order to reduce the calculation amount of the deep learning, the cross entropy item is selected and optimized, and a cross entropy loss function is established in consideration of the fact that the calculation amount generated by the deep learning process is large. For a classification problem, its corresponding cross entropy loss function can be expressed as:

wherein: n is the number of samples, w _i For the label corresponding to the ith sample, p (w _i ) To predict the probability of correct for sample i.

The cross entropy loss function established from H (p, p') is then:

(8c) In combination with the semantic optimization function, the loss function is updated by the following formula:

loss=loss_ce (s, s')+σ·loss (m, n) expression 14

Wherein: sigma is update weight, the value range of which is 0-1 decimal, and is generally controlled between 0 and 0.2;

(8d) If the Loss is less than the threshold, saving the current model parameters, and then entering (8 e), otherwise, directly entering the next step without saving the model parameters;

(8e) The decoding parameters are updated by back propagation. In the back propagation process, a gradient descent method is used to approximate the optimal solution. The gradient calculating method comprises the following steps:

wherein: alpha is the parameter to be updated and,for the modules or networks that need to be undergone in the process of solving the gradient from Loss to α, since not all parameters need to be strictly calculated according to the modules that are undergone by the back propagation, for example, if only the end-to-end input/output of the codec is considered, the gradient calculation can be simplified as follows:

in particular, when the channel is an AWGN channel,

after the gradient is obtained, model parameters such as weight parameters of an attention mechanism, hidden layer parameters and the like can be dynamically adjusted by the following steps:

α _n ＝α _n-1 - λΔ expression 17

Wherein: λ is a gradient decreasing weight parameter;

(8f) If the current training times are smaller than the iteratable times, turning to the step (2), and continuing to train the network; otherwise, saving the model parameters after the last training, and exiting the training;

further, the step (9) comprises the following specific steps:

(9a) Loading trained model parameters and testing a data set;

(9b) Initializing a channel environment of an analysis process;

(9c) Selecting an evaluation index;

(9d) If the evaluation index is BLEU, the step (9 e) is entered; if the evaluation index is semantic similarity, entering a step (9 f); if the evaluation index is other, returning an error;

(9e) BLEU is based on an N-gram model, and is determined by the precision value p of the N-gram _n And weight value w thereof _n The BLEU size of the output result versus target statement may be calculated:

wherein: BP is a length Penalty factor (Brightness Penalty), whose value is a conditional function:

the BLEU value ranges from 0 to 1, the larger the value, the higher the reduction degree of the predicted result, the smaller the value, the more inaccurate the prediction. When the length of the predicted sentence s' is greater than the length of the original sentence s, the BLEU value is reducedIf the length penalty of the sentence is not longer than the original sentence length, a penalty mechanism is not required to be started;

(9f) The calculation of the semantic similarity is based on a model parameter BERT-Large Uncased (white Word Mask) disclosed by Google, a BERT model calculation formula B (-) is utilized, and the semantic distance between the two is calculated by analogy to an included angle calculation method:

(9f) Calculating an average value of the evaluation index in units of sample capacity;

(9g) And returning an evaluation result.

The beneficial effects are that: the invention provides a semantic communication text transmission optimization method based on deep learning, which aims at a text transmission scene with limited computing resources in 6G communication, adopts a semantic communication system constructed by a Transformer to extract and compress semantic information under a specific scene, adopts a method for optimizing the semantic information as a concentration point by using channel coding and decoding as a black box process, further improves the text transmission semantic accuracy of the semantic communication system, and obtains good balance between complexity and performance.

In summary, in the scenario that the computing resources of the mobile communication data processing end are limited, the semantic communication optimization algorithm based on deep learning provided by the invention is superior in the aspect of maximizing semantic information when the semantic communication optimization algorithm is oriented to text data transmission.

Drawings

Fig. 1 is a schematic diagram of a semantic communication text transmission optimization method based on deep learning according to an embodiment of the present invention;

FIG. 2 is a diagram of simulation results of the effect of convergence of a loss value and training times under a semantic optimization algorithm provided by an embodiment of the present invention;

fig. 3 is a diagram of simulation results of the variation of the BLEU value with the signal-to-noise ratio after training for 10 times under the optimization algorithm provided by the embodiment of the present invention;

Detailed Description

The core idea of the invention is that: aiming at the problem of limited data compression of the traditional communication system, the deep learning-based end-to-end semantic communication system model is adopted to extract and compress semantic information in a specific scene, and the semantic information is reserved to a greater extent through joint coding of information source channels. For the further optimization problem in the scene of limited computing resources at the data processing end, the channel coding and decoding are adopted as black box processing, so that the semantic information is optimized as a focus optimization method, and the aim of further improving the text semantic accuracy of the semantic communication system is fulfilled by gradient descent updating of the decoding parameters.

The present invention is described in further detail below.

Firstly, preprocessing an input corpus at a transmitting end to generate a training set, a testing set and a corresponding dictionary, wherein the training set, the testing set and the corresponding dictionary are convenient for recovering a predicted text, and the method specifically comprises the following steps:

And for the processing of the Chinese database, the processing is relatively complex. Firstly, a Chinese word segmentation component 'Jieba' library in a Python third party library needs to be called, a cut function in the Chinese word segmentation component 'Jieba' is used for splitting a Chinese text to be processed, and sentences are split into independent words and stored in a list. Words are then combined together in spaces by a join function carried by Python to form a sentence. In addition, the deletion operation is required for the non-Chinese characters, and the characters to be processed are known to be the characters of non-one- ' according to the initial Chinese character ' one ' and the ending Chinese character ' ' of the ASCII code table. Finally, the stop word removing operation is performed, namely, nonsensical words in the text are deleted, because the words do not provide valuable information when semantic analysis is performed, including some connective words 'yes', 'in', and the like, and some Chinese assistances such as 'ya', 'in', and the like;

(1c) Clause: the long texts are separated according to sentence standards, so that single sentences can be conveniently processed and sentence lengths can be counted. In addition, as the design scheme uses a transducer to perform semantic processing, in order to help a machine learning model to better understand and process text sequences, a sequence START-END marking needs to be performed, a special START mark (usually indicated by "< s >" or "< START >") is added at the beginning of each text sequence, namely a single sentence, and a special END mark (usually indicated by "</s >" or "< END >") is added at the END, and sentence division is performed by using the marks, so that the model can be helped to better recognize sentence structures and grammar rules of the processed language, and the performance and the expression of the model are improved;

(1e) Sequence filling: in order to make all sequences have the same length and be convenient to input into a training model, filling words are required to be added into sentences with different lengths, generally 0 is selected as a filling word, then post alignment operation is carried out, and the sentences with the longest sentence length are filled;

(1f) Data set partitioning: the dataset was assembled as per 9:1 into a training set and a testing set so as to evaluate the performance of the experimental model by using corresponding data sets at different stages;

step (2), carrying out semantic coding S on the input sentence S according to the initial parameters of the model _α (-), semantic representation sequence m=s generated by semantic encoder using a transducer _α (s) specifically:

(2b) For one sentence s of input, s= [ w ] ₁ ，w ₂ ，...，w _l ]Wherein w is _i Representing the i-th word in the sentence, l is the sentence length. Firstly, inputting a single word through a word embedding modeThe word is expanded to a model dimension 128, and the generated word vector has both characteristic semantics and specific position information through the following position codes:

Then, the sentence input into the semantic encoder becomes a form of:

(2d) Since the transducer uses a multi-head mechanism, i.e. by multiple sets of W ^Q 、W ^K 、W ^V The matrix can obtain a plurality of groups of Q, K, V vectors, so that in order to enable the model to pay attention to information of different semantic layers at the same time, Z matrixes obtained by calculation of each group are integrated through splicing operation;

MultiHead＝Concat(Z ₁ ，Z ₂ ，...Z _k ) Expression 4

(2e) The spliced result is transmitted into a feedforward network, is transmitted into a channel coding module after being subjected to residual connection (Residual Connection) and layer normalization (Layer Normalization, LN), and the generated semantic representation sequence is S _α (s)；

Step (3), channel coding C _β (. Cndot.) to ensure sequence stability on channelFixed transmission, coded symbol stream x=c _β [S _α (s)]；

And (4) establishing a channel model according to the required signal-to-noise ratio condition and the environment for transmitting information, wherein the method comprises the following specific steps of:

y=gx+n expression 5

(4c) If the channel is AWGN, g=1, N corresponds to gaussian distribution N (0, σ ² ) The method comprises the steps of carrying out a first treatment on the surface of the If the channel is a Rayleigh fading channel, g conforms to Rayleigh distributiong is more than or equal to 0, and under the condition of high signal-to-noise ratio, g and n can be distributed in a circular symmetry complex Gaussian manner CN (0, sigma) ² ) Performing simulation;

step (5), at the receiving end, the channel output signal y is firstly sent into a channel decoding module to recover the semantic representation sequence

And (6) taking channel coding and decoding as a black box, extracting a recovered semantic representation sequence n and a semantic representation sequence m generated by a semantic encoder, sending the extracted semantic representation sequence n and the semantic representation sequence m into a semantic optimization network, and obtaining a loss value required by optimizing semantic information, wherein the method specifically comprises the following steps of:

(6a) In the optimization process, the channel coding and decoding part is firstly processed by a black box, and a semantic representation sequence m generated by a sender semantic encoder and a semantic representation sequence n recovered after the receiver channel decoding are used as inputs of a semantic optimization network. In order to correlate semantic information at the receiving and transmitting ends to the greatest extent, a maximized mutual information mechanism is adopted as an optimization target of the network, and the mutual information determines the information content of encoded data to a certain extent, so that the maximized mutual information mechanism has the advantages of improving the signaling rate of the system and maximizing the channel tolerance;

(6b) The method for calculating the mutual information is shown as follows:

(6d) Considering that KL divergence is a special form of f-divergence, starting from the nature of f-divergence, the general expression of f-divergence is:

wherein: f (·) is a convex function, satisfying

KL divergence is the f-divergence when f (t) =t·logt:

wherein: x is any traversable equation;

the function X for maximizing mutual information can be found out through a training network;

step (7), according to the local background knowledge base of the receiving end, carrying out semantic decoding on the semantic representation sequence to obtain a predicted text sequence

Step (8), cross entropy loss function calculation is carried out on the predicted s and the target sequence s, the obtained result and the semantic optimization function are carried out back propagation together, and a system model is trained, and the method comprises the following specific steps:

wherein: n is a sampleNumber, w _i For the label corresponding to the ith sample, p (w _i ) To predict the probability of correct for sample i.

The cross entropy loss function established from H (p, p') is then:

loss=loss_ce (s, s')+σ·loss (m, n) expression 14

wherein: alpha is the parameter to be updated and,in order to calculate the gradient of the module or network that needs to be undergone in the gradient process from Loss to α, since not all parameters need to be strictly calculated according to the module that is undergone by back propagation, for example, when training the semantic communication system, the semantic codec and the channel codec are often combined, only the input and output of the semantic codec from end to end are considered to be processed as a black box, and at this time, the gradient calculation can be simplified as follows:

in particular, when the channel is an AWGN channel，

α _n ＝α _n-1 - λΔ expression 17

Wherein: λ is a gradient decreasing weight parameter;

step (9), in the stage of analyzing system performance, testing the trained system in different channel environments, taking BLEU or semantic similarity as an evaluation index, focusing on the performance of the system under the condition of limited computing resources, and comprising the following specific steps:

(9a) Loading trained model parameters and testing a data set;

(9b) Initializing a channel environment of an analysis process;

(9c) Selecting an evaluation index;

BLEU valueThe larger the value, the higher the degree of reduction of the predicted result, the smaller the value, the more inaccurate the prediction. When the length of the predicted sentence s' is greater than the length of the original sentence s, the BLEU value is reducedIf the length penalty of the sentence is not longer than the original sentence length, a penalty mechanism is not required to be started;

(9g) And returning an evaluation result.

In fig. 1, a semantic communication text transmission optimization method based on deep learning is described, after a channel is processed as a black box, a semantic optimization network further reduces the variation range of a semantic representation sequence by calculating the difference of a semantic sequence at a receiving and transmitting end.

In fig. 2, simulation results of the effect of convergence of the loss value and the training times under the semantic optimization algorithm are described, it can be seen that the convergence speed of the optimization method is faster in the initial stage, and the phenomenon of resource waste and time delay caused by the fact that the sinking part is the smallest does not occur.

In fig. 3, a simulation result of the variation of the BLEU value along with the signal-to-noise ratio after training for 10 times under the optimization algorithm is described, and under the condition of limited computing resources and low signal-to-noise ratio, the semantic optimization algorithm can improve the accuracy by about 20% compared with the deep sc algorithm.

According to the description of the invention, it should be apparent to those skilled in the art that the invention proposes a semantic communication text transmission optimization method based on deep learning, which can effectively avoid the waste of computing resources and achieve a good balance between complexity and performance.

What is not described in detail in the present application belongs to the prior art known to those skilled in the art.

Claims

1. The semantic communication text transmission optimization method based on deep learning is characterized by comprising the following steps of:

(1) Semantic coding S of an input sentence S according to model initial parameters _α (. Cndot.) the generated semantic representation sequence m=s _α (s)；

(2) Channel coded C _β (-) to ensure a stable transmission of the sequence over the channel, the coded symbol stream x=c _β [S _α (s)]；

(3) Establishing a channel model according to the required signal-to-noise ratio condition and the environment for transmitting information;

(4) At the receiving end, the channel output signal y is firstly sent to a channel decoding module to recover the semantic representation sequence

(5) Taking channel coding and decoding as a black box, extracting a recovered semantic representation sequence n and a semantic representation sequence m generated by a semantic encoder, and sending the semantic representation sequence n and the semantic representation sequence m into a semantic optimization network to obtain a loss value required by optimizing semantic information;

(6) According to a local background knowledge base of a receiving end, carrying out semantic decoding on the semantic representation sequence to obtain a predicted sequence s';

(7) Performing cross entropy loss function calculation on the predicted s' and the target sequence s, and performing back propagation on the obtained result and a semantic optimization function together to train a system model;

further, the step (5) comprises the following specific steps:

(5a) In the optimization process, the channel coding and decoding part is firstly processed by a black box, and a semantic characterization sequence m generated by a sender semantic encoder and a semantic characterization sequence n recovered after the receiver channel decoding are used as inputs of a semantic optimization network; in order to correlate the semantic information at the receiving and transmitting ends to the greatest extent, a maximized mutual information mechanism is adopted as an optimization target of the network, and as the mutual information determines the information quantity contained in the coded data to a certain extent, the other advantage of using the mechanism is that the signaling rate of the system can be improved, and the channel tolerance is maximized;

(5b) The method for calculating the mutual information is shown as follows:

(5c) From the definition of KL divergence, the KL divergence of the product of the joint probability distribution of m, n and the edge probability distribution, where KL divergence is a special form of f-divergence (f-divergence), the convex function taking into account the f-divergence satisfiesThe lower bound representation of the constraint equation satisfied by the mutual information can be obtained:

wherein: x is any traversable equation;

(5d) In order to maximize semantic information of the associated transceiver, a loss function of a semantic optimization network is established as follows:

further, the step (7) comprises the following specific steps:

(7a) Calculating the distribution difference of the prediction result s' and the target result s according to the two distribution differences:

wherein: l is the sentence length, p represents the probability distribution of the target result, p ' represents the probability distribution of the predicted result, H (p ') represents the information entropy of the predicted result, and H (p, p ') represents the cross entropy of the target result and the predicted result;

(7b) In order to reduce the calculation amount of the deep learning, selecting and optimizing cross entropy items, and establishing a cross entropy loss function:

wherein: n is the number of samples;

(7c) In combination with the semantic optimization function, the loss function is updated by the following formula:

Loss＝Loss_CE(s，s′)+σ·Loss(m，n)

(7d) If the Loss is less than the threshold, the current model parameters are saved, and then the step (7 e) is carried out, otherwise, the next step is directly carried out;

(7e) The decoding parameters are updated through back propagation, and the gradient calculation method comprises the following steps:

wherein: alpha is the parameter to be updated and,in order to find the module or network that needs to be traversed in the gradient from Loss to alpha, not all parameters are exactly calculated as the module that is traversed by the back propagation, e.g. if only the codec is consideredThe gradient calculation can be simplified as:

α _n ＝α _n-1 -λΔ

wherein: λ is a gradient decreasing weight parameter;

(7f) If the current training times are smaller than the iteratable times, returning to the step (1) and continuing to train the network; otherwise, the model parameters after the last training are saved, and the training is exited.