WO2024045672A1 - Procédé de codage, procédé de décodage, appareil de codage, appareil de décodage et dispositif électronique - Google Patents

Procédé de codage, procédé de décodage, appareil de codage, appareil de décodage et dispositif électronique Download PDF

Info

Publication number
WO2024045672A1
WO2024045672A1 PCT/CN2023/092061 CN2023092061W WO2024045672A1 WO 2024045672 A1 WO2024045672 A1 WO 2024045672A1 CN 2023092061 W CN2023092061 W CN 2023092061W WO 2024045672 A1 WO2024045672 A1 WO 2024045672A1
Authority
WO
WIPO (PCT)
Prior art keywords
semantic
information
model
decoding
coding
Prior art date
Application number
PCT/CN2023/092061
Other languages
English (en)
Chinese (zh)
Inventor
许晓东
孙梦颖
董辰
韩书君
王碧舳
Original Assignee
北京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京邮电大学 filed Critical 北京邮电大学
Publication of WO2024045672A1 publication Critical patent/WO2024045672A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L1/00Arrangements for detecting or preventing errors in the information received
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W28/00Network traffic management; Network resource management
    • H04W28/02Traffic management, e.g. flow control or congestion control

Definitions

  • the present disclosure relates to the field of communication technology, and in particular to encoding methods, decoding methods, encoding devices, decoding devices and electronic equipment.
  • network nodes tend to become intelligent.
  • the intelligence of network nodes leads to the rapid expansion of information space, a sharp increase in information dimensions, and an increase in the difficulty of representing multi-dimensional information. This makes it difficult to match traditional network service capabilities with high-latitude information spaces.
  • the amount of data transmitted by communication is too large, and the information business service system cannot continue to meet people's needs for complex, diverse and intelligent information transmission.
  • network noise also brings a large transmission error rate.
  • traditional communication systems focus on the transmission process and ignore the context-related meaning.
  • the 5G system is approaching the Shannon limit, and the increase in data volume will bring about a series of problems, such as communication bottlenecks, increased latency, and security issues.
  • Intelligent network is a communication technology that combines artificial intelligence and communication technology to improve communication performance.
  • Artificial intelligence models to encode disseminate and decode business information can significantly reduce the amount of data transmission in communication services and greatly improve the efficiency of information transmission without losing important information. Therefore, a joint encoding and decoding method of source channels based on intelligent simplified network is needed to achieve adaptive encoding and decoding.
  • the encoding and decoding method can be continuously optimized by verifying the transmission data to ensure the accuracy of communication transmission. Rate.
  • the present disclosure provides an encoding method, a decoding method, an encoding device, a decoding device, an electronic device and a storage medium for an intelligent simplified network.
  • an encoding method including:
  • the sending end obtains the first service information and detects the data type of the first service information
  • the sending end selects an adapted semantic encoding model according to the data type of the first service information
  • the sending end inputs the first business information into the semantic coding model to perform semantic extraction to obtain semantic coding information, and performs feature extraction on the first business information to obtain key feature coding information;
  • the sending end selects the corresponding source channel joint coding model based on the type of the semantic coding information and the current channel transmission environment, and combines the semantic coding information, the key feature coding information, the parameters of the semantic decoding model and The channel transmission environment parameters are input into the source channel joint coding model, and the second service information is encoded according to the coding mode output by the source channel joint coding model and sent to the receiving end through the physical channel.
  • the encoding method further includes: the sending end periodically sampling the second service information received by the receiving end, and calculating the difference between the second service information output by the decoder of the sending end and the The mean square error of the second service information output by the encoder of the sending end is used to train the encoder and decoder of the sending end and the encoder and decoder of the receiving end using the calculated mean square error.
  • using the calculated mean square error to train the encoder and decoder of the sending end and the encoder and decoder of the receiving end includes:
  • L MSE represents the mean square error
  • S represents the second service information output by the encoder of the transmitting end
  • S′ represents the second service information received by the decoder of the transmitting end
  • w,s, c represents the length, width and number of channels of the image respectively
  • L SE represents the semantic error
  • represents the semantic coding information output by the penultimate layer of the semantic coding model
  • the loss equation used to train the semantic encoding model in the encoder and the semantic decoding model in the decoder is obtained through the mean square error and the semantic error:
  • ⁇ (0,1) represents a weight factor, which is used to represent the respective proportions of the semantic error L SE and the mean square error L MSE .
  • the training of the encoder and decoder of the sending end and the encoder and decoder of the receiving end also includes:
  • the loss equation between the data signal output by the decoder at the sending end and the data signal output by the encoder at the sending end is calculated by the following formula, which is used to train the source channel joint coding model in the encoder and/or the
  • the source channel joint decoding model in the decoder is:
  • x represents the data signal output by the encoder at the transmitting end
  • x′ represents the data signal output by the decoder at the transmitting end.
  • the source channel joint encoding and decoding method also includes training the semantic encoding model and/or the semantic decoding model through the following steps:
  • the similarity between the original image output by the encoder of the sending end and the decoded image output by the decoder of the sending end is calculated by the following formula to evaluate the accuracy of the semantic encoding model and the semantic decoding model:
  • ⁇ x represents the mean value of x
  • ⁇ y represents the mean value of y
  • ⁇ y represents the variance of x
  • ⁇ xy represents the covariance of x and y
  • c 1 and c 2 represent the covariance coefficient
  • the parameters of the semantic encoding model and/or the semantic decoding model are updated based on a stochastic gradient descent algorithm.
  • the encoding method also includes training the source channel joint coding model and/or the source channel joint decoding model through the following steps:
  • the similarity between the data signal output by the transmitter encoder and the data signal output by the transmitter decoder is calculated by the following formula to evaluate the source channel joint coding model and the source channel joint decoding Model accuracy:
  • ⁇ x represents the mean value of x
  • ⁇ y represents the mean value of y
  • ⁇ y represents the variance of x
  • ⁇ xy represents the covariance of x and y.
  • c 1 and c 2 represent the covariance coefficient;
  • the similarity error between the data signal output by the encoder at the transmitting end and the data signal output by the decoder at the transmitting end is greater than the tolerance error threshold ⁇ , that is, The loss equation is calculated
  • the parameters of the source channel joint encoding model and the source channel joint decoding model are updated based on a stochastic gradient descent algorithm.
  • the step of the sending end processing the first service information to form the second service information specifically includes:
  • the first business information extracted by the semantic coding model is normalized, and then the normalized first business information is input into a residual neural network, and a multi-layer residual convolutional neural network is used and a parameterized activation function to encode the first service information, and finally regularize the first service information to form the second service information;
  • parameterized activation function includes ReLU or PReLU:
  • the activation function PReLU is equivalent to the activation function ReLU.
  • the semantic coding model is based on a multi-layer residual network, including two residual networks: a bottleneck layer and an extended bottleneck layer.
  • the number of bottleneck layers is 2, and the number of extended bottleneck layers is 2. According to 4.
  • a decoding method including:
  • the receiving end receives the second service information sent by the transmitting end, and uses the source channel joint decoding model to decode to obtain semantic encoding information, key feature encoding information, and parameters of the semantic decoding model;
  • the receiving end uses the parameters of the semantic decoding model to construct the semantic decoding model, and uses the semantic decoding model to semantically decode the semantic encoding information and the key feature encoding information to obtain the semantic information and key feature information. ;
  • the receiving end uses the key feature information obtained by decoding to verify the semantic information
  • the receiving end performs recovery processing on the semantic information to obtain third service information
  • the receiving end uses the key feature information to repair the semantic information, or triggers the sending end to resend the second service information until the semantic information passes. After verification, recovery processing is performed on it to obtain the third service information.
  • the step of the receiving end decoding the second service information includes:
  • the second business information is input into multiple deconvolution layers of the semantic decoding model, and output through an activation function, and finally a deregularization process is performed; wherein the activation function includes PReLU, ReLU, and Sigmoid activation functions.
  • an encoding device including:
  • the first acquisition module is configured to acquire the first business information and detect the data type of the first business information
  • a model selection module configured to select an adapted semantic coding model according to the data type of the first business information
  • a semantic encoding module configured to input the first business information into the semantic encoding model to perform semantic extraction to obtain semantic encoding information, and to perform feature extraction on the first business information to obtain key feature encoding information;
  • the source channel joint coding module is configured to be based on the type of the semantic coding information and the current Based on the current channel transmission environment, select the corresponding source channel joint coding model, and input the semantic coding information, the key feature coding information, the parameters of the semantic decoding model and the channel transmission environment parameters into the source channel joint coding model, according to
  • the coding mode output by the joint coding model of the source channel is encoded to form second service information and is sent to the receiving end through the physical channel.
  • the encoding device further includes: a model training module configured to periodically sample the second service information received by the receiving end, and calculate the third service information output by the decoder of the sending end.
  • the mean square error between the second service information and the second service information output by the encoder of the sending end is used to train the encoder and decoder of the sending end and the encoder of the receiving end using the calculated mean square error. decoder.
  • a decoding device including:
  • the source channel joint decoding module is configured to receive the second service information, and perform decoding using the source channel joint decoding model to obtain the semantic encoding information, the key feature encoding information and the parameters of the semantic decoding model;
  • a semantic decoding module configured to construct the semantic decoding model using parameters of the semantic decoding model, and use the semantic decoding model to semantically decode the semantic encoding information and the key feature encoding information to obtain semantic information and Key feature information;
  • An information verification module configured to verify the semantic information using the key feature information
  • an information processing module configured to, in response to the semantic information passing verification, perform recovery processing on the semantic information to obtain third business information; and in response to the semantic information not passing verification, utilize the key feature information Repair the semantic information, or trigger the sending end to resend the second service information until the semantic information passes the verification and is restored to obtain the third service information.
  • the present disclosure also provides an electronic device, including:
  • the memory stores instructions that can be executed by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the encoding described in any one of the above technical solutions. method or decoding method.
  • the present disclosure also provides a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the encoding method or decoding method according to any one of the above embodiments.
  • the present disclosure also provides a computer program product, including a computer program that, when executed by a processor, implements the encoding method or the decoding method according to any one of the above embodiments.
  • the present disclosure provides an encoding method, a decoding method, an encoding device, a decoding device, an electronic device and a storage medium, which combines the semantic extraction of artificial intelligence with the source channel joint encoding of communication technology, and combines the type of source information and the channel transmission environment.
  • a decoding method which combines the semantic extraction of artificial intelligence with the source channel joint encoding of communication technology, and combines the type of source information and the channel transmission environment.
  • Figure 1 is a step diagram of an encoding method in an embodiment of the present disclosure
  • Figure 2 is a step diagram of a decoding method in an embodiment of the present disclosure
  • Figure 3 is an overall flow chart of joint encoding and decoding of source channels in an embodiment of the present disclosure
  • Figure 4 is a schematic diagram of encoder model optimization based on feedback information from the receiving end in an embodiment of the present disclosure
  • Figure 5 is a flow chart of joint semantic source channel encoding and decoding in an embodiment of the present disclosure
  • Figure 6 is a structural diagram of a multi-layer residual network for feature extraction in an embodiment of the present disclosure
  • Figure 7 is a structural diagram of the bottleneck layer of the residual network in the embodiment of the present disclosure.
  • Figure 8 is a structural diagram of the extended bottleneck layer of the residual network in the embodiment of the present disclosure.
  • Figure 9 is a diagram of the key feature extraction and analysis process in the embodiment of the present disclosure.
  • Figure 10 is a functional block diagram of an encoding device in an embodiment of the present disclosure.
  • Figure 11 is a functional block diagram of a decoding device in an embodiment of the present disclosure.
  • the sending end device uses a preconfigured first model to extract the first service information and obtain the second service information to be transmitted; the sending end device transmits the second service information to the receiving end device.
  • the receiving end device receives the second service information, and uses a preconfigured second model to restore the second service information to obtain third service information; the third service information restored through the second model is smaller than the original third service information.
  • the method further includes: an update module determines whether the receiving end device needs to update the second model, and when it determines that an update is needed, the update module
  • the receiving end device transmits a preconfigured third model, and the receiving end device uses the third model to update the second model.
  • Processing business information through pre-trained artificial intelligence models can significantly reduce the amount of data transmission in communication services and greatly improve the efficiency of information transmission. These models are relatively stable, reusable and disseminating. The dissemination and reuse of models will help enhance network intelligence while reducing overhead and resource waste.
  • the model can be divided into several model slices according to different segmentation rules.
  • model slices can also be transmitted between different network nodes, and the model slices can be assembled into models.
  • Model slices can be distributed and stored on multiple network nodes. When a network node finds that it is missing or needs to update a certain model or a certain model slice, it can request it from surrounding nodes that may have the slice.
  • the network nodes passing along the path of transmitting the business information and transmitting the model include intelligent simplified routers.
  • the functions of the intelligent router include but are not limited to business information transmission, model transmission, absorption model self-update, security protection and other functions.
  • the transmission function of the intelligent router involves transmitting business information or models from the source node to the sink node. There are multiple paths between the source node and the sink node.
  • the model transmission function of the Intelligent Router can transmit model slices. By rationally arranging the model slices to take multiple paths, the model slices can be multi-channel transmitted to improve the model transmission rate.
  • This disclosure provides an encoding method, as shown in Figure 1, including:
  • Step S101 The sending end obtains the first service information and detects the data type of the first service information
  • Step S102 The sending end selects an adapted semantic coding model according to the data type of the first service information
  • Step S103 The sending end inputs the first business information into the semantic coding model to perform semantic extraction to obtain semantic coding information, and performs feature extraction on the first business information to obtain key feature coding information;
  • Step S104 The sender selects the corresponding source channel joint coding model based on the type of semantic coding information and the current channel transmission environment, and inputs the semantic coding information, key feature coding information, semantic decoding model parameters, and channel transmission environment parameters.
  • the source channel joint coding model encodes the second service information according to the coding mode output by the source channel joint coding model and sends it to the receiving end through the physical channel.
  • the coding method is characterized by jointly considering the semantic extraction of the source and the source channel coding.
  • an intelligent simplified transmitter is constructed at the sending end, which mainly includes two modules, one is the semantic coding module and the other is the source channel joint coding module.
  • the semantic coding module the artificial intelligence method is used to perform semantic extraction on the source data (i.e. the first business information) using the trained semantic coding model, and the extracted semantic information and the parameters of the selected semantic coding model are intelligently processed.
  • coding In the process of semantic extraction, key feature information is extracted based on the data type of the first business information (text, image, or video), and the key feature information is encoded and transmitted.
  • the key feature is used to verify the restored information.
  • the results can be used as a basis for judgment to determine whether the information needs to be retransmitted, or As feedback for the training of the semantic encoding model and semantic decoding model of the sender and receiver.
  • artificial intelligence technology is used to perform perception modeling of the transmission environment and channel estimation of the modulation and coding channel.
  • the joint source channel coding model Based on the joint source channel coding model, based on the current channel transmission information, the data that needs to be transmitted is signaled.
  • Source channel joint coding is used to perform perception modeling of the transmission environment and channel estimation of the modulation and coding channel.
  • the data that needs to be transmitted is signaled.
  • Source channel joint coding Based on the joint source channel coding model, based on the current channel transmission information, the data that needs to be transmitted is signaled.
  • Source channel joint coding Through the above technical solution, the extracted semantic information is combined with the current channel transmission environment.
  • the specific source information is combined with the specific channel transmission environment, and a
  • the mainstream channel conditions are divided into A, B, C, and D, then the joint coding model of the four source channels A, A B, A C, and A D can be pre-trained.
  • the current channel condition is C, then you can choose to use the A-C model for encoding.
  • Such a model selected for a specific source type and a specific channel transmission environment will help improve coding accuracy and channel transmission efficiency.
  • the encoding method also includes: the sending end periodically samples the second service information received by the receiving end, and calculates the difference between the second service information output by the decoder of the sending end and the The mean square error of the second service information output by the encoder is used to train the encoder and decoder at the transmitting end and the encoder and decoder at the receiving end using the calculated mean square error.
  • the encoder at the sending end and the encoder at the receiving end both include a semantic coding model and a source channel joint coding model.
  • the decoder at the sending end and the decoder at the receiving end both include a semantic decoding model and a source channel joint decoding model.
  • a decoding model is a semantic decoding model and a source channel joint decoding model.
  • this embodiment proposes that the decoder model 2A of the receiving end 2 receives the data information (ie, the second service information) Perform periodic sampling, and then send it back to the original sending end 1 through the physical channel, and input the data into the decoder model 1A of the sending end 1, and input the decoded symbols into the codec together with the original sending symbols output by the sending end encoder model 1B.
  • the model training unit 1C performs comparison, calculates the mean square error value of the two, and performs model optimization training of the encoder and decoder, thereby improving the encoding and decoding capabilities of the encoding model and decoding model and ensuring the accuracy of the transmitted data.
  • using the calculated mean square error to train the encoder and decoder at the transmitting end and the encoder and decoder at the receiving end includes:
  • L MSE represents the mean square error
  • S represents the second service information output by the encoder at the transmitting end
  • S' represents the second service information received by the decoder at the transmitting end
  • w, s, c represent the length, width and channel of the image respectively. number
  • L SE represents the semantic error
  • represents the semantic coding information output by the penultimate layer of the output semantic coding model
  • the loss equation used to train the semantic encoding model in the encoder and the semantic decoding model in the decoder is obtained through the mean square error and semantic error:
  • ⁇ (0,1) represents the weight factor, which is used to represent the respective proportions of the semantic error L SE and the mean square error L MSE .
  • x represents the data signal output by the transmitting end encoder
  • x′ represents the data signal output by the transmitting end decoder
  • the accuracy of the model is evaluated by calculating the similarity between the original image (or data signal) output by the transmitter encoder and the decoded image (or data signal) output by the transmitter decoder:
  • ⁇ x represents the mean value of x
  • ⁇ y represents the mean value of y
  • ⁇ xy represents the covariance of x and y
  • c 1 and c 2 represent the covariance coefficient.
  • the similarity of the image or the similarity of the data signal between the encoder of the sending end and the decoder of the sending end is obtained through the above evaluation formula.
  • the similarity shows that the difference is greater than the preset threshold, for example When , it indicates that the accuracy of the semantic encoding model or semantic decoding model is insufficient, then the loss equation can be used Train the model; when When , it indicates that the accuracy of the source channel joint encoding model or the source channel joint decoding model is insufficient, resulting in inaccurate encoded or decoded information, then the loss equation can be used Train the model.
  • a picture source is defined as S ⁇ C whs , where w, h, s respectively represent pictures length, width and number of channels.
  • the semantic encoder includes a multi-layer downsampling convolution layer and a multi-layer linear coding layer, which reduces the dimensionality of the multi-dimensional picture source (first business information) into semantic coding information l ⁇ R d , where d represents the semantic coding information. dimensions. ⁇ and ⁇ represent the neural network parameters of the semantic encoder and semantic decoder respectively.
  • C a and E ⁇ represent the encoder and decoder respectively used for joint source channel coding; the parameter ⁇ represents the parameters of the semantic decoding model.
  • bit data stream (second service information) sent by the sending end undergoes multi-channel channel fading, and the information received by the receiving end is modeled as:
  • y represents the received symbol information stream
  • ⁇ noise represents additive Gaussian white noise
  • h is the channel impulse response gain of the multipath channel
  • Re represents the convolution operation, assuming that for a symbol signal stream, the channel response is fixed and is different for different symbols at the same time.
  • Channel h changes with time, specifically expressed as follows:
  • a k (t) represents the channel gain of the k-th channel
  • ⁇ k represents the delay of the k-th channel
  • K Indicates the total number of channels.
  • the data obtained by decoding the second service information at the receiving end is expressed as:
  • the mean square error is used as part of the loss equation to train the semantic encoding model and the semantic decoding model, that is, through the above loss equation and / or Train the model.
  • the transmitting end is equipped with a semantic encoder and a semantic decoder.
  • the semantic decoder parameters need to be transmitted to the receiving end together with the semantic encoding information.
  • the semantic decoding model needs to be configured so that the amount of data is as small as possible.
  • the basic framework of AI automatic coding based on convolutional neural network is used to effectively reduce the redundancy of parameters.
  • represents the activation function (the activation function can be parameterized ReLU or PReLU, Sigmod function); * represents the 2D convolution operation.
  • the fully connected layer at the receiving end is symmetrical to the fully connected layer at the transmitting end.
  • the receiving end uses the reconstructed model restored by the received semantic decoding model parameters, inputs the semantic bit data stream into the reconstructed semantic decoding model, and restores the second business information to obtain the same result as the second service information.
  • the third business information (restored image) that is highly similar to the first business information, the resulting reconstructed graph is as follows:
  • W represents the flip operation.
  • ⁇ x represents the mean value of x
  • ⁇ y represents the mean value of y
  • ⁇ xy represents the covariance of x and y
  • c 1 and c 2 represent the covariance coefficient.
  • the training steps include:
  • Input Get a set of images S from the image database, and a model selection classifier ⁇ with a tolerance error threshold ⁇ ;
  • the training steps include:
  • Input Get a set of images S from the image database, and a model selection classifier ⁇ , tolerance error threshold ⁇ ; semantic encoding model E ⁇ and semantic decoding model D ⁇ , fading channel h, noise ⁇ noise ;
  • the step of the sending end processing the first service information to form the second service information specifically includes:
  • the semantic coding model is used to extract the first business information for normalization processing, and then the normalized first business information is input into the residual neural network, and the multi-layer residual convolutional neural network and parameterized activation function are used to first business information is encoded, and finally the first business information is regularized Processing to form second business information;
  • the parameterized activation function includes ReLU or PReLU:
  • the steps for the receiving end to decode the second service information include:
  • the second business information is input into multiple deconvolution layers of the semantic decoding model, and output through the activation function, and finally a deregularization process is performed;
  • the activation function includes PReLU, ReLU, and Sigmoid activation functions.
  • a neural network is introduced to map the extracted semantic information and the corresponding semantic extraction training model parameters into bit data.
  • a residual network is introduced to map the source encoder and The decoder is parameterized.
  • the specific joint source channel coding process is shown in Figure 5. The parameterization process is described in detail as follows:
  • the sending end normalizes the extracted source data (first business information), and then inputs the normalized data into the residual neural network.
  • the residual neural network uses the parameterized activation function ReLU (PReLU).
  • PReLU parameterized activation function ReLU
  • the source data is encoded by a coding equation based on a multi-layer residual convolutional neural network. After passing through the multi-layer convolutional neural network, the data is regularized and input into the physical channel and sent to the receiving end.
  • PReLU activation function is specifically expressed as:
  • the activation function PReLU is equivalent to the activation function ReLU.
  • Res1-Res4 are the residual convolution layers generated by ResNet50
  • Res5 is an additional convolution layer
  • F is the encoding output information.
  • Res1-Res4 have different convolution structures, which usually need to be adjusted based on the output format of F.
  • Res1 initialization generally uses: (7X7 conv, 64, stride 2) parameter method (convolution is 7x7, 64 pixels, stride 2 ).
  • the advantages of using the above residual network structure are: (1) low complexity and few parameters required; (2) deeper network depth, no gradient disappearance; (3) increased classification accuracy; (4) solved Network degradation problem during training.
  • the receiving end inputs the received data into the deconvolution corresponding to the transmission convolution layer.
  • Multilayer and output through the activation function, and perform another anti-regularization process to decode the semantic bit data, the bit data of the semantic decoding model parameters and the bit data of the key feature extraction encoding information. Based on the semantic decoding model parameters, the semantic information is decoded.
  • the activation functions mentioned above include but are not limited to PReLU, ReLU, Sigmoid activation function, etc.
  • the deconvolutional neural network of the decoder corresponds to the neural network of the encoder.
  • This disclosure also provides a decoding method, as shown in Figure 2, including:
  • Step S201 the receiving end receives the second service information sent by the transmitting end, and uses the source channel joint decoding model to decode to obtain semantic encoding information, key feature encoding information and parameters of the semantic decoding model;
  • Step S202 the receiving end uses the parameters of the semantic decoding model to construct a semantic decoding model, and uses the semantic decoding model to semantically decode the semantic encoding information and key feature encoding information to obtain the semantic information and key feature information;
  • Step S203 The receiving end uses the key feature information obtained by decoding to verify the semantic information
  • Step S204 In response to the semantic information passing the verification, the receiving end performs recovery processing on the semantic information to obtain the third service information;
  • Step S205 In response to the semantic information not passing the verification, the receiving end uses the key feature information to repair the semantic information, or triggers the sending end to resend the second service information until the semantic information passes the verification and then turns to step S204 to perform the semantic information verification. Resume processing to obtain third service information.
  • the intelligent simplified receiver may include an AI model-based source channel decoder and an AI model-based semantic decoder.
  • the intelligent receiver decodes the received second service information (the second service information is bit data) according to the AI model parameters, and obtains semantic decoding model parameters, semantic coding information and semantic coding information used for semantic decoding at the receiving end. and key feature encoding information, and then use the semantic decoding model parameters to build a semantic decoding model, and perform semantic decoding to obtain the semantic information and key feature information corresponding to the first business information. In order to verify the accuracy of the received information, the decoded semantic information needs to be verified.
  • This disclosure provides two verification methods: (1) Analyze the important information in the semantic information and compare it with the received key features Information relative comparison; (2) Input semantic information into the verification network, and determine whether the output elements of the verification network are in the key feature information. If the verification passes, it means that the semantic information decoded by the receiving end is correct. Otherwise, the semantic information is incorrect. Use key feature information to repair the semantic information that fails the verification, or if necessary, trigger the sending end to resend or Partially resend business information to ensure the accuracy of transmitted business information.
  • the step of decoding the second service information by the receiving end includes: inputting the second service information into multiple deconvolution layers of the semantic decoding model, outputting it through the activation function, and finally performing a deregularization process.
  • the activation functions include PReLU, ReLU, and Sigmoid activation functions.
  • the receiving end inputs the decoded semantic information into the feature verification network to verify the restored semantic information.
  • the processing process of the feature verification network includes: pooling the decoded feature information, adding the signal-to-noise ratio of the channel and splicing the pooled information, inputting it into the fully connected layer, and outputting it through the activation function PReLU, and then inputting Go to the next fully connected layer and output it through the Sigmoid activation function.
  • the present disclosure also provides an encoding device, as shown in Figure 10, including:
  • the first acquisition module 101 is configured to acquire the first business information and detect the data type of the first business information
  • the model selection module 102 is configured to select an adapted semantic coding model according to the data type of the first business information
  • the semantic encoding module 103 is configured to input the first business information into the semantic encoding model to perform semantic extraction to obtain semantic encoding information, and to perform feature extraction on the first business information to obtain key feature encoding information;
  • the source channel joint encoding module 104 is configured to select a corresponding source channel joint encoding model based on the type of semantic encoding information and the current channel transmission environment, and combine the semantic encoding information, key feature encoding information, and parameters of the semantic decoding model And channel transmission environment parameter input
  • the source channel joint coding model encodes the second service information according to the coding mode output by the source channel joint coding model and sends it to the receiving end through the physical channel.
  • the feature of the joint source channel coding method is that the semantic extraction of the source and the source channel coding are jointly considered.
  • the first acquisition module 101, the model selection module 102, the semantic encoding module 103, and the source channel joint encoding module 104 can be set in the sending end 1.
  • an intelligent simplified transmitter is constructed at the sending end.
  • the intelligent simplified transmitter mainly includes two modules, one is the semantic encoding module and the other is the source channel joint encoding module.
  • Figure 3 shows the overall flow chart of semantic encoding transmission at the sender and semantic restoration at the receiver.
  • the semantic encoding module 103 adopts artificial intelligence methods and uses the trained semantic encoding model to perform semantic extraction on the source data (i.e., the first business information). , and intelligently encode the extracted semantic information and the parameters of its selected semantic encoding model.
  • key feature information is extracted based on the data type of the first business information (text, image, or video), and the key feature information is encoded and transmitted.
  • the key feature is used to verify the restored information. Verification
  • the results can be used as a basis for judgment on whether the information needs to be retransmitted, and can also be used as feedback for the training of semantic encoding models and semantic decoding models at the sender and receiver.
  • the source channel joint coding module 104 artificial intelligence technology is used to perform perception modeling of the transmission environment and channel estimation of the modulation and coding channel. Based on the AI model and based on the current channel transmission information, the source channel joint is performed on the data that needs to be transmitted. coding. Through the above coding device, the extracted semantic information is combined with the current channel transmission environment, and the AI model is used to output a better coding mode, which is beneficial to improving the coding accuracy and channel transmission efficiency.
  • the encoding device further includes: a model training module 105, configured to periodically sample the second service information received by the receiving end, and calculate the difference between the second service information output by the decoder of the transmitting end and that of the transmitting end.
  • the mean square error of the second service information output by the encoder is used to train the encoder and decoder at the transmitting end and the encoder and decoder at the receiving end using the calculated mean square error.
  • the model training method in this embodiment is consistent with the model training method in the above encoding method embodiment.
  • the model is trained by calculating the loss equation, so no details will be described below.
  • the present disclosure also provides a decoding device, as shown in Figure 11, including:
  • the source channel joint decoding module 201 is configured to receive the second service information and use the The source channel joint decoding model performs decoding to obtain semantic encoding information, key feature encoding information, and parameters of the semantic decoding model;
  • the semantic decoding module 202 is configured to use the parameters of the semantic decoding model to construct a semantic decoding model, and use the semantic decoding model to semantically decode the semantic encoding information and key feature encoding information to obtain the semantic information and key feature information;
  • the information verification module 203 is configured to verify the semantic information using key feature information
  • the information processing module 204 is configured to, in response to the semantic information passing the verification, restore the semantic information to obtain the third business information; and in response to the semantic information not passing the verification, use the key feature information to repair the semantic information, or The sending end is triggered to resend the second service information until the semantic information passes the verification and is restored to obtain the third service information.
  • the source channel joint decoding module 201, the semantic decoding module 202, the information verification module 203, and the information processing module 204 can be provided in the receiving end 2.
  • an intelligent simplified receiver is built.
  • the intelligent simplified receiver can include a source channel decoder based on the AI model and a semantic decoder based on the AI model.
  • the source channel joint decoding module 201 decodes the received second service information (the second service information is bit data) according to the AI model parameters to obtain semantic decoding model parameters, semantic encoding information and key features for semantic decoding at the receiving end.
  • the semantic decoding module 202 uses the semantic decoding model parameters to construct a semantic decoding model, and performs semantic decoding to obtain semantic information and key feature information corresponding to the first business information.
  • the information verification module 203 needs to verify the decoded semantic information.
  • the present disclosure provides two verification methods: (1) analyze the important information in the semantic information and compare it with Compare the received key feature information; (2) Input the semantic information into the verification network, and determine whether the output element of the verification network is in the key feature information. If the verification is passed, it means that the semantic information decoded by the receiving end is correct, the information processing module 204 performs semantic recovery on the correct semantic information to obtain the third service information.
  • the first service information is an image
  • the second service information includes the third service information.
  • the semantic information obtained by semantic extraction of the first business information is restored to obtain an image that is highly similar to the first business information, that is, the third business information.
  • the amount of data transmitted is greatly reduced and communication efficiency is improved; conversely, If the semantic information is incorrect, use key feature information to repair the semantic information that fails the verification, or if necessary, trigger the sending end to resend or partially resend the business information until the semantic information decoded by the receiving end passes the information verification. Verification of the verification module 203 to ensure the accuracy of the transmitted business information sex.
  • the present disclosure also provides an electronic device, a readable storage medium, and a computer program product.
  • electronic devices are intended to mean various forms of digital computers, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers.
  • Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the device includes a computing unit that can perform various appropriate actions and processes according to a computer program stored in a read-only memory (ROM) or loaded from a storage unit into a random access memory (RAM). In RAM, various programs and data required for device operation can also be stored.
  • ROM read-only memory
  • RAM random access memory
  • various programs and data required for device operation can also be stored.
  • the computing unit, ROM and RAM are connected to each other via buses. Input/output (I/O) interfaces are also connected to the bus.
  • I/O interface Multiple components in the device are connected to the I/O interface, including: input units, such as keyboards, mice, etc.; output units, such as various types of monitors, speakers, etc.; storage units, such as disks, optical disks, etc.; and communication units, For example, network cards, modems, wireless communication transceivers, etc.
  • the communication unit allows the device to exchange information/data with other devices through computer networks such as the Internet and/or various telecommunications networks.
  • Computing units may be various general and/or special purpose processing components having processing and computing capabilities. Some examples of computing units include, but are not limited to, central processing units (CPUs), graphics processing units (GPUs), various dedicated artificial intelligence (AI) computing chips, various computing units that run machine learning model algorithms, digital signal processors (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit performs each method and processing described above, such as the encoding method or decoding method in the above embodiments.
  • the encoding method or the decoding method may be implemented as a computer software program, which is tangibly embodied in a machine-readable medium, such as a storage unit.
  • part or all of the computer program may be loaded and/or installed onto the device via the ROM and/or communication unit.
  • the computer program When the computer program is loaded into RAM and executed by the computing unit, the above-described encoding method or decoding method may be performed. One or more steps.
  • the computing unit may be configured to perform the encoding method or the decoding method in any other suitable manner (eg, by means of firmware).
  • Various implementations of the systems and techniques described above may be implemented in digital electronic circuit systems, integrated circuit systems, field programmable gate arrays (FPGAs), application specific integrated circuits (ASICs), application specific standard products (ASSPs), systems on a chip implemented in a system (SOC), load programmable logic device (CPLD), computer hardware, firmware, software, and/or a combination thereof.
  • FPGAs field programmable gate arrays
  • ASICs application specific integrated circuits
  • ASSPs application specific standard products
  • SOC system
  • CPLD load programmable logic device
  • computer hardware firmware, software, and/or a combination thereof.
  • These various embodiments may include implementation in one or more computer programs executable and/or interpreted on a programmable system including at least one programmable processor, the programmable processor
  • the processor which may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • An output device may be a special purpose or general purpose programmable processor, may receive data and instructions from a storage system, at least one input device, and at least one output device, and transmit data and instructions to the storage system, the at least one input device, and the at least one output device.
  • Program code for implementing the encoding method or decoding method of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • magnetic storage device or any suitable combination of the above.
  • the systems and techniques described herein may be implemented on computers in order to provide interactions with users
  • the computer has: a display device (for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (for example, a mouse or a trackball) through which the user can The keyboard and the pointing device provide input to the computer.
  • a display device for example, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device for example, a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • Computer systems may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact over a communications network.
  • the relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
  • the server can be a cloud server, a distributed system server, or a server combined with a blockchain.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

La présente divulgation se rapporte au domaine technique des communications et concerne en particulier un procédé de codage, un procédé de décodage, un appareil de codage, un appareil de décodage et un dispositif électronique. Le procédé consiste à : effectuer, au moyen d'une extrémité d'envoi, une extraction sémantique sur les premières informations de service afin d'obtenir des informations de codage sémantique et des informations de codage de caractéristiques clés ; d'après le type des informations de codage sémantique et un environnement de transmission de canal actuel, sélectionner un modèle de codage de canal source commun correspondant, le modèle de codage de canal source commun effectuant un codage sur les informations de codage sémantique, les informations de codage de caractéristiques clés, les paramètres du modèle de décodage sémantique et les paramètres de l'environnement de transmission de canal afin de former des deuxièmes informations de service, puis les envoyer à une extrémité de réception ; recevoir, au moyen de l'extrémité de réception, les deuxièmes informations de service, puis effectuer un décodage de canal source et un décodage sémantique afin d'obtenir des informations sémantiques et des informations de caractéristiques clés ; et vérifier les informations sémantiques à l'aide des informations de caractéristiques clés, puis récupérer les informations sémantiques après une vérification réussie afin d'obtenir des troisièmes informations de service. Par conséquent, la quantité de transmission de communication est réduite, et la précision de la transmission de communication est assurée.
PCT/CN2023/092061 2022-09-02 2023-05-04 Procédé de codage, procédé de décodage, appareil de codage, appareil de décodage et dispositif électronique WO2024045672A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211072342.5A CN117692094A (zh) 2022-09-02 2022-09-02 编码方法、解码方法、编码装置、解码装置及电子设备
CN202211072342.5 2022-09-02

Publications (1)

Publication Number Publication Date
WO2024045672A1 true WO2024045672A1 (fr) 2024-03-07

Family

ID=90100298

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/092061 WO2024045672A1 (fr) 2022-09-02 2023-05-04 Procédé de codage, procédé de décodage, appareil de codage, appareil de décodage et dispositif électronique

Country Status (2)

Country Link
CN (1) CN117692094A (fr)
WO (1) WO2024045672A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113379040A (zh) * 2021-07-07 2021-09-10 东南大学 基于语义编码的混合重传方法
WO2022008311A1 (fr) * 2020-07-10 2022-01-13 Koninklijke Philips N.V. Compression d'informations génomiques par codage arithmétique basé sur un apprentissage machine configurable
CN114448563A (zh) * 2021-12-13 2022-05-06 北京邮电大学 语义编码传输方法及电子设备
CN114885370A (zh) * 2022-04-24 2022-08-09 北京邮电大学 语义通信方法、装置、电子设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022008311A1 (fr) * 2020-07-10 2022-01-13 Koninklijke Philips N.V. Compression d'informations génomiques par codage arithmétique basé sur un apprentissage machine configurable
CN113379040A (zh) * 2021-07-07 2021-09-10 东南大学 基于语义编码的混合重传方法
CN114448563A (zh) * 2021-12-13 2022-05-06 北京邮电大学 语义编码传输方法及电子设备
CN114885370A (zh) * 2022-04-24 2022-08-09 北京邮电大学 语义通信方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN117692094A (zh) 2024-03-12

Similar Documents

Publication Publication Date Title
Qin et al. Semantic communications: Principles and challenges
Gündüz et al. Beyond transmitting bits: Context, semantics, and task-oriented communications
CN112800247B (zh) 基于知识图谱共享的语义编/解码方法、设备和通信系统
US20220147695A1 (en) Model training method and apparatus, font library establishment method and apparatus, and storage medium
WO2019169996A1 (fr) Procédé et appareil de traitement vidéo, procédé et appareil de récupération vidéo, support d'informations et serveur
US10965948B1 (en) Hierarchical auto-regressive image compression system
CN114445831A (zh) 一种图文预训练方法、装置、设备以及存储介质
US11177823B2 (en) Data compression by local entropy encoding
US20220148239A1 (en) Model training method and apparatus, font library establishment method and apparatus, device and storage medium
US9569285B2 (en) Method and system for message handling
US11538197B2 (en) Channel-wise autoregressive entropy models for image compression
US20230090590A1 (en) Speech recognition and codec method and apparatus, electronic device and storage medium
CN114724168A (zh) 深度学习模型的训练方法、文本识别方法、装置和设备
Getu et al. Making sense of meaning: A survey on metrics for semantic and goal-oriented communication
Lu et al. Semantics-empowered communications: A tutorial-cum-survey
CN111767697B (zh) 文本处理方法、装置、计算机设备以及存储介质
CN113300813A (zh) 基于注意力的针对文本的联合信源信道方法
WO2023179800A1 (fr) Procédé de réception de communication et appareil associé
Huang et al. Joint task and data oriented semantic communications: A deep separate source-channel coding scheme
WO2024045672A1 (fr) Procédé de codage, procédé de décodage, appareil de codage, appareil de décodage et dispositif électronique
US20230154053A1 (en) System and method for scene graph lossless compression by context-based graph convolution
WO2022222767A1 (fr) Procédé et appareil de traitement de données
CN114461816A (zh) 基于知识图谱的信息补充语义通信系统的实现方法
CN116918329A (zh) 一种视频帧的压缩和视频帧的解压缩方法及装置
WO2023138238A1 (fr) Procédé et appareil de transmission d'informations basés sur un réseau commandé par intention, dispositif électronique et support

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23858712

Country of ref document: EP

Kind code of ref document: A1