CN116896651A - Video semantic communication method with adaptive code rate and related device - Google Patents

Video semantic communication method with adaptive code rate and related device Download PDF

Info

Publication number
CN116896651A
CN116896651A CN202310641159.0A CN202310641159A CN116896651A CN 116896651 A CN116896651 A CN 116896651A CN 202310641159 A CN202310641159 A CN 202310641159A CN 116896651 A CN116896651 A CN 116896651A
Authority
CN
China
Prior art keywords
semantic
reconstructed
transmitted
code
common
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310641159.0A
Other languages
Chinese (zh)
Inventor
董辰
鲍智成
梁灏泰
许晓东
张平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN202310641159.0A priority Critical patent/CN116896651A/en
Publication of CN116896651A publication Critical patent/CN116896651A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
    • H04N21/2343Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
    • H04N21/234309Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4 or from Quicktime to Realvideo
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/4402Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
    • H04N21/440218Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display by transcoding between formats or standards, e.g. from MPEG-2 to MPEG-4

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The disclosure provides a video semantic communication method with adaptive code rate and a related device; the method comprises the following steps: a step of joint coding of information source channels, in which a transmitting end converts a frame set to be transmitted into a semantic feature map; a common feature extraction step, wherein the sending end converts the semantic feature map of the frame set into common semantic codes and characteristic semantic codes; a dynamic semantic variable length coding step, namely discarding a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted; a transmission step, wherein the sending end transmits the signal to be transmitted to a receiving end; and a comprehensive decoding step, wherein the receiving end decodes the received signal to obtain a reconstructed frame set.

Description

Video semantic communication method with adaptive code rate and related device
Technical Field
The disclosure relates to the technical field of communication, in particular to a video semantic communication method and device with adaptive code rate, electronic equipment and a storage medium.
Background
The existing video communication scheme is based on traditional source channel separation coding, the source coding uses algorithms such as H.264, H.265/AVS3 and the like, and the channel coding uses algorithms such as LDPC and the like. The scheme of the signal source channel separation has certain advantages under the condition of high signal to noise ratio, and can ensure the accurate reconstruction of the video.
Because the source coding algorithm requires strict consistency on the bit level of the data, once the signal-to-noise ratio is reduced and error codes appear in the video stream, the data cannot be decoded due to damage, and repeated retransmission is needed until the code stream is transmitted correctly. The traditional communication scheme can not accurately control the bit, and can only approximately control the bit rate to reach the designated bit rate, so that the whole video code stream has fluctuation, and certain sections of video code streams can be too high to exceed the bandwidth allocated by wireless equipment, and the fluctuation is very easy to cause video jamming.
Disclosure of Invention
Aiming at the problems in the prior art, a video semantic communication method, a device, equipment and a medium with self-adaptive code rate are provided.
The invention comprises a video semantic communication method with self-adaptive code rate, which comprises the following steps:
a step of joint coding of information source channels, in which a transmitting end converts a frame set to be transmitted into a semantic feature map;
a common feature extraction step, wherein the sending end converts the semantic feature map of the frame set into common semantic codes and characteristic semantic codes;
a dynamic semantic variable length coding step, namely discarding a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted;
a transmission step, wherein the sending end transmits the signal to be transmitted to a receiving end;
and a comprehensive decoding step, wherein the receiving end decodes the received signal to obtain a reconstructed frame set.
The invention also comprises a video semantic communication device with self-adaptive code rate, which is characterized by comprising the following steps:
the information source channel joint coding module is used for converting a frame set to be transmitted into a semantic feature map by a transmitting end;
the common feature extraction module is used for converting the semantic feature map of the frame set into common semantic codes and characteristic semantic codes by the sending end;
the dynamic semantic variable length coding module discards a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted;
the transmission module is used for transmitting the signal to be transmitted to the receiving end by the sending end;
and the comprehensive decoding module is used for decoding the received signals by the receiving end to obtain a reconstructed frame set.
The invention also comprises an electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein the communication bus is used for completing the mutual communication among the processor, the communication interface and the memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program stored in the memory, so as to implement the video semantic communication method with adaptive code rate according to any one of the above technical schemes.
The invention also comprises a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes the video semantic communication method with adaptive code rate according to any one of the technical schemes when being executed by a processor.
The invention also comprises a computer program product which, when executed on a computer, causes the computer to perform the rate adaptive video semantic communication method of any of the above technical solutions.
Drawings
Embodiments of the present invention will now be described more fully with reference to the accompanying drawings. The drawings, however, are for illustration and description only and are not intended as a definition of the limits of the invention.
Fig. 1 is a schematic diagram of a video semantic communication method with adaptive code rate according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a video semantic communication device with adaptive code rate according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a system architecture according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of commonality and feature extraction provided by an embodiment of the present invention;
fig. 5 is a block diagram of an electronic device for implementing a method of an embodiment of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
Frame set: in the MPEG video coding standard, a set of frames (Group of Pictures, GOP, also called a picture set, group of pictures) is a group of consecutive pictures within a video coded in MPEG, each video coded in the MPEG standard being made up of consecutive sets of frames. The GOP of the present disclosure is slightly different from the GOP in the conventional algorithm, which contains reference frames, predicted frames, bi-directional predicted frames, etc., and represents only 4-dimensional vectors stacked from several frames of video frames.
Semantic coding: semantic information contained in the business information. According to different semantic types, the method can be subdivided into text semantic coding, image semantic coding, audio semantic coding, video semantic coding, point cloud semantic coding and the like. The possible categories of semantic codes are related to possible categories of business information and can be all possible types of semantic codes of communication transmission. The generation of the semantic code is also related to a semantic code model, and the semantic code model can adopt all possible models of artificial intelligence, deep learning, pattern recognition and other subjects.
Joint source channel coding: joint channel source coding (JSCC) is an important technology in semantic communication, and classical JSCC strategies ignore semantic features based on statistical probability of the source.
Transmitting end and receiving end: there is a channel connection between a transmitting end (transmitter) having an encoder and a receiving end (receiver) having a decoder. The invention is based on semantic coding and semantic decoding, so the sending end of the invention is provided with a semantic encoder; the receiving end of the invention has a semantic decoder. The sending end and the receiving end of the invention work in a semantic layer, and the bottom layer is still a shannon physical layer.
And (3) model: the model disclosed by the invention comprises models in the subject fields of machine learning, artificial intelligence, neural networks and the like. The model is used for carrying out semantic coding and semantic decoding on the service information, and is called a semantic coding model and a semantic decoding model. According to different service information types, the method can be divided into a text model, an audio model, an image model, a video model, a point cloud model, a one-dimensional waveform model, a radar data model and the like. The class of the model is related to the class of semantic coding and the class of business information, and can be a model of all possible types of communication transmission.
The invention is further described below with reference to the drawings and specific examples, which are not intended to be limiting.
Example 1
As shown in fig. 1, this embodiment provides a video semantic communication method with adaptive code rate, which is characterized by comprising:
s101, a source channel joint coding step, wherein a transmitting end converts a frame set to be transmitted into a semantic feature map;
step S102 of extracting common features, the sender converts the semantic feature map of the frame set into common semantic codes and characteristic semantic codes;
step S103 of dynamic semantic variable length coding, namely discarding a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted;
a transmission step S104, in which the transmitting end transmits the signal to be transmitted to the receiving end;
and a comprehensive decoding step S105, in which the receiving end decodes the received signal to obtain a reconstructed frame set.
As shown in fig. 3, a system architecture diagram provided in this embodiment is shown. Frame set X of video frames g Processing is performed at the transmitting end. And finally recovering the obtained reconstructed frame set at the receiving end
At the transmitting end, the frame is assembled into a frame X through a source channel joint Encoder (JSCC Encoder) g Set Y converted into semantic feature graphs g
Then through a commonality feature extractor (Common Feature Extrator), the set Y of semantic feature graphs is processed g Conversion into a common semantic vector and a characteristic semantic vector W g
Then passing through a dynamic semantic variable length coder (Var iable Length Coding) to combine the common semantic vector with the characteristic semantic vector W g Converted into a one-dimensional signal S to be transmitted g . Wherein the common semantic vector and the characteristic semantic vector are input into an Entropy Model (Entropy Model) which calculates the information Entropy of the encoded bits of the above vectors. And determining which coding bits are discarded according to the calculated information entropy.
And then reaches the receiving end through a transmission step (Wireless Channel).
Because the sender needs entropy model for variable length coding, the coded data loses the original position attribute because of being unfolded into one-dimensional vector, and an entropy model which is identical to the sender is deployed at the receiver and used for recovering the position attribute of the data during decoding.
The receiving end receives the one-dimensional signal through a comprehensive Decoder (which belongs to a JSCC Decoder for joint decoding of the information source and the information destination)Obtaining a reconstructed frame set by one-time decoding>Received one-dimensional signal->Is input into the entropy model to determine the location properties of the recovered data at the time of decoding. Filling zero values or average values into the coding bits at the positions during decoding, and expanding to obtain a reconstructed one-dimensional vector; the average value is calculated by the sending end according to the discarded data of the coding bit.
The beneficial effects of the source channel joint coding step S101 include that during video semantic transmission, the code rate fluctuation is localized in frame sets, the number of frames of each frame set is a positive integer, and the time length of at most a few seconds in video can be obtained from a few frames to tens of frames or hundreds of frames, so that the local frame rate can be better obtained.
Wherein, optionally, the source channel joint coding step converts each image frame in the frame set into a semantic feature map based on a pre-trained source model and a channel model;
the channel model can be replaced according to specific application scenes in the training stage. The channel model can be replaced according to specific application scenes in the whole training stage of the transmitting end encoder, so that the trained information source channel joint encoder can be better adapted to the target scenes.
The beneficial effects of the common feature extraction step S102 include that the compression efficiency of the semantic feature map is greatly improved. Meanwhile, code rate self-adaption can be performed according to channel environment during source channel joint coding.
Wherein, optionally, the step of extracting the common feature converts all semantic feature graphs in the frame set into common semantic codes and characteristic semantic codes based on an embedded convolutional neural network.
For example, using an embedded convolutional neural network to further calculate and compress all semantic feature graphs of a frame set (GOP), taking 3 graphs as a GOP as an example, 3 frames of original frames will generate 1 semantic common feature code and 3 semantic feature codes, and the total amount of data after calculation will be increased from the dimension.
Assuming that the frame set contains N video frames, the input layer of the embedded convolutional neural network accepts N semantic feature graphs Y g The output layer outputs n+1 semantic encodings, including one common semantic encoding and N characteristic semantic encodings.
In another embodiment, the semantic feature map Y g Outputting the common semantic code through the embedded convolutional neural network, and carrying out Y g Subtracting the common semantic code to obtain the characteristic semantic code.
As shown in fig. 4, the embedded convolutional neural network in the commonality feature extractor (Common Feature Extrator) takes two image frames X g (assuming that a set of frames includes 2 frames) to 1 commonality map W gc And 2 individual feature maps W gi . The commonality feature map represents commonality features in two image frames, for example, five-pointed star, hexagonal, and sun patterns in both image frames. The two characteristic feature maps respectively express the personality information of the two image frames, such as the difference of the position, the size, the direction and other attributes of the graphics.
The common feature extraction step has the beneficial effect that the semantic feature map is greatly compressed by dividing the common feature and the characteristic feature. Each semantic feature map may logically consist of commonalities and characteristics, and each semantic feature map only retains a share of commonalities semantic code in common. Assuming that the data volume of the semantic feature map is a and the data volume of the personality feature map is a, the data volume originally transmitted by the N semantic feature maps is n×a. The amount of data after retention of 1 part commonality and N parts identity is N x a+ (a-a). The data compression rate generally depends on the ratio of a and a. Assuming that a=0.5a, the compression ratio is (N-1)/(2N).
Optionally, the number of the commonality feature map and the number of the personality feature map are not limited, and are variable parameters, so that the commonality can not be extracted when the channel condition is good, and the personality feature can be directly transmitted, so that the recovery performance of the video after transmission is improved.
Wherein, optionally, the step of extracting the common features generates a common semantic code and a first number of characteristic semantic codes in each of the frame sets; the first number is the number of image frames in the set of frames.
The step S103 of dynamic semantic variable length coding has the beneficial effects that when the conditions such as channel environment change influence the code rate, repeated retransmission is not needed until the code stream is transmitted correctly; but can combine the source channel joint coding step and the common feature extraction step when the whole video code stream has fluctuation, and the invention can adapt to the whole video code stream by the joint action of a channel model, a common semantic coding proportion and a variable length coding proportion.
Wherein, optionally, the step of dynamic semantic variable length coding comprises:
based on a preset information entropy model, obtaining the information entropy of each coding bit of the common semantic code and the characteristic semantic code;
and the variable length coder discards a part of coded bits according to the information entropy of each coded bit and a preset threshold value, flattens the coded bits into a one-dimensional vector and splices the one-dimensional vector to generate a signal to be transmitted.
Wherein, optionally, in the dynamic semantic variable length coding step, the random variable of the information entropy model includes at least one of the following: channel conditions, quality of service, latency requirements, bandwidth limitations;
and comparing the information entropy of the coded bits with the preset threshold value, and discarding the coded bits lower than the preset threshold value.
Optionally, in the step of dynamic semantic variable length coding, the discarding a part of data to generate a signal to be transmitted includes:
and discarding the code bits to be discarded to obtain a shortened one-dimensional vector serving as a signal to be transmitted.
Optionally, in the step of dynamic semantic variable length coding, the step of discarding a part of data to generate a signal to be transmitted further includes:
calculating an average value for coding bits to be discarded; the mean value is used for filling the received one-dimensional vector when the receiving end recovers.
By discarding the coding bits with lower partial information entropy, the useful information in the common and characteristic semantic coding is reserved, so that the information in video transmission is reserved to the greatest extent. And for the discarded coded bits, the coded bits with low information entropy can be recovered at the receiving end in two ways (zero padding or average padding).
Wherein, the step of transmitting S104, optionally, the step of transmitting converts the signal to be transmitted into a floating point number, and then uses analog communication modulation for transmission.
Wherein, optionally, the step of transmitting quantizes the signal to be transmitted into discrete integers, followed by transmission using digital communication modulation.
The comprehensive decoding step S105 has the beneficial effects that an asymmetric structure is adopted, more neural networks and algorithms are deployed at the transmitting end, and the receiving end only uses one information source channel joint decoder, so that the computing burden of the receiving end can be reduced, and mobile end equipment such as a mobile phone and the like can also load real-time video semantic transmission.
Optionally, in the comprehensive decoding step, the semantic decoder expands the received signal to obtain a reconstructed one-dimensional vector, restores the reconstructed one-dimensional vector to a multi-dimensional matrix format to obtain a reconstructed common semantic code and a reconstructed characteristic semantic code, fuses the reconstructed common semantic code and the reconstructed characteristic semantic code to obtain a reconstructed semantic feature map, and obtains a reconstructed frame set based on the reconstructed semantic feature map.
Optionally, in the step of comprehensive decoding, the expanding the received signal to obtain a reconstructed one-dimensional vector includes:
acquiring the position attribute of the received signal based on a preset information entropy model;
and expanding the received signal based on the position attribute to obtain a reconstructed one-dimensional vector.
Optionally, the expanding the received signal based on the location attribute obtains a reconstructed one-dimensional vector, including:
and zero padding is carried out on the code bits discarded by the corresponding transmitting end in the received signals, and the reconstructed one-dimensional vector is obtained by expansion.
Optionally, the expanding the received signal based on the location attribute obtains a reconstructed one-dimensional vector, including:
filling the average value of the code bits discarded by the corresponding transmitting end in the received signal, and expanding to obtain a reconstructed one-dimensional vector; the average value is calculated by the sending end according to the discarded data of the coding bit.
The technical solution of this embodiment may be combined with the technical solutions of other embodiments.
Example two
The embodiment provides a video file semantic communication method with self-adaptive code rate, which is characterized by comprising the following steps:
a video dividing step, wherein a transmitting end divides a video file to be transmitted into a second number of frame sets;
s101, a source channel joint coding step, wherein a transmitting end converts a frame set to be transmitted into a semantic feature map;
step S102 of extracting common features, the sender converts the semantic feature map of the frame set into common semantic codes and characteristic semantic codes;
step S103 of dynamic semantic variable length coding, namely discarding a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted;
a transmission step S104, in which the transmitting end transmits the signal to be transmitted to the receiving end;
a comprehensive decoding step S105, in which the receiving end decodes the received signal to obtain a reconstructed frame set;
and a video recovery step, wherein the receiving end acquires the reconstructed video file according to the second number of reconstructed frame sets.
For an extension of each step of this embodiment, refer to embodiment one.
The technical solution of this embodiment may be combined with the technical solutions of other embodiments.
Example III
The embodiment provides a stream media video semantic communication method with self-adaptive code rate, which is characterized by comprising the following steps:
a frame set acquisition step, wherein a transmitting end acquires a plurality of video frames from streaming media video to be transmitted to be combined into a frame set;
s101, a source channel joint coding step, wherein a transmitting end converts a frame set to be transmitted into a semantic feature map;
step S102 of extracting common features, the sender converts the semantic feature map of the frame set into common semantic codes and characteristic semantic codes;
step S103 of dynamic semantic variable length coding, namely discarding a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted;
a transmission step S104, in which the transmitting end transmits the signal to be transmitted to the receiving end;
a comprehensive decoding step S105, in which the receiving end decodes the received signal to obtain a reconstructed frame set;
and a streaming media recovery step, wherein the receiving end adds the reconstructed frame set into streaming media video.
For an extension of each step of this embodiment, refer to embodiment one.
The technical solution of this embodiment may be combined with the technical solutions of other embodiments.
Example IV
As shown in fig. 2, this embodiment provides a video semantic communication device with adaptive code rate, which is characterized by comprising:
the source channel joint coding module 201, the transmitting end converts the frame set to be transmitted into a semantic feature map;
the common feature extraction module 202, the sender converts the semantic feature map of the frame set into common semantic codes and characteristic semantic codes;
the dynamic semantic variable length coding module 203 discards a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted;
a transmission module 204, where the sending end transmits the signal to be transmitted to the receiving end;
and the comprehensive decoding module 205 decodes the received signal by the receiving end to obtain a reconstructed frame set.
The expansion modes of the modules in the embodiment, see the first embodiment, correspond to the expansion modes of the method steps in the first embodiment one by one.
The technical solution of this embodiment may be combined with the technical solutions of other embodiments.
Example five
The embodiment provides a video semantic communication device with adaptive code rate, which is characterized by comprising:
the video dividing module divides the video file to be transmitted into a second number of frame sets by the transmitting end;
the source channel joint coding module 201, the transmitting end converts the frame set to be transmitted into a semantic feature map;
the common feature extraction module 202, the sender converts the semantic feature map of the frame set into common semantic codes and characteristic semantic codes;
the dynamic semantic variable length coding module 203 discards a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted;
a transmission module 204, where the sending end transmits the signal to be transmitted to the receiving end;
the comprehensive decoding module 205 decodes the received signal by the receiving end to obtain a reconstructed frame set;
and the video recovery module is used for acquiring the reconstructed video file according to the reconstructed frame set of the second number by the receiving end.
The expansion modes of the modules in the embodiment, see the second embodiment, are in one-to-one correspondence with the expansion modes of the method steps in the first embodiment.
The technical solution of this embodiment may be combined with the technical solutions of other embodiments.
Example six
The embodiment provides a video semantic communication device with adaptive code rate, which is characterized by comprising:
the frame set acquisition module is used for acquiring a plurality of video frames from streaming media video to be transmitted by a transmitting end to be combined into a frame set;
the source channel joint coding module 201, the transmitting end converts the frame set to be transmitted into a semantic feature map;
the common feature extraction module 202, the sender converts the semantic feature map of the frame set into common semantic codes and characteristic semantic codes;
the dynamic semantic variable length coding module 203 discards a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted;
a transmission module 204, where the sending end transmits the signal to be transmitted to the receiving end;
the comprehensive decoding module 205 decodes the received signal by the receiving end to obtain a reconstructed frame set;
and the receiving end adds the reconstructed frame set into the streaming video.
The expansion modes of the modules in the embodiment, see the third embodiment, are in one-to-one correspondence with the expansion modes of the method steps in the first embodiment.
The technical solution of this embodiment may be combined with the technical solutions of other embodiments.
Example seven
According to embodiments of the present invention, the present invention also provides an electronic device, a readable storage medium and a computer program product.
The invention also comprises an electronic device comprising a processor, a communication interface, a memory and a communication bus, wherein the communication bus is used for completing the mutual communication among the processor, the communication interface and the memory;
the memory is used for storing a computer program;
the processor is configured to execute the computer program stored in the memory, so as to implement the video semantic communication method with adaptive code rate according to any one of the above technical schemes.
The invention also comprises a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes the video semantic communication method with adaptive code rate according to any one of the technical schemes when being executed by a processor.
The invention also comprises a computer program product which, when executed on a computer, causes the computer to perform the rate adaptive video semantic communication method of any of the above technical solutions.
Fig. 5 shows a schematic block diagram of an example electronic device 500 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.
As shown in fig. 5, the apparatus 500 includes a computing unit 501 that can perform various suitable actions and processes according to a computer program stored in a Read Only Memory (ROM) 502 or a computer program loaded from a storage unit 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data required for the operation of the device 500 can also be stored. The computing unit 501, the ROM 502, and the RAM 803 are connected to each other by a bus 504. An input/output (I/O) interface 505 is also connected to bus 504.
Various components in the device 500 are connected to the I/O interface 505, including: an input unit 506 such as a keyboard, a mouse, etc.; an output unit 507 such as various types of displays, speakers, and the like; a storage unit 508 such as a magnetic disk, an optical disk, or the like; and a communication unit 509 such as a network card, modem, wireless communication transceiver, etc. The communication unit 509 allows the device 500 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 501 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing unit 501 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, etc. The computing unit 501 performs the various methods and processes described above, such as test methods. For example, in some embodiments, the test method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 508. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 500 via the ROM 502 and/or the communication unit 509. When a computer program is loaded into RAM 503 and executed by computing unit 501, one or more steps of the test method described above may be performed. Alternatively, in other embodiments, the computing unit 501 may be configured to perform the test method by any other suitable means (e.g. by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective 999 computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server incorporating a blockchain.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution disclosed in the present invention can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (20)

1. A video semantic communication method with self-adaptive code rate is characterized by comprising the following steps:
a step of joint coding of information source channels, in which a transmitting end converts a frame set to be transmitted into a semantic feature map;
a common feature extraction step, wherein the sending end converts the semantic feature map of the frame set into common semantic codes and characteristic semantic codes;
a dynamic semantic variable length coding step, namely discarding a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted;
a transmission step, wherein the sending end transmits the signal to be transmitted to a receiving end;
and a comprehensive decoding step, wherein the receiving end decodes the received signal to obtain a reconstructed frame set.
2. The method of claim 1, wherein the source channel joint coding step converts each image frame in the set of frames into a semantic feature map based on a pre-trained source model, a channel model;
the channel model can be replaced according to specific application scenes in the training stage.
3. The method of claim 1, wherein the common feature extraction step converts all semantic feature graphs in the set of frames into common semantic codes and feature semantic codes based on an embedded convolutional neural network.
4. A method according to claim 3, wherein said common feature extraction step generates a common semantic code and a first number of characteristic semantic codes in each of said sets of frames; the first number is the number of image frames in the set of frames.
5. The method of claim 1, wherein the dynamic semantic variable length coding step comprises:
based on a preset information entropy model, obtaining the information entropy of each coding bit of the common semantic code and the characteristic semantic code;
and the variable length coder discards a part of coded bits according to the information entropy of each coded bit and a preset threshold value, flattens the coded bits into a one-dimensional vector and splices the one-dimensional vector to generate a signal to be transmitted.
6. The method of claim 5, wherein in the dynamic semantic variable length coding step, the random variable of the information entropy model includes at least one of: channel conditions, quality of service, latency requirements, bandwidth limitations;
and comparing the information entropy of the coded bits with the preset threshold value, and discarding the coded bits lower than the preset threshold value.
7. The method of claim 6, wherein in the step of dynamic semantic variable length coding, the discarding a portion of the data to generate a signal to be transmitted comprises:
and discarding the code bits to be discarded to obtain a shortened one-dimensional vector serving as a signal to be transmitted.
8. The method of claim 7, wherein in the step of dynamic semantic variable length coding, the discarding a portion of the data to generate a signal to be transmitted further comprises:
calculating an average value for coding bits to be discarded; the mean value is used for filling the received one-dimensional vector when the receiving end recovers.
9. The method of claim 1, wherein the transmitting step converts the signal to be transmitted into a floating point number, and then modulates the transmission using analog communication.
10. The method of claim 1, wherein the transmitting step quantizes the signal to be transmitted into discrete integers, followed by transmission using digital communication modulation.
11. The method of claim 1, wherein in the step of comprehensively decoding, the semantic decoder expands the received signal to obtain a reconstructed one-dimensional vector, restores the reconstructed one-dimensional vector to a multi-dimensional matrix format to obtain a reconstructed common semantic code and a reconstructed characteristic semantic code, fuses the reconstructed common semantic code and the reconstructed characteristic semantic code to obtain a reconstructed semantic feature map, and obtains a reconstructed frame set based on the reconstructed semantic feature map.
12. The method of claim 1, wherein in the step of comprehensively decoding, the expanding the received signal to obtain a reconstructed one-dimensional vector comprises:
acquiring the position attribute of the received signal based on a preset information entropy model;
and expanding the received signal based on the position attribute to obtain a reconstructed one-dimensional vector.
13. The method of claim 12, wherein the expanding the received signal based on the location attribute results in a reconstructed one-dimensional vector, comprising:
and zero padding is carried out on the code bits discarded by the corresponding transmitting end in the received signals, and the reconstructed one-dimensional vector is obtained by expansion.
14. The method of claim 12, wherein the expanding the received signal based on the location attribute results in a reconstructed one-dimensional vector, comprising:
filling the average value of the code bits discarded by the corresponding transmitting end in the received signal, and expanding to obtain a reconstructed one-dimensional vector; the average value is calculated by the sending end according to the discarded data of the coding bit.
15. The method of claim 1, for video file transmission, the method further comprising, prior to:
a video dividing step, wherein a transmitting end divides a video file to be transmitted into a second number of frame sets;
the method further comprises the following steps:
and a video recovery step, wherein the receiving end acquires the reconstructed video file according to the second number of reconstructed frame sets.
16. The method of claim 1, for streaming video transmission, the method further comprising, prior to:
a frame set acquisition step, wherein a transmitting end acquires a plurality of video frames from streaming media video to be transmitted to be combined into a frame set;
the method further comprises the following steps:
and a streaming media recovery step, wherein the receiving end adds the reconstructed frame set into streaming media video.
17. A rate adaptive video semantic communication device, comprising:
the information source channel joint coding module is used for converting a frame set to be transmitted into a semantic feature map by a transmitting end;
the common feature extraction module is used for converting the semantic feature map of the frame set into common semantic codes and characteristic semantic codes by the sending end;
the dynamic semantic variable length coding module discards a part of data from the common semantic code and the characteristic semantic code to generate a signal to be transmitted;
the transmission module is used for transmitting the signal to be transmitted to the receiving end by the sending end;
and the comprehensive decoding module is used for decoding the received signals by the receiving end to obtain a reconstructed frame set.
18. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-15.
19. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-15.
20. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any of claims 1-15.
CN202310641159.0A 2023-06-01 2023-06-01 Video semantic communication method with adaptive code rate and related device Pending CN116896651A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310641159.0A CN116896651A (en) 2023-06-01 2023-06-01 Video semantic communication method with adaptive code rate and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310641159.0A CN116896651A (en) 2023-06-01 2023-06-01 Video semantic communication method with adaptive code rate and related device

Publications (1)

Publication Number Publication Date
CN116896651A true CN116896651A (en) 2023-10-17

Family

ID=88311380

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310641159.0A Pending CN116896651A (en) 2023-06-01 2023-06-01 Video semantic communication method with adaptive code rate and related device

Country Status (1)

Country Link
CN (1) CN116896651A (en)

Similar Documents

Publication Publication Date Title
US20200145692A1 (en) Video processing method and apparatus
CN109600618B (en) Video compression method, decompression method, device, terminal and medium
US9454552B2 (en) Entropy coding and decoding using polar codes
CN107481295B (en) Image compression system of convolutional neural network based on dynamic byte length distribution
CN109903351B (en) Image compression method based on combination of convolutional neural network and traditional coding
WO2021031877A1 (en) Methods and apparatus for image coding and decoding, and chip
CN113538287B (en) Video enhancement network training method, video enhancement method and related devices
CN113676730B (en) Video coding method and device, electronic equipment and storage medium
Bedruz et al. Comparison of Huffman Algorithm and Lempel-Ziv Algorithm for audio, image and text compression
Mahmud An improved data compression method for general data
CN111918071A (en) Data compression method, device, equipment and storage medium
CN102687509B (en) Use the scalable compression of JPEG-LS
WO2023179800A1 (en) Communication receiving method and apparatus thereof
CN114614829A (en) Satellite data frame processing method and device, electronic equipment and readable storage medium
CN116896651A (en) Video semantic communication method with adaptive code rate and related device
CN107205151B (en) Coding and decoding device and method based on mixed distortion measurement criterion
CN111405293A (en) Video transmission method and device
Shin et al. RL-SPIHT: reinforcement learning-based adaptive selection of compression ratios for 1-D SPIHT algorithm
CN115361556A (en) High-efficiency video compression algorithm based on self-adaption and system thereof
CN104113394B (en) The compression of communication modulation signal and decompressing method
CN105306941B (en) A kind of method for video coding
CN113554719A (en) Image encoding method, decoding method, storage medium and terminal equipment
WO2024007843A9 (en) Encoding method and apparatus, decoding method and apparatus, and computer device
CN114827289B (en) Communication compression method, system, electronic device and storage medium
CN110753241B (en) Image coding and decoding method and system based on multiple description networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination