CN113315972B - Video semantic communication method and system based on hierarchical knowledge expression - Google Patents

Video semantic communication method and system based on hierarchical knowledge expression Download PDF

Info

Publication number
CN113315972B
CN113315972B CN202110543408.3A CN202110543408A CN113315972B CN 113315972 B CN113315972 B CN 113315972B CN 202110543408 A CN202110543408 A CN 202110543408A CN 113315972 B CN113315972 B CN 113315972B
Authority
CN
China
Prior art keywords
semantic
video
network
knowledge base
level
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110543408.3A
Other languages
Chinese (zh)
Other versions
CN113315972A (en
Inventor
石光明
高大化
杨旻曦
张中强
董宇波
谢雪梅
刘丹华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xidian University
Original Assignee
Xidian University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xidian University filed Critical Xidian University
Priority to CN202110543408.3A priority Critical patent/CN113315972B/en
Publication of CN113315972A publication Critical patent/CN113315972A/en
Application granted granted Critical
Publication of CN113315972B publication Critical patent/CN113315972B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/169Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the coding unit, i.e. the structural portion or semantic portion of the video signal being the object or the subject of the adaptive coding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/90Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using coding techniques not provided for in groups H04N19/10-H04N19/85, e.g. fractals
    • H04N19/91Entropy coding, e.g. variable length coding [VLC] or arithmetic coding
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/23Processing of content or additional data; Elementary server operations; Server middleware
    • H04N21/234Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs
    • H04N21/23418Processing of video elementary streams, e.g. splicing of video streams, manipulating MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream, rendering scenes according to MPEG-4 scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream

Abstract

The invention provides a video semantic communication method based on hierarchical knowledge expression, which mainly solves the problems of incomplete semantic extraction, insufficient semantic representation capability and redundant semantic description in the prior art. The implementation scheme is as follows: constructing a hierarchical knowledge base consisting of a multi-level signal perception network, a semantic abstract network, a semantic reconstruction network and a video reconstruction network; collecting a video signal to be transmitted; extracting the structural semantic features of the video signal based on a signal perception network and a semantic abstract network in a hierarchical knowledge base, and transmitting the video signal through an ultra-narrow band channel; and reconstructing a video signal by utilizing a semantic reconstruction network and a signal reconstruction network in the hierarchical knowledge base according to the structural semantic features. According to the invention, by mining semantic features of different scales and using a structured data structure to represent semantics, not only is the integrity of semantic extraction improved, but also the semantic representation capability and the communication bandwidth utilization rate are improved, and the method can be used for online conferences, human-computer interaction and intelligent Internet of things.

Description

Video semantic communication method and system based on hierarchical knowledge expression
Technical Field
The invention belongs to the technical field of video communication, and particularly relates to a video semantic communication method and system based on hierarchical knowledge expression, which can be used for online conferences, man-machine interaction and intelligent Internet of things.
Background
Video communication is a communication service that delivers video information. The video communication technology commonly used at present realizes the transmission of video information by completely transmitting video signals, and judges the quality of video communication according to the integrity of the transmitted signals. With the rapid improvement of video signal definition and the geometric increase of the number of communication terminals, the communication bandwidth gradually approaching the growth limit cannot meet the requirements of video communication scenes in intelligent times backgrounds such as large online interactive conferences and intelligent internet of things.
In the conventional video communication method, a video signal is compressed and encoded based on a signal compression algorithm represented by wavelet transform, the obtained code is transmitted through a channel, and finally the video signal is reconstructed at a receiving end. However, due to the limitation of the compression algorithm, the code rate after compression still cannot meet the video communication requirement of the intelligent era, for example, the bandwidth of 40Mbps is required for transmitting a single 4K/30 frame of video by using the current latest video coding method H265, and tens of channels of video cannot be transmitted simultaneously by using a 5G terminal device to meet the interaction requirement of a large online conference.
The video communication method based on deep learning includes that a video encoder based on a deep network converts video signals into feature vectors, then the feature vectors are transmitted through a channel, and finally a video decoder based on a countermeasure generation network restores the video signals according to the obtained feature vectors at a receiving end. Compared with the traditional video communication method, the neural network can generate the feature vectors with any length, so that the compression rate of the video communication method based on deep learning can be high. However, since a large amount of data and time are needed for training a deep network, the trained network can only be used in a specific scene, and when the scene changes, a data set needs to be reconstructed and trained, and the training is not flexible. In addition, since the extracted features are not targeted, in the case of high compression rate, key details are easily lost, resulting in distortion of the generated video.
In many application scenarios, video communication does not need to transmit the video signal completely, but needs to convey semantic information represented by the video signal, for example, in an interactive video conference scenario, the two communicating parties need meaning conveyed by facial expressions and body movements, but do not need information such as environment, clothes texture, etc. of the other party. Therefore, the video semantic communication method for extracting and transmitting the semantics expressed by the video signals in the target scene and reproducing the video signals according to the semantics can effectively save the communication bandwidth so as to meet the video communication requirement of the intelligent era. For example, a patent application with publication number CN111246176A entitled "a segmented video transmission method" discloses a video transmission method based on text semantics. The method comprises the steps of firstly identifying and extracting a foreground target in a video signal by a sender, then using feature point coordinates of the foreground target described by a text as text semantics, then transmitting the text semantics to a receiver, then reconstructing a target contour by the sender according to the text semantics, and finally reconstructing the video signal according to the target contour by a trained generation network. The method improves the compression efficiency and transmits the semantic information of the video signal by extracting, transmitting and reconstructing the semantics in the video signal, but has the following two disadvantages:
firstly, because the transmission semantics of the method are the outlines of the foreground targets, only coordinates can be transmitted, information including target colors, textures and the like cannot be transmitted, the semantics extraction is incomplete, and the application scene is limited;
secondly, the method adopts the unstructured data structures such as texts to represent semantics, and the semantic representation capability of the unstructured data structures is insufficient, so that the semantic types which can be transmitted by the method are not rich, the expression is not accurate, and the unstructured description has redundancy, so that the communication bandwidth is wasted.
Disclosure of Invention
The invention aims to provide a video semantic communication method and system based on hierarchical knowledge expression, which are used for solving the problems of incomplete semantic extraction, insufficient semantic representation capability and redundant semantic description in the prior art, expanding application scenes and avoiding waste of communication bandwidth.
In order to achieve the purpose, the video semantic communication method based on hierarchical knowledge expression comprises the following steps:
1) constructing a hierarchical knowledge base K:
1a) establishing semantic perception knowledge base K0For storing the primary structured semantic features G extracted from the video0Signal-aware network We 0And structuring semantic features G from the primary0Signal reconstruction network for reconstructing video
Figure BDA0003072635490000021
1b) Establishing a semantic abstract knowledge base K with L levels gradually increasedlFor storing semantic features G structured from a lower levell-1Middle generation of high-level structured semantic features GlSemantic abstract network of
Figure BDA0003072635490000022
And structuring semantic features G from a higher levellReconstructing a low-level structured semantic feature Gl-1Semantic restructuring network
Figure BDA0003072635490000023
Wherein L is more than or equal to 1, L is the serial number of the semantic hierarchy, and L is more than or equal to 1 and less than or equal to L;
1c) to semantically perceive a knowledge base K0And L semantic abstract knowledge bases K with gradually increased levelslForming a hierarchical knowledge base K according to a hierarchical sequence;
2) collecting an F frame original video V to be transmitted, wherein F is more than or equal to 1;
3) signal perception network W based on hierarchical knowledge base Ke 0And semantic abstract network
Figure BDA0003072635490000031
Extracting semantic features in the original video V to obtain top-level structured semantic features G corresponding to the original video VL
4) Setting an ultra-narrow band channel with the bandwidth Q less than or equal to 4Kbps, and performing special semantic processing on the top level structured semantic corresponding to the original video VSign GLCarrying out transmission;
5) signal reconstruction network based on hierarchical knowledge base K
Figure BDA0003072635490000032
And semantic restructuring networks
Figure BDA0003072635490000033
For received top level structured semantic features GLAnd restoring to obtain a reconstructed video V'.
In order to achieve the above object, the video semantic communication system based on hierarchical knowledge expression of the present invention includes:
the video acquisition device is used for acquiring an original video;
the semantic encoder is connected with the video acquisition device and used for performing semantic encoding on the original video to obtain semantic features of the original video;
the ultra-narrow band communication device is connected with the semantic encoder and is used for transmitting the characteristics of the video on an ultra-narrow band channel;
the semantic decoder is connected with the ultra-narrow band communication device and is used for reconstructing semantic features to obtain a reconstructed video;
the video display device is connected with the semantic decoder and is used for displaying the reconstructed video;
the method is characterized in that:
the knowledge base query port of the semantic encoder is connected with an information source level knowledge base which stores a semantic extraction network and is used for encoding the original video level by level to obtain the structural semantic features corresponding to the original video;
and the knowledge base query port of the semantic encoder is connected with an information sink level knowledge base in which a video reconstruction network is stored, and is used for reconstructing the structural semantic features level by level to obtain a reconstructed video.
Compared with the prior art, the invention has the following advantages:
firstly, in the process of extracting the video signal semantics, the invention adopts the step-by-step semantics extraction based on the hierarchical knowledge base for the video signal, fully excavates the semantic features of different scales in the video signal, avoids the single type of semantic features used in the prior art, and effectively improves the integrity of the semantics extraction;
secondly, in the process of representing the video signal semantics, the invention uses more flexible structural semantic representation, and efficiently represents the semantic objects in the video signal and the interactive relationship between the semantic objects by using the relationship between the objects in the structure, thereby avoiding the phenomena of inaccurate semantic description and redundant description caused by using text to represent the semantics in the prior art, and effectively improving the semantic representation capability and the communication bandwidth utilization rate.
Drawings
FIG. 1 is a flow chart of an implementation of a video semantic communication method based on hierarchical knowledge representation according to the present invention;
fig. 2 is a schematic structural diagram of a video semantic communication system based on hierarchical knowledge expression according to the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the specific embodiments.
The embodiment is oriented to the video semantic communication requirement of the online interactive conference, and the video semantic communication method and the video semantic communication system based on hierarchical knowledge expression are realized aiming at single body action and multi-person interactive behavior in a conference scene.
Referring to fig. 1, the video semantic communication method based on hierarchical knowledge expression of the present embodiment includes the following steps:
step 1, constructing a hierarchical knowledge base K.
1.1) establishing a semantic perception knowledge base K0For storing the primary structured semantic features G extracted from the video0Signal-aware network We 0And structuring semantic features G from the primary0Signal reconstruction network for reconstructing video
Figure BDA0003072635490000041
In this example, both single-person limb actions and multi-person interaction actions in scene semanticsCan be described by semantic objects, such as joint points, persons and relations between the semantic objects, such as bones and person interaction relations, and in order to effectively represent the relations between the semantic objects and the semantic objects, the primary structured semantic features G0Signal aware network We 0And signal reconstruction network
Figure BDA0003072635490000042
The structures of (a) are respectively as follows:
the primary structured semantic feature G0Characterized by a semantic graph consisting of points and edges, which comprises a primary set of nodes A0And a primary set of edges B0Wherein:
primary node set
Figure BDA0003072635490000043
By
Figure BDA0003072635490000044
A primary node
Figure BDA0003072635490000045
Comprises 14 human body joint points of a head, a neck, a left shoulder, a right shoulder, a left elbow, a right elbow, a left hand, a right hand, a left hip, a right hip, a left knee, a right knee, a left foot and a right foot, i0Is the sequence number of the node in the primary set of nodes,
Figure BDA0003072635490000046
is the total number of nodes in the primary set of nodes,
Figure BDA0003072635490000047
is the ith0A node defined by the semantic class of the node
Figure BDA0003072635490000048
Semantic feature vector
Figure BDA0003072635490000049
And child node set
Figure BDA00030726354900000410
Composition, semantic feature vectors
Figure BDA00030726354900000411
A two-dimensional vector formed by plane two-dimensional coordinates of the joint points;
primary set of edges
Figure BDA00030726354900000412
By
Figure BDA00030726354900000413
A primary side
Figure BDA00030726354900000414
Composition comprising the bones of the arms, thighs, which connect the joint points, j0Is the sequence number of the edge in the primary edge set,
Figure BDA0003072635490000051
is the total number of edges in the primary edge set,
Figure BDA0003072635490000052
is jth0An edge defined by the semantic class of the edge
Figure BDA0003072635490000053
And semantic feature vectors
Figure BDA0003072635490000054
Composition, semantic feature vectors
Figure BDA0003072635490000055
A one-dimensional vector formed by scalars reflecting the shielding degree;
the signal-aware network We 0According to character actions and interactions in a meeting scene, a trained human posture detection network is selected as a primary structured semantic feature G extracted from a video0Signal-aware network We 0
The signal reconstruction network
Figure BDA0003072635490000056
According to character actions and interactions in a meeting scene, a trained video image generation network based on a semantic graph is selected as a primary structured semantic feature G extracted from a video0Signal-aware network We 0
At present, openpos, alphapos and the like are commonly used as human body posture detection networks, and graph2image, SPADE and the like are commonly used as semantic graph-based video image generation networks, and the example preferably but not limited to use alphapos as We 0Using graph2image as
Figure BDA0003072635490000057
1.2) establishing a semantic abstract knowledge base K with L levels gradually increasedlFor storing semantic features G structured from a lower levell-1Middle generation of high-level structured semantic features GlSemantic abstract network of
Figure BDA0003072635490000058
And structuring semantic features G from a higher levellReconstructing a low-level structured semantic feature Gl-1Semantic restructuring network
Figure BDA0003072635490000059
Wherein L is more than or equal to 1, L is the serial number of the semantic hierarchy, and L is more than or equal to 1 and less than or equal to L;
in this example, character actions and interactions in a meeting scenario involve three levels of semantics, from primary to advanced: the joint point level, the single-person posture level and the multi-person interaction level need to use a level semantic graph formed by a plurality of semantic graphs as a structural representation mode of multi-level semantics. Its high-level structured semantic features G0Semantic abstract network
Figure BDA00030726354900000510
And semantic restructuring networks
Figure BDA00030726354900000511
The structures of (a) are respectively as follows:
the advanced structured semantic feature G0Characterized using a semantic graph consisting of points and edges, which includes a set of advanced nodes AlAnd advanced edge set BlWherein:
advanced node set
Figure BDA00030726354900000512
ilThe sequence numbers of the nodes in the higher level node set,
Figure BDA00030726354900000513
is the total number of nodes in the advanced set of nodes,
Figure BDA00030726354900000514
is the ithlA node defined by the semantic class of the node
Figure BDA00030726354900000515
Semantic feature vector
Figure BDA00030726354900000516
And child node set
Figure BDA00030726354900000517
Composition is carried out;
advanced edge set
Figure BDA0003072635490000061
jlIs the sequence number of an edge in the advanced edge set,
Figure BDA0003072635490000062
is the total number of edges in the advanced edge set,
Figure BDA0003072635490000063
is jthlAn edge defined by the semantic class of the edge
Figure BDA0003072635490000064
And semantic feature vectors
Figure BDA0003072635490000065
Composition is carried out;
the semantic abstraction network
Figure BDA0003072635490000066
According to the feature extraction of graph structure data, a trained downsampling network based on a graph convolution neural network is selected as a low-level structured semantic feature Gl-1Middle generation of high-level structured semantic features GlSemantic abstract network of
Figure BDA0003072635490000067
The semantic restructuring network
Figure BDA0003072635490000068
According to the feature reduction of the graph structure data, a trained up-sampling network based on a graph convolution neural network is selected as a semantic feature G which is structured from a higher levellReconstructing a low-level structured semantic feature Gl-1Semantic restructuring network
Figure BDA0003072635490000069
The convolutional neural networks commonly used at present are GCN, GraphSAGE and GAT, and the example is preferably but not limited to using a downsampling network based on GAT as the convolutional neural network
Figure BDA00030726354900000610
Using a GAT-based up-sampling network as
Figure BDA00030726354900000611
1.3) knowledge base K of semantic perception0And L semantic abstract knowledge bases K with gradually increased levelslAnd (4) forming a hierarchical knowledge base K according to the hierarchical order.
Step 2, acquiring an F-frame original video V to be transmitted, where in this example, F is 1.
Step 3, based on the signal perception network W in the hierarchical knowledge base Ke 0And semantic abstract network
Figure BDA00030726354900000612
Extracting semantic features in the original video V to obtain top-level structured semantic features G corresponding to the original video VL
3.1) knowledge base K based on semantic perception0Signal aware network W ine 0Extracting primary semantic features in the original video V to obtain primary structured semantic features G0
3.2) the semantic abstract knowledge base K which is gradually increased based on L levels in sequencelExtracting primary structured semantic features G0Obtaining top level structural semantic feature G from the high level semantic featuresL
3.2.1) let l be 1;
3.2.2) semantic-based abstract knowledge base KlSemantic abstract network in (1)
Figure BDA00030726354900000613
Structuring semantic features G from a lower levell-1Generating high-level structured semantic features Gl
3.2.3) judging whether L is more than or equal to L, if so, obtaining the top-level structural semantic feature GLOtherwise, let l equal to l +1, return to 3.2.2).
Step 4, transmitting the top-level structured semantic features G corresponding to the original video V through the ultra-narrow band channelL
4.1) Top-level structured semantic features G at the sending endLBinary coding is carried out to obtain binary code Sb
The currently used binary coding methods include arithmetic coding, huffman coding and the like, and the example is preferably but not limited to arithmetic coding;
4.2) modulation of binary code S at the transmitting endbObtaining a signal S and processing the signal S through an ultra-narrow band channel with the bandwidth of Q ═ 3KbpsTransmitting;
4.3) demodulating the signal S at the receiving end to obtain a binary code SbAnd for the binary code SbBinary decoding is carried out to obtain top-level structural semantic features GL
Step 5, restoring the received top-level structured semantic features G based on the hierarchical knowledge base KLAnd a reconstructed video V' is obtained.
For the top level structured semantic feature GLThe reduction of (1) is to extract the top-level structural semantic feature G corresponding to the original video VLThe implementation steps are as follows:
5.1) the semantic abstract knowledge base K which is gradually reduced based on L levels in sequencelRestoring top-level structured semantic features GLTo obtain primary structured semantic features G0
5.1.1) making L ═ L;
5.1.2) semantic-based abstract knowledge base KlSemantic restructuring networks in
Figure BDA0003072635490000071
Structuring semantic features G from a higher levellReducing lower-level structured semantic features Gl-1
5.1.3) judging whether l is less than or equal to 1, if so, obtaining a primary structured semantic feature G0Execution 5.2), otherwise, let l ═ l-1, return 5.1.2);
5.2) knowledge base K based on semantic perception0Signal reconstruction network in
Figure BDA0003072635490000072
For primary structured semantic features G0And carrying out video reconstruction to obtain a reconstructed video V'.
Referring to fig. 2, the video semantic communication system based on hierarchical knowledge expression in the present example includes: the system comprises a video acquisition device 1, a semantic encoder 2, an information source level knowledge base 6, an ultra-narrow band communication device 3, a semantic decoder 4, an information sink level knowledge base 7 and a video display device 5, wherein:
the video acquisition device 1 is connected with the semantic encoder 2 through a video data receiving port and is used for acquiring an original video;
in this example, it is preferable, but not limited, to use a camera with a resolution of 2K and a frame rate of 30 frames per second as the video capture device;
the semantic encoder 2 is connected with the video acquisition device 1 through a video data receiving port, is connected with the information source level knowledge base 6 through a knowledge base query port, and is connected with the ultra-narrow band communication device 3 through a semantic transmitting port, and is used for encoding the original video level by level to obtain the structural semantic features corresponding to the original video;
and the information source level knowledge base 6 is connected with the semantic encoder 2 through a knowledge base query port and is used for storing the semantic extraction network. In this example, it is preferable, but not limited, to use a PC workstation as the sender in the system and as the hardware platform for the semantic encoder 2 and the source-level knowledge base 6;
the ultra-narrow band communication device 3 is connected with the semantic encoder 2 through a semantic sending port and connected with the semantic decoder 4 through a semantic receiving port, and is used for transmitting the structural semantic features of the video on an ultra-narrow band channel. In this example, in order to intuitively show that the bandwidth adopted in this example is only Q ═ 3Kbps, so this example preferably but not limited to the sound wave with frequency of 3KHz that can be heard by human ears as the carrier wave, the transmitting end uses the loudspeaker to transmit the sound wave signal, and the receiving end uses the microphone to receive the sound wave signal;
the semantic decoder 4 is connected with the video display device 5 through a video data sending port, is connected with the information sink level knowledge base 7 through a knowledge base query port, and is connected with the ultra-narrow band communication device 3 through a semantic receiving port, and is used for reconstructing the structural semantic features level by level to obtain a reconstructed video;
and the information sink level knowledge base 7 is connected with the semantic decoder 4 through a knowledge base query port and is used for storing the video reconstruction network. In this example, it is preferable, but not limited, to use a PC workstation as the receiving end in the system, and as the hardware platform for the semantic decoder 4 and the sink level knowledge base 7;
and the video display device 5 is connected with the semantic decoder 4 through a video signal sending port and is used for displaying the reconstructed video. In this example, it is preferable, but not limited, to use a display with a resolution of 2K and a frame rate of 30 frames per second as the video display device.
The working principle of the system of the embodiment is as follows:
the video acquisition device 1 acquires an original video signal and sends the original video signal to the semantic encoder 2; the semantic encoder 2 queries the information source level knowledge base 6 to obtain a semantic extraction network; the semantic encoder 2 encodes the original video layer by layer according to the queried semantic extraction network to obtain the structural semantic features corresponding to the original video and sends the structural semantic features to the ultra-narrow band communication device 3; the ultra-narrow band communication device 3 transmits the structural semantic features of the video on an ultra-narrow band channel and sends the structural semantic features to the semantic decoder 4; the semantic decoder 4 queries the information sink level knowledge base 7 to obtain a video reconstruction network; the semantic decoder 4 reconstructs the structural semantic features layer by layer according to the inquired video reconstruction network to obtain a reconstructed video and sends the reconstructed video to the video display device 5; the video display device 5 displays the reconstructed video.
The foregoing description is only an example of the present invention and is not intended to limit the invention, so that it will be apparent to those skilled in the art that various changes and modifications in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims (5)

1. The video semantic communication method based on hierarchical knowledge expression is characterized by comprising the following steps of:
1) constructing a hierarchical knowledge base K:
1a) establishing semantic perception knowledge base K0For storing the primary structured semantic features G extracted from the video0Signal aware network of
Figure FDA0003518925380000011
And structuring semantic features G from the primary0Signal reconstruction network for reconstructing video
Figure FDA0003518925380000012
Primary structured semantic features G0By using a primary node set A0And a primary set of edges B0A graph structure of compositions, wherein:
primary node set
Figure FDA0003518925380000013
i0Is the sequence number of the node in the primary set of nodes,
Figure FDA0003518925380000014
is the total number of nodes in the primary set of nodes,
Figure FDA0003518925380000015
is the ith0A node defined by the semantic class of the node
Figure FDA0003518925380000016
Semantic feature vector
Figure FDA0003518925380000017
And child node set
Figure FDA0003518925380000018
Composition is carried out;
primary set of edges
Figure FDA0003518925380000019
j0Is the sequence number of the edge in the primary edge set,
Figure FDA00035189253800000110
is the total number of edges in the primary edge set,
Figure FDA00035189253800000111
is jth0An edge defined by the semantic class of the edge
Figure FDA00035189253800000112
And semantic feature vectors
Figure FDA00035189253800000113
Composition is carried out;
1b) establishing a semantic abstract knowledge base K with L levels gradually increasedlFor storing semantic features G structured from a lower levell-1Middle generation of high-level structured semantic features GlSemantic abstract network of
Figure FDA00035189253800000114
And structuring semantic features G from a higher levellReconstructing a low-level structured semantic feature Gl-1Semantic restructuring network
Figure FDA00035189253800000115
Wherein L is more than or equal to 1, L is the serial number of the semantic hierarchy, and L is more than or equal to 1 and less than or equal to L; l-th level structured semantic feature GlBy using a set of advanced nodes AlAnd advanced edge set BlA graph structure of compositions, wherein:
advanced node set
Figure FDA00035189253800000116
ilThe sequence numbers of the nodes in the higher level node set,
Figure FDA00035189253800000117
is the total number of nodes in the advanced set of nodes,
Figure FDA00035189253800000118
is the ithlA node defined by the semantic class of the node
Figure FDA00035189253800000119
Semantic feature vector
Figure FDA00035189253800000120
And child node set
Figure FDA00035189253800000121
Composition is carried out;
advanced edge set
Figure FDA00035189253800000122
jlIs the sequence number of an edge in the advanced edge set,
Figure FDA00035189253800000123
is the total number of edges in the advanced edge set,
Figure FDA00035189253800000124
is jthlAn edge defined by the semantic class of the edge
Figure FDA00035189253800000125
And semantic feature vectors
Figure FDA00035189253800000126
Composition is carried out;
1c) to semantically perceive a knowledge base K0And L semantic abstract knowledge bases K with gradually increased levelslForming a hierarchical knowledge base K according to a hierarchical sequence;
2) collecting an F frame original video V to be transmitted, wherein F is more than or equal to 1;
3) signal perception network based on hierarchical knowledge base K
Figure FDA0003518925380000021
And semantic abstract network
Figure FDA0003518925380000022
Extracting semantic features in the original video V to obtainTo the top level structured semantic feature G corresponding to the original video VL(ii) a The method is realized as follows:
3a) knowledge base K based on semantic perception0Signal aware network in
Figure FDA0003518925380000023
Extracting primary semantic features in the original video V to obtain primary structured semantic features G0
3b) Semantic abstract knowledge base K sequentially based on L levels which are gradually increasedlExtracting primary structured semantic features G0Obtaining top level structural semantic feature G from the high level semantic featuresL
3b1) Let l equal to 1;
3b2) semantic abstract knowledge base KlSemantic abstract network in (1)
Figure FDA0003518925380000024
Structuring semantic features G from a lower levell-1Generating high-level structured semantic features Gl
3b3) Judging whether L is more than or equal to L, if so, obtaining top-level structural semantic features GLOtherwise, let l equal to l +1, return to 3b 2);
4) setting an ultra-narrow band channel with the bandwidth Q less than or equal to 4Kbps, and carrying out top level structural semantic feature G corresponding to the original video VLCarrying out transmission;
5) signal reconstruction network based on hierarchical knowledge base K
Figure FDA0003518925380000025
And semantic restructuring networks
Figure FDA0003518925380000026
For received top level structured semantic features GLAnd restoring to obtain a reconstructed video V'.
2. The method of claim 1, wherein the signal-aware network in 1a)
Figure FDA0003518925380000027
And signal reconstruction network
Figure FDA0003518925380000028
And respectively adopting the trained semantic graph to generate a network and the trained video reconstruction network based on the semantic graph.
3. The method of claim 1, wherein the semantic abstraction network of 1b)
Figure FDA0003518925380000031
And semantic restructuring networks
Figure FDA0003518925380000032
Respectively adopting a trained down-sampling network based on graph convolution and a trained up-sampling network based on graph convolution.
4. The method according to claim 1, wherein the top-level structured semantic features G corresponding to the original video V in 4) are transmitted through an ultra-narrow band channelLThe implementation is as follows:
4a) top-level structured semantic features G at a sending endLBinary coding is carried out to obtain binary code Sb
4b) Modulating binary code S at transmitting endbObtaining a signal S and transmitting the signal S through an ultra-narrow band channel;
4c) demodulating the signal S at the receiving end to obtain a binary code Sb
4d) At the receiving end, a binary code SbBinary decoding is carried out to obtain top-level structured semantic features GL
5. The method of claim 1, wherein 5) reconstructing the network based on the signals in the hierarchical knowledge base K
Figure FDA0003518925380000033
And semantic restructuring networks
Figure FDA0003518925380000034
The reconstructed video V' is restored as follows:
5a) semantic abstract knowledge base K which is gradually reduced based on L levels in sequencelRestoring top-level structured semantic features GLTo obtain primary structured semantic features G0
5a1) Let L be L;
5a2) semantic abstract knowledge base KlSemantic restructuring networks in
Figure FDA0003518925380000035
Structuring semantic features G from a higher levellReducing lower-level structured semantic features Gl-1
5a3) Judging whether l is less than or equal to 1, if so, obtaining primary structural semantic features G0Execute 5b), otherwise, let l ═ l-1, return 5a 2);
5b) knowledge base K based on semantic perception0Signal reconstruction network in
Figure FDA0003518925380000036
For primary structured semantic features G0And carrying out video reconstruction to obtain a reconstructed video V'.
CN202110543408.3A 2021-05-19 2021-05-19 Video semantic communication method and system based on hierarchical knowledge expression Active CN113315972B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110543408.3A CN113315972B (en) 2021-05-19 2021-05-19 Video semantic communication method and system based on hierarchical knowledge expression

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110543408.3A CN113315972B (en) 2021-05-19 2021-05-19 Video semantic communication method and system based on hierarchical knowledge expression

Publications (2)

Publication Number Publication Date
CN113315972A CN113315972A (en) 2021-08-27
CN113315972B true CN113315972B (en) 2022-04-19

Family

ID=77373538

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110543408.3A Active CN113315972B (en) 2021-05-19 2021-05-19 Video semantic communication method and system based on hierarchical knowledge expression

Country Status (1)

Country Link
CN (1) CN113315972B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113705245B (en) * 2021-09-01 2022-09-27 北京邮电大学 Semantic communication method, device, system, computer equipment and storage medium
CN114363557B (en) * 2022-03-04 2022-06-24 西安电子科技大学 Semantic fidelity-oriented virtual conference method and three-dimensional virtual conference system
CN115146125B (en) * 2022-05-27 2023-02-03 北京科技大学 Receiving end data filtering method and device under semantic communication multi-address access scene
CN116073889B (en) * 2023-02-06 2023-09-01 中国科学院微小卫星创新研究院 Satellite communication network architecture based on semantic content
CN116386087B (en) * 2023-03-31 2024-01-09 阿里巴巴(中国)有限公司 Target object processing method and device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110113616A (en) * 2019-06-05 2019-08-09 杭州电子科技大学 A kind of multi-layer monitor video Efficient Compression coding, decoding apparatus and method
CN111813974A (en) * 2020-07-08 2020-10-23 广州市多米教育科技有限公司 Self-adaptive practice system based on image semantic analysis
CN112528873A (en) * 2020-12-15 2021-03-19 西安电子科技大学 Signal semantic recognition method based on multi-stage semantic representation and semantic calculation
CN112800247A (en) * 2021-04-09 2021-05-14 华中科技大学 Semantic encoding/decoding method, equipment and communication system based on knowledge graph sharing

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1658234B (en) * 2004-02-18 2010-05-26 国际商业机器公司 Method and device for generating hierarchy visual structure of semantic network
US20100306197A1 (en) * 2008-05-27 2010-12-02 Multi Base Ltd Non-linear representation of video data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110113616A (en) * 2019-06-05 2019-08-09 杭州电子科技大学 A kind of multi-layer monitor video Efficient Compression coding, decoding apparatus and method
CN111813974A (en) * 2020-07-08 2020-10-23 广州市多米教育科技有限公司 Self-adaptive practice system based on image semantic analysis
CN112528873A (en) * 2020-12-15 2021-03-19 西安电子科技大学 Signal semantic recognition method based on multi-stage semantic representation and semantic calculation
CN112800247A (en) * 2021-04-09 2021-05-14 华中科技大学 Semantic encoding/decoding method, equipment and communication system based on knowledge graph sharing

Also Published As

Publication number Publication date
CN113315972A (en) 2021-08-27

Similar Documents

Publication Publication Date Title
CN113315972B (en) Video semantic communication method and system based on hierarchical knowledge expression
Qin et al. Semantic communications: Principles and challenges
Pearson Developments in model-based video coding
CN111512342A (en) Method and device for processing repeated points in point cloud compression
CN112950471A (en) Video super-resolution processing method and device, super-resolution reconstruction model and medium
CN110969572B (en) Face changing model training method, face exchange device and electronic equipment
CN111402399A (en) Face driving and live broadcasting method and device, electronic equipment and storage medium
Boubekker Bandwidth Reduction for the Transmission of Sign Language Over Telephone Lines
CN113132727B (en) Scalable machine vision coding method and training method of motion-guided image generation network
Xia et al. WiserVR: Semantic communication enabled wireless virtual reality delivery
Fujihashi et al. Wireless 3D point cloud delivery using deep graph neural networks
Zhu et al. A semantic-aware transmission with adaptive control scheme for volumetric video service
WO2020062998A1 (en) Image processing method, storage medium, and electronic device
CN112492313B (en) Picture transmission system based on generation countermeasure network
Duan et al. Multimedia semantic communications: Representation, encoding and transmission
CN116245886A (en) Medical image segmentation method based on federal learning and attention mechanism
CN115223244B (en) Haptic motion simulation method, device, apparatus and storage medium
CN111553961B (en) Method and device for acquiring line manuscript corresponding color map, storage medium and electronic device
CN110276728B (en) Human face video enhancement method based on residual error generation countermeasure network
CN116420351A (en) Providing 3D representations of sending participants in a virtual conference
CN113810736B (en) AI-driven real-time point cloud video transmission method and system
Tonchev et al. Semantic Communication System for 3D Video
Aizawa Model-based video coding
CN114363557B (en) Semantic fidelity-oriented virtual conference method and three-dimensional virtual conference system
CN112118359B (en) Text information processing method and device, storage medium, processor and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant