CN116708843B

CN116708843B - User experience quality feedback regulation system in semantic communication process

Info

Publication number: CN116708843B
Application number: CN202310967234.2A
Authority: CN
Inventors: 林嘉铭; 程宝平; 付涛; 陶晓明
Original assignee: Tsinghua University; China Mobile Hangzhou Information Technology Co Ltd
Current assignee: Tsinghua University; China Mobile Hangzhou Information Technology Co Ltd
Priority date: 2023-08-03
Filing date: 2023-08-03
Publication date: 2023-10-31
Anticipated expiration: 2043-08-03
Also published as: CN116708843A

Abstract

The application provides a user experience quality feedback regulation system in the semantic communication process, which is applied to the technical field of semantic communication and at least comprises the following components: an encoding end and a decoding end; the encoding end performs semantic encoding on the original video image through a semantic encoder in a semantic encoding and decoding network to obtain an encoding result, and sends the encoding result to the decoding end; the decoding end performs semantic decoding on the coding result through a semantic decoder in a semantic coding and decoding network to obtain a reconstructed video image; the decoding end obtains QoE index values of the reconstructed video images and an adjustment rule mapped by the QoE index values of the reconstructed video images; at least one of the decoding side and the encoding side performs an adjustment rule. The application designs a QoE evaluation and feedback mechanism for semantic communication based on a semantic communication system and QoE indexes, realizes the quality evaluation and feedback adjustment of the semantic communication system under different environments, and fills the blank of the existing QoE evaluation feedback mechanism in the aspect of semantic communication.

Description

User experience quality feedback regulation system in semantic communication process

Technical Field

The application relates to the technical field of semantic communication, in particular to a user experience quality feedback regulation system in the semantic communication process.

Background

The semantic communication technology is a new generation communication technology which integrates human brain visual perception and cognition mechanisms into a communication process by referring to a human brain ultrahigh image video compression performance mechanism, realizes high-definition and high-fluency reconstruction of video under high-efficiency semantic characterization and extremely low code rate, and is very suitable for a communication technology of a weak network environment.

However, the current quality of user experience (Quality of Experience, qoE) evaluation method and feedback adjustment mechanism in the semantic communication field are not yet mature, and the conventional QoE evaluation method hardly involves a semantically related codec process, and the feedback adjustment mechanism cannot be directly multiplexed into the semantic communication field.

Therefore, it is necessary to develop a feedback adjustment system for user experience quality in the semantic communication process, so as to implement feedback for user experience quality of the semantic communication and generate an adjustment strategy for the semantic communication.

Disclosure of Invention

In view of the foregoing, embodiments of the present application provide a system for user quality of experience feedback adjustment in semantic communication processes to overcome or at least partially solve the foregoing problems.

An embodiment of the present application provides a feedback adjustment system for user experience quality in a semantic communication process, which at least includes: an encoding end and a decoding end;

the encoding end performs semantic encoding on the original video image through a semantic encoder in a semantic encoding and decoding network to obtain an encoding result, and sends the encoding result to the decoding end;

the decoding end performs semantic decoding on the coding result through a semantic decoder in the semantic coding and decoding network to obtain a reconstructed video image;

the decoding end obtains QoE index values of the reconstructed video images;

the decoding end and the encoding end acquire an adjustment rule mapped by QoE index values of the reconstructed video image;

at least one of the decoding end and the encoding end executes the adjustment rule.

In an alternative embodiment, the system further comprises: the server is respectively in communication connection with the encoding end and the decoding end;

the encoding end sends the encoding result to the server and forwards the encoding result to the decoding end through the server;

the server performs semantic decoding on the coding result through the semantic decoder to obtain the reconstructed video image;

The server carries out QoE evaluation on the reconstructed video image according to a target QoE index which is irrelevant to the content of the video image and a semantic QoE index which is strongly relevant to the content of the video image, so as to obtain a QoE index value of the reconstructed video image, and the QoE index value of the reconstructed video image is sent to the coding end.

In an alternative embodiment, the system further comprises: the server is respectively in communication connection with the encoding end and the decoding end; the encoding end sends the encoding result to the server and forwards the encoding result to the decoding end through the server; the server performs semantic decoding on the coding result through the semantic decoder to obtain the reconstructed video image; the decoding end obtains QoE index values of the reconstructed video image, and the QoE index values comprise:

under the condition that the current computing power of the decoding end is lower than a first target computing power threshold, the decoding end acquires QoE index values obtained by the server through QoE evaluation on the reconstructed video image according to target QoE indexes irrelevant to video image contents and semantic QoE indexes strongly relevant to the video image contents;

under the condition that the current computing power of the decoding end is between the first target computing power threshold value and the second target computing power threshold value, the decoding end carries out QoE evaluation on the reconstructed video image according to a part of indexes in the target QoE indexes and the semantic QoE indexes to obtain a part of QoE index values of the reconstructed video image, and the server acquires the calculated residual QoE index values of the reconstructed video image according to the target QoE indexes and the residual indexes in the semantic QoE indexes;

And under the condition that the current computing power of the decoding end is higher than the second target computing power threshold, the decoding end carries out QoE evaluation on the reconstructed video image according to the target QoE index and the semantic QoE index to obtain the QoE index value of the reconstructed video image.

In an optional implementation manner, the decoding end obtains an adjustment rule mapped by a QoE index value of the reconstructed video image, including:

the decoding end compares the QoE index value of the reconstructed video image with a decoding end adjusting threshold value;

under the condition that the QoE index value of the reconstructed video image is lower than the decoding end adjusting threshold value, the decoding end queries a first adjusting rule mapped by the QoE index value of the reconstructed video image based on a first mapping relation between the adjusting rule and the QoE index value;

the encoding end obtains the adjustment rule, which comprises the following steps:

the encoding end obtains the first regulation rule determined by the decoding end.

the server compares QoE index values of the reconstructed video images with a server adjustment threshold;

under the condition that the QoE index value of the reconstructed video image is lower than the server regulation threshold value, the server inquires a second regulation rule mapped by the QoE index value of the reconstructed video image based on a second mapping relation between the regulation rule and the QoE index value;

the encoding end obtains an adjustment rule mapped by the QoE index value of the reconstructed video image, and the adjustment rule comprises the following steps:

the encoding end obtains the first regulation rule determined by the decoding end and/or the second regulation rule determined by the server;

the decoding end obtains an adjustment rule mapped by the QoE index value of the reconstructed video image, and the adjustment rule comprises the following steps:

the decoding end obtains the first regulation rule determined by the decoding end and/or the second regulation rule determined by the server.

In an alternative embodiment, the number of the first mapping relation or the second mapping relation is a plurality of, and the first mapping relation or the second mapping relation is respectively adapted to different communication scenes;

The decoding end determines a target first mapping relation from a plurality of first mapping relations according to a communication scene represented by the reconstructed video image, and determines the first regulation rule based on the target first mapping relation;

and the server determines a target second mapping relation from a plurality of second mapping relations according to the communication scene represented by the reconstructed video image, and determines the second regulation rule based on the target second mapping relation.

In an alternative embodiment, the adjustment rule includes at least: adjusting parameters of the semantic encoder and the semantic decoder;

at least one of the decoding end and the encoding end performs the adjustment rule, including:

the encoding end adjusts parameters of the semantic encoder, performs semantic encoding on a new original video image according to the semantic encoder after parameter adjustment to obtain a new encoding result, and sends the new encoding result to the decoding end;

the decoding end adjusts parameters of the semantic decoder, and performs semantic decoding on the new original video image according to the semantic decoder with the parameters adjusted, so as to obtain a new reconstructed video image.

In an alternative embodiment, the QoE indicator value of the reconstructed video image is obtained according to the following steps:

extracting influence factors which are not related to the content of the video image aiming at the reconstructed video image, and determining a target QoE index value of a target QoE index which is not related to the content of the video image;

extracting an influence factor which is strongly related to the content of the video image aiming at the reconstructed video image, and determining a semantic QoE index value of a semantic QoE index which is strongly related to the content of the video image;

and determining the QoE index value of the reconstructed video image according to the respective weights of the target QoE index and the semantic QoE index, and the target QoE index value and the semantic QoE index value.

In an alternative embodiment, the semantic QoE indicator value of the semantic QoE indicator strongly related to the video image content comprises: an average keypoint distance for characterizing an offset between a keypoint of the original video image and a keypoint of the reconstructed video image;

in the case that the average key point distance of the reconstructed video image is lower than the target distance, the adjustment rule is: increasing the number of transmissions of key points of the original video image and/or adjusting parameters of the semantic encoder and the semantic decoder.

In an alternative embodiment, the semantic QoE indicator value of the semantic QoE indicator strongly related to the video image content comprises: the Laplacian coordinate difference value is used for representing the offset of the reconstruction target object in the reconstruction video image relative to the original target object in the original video image;

in the case that the difference value of the map Laplacian coordinates of the reconstructed video image is higher than a set threshold value, the adjustment rule is: the semantic decoder enables contour constraints and/or adjusts parameters of the semantic encoder and the semantic decoder.

The second aspect of the present embodiment further provides a method for adjusting feedback of user experience quality in a semantic communication process, where the method is applied to the feedback adjustment system of user experience quality provided in the first aspect of the present application, where the system includes an encoding end and a decoding end, and the method at least includes:

The decoding end obtains QoE index values of the reconstructed video images;

The third aspect of the embodiment of the present application further provides an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps in the feedback adjustment method for user experience quality in the semantic communication process according to any one of the second aspect of the embodiment of the present application when the processor executes the method.

The fourth aspect of the embodiment of the present application further provides a computer readable storage medium, on which a computer program/instruction is stored, which when executed by a processor, implements the steps in the method for adjusting feedback of user experience quality in the semantic communication process according to any one of the first aspect of the embodiment of the present application.

A fifth aspect of an embodiment of the present application provides a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps in the method for user quality of experience feedback adjustment in a semantic communication process according to any of the first aspects.

The embodiment of the application provides a user experience quality feedback regulation system in a semantic communication process, which at least comprises the following components: an encoding end and a decoding end; the encoding end performs semantic encoding on the original video image through a semantic encoder in a semantic encoding and decoding network to obtain an encoding result, and sends the encoding result to the decoding end; the decoding end performs semantic decoding on the coding result through a semantic decoder in the semantic coding and decoding network to obtain a reconstructed video image; the decoding end obtains QoE index values of the reconstructed video images; the decoding end and the encoding end acquire an adjustment rule mapped by QoE index values of the reconstructed video image; the decoding end and the encoding end at least one executes the adjustment rule.

The embodiment of the application has the following specific beneficial effects: according to the embodiment of the application, the QoE index value of the reconstructed video image is obtained, so that the user experience quality feedback based on semantic communication is realized, the defect of the existing QoE index in terms of semantic evaluation is overcome, and the reconstruction quality of the reconstructed video image in terms of semantics can be effectively measured. In addition, the embodiment of the application determines the corresponding regulation rule based on the user experience quality feedback result (QoE index value), thereby regulating the semantic communication, improving the semantic communication effect in the actual application process, realizing the quality evaluation and feedback regulation of the semantic communication system under different environments, and filling the blank of the existing QoE evaluation feedback regulation mechanism in the aspect of semantic communication.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a schematic structural diagram of a feedback adjustment system for user experience quality in a semantic communication process according to an embodiment of the present application;

FIG. 2 is a schematic representation of a variation of a reconstructed video image according to an embodiment of the present application;

fig. 3 is a flowchart of a feedback adjustment mechanism of a feedback adjustment system for quality of experience of a user according to an embodiment of the present application;

fig. 4 is a schematic diagram of a feedback adjustment flow of user experience quality according to an embodiment of the present application;

fig. 5 is a flowchart of steps in a method for adjusting feedback of user experience quality according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will be described in more detail below with reference to the accompanying drawings in the embodiments of the present application. While exemplary embodiments of the present application are shown in the drawings, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the application to those skilled in the art.

The high definition of the camera brings convenience for life of users and simultaneously faces challenges in the aspects of massive access of access terminals, continuous increase of code rate, increasingly complex scenes and the like. The development trend of rural common weak network environment, limited uplink bandwidth and high definition of cameras brings the problems of full bandwidth occupation, poor playing experience, high cloud storage cost and the like. Among the typical problems are: in the video intercom process of the camera, the problem that the face video is blocked due to insufficient bandwidth, the face is unclear and the like; scene video also faces similar problems of stuck and blurred. At present, the semantic communication technology is mainly adopted to solve the problems, and the communication effect of video images is improved.

Semantic communication refers to a communication technology which utilizes semantic information of transmission content to perform encoding and decoding, can remove redundant data, reduces transmission data volume and realizes higher efficiency. The semantic communication of the human face belongs to one branch of semantic communication, namely the communication process of taking the human face image as transmission content, encoding and decoding the human face image by analyzing semantic information in the image, and reconstructing the human face image.

In order to ensure the use experience of users in the communication process and improve the user experience quality, how to adjust based on the real-time communication quality and the user experience quality is an important research direction of the semantic communication technology. The current mainstream technical scheme is designed based on the QoE calculation and adjustment task of the traditional communication service, and no QoE evaluation and feedback adjustment method specific to the semantic communication process exists. In summary, the quality of experience (Quality of Experience, qoE) technologies of the communication system are all evaluation technologies oriented to the traditional coding and decoding methods (such as h.264, etc.), and are not applicable to semantic coding and decoding, and cannot be directly used for measuring the semantic gap between the video image generated by the semantic coding and decoding and the original video image content. Therefore, the quality of the reconstructed image generated in the semantic encoding and decoding process is directly estimated by using the original QoE technology, an accurate evaluation result cannot be obtained, and an applicable feedback adjustment mechanism cannot be obtained.

In view of the above problems, an embodiment of the present application provides a system for adjusting feedback of user experience quality in a semantic communication process, so as to solve the problem that the original QoE technology is not suitable for the semantic communication technology, cannot obtain accurate feedback of user experience quality, and cannot adjust semantic communication. The following describes in detail a user experience quality feedback adjustment system in the semantic communication process provided by the embodiment of the application through some embodiments and application scenarios thereof with reference to the accompanying drawings.

The embodiment proposes a feedback adjustment system for user experience quality in a semantic communication process, referring to fig. 1, fig. 1 shows a schematic structural diagram of a feedback adjustment system for user experience quality in a semantic communication process, as shown in fig. 1, and the system includes:

an encoding end and a decoding end;

The decoding end obtains QoE index values of the reconstructed video images;

In this embodiment, the user experience quality feedback adjustment system for executing semantic communication mainly includes an encoding end and a decoding end, where the encoding end and the decoding end may be any terminals with communication functions, such as a computer, a mobile phone, a tablet, a PC, and the like.

The encoding end performs semantic encoding on the original video image through a semantic encoder in a semantic encoding and decoding network to obtain an encoding result, and sends the encoding result to the decoding end; and the decoding end performs semantic decoding on the coding result through a semantic decoder in the semantic coding and decoding network to obtain a reconstructed video image. Specifically, in the semantic communication process, an encoding end performs semantic encoding on an original video image to obtain an encoding result, wherein the encoding result contains semantic information that the source is the video image. The encoding end sends the encoding result to the decoding end, and the decoding end uses the semantic decoder to reconstruct the image to obtain the reconstructed video image corresponding to the original video image. By utilizing the semantic communication technology, the original video image does not need to be directly transmitted, the image code rate of transmission is greatly reduced, and the communication efficiency is improved.

In the practical application process, in the semantic communication process, the video image obtained by the decoding end is obtained by executing a series of encoding and decoding processes through a semantic encoding and decoding network, and the semantic difference between the reconstructed video image and the original video image in content exists. Taking the semantic communication of a human face as an example, referring to fig. 2, fig. 2 shows a change schematic diagram of a reconstructed video image, a decoding end performs human face reconstruction only through key points of the human face and a reference human face image, and the shape information contained in the key points is limited, so that the reconstructed human face image is easy to deform wholly or partially. Because the traditional QoE technology is not suitable for semantic encoding and decoding, the problems of deformation and the like in the reconstructed face image generated in the semantic encoding and decoding process cannot be found or accurately estimated by the traditional QoE technology, and the policy adjustment on semantic communication cannot be further performed based on the obtained QoE result.

The decoding end obtains QoE index values of the reconstructed video images; and the decoding end and the encoding end acquire an adjustment rule mapped by QoE index values of the reconstructed video image. The QoE index value represents an evaluation result obtained after QoE evaluation is performed on the reconstructed video image, and includes index values of one or more QoE indexes for measuring semantically related contents such as image distortion degree, content time sequence stability, and the like. The regulation rule represents a rule for regulating related parameters of semantic communication by the encoding end and the decoding end. There is a mapping relation between the QoE index values and the regulation rules, and different QoE index values correspond to different regulation rules, namely, the reconstructed video images have different quality, and the regulation strategies for semantic communication are different.

Referring to fig. 4, fig. 4 shows a schematic flow chart of feedback adjustment of user experience quality, as shown in fig. 4, an encoding end encodes an original video image by using a semantic encoder to obtain an encoding result, the encoding result is sent to a decoding end, and the decoding end uses the semantic decoder to reconstruct an image to obtain a reconstructed video image (e.g., the reconstructed video in fig. 4). After the decoding side obtains the reconstructed video image, the corresponding QoE indicator value (e.g., the semantic QoE indicator in fig. 4) may be determined according to the relevant impact factors by extracting the relevant impact factors from the reconstructed video image. According to the determined QoE index value of the reconstructed video image, the semantic encoder and the semantic decoder can be correspondingly adjusted based on a preset feedback adjustment mechanism. In practice, the calculation of the index value is mainly performed from two aspects: on the one hand, for reconstructing a video image, extracting influence factors (influence factors required for representing the traditional QoE, such as video resolution, frame rate and the like) which are not related to the content of the video image, and determining a target QoE index value of a target QoE index which is not related to the content of the video image; on the other hand, for reconstructing a video image, an influence factor strongly related to the content of the video image (representing an influence factor related to semantics, such as a face key point, a key region outline, etc.) is extracted, and a semantic QoE index value of a semantic QoE index strongly related to the content of the video image is determined.

And determining the QoE index value of the reconstructed video image according to the respective weights of the target QoE index and the semantic QoE index, and the target QoE index value and the semantic QoE index value. In the implementation, the index values obtained by calculation in two aspects can be integrated according to the weights of the preset target QoE index and the preset semantic QoE index, so that the quality evaluation of the reconstructed video image in semantic communication is realized.

In an alternative embodiment, the semantic QoE indicator value of the semantic QoE indicator strongly related to the video image content comprises: an average keypoint distance (Average keypoints distance, AKD) for characterizing an offset between a keypoint of the original video image and a keypoint of the reconstructed video image.

In this embodiment, the semantic QoE indicator value of the semantic QoE indicator strongly related to the video image content includes: an average keypoint distance that characterizes the offset between the keypoint of the original video image and the keypoint of the reconstructed video image, i.e. the average of the L2 distances of the coordinate differences. In particular implementations, a keypoint detection model may be utilized to extract a plurality of keypoints for a reconstructed video image from the original video image. The key points represent feature points of the image, for example, in a face semantic communication scene, the key points are contour feature points, five-sense organ feature points, and the like, and in this embodiment, the selection method of the key points is not limited. In addition, in the practical application process, there are a plurality of different algorithms for extracting or detecting the key points, the number of the key points obtained by the different key point detection algorithms is different, and for example, the number of the key points can be 5 points, 68 points or 103 points, etc., the more the number of the key points is, the more detailed the obtained semantic information is, but the more the calculated amount is.

In this embodiment, the average keypoint distance may be calculated according to the following formula:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein AKD represents the average key point distance, N represents the number of key points, and +.>Representing the i-th key point in the original video image, < +.>The i-th key point of the reconstructed video image is represented, i is greater than or equal to 1 and less than or equal to N. In this embodiment, the degree of deviation of the overall position of the reconstructed video image with respect to the original video image is obtained by calculating the average euclidean distance of all the key points. The average key point distance is used for representing the image shape in the reconstructed video image, and compared with the deviation degree of the image shape in the original video image, the larger the average key point distance is, the larger the deviation degree is, and the lower the quality of the reconstructed video image is; the smaller the average keypoint distance, the smaller the degree of offset, and the higher the quality of the reconstructed video image.

In this embodiment, if the average key point distance of the reconstructed video image is lower than the target distance, it indicates that the reconstructed video image has a shape offset compared with the original video image, and semantic distortion conditions such as key point offset occur, and the corresponding adjustment rule is: increasing the number of transmissions of key points of the original video image and/or adjusting parameters of the semantic encoder and semantic decoder. Specifically, the key points of the original video image can be increased, the transmitted semantic communication information can be increased, or parameters of the coder and the decoder can be selectively adjusted according to specific network and computing power conditions, so that the quality of the reconstructed video image is improved, and a good semantic communication effect is achieved.

In an alternative embodiment, the semantic QoE indicator value of the semantic QoE indicator strongly related to the video image content comprises: -map Laplacian coordinate differences (Graph Laplacian Coordinate Difference, GLCD) for characterizing an offset of a reconstructed target object in the reconstructed video image relative to an original target object in the original video image;

In this embodiment, the semantic QoE indicator value of the semantic QoE indicator strongly related to the video image content includes: the Laplacian coordinate difference value is used for representing the offset degree of a reconstructed target object in a reconstructed video image relative to an original target object in an original video image. In the scene of facial semantic communication, the map Laplacian coordinate difference value represents the distortion degree of five parts of eyebrows, eyes, nose, lips and facial contours in a reconstructed video image. Specifically, first, the positions of each key point in the face image are classified according to the positions, which at least include five positions: eyebrows, eyes, nose, lips and facial contours. According to the positions of the key points, classifying the key points in the face image, determining the position of each key point, and dividing all the key points into eyebrow key points, glasses key points, nose key points and facial contour key points. And respectively analyzing the key points of each part, calculating the distance from each key point to the center point of the part to which the key point belongs by using the Laplacian L to represent the relative position of the key point to the center of the part, finally calculating the offset of the relative position of each key point in the original image and the generated image, summing, representing the deviation of the generated image at the part level by using the value, and finally determining the offset degree of each part (target object).

Specifically, the relative position of the key point relative to the center point of the part to which the key point belongs is calculated according to the following formula:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing the ith key point of the current reconstructed face image or the ith key point of the current original face image,/or->Representing the relative position of the ith key point of the current reconstructed face image or the ith key point of the current original face image, < ->Representing a set of keypoints belonging to the same part as the i-th keypoint,/for>The number of key points representing the part to which the ith key point belongs,/->Representation->Any key point of the part.

After the relative positions of the key points are calculated, the map Laplacian coordinate difference value of the reconstructed video image is calculated according to the following formula:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing the ith key point and the +_th key point in the current original face image>Representing the current reconstructed face imageIth key point->Representing the relative position of the ith key point of the current original face image, +.>And (3) representing the relative position of the ith key point of the current reconstructed face image, wherein GLCD represents the coordinate difference value of the map Laplacian, and N represents the number of the key points, i is greater than or equal to 1 and less than or equal to N.

In this embodiment, if the difference value of the map Laplacian coordinates of the reconstructed video image is higher than the set threshold value, the deviation degree of the reconstructed target object in the reconstructed video image relative to the original target object in the original video image is excessively large, and the corresponding adjustment rule is: the semantic decoder enables contour constraints and/or adjusts parameters of the semantic encoder and the semantic decoder. Specifically, the parameter can be selected according to the specific network and the computing power condition, for example, the semantic communication effect can be improved by selecting a codec with larger computing power, or in the process of semantic communication, the encoding end sends the first outline of the original video image to the decoding end based on an outline constraint strategy, so that semantic information is increased, and the semantic communication effect is improved.

In an alternative embodiment, the semantic QoE indicator value of the semantic QoE indicator strongly related to the video image content further comprises: maximum mean difference (Maximum Mean Discrepancy, MMD). The maximum mean difference is used for representing time sequence stability between multiple frames of reconstructed video images and can be obtained by measuring time sequence hidden space characteristics of the reconstructed video images and original video images. Specifically, the corresponding characteristics (namely, instant hidden space expression) of each frame in the original video image and the reconstructed video image are calculated and recorded as a set And->Use->Latent spatial features representing original video images, usingThe hidden space features of the reconstructed video image are represented, the similarity between the set and the set of each feature is calculated by using a core operator K, and the maximum average difference value can be calculated by the following formula:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein K (x, y) represents the similarity of the features x and y, and MMD represents the maximum average difference; m represents the set->The total number of features in the set, n represents the set->The total number of features in the matrix transpose operation is denoted by T. In the embodiment, the timing sequence hidden space expression in the reconstructed video image is mined, the maximum mean difference is calculated as a semantic QoE index value, the semantic QoE evaluation is carried out on the reconstructed video image from the point of timing sequence consistency, the accuracy of QoE evaluation is further improved,

furthermore, the semantic QoE indicator value when the semantic QoE indicator strongly correlates with the video image content simultaneously comprises: and when the map Laplacian coordinate difference value and the average key point distance are obtained, weighting and summing can be carried out according to preset weights, so that the integrated semantic QoE index value is obtained. The calculation can be specifically performed according to the following formula:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>(Semantic Image Reconstruction Score ) QoE indicator value, +_f representing the reconstructed video image >Weights representing average keypoint distance, +.>The weights representing the map Laplacian coordinate differences, AKD representing the average key point distance of the reconstructed video image, GLCD representing the map Laplacian coordinate differences of the reconstructed video image.

According to the above embodiment, it can be seen that the user experience quality feedback adjustment system provided by the embodiment of the application makes up for the deficiency of the existing QoE index in terms of semantic evaluation. The existing communication technology does not consider the influence factors related to the semantics when designing the QoE evaluation method, so that the video image decoding quality in the aspect of video image content cannot be measured. The QoE evaluation method for semantic communication design in the embodiment of the application is characterized in that the QoE evaluation method is more suitable for semantic communication service, and the reconstruction quality of the reconstructed video image in terms of semantics can be effectively measured by extracting relevant influence factors from the reconstructed video image and calculating a target QoE index value and a semantic QoE index value (such as average key point distance and graph Laplacian coordinate difference value). In addition, the embodiment of the application also makes up the shortages of the existing adjusting method in terms of semantic communication. The existing semantic communication regulation strategies are mostly used for regulating indexes such as video code rate and the like which are irrelevant to contents, and cannot be regulated when errors occur in the reconstruction of the contents such as video images and the like. The embodiment of the application designs a feedback regulation strategy aiming at semantic communication, and can effectively solve the problems related to video image content, such as low semantic information reduction degree, unstable semantic time sequence and the like in the communication process.

In practical application, there is a problem that the computing capability of the decoding end is low, and it is difficult to independently execute semantic communication (tasks such as semantic coding and decoding and QoE evaluation), and the following embodiments further provide a solution based on the feedback adjustment system for user experience quality provided in the foregoing embodiments.

Embodiment 1

In this embodiment, the server is in communication connection with the encoding end and the decoding end, respectively, and the encoding end sends the encoding result to the server, and the server sends the encoding result to the decoding end. In order to solve the problem of insufficient computation power at the decoding end, in this embodiment, qoE evaluation may be performed by a server. Therefore, the server is required to reconstruct the video image, namely, the semantic decoder is used for carrying out semantic decoding on the coding result to obtain a reconstructed video image. Then, the server can replace the decoding end to carry out QoE evaluation on the reconstructed video image, and then the obtained evaluation result (QoE index value of the reconstructed video image) is sent to the decoding end, so that the calculation power requirement on the decoding end is reduced, and the semantic communication efficiency is improved.

Specifically, the server performs QoE evaluation based on a preset QoE index, where the QoE index includes: a target QoE indicator independent of the video image content, and a semantic QoE indicator strongly related to the video image content. The target QoE indicator, which is independent of the video image content, represents the conventional QoE indicator, including the play-out time, buffer ratio, average media rate, etc. The semantic QoE index is strongly related to the content of the video image, and represents the QoE index for evaluating semantic information, wherein the QoE index comprises indexes of single-frame reconstruction quality, time sequence consistency and the like, the single-frame reconstruction quality represents the image quality of each frame of image in the reconstructed video image, the time sequence consistency represents the consistency degree of time sequence characteristics before reconstructing multiple frames of the video image, such as whether the problems of brightness abrupt change and the like exist or not. Therefore, the reconstructed video image is analyzed and evaluated by combining the traditional QoE index and the semantic QoE index, and a more accurate QoE index value is obtained.

Embodiment II

In this embodiment, the server is in communication connection with the encoding end and the decoding end, respectively, and the encoding end sends the encoding result to the server, and the server sends the encoding result to the decoding end. According to the difference of calculation force of the decoding end, the decoding end carries partial or all QoE evaluation tasks.

In the implementation, after receiving the decoding result sent by the server, the decoding end decodes and reconstructs the decoding result by using the semantic decoder, so as to obtain a reconstructed video image, and performs corresponding QoE index calculation based on the own computing power capacity to obtain the QoE index value of the reconstructed video image.

The current computing power of the decoding end is lower than the first target computing power threshold, which means that the computing power of the decoding end is too low to perform QoE evaluation, so a third party (e.g., a server in the first embodiment) is required to perform QoE evaluation task. The decoding end does not calculate QoE index, and the server carries out QoE evaluation on the reconstructed video image according to the related index (target QoE index irrelevant to the video image content and semantic QoE index strongly relevant to the video image content) to obtain QoE index value and sends the QoE index value to the decoding end.

The current computing power of the decoding end is between the first target computing power threshold and the second target computing power threshold, which means that the computing power of the decoding end is at a medium level, and it is difficult to independently perform QoE evaluation or the processing efficiency is poor, so that a third party (such as a server in the first embodiment) is required to assist in performing QoE evaluation tasks, for example, a server performs a part of QoE evaluation tasks, and the decoding end performs another part of QoE evaluation tasks. Specifically, the decoding end carries out QoE evaluation on the reconstructed video image according to a target QoE index and a part of indexes in the semantic QoE indexes to obtain a part of QoE index values of the reconstructed video image; aiming at the reconstructed video image, the server calculates the residual QoE index value according to the target QoE index and the residual index in the semantic QoE index, and sends the obtained QoE index value to the decoding end. Further, a simpler part of the QoE evaluation task, such as calculation of an index value related to the target QoE index (traditional QoE index) and calculation of a key point distance average value in the semantic QoE index, may be performed by the decoding end; a part of the QoE evaluation task that is more difficult to perform by the server, such as calculation of the graph Laplacian coordinate difference in the semantic QoE index value.

The current computing power of the decoding end is higher than the second target computing power threshold, which means that the self computing power of the decoding end is higher, qoE evaluation can be independently completed, and the server is not required to assist in QoE evaluation. Therefore, in this case, the decoding end independently performs QoE evaluation on the reconstructed video image according to the relevant indexes (the target QoE index and the semantic QoE index), and obtains the corresponding QoE index value.

In summary, in the first embodiment, the technical scheme is provided that the server independently performs QoE evaluation on the reconstructed video image according to the related indexes (the target QoE index and the semantic QoE index) to obtain the QoE index value of the reconstructed video image, and the scheme can reduce the operation pressure of the decoding end and improve the semantic communication efficiency of the decoding end. In the second embodiment, the QoE evaluation tasks required to be executed by the decoding end and the server are determined according to the actual computing power capability of the decoding end, if the decoding end has high computing power, all QoE indexes can be calculated, and if the decoding end has middle-low computing power, part of the indexes or none of the indexes can be calculated, so that the QoE evaluation tasks required to be executed by the decoding end can be flexibly arranged, and the computing power resources of the decoding end can be fully utilized under the condition that the semantic communication efficiency is not affected.

In this embodiment, after obtaining the QoE index value of the reconstructed video image, the decoding end compares the QoE index value with the decoding end adjustment threshold, and selects a corresponding adjustment rule according to the relationship between the QoE index value and the decoding end adjustment threshold. When the QoE index value is lower than the adjustment threshold value of the decoding end, the QoE index value is too low, namely the image quality of the reconstructed video image obtained by the decoding end is too low, the semantic communication effect is poor, and related semantic communication parameters or strategies need to be adjusted. In this case, the decoding end queries and obtains a first adjustment rule mapped by the QoE index value of the reconstructed video image based on a first mapping relationship between the adjustment rule and the QoE index value. Specifically, the first adjustment rule may be determined according to the following formula:

The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing a first regulation rule,/->A first mapping relation between the regulation rule and the QoE index value is represented, namely, a preset regulation mapping rule is represented; />A QoE indicator value representing the input reconstructed video image; />Representing the decoding side adjustment threshold.

In the implementation, the coding end cannot obtain the reconstructed video image, does not need to obtain the QoE index value of the reconstructed video image, naturally cannot obtain the semantic communication quality evaluation result, and cannot independently perform parameter adjustment and strategy adjustment related to semantic communication. According to the method, after the decoding end obtains the corresponding first regulation rule according to the QoE index value of the reconstructed video image, the first regulation rule is sent to the encoding end, so that the encoding end can adaptively regulate corresponding parameters and regulate strategies according to the received first regulation rule, and communication quality and efficiency of semantic communication are improved.

In this embodiment, the system further includes a server, and the server sends the encoding result of the encoding end to the decoding end. The server can reconstruct the image by itself, namely, the encoding result is semantically decoded by the semantic decoder to obtain a reconstructed video image, and then the corresponding QoE index value is obtained according to the reconstructed video image. In the implementation, the server may independently complete the calculation of the QoE indicator value according to the scheme in the first embodiment, or may combine the scheme in the second embodiment with the scheme in the second embodiment, and each decoding end is responsible for calculating a part of the QoE indicator value.

Considering that most decoding ends are low-power devices, the embodiment proposes that a server executes all or part of adjustment rule generation tasks, that is, the server judges specific low-quality reconstruction conditions according to QoE evaluation results (QoE index values) and determines a coding and decoding model adjustment strategy. In a specific implementation, comparing, by the server, the QoE index value of the reconstructed video image with a server adjustment threshold, and querying, based on a second mapping relationship between the adjustment rule and the QoE index value, a second adjustment rule to which the QoE index value of the reconstructed video image is mapped, if the QoE index value of the reconstructed video image is lower than the server adjustment threshold. The calculation can be performed according to the following formula:

the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Representing a second regulation rule,/->Representing a second mapping relationship between the adjustment rule and the QoE indicator value; />A QoE index value representing the reconstructed video image acquired by the server;indicating that the server adjusts the threshold.

In the implementation, the coding end cannot obtain the reconstructed video image and the QoE index value of the reconstructed video image, so that the semantic communication quality evaluation result is difficult to obtain, and the parameter adjustment and the strategy adjustment related to the semantic communication cannot be independently performed. In this embodiment, the encoding end may obtain the first adjustment rule determined by the decoding end and/or the second adjustment rule determined by the server. Similarly, since the decoding end is responsible for determining the first regulation rule, the server is responsible for determining the second regulation rule, and correspondingly, the decoding end can acquire the first regulation rule determined by the decoding end and/or the second regulation rule determined by the server, thereby combining the first regulation rule and the second regulation rule to realize adaptive semantic communication regulation.

The feedback regulation mechanism of the feedback regulation system for the user experience quality provided by the embodiment of the application can adapt to different network environments and computing power equipment. The embodiment of the application divides the feedback regulation mechanism into two parts of a server side and a decoding side (for example, the decoding side is responsible for determining a first regulation rule, the server is responsible for determining a second regulation rule), and the QoE evaluation and feedback regulation mode can be regulated according to the specific network environment and the computing power capability of the client side, so that the method has stronger universality.

In this embodiment, different communication scenes adapt to different first mapping relationships or second mapping relationships, and, for example, in a face semantic communication scene (for example, a scene of video communication, teleconference, etc.), a target first mapping relationship and a target second mapping relationship are determined from a first mapping relationship and a second mapping relationship corresponding to a preset face semantic communication scene, so as to obtain a first adjustment rule and a second adjustment rule. Under the vehicle semantic communication scene, a target first mapping relation and a target second mapping relation can be determined from a first mapping relation and a second mapping relation corresponding to a preset vehicle semantic communication scene, and then a first regulation rule and a second regulation rule are obtained.

The decoding end adjusts parameters of the semantic decoder, and performs semantic decoding on the new coding result according to the semantic decoder after parameter adjustment to obtain a new reconstructed video image.

Referring to fig. 3, fig. 3 shows a flow chart of a feedback adjustment mechanism of a feedback adjustment system for user experience quality, as shown in fig. 3, an encoding end encodes an original video image by using a semantic encoder, and sends an encoding result to a decoding end (such as a client in fig. 3) through a server. The decoding end decodes the coding result by using a semantic decoder to obtain a reconstructed video image, and then plays the video image by a player. The decoding end obtains QoE index values (such as semantic QoE in figure 3) of the reconstructed video image, and determines a corresponding first regulation rule according to the QoE index values; the server obtains a QoE indicator value (e.g. the semantic QoE in fig. 3) of the reconstructed video image, and determines a corresponding second adjustment rule according to the QoE indicator value. As indicated by the dashed arrow in fig. 3, the encoding end obtains the first adjustment rule of the decoding end and the second adjustment rule of the server, so as to perform parameter adjustment optimization on the semantic encoder, and perform semantic encoding on the new original video image, thereby obtaining a new encoding result. The decoding end adjusts the parameters of the semantic decoder based on the acquired adjusting rules, and performs semantic decoding on the new coding result according to the semantic decoder after parameter adjustment to obtain a new reconstructed video image, thereby realizing parameter adjustment and optimization of semantic communication, improving the quality of the generated reconstructed video image, or improving the semantic communication efficiency.

The user experience feedback regulation system designed for the semantic communication process increases consideration of semantic information, and solves the problems that the existing QoE index is weaker in semantic information measurement capability and even completely irrelevant to semantic information. Meanwhile, the QoE index designed by the embodiment of the application is designed based on the semantic communication process, and the required information is only the information transmitted in the semantic transmission and the information extracted after video reconstruction, so that the original video frame is not required to be referred, and additional information is not required to be transmitted, thereby improving the semantic communication efficiency. The embodiment of the application also provides a feedback regulation mechanism based on the semantic communication characteristics and the QoE evaluation method. The adjustment strategy output by the feedback adjustment mechanism needs to be modified according to a specific scene, for example, the semantic information in the video call scene is mainly information related to the face, and correspondingly, a related strategy capable of adjusting the face generation quality is selected from the adjustment strategies. In addition, the embodiment of the application divides the feedback adjustment mechanism into two parts of a server side and a decoding side, and can adjust the calculation scheme according to the network environment and the calculation force condition of the decoding side, thereby reducing the hardware requirement on the decoding side.

The second aspect of the embodiment of the present application further provides a method for adjusting feedback of user experience quality in a semantic communication process, referring to fig. 5, fig. 5 shows a flowchart of steps of a method for adjusting feedback of user experience quality, as shown in fig. 5, where the method is applied to a system for adjusting feedback of user experience quality provided in the first aspect of the embodiment of the present application, where the system includes an encoding end and a decoding end, and the method at least includes:

step S501, the encoding end performs semantic encoding on an original video image through a semantic encoder in a semantic encoding and decoding network to obtain an encoding result, and sends the encoding result to the decoding end;

step S502, the decoding end performs semantic decoding on the coding result through a semantic decoder in the semantic coding and decoding network to obtain a reconstructed video image;

step S503, the decoding end obtains a QoE indicator value of the reconstructed video image;

step S504, the decoding end and the encoding end acquire an adjustment rule mapped by QoE index values of the reconstructed video image;

step S505, at least one of the decoding end and the encoding end executes the adjustment rule.

In an alternative embodiment, the method further comprises:

In an optional implementation manner, the user experience quality feedback adjustment system further comprises a server, and the server is in communication connection with the encoding end and the decoding end respectively; the method further comprises the steps of:

the encoding end sends the encoding result to the server and forwards the encoding result to the decoding end through the server; the server performs semantic decoding on the coding result through the semantic decoder to obtain the reconstructed video image; the decoding end obtains QoE index values of the reconstructed video image, and the QoE index values comprise:

In an alternative embodiment, the method further comprises:

in the case that the difference value of the map Laplacian coordinates of the reconstructed video image is higher than a set threshold value, the adjustment rule is: the semantic decoder enables contour constraints and/or adjusts parameters of the semantic encoder and the semantic decoder

The embodiment of the application also provides an electronic device, and referring to fig. 6, fig. 6 is a schematic structural diagram of the electronic device according to the embodiment of the application. As shown in fig. 6, the electronic device 100 includes: the memory 110 and the processor 120 are connected through a bus communication, and a computer program is stored in the memory 110 and can run on the processor 120, so that the steps in the user experience quality feedback adjustment method in the semantic communication process disclosed by the second aspect of the embodiment of the application are realized.

The embodiment of the application also provides a computer readable storage medium, on which a computer program/instruction is stored, which when executed by a processor, implements the steps in the user experience quality feedback adjustment method in the semantic communication process disclosed in the second aspect of the embodiment of the application.

The embodiment of the application also provides a computer program product, which comprises a computer program/instruction, wherein the computer program/instruction realizes the steps in the feedback adjustment method of the user experience quality in the semantic communication process disclosed in the second aspect of the embodiment of the application when being executed by a processor.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus, electronic devices, and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing describes in detail a feedback adjustment system for user experience quality in a semantic communication process, and specific examples are applied to illustrate principles and embodiments of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. A system for feedback adjustment of user quality of experience in a semantic communication process, comprising at least: an encoding end and a decoding end;

the decoding end obtains QoE index values of the reconstructed video images;

at least one of the decoding end and the encoding end performs the adjustment rule;

the system further comprises: the server is respectively in communication connection with the encoding end and the decoding end;

2. The user quality of experience feedback adjustment system of claim 1, further comprising: the server is respectively in communication connection with the encoding end and the decoding end; the encoding end sends the encoding result to the server and forwards the encoding result to the decoding end through the server; the server performs semantic decoding on the coding result through the semantic decoder to obtain the reconstructed video image; the decoding end obtains QoE index values of the reconstructed video image, and the QoE index values comprise:

3. The system according to claim 1, wherein the decoding side obtains an adjustment rule mapped by QoE indicator values of the reconstructed video image, comprising:

4. The user quality of experience feedback adjustment system of claim 3, further comprising: the server is respectively in communication connection with the encoding end and the decoding end;

5. The user experience quality feedback adjustment system according to claim 4, wherein the number of the first mapping relation or the second mapping relation is plural, and each of the first mapping relation or the second mapping relation is adapted to different communication scenes;

6. The user quality of experience feedback adjustment system according to any one of claims 1-5, wherein the adjustment rules comprise at least: adjusting parameters of the semantic encoder and the semantic decoder;

7. The system according to any of claims 1-5, wherein the QoE indicator value of the reconstructed video image is obtained by:

8. The user quality of experience feedback adjustment system according to claim 7, wherein the semantic QoE indicator value of the semantic QoE indicator strongly related to the video image content comprises: an average keypoint distance for characterizing an offset between a keypoint of the original video image and a keypoint of the reconstructed video image;

9. The user quality of experience feedback adjustment system according to claim 7, wherein the semantic QoE indicator value of the semantic QoE indicator strongly related to the video image content comprises: the Laplacian coordinate difference value is used for representing the offset of the reconstruction target object in the reconstruction video image relative to the original target object in the original video image;