CN110581974A

CN110581974A - face picture improving method, user terminal and computer readable storage medium

Info

Publication number: CN110581974A
Application number: CN201810576986.5A
Authority: CN
Inventors: 李青; 史敏锐; 李东亮; 章波焕
Original assignee: China Telecom Corp Ltd
Current assignee: China Telecom Corp Ltd
Priority date: 2018-06-07
Filing date: 2018-06-07
Publication date: 2019-12-17
Anticipated expiration: 2038-06-07
Also published as: CN110581974B

Abstract

The invention discloses a face picture improving method, a user terminal and a computer readable storage medium. The method comprises the following steps: in the video call process, facial feature parameters under the condition of clear call images are collected; caching a current video frame; and under the condition of face image data missing, compensating and restoring the missing face image data according to the cached video frame data and the collected face characteristic parameters. The invention solves the problem of mosaic on the face picture caused by discontinuous frame loss of video data, and the receiving terminal can compensate and recover the image data lost in the key area of the face on the picture, thereby improving the quality of video call service.

Description

Face picture improving method, user terminal and computer readable storage medium

Technical Field

The present invention relates to the field of mobile communication technologies, and in particular, to a method for improving a face image, a user terminal, and a computer-readable storage medium.

Background

VoLTE (Voice over LTE, IMS-based Voice service) is a high-definition audio/video communication service provided by telecommunication operators to users based on 4G + network. Meanwhile, as a supplement to the VoLTE service, the RCS (converged communication) can provide multimedia instant message sharing such as text, pictures, video, files, and the like, and VoIP (Voice over Internet Protocol) and other services, and the combination of the two can accelerate the upgrade and upgrade of the basic communication service of the operator.

Disclosure of Invention

the applicant found that: in order to guarantee the audio and video call quality, the GSMA (global system for mobile operator association) introduces the technologies of semi-persistent scheduling (SPS), time slot binding (ttibundling), robust header compression (RoHC), Discontinuous Reception (DRX), etc. in the VoLTE technology standard to improve the air interface transmission quality and save power.

these techniques can improve the wireless network transmission quality, but still have the following problems: in dense areas such as urban building dense areas, exhibition halls and meeting places, and in subways, tunnels and other narrow indoor spaces, due to network reasons such as wireless high-frequency signal coverage and network capacity, the face image in video call is blurred, and service experience is affected.

In view of the above technical problems, the present invention provides a face picture improvement method, a user terminal and a computer readable storage medium, which can compensate and recover image data lost in key areas of a face on a picture, and improve the quality of video call service.

According to an aspect of the present invention, there is provided a face picture improvement method, including:

In the video call process, facial feature parameters under the condition of clear call images are collected;

caching a current video frame;

And under the condition of face image data missing, compensating and restoring the missing face image data according to the cached video frame data and the collected face characteristic parameters.

In some embodiments of the present invention, the face picture improvement method further includes: and acquiring audio characteristics, and determining emotion characteristics of the opposite party according to the audio characteristics.

in some embodiments of the present invention, the compensating and restoring missing facial image data by a facial feature restoration algorithm according to the buffered video frame data and the collected facial feature parameters includes:

And compensating and restoring missing face image data through a face feature restoration algorithm according to the collected emotion feature of the opposite side, the cached video frame data and the collected face feature parameters.

In some embodiments of the present invention, the face picture improvement method further includes:

Receiving an image data frame of a video call in the video call process;

Identifying and labeling at least one position area in a video image, wherein the position area comprises a face area; and then, executing the step of collecting the facial feature parameters under the condition that the call image is clear.

in some embodiments of the present invention, the acquiring the audio feature and determining the emotion feature of the opponent according to the audio feature includes:

The method comprises the steps of collecting changes of an audio signal of the opposite side in a conversation process, and determining emotion characteristics of the opposite side according to the changes of the audio signal of the opposite side, wherein the audio signal of the opposite side comprises at least one of the speed, tone, volume and audio of the opposite side.

In some embodiments of the present invention, said identifying and labeling at least one location area in the video image comprises:

At least one position area in the video image is identified and marked, and different areas are guaranteed in a grading mode.

Identifying a face region and setting the face region to a first priority;

Identifying a change area and setting the change area as a second priority;

the background region is identified and set to a third priority, wherein the first priority is higher than the second priority, which is higher than the third priority.

In some embodiments of the present invention, the facial feature parameters when the collected call image is clear include:

Collecting a face picture under the condition of clear call image;

The method comprises the steps of obtaining facial feature parameters of key parts of each region of a face, wherein the facial feature parameters comprise at least one item of feature information and a proportional relation.

In some embodiments of the invention, after buffering the current video frame, the method further comprises:

judging whether the current video frame is complete;

judging whether the face image data is lost or not under the condition that the current video frame is incomplete;

under the condition that the face image data is missing, executing a step of compensating and restoring the missing face image data according to the cached video frame data and the collected face characteristic parameters;

The step of receiving the image data frame of the video call is performed in a case where the face image data is not lost or in a case where the current video frame is complete.

in some embodiments of the invention, said buffering the current video frame comprises: and buffering the image data of the previous frame.

according to another aspect of the present invention, there is provided a user terminal comprising:

the clear feature acquisition module is used for acquiring facial feature parameters under the condition that a call image is clear in the video call process;

The data caching module is used for caching the current video frame;

and the data restoration module is used for compensating and restoring the missing face image data according to the cached video frame data and the collected face characteristic parameters under the condition of the missing face image data.

in some embodiments of the present invention, the user terminal is configured to perform an operation for implementing the face picture improvement method according to any of the above embodiments.

a memory to store instructions;

And the processor is used for executing the instructions to enable the device to execute the operation of realizing the human face picture improvement method according to any one of the embodiments.

According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions, which when executed by a processor, implement the method for improving a human face picture according to any one of the above embodiments.

The invention solves the problem of mosaic on the face picture caused by discontinuous frame loss of video data, and the receiving terminal can compensate and recover the image data lost in the key area of the face on the picture, thereby improving the quality of video call service.

Drawings

in order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

fig. 1 is a schematic diagram of some embodiments of a face picture improvement method of the present invention.

Fig. 2 is a schematic diagram of another embodiment of the method for improving a human face picture according to the present invention.

Fig. 3a and 3b are schematic diagrams of some further embodiments of the method for improving a human face picture according to the present invention.

Fig. 4 is a diagram of some embodiments of a ue of the present invention.

Fig. 5 is a schematic diagram of another embodiment of a ue of the present invention.

Fig. 6 is a diagram of a user terminal according to further embodiments of the present invention.

Fig. 7 is a diagram of a user terminal according to further embodiments of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

in all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Fig. 1 is a schematic diagram of some embodiments of a face picture improvement method of the present invention. Preferably, this embodiment can be executed by the user terminal of the present invention. The method comprises the following steps:

And 11, acquiring facial feature parameters under the condition of clear call images in the video call process.

In some embodiments of the present invention, the step 11 may include:

And step 111, collecting a face picture under the condition that the call image is clear.

And 112, acquiring facial feature parameters of key parts of each region of the face, wherein the facial feature parameters comprise at least one of feature information and a proportional relationship.

in some embodiments of the present invention, the key parts of the regions may include key parts of eyebrows, eyes, nose, mouth, chin, cheekbones, facial contour, chignon, and the like.

In some embodiments of the present invention, step 112 may include collecting more than 15 points of information.

And step 12, buffering the current video frame.

in some embodiments of the present invention, the step 12 may comprise: and buffering the image data of the previous frame, namely buffering the image data of the video frame of the previous picture.

and step 13, inputting the cached video frame data and the collected facial feature parameters into a preset facial feature restoration algorithm model under the condition of facial image data loss, and compensating and restoring the lost facial image data.

the human face image improvement method provided by the embodiment of the invention is a method for improving the image quality of a key area of a human face in a VoLTE/RCS video call, facial feature data are collected when an image is clear and serve as input parameters of a human face feature algorithm model, and missing human face images are supplemented and restored if necessary.

The embodiment of the invention solves the problem of mosaic on the face picture caused by discontinuous frame loss of video data, and the receiving terminal can compensate and recover the image data lost in the key area of the face on the picture, thereby improving the quality of video call service.

fig. 2 is a schematic diagram of another embodiment of the method for improving a human face picture according to the present invention. Preferably, this embodiment can be executed by the user terminal of the present invention. The method of the embodiment of fig. 2 may include the steps of:

step 20, the user terminal and another user terminal start video call.

And step 21, receiving the image data frame of the video call in the video call process.

At least one position area in the video image is identified and labeled, step 22, wherein the position area comprises a face area.

In some embodiments of the present invention, step 22 may comprise: at least one position area in the video image is identified and marked, and different areas are guaranteed in a grading mode.

In some embodiments of the present invention, the step of identifying and labeling at least one location area in the video image may specifically include:

Step 221, identify the face region and set the face region to a first priority.

Step 222, identifying a change area and setting the change area as a second priority, wherein the change area refers to an area where the ratio of the image at the same position to the previous frame changes by more than a predetermined value, for example: hand shaking back and forth.

Step 223, identify the background area and set the background area as a third priority, wherein the first priority is higher than the second priority, and the second priority is higher than the third priority.

the embodiment of the invention can identify and label different position areas in the same video picture. Furthermore, the above embodiments of the present invention may provide a hierarchical and differentiated service quality assurance mechanism for different location areas (especially, face location areas) different from other location areas (for example, background location areas).

And step 23, collecting facial feature parameters under the condition that the call image is clear. Step 23 of the embodiment of fig. 2 is the same as or similar to step 11 of the embodiment of fig. 1 and will not be described in detail herein.

And 24, acquiring audio characteristics, and determining the emotion characteristics of the opposite party according to the audio characteristics.

In some embodiments of the present invention, step 24 may specifically include: the method comprises the steps of collecting changes of an audio signal of the opposite party in a conversation process, and determining emotion characteristics of the opposite party according to the changes of the audio signal of the opposite party, wherein the audio signal of the opposite party can comprise at least one of audio signals of the opposite party, such as the speed, tone, volume and audio, and the emotion characteristics of the opposite party can comprise normal, smile, joy, laugh, anger and other emotion characteristics.

According to the embodiment of the invention, the emotion characteristics of the opposite party can be judged according to the voice, and the emotion characteristics of the opposite party are used as the input parameters of the face characteristic algorithm model, so that the similarity of the facial features during recovery can be improved.

In some embodiments of the present invention, after step 24, the method may further comprise: and inputting the emotion characteristics of the opposite side and the facial characteristic parameters under the condition that the call image is clear into a facial characteristic and emotion information base.

step 25, buffering the current video frame.

In some embodiments of the present invention, the step 25 may comprise: and buffering the image data of the previous frame, namely buffering the image data of the video frame of the previous picture.

step 26, determine if the facial image data is missing. In the case of a missing face image data, step 27 is executed; otherwise, in the case where the face image data is not missing, step 21 is executed.

in some embodiments of the present invention, the step 26 may include:

Step 261, determine whether the current video frame is complete.

In step 262, when the network quality is not good and the complete image data frame is not received, it is determined whether the missing data relates to a face image.

In step 263, if the data of the missing part relates to a face image, it is determined that the face image data is missing, and step 27 is executed.

in step 264, if the data of the missing part does not relate to the face image or if the current video frame is complete, it is determined that the face image data is not missing, and step 21 is executed.

step 27, extracting collected emotion characteristics of the opponent and collected face characteristic parameters from the face characteristic and emotion information base; and inputting the collected emotion characteristics of the opposite side, the cached video frame data and the collected facial characteristic parameters into a preset facial characteristic restoration algorithm model.

And step 28, compensating and restoring the current missing face image data through a face feature restoration algorithm according to the collected emotion features of the opposite side, the cached video frame data and the collected face feature parameters.

in the above embodiment of the invention, the terminal can identify the region of the human face on the video picture, collect the facial feature point information, judge the emotion fluctuation according to the audio signal, and cache the previous screen data frame in the video passing process. When the network transmission quality in the video call is not good (the video frame loss condition is judged), the image data related to the face part in the lost data frame can be automatically reshaped according to the face feature model algorithm, so that the problem of face blurring caused by discontinuous network quality in the video call is solved, and the video telephone service experience is improved.

Fig. 3a and 3b are schematic diagrams of some further embodiments of the method for improving a human face picture according to the present invention. Wherein fig. 3b gives a specific example of fig. 3 a. Preferably, this embodiment can be executed by the user terminal of the present invention. The method of the embodiment of fig. 3a and 3b may comprise the steps of:

And step 31, collecting facial feature data when the image is clear, using the facial feature data as input parameters of a facial feature algorithm model, and supplementing and recovering missing facial images when necessary.

And step 32, judging the emotion characteristics of the opposite side according to the voice, and using the emotion characteristics of the opposite side as input parameters of a face characteristic algorithm model, so that the similarity of the facial features in image recovery can be improved.

Step 33, buffering the current video frame.

In some embodiments of the present invention, the step 33 may include: and buffering the image data of the previous frame, namely buffering the image data of the video frame of the previous picture.

And step 34, compensating and restoring the current missing face image data through a face feature restoration algorithm according to the collected emotion feature of the opposite side, the cached video frame data of the previous frame and the collected face feature parameters.

in the audio and video telephone process, the user terminal of the embodiment of the invention can capture the key facial features when the human face image is clear, record the change features of audio signals such as the speed, tone, frequency and the like, and judge the emotion of the opposite side in real time. Meanwhile, the above embodiment of the present invention buffers the previous video data frame. When the video frame received by the terminal is incomplete, the embodiment of the invention can compensate and recover the missing human face picture in the image in real time according to the key feature of the human face of the other captured before and the current emotion information by combining the human face feature model algorithm and the cached data frame.

the embodiment of the invention can be applied to the application scenes of VoLTE/RCS and other people and interpersonal point-to-point video telephones.

The embodiment of the invention can ensure the image quality of the face part on the picture in video communication by the user terminal when the wireless signal coverage is poor and the network transmission is interrupted, thereby improving the fluency and integrity of the video telephone service picture and improving the service experience. The embodiment of the invention can be applied to various scenes of VoLTE and RCS point-to-point mobile video telephone service.

fig. 4 is a diagram of some embodiments of a ue of the present invention. As shown in fig. 4, the user terminal may include a clear feature acquisition module 41, a data caching module 42, and a data restoring module 43, where:

And the clear feature acquisition module 41 is used for acquiring facial feature parameters under the condition that the call image is clear in the video call process.

And the data buffering module 42 is used for buffering the current video frame.

and the data restoration module 43 is configured to, in the case of facial image data loss, compensate and restore the lost facial image data through a facial feature restoration algorithm according to the cached video frame data and the acquired facial feature parameters.

In some embodiments of the present invention, the user terminal is configured to perform an operation for implementing the face picture improvement method according to any of the embodiments (for example, any of fig. 1 to 3).

based on the user terminal provided by the embodiment of the invention, the face feature data when the image is clear can be collected and used as the input parameters of the face feature algorithm model, and the missing face image can be supplemented and restored if necessary.

The embodiment of the invention solves the problem of mosaic on the face picture caused by discontinuous frame loss of video data, and the receiving terminal (user terminal) can compensate and recover the image data lost in the key area of the face on the picture, thereby improving the service quality of video call.

Fig. 5 is a schematic diagram of another embodiment of a ue of the present invention. Compared with the embodiment of fig. 4, in the embodiment of fig. 5, the user terminal may further include an emotional characteristic obtaining module 44, where:

and the emotion characteristic acquisition module 44 is used for acquiring audio characteristics and determining the emotion characteristics of the other party according to the audio characteristics.

In some embodiments of the present invention, the emotion feature obtaining module 44 may be specifically configured to collect a change of an audio signal of the opposite party during the call, and determine an emotion feature of the opposite party according to the change of the audio signal of the opposite party, where the audio signal of the opposite party includes at least one of a speed, a tone, a volume, and an audio of the opposite party.

the data restoring module 43 may also be configured to compensate and restore missing facial image data through a facial feature restoring algorithm according to the collected emotion features of the other party, the cached video frame data, and the collected facial feature parameters.

according to the embodiment of the invention, the face missing image can be compensated and restored by utilizing a face feature model algorithm according to the face feature key parameter collected in the video signal, the emotion feature parameter of the other party collected in the audio signal and the cached image data of the last frame, so that the problem that the face image is fuzzy when the network quality is poor in video call is solved.

Fig. 6 is a diagram of a user terminal according to further embodiments of the present invention. Compared with the embodiment of fig. 5, in the embodiment of fig. 6, the user terminal may further include a data receiving module 45 and an area identifying module 46, where:

and the data receiving module 45 is configured to receive an image data frame of a video call in the video call process.

A region identification module 46 for identifying and labeling at least one location region in the video image, wherein the location region comprises a face region.

Fig. 7 is a diagram of a user terminal according to further embodiments of the present invention. As shown in fig. 7, the user terminal may include a memory 71 and a processor 72, wherein:

a memory 71 for storing instructions.

A processor 72, configured to execute the instructions, so that the apparatus performs operations to implement the method for improving a face picture according to any of the embodiments described above (for example, any of fig. 1 to fig. 3).

The embodiment of the invention can ensure the image quality of the face part on the picture in video communication, improve the fluency and integrity of the video telephone service picture and improve the service experience by applying the user terminal of the embodiment of the invention under the conditions of poor wireless signal coverage and discontinuous network transmission interruption. The embodiment of the invention can be applied to various scenes of VoLTE and RCS point-to-point mobile video telephone service.

According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions, which when executed by a processor, implement the method for improving a human face image according to any one of the embodiments (for example, any one of fig. 1 to 3) described above.

Based on the computer-readable storage medium provided by the above embodiment of the invention, in the process of video passing, the terminal can identify the region where the human face is located on the video picture, collect the facial feature point information, judge the emotion fluctuation according to the audio signal, and cache the previous screen data frame. When the network transmission quality in the video call is not good (the video frame loss condition is judged), the image data related to the face part in the lost data frame can be automatically reshaped according to the face feature model algorithm, the problem of face blurring caused by discontinuous network quality in the video call is solved, and the video telephone service experience is improved.

the user terminals described above may be implemented as a general purpose processor, a Programmable Logic Controller (PLC), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any suitable combination thereof, for performing the functions described herein.

thus far, the present invention has been described in detail. Some details well known in the art have not been described in order to avoid obscuring the concepts of the present invention. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The description of the present invention has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to practitioners skilled in this art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

1. A face picture improvement method is characterized by comprising the following steps:

Caching a current video frame;

2. the method for improving a human face picture according to claim 1, further comprising:

collecting audio features, and determining emotion features of the opposite party according to the audio features;

The compensating and restoring missing facial image data according to the cached video frame data and the collected facial feature parameters by a facial feature restoration algorithm comprises the following steps:

3. The method for improving human face picture according to claim 2, wherein the collecting audio features and the determining the emotion feature of the opposite party according to the audio features comprises:

4. The method for improving a human face picture according to any one of claims 1 to 3, further comprising:

Receiving an image data frame of a video call in the video call process;

5. The method of claim 4, wherein the identifying and labeling at least one location area in the video image comprises:

6. the method of claim 5, wherein the identifying and labeling at least one location area in the video image comprises:

Identifying a face region and setting the face region to a first priority;

identifying a change area and setting the change area as a second priority;

7. The method for improving the human face picture according to any one of claims 1 to 3, wherein the facial feature parameters under the condition that the collected call image is clear comprise:

Collecting a face picture under the condition of clear call image;

8. The method for improving human face picture according to any one of claims 1-3, wherein after buffering the current video frame, the method further comprises:

Judging whether the current video frame is complete;

under the condition that the face image data is missing, executing a step of compensating and restoring the missing face image data through a face feature restoration algorithm according to the cached video frame data and the collected face feature parameters;

9. The method for improving human face picture according to any one of claims 1-3, wherein said buffering the current video frame comprises:

and buffering the image data of the previous frame.

10. A user terminal, comprising:

the data caching module is used for caching the current video frame;

11. The user terminal according to claim 10, wherein the user terminal is configured to perform an operation for implementing the face picture improvement method according to any one of claims 1 to 9.

12. A user terminal, comprising:

A memory to store instructions;

A processor configured to execute the instructions to cause the apparatus to perform operations to implement the face picture improvement method according to any one of claims 1 to 9.

13. A computer-readable storage medium storing computer instructions which, when executed by a processor, implement the face picture improvement method according to any one of claims 1 to 9.