WO2021042513A1 - 视频聊天中添加表情的方法、装置、计算机设备及存储介质 - Google Patents

视频聊天中添加表情的方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021042513A1
WO2021042513A1 PCT/CN2019/116756 CN2019116756W WO2021042513A1 WO 2021042513 A1 WO2021042513 A1 WO 2021042513A1 CN 2019116756 W CN2019116756 W CN 2019116756W WO 2021042513 A1 WO2021042513 A1 WO 2021042513A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
facial
images
client
emotional state
Prior art date
Application number
PCT/CN2019/116756
Other languages
English (en)
French (fr)
Inventor
陈爽
黄秋凤
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021042513A1 publication Critical patent/WO2021042513A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • H04N7/147Communication arrangements, e.g. identifying the communication as a video-communication, intermediate storage of the signals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/60Network streaming of media packets
    • H04L65/75Media network packet handling

Definitions

  • the embodiments of the present application relate to the financial field, in particular to a method, device, computer equipment, and storage medium for adding emoticons in video chats.
  • video calls can support one-to-one video calls or multi-party video calls.
  • one party initiates a session request, and the other party responds, and the two parties establish a video call connection.
  • the video capture modules of both parties collect the images of both parties and pass them to the other party.
  • the audio capture modules of both parties collect their own voice signals and send them to the other party. In this way, both parties can see each other's images and communicate in real time with voice.
  • the inventor realizes that the current video call only transmits the video stream and the audio stream to the other party for playback, and the content is monotonous and lacks interest.
  • the embodiments of the present application provide a method, device, computer equipment, and storage medium for adding emoticons in a video chat.
  • a technical solution adopted in the embodiment created by this application is to provide a method for adding emoticons in a video chat, which includes the following steps: obtaining a facial video of the first client user during a video call; The facial video determines the emotional state of the user; selects a motion design that matches the emotional state from a preset motion effect database, and adds the motion design to the facial video to be in the first The second client displays.
  • an embodiment of the present application also provides an apparatus for adding expressions in a video chat, including: an acquisition module for acquiring a facial video of a first client user during a video call; The facial video determines the emotional state of the user; an execution module is used to select a motion design matching the emotional state from a preset motion effect database, and add the motion design to the facial video , To display on the second client.
  • an embodiment of the present application further provides a computer device, including a memory and a processor.
  • the memory stores computer readable instructions.
  • the The processor executes the steps of a method for adding an expression in a video chat; wherein, the method for adding an expression in a video chat includes the following steps: acquiring a facial video of a first client user during a video call; according to the facial video Determine the emotional state of the user; select a motion design that matches the emotional state from a preset motion effect database, and add the motion design to the facial video to perform on the second client display.
  • embodiments of the present application also provide a non-volatile storage medium storing computer-readable instructions.
  • the computer-readable instructions are executed by one or more processors, one or more processing
  • the device executes the steps of a method for adding emoticons in a video chat; wherein, the method for adding emoticons in a video chat includes the following steps: obtaining a facial video of the first client user during a video call; The emotional state of the user; select a motion design matching the emotional state from a preset motion effect database, and add the motion design to the facial video for display on the second client.
  • the embodiment of the present application intercepts facial images during a video call, recognizes the emotions of the facial images, and matches the motion design according to the emotions.
  • the method can accurately recognize the user's emotions through facial expressions and improve the accuracy of matching.
  • it can also solve the problem of matching errors or inability to match when the network speed is slow, the voice is low or the speech is not clear.
  • FIG. 1 is a schematic diagram of the basic flow of a method for adding emoticons in a video chat provided by an embodiment of the application;
  • FIG. 2 is a schematic diagram of the basic flow of a method for obtaining a facial video of a first client user during a video call sent by a server according to an embodiment of the application;
  • FIG. 3 is a schematic diagram of the basic flow of a method for determining a facial video of a first client user based on multiple video images according to an embodiment of the application;
  • FIG. 4 is a schematic diagram of the basic flow of a method for determining a user's emotional state according to a facial video provided by an embodiment of the application;
  • FIG. 5 is a schematic diagram of the basic flow of a method for adding a motion design to a facial video according to an embodiment of the application;
  • FIG. 6 is a schematic diagram of the basic flow of a method for adding animation design provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of the basic flow of another method for adding motion design provided by an embodiment of the application.
  • Fig. 8 is a basic structural block diagram of an apparatus for adding emoticons in a video chat provided by an embodiment of the application;
  • Fig. 9 is a block diagram of the basic structure of a computer device provided by an embodiment of the application.
  • FIG. 1 is a schematic diagram of the basic flow of the method for adding emoticons in a video chat according to this embodiment.
  • the method for adding emoticons in a video chat includes the following steps:
  • the second client obtains the video stream sent by the first client from the server and intercepts the video at a preset time interval.
  • the image frame in the stream, and the image frame is identified to determine whether it is a facial image.
  • the video data is intercepted with this time point as the starting point until the intercepted image frame is a non-facial image to obtain the user's facial video.
  • the first client is the client that sends the video stream in this embodiment
  • the second client is the client that receives the video stream.
  • the first client is also the second client
  • the second client is also the first client.
  • S1200 Determine the emotional state of the user according to the facial video
  • various micro-expression images can be selected for the facial expression sample image, for example, squinting eyes, pursing lips, rolling eyes, and so on.
  • the convolutional neural network model is trained through facial expression sample images until the trained model can converge.
  • the video call can be divided into multiple facial videos according to the emotional state, and each facial video is an emotional state.
  • the facial video can be divided according to the emotional state, and the emotional state of the image frame in the facial video is taken as the emotional state of the facial video.
  • the captured multiple image frames are a, b, c, d, e, f, g, and the corresponding time points are 1s, 1.2s, 1.4s, 1.6s, 1.8s, 2s, 2.2 s.
  • the emotional states of a, b, c, d be happy emotional states
  • the emotional states of e, f, g are calm emotional states. Therefore, the emotional state of the facial video composed of time nodes 1s to 1.6s is determined to be happy
  • It is determined that the emotional state of the facial video composed of time nodes from 1.8s to 2.2s is calm.
  • the motion effect database is pre-stored in the second client, and includes multiple motion effect designs classified according to the emotional state identification code.
  • the animation design can be to add an expression to the user's face, for example, adding a laughing mouth, a smiling eye, a shiny gold tooth on the mouth, and so on.
  • an identification code can be set for each emotional state, and the animation set corresponding to the identification code can be searched in the animation database through the identification code, and one of the animation sets can be selected.
  • the second client when adding a motion design, obtains the facial contour size in the video, scales the selected preset contour size of the motion design according to the facial contour size, and overlaps the two to achieve The purpose of adding animation design in the second client.
  • a prompt message prompting the user whether to add the animation design can be displayed in the second client, and when a user-triggered cancel addition message is received After that, the original facial video of the first client user is displayed; when the user-triggered message confirming the addition is received, the animation design is added.
  • the second client may receive the animation design triggered by the user of the second client, and add the animation design to the face of the call video.
  • multiple emoticons are displayed on the display interface of the second client, including various spoof emoticons, which are triggered by the user by clicking on the emoticons.
  • the first client user can also modify or add animation design to his own facial video, and send the processed facial video to the second client through the server, in order to facilitate the second client user’s
  • the server obtains the original facial video of the first client user at the same time as the facial video processed by the first client user. Therefore, in this case: the second client receives the information sent by the server for prompting the first client The end video is the prompt information of the processed facial video; the request for obtaining the original facial video is sent to the server; the original video of the first client user sent by the server is received, so as to be displayed in the second client.
  • the expression method is added to the above video chat.
  • the emotion of the facial image is recognized, and the motion design is matched according to the emotion.
  • This method can accurately recognize the user's emotion through the facial expression and improve the matching Accuracy.
  • it can also solve the problem of matching errors or inability to match when the network speed is slow, the voice is low or the speech is not clear.
  • FIG. 2 is a method for obtaining a first client sent by a server according to an embodiment of the application. Schematic diagram of the basic flow of the method for the end user to perform facial video during a video call.
  • step S1100 includes the following steps:
  • the first client is a client that sends a video stream
  • the video stream is video data generated by the user during a video call.
  • the second client intercepts the video stream from the server.
  • the video stream segment can be intercepted according to the preset time interval, or the complete video stream can be intercepted.
  • S1120 Intercept multiple video images from the video stream in sequence according to the first preset time interval
  • S1130 Determine the facial video of the first client user according to multiple video images.
  • the first preset time interval is a preset time interval.
  • FIG. 3 is a method for determining the first client based on multiple video images according to the embodiment of the application. Schematic diagram of the basic flow of the method for end-user facial video.
  • step S1130 includes the following steps:
  • the pre-trained face recognition model can be used to sequentially judge the intercepted video images to determine whether they are face images.
  • a neural network model can be used.
  • S1132. Determine the face image in the first order in the multiple consecutive face image groups as the first target image in order, and compare it with the face image in the last order in the multiple consecutive face image groups.
  • the neighboring non-face image is determined as the second target image;
  • S1133 Determine the time points at which the first target image and the second target image are intercepted as the start time and the end time respectively, and determine the video between the start time and the end time as a facial video.
  • the first target image includes a face image
  • the second target image does not include a face image and only includes a non-face image.
  • the facial videos are all videos that contain human facial images. Therefore, when determining the facial video, in accordance with the order in which the video images are captured, it is determined that each continuous video image contains a face image. Only in this case, the video image with the first order of the continuous video image is determined as the first video image.
  • the target image, the non-face image adjacent to the video image in the last order in the continuous video image, that is, the video image that does not contain the face image, is determined as the second target image, and the first target image and the second target image are determined The time point as the start time and the end time to intercept the facial video.
  • one or more facial videos can be obtained.
  • animation design can be added separately.
  • FIG. 4 is a basic flow diagram of a method for determining a user's emotional state based on a facial video provided by an embodiment of the present application.
  • step S1200 includes the following steps:
  • S1220 Recognize emotional states of multiple facial images respectively
  • a plurality of image frames can be sequentially input into a pre-trained to converge emotional recognition model to obtain a classification value according to the interception order of the image frames, and the emotional state of each image frame can be determined according to the classification value.
  • multiple emotional states can be set, such as happy, funny, laughing, rolling eyes, pursing, disdain, contempt, sad, calm and so on.
  • various micro-expression images can be selected for the facial expression sample image, for example, squinting eyes, pursing lips, rolling eyes, and so on.
  • the convolutional neural network model is trained through facial expression sample images until the trained model can converge.
  • S1230 Determine whether the number of adjacent facial images with the same emotional state is greater than a preset number
  • multiple facial videos can be classified according to the emotional state, and each facial video is an emotional state.
  • the facial video can be divided according to the emotional state, and the emotional state of the image frame in the facial video is taken as the emotional state of the facial video.
  • the captured multiple image frames are a, b, c, d, e, f, g, and the corresponding time points are 1s, 1.2s, 1.4s, 1.6s, 1.8s, 2s, 2.2 s.
  • the emotional states of a, b, c, d be happy emotional states
  • the emotional states of e, f, g are calm emotional states. Therefore, the emotional state of the facial video composed of time nodes 1s to 1.6s is determined to be happy
  • It is determined that the emotional state of the facial video composed of time nodes from 1.8s to 2.2s is calm.
  • the video frame of the entire video can be intercepted, and the facial images in each video frame can be judged to have the same emotional state.
  • they have the same emotional state follow the facial video
  • the method of determining intercepts a video with the same emotional state and determines the video as a facial video in a certain emotional state.
  • An embodiment of the application provides a method for adding a motion design to a facial video, as shown in FIG. 5, which is a basic flow diagram of a method for adding a motion design to a facial video according to an embodiment of the application .
  • step S1300 includes the following steps:
  • the animation design can be to add expressions to the user's face, for example, adding a laughing mouth, a smiling eye, showing a shiny gold tooth on the mouth, and so on. Therefore, in order to match the motion design with the size of the human face, in the embodiment of the present application, the size of the motion design is scaled according to the size of the human face, and the motion design is added to the face image.
  • the animation design library is displayed in the terminal interface, the user clicks the animation design to send the selection instruction, and the terminal receives the instruction according to the face The size of the image adds the animation design to the face image.
  • the terminal may randomly select an animation design from the animation database of a certain emotional state, or according to user preferences, for example, according to the number of times the user uses a certain animation design. The most dynamic design.
  • the embodiment of the present application also provides a method for adding animation design, as shown in FIG. 6, which is a basic flow diagram of a method for adding animation design provided by an embodiment of the application.
  • step S1300 the following steps are further included:
  • the first client may receive the animation design triggered by the user of the second client, and add the animation design to the face of the call video.
  • multiple expressions are displayed on the display interface of the second client, including various spoof expressions, which are triggered by the user by clicking on the expression.
  • this function can be assigned according to permissions. For example, if the user of the second client has higher permissions, the animation design displayed in the first client will be displayed according to the animation design selected by the second client. For example, during a video call between the first client user and the second client user, and the second client user has a higher authority, he selects a certain animation design to display in the video of the first client. By designing the permissions, the user's usage rate of the software can be further increased.
  • the embodiment of the present application also provides another method for adding animation design, as shown in FIG. 7, which is a basic flow diagram of another method for adding animation design provided by an embodiment of the application.
  • step S1300 the following steps are further included:
  • the prompt information is used to prompt the user of the second client that the video stream of the first client has been processed. For example, when the video stream of the first client is a video that has been beautified, the prompt information prompts that the video stream has undergone beautification processing.
  • the second client When the user of the second client does not want to see the processed video, the second client sends a request to the server to obtain the original video stream corresponding to the processed video stream and requests the server to send the original video of the first client.
  • FIG. 8 is a basic structural block diagram of an apparatus for adding emoticons in a video chat in this embodiment.
  • an apparatus for adding emoticons in a video chat includes: an acquisition module 2100, a processing module 2200, and an execution module 2300.
  • the obtaining module 2100 is used to obtain the facial video of the first client user during a video call
  • the processing module 2200 is used to determine the emotional state of the user according to the facial video
  • the execution module 2300 is used to obtain a preset Select the animation design matching the emotional state from the animation database of, and add the animation design to the facial video for display on the second client.
  • the device for adding expressions in video chats intercepts the facial images during the video call, recognizes the emotions of the facial images, and matches the animation design according to the emotions.
  • This method can accurately recognize the user's emotions through facial expressions and improve the matching Accuracy. In addition, it can also solve the problem of matching errors or inability to match when the network speed is slow, the voice is low or the speech is not clear.
  • the acquisition module includes: a first acquisition sub-module, configured to receive a video stream of the first client sent by a server; a first processing sub-module, configured to sequentially download from A plurality of video images are intercepted in the video stream; a first execution sub-module is configured to determine the facial video of the first client user according to the plurality of video images.
  • the processing module includes: a second processing sub-module, configured to sequentially determine whether the multiple video images are face images according to the order in which the video images are intercepted; and a third processing sub-module, configured to follow the Sequentially, the face image in the first order in the plurality of consecutive face image groups is determined as the first target image, and will be adjacent to the face image in the last order in the plurality of consecutive face image groups
  • the non-human face image of is determined as the second target image
  • the second execution sub-module is used to determine the time points at which the first target image and the second target image are intercepted as the start time and the end time, and the The video between the start time and the end time is determined to be the face video.
  • the execution module includes: a second acquisition sub-module, which is used to sequentially intercept a plurality of facial images from the facial video at a second preset time interval; and a fourth processing sub-module, which is used to separately identify all facial images.
  • the emotional state of the multiple facial images is used to determine whether the number of adjacent facial images with the same emotional state is greater than the preset number; the third execution sub-module is used to When the number is preset, the emotional state of the facial video composed of multiple adjacent facial images is determined as the target emotional state.
  • the execution module includes: a third acquisition sub-module for acquiring the face size in the facial video; a sixth processing sub-module for designing the animation effect according to the face size The size of the zoom is performed; the fourth execution sub-module is used to overlap the zoomed motion design with the face image.
  • it further includes: a fourth acquisition sub-module for receiving the first animation design triggered by the user of the second client; and a fifth execution sub-module to add the first animation design to the Facial video.
  • it further includes: a fifth acquisition submodule, configured to receive prompt information sent by the server for prompting that the video stream of the first client is a processed video stream; and a seventh processing submodule, configured to Send an acquisition request to the server, where the acquisition request is used to obtain the original video stream corresponding to the processed video stream; the sixth execution sub-module is used to receive the original video stream sent by the server, so as to The second client terminal displays.
  • a fifth acquisition submodule configured to receive prompt information sent by the server for prompting that the video stream of the first client is a processed video stream
  • a seventh processing submodule configured to Send an acquisition request to the server, where the acquisition request is used to obtain the original video stream corresponding to the processed video stream
  • the sixth execution sub-module is used to receive the original video stream sent by the server, so as to The second client terminal displays.
  • FIG. 9 is a block diagram of the basic structure of the computer device in this embodiment.
  • the computer device includes a processor, a storage medium, a memory, and a network interface connected through a system bus.
  • the storage medium of the computer device stores an operating system, a database, and computer-readable instructions.
  • the database may store control information sequences.
  • the processor can enable the processor to implement a video chat session.
  • the storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random storage memory ( Random Access Memory, RAM) and other volatile storage media.
  • the processor of the computer equipment is used to provide computing and control capabilities and support the operation of the entire computer equipment.
  • a computer readable instruction may be stored in the memory of the computer device, and when the computer readable instruction is executed by the processor, the processor may execute a method for adding emoticons in a video chat.
  • the method for adding an expression in a video chat includes the following steps: acquiring a facial video of a user of the first client during a video call; determining the emotional state of the user according to the facial video; from a preset animation database The animation design matching the emotional state is selected, and the animation design is added to the facial video for display on the second client.
  • the network interface of the computer device is used to connect and communicate with the terminal.
  • FIG. 9 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • the processor is used to execute the specific content of the acquisition module 2100, the processing module 2200, and the execution module 2300 in FIG. 8, and the memory stores the program codes and various data required to execute the above modules.
  • the network interface is used for data transmission between user terminals or servers.
  • the memory in this embodiment stores the program code and data required to execute all the sub-modules in the method of adding emoticons in the video chat, and the server can call the program code and data of the server to perform the functions of all the sub-modules.
  • the computer equipment intercepts the facial images during the video call, recognizes the emotions of the facial images, and matches the motion design according to the emotions.
  • This method can accurately recognize the user's emotions through facial expressions and improve the accuracy of matching.
  • it can also solve the problem of matching errors or inability to match when the network speed is slow, the voice is low or the speech is not clear.
  • the present application also provides a storage medium storing computer-readable instructions.
  • the one or more processors execute a method for adding emoticons in a video chat. Steps; wherein, the method of adding emoticons in the video chat includes the following steps: obtaining a facial video of the first client user during a video call; determining the emotional state of the user according to the facial video; from preset motion effects The animation design matching the emotional state is selected from the database, and the animation design is added to the facial video for display on the second client.
  • the computer program can be stored in a computer readable storage medium, and the program can be stored in a computer readable storage medium. When executed, it may include the procedures of the above-mentioned method embodiments.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disc, a read-only memory (Read-Only Memory, ROM), or a volatile storage medium (Random Access Memory, RAM), etc. Storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Information Transfer Between Computers (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请实施例公开了一种视频聊天中添加表情的方法、装置、计算机设备及存储介质。所述方法包括下述步骤:获取第一客户端用户在视频通话时的面部视频;根据所述面部视频确定所述用户的情绪状态;从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。该方法通过截取视频通话过程中的面部图像,对面部图像的情绪进行识别,并根据情绪来匹配动效设计,该方法通过面部表情可以准确的识别用户的情绪,提高匹配的准确度。此外,还可以解决网速慢,声音小或者说话不清楚时出现的匹配错误或者无法匹配的问题。

Description

视频聊天中添加表情的方法、装置、计算机设备及存储介质
本申请要求于2019年9月3日提交中国专利局、申请号为201910828395.7,发明名称为“视频聊天中添加表情的方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及金融领域,尤其是一种视频聊天中添加表情的方法、装置、计算机设备及存储介质。
背景技术
随着互联网技术的发展,移动终端的普及,视频通话技术越来越受到人们的青睐。
目前,视频通话可以支持一对一的视频通话,也可以支持多方视频通话。在进行视频时,一方发起会话请求,另一方回应后,双方建立视频通话连接,双方视频采集模块采集到双方的图像并传递给对方,同时双方的音频采集模块采集各自的语音信号发送给对方,这样双方都能看到对方的图像,并进行语音实时交流。
发明人意识到,目前的视频通话仅仅是将视频流和音频流传递到对方进行播放,内容单调,缺乏趣味性。
发明内容
本申请实施例提供一种视频聊天中添加表情的方法、装置、计算机设备及存储介质。
为解决上述技术问题,本申请创造的实施例采用的一个技术方案是:提供一种视频聊天中添加表情的方法,包括下述步骤:获取第一客户端用户在视频通话时的面部视频;根据所述面部视频确定所述用户的情绪状态;从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。
为解决上述技术问题,本申请实施例还提供一种视频聊天中添加表情的装置,包括:获取模块,用于获取第一客户端用户在视频通话时的面部视频;处理模块,用于根据所述面部视频确定所述用户的情绪状态;执行模块,用于从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。
为解决上述技术问题,本申请实施例还提供一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行一种视频聊天中添加表情的方法的步骤;其中,所述视频聊天中添加表情的方法包括以下步骤:获取第一客户端用户在视频通话时的面部视频;根据所述面部视频确定所述用户的情绪状态;从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。
为解决上述技术问题,本申请实施例还提供一种存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行一种视频聊天中添加表情的方法的步骤;其中,所述视频聊天中添加表情的方法包括以下步骤:获取第一客户端用户在视频通话时的面部视频;根据所述面部视频确定所述用户的情绪状态;从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。
本申请实施例通过截取视频通话过程中的面部图像,对面部图像的情绪进行识别,并根据情绪来匹配动效设计,该方法通过面部表情可以准确的识别用户的情绪,提高匹配的准确度。此外,还可以解决网速慢,声音小或者说话不清楚时出现的匹配错误或者无法匹配的问题。
附图说明
图1为本申请实施例提供的一种视频聊天中添加表情的方法的基本流程示意图;
图2为本申请实施例提供的一种获取服务器发送的第一客户端用户在视频通话时的面部视频的方法的基本流程示意图;
图3为本申请实施例提供的一种根据多个视频图像确定第一客户端用户的面部视频的方法的基本流程示意图;
图4为本申请实施例提供的一种根据面部视频确定用户的情绪状态的方法的基本流程示意图;
图5为本申请实施例提供一种将动效设计添加到面部视频中的方法的基本流程示意图;
图6为本申请实施例提供的一种添加动效设计的方法的基本流程示意图;
图7为本申请实施例提供的另一种添加动效设计的方法的基本流程示意图;
图8为本申请实施例提供的一种视频聊天中添加表情的装置基 本结构框图;
图9为本申请实施例提供的计算机设备基本结构框图。
具体实施方式
具体地,请参阅图1,图1为本实施例视频聊天中添加表情的方法的基本流程示意图。
如图1所示,视频聊天中添加表情的方法包括下述步骤:
S1100、获取第一客户端用户在视频通话时的面部视频;
实际应用中,用户在视频通话过程中,采用面对面的方式进行交谈来增强互动性。但是在通常情况,由于通讯信号或者交流中问题可能会出现画面中没有人像的情况,因此,第二客户端从服务器中获取第一客户端发送的视频流,并按照预设的时间间隔截取视频流中的图像帧,以及对图像帧进行识别判断其是否为面部图像。当该图像帧为面部图像时,以该时间点为起始点截取视频数据直到截取的图像帧为非面部图像时为止,得到用户的面部视频。
需要说明的是,第一客户端为本实施例中发送视频流的客户端,第二客户端为接收视频流的客户端。实际上,在视频通话过程中,由于第一客户端和第二客户端同时发送视频流和接收视频流,所以第一客户端同时也是第二客户端,第二客户端同时也是第一客户端。
S1200、根据面部视频确定用户的情绪状态;
获取该面部视频中截取的多个图像帧,按照图像帧的截取顺序依次将多个图像帧输入到预先训练至收敛的情绪识别模型中得到分类值,按照分类值确定每个图像帧的情绪状态。为了增强趣味性,可以设置多个情绪状态,例如,快乐,搞笑、大笑,翻白眼,抿嘴笑、不屑,鄙视,伤心,平静等等。
其中,面部表情样本图像可以选用各种微表情图像,例如,斜眼笑,抿嘴笑,翻白眼等等。通过面部表情样本图像对卷积神经网络模型进行训练,直至训练后的模型可以收敛为止。
在一些实施方式中,在视频通话时可能会存在多个情绪状态,即将视频通话按照情绪状态可以分为多种面部视频,每一种面部视频为一种情绪状态。可以按照情绪状态划分面部视频,并将该面部视频中图像帧的情绪状态作为该面部视频的情绪状态。
举例说明,截取到的多个图像帧分别为a,b,c,d,e,f,g,其对应的时间点分别为1s,1.2s,1.4s,1.6s,1.8s,2s,2.2s。设a,b,c,d的情绪状态为快乐的情绪状态,e,f,g的情绪状态为平静的情绪状态,因此,确定由时间节点1s到1.6s组成的面部视频的情绪状态为快乐,确定由时间节点1.8s到2.2s组成的面部视频的情绪状态为平静。
S1300、从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将动效设计添加到面部视频中,以在第二客户端进行显示。
动效数据库预存在第二客户端中,包括按照情绪状态识别码进行分类后的多个动效设计。该动效设计可以是在用户脸上添加表情,例如,增加一个哈哈大笑的嘴,一个笑弯了的眼睛,在嘴上露一颗闪闪发光的金牙等等。
在实际应用中为了便于选取,可以对每种情绪状态设置识别码,通过识别码在动效数据库中查找与识别码对应的动效集合,并从动效集合中任选一种。
本实施例在添加动效设计时,第二客户端获取视频中的面部轮廓尺寸,将选取的动效设计的预设的轮廓尺寸按照面部轮廓尺寸进行缩放,并将二者进行重合,进而实现在第二客户端添加动效设计的目的。
在一些实施方式中,当视频通话为比较正式的场合,不应添加动效设计时,可以在第二客户端中显示提示用户是否添加动效设计的提示信息,当接收用户触发的取消添加消息后,显示第一客户端用户的原始面部视频;当接收到用户触发的确认添加的消息后,添加动效设计。
在一个应用场景中,为了增强趣味性,第二客户端可以接收第二客户端用户触发的动效设计,并将该动效设计添加到通话视频的面部。其中,第二客户端的显示界面中显示多个表情,包括各种恶搞表情,用户通过点击表情来触发。
在一个应用场景中,第一客户端用户也可以对自身的面部视频进行修饰或者添加动效设计,并通过服务器将处理后的面部视频发送给第二客户端,为了便于第二客户端用户的选择,服务器在获取第一客户端用户处理过的面部视频时同时获取第一客户端用户的原始面部视频,因此,在这种情况下:第二客户端接收服务器发送的用于提示第一客户端视频为处理的面部视频的提示信息;向服务器发送原始面部视频的获取请求;接收服务器发送的第一客户端用户的原始视频,以在第二客户端中进行显示。
上述视频聊天中添加表情方法,通过截取视频通话过程中的面部图像,对面部图像的情绪进行识别,并根据情绪来匹配动效设计,该方法通过面部表情可以准确的识别用户的情绪,提高匹配的准确度。此外,还可以解决网速慢,声音小或者说话不清楚时出现的匹配错误或者无法匹配的问题。
本申请实施例提供一种获取服务器发送的第一客户端用户在视频通话时的面部视频的方法,如图2所示,图2为本申请实施例提供的一种获取服务器发送的第一客户端用户在视频通话时的面部视频 的方法的基本流程示意图。
具体地,如图2所示,步骤S1100包括下述步骤:
S1110、接收服务器发送的第一客户端的视频流;
第一客户端为发送视频流的客户端,视频流为用户在视频通话过程中产生的视频数据。在视频通话过程中,第二客户端从服务器中截取视频流。可以按照预设的时间间隔来截取视频流片段,也可以截取完整的视频流。
S1120、按照第一预设时间间隔依次从视频流中截取多个视频图像;
S1130、根据多个视频图像确定第一客户端用户的面部视频。
第一预设时间间隔为预设的时间间隔,通过按照预设时间间隔截取视频图像,即视频帧,并判断视频帧中是否包含面部图像,当该图像帧为面部图像时,以该时间点为起始点截取视频数据直到截取的图像帧为非面部图像时为止,得到用户的面部视频。在实际应用中,在视频通话过程中很可能会产生不包含用户面部图像的视频片段,在这种情况下,利用上述方法,可以准确的确定视频流中包含用户面部视频的数据,避免后续表情添加出错的问题。
本申请实施例提供一种根据多个视频图像确定第一客户端用户的面部视频的方法,如图3所示,图3为本申请实施例提供的一种根据多个视频图像确定第一客户端用户的面部视频的方法的基本流程示意图。
具体地,如图3所示,步骤S1130包括下述步骤:
S1131、按照截取视频图像的顺序依次判断多个视频图像是否为人脸图像;
在实际应用中,可以利用预先训练得到的人脸识别模型依次对截取的视频图像进行判断,以确定其是否为人脸图像。例如,可以利用神经网络模型等。
S1132、按照顺序将包含多个连续的人脸图像组中第一顺序位的人脸图像确定为第一目标图像,以及将与多个连续的人脸图像组中最后顺序位的人脸图像相邻的非人脸图像确定为第二目标图像;
S1133、分别将截取第一目标图像和第二目标图像的时间点确定为起始时刻和终止时刻,以及将起始时刻和终止时刻之间的视频确定为面部视频。
本申请实施例中,第一目标图像包含人脸图像,第二目标图像不包含人脸图像只包含非人脸图像。需要说明的是,面部视频为截取的视频图像中均为包含人脸图像的视频。因此,在确定面部视频时,按照视频图像的截取顺序,确定每个连续视频图像包含人脸图像,只有在这种情况下,将该连续视频图像的第一顺序位的视频图像确定为第 一目标图像,将与连续视频图像中最后顺序位的视频图像相邻的非人脸图像,即不包含人脸图像的视频图像确定为第二目标图像,以及将第一目标图像和第二目标图像的时间点作为起始时刻和终止时刻截取面部视频。
需要说明的是,按照本实施例中面部视频的确定方法,可以得到一个或多个面部视频。在实际应用中,对于多个面部视频,可以分别添加动效设计。本实施例中,还可能会出现多个连续的非人脸图像组成的视频,可以对该视频不作处理;本实施例中,由于是按照时间间隔截取的视频图像,因此,还可能会出现未截取的视频图像中存在非人脸图像的面部视频,在此情况下,由于面部视频中出现的非人脸图像的时间段极短,因此,按照上述方法处理即可。
本申请实施例提供一种根据面部视频确定用户的情绪状态的方法,如图4所示,图4为本申请实施例提供的一种根据面部视频确定用户的情绪状态的方法的基本流程示意图。
具体地,如图4所示,步骤S1200包括下述步骤:
S1210、按照第二预设时间间隔依次从面部视频截取多个面部图像;
S1220、分别识别多个面部图像的情绪状态;
在识别面部图像的情绪状态时,可以按照图像帧的截取顺序依次将多个图像帧输入到预先训练至收敛的情绪识别模型中得到分类值,按照分类值确定每个图像帧的情绪状态。其中,可以设置多个情绪状态,例如,快乐,搞笑、大笑,翻白眼,抿嘴笑、不屑,鄙视,伤心,平静等等。
需要说明的是,面部表情样本图像可以选用各种微表情图像,例如,斜眼笑,抿嘴笑,翻白眼等等。通过面部表情样本图像对卷积神经网络模型进行训练,直至训练后的模型可以收敛为止。
S1230、判断具有相同情绪状态且相邻的面部图像的个数是否大于预设个数;
S1240、当大于预设个数时,将由相邻的多个面部图像组成的面部视频的情绪状态确定为目标情绪状态。
在实际应用中,整个视频通话过程中,会存在多个情绪状态,本申请实施例中,按照情绪状态可以分为多种面部视频,每一种面部视频为一种情绪状态。可以按照情绪状态划分面部视频,并将该面部视频中图像帧的情绪状态作为该面部视频的情绪状态。
举例说明,截取到的多个图像帧分别为a,b,c,d,e,f,g,其对应的时间点分别为1s,1.2s,1.4s,1.6s,1.8s,2s,2.2s。设a,b,c,d的情绪状态为快乐的情绪状态,e,f,g的情绪状态为平静的情绪状态,因此,确定由时间节点1s到1.6s组成的面部视频的情绪状态为快乐,确定由时间节点 1.8s到2.2s组成的面部视频的情绪状态为平静。
需要说明的是,当按照情绪状态划分面部视频时,可以对整个视频进行视频帧截取,并判断每个视频帧中的面部图像拥有相同的情绪状态,当具有相同的情绪状态时,按照面部视频的确定方法截取具有相同情绪状态的视频,并将该视频确定为某种情绪状态下的面部视频。
本申请实施例提供一种将动效设计添加到面部视频中的方法,如图5所示,图5为本申请实施例提供一种将动效设计添加到面部视频中的方法的基本流程示意图。
具体地,如图5所示,步骤S1300包括下述步骤:
S1311、获取面部视频中人脸尺寸;
S1312、按照人脸尺寸将动效设计的尺寸进行缩放;
动效设计可以是在用户脸上添加表情,例如,增加一个哈哈大笑的嘴,一个笑弯了的眼睛,在嘴上露一颗闪闪发光的金牙等等。因此,为了将动效设计与人脸尺寸进行匹配,本申请实施例中,将动效设计的尺寸按照人脸的的尺寸进行缩放,并将动效设计添加到人脸图像中。
在一些实施方式中,在选取动效设计时,可以按照用户的指令进行选取,例如在终端界面中显示动效设计库,用户单击动效设计发送选取指令,终端接收到指令后按照人脸图像尺寸大小将动效设计添加到人脸图像中。
在一些实施方式中,为了提高便利性,终端可以从某一情绪状态的动效数据库中随机选取动效设计,还可以根据用户偏好,例如,按照用户使用某种动效设计的次数,选取次数最多的动效设计。
S1313、将缩放后的动效设计与人脸图像重合。
本申请实施例还提供一种添加动效设计的方法,如图6所示,图6为本申请实施例提供的一种添加动效设计的方法的基本流程示意图。
具体地,如图6所示,步骤S1300之后,还包括下述步骤:
S1321、接收第二客户端用户触发的第一动效设计;
S1322、将第一动效设计添加到面部视频中。
为了增强趣味性,第一客户端可以接收第二客户端用户触发的动效设计,并将该动效设计添加到通话视频的面部。其中,第二客户端的显示界面中显示多个表情,包括各种恶搞表情,用户通过点击表情来触发。需要说明的是,该功能可以按照权限进行分配,例如,第二客户端用户的权限较高,则第一客户端中显示的动效设计则按照第二客户端选择的动效设计进行显示,举例说明,第一客户端用户和第二客户端用户在视频通话过程中,第二客户端用户的权限较高,则其选取某中动效设计在第一客户端的视频中进行显示。通过对权限进行设计,还可以进一步增加用户的对该软件的使用率。
本申请实施例还提供另一种添加动效设计的方法,如图7所示, 图7为本申请实施例提供的另一种添加动效设计的方法的基本流程示意图。
具体地,如图7所示,步骤S1300之后,还包括下述步骤:
S1331、接收服务器发送的用于提示第一客户端的视频流为已处理的视频流的提示信息;
提示信息为用于向第二客户端用户提示第一客户端的视频流已经处理。例如,当第一客户端的视频流为已经美颜的视频,则提示信息提示该视频流已经经过美颜处理。
S1332、向服务器发送获取请求,其中,获取请求用于获取所述已处理的视频流对应的原始视频流;
S1333、接收服务器发送的原始视频流,以在第二客户端进行显示。
当第二客户端用户不希望看到处理过的视频时,第二客户端向服务器发送获取所述已处理的视频流对应的原始视频流的请求并请求服务器发送第一客户端的原始视频。
为解决上述技术问题本申请实施例还提供一种视频聊天中添加表情的装置。具体请参阅图8,图8为本实施例视频聊天中添加表情的装置基本结构框图。
如图8所示,一种视频聊天中添加表情的装置,包括:获取模块2100、处理模块2200和执行模块2300。其中,获取模块2100,用于获取第一客户端用户在视频通话时的面部视频;处理模块2200,用于根据所述面部视频确定所述用户的情绪状态;执行模块2300,用于从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。
视频聊天中添加表情的装置通过截取视频通话过程中的面部图像,对面部图像的情绪进行识别,并根据情绪来匹配动效设计,该方法通过面部表情可以准确的识别用户的情绪,提高匹配的准确度。此外,还可以解决网速慢,声音小或者说话不清楚时出现的匹配错误或者无法匹配的问题。
在一些实施方式中,所述获取模块包括:第一获取子模块,用于接收服务器发送的所述第一客户端的视频流;第一处理子模块,用于按照第一预设时间间隔依次从所述视频流中截取多个视频图像;第一执行子模块,用于根据所述多个视频图像确定所述第一客户端用户的面部视频。
在一些实施方式中,所述处理模块包括:第二处理子模块,用于按照截取视频图像的顺序依次判断所述多个视频图像是否为人脸图像;第三处理子模块,用于按照所述顺序将包含多个连续的人脸图像组中第一顺序位的人脸图像确定为第一目标图像,以及将与所述多个 连续的人脸图像组中最后顺序位的人脸图像相邻的非人脸图像确定为第二目标图像;第二执行子模块,用于分别将截取所述第一目标图像和第二目标图像的时间点确定为起始时刻和终止时刻,以及将所述起始时刻和所述终止时刻之间的视频确定为所述面部视频。
在一些实施方式中,所述执行模块包括:第二获取子模块,用于按照第二预设时间间隔依次从所述面部视频截取多个面部图像;第四处理子模块,用于分别识别所述多个面部图像的情绪状态;第五处理子模块,用于判断具有相同情绪状态且相邻的面部图像的个数是否大于预设个数;第三执行子模块,用于当大于所述预设个数时,将由相邻的多个面部图像组成的面部视频的情绪状态确定为目标情绪状态。
在一些实施方式中,所述执行模块包括:第三获取子模块,用于获取所述面部视频中人脸尺寸;第六处理子模块,用于按照所述人脸尺寸将所述动效设计的尺寸进行缩放;第四执行子模块,用于将缩放后的动效设计与人脸图像重合。
在一些实施方式中,还包括:第四获取子模块,用于接收所述第二客户端用户触发的第一动效设计;第五执行子模块将所述第一动效设计添加到所述面部视频中。
在一些实施方式中,还包括:第五获取子模块,用于接收服务器发送的用于提示所述第一客户端的视频流为已处理的视频流的提示信息;第七处理子模块,用于向所述服务器发送获取请求,其中,所述获取请求用于获取所述已处理的视频流对应的原始视频流;第六执行子模块,用于接收所述服务器发送的原始视频流,以在所述第二客户端进行显示。
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图9,图9为本实施例计算机设备基本结构框图。
如图9所示,计算机设备的内部结构示意图。如图9所示,该计算机设备包括通过系统总线连接的处理器、存储介质、存储器和网络接口。其中,该计算机设备的存储介质存储有操作系统、数据库和计算机可读指令,数据库中可存储有控件信息序列,该计算机可读指令被处理器执行时,可使得处理器实现一种视频聊天中添加表情的方法,在一些实施方式中,所述存储介质可以为为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等易失性存储介质。该计算机设备的处理器用于提供计算和控制能力,支撑整个计算机设备的运行。该计算机设备的存储器中可存储有计算机可读指令,该计算机可读指令被处理器执行时,可使得处理器执行一种视频聊天中添加表情的方法。其中,所述视频聊天中添加表情的方法包括以下步骤:获取第一客户端用户在视频通话时的面部视频;根据所述面部视频确 定所述用户的情绪状态;从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。该计算机设备的网络接口用于与终端连接通信。本领域技术人员可以理解,图9中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
本实施方式中处理器用于执行图8中获取模块2100、处理模块2200和执行模块2300的具体内容,存储器存储有执行上述模块所需的程序代码和各类数据。网络接口用于向用户终端或服务器之间的数据传输。本实施方式中的存储器存储有视频聊天中添加表情的方法中执行所有子模块所需的程序代码及数据,服务器能够调用服务器的程序代码及数据执行所有子模块的功能。
计算机设备通过截取视频通话过程中的面部图像,对面部图像的情绪进行识别,并根据情绪来匹配动效设计,该方法通过面部表情可以准确的识别用户的情绪,提高匹配的准确度。此外,还可以解决网速慢,声音小或者说话不清楚时出现的匹配错误或者无法匹配的问题。
本申请还提供一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行一种视频聊天中添加表情的方法的步骤;其中,所述视频聊天中添加表情的方法包括以下步骤:获取第一客户端用户在视频通话时的面部视频;根据所述面部视频确定所述用户的情绪状态;从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等易失性存储介质。

Claims (20)

  1. 一种在视频聊天中添加表情的方法,包括下述步骤:
    获取第一客户端用户在视频通话时的面部视频;
    根据所述面部视频确定所述用户的情绪状态;
    从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。
  2. 根据权利要求1所述的在视频聊天中添加表情的方法,所述获取第一客户端用户在视频通话时的面部视频,包括:
    接收服务器发送的所述第一客户端的视频流;
    按照第一预设时间间隔依次从所述视频流中截取多个视频图像;
    根据所述多个视频图像确定所述第一客户端用户的面部视频。
  3. 根据权利要求2所述的在视频聊天中添加表情的方法,所述根据所述多个视频图像确定所述第一客户端用户的面部视频,包括:
    按照截取视频图像的顺序依次判断所述多个视频图像是否为人脸图像;
    按照所述顺序将包含多个连续的人脸图像组中第一顺序位的人脸图像确定为第一目标图像,以及将与所述多个连续的人脸图像组中最后顺序位的人脸图像相邻的非人脸图像确定为第二目标图像;
    将第一目标图像和第二目标图像的时间点确定为起始时刻和终止时刻,以及将所述起始时刻和所述终止时刻之间的视频确定为所述面部视频。
  4. 根据权利要求1所述的在视频聊天中添加表情的方法,所述根据所述面部视频确定所述用户的情绪状态,包括:
    按照第二预设时间间隔依次从所述面部视频截取多个面部图像;
    分别识别所述多个面部图像的情绪状态;
    判断具有相同情绪状态且相邻的面部图像的个数是否大于预设个数;
    当大于所述预设个数时,将由相邻的多个面部图像组成的面部视频的情绪状态确定为目标情绪状态。
  5. 根据权利要求1所述的在视频聊天中添加表情的方法,所述将所述动效设计添加到所述面部视频中,包括:
    获取所述面部视频中人脸尺寸;
    按照所述人脸尺寸将所述动效设计的尺寸进行缩放;
    将缩放后的动效设计与人脸图像重合。
  6. 根据权利要求1至5任一项所述的在视频聊天中添加表情的方法,所述将所述动效设计添加到所述面部视频中之后,还包括:
    接收所述第二客户端用户触发的第一动效设计;
    将所述第一动效设计添加到所述面部视频中。
  7. 根据权利要求1至5任一项所述的在视频聊天中添加表情的方法,所述将所述动效设计添加到所述面部视频中之后,还包括:
    接收服务器发送的用于提示所述第一客户端的视频流为已处理的视频流的提示信息;
    向所述服务器发送获取请求,其中,所述获取请求用于获取所述已处理的视频流对应的原始视频流;
    接收所述服务器发送的原始视频流,以在所述第二客户端进行显示。
  8. 一种视频聊天中添加表情的装置,包括:
    获取模块,用于获取第一客户端用户在视频通话时的面部视频;
    处理模块,用于根据所述面部视频确定所述用户的情绪状态;
    执行模块,用于从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。
  9. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行一种视频聊天中添加表情的方法的步骤;
    其中,所述视频聊天中添加表情的方法包括以下步骤:
    获取第一客户端用户在视频通话时的面部视频;
    根据所述面部视频确定所述用户的情绪状态;
    从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。
  10. 根据权利要求9所述的计算机设备,所述获取第一客户端用户在视频通话时的面部视频,包括:
    接收服务器发送的所述第一客户端的视频流;
    按照第一预设时间间隔依次从所述视频流中截取多个视频图像;
    根据所述多个视频图像确定所述第一客户端用户的面部视频。
  11. 根据权利要求10所述的计算机设备,所述根据所述多个视频图像确定所述第一客户端用户的面部视频,包括:
    按照截取视频图像的顺序依次判断所述多个视频图像是否为人脸图像;
    按照所述顺序将包含多个连续的人脸图像组中第一顺序位的人脸图像确定为第一目标图像,以及将与所述多个连续的人脸图像组中最后顺序位的人脸图像相邻的非人脸图像确定为第二目标图像;
    将第一目标图像和第二目标图像的时间点确定为起始时刻和终止时刻,以及将所述起始时刻和所述终止时刻之间的视频确定为所述面部视频。
  12. 根据权利要求9所述的计算机设备,所述根据所述面部视频确定所述用户的情绪状态,包括:
    按照第二预设时间间隔依次从所述面部视频截取多个面部图像;
    分别识别所述多个面部图像的情绪状态;
    判断具有相同情绪状态且相邻的面部图像的个数是否大于预设个数;
    当大于所述预设个数时,将由相邻的多个面部图像组成的面部视频的情绪状态确定为目标情绪状态。
  13. 根据权利要求9所述的计算机设备,所述将所述动效设计添加到所述面部视频中,包括:
    获取所述面部视频中人脸尺寸;
    按照所述人脸尺寸将所述动效设计的尺寸进行缩放;
    将缩放后的动效设计与人脸图像重合。
  14. 根据权利要求9至13任一项所述的计算机设备,所述将所述动效设计添加到所述面部视频中之后,还包括:
    接收所述第二客户端用户触发的第一动效设计;
    将所述第一动效设计添加到所述面部视频中。
  15. 根据权利要求9至13任一项所述的计算机设备,所述将所述动效设计添加到所述面部视频中之后,还包括:
    接收服务器发送的用于提示所述第一客户端的视频流为已处理的视频流的提示信息;
    向所述服务器发送获取请求,其中,所述获取请求用于获取所述已处理的视频流对应的原始视频流;
    接收所述服务器发送的原始视频流,以在所述第二客户端进行显示。
  16. 一种存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行一种视频聊天中添加表情的方法的步骤;
    其中,所述视频聊天中添加表情的方法包括以下步骤:
    获取第一客户端用户在视频通话时的面部视频;
    根据所述面部视频确定所述用户的情绪状态;
    从预设的动效数据库中选取所述情绪状态相匹配的动效设计,并将所述动效设计添加到所述面部视频中,以在第二客户端进行显示。
  17. 根据权利要求16所述的非易失性存储介质,所述获取第一客户端用户在视频通话时的面部视频,包括:
    接收服务器发送的所述第一客户端的视频流;
    按照第一预设时间间隔依次从所述视频流中截取多个视频图像;
    根据所述多个视频图像确定所述第一客户端用户的面部视频。
  18. 根据权利要求17所述的非易失性存储介质,所述根据所述多个视频图像确定所述第一客户端用户的面部视频,包括:
    按照截取视频图像的顺序依次判断所述多个视频图像是否为人脸图像;
    按照所述顺序将包含多个连续的人脸图像组中第一顺序位的人脸图像确定为第一目标图像,以及将与所述多个连续的人脸图像组中最后顺序位的人脸图像相邻的非人脸图像确定为第二目标图像;
    将第一目标图像和第二目标图像的时间点确定为起始时刻和终止时刻,以及将所述起始时刻和所述终止时刻之间的视频确定为所述面部视频。
  19. 根据权利要求16所述的非易失性存储介质,所述根据所述面部视频确定所述用户的情绪状态,包括:
    按照第二预设时间间隔依次从所述面部视频截取多个面部图像;
    分别识别所述多个面部图像的情绪状态;
    判断具有相同情绪状态且相邻的面部图像的个数是否大于预设个数;
    当大于所述预设个数时,将由相邻的多个面部图像组成的面部视频的情绪状态确定为目标情绪状态。
  20. 根据权利要求16所述的非易失性存储介质,所述将所述动效设计添加到所述面部视频中,包括:
    获取所述面部视频中人脸尺寸;
    按照所述人脸尺寸将所述动效设计的尺寸进行缩放;
    将缩放后的动效设计与人脸图像重合。
PCT/CN2019/116756 2019-09-03 2019-11-08 视频聊天中添加表情的方法、装置、计算机设备及存储介质 WO2021042513A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910828395.7 2019-09-03
CN201910828395.7A CN110650306B (zh) 2019-09-03 2019-09-03 视频聊天中添加表情的方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021042513A1 true WO2021042513A1 (zh) 2021-03-11

Family

ID=69010078

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/116756 WO2021042513A1 (zh) 2019-09-03 2019-11-08 视频聊天中添加表情的方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN110650306B (zh)
WO (1) WO2021042513A1 (zh)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814540A (zh) * 2020-05-28 2020-10-23 维沃移动通信有限公司 信息显示方法、装置、电子设备和可读存储介质
WO2022001706A1 (en) * 2020-06-29 2022-01-06 Guangdong Oppo Mobile Telecommunications Corp., Ltd. A method and system providing user interactive sticker based video call
CN112422844A (zh) * 2020-09-23 2021-02-26 上海哔哩哔哩科技有限公司 在视频中添加特效的方法、装置、设备及可读存储介质
CN112135083B (zh) * 2020-09-27 2022-09-06 广东小天才科技有限公司 一种视频通话过程中脸舞互动的方法及系统
CN112270733A (zh) * 2020-09-29 2021-01-26 北京五八信息技术有限公司 Ar表情包的生成方法、装置、电子设备及存储介质
CN112565913B (zh) * 2020-11-30 2023-06-20 维沃移动通信有限公司 视频通话方法、装置和电子设备
CN117440123A (zh) * 2022-07-15 2024-01-23 中兴通讯股份有限公司 音视频呼叫方法及装置
CN115426505B (zh) * 2022-11-03 2023-03-24 北京蔚领时代科技有限公司 基于面部捕捉的预设表情特效触发方法及相关设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106792170A (zh) * 2016-12-14 2017-05-31 合网络技术(北京)有限公司 视频处理方法及装置
US20180068178A1 (en) * 2016-09-05 2018-03-08 Max-Planck-Gesellschaft Zur Förderung D. Wissenschaften E.V. Real-time Expression Transfer for Facial Reenactment
CN107835464A (zh) * 2017-09-28 2018-03-23 努比亚技术有限公司 视频通话窗口画面处理方法、终端和计算机可读存储介质
CN109063644A (zh) * 2018-08-01 2018-12-21 长兴创智科技有限公司 基于人脸识别表情涂鸦方法、装置、存储介质及电子设备
CN109147825A (zh) * 2018-08-09 2019-01-04 湖南永爱生物科技有限公司 基于语音识别的人脸表情装饰方法、装置、存储介质及电子设备
CN109508638A (zh) * 2018-10-11 2019-03-22 平安科技(深圳)有限公司 人脸情绪识别方法、装置、计算机设备及存储介质

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2004289254A (ja) * 2003-03-19 2004-10-14 Matsushita Electric Ind Co Ltd テレビ電話端末
CN103647922A (zh) * 2013-12-20 2014-03-19 百度在线网络技术(北京)有限公司 虚拟视频通话方法和终端
US9576190B2 (en) * 2015-03-18 2017-02-21 Snap Inc. Emotion recognition in video conferencing
CN104780339A (zh) * 2015-04-16 2015-07-15 美国掌赢信息科技有限公司 一种即时视频中的表情特效动画加载方法和电子设备
CN104902212B (zh) * 2015-04-30 2019-05-10 努比亚技术有限公司 一种视频通信方法及装置
CN106778706A (zh) * 2017-02-08 2017-05-31 康梅 一种基于表情识别的实时假面视频展示方法
CN108399358B (zh) * 2018-01-11 2021-11-05 中国地质大学(武汉) 一种在视频聊天的表情显示方法及系统
CN108596140A (zh) * 2018-05-08 2018-09-28 青岛海信移动通信技术股份有限公司 一种移动终端人脸识别方法及系统
CN110020582B (zh) * 2018-12-10 2023-11-24 平安科技(深圳)有限公司 基于深度学习的人脸情绪识别方法、装置、设备及介质
CN109815873A (zh) * 2019-01-17 2019-05-28 深圳壹账通智能科技有限公司 基于图像识别的商品展示方法、装置、设备及介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180068178A1 (en) * 2016-09-05 2018-03-08 Max-Planck-Gesellschaft Zur Förderung D. Wissenschaften E.V. Real-time Expression Transfer for Facial Reenactment
CN106792170A (zh) * 2016-12-14 2017-05-31 合网络技术(北京)有限公司 视频处理方法及装置
CN107835464A (zh) * 2017-09-28 2018-03-23 努比亚技术有限公司 视频通话窗口画面处理方法、终端和计算机可读存储介质
CN109063644A (zh) * 2018-08-01 2018-12-21 长兴创智科技有限公司 基于人脸识别表情涂鸦方法、装置、存储介质及电子设备
CN109147825A (zh) * 2018-08-09 2019-01-04 湖南永爱生物科技有限公司 基于语音识别的人脸表情装饰方法、装置、存储介质及电子设备
CN109508638A (zh) * 2018-10-11 2019-03-22 平安科技(深圳)有限公司 人脸情绪识别方法、装置、计算机设备及存储介质

Also Published As

Publication number Publication date
CN110650306A (zh) 2020-01-03
CN110650306B (zh) 2022-04-15

Similar Documents

Publication Publication Date Title
WO2021042513A1 (zh) 视频聊天中添加表情的方法、装置、计算机设备及存储介质
US11765113B2 (en) Assistance during audio and video calls
CN109726624B (zh) 身份认证方法、终端设备和计算机可读存储介质
US10275672B2 (en) Method and apparatus for authenticating liveness face, and computer program product thereof
US9621851B2 (en) Augmenting web conferences via text extracted from audio content
EP3399467A1 (en) Emotion recognition in video conferencing
CN109829432B (zh) 用于生成信息的方法和装置
CN108920640B (zh) 基于语音交互的上下文获取方法及设备
KR20130022434A (ko) 통신단말장치의 감정 컨텐츠 서비스 장치 및 방법, 이를 위한 감정 인지 장치 및 방법, 이를 이용한 감정 컨텐츠를 생성하고 정합하는 장치 및 방법
US20220214797A1 (en) Virtual image control method, apparatus, electronic device and storage medium
CN117669605A (zh) 解析电子对话用于在替代界面中呈现
CN111476871A (zh) 用于生成视频的方法和装置
WO2021227916A1 (zh) 面部形象生成方法、装置、电子设备及可读存储介质
CN113703579B (zh) 数据处理方法、装置、电子设备及存储介质
US20240048572A1 (en) Digital media authentication
US20230410815A1 (en) Transcription generation technique selection
CN113014857A (zh) 视频会议显示的控制方法、装置、电子设备及存储介质
US20230053277A1 (en) Modified media detection
CN112669846A (zh) 交互系统、方法、装置、电子设备及存储介质
CN111862279A (zh) 交互处理方法和装置
Kumano et al. Collective first-person vision for automatic gaze analysis in multiparty conversations
CN110619602A (zh) 一种图像生成方法、装置、电子设备及存储介质
CN111476903A (zh) 虚拟交互实现控制方法、装置、计算机设备及存储介质
US11792353B2 (en) Systems and methods for displaying users participating in a communication session
JP2024513515A (ja) 映像生成方法及び装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19943977

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19943977

Country of ref document: EP

Kind code of ref document: A1