CN111416955B - Video call method and electronic equipment - Google Patents

Video call method and electronic equipment Download PDF

Info

Publication number
CN111416955B
CN111416955B CN202010183380.2A CN202010183380A CN111416955B CN 111416955 B CN111416955 B CN 111416955B CN 202010183380 A CN202010183380 A CN 202010183380A CN 111416955 B CN111416955 B CN 111416955B
Authority
CN
China
Prior art keywords
video
far
tone
abnormal
video picture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010183380.2A
Other languages
Chinese (zh)
Other versions
CN111416955A (en
Inventor
汪利民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Co Ltd
Original Assignee
Vivo Mobile Communication Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Co Ltd filed Critical Vivo Mobile Communication Co Ltd
Priority to CN202010183380.2A priority Critical patent/CN111416955B/en
Publication of CN111416955A publication Critical patent/CN111416955A/en
Application granted granted Critical
Publication of CN111416955B publication Critical patent/CN111416955B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/442Monitoring of processes or resources, e.g. detecting the failure of a recording device, monitoring the downstream bandwidth, the number of times a movie has been viewed, the storage space available from the internal hard disk
    • H04N21/44227Monitoring of local network, e.g. connection or bandwidth variations; Detecting new devices in the local network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a video call method and electronic equipment, comprising the following steps: monitoring whether a far-end video picture displayed on the local terminal equipment is abnormal or not in the process of carrying out video call between the local terminal equipment and the far-end equipment, and if so, playing a first video at the display position of the far-end video picture to replace the abnormal far-end video picture; and the first video is generated by searching corresponding target image resources according to a voice recognition result after voice recognition is carried out on voice call data transmitted by the far-end equipment. Therefore, when the far-end video picture displayed on the local terminal equipment is abnormal, the first video can be played at the display position of the far-end video picture to replace the abnormal far-end video picture, and the first video is generated based on the voice call data of the far-end equipment, and the voice call data can reflect the information of the voice, the semantics, the scene and the like of the two parties of the call, so that the effect of simulating real video call can be achieved, and the video call experience of a user is improved.

Description

Video call method and electronic equipment
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to a video call method and electronic equipment.
Background
Along with the rapid development of communication technology and the popularization of intelligent electronic equipment, the communication between people has gradually separated from the form of traditional voice call, more development is towards the direction of video call, the electronic equipment is provided with or other communication software at present, the video call is basically supported, the safety of the video call is greatly improved, in addition, the current expression and action of a user can be displayed in time by the video call, the interestingness and the interactivity are greatly improved, and therefore the traditional simple voice call function can be replaced to a great extent by the video call in the future development.
In the prior art, video call basically aims at the transmission of data at two ends of call, and the data transmitted by a far end is received by a home end to be displayed in real time. However, when the remote end turns off the camera or the network status is not good, the local end cannot receive the image data of the remote end, so that the video call picture of the remote end device displayed on the local end is abnormal, and the video call experience of the user is further affected.
Disclosure of Invention
The embodiment of the invention provides a video call method and electronic equipment, and aims to solve the technical problem that in the prior art, the video call experience of a user is influenced due to the fact that a video call picture of remote equipment displayed on local equipment is abnormal.
To solve the above technical problem, the embodiment of the present invention is implemented as follows:
in a first aspect, an embodiment of the present invention provides a video call method, which is applied to a local device, and the method includes:
monitoring whether a far-end video picture displayed on the local terminal equipment is abnormal or not in the process of carrying out video call between the local terminal equipment and the far-end equipment;
under the condition that the far-end video picture is abnormal, playing a first video at the display position of the far-end video picture to replace the abnormal far-end video picture;
and after voice recognition is carried out on the voice call data transmitted by the far-end equipment, the first video is a video generated by searching for a corresponding target image resource according to a voice recognition result.
Optionally, as an embodiment, the monitoring, during the video call between the local device and the remote device, whether a remote video picture displayed on the local device is abnormal includes:
monitoring whether a black screen picture appears in a far-end video picture displayed on the local terminal equipment or not in the process of carrying out video call between the local terminal equipment and the far-end equipment, and if so, determining that the far-end video picture displayed on the local terminal equipment is abnormal; alternatively, the first and second electrodes may be,
monitoring whether a remote video picture displayed on the local device is blocked or not in the process of carrying out video call between the local device and the remote device, and if so, determining that the remote video picture displayed on the local device is abnormal.
Optionally, as an embodiment, before playing the first video at the display position of the far-end video picture to replace the abnormal far-end video picture when the far-end video picture has the abnormality, the method further includes:
performing semantic recognition on the voice call data transmitted by the remote equipment to obtain a semantic recognition result;
searching corresponding target image resources according to the semantic recognition result;
and generating a first video based on the target image resource.
Optionally, as an embodiment, when there are a plurality of remote devices, in a case that there is an abnormality in the remote video frames, before playing the first video at the display position of the remote video frame to replace the abnormal remote video frame, the method further includes:
performing tone recognition on the voice call data transmitted by the remote equipment to obtain a tone recognition result;
extracting voice data corresponding to different timbres from the voice call data according to the timbre identification result;
performing semantic recognition on the voice data corresponding to each tone to obtain a semantic recognition result corresponding to each tone;
searching a target image resource corresponding to each tone according to the semantic recognition result corresponding to each tone;
and generating a first video based on the target image resource.
Optionally, as an embodiment, the generating a first video based on the target image resource specifically includes:
determining whether a character association relationship exists between semantic recognition results corresponding to each tone;
and if so, generating a first video containing multi-person interaction based on the image resource corresponding to the semantic identification result with the character association relationship.
In a second aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:
the monitoring unit is used for monitoring whether a far-end video picture displayed on the local terminal equipment is abnormal or not in the process of carrying out video call between the local terminal equipment and the far-end equipment;
the playing unit is used for playing a first video at the display position of the far-end video picture to replace the abnormal far-end video picture under the condition that the far-end video picture is abnormal;
and after voice recognition is carried out on the voice call data transmitted by the far-end equipment, the first video is a video generated by searching for a corresponding target image resource according to a voice recognition result.
Optionally, as an embodiment, the monitoring unit includes:
the first monitoring subunit is configured to monitor whether a far-end video picture displayed on the local device has a black screen picture or not in a process of performing a video call between the local device and a far-end device, and if so, determine that the far-end video picture displayed on the local device is abnormal; alternatively, the first and second electrodes may be,
and the second monitoring subunit is used for monitoring whether a remote video picture displayed on the local device is a pause picture or not in the process of carrying out video call between the local device and the remote device, and if so, determining that the remote video picture displayed on the local device is abnormal.
Optionally, as an embodiment, the electronic device further includes:
the first recognition unit is used for carrying out semantic recognition on the voice call data transmitted by the remote equipment to obtain a semantic recognition result;
the first searching unit is used for searching corresponding target image resources according to the semantic recognition result;
and the first generation unit is used for generating a first video based on the target image resource.
Optionally, as an embodiment, when there are a plurality of the remote devices, the electronic device further includes:
the second identification unit is used for carrying out tone color identification on the voice call data transmitted by the far-end equipment to obtain a tone color identification result;
the extracting unit is used for extracting voice data corresponding to different timbres from the voice call data according to the timbre identification result;
the third identification unit is used for carrying out semantic identification on the voice data corresponding to each tone to obtain a semantic identification result corresponding to each tone;
the second searching unit is used for searching the target image resource corresponding to each tone according to the semantic recognition result corresponding to each tone;
and the second generation unit is used for generating a first video based on the target image resource.
Optionally, as an embodiment, the second generating unit includes:
the determining subunit is used for determining whether a person association relationship exists between semantic recognition results corresponding to each tone;
and the generating subunit is used for generating a first video containing multi-person interaction based on the image resource corresponding to the semantic identification result with the person association relationship under the condition that the determination result of the determining subunit is positive.
In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored on the memory and executable on the processor, where the computer program, when executed by the processor, implements the steps of the video call method in the first aspect.
In a fourth aspect, an embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the steps of the video call method in the first aspect.
In the embodiment of the invention, in the process of carrying out video call between the local terminal equipment and the remote terminal equipment, aiming at the condition that the remote terminal video picture displayed on the local terminal equipment is abnormal, a first video generated based on the voice call data of the remote terminal equipment can be played at the display position of the remote terminal video picture of the local terminal equipment so as to replace the abnormal remote terminal video picture. Compared with the prior art, in the embodiment of the invention, when the far-end video picture displayed on the local terminal equipment is abnormal, the first video can be played at the display position of the far-end video picture to replace the abnormal far-end video picture, and because the first video is generated based on the voice call data of the far-end equipment, and the voice call data can reflect the information of the tone, the semantics, the scene and the like of both parties of the call, the effect of simulating real video call can be achieved, and the video call experience of a user is further improved.
Drawings
Fig. 1 is a flowchart of a video call method according to an embodiment of the present invention;
fig. 2 is a flowchart of a first video generation step according to an embodiment of the present invention;
fig. 3 is a scene diagram of a video call method according to an embodiment of the present invention;
fig. 4 is a flowchart of another first video generation step provided by the embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;
fig. 6 is a schematic diagram of a hardware structure of an electronic device implementing various embodiments of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention provides a video call method and electronic equipment.
First, a video call method provided in an embodiment of the present invention is described below.
It should be noted that the method provided by the embodiment of the present invention is applicable to an electronic device, and in practical application, the electronic device may include: smart phones, tablet computers, personal digital assistants, and the like, which are not limited in this embodiment of the present invention.
Fig. 1 is a flowchart of a video call method according to an embodiment of the present invention, and as shown in fig. 1, the method may include the following steps: step 101 and step 102, wherein,
in step 101, in the process of performing a video call between the local device and the remote device, monitoring whether a remote video picture displayed on the local device is abnormal; if yes, go to step 102, otherwise do not process.
In a video call scene, when the local device and the remote device perform a video call, a video picture of the local device, that is, a local video picture, and a video picture of the remote device, that is, a remote video picture, are displayed on the local device, and a video picture of the remote device and a video picture of the local device are also displayed on the remote device.
In the embodiment of the present invention, the far-end video picture abnormality may include the following situations: black screen of far-end video picture, stuck far-end video picture and the like.
In an embodiment provided by the present invention, considering that the far-end video screen blacking is usually caused by the far-end device closing the camera, and therefore, whether the far-end video screen displayed on the local device is abnormal may be determined by monitoring the camera of the far-end device, and correspondingly, the step 101 may specifically include the following steps:
in the process of carrying out video call between the local terminal equipment and the remote terminal equipment, monitoring whether a black screen image appears on a remote terminal video image displayed on the local terminal equipment, and if so, determining that the remote terminal video image displayed on the local terminal equipment is abnormal.
In another embodiment provided by the present invention, considering that the remote video picture is usually caused by poor network communication quality between the local device and the remote device, and therefore, whether the remote video picture displayed on the local device is abnormal or not may be determined by monitoring the network communication quality between the local device and the remote device, and accordingly, the step 101 may specifically include the following steps:
in the process of carrying out video call between the local terminal equipment and the remote terminal equipment, monitoring whether a remote terminal video picture displayed on the local terminal equipment has a pause picture or not, and if so, determining that the remote terminal video picture displayed on the local terminal equipment has abnormity.
In step 102, playing a first video at the display position of the far-end video picture to replace the abnormal far-end video picture; the first video is generated by searching corresponding target image resources according to a voice recognition result after voice recognition is carried out on voice call data transmitted by the far-end equipment.
In the embodiment of the invention, when the far-end video picture displayed on the local equipment is abnormal, the voice call data of the far-end equipment can still be normally transmitted to the local equipment, and the voice call data of the far-end equipment carries effective information such as the tone, the semantics and the scene of two parties of a call, so that when the far-end video picture displayed on the local equipment is abnormal, the effective information, namely a voice recognition result, is extracted by carrying out voice recognition on the voice call data of the far-end equipment, then an image resource which is in association with the voice recognition result is searched, and a first video is generated and played based on the searched image resource to replace the abnormal far-end video picture.
In the embodiment of the present invention, when the speech recognition result includes information of the mood, the semantic, the scene, and the like of the two parties in a call, the target image resource is an image resource matched with the mood, the semantic, and the scene, where the target image resource may include: picture resources and video resources.
In the embodiment of the invention, some picture resources and video resources can be stored in the local terminal equipment in advance, and under the condition, corresponding target image resources can be searched from the local terminal equipment; or the target image resource may also be searched for in other electronic devices, for example, a server, which is not limited in the embodiment of the present invention.
In the embodiment of the invention, after the corresponding target image resource is found, the time interval and the animation effect set in advance can be set, the video composition similar to a slide is carried out on each picture, each picture can randomly generate different dynamic effects, the generated video is converted into the video format which can be normally played by the electronic equipment, and finally the first video is obtained.
In the embodiment of the present invention, after the first video is generated, the first video may be used to replace a far-end video picture with an abnormal picture on the local device, and specifically, the first video is played at a display position of the far-end video picture of the local device.
For ease of understanding, the description is made in conjunction with a specific application scenario. Taking a scene of video call that the remote device closes the camera as an example, according to the image and video resources stored by the local device, dynamic video image editing and synthesis are carried out, and then display is carried out on the local device, so that the purpose of simulating real video call is achieved.
As can be seen from the foregoing embodiment, in the process of performing a video call between the local device and the remote device, for a case that a remote video picture displayed on the local device is abnormal, a first video generated based on voice call data of the remote device may be played at a display position of the remote video picture of the local device to replace the abnormal remote video picture. Compared with the prior art, in the embodiment of the invention, when the far-end video picture displayed on the local terminal equipment is abnormal, the first video can be played at the display position of the far-end video picture to replace the abnormal far-end video picture, and because the first video is generated based on the voice call data of the far-end equipment, and the voice call data can reflect the information of the tone, the semantics, the scene and the like of both parties of the call, the effect of simulating real video call can be achieved, and the video call experience of a user is further improved.
In a double-person video call scenario, that is, when there are 1 remote devices, as shown in fig. 2, fig. 2 is a flowchart of a first video generation step provided in an embodiment of the present invention, and may include the following steps: step 201, step 202 and step 203, wherein,
in step 201, semantic recognition is performed on the voice call data transmitted by the remote device to obtain a semantic recognition result.
In the embodiment of the present invention, in a double-person video call scene, when a far-end video picture displayed on a local device is abnormal, semantic recognition may be performed on voice call data transmitted by the far-end device to obtain a semantic recognition result, where the semantic recognition result may include: scene information involved in voice call data, wherein the scene information comprises: time, location and people information; in addition, the scene information may also include other information besides the above information, which is not limited in this embodiment of the present invention.
In step 202, according to the semantic recognition result, the corresponding target image resource is searched.
In the embodiment of the invention, according to the semantic recognition result, for example, a scene that two parties in a call play at a certain place is recognized in the call process, the scene recognition query is automatically performed on the picture resource and the video resource of the local map library, the query condition can be based on the place, the time, the person and the like, and then the found picture resource and video resource are determined as the target image resource. If the corresponding image resource is not identified, the default picture with both parties or the collected picture can be determined as the target image resource. In addition, because the existing galleries are basically provided with intelligent classification, further optimization searching can be carried out based on the intelligent classification.
In step 203, a first video is generated based on the target image asset.
In the embodiment of the invention, after the target image resource is found, the time interval and the animation effect set in advance can be set, the video synthesis similar to a slide is carried out on each picture, each picture can randomly generate different dynamic effects, the generated video is converted into the video format which can be normally played by the electronic equipment, and finally the first video is obtained.
In the embodiment of the present invention, after the first video is generated, the first video may be used to replace a far-end video picture with an abnormal picture on the home terminal device, for example, as shown in fig. 3, a home terminal video picture 31 and a synthesized first video 32 are displayed on the home terminal device 30, so as to achieve the purpose of simulating a real video call.
As can be seen from the above embodiments, in a double-person video call scene, in order to solve the problem that a far-end video picture displayed on a local device is abnormal, in the embodiments of the present invention, semantic recognition may be performed on voice call data transmitted by the remote device to obtain a semantic recognition result, a corresponding target image resource is searched according to the semantic recognition result, a first video is generated based on the target image resource, and the first video is played at a far-end video picture position of the local device. Compared with the prior art, in the embodiment of the invention, when the far-end video picture displayed on the local terminal equipment is abnormal, the first video can be played at the display position of the far-end video picture to replace the abnormal far-end video picture, and because the first video is generated based on the voice call data of the far-end equipment, and the voice call data can reflect the information of the tone, the semantics, the scene and the like of both parties of the call, the effect of simulating real video call can be achieved, and the video call experience of a user is further improved.
In a multi-person video call scenario, that is, when there are multiple remote devices, as shown in fig. 4, fig. 4 is a flowchart of another first video generation step provided in the embodiment of the present invention, and may include the following steps: step 401, step 402, step 403, step 404 and step 405, wherein,
in step 401, the voice call data transmitted by the remote device is subjected to tone color recognition to obtain a tone color recognition result.
In the embodiment of the invention, in a multi-person video call scene, when a far-end video picture displayed on local equipment is abnormal, tone recognition can be carried out on voice call data transmitted by the far-end equipment to obtain a tone recognition result so as to predict members of a multi-party video call.
In step 402, voice data corresponding to different timbres are extracted from the voice call data according to the timbre recognition result.
In the embodiment of the invention, the tone is considered to be the identifier of the video call member, so that the voice data of different video call members can be extracted from the voice call data according to different tones of pronunciations of different video call members.
In step 403, semantic recognition is performed on the voice data corresponding to each tone, so as to obtain a semantic recognition result corresponding to each tone.
In the embodiment of the present invention, semantic recognition may be performed on the voice data of different video call members, respectively, to obtain a semantic recognition result of each video call member, where the semantic recognition result may include: scene information involved in voice call data, wherein the scene information comprises: time, location and people information; in addition, the scene information may also include other information besides the above information, which is not limited in this embodiment of the present invention.
In step 404, the target image resource corresponding to each tone is searched according to the semantic recognition result corresponding to each tone.
In the embodiment of the invention, the image resource corresponding to each semantic recognition result can be searched, namely, the image resource corresponding to each video call member is searched.
In step 405, a first video is generated based on the target image asset.
In the embodiment of the present invention, when video synthesis is performed on a target image resource, if there is no intersection between different video call members, the different video call members are independently synthesized, and if there is an intersection, a multi-person interactive video album may be further manufactured to increase the interest of the video call, and correspondingly, the step 405 may specifically include the following steps:
determining whether a character association relationship exists between semantic recognition results corresponding to each tone;
and if so, generating a first video containing multi-person interaction based on the image resource corresponding to the semantic identification result with the character association relationship.
It can be seen from the above embodiments that, in a multi-user video call scene, in order to address a situation where a far-end video picture displayed on a local device is abnormal, in the embodiments of the present invention, tone recognition may be performed on voice call data transmitted by the far-end device to obtain a tone recognition result, voice data of different tones are extracted from the voice call data according to the tone recognition result, semantic recognition is performed on the voice data of each tone to obtain a plurality of semantic recognition results, a target image resource corresponding to each semantic recognition result is searched for, a first video is generated based on the target image resource, and the first video is played at a far-end video picture position of the local device. Compared with the prior art, in the embodiment of the invention, when the far-end video picture displayed on the local terminal equipment is abnormal, the first video can be played at the display position of the far-end video picture to replace the abnormal far-end video picture, and because the first video is generated based on the voice call data of the far-end equipment, and the voice call data can reflect the information of the tone, the semantics, the scene and the like of both parties of the call, the effect of simulating real video call can be achieved, and the video call experience of a user is further improved.
Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, and as shown in fig. 5, the electronic device 500 may include: a monitoring unit 501 and a playing unit 502, wherein,
the monitoring unit 501 is configured to monitor whether a far-end video picture displayed on a local device is abnormal or not in a process of a video call between the local device and a far-end device;
a playing unit 502, configured to play a first video at a display position of the far-end video frame to replace an abnormal far-end video frame when the far-end video frame is abnormal;
and after voice recognition is carried out on the voice call data transmitted by the far-end equipment, the first video is a video generated by searching for a corresponding target image resource according to a voice recognition result.
As can be seen from the foregoing embodiment, in the process of performing a video call between the local device and the remote device, for a case that a remote video picture displayed on the local device is abnormal, a first video generated based on voice call data of the remote device may be played at a display position of the remote video picture of the local device to replace the abnormal remote video picture. Compared with the prior art, in the embodiment of the invention, when the far-end video picture displayed on the local terminal equipment is abnormal, the first video can be played at the display position of the far-end video picture to replace the abnormal far-end video picture, and because the first video is generated based on the voice call data of the far-end equipment, and the voice call data can reflect the information of the tone, the semantics, the scene and the like of both parties of the call, the effect of simulating real video call can be achieved, and the video call experience of a user is further improved.
Optionally, as an embodiment, the monitoring unit 501 may include:
the first monitoring subunit is configured to monitor whether a far-end video picture displayed on the local device has a black screen picture or not in a process of performing a video call between the local device and a far-end device, and if so, determine that the far-end video picture displayed on the local device is abnormal; alternatively, the first and second electrodes may be,
and the second monitoring subunit is used for monitoring whether a remote video picture displayed on the local device is a pause picture or not in the process of carrying out video call between the local device and the remote device, and if so, determining that the remote video picture displayed on the local device is abnormal.
Optionally, as an embodiment, the electronic device 500 may further include:
the first recognition unit is used for carrying out semantic recognition on the voice call data transmitted by the remote equipment to obtain a semantic recognition result;
the first searching unit is used for searching corresponding target image resources according to the semantic recognition result;
and the first generation unit is used for generating a first video based on the target image resource.
Optionally, as an embodiment, when there are a plurality of remote devices, the electronic device 500 may further include:
the second identification unit is used for carrying out tone color identification on the voice call data transmitted by the far-end equipment to obtain a tone color identification result;
the extracting unit is used for extracting voice data corresponding to different timbres from the voice call data according to the timbre identification result;
the third identification unit is used for carrying out semantic identification on the voice data corresponding to each tone to obtain a semantic identification result corresponding to each tone;
the second searching unit is used for searching the target image resource corresponding to each tone according to the semantic recognition result corresponding to each tone;
and the second generation unit is used for generating a first video based on the target image resource.
Optionally, as an embodiment, the second generating unit may include:
the determining subunit is used for determining whether a person association relationship exists between semantic recognition results corresponding to each tone;
and the generating subunit is used for generating a first video containing multi-person interaction based on the image resource corresponding to the semantic identification result with the person association relationship under the condition that the determination result of the determining subunit is positive.
Fig. 6 is a schematic diagram of a hardware structure of an electronic device for implementing various embodiments of the present invention, and as shown in fig. 6, the electronic device 600 includes, but is not limited to: a radio frequency unit 601, a network module 602, an audio output unit 603, an input unit 604, a sensor 605, a display unit 606, a user input unit 607, an interface unit 608, a memory 609, a processor 610, and a power supply 611. Those skilled in the art will appreciate that the electronic device configuration shown in fig. 6 does not constitute a limitation of the electronic device, and that the electronic device may include more or fewer components than shown, or some components may be combined, or a different arrangement of components. In the embodiment of the present invention, the electronic device includes, but is not limited to, a mobile phone, a tablet computer, a notebook computer, a palm computer, a vehicle-mounted terminal, a wearable device, a pedometer, and the like. Wherein:
the processor 610 is configured to monitor whether a far-end video picture displayed on the local device is abnormal or not in a process of performing a video call between the local device and the far-end device;
a display unit 606, configured to play a first video at a display position of the far-end video picture to replace an abnormal far-end video picture when the far-end video picture is abnormal;
and after voice recognition is carried out on the voice call data transmitted by the far-end equipment, the first video is a video generated by searching for a corresponding target image resource according to a voice recognition result.
Optionally, the processor 610 is further configured to monitor whether a far-end video picture displayed on the local device has a black screen picture during a video call between the local device and the far-end device, and if so, determine that the far-end video picture displayed on the local device is abnormal; alternatively, the first and second electrodes may be,
monitoring whether a remote video picture displayed on the local device has a pause picture or not in the process of carrying out video call between the local device and the remote device, and if so, determining that the remote video picture displayed on the local device is abnormal.
Optionally, the processor 610 is further configured to perform semantic recognition on the voice call data transmitted by the remote device to obtain a semantic recognition result; searching corresponding target image resources according to the semantic recognition result; and generating a first video based on the target image resource.
Optionally, the processor 610 is further configured to perform tone recognition on the voice call data transmitted by the remote device to obtain a tone recognition result; extracting voice data corresponding to different timbres from the voice call data according to the timbre identification result; performing semantic recognition on the voice data corresponding to each tone to obtain a semantic recognition result corresponding to each tone; searching a target image resource corresponding to each tone according to the semantic recognition result corresponding to each tone; and generating a first video based on the target image resource.
Optionally, the processor 610 is further configured to determine whether there is a human relationship between semantic recognition results corresponding to each tone; and if so, generating a first video containing multi-person interaction based on the image resource corresponding to the semantic identification result with the character association relationship.
In the embodiment of the present invention, in the process of performing a video call between the local device and the remote device, for the case that the remote video picture displayed on the local device is abnormal, the first video generated based on the voice call data of the remote device may be played at the display position of the remote video picture of the local device to replace the abnormal remote video picture. Compared with the prior art, in the embodiment of the invention, when the far-end video picture displayed on the local terminal equipment is abnormal, the first video can be played at the display position of the far-end video picture to replace the abnormal far-end video picture, and because the first video is generated based on the voice call data of the far-end equipment, and the voice call data can reflect the information of the tone, the semantics, the scene and the like of both parties of the call, the effect of simulating real video call can be achieved, and the video call experience of a user is further improved.
It should be understood that, in the embodiment of the present invention, the radio frequency unit 601 may be used for receiving and sending signals during a message sending and receiving process or a call process, and specifically, receives downlink data from a base station and then processes the received downlink data to the processor 610; in addition, the uplink data is transmitted to the base station. In general, radio frequency unit 601 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. Further, the radio frequency unit 601 may also communicate with a network and other devices through a wireless communication system.
The electronic device provides wireless broadband internet access to the user via the network module 602, such as assisting the user in sending and receiving e-mails, browsing web pages, and accessing streaming media.
The audio output unit 603 may convert audio data received by the radio frequency unit 601 or the network module 602 or stored in the memory 609 into an audio signal and output as sound. Also, the audio output unit 603 may also provide audio output related to a specific function performed by the electronic apparatus 600 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 603 includes a speaker, a buzzer, a receiver, and the like.
The input unit 604 is used to receive audio or video signals. The input Unit 604 may include a Graphics Processing Unit (GPU) 6041 and a microphone 6042, and the Graphics processor 6041 processes image data of a still picture or video obtained by an image capturing apparatus (such as a camera) in a video capture mode or an image capture mode. The processed image may be displayed on the display unit 606. The image processed by the graphic processor 6041 may be stored in the memory 609 (or other storage medium) or transmitted via the radio frequency unit 601 or the network module 602. The microphone 6042 can receive sound, and can process such sound into audio data. The processed audio data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 601 in case of the phone call mode.
The electronic device 600 also includes at least one sensor 605, such as a light sensor, motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 6061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 6061 and/or the backlight when the electronic apparatus 600 is moved to the ear. As one type of motion sensor, an accelerometer sensor can detect the magnitude of acceleration in each direction (generally three axes), detect the magnitude and direction of gravity when stationary, and can be used to identify the posture of an electronic device (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), and vibration identification related functions (such as pedometer, tapping); the sensors 605 may also include fingerprint sensors, pressure sensors, iris sensors, molecular sensors, gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc., which are not described in detail herein.
The display unit 606 is used to display information input by the user or information provided to the user. The Display unit 606 may include a Display panel 6061, and the Display panel 6061 may be configured by a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.
The user input unit 607 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device. Specifically, the user input unit 607 includes a touch panel 6071 and other input devices 6072. Touch panel 6071, also referred to as a touch screen, may collect touch operations by a user on or near it (e.g., operations by a user on or near touch panel 6071 using a finger, stylus, or any suitable object or accessory). The touch panel 6071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 610, receives a command from the processor 610, and executes the command. In addition, the touch panel 6071 can be implemented by various types such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. The user input unit 607 may include other input devices 6072 in addition to the touch panel 6071. Specifically, the other input devices 6072 may include, but are not limited to, a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a track ball, a mouse, and a joystick, which are not described herein again.
Further, the touch panel 6071 can be overlaid on the display panel 6061, and when the touch panel 6071 detects a touch operation on or near the touch panel 6071, the touch operation is transmitted to the processor 610 to determine the type of the touch event, and then the processor 610 provides a corresponding visual output on the display panel 6061 according to the type of the touch event. Although the touch panel 6071 and the display panel 6061 are shown in fig. 6 as two separate components to implement the input and output functions of the electronic device, in some embodiments, the touch panel 6071 and the display panel 6061 may be integrated to implement the input and output functions of the electronic device, and this is not limited here.
The interface unit 608 is an interface for connecting an external device to the electronic apparatus 600. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 608 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the electronic device 600 or may be used to transmit data between the electronic device 600 and external devices.
The memory 609 may be used to store software programs as well as various data. The memory 609 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 609 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.
The processor 610 is a control center of the electronic device, connects various parts of the whole electronic device by using various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 609, and calling data stored in the memory 609, thereby performing overall monitoring of the electronic device. Processor 610 may include one or more processing units; preferably, the processor 610 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 610.
The electronic device 600 may further include a power supply 611 (e.g., a battery) for supplying power to the various components, and preferably, the power supply 611 may be logically connected to the processor 610 via a power management system, such that the power management system may be used to manage charging, discharging, and power consumption.
In addition, the electronic device 600 includes some functional modules that are not shown, and are not described in detail herein.
Preferably, an embodiment of the present invention further provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and capable of running on the processor, where the computer program, when executed by the processor, implements each process of any of the above embodiments of the video call method, and can achieve the same technical effect, and details are not repeated here to avoid repetition.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of any one of the above video call method embodiments, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.
It should be noted that, in the present specification, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (11)

1. A video call method is applied to local terminal equipment, and is characterized by comprising the following steps:
monitoring whether a far-end video picture displayed on the local terminal equipment is abnormal or not in the process of carrying out video call between the local terminal equipment and the far-end equipment;
under the condition that the far-end video picture is abnormal, playing a first video at the display position of the far-end video picture to replace the abnormal far-end video picture;
after voice recognition is carried out on voice call data transmitted by the far-end equipment, searching a corresponding target image resource according to a voice recognition result and generating a video;
when there are a plurality of remote devices, in the case that there is an abnormality in the remote video pictures, before playing the first video at the display position of the remote video picture to replace the abnormal remote video picture, the method further includes:
performing tone recognition on the voice call data transmitted by the remote equipment to obtain a tone recognition result; extracting voice data corresponding to different timbres from the voice call data according to the timbre identification result; performing semantic recognition on the voice data corresponding to each tone to obtain a semantic recognition result corresponding to each tone; searching a target image resource corresponding to each tone according to the semantic recognition result corresponding to each tone; and generating a first video based on the target image resource.
2. The method according to claim 1, wherein the monitoring whether the far-end video picture displayed on the local device is abnormal or not in the process of the local device and the far-end device performing the video call comprises:
monitoring whether a black screen picture appears in a far-end video picture displayed on the local terminal equipment or not in the process of carrying out video call between the local terminal equipment and the far-end equipment, and if so, determining that the far-end video picture displayed on the local terminal equipment is abnormal; alternatively, the first and second electrodes may be,
monitoring whether a remote video picture displayed on the local device is blocked or not in the process of carrying out video call between the local device and the remote device, and if so, determining that the remote video picture displayed on the local device is abnormal.
3. The method according to claim 1, wherein, in the case that there is an abnormality in the far-end video picture, before playing the first video at the display position of the far-end video picture to replace the abnormal far-end video picture, further comprising:
performing semantic recognition on the voice call data transmitted by the remote equipment to obtain a semantic recognition result;
searching corresponding target image resources according to the semantic recognition result;
and generating a first video based on the target image resource.
4. The method according to claim 1, wherein the generating of the first video based on the target image resource is specifically:
determining whether a character association relationship exists between semantic recognition results corresponding to each tone;
and if so, generating a first video containing multi-person interaction based on the image resource corresponding to the semantic identification result with the character association relationship.
5. An electronic device, characterized in that the electronic device comprises:
the monitoring unit is used for monitoring whether a far-end video picture displayed on the local terminal equipment is abnormal or not in the process of carrying out video call between the local terminal equipment and the far-end equipment;
the playing unit is used for playing a first video at the display position of the far-end video picture to replace the abnormal far-end video picture under the condition that the far-end video picture is abnormal;
after voice recognition is carried out on voice call data transmitted by the far-end equipment, searching a corresponding target image resource according to a voice recognition result and generating a video;
when there are a plurality of remote devices, in the case that there is an abnormality in the remote video pictures, before playing the first video at the display position of the remote video picture to replace the abnormal remote video picture, the method further includes:
performing tone recognition on the voice call data transmitted by the remote equipment to obtain a tone recognition result; extracting voice data corresponding to different timbres from the voice call data according to the timbre identification result; performing semantic recognition on the voice data corresponding to each tone to obtain a semantic recognition result corresponding to each tone; searching a target image resource corresponding to each tone according to the semantic recognition result corresponding to each tone; and generating a first video based on the target image resource.
6. The electronic device of claim 5, wherein the monitoring unit comprises:
the first monitoring subunit is configured to monitor whether a far-end video picture displayed on the local device has a black screen picture or not in a process of performing a video call between the local device and a far-end device, and if so, determine that the far-end video picture displayed on the local device is abnormal; alternatively, the first and second electrodes may be,
and the second monitoring subunit is used for monitoring whether a remote video picture displayed on the local device is a pause picture or not in the process of carrying out video call between the local device and the remote device, and if so, determining that the remote video picture displayed on the local device is abnormal.
7. The electronic device of claim 5, further comprising:
the first recognition unit is used for carrying out semantic recognition on the voice call data transmitted by the remote equipment to obtain a semantic recognition result;
the first searching unit is used for searching corresponding target image resources according to the semantic recognition result;
and the first generation unit is used for generating a first video based on the target image resource.
8. The electronic device of claim 5, wherein when the remote device is plural, the electronic device further comprises:
the second identification unit is used for carrying out tone color identification on the voice call data transmitted by the far-end equipment to obtain a tone color identification result;
the extracting unit is used for extracting voice data corresponding to different timbres from the voice call data according to the timbre identification result;
the third identification unit is used for carrying out semantic identification on the voice data corresponding to each tone to obtain a semantic identification result corresponding to each tone;
the second searching unit is used for searching the target image resource corresponding to each tone according to the semantic recognition result corresponding to each tone;
and the second generation unit is used for generating a first video based on the target image resource.
9. The electronic device according to claim 8, wherein the second generation unit includes:
the determining subunit is used for determining whether a person association relationship exists between semantic recognition results corresponding to each tone;
and the generating subunit is used for generating a first video containing multi-person interaction based on the image resource corresponding to the semantic identification result with the person association relationship under the condition that the determination result of the determining subunit is positive.
10. An electronic device comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps of the video telephony method of any of claims 1 to 4.
11. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the video call method according to any one of claims 1 to 4.
CN202010183380.2A 2020-03-16 2020-03-16 Video call method and electronic equipment Active CN111416955B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010183380.2A CN111416955B (en) 2020-03-16 2020-03-16 Video call method and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010183380.2A CN111416955B (en) 2020-03-16 2020-03-16 Video call method and electronic equipment

Publications (2)

Publication Number Publication Date
CN111416955A CN111416955A (en) 2020-07-14
CN111416955B true CN111416955B (en) 2022-03-04

Family

ID=71494364

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010183380.2A Active CN111416955B (en) 2020-03-16 2020-03-16 Video call method and electronic equipment

Country Status (1)

Country Link
CN (1) CN111416955B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112565913B (en) * 2020-11-30 2023-06-20 维沃移动通信有限公司 Video call method and device and electronic equipment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101917585A (en) * 2010-08-13 2010-12-15 宇龙计算机通信科技(深圳)有限公司 Method, device and terminal for regulating video information sent from visual telephone to opposite terminal
CN104469244A (en) * 2013-09-13 2015-03-25 联想(北京)有限公司 A network based video image adjusting method and system
CN104780459A (en) * 2015-04-16 2015-07-15 美国掌赢信息科技有限公司 Method and electronic equipment for loading effects in instant video
CN105872437A (en) * 2015-12-15 2016-08-17 乐视致新电子科技(天津)有限公司 Video call control method, video call control device and terminal
CN107396198A (en) * 2017-07-24 2017-11-24 维沃移动通信有限公司 A kind of video call method and mobile terminal
JP2018006791A (en) * 2016-06-27 2018-01-11 三菱電機株式会社 Navigation device and operation method for navigation device
CN109996026A (en) * 2019-04-23 2019-07-09 广东小天才科技有限公司 Special video effect interactive approach, device, equipment and medium based on wearable device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101917585A (en) * 2010-08-13 2010-12-15 宇龙计算机通信科技(深圳)有限公司 Method, device and terminal for regulating video information sent from visual telephone to opposite terminal
CN104469244A (en) * 2013-09-13 2015-03-25 联想(北京)有限公司 A network based video image adjusting method and system
CN104780459A (en) * 2015-04-16 2015-07-15 美国掌赢信息科技有限公司 Method and electronic equipment for loading effects in instant video
CN105872437A (en) * 2015-12-15 2016-08-17 乐视致新电子科技(天津)有限公司 Video call control method, video call control device and terminal
JP2018006791A (en) * 2016-06-27 2018-01-11 三菱電機株式会社 Navigation device and operation method for navigation device
CN107396198A (en) * 2017-07-24 2017-11-24 维沃移动通信有限公司 A kind of video call method and mobile terminal
CN109996026A (en) * 2019-04-23 2019-07-09 广东小天才科技有限公司 Special video effect interactive approach, device, equipment and medium based on wearable device

Also Published As

Publication number Publication date
CN111416955A (en) 2020-07-14

Similar Documents

Publication Publication Date Title
CN109078319B (en) Game interface display method and terminal
CN109240577B (en) Screen capturing method and terminal
CN110109593B (en) Screen capturing method and terminal equipment
CN111666009B (en) Interface display method and electronic equipment
CN107870674B (en) Program starting method and mobile terminal
CN109412932B (en) Screen capturing method and terminal
CN111402866A (en) Semantic recognition method and device and electronic equipment
CN110096203B (en) Screenshot method and mobile terminal
CN110855921B (en) Video recording control method and electronic equipment
CN109922294B (en) Video processing method and mobile terminal
CN109618218B (en) Video processing method and mobile terminal
CN109949809B (en) Voice control method and terminal equipment
CN111405043A (en) Information processing method and device and electronic equipment
CN111698550A (en) Information display method and device, electronic equipment and medium
CN109067979B (en) Prompting method and mobile terminal
CN108270928B (en) Voice recognition method and mobile terminal
CN109166164B (en) Expression picture generation method and terminal
CN111416955B (en) Video call method and electronic equipment
CN108418961B (en) Audio playing method and mobile terminal
CN111443968A (en) Screenshot method and electronic equipment
CN108287644B (en) Information display method of application program and mobile terminal
CN111491058A (en) Method for controlling operation mode, electronic device, and storage medium
CN110888572A (en) Message display method and terminal equipment
CN108471549B (en) Remote control method and terminal
CN110928616A (en) Shortcut icon management method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant