CN111405234A - Video conference information system and method with integration of cloud computing and edge computing - Google Patents

Video conference information system and method with integration of cloud computing and edge computing Download PDF

Info

Publication number
CN111405234A
CN111405234A CN202010304820.5A CN202010304820A CN111405234A CN 111405234 A CN111405234 A CN 111405234A CN 202010304820 A CN202010304820 A CN 202010304820A CN 111405234 A CN111405234 A CN 111405234A
Authority
CN
China
Prior art keywords
video
conference
conference scene
user
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010304820.5A
Other languages
Chinese (zh)
Inventor
徐佳辉
万小贞
万志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dayi Technology Co ltd
Original Assignee
Hangzhou Dayi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dayi Technology Co ltd filed Critical Hangzhou Dayi Technology Co ltd
Priority to CN202010304820.5A priority Critical patent/CN111405234A/en
Publication of CN111405234A publication Critical patent/CN111405234A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • G06T5/77
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Signal Processing (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a video conference information system and a method integrating cloud computing and edge computing, wherein the system comprises a cloud server and a plurality of user terminals, and each user terminal comprises a function enhancement module and a network detection module; the user terminal is used for acquiring the conference scene audio and video of the current user, the network detection module is used for monitoring and judging the network state in real time, and the function enhancement module is used for carrying out function enhancement on the conference scene audio and video; and the cloud server is used for establishing and training a corresponding cloud conference scene model according to the conference scene audio and video uploaded by the corresponding user terminal, and respectively transmitting the calculated conference scene audio and video to other user terminals of the video conference. By the system and the method, the execution main body of the conference scene construction and the function enhancement can be determined to be the cloud node or the edge node according to the real-time network environment, and the system and the method have the advantages of stable video conference connection, high-quality and smooth audio and video conference scene, voice fidelity restoration, low environmental noise and the like.

Description

Video conference information system and method with integration of cloud computing and edge computing
Technical Field
The invention relates to the technical field of cloud computing, edge computing and communication, in particular to a video conference information system and method integrating cloud computing and edge computing.
Background
Video conferencing refers to the communication of voice, video and file data between individuals or groups in two or more different locations via transmission lines and multimedia devices, so as to achieve real-time and interactive communication. Video conferences use similar telephones to see the expression and movement of people talking to you, in addition to the speech communication, so that people in different places can communicate as if they were in the same conference room. With the wide use of video conference systems, on the basis of basic mutual transmission of multi-party voice and video, more and more functions are added and integrated, such as image quality enhancement, voice tone enhancement, environmental noise removal and the like, so that the conference effect of a video conference is enhanced, and the work and communication are facilitated.
The communication quality is very important during the video conference, and can be greatly influenced by the communication network environment, and at present, the video conference system has the following defects: 1. when most of video conferences are all with mobile devices such as smart phones as conference terminals, then as transmission ways through the mobile internet, the signal difference of different areas or space mobile networks is very large, for example, when wifi networks are used, the closer the signal to the router is, the better the signal is, the mobile network signals are very poor in closed spaces such as garages, subway stations and elevators, and these factors can cause the mobile networks to have unstable network speed and unstable video connection, thereby affecting the effect of the video conferences. 2. In the video conference, the data transmission amount of the mobile network is increased by the functions of image quality enhancement, voice quality enhancement, environmental noise removal and the like, so that longer network delay is brought, and the functions are forced to be closed and aggravate the phenomena of video conference blocking or interruption and the like when the network environment is not good.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, the present invention aims to: the video conference information system and method with the integration of cloud computing and edge computing are provided, a user terminal of a video conference is used as an edge node, a cloud server is used as a cloud node, the user terminal and the cloud server are integrated to compute to achieve audio and video conference scene construction and various function enhancement of the video conference, the execution main body of the conference scene construction and the function enhancement can be determined to be the cloud node or the edge node according to a real-time network environment, and the video conference information system and method have the advantages of stable video conference connection, high-quality and smooth audio and video conference scene, voice fidelity restoration, low environmental noise and the like.
A video conference information system integrating cloud computing and edge computing comprises a cloud server and a plurality of user terminals, wherein each user terminal comprises a function enhancement module and a network detection module; the user terminal is used for acquiring the conference scene audio and video of the current user and displaying the conference scene audio and video of other users of the video conference; the network detection module is used for monitoring and judging the network state in real time, and the function enhancement module is used for carrying out function enhancement on the audio and video of the conference scene; the cloud server is used for establishing and training a corresponding cloud conference scene model in real time according to the conference scene audio and video uploaded by the corresponding user terminal, fitting the cloud conference scene model with the corresponding video conference user, and respectively transmitting the conference scene audio and video after calculation processing to other user terminals of the video conference.
Further, the function enhancement module is used for carrying out function enhancement on the audio and video of the conference scene, and comprises the steps of extracting human face ROI (region of interest) features to carry out human face detection, enhancing human face beauty, enhancing image quality, extracting ROI features of a picture foreground region to carry out background fuzzification, speech enhancement, speech fidelity restoration and environmental noise removal.
Furthermore, the tolerance of the corresponding cloud conference scene model to the network state is judged according to the data volume of the conference scene audio and video uploaded by each user terminal; when the data volume of the conference scene audio and video uploaded by the user terminal is smaller than a preset threshold value and the fitting degree of the corresponding cloud conference scene model and the user is low, judging that the tolerance of the corresponding cloud conference scene model to the network state is high, and performing function enhancement on the conference scene audio and video of the current user by adopting a function enhancement module; when the data volume of the conference scene audio and video uploaded by the user terminal is larger than a preset threshold value, and the fitting degree of the corresponding cloud conference scene model and the user is high, the tolerance of the corresponding cloud conference scene model to the network state is judged to be low, and the basic data of the current user is adopted to perform fusion calculation on the virtual complete conference scene audio and video.
Further, when the network state is bad, the user terminal uploads the conference basic data information of the current user to the cloud server, the training of the cloud conference scene model of the corresponding user is stopped, and the user terminal establishes and trains the corresponding user terminal conference scene model according to the conference scene audio and video collected by the user terminal; and when the network state is recovered well, the corresponding cloud conference scene model continues to train, and the user-side conference scene model synchronizes the model training data of the current user to the corresponding cloud conference scene model.
Further, the network detection module determines the current network state according to the packet loss rate of the data packet and the feedback delay after the data packet is sent.
A video conference information method with integration of cloud computing and edge computing comprises the following steps:
s1: acquiring a conference scene audio and video of a current user through a user terminal, and displaying conference scene audio and video of other users of the video conference;
s2: the method comprises the steps that a function enhancement module is adopted to enhance the functions of conference scene audios and videos, and the enhanced conference scene audios and videos are uploaded to a cloud server; the cloud server establishes and trains a corresponding cloud conference scene model in real time according to the conference scene audio and video uploaded by the corresponding user terminal, and the cloud conference scene model is fitted with the corresponding video conference user;
s3: monitoring the network state in real time by adopting a network detection module, and judging whether the network state is good or not in real time; if yes, go to step S4, otherwise go to step S5;
s4: the method comprises the steps that a function enhancement module is adopted to enhance the functions of conference scene audios and videos of a current user, and the enhanced conference scene audios and videos are uploaded to a cloud server; the cloud server transmits the enhanced conference scene audio and video to other user terminals of the video conference respectively;
s5: the user terminal uploads the conference basic data information of the current user to the cloud server, and the corresponding cloud conference scene model performs fusion calculation on a virtual complete conference scene audio and video according to the basic data of the user; and the cloud server transmits the virtually complete conference scene audio and video to other user terminals of the video conference respectively.
Further, the function enhancement module is adopted to perform function enhancement on the audio and video of the conference scene, and the function enhancement module comprises the steps of extracting human face ROI (region of interest) features to perform human face detection, enhancing human face beauty, enhancing image quality, extracting image foreground region ROI features to perform background fuzzification, enhancing voice, restoring voice fidelity and removing environmental noise.
Furthermore, the tolerance of the corresponding cloud conference scene model to the network state is judged according to the data volume of the conference scene audio and video uploaded by each user terminal; when the data volume of the conference scene audio and video uploaded by the user terminal is smaller than a preset threshold value and the fitting degree of the corresponding cloud conference scene model and the user is low, judging that the tolerance of the corresponding cloud conference scene model to the network state is high, and performing function enhancement on the conference scene audio and video of the current user by adopting a function enhancement module; when the data volume of the conference scene audio and video uploaded by the user terminal is larger than a preset threshold value, and the fitting degree of the corresponding cloud conference scene model and the user is high, the tolerance of the corresponding cloud conference scene model to the network state is judged to be low, and the basic data of the current user is adopted to perform fusion calculation on the virtual complete conference scene audio and video.
Further, when the network state is bad, the user terminal uploads the conference basic data information of the current user to the cloud server, the training of the cloud conference scene model of the corresponding user is stopped, and the user terminal establishes and trains the corresponding user terminal conference scene model according to the conference scene audio and video collected by the user terminal; and when the network state is recovered well, the corresponding cloud conference scene model continues to train, and the user-side conference scene model synchronizes the model training data of the current user to the corresponding cloud conference scene model.
Further, the network detection module determines the current network state according to the packet loss rate of the data packet and the feedback delay after the data packet is sent.
Compared with the prior art, the invention has the following advantages:
the invention provides a video conference information system and method integrating cloud and edge calculation, wherein a user terminal of a video conference is used as an edge node, and a cloud server of the video conference is used as a cloud node; establishing and training a one-to-one corresponding cloud conference scene model for each user participating in the video conference on a cloud server; when the network state is good, adopting an edge enhancement mode, and then respectively transmitting the enhanced conference scene audio and video to other user terminals of the video conference; and when the network state is not good, adopting a cloud and edge fusion calculation mode, and then respectively transmitting the virtual complete conference scene audio and video to other user terminals of the video conference. The implementation main body of the conference scene construction and function enhancement can be determined to be a cloud node or an edge node according to the real-time network environment, and the method has the advantages of stable video conference connection, high-quality and smooth audio and video conference scene, voice fidelity restoration, low environmental noise and the like.
Drawings
Fig. 1 is a system framework diagram of a video conference information system with a convergence of cloud and edge computing according to an embodiment of the present invention;
fig. 2 is a flowchart of a video conference information method in which cloud and edge computing are fused according to a second embodiment of the present invention;
fig. 3 is a block diagram of a function enhancement module according to one or two embodiments of the present invention;
fig. 4 is a flowchart of determining a degree of fitting between a conference scene model and a user in the first embodiment or the second embodiment of the present invention.
Detailed Description
Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and therefore are only examples, and the protection scope of the present invention is not limited thereby.
The first embodiment is as follows:
referring to fig. 1, a video conference information system with integration of cloud and edge computing includes a cloud server and a plurality of user terminals, each of which includes a function enhancing module and a network detecting module; the user terminal is used for acquiring the conference scene audio and video of the current user and displaying the conference scene audio and video of other users of the video conference; the network detection module is used for monitoring and judging the network state in real time, and the function enhancement module is used for carrying out function enhancement on the audio and video of the conference scene; the cloud server is used for establishing and training a corresponding cloud conference scene model according to the conference scene audio and video uploaded by the corresponding user terminal, fitting the cloud conference scene model with the corresponding video conference user, and respectively transmitting the conference scene audio and video after calculation processing to other user terminals of the video conference. Specifically, the user terminal may be a mobile device such as a smart phone, or may be a desktop computer, a notebook computer, or an ipad; the conference scene audio and video of the current user comprise video information and audio information of the current user shot and recorded by a user terminal in the video conference process; the network detection module can monitor the network state of each user terminal in real time, and judge the current network state according to the packet loss rate of a data packet during video conference data transmission and the feedback time delay after the data packet is sent. Referring to fig. 3, the function enhancement module is used for performing function enhancement on the audio and video of the conference scene, including face detection by extracting face ROI features, face beauty enhancement, image quality enhancement, background blurring by extracting ROI feature detection of a picture foreground region, voice enhancement, voice fidelity restoration and environmental noise removal. And establishing a one-to-one corresponding cloud conference scene model for each user participating in the video conference on a cloud server, and training the model in real time by adopting a plurality of conference scene audios and videos uploaded by each user terminal.
In the video conference information system, the cloud conference scene model comprises a character model, a foreground model and a denoising voice model, and the character model training method comprises the following steps: the method comprises the steps of carrying out face recognition, expression analysis, posture analysis and the like on users participating in the video conference from video information uploaded by a user terminal, then constructing and training character models of the users, and storing model data into a cloud server. The foreground model training method comprises the following steps: foreground frames are extracted from video information frames uploaded by a user terminal, a plurality of foreground frames are adopted to construct and train a foreground model of the user, and model data are stored in a cloud server. The denoising voice model training method comprises the following steps: the method comprises the steps of carrying out recognition analysis on audio information uploaded by a user terminal, distinguishing the audio of a participating user from the audio of a conference scene, storing model data into a cloud server, and updating a denoising voice model in real time according to the audio of the participating user; preferably, the audio of the participating user can be used as a weighting parameter to modify the expression data of the user.
In the video conference information system, when the network detection module monitors that the network state is good, the function enhancement module is adopted to enhance the functions of the conference scene audio and video of the current user, and the enhanced conference scene audio and video are uploaded to the cloud server; and the cloud server respectively transmits the enhanced conference scene audio and video to other user terminals of the video conference. When the network detection module monitors that the network state is poor, at the moment, the function enhancement processing is not carried out on the conference scene audio and video, the user terminal only uploads the conference basic data information of the current user to the cloud server, then the cloud and edge fusion calculation is carried out, the basic data of the user is substituted into a cloud conference scene model corresponding to the user, and the virtual complete conference scene audio and video are fused and calculated; and the cloud server transmits the virtually complete conference scene audio and video to other user terminals of the video conference respectively. Specifically, the conference basic data information mainly refers to voice information, and may also be thumbnail image information obtained by extracting and down-sampling a body area or a face area of a user. When the network detection module monitors that the network state is recovered from poor to good, the conference scene audio and video generation in the cloud and edge fusion calculation mode is recovered to be the marginalized conference scene audio and video generation.
Referring to fig. 4, in the video conference information system, the establishment and training of each cloud conference scene model requires a corresponding user terminal to upload a certain number of conference scene audios and videos, and the more sufficient the data amount is, the better the fitting effect of the model obtained by training and the user is. Therefore, after the video conference starts, the tolerance of the corresponding cloud conference scene model to the network state can be judged according to the data volume of the conference scene audios and videos uploaded by each user terminal. When the data volume of the conference scene audio and video uploaded by the user terminal is less than a preset threshold value, and the fitting degree of the corresponding cloud conference scene model and the user is low, it is judged that the tolerance of the corresponding cloud conference scene model to the network state is high, and more edge enhancement modes are adopted, namely, a function enhancement module is adopted to perform function enhancement on the conference scene audio and video of the current user; as the data amount of the conference scene audio and video uploaded by the user increases and is larger than a preset threshold, the fitting degree of the corresponding cloud conference scene model and the user is high, the tolerance of the corresponding cloud conference scene model to the network state is judged to be low, and more computing modes with fused cloud and edge are adopted, namely, the basic data of the current user are adopted to compute the virtual complete conference scene audio and video in a fused mode. Specifically, a specific numerical value of the preset threshold is reversely deduced according to the cloud conference scene model and the fitting degree of the user.
In specific implementation, when the network state is not good, the user terminal can only upload the conference basic data of the user, and at the moment, the corresponding cloud conference scene model cannot be trained; or when the user is not convenient to directly participate in the video conference for some reasons, such as driving, running, etc.; at this time, a user terminal conference scene model can be established for the user at the user terminal, and the user terminal establishes and trains a corresponding user terminal conference scene model according to the conference scene audio and video collected by the user terminal; and when the network state is recovered well, the corresponding cloud conference scene model continues to train, and the user-side conference scene model synchronizes the model training data of the current user to the corresponding cloud conference scene model.
According to the video conference information system, the user terminal of the video conference is used as the edge node, the cloud server of the video conference is used as the cloud node, the video conference scene construction and various function enhancements of the video conference are realized through fusion calculation of the user terminal and the cloud server, the execution main body of the conference scene construction and the function enhancements can be determined to be the cloud node or the edge node according to the real-time network environment, and the video conference information system has the advantages of stable video conference connection, high-quality and smooth video conference scene, voice fidelity restoration, low environmental noise and the like.
Example two:
referring to fig. 2, a video conference information method with fusion of cloud and edge computing includes the following steps:
s1: acquiring a conference scene audio and video of a current user through a user terminal, and displaying conference scene audio and video of other users of the video conference;
s2: the method comprises the steps that a function enhancement module is adopted to enhance the functions of conference scene audios and videos, and the enhanced conference scene audios and videos are uploaded to a cloud server; the cloud server establishes and trains a corresponding cloud conference scene model according to the conference scene audio and video uploaded by the corresponding user terminal, and the cloud conference scene model is fitted with the corresponding video conference user;
s3: monitoring the network state in real time by adopting a network detection module, and judging whether the network state is good or not in real time; if yes, go to step S4, otherwise go to step S5;
s4: the method comprises the steps that a function enhancement module is adopted to enhance the functions of conference scene audios and videos of a current user, and the enhanced conference scene audios and videos are uploaded to a cloud server; the cloud server transmits the enhanced conference scene audio and video to other user terminals of the video conference respectively;
s5: the user terminal uploads the conference basic data information of the current user to the cloud server, and the corresponding cloud conference scene model performs fusion calculation on a virtual complete conference scene audio and video according to the basic data of the user; and the cloud server transmits the virtually complete conference scene audio and video to other user terminals of the video conference respectively.
Specifically, the user terminal may be a mobile device such as a smart phone, or may be a desktop computer, a notebook computer, or an ipad; the conference scene audio and video of the current user comprise video information and audio information of the current user shot and recorded by a user terminal in the video conference process; the network detection module can monitor the network state of each user terminal in real time, and judge the current network state according to the packet loss rate of a data packet during video conference data transmission and the feedback time delay after the data packet is sent. Referring to fig. 3, the function enhancement module is used for performing function enhancement on the audio and video of the conference scene, including face detection by extracting face ROI features, face beauty enhancement, image quality enhancement, background blurring by extracting ROI feature detection of a picture foreground region, voice enhancement, voice fidelity restoration and environmental noise removal. And establishing a one-to-one corresponding cloud conference scene model for each user participating in the video conference on a cloud server, and training the model in real time by adopting a plurality of conference scene audios and videos uploaded by each user terminal.
In the video conference information method, the cloud conference scene model comprises a character model, a foreground model and a denoising voice model, and the character model training method comprises the following steps: the method comprises the steps of carrying out face recognition, expression analysis, posture analysis and the like on users participating in the video conference from video information uploaded by a user terminal, then constructing and training character models of the users, and storing model data into a cloud server. The foreground model training method comprises the following steps: foreground frames are extracted from video information frames uploaded by a user terminal, a plurality of foreground frames are adopted to construct and train a foreground model of the user, and model data are stored in a cloud server. The denoising voice model training method comprises the following steps: the method comprises the steps of carrying out recognition analysis on audio information uploaded by a user terminal, distinguishing the audio of a participating user from the audio of a conference scene, storing model data into a cloud server, and updating a denoising voice model in real time according to the audio of the participating user; preferably, the audio of the participating user can be used as a weighting parameter to modify the expression data of the user.
In the video conference information method, when the network detection module monitors that the network state is good, the function enhancement module is adopted to enhance the functions of the conference scene audio and video of the current user, and the enhanced conference scene audio and video are uploaded to the cloud server; and the cloud server respectively transmits the enhanced conference scene audio and video to other user terminals of the video conference. When the network detection module monitors that the network state is poor, at the moment, the function enhancement processing is not carried out on the conference scene audio and video, the user terminal only uploads the conference basic data information of the current user to the cloud server, then the cloud and edge fusion calculation is carried out, the basic data of the user is substituted into a cloud conference scene model corresponding to the user, and the virtual complete conference scene audio and video are fused and calculated; and the cloud server transmits the virtually complete conference scene audio and video to other user terminals of the video conference respectively. Specifically, the conference basic data information mainly refers to voice information, and may also be thumbnail image information obtained by extracting and down-sampling a body area or a face area of a user. When the network detection module monitors that the network state is recovered from poor to good, the conference scene audio and video generation in the cloud and edge fusion calculation mode is recovered to be the marginalized conference scene audio and video generation.
Referring to fig. 4, in the video conference information method, the establishment and training of each cloud conference scene model requires a certain number of conference scene audios and videos to be uploaded by the corresponding user terminal, and the more sufficient the data amount is, the better the fitting effect of the trained model and the user is. Therefore, after the video conference starts, the tolerance of the corresponding cloud conference scene model to the network state can be judged according to the data volume of the conference scene audios and videos uploaded by each user terminal. When the data volume of the conference scene audio and video uploaded by the user terminal is less than a preset threshold value, and the fitting degree of the corresponding cloud conference scene model and the user is low, it is judged that the tolerance of the corresponding cloud conference scene model to the network state is high, and more edge enhancement modes are adopted, namely, a function enhancement module is adopted to perform function enhancement on the conference scene audio and video of the current user; as the data amount of the conference scene audio and video uploaded by the user increases and is larger than a preset threshold, the fitting degree of the corresponding cloud conference scene model and the user is high, the tolerance of the corresponding cloud conference scene model to the network state is judged to be low, and more computing modes with fused cloud and edge are adopted, namely, the basic data of the current user are adopted to compute the virtual complete conference scene audio and video in a fused mode. Specifically, a specific numerical value of the preset threshold is reversely deduced according to the cloud conference scene model and the fitting degree of the user.
In specific implementation, when the network state is not good, the user terminal can only upload the conference basic data of the user, and at the moment, the corresponding cloud conference scene model cannot be trained; or when the user is not convenient to directly participate in the video conference for some reasons, such as driving, running, etc.; at this time, a user terminal conference scene model can be established for the user at the user terminal, and the user terminal establishes and trains a corresponding user terminal conference scene model according to the conference scene audio and video collected by the user terminal; and when the network state is recovered well, the corresponding cloud conference scene model continues to train, and the user-side conference scene model synchronizes the model training data of the current user to the corresponding cloud conference scene model.
According to the video conference information method, the user terminal of the video conference is used as the edge node, the cloud server of the video conference is used as the cloud node, the video conference scene construction and various function enhancements of the video conference are realized through fusion calculation of the user terminal and the cloud server of the video conference, the execution main body of the conference scene construction and the function enhancements can be determined to be the cloud node or the edge node according to the real-time network environment, and the video conference information method has the advantages of stable video conference connection, high-quality and smooth video conference scene, voice fidelity restoration, low environmental noise and the like.
Finally, the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting, although the present invention is described in detail with reference to the embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, and all of them should be covered in the protection scope of the present invention.

Claims (10)

1. A video conference information system integrating cloud computing and edge computing is characterized by comprising a cloud server and a plurality of user terminals, wherein each user terminal comprises a function enhancement module and a network detection module; the user terminal is used for acquiring the conference scene audio and video of the current user and displaying the conference scene audio and video of other users of the video conference; the network detection module is used for monitoring and judging the network state in real time, and the function enhancement module is used for carrying out function enhancement on the audio and video of the conference scene; the cloud server is used for establishing and training a corresponding cloud conference scene model in real time according to the conference scene audio and video uploaded by the corresponding user terminal, fitting the cloud conference scene model with the corresponding video conference user, and respectively transmitting the conference scene audio and video after calculation processing to other user terminals of the video conference.
2. The system of claim 1, wherein the functional enhancement module is configured to functionally enhance the audio/video of the conference scene, and comprises extracting a face ROI feature for face detection, face beauty enhancement, image quality enhancement, extracting a picture foreground region ROI feature for background blurring, speech enhancement, speech fidelity restoration, and ambient noise removal.
3. The video conference information system according to claim 1, wherein the tolerance of the corresponding cloud conference scene model to the network state is judged according to the data volume of the conference scene audio and video uploaded by each user terminal; when the data volume of the conference scene audio and video uploaded by the user terminal is smaller than a preset threshold value and the fitting degree of the corresponding cloud conference scene model and the user is low, judging that the tolerance of the corresponding cloud conference scene model to the network state is high, and performing function enhancement on the conference scene audio and video of the current user by adopting a function enhancement module; when the data volume of the conference scene audio and video uploaded by the user terminal is larger than a preset threshold value, and the fitting degree of the corresponding cloud conference scene model and the user is high, the tolerance of the corresponding cloud conference scene model to the network state is judged to be low, and the basic data of the current user is adopted to perform fusion calculation on the virtual complete conference scene audio and video.
4. The video conference information system according to claim 1, wherein when the network status is bad, the user terminal uploads the conference basic data information of the current user to the cloud server, the cloud conference scene model of the corresponding user stops training, and the user terminal establishes and trains the corresponding user-side conference scene model according to the conference scene audio and video collected by the user terminal; and when the network state is recovered well, the corresponding cloud conference scene model continues to train, and the user-side conference scene model synchronizes the model training data of the current user to the corresponding cloud conference scene model.
5. The video conference information system according to claim 1, wherein said network detection module determines the current network status according to a packet loss rate of the data packet and a feedback delay after the data packet is transmitted.
6. A method for using a videoconference information system, as defined in any of claims 1 to 5, comprising the steps of:
s1: acquiring a conference scene audio and video of a current user through a user terminal, and displaying conference scene audio and video of other users of the video conference;
s2: the method comprises the steps that a function enhancement module is adopted to enhance the functions of conference scene audios and videos, and the enhanced conference scene audios and videos are uploaded to a cloud server; the cloud server establishes and trains a corresponding cloud conference scene model in real time according to the conference scene audio and video uploaded by the corresponding user terminal, and the cloud conference scene model is fitted with the corresponding video conference user;
s3: monitoring the network state in real time by adopting a network detection module, and judging whether the network state is good or not in real time; if yes, go to step S4, otherwise go to step S5;
s4: the method comprises the steps that a function enhancement module is adopted to enhance the functions of conference scene audios and videos of a current user, and the enhanced conference scene audios and videos are uploaded to a cloud server; the cloud server transmits the enhanced conference scene audio and video to other user terminals of the video conference respectively;
s5: the user terminal uploads the conference basic data information of the current user to the cloud server, and the corresponding cloud conference scene model performs fusion calculation on a virtual complete conference scene audio and video according to the basic data of the user; and the cloud server transmits the virtually complete conference scene audio and video to other user terminals of the video conference respectively.
7. The video conference information method according to claim 6, wherein the audio and video of the conference scene is functionally enhanced by the functional enhancement module, and the functional enhancement comprises face detection by extracting face ROI features, face beauty enhancement, image quality enhancement, background blurring by extracting ROI feature detection of a foreground region of a picture, voice enhancement, voice fidelity restoration and environmental noise removal.
8. The video conference information method according to claim 6, wherein the tolerance of the corresponding cloud conference scene model to the network state is judged according to the data volume of the conference scene audio and video uploaded by each user terminal; when the data volume of the conference scene audio and video uploaded by the user terminal is smaller than a preset threshold value and the fitting degree of the corresponding cloud conference scene model and the user is low, judging that the tolerance of the corresponding cloud conference scene model to the network state is high, and performing function enhancement on the conference scene audio and video of the current user by adopting a function enhancement module; when the data volume of the conference scene audio and video uploaded by the user terminal is larger than a preset threshold value, and the fitting degree of the corresponding cloud conference scene model and the user is high, the tolerance of the corresponding cloud conference scene model to the network state is judged to be low, and the basic data of the current user is adopted to perform fusion calculation on the virtual complete conference scene audio and video.
9. The video conference information method according to claim 6, wherein when the network state is bad, the user terminal uploads the conference basic data information of the current user to the cloud server, the cloud conference scene model of the corresponding user stops training, and the user terminal establishes and trains the corresponding user-side conference scene model according to the conference scene audio and video collected by the user terminal; and when the network state is recovered well, the corresponding cloud conference scene model continues to train, and the user-side conference scene model synchronizes the model training data of the current user to the corresponding cloud conference scene model.
10. The video conference information method according to claim 6, wherein said network detection module determines the current network state according to the packet loss rate of the data packet and the feedback delay after the data packet is sent.
CN202010304820.5A 2020-04-17 2020-04-17 Video conference information system and method with integration of cloud computing and edge computing Pending CN111405234A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010304820.5A CN111405234A (en) 2020-04-17 2020-04-17 Video conference information system and method with integration of cloud computing and edge computing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010304820.5A CN111405234A (en) 2020-04-17 2020-04-17 Video conference information system and method with integration of cloud computing and edge computing

Publications (1)

Publication Number Publication Date
CN111405234A true CN111405234A (en) 2020-07-10

Family

ID=71429666

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010304820.5A Pending CN111405234A (en) 2020-04-17 2020-04-17 Video conference information system and method with integration of cloud computing and edge computing

Country Status (1)

Country Link
CN (1) CN111405234A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112291506A (en) * 2020-12-25 2021-01-29 北京电信易通信息技术股份有限公司 Method and system for tracing security vulnerability of streaming data in video conference scene
CN112672090A (en) * 2020-12-17 2021-04-16 深圳随锐云网科技有限公司 Method for optimizing audio and video effects in cloud video conference
CN112908353A (en) * 2021-02-03 2021-06-04 天津大学 Voice enhancement method for hearing aid by combining edge computing and cloud computing
CN113327619A (en) * 2021-02-26 2021-08-31 山东大学 Conference recording method and system based on cloud-edge collaborative architecture
CN113362455A (en) * 2021-06-18 2021-09-07 特斯联科技集团有限公司 Video conference background virtualization processing method and device
CN113473068A (en) * 2021-07-14 2021-10-01 中国联合网络通信集团有限公司 Conference access method, device, server and storage medium
CN113965550A (en) * 2021-10-15 2022-01-21 天津大学 Intelligent interactive remote auxiliary video system
WO2022099753A1 (en) * 2020-11-11 2022-05-19 苏州知云创宇信息科技有限公司 Conference video information uploading method and system based on cloud computing service
CN115914540A (en) * 2022-10-12 2023-04-04 山东美承数码科技有限公司 Cloud video conference system
CN117560464A (en) * 2024-01-10 2024-02-13 深圳市云屋科技有限公司 Multi-platform video conference method and system
CN117560464B (en) * 2024-01-10 2024-05-03 深圳市云屋科技有限公司 Multi-platform video conference method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104704814A (en) * 2012-07-30 2015-06-10 摩托罗拉移动技术公司 Video bandwidth allocation in a video conference
CN106488265A (en) * 2016-10-12 2017-03-08 广州酷狗计算机科技有限公司 A kind of method and apparatus sending Media Stream
CN108574817A (en) * 2017-03-09 2018-09-25 北京达力博信科技有限公司 A kind of video conferencing system and videoconference data transmission method
CN109218759A (en) * 2018-09-27 2019-01-15 广州酷狗计算机科技有限公司 Push method, apparatus, server and the storage medium of Media Stream
CN109769143A (en) * 2019-02-03 2019-05-17 广州视源电子科技股份有限公司 Method of video image processing, device, video system, equipment and storage medium
US10462425B1 (en) * 2018-09-07 2019-10-29 Bank Of America Corporation Processing system for providing a teller assistant experience using enhanced reality interfaces
CN110581976A (en) * 2019-09-16 2019-12-17 平安科技(深圳)有限公司 teleconferencing method, device, computer system and readable storage medium
CN111010529A (en) * 2019-12-25 2020-04-14 杭州席媒科技有限公司 Video conference method and system capable of realizing multi-person real-time annotation

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104704814A (en) * 2012-07-30 2015-06-10 摩托罗拉移动技术公司 Video bandwidth allocation in a video conference
CN106488265A (en) * 2016-10-12 2017-03-08 广州酷狗计算机科技有限公司 A kind of method and apparatus sending Media Stream
CN108574817A (en) * 2017-03-09 2018-09-25 北京达力博信科技有限公司 A kind of video conferencing system and videoconference data transmission method
US10462425B1 (en) * 2018-09-07 2019-10-29 Bank Of America Corporation Processing system for providing a teller assistant experience using enhanced reality interfaces
CN109218759A (en) * 2018-09-27 2019-01-15 广州酷狗计算机科技有限公司 Push method, apparatus, server and the storage medium of Media Stream
CN109769143A (en) * 2019-02-03 2019-05-17 广州视源电子科技股份有限公司 Method of video image processing, device, video system, equipment and storage medium
CN110581976A (en) * 2019-09-16 2019-12-17 平安科技(深圳)有限公司 teleconferencing method, device, computer system and readable storage medium
CN111010529A (en) * 2019-12-25 2020-04-14 杭州席媒科技有限公司 Video conference method and system capable of realizing multi-person real-time annotation

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022099753A1 (en) * 2020-11-11 2022-05-19 苏州知云创宇信息科技有限公司 Conference video information uploading method and system based on cloud computing service
CN112672090B (en) * 2020-12-17 2023-04-18 深圳随锐视听科技有限公司 Method for optimizing audio and video effects in cloud video conference
CN112672090A (en) * 2020-12-17 2021-04-16 深圳随锐云网科技有限公司 Method for optimizing audio and video effects in cloud video conference
CN112291506A (en) * 2020-12-25 2021-01-29 北京电信易通信息技术股份有限公司 Method and system for tracing security vulnerability of streaming data in video conference scene
CN112908353A (en) * 2021-02-03 2021-06-04 天津大学 Voice enhancement method for hearing aid by combining edge computing and cloud computing
CN113327619A (en) * 2021-02-26 2021-08-31 山东大学 Conference recording method and system based on cloud-edge collaborative architecture
CN113327619B (en) * 2021-02-26 2022-11-04 山东大学 Conference recording method and system based on cloud-edge collaborative architecture
CN113362455A (en) * 2021-06-18 2021-09-07 特斯联科技集团有限公司 Video conference background virtualization processing method and device
CN113473068A (en) * 2021-07-14 2021-10-01 中国联合网络通信集团有限公司 Conference access method, device, server and storage medium
CN113965550A (en) * 2021-10-15 2022-01-21 天津大学 Intelligent interactive remote auxiliary video system
CN113965550B (en) * 2021-10-15 2023-08-18 天津大学 Intelligent interactive remote auxiliary video system
CN115914540A (en) * 2022-10-12 2023-04-04 山东美承数码科技有限公司 Cloud video conference system
CN115914540B (en) * 2022-10-12 2023-09-29 山东美承数码科技有限公司 Cloud video conference system
CN117560464A (en) * 2024-01-10 2024-02-13 深圳市云屋科技有限公司 Multi-platform video conference method and system
CN117560464B (en) * 2024-01-10 2024-05-03 深圳市云屋科技有限公司 Multi-platform video conference method and system

Similar Documents

Publication Publication Date Title
CN111405234A (en) Video conference information system and method with integration of cloud computing and edge computing
KR102054173B1 (en) Displaying a presenter during a video conference
US9282284B2 (en) Method and system for facial recognition for a videoconference
CN104521180B (en) Conference call method, apparatus and system based on Unified Communication
CN100459711C (en) Video compression method and video system using the method
CN111402399B (en) Face driving and live broadcasting method and device, electronic equipment and storage medium
CN105376515B (en) Rendering method, the apparatus and system of communication information for video communication
CN104836981A (en) Intelligent meeting collaborative method and meeting terminal
CN104639777A (en) Conference control method, conference control device and conference system
WO2013107184A1 (en) Conference recording method and conference system
JP2009510877A (en) Face annotation in streaming video using face detection
CN112672090B (en) Method for optimizing audio and video effects in cloud video conference
CN101141610A (en) Apparatus and method for video mixing and computer readable medium
CN112839196B (en) Method, device and storage medium for realizing online conference
CN107623830B (en) A kind of video call method and electronic equipment
CN111988555B (en) Data processing method, device, equipment and machine readable medium
CN102984496A (en) Processing method, device and system of video and audio information in video conference
US20110164742A1 (en) Conversation detection in an ambient telephony system
CN110536095A (en) Call method, device, terminal and storage medium
CN103702064A (en) Video conference method, video conference terminal and video conference system
CN111901621A (en) Interactive live broadcast teaching throttling device and method based on live broadcast content recognition
CN113593587B (en) Voice separation method and device, storage medium and electronic device
US11792353B2 (en) Systems and methods for displaying users participating in a communication session
CN114979545A (en) Multi-terminal call method, storage medium and electronic device
CN113992882A (en) Packet processing method and device for multi-person conversation, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200710