CN111541856A - Video and audio processing device and video conference system thereof - Google Patents

Video and audio processing device and video conference system thereof Download PDF

Info

Publication number
CN111541856A
CN111541856A CN201810971935.2A CN201810971935A CN111541856A CN 111541856 A CN111541856 A CN 111541856A CN 201810971935 A CN201810971935 A CN 201810971935A CN 111541856 A CN111541856 A CN 111541856A
Authority
CN
China
Prior art keywords
audio data
data
video
audio
integrated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201810971935.2A
Other languages
Chinese (zh)
Inventor
罗承志
罗志红
汪义臣
钟进先
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Linky View Technology Co ltd
Original Assignee
Shenzhen Linky View Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Linky View Technology Co ltd filed Critical Shenzhen Linky View Technology Co ltd
Priority to CN201810971935.2A priority Critical patent/CN111541856A/en
Publication of CN111541856A publication Critical patent/CN111541856A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems

Abstract

The present invention relates to video and audio processing technology, and more particularly to a video and audio processing apparatus; the device comprises a video and audio processing device and a host device which can be connected with the video and audio processing device and a remote device. The host device obtains a plurality of candidate video data and a plurality of audio data to be integrated, screens out a plurality of candidate video data according to the audio data to be integrated, integrates the audio data to be integrated, and transmits the integrated audio data and the screened video data to the remote device, wherein the audio data to be integrated comprises client audio data and host audio data, and the candidate video data comprises a plurality of client video data and host video data; the invention ensures that the user vision of the remote device can still focus on the speaker of the local end on the premise that the local end has a plurality of video conference participants, thereby improving the quality and the efficiency of the video conference.

Description

Video and audio processing device and video conference system thereof
Technical Field
The present invention relates to video and audio processing technologies, and in particular, to a video and audio processing apparatus and a video conference system thereof.
Background
With the continuous development of electronic technology and communication network, the hardware equipment, software technology and operation function of video signal have been greatly improved. The video signal can provide a plurality of users to carry out synchronous connection at any time through electronic devices such as computers, smart phones, tablet computers and the like, so that the users can see dynamic images of other users through a screen when carrying out voice communication, and the reality and the telepresence of communication are enhanced. Therefore, video conferences have been applied to business communication, whereby enterprises can perform cross-regional internal and external communication.
However, when a video conference is conducted between multiple people, the wide angle of the lens of the video equipment or the insufficient sound receiving range of the microphone often causes that all people need to be close to the video equipment, thereby affecting the quality and efficiency of the video conference.
Disclosure of Invention
In order to solve the above technical problems, a first object of the present invention is to provide a video and audio processing apparatus.
The technical scheme adopted by the invention to realize the purpose is as follows:
a video and audio processing device comprises a communication circuit, an image capturing device, a radio device, a memory and a processor, wherein the communication circuit is used for connecting to at least one other video and audio processing device and a remote device; the image capturing device is used for capturing image signals; the radio device is used for capturing sound signals; the memory is used for storing file data; the processor is coupled to the communication circuit, the image capturing device, the radio device and the memory, and is used for executing the following steps: obtaining at least one candidate video data and a plurality of audio data to be integrated, wherein each candidate video data corresponds to one of the audio data to be integrated; selecting one of the candidate video data according to the audio data to be integrated corresponding to each of the candidate video data to generate screened video data; integrating the audio data to be integrated to generate integrated audio data; and transmitting the integrated audio data and the screened video data to the remote device through the communication circuit.
As a further improvement, the processor is further configured to perform the following steps: receiving remote video data and remote audio data from the remote device through the communication circuit; and transmitting the remote video data and the remote audio data to each of the other video and audio processing devices through the communication circuit.
As a further improvement, the method further comprises the following steps: the screen is used for displaying pictures; and a broadcasting device for broadcasting sound; wherein the processor is further coupled to the screen and the broadcasting device, and is configured to perform the following steps: the remote video data and the remote audio data are played through the screen and the broadcasting device, respectively.
As a further improvement, each of the audio data to be integrated includes a reception volume, and the step of the processor selecting one of the candidate video data according to the audio data to be integrated corresponding to each of the candidate video data to generate the filtered video data includes: selecting the candidate video data corresponding to the audio data to be integrated with the maximum reception volume as the screened video data; each of the audio data to be integrated includes an audio-to-noise ratio, and the step of the processor selecting one of the candidate video data according to the audio data to be integrated corresponding to each of the candidate video data to generate the screened video data includes: selecting the candidate video data corresponding to the audio data to be integrated with the maximum radio noise ratio as the screened video data; each of the audio data to be integrated includes a reception time, and the step of the processor selecting one of the candidate video data according to the audio data to be integrated corresponding to each of the candidate video data to generate the screened video data includes: selecting the candidate video data corresponding to the audio data to be integrated with the longest reception time as the screened video data.
As a further improvement, each of the audio data to be integrated includes a reception time, and the step of the processor selecting one of the candidate video data according to each of the audio data to generate the filtered video data includes: selecting the candidate video data corresponding to the audio data with the reception time greater than the time threshold value as the screened video data;
as a further improvement, the processor for integrating the audio data to be integrated to generate integrated audio data comprises: performing audio mixing processing or denoising processing on the audio data to be integrated to generate the integrated audio data; the processor is further configured to perform the following steps: receiving a second video signal and a second audio signal from the image capturing device and the audio receiving device, respectively, to generate second video data and second audio data; determining whether the sound parameter of the second audio data meets the transmission standard; if yes, setting the second video data and the second audio data as one of the candidate video data and one of the audio data to be integrated, respectively; and if not, only setting the second audio data as one of the audio data to be integrated.
As a further improvement, the second audio data includes audio frequencies, and the processor determines whether the sound parameters of the second audio data meet the transmission criteria includes: judging whether the audio frequency conforms to the human voice frequency; the second audio data further includes the sound receiving volume of the sound receiving device, and the step of the processor determining whether the sound parameter of the second audio data meets the transmission standard includes: judging whether the radio volume is larger than a volume threshold value; the audio data further includes a reception noise ratio of the reception device, and the step of the processor determining whether the sound parameter of the second audio data meets the transmission standard further includes: determining whether the reception noise ratio is greater than a reception noise ratio threshold.
The second objective of the present invention is to provide a video conference system, which comprises at least one client device, wherein each client device generates client video data and client audio data; and a host device connected to each client device and the remote device, wherein the host device obtains at least one candidate video data and a plurality of audio data to be integrated, selects one of the candidate video data according to each audio data to be integrated to generate screened video data, integrates the audio data to be integrated to generate integrated audio data, and transmits the integrated audio data and the screened video data to the remote device, wherein the audio data to be integrated comprises the client audio data and the host audio data, the candidate video data comprises at least one of the client video data and video host data, and each candidate video data corresponds to one of the audio data.
As a further refinement, for each of the client devices: the client device determines whether the sound parameters of the audio data of the client meet the sound receiving standard; if yes, the client device transmits the client audio data and the client video data to the host device; and if not, the client device only transmits the client audio data to the host device.
As a further improvement, the host device further receives remote video data and remote audio data from the remote device and transmits the remote video data and the remote audio data to each of the client devices.
The invention relates to a video and audio processing device and a video conference system thereof.A host device integrates all audio data generated by a local end and transmits the integrated audio data to a remote device, and transmits one of the video data to the remote device according to the sound parameters of all the audio data, so that the user vision of the remote device can still focus on a main speaker at the local end on the premise that the local end has a plurality of video conference participants, thereby improving the quality and the efficiency of the video conference.
Drawings
FIG. 1 is a schematic diagram of a video conferencing system of the present invention;
FIG. 2 is a schematic diagram of a video and audio processing apparatus according to the present invention;
FIG. 3 is a schematic diagram of a host device of the present invention;
FIG. 4 is a flow chart of an audio and video processing method of the present invention;
FIG. 5 is a flow chart of an audio and video processing method of the present invention;
FIG. 6 is a flow chart of an audio and video processing method of the present invention;
FIG. 7 is a flow chart of an audio and video processing method of the present invention;
fig. 8 is a schematic diagram of the application scenario of fig. 7 of the present invention.
Detailed Description
The following detailed description of embodiments of the invention refers to the accompanying drawings.
Referring to fig. 1, the video and audio processing device and the video conference system thereof according to the present invention are used in a video conference system 100, and include video and audio processing devices 110A to 110D and a host device 120, wherein the host device 120 can be connected to the video and audio processing devices 110A to 110D and a remote device 130, respectively. The video and audio processing devices 110A-110D and the host device 120 are local devices, and the video and audio processing devices 110A-110D can be wirelessly connected with the host device 120 by using a local area network or directly connected with the host device 120 by wire. In addition, the host device 120 can be connected to the remote device 130 via the Internet. It should be noted that the video and audio processing devices 110A-110D in the video conferencing system 100 are only for convenience of illustration. In other embodiments, the video conferencing system 100 may further provide any number of video and audio processing devices for interfacing with the host device 120, and the invention is not limited thereto.
As shown in fig. 2, each of the video and audio processing devices 110A-110D includes a screen 111, a communication circuit 112, an image capturing device 113, a broadcasting device 114, a sound receiving device 115, a memory 116, and a processor 117. In the embodiment, the video and audio processing devices 110A to 110D are, for example, electronic devices with audio and video processing functions, such as a personal computer, a notebook computer, a smart phone, a tablet computer, and a personal digital assistant, but the invention is not limited thereto. For convenience of description, the following description will be made only with respect to the video and audio processing apparatus 110A.
The screen 111 is used for displaying the video and the image outputted by the audio processing device 110A for the user to watch. In the present embodiment, the screen 111 is, for example, a liquid crystal display, a light emitting diode display, a field emission display or other types of displays externally connected to or built in the video and audio processing device 110A.
The communication circuit 112 is used to connect with other devices through a communication network, which may be a component supporting wireless network connection such as WiMAX communication protocol, Wi-Fi communication protocol, 3G communication protocol, 4G communication protocol, etc., wired network connection such as ADSL broadband or optical fiber network, etc., but the invention is not limited thereto.
The image capturing device 113 is used to capture images in front of the camera, and may be a camera using a CCD, CMOS or other lens. The image capturing device 113 may be built in the video and audio processing device 110A, or may be a web camera, a digital camera, a monocular camera, a digital camera, etc. externally connected to the video and audio processing device 110A.
The broadcasting device 114 is used for broadcasting sound and includes a speaker. The sound receiving device 115 is used for receiving sound and comprises a microphone. The broadcasting device 114 and the sound receiving device 115 may be built in the video and audio processing device 110A or may be externally connected to the video and audio processing device 110A. In addition, when the broadcasting device 114 and the sound receiving device 115 are externally connected to the video and audio processing device 110A, they can be further integrated into a single device such as an earphone microphone.
The memory 116 is used to store file data, and may be any type of fixed or removable random access memory, read only memory, flash memory, hard disk, or other similar device or combination of devices.
The processor 117 is coupled to the screen 111, the communication circuit 112, the image capturing device 113, the broadcasting device 114, the sound receiving device 115 and the memory 116, and is used for controlling and integrating operations among these components. The processor 117 may be, for example, a central processing unit, or other programmable general purpose or special purpose microprocessor, digital signal processor, programmable controller, application specific integrated circuit, programmable logic device, or other similar device or combination of devices.
In addition, in the embodiment, the video and audio processing device 110A may also be an electronic device consisting of only the communication circuit 112, the image capturing device 113, the sound receiving device 115, the memory 116 and the processor 117, and may be externally connected to another electronic device including the screen 111 and the broadcasting device 114.
As shown in fig. 3, the host device 120 includes a screen 121, a communication circuit 122, an image capturing device 123, a broadcasting device 124, a sound receiving device 125, a memory 126, and a processor 127. In this embodiment, the host device 120 may also be an electronic device with audio and video processing functions, such as a personal computer, a notebook computer, a smart phone, a tablet computer, a personal digital assistant, etc., but the invention is not limited thereto. The screen 121, the communication circuit 122, the image capturing device 123, the broadcasting device 124, the sound receiving device 125, the memory 126 and the processor 127 are respectively similar to the screen 111, the communication circuit 112, the image capturing device 113, the broadcasting device 114, the sound receiving device 115, the memory 116 and the processor 117 of the video and audio processing devices 110A to 110D in fig. 1, and related descriptions thereof refer to the foregoing paragraphs, which are not repeated herein.
In this embodiment, when a plurality of local devices and remote devices 130 of the video conference system 100 are in a video conference, each of the video and audio processing devices 110A-110D transmits the generated client audio data to the host device 120, and further determines whether to further transmit the generated client video data to the host device 120 by using the client audio data. The host device 120 itself also generates host audio data and host video data. Then, the host device 120 integrates the audio data and the screened video data, and transmits the integrated audio data and the screened video data to the remote device 130, thereby implementing a video conference with a focusing effect of the speaker. The following describes specific ways of audio and video processing performed by the video and audio processing devices 110A-110D and the host device 120, respectively.
FIG. 4 is a flow chart illustrating an audio and video processing method according to an embodiment of the invention. The method of the present embodiment is suitable for the video and audio processing devices 110A-110D of fig. 2, and the detailed steps of the audio and video processing method will be described below with reference to each component of the video and audio processing device 110A, and so on for the video and audio processing devices 110B-110D. However, in practice, the method of the present embodiment is also applicable to an electronic device that includes only the communication circuit 112, the image capturing device 113, the sound receiving device 115, the memory 116 and the processor 117, and the invention is not limited thereto.
As shown in fig. 2 and 4, first, the processor 117 of the video and audio processing device 110A receives the first video signal and the first audio signal from the image capturing device 113 and the sound receiving device 115, respectively, to generate first video data and first audio data (step S202). Here, the first image signal is a moving image of a user of the video and audio processing device 110A captured by the image capturing device 113, and the first audio signal is an ambient sound of the video and audio processing device 110A captured by the audio receiving device 115. The processor 117 performs digital conversion on the first video signal and the first audio signal captured by the image capturing device 113 and the audio capturing device 115 to generate first video data and first audio data. In addition, the processor 117 can also selectively compress the first video data to meet different requirements of the transmission bandwidth of the subsequent network, which is not limited herein.
Then, the processor 117 determines whether the first audio data meets at least one transmission condition (step S204), so as to determine whether the first audio signal captured by the audio receiving device 114 is the speaking voice of the user of the video and audio processing device 110A and whether the user of the video and audio processing device 110A is a possible speaker of the video conference by using the audio parameter of the first audio data, thereby determining whether the first audio data meets the transmission condition. The sound parameters may be parameters such as sound receiving frequency, sound receiving volume, and sound receiving noise ratio.
Specifically, in the embodiment, since the first audio signal is the ambient sound captured by the sound receiving device 115, the processor 117 first determines whether the sound receiving frequency of the first audio data matches the human voice frequency. If so, processor 117 directly determines that the user of video and audio processing device 110A is a likely speaker of the video conference. For example, the frequency range of the male speaking sound is 85 to 180Hz, the frequency range of the female speaking sound is 165 to 255Hz, and the processor 117 can determine whether the reception frequency of the first audio data falls within the intervals to determine whether the first audio data corresponds to the human voice. When the reception frequency of the first audio data does correspond to the voice, the processor 117 will determine that it meets the transmission condition.
In one embodiment, the processor 117 further determines whether the reception volume of the first audio data is greater than a predetermined volume threshold after determining that the first audio data corresponds to a voice to determine whether the user of the video and audio processing device 110A is a possible speaker of the video conference. Generally, the volume of a normal speaker is about 60dB, and the threshold value of the volume can be preset to 55dB, for example, to reserve an allowable detection error range. In addition, the processor 117 may also change the preset volume threshold according to the surrounding environment of the video conference or manual adjustment of the user of the video and audio processing apparatus 110A at any time. When the reception frequency of the first audio data does correspond to the human voice and the reception volume of the first audio data is greater than the volume threshold, the processor 117 determines that it meets the transmission condition.
In one embodiment, similar to the concept of sound reception volume, the processor 117 further determines whether the user of the video and audio processing device 110A is a possible speaker of the video conference by determining whether the sound reception noise ratio of the first audio data is greater than a predetermined noise ratio threshold, wherein the noise ratio threshold may be predetermined to be 55dB, for example, after determining that the first audio data corresponds to a human voice. In addition, the processor 117 may also change the predetermined threshold value of the noise ratio according to the surrounding environment of the video and audio processing device 110A or the manual adjustment of the user of the video and audio processing device 110A at any time. When the reception frequency of the first audio data does correspond to the human voice and the noise ratio threshold of the first audio data is greater than the noise ratio threshold, the processor 117 determines that it meets the transmission condition.
In one embodiment, the processor 117 can also determine whether the user of the video and audio processing device 110A is a possible speaker of the video conference by using the reception frequency, the reception volume and the reception noise ratio of the first audio data. When the reception frequency of the first audio data does correspond to the human voice, the reception volume of the first audio data is greater than the volume threshold, and the noise ratio threshold of the first audio data is greater than the noise ratio threshold, the processor 117 determines that it meets the transmission condition.
Then, when the processor 117 determines that the first audio data meets the transmission condition, it represents that the user of the video and audio processing device 110A is a possible speaker of the video conference, so the processor 117 will transmit the first audio data and the first video data to the host device 120 through the communication circuit 112 (step S206). On the other hand, when the processor 117 determines that the first audio data does not satisfy any transmission condition, it means that the user of the video and audio processing device 110A is not a possible speaker of the video conference, so the processor 117 only transmits the first audio data to the host device 120 through the communication circuit 112 (step S208) to reduce the transmission of the local data amount. It should be noted that the first video data transmitted to the host device 120 is only one candidate video data to be subsequently transmitted to the remote device 130, that is, the host device 120 still selects one of the candidate video data to transmit to the remote device 130. In addition, the first audio data transmitted to the host device 120 is subsequently integrated with other audio data. The details will be described later.
The host device 120 will itself generate audio data and video data in addition to receiving audio data and any possible video data for the video and audio processing devices 110A-110D. In detail, FIG. 5 is a flow chart illustrating an audio and video processing method according to an embodiment of the invention. The method of the present embodiment is suitable for the host device 120 of fig. 3, and the following describes the detailed steps of the audio and video processing method with each component in the host device 120. However, in practice, the method of the present embodiment is also applicable to an electronic device that includes only the communication circuit 122, the image capturing device 123, the sound receiving device 125, the memory 126 and the processor 127, which is not limited herein.
As shown in fig. 3 and 5, first, the processor 127 of the host device 120 receives the second image signal and the second audio signal from the image capturing device 123 and the sound receiving device 125 to generate second video data and second audio data, respectively (step S302), and the processor 127 determines whether the second audio data meets at least one transmission condition (step S304). The way in which the processor 127 of the host device 120 executes steps S302 and S304 is similar to the way in which the video and audio processing devices 110A-110D execute steps S202 and S204, and for related descriptions, reference is made to the foregoing paragraphs, which are not repeated herein.
Unlike the embodiment of fig. 4, since the host device 120 itself will perform the integration of the audio data and the screening of the video data later, when the processor 127 determines that the second audio data meets the transmission condition, i.e., the user of the host device 120 is a possible speaker of the video conference, the processor 127 sets the second video data as one of the candidate video data and sets the second audio data as one of the audio data to be integrated (step S306). On the other hand, when the processor 127 determines that the second audio data does not meet any transmission condition, the second audio data is set as only one of the audio data to be integrated (step S308), i.e., the user of the host device 120 is not a possible speaker of the video conference.
After obtaining the audio data to be integrated and the candidate video data of the host device 120 itself and all the video and audio processing devices 110A-110D, the host device will perform integration and screening, respectively, as the transmission data to be transmitted to the remote device 130. In detail, fig. 6 is a flow chart illustrating an audio and video processing method according to an embodiment of the invention. The method of the present embodiment is suitable for the host device 120 of fig. 3, and the following describes the detailed steps of the audio and video processing method with each component in the host device 120.
As shown in fig. 3 and fig. 6, first, the processor 127 of the host device 120 obtains at least one candidate video data and a plurality of audio data to be integrated (step S402). The candidate video data and the audio data to be integrated are the data obtained by the video and audio processing devices 110A-110D after the process of FIG. 2 is executed and the host device 120 after the process of FIG. 3 is executed.
Next, the processor 127 selects one of the candidate video data according to each of the audio data to be integrated to generate filtered video data (step S404). In detail, the candidate video data are only video data of possible speakers in the video conference, and the processor 127 compares the audio data to be integrated corresponding to all the candidate video data to further select the video data to be transmitted to the remote device 130. Since the audio data to be integrated corresponding to all the candidate video data already conform to the human frequency, the processor 127 will select the video data to be transmitted according to the audio parameters of all the corresponding audio data to be integrated. The sound parameters may be the sound receiving time, sound receiving volume, and sound receiving noise ratio.
In one embodiment, the processor 127 selects the audio data to be integrated corresponding to all the candidate video data with the longest reception time as the filtered video data. This embodiment directly considers the speaking-first user as the current speaker of the video conference.
In one embodiment, the processor 127 selects the audio data to be integrated corresponding to all candidate video data whose reception time is greater than the time threshold, such as 0.5 seconds, as the filtered video data. In this embodiment, the voice reception time of the user is greater than the time threshold, and the processor 127 only considers the user corresponding to the voice reception time greater than the time threshold as the current speaker of the video conference, so as to avoid continuously switching the video frames of different device users to be displayed in a very short time by the subsequent remote device 130.
In one embodiment, the processor 127 selects the audio data to be integrated corresponding to all the candidate video data with the largest reception volume as the screened video data. This embodiment takes into account that the users corresponding to other candidate video data may be privately discussed, rather than being speakers of the video conference
In one embodiment, similar to the concept of sound volume, the processor 127 selects the audio data to be integrated corresponding to all the candidate video data with the largest sound-to-noise ratio as the filtered video data.
In an embodiment, the processor 127 may also use different combinations of the reception time, the reception volume, and the reception noise ratio as a selection basis to make the filtered result more accurate.
In one embodiment, when there is only a single candidate video data, the processor 127 can directly set it as the filtered video data. In yet another embodiment, when there is no candidate video data, i.e. the user at the local end does not speak, the processor 127 can continuously transmit the video data of the speaker at the previous time point as the filtered video data.
On the other hand, the processor 127 integrates all the audio data to be integrated together to generate integrated audio data (step S406). In detail, the processor 127 may perform a mixing process or a denoising process on all the audio data to be integrated, so that the integrated audio data has better quality. The processor 127 then transmits the integrated audio data and the filtered video data to the remote device 130 via the communication circuit 112 (step S408). In other words, the audio data and the video data received by the remote device 130 from the host device 120 are the locally integrated sound and the video picture of the speaker, so as to achieve the video conference with the focusing effect of the speaker.
Incidentally, the host device 120 will receive the remote video data and the remote audio data from the remote device 130 through the communication circuit 122 at the same time, and play the remote video data and the remote audio data through the screen 121 and the playing device 124, respectively. In addition, the host device 120 will also transmit the remote video data and the remote audio data to the video and audio processing devices 110A-110D, and the video and audio processing devices 110A-110D will play the remote video data and the remote audio data through the screen 111 and the playing device 114.
Fig. 7 is a flowchart of an audio and video processing method according to an embodiment of the invention to illustrate an application scenario of the video conference system 100 of fig. 1. FIG. 8 is a diagram illustrating an application scenario of FIG. 7.
As shown in fig. 1 and fig. 8, first, the host device 120 of the video conference system 100 obtains at least one candidate video data and a plurality of audio data to be integrated (step S502). The candidate video data is the video picture which may be the speaker of the local device user, and the audio data to be integrated is the speaking voice of the local device user. Then, the host device 120 determines the local speaker according to the audio data to be integrated corresponding to the candidate video data (step S504) to generate the video data of the speaker (step S506). On the other hand, the host device 120 performs audio mixing and/or denoising on the audio data to be integrated (step S508) to generate integrated audio data (step S510). Then, the host device 120 transmits the video data of the speaker and the integrated audio data to the remote device (step S512). For details of steps S502 to S512, please refer to the related description of the foregoing embodiments, which is not repeated herein.
As shown in fig. 8, in the present embodiment, the video and audio processing devices 110A-110B are connected to the host device 120 via a network LAN, respectively, and the host device 120 is connected to the remote device 130 via a network WAN. Assume that the host device 120 determines that user C1 of the video and audio processing device 110A is the master in FIG. 5, and then transmits video data of user C1 to the remote device 130. In addition, the host device 120 will also transmit the audio data M integrated by the audio data VC1, VC2 and VH of users C1, C2 and H to the remote device 130, and the remote device 130 will play the video of user C1 and the sound of users C1, C2 and H. In addition, the host device 120 also receives and transmits the video data and audio data VC1 of the user R of the remote device 130 to the video and audio processing devices 110A-110B, and the video and audio processing devices 110A-110B and the host device 120 simultaneously play the video pictures and sound of the user R.

Claims (10)

1. A video and audio processing apparatus, comprising: the device comprises a communication circuit, an image acquisition device, a radio device, a memory and a processor, wherein the communication circuit is used for connecting to at least one other video and audio processing device and a remote device; the image capturing device is used for capturing image signals; the radio device is used for capturing sound signals; the memory is used for storing file data; the processor is coupled to the communication circuit, the image capturing device, the radio device and the memory, and is used for executing the following steps: obtaining at least one candidate video data and a plurality of audio data to be integrated, wherein each candidate video data corresponds to one of the audio data to be integrated; selecting one of the candidate video data according to the audio data to be integrated corresponding to each of the candidate video data to generate screened video data; integrating the audio data to be integrated to generate integrated audio data; and transmitting the integrated audio data and the screened video data to the remote device through the communication circuit.
2. A video and audio processing apparatus according to claim 1, wherein: the processor is further configured to perform the following steps: receiving remote video data and remote audio data from the remote device through the communication circuit; and transmitting the remote video data and the remote audio data to each of the other video and audio processing devices through the communication circuit.
3. A video and audio processing apparatus according to claim 2, wherein: further comprising: the screen is used for displaying pictures; and a broadcasting device for broadcasting sound; wherein the processor is further coupled to the screen and the broadcasting device, and is configured to perform the following steps: the remote video data and the remote audio data are played through the screen and the broadcasting device, respectively.
4. A video and audio processing apparatus according to claim 1, wherein: each of the audio data to be integrated includes a reception volume, and the step of the processor selecting one of the candidate video data according to the audio data to be integrated corresponding to each of the candidate video data to generate the screened video data includes: selecting the candidate video data corresponding to the audio data to be integrated with the maximum reception volume as the screened video data; each of the audio data to be integrated includes an audio-to-noise ratio, and the step of the processor selecting one of the candidate video data according to the audio data to be integrated corresponding to each of the candidate video data to generate the screened video data includes: selecting the candidate video data corresponding to the audio data to be integrated with the maximum radio noise ratio as the screened video data; each of the audio data to be integrated includes a reception time, and the step of the processor selecting one of the candidate video data according to the audio data to be integrated corresponding to each of the candidate video data to generate the screened video data includes: selecting the candidate video data corresponding to the audio data to be integrated with the longest reception time as the screened video data.
5. A video and audio processing apparatus according to claim 1, wherein: each of the audio data to be integrated includes a reception time, and the step of the processor selecting one of the candidate video data according to each of the audio data to generate the filtered video data includes: selecting the candidate video data corresponding to the audio data with the reception time greater than the time threshold value as the screened video data.
6. A video and audio processing apparatus according to claim 1, wherein: the processor is used for integrating the audio data to be integrated to generate integrated audio data, and the steps of the processor comprise: performing audio mixing processing or denoising processing on the audio data to be integrated to generate the integrated audio data; the processor is further configured to perform the following steps: receiving a second video signal and a second audio signal from the image capturing device and the audio receiving device, respectively, to generate second video data and second audio data; determining whether the sound parameter of the second audio data meets the transmission standard; if yes, setting the second video data and the second audio data as one of the candidate video data and one of the audio data to be integrated, respectively; and if not, only setting the second audio data as one of the audio data to be integrated.
7. A video and audio processing apparatus according to claim 9, wherein: the second audio data includes audio frequency, and the processor determines whether the sound parameter of the second audio data meets the transmission standard includes: judging whether the audio frequency conforms to the human voice frequency; the second audio data further includes the sound receiving volume of the sound receiving device, and the step of the processor determining whether the sound parameter of the second audio data meets the transmission standard includes: judging whether the radio volume is larger than a volume threshold value; the audio data further includes a reception noise ratio of the reception device, and the step of the processor determining whether the sound parameter of the second audio data meets the transmission standard further includes: determining whether the reception noise ratio is greater than a reception noise ratio threshold.
8. A video conferencing system, comprising: the system comprises at least one client device, wherein each client device respectively generates client video data and client audio data; and a host device connected to each client device and the remote device, wherein the host device obtains at least one candidate video data and a plurality of audio data to be integrated, selects one of the candidate video data according to each audio data to be integrated to generate screened video data, integrates the audio data to be integrated to generate integrated audio data, and transmits the integrated audio data and the screened video data to the remote device, wherein the audio data to be integrated comprises the client audio data and the host audio data, the candidate video data comprises at least one of the client video data and video host data, and each candidate video data corresponds to one of the audio data.
9. A video conferencing system as in claim 8, wherein: for each of the client devices: the client device determines whether the sound parameters of the audio data of the client meet the sound receiving standard; if yes, the client device transmits the client audio data and the client video data to the host device; and if not, the client device only transmits the client audio data to the host device.
10. A video conferencing system as in claim 8, wherein: the host device further receives remote video data and remote audio data from the remote device and transmits the remote video data and the remote audio data to each of the client devices.
CN201810971935.2A 2018-08-24 2018-08-24 Video and audio processing device and video conference system thereof Withdrawn CN111541856A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810971935.2A CN111541856A (en) 2018-08-24 2018-08-24 Video and audio processing device and video conference system thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810971935.2A CN111541856A (en) 2018-08-24 2018-08-24 Video and audio processing device and video conference system thereof

Publications (1)

Publication Number Publication Date
CN111541856A true CN111541856A (en) 2020-08-14

Family

ID=71976623

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810971935.2A Withdrawn CN111541856A (en) 2018-08-24 2018-08-24 Video and audio processing device and video conference system thereof

Country Status (1)

Country Link
CN (1) CN111541856A (en)

Similar Documents

Publication Publication Date Title
TWI602437B (en) Video and audio processing devices and video conference system
US8614735B2 (en) Video conferencing
WO2016112698A1 (en) Screen casting method, device and system
US9344678B2 (en) Information processing apparatus, information processing method and computer-readable storage medium
CN112995566B (en) Sound source positioning method based on display device, display device and storage medium
US20110194488A1 (en) Telephone System
KR20180048982A (en) Devices for video-conferences
CN102202206B (en) Communication equipment
JP7334470B2 (en) VIDEO PROCESSING DEVICE, VIDEO CONFERENCE SYSTEM, VIDEO PROCESSING METHOD, AND PROGRAM
US20190306462A1 (en) Image processing apparatus, videoconference system, image processing method, and recording medium
CN114531564A (en) Processing method and electronic equipment
US20140362166A1 (en) Incoming call display method, electronic device, and incoming call display system
US20200106821A1 (en) Video processing apparatus, video conference system, and video processing method
CN111541856A (en) Video and audio processing device and video conference system thereof
US9392036B2 (en) Terminal device and communication system
WO2016151974A1 (en) Information processing device, information processing method, client device, server device, and information processing system
US11165990B2 (en) Mobile terminal and hub apparatus for use in a video communication system
US10979628B2 (en) Image processing apparatus, image processing system, image processing method, and recording medium
EP4311227A1 (en) Electronic device and method for remote video conference
TWI539798B (en) Method for adaptively adjusting video transmission frequency and terminal device thereof and video transmission system
CN116055858A (en) Control method, control device, electronic equipment and storage medium
TW201717610A (en) Video conference system and video conference method and camara thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20200814