CN116312510A

CN116312510A - Near-end control method of far-end conference device, remote conference system and related device

Info

Publication number: CN116312510A
Application number: CN202310048331.1A
Authority: CN
Inventors: 王欢良; 张李; 李霄; 唐浩元; 肖佳林
Original assignee: Suzhou Qimengzhe Technology Co ltd
Current assignee: Suzhou Qimengzhe Technology Co ltd
Priority date: 2023-01-31
Filing date: 2023-01-31
Publication date: 2023-06-23

Abstract

The application provides a near-end control method of a far-end conference device, a remote conference system and related devices, wherein the method comprises the following steps: the near-end conference device embeds the corresponding control instruction into the voice data stream in an audio watermark mode through an audio watermark generation algorithm according to the control operation of the user, and transmits the control instruction to the far-end conference device through an uplink channel. The remote conference device receives the voice data stream from the downlink channel, continuously detects whether the voice data stream contains the audio watermark, and if the voice data stream contains the audio watermark, extracts and analyzes a control instruction contained in the audio watermark through an audio watermark extraction algorithm, and executes corresponding adjustment action according to the control instruction. The control instruction is transmitted by adopting the audio watermarking technology, so that the sound pick-up volume and the sound pick-up quality of the far-end conference device are directly controlled from the near-end conference device, and the user experience of the remote conference is improved. Because the audio watermarking technology is adopted, the communication is not affected, and the network transmission control protocol is not required to be additionally added.

Description

Near-end control method of far-end conference device, remote conference system and related device

Technical Field

The present invention relates to teleconferencing, and more particularly, to a method for controlling a teleconferencing device, a teleconferencing system, and related apparatuses.

Background

Teleconferencing has become an important form of current work. Devices for teleconferencing are of a wide variety, but can generally be divided into two categories, one with and one without an operator interface. The former includes cellular phones, personal Computers (PCs), etc., and the latter includes conventional hands-free phones, conference boxes, etc.

In any conference device, a problem is often encountered, namely that when the gain setting of the far-end device is too small, the speaker sounds light or the far-end device is far away from the sound receiving device, the near-end device has difficulty in hearing the far-end speaking content, but the far-end speaker is not known to the far-end speaker. It is common practice at this time for the proximal end to alert the distal end to adjust and to repeat multiple times. This delays time and the user experience is poor. In certain situations, the near end cannot interrupt the far end speech, and then only receives the fact that the near end cannot hear clearly, which greatly influences the conference effect (especially long-term speech at the far end).

In addition, there are two situations that affect the sound quality received by the near end: one is that the far-end environment has reverberation or noise, resulting in near-end inability to hear far-end speech; the other is that the network transmission bandwidth is insufficient, and the data packet loss or delay causes the remote voice to be intermittent.

Disclosure of Invention

The invention aims to provide a near-end control method of a far-end conference device, a remote conference system and a related device, so as to realize the direct control of the sound pick-up volume and quality of the far-end device from the near end, thereby improving the user experience.

The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.

According to a first aspect of the present invention, there is provided a near-end control method of a far-end conference apparatus, performed by the near-end conference apparatus, comprising: according to the control operation of the user, the corresponding control instruction is embedded into the voice data stream in an audio watermarking mode and is transmitted to the far-end conference device through an uplink channel.

In an embodiment, the control instructions include volume adjustment instructions, noise reduction instructions, and/or compression ratio adjustment instructions.

In an embodiment, the audio watermark further includes ID information of the remote conference device.

In one embodiment, the audio watermark is circularly embedded N times in the voice data stream at equal intervals.

According to a second aspect of the present invention, there is provided a near-end control method of a far-end conference apparatus, performed by the far-end conference apparatus, comprising: and receiving a voice data stream from a downlink channel, continuously detecting whether the voice data stream contains an audio watermark, extracting and analyzing a control instruction contained in the audio watermark if the voice data stream contains the audio watermark, and executing corresponding adjusting action according to the control instruction.

In one embodiment:

a, when the control instruction includes a volume adjustment instruction, the adjustment action includes:

a1, firstly, the recording gain of equipment is increased;

a2, if the recording gain is maximum, if the remote conference device is configured with a microphone array pickup function, attempting to start the microphone array pickup function;

b, when the control instruction includes a noise reduction instruction, the adjusting action includes:

b1, firstly, trying to start a microphone array voice enhancement function;

b2, if the remote equipment is not provided with the microphone array voice enhancement function and is provided with the single-channel noise suppression function, attempting to start the single-channel noise suppression function;

b3, if the remote equipment is not provided with a microphone array voice enhancement function and a single-channel noise suppression function, attempting to reduce the recording gain of the equipment;

c, when the control instruction comprises a compression rate adjustment instruction, the adjustment action comprises:

c1, firstly, attempting to adjust the data transmission rate;

and C2, if the data transmission rate reaches the limit, attempting to increase the data compression ratio.

In an embodiment, when the ID information is included in the audio watermark, after detecting that the audio watermark is included, before extracting and parsing the control instruction included in the audio watermark, the method further includes:

extracts and parses ID information contained in the audio watermark,

it is judged whether the ID information is identical to the own ID,

if so, the subsequent steps are performed.

In one embodiment, the detection of the audio watermark is determined to be included only when M audio watermarks are detected consecutively.

According to a third aspect of the present invention, there is provided a near-end conference apparatus comprising:

the control module is used for sending out corresponding control instructions according to control operations of users;

the embedding module is used for embedding the control instruction into the voice data stream in an audio watermarking manner;

and the sending module is used for transmitting the voice data stream to the far-end conference device through an uplink channel.

According to a fourth aspect of the present invention, there is provided a remote conference apparatus comprising:

a receiving module, configured to receive a voice data stream from a downlink channel, and continuously detect whether an audio watermark is included therein;

the analysis module is used for extracting and analyzing a control instruction contained in the audio watermark when the voice data stream is detected to contain the audio watermark;

and the execution module is used for executing corresponding adjusting actions according to the control instructions.

According to a fifth aspect of the present invention there is provided a teleconferencing system comprising at least one near-end conference device and at least one far-end conference device, each of said near-end conference devices being in communicative connection with at least one of said far-end conference devices, said near-end conference device being adapted to perform the method according to any of the first aspects and said far-end conference device being adapted to perform the method according to any of the second aspects.

The embodiment of the invention has the beneficial effects that: if the voice volume transmitted by a far-end conference device is too small or the voice quality is poor, a specific instruction can be transmitted to the selected far-end conference device through the near-end conference device, and after the far-end conference device receives the instruction, the sound pickup configuration parameters of the equipment are adjusted, so that the sound pickup volume and the sound pickup quality are improved. By converting the control instruction into the audio watermark and transmitting the audio watermark together with the call voice through the uplink channel, the call is not affected, and the network transmission control protocol is not required to be additionally added. In addition, the audio watermark generation algorithm adds an audio watermark to audio without affecting the audio hearing, and has certain robustness to various signal processing.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

The above features and advantages of the present invention will be better understood after reading the detailed description of embodiments of the present disclosure in conjunction with the following drawings. In the drawings, the components are not necessarily to scale and components having similar related features or characteristics may have the same or similar reference numerals.

Fig. 1 is a schematic diagram of a teleconferencing system in accordance with an embodiment of the present application;

FIG. 2 is a timing flow diagram of a method embodiment of the present application;

fig. 3 is a schematic block diagram of an embodiment of the apparatus of the present application.

Detailed Description

The invention is described in detail below with reference to the drawings and the specific embodiments. It is noted that the aspects described below in connection with the drawings and the specific embodiments are merely exemplary and should not be construed as limiting the scope of the invention in any way.

As shown in fig. 1, an embodiment of the present application provides a teleconference system, which is composed of a near-end conference device and a far-end conference device, where the near-end conference device and the far-end conference device are communicatively connected by a network, a mobile signal, and the like, so as to transmit audio data. The near-end conference device and the far-end conference device may be mobile phones, computers, hands-free phones, conference boxes, and the like, and are each provided with a speaker and a microphone.

Alternatively, the microphone of the far-end conference device may be a microphone array and support microphone signal processing algorithms with far-field pickup, voice enhancement and reverberation suppression functions. The far-end conference device can also support a single-channel signal processing algorithm and has the functions of noise reduction and reverberation suppression.

Based on the above system, the present embodiment provides a near-end control method of a far-end conference device, where when the volume of voice transmitted from a certain far-end conference device is too small or the sound quality is poor, a user can set a control instruction to be sent through a specific key on the near-end conference device.

The method is shown in fig. 2, and specifically comprises the following steps:

the near-end conference device embeds the corresponding control instruction into the voice data stream in an audio watermark mode through an audio watermark generation algorithm according to the control operation of the user, and transmits the control instruction to the far-end conference device through an uplink channel.

The remote conference device receives the voice data stream from the downlink channel, continuously detects whether the voice data stream contains the audio watermark, and if the voice data stream contains the audio watermark, extracts and analyzes a control instruction contained in the audio watermark through an audio watermark extraction algorithm, and executes corresponding adjustment action according to the control instruction.

The audio watermarking technology adopted by the method is taken as an important branch of digital watermarking, and mainly utilizes the characteristics of a Human Auditory System (HAS) to embed secret information into carrier information, so that the secret information is not easy to perceive, and the normal use and the auditory effect of the original audio information are not influenced.

The current audio watermark correlation algorithm is mature, and the existing watermark information embedding algorithm mainly comprises the following steps: least significant methods (LSBs), echo concealment methods, phase encoding methods, spread spectrum methods, discrete cosine transform Domain (DCT) methods, wavelet transform Domain (DWT) methods, fourier transform Domain (DFT) methods, and the like.

Existing audio watermarking techniques are commonly used for copyright protection and tracking and tracing to prevent audio leakage or theft. The method is to skillfully utilize the audio watermarking technology to realize the transmission of control instructions, add the audio watermarking to the audio through the audio watermarking generation algorithm, and not only can not influence the audio hearing, but also has certain robustness to various signal processing. The control instruction is converted into the audio watermark to be transmitted together with the call voice through the uplink channel, so that the call is not affected, and the network transmission control protocol is not required to be additionally added.

The control instruction at least comprises a volume adjusting instruction, a noise reduction instruction and/or a compression rate adjusting instruction.

A, when the control instruction comprises a volume adjustment instruction, the adjustment actions of the far-end conference device comprise:

a1, firstly, the recording gain of equipment is increased;

a2, if the recording gain is maximum, if the remote conference device is configured with a microphone array pickup function, an attempt is made to turn on the microphone array pickup function.

And B, when the control instruction comprises a noise reduction instruction, the adjusting action of the far-end conference device comprises the following steps:

b1, firstly, trying to start a microphone array voice enhancement function;

b3, if the remote device is not configured with the microphone array voice enhancement function and the single channel noise suppression function, attempting to lower the device recording gain.

And C, when the control instruction comprises a compression rate adjusting instruction, the adjusting action of the far-end conference device comprises the following steps:

c1, firstly, attempting to adjust the data transmission rate;

In a possible embodiment, since one near-end conference device may be connected to a plurality of far-end conference devices, in order to control a specified far-end conference device, the audio watermark may further include ID information of the specified far-end conference device, and after detecting that the audio watermark is included, the far-end conference device first extracts and parses the ID information included in the audio watermark to determine whether the ID information is the same as the ID of the far-end conference device itself, and if so, performs the subsequent steps.

Further, to increase robustness, the near-end conferencing device may embed the audio watermark into the voice data stream N times (e.g., 3 times) in equally spaced cycles. When M (e.g., 2) audio watermarks are detected consecutively, it is determined that an audio watermark is detected.

As shown in fig. 3, the embodiment of the present application further provides a near-end conference device and a far-end conference device, where the near-end conference device 310 includes:

the control module 311 is configured to issue a corresponding control instruction according to a control operation of a user;

an embedding module 312 for embedding the control instructions in the voice data stream in the form of an audio watermark;

a sending module 313, configured to transmit the voice data stream to the remote conference device through an upstream channel.

The remote conference device 320 includes:

a receiving module 321, configured to receive the voice data stream from the downlink channel and continuously detect whether the voice data stream contains an audio watermark;

the parsing module 322 is configured to extract and parse a control instruction included in the audio watermark when it is detected that the voice data stream includes the audio watermark;

the execution module 323 is configured to execute a corresponding adjustment action according to the control instruction.

The modules can be integrated into the existing conference device in the form of software or a combination of software and hardware.

The control operation of the user can be input into the near-end conference device through a special key or an operation interface, and if the near-end conference device is non-operation interface and cannot select a plurality of far-end devices, only control instructions of a single far-end device are supported. If the near-end conference equipment is provided with an operation interface, one-to-many remote conferences can be carried out with a plurality of far-end conference equipment through a network. The control process is described in detail below with two specific embodiments.

Example 1

Two conference devices without an operation interface are connected through a network to perform one-to-one remote conference. One of which is called a near-end conferencing device and the other is a far-end conferencing device. Wherein, the near-end conference device is provided with a special key called a remote-end conference device volume adjusting key.

During the conference, the user may press the special key if the volume of the speech received by the near-end device is too small. After the special key is pressed, the system inserts the volume adjustment control command in the uplink channel of the near-end conference device in the audio watermark mode in the following conversation process, and the volume adjustment control command is circularly inserted for 3 times at equal intervals.

The remote conference device system will continually detect whether audio received from the downstream channel contains an audio watermark. If an audio watermark is contained, the control command contained in the watermark is extracted and parsed. Aiming at the volume adjusting instruction, the system of the remote conference device firstly increases the recording gain of the device, and if the recording gain is maximum, the system tries to start the pick-up function of the microphone array; if the remote device is configured with a microphone array pickup function, the function is turned on. In this way, the volume of the voice received by the near-end conference device from the far-end will automatically increase.

Example 2

The near-end conference device is provided with an operation interface, and performs one-to-many remote conferences with a plurality of far-end conference devices through a network. The system of the near-end conference device supports three control functions of far-end volume adjustment, far-end signal noise reduction and far-end compression rate adjustment. On the control interface, the three functions correspond to three buttons. In addition, all remote conference devices which establish connection with the near conference devices are arranged on the control interface.

During a conference, if the near-end device receives too little voice volume from a far-end conference device, the user may first select the far-end conference device on the manipulation interface and then press the far-end volume adjustment button. The near-end conferencing device system will then insert the far-end conferencing device ID and the volume adjustment instructions together in the form of an audio watermark into the audio stream of the upstream channel, cyclically at regular intervals for 3 times.

Each remote conference device system will continually detect whether audio received from the downstream channel contains an audio watermark. If the watermark is detected to be contained twice in succession, the device ID and the control command contained in the watermark are extracted and parsed. If the device and the own device ID are the same, a control command is executed.

Aiming at the volume adjusting instruction, the system of the far-end conference device firstly increases the recording gain of the device, and if the recording gain is maximum, the system tries to start the pick-up function of the microphone array; if the remote device is configured with a microphone array pickup function, the function is turned on. In this way, the volume of the voice received by the near-end conference device from the far-end conference device will automatically increase.

Likewise, if the near-end device receives too much speech noise from a far-end conference device, the user may first select the far-end conference device on the manipulation interface and then press the far-end signal noise reduction button. The command is then embedded together with the device ID in the audio stream of the upstream channel in an audio watermarking fashion, with 3 equally spaced cycles. Each remote conference device system will continually detect whether audio received from the downstream channel contains an audio watermark. If the watermark is detected to be contained twice in succession, the device ID and the control command contained in the watermark are extracted and parsed. If the device and the own device ID are the same, a signal noise reduction instruction is executed.

Likewise, if the near-end device receives a voice discontinuity from a far-end conference device, the user may first select the far-end conference device on the manipulation interface and then press the far-end compressibility adjustment button. The command is then embedded together with the device ID in the audio stream of the upstream channel in an audio watermarking fashion, with 3 equally spaced cycles. Each remote conference device system will continually detect whether audio received from the downstream channel contains an audio watermark. If the watermark is detected to be contained twice in succession, the device ID and the control command contained in the watermark are extracted and parsed. If the device and the own device ID are the same, the compression rate adjustment instruction is executed.

In summary, the present application provides a near-end control method, a remote conference system and related devices for a remote conference device, which directly controls sound pickup volume and quality of the remote conference device from the near-end conference device by transmitting a control instruction by adopting an audio watermarking technology, so as to improve user experience of the remote conference. Because the audio watermarking technology is adopted, the communication is not affected, and the network transmission control protocol is not required to be additionally added. Meanwhile, the method and the system are also suitable for conference equipment with an operation interface and without an operation interface, do not depend on specific conference hardware and conference software, and can be integrated into the existing conference system for application.

In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.

The previous description of the disclosure is provided to enable any person skilled in the art to make or use the disclosure. Various modifications to the disclosure will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other variations without departing from the spirit or scope of the disclosure. Thus, the disclosure is not intended to be limited to the examples and designs described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description is of the preferred embodiment of the present application and is not intended to limit the invention to the particular form disclosed, but on the contrary, the intention is to cover all modifications, equivalents, alternatives, and alternatives falling within the spirit and scope of the invention.

Claims

1. A near-end control method of a far-end conference apparatus, which is executed by the near-end conference apparatus, comprising:

according to the control operation of the user, the corresponding control instruction is embedded into the voice data stream in an audio watermarking mode and is transmitted to the far-end conference device through an uplink channel.

2. The method according to claim 1, wherein the control instruction includes a volume adjustment instruction, a noise reduction instruction, and/or a compression rate adjustment instruction.

3. The method of claim 1, wherein the audio watermark further includes ID information of the remote conference device.

4. The method of claim 1, wherein the audio watermark is cyclically embedded N times in the voice data stream at equal intervals.

5. A near-end control method of a far-end conference apparatus, which is executed by the far-end conference apparatus, comprising:

and receiving a voice data stream from a downlink channel, continuously detecting whether the voice data stream contains an audio watermark, extracting and analyzing a control instruction contained in the audio watermark if the voice data stream contains the audio watermark, and executing corresponding adjusting action according to the control instruction.

6. The method for controlling a remote conference device according to claim 5, wherein,

a1, firstly, the recording gain of equipment is increased;

b1, firstly, trying to start a microphone array voice enhancement function;

c1, firstly, attempting to adjust the data transmission rate;

7. The method according to claim 5, wherein when the ID information is included in the audio watermark, after detecting that the audio watermark is included, extracting and parsing the control instruction included in the audio watermark is preceded by:

extracts and parses ID information contained in the audio watermark,

it is judged whether the ID information is identical to the own ID,

if so, the subsequent steps are performed.

8. The method according to claim 5, wherein the detection of the audio watermark is determined only when M audio watermarks are detected consecutively.

9. A near-end conferencing device, comprising:

10. A remote conference device, comprising:

11. A teleconferencing system, characterized by: comprising at least one near-end conference device and at least one far-end conference device, each of said near-end conference devices being communicatively connected to at least one of said far-end conference devices, said near-end conference devices being adapted to perform the method according to any of claims 1-4, said far-end conference devices being adapted to perform the method according to any of claims 5-8.