CN112637543A

CN112637543A - Audio and video conference method and device based on voice control

Info

Publication number: CN112637543A
Application number: CN202011473345.0A
Authority: CN
Inventors: 祝素伟; 苏占峰; 舒骋
Original assignee: Suirui Technology Group Co Ltd
Current assignee: Suirui Technology Group Co Ltd
Priority date: 2020-12-09
Filing date: 2020-12-09
Publication date: 2021-04-09

Abstract

The invention discloses an audio and video conference method and device based on voice control, wherein the method comprises the following steps: when the voice module in the standby state detects voice including awakening information, starting audio and video equipment; acquiring voiceprint characteristics in the voice including the awakening information; and comparing the voiceprint features in the voice containing the awakening information with a pre-stored voiceprint feature library, if the voiceprint features in the voice containing the awakening information belong to the data in the pre-stored voiceprint feature library, storing the voiceprint features in the voice containing the awakening information to the audio and video equipment, and setting the voiceprint permission corresponding to the voiceprint features in the voice containing the awakening information. The audio and video conference method and device based on voice control can realize the starting of audio and video equipment through voice, enhance the convenience of conference control, and can also set the voiceprint permission of a speaker.

Description

Audio and video conference method and device based on voice control

Technical Field

The invention relates to the field of audio and video conferences, in particular to an audio and video conference method and device based on voice control.

Background

The existing intelligent audio and video conference equipment needs to be manually controlled to be started, shut down and perform other control operations when being used every time, people unfamiliar with the equipment may need to spend some time on researching how to operate, the equipment is very inconvenient, and the conference time is likely to be delayed.

The information disclosed in this background section is only for enhancement of understanding of the general background of the invention and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.

Disclosure of Invention

The invention aims to provide an audio and video conference method and device based on voice control, which can realize the starting of audio and video equipment through voice, enhance the convenience of conference control and set the voiceprint permission of a speaker.

In order to achieve the above object, the present invention provides an audio/video conference method based on voice control, comprising: when the voice module in the standby state detects voice including awakening information, starting audio and video equipment; acquiring voiceprint characteristics in the voice including the awakening information; and comparing the voiceprint features in the voice containing the awakening information with a pre-stored voiceprint feature library, if the voiceprint features in the voice containing the awakening information belong to the data in the pre-stored voiceprint feature library, storing the voiceprint features in the voice containing the awakening information to the audio and video equipment, and setting the voiceprint permission corresponding to the voiceprint features in the voice containing the awakening information.

In an embodiment of the present invention, the audio/video conference method based on voice control further includes: after the audio and video equipment is started, the voice module detects voice information in real time; when the voice comprising the control information is detected, acquiring the voiceprint characteristics in the voice comprising the control information; comparing the voiceprint features in the voice including the control information with the voiceprint features locally stored in the audio and video equipment, if the voiceprint features in the voice including the control information are matched with the voiceprint features locally stored in the audio and video equipment, acquiring the voiceprint authority corresponding to the voiceprint features in the voice including the control information, and otherwise, ignoring the voice including the control information; and judging whether the voiceprint features in the voice containing the control information have corresponding control authority according to the voiceprint authority corresponding to the voiceprint features in the voice containing the control information, if so, executing control operation in the control information on the audio and video equipment, otherwise, ignoring the voice containing the control information.

In an embodiment of the present invention, the audio/video conference method based on voice control further includes: after the audio and video equipment is started, the voice module detects voice information in real time; when the voice comprising the control information is detected, acquiring the voiceprint characteristics in the voice comprising the control information; comparing the voiceprint features in the voice including the control information with a pre-stored voiceprint feature library, if the voiceprint features in the voice including the control information belong to data in the pre-stored voiceprint feature library, acquiring the voiceprint permission corresponding to the voiceprint features in the voice including the control information from the pre-stored voiceprint permission library, and otherwise, ignoring the voice including the control information; and judging whether the voiceprint features in the voice containing the control information have corresponding control authority according to the voiceprint authority corresponding to the voiceprint features in the voice containing the control information, if so, executing control operation in the control information on the audio and video equipment, otherwise, ignoring the voice containing the control information.

In an embodiment of the present invention, the audio/video conference method based on voice control further includes: after the audio and video equipment is executed with the control operation of starting the video conference, the voice module detects the voice in the video conference process in real time and acquires the voiceprint characteristics corresponding to the voice in the video conference process; and comparing the voiceprint features corresponding to the voice in the video conference process with a pre-stored voiceprint feature library, if the voiceprint features belong to the data in the pre-stored voiceprint feature library, converting the voice in the video conference process into text contents for storage, and otherwise, returning prompt information to the audio and video equipment, wherein the prompt information is used for prompting that the identity of the current speaker needs authentication and confirmation.

In an embodiment of the present invention, the audio/video conference method based on voice control further includes: when the audio and video equipment is executed to finish the operation of the conference, the stored text content is arranged into a conference summary and is sent to a mailbox of a conference participant and/or a designated person through a mail; the audio and video equipment executes shutdown operation, and the voice module returns to a standby state.

Based on the same inventive concept, the invention also provides an audio and video conference device based on voice control, which comprises: a voice module and an audio-video device. The audio and video equipment is coupled with the voice module, and when the audio and video equipment is in a power-off state, the voice module is in a standby state. The voice module is used for starting the audio and video equipment when voice including awakening information is detected for the first time in a standby state; the audio and video equipment is used for acquiring voiceprint characteristics in the voice including the awakening information; and the audio and video equipment is also used for comparing the voiceprint features in the voice containing the awakening information with a pre-stored voiceprint feature library, if the voiceprint features in the voice containing the awakening information belong to the data in the pre-stored voiceprint feature library, storing the voiceprint features in the voice containing the awakening information to the audio and video equipment, and setting the voiceprint permission corresponding to the voiceprint features in the voice containing the awakening information.

In an embodiment of the present invention, the voice module is further configured to detect voice information in real time after the audio/video device is turned on; the audio and video equipment is further used for acquiring voiceprint features in the voice containing the control information when the voice module detects the voice containing the control information; the voice control system is also used for comparing the voiceprint features in the voice containing the control information with the voiceprint features locally stored in the audio and video equipment, if the voiceprint features are matched with the voiceprint features locally stored in the audio and video equipment, then the voiceprint authority corresponding to the voiceprint features in the voice containing the control information is obtained, otherwise, the voice containing the control information is ignored; and the voice control module is further used for judging whether the voiceprint features in the voice containing the control information have corresponding control authority according to the voiceprint authority corresponding to the voiceprint features in the voice containing the control information, if so, performing corresponding operation according to the control information, otherwise, ignoring the voice containing the control information.

In an embodiment of the present invention, the voice module is further configured to detect a voice in a video conference process in real time after the audio/video device is executed with a control operation of starting the video conference; the audio and video equipment is also used for acquiring the voiceprint characteristics corresponding to the voice in the video conference process; and the voice recognition device is also used for comparing the voiceprint features corresponding to the voice in the video conference process with a pre-stored voiceprint feature library, if the voiceprint features belong to the data in the pre-stored voiceprint feature library, converting the voice in the video conference process into text contents for storage, and otherwise, returning prompt information to the audio and video equipment, wherein the prompt information is used for prompting that the identity of the current speaker needs authentication confirmation.

In an embodiment of the present invention, the audio/video device is further configured to, when executed to end the operation of the current conference, sort the stored text content into a conference summary, send the conference summary to a mailbox of a participant and/or a designated person through a mail, and then execute a shutdown operation; the voice module is also used for backing back to a standby state when the audio and video equipment is powered off.

Based on the same inventive concept, the present invention further provides a computer-readable storage medium, where a computer program is stored, and the computer program is configured to execute the audio and video conference method based on voice control according to any of the above embodiments.

Compared with the prior art, according to the audio and video conference method and device based on voice control, the voice module in standby can directly wake up the audio and video conference equipment after detecting the wake-up word, and the audio and video conference equipment can automatically execute the starting-up action after being awakened, so that the time is saved, and the convenience is improved. Preferably, the voiceprint authority of the speaker can be set, when the speaker sends out a voice including control information, whether the control operation in the voice belongs to the operation allowed by the voiceprint authority is judged according to the voiceprint authority, if the control operation in the voice belongs to the operation allowed by the voiceprint authority, the related control operation is executed on the audio and video equipment, the voice control audio and video conference is achieved, a remote controller can be replaced, hardware cost is saved, and control portability is further improved. Preferably, in the conference process, the audio and video device converts the voice detected by the voice module in real time into text content, when the operation of ending the conference is executed, the text content in the conference process is arranged into a conference summary, and the conference summary is automatically sent to the mailbox of the related personnel through a mail, so that the conference efficiency and the conference effect are greatly improved.

Drawings

Fig. 1 is a block diagram of steps of an audio-video conferencing method based on voice control according to an embodiment of the present invention;

fig. 2 is a block diagram of steps of an audio-video conferencing method based on voice control according to an embodiment of the present invention;

fig. 3 is a block diagram of steps of an audio-video conferencing method based on voice control according to an embodiment of the present invention;

fig. 4 is a block diagram of steps of an audio-video conferencing method based on voice control according to an embodiment of the present invention;

fig. 5 is a block diagram of an audio-video conference device based on voice control according to an embodiment of the present invention.

Detailed Description

The following detailed description of the present invention is provided in conjunction with the accompanying drawings, but it should be understood that the scope of the present invention is not limited to the specific embodiments.

Throughout the specification and claims, unless explicitly stated otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated element or component but not the exclusion of any other element or component.

In order to overcome the problem that a user is not easy to operate audio/video equipment, as shown in fig. 1, the embodiment provides an audio/video conference method based on voice control, and a standby voice module is used for detecting a wakeup word so as to automatically start the audio/video equipment.

Specifically, the voice module of the present embodiment is mainly composed of a low power consumption circuit that the microphone collects and wakes up the main device. The multi-path microphone is used as an input source, the analog microphone can be connected to an ADC conversion chip, and the conversion chip is connected to a main chip or a CODEC chip through interfaces such as I2S; the digital microphone can be directly connected to a main chip or a CODEC chip, and the main chip or the CODEC chip can be powered by partial voltage all the time, so that the microphone can be in a low-power-consumption working state all the time. Therefore, the audio and video equipment is in a low power consumption state close to shutdown at ordinary times, only weak power supply of the intelligent voice module is reserved, the voice module is in a standby state, and surrounding sound conditions are monitored at any time.

The audio and video conference method based on voice control comprises the following steps: S1-S3.

In step S1, when the voice module in the standby state detects a voice including the wake-up type information, the audio/video device is turned on. Specifically, when the microphone collects the voice of the wake-up information, the main chip is awakened through a Voice Activity Detection (VAD) module and works according to the decoded voice content, so that low-power consumption monitoring and wake-up operation of the voice mode of the audio and video conference equipment are achieved.

In step S2, a voiceprint feature in the voice including the wake-up class information is acquired. Wherein, obtaining the voiceprint characteristics in the voice comprises: and processing the voice signals and extracting voiceprint features. Specifically, sound waves are transmitted to a microphone of the equipment through air, different transmission paths exist in different spaces, refraction, diffraction and reflection are generated and finally superposed together, after the microphone collects voice signals, the voice signals needing to be recognized are enhanced, the voice which does not need to be recognized is suppressed, and the voice processing algorithm (such as echo elimination, wave velocity forming, sound source positioning, reverberation elimination, noise suppression, automatic gain, blind source separation and the like) is adopted to collect the needed voice signals and carry out digital processing. And then, sound feature extraction is carried out, wherein the extraction algorithm comprises Mel frequency cepstrum coefficients MFCC, linear prediction cepstrum coefficients LPCC, a multimedia content description interface MPEG7 and the like, wherein the MFCC is based on cepstrum and is more accordant with the human auditory principle, so that the MFCC is the most common and effective sound feature extraction algorithm. Prior to extraction of the MFCCs, pre-processing of the sound, including analog-to-digital conversion, pre-emphasis, and windowing, is required.

In step S3, comparing the voiceprint feature in the voice including the wake-up information with a pre-stored voiceprint feature library, if the voiceprint feature in the voice including the wake-up information belongs to the data in the pre-stored voiceprint feature library, storing the voiceprint feature in the voice including the wake-up information to the audio/video device, and setting a voiceprint right corresponding to the voiceprint feature in the voice including the wake-up information. The voiceprint permission can be set to different levels, and the different levels correspond to different control permission items. For example, the person who wakes up the audio and video device at this time may be defaulted as a conference host, and the voiceprint permission of the person is set to the highest level, so that the person is allowed to obtain the most control permissions. The control of the audio and video in the specification includes turning off, turning on, turning up the volume of the equipment, turning down the volume of the equipment, allowing a person to join the conference, and moving the person out of the conference.

Optionally, in order to implement that only a default conference host (i.e., a person who wakes up the audio and video device this time) is allowed to perform a control operation, as shown in fig. 2, the audio and video conference method based on voice control according to an embodiment further includes: step S4 to step S6.

In step S4, after the audio/video device is turned on, the voice module detects voice information in real time.

In step S5, when a voice including control class information is detected, acquiring a voiceprint feature in the voice including control class information; and comparing the voiceprint features in the voice containing the control information with the voiceprint features locally stored in the audio and video equipment, if the voiceprint features are matched with the voiceprint features locally stored in the audio and video equipment, acquiring the voiceprint authority corresponding to the voiceprint features in the voice containing the control information, and otherwise, ignoring the voice containing the control information.

In step S6, it is determined whether the voiceprint feature in the voice including the control type information has a corresponding control authority according to the voiceprint authority corresponding to the voiceprint feature in the voice including the control type information, and if the voiceprint feature in the voice including the control type information has the corresponding control authority, the control operation in the control type information is executed on the audio/video device, for example, when a conference host sends a voice of "please join three sheets into a conference", and it is confirmed that the conference host has the voiceprint authority, the audio/video device executes the operation of "join three sheets into a conference". Otherwise, the voice including the control class information is ignored.

In addition, in other embodiments, in order to give control operations to general persons or participants, rather than only the control of a conference host, the audio/video conference method based on voice control prestores a plurality of voiceprint features and corresponding voiceprint permissions, and different voiceprint permissions can be given to different persons according to requirements, so as to correspond to different control operations, such as only allowing the participants to control volume operations. Specifically, the audio and video conference method further includes: after the audio and video equipment is started, the voice module detects voice information in real time; when the voice comprising the control information is detected, acquiring the voiceprint characteristics in the voice comprising the control information; comparing the voiceprint features in the voice including the control information with a pre-stored voiceprint feature library, if the voiceprint features in the voice including the control information belong to data in the pre-stored voiceprint feature library, acquiring the voiceprint permission corresponding to the voiceprint features in the voice including the control information from the pre-stored voiceprint permission library, and otherwise, ignoring the voice including the control information; and judging whether the voiceprint features in the voice containing the control information have corresponding control authority according to the voiceprint authority corresponding to the voiceprint features in the voice containing the control information, if so, executing control operation in the control information on the audio and video equipment, otherwise, ignoring the voice containing the control information.

Preferably, in order to ensure the safety of the identities of the participants after the conference moderator starts the conference, as shown in fig. 3, the audio-video conference method based on voice control according to an embodiment further includes: step S7 to step S8.

In step S7, after the audio/video device is executed to start the control operation of the video conference, the voice module detects the voice in the video conference process in real time and obtains the voiceprint feature corresponding to the voice in the video conference process.

In step S8, comparing the voiceprint feature corresponding to the voice in the video conference process with a pre-stored voiceprint feature library, if the voiceprint feature belongs to the data in the pre-stored voiceprint feature library, converting the voice in the video conference process into text for storage, otherwise, returning a prompt message to the audio and video device, wherein the prompt message is used for prompting that the identity of the current speaker needs to be authenticated and confirmed. If the identity of the current speaker does not meet the requirement, the conference host can send out voice including 'move out of Liquan in the conference', and when the conference host is confirmed to have the voiceprint right, the Liquan can be removed from the conference.

Preferably, in order to automatically record and forward the conference content, as shown in fig. 4, the audio-video conference method based on voice control according to an embodiment further includes: step S9 to step S10.

In step S9, when the audio/video device is executed to end the conference, the saved text content is arranged as a conference summary and sent to the mailbox of the conference participant and/or the designated person by mail.

In step S10, the audio/video device performs a power-off operation, and the voice module falls back to the standby state.

Based on the same inventive concept, as shown in fig. 5, an embodiment further provides an audio/video conference device based on voice control, which includes: a voice module 10 and an audio-video device 11.

The audio and video device 11 is coupled with the voice module 10, the voice module 10 may be disposed inside the audio and video device 11, may also be disposed outside the audio and video device 11, may be inserted into or pulled out of the audio and video device 11, and may also be fixedly connected with the audio and video device 11, when the audio and video device 11 is in a power-off state, the voice module 10 is in a standby state. The voice module 10 is mainly composed of a low power consumption circuit for collecting and waking up a main device by a microphone. The multi-path microphone is used as an input source, the analog microphone can be connected to an ADC conversion chip, and the conversion chip is connected to a main chip or a CODEC chip through interfaces such as I2S; the digital microphone can be directly connected to a main chip or a CODEC chip, and the main chip or the CODEC chip can be powered by partial voltage all the time, so that the microphone can be in a low-power-consumption working state all the time. Therefore, the audio and video equipment is in a low power consumption state close to shutdown at ordinary times, only weak power supply of the intelligent voice module is reserved, the voice module is in a standby state, and surrounding sound conditions are monitored at any time.

The voice module 10 is configured to, when a voice including wake-up information is detected for the first time in a standby state, turn on the audio/video device 11; the audio and video device 11 is configured to obtain a voiceprint feature in the voice including the wake-up information; the audio and video device 11 is further configured to compare the voiceprint features in the voice including the wake-up information with a pre-stored voiceprint feature library, store the voiceprint features in the voice including the wake-up information to the audio and video device 11 if the voiceprint features in the voice including the wake-up information belong to data in the pre-stored voiceprint feature library, and set a voiceprint right corresponding to the voiceprint features in the voice including the wake-up information.

Optionally, in order to allow only a default conference host (i.e., a person who wakes up the audio/video device 11 this time) to perform a control operation, in an embodiment, the voice module 10 is further configured to detect voice information in real time after the audio/video device 11 is turned on. The audio and video device 11 is further configured to, when the voice module 10 detects a voice including control information, acquire a voiceprint feature in the voice including the control information; the voice recognition module is further configured to compare the voiceprint features in the voice including the control information with the voiceprint features locally stored in the audio/video device 11, and if the voiceprint features in the voice including the control information are matched with the voiceprint features locally stored in the audio/video device 11, obtain a voiceprint right corresponding to the voiceprint features in the voice including the control information, otherwise ignore the voice including the control information; and the voice control module is further used for judging whether the voiceprint features in the voice containing the control information have corresponding control authority according to the voiceprint authority corresponding to the voiceprint features in the voice containing the control information, if so, performing corresponding operation according to the control information, otherwise, ignoring the voice containing the control information.

In addition, in other embodiments, in order to give control operations to ordinary persons or participants in a conference, not only control of a conference host, the audio/video device 11 is configured to obtain a voiceprint feature in the voice including the control type information when the voice module 10 detects the voice including the control type information; comparing the voiceprint features in the voice including the control information with a pre-stored voiceprint feature library, if the voiceprint features in the voice including the control information belong to data in the pre-stored voiceprint feature library, acquiring the voiceprint permission corresponding to the voiceprint features in the voice including the control information from the pre-stored voiceprint permission library, and otherwise, ignoring the voice including the control information; and the audio/video device is further configured to determine whether the voiceprint feature in the voice including the control information has a corresponding control authority according to the voiceprint authority corresponding to the voiceprint feature in the voice including the control information, if so, execute the control operation in the control information on the audio/video device 11, otherwise, ignore the voice including the control information.

Preferably, in order to ensure the safety of the identities of the participants after the conference host starts the conference, in an embodiment, the audio/video device 11 is further configured to obtain a voiceprint feature corresponding to a voice in the video conference process; and the voice recognition module is further configured to compare voiceprint features corresponding to voices in the video conference process with a pre-stored voiceprint feature library, convert voices in the video conference process into text contents for storage if the voiceprint features belong to data in the pre-stored voiceprint feature library, and otherwise return prompt information to the audio and video equipment 11, where the prompt information is used to prompt that the identity of the current speaker needs authentication confirmation.

Preferably, in order to automatically record and forward the conference content, in an embodiment, the audio/video device 11 is further configured to, when the operation of ending the conference is executed, sort the stored text content into a conference summary, send the conference summary to a mailbox of a participant and/or a designated person through an email, and then execute a shutdown operation. The voice module 10 is further configured to fall back to a standby state when the audio/video device 11 is powered off.

Based on the same inventive concept, an embodiment further provides a computer-readable storage medium, where a computer program is stored, and the computer program is configured to execute the audio and video conference method based on voice control according to any one of the above embodiments.

In summary, according to the audio/video conference method and apparatus based on voice control in the embodiment, the voice module in standby mode can directly wake up the audio/video conference device after detecting the wake-up word, and the audio/video conference device can automatically execute the power-on action after being woken up, so that time is saved and convenience is improved. Preferably, the voiceprint authority of the speaker can be set, when the speaker sends out a voice including control information, whether the control operation in the voice belongs to the operation allowed by the voiceprint authority is judged according to the voiceprint authority, if the control operation in the voice belongs to the operation allowed by the voiceprint authority, the related control operation is executed on the audio and video equipment, the voice control audio and video conference is achieved, a remote controller can be replaced, hardware cost is saved, and control portability is further improved. Preferably, in the conference process, the audio and video device converts the voice detected by the voice module in real time into text content, when the operation of ending the conference is executed, the text content in the conference process is arranged into a conference summary, and the conference summary is automatically sent to the mailbox of the related personnel through a mail, so that the conference efficiency and the conference effect are greatly improved.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The foregoing descriptions of specific exemplary embodiments of the present invention have been presented for purposes of illustration and description. It is not intended to limit the invention to the precise form disclosed, and obviously many modifications and variations are possible in light of the above teaching. The exemplary embodiments were chosen and described in order to explain certain principles of the invention and its practical application to enable one skilled in the art to make and use various exemplary embodiments of the invention and various alternatives and modifications as are suited to the particular use contemplated. It is intended that the scope of the invention be defined by the claims and their equivalents.

Claims

1. An audio and video conference method based on voice control is characterized by comprising the following steps:

when the voice module in the standby state detects voice including awakening information, starting audio and video equipment;

acquiring voiceprint characteristics in the voice including the awakening information;

and comparing the voiceprint features in the voice containing the awakening information with a pre-stored voiceprint feature library, if the voiceprint features in the voice containing the awakening information belong to the data in the pre-stored voiceprint feature library, storing the voiceprint features in the voice containing the awakening information to the audio and video equipment, and setting the voiceprint permission corresponding to the voiceprint features in the voice containing the awakening information.

2. The audio-visual conferencing method based on voice control as claimed in claim 1, wherein the audio-visual conferencing method based on voice control further comprises:

after the audio and video equipment is started, the voice module detects voice information in real time;

when the voice comprising the control information is detected, acquiring the voiceprint characteristics in the voice comprising the control information;

comparing the voiceprint features in the voice including the control information with the voiceprint features locally stored in the audio and video equipment, if the voiceprint features in the voice including the control information are matched with the voiceprint features locally stored in the audio and video equipment, acquiring the voiceprint authority corresponding to the voiceprint features in the voice including the control information, and otherwise, ignoring the voice including the control information;

and judging whether the voiceprint features in the voice containing the control information have corresponding control authority according to the voiceprint authority corresponding to the voiceprint features in the voice containing the control information, if so, executing control operation in the control information on the audio and video equipment, otherwise, ignoring the voice containing the control information.

3. The audio-visual conferencing method based on voice control as claimed in claim 1, wherein the audio-visual conferencing method based on voice control further comprises:

comparing the voiceprint features in the voice including the control information with a pre-stored voiceprint feature library, if the voiceprint features in the voice including the control information belong to data in the pre-stored voiceprint feature library, acquiring the voiceprint permission corresponding to the voiceprint features in the voice including the control information from the pre-stored voiceprint permission library, and otherwise, ignoring the voice including the control information;

4. The audio-visual conferencing method based on voice control as claimed in claim 2, wherein the audio-visual conferencing method based on voice control further comprises:

after the audio and video equipment is executed with the control operation of starting the video conference, the voice module detects the voice in the video conference process in real time and acquires the voiceprint characteristics corresponding to the voice in the video conference process;

and comparing the voiceprint features corresponding to the voice in the video conference process with a pre-stored voiceprint feature library, if the voiceprint features belong to the data in the pre-stored voiceprint feature library, converting the voice in the video conference process into text contents for storage, and otherwise, returning prompt information to the audio and video equipment, wherein the prompt information is used for prompting that the identity of the current speaker needs authentication and confirmation.

5. The audio-visual conferencing method based on voice control as claimed in claim 4, wherein the audio-visual conferencing method based on voice control further comprises:

when the audio and video equipment is executed to finish the operation of the conference, the stored text content is arranged into a conference summary and is sent to a mailbox of a conference participant and/or a designated person through a mail;

the audio and video equipment executes shutdown operation, and the voice module returns to a standby state.

6. An audio-video conference device based on voice control, comprising:

a voice module;

the audio and video equipment is coupled with the voice module, and when the audio and video equipment is in a power-off state, the voice module is in a standby state;

the voice module is used for starting the audio and video equipment when voice including awakening information is detected for the first time in a standby state; the audio and video equipment is used for acquiring voiceprint characteristics in the voice including the awakening information; and the audio and video equipment is also used for comparing the voiceprint features in the voice containing the awakening information with a pre-stored voiceprint feature library, if the voiceprint features in the voice containing the awakening information belong to the data in the pre-stored voiceprint feature library, storing the voiceprint features in the voice containing the awakening information to the audio and video equipment, and setting the voiceprint permission corresponding to the voiceprint features in the voice containing the awakening information.

7. Audio-visual conferencing device based on speech control according to claim 6,

the voice module is also used for detecting voice information in real time after the audio and video equipment is started;

the audio and video equipment is further used for acquiring voiceprint features in the voice containing the control information when the voice module detects the voice containing the control information; the voice control system is also used for comparing the voiceprint features in the voice containing the control information with the voiceprint features locally stored in the audio and video equipment, if the voiceprint features are matched with the voiceprint features locally stored in the audio and video equipment, then the voiceprint authority corresponding to the voiceprint features in the voice containing the control information is obtained, otherwise, the voice containing the control information is ignored; and the voice control module is further used for judging whether the voiceprint features in the voice containing the control information have corresponding control authority according to the voiceprint authority corresponding to the voiceprint features in the voice containing the control information, if so, performing corresponding operation according to the control information, otherwise, ignoring the voice containing the control information.

8. Audio-visual conferencing arrangement based on speech control according to claim 7,

the voice module is also used for detecting the voice in the video conference process in real time after the audio and video equipment is executed with the control operation of starting the video conference;

the audio and video equipment is also used for acquiring the voiceprint characteristics corresponding to the voice in the video conference process; and the voice recognition device is also used for comparing the voiceprint features corresponding to the voice in the video conference process with a pre-stored voiceprint feature library, if the voiceprint features belong to the data in the pre-stored voiceprint feature library, converting the voice in the video conference process into text contents for storage, and otherwise, returning prompt information to the audio and video equipment, wherein the prompt information is used for prompting that the identity of the current speaker needs authentication confirmation.

9. Audio-visual conference device based on speech control, according to claim 8,

the audio and video equipment is also used for sorting the stored text content into a conference summary when the audio and video equipment is executed to finish the operation of the conference, sending the conference summary to a mailbox of the conference participant and/or the appointed personnel through a mail, and then executing shutdown operation;

the voice module is also used for backing back to a standby state when the audio and video equipment is powered off.

10. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program for executing the audio-video conference method based on voice control according to any one of claims 1 to 5.