CN116939144A - Voice control method and device for online conference - Google Patents

Voice control method and device for online conference Download PDF

Info

Publication number
CN116939144A
CN116939144A CN202310800749.3A CN202310800749A CN116939144A CN 116939144 A CN116939144 A CN 116939144A CN 202310800749 A CN202310800749 A CN 202310800749A CN 116939144 A CN116939144 A CN 116939144A
Authority
CN
China
Prior art keywords
conference
voice
control
content
text content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310800749.3A
Other languages
Chinese (zh)
Inventor
张峣
陈继辉
李嘉宾
朱道彦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Visionvera Information Technology Co Ltd
Original Assignee
Visionvera Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Visionvera Information Technology Co Ltd filed Critical Visionvera Information Technology Co Ltd
Priority to CN202310800749.3A priority Critical patent/CN116939144A/en
Publication of CN116939144A publication Critical patent/CN116939144A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/083Recognition networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/1066Session management
    • H04L65/1083In-session procedures
    • H04L65/1093In-session procedures by adding participants; by removing participants
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • H04L65/403Arrangements for multi-party communication, e.g. for conferences
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the application provides a voice control method, a voice control device, electronic equipment and a computer readable storage medium for an online conference, which are used for receiving input voice content through a chairman terminal and sending the subsequently received voice content to a conference server as conference control voice under the condition that the chairman terminal meets a voice mode awakening condition. The conference server then converts the conference control speech into corresponding semantic text content and sends the corresponding semantic text content back to the chairman terminal for operation confirmation of the semantic text content. And constructing a conference control instruction corresponding to the conference control voice after confirmation to control the conference. The conference management and control requirements of the conference host on the online conference are directly obtained through voice recognition, a conference management and control request is initiated to the conference server, and the conference management and control request is directly executed by the conference server after the conference management and control voice is confirmed, so that the dialogue communication process of the conference host and the conference controller is omitted, the operation time for conference management and control is saved, and the execution efficiency of the conference control of the host is improved.

Description

Voice control method and device for online conference
Technical Field
The present application relates to the field of video networking technologies, and in particular, to a method and apparatus for controlling voice of an online conference, an electronic device, and a computer readable storage medium.
Background
Aiming at the continuous development of network technology, an important real-time network technology of video networking is promoted aiming at the requirements of high-definition video transmission, network live broadcast and the like. The technology of the visual networking is also developed and matured at a rapid speed, and plays an irreplaceable important role in the background of the Internet in a large age by virtue of the technical advantages.
An online audio-video conference is an important application place of a video networking conference, and in the related art, in addition to participants in the online conference with multiple terminal participation, a dedicated conference host generally also exists to control the content rhythm of the conference.
However, in the prior art, the adjustment of the conference content by the presenter depends on the conference control personnel at one side of the conference server, and the presenter is required to inform the conference control personnel of the management requirement first, and then the conference control personnel completes the related operation, so that additional communication time inevitably exists, and the execution efficiency of the conference control by the presenter is low.
Disclosure of Invention
In view of the foregoing, embodiments of the present application are directed to providing a voice control method, apparatus, electronic device, and computer-readable storage medium for online conferencing that overcomes or at least partially solves the foregoing problems.
In a first aspect, an embodiment of the present application discloses a voice control method for an online conference, which is applied to a chairman terminal for the online conference, and the method includes:
receiving input voice content;
under the condition that the voice content contains a preset voice wake-up word, the subsequently received voice content is used as conference control voice and is sent to a conference server, so that the conference server converts the conference control voice into semantic text content and then sends the semantic text content to the chairman terminal;
and under the condition that the semantic text content sent by the conference server is received, sending confirmation information for the conference control voice to the conference server in response to the confirmation operation for the semantic text content, so that the conference server can control the conference according to the confirmation information and the semantic text content.
In a second aspect, an embodiment of the present application discloses a voice control method for an online conference, which is applied to a conference server for the online conference, and the method includes:
receiving conference control voice sent by a chairman terminal;
converting the conference control voice into corresponding semantic text content and sending the corresponding semantic text content to the chairman terminal so that the chairman terminal can respond to the confirmation operation of the semantic text content and send confirmation information of the conference control voice to the conference server;
And under the condition that confirmation information sent by the chairman terminal is received, constructing a conference control instruction corresponding to the conference control voice to control the conference. A step of
In a third aspect, the embodiment of the present application further provides a voice control device for an online conference, which is applied to a chairman terminal; the device comprises:
the voice input module is used for receiving input voice content;
the voice content sending module is used for taking the subsequently received voice content as conference control voice and sending the conference control voice to the conference server under the condition that the voice content contains a preset voice wake-up word, so that the conference server converts the conference control voice into semantic text content and then sends the semantic text content to the chairman terminal;
and the control voice confirmation module is used for responding to the confirmation operation of the semantic text content under the condition that the semantic text content sent by the conference server is received, sending the confirmation information of the conference control voice to the conference server, and controlling the conference by the conference server according to the confirmation information and the semantic text content.
In a fourth aspect, the embodiment of the application also provides a voice control device for online conference, which is applied to a conference server; the device comprises:
The voice receiving module is used for receiving conference control voice sent by the chairman terminal;
the voice conversion module is used for converting the conference control voice into corresponding semantic text content and sending the corresponding semantic text content to the chairman terminal so that the chairman terminal can respond to the confirmation operation of the semantic text content and send confirmation information of the conference control voice to the conference server;
and the conference control module is used for constructing a conference control instruction corresponding to the conference control voice to control the conference under the condition that the confirmation information sent by the chairman terminal is received.
In a fifth aspect, embodiments of the present application provide an electronic device comprising one or more processors; and one or more machine readable media having instructions stored thereon, which when executed by the one or more processors, cause the apparatus to perform the method of view networking based data transcoding as described in the first and second aspects.
In a sixth aspect, embodiments of the present application provide a computer readable storage medium storing a computer program for causing a processor to perform the methods of the first and second aspects.
The embodiment of the application has the following advantages:
in the embodiment of the application, the chairman terminal receives the input voice content, and sends the subsequently received voice content to the conference server as conference control voice under the condition that the voice content meets the voice mode awakening condition. The conference server then converts the conference control voice into corresponding semantic text content and sends the corresponding semantic text content back to the chairman terminal for operation confirmation of the semantic text content, and sends confirmation information for the conference control voice to the conference server. And under the condition that the conference server receives the confirmation information sent by the chairman terminal, constructing a conference control instruction corresponding to conference control voice to control the conference. The conference management and control requirements of the conference host on the online conference are directly obtained through voice recognition, the conference management and control requirements are converted into control voices, a conference management and control request is initiated to the conference server, and the conference management and control request is directly executed by the conference server after the conference control voices are confirmed, so that the conversation communication process of the conference host and the conference controller on the conference server side is omitted, the operation time for conference management and control is saved, and the execution efficiency of the conference control of the host is improved.
Drawings
FIG. 1 is a flow chart of the brief steps of an embodiment of a voice control method for an online conference of the present application, applied to a chairman terminal for an online conference;
FIG. 2 is a flow chart of the brief steps of an embodiment of a voice control method for an online conference of the present application, applied to a conference server for the online conference;
FIG. 3 is an interactive diagram of steps for implementing an online conference voice control method of the present application;
FIG. 4 is a diagram of the voice control logic of an online conference according to an embodiment of the present application;
fig. 5 is a terminal interaction relationship diagram of online conference voice control provided by an embodiment of the present application;
FIG. 6 is a block diagram of a voice control apparatus for an online conference according to the present application;
fig. 7 is a block diagram of a voice control apparatus for an on-line conference according to still another embodiment of the present application.
Detailed Description
In order that the above-recited objects, features and advantages of the present application will become more readily apparent, a more particular description of the application will be rendered by reference to the appended drawings and appended detailed description.
Fig. 1 shows a flow chart of the steps of a brief implementation of a voice control method for an online conference, which is applied to a chairman terminal for an online conference. The system for online conference may include a server and a terminal. The terminal in the online conference system can be an internet terminal or a video networking terminal. For example, the online conferencing system may be deployed in the internet of vision or the internet. In the embodiment of the application, the online conference system comprises a total conference, at least one sub-conference can be divided in the total conference, the sub-conference is managed by the total conference, the sub-conference comprises at least two terminals in the total conference, the sub-conference can be understood as a local conference divided in the total conference, and the sub-conference can be obtained by pulling at least two terminals in the total conference to create the conference. Each terminal in the sub-conference belongs to a total conference, and any terminal in the total conference can be only in the total conference, or can be in the total conference and at least one sub-conference.
As shown in fig. 1, the method flow may include the steps of:
step 101, receiving input voice content.
In the embodiment of the application, the terminal where the conference moderator is located is set as the chairman terminal by default. The chairman terminal is configured with a voice management unit and a display unit, wherein the voice management unit is used for collecting sound content sent by personnel of the chairman terminal and storing the sound content as an audio file; in addition, the voice management unit is also used for identifying the collected partial sound content. The voice control method provided by the embodiment of the application is applied to online conferences with multiple ends, various sound contents can not be received continuously in the process, and the chairman terminal is required to be provided with the function of distinguishing the conference control voice of the host from the voice of the conference contents in conventional communication. The display unit is used for displaying conference information and presenting unpredictable instant information in the whole conference process.
It should be noted that the speech management unit of the chairman terminal is not always active. The conference in the embodiment of the application is deployed in an online conference system based on the video networking, and particularly conference linkage between the chairman terminal and at least part of the participant terminals is realized through conference software positioned on a conference server. Therefore, before the conference starts, the moderator needs to turn on the enabling switch for the voice management unit at the software side of the conference system, and set the voice control mode to the "on" state, so that the moderator terminal starts to monitor and recognize the voice content.
Step 102, under the condition that the voice content contains a preset voice wake-up word, the subsequently received voice content is used as conference control voice and sent to a conference server, so that the conference server converts the conference control voice into semantic text content and then sends the semantic text content to the chairman terminal.
And for the voice content acquired by the voice management unit, under the condition that the voice content satisfies the voice wake-up mode, the chairman terminal jumps to the voice wake-up mode, and takes all the voice content received after the mode enabling moment as conference control voice of the host for the whole online conference.
In the embodiment of the application, the voice wake-up word is adopted for the recognition of the voice wake-up mode, namely, the voice mode wake-up condition can be met under the condition that the voice wake-up word exists in the voice content collected by the voice management unit.
After the voice management unit completely collects the current conference control voice and packages the current conference control voice into a corresponding audio file, the chairman terminal immediately sends the audio file to a conference server where the conference system is located, so that the conference server converts the conference control voice into corresponding semantic text content and sends the corresponding semantic text content back to the chairman terminal for command confirmation.
Step 103, under the condition that the semantic text content sent by the conference server is received, sending confirmation information for the conference control voice to the conference server in response to the confirmation operation for the semantic text content, so that the conference server can control the conference according to the confirmation information and the semantic text content.
After the chairman terminal sends the packaged audio file to the conference server, the received voice text content and operation confirmation information sent by the conference server are used as feedback. The chairman terminal will then inform the conference moderator of the confirmation of the operation of the previously issued conference control voice,
although wake-up words for distinguishing conventional voice contents from conference control voices are set in the embodiment of the application, the system may still have a risk of inaccurate voice recognition. Furthermore, since the conference content is in a network environment, there is also a potential risk of eavesdropping by a third party lawbreaker if the conference content is designed for commercial secrets or other private information. Therefore, before each conference control voice is executed, the chairman terminal where the host is located is required to carry out secondary confirmation on the conference control voice so as to ensure that the current conference control operation accords with the wish of the conference host.
Specifically, for semantic text content corresponding to conference control voice sent by a conference server, a chairman terminal can display the semantic text content to a conference host through a display unit and attach an operation confirmation prompt for the voice text content, and after the conference host determines to execute the operation, the chairman terminal can send confirmation information for the conference control voice to the conference server so that the conference server can control an online conference according to the confirmation information and the semantic text content.
In summary, in the embodiment of the application, the chairman terminal receives the input voice content, and sends the subsequently received voice content to the conference server as conference control voice under the condition that the voice content meets the voice mode awakening condition. The conference server then converts the conference control voice into corresponding semantic text content and sends the corresponding semantic text content back to the chairman terminal for operation confirmation of the semantic text content, and sends confirmation information for the conference control voice to the conference server. And under the condition that the conference server receives the confirmation information sent by the chairman terminal, constructing a conference control instruction corresponding to conference control voice to control the conference. The conference management and control requirements of the conference host on the online conference are directly obtained through voice recognition, the conference management and control requirements are converted into control voices, a conference management and control request is initiated to the conference server, and the conference management and control request is directly executed by the conference server after the conference control voices are confirmed, so that the conversation communication process of the conference host and the conference controller on the conference server side is omitted, the operation time for conference management and control is saved, and the execution efficiency of the conference control of the host is improved.
Fig. 2 shows a flow chart of steps for a brief implementation of a voice control method for an online conference, which is applied to a conference server for an online conference. As shown in fig. 2, the method flow may include the steps of:
step 201, receiving conference control voice sent by a chairman terminal.
Step 202, converting the conference control voice into corresponding semantic text content and sending the semantic text content to the chairman terminal, so that the chairman terminal responds to the confirmation operation of the semantic text content, and sending confirmation information of the conference control voice to the conference server.
For steps 201 to 202, when receiving the conference control voice sent by the chairman terminal, the conference server side first converts the voice information into corresponding text information by means of voice recognition for confirmation by the conference chairman hot of the chairman terminal. Although wake-up words for distinguishing conventional voice contents from conference control voices are set in the embodiment of the application, the system may still have a risk of inaccurate voice recognition. Furthermore, since the conference content is in a network environment, there is also a potential risk of eavesdropping by a third party lawbreaker if the conference content is designed for commercial secrets or other private information. Therefore, before each conference control voice is executed, the chairman terminal where the chairman is located is required to carry out secondary confirmation on the conference control voice so as to ensure that the current conference control operation accords with the wish of the conference chairman and ensure the safety for conference control.
And 203, under the condition that confirmation information sent by the chairman terminal is received, constructing a conference control instruction corresponding to the conference control voice to control the conference.
After sending the semantic text content to the chairman terminal and receiving a confirmation operation for the semantic text content, i.e. the conference moderator has recognized the execution operation for the previously sent conference control speech. The conference server creates conference control instructions executable by the system according to the identified semantic text content and executes the conference control instructions.
In the embodiment of the application, the more conventional control operation types aiming at online conference control include: the method comprises the steps of moving a current participant terminal A out of a current conference, pulling a terminal C into the current conference, calling the picture content of the participant terminal C, displaying the picture as a main picture of the conference, and silencing voice input of all other terminals except a speaking participant terminal D into a silencing mode.
Correspondingly, the conference control instruction is converted into a conference control instruction which can be recognized by the system, for example, if a current participant terminal A moves out of the current conference, the network link between the participant terminal A and the current conference is disconnected.
In summary, in the embodiment of the application, the chairman terminal receives the input voice content, and sends the subsequently received voice content to the conference server as conference control voice under the condition that the voice content meets the voice mode awakening condition. The conference server then converts the conference control voice into corresponding semantic text content and sends the corresponding semantic text content back to the chairman terminal for operation confirmation of the semantic text content, and sends confirmation information for the conference control voice to the conference server. And under the condition that the conference server receives the confirmation information sent by the chairman terminal, constructing a conference control instruction corresponding to conference control voice to control the conference. The conference management and control requirements of the conference host on the online conference are directly obtained through voice recognition, the conference management and control requirements are converted into control voices, a conference management and control request is initiated to the conference server, and the conference management and control request is directly executed by the conference server after the conference control voices are confirmed, so that the conversation communication process of the conference host and the conference controller on the conference server side is omitted, the operation time for conference management and control is saved, and the execution efficiency of the conference control of the host is improved.
Fig. 3 shows an interactive diagram of implementation steps of a voice control method for an online conference, where the interactive end includes a chairman terminal and a conference server for the online conference. As shown in fig. 3, the method flow may include the steps of:
step 301, receiving input voice content.
Referring to fig. 4, a voice control logic relationship diagram of an online conference according to an embodiment of the present application is shown. As shown in fig. 4, the chairman terminal will first receive an external voice input and perform signaling interaction with the conference server where the pamir software is located. The method comprises the following steps: (1) and under the condition that the input voice content meets the voice mode awakening condition, the chairman terminal takes the subsequently received voice content as conference control voice and sends the conference control voice to the conference server, so that the conference server converts the conference control voice into semantic text content and then sends the semantic text content to the chairman terminal. (2) And under the condition that the chairman terminal receives the semantic text content sent by the conference server, the chairman terminal responds to the confirmation operation for the semantic text content and sends confirmation information for conference control voice to the conference server so that the conference server can control the conference according to the confirmation information and the semantic text content. (3) After receiving the conference control voice sent by the chairman terminal, the conference server converts the conference control voice into corresponding semantic text content and sends the corresponding semantic text content to the chairman terminal, so that the chairman terminal responds to the confirmation operation of the semantic text content, and sends confirmation information of the conference control voice to the conference server. (4) And under the condition that the conference server receives the confirmation information sent by the chairman terminal, constructing a conference control instruction corresponding to the conference control voice to control the conference.
It should be noted that, in the embodiment of the present invention, the video networking conference scheduling software (parameter) running on the conference server is video networking conference scheduling software with comprehensive functions, high execution efficiency and stable running. The video conference management system has a powerful video conference management function, and supports the scheduling of video sources such as video networking terminals, electronic matrixes, monitoring broadcast, live broadcast sources, media synthesizers, electric walls, video monitoring access, third party conference terminals accessed by XMU, mobile phones, PC soft terminals, computer screen capturing, unmanned aerial vehicles and the like. The system has super flexibility and can meet the video scheduling requirements of tens of thousands of point ultra-large video conference and emergency command.
With continued reference to fig. 4, when the chairman terminal receives the semantic text content sent by the conference server, the semantic text content is displayed in the form of subtitles through the display screen, so that the conference moderator can read and confirm the conference control operation ensured by the semantic text content.
Chairman terminal and each participant terminal: the participant terminal A, the participant terminal B and the participant terminal C are connected through the V2V protocol network in a memory way. V2V is an autonomous security protocol fusing commercial cryptographic techniques, and is mainly used for cryptographically guaranteeing the video networking in the following aspects. The network communication layer performs authenticity and confidentiality protection from the aspects of core switching node networking authentication, service management server access authentication, terminal equipment access authentication, link transmission encryption and the like; and in the business service layer, business and data protection are performed in aspects of business data encryption, control signaling integrity, management instruction integrity, business authentication and the like. Meanwhile, the security management of the user of the video networking is enhanced, the system and the resources in the access rights of legal users are ensured, and the password application evaluation requirement of the related information system is met.
(1) Password device
The scheme takes a cryptographic algorithm, a cryptographic protocol and a cryptographic interface as a cryptographic basis, and uses software and hardware cryptographic equipment such as a key management service system, a certificate management service system, a server cryptographic machine, a PCIE encryption card, a USBKey and the like to provide cryptographic service support for the video networking.
(2) Password infrastructure
The scheme combines the cryptographic technology with the visual networking, and realizes the deployment of cryptographic infrastructures such as key management, certificate management and the like in an electronic government visual networking environment. Services such as generating, storing, using, distributing, importing and exporting, backing up and recovering keys are provided by key management; services such as issuing, storing, querying certificate information and the like are provided by certificate management.
(3) Cryptographic services
Based on the password service commonly constructed by the password equipment and the password infrastructure, the password service of data encryption/decryption, signature/verification, identity authentication and the like is provided for the upper-layer visual network core service application.
(4) Cryptographic applications
The equipment networking and the equipment networking are realized based on the password technology to carry out identity authentication; based on the cryptographic technology, realizing the service safety authentication between each service system and the terminal, and ensuring the legal identity of the equipment participating in the service; based on the password technology, the password security design is integrated into the protocol, so that attack behaviors such as replay, tampering, damage and the like are resisted, and the secure transmission of the audio and video service data is realized; and (3) carrying out integrity check on the key configuration information reading and storing process based on the password technology, and preventing malicious tampering.
Step 302, under the condition that the voice content contains a preset voice wake-up word, the subsequently received voice content is used as conference control voice and sent to a conference server.
The step may refer to the step 102, and the description of this embodiment is omitted here.
Optionally, step 302 may include the steps of:
sub-step 3021: and under the condition that the voice content comprises a preset voice wake-up word and the voice wake-up word is positioned at the prefix position of the whole voice content, taking the subsequently received voice content as conference control voice and sending the conference control voice to a conference server.
Referring to fig. 5, a terminal interaction diagram of on-line conference voice control provided by an embodiment of the present application is shown. As shown in fig. 5, according to the interaction sequence from the two ends to the top, the voice control switch is turned on by Pamir first, and then the terminal is notified to turn on the voice recognition module. The chairman terminal enters a voice recognition mode when opening a voice recognition module, and simultaneously outputs a voice recognition opening caption prompt on a display screen. It should be noted that, the prompt is presented at the uppermost layer of the display device picture in the form of a closed caption, i.e. the prompt is not interfered by other conference contents.
In one embodiment of the present application, when the voice wake-up word "flat-headed and" is recognized in the voice content received by the chairman terminal, it is determined that the voice mode wake-up condition is satisfied so as to enter a state in which the subsequently received voice content is regarded as conference control voice, and then the notification of the pamir is prepared to receive the voice input command.
In one embodiment of the present application, it is set that the voice content satisfies the voice mode wake-up condition only when the voice content includes a preset voice wake-up word and the voice wake-up word is located at a prefix position of the whole voice content. For example: flat-headed brother, which changes a to speaker 1".
And 303, converting the conference control voice into corresponding semantic text content and sending the semantic text content to the chairman terminal.
The step may refer to the step 202, and this embodiment is not described herein.
Optionally, step 303 may further include the steps of:
substep 3031, when it is recognized that the semantic text content does not conform to the preset speech control format specification, sending specification alarm information of the speech content to the chairman terminal end, so that the chairman terminal end receives and responds to the specification alarm information, and displaying prompt information that the current speech content does not conform to the preset speech specification.
In the embodiment of the application, a normalized voice control template is also arranged for conference control voice. If and only if the conference control voice sent by the chairman terminal meets the preset voice control format, the corresponding semantic text content is further sent back to the chairman terminal for confirmation. The "sound control format specification" is preset in the terminal software parameter of the conference server, for example: "change a to speaker 1", "set a as chairman viewer", "add a to conference", "adjust viewing screen of a to left and right split screens", etc. are conference control voices satisfying the voice control format specifications.
And the voice contents such as 'letting A speak', 'pulling A into conference' are conference control voices which do not accord with the specification of the voice control format, and if the voice command is not specified in the operation process, the terminal cannot accurately recognize and execute.
Step 304, receiving semantic text content sent by the conference server, performing a confirmation operation on the semantic text content, and sending confirmation information on the conference control voice to the conference server.
The step may refer to step 103, and this embodiment is not described herein.
Optionally, step 304 may include the steps of:
substep 3041, presenting the semantic text content and a confirmation prompt for the conference control voice.
For example, after receiving the semantic text content sent by the conference server, the chairman terminal may display a prompt word of "whether the command is correct or not, and request to confirm with voice" on the display screen. The conference moderator's input voice will then continue to be collected to confirm whether the conference control instruction is to be executed. It should be noted that, the prompt is presented at the uppermost layer of the display device picture in the form of a closed caption, i.e. the prompt is not interfered by other conference contents.
Substep 3042, receiving a reply voice to the confirmation prompt.
And step 3043, sending acknowledgement information to the conference server to control the conference according to the acknowledgement information and the conference control voice when the content of the reply voice accords with the acknowledgement meaning.
For sub-step 3042-sub-step 3043, after receiving the reply voice for the confirmation prompt, the chairman terminal sends a confirmation message back to the conference server, indicating confirmation to execute the conference control instruction, so that the conference server controls the conference according to the confirmation message and the conference control voice.
Optionally, substep 3043 may include the steps of:
substep B1: and re-entering the step of taking the subsequently received voice content as conference control voice and sending the conference control voice to a conference server under the condition that the reply voice for the confirmation prompt is not received within the preset receiving time threshold.
With continued reference to fig. 5, in particular, if the mind of the presenter changes, the presenter is reluctant to execute the control content corresponding to the previously transmitted conference control voice, or to execute another conference control, it may be selected not to respond to the confirmation information within the reply time set by the system. Thus, the chairman terminal will not receive the confirmation reply voice, but will enter the mode of receiving voice command again to restart receiving the conference control voice.
And 305, receiving and responding to the confirmation information sent by the chairman terminal, and constructing a conference control instruction corresponding to the conference control voice to control the conference.
Optionally, step 305 may include the steps of:
a substep 3051 of identifying the voice text content, and determining a target terminal and a target control parameter to be controlled, where the target terminal includes: the chairman terminal or at least part of the participant terminals.
And step 3052, obtaining an identity corresponding to the target terminal, and generating a conference control instruction according to the identity and the target control parameter.
For substep 3051, substep 3052, after determining the target terminal and target control parameters that need to be controlled, the system will generate matching executable instructions to control the conference. For example:
1. altering speaker 1: the presenter speaks the voice "change a to speaker 1" (where a is the name of the participant), and after recognizing the instruction, the terminal will make a subtitle on the presenter's viewing screen "change a to speaker 1, confirm? After the moderator answers yes, the terminal will send a command dllchangesefakerex 123-321-12345-0-123-321-56789-0-0 to the pamir (123-321-12345-0-0 is the moderator terminal number, 123-321-56789-0-0 is the participant a terminal number), which is executed by the pamir.
2. Setting (changing) a chairman viewing party, namely, the chairman speaks a voice of "set a as the chairman viewing party" (wherein a is the name of the participant), and after recognizing the instruction, the terminal will play a caption on the chairman viewing picture "set a as the chairman viewing party, confirm? ", after the moderator answers yes, the terminal will send a command DllChangeMasterViewing 123-321-56789-0-0 to pamir (123-321-56789-0-0 is the participant a terminal number), which is executed by pamir.
And a substep 3053, executing the conference control instruction, and sending an execution result of the conference control instruction to the chairman terminal and the target terminal, so that the chairman terminal and the target terminal respond to the execution result.
Because the whole online conference system runs on the conference server and the terminal software thereof, after the system executes the control content of the conference control voice, the conference control voice needs to be reflected on all the current terminals, so after the conference server executes the conference control instruction, the execution result of the conference control instruction needs to be sent to the chairman terminal and the target terminal, so that the chairman terminal and the target terminal can respond to the execution result.
For example: for the control operation of changing a to speaker 1, the system will only be responsible for forwarding audio input by the participant terminal a to each terminal to ensure that all audio content received by each terminal in the conference is from terminal a.
In summary, in the embodiment of the application, the chairman terminal receives the input voice content, and sends the subsequently received voice content to the conference server as conference control voice under the condition that the voice content meets the voice mode awakening condition. The conference server then converts the conference control voice into corresponding semantic text content and sends the corresponding semantic text content back to the chairman terminal for operation confirmation of the semantic text content, and sends confirmation information for the conference control voice to the conference server. And under the condition that the conference server receives the confirmation information sent by the chairman terminal, constructing a conference control instruction corresponding to conference control voice to control the conference. The conference management and control requirements of the conference host on the online conference are directly obtained through voice recognition, the conference management and control requirements are converted into control voices, a conference management and control request is initiated to the conference server, and the conference management and control request is directly executed by the conference server after the conference control voices are confirmed, so that the conversation communication process of the conference host and the conference controller on the conference server side is omitted, the operation time for conference management and control is saved, and the execution efficiency of the conference control of the host is improved.
It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the application.
Fig. 6 shows a block diagram of a voice control device for an online conference, which can be applied to a chairman terminal, and the voice control device 40 specifically includes the following modules:
a voice input module 401 for receiving input voice content;
the voice content sending module 402 is configured to, when the voice content includes a preset voice wake-up word, send a subsequently received voice content as conference control voice to a conference server, so that the conference server converts the conference control voice into semantic text content and sends the semantic text content to the chairman terminal;
and the control voice confirmation module 403 is configured to send confirmation information for the conference control voice to the conference server in response to a confirmation operation for the semantic text content when the semantic text content sent by the conference server is received, so that the conference server controls the conference according to the confirmation information and the semantic text content.
In an embodiment of the present application, the voice content sending module 402 may further include:
and the wake-up condition judging sub-module is used for determining that the voice content meets the voice mode wake-up condition under the condition that the voice content contains a preset voice wake-up word.
In the embodiment of the present application, the wake condition judgment sub-module may further include:
the wake-up condition judging unit is used for determining that the voice content meets the voice mode wake-up condition under the condition that the voice content contains a preset voice wake-up word and the voice wake-up word is located at the prefix position of the whole voice content.
In an embodiment of the present application, the control voice confirmation module 403 may further include:
the voice text display sub-module is used for displaying the semantic text content and confirming prompts aiming at the conference control voice;
a reply voice receiving sub-module for receiving a reply voice for the confirmation prompt;
and the confirmation information sending sub-module is used for sending confirmation information to the conference server under the condition that the content of the reply voice accords with the confirmation meaning so that the conference server can control the conference according to the confirmation information and the conference control voice.
Optionally, the reply voice receiving sub-module may further include:
and the reply voice updating unit is used for re-entering the step of taking the subsequently received voice content as conference control voice and sending the conference control voice to the conference server under the condition that the reply voice for the confirmation prompt is not received within the preset receiving time threshold.
Fig. 7 shows a block diagram of a voice control device for an online conference, which can be applied to a chairman terminal, and the voice control device 50 specifically includes the following modules:
a voice receiving module 501, configured to receive conference control voice sent by a chairman terminal;
the voice conversion module 502 is configured to convert the conference control voice into corresponding semantic text content and send the semantic text content to the chairman terminal, so that the chairman terminal responds to a confirmation operation of the semantic text content, and sends confirmation information of the conference control voice to the conference server;
and the conference control module 503 is configured to construct a conference control instruction corresponding to the conference control voice to control the conference when receiving the confirmation information sent by the chairman terminal.
In an embodiment of the present application, the conference control module 503 may further include:
The control content determining submodule is used for identifying the voice text content and determining a target terminal and a target control parameter which need to be controlled, and the target terminal comprises: the chairman terminal or at least part of the participant terminals;
the control instruction generation sub-module is used for acquiring an identity corresponding to the target terminal and generating a conference control instruction according to the identity and the target control parameter;
and the control instruction execution sub-module is used for executing the conference control instruction and sending an execution result of the conference control instruction to the chairman terminal and the target terminal so that the chairman terminal and the target terminal can respond to the execution result.
In an embodiment of the present application, the apparatus may further include:
and the normative warning module is used for sending normative warning information of the voice content to the chairman terminal end under the condition that the semantic text content is recognized to be not in accordance with the preset voice control format normative, so that the chairman terminal end can receive and respond to the normative warning information and display prompt information that the current voice content is not in accordance with the preset voice normative.
In summary, in the voice control device for online conference provided by the embodiment of the application, the chairman terminal receives the input voice content, and sends the subsequently received voice content to the conference server as conference control voice under the condition that the voice content meets the voice mode wake-up condition. The conference server then converts the conference control voice into corresponding semantic text content and sends the corresponding semantic text content back to the chairman terminal for operation confirmation of the semantic text content, and sends confirmation information for the conference control voice to the conference server. And under the condition that the conference server receives the confirmation information sent by the chairman terminal, constructing a conference control instruction corresponding to conference control voice to control the conference. The conference management and control requirements of the conference host on the online conference are directly obtained through voice recognition, the conference management and control requirements are converted into control voices, a conference management and control request is initiated to the conference server, and the conference management and control request is directly executed by the conference server after the conference control voices are confirmed, so that the conversation communication process of the conference host and the conference controller on the conference server side is omitted, the operation time for conference management and control is saved, and the execution efficiency of the conference control of the host is improved.
For the device embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points.
The embodiment of the application discloses an electronic device, which comprises: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the apparatus to perform the method of processing conference data as claimed in any of the preceding claims.
An embodiment of the application discloses a computer-readable storage medium storing a computer program for causing a processor to execute a method for processing conference data according to any one of the above.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.
It will be apparent to those skilled in the art that embodiments of the present application may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
Embodiments of the present application are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.
The above description of the application provides a method and a device for electronic focusing of a camera, and specific examples are applied to illustrate the principles and embodiments of the application, and the above description of the examples is only used to help understand the method and core idea of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. A voice control method for an online conference, applied to a chairman terminal of the conference, the method comprising:
receiving input voice content;
under the condition that the voice content contains a preset voice wake-up word, the subsequently received voice content is used as conference control voice and is sent to a conference server, so that the conference server converts the conference control voice into semantic text content and then sends the semantic text content to the chairman terminal;
and under the condition that the semantic text content sent by the conference server is received, sending confirmation information for the conference control voice to the conference server in response to the confirmation operation for the semantic text content, so that the conference server can control the conference according to the confirmation information and the semantic text content.
2. The method according to claim 1, wherein in the case that the voice content includes a preset voice wake word, the step of sending the subsequently received voice content to the conference server as conference control voice includes:
and under the condition that the voice content comprises a preset voice wake-up word and the voice wake-up word is positioned at the prefix position of the whole voice content, taking the subsequently received voice content as conference control voice and sending the conference control voice to a conference server.
3. The method of claim 1, wherein the sending acknowledgement information for the conference control voice to the conference server in response to the acknowledgement operation for the semantic text content comprises:
a confirmation prompt for the conference control voice is displayed for the semantic text content;
receiving a reply voice for the confirmation prompt;
and sending confirmation information to the conference server under the condition that the content of the reply voice accords with the confirmation meaning, so that the conference server can control the conference according to the confirmation information and the conference control voice.
4. The content of claim 3, wherein after presenting the semantic text content and the confirmation prompt for the conference control voice, the method further comprises:
And re-entering the step of taking the subsequently received voice content as conference control voice and sending the conference control voice to a conference server under the condition that the reply voice for the confirmation prompt is not received within the preset receiving time threshold.
5. A voice control method for a video-on-internet conference, applied to a conference server of the conference, the method comprising:
receiving conference control voice sent by a chairman terminal;
converting the conference control voice into corresponding semantic text content and sending the corresponding semantic text content to the chairman terminal so that the chairman terminal can respond to the confirmation operation of the semantic text content and send confirmation information of the conference control voice to the conference server;
and under the condition that confirmation information sent by the chairman terminal is received, constructing a conference control instruction corresponding to the conference control voice to control the conference.
6. The method of claim 5, wherein the constructing conference control instructions corresponding to the conference control voices controls the conference, comprising:
identifying the voice text content, and determining a target terminal and target control parameters which need to be controlled, wherein the target terminal comprises: the chairman terminal or at least part of the participant terminals;
Acquiring an identity corresponding to the target terminal, and generating a conference control instruction according to the identity and a target control parameter;
and executing the conference control instruction, and sending an execution result of the conference control instruction to the chairman terminal and the target terminal so that the chairman terminal and the target terminal respond to the execution result.
7. The method of claim 5, wherein after transmitting the conference control voice to a conference service terminal, the method further comprises:
and under the condition that the semantic text content is not in accordance with the preset voice control format specification, sending specification alarm information of the voice content to the chairman terminal end so that the chairman terminal can receive and respond to the specification alarm information to display prompt information that the current voice content is not in accordance with the preset voice specification.
8. A voice control apparatus for an online conference, the apparatus being applied to a chairman terminal for the conference, the apparatus comprising:
the voice input module is used for receiving input voice content;
the voice content sending module is used for taking the subsequently received voice content as conference control voice and sending the conference control voice to the conference server under the condition that the voice content contains a preset voice wake-up word, so that the conference server converts the conference control voice into semantic text content and then sends the semantic text content to the chairman terminal;
And the control voice confirmation module is used for responding to the confirmation operation of the semantic text content under the condition that the semantic text content sent by the conference server is received, sending the confirmation information of the conference control voice to the conference server, and controlling the conference by the conference server according to the confirmation information and the semantic text content.
9. A voice control apparatus for an online conference, the apparatus comprising:
the voice receiving module is used for receiving conference control voice sent by the chairman terminal;
the voice conversion module is used for converting the conference control voice into corresponding semantic text content and sending the corresponding semantic text content to the chairman terminal so that the chairman terminal can respond to the confirmation operation of the semantic text content and send confirmation information of the conference control voice to the conference server;
and the conference control module is used for constructing a conference control instruction corresponding to the conference control voice to control the conference under the condition that the confirmation information sent by the chairman terminal is received.
10. A computer-readable storage medium, characterized in that it stores a computer program causing a processor to execute the online conference voice control method according to any one of claims 1 to 7.
CN202310800749.3A 2023-06-30 2023-06-30 Voice control method and device for online conference Pending CN116939144A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310800749.3A CN116939144A (en) 2023-06-30 2023-06-30 Voice control method and device for online conference

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310800749.3A CN116939144A (en) 2023-06-30 2023-06-30 Voice control method and device for online conference

Publications (1)

Publication Number Publication Date
CN116939144A true CN116939144A (en) 2023-10-24

Family

ID=88383496

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310800749.3A Pending CN116939144A (en) 2023-06-30 2023-06-30 Voice control method and device for online conference

Country Status (1)

Country Link
CN (1) CN116939144A (en)

Similar Documents

Publication Publication Date Title
US11880442B2 (en) Authentication of audio-based input signals
EP3453146B1 (en) Communication system
CN110213522B (en) Video data processing method and device and related equipment
US10938870B2 (en) Content management across a multi-party conferencing system by parsing a first and second user engagement stream and facilitating the multi-party conference using a conference engine
CN112738559B (en) Screen projection implementation method, device and system
KR102085383B1 (en) Termial using group chatting service and operating method thereof
WO2016127691A1 (en) Method and apparatus for broadcasting dynamic information in multimedia conference
US10893235B2 (en) Conferencing apparatus and method for switching access terminal thereof
US20120259924A1 (en) Method and apparatus for providing summary information in a live media session
CN107370610A (en) Meeting synchronous method and device
CN105916005B (en) The content model control method and device of multimedia messages
CN111147789B (en) Method, device and equipment for recording audio and video stream and storage medium
CN110418181B (en) Service processing method and device for smart television, smart device and storage medium
CN117715048A (en) Telecommunication fraud recognition method, device, electronic equipment and storage medium
US20180232356A1 (en) Method and system to communicate between devices through natural language using instant messaging applications and interoperable public identifiers
CN110601850B (en) Scenic spot information recording method, related equipment and storage medium
CN116939144A (en) Voice control method and device for online conference
CN111212043A (en) Multimedia file generation method and device
CN113420133B (en) Session processing method, device, equipment and storage medium
CN110635993B (en) Method and apparatus for synthesizing multimedia information
CN111954004B (en) Data processing method, control equipment, remote user terminal and data processing system
KR20190009658A (en) Service provision device, user device and authentication server for executing authentication in voice interface device
CN114840824A (en) Data processing method, device, terminal, cloud service and storage medium
CN111726645A (en) Live broadcast control method and device, electronic equipment and storage medium
US20230308493A1 (en) Access method and device for managing access to a secure communication session between participating communication terminals by a requesting communication terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication