WO2023241360A1 - Online class voice interaction methods and apparatus, device and storage medium - Google Patents

Online class voice interaction methods and apparatus, device and storage medium Download PDF

Info

Publication number
WO2023241360A1
WO2023241360A1 PCT/CN2023/097411 CN2023097411W WO2023241360A1 WO 2023241360 A1 WO2023241360 A1 WO 2023241360A1 CN 2023097411 W CN2023097411 W CN 2023097411W WO 2023241360 A1 WO2023241360 A1 WO 2023241360A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio
voice
client
sent
online
Prior art date
Application number
PCT/CN2023/097411
Other languages
French (fr)
Chinese (zh)
Inventor
迪力木拉提·都里昆
吴承峰
Original Assignee
北京新唐思创教育科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京新唐思创教育科技有限公司 filed Critical 北京新唐思创教育科技有限公司
Publication of WO2023241360A1 publication Critical patent/WO2023241360A1/en

Links

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/08Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations
    • G09B5/14Electrically-operated educational appliances providing for individual presentation of information to a plurality of student stations with provision for individual teacher-student communication
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/80Responding to QoS

Definitions

  • the present disclosure relates to the technical field of human-computer interaction, and in particular to a voice interaction method, device, equipment and storage medium for online classrooms.
  • the teacher starts online live teaching, and when a student needs to speak, the student's voice permissions are enabled through buttons in the live broadcast interface.
  • a voice interaction method for online classes including:
  • the server sends the first audio to the server, wherein the server receives the first audio sent by multiple clients and adds the first audio sent by the multiple clients to the voice queue in sequence;
  • the second audio that has not been played in the voice queue is acquired in sequence, and the second audio is played.
  • another voice interaction method for online classes including:
  • the unplayed second audio in the voice queue is sent to the multiple clients in sequence, wherein each client plays the second audio.
  • a voice interaction device for online classes including:
  • the recording module is used to obtain the recording in response to the triggering operation of the voice recording control in the online classroom interface.
  • Upload module used to send the first audio to the server, wherein the server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients to the voice in turn. in queue;
  • a playback module configured to sequentially obtain unplayed second audio in the voice queue and play the second audio.
  • another voice interaction device for online classes including:
  • the receiving module is used to receive the first audio sent by multiple clients
  • a storage module configured to add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio
  • a sending module configured to send the unplayed second audio in the voice queue to the multiple clients in sequence, where each client plays the second audio.
  • an electronic device including: a processor; a memory for storing instructions executable by the processor; and the processor for reading the executable instructions from the memory. Execute the instruction, and execute the instruction to implement the above voice interaction method in the online classroom.
  • a computer-readable storage medium stores a computer program.
  • the computer program is executed by a processor, the above-mentioned voice interaction method in an online classroom is implemented.
  • the client obtains the recorded first audio and sends it to the server.
  • the server receives the first audio sent by multiple clients and sends the first audio to the server.
  • the first audio is added to the voice queue in turn, and the client sequentially obtains the unplayed second audio in the voice queue and plays the second audio.
  • This provides a semi-asynchronous voice discussion function for online classes and enhances the online A sense of classroom atmosphere, and when multiple clients record audio, each audio is acquired and played sequentially based on the voice queue, achieving semi-asynchronous voice sharing and communication, ensuring orderly speech among students, and improving the online classroom voice interaction effect.
  • Figure 1 is a schematic flowchart of a voice interaction method in an online classroom provided by an embodiment of the present disclosure
  • Figure 2 is a schematic flowchart of another voice interaction method in an online classroom provided by an embodiment of the present disclosure
  • Figure 3 is a schematic diagram of an online classroom interface provided by an embodiment of the present disclosure.
  • Figure 4 is a schematic flowchart of another voice interaction method in an online classroom provided by an embodiment of the present disclosure
  • Figure 5 is a schematic structural diagram of a voice interaction device for online classrooms provided by an embodiment of the present disclosure
  • Figure 6 is a schematic structural diagram of another online classroom voice interaction device provided by an embodiment of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
  • Figure 1 is a schematic flow chart of a voice interaction method in an online classroom provided by an embodiment of the present disclosure. As shown in Figure 1, the voice interaction method in an online classroom provided by an embodiment of the present disclosure includes:
  • Step 101 Obtain the recorded first audio in response to a triggering operation on the voice recording control in the online classroom interface.
  • the method of the embodiment of the present disclosure is used for voice interaction between users in an online classroom.
  • a virtual teaching scene can be constructed based on the unity3D engine and displayed in the online classroom interface to realize online classroom teaching.
  • the online classroom interface is configured with a voice recording control, and the user can activate the voice recording function by triggering the voice recording control.
  • the client obtains the first recorded audio in response to the triggering operation of the voice recording control in the online classroom interface.
  • the triggering operations of the voice recording control include but are not limited to key triggering, touch track triggering, gesture triggering, etc.
  • the multiple clients may be multiple student-side clients, or may include teacher-side clients and One less student side client.
  • the teacher collects videos through a camera
  • the teacher-side client sends the collected video to the server
  • the server sends the video to each student-side client
  • the student-side clients use the
  • the video is displayed in a designated area in real time to implement online classroom teaching, where the designated area is, for example, the upper right area of the online classroom interface.
  • the voice recording control is configured in the online classroom interface of the student-side client.
  • the voice recording control can be configured to press and hold to speak. Students perform voice recording through the voice recording control in the online classroom interface. Locally on the student-side client The first audio is generated, and the student-side client sends the first audio to the client.
  • Step 102 Send the first audio to the server, where the server receives the first audio sent by multiple clients and adds the first audio sent by the multiple clients to the voice queue in sequence.
  • each client sends the first audio recorded locally to the server.
  • the server is provided with a voice queue, and the voice queue is used to store the first audio received by the server.
  • the server receives multiple first audios sent by multiple clients, and adds the multiple first audios to the voice queue in sequence according to the timestamp information of each first audio.
  • the student-side client obtains the recorded first audio and the recording timestamp corresponding to the first audio.
  • the recording timestamp is used to indicate the recording time of the first audio.
  • the student-side client sends the first audio and its corresponding recording timestamp to the server.
  • the server receives multiple first audios, it adds the first audios to the voice queue in order of recording time according to the recording timestamp. , thus multiple audios are stored in the voice queue in a certain order.
  • Step 103 Acquire the unplayed second audio in the voice queue in sequence, and play the second audio.
  • each client sequentially obtains and plays the audios in the voice queue according to the order between the audios in the voice queue.
  • the voice queue includes played audio and unplayed second audio, and each time the client obtains and plays the second audio in the voice queue.
  • the server sends the unplayed second audio in the voice queue to each client in turn according to a preset time interval; or, the client obtains and plays audio 1, and when the client detects that audio 1 has been played,
  • the server obtains audio two that is adjacent to audio one and has not been played from the voice queue according to the audio retrieval request, and sends audio two to the client.
  • the online class is set up in group discussion mode
  • the student-side clients in the same group are set with the same group ID
  • the client sends the first audio and group ID to the server.
  • the server stores multiple first audios and corresponding group identifiers, and the client sequentially obtains and plays the corresponding audios from the server according to its own group identifier.
  • the client obtains the recorded first audio and sends it to the server.
  • the server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients in sequence to In the voice queue, the client sequentially obtains the unplayed second audio in the voice queue and plays the second audio.
  • the client after acquiring the second audio in the voice queue, the client displays the second audio in the form of a voice bar in a preset area in the online classroom interface.
  • FIG. 2 is a schematic flow chart of a voice interaction method in an online classroom provided by an embodiment of the present disclosure.
  • the presets in the online classroom interface are area, which displays the second audio in sequence in the form of voice bars, including:
  • Step 201 Perform speech recognition on the second audio based on the pre-trained speech recognition model, and obtain the text content corresponding to the second audio.
  • the input of the speech recognition model is audio
  • the output is text content corresponding to the audio.
  • the speech recognition model can be implemented based on a deep neural network, using audio labeled with corresponding text content as training samples, and training the speech recognition model based on the training samples.
  • a speech recognition model is set in advance, and the speech recognition model is used to perform speech recognition on each audio in the speech queue, and the text content corresponding to each audio is obtained.
  • the client sequentially obtains the unplayed second audio in the voice queue and plays it, it obtains the text content corresponding to the second audio.
  • Step 202 Obtain the user identification of the second audio and the user name corresponding to the user identification.
  • the client obtains the recorded first audio and the user identification corresponding to the first audio, and sends the first audio and its corresponding user identification to the server. Furthermore, the client obtains the second audio in the voice queue. The audio and the user identification corresponding to the second audio, and the corresponding user name is determined based on the user identification.
  • the user ID is used to distinguish each user.
  • the user ID can be a user account, and the user name can be input by the user when creating the account.
  • Step 203 Fill in the preset controls according to the user name and text content, and generate a voice bar corresponding to the second audio.
  • the text content corresponding to the user name and the second audio is used as the display content of the voice bar, Populate the preset control with user name and text content to generate a voice bar.
  • Figure 3 shows a schematic diagram of the online classroom interface.
  • Mark 31 in the figure is the online classroom interface.
  • the virtual teaching scene is displayed in the online classroom interface.
  • the preset area in the online classroom interface can be the lower right corner.
  • the audio content can be displayed more intuitively based on the voice bar.
  • filling the preset controls according to the user name and text content to generate a voice bar corresponding to the second audio includes: obtaining user preference information corresponding to the user identification; The display style determines the target control of the display style; fills the target control according to the user name and text content, and generates a voice bar corresponding to the second audio.
  • the user preference information is used to indicate the user's preference for the display style.
  • the user preference information may be set by the user, or may be determined based on the user behavior log.
  • the display style includes the theme of the control, bubble effects, etc.
  • the target control is rendered according to the display style, and the target control is filled with the user name and text content to generate a voice bar.
  • client one sends the recorded audio and the first user identification corresponding to the audio to the server.
  • client two obtains the audio from the voice queue of the server, the user preference information corresponding to the first user identification is used. , determine the target control of the corresponding display style. And fill in the target control according to the user name and text content, and generate a voice bar corresponding to the audio.
  • voice bars with different display styles can be displayed according to user preferences, thereby improving the display effect.
  • audio corresponding to the target voice bar in response to a triggering operation on the target voice bar displayed in the preset area, audio corresponding to the target voice bar is played.
  • the user can click on the voice bar displayed in the online classroom interface.
  • the click operation on the voice bar is detected, the audio corresponding to the voice bar is obtained and played.
  • the implementation methods of triggering operations include but are not limited to button triggering, touch screen click triggering, gesture triggering, etc.
  • the form of voice bars is used for voice sharing and communication in the online classroom, and the voice interaction during the online teaching process is retained in the form of voice bars as an effective interaction in the classroom, providing a basis for subsequent recording and broadcasting of lessons. , students’ highlight performance, etc. provide materials.
  • FIG 4 is a schematic flow chart of another voice interaction method in an online classroom provided by an embodiment of the present disclosure.
  • the voice interaction method in an online classroom provided by an embodiment of the present disclosure includes:
  • Step 401 Receive first audio messages sent by multiple clients.
  • the execution subject is the server.
  • the online classroom interface is configured with a voice recording control, and the user can activate the voice recording function by triggering the voice recording control.
  • the client obtains the first recorded audio in response to the triggering operation of the voice recording control in the online classroom interface.
  • the triggering operations of the voice recording control include but are not limited to key triggering, touch track triggering, gesture triggering, etc.
  • the multiple clients may be multiple student-side clients, or may include a teacher-side client and at least one student-side client.
  • Each client can obtain the first recorded audio in response to a triggering operation on the voice recording control, and send the first audio to the server.
  • Step 402 Add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio.
  • the server is provided with a voice queue, and the voice queue is used to store the first audio received by the server.
  • the timestamp information includes a recording timestamp
  • the recording timestamp is used to represent the recording time of the first audio.
  • the student-side client sends the first audio and its corresponding recording timestamp to the server.
  • the server receives multiple first audios, it adds the first audios to the voice queue in order of recording time according to the recording timestamp. , thus multiple audios are stored in the voice queue in a certain order.
  • Step 403 Send the unplayed second audio in the voice queue to multiple clients in sequence, where each client plays the second audio.
  • the number of unplayed second audios in the voice queue is multiple.
  • the server can store the unplayed second audios in the voice queue. Send it to each client in turn, so that each client plays the second audio in turn.
  • the server sends the unplayed second audio in the voice queue to each client in sequence according to a preset time interval.
  • the time interval can be set according to actual needs.
  • the server when the server receives the audio acquisition request sent by the client, it sends the first unplayed second audio in the voice queue to the client.
  • the client when the client detects that the current audio has finished playing, it sends an audio acquisition request to the server.
  • the server receives the first audios sent by multiple clients, adds the multiple first audios to the voice queue in sequence according to the timestamp information of each first audio, and adds the voice queue to the voice queue.
  • the unplayed second audios are sent to multiple clients in sequence, so that the clients can play multiple second audios in sequence.
  • This provides a semi-asynchronous voice discussion function for online classes, allowing users to conduct Voice sharing and communication can achieve the effect of classroom group discussion in the online classroom, enhance the atmosphere of the online classroom, and, in the case of multiple clients recording audio, based on the voice queue in sequence Obtain and play each audio to achieve semi-asynchronous voice sharing and communication, ensuring orderly speech among students, solving the problem of unclear speech caused by multiple audios being played at the same time when multiple students speak at the same time, and improving the voice quality of online classes. interactive effects.
  • FIG. 5 is a schematic structural diagram of a voice interaction device for online classes provided by an embodiment of the present disclosure.
  • the voice interaction device for online classes includes: a recording module 51 , an upload module 52 , and a playback module 53 .
  • the recording module 51 is used to obtain the recorded first audio in response to the triggering operation of the voice recording control in the online classroom interface.
  • Upload module 52 used to send the first audio to the server, wherein the server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients in sequence to in the voice queue.
  • the playback module 53 is used to sequentially obtain the unplayed second audio in the voice queue and play the second audio.
  • the voice interaction device of the online classroom further includes: a display module configured to sequentially display the second audio in the form of a voice bar in a preset area in the online classroom interface.
  • the display module includes: a recognition unit, used to perform speech recognition on the second audio according to a pre-trained speech recognition model, and obtain the text content corresponding to the second audio; and an acquisition unit, using for obtaining the user identification of the second audio and the user name corresponding to the user identification; a generating unit for filling the preset control according to the user name and the text content to generate the second audio corresponding voice bar.
  • the generation unit is specifically configured to: obtain user preference information corresponding to the user identification; determine the target control of the display style according to the display style corresponding to the user preference information; The user name and the text content are filled in the target control to generate a voice bar corresponding to the second audio.
  • the voice interaction device of the online classroom further includes: a triggering module, configured to respond to a triggering operation on a target voice bar displayed in the preset area, and play a video corresponding to the target voice bar. Audio.
  • a triggering module configured to respond to a triggering operation on a target voice bar displayed in the preset area, and play a video corresponding to the target voice bar. Audio.
  • FIG. 6 is a schematic structural diagram of another online classroom voice interaction device provided by an embodiment of the present disclosure.
  • the online classroom voice interaction device includes: a receiving module 61 , a storage module 62 , and a sending module 63 .
  • the receiving module 61 is used to receive the first audio sent by multiple clients.
  • the storage module 62 is configured to add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio.
  • the sending module 63 is configured to send the unplayed second audio in the voice queue to the multiple clients in sequence, where each client plays the second audio.
  • the sending module 63 is specifically configured to: send the unplayed second audios in the voice queue to each voice in sequence according to a preset time interval. client; or, when an audio acquisition request returned by the client is detected, the first second audio in the voice queue is sent to the client, where the audio acquisition request is when the client detects that the current audio has been played. sent at the time.
  • the online classroom voice interaction device provided by the embodiment of the present disclosure can execute any online classroom voice interaction method provided by the embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method.
  • Contents that are not described in detail in the device embodiments of the present disclosure may refer to the descriptions in any method embodiments of the present disclosure.
  • Exemplary embodiments of the present disclosure also provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor.
  • the memory stores a computer program that can be executed by the at least one processor, and when executed by the at least one processor, the computer program is used to cause the electronic device to execute:
  • the server In response to the triggering operation of the voice recording control in the online classroom interface, obtain the recorded first audio; send the first audio to the server, where the server receives the first audio sent by multiple clients and sends the first audio to the server.
  • the first audio sent by the client is added to the voice queue in turn; the second audio that has not been played in the voice queue is obtained in turn, and the second audio is played.
  • the computer program when executed by the at least one processor, is also used to cause the electronic device to execute: sequentially in the form of a voice bar in a preset area in the online classroom interface. Show second audio.
  • sequentially displaying the second audio in the form of voice bars includes: performing speech recognition on the second audio according to a pre-trained speech recognition model, and obtaining the text content corresponding to the second audio; obtaining the second audio The user identification and the user name corresponding to the user identification; fill in the preset control according to the user name and text content, and generate a voice bar corresponding to the second audio.
  • filling the preset controls according to the user name and text content, and generating a voice bar corresponding to the second audio includes: obtaining user preference information corresponding to the user identification; and displaying corresponding information according to the user preference information.
  • Style determine the target control of the display style; fill the target control according to the user name and text content, and generate a voice bar corresponding to the second audio.
  • the computer program when executed by the at least one processor, is also used to cause the electronic device to perform: in response to a triggering operation on the target voice bar displayed in the preset area, Play the audio corresponding to the target voice bar.
  • Exemplary embodiments of the present disclosure also provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor.
  • the memory stores a computer program that can be executed by at least one processor.
  • the computer program when executed by at least one processor, is used to cause the electronic device to execute:
  • Receive first audios sent by multiple clients add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio; send unplayed second audios in the voice queue to multiple clients in sequence client, where each client plays the second audio.
  • the number of second audios is multiple, and sending the unplayed second audios in the voice queue to multiple clients in sequence includes: according to a preset time interval, sending the unplayed second audios in the voice queue to multiple clients.
  • the second audio is sent to each client in turn; or, when the audio acquisition request returned by the client is detected, the first second audio in the voice queue is sent to the client, where the audio acquisition request is detected by the client. Sent when the current audio has finished playing.
  • Exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, causes the computer to execute:
  • the server In response to the triggering operation of the voice recording control in the online classroom interface, obtain the recorded first audio; send the first audio to the server, where the server receives the first audio sent by multiple clients and sends the first audio to the server.
  • the first audio sent by the client is added to the voice queue in turn; the second audio that has not been played in the voice queue is obtained in turn, and the second audio is played.
  • the computer program when executed by the processor of the computer, it is also used to cause the computer to execute: in a preset area in the online classroom interface, sequentially display the second time in the form of a voice bar. Audio.
  • displaying the second audio in the form of voice bars sequentially includes: performing speech recognition on the second audio according to a pre-trained speech recognition model, and obtaining the text content corresponding to the second audio; obtaining the second audio.
  • the user ID of the second audio and the user name corresponding to the user ID fill in the preset control according to the user name and text content to generate a voice bar corresponding to the second audio.
  • filling the preset controls according to the user name and text content, and generating a voice bar corresponding to the second audio includes: obtaining user preference information corresponding to the user identification; and displaying corresponding information according to the user preference information.
  • Style determine the target control of the display style; fill the target control according to the user name and text content, and generate a voice bar corresponding to the second audio.
  • the computer program when executed by the processor of the computer, is also used to cause the computer to perform: in response to a triggering operation on the target voice bar displayed in the preset area, playback and target The audio corresponding to the voice bar.
  • Exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, causes the computer to execute:
  • Receive first audios sent by multiple clients add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio; send unplayed second audios in the voice queue to multiple clients in sequence client, where each client plays the second audio.
  • the number of second audios is multiple, and sending the unplayed second audios in the voice queue to multiple clients in sequence includes: according to a preset time interval, sending the unplayed second audios in the voice queue to multiple clients.
  • the second audio is sent to each client in turn; or, when the audio acquisition request returned by the client is detected, the first second audio in the voice queue is sent to the client, where the audio acquisition request is detected by the client. Sent when the current audio has finished playing.
  • Exemplary embodiments of the present disclosure also provide a computer program product, including a computer program, wherein the computer program, when executed by a processor of a computer, is used to cause the computer to perform a method according to an embodiment of the present disclosure.
  • Electronic devices are intended to refer to various forms of digital electronic computing equipment, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices.
  • the components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
  • the electronic device 700 includes a computing unit 701 that can perform calculations according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 . Perform various appropriate actions and processing. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored.
  • Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704.
  • An input/output (I/O) interface 705 is also connected to bus 704.
  • the input unit 706 may be a device capable of inputting information to the electronic device 700
  • the input unit 706 may receive inputted numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device.
  • Output unit 707 may be any type of device capable of presenting information, and may include, but is not limited to, a display, speakers, video/audio output terminal, vibrator, and/or printer.
  • the storage unit 708 may include, but is not limited to, magnetic disks and optical disks.
  • the communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver and/or a chip Groups such as BluetoothTM devices, WiFi devices, WiMax devices, cellular communications devices and/or the like.
  • Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc.
  • the computing unit 701 performs the various methods and processes described above.
  • the voice interaction method of the online classroom can be implemented as a computer software program, which is tangibly included in a machine-readable medium, such as the storage unit 708.
  • part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709.
  • the computing unit 701 may be configured to perform the voice interaction method of the online classroom through any other suitable means (eg, by means of firmware).
  • Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented.
  • the program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM portable compact disk read-only memory
  • optical storage devices magnetic storage devices, or any suitable combination of the foregoing.
  • machine-readable medium and “computer-readable medium” refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor (eg, magnetic disk, optical disk, memory, programmable logic device (PLD)), including machine-readable media that receive machine instructions as machine-readable signals.
  • machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
  • the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer.
  • a display device eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor
  • a keyboard and pointing device eg, a mouse or a trackball
  • Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
  • the systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system.
  • the components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
  • Computer systems may include clients and servers.
  • Clients and servers are generally remote from each other and typically interact over a communications network.
  • the relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.

Abstract

The present disclosure relates to online class voice interaction methods and apparatus, a device and a storage medium. One method comprises: in response to a trigger operation on a voice recording control in an online class interface, acquiring a recorded first audio; sending the first audio to a server, wherein the server receives first audios sent by a plurality of clients, and sequentially adds into a voice queue the first audios sent by the plurality of clients; and sequentially acquiring second audios in the voice queue which are not played, and playing the second audios. The technical solution of the present disclosure realizes semi-asynchronous voice sharing and communication, thus improving the atmosphere of an online class, ensuring that students speak in an orderly manner, and improving the voice interaction effect of an online class.

Description

在线课堂的语音交互方法、装置、设备及存储介质Voice interaction methods, devices, equipment and storage media for online classrooms
本申请要求申请日为2022年6月14日,申请号为“202210664108.5”,专利名称为“在线课堂的语音交互方法、装置、设备及存储介质”的发明申请的优先权,其全部内容在此引入作为参考。This application requires the priority of the invention application with the filing date of June 14, 2022, the application number is "202210664108.5", and the patent name is "Voice interaction method, device, equipment and storage medium for online classroom", the entire content of which is here Introduced for reference.
技术领域Technical field
本公开涉及人机交互技术领域,尤其涉及一种在线课堂的语音交互方法、装置、设备及存储介质。The present disclosure relates to the technical field of human-computer interaction, and in particular to a voice interaction method, device, equipment and storage medium for online classrooms.
背景技术Background technique
随着互联网技术的发展,在线课堂被广泛应用于各类教育教学场景。在课堂教学过程中,为保证教学质量,增强学生的课堂参与度,通常需要学生在课堂上进行语音互动。With the development of Internet technology, online classes are widely used in various educational and teaching scenarios. During the classroom teaching process, in order to ensure the quality of teaching and enhance students' classroom participation, students are usually required to engage in voice interaction in the classroom.
相关技术中,由教师开启线上直播教学,并在需要学生发言时,通过直播界面中的按键开启该学生的语音权限。In related technologies, the teacher starts online live teaching, and when a student needs to speak, the student's voice permissions are enabled through buttons in the live broadcast interface.
发明内容Contents of the invention
根据本公开的一方面,提供了一种在线课堂的语音交互方法,包括:According to one aspect of the present disclosure, a voice interaction method for online classes is provided, including:
响应于对在线课堂界面中语音录制控件的触发操作,获取录制的第一音频;In response to the triggering operation of the voice recording control in the online classroom interface, obtain the first recorded audio;
将所述第一音频发送至服务端,其中,所述服务端接收多个客户端发送的第一音频,并将所述多个客户端发送的第一音频依次添加至语音队列中;Send the first audio to the server, wherein the server receives the first audio sent by multiple clients and adds the first audio sent by the multiple clients to the voice queue in sequence;
依次获取所述语音队列中未播放的第二音频,并播放所述第二音频。The second audio that has not been played in the voice queue is acquired in sequence, and the second audio is played.
根据本公开的另一方面,提供了另一种在线课堂的语音交互方法,包括:According to another aspect of the present disclosure, another voice interaction method for online classes is provided, including:
接收多个客户端发送的第一音频;Receive the first audio sent by multiple clients;
根据每个所述第一音频的时间戳信息,将多个所述第一音频依次添加至语音队列中;Add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio;
将所述语音队列中未播放的第二音频依次发送至所述多个客户端,其中,每个客户端播放所述第二音频。The unplayed second audio in the voice queue is sent to the multiple clients in sequence, wherein each client plays the second audio.
根据本公开的另一方面,提供了一种在线课堂的语音交互装置,包括:According to another aspect of the present disclosure, a voice interaction device for online classes is provided, including:
录制模块,用于响应于对在线课堂界面中语音录制控件的触发操作,获取录 制的第一音频;The recording module is used to obtain the recording in response to the triggering operation of the voice recording control in the online classroom interface. The first audio produced;
上传模块,用于将所述第一音频发送至服务端,其中,所述服务端接收多个客户端发送的第一音频,并将所述多个客户端发送的第一音频依次添加至语音队列中;Upload module, used to send the first audio to the server, wherein the server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients to the voice in turn. in queue;
播放模块,用于依次获取所述语音队列中未播放的第二音频,并播放所述第二音频。A playback module, configured to sequentially obtain unplayed second audio in the voice queue and play the second audio.
根据本公开的另一方面,提供了另一种在线课堂的语音交互装置,包括:According to another aspect of the present disclosure, another voice interaction device for online classes is provided, including:
接收模块,用于接收多个客户端发送的第一音频;The receiving module is used to receive the first audio sent by multiple clients;
存储模块,用于根据每个所述第一音频的时间戳信息,将多个所述第一音频依次添加至语音队列中;A storage module configured to add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio;
发送模块,用于将所述语音队列中未播放的第二音频依次发送至所述多个客户端,其中,每个客户端播放所述第二音频。A sending module, configured to send the unplayed second audio in the voice queue to the multiple clients in sequence, where each client plays the second audio.
根据本公开的另一方面,提供了一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述的在线课堂的语音交互方法。According to another aspect of the present disclosure, an electronic device is provided, including: a processor; a memory for storing instructions executable by the processor; and the processor for reading the executable instructions from the memory. Execute the instruction, and execute the instruction to implement the above voice interaction method in the online classroom.
根据本公开的另一方面,提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述的在线课堂的语音交互方法。According to another aspect of the present disclosure, a computer-readable storage medium is provided. The storage medium stores a computer program. When the computer program is executed by a processor, the above-mentioned voice interaction method in an online classroom is implemented.
根据本申请实施例中提供的一个或多个技术方案,通过客户端获取录制的第一音频并发送至服务端,服务端接收多个客户端发送的第一音频,并将多个客户端发送的第一音频依次添加至语音队列中,客户端依次获取语音队列中未播放的第二音频,并播放第二音频,由此,提供了一种在线课堂的半异步语音讨论功能,增强了在线课堂的氛围感,并且,在多个客户端录制音频的情况下,基于语音队列依次获取并播放各个音频,实现半异步式的语音分享与交流,保证学生之间的有序发言,提高在线课堂的语音交互效果。According to one or more technical solutions provided in the embodiments of this application, the client obtains the recorded first audio and sends it to the server. The server receives the first audio sent by multiple clients and sends the first audio to the server. The first audio is added to the voice queue in turn, and the client sequentially obtains the unplayed second audio in the voice queue and plays the second audio. This provides a semi-asynchronous voice discussion function for online classes and enhances the online A sense of classroom atmosphere, and when multiple clients record audio, each audio is acquired and played sequentially based on the voice queue, achieving semi-asynchronous voice sharing and communication, ensuring orderly speech among students, and improving the online classroom voice interaction effect.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普 通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to explain the embodiments of the present disclosure or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for ordinary people in the art, For those skilled in the art, other drawings can also be obtained based on these drawings without exerting creative labor.
图1为本公开实施例所提供的一种在线课堂的语音交互方法的流程示意图;Figure 1 is a schematic flowchart of a voice interaction method in an online classroom provided by an embodiment of the present disclosure;
图2为本公开实施例所提供的另一种在线课堂的语音交互方法的流程示意图;Figure 2 is a schematic flowchart of another voice interaction method in an online classroom provided by an embodiment of the present disclosure;
图3为本公开实施例所提供的一种在线课堂界面示意图;Figure 3 is a schematic diagram of an online classroom interface provided by an embodiment of the present disclosure;
图4为本公开实施例所提供的另一种在线课堂的语音交互方法的流程示意图;Figure 4 is a schematic flowchart of another voice interaction method in an online classroom provided by an embodiment of the present disclosure;
图5为本公开实施例所提供的一种在线课堂的语音交互装置的结构示意图;Figure 5 is a schematic structural diagram of a voice interaction device for online classrooms provided by an embodiment of the present disclosure;
图6为本公开实施例所提供的另一种在线课堂的语音交互装置的结构示意图;Figure 6 is a schematic structural diagram of another online classroom voice interaction device provided by an embodiment of the present disclosure;
图7为本公开实施例提供的一种电子设备的结构示意图。FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, which rather are provided for A more thorough and complete understanding of this disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that various steps described in the method implementations of the present disclosure may be executed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performance of illustrated steps. The scope of the present disclosure is not limited in this regard.
以下参照附图描述本公开的方案。Aspects of the present disclosure are described below with reference to the accompanying drawings.
图1为本公开实施例所提供的一种在线课堂的语音交互方法的流程示意图,如图1所示,本公开实施例提供的在线课堂的语音交互方法包括:Figure 1 is a schematic flow chart of a voice interaction method in an online classroom provided by an embodiment of the present disclosure. As shown in Figure 1, the voice interaction method in an online classroom provided by an embodiment of the present disclosure includes:
步骤101,响应于对在线课堂界面中语音录制控件的触发操作,获取录制的第一音频。Step 101: Obtain the recorded first audio in response to a triggering operation on the voice recording control in the online classroom interface.
本公开实施例的方法,用于在线课堂中各用户之间的语音交互。其中,可以基于unity3D引擎构建虚拟教学场景,并于在线课堂界面中显示该虚拟教学场景,以实现在线课堂教学。The method of the embodiment of the present disclosure is used for voice interaction between users in an online classroom. Among them, a virtual teaching scene can be constructed based on the unity3D engine and displayed in the online classroom interface to realize online classroom teaching.
本公开实施例中,在线课堂界面中配置有语音录制控件,用户可以通过触发该语音录制控件开启语音录制功能。客户端响应于对在线课堂界面中语音录制控件的触发操作,获取录制的第一音频。其中,语音录制控件的触发操作包括但不限于按键触发、触摸轨迹触发、手势触发等。In this disclosed embodiment, the online classroom interface is configured with a voice recording control, and the user can activate the voice recording function by triggering the voice recording control. The client obtains the first recorded audio in response to the triggering operation of the voice recording control in the online classroom interface. Among them, the triggering operations of the voice recording control include but are not limited to key triggering, touch track triggering, gesture triggering, etc.
其中,多个客户端可以是多个学生侧客户端,也可以包括教师侧客户端和至 少一个学生侧客户端。Among them, the multiple clients may be multiple student-side clients, or may include teacher-side clients and One less student side client.
作为一种示例,在线课堂中教师通过摄像头采集视频,教师侧客户端将采集的视频发送至服务端,服务端将该视频发送至各学生侧客户端,学生侧客户端通过在线课堂界面中的指定区域实时显示该视频,实现在线课堂的教学,其中,该指定区域例如为在线课堂界面的右上区域。本示例中,在学生侧客户端的在线课堂界面中配置语音录制控件,该语音录制控件可以采用按住发言的配置,学生通过在线课堂界面中的语音录制控件进行语音录制,在学生侧客户端本地生成第一音频,学生侧客户端将第一音频发送至客户端。As an example, in an online classroom, the teacher collects videos through a camera, the teacher-side client sends the collected video to the server, and the server sends the video to each student-side client, and the student-side clients use the The video is displayed in a designated area in real time to implement online classroom teaching, where the designated area is, for example, the upper right area of the online classroom interface. In this example, the voice recording control is configured in the online classroom interface of the student-side client. The voice recording control can be configured to press and hold to speak. Students perform voice recording through the voice recording control in the online classroom interface. Locally on the student-side client The first audio is generated, and the student-side client sends the first audio to the client.
步骤102,将第一音频发送至服务端,其中,服务端接收多个客户端发送的第一音频,并将多个客户端发送的第一音频依次添加至语音队列中。Step 102: Send the first audio to the server, where the server receives the first audio sent by multiple clients and adds the first audio sent by the multiple clients to the voice queue in sequence.
本实施例中,每个客户端均将本地录制的第一音频发送至服务端,服务端设置有语音队列,语音队列用于存储服务端接收的第一音频。可选地,服务端接收多个客户端发送的多个第一音频,根据每个第一音频的时间戳信息,将多个第一音频依次添加至语音队列中。In this embodiment, each client sends the first audio recorded locally to the server. The server is provided with a voice queue, and the voice queue is used to store the first audio received by the server. Optionally, the server receives multiple first audios sent by multiple clients, and adds the multiple first audios to the voice queue in sequence according to the timestamp information of each first audio.
作为一种示例,学生侧客户端获取录制的第一音频以及该第一音频对应的录制时间戳,录制时间戳用于表示第一音频的录制时间。学生侧客户端将第一音频及其对应的录制时间戳发送至服务端,当服务端接收到多个第一音频时,根据录制时间戳将第一音频按照录制时间的顺序依次添加至语音队列中,由此,语音队列中按照一定顺序依次存储多个音频。As an example, the student-side client obtains the recorded first audio and the recording timestamp corresponding to the first audio. The recording timestamp is used to indicate the recording time of the first audio. The student-side client sends the first audio and its corresponding recording timestamp to the server. When the server receives multiple first audios, it adds the first audios to the voice queue in order of recording time according to the recording timestamp. , thus multiple audios are stored in the voice queue in a certain order.
步骤103,依次获取语音队列中未播放的第二音频,并播放第二音频。Step 103: Acquire the unplayed second audio in the voice queue in sequence, and play the second audio.
本实施例中,每个客户端按照语音队列中各音频之间的顺序,依次获取并播放语音队列中的音频。其中,语音队列中包括已播放的音频和未播放的第二音频,每次客户端均获取并播放语音队列中的第二音频。可选地,服务端按照预设的时间间隔,将语音队列中未播放的第二音频依次发送至每个客户端;或者,客户端获取并播放音频一,当客户端检测到音频一播放完毕时向服务端发送音频获取请求,服务端根据音频获取请求从语音队列中获取与音频一相邻且未播放的音频二,并将音频二发送至该客户端。In this embodiment, each client sequentially obtains and plays the audios in the voice queue according to the order between the audios in the voice queue. The voice queue includes played audio and unplayed second audio, and each time the client obtains and plays the second audio in the voice queue. Optionally, the server sends the unplayed second audio in the voice queue to each client in turn according to a preset time interval; or, the client obtains and plays audio 1, and when the client detects that audio 1 has been played, When an audio retrieval request is sent to the server, the server obtains audio two that is adjacent to audio one and has not been played from the voice queue according to the audio retrieval request, and sends audio two to the client.
举例而言,在线课堂设置小组讨论模式,同一小组内的学生侧客户端设置有相同的小组标识,客户端将第一音频和小组标识发送至服务端。服务端存储多个第一音频以及对应的小组标识,客户端根据自身的小组标识向服务端依次获取并播放对应的音频。 For example, if the online class is set up in group discussion mode, the student-side clients in the same group are set with the same group ID, and the client sends the first audio and group ID to the server. The server stores multiple first audios and corresponding group identifiers, and the client sequentially obtains and plays the corresponding audios from the server according to its own group identifier.
根据本公开实施例的技术方案,客户端获取录制的第一音频并发送至服务端,服务端接收多个客户端发送的第一音频,并将多个客户端发送的第一音频依次添加至语音队列中,客户端依次获取语音队列中未播放的第二音频,并播放第二音频,由此,提供了一种在线课堂的半异步语音讨论功能,使用户可以在在线课堂中进行语音分享与交流,在在线课堂中实现课堂小组讨论的效果,增强了在线课堂的氛围感,并且,在多个客户端录制音频的情况下,基于语音队列依次获取并播放各个音频,实现半异步式的语音分享与交流,保证学生之间的有序发言,解决了多个学生同时发言时同时播放多个音频导致语音不清楚的问题,提高在线课堂的语音交互效果。According to the technical solution of the embodiment of the present disclosure, the client obtains the recorded first audio and sends it to the server. The server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients in sequence to In the voice queue, the client sequentially obtains the unplayed second audio in the voice queue and plays the second audio. This provides a semi-asynchronous voice discussion function in the online classroom, allowing users to share voice in the online classroom. and communication, achieving the effect of classroom group discussion in the online classroom, enhancing the atmosphere of the online classroom, and when multiple clients record audio, each audio is obtained and played sequentially based on the voice queue, achieving semi-asynchronous Voice sharing and communication ensures orderly speech among students, solves the problem of unclear speech caused by playing multiple audios at the same time when multiple students speak at the same time, and improves the voice interaction effect of online classes.
基于上述实施例,在本公开的一个实施例中,客户端在获取语音队列中的第二音频之后,在所述在线课堂界面中的预设区域,以语音条的形式展示第二音频。Based on the above embodiments, in one embodiment of the present disclosure, after acquiring the second audio in the voice queue, the client displays the second audio in the form of a voice bar in a preset area in the online classroom interface.
图2为本公开实施例所提供的一种在线课堂的语音交互方法的流程示意图,如图2所示,本公开实施例提供的在线课堂的语音交互方法中,在在线课堂界面中的预设区域,以语音条的形式依次展示第二音频,包括:Figure 2 is a schematic flow chart of a voice interaction method in an online classroom provided by an embodiment of the present disclosure. As shown in Figure 2, in the voice interaction method in an online classroom provided by an embodiment of the present disclosure, the presets in the online classroom interface are area, which displays the second audio in sequence in the form of voice bars, including:
步骤201,根据预训练的语音识别模型对第二音频进行语音识别,获取第二音频对应的文本内容。Step 201: Perform speech recognition on the second audio based on the pre-trained speech recognition model, and obtain the text content corresponding to the second audio.
本实施例中,语音识别模型的输入为音频,输出为该音频对应的文本内容。其中,语音识别模型可以基于深度神经网络实现,采用标注有对应文本内容的音频作为训练样本,根据训练样本训练语音识别模型。In this embodiment, the input of the speech recognition model is audio, and the output is text content corresponding to the audio. Among them, the speech recognition model can be implemented based on a deep neural network, using audio labeled with corresponding text content as training samples, and training the speech recognition model based on the training samples.
作为一种示例,预先设置语音识别模型,通过语音识别模型对语音队列中的各音频进行语音识别,获取各音频对应的文本内容。本示例中,在客户端依次获取语音队列中未播放的第二音频并播放时,获取第二音频对应的文本内容。As an example, a speech recognition model is set in advance, and the speech recognition model is used to perform speech recognition on each audio in the speech queue, and the text content corresponding to each audio is obtained. In this example, when the client sequentially obtains the unplayed second audio in the voice queue and plays it, it obtains the text content corresponding to the second audio.
步骤202,获取第二音频的用户标识以及与用户标识对应的用户名称。Step 202: Obtain the user identification of the second audio and the user name corresponding to the user identification.
本实施例中,客户端获取录制的第一音频以及该第一音频对应的用户标识,并将第一音频及其对应的用户标识发送至服务端,进而,客户端获取语音队列中的第二音频以及该第二音频对应的用户标识,根据用户标识确定对应的用户名称。其中,用户标识用于区分各用户,用户标识可以是用户账号,用户名称可以是用户在创建账号时输入的。In this embodiment, the client obtains the recorded first audio and the user identification corresponding to the first audio, and sends the first audio and its corresponding user identification to the server. Furthermore, the client obtains the second audio in the voice queue. The audio and the user identification corresponding to the second audio, and the corresponding user name is determined based on the user identification. The user ID is used to distinguish each user. The user ID can be a user account, and the user name can be input by the user when creating the account.
步骤203,根据用户名称和文本内容对预设控件进行填充,生成第二音频对应的语音条。Step 203: Fill in the preset controls according to the user name and text content, and generate a voice bar corresponding to the second audio.
本实施例中,将用户名称和第二音频对应的文本内容作为语音条的显示内容, 通过用户名称和文本内容对预设控件进行填充生成语音条。作为一种示例,参照图3,图3示出了在线课堂界面的示意图,图中标记31为在线课堂界面,在线课堂界面中展示虚拟教学场景,在线课堂界面中的预设区域可以是右下虚线区域,图中语音条的显示内容“A:XXXX”中,A为用户名称,XXXX为第二音频对应的文本内容。由此,能够在语音条的基础上更直观的展示音频内容。In this embodiment, the text content corresponding to the user name and the second audio is used as the display content of the voice bar, Populate the preset control with user name and text content to generate a voice bar. As an example, refer to Figure 3, which shows a schematic diagram of the online classroom interface. Mark 31 in the figure is the online classroom interface. The virtual teaching scene is displayed in the online classroom interface. The preset area in the online classroom interface can be the lower right corner. In the dotted line area, in the displayed content "A: XXXX" of the voice bar in the figure, A is the user name, and XXXX is the text content corresponding to the second audio. As a result, the audio content can be displayed more intuitively based on the voice bar.
在本公开的一个实施例中,所述根据用户名称和文本内容对预设控件进行填充,生成第二音频对应的语音条,包括:获取与用户标识对应的用户偏好信息;根据用户偏好信息对应的显示风格,确定显示风格的目标控件;根据用户名称和文本内容对目标控件进行填充,生成第二音频对应的语音条。In one embodiment of the present disclosure, filling the preset controls according to the user name and text content to generate a voice bar corresponding to the second audio includes: obtaining user preference information corresponding to the user identification; The display style determines the target control of the display style; fills the target control according to the user name and text content, and generates a voice bar corresponding to the second audio.
本实施例中,用户偏好信息用于指示用户对显示风格的偏好,用户偏好信息可以是用户设置的,也可以是根据用户行为日志确定的。显示风格包括控件的主题、气泡效果等,根据显示风格渲染目标控件,并通过用户名称和文本内容对目标控件进行填充,生成语音条。In this embodiment, the user preference information is used to indicate the user's preference for the display style. The user preference information may be set by the user, or may be determined based on the user behavior log. The display style includes the theme of the control, bubble effects, etc. The target control is rendered according to the display style, and the target control is filled with the user name and text content to generate a voice bar.
作为一种示例,客户端一向服务端发送录制的音频和该音频对应的第一用户标识,在客户端二从服务端的语音队列中获取该音频时,根据该第一用户标识对应的用户偏好信息,确定相应显示风格的目标控件。并根据用户名称和文本内容对目标控件进行填充,生成该音频对应的语音条。由此,对于不同的用户,能够根据用户偏好展示不同显示风格的语音条,提高显示效果。As an example, client one sends the recorded audio and the first user identification corresponding to the audio to the server. When client two obtains the audio from the voice queue of the server, the user preference information corresponding to the first user identification is used. , determine the target control of the corresponding display style. And fill in the target control according to the user name and text content, and generate a voice bar corresponding to the audio. Thus, for different users, voice bars with different display styles can be displayed according to user preferences, thereby improving the display effect.
在本公开的一个实施例中,响应于对预设区域中展示的目标语音条的触发操作,播放与目标语音条对应的音频。本实施例中,用户可以对在线课堂界面中展示的语音条进行点击操作,当检测到对语音条的点击操作时,获取该语音条对应的音频并播放。其中,触发操作的实现方式包括但不限于按键触发、触摸屏点击触发、手势触发等。In one embodiment of the present disclosure, in response to a triggering operation on the target voice bar displayed in the preset area, audio corresponding to the target voice bar is played. In this embodiment, the user can click on the voice bar displayed in the online classroom interface. When the click operation on the voice bar is detected, the audio corresponding to the voice bar is obtained and played. Among them, the implementation methods of triggering operations include but are not limited to button triggering, touch screen click triggering, gesture triggering, etc.
本公开实施例中,在在线课堂中采用语音条的形式进行语音分享与交流,并且,通过语音条的形式保留线上教学过程中的语音交互,作为课堂中有效互动,为后续的录播课、学生高光表现等提供素材。In the embodiment of the present disclosure, the form of voice bars is used for voice sharing and communication in the online classroom, and the voice interaction during the online teaching process is retained in the form of voice bars as an effective interaction in the classroom, providing a basis for subsequent recording and broadcasting of lessons. , students’ highlight performance, etc. provide materials.
基于上述实施例,下面以服务端侧对本公开实施例的方法进行说明。Based on the above embodiments, the method of the embodiments of the present disclosure will be described below from the server side.
图4为本公开实施例所提供的另一种在线课堂的语音交互方法的流程示意图,如图4所示,本公开实施例提供的在线课堂的语音交互方法包括:Figure 4 is a schematic flow chart of another voice interaction method in an online classroom provided by an embodiment of the present disclosure. As shown in Figure 4, the voice interaction method in an online classroom provided by an embodiment of the present disclosure includes:
步骤401,接收多个客户端发送的第一音频。Step 401: Receive first audio messages sent by multiple clients.
本公开实施例中,执行主体为服务端。 In this disclosed embodiment, the execution subject is the server.
本实施例中,在线课堂界面中配置有语音录制控件,用户可以通过触发该语音录制控件开启语音录制功能。客户端响应于对在线课堂界面中语音录制控件的触发操作,获取录制的第一音频。其中,语音录制控件的触发操作包括但不限于按键触发、触摸轨迹触发、手势触发等。In this embodiment, the online classroom interface is configured with a voice recording control, and the user can activate the voice recording function by triggering the voice recording control. The client obtains the first recorded audio in response to the triggering operation of the voice recording control in the online classroom interface. Among them, the triggering operations of the voice recording control include but are not limited to key triggering, touch track triggering, gesture triggering, etc.
其中,多个客户端可以是多个学生侧客户端,也可以包括教师侧客户端和至少一个学生侧客户端。每个客户端均可以响应于对语音录制控件的触发操作,获取录制的第一音频,并将第一音频发送至服务端。The multiple clients may be multiple student-side clients, or may include a teacher-side client and at least one student-side client. Each client can obtain the first recorded audio in response to a triggering operation on the voice recording control, and send the first audio to the server.
步骤402,根据每个第一音频的时间戳信息,将多个第一音频依次添加至语音队列中。Step 402: Add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio.
本实施例中,服务端设置有语音队列,语音队列用于存储服务端接收的第一音频。In this embodiment, the server is provided with a voice queue, and the voice queue is used to store the first audio received by the server.
作为一种示例,时间戳信息包括录制时间戳,录制时间戳用于表示第一音频的录制时间。学生侧客户端将第一音频及其对应的录制时间戳发送至服务端,当服务端接收到多个第一音频时,根据录制时间戳将第一音频按照录制时间的顺序依次添加至语音队列中,由此,语音队列中按照一定顺序依次存储多个音频。As an example, the timestamp information includes a recording timestamp, and the recording timestamp is used to represent the recording time of the first audio. The student-side client sends the first audio and its corresponding recording timestamp to the server. When the server receives multiple first audios, it adds the first audios to the voice queue in order of recording time according to the recording timestamp. , thus multiple audios are stored in the voice queue in a certain order.
步骤403,将语音队列中未播放的第二音频依次发送至多个客户端,其中,每个客户端播放第二音频。Step 403: Send the unplayed second audio in the voice queue to multiple clients in sequence, where each client plays the second audio.
本实施例中,语音队列中未播放的第二音频的数量为多个,为了避免多个音频同时播放,保证学生之间的有序发言,服务端可以将语音队列中未播放的第二音频依次发送至各客户端,以使各客户端依次播放第二音频。In this embodiment, the number of unplayed second audios in the voice queue is multiple. In order to avoid multiple audios from being played at the same time and ensure orderly speech among students, the server can store the unplayed second audios in the voice queue. Send it to each client in turn, so that each client plays the second audio in turn.
作为一种示例,服务端按照预设的时间间隔,将语音队列中未播放的第二音频依次发送至每个客户端。其中,时间间隔可以根据实际需要设置。As an example, the server sends the unplayed second audio in the voice queue to each client in sequence according to a preset time interval. Among them, the time interval can be set according to actual needs.
作为另一种示例,服务端接收到客户端发送的音频获取请求时,将语音队列中首个未播放的第二音频发送至客户端。本示例中,客户端检测到当前音频播放完毕时,向服务端发送音频获取请求。As another example, when the server receives the audio acquisition request sent by the client, it sends the first unplayed second audio in the voice queue to the client. In this example, when the client detects that the current audio has finished playing, it sends an audio acquisition request to the server.
根据本公开实施例的技术方案,服务端接收多个客户端发送的第一音频,根据每个第一音频的时间戳信息,将多个第一音频依次添加至语音队列中,以及将语音队列中未播放的第二音频依次发送至多个客户端,以使客户端依次播放多个第二音频,由此,提供了一种在线课堂的半异步语音讨论功能,使用户可以在在线课堂中进行语音分享与交流,在在线课堂中实现课堂小组讨论的效果,增强了在线课堂的氛围感,并且,在多个客户端录制音频的情况下,基于语音队列依次 获取并播放各个音频,实现半异步式的语音分享与交流,保证学生之间的有序发言,解决了多个学生同时发言时多个音频同时播放导致语音不清楚的问题,提高在线课堂的语音交互效果。According to the technical solution of the embodiment of the present disclosure, the server receives the first audios sent by multiple clients, adds the multiple first audios to the voice queue in sequence according to the timestamp information of each first audio, and adds the voice queue to the voice queue. The unplayed second audios are sent to multiple clients in sequence, so that the clients can play multiple second audios in sequence. This provides a semi-asynchronous voice discussion function for online classes, allowing users to conduct Voice sharing and communication can achieve the effect of classroom group discussion in the online classroom, enhance the atmosphere of the online classroom, and, in the case of multiple clients recording audio, based on the voice queue in sequence Obtain and play each audio to achieve semi-asynchronous voice sharing and communication, ensuring orderly speech among students, solving the problem of unclear speech caused by multiple audios being played at the same time when multiple students speak at the same time, and improving the voice quality of online classes. interactive effects.
图5为本公开实施例所提供的一种在线课堂的语音交互装置的结构示意图,如图5所示,该在线课堂的语音交互装置包括:录制模块51,上传模块52,播放模块53。FIG. 5 is a schematic structural diagram of a voice interaction device for online classes provided by an embodiment of the present disclosure. As shown in FIG. 5 , the voice interaction device for online classes includes: a recording module 51 , an upload module 52 , and a playback module 53 .
其中,录制模块51,用于响应于对在线课堂界面中语音录制控件的触发操作,获取录制的第一音频。Among them, the recording module 51 is used to obtain the recorded first audio in response to the triggering operation of the voice recording control in the online classroom interface.
上传模块52,用于将所述第一音频发送至服务端,其中,所述服务端接收多个客户端发送的第一音频,并将所述多个客户端发送的第一音频依次添加至语音队列中。Upload module 52, used to send the first audio to the server, wherein the server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients in sequence to in the voice queue.
播放模块53,用于依次获取所述语音队列中未播放的第二音频,并播放所述第二音频。The playback module 53 is used to sequentially obtain the unplayed second audio in the voice queue and play the second audio.
在本公开的一个实施例中,在线课堂的语音交互装置还包括:展示模块,用于在所述在线课堂界面中的预设区域,以语音条的形式依次展示所述第二音频。In one embodiment of the present disclosure, the voice interaction device of the online classroom further includes: a display module configured to sequentially display the second audio in the form of a voice bar in a preset area in the online classroom interface.
在本公开的一个实施例中,展示模块包括:识别单元,用于根据预训练的语音识别模型对所述第二音频进行语音识别,获取所述第二音频对应的文本内容;获取单元,用于获取所述第二音频的用户标识以及与所述用户标识对应的用户名称;生成单元,用于根据所述用户名称和所述文本内容对预设控件进行填充,生成所述第二音频对应的语音条。In one embodiment of the present disclosure, the display module includes: a recognition unit, used to perform speech recognition on the second audio according to a pre-trained speech recognition model, and obtain the text content corresponding to the second audio; and an acquisition unit, using for obtaining the user identification of the second audio and the user name corresponding to the user identification; a generating unit for filling the preset control according to the user name and the text content to generate the second audio corresponding voice bar.
在本公开的一个实施例中,生成单元具体用于:获取与所述用户标识对应的用户偏好信息;根据所述用户偏好信息对应的显示风格,确定所述显示风格的目标控件;根据所述用户名称和所述文本内容对所述目标控件进行填充,生成所述第二音频对应的语音条。In one embodiment of the present disclosure, the generation unit is specifically configured to: obtain user preference information corresponding to the user identification; determine the target control of the display style according to the display style corresponding to the user preference information; The user name and the text content are filled in the target control to generate a voice bar corresponding to the second audio.
在本公开的一个实施例中,在线课堂的语音交互装置还包括:触发模块,用于响应于对所述预设区域中展示的目标语音条的触发操作,播放与所述目标语音条对应的音频。In one embodiment of the present disclosure, the voice interaction device of the online classroom further includes: a triggering module, configured to respond to a triggering operation on a target voice bar displayed in the preset area, and play a video corresponding to the target voice bar. Audio.
图6为本公开实施例所提供的另一种在线课堂的语音交互装置的结构示意图,如图6所示,该在线课堂的语音交互装置包括:接收模块61,存储模块62,发送模块63。FIG. 6 is a schematic structural diagram of another online classroom voice interaction device provided by an embodiment of the present disclosure. As shown in FIG. 6 , the online classroom voice interaction device includes: a receiving module 61 , a storage module 62 , and a sending module 63 .
接收模块61,用于接收多个客户端发送的第一音频。 The receiving module 61 is used to receive the first audio sent by multiple clients.
存储模块62,用于根据每个所述第一音频的时间戳信息,将多个所述第一音频依次添加至语音队列中。The storage module 62 is configured to add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio.
发送模块63,用于将所述语音队列中未播放的第二音频依次发送至所述多个客户端,其中,每个客户端播放所述第二音频。The sending module 63 is configured to send the unplayed second audio in the voice queue to the multiple clients in sequence, where each client plays the second audio.
在本公开的一个实施例中,所述第二音频的数量为多个,发送模块63具体用于:按照预设的时间间隔,将所述语音队列中未播放的第二音频依次发送至每个客户端;或者,当检测到客户端返回的音频获取请求时,将所述语音队列中首个第二音频发送至客户端,其中,所述音频获取请求是客户端检测到当前音频播放完毕时发送的。In one embodiment of the present disclosure, there are multiple second audios, and the sending module 63 is specifically configured to: send the unplayed second audios in the voice queue to each voice in sequence according to a preset time interval. client; or, when an audio acquisition request returned by the client is detected, the first second audio in the voice queue is sent to the client, where the audio acquisition request is when the client detects that the current audio has been played. sent at the time.
本公开实施例所提供的在线课堂的语音交互装置可执行本公开实施例所提供的任意在线课堂的语音交互方法,具备执行方法相应的功能模块和有益效果。本公开装置实施例中未详尽描述的内容可以参考本公开任意方法实施例中的描述。The online classroom voice interaction device provided by the embodiment of the present disclosure can execute any online classroom voice interaction method provided by the embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method. Contents that are not described in detail in the device embodiments of the present disclosure may refer to the descriptions in any method embodiments of the present disclosure.
本公开示例性实施例还提供一种电子设备,包括:至少一个处理器;以及与至少一个处理器通信连接的存储器。所述存储器存储有能够被所述至少一个处理器执行的计算机程序,所述计算机程序在被所述至少一个处理器执行时,用于使所述电子设备执行:Exemplary embodiments of the present disclosure also provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor. The memory stores a computer program that can be executed by the at least one processor, and when executed by the at least one processor, the computer program is used to cause the electronic device to execute:
响应于对在线课堂界面中语音录制控件的触发操作,获取录制的第一音频;将第一音频发送至服务端,其中,服务端接收多个客户端发送的第一音频,并将多个客户端发送的第一音频依次添加至语音队列中;依次获取语音队列中未播放的第二音频,并播放第二音频。In response to the triggering operation of the voice recording control in the online classroom interface, obtain the recorded first audio; send the first audio to the server, where the server receives the first audio sent by multiple clients and sends the first audio to the server. The first audio sent by the client is added to the voice queue in turn; the second audio that has not been played in the voice queue is obtained in turn, and the second audio is played.
在本公开的一个实施例中,所述计算机程序在被所述至少一个处理器执行时,还用于使所述电子设备执行:在在线课堂界面中的预设区域,以语音条的形式依次展示第二音频。In one embodiment of the present disclosure, the computer program, when executed by the at least one processor, is also used to cause the electronic device to execute: sequentially in the form of a voice bar in a preset area in the online classroom interface. Show second audio.
在本公开的一个实施例中,以语音条的形式依次展示第二音频,包括:根据预训练的语音识别模型对第二音频进行语音识别,获取第二音频对应的文本内容;获取第二音频的用户标识以及与用户标识对应的用户名称;根据用户名称和文本内容对预设控件进行填充,生成第二音频对应的语音条。In one embodiment of the present disclosure, sequentially displaying the second audio in the form of voice bars includes: performing speech recognition on the second audio according to a pre-trained speech recognition model, and obtaining the text content corresponding to the second audio; obtaining the second audio The user identification and the user name corresponding to the user identification; fill in the preset control according to the user name and text content, and generate a voice bar corresponding to the second audio.
在本公开的一个实施例中,根据用户名称和文本内容对预设控件进行填充,生成第二音频对应的语音条,包括:获取与用户标识对应的用户偏好信息;根据用户偏好信息对应的显示风格,确定显示风格的目标控件;根据用户名称和文本内容对目标控件进行填充,生成第二音频对应的语音条。 In one embodiment of the present disclosure, filling the preset controls according to the user name and text content, and generating a voice bar corresponding to the second audio includes: obtaining user preference information corresponding to the user identification; and displaying corresponding information according to the user preference information. Style, determine the target control of the display style; fill the target control according to the user name and text content, and generate a voice bar corresponding to the second audio.
在本公开的一个实施例中,所述计算机程序在被所述至少一个处理器执行时,还用于使所述电子设备执行:响应于对预设区域中展示的目标语音条的触发操作,播放与目标语音条对应的音频。In one embodiment of the present disclosure, the computer program, when executed by the at least one processor, is also used to cause the electronic device to perform: in response to a triggering operation on the target voice bar displayed in the preset area, Play the audio corresponding to the target voice bar.
本公开示例性实施例还提供一种电子设备,包括:至少一个处理器;以及与至少一个处理器通信连接的存储器。存储器存储有能够被至少一个处理器执行的计算机程序,计算机程序在被至少一个处理器执行时用于使电子设备执行:Exemplary embodiments of the present disclosure also provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor. The memory stores a computer program that can be executed by at least one processor. The computer program, when executed by at least one processor, is used to cause the electronic device to execute:
接收多个客户端发送的第一音频;根据每个第一音频的时间戳信息,将多个第一音频依次添加至语音队列中;将语音队列中未播放的第二音频依次发送至多个客户端,其中,每个客户端播放第二音频。Receive first audios sent by multiple clients; add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio; send unplayed second audios in the voice queue to multiple clients in sequence client, where each client plays the second audio.
在本公开的一个实施例中,第二音频的数量为多个,将语音队列中未播放的第二音频依次发送至多个客户端,包括:按照预设的时间间隔,将语音队列中未播放的第二音频依次发送至每个客户端;或者,当检测到客户端返回的音频获取请求时,将语音队列中首个第二音频发送至客户端,其中,音频获取请求是客户端检测到当前音频播放完毕时发送的。In one embodiment of the present disclosure, the number of second audios is multiple, and sending the unplayed second audios in the voice queue to multiple clients in sequence includes: according to a preset time interval, sending the unplayed second audios in the voice queue to multiple clients. The second audio is sent to each client in turn; or, when the audio acquisition request returned by the client is detected, the first second audio in the voice queue is sent to the client, where the audio acquisition request is detected by the client. Sent when the current audio has finished playing.
本公开示例性实施例还提供一种存储有计算机程序的非瞬时计算机可读存储介质,其中,所述计算机程序在被计算机的处理器执行时用于使所述计算机执行:Exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, causes the computer to execute:
响应于对在线课堂界面中语音录制控件的触发操作,获取录制的第一音频;将第一音频发送至服务端,其中,服务端接收多个客户端发送的第一音频,并将多个客户端发送的第一音频依次添加至语音队列中;依次获取语音队列中未播放的第二音频,并播放第二音频。In response to the triggering operation of the voice recording control in the online classroom interface, obtain the recorded first audio; send the first audio to the server, where the server receives the first audio sent by multiple clients and sends the first audio to the server. The first audio sent by the client is added to the voice queue in turn; the second audio that has not been played in the voice queue is obtained in turn, and the second audio is played.
在本公开的一个实施例中,所述计算机程序在被计算机的处理器执行时,还用于使所述计算机执行:在在线课堂界面中的预设区域,以语音条的形式依次展示第二音频。In one embodiment of the present disclosure, when the computer program is executed by the processor of the computer, it is also used to cause the computer to execute: in a preset area in the online classroom interface, sequentially display the second time in the form of a voice bar. Audio.
在本公开的一个实施例中,其中,以语音条的形式依次展示第二音频,包括:根据预训练的语音识别模型对第二音频进行语音识别,获取第二音频对应的文本内容;获取第二音频的用户标识以及与用户标识对应的用户名称;根据用户名称和文本内容对预设控件进行填充,生成第二音频对应的语音条。In one embodiment of the present disclosure, displaying the second audio in the form of voice bars sequentially includes: performing speech recognition on the second audio according to a pre-trained speech recognition model, and obtaining the text content corresponding to the second audio; obtaining the second audio. The user ID of the second audio and the user name corresponding to the user ID; fill in the preset control according to the user name and text content to generate a voice bar corresponding to the second audio.
在本公开的一个实施例中,根据用户名称和文本内容对预设控件进行填充,生成第二音频对应的语音条,包括:获取与用户标识对应的用户偏好信息;根据用户偏好信息对应的显示风格,确定显示风格的目标控件;根据用户名称和文本内容对目标控件进行填充,生成第二音频对应的语音条。 In one embodiment of the present disclosure, filling the preset controls according to the user name and text content, and generating a voice bar corresponding to the second audio includes: obtaining user preference information corresponding to the user identification; and displaying corresponding information according to the user preference information. Style, determine the target control of the display style; fill the target control according to the user name and text content, and generate a voice bar corresponding to the second audio.
在本公开的一个实施例中,所述计算机程序在被计算机的处理器执行时,还用于使所述计算机执行:响应于对预设区域中展示的目标语音条的触发操作,播放与目标语音条对应的音频。In one embodiment of the present disclosure, the computer program, when executed by the processor of the computer, is also used to cause the computer to perform: in response to a triggering operation on the target voice bar displayed in the preset area, playback and target The audio corresponding to the voice bar.
本公开示例性实施例还提供一种存储有计算机程序的非瞬时计算机可读存储介质,其中,所述计算机程序在被计算机的处理器执行时用于使所述计算机执行:Exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, causes the computer to execute:
接收多个客户端发送的第一音频;根据每个第一音频的时间戳信息,将多个第一音频依次添加至语音队列中;将语音队列中未播放的第二音频依次发送至多个客户端,其中,每个客户端播放第二音频。Receive first audios sent by multiple clients; add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio; send unplayed second audios in the voice queue to multiple clients in sequence client, where each client plays the second audio.
在本公开的一个实施例中,第二音频的数量为多个,将语音队列中未播放的第二音频依次发送至多个客户端,包括:按照预设的时间间隔,将语音队列中未播放的第二音频依次发送至每个客户端;或者,当检测到客户端返回的音频获取请求时,将语音队列中首个第二音频发送至客户端,其中,音频获取请求是客户端检测到当前音频播放完毕时发送的。In one embodiment of the present disclosure, the number of second audios is multiple, and sending the unplayed second audios in the voice queue to multiple clients in sequence includes: according to a preset time interval, sending the unplayed second audios in the voice queue to multiple clients. The second audio is sent to each client in turn; or, when the audio acquisition request returned by the client is detected, the first second audio in the voice queue is sent to the client, where the audio acquisition request is detected by the client. Sent when the current audio has finished playing.
本公开示例性实施例还提供一种计算机程序产品,包括计算机程序,其中,所述计算机程序在被计算机的处理器执行时用于使所述计算机执行根据本公开实施例的方法。Exemplary embodiments of the present disclosure also provide a computer program product, including a computer program, wherein the computer program, when executed by a processor of a computer, is used to cause the computer to perform a method according to an embodiment of the present disclosure.
参考图7,现将描述可以作为本公开的服务器或客户端的电子设备700的结构框图,其是可以应用于本公开的各方面的硬件设备的示例。电子设备旨在表示各种形式的数字电子的计算机设备,诸如,膝上型计算机、台式计算机、工作台、个人数字助理、服务器、刀片式服务器、大型计算机、和其它适合的计算机。电子设备还可以表示各种形式的移动装置,诸如,个人数字处理、蜂窝电话、智能电话、可穿戴设备和其它类似的计算装置。本文所示的部件、它们的连接和关系、以及它们的功能仅仅作为示例,并且不意在限制本文中描述的和/或者要求的本公开的实现。Referring to FIG. 7 , a structural block diagram of an electronic device 700 that may serve as a server or client of the present disclosure will now be described, which is an example of a hardware device that may be applied to aspects of the present disclosure. Electronic devices are intended to refer to various forms of digital electronic computing equipment, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.
如图7所示,电子设备700包括计算单元701,其可以根据存储在只读存储器(ROM)702中的计算机程序或者从存储单元708加载到随机访问存储器(RAM)703中的计算机程序,来执行各种适当的动作和处理。在RAM 703中,还可存储设备700操作所需的各种程序和数据。计算单元701、ROM 702以及RAM 703通过总线704彼此相连。输入/输出(I/O)接口705也连接至总线704。As shown in FIG. 7 , the electronic device 700 includes a computing unit 701 that can perform calculations according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 . Perform various appropriate actions and processing. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
电子设备700中的多个部件连接至I/O接口705,包括:输入单元706、输出单元707、存储单元708以及通信单元709。输入单元706可以是能向电子设备700 输入信息的任何类型的设备,输入单元706可以接收输入的数字或字符信息,以及产生与电子设备的用户设置和/或功能控制有关的键信号输入。输出单元707可以是能呈现信息的任何类型的设备,并且可以包括但不限于显示器、扬声器、视频/音频输出终端、振动器和/或打印机。存储单元708可以包括但不限于磁盘、光盘。通信单元709允许电子设备700通过诸如因特网的计算机网络和/或各种电信网络与其他设备交换信息/数据,并且可以包括但不限于调制解调器、网卡、红外通信设备、无线通信收发机和/或芯片组,例如蓝牙TM设备、WiFi设备、WiMax设备、蜂窝通信设备和/或类似物。Multiple components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be a device capable of inputting information to the electronic device 700 For any type of device for inputting information, the input unit 706 may receive inputted numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device. Output unit 707 may be any type of device capable of presenting information, and may include, but is not limited to, a display, speakers, video/audio output terminal, vibrator, and/or printer. The storage unit 708 may include, but is not limited to, magnetic disks and optical disks. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver and/or a chip Groups such as Bluetooth™ devices, WiFi devices, WiMax devices, cellular communications devices and/or the like.
计算单元701可以是各种具有处理和计算能力的通用和/或专用处理组件。计算单元701的一些示例包括但不限于中央处理单元(CPU)、图形处理单元(GPU)、各种专用的人工智能(AI)计算芯片、各种运行机器学习模型算法的计算单元、数字信号处理器(DSP)、以及任何适当的处理器、控制器、微控制器等。计算单元701执行上文所描述的各个方法和处理。例如,在一些实施例中,在线课堂的语音交互方法可被实现为计算机软件程序,其被有形地包含于机器可读介质,例如存储单元708。在一些实施例中,计算机程序的部分或者全部可以经由ROM 702和/或通信单元709而被载入和/或安装到电子设备700上。在一些实施例中,计算单元701可以通过其他任何适当的方式(例如,借助于固件)而被配置为执行在线课堂的语音交互方法。Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above. For example, in some embodiments, the voice interaction method of the online classroom can be implemented as a computer software program, which is tangibly included in a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. In some embodiments, the computing unit 701 may be configured to perform the voice interaction method of the online classroom through any other suitable means (eg, by means of firmware).
用于实施本公开的方法的程序代码可以采用一个或多个编程语言的任何组合来编写。这些程序代码可以提供给通用计算机、专用计算机或其他可编程数据处理装置的处理器或控制器,使得程序代码当由处理器或控制器执行时使流程图和/或框图中所规定的功能/操作被实施。程序代码可以完全在机器上执行、部分地在机器上执行,作为独立软件包部分地在机器上执行且部分地在远程机器上执行或完全在远程机器或服务器上执行。Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪 存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
如本公开使用的,术语“机器可读介质”和“计算机可读介质”指的是用于将机器指令和/或数据提供给可编程处理器的任何计算机程序产品、设备、和/或装置(例如,磁盘、光盘、存储器、可编程逻辑装置(PLD)),包括,接收作为机器可读信号的机器指令的机器可读介质。术语“机器可读信号”指的是用于将机器指令和/或数据提供给可编程处理器的任何信号。As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor (eg, magnetic disk, optical disk, memory, programmable logic device (PLD)), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
为了提供与用户的交互,可以在计算机上实施此处描述的系统和技术,该计算机具有:用于向用户显示信息的显示装置(例如,CRT(阴极射线管)或者LCD(液晶显示器)监视器);以及键盘和指向装置(例如,鼠标或者轨迹球),用户可以通过该键盘和该指向装置来将输入提供给计算机。其它种类的装置还可以用于提供与用户的交互;例如,提供给用户的反馈可以是任何形式的传感反馈(例如,视觉反馈、听觉反馈、或者触觉反馈);并且可以用任何形式(包括声输入、语音输入或者、触觉输入)来接收来自用户的输入。To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.
可以将此处描述的系统和技术实施在包括后台部件的计算系统(例如,作为数据服务器)、或者包括中间件部件的计算系统(例如,应用服务器)、或者包括前端部件的计算系统(例如,具有图形用户界面或者网络浏览器的用户计算机,用户可以通过该图形用户界面或者该网络浏览器来与此处描述的系统和技术的实施方式交互)、或者包括这种后台部件、中间件部件、或者前端部件的任何组合的计算系统中。可以通过任何形式或者介质的数字数据通信(例如,通信网络)来将系统的部件相互连接。通信网络的示例包括:局域网(LAN)、广域网(WAN)和互联网。The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.
计算机系统可以包括客户端和服务器。客户端和服务器一般远离彼此并且通常通过通信网络进行交互。通过在相应的计算机上运行并且彼此具有客户端-服务器关系的计算机程序来产生客户端和服务器的关系。Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的 情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that in this article, relational terms such as “first” and “second” are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is no such actual relationship or sequence between entities or operations. Furthermore, the terms "comprises,""comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. without further restrictions In the case where an element is defined by the statement "comprises a...", it does not exclude the presence of other identical elements in a process, method, article or device that includes the stated element.
以上所述仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文所述的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。 The above descriptions are only specific embodiments of the present disclosure, enabling those skilled in the art to understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be practiced in other embodiments without departing from the spirit or scope of the disclosure. Therefore, the present disclosure is not to be limited to the embodiments described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (16)

  1. 一种在线课堂的语音交互方法,应用于客户端,所述方法包括:A voice interaction method for online classes, applied to the client, the method includes:
    响应于对在线课堂界面中语音录制控件的触发操作,获取录制的第一音频;In response to the triggering operation of the voice recording control in the online classroom interface, obtain the first recorded audio;
    将所述第一音频发送至服务端,其中,所述服务端接收多个客户端发送的第一音频,并将所述多个客户端发送的第一音频依次添加至语音队列中;Send the first audio to the server, wherein the server receives the first audio sent by multiple clients and adds the first audio sent by the multiple clients to the voice queue in sequence;
    依次获取所述语音队列中未播放的第二音频,并播放所述第二音频。The second audio that has not been played in the voice queue is acquired in sequence, and the second audio is played.
  2. 如权利要求1所述的在线课堂的语音交互方法,还包括:The voice interaction method for online classes as claimed in claim 1, further comprising:
    在所述在线课堂界面中的预设区域,以语音条的形式依次展示所述第二音频。In the preset area of the online classroom interface, the second audio is sequentially displayed in the form of a voice bar.
  3. 如权利要求2所述的在线课堂的语音交互方法,其中,所述以语音条的形式依次展示所述第二音频,包括:The voice interaction method for online classes as claimed in claim 2, wherein the sequential display of the second audio in the form of voice bars includes:
    根据预训练的语音识别模型对所述第二音频进行语音识别,获取所述第二音频对应的文本内容;Perform speech recognition on the second audio according to a pre-trained speech recognition model, and obtain the text content corresponding to the second audio;
    获取所述第二音频的用户标识以及与所述用户标识对应的用户名称;Obtain the user identification of the second audio and the user name corresponding to the user identification;
    根据所述用户名称和所述文本内容对预设控件进行填充,生成所述第二音频对应的语音条。Fill in the preset control according to the user name and the text content, and generate a voice bar corresponding to the second audio.
  4. 如权利要求3所述的在线课堂的语音交互方法,其中,所述根据所述用户名称和所述文本内容对预设控件进行填充,生成所述第二音频对应的语音条,包括:The voice interaction method for online classes according to claim 3, wherein filling in preset controls according to the user name and the text content and generating a voice bar corresponding to the second audio includes:
    获取与所述用户标识对应的用户偏好信息;Obtain user preference information corresponding to the user identification;
    根据所述用户偏好信息对应的显示风格,确定所述显示风格的目标控件;Determine the target control of the display style according to the display style corresponding to the user preference information;
    根据所述用户名称和所述文本内容对所述目标控件进行填充,生成所述第二音频对应的语音条。The target control is filled in according to the user name and the text content, and a voice bar corresponding to the second audio is generated.
  5. 如权利要求2-4中任一项所述的在线课堂的语音交互方法,还包括:The voice interaction method in online classroom according to any one of claims 2-4, further comprising:
    响应于对所述预设区域中展示的目标语音条的触发操作,播放与所述目标语音条对应的音频。In response to a triggering operation on a target voice bar displayed in the preset area, audio corresponding to the target voice bar is played.
  6. 一种在线课堂的语音交互方法,应用于服务端,所述方法包括:A voice interaction method for online classes, applied to the server, the method includes:
    接收多个客户端发送的第一音频;Receive the first audio sent by multiple clients;
    根据每个所述第一音频的时间戳信息,将多个所述第一音频依次添加至语音队列中;Add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio;
    将所述语音队列中未播放的第二音频依次发送至所述多个客户端,其中,每个客户端用于播放所述第二音频。 The unplayed second audio in the voice queue is sent to the multiple clients in sequence, where each client is used to play the second audio.
  7. 如权利要求6所述的在线课堂的语音交互方法,其中,所述第二音频的数量为多个,所述将所述语音队列中未播放的第二音频依次发送至所述多个客户端,包括:The voice interaction method in an online classroom according to claim 6, wherein the number of second audios is multiple, and the second audios that are not played in the voice queue are sent to the multiple clients in sequence. ,include:
    按照预设的时间间隔,将所述语音队列中未播放的第二音频依次发送至每个客户端;或者,According to the preset time interval, the second audio that has not been played in the voice queue is sent to each client in turn; or,
    当检测到客户端返回的音频获取请求时,将所述语音队列中首个第二音频发送至客户端,其中,所述音频获取请求是客户端检测到当前音频播放完毕时发送的。When the audio acquisition request returned by the client is detected, the first second audio in the voice queue is sent to the client, wherein the audio acquisition request is sent when the client detects that the current audio has been played.
  8. 一种在线课堂的语音交互装置,应用于客户端,所述装置包括:A voice interaction device for online classes, applied to the client, the device includes:
    录制模块,用于响应于对在线课堂界面中语音录制控件的触发操作,获取录制的第一音频;A recording module, used to obtain the first recorded audio in response to the triggering operation of the voice recording control in the online classroom interface;
    上传模块,用于将所述第一音频发送至服务端,其中,所述服务端接收多个客户端发送的第一音频,并将所述多个客户端发送的第一音频依次添加至语音队列中;Upload module, used to send the first audio to the server, wherein the server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients to the voice in turn. in queue;
    播放模块,用于依次获取所述语音队列中未播放的第二音频,并播放所述第二音频。A playback module, configured to sequentially obtain unplayed second audio in the voice queue and play the second audio.
  9. 如权利要求8所述的在线课堂的语音交互装置,所述装置还包括展示模块,用于在所述在线课堂界面中的预设区域,以语音条的形式依次展示所述第二音频。The voice interaction device for online classroom as claimed in claim 8, further comprising a display module configured to sequentially display the second audio in the form of a voice bar in a preset area of the online classroom interface.
  10. 如权利要求9所述的在线课堂的语音交互装置,其中,所述展示模块包括:The voice interaction device for online classroom according to claim 9, wherein the display module includes:
    识别单元,用于根据预训练的语音识别模型对所述第二音频进行语音识别,获取所述第二音频对应的文本内容;A recognition unit, configured to perform speech recognition on the second audio according to a pre-trained speech recognition model, and obtain the text content corresponding to the second audio;
    获取单元,用于获取所述第二音频的用户标识以及与所述用户标识对应的用户名称;An acquisition unit, configured to acquire the user identification of the second audio and the user name corresponding to the user identification;
    生成单元,用于根据所述用户名称和所述文本内容对预设控件进行填充,生成所述第二音频对应的语音条。A generating unit, configured to fill in the preset control according to the user name and the text content, and generate a voice bar corresponding to the second audio.
  11. 如权利要求10所述的在线课堂的语音交互装置,其中,所述生成单元用于:The voice interaction device for online classroom according to claim 10, wherein the generating unit is used for:
    获取与所述用户标识对应的用户偏好信息;根据所述用户偏好信息对应的显示风格,确定所述显示风格的目标控件;根据所述用户名称和所述文本内容对所述目标控件进行填充,生成所述第二音频对应的语音条。 Obtain user preference information corresponding to the user identification; determine the target control of the display style according to the display style corresponding to the user preference information; fill the target control according to the user name and the text content, Generate a voice bar corresponding to the second audio.
  12. 如权利要求9-11中任一项所述的在线课堂的语音交互装置,所述装置还包括触发模块,用于响应于对所述预设区域中展示的目标语音条的触发操作,播放与所述目标语音条对应的音频。The voice interaction device for online classroom according to any one of claims 9 to 11, said device further comprising a trigger module for, in response to a trigger operation on the target voice bar displayed in the preset area, playing and The audio corresponding to the target voice bar.
  13. 一种在线课堂的语音交互装置,应用于服务端,所述装置包括:A voice interaction device for online classes, applied to the server, the device includes:
    接收模块,用于接收多个客户端发送的第一音频;The receiving module is used to receive the first audio sent by multiple clients;
    存储模块,用于根据每个所述第一音频的时间戳信息,将多个所述第一音频依次添加至语音队列中;A storage module configured to add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio;
    发送模块,用于将所述语音队列中未播放的第二音频依次发送至所述多个客户端,其中,每个客户端用于播放所述第二音频。A sending module, configured to send the unplayed second audio in the voice queue to the plurality of clients in sequence, where each client is used to play the second audio.
  14. 如权利要求13所述的在线课堂的语音交互装置,其中,所述第二音频的数量为多个,所述发送模块用于:The voice interaction device for online classes as claimed in claim 13, wherein the number of second audios is multiple, and the sending module is used to:
    按照预设的时间间隔,将所述语音队列中未播放的第二音频依次发送至每个客户端;或者,当检测到客户端返回的音频获取请求时,将所述语音队列中首个第二音频发送至客户端,其中,所述音频获取请求是客户端检测到当前音频播放完毕时发送的。According to the preset time interval, the second unplayed audio in the voice queue is sent to each client in turn; or, when the audio acquisition request returned by the client is detected, the first audio in the voice queue is sent to each client in turn; The second audio is sent to the client, where the audio acquisition request is sent when the client detects that the current audio has been played.
  15. 一种电子设备,包括:An electronic device including:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;memory for storing instructions executable by the processor;
    所述处理器,用于从所述存储器中读取所述可执行指令,并执行所述指令以实现上述权利要求1-7中任一项所述的方法。The processor is configured to read the executable instructions from the memory and execute the instructions to implement the method described in any one of claims 1-7.
  16. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述权利要求1-7中任一项所述的方法。 A computer-readable storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor, the method described in any one of claims 1-7 is implemented.
PCT/CN2023/097411 2022-06-14 2023-05-31 Online class voice interaction methods and apparatus, device and storage medium WO2023241360A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210664108.5 2022-06-14
CN202210664108.5A CN114760274B (en) 2022-06-14 2022-06-14 Voice interaction method, device, equipment and storage medium for online classroom

Publications (1)

Publication Number Publication Date
WO2023241360A1 true WO2023241360A1 (en) 2023-12-21

Family

ID=82336872

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/097411 WO2023241360A1 (en) 2022-06-14 2023-05-31 Online class voice interaction methods and apparatus, device and storage medium

Country Status (2)

Country Link
CN (1) CN114760274B (en)
WO (1) WO2023241360A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114760274B (en) * 2022-06-14 2022-09-02 北京新唐思创教育科技有限公司 Voice interaction method, device, equipment and storage medium for online classroom

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160093315A1 (en) * 2014-09-29 2016-03-31 Kabushiki Kaisha Toshiba Electronic device, method and storage medium
CN105867718A (en) * 2015-12-10 2016-08-17 乐视网信息技术(北京)股份有限公司 Multimedia interaction method and apparatus
CN109672610A (en) * 2018-12-26 2019-04-23 深圳市自然门科技有限公司 A kind of multigroup group speech real time communication method and system
CN112312064A (en) * 2020-11-02 2021-02-02 腾讯科技(深圳)有限公司 Voice interaction method and related equipment
CN114760274A (en) * 2022-06-14 2022-07-15 北京新唐思创教育科技有限公司 Voice interaction method, device, equipment and storage medium for online classroom

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8086457B2 (en) * 2007-05-30 2011-12-27 Cepstral, LLC System and method for client voice building
CN102364952B (en) * 2011-10-25 2013-12-25 浙江万朋网络技术有限公司 Method for processing audio and video synchronization in simultaneous playing of plurality of paths of audio and video
CN109039872B (en) * 2018-09-04 2020-04-17 北京达佳互联信息技术有限公司 Real-time voice information interaction method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160093315A1 (en) * 2014-09-29 2016-03-31 Kabushiki Kaisha Toshiba Electronic device, method and storage medium
CN105867718A (en) * 2015-12-10 2016-08-17 乐视网信息技术(北京)股份有限公司 Multimedia interaction method and apparatus
CN109672610A (en) * 2018-12-26 2019-04-23 深圳市自然门科技有限公司 A kind of multigroup group speech real time communication method and system
CN112312064A (en) * 2020-11-02 2021-02-02 腾讯科技(深圳)有限公司 Voice interaction method and related equipment
CN114760274A (en) * 2022-06-14 2022-07-15 北京新唐思创教育科技有限公司 Voice interaction method, device, equipment and storage medium for online classroom

Also Published As

Publication number Publication date
CN114760274B (en) 2022-09-02
CN114760274A (en) 2022-07-15

Similar Documents

Publication Publication Date Title
US10210002B2 (en) Method and apparatus of processing expression information in instant communication
US20200234478A1 (en) Method and Apparatus for Processing Information
KR20210069711A (en) Courseware recording and playback methods, devices, smart interactive tablets and storage media
CN110568984A (en) Online teaching method and device, storage medium and electronic equipment
JP2017537412A (en) System and method for tracking events and providing virtual meeting feedback
CN110673777A (en) Online teaching method and device, storage medium and terminal equipment
US11094215B2 (en) Internet-based recorded course learning following system and method
US20130222526A1 (en) System and Method of a Remote Conference
CN109697906B (en) Following teaching method based on Internet teaching platform
CN114339285B (en) Knowledge point processing method, video processing method, device and electronic equipment
CN111209417A (en) Information display method, server, terminal and storage medium
WO2023241360A1 (en) Online class voice interaction methods and apparatus, device and storage medium
CN112131361A (en) Method and device for pushing answer content
WO2016161922A1 (en) Video file processing method and device
US20150141154A1 (en) Interactive Experimentation
US20170004859A1 (en) User created textbook
CN115963963A (en) Interactive novel generation method, presentation method, device, equipment and medium
US20220150290A1 (en) Adaptive collaborative real-time remote remediation
CN115391709A (en) Content creation based on text-to-image generation
CN114913042A (en) Teaching courseware generation method and device, electronic equipment and storage medium
CN109343761B (en) Data processing method based on intelligent interaction equipment and related equipment
KR20120027647A (en) Learning contents generating system and method thereof
CN113420135A (en) Note processing method and device in online teaching, electronic equipment and storage medium
CN114422468A (en) Message processing method, device, terminal and storage medium
CN115052194B (en) Learning report generation method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23822930

Country of ref document: EP

Kind code of ref document: A1