WO2023241360A1

WO2023241360A1 - Online class voice interaction methods and apparatus, device and storage medium

Info

Publication number: WO2023241360A1
Application number: PCT/CN2023/097411
Authority: WO
Inventors: 迪力木拉提·都里昆; 吴承峰
Original assignee: 北京新唐思创教育科技有限公司
Priority date: 2022-06-14
Filing date: 2023-05-31
Publication date: 2023-12-21
Also published as: CN114760274B; CN114760274A

Abstract

The present disclosure relates to online class voice interaction methods and apparatus, a device and a storage medium. One method comprises: in response to a trigger operation on a voice recording control in an online class interface, acquiring a recorded first audio; sending the first audio to a server, wherein the server receives first audios sent by a plurality of clients, and sequentially adds into a voice queue the first audios sent by the plurality of clients; and sequentially acquiring second audios in the voice queue which are not played, and playing the second audios. The technical solution of the present disclosure realizes semi-asynchronous voice sharing and communication, thus improving the atmosphere of an online class, ensuring that students speak in an orderly manner, and improving the voice interaction effect of an online class.

Description

Voice interaction methods, devices, equipment and storage media for online classrooms

This application requires the priority of the invention application with the filing date of June 14, 2022, the application number is "202210664108.5", and the patent name is "Voice interaction method, device, equipment and storage medium for online classroom", the entire content of which is here Introduced for reference.

Technical field

The present disclosure relates to the technical field of human-computer interaction, and in particular to a voice interaction method, device, equipment and storage medium for online classrooms.

Background technique

With the development of Internet technology, online classes are widely used in various educational and teaching scenarios. During the classroom teaching process, in order to ensure the quality of teaching and enhance students' classroom participation, students are usually required to engage in voice interaction in the classroom.

In related technologies, the teacher starts online live teaching, and when a student needs to speak, the student's voice permissions are enabled through buttons in the live broadcast interface.

Contents of the invention

According to one aspect of the present disclosure, a voice interaction method for online classes is provided, including:

In response to the triggering operation of the voice recording control in the online classroom interface, obtain the first recorded audio;

Send the first audio to the server, wherein the server receives the first audio sent by multiple clients and adds the first audio sent by the multiple clients to the voice queue in sequence;

The second audio that has not been played in the voice queue is acquired in sequence, and the second audio is played.

According to another aspect of the present disclosure, another voice interaction method for online classes is provided, including:

Receive the first audio sent by multiple clients;

Add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio;

The unplayed second audio in the voice queue is sent to the multiple clients in sequence, wherein each client plays the second audio.

According to another aspect of the present disclosure, a voice interaction device for online classes is provided, including:

The recording module is used to obtain the recording in response to the triggering operation of the voice recording control in the online classroom interface. The first audio produced;

Upload module, used to send the first audio to the server, wherein the server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients to the voice in turn. in queue;

A playback module, configured to sequentially obtain unplayed second audio in the voice queue and play the second audio.

According to another aspect of the present disclosure, another voice interaction device for online classes is provided, including:

The receiving module is used to receive the first audio sent by multiple clients;

A storage module configured to add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio;

A sending module, configured to send the unplayed second audio in the voice queue to the multiple clients in sequence, where each client plays the second audio.

According to another aspect of the present disclosure, an electronic device is provided, including: a processor; a memory for storing instructions executable by the processor; and the processor for reading the executable instructions from the memory. Execute the instruction, and execute the instruction to implement the above voice interaction method in the online classroom.

According to another aspect of the present disclosure, a computer-readable storage medium is provided. The storage medium stores a computer program. When the computer program is executed by a processor, the above-mentioned voice interaction method in an online classroom is implemented.

According to one or more technical solutions provided in the embodiments of this application, the client obtains the recorded first audio and sends it to the server. The server receives the first audio sent by multiple clients and sends the first audio to the server. The first audio is added to the voice queue in turn, and the client sequentially obtains the unplayed second audio in the voice queue and plays the second audio. This provides a semi-asynchronous voice discussion function for online classes and enhances the online A sense of classroom atmosphere, and when multiple clients record audio, each audio is acquired and played sequentially based on the voice queue, achieving semi-asynchronous voice sharing and communication, ensuring orderly speech among students, and improving the online classroom voice interaction effect.

Description of the drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to explain the embodiments of the present disclosure or the technical solutions in the prior art more clearly, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for ordinary people in the art, For those skilled in the art, other drawings can also be obtained based on these drawings without exerting creative labor.

Figure 1 is a schematic flowchart of a voice interaction method in an online classroom provided by an embodiment of the present disclosure;

Figure 2 is a schematic flowchart of another voice interaction method in an online classroom provided by an embodiment of the present disclosure;

Figure 3 is a schematic diagram of an online classroom interface provided by an embodiment of the present disclosure;

Figure 4 is a schematic flowchart of another voice interaction method in an online classroom provided by an embodiment of the present disclosure;

Figure 5 is a schematic structural diagram of a voice interaction device for online classrooms provided by an embodiment of the present disclosure;

Figure 6 is a schematic structural diagram of another online classroom voice interaction device provided by an embodiment of the present disclosure;

FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the present disclosure.

Detailed ways

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, which rather are provided for A more thorough and complete understanding of this disclosure. It should be understood that the drawings and embodiments of the present disclosure are for illustrative purposes only and are not intended to limit the scope of the present disclosure.

It should be understood that various steps described in the method implementations of the present disclosure may be executed in different orders and/or in parallel. Furthermore, method embodiments may include additional steps and/or omit performance of illustrated steps. The scope of the present disclosure is not limited in this regard.

Aspects of the present disclosure are described below with reference to the accompanying drawings.

Figure 1 is a schematic flow chart of a voice interaction method in an online classroom provided by an embodiment of the present disclosure. As shown in Figure 1, the voice interaction method in an online classroom provided by an embodiment of the present disclosure includes:

Step 101: Obtain the recorded first audio in response to a triggering operation on the voice recording control in the online classroom interface.

The method of the embodiment of the present disclosure is used for voice interaction between users in an online classroom. Among them, a virtual teaching scene can be constructed based on the unity3D engine and displayed in the online classroom interface to realize online classroom teaching.

In this disclosed embodiment, the online classroom interface is configured with a voice recording control, and the user can activate the voice recording function by triggering the voice recording control. The client obtains the first recorded audio in response to the triggering operation of the voice recording control in the online classroom interface. Among them, the triggering operations of the voice recording control include but are not limited to key triggering, touch track triggering, gesture triggering, etc.

Among them, the multiple clients may be multiple student-side clients, or may include teacher-side clients and One less student side client.

As an example, in an online classroom, the teacher collects videos through a camera, the teacher-side client sends the collected video to the server, and the server sends the video to each student-side client, and the student-side clients use the The video is displayed in a designated area in real time to implement online classroom teaching, where the designated area is, for example, the upper right area of the online classroom interface. In this example, the voice recording control is configured in the online classroom interface of the student-side client. The voice recording control can be configured to press and hold to speak. Students perform voice recording through the voice recording control in the online classroom interface. Locally on the student-side client The first audio is generated, and the student-side client sends the first audio to the client.

Step 102: Send the first audio to the server, where the server receives the first audio sent by multiple clients and adds the first audio sent by the multiple clients to the voice queue in sequence.

In this embodiment, each client sends the first audio recorded locally to the server. The server is provided with a voice queue, and the voice queue is used to store the first audio received by the server. Optionally, the server receives multiple first audios sent by multiple clients, and adds the multiple first audios to the voice queue in sequence according to the timestamp information of each first audio.

As an example, the student-side client obtains the recorded first audio and the recording timestamp corresponding to the first audio. The recording timestamp is used to indicate the recording time of the first audio. The student-side client sends the first audio and its corresponding recording timestamp to the server. When the server receives multiple first audios, it adds the first audios to the voice queue in order of recording time according to the recording timestamp. , thus multiple audios are stored in the voice queue in a certain order.

Step 103: Acquire the unplayed second audio in the voice queue in sequence, and play the second audio.

In this embodiment, each client sequentially obtains and plays the audios in the voice queue according to the order between the audios in the voice queue. The voice queue includes played audio and unplayed second audio, and each time the client obtains and plays the second audio in the voice queue. Optionally, the server sends the unplayed second audio in the voice queue to each client in turn according to a preset time interval; or, the client obtains and plays audio 1, and when the client detects that audio 1 has been played, When an audio retrieval request is sent to the server, the server obtains audio two that is adjacent to audio one and has not been played from the voice queue according to the audio retrieval request, and sends audio two to the client.

For example, if the online class is set up in group discussion mode, the student-side clients in the same group are set with the same group ID, and the client sends the first audio and group ID to the server. The server stores multiple first audios and corresponding group identifiers, and the client sequentially obtains and plays the corresponding audios from the server according to its own group identifier.

According to the technical solution of the embodiment of the present disclosure, the client obtains the recorded first audio and sends it to the server. The server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients in sequence to In the voice queue, the client sequentially obtains the unplayed second audio in the voice queue and plays the second audio. This provides a semi-asynchronous voice discussion function in the online classroom, allowing users to share voice in the online classroom. and communication, achieving the effect of classroom group discussion in the online classroom, enhancing the atmosphere of the online classroom, and when multiple clients record audio, each audio is obtained and played sequentially based on the voice queue, achieving semi-asynchronous Voice sharing and communication ensures orderly speech among students, solves the problem of unclear speech caused by playing multiple audios at the same time when multiple students speak at the same time, and improves the voice interaction effect of online classes.

Based on the above embodiments, in one embodiment of the present disclosure, after acquiring the second audio in the voice queue, the client displays the second audio in the form of a voice bar in a preset area in the online classroom interface.

Figure 2 is a schematic flow chart of a voice interaction method in an online classroom provided by an embodiment of the present disclosure. As shown in Figure 2, in the voice interaction method in an online classroom provided by an embodiment of the present disclosure, the presets in the online classroom interface are area, which displays the second audio in sequence in the form of voice bars, including:

Step 201: Perform speech recognition on the second audio based on the pre-trained speech recognition model, and obtain the text content corresponding to the second audio.

In this embodiment, the input of the speech recognition model is audio, and the output is text content corresponding to the audio. Among them, the speech recognition model can be implemented based on a deep neural network, using audio labeled with corresponding text content as training samples, and training the speech recognition model based on the training samples.

As an example, a speech recognition model is set in advance, and the speech recognition model is used to perform speech recognition on each audio in the speech queue, and the text content corresponding to each audio is obtained. In this example, when the client sequentially obtains the unplayed second audio in the voice queue and plays it, it obtains the text content corresponding to the second audio.

Step 202: Obtain the user identification of the second audio and the user name corresponding to the user identification.

In this embodiment, the client obtains the recorded first audio and the user identification corresponding to the first audio, and sends the first audio and its corresponding user identification to the server. Furthermore, the client obtains the second audio in the voice queue. The audio and the user identification corresponding to the second audio, and the corresponding user name is determined based on the user identification. The user ID is used to distinguish each user. The user ID can be a user account, and the user name can be input by the user when creating the account.

Step 203: Fill in the preset controls according to the user name and text content, and generate a voice bar corresponding to the second audio.

In this embodiment, the text content corresponding to the user name and the second audio is used as the display content of the voice bar, Populate the preset control with user name and text content to generate a voice bar. As an example, refer to Figure 3, which shows a schematic diagram of the online classroom interface. Mark 31 in the figure is the online classroom interface. The virtual teaching scene is displayed in the online classroom interface. The preset area in the online classroom interface can be the lower right corner. In the dotted line area, in the displayed content "A: XXXX" of the voice bar in the figure, A is the user name, and XXXX is the text content corresponding to the second audio. As a result, the audio content can be displayed more intuitively based on the voice bar.

In one embodiment of the present disclosure, filling the preset controls according to the user name and text content to generate a voice bar corresponding to the second audio includes: obtaining user preference information corresponding to the user identification; The display style determines the target control of the display style; fills the target control according to the user name and text content, and generates a voice bar corresponding to the second audio.

In this embodiment, the user preference information is used to indicate the user's preference for the display style. The user preference information may be set by the user, or may be determined based on the user behavior log. The display style includes the theme of the control, bubble effects, etc. The target control is rendered according to the display style, and the target control is filled with the user name and text content to generate a voice bar.

As an example, client one sends the recorded audio and the first user identification corresponding to the audio to the server. When client two obtains the audio from the voice queue of the server, the user preference information corresponding to the first user identification is used. , determine the target control of the corresponding display style. And fill in the target control according to the user name and text content, and generate a voice bar corresponding to the audio. Thus, for different users, voice bars with different display styles can be displayed according to user preferences, thereby improving the display effect.

In one embodiment of the present disclosure, in response to a triggering operation on the target voice bar displayed in the preset area, audio corresponding to the target voice bar is played. In this embodiment, the user can click on the voice bar displayed in the online classroom interface. When the click operation on the voice bar is detected, the audio corresponding to the voice bar is obtained and played. Among them, the implementation methods of triggering operations include but are not limited to button triggering, touch screen click triggering, gesture triggering, etc.

In the embodiment of the present disclosure, the form of voice bars is used for voice sharing and communication in the online classroom, and the voice interaction during the online teaching process is retained in the form of voice bars as an effective interaction in the classroom, providing a basis for subsequent recording and broadcasting of lessons. , students’ highlight performance, etc. provide materials.

Based on the above embodiments, the method of the embodiments of the present disclosure will be described below from the server side.

Figure 4 is a schematic flow chart of another voice interaction method in an online classroom provided by an embodiment of the present disclosure. As shown in Figure 4, the voice interaction method in an online classroom provided by an embodiment of the present disclosure includes:

Step 401: Receive first audio messages sent by multiple clients.

In this disclosed embodiment, the execution subject is the server.

In this embodiment, the online classroom interface is configured with a voice recording control, and the user can activate the voice recording function by triggering the voice recording control. The client obtains the first recorded audio in response to the triggering operation of the voice recording control in the online classroom interface. Among them, the triggering operations of the voice recording control include but are not limited to key triggering, touch track triggering, gesture triggering, etc.

The multiple clients may be multiple student-side clients, or may include a teacher-side client and at least one student-side client. Each client can obtain the first recorded audio in response to a triggering operation on the voice recording control, and send the first audio to the server.

Step 402: Add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio.

In this embodiment, the server is provided with a voice queue, and the voice queue is used to store the first audio received by the server.

As an example, the timestamp information includes a recording timestamp, and the recording timestamp is used to represent the recording time of the first audio. The student-side client sends the first audio and its corresponding recording timestamp to the server. When the server receives multiple first audios, it adds the first audios to the voice queue in order of recording time according to the recording timestamp. , thus multiple audios are stored in the voice queue in a certain order.

Step 403: Send the unplayed second audio in the voice queue to multiple clients in sequence, where each client plays the second audio.

In this embodiment, the number of unplayed second audios in the voice queue is multiple. In order to avoid multiple audios from being played at the same time and ensure orderly speech among students, the server can store the unplayed second audios in the voice queue. Send it to each client in turn, so that each client plays the second audio in turn.

As an example, the server sends the unplayed second audio in the voice queue to each client in sequence according to a preset time interval. Among them, the time interval can be set according to actual needs.

As another example, when the server receives the audio acquisition request sent by the client, it sends the first unplayed second audio in the voice queue to the client. In this example, when the client detects that the current audio has finished playing, it sends an audio acquisition request to the server.

According to the technical solution of the embodiment of the present disclosure, the server receives the first audios sent by multiple clients, adds the multiple first audios to the voice queue in sequence according to the timestamp information of each first audio, and adds the voice queue to the voice queue. The unplayed second audios are sent to multiple clients in sequence, so that the clients can play multiple second audios in sequence. This provides a semi-asynchronous voice discussion function for online classes, allowing users to conduct Voice sharing and communication can achieve the effect of classroom group discussion in the online classroom, enhance the atmosphere of the online classroom, and, in the case of multiple clients recording audio, based on the voice queue in sequence Obtain and play each audio to achieve semi-asynchronous voice sharing and communication, ensuring orderly speech among students, solving the problem of unclear speech caused by multiple audios being played at the same time when multiple students speak at the same time, and improving the voice quality of online classes. interactive effects.

FIG. 5 is a schematic structural diagram of a voice interaction device for online classes provided by an embodiment of the present disclosure. As shown in FIG. 5 , the voice interaction device for online classes includes: a recording module 51 , an upload module 52 , and a playback module 53 .

Among them, the recording module 51 is used to obtain the recorded first audio in response to the triggering operation of the voice recording control in the online classroom interface.

Upload module 52, used to send the first audio to the server, wherein the server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients in sequence to in the voice queue.

The playback module 53 is used to sequentially obtain the unplayed second audio in the voice queue and play the second audio.

In one embodiment of the present disclosure, the voice interaction device of the online classroom further includes: a display module configured to sequentially display the second audio in the form of a voice bar in a preset area in the online classroom interface.

In one embodiment of the present disclosure, the display module includes: a recognition unit, used to perform speech recognition on the second audio according to a pre-trained speech recognition model, and obtain the text content corresponding to the second audio; and an acquisition unit, using for obtaining the user identification of the second audio and the user name corresponding to the user identification; a generating unit for filling the preset control according to the user name and the text content to generate the second audio corresponding voice bar.

In one embodiment of the present disclosure, the generation unit is specifically configured to: obtain user preference information corresponding to the user identification; determine the target control of the display style according to the display style corresponding to the user preference information; The user name and the text content are filled in the target control to generate a voice bar corresponding to the second audio.

In one embodiment of the present disclosure, the voice interaction device of the online classroom further includes: a triggering module, configured to respond to a triggering operation on a target voice bar displayed in the preset area, and play a video corresponding to the target voice bar. Audio.

FIG. 6 is a schematic structural diagram of another online classroom voice interaction device provided by an embodiment of the present disclosure. As shown in FIG. 6 , the online classroom voice interaction device includes: a receiving module 61 , a storage module 62 , and a sending module 63 .

The receiving module 61 is used to receive the first audio sent by multiple clients.

The storage module 62 is configured to add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio.

The sending module 63 is configured to send the unplayed second audio in the voice queue to the multiple clients in sequence, where each client plays the second audio.

In one embodiment of the present disclosure, there are multiple second audios, and the sending module 63 is specifically configured to: send the unplayed second audios in the voice queue to each voice in sequence according to a preset time interval. client; or, when an audio acquisition request returned by the client is detected, the first second audio in the voice queue is sent to the client, where the audio acquisition request is when the client detects that the current audio has been played. sent at the time.

The online classroom voice interaction device provided by the embodiment of the present disclosure can execute any online classroom voice interaction method provided by the embodiment of the present disclosure, and has corresponding functional modules and beneficial effects for executing the method. Contents that are not described in detail in the device embodiments of the present disclosure may refer to the descriptions in any method embodiments of the present disclosure.

Exemplary embodiments of the present disclosure also provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor. The memory stores a computer program that can be executed by the at least one processor, and when executed by the at least one processor, the computer program is used to cause the electronic device to execute:

In response to the triggering operation of the voice recording control in the online classroom interface, obtain the recorded first audio; send the first audio to the server, where the server receives the first audio sent by multiple clients and sends the first audio to the server. The first audio sent by the client is added to the voice queue in turn; the second audio that has not been played in the voice queue is obtained in turn, and the second audio is played.

In one embodiment of the present disclosure, the computer program, when executed by the at least one processor, is also used to cause the electronic device to execute: sequentially in the form of a voice bar in a preset area in the online classroom interface. Show second audio.

In one embodiment of the present disclosure, sequentially displaying the second audio in the form of voice bars includes: performing speech recognition on the second audio according to a pre-trained speech recognition model, and obtaining the text content corresponding to the second audio; obtaining the second audio The user identification and the user name corresponding to the user identification; fill in the preset control according to the user name and text content, and generate a voice bar corresponding to the second audio.

In one embodiment of the present disclosure, filling the preset controls according to the user name and text content, and generating a voice bar corresponding to the second audio includes: obtaining user preference information corresponding to the user identification; and displaying corresponding information according to the user preference information. Style, determine the target control of the display style; fill the target control according to the user name and text content, and generate a voice bar corresponding to the second audio.

In one embodiment of the present disclosure, the computer program, when executed by the at least one processor, is also used to cause the electronic device to perform: in response to a triggering operation on the target voice bar displayed in the preset area, Play the audio corresponding to the target voice bar.

Exemplary embodiments of the present disclosure also provide an electronic device, including: at least one processor; and a memory communicatively connected to the at least one processor. The memory stores a computer program that can be executed by at least one processor. The computer program, when executed by at least one processor, is used to cause the electronic device to execute:

Receive first audios sent by multiple clients; add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio; send unplayed second audios in the voice queue to multiple clients in sequence client, where each client plays the second audio.

In one embodiment of the present disclosure, the number of second audios is multiple, and sending the unplayed second audios in the voice queue to multiple clients in sequence includes: according to a preset time interval, sending the unplayed second audios in the voice queue to multiple clients. The second audio is sent to each client in turn; or, when the audio acquisition request returned by the client is detected, the first second audio in the voice queue is sent to the client, where the audio acquisition request is detected by the client. Sent when the current audio has finished playing.

Exemplary embodiments of the present disclosure also provide a non-transitory computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor of a computer, causes the computer to execute:

In one embodiment of the present disclosure, when the computer program is executed by the processor of the computer, it is also used to cause the computer to execute: in a preset area in the online classroom interface, sequentially display the second time in the form of a voice bar. Audio.

In one embodiment of the present disclosure, displaying the second audio in the form of voice bars sequentially includes: performing speech recognition on the second audio according to a pre-trained speech recognition model, and obtaining the text content corresponding to the second audio; obtaining the second audio. The user ID of the second audio and the user name corresponding to the user ID; fill in the preset control according to the user name and text content to generate a voice bar corresponding to the second audio.

In one embodiment of the present disclosure, the computer program, when executed by the processor of the computer, is also used to cause the computer to perform: in response to a triggering operation on the target voice bar displayed in the preset area, playback and target The audio corresponding to the voice bar.

Exemplary embodiments of the present disclosure also provide a computer program product, including a computer program, wherein the computer program, when executed by a processor of a computer, is used to cause the computer to perform a method according to an embodiment of the present disclosure.

Referring to FIG. 7 , a structural block diagram of an electronic device 700 that may serve as a server or client of the present disclosure will now be described, which is an example of a hardware device that may be applied to aspects of the present disclosure. Electronic devices are intended to refer to various forms of digital electronic computing equipment, such as laptop computers, desktop computers, workstations, personal digital assistants, servers, blade servers, mainframe computers, and other suitable computers. Electronic devices may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions are examples only and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in FIG. 7 , the electronic device 700 includes a computing unit 701 that can perform calculations according to a computer program stored in a read-only memory (ROM) 702 or loaded from a storage unit 708 into a random access memory (RAM) 703 . Perform various appropriate actions and processing. In the RAM 703, various programs and data required for the operation of the device 700 can also be stored. Computing unit 701, ROM 702 and RAM 703 are connected to each other via bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Multiple components in the electronic device 700 are connected to the I/O interface 705, including: an input unit 706, an output unit 707, a storage unit 708, and a communication unit 709. The input unit 706 may be a device capable of inputting information to the electronic device 700 For any type of device for inputting information, the input unit 706 may receive inputted numeric or character information and generate key signal inputs related to user settings and/or function control of the electronic device. Output unit 707 may be any type of device capable of presenting information, and may include, but is not limited to, a display, speakers, video/audio output terminal, vibrator, and/or printer. The storage unit 708 may include, but is not limited to, magnetic disks and optical disks. The communication unit 709 allows the electronic device 700 to exchange information/data with other devices through a computer network such as the Internet and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver and/or a chip Groups such as Bluetooth™ devices, WiFi devices, WiMax devices, cellular communications devices and/or the like.

Computing unit 701 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 701 include, but are not limited to, a central processing unit (CPU), a graphics processing unit (GPU), various dedicated artificial intelligence (AI) computing chips, various computing units running machine learning model algorithms, digital signal processing processor (DSP), and any appropriate processor, controller, microcontroller, etc. The computing unit 701 performs the various methods and processes described above. For example, in some embodiments, the voice interaction method of the online classroom can be implemented as a computer software program, which is tangibly included in a machine-readable medium, such as the storage unit 708. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 700 via the ROM 702 and/or the communication unit 709. In some embodiments, the computing unit 701 may be configured to perform the voice interaction method of the online classroom through any other suitable means (eg, by means of firmware).

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general-purpose computer, special-purpose computer, or other programmable data processing device, such that the program codes, when executed by the processor or controller, cause the functions specified in the flowcharts and/or block diagrams/ The operation is implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, laptop disks, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read-only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.

As used in this disclosure, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or means for providing machine instructions and/or data to a programmable processor (eg, magnetic disk, optical disk, memory, programmable logic device (PLD)), including machine-readable media that receive machine instructions as machine-readable signals. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide interaction with a user, the systems and techniques described herein may be implemented on a computer having a display device (eg, a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user ); and a keyboard and pointing device (eg, a mouse or a trackball) through which a user can provide input to the computer. Other kinds of devices may also be used to provide interaction with the user; for example, the feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and may be provided in any form, including Acoustic input, voice input or tactile input) to receive input from the user.

The systems and techniques described herein may be implemented in a computing system that includes back-end components (e.g., as a data server), or a computing system that includes middleware components (e.g., an application server), or a computing system that includes front-end components (e.g., A user's computer having a graphical user interface or web browser through which the user can interact with implementations of the systems and technologies described herein), or including such backend components, middleware components, or any combination of front-end components in a computing system. The components of the system may be interconnected by any form or medium of digital data communication (eg, a communications network). Examples of communication networks include: local area network (LAN), wide area network (WAN), and the Internet.

Computer systems may include clients and servers. Clients and servers are generally remote from each other and typically interact over a communications network. The relationship of client and server is created by computer programs running on corresponding computers and having a client-server relationship with each other.

It should be noted that in this article, relational terms such as “first” and “second” are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is no such actual relationship or sequence between entities or operations. Furthermore, the terms "comprises,""comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment. without further restrictions In the case where an element is defined by the statement "comprises a...", it does not exclude the presence of other identical elements in a process, method, article or device that includes the stated element.

The above descriptions are only specific embodiments of the present disclosure, enabling those skilled in the art to understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be practiced in other embodiments without departing from the spirit or scope of the disclosure. Therefore, the present disclosure is not to be limited to the embodiments described herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

A voice interaction method for online classes, applied to the client, the method includes:

In response to the triggering operation of the voice recording control in the online classroom interface, obtain the first recorded audio;

Send the first audio to the server, wherein the server receives the first audio sent by multiple clients and adds the first audio sent by the multiple clients to the voice queue in sequence;

The second audio that has not been played in the voice queue is acquired in sequence, and the second audio is played.
The voice interaction method for online classes as claimed in claim 1, further comprising:

In the preset area of the online classroom interface, the second audio is sequentially displayed in the form of a voice bar.
The voice interaction method for online classes as claimed in claim 2, wherein the sequential display of the second audio in the form of voice bars includes:

Perform speech recognition on the second audio according to a pre-trained speech recognition model, and obtain the text content corresponding to the second audio;

Obtain the user identification of the second audio and the user name corresponding to the user identification;

Fill in the preset control according to the user name and the text content, and generate a voice bar corresponding to the second audio.
The voice interaction method for online classes according to claim 3, wherein filling in preset controls according to the user name and the text content and generating a voice bar corresponding to the second audio includes:

Obtain user preference information corresponding to the user identification;

Determine the target control of the display style according to the display style corresponding to the user preference information;

The target control is filled in according to the user name and the text content, and a voice bar corresponding to the second audio is generated.
The voice interaction method in online classroom according to any one of claims 2-4, further comprising:

In response to a triggering operation on a target voice bar displayed in the preset area, audio corresponding to the target voice bar is played.
A voice interaction method for online classes, applied to the server, the method includes:

Receive the first audio sent by multiple clients;

Add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio;

The unplayed second audio in the voice queue is sent to the multiple clients in sequence, where each client is used to play the second audio.
The voice interaction method in an online classroom according to claim 6, wherein the number of second audios is multiple, and the second audios that are not played in the voice queue are sent to the multiple clients in sequence. ,include:

According to the preset time interval, the second audio that has not been played in the voice queue is sent to each client in turn; or,

When the audio acquisition request returned by the client is detected, the first second audio in the voice queue is sent to the client, wherein the audio acquisition request is sent when the client detects that the current audio has been played.
A voice interaction device for online classes, applied to the client, the device includes:

A recording module, used to obtain the first recorded audio in response to the triggering operation of the voice recording control in the online classroom interface;

Upload module, used to send the first audio to the server, wherein the server receives the first audio sent by multiple clients, and adds the first audio sent by the multiple clients to the voice in turn. in queue;

A playback module, configured to sequentially obtain unplayed second audio in the voice queue and play the second audio.
The voice interaction device for online classroom as claimed in claim 8, further comprising a display module configured to sequentially display the second audio in the form of a voice bar in a preset area of the online classroom interface.
The voice interaction device for online classroom according to claim 9, wherein the display module includes:

A recognition unit, configured to perform speech recognition on the second audio according to a pre-trained speech recognition model, and obtain the text content corresponding to the second audio;

An acquisition unit, configured to acquire the user identification of the second audio and the user name corresponding to the user identification;

A generating unit, configured to fill in the preset control according to the user name and the text content, and generate a voice bar corresponding to the second audio.
The voice interaction device for online classroom according to claim 10, wherein the generating unit is used for:

Obtain user preference information corresponding to the user identification; determine the target control of the display style according to the display style corresponding to the user preference information; fill the target control according to the user name and the text content, Generate a voice bar corresponding to the second audio.
The voice interaction device for online classroom according to any one of claims 9 to 11, said device further comprising a trigger module for, in response to a trigger operation on the target voice bar displayed in the preset area, playing and The audio corresponding to the target voice bar.
A voice interaction device for online classes, applied to the server, the device includes:

The receiving module is used to receive the first audio sent by multiple clients;

A storage module configured to add multiple first audios to the voice queue in sequence according to the timestamp information of each first audio;

A sending module, configured to send the unplayed second audio in the voice queue to the plurality of clients in sequence, where each client is used to play the second audio.
The voice interaction device for online classes as claimed in claim 13, wherein the number of second audios is multiple, and the sending module is used to:

According to the preset time interval, the second unplayed audio in the voice queue is sent to each client in turn; or, when the audio acquisition request returned by the client is detected, the first audio in the voice queue is sent to each client in turn; The second audio is sent to the client, where the audio acquisition request is sent when the client detects that the current audio has been played.
An electronic device including:

processor;

memory for storing instructions executable by the processor;

The processor is configured to read the executable instructions from the memory and execute the instructions to implement the method described in any one of claims 1-7.
A computer-readable storage medium, the storage medium stores a computer program, and when the computer program is executed by a processor, the method described in any one of claims 1-7 is implemented.