CN111798848A - Voice synchronous output method and device and electronic equipment - Google Patents

Voice synchronous output method and device and electronic equipment Download PDF

Info

Publication number
CN111798848A
CN111798848A CN202010618903.1A CN202010618903A CN111798848A CN 111798848 A CN111798848 A CN 111798848A CN 202010618903 A CN202010618903 A CN 202010618903A CN 111798848 A CN111798848 A CN 111798848A
Authority
CN
China
Prior art keywords
user
voice
users
client
content
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010618903.1A
Other languages
Chinese (zh)
Inventor
张晓平
武亚强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN202010618903.1A priority Critical patent/CN111798848A/en
Publication of CN111798848A publication Critical patent/CN111798848A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1818Conference organisation arrangements, e.g. handling schedules, setting up parameters needed by nodes to attend a conference, booking network resources, notifying involved parties
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/141Setup of application sessions

Abstract

The application discloses a method, a device and electronic equipment for synchronously outputting voice, wherein the method comprises the following steps: the method comprises the steps that user media data of at least two first users are obtained, a first client of each first user is in communication connection with an interactive system, and the user media data of the first users comprise voice data of the first users; recognizing voice content contained in voice data of a first user; based on the voice content of the first user, determining a voice playing group to which the first user belongs, wherein the voice playing group comprises at least two first users meeting conditions, and the conditions comprise: the matching degree between the voice contents meets a first condition; and outputting the voice data of other first users except the first user in the voice playing group to which the first user belongs to the first client of the first user. The scheme of the application reduces the voice noise generated by synchronous reading of multiple persons based on the interactive system, and improves the voice synchronism of synchronous reading of multiple persons based on the interactive system.

Description

Voice synchronous output method and device and electronic equipment
Technical Field
The present application relates to the field of network communication technologies, and in particular, to a method and an apparatus for outputting voice synchronously, and an electronic device.
Background
The interactive system, also called an online interactive system, can realize the sharing of multimedia data by different users through a network, for example, the online interaction which can be realized by the online interactive system can realize online meeting, online classroom or other types of online live broadcast and the like. For example, the teacher and the student can realize the explanation and the communication of the online course based on the online interactive system.
In some online interactive systems, multiple people may be required to input the same content by voice. For example, in an online classroom, students may need to read texts at the same time to drive the students to put into class learning, so as to realize a classroom effect similar to an offline classroom; as another example, an enterprise unit may perform some speech rehearsal via an online interactive system, etc., in which case multiple people may be required to read the same content in synchronization. However, in the online interaction, the user is limited by various factors such as network transmission quality, and it is difficult for the user to adjust the voice input rhythm in time, so that the voices of multiple users in the online interaction cannot be synchronized, and the voice is noisy, so how to achieve the effect of multi-user synchronous reading in the online interaction, and reducing the voice noise in the multi-user synchronous reading is a technical problem to be solved by those skilled in the art.
Disclosure of Invention
The application provides a method and a device for synchronously outputting voice and electronic equipment.
The voice synchronous output method comprises the following steps:
the method comprises the steps that user media data of at least two first users are obtained, a first client of each first user is in communication connection with an interactive system, the user media data of the first users comprise voice data of the first users, and the interactive system is a platform capable of achieving synchronous sharing of multimedia data of the users on the basis of a network;
recognizing voice content contained in voice data of the first user;
determining a voice playing group to which the first user belongs based on the voice content of the first user, wherein the voice playing group comprises at least two first users meeting conditions, and the conditions comprise: the matching degree between the voice contents meets a first condition;
and outputting the voice data of other first users except the first user in the voice playing group to which the first user belongs to the first client of the first user.
Preferably, the conditions further include:
and the incidence relation between the user attribute characteristics meets a second condition.
Preferably, the association relationship between the user attribute features satisfies a second condition, including:
the similarity of sound features exceeds a first threshold, and the sound features are determined based on the voice data of the first user;
and/or the similarity of the user portrait features exceeds a second threshold;
and/or the user relationship between the first users has relevance;
and/or the geographic locations belong to the same geographic area range.
Preferably, the determining, based on the voice content of the first user, the voice play group to which the first user belongs includes:
and determining a voice playing group to which the first user belongs based on the voice content of the first user and the number of people in the group set by the first user, wherein the number of people of the first user in the voice playing group is not more than the number of people in the group.
Preferably, the interactive system further establishes a communication connection with a second client of a second user, and is configured to distribute multimedia data transmitted by the second client to first clients of the at least two first users;
the method further comprises the following steps:
determining at least one target first user with the difference degree meeting the condition between the voice contents of other first users from the at least two first users based on the voice contents of the at least two first users;
and sending a voice content abnormity prompt to a second client of the second user, wherein the voice content abnormity prompt indicates that the voice content of the target first user is abnormal, so that the target first user is identified in an online interactive interface of the second client.
Preferably, the method further comprises the following steps:
determining a voice difference condition existing between the voice content of the target first user and the voice content of other first users;
and sending the voice difference condition corresponding to the target first user to the second client so as to display the voice difference condition of the target first user in the online interactive interface of the second client.
Preferably, the method further comprises the following steps:
obtaining an independent monitoring request sent by a second client of the second user, wherein the independent monitoring request indicates a first user selected to be monitored by the second user;
and extracting the voice data of the first user selected and monitored by the second user from the voice data of the at least two first users, and sending the voice data of the first user selected and monitored by the second user to the second terminal of the second user so as to independently play the voice data of the first user selected and monitored by the second user at the second terminal of the second user.
In another aspect, the present application further provides a speech synchronization output apparatus, including:
the data acquisition unit is used for acquiring user media data of at least two first users, a first client of each first user is in communication connection with an interactive system, the user media data of the first users comprise voice data of the first users, and the interactive system is a platform capable of realizing synchronous sharing of the multimedia data of the users based on a network;
a content recognition unit configured to recognize a voice content included in the voice data of the first user;
a group determining unit, configured to determine, based on the voice content of the first user, a voice play group to which the first user belongs, where the voice play group includes at least two first users that meet a condition, and the condition includes: the matching degree between the voice contents meets a first condition;
and the voice output unit is used for outputting the voice data of other first users except the first user in the voice playing group to which the first user belongs to the first client of the first user.
Preferably, the interactive system further establishes a communication connection with a second client of a second user, and is configured to distribute multimedia data transmitted by the second client to first clients of the at least two first users;
the device further comprises:
an abnormal user determining unit, configured to determine, from the at least two first users, at least one target first user whose difference degree between the voice contents of the at least two first users and the voice contents of other first users meets a condition, based on the voice contents of the at least two first users;
and the abnormal voice content prompt unit is used for sending a voice content abnormal prompt to a second client of the second user, wherein the voice content abnormal prompt indicates that the voice content of the target first user is abnormal, so that the target first user is identified in an online interactive interface of the second client.
In another aspect, the present application further provides an electronic device, including:
a memory and a processor;
wherein the processor is configured to execute the voice synchronous output method as described in any one of the above;
the memory is used for storing programs needed by the processor to perform operations.
According to the scheme, after the voice content contained in the voice data of the user in the interactive system is identified, the user with the matching degree meeting the condition between the voice contents can be attributed to the same voice playing group, and the voice data of other users belonging to the same voice playing group with the user is only output at the client side of the user, so that the voice with synchronism with the voice sent by the user can be played at the client side of the user, the voice synchronism of simultaneous reading of multiple users based on the interactive system is realized, the voice noise generated by the synchronous reading of the multiple users based on the interactive system is reduced, and the effect of synchronous reading based on the interactive system is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
Fig. 1 is a schematic diagram of a composition architecture of an online interaction scenario provided in an embodiment of the present application;
fig. 2 is a schematic flowchart of a method for outputting speech synchronously according to an embodiment of the present application;
fig. 3 is a schematic flowchart of another speech synchronous output method according to an embodiment of the present application;
fig. 4 is a schematic flowchart of another speech synchronous output method according to an embodiment of the present application;
fig. 5 is a schematic view of a flow interaction of a method for outputting speech synchronously according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of a voice synchronous output device according to an embodiment of the present application;
fig. 7 is a schematic diagram of a composition architecture of an electronic device according to an embodiment of the present disclosure.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It should be understood that the data so used may be interchanged under appropriate circumstances such that embodiments of the application described herein may be practiced otherwise than as specifically illustrated.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without inventive step, are within the scope of the present disclosure.
The scheme of the application is suitable for realizing the scene of voice or video communication among multiple persons based on the interactive system. The interactive system can be an online conference system, an online education system for realizing intelligent education such as an online classroom system and a live broadcast system related to multi-person interaction.
The scheme of this application can be applied to interactive system's server, also can be applied to interactive system's customer end, can many people synchronous sound production in-process the noisy sound that appears through the scheme of this application, improves many people synchronous sound production's synchronization effect.
For convenience of understanding, a scenario applicable to the present application is described below by taking a case of an interactive system as an example, and as shown in fig. 1, a schematic structural diagram of an online interactive system of the present application is shown.
In fig. 1, the interactive system is illustrated as an online classroom.
As can be seen from fig. 1, fig. 1 may include: an online classroom system 101, a first client 102 of a plurality of students and a second client 103 of at least one teacher.
The online classroom system 101 may include at least one server 1011 for implementing an online classroom, for example, the online classroom system may include a server cluster formed by a plurality of servers, or a cloud platform.
The first client and the second client are in communication connection with the online classroom system, so that the first client and the second client are accessed to the online classroom system. On the basis, the teacher can transmit the teaching video needing live broadcasting to the online classroom system through the second client, and the online classroom system can distribute the teaching video to the first clients of all students, so that the online live-broadcasting type online classroom is realized.
It is understood that the teaching video transmitted by the second client through the online classroom system can include one or more of course content of a course taught by the teacher, voice of the teacher, and an image of the teacher.
In addition, in the course of course learning based on the online classroom system, the second client may also send multimedia data of the student, which includes audio and/or video, to the second client on the teacher side, so that the teacher may obtain the course learning state of each student through the second client, or learn the problems and the like of the student through the voice of the student.
It is understood that fig. 1 illustrates an online interactive scenario, an online classroom, as an example. But similar for other online interaction scenarios, e.g., for online video conferencing, it is also necessary for multiple users to establish a pass-through connection with a server of an online video conferencing system through a client, so that clients of different users participating in the conference can transmit user voice and video, and transmit documents or content related to the conference, etc. through the online video conferencing. Of course, the online interaction scenario based on other interaction systems is also similar, and the description is omitted here.
Based on the above, the speech synchronous output method of the present application is described below with reference to the flowchart.
As shown in fig. 2, which shows a schematic flow chart of another embodiment of the speech synchronous output method according to the present application, the speech synchronous output method of the present embodiment may be applied to a server of an interactive system, or may be applied to a client that establishes a communication connection with the interactive system.
The process of this embodiment may include the following steps:
s201, user media data of at least two first users are obtained.
The first client of the first user is in communication connection with the interactive system. In this embodiment, the first user may be any user that establishes a communication connection with the interactive system through the client.
The interactive system is a platform which can realize synchronous sharing of multimedia data of a plurality of users among the plurality of users based on a network. For example, the interactive system may be the aforementioned live platform or an online classroom system of an online classroom.
It is understood that the multimedia data that the user can share based on the interactive system can be of various types, such as multimedia such as audio and video of the user, and also can be a document or PPT.
The user media data of the first user includes at least voice data of the first user, such as a sound made by the first user. Of course, the user media data of the first user may also include video data of the first user, such as a video image of the first user included in the video data.
It can be understood that, in the case that the present embodiment is applied to an interactive system, the interactive system may obtain the user media data transmitted by the first client of each first user.
However, when the embodiment is applied to the client, for example, the first client of the first user, the user media data of the local first user collected by the first client and the user media data of the other first users transmitted by the interactive system may be acquired.
S202, recognizing the voice content contained in the voice data of the first user.
The voice content in the voice data refers to content information expressed in a voice form in the voice data.
Any voice recognition technology can be adopted for recognizing the voice content in the voice data, and the method is not limited to this.
For example, in a possible case, the voice content in the recognized voice data may be a text corresponding to the recognized voice data, where the text includes characters such as characters converted from the voice in the voice data, and the content information in the voice data may be visually reflected through the text. For example, if the first user utters "this is a good era" by voice, the text of the voice content contained in the voice data can be recognized as "this is a good era".
S203, based on the voice content of the first user, determining the voice playing group to which the first user belongs.
Wherein, the voice playing group comprises at least two first users meeting the conditions, and the conditions at least comprise: the degree of matching between the voice contents satisfies a first condition.
For example, the matching degree between the voice contents satisfying the first condition may be that the matching degree between the voice contents exceeds a set threshold. As another example, the matching degree between the voice contents satisfying the first condition may be that the matching degree between the voice contents is the highest.
It can be understood that if the voice contents contained in the voice data of the multiple first users obtained at the same time are completely the same, it indicates that the voice contents of the multiple first users are completely synchronized, and therefore, the degree of synchronization of the voice contents uttered by the voices of the two users can be reflected by the matching degree between the voice contents of the two users.
For example, taking an online classroom in which a plurality of students read articles at the same time as an example, if after obtaining voice data of two students at a certain time, it is analyzed that voice contents in the voice data of the two students are completely consistent, it indicates that the two students are reading the articles at the same reading progress and speed, and the voice synchronism of the two students reading the articles is high.
In this embodiment, there may be a plurality of ways to determine the matching degree between different voice contents. For example, text matching is performed on different voice contents, and the text matching degree is used as the matching degree between the voice contents.
For another example, the matching degree between different voice contents may also be analyzed from one or more dimensions, such as the total number of characters contained in the voice contents, the number of identical characters, the number of consecutive identical characters, and the difference between the total number of characters between different voice contents.
For example, it is assumed that the voice content corresponding to the user a is "we plan to visit a museum", the voice content corresponding to the user B is also "we plan to visit the museum", the voice content corresponding to the user C is "we plan to visit a museum", and the voice content corresponding to the user D is "we intend to visit a museum and", it is known that the voice contents sent by the user a and the user B are identical, and therefore, the voice contents between the two users are synchronized.
And the voice content of user C is one character less than the voice content of user a, so the voice content uttered by user C has hysteresis with respect to user a. Similarly, although the voice content of user D also includes the voice content of user a, the voice content of user D has one character "and" more than the voice content of user a, and the voice content sent by user D is advanced relative to user a and cannot be kept synchronous with the voice of user a. Therefore, the speech synchronization between user D and user a is relatively poor with respect to user B.
As can be seen from this example, user B and user a can be determined to belong to the same voice play group.
It will be appreciated that in practical applications, different matching weights may also be set for these several different dimensions. For example, considering that the more synchronous the voice contents uttered by different users, the more the number or the ratio of the number of the continuous same characters in the voice contents, the weight of the number of the continuous same characters in different voice contents can be set relatively higher, and the weight of the difference value of the total format of the characters can be relatively weaker, etc.
As another example, matching of voice content may also be performed in connection with semantics of the voice content. Of course, the semantic content may also be subjected to matching analysis by combining a plurality of matching manners.
S204, outputting the voice data of other first users except the first user in the voice playing group to which the first user belongs to the first client of the first user.
In this embodiment, when the embodiment is applied to an interactive system, for any first user, the interactive system may transmit only the voice data of other users belonging to the same voice play group as the first user to the first client of the first user, so that the first client of the first user only plays the voice data of each user in the voice play group to which the first user belongs.
In this embodiment, when the embodiment is applied to the first client of the first user, the first client may extract, from the voice data of each first user sent by the interactive system, the voice data of the first user belonging to the same voice playing group as the first user of the first client, and play the extracted voice data of each first user.
It can be understood that, for any first user, only the voice data of other first users belonging to the same voice playing group as the first user is output to the client of the first user, and then the first user can only hear the voice content different from the voice content sent by the first user, so that the synchronism of the voice content sensed by the first user side is improved, and the user can achieve the effect of reading with other users synchronously off-line.
It is understood that, in the case that the multimedia data of any one first user includes a video image captured by the first client side (e.g., a video image of the first user, etc.), for any one first user, video images of other first users belonging to the same voice play group as the first user are also output to the first client.
Of course, in practical applications, for any first user, the video image output to the first client of the first user may also include the video image of any first user in the prior interactive system, but may only belong to other first users in the same voice play group as the first user.
Therefore, after the voice content contained in the voice data of the user in the interactive system is identified, the user with the matching degree meeting the condition between the voice contents can be attributed to the same voice playing group, and the voice data of other users belonging to the same voice playing group with the user is only output at the client side of the user, so that the voice with synchronism with the voice sent by the user can be played at the client side of the user, the voice synchronism based on simultaneous reading of multiple users in the interactive system is realized, the voice noise generated by the simultaneous reading of the multiple users in the interactive system is reduced, and the effect of synchronous reading based on the interactive system is improved.
It can be understood that, in practical applications, the user attribute characteristics that users belonging to the same voice play group need to satisfy may also be set so as to classify users having the same or similar user attribute characteristics into the same voice play group.
In addition, in practical application, the number of people in the voice playing group can be set by the client of the first user or the interactive system, so that the number of people in each voice playing group does not exceed the set number of people.
For example, referring to fig. 3, which shows a flowchart of another embodiment of the method for outputting synchronized speech according to the present application, the method of the present embodiment may be applied to a client, such as a first client of a first user, and may also be applied to an interactive system. The method of the embodiment can comprise the following steps:
s301, user media data of at least two first users are obtained.
The user media data of the first user comprises at least voice data of the first user.
S302, recognizing the voice content contained in the voice data of the first user.
S303, determining a voice playing group to which the first user belongs based on the voice content of the first user and the number of people in the group set by the first user, wherein the voice playing group comprises at least two first users meeting the condition.
Wherein the conditions include: the matching degree between the voice contents satisfies a first condition, and the incidence relation between the user attribute features satisfies a second condition.
The first condition can refer to the related description above, and is not described herein again.
User attribute features refer to features that the user has or is associated with. By setting the association relation among the user attribute characteristics of different first users belonging to the same voice playing group to meet a second condition, the first users with similar or associated user attribute characteristics can be divided into one voice playing group.
For example, the user attribute feature may be one or more of a voice feature, an identity feature, a geographical location of the user, and a user group to which the user belongs in the interactive system.
Correspondingly, the condition that the association relationship between the user attribute features satisfies the second condition may include any one or more of the following conditions:
the similarity of the sound features exceeds a first threshold, and the sound features are determined based on the voice data of the first user;
the similarity of the user portrait features exceeds a second threshold;
the user relationship between the first users has relevance;
the geographic locations belong to the same geographic area range.
The voice features may be features such as timbre or tone of the first user, and the plurality of first users whose voice feature similarity exceeds the first threshold are determined as the same voice playing group, which is beneficial to avoiding harmony and consistency of voice contents integrally sent by the plurality of first users in the voice playing group due to too high or too low tone of some users (or more special timbre of some users).
The user profile features may be features that reflect the user's own attributes or social behavior attributes. For example, the user profile characteristics may include the user's age, gender, school calendar, and occupation undertaken, among others. For example, when the ages of different users are similar or the genders of the users are the same, the voice effects of the voice contents are similar when the users send the same voice content, for example, the difference between the voice effects of the college students and the college students is large, so that the voice consistency effect presented when the college students read the texts together with the college students is relatively poor, and the voice effects of the college students reading the same texts with the similar ages are the same, so that the uniformity of the reading effect is more obvious.
The user relationship may be a friend, belong to the same group, or belong to the same class, etc. It can be understood that if the user relationships of different first users have an association, it can be stated that the first users are familiar with each other, relatively known, or even relatively tacit with each other, so that it is more favorable for each first user in the voice play group to be able to send out voice content at the same speed of speech by taking the first user having an association as a condition of belonging to the same voice play group.
Wherein, the located geographic positions belong to the same geographic area range. If so, different first users belong to the geographic area range corresponding to the server deployed with the same interactive system; or the geographical locations of different first users belong to the same city of grade or province, etc. It can be understood that, when the geographic locations of the users belong to the same geographic area range, the network transmission speeds between the users are also relatively similar, which is beneficial to reducing the condition that the voice content is not synchronized due to network delay, thereby being beneficial to improving the synchronization of the voice content.
Of course, in the above cases of the user attribute features, there may be other possibilities that the second condition satisfied by the association relationship between the user attribute features in practical application, which is not limited to this.
It can be understood that, on the premise of considering the similarity between the voice contents of different users, the present embodiment also considers the user attribute characteristics having an influence on the similarity of the voice contents to comprehensively determine the users belonging to the same voice playing group, thereby facilitating the voice synchronicity of each user in the subsequent voice playing group.
The number of people in the group set by the first user refers to the maximum number of users included in the voice playing group to which the first user belongs and set by the first user. Correspondingly, in the process of determining the voice play group to which the first user belongs, it is required to ensure that the number of users in the voice play group to which the first user belongs does not exceed the number of users in the group set by the first user. For example, if there are 5 components set by a first user, a total of 5 first users in the voice play group assigned by the first user are finally assigned, and the above-mentioned conditions are satisfied between the 5 first users.
For example, taking the application of the present embodiment to an interactive system as an example, the interactive system may determine, in combination with the voice content of each first user and the number of people in the group set by each first user, the first user suitable for being classified into one voice play group, so as to obtain the voice play group to which each first user belongs.
If the application is applied to the first client of the first user, the first client can select a corresponding number of other first users to form a voice playing group with the first user according to the number of people in the group set by the first user after determining other first users meeting the condition with the first user of the first client.
It should be noted that, in the present embodiment, the condition satisfied by the first user in the voice play group includes the first condition and the second condition at the same time as an example, but it can be understood that, in the case that the condition satisfied by the user in the voice play group only includes the first condition, determining the voice play group by combining the number of people in the group set by the first user is also applicable to the present embodiment.
S304, outputting the voice data of other first users except the first user in the voice playing group to which the first user belongs to the first client of the first user.
This step 304 can be referred to the related description of the previous embodiment, and will not be described herein.
It is understood that in a scenario of online interaction based on the interactive system, there may also be a controller of the online interaction, such as a manager, an initiator, or a maintainer of the online interaction may all be a control user in the online interaction process. For example, a teacher in an online classroom acts as a classroom manager and maintainer, and in an online conference, a conference host or initiator may need to manage the overall conference state, maintain the conference progress, and handle problems in the online conference.
Correspondingly, the interactive system can also establish communication connection with a second client of a second user. The interactive system is used for distributing the multimedia data transmitted by the second client to the first client of at least two first users of the interactive system.
In this case, the controller of the online interaction (i.e. the second user) may need to know more accurately or in detail about the users whose voice content cannot be synchronized with other users during the online interaction, or about the specific reason why the voice content of some users cannot be synchronized, and so on.
The second user may belong to at least two first users, or may be another user other than the first user.
For example, taking an online classroom scenario as an example, the first user may be a student and the second user may be a teacher. Of course, it can also be considered that the first user may be any one of a student or a teacher, and the second user may be only a teacher, for example, in the case that the teacher and the student read synchronously, both the teacher and the student may output only the voice content of the user synchronized with the own voice content at the client by setting or defaulting.
For another example, taking an online conference as an example, the first user and the second user may both be any one participant, and the second user may belong to at least two users among the first users, so that, based on the scheme of the present application, a client of any one participant may only play the voice content of other users synchronized with the voice content thereof, and may also apply for obtaining information such as a reason that the voice synchronization cannot be achieved or an abnormal user that causes abnormal voice synchronization. Of course, the second user may also be a conference initiator separate from the first user, and the first user may be a common participant.
For example, referring to fig. 4, which shows a schematic flow chart of another embodiment of the speech synchronous output method of the present application, the method of the present embodiment may be applied to an interactive system; in a case that the second user belongs to at least two first users, the present application may also be applied to a client of the user, and the method of this embodiment may include:
s401, user media data of at least two first users are obtained.
The user media data of the first user comprises at least voice data of the first user.
S402, identifying the voice content contained in the voice data of the first user.
S403, determining a voice playing group to which the first user belongs based on the voice content of the first user, wherein the voice playing group comprises at least two first users meeting the conditions.
Wherein the conditions at least include: the degree of matching between the voice contents satisfies a first condition. Of course. The method can also comprise the following steps: and the incidence relation between the user attribute characteristics meets a second condition.
S404, outputting the voice data of other first users except the first user in the voice playing group to which the first user belongs to the first client of the first user.
Reference may be made to the related description of the foregoing embodiments for the above S401 to S404, which are not described herein again.
S405, determining at least one target first user with the difference degree meeting the condition with the voice content of other first users from the at least two first users based on the voice content of the at least two first users.
The target first user belongs to the at least two first users, and the first user with the difference degree meeting the condition is called the target first user only for convenience of distinguishing.
The condition that the difference degree meets the requirement can represent that the synchronism of the voice content is poor, and if the difference degree between the voice content of a certain first user and the contents of other first users meets the requirement, it is indicated that the first user may have a lead or lag of the voice content relative to the other first users.
Accordingly, there are many possibilities that the degree of difference between the speech content of the other first users meets the condition. For example, the degree of difference between the voice contents of the other first users meets the condition that the degree of similarity between the voice contents of the other first users is lower than the third threshold. Wherein the third threshold is smaller than a set threshold that the aforementioned matching degree between the voice contents needs to exceed. For another example, the condition that the degree of difference meets the requirement may be: the number of different characters from the voice content of other first users exceeds the set number.
For example, taking the example of implementing reading of the same document by multiple people based on the interactive system, assuming that under normal conditions, most of the contents read by the first users are "we plan to visit a museum", most of the contents read by the first users are "we plan to visit the museum", and if the contents spoken by a certain first user are "visit the museum, which is significant", the contents read by the first user are more advanced relative to other first users, and have a larger difference with the contents spoken by other first users, so that the contents spoken by the first user and the contents spoken by other most of the first users cannot achieve voice synchronization, the first user may be determined as a target user. Similarly, if the speech content uttered by the first user is completely different from the speech content uttered by the other first users, or there is a delay in reading the speech content, and the like, the difference between the speech content of the first user and the speech content uttered by the other first users is high, and the first user may be considered as a target user with speech content abnormality.
As an alternative, in the case that the embodiment is executed by the interactive system, the second client of the second user may further send a synchronization abnormal user identification request to the interactive system, where the synchronization abnormal user identification request is used to request to identify a user whose utterance is not compliant. Accordingly, the interactive system may determine at least one target first user for whom the degree of difference between the speech content of the other first user is in accordance with the synchronization exception user identification request.
S406, sending a voice content exception prompt to the second client of the second user.
And the voice content abnormity prompt indicates that the voice content of the target first user is abnormal, so that the target first user is identified in the online interactive interface of the second client.
For example, the voice content exception prompt may carry an identification of the target first user and a voice out-of-sync identification. And accordingly. The second client can be displayed on the online interactive interface
The specific form of the voice asynchronous identifier can be set as required, and the function of the voice asynchronous identifier is to prompt that the voice content of the target first user is obviously different from the voice content of other first users. For example, the out-of-sync voice flag may be "! "this symbol may also be a text label, etc.
It is understood that there are many situations where the target first user is not voice synchronized with the other first users, wherein a situation that causes a voice difference (not voice synchronized) in the voice content between the target first user and the other first users is referred to as a voice difference situation, and the voice difference situation may have many possibilities.
For example, if the speech content uttered by the target first user is relatively delayed with respect to the speech content of other first users, then the speech discrepancy condition pertains to speech utterance delay, e.g., the article reading speed is slower than others.
For another example, if the voice content uttered by the target first user is relatively advanced with respect to the voice content of other first users, the voice exception condition is a voice utterance advance, for example, the progress of reading an article is over that of other people due to the too fast voice speed.
For another example, if the content of the voice uttered by the target first user is completely different from the content of the voice uttered by the other first users, the abnormal voice condition may be a voice utterance error or the like.
Of course, there may be other possibilities for the voice exception condition, which is not limited in this respect.
Correspondingly, in order to enable the second user of the second client to know the specific reason causing the voice content of the target first user to be asynchronous with other first users, the embodiment may further determine the voice difference condition existing between the voice content of the target first user and the voice content of other first users, and send the voice difference condition corresponding to the target first user to the second client, so that the voice difference condition of the target first user is displayed in the online interactive interface of the second client.
The speech difference condition may be determined based on a particular difference speech between the speech content of the target first user and the speech content of the other first users. For example, the voice difference condition may be determined by comparing the content difference between the voice content of the target first user and the voice content of the other first users, and determining the voice difference condition corresponding to the content difference.
For example, suppose that the voice content of the user a has a large voice difference with other users, wherein the voice content of most users is "good day today", and the voice content of the user a is "good day, something think of, then the voice difference condition of the user a is" too fast voice, and the voice content is advanced ".
It is understood that, in this embodiment, the second user may also have a need to check the online interaction condition of a certain first user, for example, a teacher may need to check the online classroom condition of a student, or, in a case where a certain student reads content and has a significant voice difference with other students, inquire the state of the student and listen to the voice uttered by the student separately, etc.
In order to achieve the purpose, in the present application, the second user may also request to individually monitor a certain first user through the second client. Correspondingly, after obtaining the independent monitoring request sent by the second client of the second user, the voice data of the first user selected to be monitored by the second user can be extracted from the voice data of the at least two first users, and the voice data of the first user selected to be monitored by the second user is sent to the second terminal of the second user, so that the voice data of the first user selected to be monitored by the second user is played independently at the second terminal of the second user.
Wherein, the independent monitoring request indicates that the second user selects the first user for monitoring. For example, there may be one monitor button on each display window of the first user on the online interactive interface of the second client of the second user, and after the second client detects that the second user clicks the monitor button on the display window of the first user, an independent monitor request for the first user may be generated.
Therefore, in the embodiment of the application, the first client of the first user can play the voice data of other users synchronized with the voice content of the first user, so that the voice noise generated by the first user due to the asynchronous voice is reduced, and the effect of realizing the synchronous voice based on the online interaction of the users is improved. Meanwhile, the method and the device can also provide the user information of the first user with abnormal voice synchronization and the specific abnormal voice condition for the second client of the second user of the interactive system, so that the second user can timely and conveniently know the voice condition sent by each user in the interactive system.
In order to facilitate understanding of the solution of the present application, a description is given in conjunction with an application scenario of an online classroom. For easy understanding, in this application scenario, the online classroom system (i.e., the server of the online classroom system) is used to determine the voice play group and determine the voice data for each first user, and the first user in the online classroom system is a student in the online classroom, the second user is a teacher in the online classroom, and a scenario in which a plurality of students read the text synchronously based on the online classroom under the instruction of the teacher is taken as an example.
For example, referring to fig. 5, which shows a schematic flow interaction diagram of a speech synchronous output method according to the present application, the method of this embodiment may include:
s501, the online classroom system obtains user media data of at least two students.
The user media data of the student at least comprises voice data of the student, and the voice data of the student is voice data sent by the student reading the text.
Of course, the user media data of the student may also include image data of the student.
S502, the online classroom system identifies the voice content contained in the voice data of each student.
And S503, the online classroom system determines the voice playing group to which each student belongs based on the voice content of each student and the number of people in the group set by each student.
The voice playing group comprises at least two students, wherein the matching degree between voice contents meets a first condition, and the incidence relation between student attribute characteristics meets a second condition. For example, the online classroom system can classify a plurality of students with relatively high matching degree of voice contents and relatively high similarity of student attribute features into a voice play group, so that the voice contents of the students in the voice play group are basically consistent.
And S504, aiming at each student, the online classroom system sends the voice data of other students except the student in the voice playing group to which the student belongs to the first client of the student, so that the first client of the student plays the voice data of each student played by the online classroom system.
It is understood that the online classroom system can also send the video images of all students in the online classroom or students belonging to the same voice playing group to the student so as to view the images of other students in the online classroom interface of the first client of the student.
Of course, the online classroom system can also obtain multimedia data of the teacher uploaded by the second client of the teacher end, such as video and audio of the teacher, or classroom data or classroom lecture multimedia released by the teacher, and the like. Accordingly, the online classroom system can also distribute the multimedia data of the teacher side to the first client of each student.
Therefore, the online classroom system returns students with similar voice contents to one voice playing group, and only plays voices of reading articles to the students in the same voice playing group, so that the students can hear voices which are consistent with the reading synchronicity of the articles, the noise of voice which is generated when a plurality of people read the articles simultaneously in the online classroom due to the fact that the reading speed of part of the students is advanced, lagged or wrong is reduced, the immersion feeling of the students for reading the plurality of people based on the online classroom is improved, and the synchronous effect of reading the plurality of people is also improved.
And S505, the second client of the teacher sends a synchronous abnormal user identification request to the online classroom system.
For example, a key for triggering the identification of the synchronization exception may be displayed in the online classroom interface of the second client of the teacher, and by clicking the key, the second client may be triggered to generate and send the synchronization exception user identification request.
The synchronous abnormity identification request is used for requesting to identify students with poor reading synchronism, such as the Chinese reading errors of the students in the online classroom or asynchronous progress.
And S506, the online classroom system responds to the synchronous abnormal user identification request, determines at least one target student meeting the condition of the difference degree between the voice contents of other students from the at least two students based on the voice contents of the at least two students, and determines the voice difference condition existing between the voice contents of the target student and the voice contents of the other students.
It should be noted that, in this embodiment, the teacher sends a synchronization abnormal user identification request to the online classroom system through the second client to trigger the online classroom system to determine the target study and the abnormal voice condition, but in practical applications, it may be assumed that the online classroom system automatically executes step S506, and thus the step S506 may be executed without being triggered by step S505.
And S507, sending a voice abnormity prompt carrying the voice difference condition of the target students to the second client of the teacher so that the target students and the voice difference condition of the target students are identified in the online interactive interface of the second client of the teacher.
For example, the information of the target student and the voice difference condition may be carried in the voice exception prompt, so as to identify the target student in the online interactive interface of the second client, and to indicate the voice difference condition in the video window of the target student.
The voice difference condition can be a condition that the reading progress of the article is obviously advanced or delayed relative to other students, the reading content is wrong, and the like, so that the reading of the article is not synchronous with that of other students.
It should be understood that the sequence of the steps 505 to 507 and the step 503-504 is not limited to that shown in fig. 5, and in practical applications, the steps 505 to 507 may be executed at the same time as the steps S503-504 are executed, which is not limited thereto.
It can be understood that the teacher may also request to monitor a certain student through the second client, so that the online classroom system can provide the teacher with the voice (which may also have video images) of the student that the teacher chooses to monitor, which may be similar to the foregoing embodiment, and is not described herein again.
It should be noted that, the embodiment is described by taking an online classroom as an example, but other interactive systems are also applicable to the embodiment, and are not described herein again.
The application also provides a voice synchronous output device corresponding to the voice synchronous output method. Referring to fig. 6, which shows a schematic structural diagram of an embodiment of a speech synchronous output device according to the present application, the device of the present embodiment may be applied to a server of an interactive system; or to a client connected to the interactive system, such as the client of the first user. The apparatus of this embodiment may include:
a data obtaining unit 601, configured to obtain user media data of at least two first users, where a first client of the first user establishes a communication connection with an interactive system, the user media data of the first user includes voice data of the first user, and the interactive system is a platform capable of implementing synchronous sharing of multimedia data of multiple users among multiple users based on a network;
a content recognition unit 602 configured to recognize a voice content included in the voice data of the first user;
a group determining unit 603, configured to determine, based on the voice content of the first user, a voice play group to which the first user belongs, where the voice play group includes at least two first users that meet a condition, and the condition includes: the matching degree between the voice contents meets a first condition;
a voice output unit 604, configured to output, to a first client of the first user, voice data of a first user other than the first user in a voice play group to which the first user belongs.
In one possible implementation, the conditions in the set of determination units further include:
and the incidence relation between the user attribute characteristics meets a second condition.
Preferably, the association relationship between the user attribute features satisfies a second condition, including:
the similarity of sound features exceeds a first threshold, and the sound features are determined based on the voice data of the first user;
and/or the similarity of the user portrait features exceeds a second threshold;
and/or the user relationship between the first users has relevance;
and/or the geographic locations belong to the same geographic area range.
In another possible implementation manner, the group determining unit is specifically configured to determine, based on the voice content of the first user and the number of people in the group set by the first user, a voice playing group to which the first user belongs, where the number of people of the first user in the voice playing group is not greater than the number of people in the group.
In yet another possible implementation manner, the interactive system further establishes a communication connection with a second client of a second user, and the interactive system is configured to distribute the multimedia data transmitted by the second client to the first clients of the at least two first users;
the device further comprises:
an abnormal user determining unit, configured to determine, from the at least two first users, at least one target first user whose difference degree between the voice contents of the at least two first users and the voice contents of other first users meets a condition, based on the voice contents of the at least two first users;
and the abnormal voice content prompt unit is used for sending a voice content abnormal prompt to a second client of the second user, wherein the voice content abnormal prompt indicates that the voice content of the target first user is abnormal, so that the target first user is identified in an online interactive interface of the second client.
Optionally, the apparatus further comprises:
a condition determining unit, configured to determine a speech difference condition existing between the speech content of the target first user and the speech content of the other first users;
and the status sending unit is used for sending the voice difference status corresponding to the target first user to the second client so as to display the voice difference status of the target first user in the online interactive interface of the second client.
Optionally, the apparatus further comprises:
a monitoring receiving unit, configured to obtain an independent monitoring request sent by a second client of the second user, where the independent monitoring request indicates a first user selected to be monitored by the second user;
and the monitored data returning unit is used for extracting the voice data of the first user selected to be monitored by the second user from the voice data of the at least two first users, and sending the voice data of the first user selected to be monitored by the second user to the second terminal of the second user, so that the voice data of the first user selected to be monitored by the second user is independently played at the second terminal of the second user.
In yet another aspect, the present application further provides an electronic device, as shown in fig. 7, which shows a schematic structural diagram of a component of the electronic device, where the electronic device may be a server of an interactive system or a client of the interactive system, and the electronic device includes at least a memory 701 and a processor 702;
wherein the processor 701 is configured to perform the voice synchronous output method as in any of the above embodiments.
The memory is used for storing programs required for the processor to perform operations.
It is understood that the electronic device may further comprise a display unit 703, an input unit 704 and a communication bus 705. Of course, the electronic device may have more or less components than those shown in fig. 7, which is not limited thereto.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of synchronized output of speech, comprising:
the method comprises the steps that user media data of at least two first users are obtained, a first client of each first user is in communication connection with an interactive system, the user media data of the first users comprise voice data of the first users, and the interactive system is a platform capable of achieving synchronous sharing of multimedia data of the users on the basis of a network;
recognizing voice content contained in voice data of the first user;
determining a voice playing group to which the first user belongs based on the voice content of the first user, wherein the voice playing group comprises at least two first users meeting conditions, and the conditions comprise: the matching degree between the voice contents meets a first condition;
and outputting the voice data of other first users except the first user in the voice playing group to which the first user belongs to the first client of the first user.
2. The method of claim 1, the conditions further comprising:
and the incidence relation between the user attribute characteristics meets a second condition.
3. The method of claim 2, the association between the user attribute features satisfying a second condition comprising:
the similarity of sound features exceeds a first threshold, and the sound features are determined based on the voice data of the first user;
and/or the similarity of the user portrait features exceeds a second threshold;
and/or the user relationship between the first users has relevance;
and/or the geographic locations belong to the same geographic area range.
4. The method of claim 1, wherein the determining a voice play group to which the first user belongs based on the voice content of the first user comprises:
and determining a voice playing group to which the first user belongs based on the voice content of the first user and the number of people in the group set by the first user, wherein the number of people of the first user in the voice playing group is not more than the number of people in the group.
5. The method of claim 1, wherein the interactive system further establishes a communication connection with a second client of a second user, and the interactive system is configured to distribute multimedia data transmitted by the second client to a first client of the at least two first users;
the method further comprises the following steps:
determining at least one target first user with the difference degree meeting the condition between the voice contents of other first users from the at least two first users based on the voice contents of the at least two first users;
and sending a voice content abnormity prompt to a second client of the second user, wherein the voice content abnormity prompt indicates that the voice content of the target first user is abnormal, so that the target first user is identified in an online interactive interface of the second client.
6. The method of claim 5, further comprising:
determining a voice difference condition existing between the voice content of the target first user and the voice content of other first users;
and sending the voice difference condition corresponding to the target first user to the second client so as to display the voice difference condition of the target first user in the online interactive interface of the second client.
7. The method of claim 5, further comprising:
obtaining an independent monitoring request sent by a second client of the second user, wherein the independent monitoring request indicates a first user selected to be monitored by the second user;
and extracting the voice data of the first user selected and monitored by the second user from the voice data of the at least two first users, and sending the voice data of the first user selected and monitored by the second user to the second terminal of the second user so as to independently play the voice data of the first user selected and monitored by the second user at the second terminal of the second user.
8. A speech synchronous output device comprising:
the data acquisition unit is used for acquiring user media data of at least two first users, a first client of each first user is in communication connection with an interactive system, the user media data of the first users comprise voice data of the first users, and the interactive system is a platform capable of realizing synchronous sharing of the multimedia data of the users based on a network;
a content recognition unit configured to recognize a voice content included in the voice data of the first user;
a group determining unit, configured to determine, based on the voice content of the first user, a voice play group to which the first user belongs, where the voice play group includes at least two first users that meet a condition, and the condition includes: the matching degree between the voice contents meets a first condition;
and the voice output unit is used for outputting the voice data of other first users except the first user in the voice playing group to which the first user belongs to the first client of the first user.
9. The apparatus of claim 8, wherein the interactive system further establishes a communication connection with a second client of a second user, and the interactive system is configured to distribute multimedia data transmitted by the second client to a first client of the at least two first users;
the device further comprises:
an abnormal user determining unit, configured to determine, from the at least two first users, at least one target first user whose difference degree between the voice contents of the at least two first users and the voice contents of other first users meets a condition, based on the voice contents of the at least two first users;
and the abnormal voice content prompt unit is used for sending a voice content abnormal prompt to a second client of the second user, wherein the voice content abnormal prompt indicates that the voice content of the target first user is abnormal, so that the target first user is identified in an online interactive interface of the second client.
10. An electronic device, comprising:
a memory and a processor;
wherein the processor is configured to perform the voice synchronous output method according to any one of claims 1 to 7;
the memory is used for storing programs needed by the processor to perform operations.
CN202010618903.1A 2020-06-30 2020-06-30 Voice synchronous output method and device and electronic equipment Pending CN111798848A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010618903.1A CN111798848A (en) 2020-06-30 2020-06-30 Voice synchronous output method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010618903.1A CN111798848A (en) 2020-06-30 2020-06-30 Voice synchronous output method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN111798848A true CN111798848A (en) 2020-10-20

Family

ID=72810771

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010618903.1A Pending CN111798848A (en) 2020-06-30 2020-06-30 Voice synchronous output method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN111798848A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080065205A (en) * 2007-01-08 2008-07-11 임영희 Customized learning system, customized learning method, and learning device
JP2016004066A (en) * 2014-06-13 2016-01-12 株式会社Nttドコモ Management device, conversation system, conversation management method, and program
CN108965904A (en) * 2018-09-05 2018-12-07 北京优酷科技有限公司 A kind of volume adjusting method and client of direct broadcasting room
CN109671429A (en) * 2018-12-02 2019-04-23 腾讯科技(深圳)有限公司 Voice interactive method and equipment
CN110491383A (en) * 2019-09-25 2019-11-22 北京声智科技有限公司 A kind of voice interactive method, device, system, storage medium and processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20080065205A (en) * 2007-01-08 2008-07-11 임영희 Customized learning system, customized learning method, and learning device
JP2016004066A (en) * 2014-06-13 2016-01-12 株式会社Nttドコモ Management device, conversation system, conversation management method, and program
CN108965904A (en) * 2018-09-05 2018-12-07 北京优酷科技有限公司 A kind of volume adjusting method and client of direct broadcasting room
CN109671429A (en) * 2018-12-02 2019-04-23 腾讯科技(深圳)有限公司 Voice interactive method and equipment
CN110491383A (en) * 2019-09-25 2019-11-22 北京声智科技有限公司 A kind of voice interactive method, device, system, storage medium and processor

Similar Documents

Publication Publication Date Title
US10872535B2 (en) Facilitating facial recognition, augmented reality, and virtual reality in online teaching groups
Jepson Conversations—and negotiated interaction—in text and voice chat rooms
US9621851B2 (en) Augmenting web conferences via text extracted from audio content
Doherty-Sneddon et al. Face-to-face and video-mediated communication: A comparison of dialogue structure and task performance.
US8791977B2 (en) Method and system for presenting metadata during a videoconference
CN113014732B (en) Conference record processing method and device, computer equipment and storage medium
CN106471802A (en) Real-time video conversion in video conference
CN102170553A (en) Conference system, information processor, conference supporting method and information processing method
US10084829B2 (en) Auto-generation of previews of web conferences
US20160329050A1 (en) Meeting assistant
KR20080021594A (en) System and method for learning languages
JP2024026295A (en) Privacy-friendly conference room transcription from audio-visual streams
US20230343339A1 (en) Automated Audio-to-Text Transcription in Multi-Device Teleconferences
CN112866619A (en) Teleconference control method and device, electronic equipment and storage medium
JP2003323628A (en) Device and program for video identifying speaker and method of displaying video identifying speaker
Mancini Simulated interaction: How the television journalist speaks
CN111798848A (en) Voice synchronous output method and device and electronic equipment
US11318373B2 (en) Natural speech data generation systems and methods
US20230353613A1 (en) Active speaker proxy presentation for sign language interpreters
CN112562677B (en) Conference voice transcription method, device, equipment and storage medium
JP3930402B2 (en) ONLINE EDUCATION SYSTEM, INFORMATION PROCESSING DEVICE, INFORMATION PROVIDING METHOD, AND PROGRAM
JP7152453B2 (en) Information processing device, information processing method, information processing program, and information processing system
Dutt et al. Video, talk and text: How do parties communicate coherently across modalities in live videostreams?
Kawashima et al. Visual filler: facilitating smooth turn-taking in video conferencing with transmission delay
Hollier et al. Synchronization accessibility user requirements

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination