CN112261337B - Method and equipment for playing voice information in multi-person voice - Google Patents

Method and equipment for playing voice information in multi-person voice Download PDF

Info

Publication number
CN112261337B
CN112261337B CN202011049085.4A CN202011049085A CN112261337B CN 112261337 B CN112261337 B CN 112261337B CN 202011049085 A CN202011049085 A CN 202011049085A CN 112261337 B CN112261337 B CN 112261337B
Authority
CN
China
Prior art keywords
user
virtual
information
target
users
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011049085.4A
Other languages
Chinese (zh)
Other versions
CN112261337A (en
Inventor
程翰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Lianshang Network Technology Co Ltd
Original Assignee
Shanghai Lianshang Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Lianshang Network Technology Co Ltd filed Critical Shanghai Lianshang Network Technology Co Ltd
Priority to CN202011049085.4A priority Critical patent/CN112261337B/en
Publication of CN112261337A publication Critical patent/CN112261337A/en
Priority to PCT/CN2021/119542 priority patent/WO2022068640A1/en
Application granted granted Critical
Publication of CN112261337B publication Critical patent/CN112261337B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities
    • H04M3/568Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities audio processing specific to telephonic conferencing, e.g. spatial distribution, mixing of participants
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • H04N7/157Conference systems defining a virtual conference space and using avatars or agents
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information

Abstract

The application aims to provide a method and equipment for playing voice information in multi-person voice, wherein the method comprises the following steps: for a target user in a plurality of users participating in multi-user voice, determining virtual position information of other users in a virtual sound field corresponding to the target user, and generating virtual sound field information corresponding to the target user according to the virtual position information; and sending the virtual sound field information to user equipment corresponding to the target user, so that the user equipment plays the voice information of the user according to the virtual position information of each user in the target virtual sound field. The method and the device can enable each user to clearly and accurately distinguish each person's voice in the multi-person voice, and can intuitively and quickly know which other user is speaking at present, so that great convenience can be provided for the user in the multi-person voice.

Description

Method and equipment for playing voice information in multi-person voice
Technical Field
The present application relates to the field of communications, and in particular, to a technique for playing voice information in multi-user voice.
Background
With the development of the era, voice communication has become one of the most popular and popular communication modes, in the prior art, multi-user voice communication refers to that a plurality of users use clients on terminal devices such as mobile phones and PCs to communicate in real time through a network by using voice, and a common multi-user voice communication scheme is that each client receives real-time voice information of other clients, and then locally mixes the received real-time voice information to obtain local mixed voice information and play the local mixed voice information.
Disclosure of Invention
An object of the present application is to provide a method and apparatus for playing voice information in multi-user voice.
According to one aspect of the application, a method applied to a network device end for playing voice information in multi-person voice is provided, and the method includes:
for a target user in a plurality of users participating in multi-user voice, determining virtual position information of other users in a virtual sound field corresponding to the target user, and generating virtual sound field information corresponding to the target user according to the virtual position information;
and sending the virtual sound field information to user equipment corresponding to the target user, so that the user equipment plays the voice information of the user according to the virtual position information of each user in the target virtual sound field.
According to an aspect of the present application, there is provided a network device for playing voice information in multi-person voice, the device including:
the one-to-one module is used for determining virtual position information of other users in a virtual sound field corresponding to a target user in the multiple users participating in the multi-user voice, and generating virtual sound field information corresponding to the target user according to the virtual position information;
and the second module is used for sending the virtual sound field information to user equipment corresponding to the target user so that the user equipment plays the voice information of the user according to the virtual position information of each user in the target virtual sound field.
According to an aspect of the present application, there is provided an apparatus for playing voice information in a voice of a plurality of persons, wherein the apparatus comprises:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to:
for a target user in a plurality of users participating in multi-user voice, determining virtual position information of other users in a virtual sound field corresponding to the target user, and generating virtual sound field information corresponding to the target user according to the virtual position information;
and sending the virtual sound field information to user equipment corresponding to the target user, so that the user equipment plays the voice information of the user according to the virtual position information of each user in the target virtual sound field.
According to one aspect of the application, there is provided a computer-readable medium storing instructions that, when executed, cause a system to:
for a target user in a plurality of users participating in multi-user voice, determining virtual position information of other users in a virtual sound field corresponding to the target user, and generating virtual sound field information corresponding to the target user according to the virtual position information;
and sending the virtual sound field information to user equipment corresponding to the target user, so that the user equipment plays the voice information of the user according to the virtual position information of each user in the target virtual sound field.
Compared with the prior art, the method and the device have the advantages that the virtual position information of other users in the virtual sound field corresponding to the user can be determined for each user of multiple users participating in voice, and then the voice information of other users can be played according to the virtual position information of other users in the virtual sound field corresponding to the user, so that each user can clearly and accurately distinguish the voice of each user in multi-user voice, and the current other user can be intuitively and quickly known to speak, and great convenience can be provided for the user in multi-user voice.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
fig. 1 is a flowchart illustrating a method for playing voice information in multi-user voice applied to a network device according to an embodiment of the present application;
fig. 2 is a diagram illustrating a network device for playing voice information in multi-user voice according to an embodiment of the present application;
FIG. 3 illustrates an exemplary system that can be used to implement the various embodiments described in this application.
The same or similar reference numbers in the drawings identify the same or similar elements.
Detailed Description
The present application is described in further detail below with reference to the attached figures.
In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party each include one or more processors (e.g., central Processing Units (CPUs)), input/output interfaces, network interfaces, and memory.
The Memory may include forms of volatile Memory, random Access Memory (RAM), and/or non-volatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory. Memory is an example of a computer-readable medium.
Computer-readable media, including both permanent and non-permanent, removable and non-removable media, may implement the information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase-Change Memory (PCM), programmable Random Access Memory (PRAM), static Random-Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash Memory or other Memory technology, compact Disc Read Only Memory (CD-ROM), digital Versatile Disc (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The device referred to in this application includes, but is not limited to, a user device, a network device, or a device formed by integrating a user device and a network device through a network. The user equipment includes, but is not limited to, any mobile electronic product capable of performing human-computer interaction with a user (e.g., human-computer interaction through a touch panel), such as a smart phone, a tablet computer, and the like, and the mobile electronic product may employ any operating system, such as an Android operating system, an iOS operating system, and the like. The network Device includes an electronic Device capable of automatically performing numerical calculation and information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded Device, and the like. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud of a plurality of servers; here, the Cloud is composed of a large number of computers or web servers based on Cloud Computing (Cloud Computing), which is a kind of distributed Computing, one virtual supercomputer consisting of a collection of loosely coupled computers. Including, but not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless Ad Hoc network (Ad Hoc network), etc. Preferably, the device may also be a program running on the user device, the network device, or a device formed by integrating the user device and the network device, the touch terminal, or the network device and the touch terminal through a network.
Of course, those skilled in the art will appreciate that the foregoing is by way of example only, and that other existing or future devices, which may be suitable for use in the present application, are also encompassed within the scope of the present application and are hereby incorporated by reference.
In the description of the present application, "a plurality" means two or more unless specifically defined otherwise.
Fig. 1 shows a flowchart of a method for playing voice in multi-person voice applied to a network device according to an embodiment of the present application, where the method includes step S11 and step S12. In step S11, for a target user of multiple users participating in multi-user voice, the network device determines virtual position information of other users in a virtual sound field corresponding to the target user, and generates virtual sound field information corresponding to the target user according to the virtual position information; in step S12, the network device sends the virtual sound field information to the user device corresponding to the target user, so that the user device plays the voice information of the user according to the virtual position information of each of the other users in the target virtual sound field.
In step S11, for a target user of multiple users participating in multi-user speech, the network device determines virtual position information of other users in a virtual sound field corresponding to the target user, and generates virtual sound field information corresponding to the target user according to the virtual position information. In some embodiments, the target user is each of a plurality of users participating in multi-person speech. In some embodiments, the virtual sound field is a relative coordinate system, the relative coordinate system may be a two-dimensional plane coordinate system or a three-dimensional space coordinate system, each user corresponds to a virtual sound field, the virtual position refers to a coordinate point corresponding to the virtual sound field of the user by another user, the virtual position information is a coordinate value corresponding to the coordinate point, and a virtual position corresponding to the user in the virtual sound field of the user is a coordinate origin. For example, in the virtual sound field of User1, the virtual position information corresponding to User1 is (0, 0), the virtual position information corresponding to User2 is (0, 1), in the virtual sound field of User2, the virtual position information corresponding to User1 is (0, -1), and the virtual position information corresponding to User2 is (0, 0). In some embodiments, the coordinate axis unit of the virtual sound field corresponding to a user is a predetermined distance interval, for example, 1 cm, 10 cm, 1 m, etc., the coordinate axis direction is a predetermined direction relative to the user, for example, the positive direction of the X axis is the right of the user, and the positive direction of the Y axis is the front of the user. In some embodiments, relative distance information and relative direction information between two users can be obtained according to the corresponding virtual position information of one user in the virtual sound field of the other user, and the coordinate axis unit and the coordinate axis direction of the virtual sound field. For example, in the virtual sound field of User1, the positive direction of the X axis is the right of User1, the positive direction of the Y axis is the front of User1, the unit of the X axis and the Y axis is 1 meter, the virtual position information corresponding to User1 is (0, 0), and the virtual position information corresponding to User2 is (1, 0), so that it can be found that User2 is 1 meter directly in front of User 1. In some embodiments, for each user in the multi-user speech, the virtual sound field information corresponding to the user includes, but is not limited to, a coordinate axis direction and a coordinate axis unit of the virtual sound field of the user, and corresponding virtual position information (i.e., coordinate values of a coordinate point) of each other user in the virtual sound field of the user.
In step S12, the network device sends the virtual sound field information to the user device corresponding to the target user, so that the user device plays the voice information of the user according to the virtual position information of each of the other users in the target virtual sound field. In some embodiments, for each user in the multi-user voice, the voice information of the other user may be sent from the user equipment corresponding to the other user to the user equipment corresponding to the user via the network device, or may also be sent from the user equipment corresponding to the other user to the user equipment corresponding to the user via a p2p connection established between the two user equipments. In some embodiments, for each user in the multi-user speech, when receiving speech information sent by some other user, according to the virtual position information corresponding to the other user in the virtual sound field of the user, and the coordinate axis direction and the coordinate axis unit of the virtual sound field of the user, the relative distance information and the relative direction information of the other user with respect to the user may be obtained, and the speech information may be played according to the relative distance information and the relative direction information. For example, in the virtual sound field of User1, the positive direction of the X axis is the right of User1, the positive direction of the Y axis is the front of User1, the unit of the X axis and the Y axis is 1 meter, the virtual position information corresponding to User1 is (0, 0), and the virtual position information corresponding to User2 is (0, -2), so that it can be concluded that User2 is 2 meters right behind User1, and the speech information is played according to the relative distance information and the relative direction information. In some embodiments, the manner of playing the voice information according to the relative distance information and the relative direction information may be that the voice information is filtered, delayed, and the like through a Head Related Transfer Function (HRTF) and then output to a speaker of the user equipment to be played, so that the voice of each user can be clearly and accurately distinguished when a plurality of other users speak simultaneously in the multi-user voice, and the user can intuitively and quickly know which other user is speaking currently when each other user speaks, which can provide great convenience for the user in the multi-user voice.
In some embodiments, the determining, for a target user of a plurality of users participating in multi-person speech, virtual position information of other users of the plurality of users in a virtual sound field corresponding to the target user includes step S13 (not shown), step S14 (not shown), and step S15 (not shown). In step S13, the network device determines virtual scene information corresponding to the multi-person voice; in step S14, the network device determines, according to the virtual scene information, a virtual position corresponding to each of the multiple users; in step S15, the network device determines, according to the virtual position corresponding to the target user and the virtual positions corresponding to the other users, virtual position information of the other users in the virtual sound field corresponding to the target user. In some embodiments, the virtual scene may be a virtual two-dimensional scene or a virtual three-dimensional scene, such as a virtual meeting room, a virtual classroom, and the like. In some embodiments, the virtual scene information includes, but is not limited to, visualization information of a virtual scene, configuration information of the virtual scene, and the like, where the visualization information of the virtual scene is used to visually present the virtual scene to a user by means of a 2D scene image or a 3D scene model, so that the user may determine a virtual position of himself or herself in the virtual scene, or browse the virtual position of himself or herself in the virtual scene, and the configuration information of the virtual scene is used to obtain, according to the virtual positions of two users in the virtual scene, relative distance information and relative direction information between the two users, and determine virtual position information of the two users in respective corresponding virtual sound fields. In some embodiments, the virtual scene may be selected by the voice-initiating user in a plurality of default virtual scenes, and may also be at least one target virtual scene selected by at least one user of the plurality of users in the plurality of default virtual scenes, and then the target virtual scene selected by the user the most times is determined as a virtual scene corresponding to the plurality of voices from the at least one target virtual scene, or the virtual scene may also be a target default virtual scene automatically determined in the plurality of default virtual scenes according to voice topic information of the voices of the plurality of persons and matching with the voice topic information. In some embodiments, the virtual location of each user in the virtual scene may be determined by the voice-initiated user for each user, or may be determined by each user individually, or may be determined in a plurality of predetermined virtual locations according to user information corresponding to each user individually, wherein tag information (e.g., "platform") of the virtual location of each user in the virtual scene (e.g., virtual classroom) matches user information (e.g., "language teacher") corresponding to the user. In some embodiments, after determining the virtual scene corresponding to the voices of the multiple persons, the network device sends virtual scene information corresponding to the virtual scene to each user in the multiple voices, and then visually presents the virtual scene to each user through a 2D scene image or a 3D scene model according to visual information in the virtual scene information, and then each user determines a virtual position in the virtual scene through a predetermined interactive operation (e.g., clicking) in the 2D scene image or the 3D scene model, or only sends the virtual scene information corresponding to the virtual scene to the voice-initiating user, and the voice-initiating user determines the virtual position in the virtual scene of each user in the voices of the multiple persons. In some embodiments, for each user, according to the virtual position of the user in the virtual scene, the virtual position of each other user in the virtual scene, and the configuration information corresponding to the virtual scene, relative distance information and relative direction information between the user and each other user may be obtained, and virtual position information of each other user in the virtual sound field corresponding to the user may be determined. In some embodiments, the network device may send the virtual location of each user in the virtual scene to each user and present on the corresponding user device, so that each user may know the virtual location of itself and each other user in the virtual scene. In some embodiments, each user device visually presents to each user its own virtual location in the virtual scene corresponding to each other user in the 2D scene image or 3D scene model corresponding to the virtual scene information, so that each user can visually and quickly know the relative distance and relative direction of the other users in the virtual scene with respect to itself, e.g., the user device can present the identification information (e.g., user name, user ID, etc.) of the corresponding user at each virtual location in the 2D scene image or 3D scene model.
In some embodiments, the step S13 includes: the network equipment obtains identification information corresponding to target virtual scene information selected by a voice initiating user in a plurality of default virtual scene information, and determines the target virtual scene information as virtual scene information corresponding to the multi-user voice. In some embodiments, the voice-initiated user selects a target virtual scene among a plurality of default virtual scenes and sends identification information (e.g., scene name, scene ID, etc.) corresponding to the target virtual scene to the network device. For example, the plurality of default virtual scenes include a virtual conference room 1, a virtual conference room 2, a virtual classroom 1, and a virtual classroom 2, and the voice-initiating user selects the virtual conference room 1 as a target virtual scene from the plurality of default virtual scenes and sends corresponding identification information "virtual conference room 1" to the network device.
In some embodiments, the step S13 includes: the network equipment obtains at least one piece of target virtual scene information selected by at least one user in the plurality of users in the plurality of pieces of default virtual scene information, and determines the virtual scene information corresponding to the multi-person voice from the at least one piece of target virtual scene information, wherein the determined virtual scene information is selected for the most times. In some embodiments, each user may select one or more target virtual scenes from a plurality of default virtual scenes, and send identification information corresponding to the one or more target virtual scenes to the network device, and then the network device determines, from the one or more target virtual scenes, a target virtual scene selected the most times by the user as a virtual scene corresponding to the multi-person voice, and preferably, each user may select only one target virtual scene from the plurality of default virtual scenes.
In some embodiments, the step S13 includes: and the network equipment determines target default virtual scene information matched with the voice theme information from a plurality of pieces of default virtual scene information according to the voice theme information corresponding to the multi-person voice, and determines the target default virtual scene information as the virtual scene information corresponding to the multi-person voice. In some embodiments, the voice topic information corresponding to the multi-person voice may be input by a voice initiating user and then sent to the network device, or the voice initiating user may select, from a plurality of preset default voice topic information, voice topic information corresponding to the multi-person voice, and send identification information (e.g., a topic name, a topic ID, and the like) corresponding to the voice topic information to the network device, where the voice topic information is used to represent a topic of the multi-person voice at this time, including but not limited to "meeting", "technical sharing", and the like. In some embodiments, a default virtual scene matching the voice topic information is determined as a virtual scene corresponding to the multi-person voice in a plurality of default virtual scenes according to the voice topic information corresponding to the multi-person voice, for example, the plurality of default virtual scenes includes a virtual meeting room, a virtual classroom, and a virtual coffee shop, and a default virtual scene "virtual meeting room" matching the voice topic information "meeting" in the plurality of default virtual scenes is determined as a virtual scene corresponding to the multi-person voice according to the voice topic information "meeting" corresponding to the multi-person voice.
In some embodiments, said step S13 comprises a step S16 (not shown). In step S16, the network device determines, according to the user information corresponding to the multiple users, target default virtual scene information that matches the user information from the multiple default virtual scene information, and determines the target default virtual scene information as virtual scene information corresponding to the multi-user voice. In some embodiments, a default virtual scene matching the user information corresponding to each user or the user information corresponding to the voice-originating user is determined as a virtual scene corresponding to the multi-person voice in the plurality of default virtual scene information according to the user information corresponding to each of the plurality of users or according to the user information corresponding to the voice-originating user in the plurality of users.
In some embodiments, the determining, according to the user information corresponding to the plurality of users, target default virtual scenario information that matches the user information from a plurality of default virtual scenario information includes: and the network equipment determines target default virtual scene information matched with the user information from the plurality of pieces of default virtual scene information according to the user information corresponding to the voice initiating user in the plurality of users. For example, the plurality of default virtual scenes include a virtual meeting room, a virtual classroom, and a virtual coffee shop, and the semantic initiating user in the plurality of users corresponds to user information including "occupation: teacher ", then, based on the user information" occupation: teacher ", relating the user information" occupation "in the default virtual scenes: the teacher 'matched default virtual scene' virtual classroom 'is determined as the virtual scene corresponding to the multiple persons' voices.
In some embodiments, the determining, according to the user information corresponding to the multiple users, target default virtual scenario information that matches the user information from multiple default virtual scenario information includes: the network equipment determines at least one piece of default virtual scene information matched with the user information corresponding to each user in the multiple users from the multiple pieces of default virtual scene information according to the user information corresponding to each user in the multiple users, and determines target default virtual scene information from the at least one piece of default virtual scene information, wherein the number of the users matched with the target default virtual scene information is the largest. In some embodiments, for each user of the plurality of users, a default virtual scene matching the user information is determined from a plurality of default virtual scenes according to the user information corresponding to the user. In some embodiments, the default virtual scene with the largest number of users matched with the at least one default virtual scene matched with the user information corresponding to each user is determined as the virtual scene corresponding to the multi-person voice. For example, the plurality of users corresponding to the multi-person voice include User1, user2 and User3, the plurality of default virtual scenes include a virtual meeting room, a virtual classroom and a virtual coffee shop, and the User information of User1 includes "occupation: teacher ", the default virtual scene matched with User1 is the virtual classroom, and the User information of User2 includes" occupation: student ", the default virtual scene that matches User2 is also a virtual classroom, and User 3's User information includes" hobbies: if the default virtual scene matched with the User3 is the virtual coffee hall, the virtual scene 'virtual classroom' with the largest number of matched users in the virtual classroom and the virtual coffee hall is determined as the virtual scene corresponding to the voice of multiple persons.
In some embodiments, the virtual scene information includes a plurality of predetermined virtual locations; wherein the step S14 includes: and for each user in the plurality of users, the network equipment obtains a target preset virtual position corresponding to the user in the plurality of preset virtual positions, and determines the target preset virtual position as the virtual position of the user in the virtual scene information. In some embodiments, a plurality of predetermined virtual positions are included in the virtual scene, each predetermined virtual position is visually presented to the user in the 2D scene image or the 3D scene model corresponding to the virtual scene information, each user respectively corresponds to one target predetermined virtual position in the plurality of predetermined virtual positions, and the target predetermined virtual position respectively corresponding to each user is determined as the virtual position of the user in the virtual scene, preferably, each user respectively corresponds to one different target predetermined virtual position. In some embodiments, the virtual location of the user in the virtual scene can only be one of a plurality of predetermined virtual locations, and not any virtual location in the virtual scene. In some embodiments, the target predetermined virtual location may be selected by the voice-initiating user for each user in a plurality of predetermined virtual locations, or may be selected by each user for himself in a plurality of predetermined virtual locations, or may be determined automatically for each user from a plurality of predetermined virtual locations that match the user information for that user based on the user information for each user individually.
In some embodiments, said obtaining, for each user of the plurality of users, a corresponding target predetermined virtual location of the user in the plurality of predetermined virtual locations comprises: for each user in the plurality of users, obtaining a target predetermined virtual position, designated for the user, of the voice-initiating user in the plurality of predetermined virtual positions. In some embodiments, the voice initiating user specifies a target predetermined virtual location corresponding to each user in the plurality of predetermined virtual locations, and sends identification information of the target predetermined virtual location corresponding to each user to the network device.
In some embodiments, said obtaining, for each user of the plurality of users, a corresponding target predetermined virtual location of the user in the plurality of predetermined virtual locations comprises: and for each user in the plurality of users, determining a target preset virtual position in the plurality of preset virtual positions according to the user information corresponding to the user, wherein the label information of the target preset virtual position in the virtual scene information is matched with the user information corresponding to the user. In some embodiments, for each user, a target predetermined virtual location in the plurality of predetermined virtual locations where corresponding tag information matches the user information in the virtual scene is automatically determined based on the user information corresponding to the user. For example, user 1's User information includes "occupation: the virtual scene is a virtual classroom, tag information corresponding to a predetermined virtual position L1 in a plurality of predetermined virtual positions in the virtual scene is a "platform", and the tag information is associated with User information "occupation" of User 1: the language teacher "matches, whereby the predetermined virtual position L1 can be determined as a target predetermined virtual position corresponding to User1 among the plurality of predetermined virtual positions.
In some embodiments, the obtaining, for each user in the plurality of users, a target predetermined virtual location corresponding to the user in the plurality of predetermined virtual locations includes step S17 (not shown), step S18 (not shown), and step S19 (not shown). In step S17, the network device generates virtual location request information and sends the virtual location request information to each of the multiple users, where the virtual location request information includes the virtual scene information; in step S18, the network device receives feedback information about the virtual location request information sent by at least one of the users, where the feedback information sent by each of the at least one user is used to indicate a target predetermined virtual location selected by the user among the predetermined virtual locations; in step S19, for each of the users, the network device determines, according to the feedback information, a target predetermined virtual location corresponding to the user in the predetermined virtual locations. In some embodiments, virtual location request information including virtual scene information is sent to each user, each user presents a 2D scene image or a 3D scene model corresponding to the virtual scene information after receiving the virtual location request information, and presents a plurality of predetermined virtual locations in the 2D scene image or the 3D scene model, each user individually selects one target predetermined virtual location from the plurality of predetermined virtual locations, and sends feedback information including identification information of the selected target predetermined virtual location to the network device, and after receiving the feedback information sent by a certain user, the network device can obtain the target predetermined virtual location selected by the user from the plurality of predetermined virtual locations, preferably, each user can only select a different target predetermined virtual location, and the plurality of users cannot select the same target predetermined virtual location. In some embodiments, the virtual location request information corresponds to a feedback deadline, and after the feedback deadline is reached, for each user of the plurality of users who does not currently feed back, a respective target predetermined virtual location may be selected by the voice-initiated user from the at least one predetermined virtual location that is not currently selected for each user who does not currently feed back, or a respective target predetermined virtual location may be automatically assigned by the network device from the at least one predetermined virtual location that is not currently selected for each user who does not currently feed back.
In some embodiments, the method further comprises: after receiving feedback information sent by a first user of the multiple users, the network device generates first prompt information corresponding to the feedback information, and sends the first prompt information to other users which have not fed back yet in the multiple users, so as to prompt that a first target preset virtual position indicated by the feedback information is not selectable. In some embodiments, after receiving feedback information sent by the first user and used for selecting the first target predetermined virtual location from the plurality of predetermined virtual locations, the network device generates prompt information corresponding to the feedback information (for example, "the first user has selected the first target predetermined virtual location"), and sends the prompt information to each of the other users that have not been currently fed back, so as to prompt each of the other users that have not been currently fed back that the first target predetermined virtual location cannot be selected. In some embodiments, the user device corresponding to each of the other users may set the first target predetermined virtual location to an unselected state in the 2D scene image or the 3D scene model corresponding to the virtual scene information after receiving the prompt message (e.g., "the first user has selected the first target predetermined virtual location").
In some embodiments, the method further comprises: after receiving feedback information sent by a first user of the multiple users, the network device generates second prompt information corresponding to the feedback information, and sends the second prompt information to other users except the first user of the multiple users to prompt that a first target preset virtual position indicated by the feedback information is selected by the first user. In some embodiments, the network device sends a prompt (e.g., "the first user has selected the first target predetermined virtual location") to each of the other users in the plurality of users other than the first user to prompt each of the other users that the first target predetermined virtual location has been selected by the first user so that each user can be made aware of the virtual locations of the other users in the virtual scene. In some embodiments, identification information (e.g., user name, user ID, etc.) of the first user is presented at the first target predetermined virtual location in a 2D scene image or 3D scene model to which the virtual scene information corresponds.
In some embodiments, the method further comprises: the network equipment receives invitation request information sent by a second user of the at least one user, wherein the second user selects a second target scheduled virtual position in the plurality of scheduled virtual positions, and the invitation request information is used for inviting a third user which is not currently fed back in the plurality of users to select a scheduled virtual position near the second target scheduled virtual position; and sending the invitation request information to the third user to prompt the third user to select the unselected preset virtual position near the second target preset virtual position as the target preset virtual position corresponding to the third user. In some embodiments, the second user has selected a second target predetermined virtual location among the plurality of predetermined virtual locations as its own virtual location in the virtual scene, and in response to an invitation triggering operation performed by the second user for a third user that is not currently feeding back, invitation request information for inviting the second user to select a predetermined virtual location near the second target predetermined virtual location is generated and sent to the network device. In some embodiments, it is required to detect whether the third user has currently selected the predetermined virtual location corresponding to the third user, and if the third user has not currently selected the predetermined virtual location, the invitation triggering operation may be performed with respect to the third user. In some embodiments, after receiving the invitation request information for the third user, the network device forwards the invitation request information to the third user to prompt the third user to select an unselected predetermined virtual location near the second target predetermined virtual location as the target predetermined virtual location corresponding to the third user. In some embodiments, upon receiving the prompt (e.g., "the first user has selected the first target predetermined virtual location"), at least one predetermined virtual location that is not selected near the second target predetermined virtual location may be set to a special display state (e.g., highlighted) in the 2D scene image or the 3D scene model to which the virtual scene information corresponds to guide the third user to select one of the at least one predetermined virtual location as the target predetermined virtual location.
In some embodiments, the method further comprises: after reaching the predetermined feedback time limit corresponding to the virtual location request information, the network device determines, for each user that does not currently feed back, a target predetermined virtual location corresponding to the user in at least one predetermined virtual location that is not currently selected among the plurality of predetermined virtual locations, and determines the target predetermined virtual location as the virtual location of the user in the virtual scene information. In some embodiments, the virtual location request information corresponds to a predetermined feedback duration (e.g., 5 minutes), which may be default by the network device or set by the voice-initiated user, and after the feedback duration is reached, for each user of the plurality of users who is not currently feeding back, a respective target predetermined virtual location may be selected by the voice-initiated user for each currently non-fed-back user from the at least one currently non-selected predetermined virtual location, or a respective target predetermined virtual location may be automatically assigned by the network device for each currently non-fed-back user from the at least one currently non-selected predetermined virtual location. In some embodiments, different predetermined virtual positions correspond to different priorities in the virtual scene according to respective tag information, and the respective corresponding target predetermined virtual positions may be automatically allocated to the currently unrevealed users according to the order of priorities from high to low. For example, if the virtual scene is a virtual auditorium, the priorities corresponding to the plurality of predetermined virtual positions of which the tag information is in the "first row" in the virtual scene are greater than the priorities corresponding to the plurality of predetermined virtual positions of which the tag information is in the "second row", and the currently unselected predetermined virtual positions of which the tag information is in the "first row" are automatically allocated to the currently unrevealed user as the corresponding target predetermined virtual positions.
In some embodiments, the determining, for each user of the plurality of users who is not currently feeding back, a corresponding target predetermined virtual location of the user in at least one predetermined virtual location of the plurality of predetermined virtual locations that is not currently selected includes: determining hot spot position area information in the virtual scene information according to the virtual position of at least one user in the virtual scene information which is fed back currently in the plurality of users; and for each user which is not currently fed back in the plurality of users, determining an unselected predetermined virtual position in the hot spot position area information as the virtual position of the user in the virtual scene information. In some embodiments, after the feedback time limit is reached, a hot spot location area where the distribution of the user virtual locations in the virtual scene is dense is determined according to the distribution situation of the user virtual locations corresponding to each user that has been currently fed back in the virtual scene, a predetermined virtual location is automatically allocated to each user that has not been currently fed back as a respective corresponding target predetermined virtual location from one or more predetermined virtual locations that have not been selected in the hot spot location area preferentially, and if all the predetermined virtual locations in the hot spot location area have been selected, a predetermined virtual location is automatically allocated to each user that has not been currently fed back as a respective corresponding target predetermined virtual location from other predetermined virtual locations that have not been selected in the virtual scene.
In some embodiments, the determining, for each user of the plurality of users who does not currently feed back, a corresponding target predetermined virtual location of the user in at least one predetermined virtual location of the plurality of predetermined virtual locations that is not currently selected includes: and for each user which is not currently fed back in the plurality of users, determining a target preset virtual position corresponding to the user in at least one preset virtual position which is not currently selected in the plurality of preset virtual positions, wherein the label information of the target preset virtual position in the virtual scene information is matched with the user information corresponding to the user. In some embodiments, for each user not currently fed back, according to the user information corresponding to the user, a predetermined virtual position, from at least one predetermined virtual position not currently selected, is determined for the user, where corresponding tag information in the virtual scene matches the user information, as a target predetermined virtual position corresponding to the user. For example, user information of User1 that is not currently fed back includes "occupation: a teacher ", the virtual scene is a virtual classroom, tag information corresponding to a predetermined virtual position L1 of at least one predetermined virtual position that is not currently selected in the virtual scene is a" platform ", and the tag information is associated with User information" occupation "of User 1: teacher "matches, whereby the predetermined virtual position L1 can be automatically assigned to User1 as its corresponding target predetermined virtual position.
Fig. 2 is a block diagram of a network device for playing voice in multi-user voice according to an embodiment of the present application, where the network device includes a one-module 11 and a two-module 12. A one-to-one module 11, configured to determine, for a target user among multiple users participating in multi-user speech, virtual position information of other users in a virtual sound field corresponding to the target user, and generate, according to the virtual position information, virtual sound field information corresponding to the target user; a second module 12, configured to send the virtual sound field information to user equipment corresponding to the target user, so that the user equipment plays the voice information of the user according to virtual position information of each of the other users in the target virtual sound field.
The one-to-one module 11 is configured to determine, for a target user among multiple users participating in multi-user speech, virtual position information of other users in a virtual sound field corresponding to the target user, and generate virtual sound field information corresponding to the target user according to the virtual position information. In some embodiments, the target user is each of a plurality of users participating in multi-person speech. In some embodiments, the virtual sound field is a relative coordinate system, the relative coordinate system may be a two-dimensional plane coordinate system or a three-dimensional space coordinate system, each user corresponds to a virtual sound field, the virtual position refers to a coordinate point corresponding to the virtual sound field of the user by another user, the virtual position information is a coordinate value corresponding to the coordinate point, and a virtual position corresponding to the user in the virtual sound field of the user is a coordinate origin. For example, in the virtual sound field of User1, the virtual position information corresponding to User1 is (0, 0), the virtual position information corresponding to User2 is (0, 1), in the virtual sound field of User2, the virtual position information corresponding to User1 is (0, -1), and the virtual position information corresponding to User2 is (0, 0). In some embodiments, the coordinate axis unit of the virtual sound field corresponding to a user is a predetermined distance interval, for example, 1 cm, 10 cm, 1 m, etc., the coordinate axis direction is a predetermined direction relative to the user, for example, the positive direction of the X axis is the right of the user, and the positive direction of the Y axis is the front of the user. In some embodiments, relative distance information and relative direction information between two users can be obtained according to the corresponding virtual position information of one user in the virtual sound field of the other user, and the coordinate axis unit and the coordinate axis direction of the virtual sound field. For example, in the virtual sound field of User1, the positive direction of the X axis is the right of User1, the positive direction of the Y axis is the front of User1, the unit of the X axis and the Y axis is 1 meter, the virtual position information corresponding to User1 is (0, 0), and the virtual position information corresponding to User2 is (1, 0), so that it can be found that User2 is 1 meter directly in front of User 1. In some embodiments, for each user in the multi-user speech, the virtual sound field information corresponding to the user includes, but is not limited to, a coordinate axis direction and a coordinate axis unit of the virtual sound field of the user, and corresponding virtual position information (i.e., coordinate values of a coordinate point) of each other user in the virtual sound field of the user.
A secondary module 12, configured to send the virtual sound field information to user equipment corresponding to the target user, so that the user equipment plays the voice information of the user according to virtual position information of each user in the target virtual sound field in the other users. In some embodiments, for each user in the multi-user voice, the voice information of the other user may be sent from the user equipment corresponding to the other user to the user equipment corresponding to the user via the network device, or may also be sent from the user equipment corresponding to the other user to the user equipment corresponding to the user via a p2p connection established between the two user equipments. In some embodiments, for each user in the multi-user speech, when receiving speech information sent by some other user, according to the virtual position information corresponding to the other user in the virtual sound field of the user, and the coordinate axis direction and the coordinate axis unit of the virtual sound field of the user, the relative distance information and the relative direction information of the other user with respect to the user may be obtained, and the speech information may be played according to the relative distance information and the relative direction information. For example, in the virtual sound field of User1, the positive direction of the X axis is the right of User1, the positive direction of the Y axis is the front of User1, the unit of the X axis and the Y axis is 1 meter, the virtual position information corresponding to User1 is (0, 0), and the virtual position information corresponding to User2 is (0, -2), so that it can be concluded that User2 is 2 meters right behind User1, and the speech information is played according to the relative distance information and the relative direction information. In some embodiments, the manner of playing the voice information according to the relative distance information and the relative direction information may be that the voice information is filtered, delayed, and the like through a Head Related Transfer Function (HRTF) and then output to a speaker of the user equipment to be played, so that the voice of each user can be clearly and accurately distinguished when a plurality of other users speak simultaneously in the multi-user voice, and the user can intuitively and quickly know which other user is speaking currently when each other user speaks, which can provide great convenience for the user in the multi-user voice.
In some embodiments, for a target user of a plurality of users participating in multi-person speech, determining virtual position information of other users of the plurality of users in a virtual sound field corresponding to the target user includes a three-module 13 (not shown), a four-module 14 (not shown), and a five-module 15 (not shown). A third module 13, configured to determine virtual scene information corresponding to the multi-person voice; a fourth module 14, configured to determine, according to the virtual scene information, a virtual location corresponding to each user of the multiple users; a fifth module 15, configured to determine, according to the virtual position corresponding to the target user and the virtual positions corresponding to the other users, virtual position information of the other users in the virtual sound field corresponding to the target user. Here, the specific implementation of the third module 13, the fourth module 14 and the fifth module 15 is the same as or similar to the embodiment related to steps S13, S14 and S15 in fig. 1, and therefore, the detailed description is omitted, and the detailed implementation is incorporated herein by reference.
In some embodiments, the one-three module 13 is configured to: and obtaining identification information corresponding to target virtual scene information selected by a voice initiating user in a plurality of default virtual scene information, and determining the target virtual scene information as the virtual scene information corresponding to the multi-user voice. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
In some embodiments, the one-three module 13 is configured to: the method comprises the steps of obtaining at least one piece of target virtual scene information selected by at least one user in a plurality of pieces of default virtual scene information, and determining the virtual scene information corresponding to the multi-person voice from the at least one piece of target virtual scene information, wherein the determined virtual scene information is selected most frequently. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
In some embodiments, the one-three module 13 is configured to: and determining target default virtual scene information matched with the voice theme information from a plurality of pieces of default virtual scene information according to the voice theme information corresponding to the multi-person voice, and determining the target default virtual scene information as the virtual scene information corresponding to the multi-person voice. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described again, and are included herein by reference.
In some embodiments, the one-three module 13 includes one-six module 16 (not shown). A sixth module 16, configured to determine, according to the user information corresponding to the multiple users, target default virtual scene information that matches the user information from multiple pieces of default virtual scene information, and determine the target default virtual scene information as virtual scene information corresponding to the multi-user voice. Here, the specific implementation of a sixth module 16 is the same as or similar to the embodiment related to step S16 in fig. 1, and therefore, the detailed description is omitted, and the detailed implementation is incorporated herein by reference.
In some embodiments, the determining, according to the user information corresponding to the plurality of users, target default virtual scenario information that matches the user information from a plurality of default virtual scenario information includes: and the network equipment determines target default virtual scene information matched with the user information from the plurality of pieces of default virtual scene information according to the user information corresponding to the voice initiating user in the plurality of users. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
In some embodiments, the determining, according to the user information corresponding to the plurality of users, target default virtual scenario information that matches the user information from a plurality of default virtual scenario information includes: the network equipment determines at least one piece of default virtual scene information matched with the user information corresponding to each user in the multiple users from the multiple pieces of default virtual scene information according to the user information corresponding to each user in the multiple users, and determines target default virtual scene information from the at least one piece of default virtual scene information, wherein the number of the users matched with the target default virtual scene information is the largest. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
In some embodiments, the virtual context information includes a plurality of predetermined virtual locations; wherein the one four-module 14: and the virtual scene information acquiring unit is used for acquiring a target preset virtual position corresponding to the user in the plurality of preset virtual positions for each user in the plurality of users, and determining the target preset virtual position as the virtual position of the user in the virtual scene information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described again, and are included herein by reference.
In some embodiments, the obtaining, for each user of the plurality of users, a corresponding target predetermined virtual location of the user in the plurality of predetermined virtual locations includes: for each user of the plurality of users, obtaining a target predetermined virtual location of a voice-initiating user of the plurality of users, which is designated for the user in the plurality of predetermined virtual locations. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
In some embodiments, the obtaining, for each user of the plurality of users, a corresponding target predetermined virtual location of the user in the plurality of predetermined virtual locations includes: and for each user in the plurality of users, determining a target preset virtual position in the plurality of preset virtual positions according to the user information corresponding to the user, wherein the label information of the target preset virtual position in the virtual scene information is matched with the user information corresponding to the user. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
In some embodiments, obtaining, for each user in the plurality of users, a target predetermined virtual location corresponding to the user in the plurality of predetermined virtual locations includes a seven module 17 (not shown), an eight module 18 (not shown), and a nine module 19 (not shown). A seventh module 17, configured to generate virtual location request information and send the virtual location request information to each user of the multiple users, where the virtual location request information includes the virtual scene information; an eight module 18, configured to receive feedback information sent by at least one of the users about the virtual location request information, where the feedback information sent by each of the at least one user is used to indicate a target predetermined virtual location selected by the user among the plurality of predetermined virtual locations; a nine module 19, configured to, for each user in the multiple users, determine, according to the feedback information, a target predetermined virtual location corresponding to the user in the multiple predetermined virtual locations. Here, the specific implementation manners of the seven module 17, the eight module 18 and the nine module 19 are the same as or similar to the embodiments related to steps S17, S18 and S19 in fig. 1, and therefore are not described herein again, and are included herein by reference.
In some embodiments, the apparatus is further configured to: after receiving feedback information sent by a first user of the multiple users, generating first prompt information corresponding to the feedback information, and sending the first prompt information to other users which have not fed back so as to prompt that a first target preset virtual position indicated by the feedback information is not selectable. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and thus are not described again, and are included herein by reference.
In some embodiments, the apparatus is further configured to: after receiving feedback information sent by a first user of the multiple users, generating second prompt information corresponding to the feedback information, and sending the second prompt information to other users except the first user of the multiple users to prompt that a first target preset virtual position indicated by the feedback information is selected by the first user. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
In some embodiments, the apparatus is further configured to: receiving invitation request information sent by a second user of the at least one user, wherein the second user selects a second target predetermined virtual position in the plurality of predetermined virtual positions, and the invitation request information is used for inviting a third user which is not currently fed back in the plurality of users to select a predetermined virtual position near the second target predetermined virtual position; and sending the invitation request information to the third user to prompt the third user to select the unselected preset virtual position near the second target preset virtual position as the target preset virtual position corresponding to the third user. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
In some embodiments, the apparatus is further configured to: after a predetermined feedback time limit corresponding to the virtual position request information is reached, for each user which does not feed back currently in the plurality of users, determining a target predetermined virtual position corresponding to the user in at least one predetermined virtual position which is not selected currently in the plurality of predetermined virtual positions, and determining the target predetermined virtual position as the virtual position of the user in the virtual scene information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
In some embodiments, the determining, for each user of the plurality of users who is not currently feeding back, a corresponding target predetermined virtual location of the user in at least one predetermined virtual location of the plurality of predetermined virtual locations that is not currently selected includes: determining hot spot position area information in the virtual scene information according to the virtual position of at least one user in the virtual scene information which is fed back currently in the plurality of users; and for each user which is not currently fed back in the plurality of users, determining an unselected predetermined virtual position in the hot spot position area information as the virtual position of the user in the virtual scene information. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
In some embodiments, the determining, for each user of the plurality of users who does not currently feed back, a corresponding target predetermined virtual location of the user in at least one predetermined virtual location of the plurality of predetermined virtual locations that is not currently selected includes: and for each user which is not currently fed back in the plurality of users, determining a target scheduled virtual position corresponding to the user in at least one scheduled virtual position which is not currently selected in the plurality of scheduled virtual positions, wherein the tag information of the target scheduled virtual position in the virtual scene information is matched with the user information corresponding to the user. Here, the related operations are the same as or similar to those of the embodiment shown in fig. 1, and therefore are not described again, and are included herein by reference.
FIG. 3 illustrates an exemplary system that can be used to implement the various embodiments described in this application.
In some embodiments, as illustrated in FIG. 3, the system 300 can be implemented as any of the devices in each of the described embodiments. In some embodiments, system 300 may include one or more computer-readable media (e.g., system memory or NVM/storage 320) having instructions and one or more processors (e.g., processor(s) 305) coupled with the one or more computer-readable media and configured to execute the instructions to implement modules to perform the actions described herein.
For one embodiment, system control module 310 may include any suitable interface controllers to provide any suitable interface to at least one of processor(s) 305 and/or any suitable device or component in communication with system control module 310.
The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315. Memory controller module 330 may be a hardware module, a software module, and/or a firmware module.
System memory 315 may be used, for example, to load and store data and/or instructions for system 300. For one embodiment, system memory 315 may include any suitable volatile memory, such as suitable DRAM. In some embodiments, the system memory 315 may include a double data rate type four synchronous dynamic random access memory (DDR 4 SDRAM).
For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide an interface to NVM/storage 320 and communication interface(s) 325.
For example, NVM/storage 320 may be used to store data and/or instructions. NVM/storage 320 may include any suitable non-volatile memory (e.g., flash memory) and/or may include any suitable non-volatile storage device(s) (e.g., one or more Hard Disk Drives (HDDs), one or more Compact Disc (CD) drives, and/or one or more Digital Versatile Disc (DVD) drives).
NVM/storage 320 may include storage resources that are physically part of the device on which system 300 is installed or may be accessed by the device and not necessarily part of the device. For example, NVM/storage 320 may be accessible over a network via communication interface(s) 325.
Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. System 300 may wirelessly communicate with one or more components of a wireless network according to any of one or more wireless network standards and/or protocols.
For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) of the system control module 310, such as memory controller module 330. For one embodiment, at least one of the processor(s) 305 may be packaged together with logic for one or more controller(s) of the system control module 310 to form a System In Package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310. For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic for one or more controller(s) of the system control module 310 to form a system on a chip (SoC).
In various embodiments, system 300 may be, but is not limited to being: a server, a workstation, a desktop computing device, or a mobile computing device (e.g., a laptop computing device, a holding computing device, a tablet, a netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, a keyboard, a Liquid Crystal Display (LCD) screen (including a touch screen display), a non-volatile memory port, multiple antennas, a graphics chip, an Application Specific Integrated Circuit (ASIC), and speakers.
The present application also provides a computer readable storage medium having stored thereon computer code which, when executed, performs a method as in any one of the preceding.
The present application also provides a computer program product, which when executed by a computer device, performs the method of any of the preceding claims.
The present application further provides a computer device, comprising:
one or more processors;
a memory for storing one or more computer programs;
the one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method as recited in any preceding claim.
It should be noted that the present application may be implemented in software and/or a combination of software and hardware, for example, implemented using Application Specific Integrated Circuits (ASICs), general purpose computers or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.
Additionally, some portions of the present application may be applied as a computer program product, such as computer program instructions, which, when executed by a computer, may invoke or provide the method and/or solution according to the present application through the operation of the computer. Those skilled in the art will appreciate that the form in which the computer program instructions reside on a computer-readable medium includes, but is not limited to, source files, executable files, installation package files, and the like, and that the manner in which the computer program instructions are executed by a computer includes, but is not limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding installed program. Computer-readable media herein can be any available computer-readable storage media or communication media that can be accessed by a computer.
Communication media includes media by which communication signals, including, for example, computer readable instructions, data structures, program modules, or other data, are transmitted from one system to another. Communication media may include conductive transmission media such as cables and wires (e.g., fiber optics, coaxial, etc.) and wireless (non-conductive transmission) media capable of propagating energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared. Computer readable instructions, data structures, program modules or other data may be embodied in a modulated data signal, such as a carrier wave or similar mechanism that is embodied in a wireless medium, such as part of spread-spectrum techniques, for example. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. The modulation may be analog, digital or hybrid modulation techniques.
By way of example, and not limitation, computer-readable storage media may include volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. For example, computer-readable storage media include, but are not limited to, volatile memory such as random access memory (RAM, DRAM, SRAM); and non-volatile memory such as flash memory, various read-only memories (ROM, PROM, EPROM, EEPROM), magnetic and ferromagnetic/ferroelectric memories (MRAM, feRAM); and magnetic and optical storage devices (hard disk, tape, CD, DVD); or other now known media or later developed that are capable of storing computer-readable information/data for use by a computer system.
An embodiment according to the present application comprises an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein the computer program instructions, when executed by the processor, trigger the apparatus to perform a method and/or a solution according to the aforementioned embodiments of the present application.
It will be evident to those skilled in the art that the application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it will be obvious that the term "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the apparatus claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

Claims (18)

1. A method for playing voice information in multi-person voice is applied to a network equipment terminal, wherein the method comprises the following steps:
determining virtual scene information corresponding to the multi-person voice; determining a virtual position corresponding to each user in the plurality of users according to the virtual scene information; determining virtual position information of other users in a virtual sound field corresponding to a target user according to virtual positions corresponding to the target user and virtual positions corresponding to other users, and generating virtual sound field information corresponding to the target user according to the virtual position information, wherein the virtual sound field is a relative coordinate system, the virtual position corresponding to the target user in the virtual sound field is a coordinate origin, the virtual position information is a coordinate value of a coordinate point corresponding to the other users in the virtual sound field, and each user corresponds to one virtual sound field;
sending the virtual sound field information to user equipment corresponding to the target user, so that the user equipment plays the voice information of each user in the virtual sound field according to the virtual position information of each user in the other users;
wherein, the determining the virtual scene information corresponding to the multi-person voice comprises:
and determining target default virtual scene information matched with the voice theme information from a plurality of pieces of default virtual scene information according to the voice theme information corresponding to the multi-person voice, and determining the target default virtual scene information as the virtual scene information corresponding to the multi-person voice.
2. The method of claim 1, wherein the determining the virtual scene information corresponding to the multi-person voice comprises:
and obtaining identification information corresponding to target virtual scene information selected by a voice initiating user in a plurality of default virtual scene information, and determining the target virtual scene information as the virtual scene information corresponding to the multi-user voice.
3. The method of claim 1, wherein the determining the virtual scene information corresponding to the multi-person voice comprises:
the method comprises the steps of obtaining at least one piece of target virtual scene information selected by at least one user in a plurality of pieces of default virtual scene information, and determining the virtual scene information corresponding to the multi-person voice from the at least one piece of target virtual scene information, wherein the determined virtual scene information is selected most frequently.
4. The method of claim 1, wherein the determining the virtual scene information corresponding to the multi-person voice comprises:
and determining target default virtual scene information matched with the user information from a plurality of pieces of default virtual scene information according to the user information corresponding to the plurality of users, and determining the target default virtual scene information as the virtual scene information corresponding to the multi-user voice.
5. The method of claim 4, wherein the determining, according to the user information corresponding to the plurality of users, target default virtual scenario information that matches the user information from a plurality of default virtual scenario information comprises:
and determining target default virtual scene information matched with the user information from a plurality of pieces of default virtual scene information according to the user information corresponding to the voice initiating user in the plurality of users.
6. The method of claim 4, wherein the determining, according to the user information corresponding to the plurality of users, target default virtual scenario information that matches the user information from a plurality of default virtual scenario information comprises:
determining at least one piece of default virtual scene information matched with the user information corresponding to each of the multiple users from the multiple pieces of default virtual scene information according to the user information corresponding to each of the multiple users, and determining target default virtual scene information from the at least one piece of default virtual scene information, wherein the number of users matched with the target default virtual scene information is the largest.
7. The method of claim 1, wherein the virtual scene information includes a plurality of predetermined virtual locations;
wherein the determining the virtual position corresponding to each of the plurality of users according to the virtual scene information includes:
and for each user in the plurality of users, obtaining a target preset virtual position corresponding to the user in the plurality of preset virtual positions, and determining the target preset virtual position as the virtual position of the user in the virtual scene information.
8. The method of claim 7, wherein said obtaining, for each of the plurality of users, a corresponding target predetermined virtual location of the user in the plurality of predetermined virtual locations comprises:
for each user in the plurality of users, obtaining a target predetermined virtual position, designated for the user, of the voice-initiating user in the plurality of predetermined virtual positions.
9. The method of claim 7, wherein said obtaining, for each of the plurality of users, a corresponding target predetermined virtual location of the user in the plurality of predetermined virtual locations comprises:
and for each user in the plurality of users, determining a target preset virtual position in the plurality of preset virtual positions according to the user information corresponding to the user, wherein the label information of the target preset virtual position in the virtual scene information is matched with the user information corresponding to the user.
10. The method of claim 7, wherein said obtaining, for each of the plurality of users, a corresponding target predetermined virtual location of the user in the plurality of predetermined virtual locations comprises:
generating virtual position request information and sending the virtual position request information to each user in the plurality of users, wherein the virtual position request information comprises the virtual scene information;
receiving feedback information which is sent by at least one user of the plurality of users and relates to the virtual position request information, wherein the feedback information sent by each user of the at least one user is used for indicating a target preset virtual position selected by the user in the plurality of preset virtual positions;
and for each user in the plurality of users, determining a target preset virtual position corresponding to the user in the plurality of preset virtual positions according to the feedback information.
11. The method of claim 10, wherein the method further comprises:
after receiving feedback information sent by a first user of the multiple users, generating first prompt information corresponding to the feedback information, and sending the first prompt information to other users which have not fed back so as to prompt that a first target preset virtual position indicated by the feedback information is not selectable.
12. The method of claim 10, wherein the method further comprises:
after receiving feedback information sent by a first user of the multiple users, generating second prompt information corresponding to the feedback information, and sending the second prompt information to other users except the first user of the multiple users to prompt that a first target preset virtual position indicated by the feedback information is selected by the first user.
13. The method of claim 10, wherein the method further comprises:
receiving invitation request information sent by a second user of the at least one user, wherein the second user selects a second target predetermined virtual position in the plurality of predetermined virtual positions, and the invitation request information is used for inviting a third user which is not currently fed back in the plurality of users to select a predetermined virtual position near the second target predetermined virtual position;
and sending the invitation request information to the third user to prompt the third user to select the unselected preset virtual position near the second target preset virtual position as the target preset virtual position corresponding to the third user.
14. The method of claim 10, wherein the method further comprises:
after a predetermined feedback period corresponding to the virtual position request information is reached, for each user which does not currently feed back in the plurality of users, determining a target predetermined virtual position corresponding to the user in at least one predetermined virtual position which is not currently selected in the plurality of predetermined virtual positions, and determining the target predetermined virtual position as the virtual position of the user in the virtual scene information.
15. The method of claim 14, wherein said determining, for each user of the plurality of users who is not currently feeding back, a corresponding target predetermined virtual location of the user in at least one predetermined virtual location of the plurality of predetermined virtual locations that is not currently selected comprises:
determining hot spot position area information in the virtual scene information according to the virtual position of at least one user in the virtual scene information which is fed back currently in the plurality of users;
and for each user which is not currently fed back in the plurality of users, determining an unselected predetermined virtual position in the hot spot position area information as the virtual position of the user in the virtual scene information.
16. The method of claim 14, wherein said determining, for each user of the plurality of users who is not currently feeding back, a corresponding target predetermined virtual location of the user in at least one predetermined virtual location of the plurality of predetermined virtual locations that is not currently selected comprises:
and for each user which is not currently fed back in the plurality of users, determining a target scheduled virtual position corresponding to the user in at least one scheduled virtual position which is not currently selected in the plurality of scheduled virtual positions, wherein the tag information of the target scheduled virtual position in the virtual scene information is matched with the user information corresponding to the user.
17. An apparatus for playing voice information in a multi-person voice, the apparatus comprising:
a processor; and
a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of any of claims 1 to 16.
18. A computer readable medium storing instructions that, when executed, cause a system to perform the operations of any of claims 1 to 16.
CN202011049085.4A 2020-09-29 2020-09-29 Method and equipment for playing voice information in multi-person voice Active CN112261337B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011049085.4A CN112261337B (en) 2020-09-29 2020-09-29 Method and equipment for playing voice information in multi-person voice
PCT/CN2021/119542 WO2022068640A1 (en) 2020-09-29 2021-09-22 Method and device for broadcasting voice information in multi-user voice call

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011049085.4A CN112261337B (en) 2020-09-29 2020-09-29 Method and equipment for playing voice information in multi-person voice

Publications (2)

Publication Number Publication Date
CN112261337A CN112261337A (en) 2021-01-22
CN112261337B true CN112261337B (en) 2023-03-31

Family

ID=74235010

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011049085.4A Active CN112261337B (en) 2020-09-29 2020-09-29 Method and equipment for playing voice information in multi-person voice

Country Status (2)

Country Link
CN (1) CN112261337B (en)
WO (1) WO2022068640A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112261337B (en) * 2020-09-29 2023-03-31 上海连尚网络科技有限公司 Method and equipment for playing voice information in multi-person voice

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9908576D0 (en) * 1999-04-16 1999-06-09 Mitel Corp Virtual meeting rooms with spatial audio
JP2001339799A (en) * 2000-05-29 2001-12-07 Alpine Electronics Inc Virtual meeting apparatus
US6850496B1 (en) * 2000-06-09 2005-02-01 Cisco Technology, Inc. Virtual conference room for voice conferencing
CN102724604A (en) * 2012-06-06 2012-10-10 北京中自科技产业孵化器有限公司 Sound processing method for video meeting
WO2014001478A1 (en) * 2012-06-28 2014-01-03 The Provost, Fellows, Foundation Scholars, & The Other Members Of Board, Of The College Of The Holy & Undiv. Trinity Of Queen Elizabeth Near Dublin Method and apparatus for generating an audio output comprising spatial information
CN106131355A (en) * 2016-07-05 2016-11-16 华为技术有限公司 A kind of sound playing method and device
WO2019121864A1 (en) * 2017-12-19 2019-06-27 Koninklijke Kpn N.V. Enhanced audiovisual multiuser communication
CN110035250A (en) * 2019-03-29 2019-07-19 维沃移动通信有限公司 Audio-frequency processing method, processing equipment, terminal and computer readable storage medium
CN110149332A (en) * 2019-05-22 2019-08-20 北京达佳互联信息技术有限公司 Live broadcasting method, device, equipment and storage medium

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100947027B1 (en) * 2007-12-28 2010-03-11 한국과학기술원 Method of communicating with multi-user simultaneously using virtual sound and computer-readable medium therewith
CN107211061B (en) * 2015-02-03 2020-03-31 杜比实验室特许公司 Optimized virtual scene layout for spatial conference playback
EP3611941A4 (en) * 2017-04-10 2020-12-30 Yamaha Corporation Voice providing device, voice providing method, and program
CN107066102A (en) * 2017-05-09 2017-08-18 北京奇艺世纪科技有限公司 Support the method and device of multiple VR users viewing simultaneously
CN108881784B (en) * 2017-05-12 2020-07-03 腾讯科技(深圳)有限公司 Virtual scene implementation method and device, terminal and server
CN109086029B (en) * 2018-08-01 2021-10-26 北京奇艺世纪科技有限公司 Audio playing method and VR equipment
CN112261337B (en) * 2020-09-29 2023-03-31 上海连尚网络科技有限公司 Method and equipment for playing voice information in multi-person voice

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB9908576D0 (en) * 1999-04-16 1999-06-09 Mitel Corp Virtual meeting rooms with spatial audio
JP2001339799A (en) * 2000-05-29 2001-12-07 Alpine Electronics Inc Virtual meeting apparatus
US6850496B1 (en) * 2000-06-09 2005-02-01 Cisco Technology, Inc. Virtual conference room for voice conferencing
CN102724604A (en) * 2012-06-06 2012-10-10 北京中自科技产业孵化器有限公司 Sound processing method for video meeting
WO2014001478A1 (en) * 2012-06-28 2014-01-03 The Provost, Fellows, Foundation Scholars, & The Other Members Of Board, Of The College Of The Holy & Undiv. Trinity Of Queen Elizabeth Near Dublin Method and apparatus for generating an audio output comprising spatial information
CN106131355A (en) * 2016-07-05 2016-11-16 华为技术有限公司 A kind of sound playing method and device
WO2019121864A1 (en) * 2017-12-19 2019-06-27 Koninklijke Kpn N.V. Enhanced audiovisual multiuser communication
CN110035250A (en) * 2019-03-29 2019-07-19 维沃移动通信有限公司 Audio-frequency processing method, processing equipment, terminal and computer readable storage medium
CN110149332A (en) * 2019-05-22 2019-08-20 北京达佳互联信息技术有限公司 Live broadcasting method, device, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
虚拟空间会议系统中音频合成技术的研究与实现;贺宝权等;《小型微型计算机系统》;20000608(第06期);全文 *

Also Published As

Publication number Publication date
CN112261337A (en) 2021-01-22
WO2022068640A1 (en) 2022-04-07

Similar Documents

Publication Publication Date Title
CN110336735B (en) Method and equipment for sending reminding message
CN110288997A (en) Equipment awakening method and system for acoustics networking
EP3555822A1 (en) Initiating a conferencing meeting using a conference room device
CN112822431B (en) Method and equipment for private audio and video call
CN110795004B (en) Social method and device
CN110072151B (en) Virtual gift display method, electronic device and computer-readable storage medium
CN112822161B (en) Method and equipment for realizing conference message synchronization
CN112261337B (en) Method and equipment for playing voice information in multi-person voice
CN112751683B (en) Method and equipment for realizing conference message synchronization
CN112822430B (en) Conference group merging method and device
CN108111374A (en) Method, apparatus, equipment and the computer storage media of synchronizer list
CN111445345A (en) Method and equipment for releasing dynamic information
CN112788004B (en) Method, device and computer readable medium for executing instructions by virtual conference robot
CN113329237B (en) Method and equipment for presenting event label information
CN112261236B (en) Method and equipment for mute processing in multi-person voice
CN112272213A (en) Activity registration method and equipment
CN113157162A (en) Method, apparatus, medium and program product for revoking session messages
CN112261569B (en) Method and equipment for playing multiple channels
CN111859009A (en) Method and equipment for providing audio information
CN112533061B (en) Method and equipment for collaboratively shooting and editing video
KR20150108098A (en) Chatting service providing system, apparatus and method thereof
CN115734000A (en) Method, device, medium and program product for concert on live broadcast line
CN115913804A (en) Method, apparatus, medium and program product for joining chat room
CN114338579B (en) Method, equipment and medium for dubbing
CN115544378A (en) Method, device, medium and program product for collaboration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant