WO2022068640A1

WO2022068640A1 - Method and device for broadcasting voice information in multi-user voice call

Info

Publication number: WO2022068640A1
Application number: PCT/CN2021/119542
Authority: WO
Inventors: 程翰
Original assignee: 上海连尚网络科技有限公司
Priority date: 2020-09-29
Filing date: 2021-09-22
Publication date: 2022-04-07
Also published as: CN112261337A; CN112261337B

Abstract

The aim of the present application is to provide a method and device for broadcasting voice information in a multi-user voice call. The method comprises: for a target user in a plurality of users involved in a multi-user voice call, determining virtual position information of the other users in the plurality of users in a virtual sound field corresponding to the target user, and generating, according to the virtual position information, virtual sound field information corresponding to the target user; and sending the virtual sound field information to a user device corresponding to the target user, such that according to the virtual position information of each user in the other users in an ocular virtual sound field, the user device broadcasts voice information of the user. By means of the present application, each user can clearly and accurately distinguish the voice of each user in a multi-user voice call, and can intuitively and quickly know which user is currently speaking, thereby being greatly convenient for the users involved in a multi-user voice call.

Description

A method and device for playing voice information in multi-person voice

This application is based on the application with the CN application number of 202011049085.4 and the filing date of 2020.09.29, and claims its priority. The disclosure of the CN application is hereby incorporated into this application as a whole.

technical field

The present application relates to the field of communications, and in particular, to a technology for playing voice information in multi-person voices.

Background technique

With the development of the times, voice communication has become one of the most popular and common communication methods. In the prior art, multi-person voice communication means that multiple users use clients on terminal devices such as mobile phones and PCs to use voice in real time. A common multi-person voice communication scheme is that each client receives real-time voice information from multiple other clients, and then mixes multiple received real-time voice information locally to obtain a local Mix the voice message and play it.

SUMMARY OF THE INVENTION

An object of the present application is to provide a method and device for playing voice information in a multi-person voice.

According to an aspect of the present application, there is provided a method for playing voice information in a multi-person voice applied to a network device, the method comprising:

For a target user among multiple users participating in multi-person speech, determine the virtual location information of other users in the multiple users in the virtual sound field corresponding to the target user, and generate the virtual location information according to the virtual location information. The virtual sound field information corresponding to the target user;

Send the virtual sound field information to the user equipment corresponding to the target user, so that the user equipment plays the user's voice information according to the virtual position information of each of the other users in the target virtual sound field .

According to one aspect of the present application, there is provided a network device for playing voice information in a multi-person voice, the device comprising:

One-to-one module, for the target user among the multiple users participating in the multi-person voice, to determine the virtual position information of other users in the multiple users in the virtual sound field corresponding to the target user, and according to the virtual position information of the other users in the multiple users position information, to generate virtual sound field information corresponding to the target user;

The first and second modules are configured to send the virtual sound field information to the user equipment corresponding to the target user, so that the user equipment can make the virtual sound field according to the virtual position information of each of the other users in the target virtual sound field. Play the user's voice message.

According to an aspect of the present application, there is provided a device for playing voice information in a multi-person voice, wherein the device includes:

processor; and

memory arranged to store computer-executable instructions which, when executed, cause the processor to:

According to one aspect of the present application, there is provided a computer-readable medium storing instructions that, when executed, cause a system to:

Compared with the prior art, the present application can determine the virtual position information of other users in the virtual sound field corresponding to the user for each user of the multiple users participating in the voice, and then according to the virtual sound field corresponding to the user by other users in the user. The virtual location information in the server plays the voice information of other users, so that each user can clearly and accurately distinguish each person's voice in the multi-person voice, and can intuitively and quickly know which other user is currently speaking. Great convenience for users in multiplayer voice.

Description of drawings

Other features, objects and advantages of the present application will become more apparent by reading the detailed description of non-limiting embodiments made with reference to the following drawings:

1 shows a flowchart of a method for playing voice information in a multi-person voice applied to a network device end according to an embodiment of the present application;

2 shows a structural diagram of a network device for playing voice information in a multi-person voice according to an embodiment of the present application;

3 illustrates an exemplary system that may be used to implement various embodiments described in this application.

The same or similar reference numbers in the drawings represent the same or similar parts.

Detailed ways

The present application will be described in further detail below with reference to the accompanying drawings.

In a typical configuration of the present application, the terminal, the device serving the network, and the trusted party all include one or more processors (for example, a central processing unit (CPU)), an input/output interface, a network interface, and RAM.

Memory may include non-persistent memory in computer-readable media, random access memory (RAM) and/or non-volatile memory, such as read only memory (ROM) or flash memory ( Flash Memory). Memory is an example of a computer-readable medium.

Computer-readable media includes both persistent and non-permanent, removable and non-removable media, and storage of information may be implemented by any method or technology. Information may be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, Phase-Change Memory (PCM), Programmable Random Access Memory (PRAM), Static Random-Access Memory (Static Random-Access Memory, SRAM), Dynamic Random Access Memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically-Erasable Programmable Read -Only Memory, EEPROM), flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD), or other optical storage , magnetic tape cartridges, magnetic tape-disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.

The equipment referred to in this application includes, but is not limited to, user equipment, network equipment, or equipment formed by integrating user equipment and network equipment through a network. The user equipment includes, but is not limited to, any mobile electronic product that can perform human-computer interaction with the user (for example, human-computer interaction through a touchpad), such as a smart phone, a tablet computer, etc., and the mobile electronic product can use any operation. system, such as Android operating system, iOS operating system, etc. Wherein, the network device includes an electronic device that can automatically perform numerical calculation and information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, application specific integrated circuits (ASICs) ), Programmable Logic Device (PLD), Field Programmable Gate Array (FPGA), Digital Signal Processor (DSP), embedded devices, etc. The network device includes but is not limited to a computer, a network host, a single network server, a plurality of network server sets or a cloud formed by a plurality of servers; here, the cloud is formed by a large number of computers or network servers based on cloud computing, Among them, cloud computing is a kind of distributed computing, a virtual supercomputer composed of a group of loosely coupled computer sets. The network includes but is not limited to the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, a wireless ad hoc network (Ad Hoc network), and the like. Preferably, the device may also be a program running on the user equipment, network equipment, or a device formed by user equipment and network equipment, network equipment, touch terminal or network equipment and touch terminal integrated through a network.

Of course, those skilled in the art should understand that the above-mentioned devices are only examples, and other existing or possible devices that may appear in the future, if applicable to this application, should also be included within the protection scope of this application, and are included in this application by reference. this.

In the description of this application, "plurality" means two or more, unless expressly and specifically defined otherwise.

FIG. 1 shows a flow chart of a method for playing voice in a multi-person voice applied to a network device according to an embodiment of the present application, and the method includes step S11 and step S12. In step S11, for the target user among the multiple users participating in the multi-person speech, the network device determines the virtual position information of other users in the multiple users in the virtual sound field corresponding to the target user, and according to the virtual location information, and generate virtual sound field information corresponding to the target user; in step S12, the network device sends the virtual sound field information to the user equipment corresponding to the target user, so that the user equipment can The virtual position information of each user in the virtual sound field plays the user's voice information.

In step S11, for the target user among the multiple users participating in the multi-person speech, the network device determines the virtual position information of other users in the multiple users in the virtual sound field corresponding to the target user, and according to the virtual position information to generate virtual sound field information corresponding to the target user. In some embodiments, the target user is each of the multiple users participating in the multi-person speech. In some embodiments, the virtual sound field is a relative coordinate system, and the relative coordinate system may be a two-dimensional plane coordinate system or a three-dimensional space coordinate system, each user corresponds to a virtual sound field, and the virtual position refers to other users For the coordinate points corresponding to the user's virtual sound field, the virtual position information is the coordinate value corresponding to the coordinate point, and the virtual position corresponding to the user in the user's virtual sound field is the coordinate origin. For example, in the virtual sound field of User1, the virtual position information corresponding to User1 is (0,0), the virtual position corresponding to User2 is information (0,1), and in the virtual sound field of User2, the virtual position information corresponding to User1 is ( 0, -1), the virtual location information corresponding to User2 is (0,0). In some embodiments, the coordinate axis unit of the virtual sound field corresponding to a certain user is a predetermined distance interval, for example, 1 cm, 10 cm, 1 meter, etc., and the coordinate axis direction is a predetermined direction relative to the user, For example, the positive direction of the X-axis is to the right of the user, and the positive direction of the Y-axis is the front of the user. In some embodiments, relative distance information and relative direction information between two users can be obtained according to virtual position information corresponding to one user in the virtual sound field of another user, and the coordinate axis unit and coordinate axis direction of the virtual sound field. . For example, in the virtual sound field of User1, the positive direction of the X-axis is to the right of User1, the positive direction of the Y-axis is the front of User1, the unit of the X-axis and the Y-axis is 1 meter, and the virtual position information corresponding to User1 is (0 ,0), the virtual location information corresponding to User2 is (1,0), so it can be concluded that User2 is 1 meter in front of User1. In some embodiments, for each user in the multi-person speech, the virtual sound field information corresponding to the user includes but is not limited to the coordinate axis direction and coordinate axis unit of the virtual sound field of the user, and the information of each other user in the user's virtual sound field. Corresponding virtual position information in the virtual sound field (that is, the coordinate value of the coordinate point).

In step S12, the network device sends the virtual sound field information to the user equipment corresponding to the target user, so that the user equipment can make the virtual sound field according to the virtual position of each of the other users in the target virtual sound field. The message plays the user's voice message. In some embodiments, for each user in the multi-person voice, the voice information of the other users may be sent from the user equipment corresponding to the other users to the user equipment corresponding to the user via the network device, or may also be sent from the other users The corresponding user equipment is sent to the user equipment corresponding to the user through the p2p connection established between the two user equipments. In some embodiments, for each user in the multi-person voice, when receiving voice information sent by a certain other user, according to the virtual location information of the other user in the user's virtual sound field, and the user's The coordinate axis direction and coordinate axis unit of the virtual sound field can obtain relative distance information and relative direction information of the other user relative to the user, and play the voice information according to the relative distance information and relative direction information. For example, in the virtual sound field of User1, the positive direction of the X-axis is to the right of User1, the positive direction of the Y-axis is the front of User1, the unit of the X-axis and the Y-axis is 1 meter, and the virtual position information corresponding to User1 is (0 ,0), the virtual position information corresponding to User2 is (0,-2), it can be concluded that User2 is 2 meters behind User1, and plays the voice information according to the relative distance information and relative direction information . In some embodiments, the manner of playing the voice information according to the relative distance information and the relative direction information may be to filter and delay the voice information through a head-related transfer function (HRTF), and then output the voice information to the speaker of the user equipment for playback, Therefore, in the multi-person voice, the user can clearly and accurately distinguish each person's voice when multiple other users are speaking at the same time, and the user can intuitively and quickly know which other user is currently speaking when each other user is speaking. , which can provide great convenience for users in multi-person voice.

In some embodiments, for the target user among the multiple users participating in the multi-person voice, determining the virtual position information of other users among the multiple users in the virtual sound field corresponding to the target user includes step S13 (not shown), step S14 (not shown) and step S15 (not shown). In step S13, the network device determines the virtual scene information corresponding to the multi-person voice; in step S14, the network device determines the virtual location corresponding to each of the multiple users according to the virtual scene information; In step S15, the network device determines virtual position information of the other users in the virtual sound field corresponding to the target user according to the virtual position corresponding to the target user and the virtual positions corresponding to the other users. In some embodiments, the virtual scene may be a virtual two-dimensional scene or a virtual three-dimensional scene, for example, a virtual conference room, a virtual classroom, and the like. In some embodiments, the virtual scene information includes, but is not limited to, visualization information of the virtual scene, configuration information of the virtual scene, etc. The visualization information of the virtual scene is used to intuitively present the virtual scene to the user by means of a 2D scene image or a 3D scene model. , so that the user can determine the virtual position of himself or others in the virtual scene, or browse the virtual position of himself or others in the virtual scene, and the configuration information of the virtual scene is used to The virtual positions of the two users are obtained, the relative distance information and relative direction information between the two users are obtained, and the virtual position information of the two users in the corresponding virtual sound fields is determined. In some embodiments, the virtual scene may be selected by a voice-initiated user among multiple default virtual scenes, or may be at least one target virtual scene selected from multiple default virtual scenes by at least one user among the multiple users, Then, from at least one target virtual scene, the target virtual scene that has been selected the most times by the user is determined as the virtual scene corresponding to multiple voices. The target default virtual scene determined in the scene that matches the speech topic information. In some embodiments, the virtual position of each user in the virtual scene may be determined for each user by the voice initiating user, or may be determined by each user individually, or may also be determined according to each user The user information corresponding to each user, the virtual position corresponding to each user in the virtual scene is determined in a plurality of predetermined virtual positions, wherein the tag information (for example, the virtual classroom) of the virtual position of each user in the virtual scene (for example, the virtual classroom) , "Podium") matches the user's corresponding user information (eg, "language teacher"). In some embodiments, after determining the virtual scene corresponding to the multiple voices, the network device sends the virtual scene information corresponding to the virtual scene to each user in the multiple voices, and then, according to the visualization information in the virtual scene information, Intuitively present the virtual scene to each user by means of a 2D scene image or a 3D scene model, and then each user determines the location of their respective The virtual position in the virtual scene, or only the virtual scene information corresponding to the virtual scene is sent to the voice initiating user, and the voice initiating user determines the virtual position of each user in the multi-person voice in the virtual scene. In some embodiments, for each user, according to the virtual position corresponding to the user in the virtual scene, the virtual position corresponding to each other user in the virtual scene, and the configuration information corresponding to the virtual scene, the user and the user can be obtained. relative distance information and relative direction information between each other user, and determine the virtual position information of each other user in the virtual sound field corresponding to the user. In some embodiments, the network device sends the virtual location of each user in the virtual scene to each user and presents it on the corresponding user device, so that each user can know that he and each other user are in a virtual environment Virtual location in the scene. In some embodiments, each user equipment intuitively presents to each user the virtual position corresponding to itself and each other user in the virtual scene in the 2D scene image or 3D scene model corresponding to the virtual scene information, so that each user The user can intuitively and quickly know the relative distance and relative direction of other users in the virtual scene relative to himself. For example, the user equipment can present the corresponding user identification information (for example, at each virtual position in the 2D scene image or 3D scene model). , user name, user ID, etc.).

In some embodiments, the step S13 includes: the network device obtains identification information corresponding to the target virtual scene information selected by the voice-initiated user among the plurality of default virtual scene information, and assigns the target virtual scene to the target virtual scene. The information is determined as virtual scene information corresponding to the multi-person voices. In some embodiments, the voice-initiated user selects a target virtual scene from among multiple default virtual scenes, and sends identification information (eg, scene name, scene ID, etc.) corresponding to the target virtual scene to the network device. For example, multiple default virtual scenes include virtual meeting room 1, virtual meeting room 2, virtual classroom 1, and virtual classroom 2. The voice-initiated user selects virtual meeting room 1 as the target virtual scene among the multiple default virtual scenes, and sets the corresponding virtual meeting room 1 as the target virtual scene. The identification information "Virtual Meeting Room 1" is sent to the network device.

In some embodiments, the step S13 includes: the network device obtains at least one target virtual scene information selected by at least one user among the plurality of users from the plurality of default virtual scene information, and obtains information from the at least one target virtual scene from the at least one target virtual scene. The virtual scene information corresponding to the multi-person voices is determined in the information, wherein the determined virtual scene information is selected the most times. In some embodiments, each user can select one or more target virtual scenes from multiple default virtual scenes, and send identification information corresponding to the one or more target virtual scenes to the network device, and then the network device can retrieve the corresponding identification information from the one or more target virtual scenes to the network device. Among the one or more target virtual scenes, the target virtual scene selected by the user the most times is determined as the virtual scene corresponding to the multi-person voice. Preferably, each user can only select one target virtual scene among multiple default virtual scenes.

In some embodiments, the step S13 includes: the network device determines target default virtual scene information that matches the voice theme information from a plurality of default virtual scene information according to the voice theme information corresponding to the multi-person voices, and determining the target default virtual scene information as virtual scene information corresponding to the multi-person voices. In some embodiments, the voice topic information corresponding to the multi-person voices may be sent to the network device after input by the voice-initiated user, or may be selected by the voice-initiated user from multiple preset default voice topic information corresponding to the multi-person voices voice theme information, and send the identification information (for example, theme name, theme ID, etc.) corresponding to the voice theme information to the network device, wherein the voice theme information is used to represent the theme of this multi-person voice, including but not limited to "" Conference", "Class Meeting", "Technology Sharing", etc. In some embodiments, according to the voice theme information corresponding to the voices of the multiple people, the default virtual scene matching the voice theme information is determined as the virtual scene corresponding to the multiple voices in the multiple default virtual scenes, for example, a plurality of default virtual scenes The virtual scene includes a virtual conference room, a virtual classroom, and a virtual coffee shop. According to the voice theme information "meeting" corresponding to the multi-person voice, the default virtual scene "matching the voice theme information "meeting" in the multiple default virtual scenes" "Virtual meeting room" is determined as the virtual scene corresponding to the multi-person voice.

In some embodiments, the step S13 includes a step S16 (not shown). In step S16, the network device determines target default virtual scene information that matches the user information from a plurality of default virtual scene information according to the user information corresponding to the multiple users, and converts the target default virtual scene information to the target default virtual scene information. The virtual scene information corresponding to the multi-person voice is determined. In some embodiments, according to the user information corresponding to each of the multiple users, or according to the user information corresponding to the voice initiating user among the multiple users, the multiple default virtual scene information will correspond to each user The default virtual scene that matches the user information of the user information or the user information corresponding to the voice initiating user is determined as the virtual scene corresponding to the multi-person voice.

In some embodiments, the determining, according to the user information corresponding to the multiple users, the target default virtual scene information matching the user information from the multiple default virtual scene information includes: the network device according to the multiple default virtual scene information. For the user information corresponding to the voice-initiated user among the users, target default virtual scene information that matches the user information is determined from a plurality of default virtual scene information. For example, multiple default virtual scenes include virtual conference rooms, virtual classrooms, and virtual coffee shops, and the user information corresponding to the semantically initiating user among the multiple users includes "occupation: teacher", then according to the user information "occupation: teacher", The default virtual scene "virtual classroom" that matches the user information "occupation: teacher" among the plurality of default virtual scenes is determined as the virtual scene corresponding to the multi-person speech.

In some embodiments, the determining, according to the user information corresponding to the multiple users, the target default virtual scene information matching the user information from the multiple default virtual scene information includes: the network device according to the multiple default virtual scene information. User information corresponding to each of the plurality of users, at least one default virtual scene information matching the user information corresponding to each of the plurality of users is determined from a plurality of default virtual scene information, and from the Target default virtual scene information is determined from at least one default virtual scene information, wherein the number of users matching the target default virtual scene information is the largest. In some embodiments, for each user of the plurality of users, according to the user information corresponding to the user, a default virtual scene matching the user information is determined from a plurality of default virtual scenes. In some embodiments, the default virtual scene with the largest number of matched users among the at least one default virtual scene matched with the user information corresponding to each user is determined as the virtual scene corresponding to the multi-person voice. For example, multiple users corresponding to multi-person voices include User1, User2, and User3, and multiple default virtual scenes include virtual conference rooms, virtual classrooms, and virtual cafes. User1's user information includes "occupation: teacher". The default virtual scene is a virtual classroom, the user information of User2 includes "occupation: student", the default virtual scene matching User2 is also a virtual classroom, the user information of User3 includes "hobby: drinking coffee", and the default virtual scene matching User3 If it is a virtual coffee shop, in the virtual classroom and virtual coffee shop, the virtual scene "virtual classroom" with the largest number of matched users is determined as the virtual scene corresponding to the multi-person voice.

In some embodiments, the virtual scene information includes multiple predetermined virtual locations; wherein, the step S14 includes: for each user of the multiple users, the network device obtains the user's location in the multiple predetermined virtual locations. The target predetermined virtual position corresponding to the virtual position is determined as the virtual position of the user in the virtual scene information. In some embodiments, the virtual scene includes a plurality of predetermined virtual positions, and each predetermined virtual position is intuitively presented to the user in the 2D scene image or 3D scene model corresponding to the virtual scene information, and each user corresponds to the plurality of predetermined virtual positions. A target predetermined virtual position in the virtual positions, and the target predetermined virtual position corresponding to each user is determined as the virtual position of the user in the virtual scene. Preferably, each user corresponds to a different target predetermined virtual position. In some embodiments, the virtual position of the user in the virtual scene can only be one of a plurality of predetermined virtual positions, and cannot be any virtual position in the virtual scene. In some embodiments, the target predetermined virtual location may be selected by the voice initiating user for each user among the plurality of predetermined virtual locations, or may also be selected by each user for himself among the plurality of predetermined virtual locations, or , or according to the respective user information of each user, a predetermined virtual position matching the user information of the user is automatically determined for each user from a plurality of predetermined virtual positions.

In some embodiments, the obtaining, for each of the plurality of users, the target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions includes: for each of the plurality of users obtaining a target predetermined virtual position designated by a voice-initiated user among the plurality of predetermined virtual positions for the user among the plurality of users. In some embodiments, the voice-initiated user specifies a target predetermined virtual position corresponding to each user in the plurality of predetermined virtual positions, and sends the identification information of the target predetermined virtual position corresponding to each user to the network device, preferably, The voice-initiated user needs to select a different target predetermined virtual position for each user, and cannot select the same target predetermined virtual position for multiple users.

In some embodiments, the obtaining, for each of the plurality of users, the target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions includes: for each of the plurality of users a user, according to the user information corresponding to the user, determine a target predetermined virtual position among the plurality of predetermined virtual positions, wherein the tag information of the target predetermined virtual position in the virtual scene information corresponds to the user corresponding to the user information to match. In some embodiments, for each user, according to the user information corresponding to the user, a target predetermined virtual location whose corresponding tag information in the virtual scene matches the user information is automatically determined among a plurality of predetermined virtual locations. For example, the user information of User1 includes "occupation: Chinese teacher", the virtual scene is a virtual classroom, and the tag information corresponding to the predetermined virtual position L1 in the virtual scene is "podium", and the tag information is the same as that of User1. The user information "occupation: Chinese teacher" of the user information "occupation: Chinese teacher" is matched, and thus the predetermined virtual position L1 can be determined as the target predetermined virtual position corresponding to User1 in the plurality of predetermined virtual positions.

In some embodiments, for each user in the plurality of users, obtaining the target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions includes step S17 (not shown), step S18 (not shown) and step S19 (not shown). In step S17, the network device generates virtual location request information and sends it to each of the plurality of users, wherein the virtual location request information includes the virtual scene information; in step S18, the network device receives the information of the virtual location. feedback information about the virtual location request information sent by at least one of the multiple users, wherein the feedback information sent by each of the at least one user is used to indicate that the user is in the multiple predetermined virtual locations The target predetermined virtual position selected from the positions; in step S19, the network device determines, for each user in the plurality of users, according to the feedback information, the target predetermined target corresponding to the user in the plurality of predetermined virtual positions virtual location. In some embodiments, virtual location request information including virtual scene information is sent to each user, and after each user receives the virtual location request information, each user presents a 2D scene image or 3D scene model corresponding to the virtual scene information, and A plurality of predetermined virtual positions are presented in the 2D scene image or 3D scene model, each user selects a target predetermined virtual position from the plurality of predetermined virtual positions, and includes the identification information of the selected target predetermined virtual position in the The feedback information is sent to the network device. After receiving the feedback information sent by a certain user, the network device can obtain the target predetermined virtual position selected by the user among the plurality of predetermined virtual positions. Preferably, each user can only select With different target predetermined virtual positions, multiple users cannot select the same target predetermined virtual position. In some embodiments, the virtual location request information corresponds to a feedback period, and after the feedback period is reached, for each user among the plurality of users who has not yet given feedback, the voice-initiated user can select from the at least one currently unselected reservation. In the virtual location, a corresponding target predetermined virtual location is selected for each currently unreported user, or the network device may also automatically assign a corresponding target virtual location to each currently unreported user from at least one predetermined virtual location that is not currently selected. target predetermined virtual location.

In some embodiments, the method further includes: after receiving the feedback information sent by the first user among the multiple users, the network device generates first prompt information corresponding to the feedback information, and sends the first prompt information to the first user. The prompt information is sent to other users among the plurality of users who have not yet given feedback, so as to prompt that the predetermined virtual position of the first target indicated by the feedback information is not selectable. In some embodiments, after receiving the feedback information sent by the first user and selecting the first target predetermined virtual location from the multiple predetermined virtual locations, the network device generates prompt information corresponding to the feedback information (for example, "the first target virtual location" The user has selected the first target predetermined virtual position”), and send it to each other user among the multiple users who have not yet provided feedback, so as to prompt each other user who has not yet provided feedback that the first target predetermined virtual position cannot be selected. In some embodiments, after receiving the prompt information (for example, "the first user has selected the predetermined virtual location of the first target"), the user equipment corresponding to each other user can display the 2D scene image or 3D scene image corresponding to the virtual scene information In the scene model, the predetermined virtual position of the first target is set to an unselectable state.

In some embodiments, the method further includes: after the network device receives the feedback information sent by the first user among the multiple users, generating second prompt information corresponding to the feedback information, and converting the second prompt information to the network device. The information is sent to other users in the plurality of users except the first user, to prompt that the first target predetermined virtual position indicated by the feedback information has been selected by the first user. In some embodiments, the network device sends prompt information (eg, "the first user has selected the first target predetermined virtual location") to each of the plurality of users except the first user to prompt each of the other users The predetermined virtual position of the first target of the user has been selected by the first user, so that each user can know the virtual positions of other users in the virtual scene. In some embodiments, identification information (eg, user name, user ID, etc.) of the first user is presented at the predetermined virtual location of the first target in the 2D scene image or 3D scene model corresponding to the virtual scene information.

In some embodiments, the method further includes: the network device receiving invitation request information sent by a second user of the at least one user, wherein the second user has selected a first user in the plurality of predetermined virtual locations Second target predetermined virtual location, the invitation request information is used to invite a third user among the multiple users who has not given feedback currently to select a predetermined virtual location near the second target predetermined virtual location; send the invitation request information to The third user is to prompt the third user to select an unselected predetermined virtual position near the second target predetermined virtual position as the target predetermined virtual position corresponding to the third user. In some embodiments, the second user has selected the second target predetermined virtual position among the plurality of predetermined virtual positions as his own virtual position in the virtual scene, in response to the invitation performed by the second user for the third user who is not currently feeding back An operation is triggered to generate invitation request information for inviting the second user to select a predetermined virtual location near the second target predetermined virtual location, and send the request to the network device. In some embodiments, it is necessary to first detect whether the third user has currently selected a predetermined virtual location corresponding to himself, and if the third user does not currently select, the invitation triggering operation can be performed for the third user. In some embodiments, after receiving the invitation request information for the third user, the network device forwards the invitation request information to the third user, so as to prompt the third user to select an unselected reservation near the second target predetermined virtual location The virtual position is used as the target predetermined virtual position corresponding to the third user. In some embodiments, after receiving prompt information (for example, "the first user has selected the predetermined virtual position of the first target"), the second target may be reserved in the 2D scene image or 3D scene model corresponding to the virtual scene information At least one unselected predetermined virtual location near the virtual location is set to a special display state (eg, highlighted) to guide the third user to select one of the at least one predetermined virtual location as a target predetermined virtual location.

In some embodiments, the method further includes: after the network device reaches a predetermined feedback period corresponding to the virtual location request information, for each user among the plurality of users that is not currently giving feedback, determining that the user is in the A target predetermined virtual position corresponding to at least one predetermined virtual position that is not currently selected among the plurality of predetermined virtual positions, and the target predetermined virtual position is determined as the virtual position of the user in the virtual scene information. In some embodiments, the virtual location request information corresponds to a predetermined feedback period (for example, 5 minutes). For each user among the multiple users who has not yet given feedback, the user may select a corresponding target predetermined virtual position from at least one predetermined virtual position that is not currently selected by the voice-initiated user, or, alternatively, it may be Each currently unfeedback user is automatically assigned a corresponding target predetermined virtual position from at least one predetermined virtual position that is not currently selected by the network device. In some embodiments, different predetermined virtual locations correspond to different priorities in the virtual scene according to their respective tag information, and the corresponding targets can be automatically assigned to users who have not given feedback currently in descending order of priority. Book a virtual location. For example, if the virtual scene is a virtual auditorium, in the virtual scene, the priority corresponding to the plurality of predetermined virtual positions whose label information is "first row" will be higher than the priority corresponding to the plurality of predetermined virtual positions whose label information is "second row" The priority is to automatically assign the currently unselected predetermined virtual position with the label information of "first row" to the user who has not given feedback currently as the corresponding target predetermined virtual position.

In some embodiments, for each user among the plurality of users who has not currently given feedback, determining a target reservation corresponding to the user in at least one predetermined virtual location that is not currently selected among the plurality of predetermined virtual locations The virtual location includes: determining the hotspot location area information in the virtual scene information according to the virtual location of at least one user in the virtual scene information that has been currently fed back from the multiple users; for the multiple users For each user who has not given feedback at present, an unselected predetermined virtual location in the hotspot location area information is determined as the virtual location of the user in the virtual scene information. In some embodiments, after the feedback period is reached, according to the current distribution of the user's virtual location corresponding to each user that has been fed back in the virtual scene, determine a hotspot location area in the virtual scene where the user's virtual location is densely distributed, then Priority is given to automatically assigning a predetermined virtual location to each current unreported user from one or more unselected predetermined virtual locations in the hotspot location area as the respective corresponding target predetermined virtual location, if all the predetermined virtual locations in the hotspot location area If all virtual positions have been selected, a predetermined virtual position is automatically assigned to each currently unfeeding user from other unselected predetermined virtual positions in the virtual scene as the respective corresponding target predetermined virtual positions.

In some embodiments, for each user among the plurality of users who has not currently given feedback, determining a target reservation corresponding to the user in at least one predetermined virtual location that is not currently selected among the plurality of predetermined virtual locations The virtual position includes: for each user among the plurality of users who has not currently given feedback, determining a target predetermined virtual position corresponding to at least one predetermined virtual position that is not currently selected by the user among the plurality of predetermined virtual positions, Wherein, the tag information of the target predetermined virtual position in the virtual scene information matches the user information corresponding to the user. In some embodiments, for each user that is not currently feedback, according to the user information corresponding to the user, from at least one predetermined virtual location that is not currently selected, determine for the user the corresponding tag information in the virtual scene and the user The predetermined virtual position matched with the information is used as the target predetermined virtual position corresponding to the user. For example, the user information of User1 that has not been fed back currently includes "occupation: teacher", the virtual scene is a virtual classroom, and the label information corresponding to the predetermined virtual position L1 in at least one predetermined virtual position that is not currently selected in the virtual scene is " Lecture", the tag information matches the user information "occupation: teacher" of User1, so that the predetermined virtual position L1 can be automatically assigned to User1 as its corresponding target predetermined virtual position.

FIG. 2 shows a structure diagram of a network device for playing voice in a multi-person voice according to an embodiment of the present application. The device includes a first module 11 and a second module 12 . A module 11 is configured to, for the target user among the multiple users participating in the multi-person voice, determine the virtual position information of the other users in the multiple users in the virtual sound field corresponding to the target user, and according to the virtual location information, to generate virtual sound field information corresponding to the target user; the first and second modules 12 are used to send the virtual sound field information to the user equipment corresponding to the target user, so that the user equipment can The virtual position information of each user in the virtual sound field plays the user's voice information.

A module 11 is configured to, for the target user among the multiple users participating in the multi-person voice, determine the virtual position information of the other users in the multiple users in the virtual sound field corresponding to the target user, and according to the virtual position information to generate virtual sound field information corresponding to the target user. In some embodiments, the target user is each of the multiple users participating in the multi-person speech. In some embodiments, the virtual sound field is a relative coordinate system, and the relative coordinate system may be a two-dimensional plane coordinate system or a three-dimensional space coordinate system, each user corresponds to a virtual sound field, and the virtual position refers to other users For the coordinate points corresponding to the user's virtual sound field, the virtual position information is the coordinate value corresponding to the coordinate point, and the virtual position corresponding to the user in the user's virtual sound field is the coordinate origin. For example, in the virtual sound field of User1, the virtual position information corresponding to User1 is (0,0), the virtual position corresponding to User2 is information (0,1), and in the virtual sound field of User2, the virtual position information corresponding to User1 is ( 0, -1), the virtual location information corresponding to User2 is (0,0). In some embodiments, the coordinate axis unit of the virtual sound field corresponding to a certain user is a predetermined distance interval, for example, 1 cm, 10 cm, 1 meter, etc., and the coordinate axis direction is a predetermined direction relative to the user, For example, the positive direction of the X-axis is to the right of the user, and the positive direction of the Y-axis is the front of the user. In some embodiments, relative distance information and relative direction information between two users can be obtained according to virtual position information corresponding to one user in the virtual sound field of another user, and the coordinate axis unit and coordinate axis direction of the virtual sound field. . For example, in the virtual sound field of User1, the positive direction of the X-axis is to the right of User1, the positive direction of the Y-axis is the front of User1, the unit of the X-axis and the Y-axis is 1 meter, and the virtual position information corresponding to User1 is (0 ,0), the virtual location information corresponding to User2 is (1,0), so it can be concluded that User2 is 1 meter in front of User1. In some embodiments, for each user in the multi-person speech, the virtual sound field information corresponding to the user includes but is not limited to the coordinate axis direction and coordinate axis unit of the virtual sound field of the user, and the information of each other user in the user's virtual sound field. Corresponding virtual position information in the virtual sound field (that is, the coordinate value of the coordinate point).

The first and second modules 12 are configured to send the virtual sound field information to the user equipment corresponding to the target user, so that the user equipment can make the virtual sound field according to the virtual position of each of the other users in the target virtual sound field. The message plays the user's voice message. In some embodiments, for each user in the multi-person voice, the voice information of the other users may be sent from the user equipment corresponding to the other users to the user equipment corresponding to the user via the network device, or may also be sent from the other users The corresponding user equipment is sent to the user equipment corresponding to the user through the p2p connection established between the two user equipments. In some embodiments, for each user in the multi-person voice, when receiving voice information sent by a certain other user, according to the virtual location information of the other user in the user's virtual sound field, and the user's The coordinate axis direction and coordinate axis unit of the virtual sound field can obtain relative distance information and relative direction information of the other user relative to the user, and play the voice information according to the relative distance information and relative direction information. For example, in the virtual sound field of User1, the positive direction of the X-axis is to the right of User1, the positive direction of the Y-axis is the front of User1, the unit of the X-axis and the Y-axis is 1 meter, and the virtual position information corresponding to User1 is (0 ,0), the virtual position information corresponding to User2 is (0,-2), it can be concluded that User2 is 2 meters behind User1, and plays the voice information according to the relative distance information and relative direction information . In some embodiments, the manner of playing the voice information according to the relative distance information and the relative direction information may be to filter and delay the voice information through a head-related transfer function (HRTF), and then output the voice information to the speaker of the user equipment for playback, Therefore, in the multi-person voice, the user can clearly and accurately distinguish each person's voice when multiple other users are speaking at the same time, and the user can intuitively and quickly know which other user is currently speaking when each other user is speaking. , which can provide great convenience for users in multi-person voice.

In some embodiments, for the target user among the multiple users participating in the multi-person speech, the virtual position information of the other users among the multiple users in the virtual sound field corresponding to the target user is determined, including one or three Module 13 (not shown), a four-module 14 (not shown) and a five-module 15 (not shown). A third module 13 is used to determine the virtual scene information corresponding to the multi-person voices; a fourth module 14 is used to determine the virtual location corresponding to each of the multiple users according to the virtual scene information; a The fifth module 15 is configured to determine the virtual position information of the other users in the virtual sound field corresponding to the target user according to the virtual position corresponding to the target user and the virtual positions corresponding to the other users. Here, the specific implementations of the one three modules 13, the one four modules 14 and the one five modules 15 are the same as or similar to the embodiments of the steps S13, S14 and S15 in FIG. here.

In some embodiments, the one-three modules 13 are configured to: obtain identification information corresponding to the target virtual scene information selected by the voice-initiated user among the plurality of default virtual scene information among the plurality of users, and assign the target virtual scene information to the target virtual scene information. The scene information is determined as virtual scene information corresponding to the multi-person voices. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, the one-three modules 13 are configured to: obtain at least one target virtual scene information selected by at least one user among the plurality of users from a plurality of default virtual scene information, and obtain at least one target virtual scene information from the at least one target virtual scene information. The virtual scene information corresponding to the multi-person voices is determined in the scene information, wherein the determined virtual scene information is selected the most times. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, the one-three modules 13 are configured to: determine target default virtual scene information matching the voice theme information from a plurality of default virtual scene information according to the voice theme information corresponding to the voices of the multiple people , and determine the target default virtual scene information as the virtual scene information corresponding to the multi-person voices. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, the three-module 13 includes a six-module 16 (not shown). A six-module 16 is configured to determine target default virtual scene information matching the user information from a plurality of default virtual scene information according to the user information corresponding to the plurality of users, and convert the target default virtual scene information to the target default virtual scene information. The virtual scene information corresponding to the multi-person voice is determined. Here, the specific implementation of the six-module 16 is the same as or similar to the embodiment of step S16 in FIG. 1 , so it will not be repeated here, but is incorporated herein by reference.

In some embodiments, the determining, according to the user information corresponding to the multiple users, the target default virtual scene information matching the user information from the multiple default virtual scene information includes: the network device according to the multiple default virtual scene information. For the user information corresponding to the voice-initiated user among the users, target default virtual scene information that matches the user information is determined from a plurality of default virtual scene information. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, the determining, according to the user information corresponding to the multiple users, the target default virtual scene information matching the user information from the multiple default virtual scene information includes: the network device according to the multiple default virtual scene information. User information corresponding to each of the plurality of users, at least one default virtual scene information matching the user information corresponding to each of the plurality of users is determined from a plurality of default virtual scene information, and from the Target default virtual scene information is determined from at least one default virtual scene information, wherein the number of users matching the target default virtual scene information is the largest. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, the virtual scene information includes a plurality of predetermined virtual locations; wherein, the one-fourth module 14 is configured to obtain, for each user of the plurality of users, the user's location in the plurality of users The target predetermined virtual position corresponding to the predetermined virtual positions is determined as the virtual position of the user in the virtual scene information. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, the obtaining, for each of the plurality of users, the target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions includes: for each of the plurality of users obtaining a target predetermined virtual position designated by a voice-initiated user among the plurality of predetermined virtual positions for the user among the plurality of users. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, the obtaining, for each of the plurality of users, the target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions includes: for each of the plurality of users a user, according to the user information corresponding to the user, determine a target predetermined virtual position among the plurality of predetermined virtual positions, wherein the tag information of the target predetermined virtual position in the virtual scene information corresponds to the user corresponding to the user information to match. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, for each user in the plurality of users, obtaining the target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions includes a seven module 17 (not shown), There are eight modules 18 (not shown) and nine modules 19 (not shown). A seventh module 17 is configured to generate virtual location request information and send it to each of the multiple users, wherein the virtual location request information includes the virtual scene information; an eighth module 18 is configured to receive all feedback information about the virtual location request information sent by at least one of the multiple users, wherein the feedback information sent by each of the at least one user is used to indicate that the user is in the multiple predetermined virtual locations The target predetermined virtual position selected from the positions; the 19th module 19 is configured to, for each user in the plurality of users, determine the target predetermined target corresponding to the user in the plurality of predetermined virtual positions according to the feedback information virtual location. Here, the specific implementations of the 17 module 17, the 18 module 18 and the 19 module 19 are the same as or similar to the embodiments of the steps S17, S18 and S19 in FIG. here.

In some embodiments, the device is further configured to: after receiving the feedback information sent by the first user among the multiple users, generate first prompt information corresponding to the feedback information, and send the first prompt information to the user. The information is sent to other users among the plurality of users who have not yet given feedback to prompt that the predetermined virtual position of the first target indicated by the feedback information is not selectable. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, the device is further configured to: after receiving the feedback information sent by the first user among the multiple users, generate second prompt information corresponding to the feedback information, and convert the second prompt information It is sent to other users in the plurality of users except the first user, so as to prompt that the first target predetermined virtual position indicated by the feedback information has been selected by the first user. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, the device is further configured to: receive invitation request information sent by a second user in the at least one user, wherein the second user has selected a second user in the plurality of predetermined virtual locations the target predetermined virtual location, and the invitation request information is used to invite a third user among the multiple users who has not given feedback currently to select a predetermined virtual location near the second target predetermined virtual location; send the invitation request information to all users and the third user, to prompt the third user to select an unselected predetermined virtual position near the second target predetermined virtual position as the target predetermined virtual position corresponding to the third user. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, the device is further configured to: after reaching a predetermined feedback period corresponding to the virtual location request information, for each user among the plurality of users that is not currently giving feedback, determine that the user is in the plurality of users. A target predetermined virtual position corresponding to at least one predetermined virtual position that is not currently selected among the predetermined virtual positions, and the target predetermined virtual position is determined as the virtual position of the user in the virtual scene information. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, for each user among the plurality of users who has not currently given feedback, determining a target reservation corresponding to the user in at least one predetermined virtual location that is not currently selected among the plurality of predetermined virtual locations The virtual location includes: determining the hotspot location area information in the virtual scene information according to the virtual location of at least one user in the virtual scene information that has been currently fed back from the multiple users; for the multiple users For each user who has not given feedback at present, an unselected predetermined virtual location in the hotspot location area information is determined as the virtual location of the user in the virtual scene information. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, for each user among the plurality of users who has not currently given feedback, determining a target reservation corresponding to the user in at least one predetermined virtual location that is not currently selected among the plurality of predetermined virtual locations The virtual position includes: for each user among the plurality of users who has not currently given feedback, determining a target predetermined virtual position corresponding to at least one predetermined virtual position that is not currently selected by the user among the plurality of predetermined virtual positions, Wherein, the tag information of the target predetermined virtual position in the virtual scene information matches the user information corresponding to the user. Here, the related operations are the same as or similar to the embodiment shown in FIG. 1 , so they are not repeated here, but are incorporated herein by reference.

In some embodiments, as shown in FIG. 3, system 300 can function as any of the devices in each of the described embodiments. In some embodiments, system 300 may include one or more computer-readable media (eg, system memory or NVM/storage device 320 ) having instructions and be coupled to the one or more computer-readable media and configured to execute Instructions to implement a module to perform one or more processors (eg, processor(s) 305 ) to perform the actions described herein.

For one embodiment, the system control module 310 may include any suitable interface controller to provide at least one of the processor(s) 305 and/or any suitable device or component in communication with the system control module 310 any appropriate interface.

The system control module 310 may include a memory controller module 330 to provide an interface to the system memory 315 . The memory controller module 330 may be a hardware module, a software module, and/or a firmware module.

System memory 315 may be used, for example, to load and store data and/or instructions for system 300 . For one embodiment, system memory 315 may include any suitable volatile memory, eg, suitable DRAM. In some embodiments, system memory 315 may include double data rate type quad synchronous dynamic random access memory (DDR4 SDRAM).

For one embodiment, system control module 310 may include one or more input/output (I/O) controllers to provide interfaces to NVM/storage device 320 and communication interface(s) 325 .

For example, NVM/storage device 320 may be used to store data and/or instructions. NVM/storage device 320 may include any suitable non-volatile memory (eg, flash memory) and/or may include any suitable non-volatile storage device(s) (eg, one or more hard drives ( HDD), one or more compact disc (CD) drives and/or one or more digital versatile disc (DVD) drives).

NVM/storage device 320 may include storage resources that are physically part of the device on which system 300 is installed, or it may be accessed by the device without necessarily being part of the device. For example, the NVM/storage device 320 is accessible via the communication interface(s) 325 over a network.

Communication interface(s) 325 may provide an interface for system 300 to communicate over one or more networks and/or with any other suitable device. System 300 may wirelessly communicate with one or more components of a wireless network in accordance with any of one or more wireless network standards and/or protocols.

For one embodiment, at least one of the processor(s) 305 may be packaged with the logic of one or more controllers of the system control module 310 (eg, the memory controller module 330 ). For one embodiment, at least one of the processor(s) 305 may be packaged with logic of one or more controllers of the system control module 310 to form a system-in-package (SiP). For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with the logic of one or more controllers of the system control module 310 . For one embodiment, at least one of the processor(s) 305 may be integrated on the same die with logic of one or more controllers of the system control module 310 to form a system on a chip (SoC).

In various embodiments, system 300 may be, but is not limited to, a server, workstation, desktop computing device, or mobile computing device (eg, laptop computing device, handheld computing device, tablet computer, netbook, etc.). In various embodiments, system 300 may have more or fewer components and/or different architectures. For example, in some embodiments, system 300 includes one or more cameras, keyboards, liquid crystal display (LCD) screens (including touchscreen displays), non-volatile memory ports, multiple antennas, graphics chips, application specific integrated circuits ( ASIC) and speakers.

The present application also provides a computer-readable storage medium, where the computer-readable storage medium stores computer code, and when the computer code is executed, the method described in any preceding item is executed.

The present application also provides a computer program product, when the computer program product is executed by a computer device, the method according to any one of the preceding items is executed.

The present application also provides a computer device, the computer device comprising:

one or more processors;

memory for storing one or more computer programs;

The one or more computer programs, when executed by the one or more processors, cause the one or more processors to implement the method of any preceding item.

It should be noted that the present application may be implemented in software and/or a combination of software and hardware, eg, an application specific integrated circuit (ASIC), a general purpose computer, or any other similar hardware device. In one embodiment, the software program of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs of the present application (including associated data structures) may be stored on a computer-readable recording medium, such as RAM memory, magnetic or optical drives or floppy disks, and the like. In addition, some steps or functions of the present application may be implemented in hardware, for example, as a circuit that cooperates with a processor to perform various steps or functions.

In addition, a part of the present application can be applied as a computer program product, such as computer program instructions, which when executed by a computer, through the operation of the computer, can invoke or provide methods and/or technical solutions according to the present application. Those skilled in the art should understand that the existing forms of computer program instructions in computer-readable media include but are not limited to source files, executable files, installation package files, etc. Correspondingly, the ways in which computer program instructions are executed by a computer include but are not limited to Limited to: the computer directly executes the instruction, or the computer compiles the instruction and then executes the corresponding compiled program, or the computer reads and executes the instruction, or the computer reads and installs the instruction and then executes the corresponding post-installation program. program. Here, the computer-readable medium can be any available computer-readable storage medium or communication medium that can be accessed by a computer.

Communication media includes media by which communication signals containing, for example, computer readable instructions, data structures, program modules or other data are transmitted from one system to another. Communication media may include conducted transmission media such as cables and wires (eg, fiber optic, coaxial, etc.) and wireless (unconducted transmission) media capable of propagating energy waves, such as acoustic, electromagnetic, RF, microwave, and infrared . Computer readable instructions, data structures, program modules or other data may be embodied, for example, as a modulated data signal in a wireless medium such as a carrier wave or similar mechanism such as embodied as part of spread spectrum technology. The term "modulated data signal" refers to a signal whose one or more characteristics are altered or set in a manner that encodes information in the signal. Modulation can be analog, digital or hybrid modulation techniques.

By way of example and not limitation, computer-readable storage media may include volatile and non-volatile, readable storage media implemented in any method or technology for storage of information such as computer-readable instructions, data structures, program modules or other data. Removable and non-removable media. For example, computer-readable storage media include, but are not limited to, volatile memory, such as random access memory (RAM, DRAM, SRAM); and non-volatile memory, such as flash memory, various read-only memories (ROM, PROM, EPROM) , EEPROM), magnetic and ferromagnetic/ferroelectric memory (MRAM, FeRAM); and magnetic and optical storage devices (hard disks, tapes, CDs, DVDs); or other media now known or later developed capable of storing data for computer systems Computer readable information/data used.

Here, an embodiment according to the present application includes an apparatus comprising a memory for storing computer program instructions and a processor for executing the program instructions, wherein, when the computer program instructions are executed by the processor, a trigger is The apparatus operates based on the aforementioned methods and/or technical solutions according to various embodiments of the present application.

It will be apparent to those skilled in the art that the present application is not limited to the details of the above-described exemplary embodiments, but that the present application may be implemented in other specific forms without departing from the spirit or essential characteristics of the present application. Accordingly, the embodiments are to be regarded in all respects as illustrative and not restrictive, and the scope of the application is to be defined by the appended claims rather than the foregoing description, which is therefore intended to fall within the scope of the claims. All changes within the meaning and scope of the equivalents of , are included in this application. Any reference signs in the claims shall not be construed as limiting the involved claim. Furthermore, it is clear that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. Several units or means recited in the device claims can also be realized by one unit or means by means of software or hardware. The terms first, second, etc. are used to denote names and do not denote any particular order.

Claims

A method for playing voice information in a multi-person voice, applied to a network device, wherein the method includes:

For a target user among multiple users participating in multi-person speech, determine the virtual location information of other users in the multiple users in the virtual sound field corresponding to the target user, and generate the virtual location information according to the virtual location information. The virtual sound field information corresponding to the target user;

Sending the virtual sound field information to the user equipment corresponding to the target user, so that the user equipment plays the user's voice information according to the virtual position information of each of the other users in the virtual sound field.
The method according to claim 1, wherein, for the target user among the multiple users participating in the multi-person speech, the virtual positions of other users among the multiple users in the virtual sound field corresponding to the target user are determined information, including:

determining the virtual scene information corresponding to the multi-person voice;

determining a virtual location corresponding to each of the multiple users according to the virtual scene information;

According to the virtual position corresponding to the target user and the virtual positions corresponding to the other users, the virtual position information of the other users in the virtual sound field corresponding to the target user is determined.
The method according to claim 2, wherein the determining the virtual scene information corresponding to the multi-person voices comprises:

The identification information corresponding to the target virtual scene information selected by the voice initiating user among the plurality of default virtual scene information among the multiple users is obtained, and the target virtual scene information is determined as the virtual scene information corresponding to the multi-person voices.
The method according to claim 2, wherein the determining the virtual scene information corresponding to the multi-person voices comprises:

obtaining at least one target virtual scene information selected by at least one of the plurality of users from a plurality of default virtual scene information, and determining virtual scene information corresponding to the multi-person voices from the at least one target virtual scene information, Among them, the determined virtual scene information is selected the most times.
The method according to claim 2, wherein the determining the virtual scene information corresponding to the multi-person voices comprises:

According to the voice theme information corresponding to the voices of the multiple people, the target default virtual scene information matching the voice theme information is determined from a plurality of default virtual scene information, and the target default virtual scene information is determined as the multiple default virtual scene information. The virtual scene information corresponding to the human voice.
The method according to claim 2, wherein the determining the virtual scene information corresponding to the multi-person voices comprises:

According to the user information corresponding to the multiple users, target default virtual scene information matching the user information is determined from multiple default virtual scene information, and the target default virtual scene information is determined as the multi-person voice Corresponding virtual scene information.
The method according to claim 6, wherein, according to the user information corresponding to the multiple users, determining the target default virtual scene information that matches the user information from multiple default virtual scene information includes:

According to the user information corresponding to the voice initiating user among the multiple users, target default virtual scene information matching the user information is determined from multiple default virtual scene information.
The method according to claim 6, wherein, according to the user information corresponding to the multiple users, determining the target default virtual scene information that matches the user information from multiple default virtual scene information includes:

According to user information corresponding to each of the plurality of users, at least one default virtual scene information that matches the user information corresponding to each of the plurality of users is determined from the plurality of default virtual scene information, and determining target default virtual scene information from the at least one default virtual scene information, wherein the number of users matching the target default virtual scene information is the largest.
The method according to claim 2, wherein the virtual scene information includes a plurality of predetermined virtual positions;

Wherein, determining the virtual location corresponding to each of the multiple users according to the virtual scene information includes:

For each user in the plurality of users, obtain a target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions, and determine the target predetermined virtual position as the user's target predetermined virtual position in the virtual scene information virtual location.
The method according to claim 9, wherein, for each of the plurality of users, obtaining the target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions comprises:

For each user of the plurality of users, obtain a target predetermined virtual position designated by the voice-initiated user among the plurality of predetermined virtual positions for the user among the plurality of users.
The method according to claim 9, wherein, for each of the plurality of users, obtaining the target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions comprises:

For each user in the plurality of users, according to the user information corresponding to the user, a target predetermined virtual position is determined among the plurality of predetermined virtual positions, wherein the target predetermined virtual position is in the virtual scene information The tag information of the user matches the user information corresponding to the user.
The method according to claim 9, wherein, for each of the plurality of users, obtaining the target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions comprises:

generating and sending virtual location request information to each of the plurality of users, wherein the virtual location request information includes the virtual scene information;

Receive feedback information about the virtual location request information sent by at least one of the multiple users, wherein the feedback information sent by each of the at least one user is used to indicate that the user is in the multiple users the target predetermined virtual position selected in the predetermined virtual position;

For each user of the plurality of users, according to the feedback information, a target predetermined virtual position corresponding to the user in the plurality of predetermined virtual positions is determined.
The method of claim 12, wherein the method further comprises:

After receiving the feedback information sent by the first user among the multiple users, generate the first prompt information corresponding to the feedback information, and send the first prompt information to the other users who have not yet given feedback among the multiple users The user is prompted to prompt that the predetermined virtual position of the first target indicated by the feedback information cannot be selected.
The method of claim 12, wherein the method further comprises:

After receiving the feedback information sent by the first user among the multiple users, second prompt information corresponding to the feedback information is generated, and the second prompt information is sent to the multiple users except the first user. other users than the user, to prompt that the first target predetermined virtual position indicated by the feedback information has been selected by the first user.
The method of claim 12, wherein the method further comprises:

Receive invitation request information sent by a second user of the at least one user, wherein the second user has selected a second target predetermined virtual location from the plurality of predetermined virtual locations, and the invitation request information is used to invite selecting a predetermined virtual position near the second target predetermined virtual position by a third user among the plurality of users who has not given feedback currently;

Sending the invitation request information to the third user to prompt the third user to select an unselected predetermined virtual position near the second target predetermined virtual position as the target predetermined virtual position corresponding to the third user .
The method of claim 12, wherein the method further comprises:

After the predetermined feedback period corresponding to the virtual location request information is reached, for each user among the plurality of users that is not currently giving feedback, determine at least one predetermined user that is not currently selected among the plurality of predetermined virtual locations. The target predetermined virtual position corresponding to the virtual position is determined as the virtual position of the user in the virtual scene information.
The method according to claim 16, wherein, for each user in the plurality of users who is not currently giving feedback, determining at least one predetermined virtual location that is not currently selected by the user in the plurality of predetermined virtual locations The corresponding target predetermined virtual location in , including:

Determine the hotspot location area information in the virtual scene information according to the virtual position of at least one user in the virtual scene information that has been currently fed back from the plurality of users;

For each user of the plurality of users who is not currently giving feedback, a predetermined virtual location that is not selected in the hotspot location area information is determined as the virtual location of the user in the virtual scene information.
The method according to claim 16, wherein, for each user in the plurality of users who is not currently giving feedback, determining at least one predetermined virtual location that is not currently selected by the user in the plurality of predetermined virtual locations The corresponding target predetermined virtual location in , including:

For each user in the plurality of users who has not currently given feedback, determine a target predetermined virtual position corresponding to at least one predetermined virtual position that is not currently selected by the user in the plurality of predetermined virtual positions, wherein the target predetermined virtual position The tag information of the virtual location in the virtual scene information matches the user information corresponding to the user.
A device for playing voice information in a multi-person voice, characterized in that the device comprises:

processor; and

a memory arranged to store computer-executable instructions which, when executed, cause the processor to perform a method as claimed in any one of claims 1 to 18.
A computer-readable medium storing instructions that, when executed, cause a system to operate the method of any one of claims 1 to 18.