CN108109630B - Audio processing method and device and media server - Google Patents

Audio processing method and device and media server Download PDF

Info

Publication number
CN108109630B
CN108109630B CN201611037628.4A CN201611037628A CN108109630B CN 108109630 B CN108109630 B CN 108109630B CN 201611037628 A CN201611037628 A CN 201611037628A CN 108109630 B CN108109630 B CN 108109630B
Authority
CN
China
Prior art keywords
conversation
audio
audio frequency
members
setting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611037628.4A
Other languages
Chinese (zh)
Other versions
CN108109630A (en
Inventor
牛超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201611037628.4A priority Critical patent/CN108109630B/en
Priority to PCT/CN2017/082884 priority patent/WO2018094968A1/en
Publication of CN108109630A publication Critical patent/CN108109630A/en
Application granted granted Critical
Publication of CN108109630B publication Critical patent/CN108109630B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0316Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude
    • G10L21/0364Speech enhancement, e.g. noise reduction or echo cancellation by changing the amplitude for improving intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S3/00Systems employing more than two channels, e.g. quadraphonic

Abstract

The invention discloses an audio processing method and device, wherein the method comprises the following steps: sampling voice data of each conversation member of a stereo multi-person conversation to determine an audio frequency value of each conversation member; and setting audio characteristics when all conversation members speak according to the audio frequency value, wherein the audio characteristics are the volume proportion of all sound channels when all conversation members speak. In the process of stereo multi-person conversation, the voice data of each conversation member is sampled, the audio frequency characteristic of each conversation member during speaking is set according to the audio frequency value of each conversation member, different audio frequency characteristics can present different stereo effects, and the conversation members can bring a virtual seat feeling when listening to other users for speaking, so that the conversation members can be more easily distinguished as if each conversation member is in one direction, and the user experience is better.

Description

Audio processing method and device and media server
Technical Field
The present invention relates to the field of communications, and in particular, to an audio processing method and apparatus, and a media server.
Background
As for stereo multi-person conversation technology, it is mature at present, and teleconferencing is a common application scenario of stereo multi-person conversation technology. In the existing teleconference between a CS (Circuit Switch) domain and a PS (Packet Switch) domain, only each user can hear the speech during the communication process, but because the number of users in the teleconference is large, it is impossible to distinguish which user is speaking, and the user experience is poor.
Disclosure of Invention
The invention provides an audio processing method, an audio processing device and a media server, which are used for solving the following problems in the prior art: in the process of stereo multi-person conversation, because the number of users is large, which user is speaking cannot be distinguished, and the user experience is poor.
To solve the above technical problem, in one aspect, the present invention provides an audio processing method, including: sampling voice data of each conversation member of a stereo multi-person conversation to determine an audio frequency value of each conversation member; and setting audio characteristics when all conversation members speak according to the audio frequency value, wherein the audio characteristics are the volume proportion of all sound channels when all conversation members speak.
Optionally, setting an audio characteristic when each conversation member speaks according to the audio frequency value includes: judging whether audio frequency values in the same preset audio frequency range exist or not; if the conversation members exist, different audio features are set for each conversation member in the same preset audio frequency range; if not, the same or different audio characteristics are set for the respective conversation members.
Optionally, setting an audio characteristic when each conversation member speaks according to the audio frequency value includes: sequencing the audio frequency values of all the conversation members; and setting different audio characteristics for the conversation members adjacent to the audio frequency value in the sorting.
Optionally, the audio features are set for the conversation members as follows: acquiring volume information of each sound channel of the conversation members; acquiring a channel volume proportion from the calculated multiple channel volume proportions; and adjusting the volume information of each sound channel of the conversation members according to the sound channel volume proportion.
Optionally, after setting the audio frequency characteristics of each conversation member when speaking according to the audio frequency value, the method further includes: under the condition that any conversation member speaks, acquiring the audio characteristics of the speaking conversation member; and adjusting and inputting the audio of the audio players of the other conversation members except the conversation member who speaks according to the audio characteristics.
In another aspect, the present invention further provides an apparatus for processing audio, including: the sampling module is used for sampling voice data of each conversation member of the stereo multi-person conversation so as to determine an audio frequency value of each conversation member; and the setting module is used for setting audio characteristics when each conversation member speaks according to the audio frequency value, wherein the audio characteristics are the volume proportion of each sound channel when each conversation member speaks.
Optionally, the setting module includes: the judging unit is used for judging whether audio frequency values in the same preset audio frequency range exist or not; the first setting unit is used for setting different audio characteristics for each conversation member in the same preset audio frequency range under the condition that the audio frequency values in the same preset audio frequency range exist; and under the condition that the audio frequency values in the same preset audio frequency range do not exist, setting the same or different audio characteristics for each conversation member.
Optionally, the setting module includes: the sequencing unit is used for sequencing the audio frequency values of all the conversation members; and the second setting unit is used for setting different audio characteristics for the conversation members adjacent to the audio frequency value in the sorting.
Optionally, the setting module is specifically configured to set an audio feature for the conversation member according to the following manner: acquiring volume information of each sound channel of the conversation members; acquiring a channel volume proportion from the calculated multiple channel volume proportions; and adjusting the volume information of each sound channel of the conversation members according to the sound channel volume proportion.
Optionally, the method further includes: the acquisition module is used for acquiring the audio characteristics of the speaking conversation members under the condition that any conversation member speaks; and the input module is used for adjusting and inputting the audio of the players of the other conversation members except the conversation member who speaks according to the audio characteristics.
In another aspect, the present invention further provides a media server, including: the collector is used for sampling voice data of each conversation member of the stereo multi-person conversation; and the processor is used for determining the collected audio frequency value of each conversation member and setting the audio characteristic when each conversation member speaks according to the audio frequency value, wherein the audio characteristic is the volume proportion of each sound channel when the conversation member speaks.
Optionally, the processor is specifically configured to determine whether there are audio frequency values in the same preset audio frequency range; under the condition that audio frequency values in the same preset audio frequency range exist, different audio features are set for each conversation member in the same preset audio frequency range; under the condition that the audio frequency values in the same preset audio frequency range do not exist, the same or different audio characteristics are set for each conversation member; or, the method is further configured to rank the audio frequency values of the session members, and set different audio features for the session members adjacent to the audio frequency values in the ranking.
In the process of stereo multi-person conversation, the voice data of each conversation member is sampled, the audio frequency characteristic of each conversation member when speaking is set according to the audio frequency value of each conversation member, different audio characteristics can present different stereo effects, and the conversation members can bring a virtual seat feeling when listening to other users to speak as if each conversation member is in one direction, so that the conversation members who speak can be distinguished more easily, the user experience is better, and the following problems in the prior art are solved: in the process of stereo multi-person conversation, because the number of users is large, which user is speaking cannot be distinguished, and the user experience is poor.
Drawings
FIG. 1 is a flow chart of a method of processing audio in a first embodiment of the invention;
FIG. 2 is a schematic diagram of a processing apparatus for audio according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of a preferred structure of an audio processing device according to a second embodiment of the present invention;
FIG. 4 is a topological diagram of the relationship between a media server and conference members according to a fourth embodiment of the present invention;
FIG. 5 is a schematic diagram of a virtual space location setting according to a fourth embodiment of the present invention;
fig. 6 is a schematic diagram of another virtual space position setting in the fourth embodiment of the present invention.
Detailed Description
In order to solve the following problems in the prior art: in the process of stereo multi-person conversation, because the number of users is large, which user is speaking cannot be distinguished, and the user experience is poor; the invention provides an audio processing method, an audio processing device and a media server, and the invention is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The first embodiment of the present invention provides an audio processing method, the flow of which is shown in fig. 1, including steps S102 to S104:
s102, voice data of each conversation member of the stereo multi-person conversation is sampled to determine an audio frequency value of each conversation member. Wherein stereo is a sound source comprising at least two channels, which is different from mono. The single sound channel only has one sound channel, so that the sound is only heard; however, since stereo sound has a plurality of channels, a user can listen to it with a sense of stereo, and thus, a multi-channel sound source is called stereo sound, and for example, when a movie theater is used to watch a movie, the sound is played as stereo sound. Because it is stereo, subsequent audio adjustments can only be made.
And S104, setting audio characteristics when each conversation member speaks according to the audio frequency value, wherein the audio characteristics are the volume proportion of each sound channel when each conversation member speaks.
In the embodiment of the invention, in the process of stereo multi-person conversation, the voice data of each conversation member is sampled, the audio characteristics of each conversation member when speaking are set according to the audio frequency value of each conversation member, different audio characteristics can present different stereo effects, and the conversation members can bring a virtual seat feeling when listening to the speech of other users, so that the conversation members are in a same direction as if each conversation member is in the same direction, the conversation members who speak can be distinguished more easily, the user experience is better, and the following problems in the prior art are solved: in the process of stereo multi-person conversation, because the number of users is large, which user is speaking cannot be distinguished, and the user experience is poor.
In the implementation process, the setting of the audio characteristics when the conversation members speak according to the audio frequency value may include various ways, which are described in two cases below.
In the first case: judging whether audio frequency values in the same preset audio frequency range exist or not; if the conversation member exists, the voice characteristics of the users are similar, and the users are easy to generate confusion feeling when listening, so that different audio characteristics are set for each conversation member in the same preset audio frequency range.
For example, when the audio frequency values of two conversation members are both in the same preset audio frequency range, at this time, when the audio feature adjustment is performed on the two conversation members, the audio features of the two conversation members may not be adjusted to be the same, and once the audio features are adjusted to be the same, other conversation members cannot distinguish which conversation member is speaking when listening to the speech of the two conversation members, so that it is necessary to set different audio features for each conversation member in the same preset audio frequency range, and the more distinct the audio feature distinction is, the better the feeling like "one speaking on the left and one speaking on the right" is heard.
If the audio frequency value in the same preset audio frequency range does not exist, the probability that each conversation member is easy to confuse is low, therefore, during setting, the same audio characteristics can be set for each conversation member, the listened conversation member can distinguish different users through the audio frequency, and certainly, different audio characteristics can be set for different conversation members in order to distinguish different speakers more clearly.
In the second case: when the audio frequency characteristics of each conversation member during speaking are set according to the audio frequency values, the audio frequency values of each conversation member can be sequenced; in this way, different audio characteristics are set for the conversation members adjacent to the audio frequency value in the sorting, and the conversation members with similar audio frequencies can be distinguished. Certainly, when the specific setting is performed, three or four adjacent users can be set with different audio features so as to better distinguish different speakers, so that the users can feel that different speakers are in different directions, and the user experience is better.
During setting, audio features are set for the conversation members according to the following modes: acquiring volume information of each sound channel of the conversation members; acquiring a channel volume proportion from the calculated multiple channel volume proportions; and adjusting the volume information of each sound channel of the conversation members according to the sound channel volume proportion.
After the audio frequency characteristics of each conversation member during speaking are set according to the audio frequency values, the audio characteristics of the conversation members who speak can be obtained when any conversation member speaks; and adjusting the audio of the audio players of the other conversation members except the conversation member who speaks according to the audio characteristics.
A second embodiment of the present invention provides an audio processing apparatus, which is schematically shown in fig. 2 and includes: a sampling module 10, configured to sample voice data of each conversation member of a stereo multi-person conversation to determine an audio frequency value of each conversation member; and the setting module 20 is coupled to the sampling module 10 and configured to set, according to the audio frequency value, an audio characteristic of each conversation member when speaking, where the audio characteristic is a volume proportion of each channel when the conversation member speaks.
The setting module 20 may include: the judging unit is used for judging whether audio frequency values in the same preset audio frequency range exist or not; the first setting unit is used for setting different audio characteristics for each conversation member in the same preset audio frequency range under the condition that the audio frequency values in the same preset audio frequency range exist; and under the condition that the audio frequency values in the same preset audio frequency range do not exist, setting the same or different audio characteristics for each conversation member.
The setup module 20 may further include: the sequencing unit is used for sequencing the audio frequency values of all the conversation members; and the second setting unit is used for setting different audio characteristics for the conversation members adjacent to the audio frequency value in the sorting.
The setting module is specifically configured to set audio features for the session members as follows: acquiring volume information of each sound channel of the conversation members; acquiring a channel volume proportion from the calculated multiple channel volume proportions; and adjusting the volume information of each sound channel of the conversation members according to the sound channel volume proportion.
Fig. 3 shows a preferred structure of the above processing apparatus for audio in a teleconference, which further includes: an obtaining module 30, coupled to the setting module 20, configured to obtain, when any conversation member speaks, an audio feature of the speaking conversation member; and an input module 40, coupled to the obtaining module 30, for adjusting and inputting the audio of the players of the conversation members except the conversation member who speaks according to the audio characteristics.
A third embodiment of the present invention provides a media server, including: the collector is used for sampling voice data of each conversation member of the stereo multi-person conversation; and the processor determines the collected audio frequency value of each conversation member and sets the audio characteristics of each conversation member when speaking according to the audio frequency value, wherein the audio characteristics are the volume proportion of each sound channel when the conversation member speaks.
The processor is specifically used for judging whether audio frequency values in the same preset audio frequency range exist or not; under the condition that audio frequency values in the same preset audio frequency range exist, different audio features are set for each conversation member in the same preset audio frequency range; under the condition that the audio frequency values in the same preset audio frequency range do not exist, the same or different audio characteristics are set for each conversation member; or, the method is also used for sequencing the audio frequency values of the conversation members and setting different audio characteristics for the conversation members adjacent to the audio frequency values in the sequencing.
A fourth embodiment of the present invention provides a method for processing audio in a teleconference, which mainly relates to an improvement of a mixing function of a media server in charge of mixing audio in the teleconference, in a CS domain and a PS domain. In the embodiment, in order to establish a model of virtual spatial positions (which are set in virtual positions, one or more virtual spatial positions may exist in one virtual position) of conference members, that is, a feeling that each participant has one virtual seat is given to the conference members, so that the voices of the members in a teleconference are distinguished by mixing the voices through the feeling of orientation in combination with the characteristics of the voices of the conference members. The method comprises steps (1) to (3).
(1) The media server samples the sound data of each conference member.
As shown in fig. 4, after the teleconference is established, from the perspective of voice data stream, all the 7-way terminals participating in the teleconference and the media service on the core network side have the following star topology result, each terminal will send its uplink voice data to the media server through the RTP protocol, and the media server will also send the teleconference downlink data to each terminal. The media server may sample the terminal upstream voice data during the period when each member has just joined the conference (e.g., the first 10 seconds of access to the conference call). Because both parties can say "feed, hello! ' etc. to perform simple small talk. And sampling is carried out in sequence according to the sequence of accessing the teleconference.
(2) The media is ordered by sound from low frequency to high frequency.
After the media server obtains the sampling data, the function of the voiceprint recognition software/hardware is called, the uplink sound data of each member is analyzed, the frequency value is calculated, and a sequencing result from low to high is generated.
(3) And the media server performs sound mixing according to the sequencing result and by combining the orientation sound mixing function.
After the sequencing result is obtained, the media server completes the audio mixing process according to the preset audio mixing rule (the audio mixing process is a process of adjusting the audio characteristics).
This mixing rule may consider two dimensions: member location (which member is desired at which virtual location), member voice characteristics. Specifically, the positions can be divided into 6 positions, namely, left far, left near, middle far, middle near, right far and right near; the voice characteristics of the members can be divided into two groups of high frequency/low frequency, or can be simply grouped according to gender (the frequency of the voice is high or low).
As is known, the voice conference call members in the PS domain support 6 paths at maximum, i.e. there are 6 terminals participating in the conference call except the local one. By the analysis of the voiceprint, 3 members are assigned to the low frequency group, 3 members are assigned to the high frequency group, the low frequency group is assigned to the side close to the own machine in the virtual spatial position, and the high frequency group is assigned to the far side in the virtual spatial position.
In each frequency group, the members are correspondingly allocated to the left, middle and right positions according to the frequencies from low, middle and high, so that the positions are exactly 6 virtual space positions, as shown in fig. 5. When mixing, the intensity of the sound is divided by 1/3 for each of the left, center, and right positions, and the far side and near side may be divided by 0.5 to 0.5, 0.45 to 0.55, 0.4 to 0.6, etc. The process maximally isolates the interference of sound among all members, increases the signal-to-noise ratio and improves the recognition rate of the sound.
In the case of 3-way teleconference, it is only necessary to mix the audio at 1/3 for each of the left, center and right positions. If the number of the conference members is more than 6, the positions can be subdivided, wherein N is the number of the conference members, the calculation formula of the included angle between the directions of the two adjacent virtual space positions is 180/((N-1)/2-1), wherein N is the total number of the participants, in this embodiment, N is 9, so the included angle is 60 degrees, and the setting schematic is shown in fig. 6.
It should be noted that the voice data of the teleconference member assigned to the right left virtual space position is mixed into only the left channel; voice data of the teleconference members assigned to the right virtual space position is mixed into only the right channel; and the speech data of other members in the non-left and non-right virtual space positions are mixed into the left and right channels according to the angle value, and the calculation mode is as follows:
whether the virtual position is to the left or right is confirmed. If the angle is deviated from the left side, the angle value A deviated from the horizontal line on the left side is determined, and the value range of the angle A is 0-90 degrees. Then, the specific gravity of the left channel is 1 — the specific gravity of the right channel; the specific gravity of the right channel is Tan (a/2)/2. If the angle is deviated from the right side, determining an angle value B deviated from a horizontal line on the right side, wherein the value range of the angle B is 0-90 degrees. Then, the specific gravity of the right channel is 1 — the specific gravity of the left channel; the specific gravity of the left channel is Tan (B/2)/2.
In this embodiment, after the sound data of each conference member is acquired, a virtual role may be assigned to each conference member, that is, a virtual position corresponding to each conference member is set for each conference member, that is, it is equivalent to let the user a sit on the chair 1, the user B sit on the chair 2, the user C sit on the chair 3, and the like. This process is to achieve the goal of placing users with similar sound data in different orientations. After the virtual position where the conference member is expected to sit is set, the sound of each conference member needs to be tuned, namely, the audio characteristics are set for each conference member, and therefore the situation that each conference member is expected to sit in the corresponding virtual seat is achieved.
The embodiment of the invention realizes the effect of distinguishing each conference member in the teleconference as a whole, and has higher user experience.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, and the scope of the invention should not be limited to the embodiments described above.

Claims (10)

1. A method for processing audio, comprising:
sampling voice data of each conversation member of a stereo multi-person conversation to determine an audio frequency value of each conversation member;
setting audio characteristics of each conversation member when speaking according to the audio frequency value, wherein the audio characteristics comprise:
judging whether audio frequency values in the same preset audio frequency range exist or not;
if the conversation members exist, different audio features are set for each conversation member in the same preset audio frequency range;
if not, setting the same or different audio characteristics for each conversation member; wherein, the audio features are the volume proportion of each sound channel when the conversation members speak.
2. The process of claim 1, wherein setting the audio characteristics of the individual conversation members when speaking based on the audio frequency value comprises:
sequencing the audio frequency values of all the conversation members;
and setting different audio characteristics for the conversation members adjacent to the audio frequency value in the sorting.
3. The process of claim 1, wherein audio features are set for the conversation members as follows:
acquiring volume information of each sound channel of the conversation members;
acquiring a channel volume proportion from the calculated multiple channel volume proportions;
and adjusting the volume information of each sound channel of the conversation members according to the sound channel volume proportion.
4. The processing method according to any one of claims 1 to 3, wherein after setting the audio characteristics when the respective conversation members speak according to the audio frequency value, further comprising:
under the condition that any conversation member speaks, acquiring the audio characteristics of the speaking conversation member;
and adjusting and inputting the audio of the audio players of the other conversation members except the conversation member who speaks according to the audio characteristics.
5. An apparatus for processing audio, comprising:
the sampling module is used for sampling voice data of each conversation member of the stereo multi-person conversation so as to determine an audio frequency value of each conversation member;
the setting module is used for setting audio characteristics when each conversation member speaks according to the audio frequency value, wherein the audio characteristics are the volume proportion of each sound channel when each conversation member speaks; the setting module includes:
the judging unit is used for judging whether audio frequency values in the same preset audio frequency range exist or not;
the first setting unit is used for setting different audio characteristics for each conversation member in the same preset audio frequency range under the condition that the audio frequency values in the same preset audio frequency range exist; and under the condition that the audio frequency values in the same preset audio frequency range do not exist, setting the same or different audio characteristics for each conversation member.
6. The processing apparatus of claim 5, wherein the setup module comprises:
the sequencing unit is used for sequencing the audio frequency values of all the conversation members;
and the second setting unit is used for setting different audio characteristics for the conversation members adjacent to the audio frequency value in the sorting.
7. The processing apparatus as claimed in claim 5, wherein the setting module is specifically configured to set audio features for the conversation members as follows:
acquiring volume information of each sound channel of the conversation members; acquiring a channel volume proportion from the calculated multiple channel volume proportions; and adjusting the volume information of each sound channel of the conversation members according to the sound channel volume proportion.
8. The processing apparatus according to any one of claims 5 to 7, further comprising:
the acquisition module is used for acquiring the audio characteristics of the speaking conversation members under the condition that any conversation member speaks;
and the input module is used for adjusting and inputting the audio of the players of the other conversation members except the conversation member who speaks according to the audio characteristics.
9. A media server, comprising:
the collector is used for sampling voice data of each conversation member of the stereo multi-person conversation;
the processor is used for determining the collected audio frequency value of each conversation member and setting the audio characteristic when each conversation member speaks according to the audio frequency value, wherein the audio characteristic is the volume proportion of each sound channel when the conversation member speaks; the processor is specifically configured to determine whether there are audio frequency values within the same preset audio frequency range; under the condition that audio frequency values in the same preset audio frequency range exist, different audio features are set for each conversation member in the same preset audio frequency range; and under the condition that the audio frequency values in the same preset audio frequency range do not exist, setting the same or different audio characteristics for each conversation member.
10. The media server of claim 9,
the processor is further configured to sort the audio frequency values of the session members, and set different audio characteristics for the session members adjacent to the audio frequency values in the sorting.
CN201611037628.4A 2016-11-23 2016-11-23 Audio processing method and device and media server Active CN108109630B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201611037628.4A CN108109630B (en) 2016-11-23 2016-11-23 Audio processing method and device and media server
PCT/CN2017/082884 WO2018094968A1 (en) 2016-11-23 2017-05-03 Audio processing method and apparatus, and media server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611037628.4A CN108109630B (en) 2016-11-23 2016-11-23 Audio processing method and device and media server

Publications (2)

Publication Number Publication Date
CN108109630A CN108109630A (en) 2018-06-01
CN108109630B true CN108109630B (en) 2022-01-25

Family

ID=62194750

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611037628.4A Active CN108109630B (en) 2016-11-23 2016-11-23 Audio processing method and device and media server

Country Status (2)

Country Link
CN (1) CN108109630B (en)
WO (1) WO2018094968A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633091B (en) * 2020-12-09 2021-11-16 北京博瑞彤芸科技股份有限公司 Method and system for verifying real meeting
CN115361474A (en) * 2022-08-18 2022-11-18 上海复旦通讯股份有限公司 Method for auxiliary recognition of sound source in telephone conference

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118746A (en) * 2007-09-17 2008-02-06 腾讯科技(深圳)有限公司 Double track based audio data calibration method and multi-people voice talking system thererof
CN102651178A (en) * 2011-02-28 2012-08-29 中兴通讯股份有限公司 Prenatal education auxiliary device and application method thereof and user terminal
CN102969003A (en) * 2012-11-15 2013-03-13 东莞宇龙通信科技有限公司 Image pickup sound extracting method and device
CN105141730A (en) * 2015-08-27 2015-12-09 腾讯科技(深圳)有限公司 Volume control method and device
CN105741833A (en) * 2016-03-14 2016-07-06 腾讯科技(深圳)有限公司 Voice communication data processing method and device
CN105791606A (en) * 2016-04-15 2016-07-20 深圳美亚美科技有限公司 Multi-people session control system

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US4712231A (en) * 1984-04-06 1987-12-08 Shure Brothers, Inc. Teleconference system
US6125115A (en) * 1998-02-12 2000-09-26 Qsound Labs, Inc. Teleconferencing method and apparatus with three-dimensional sound positioning
FI109158B (en) * 2000-06-26 2002-05-31 Nokia Corp Portable device and method for providing a user with information about the functions of the portable device
JP2013097076A (en) * 2011-10-29 2013-05-20 Shimon Gomi Foreign language voice range self-discovery system
US9706302B2 (en) * 2014-02-05 2017-07-11 Sennheiser Communications A/S Loudspeaker system comprising equalization dependent on volume control
CN104410379B (en) * 2014-10-29 2019-05-14 深圳市金立通信设备有限公司 A kind of volume adjusting method
CN104363510B (en) * 2014-10-29 2019-04-30 深圳市金立通信设备有限公司 A kind of playback terminal
CN104811318A (en) * 2015-04-15 2015-07-29 南京农业大学 Method for controlling voice communication through voice
CN104898091B (en) * 2015-05-29 2017-07-25 复旦大学 Microphone array self calibration sonic location system based on iteration optimization algorithms

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101118746A (en) * 2007-09-17 2008-02-06 腾讯科技(深圳)有限公司 Double track based audio data calibration method and multi-people voice talking system thererof
CN102651178A (en) * 2011-02-28 2012-08-29 中兴通讯股份有限公司 Prenatal education auxiliary device and application method thereof and user terminal
CN102969003A (en) * 2012-11-15 2013-03-13 东莞宇龙通信科技有限公司 Image pickup sound extracting method and device
CN105141730A (en) * 2015-08-27 2015-12-09 腾讯科技(深圳)有限公司 Volume control method and device
CN105741833A (en) * 2016-03-14 2016-07-06 腾讯科技(深圳)有限公司 Voice communication data processing method and device
CN105791606A (en) * 2016-04-15 2016-07-20 深圳美亚美科技有限公司 Multi-people session control system

Also Published As

Publication number Publication date
WO2018094968A1 (en) 2018-05-31
CN108109630A (en) 2018-06-01

Similar Documents

Publication Publication Date Title
EP2158752B1 (en) Methods and arrangements for group sound telecommunication
JP4255461B2 (en) Stereo microphone processing for conference calls
US8606249B1 (en) Methods and systems for enhancing audio quality during teleconferencing
US6850496B1 (en) Virtual conference room for voice conferencing
US20080159507A1 (en) Distributed teleconference multichannel architecture, system, method, and computer program product
US7848738B2 (en) Teleconferencing system with multiple channels at each location
US7433716B2 (en) Communication apparatus
WO2000048379A1 (en) Method and system for providing spatialized audio in conference calls
US6813360B2 (en) Audio conferencing with three-dimensional audio encoding
US20080273476A1 (en) Device Method and System For Teleconferencing
EP1973320A1 (en) Conference system with adaptive mixing based on collocation of participants
TW200828867A (en) Communication system
CN108109630B (en) Audio processing method and device and media server
CN102457700B (en) Audio data transmission method and system
US20080232569A1 (en) Teleconferencing System with Multi-channel Imaging
US7068792B1 (en) Enhanced spatial mixing to enable three-dimensional audio deployment
US8526589B2 (en) Multi-channel telephony
Rothbucher et al. Backwards compatible 3d audio conference server using hrtf synthesis and sip
Hyrkas et al. Spatialized Audio and Hybrid Video Conferencing: Where Should Voices be Positioned for People in the Room and Remote Headset Users?
US7697675B2 (en) Multiparty call of portable devices with party positioning identification
JP2017519379A (en) Object-based teleconferencing protocol
JP2004274147A (en) Sound field fixed multi-point talking system
CN114978574A (en) Communication system and corresponding method
Reynolds et al. SPATIALIZED AUDIO CONFERENCES-IMS Integration and Traffic Modelling
WO2022152403A1 (en) Method and system for handling a teleconference

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant