CN113301294B - Call control method and device and intelligent terminal - Google Patents

Call control method and device and intelligent terminal Download PDF

Info

Publication number
CN113301294B
CN113301294B CN202110526170.3A CN202110526170A CN113301294B CN 113301294 B CN113301294 B CN 113301294B CN 202110526170 A CN202110526170 A CN 202110526170A CN 113301294 B CN113301294 B CN 113301294B
Authority
CN
China
Prior art keywords
sound source
channel
sound
value
coordinates
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110526170.3A
Other languages
Chinese (zh)
Other versions
CN113301294A (en
Inventor
谢亮洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Konka Electronic Technology Co Ltd
Original Assignee
Shenzhen Konka Electronic Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Konka Electronic Technology Co Ltd filed Critical Shenzhen Konka Electronic Technology Co Ltd
Priority to CN202110526170.3A priority Critical patent/CN113301294B/en
Publication of CN113301294A publication Critical patent/CN113301294A/en
Application granted granted Critical
Publication of CN113301294B publication Critical patent/CN113301294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/15Conference systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • H04S7/303Tracking of listener position or orientation

Abstract

The invention discloses a call control method, a call control device and an intelligent terminal, wherein the call control method comprises the following steps: acquiring sound source position information in the call process; and controlling the sound channel parameters of the playing device in the conversation process based on the sound source position information, wherein the playing device at least comprises two sound channels. Compared with the scheme which is usually more focused on visual experience in the conversation process in the prior art, the scheme of the invention is more focused on auditory experience in the conversation process. Specifically, the scheme of the invention acquires the sound source position information in the conversation process, and controls the sound channel parameters of the playing device based on the sound source position information, thereby being beneficial to restoring the azimuth of the sound when the sound is played in the conversation process, improving the authenticity of the conversation scene and improving the hearing experience.

Description

Call control method and device and intelligent terminal
Technical Field
The present invention relates to the field of communications technologies, and in particular, to a call control method and apparatus, and an intelligent terminal.
Background
With the rapid development of scientific technology, especially the rapid development of communication technology, various communication tools are becoming more and more popular in daily work and life, and the communication function is greatly convenient for users. The user can communicate daily or work meeting through voice call or video call. With the wide application of the call, the user needs to experience the call more and more.
In the prior art, visual experience during a video call is often more focused. For example, the current speaker is identified and locally enlarged for the current speaker; or, network stability is improved to improve video stability and real-time. The problem in the prior art is that only the visual experience of the user in the conversation is focused, the requirement of the user on the hearing experience is not considered, the direction of the restored sound in the conversation process is not facilitated, and the hearing experience of the user is influenced.
Accordingly, there is a need for improvement and development in the art.
Disclosure of Invention
The invention mainly aims to provide a call control method, a call control device and an intelligent terminal, and aims to solve the problems that in the prior art, only visual experience in a call is focused, the requirement on hearing experience is not considered, the direction of sound is not restored in the call process, and hearing experience is affected.
In order to achieve the above object, a first aspect of the present invention provides a call control method, where the method includes:
acquiring sound source position information in the call process;
and controlling the sound channel parameters of the playing device in the conversation process based on the sound source position information, wherein the playing device at least comprises two sound channels.
Optionally, the sound source position information includes coordinates of a sound source, and the acquiring the sound source position information in the call process includes:
identifying and acquiring a sound object, and taking the sound object as a sound source;
and positioning the sound source to obtain the coordinates of the sound source.
Optionally, the sound source position information further includes a width of a target area where the sound source is located.
Optionally, the identifying and acquiring the sound object includes:
and identifying and acquiring the sounding object through an array microphone or a camera.
Optionally, the controlling the channel parameters of the playing device during the call based on the sound source position information, where the playing device includes at least two channels includes:
calculating and acquiring a sound channel balance value based on the coordinates of the sound source and the width of the target area;
and controlling the channel parameters of the playing device based on the channel balance value.
Optionally, the coordinates of the sound source include an abscissa corresponding to a direction of the width of the target area, and the calculating to obtain the channel balance value based on the coordinates of the sound source and the width of the target area includes:
acquiring a preset channel balance threshold;
and calculating the channel balance value based on the channel balance threshold, the abscissa and the target area width, wherein the channel balance value is equal to the ratio of the abscissa to the target area width multiplied by the channel balance threshold.
Optionally, the channel parameters include a left channel gain and a right channel gain, and the controlling the channel parameters of the playback device based on the channel balance value includes:
when the channel balance value is greater than 0, the value of the left channel gain is set to be smaller than the value of the right channel gain.
A second aspect of the present invention provides a call control apparatus, wherein the apparatus includes:
the sound source position information acquisition module is used for acquiring sound source position information in the call process;
and the control module is used for controlling the sound channel parameters of the playing device in the conversation process based on the sound source position information, wherein the playing device at least comprises two sound channels.
Optionally, the sound source position information includes coordinates of a sound source, and the sound source position information acquiring module includes:
the recognition unit is used for recognizing and acquiring a sound object and taking the sound object as a sound source;
and the positioning unit is used for positioning the sound source and acquiring the coordinates of the sound source.
A third aspect of the present invention provides an intelligent terminal, where the intelligent terminal includes a memory, a processor, and a call control program stored in the memory and executable on the processor, where the call control program when executed by the processor implements any one of the steps of the call control method.
From the above, in the scheme of the invention, the sound source position information in the conversation process is obtained; and controlling the sound channel parameters of the playing device in the conversation process based on the sound source position information, wherein the playing device at least comprises two sound channels. Compared with the scheme which is usually more focused on visual experience in the conversation process in the prior art, the scheme of the invention is more focused on auditory experience in the conversation process. Specifically, the scheme of the invention acquires the sound source position information in the conversation process, and controls the sound channel parameters of the playing device based on the sound source position information, thereby being beneficial to restoring the azimuth of the sound when the sound is played in the conversation process, improving the authenticity of the conversation scene and improving the hearing experience.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a call control method according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of step S100 in FIG. 1 according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of step S200 in FIG. 1 according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of step S201 in FIG. 3 according to an embodiment of the present invention;
fig. 5 is a schematic diagram of a call process according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of a multi-sound source distribution provided by an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a call control device according to an embodiment of the present invention;
fig. 8 is a schematic diagram illustrating a specific structure of the sound source position information acquiring module 310 in fig. 7 according to an embodiment of the present invention;
fig. 9 is a schematic block diagram of an internal structure of an intelligent terminal according to an embodiment of the present invention.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It is also to be understood that the terminology used in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be further understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in this specification and the appended claims, the term "if" may be interpreted in context as "when …" or "upon" or "in response to a determination" or "in response to detection. Similarly, the phrase "if a condition or event described is determined" or "if a condition or event described is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a condition or event described" or "in response to detection of a condition or event described".
The following description of the embodiments of the present invention will be made more fully hereinafter with reference to the accompanying drawings, in which embodiments of the invention are shown, it being evident that the embodiments described are only some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways other than those described herein, and persons skilled in the art will readily appreciate that the present invention is not limited to the specific embodiments disclosed below.
With the rapid development of scientific technology, especially the rapid development of communication technology, various communication tools are becoming more and more popular in daily work and life, and the communication function is greatly convenient for users. The user can communicate daily or work meeting through voice call or video call. With the wide application of the call, the user needs to experience the call more and more.
The prior art pursues video stability, call instantaneity and call accuracy, and aims at visual experience of users. I.e. let the two parties (or more) of the call know the current meeting situation visually. Thus, in the prior art, the visual experience during a video call is often more focused. For example, the current speaker is identified and locally enlarged for the current speaker; or, network stability is improved to improve video stability and real-time. The problem in the prior art is that only the visual experience of the user in the conversation is focused, the requirement of the user on the hearing experience is not considered, the direction of the restored sound in the conversation process is not facilitated, and the hearing experience of the user is influenced.
In order to solve the problems in the prior art, the invention provides a call control method, in the embodiment of the invention, the sound source position information in the call process is obtained; and controlling the sound channel parameters of the playing device in the conversation process based on the sound source position information, wherein the playing device at least comprises two sound channels. Compared with the scheme which is usually more focused on visual experience in the conversation process in the prior art, the scheme of the invention is more focused on auditory experience in the conversation process. Specifically, the scheme of the invention acquires the sound source position information in the conversation process, and controls the sound channel parameters of the playing device based on the sound source position information, thereby being beneficial to restoring the azimuth of the sound when the sound is played in the conversation process, improving the authenticity of the conversation scene and improving the hearing experience.
Exemplary method
As shown in fig. 1, an embodiment of the present invention provides a call control method, and specifically, the method includes the following steps:
step S100, sound source position information in the conversation process is obtained.
The call in this embodiment is a conference call, and the sound emitted by the sound object is collected by the microphone set in advance and transmitted to the playing device of the call receiver for playing. The above-mentioned call process may be a video call or a voice call, and is not limited herein, and in this embodiment, the video call is specifically described as an example. Further, the above-mentioned call process may be a two-way call or a multi-way call, which is not limited herein. The above sound source position information is used to indicate the position where the sound object (i.e., the user who is speaking) is located during the call.
Step 200, controlling the channel parameters of the playing device in the communication process based on the sound source position information, wherein the playing device at least comprises two channels.
The playing device is a device for playing the voice of the sounding object by using the sounding object, and the sounding object is a user who communicates with the sounding object. The playing device may be a sound box, a television, etc. In this embodiment, the playing device has at least two channels, so that the sound sizes played by the two channels of the playing device can be controlled by adjusting the channel parameters of the playing device, so as to simulate the source direction of the sound.
In this embodiment, a specific description will be given of a case of a two-party call, in which the channel parameters of the playback devices of one listener (i.e., the object to be received) are adjusted according to the sound source position information of one party to be uttered (i.e., the object to be uttered), and if the above-mentioned call is a multi-party call, the channel parameters of the playback devices of a plurality of listeners need to be adjusted according to the sound source position information, and a specific adjustment method can be referred to the adjustment method provided in this embodiment during the two-party call.
From the above, the call control method provided by the embodiment of the invention obtains the sound source position information in the call process; and controlling the sound channel parameters of the playing device in the conversation process based on the sound source position information, wherein the playing device at least comprises two sound channels. Compared with the scheme which is usually more focused on visual experience in the conversation process in the prior art, the scheme of the invention is more focused on auditory experience in the conversation process. Specifically, the scheme of the invention acquires the sound source position information in the conversation process, and controls the sound channel parameters of the playing device based on the sound source position information, thereby being beneficial to restoring the azimuth of the sound when the sound is played in the conversation process, improving the authenticity of the conversation scene and improving the hearing experience.
Specifically, in this embodiment, the sound source position information includes coordinates of the sound source, as shown in fig. 2, and the step S100 includes:
step S101, identifying and acquiring a sound object, wherein the sound object is taken as a sound source.
Step S102, positioning the sound source and obtaining the coordinates of the sound source.
In this embodiment, the position of the sound generating object may be not fixed, for example, the user may move while speaking in the conference room to perform the presentation, or a plurality of seats may be preset in the conference room, and the specific seat of the sound generating object may not be determinable in advance. Therefore, real-time recognition of the sound object is required to determine the sound source.
In this embodiment, the sound object may be identified and acquired by the array microphone or the camera. Specifically, in the call process, the voice can be collected through the preset array microphone, so that the sounding object can be identified and obtained. Further, image recognition can be performed to obtain the sound object based on the image of the current area (such as the conference room) acquired by the camera. In the actual use process, there are other methods for identifying the sound object, which are not particularly limited herein. The sounding object is a user who is speaking in the process of speaking, and the sounding object is not fixed, so that the sounding object is identified and acquired in real time and positioned in the method, and accordingly the sound channel parameters of the playing device are controlled in real time according to the coordinates of the sound source.
The coordinates of the sound source are coordinates obtained by positioning according to a preset origin of coordinates and directions of coordinate axes, and the origin of coordinates and directions of coordinate axes can be adjusted according to actual requirements, and are not particularly limited herein. Further, two-dimensional coordinates or three-dimensional coordinates may be constructed, and in this application, two-dimensional coordinates are taken as an example for illustration, and in this embodiment, a display device is disposed in front of a target area where a sound source is located (such as a conference room) and an area where a sound receiving object (a listener) is located, so as to display video in a call process. In this embodiment, a two-dimensional coordinate is constructed with the left direction facing the display device being the negative direction of the abscissa axis, the right direction being the positive direction of the abscissa axis, and the front direction being the positive direction of the ordinate axis, so that the position of the sound emission object in the target area can be represented by the two-dimensional coordinate. In the practical use process, other methods for establishing the coordinate system can be adopted, and coordinate conversion can be performed between coordinate systems established in different modes, so that the method is not particularly limited.
In this embodiment, the sound source position information further includes a width of a target area where the sound source is located.
Specifically, the target area may be a room, such as a conference room. The width of the target area may be a width of the conference room corresponding to the abscissa axis direction. The width of the target area may be measured in advance.
Specifically, in this embodiment, as shown in fig. 3, the step S200 includes:
step S201, calculating and obtaining a channel balance value based on the coordinates of the sound source and the width of the target area.
Step S202, controlling the channel parameters of the playing device based on the channel balance value.
The channel balance value is a value for adjusting the gain between different channels of the playback device. Specifically, in this embodiment, when the above-mentioned channel balance value is greater than 0, the right channel gain of the playing device may be controlled to be greater than the left channel gain, so that the sound heard by the receiving object deviates to the right; when the channel balance value is smaller than 0, the left channel gain of the playing device can be controlled to be larger than the right channel gain, so that the sound heard by the receiving object is biased to the left; when the channel balance value is equal to 0, controlling the left channel gain of the playing device to be equal to the right channel gain.
Specifically, in this embodiment, the coordinates of the sound source include an abscissa corresponding to the direction of the width of the target area, and as shown in fig. 4, the step S201 includes:
step S2011, obtaining a preset channel balance threshold.
Step S2012, calculating and obtaining the channel balance value based on the channel balance threshold, the abscissa and the target area width, wherein the channel balance value is equal to the ratio of the abscissa to the target area width multiplied by the channel balance threshold.
The preset channel balance threshold is a preset threshold of a channel balance value of the playing device, that is, a maximum value of a channel gain of the playing device, the channel balance threshold is greater than 0, and the channel balance threshold depends on the specific playing device and can be set and adjusted according to actual requirements. In this embodiment, the channel balance threshold is 50, that is, the channel gain of the playback device is 50db at most, and if the channel gain is greater than 50db, the playback device may break down the sound, thereby affecting the hearing experience.
Specifically, in this embodiment, the channel parameters include a left channel gain and a right channel gain, and the step S202 includes: when the channel balance value is greater than 0, the value of the left channel gain is set to be smaller than the value of the right channel gain.
Specifically, when the above-mentioned channel balance value is greater than 0, it indicates that the abscissa of the sound source is greater than 0, and the sound source is on the right with respect to the origin of coordinates (such as the center of the conference room), at this time, the value of the right channel gain may be set equal to the above-mentioned channel balance value, and the value of the left channel gain is set smaller than the value of the right channel gain, so that after the sounds of the left channel and the right channel of the playing device are overlapped, the sounds of the right channel are greater, so that the user (the receiving object) feels that the sounds are emitted from the right, which is favorable for simulating the azimuth information of the sounds, and improving the hearing experience of the user.
Further, when the above-mentioned channel balance value is smaller than 0, it means that the abscissa of the sound source is smaller than 0, and the sound source is on the left side with respect to the origin of coordinates (such as the center of the conference room), at this time, the value of the left channel gain may be set to be equal to the above-mentioned channel balance value, and the value of the right channel gain may be set to be smaller than the above-mentioned left channel gain, so that after the sounds of the left and right channels of the playing device are superimposed, the sounds of the left channel are louder, and the user (receiving object) perceives the sounds to be emitted from the left side. When the channel balance value is equal to 0, it is indicated that the azimuth of the sound source has no left offset or right offset, and at this time, the values of the left channel gain and the right channel gain may be set to be equal to the preset channel balance threshold.
In this embodiment, the explanation is given taking the case that the playing device has a left channel and a right channel, so that the sound played by the playing device can be simulated on the left or right of the receiving object. In the actual use process, the three-dimensional coordinates and the azimuth simulation of the played sound can be combined, and a plurality of playing devices (or a plurality of sound channels) placed at different positions can be used for more accurately positioning and playing the sound, so that the user experience is further improved. For example, the values of gains of a plurality of channels of the playing device can be controlled by combining the height coordinates (vertical coordinates) of the sound source, so that the azimuth and the height of the simulated playing sound are realized, and the user experience is further improved. The specific control method may refer to the above call control method, and will not be described herein.
In this embodiment, the call control method is further described based on a specific application scenario. Fig. 5 is a schematic diagram of a call process according to an embodiment of the present invention, in which the data processing flow is similar for both parties of a video call from the input of one device to the output of the other device, and thus in this embodiment, the transmission of audio data from the a device to the B device is illustrated as an example. As shown in fig. 5, the a device is conference device corresponding to a sound object, for example, a television with conference function, etc., and the a device collects audio data through an array microphone, and determines the azimuth of the sound source through sound source identification and localization, so as to obtain the xy coordinates of the sound source. Alternatively, only the x-coordinate may be obtained, and in the embodiment of the present invention only the calculation based on the x-coordinate is shown. Further, the real-time xy coordinates, the current space (target area) width X and the audio data are simultaneously sent to the playing device B, and after the playing device B receives the data, the channel balance value is obtained through calculation, so that the channel parameters are set in real time according to the channel balance value. The playback apparatus B may be a conference set or a television set having a video call function. Specifically, a preset channel balance threshold N is obtained, where the channel balance threshold corresponds to a maximum value of left and right channel gains of the playback device B, for example, when n=50, the maximum value of left and right channel gains of the playback device B is 50db. Corresponding to the channel balance threshold value N, the value range of the channel balance value S of the playing device B is [ -N, N ], when s= -N, the value of the left channel gain is described as N, and when s=n, the value of the right channel gain is described as N. In this embodiment, n=50, when S obtained by calculation is-50, the gain of the left channel may be set to be 50db, the gain of the right channel may be set to be 10db, and after the last sound is superimposed, the sound will be shifted to the left, and the perception of the user is that the sound is emitted from the left. On the contrary, when S is 50, the left channel gain is set to 10db, the right channel gain is set to 50db, and finally, after the sound is overlapped, the sound is shifted to the right, and the perception of the user is that the sound is emitted from the right. In the present embodiment, the channel balance value S may be calculated based on the following formula (1):
S=N*(x/X) (1)
Wherein X is the abscissa of the sound source, X is the width of the target area, and N is a preset channel balance threshold. The origin of coordinates corresponding to the abscissa of the sound source is located at the center point corresponding to the width of the target area. Therefore, the relative left-right direction of the sound source in the target area and the corresponding distance proportion can be determined according to the abscissa x of the sound source, so that the calculated channel balance value reflects the direction of the sound source.
Further, if the identified sound sources are not unique and sound is generated simultaneously, the number of sound sources, that is, coordinates corresponding to each sound source, may be acquired, and calculation may be performed based on the number of sound sources and the coordinates corresponding to each sound source. For example, acquisition data (n, n 1 (x 1 ,y 1 ),n 2 (x 2 ,y 2 ),n 3 (x 3 ,y 3 ),···,n n (x n ,y n ) As position information, n is the number of sound sources, (x) 1 ,y 1 ) Is the firstCoordinates of sound source, (x) 2 ,y 2 ) Is the coordinates of the second sound source, and so on. On this basis, the channel balance value S can be calculated based on the following formula (2):
Figure BDA0003065857960000111
/>
wherein n is the number of sound sources, x 1 ,x 2 ,x 3 The coordinates are the abscissa of the sound source, the corresponding origin of coordinates is the center point of the array microphone (the abscissa direction is the center point of the width of the target area), X is the width of the target area, and N is the preset channel balance threshold. Fig. 6 is a schematic diagram of a multi-sound source distribution according to an embodiment of the present invention, in which fig. 6 includes 3 sound sources, n=3, x 1 =-500,x 2 =-280,x 3 If the calculated S is about-7.716, the gain of the left channel of the playing device B may be set to be greater than the gain of the right channel, specifically the gain of the left channel of the playing device may be set to be 7.716db, and the gain of the right channel may be set to be less than 7.716db. The conference video call in the prior art is limited by factors such as bandwidth, environment and the like, more sharpness and instantaneity of the call are pursued, and with the arrival of the 5G age, a user can obtain higher bandwidth and faster network speed, so that better experience can be brought to the call based on the call control method provided by the invention, and a more real dialogue scene can be provided for the two parties of the call because the two parties can hear whether the speaker is on the left or the right in the video.
Exemplary apparatus
As shown in fig. 7, corresponding to the above-mentioned call control method, an embodiment of the present invention further provides a call control device, where the call control device includes:
the sound source position information obtaining module 310 is configured to obtain sound source position information during a call.
The call in this embodiment is a conference call, and the sound emitted by the sound object is collected by the microphone set in advance and transmitted to the playing device of the call receiver for playing. The above-mentioned call process may be a video call or a voice call, and is not limited herein, and in this embodiment, the video call is specifically described as an example. Further, the above-mentioned call process may be a two-way call or a multi-way call, which is not limited herein. The above sound source position information is used to indicate the position where the sound object (i.e., the user who is speaking) is located during the call.
And a control module 320, configured to control channel parameters of a playback device during the call based on the sound source position information, where the playback device includes at least two channels.
The playing device is a device for playing the voice of the sounding object by using the sounding object, and the sounding object is a user who communicates with the sounding object. The playing device may be a sound box, a television, etc. In this embodiment, the playing device has at least two channels, so that the sound sizes played by the two channels of the playing device can be controlled by adjusting the channel parameters of the playing device, so as to simulate the source direction of the sound.
In this embodiment, a specific description will be given of a case of a two-party call, in which the channel parameters of the playback devices of one listener (i.e., the object to be received) are adjusted according to the sound source position information of one party to be uttered (i.e., the object to be uttered), and if the above-mentioned call is a multi-party call, the channel parameters of the playback devices of a plurality of listeners need to be adjusted according to the sound source position information, and a specific adjustment method can be referred to the adjustment method provided in this embodiment during the two-party call.
As can be seen from the above, in the call control device provided by the embodiment of the present invention, the sound source position information in the call process is obtained through the sound source position information obtaining module 310; and controlling, by the control module 320, channel parameters of a playback device during the call based on the sound source position information, where the playback device includes at least two channels. Compared with the scheme which is usually more focused on visual experience in the conversation process in the prior art, the scheme of the invention is more focused on auditory experience in the conversation process. Specifically, the scheme of the invention acquires the sound source position information in the conversation process, and controls the sound channel parameters of the playing device based on the sound source position information, thereby being beneficial to restoring the azimuth of the sound when the sound is played in the conversation process, improving the authenticity of the conversation scene and improving the hearing experience.
Specifically, in this embodiment, the sound source position information includes coordinates of a sound source, and as shown in fig. 8, the sound source position information obtaining module 310 includes:
the identifying unit 311 is configured to identify and acquire a sound object, and take the sound object as a sound source.
And a positioning unit 312, configured to position the sound source and obtain coordinates of the sound source.
In this embodiment, the position of the sound generating object may be not fixed, for example, the user may move while speaking in the conference room to perform the presentation, or a plurality of seats may be preset in the conference room, and the specific seat of the sound generating object may not be determinable in advance. Therefore, real-time recognition of the sound object is required to determine the sound source.
In this embodiment, the sound object may be identified and acquired by the array microphone or the camera. Specifically, in the call process, the voice can be collected through the preset array microphone, so that the sounding object can be identified and obtained. Further, image recognition can be performed to obtain the sound object based on the image of the current area (such as the conference room) acquired by the camera. In the actual use process, there are other methods for identifying the sound object, which are not particularly limited herein. The sounding object is a user who is speaking in the process of speaking, and the sounding object is not fixed, so that the sounding object is identified and acquired in real time and positioned in the method, and accordingly the sound channel parameters of the playing device are controlled in real time according to the coordinates of the sound source.
The coordinates of the sound source are coordinates obtained by positioning according to a preset origin of coordinates and directions of coordinate axes, and the origin of coordinates and directions of coordinate axes can be adjusted according to actual requirements, and are not particularly limited herein. Further, two-dimensional coordinates or three-dimensional coordinates may be constructed, and in this application, two-dimensional coordinates are taken as an example for illustration, and in this embodiment, a display device is disposed in front of a target area (such as a conference room) where the sound source is located for displaying a video during a call. In this embodiment, a two-dimensional coordinate is constructed with the left direction facing the display device being the negative direction of the abscissa axis, the right direction being the positive direction of the abscissa axis, and the front direction being the positive direction of the ordinate axis, so that the position of the sound emission object in the target area can be represented by the two-dimensional coordinate. In the practical use process, other methods for establishing the coordinate system can be adopted, and coordinate conversion can be performed between coordinate systems established in different modes, so that the method is not particularly limited.
In this embodiment, the sound source position information further includes a width of a target area where the sound source is located.
Specifically, the target area may be a room, such as a conference room. The width of the target area may be a width of the conference room corresponding to the abscissa axis direction. The width of the target area may be measured in advance.
Specifically, in this embodiment, the control module 320 is specifically configured to: calculating and acquiring a sound channel balance value based on the coordinates of the sound source and the width of the target area; and controlling the channel parameters of the playing device based on the channel balance value.
The channel balance value is a value for adjusting the gain between different channels of the playback device. Specifically, in this embodiment, when the above-mentioned channel balance value is greater than 0, the right channel gain of the playing device may be controlled to be greater than the left channel gain, so that the sound heard by the receiving object deviates to the right; when the channel balance value is smaller than 0, the left channel gain of the playing device can be controlled to be larger than the right channel gain, so that the sound heard by the receiving object is biased to the left; when the channel balance value is equal to 0, controlling the left channel gain of the playing device to be equal to the right channel gain.
Specifically, in this embodiment, the coordinates of the sound source include an abscissa corresponding to the direction of the width of the target area, and the control module 320 is specifically configured to: acquiring a preset channel balance threshold; and calculating the channel balance value based on the channel balance threshold, the abscissa and the target area width, wherein the channel balance value is equal to the ratio of the abscissa to the target area width multiplied by the channel balance threshold.
The preset channel balance threshold is a preset threshold of a channel balance value of the playing device, that is, a maximum value of a channel gain of the playing device, the channel balance threshold is greater than 0, and the channel balance threshold depends on the specific playing device and can be set and adjusted according to actual requirements. In this embodiment, the channel balance threshold is 50, that is, the channel gain of the playback device is 50db at most, and if the channel gain is greater than 50db, the playback device may break down the sound, thereby affecting the hearing experience.
Specifically, in this embodiment, the channel parameters include a left channel gain and a right channel gain, and the control module 320 is further specifically configured to: when the channel balance value is greater than 0, the value of the left channel gain is set to be smaller than the value of the right channel gain.
Specifically, when the above-mentioned channel balance value is greater than 0, it indicates that the abscissa of the sound source is greater than 0, and the sound source is on the right with respect to the origin of coordinates (such as the center of the conference room), at this time, the value of the right channel gain may be set equal to the above-mentioned channel balance value, and the value of the left channel gain is set smaller than the value of the right channel gain, so that after the sounds of the left channel and the right channel of the playing device are overlapped, the sounds of the right channel are greater, so that the user (the receiving object) feels that the sounds are emitted from the right, which is favorable for simulating the azimuth information of the sounds, and improving the hearing experience of the user.
Further, when the above-mentioned channel balance value is smaller than 0, it means that the abscissa of the sound source is smaller than 0, and the sound source is on the left side with respect to the origin of coordinates (such as the center of the conference room), at this time, the value of the left channel gain may be set to be equal to the above-mentioned channel balance value, and the value of the right channel gain may be set to be smaller than the above-mentioned left channel gain, so that after the sounds of the left and right channels of the playing device are superimposed, the sounds of the left channel are louder, and the user (receiving object) perceives the sounds to be emitted from the left side. When the channel balance value is equal to 0, it is indicated that the azimuth of the sound source has no left offset or right offset, and at this time, the values of the left channel gain and the right channel gain may be set to be equal to the preset channel balance threshold.
In this embodiment, the explanation is given taking the case that the playing device has a left channel and a right channel, so that the sound played by the playing device can be simulated on the left or right of the receiving object. In the actual use process, the three-dimensional coordinates and the azimuth simulation of the played sound can be combined, and a plurality of playing devices (or a plurality of sound channels) placed at different positions can be used for more accurately positioning and playing the sound, so that the user experience is further improved. For example, the values of gains of a plurality of channels of the playing device can be controlled by combining the height coordinates (vertical coordinates) of the sound source, so that the azimuth and the height of the simulated playing sound are realized, and the user experience is further improved. Specific control procedures may be referred to the above description and will not be repeated here.
In this embodiment, the manner in which the control module 320 calculates the channel balance value may refer to the specific description in the method embodiment, which is not described herein.
Based on the above embodiment, the present invention further provides an intelligent terminal, and a functional block diagram thereof may be shown in fig. 9. The intelligent terminal comprises a processor, a memory, a network interface and a display screen which are connected through a system bus. The processor of the intelligent terminal is used for providing computing and control capabilities. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The nonvolatile storage medium stores an operating system and a call control program. The internal memory provides an environment for the operation of the operating system and call control programs in the non-volatile storage medium. The network interface of the intelligent terminal is used for communicating with an external terminal through network connection. The call control program, when executed by the processor, implements the steps of any one of the call control methods described above. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen.
It will be appreciated by those skilled in the art that the schematic block diagram shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present invention and is not limiting of the smart terminal to which the present invention is applied, and that a particular smart terminal may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In one embodiment, there is provided an intelligent terminal including a memory, a processor, and a call control program stored on the memory and executable on the processor, the call control program when executed by the processor performing the following operation instructions:
acquiring sound source position information in the call process;
and controlling the sound channel parameters of the playing device in the conversation process based on the sound source position information, wherein the playing device at least comprises two sound channels.
The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a call control program, which when executed by a processor, implements any one of the steps of the call control method provided by the embodiment of the invention.
It should be understood that the sequence number of each step in the above embodiment does not mean the sequence of execution, and the execution sequence of each process should be determined by its function and internal logic, and should not be construed as limiting the implementation process of the embodiment of the present invention.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present invention. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other manners. For example, the apparatus/terminal device embodiments described above are merely illustrative, e.g., the division of the modules or units described above is merely a logical function division, and may be implemented in other manners, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed.
The integrated modules/units described above, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer-readable storage medium. Based on such understanding, the present invention may implement all or part of the flow of the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the steps of each method embodiment may be implemented. The computer program comprises computer program code, and the computer program code can be in a source code form, an object code form, an executable file or some intermediate form and the like. The computer readable medium may include: any entity or device capable of carrying the computer program code described above, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. The content of the computer readable storage medium can be appropriately increased or decreased according to the requirements of the legislation and the patent practice in the jurisdiction.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art will understand that; the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions are not intended to depart from the spirit and scope of the various embodiments of the invention, which are also within the spirit and scope of the invention.

Claims (7)

1. A call control method, the method comprising:
acquiring sound source position information in a call process, wherein the sound source position information comprises coordinates of a sound source, the coordinates of the sound source are two-dimensional coordinates constructed by taking the left side right opposite to a display device as the negative direction of an abscissa axis, the right side right as the positive direction of the abscissa axis and the front side as the positive direction of an ordinate axis, and the sound source is positioned in a target area, wherein the display device is a device capable of performing video display in the call process and arranged in front of the target area where the sound source is positioned;
And controlling sound channel parameters of playing equipment in the conversation process based on the sound source position information, wherein the playing equipment at least comprises two sound channels and comprises:
calculating and acquiring a sound channel balance value based on the coordinates of the sound source and the width of the target area;
the coordinates of the sound source include an abscissa corresponding to a direction of the width of the target area, and the calculating of the acquired channel balance value based on the coordinates of the sound source and the width of the target area includes: acquiring a preset channel balance threshold; calculating to obtain the channel balance value based on the channel balance threshold, the abscissa and the target area width, wherein the channel balance value is equal to the ratio of the abscissa to the target area width multiplied by the channel balance threshold;
controlling channel parameters of the playing device based on the channel balance value, wherein the channel parameters comprise left channel gain and right channel gain;
the controlling the channel parameters of the playing device based on the channel balance value includes:
when the channel balance value is greater than 0, setting a value of a right channel gain to be equal to the channel balance value, and setting a value of a left channel gain to be smaller than the value of the right channel gain;
When the channel balance value is smaller than 0, setting a value of a left channel gain to be equal to an absolute value of the channel balance value, and setting a value of a right channel gain to be smaller than the value of the left channel gain;
when the channel balance value is equal to 0, setting the values of the left channel gain and the right channel gain to be equal to a preset channel balance threshold, wherein the channel balance threshold is the maximum value of the channel gain of the playing device.
2. The call control method according to claim 1, wherein the acquiring sound source position information in the call process includes:
identifying and acquiring a sound object, and taking the sound object as a sound source;
and positioning the sound source to obtain the coordinates of the sound source.
3. The call control method according to claim 2, wherein the sound source position information further includes a width of a target area where the sound source is located.
4. The call control method according to claim 2, wherein the identifying the acquisition utterance object includes:
and identifying and acquiring the sounding object through an array microphone or a camera.
5. A call control device, the device comprising:
the system comprises a sound source position information acquisition module, a display device and a display module, wherein the sound source position information acquisition module is used for acquiring sound source position information in the call process, the sound source position information comprises coordinates of a sound source, the coordinates of the sound source are negative directions of abscissa axes when the sound source is right opposite to the display device, the right is positive directions of the abscissa axes, the front is positive directions of ordinate axes, and the constructed two-dimensional coordinates are used for representing the position of the sound source in a target area, and the display device is a device capable of performing video display in the call process in front of the target area where the sound source is located;
The control module is configured to control channel parameters of a playback device in the call process based on the sound source position information, where the playback device includes at least two channels, and includes:
calculating and acquiring a sound channel balance value based on the coordinates of the sound source and the width of the target area;
the coordinates of the sound source include an abscissa corresponding to a direction of the width of the target area, and the calculating of the acquired channel balance value based on the coordinates of the sound source and the width of the target area includes: acquiring a preset channel balance threshold; calculating to obtain the channel balance value based on the channel balance threshold, the abscissa and the target area width, wherein the channel balance value is equal to the ratio of the abscissa to the target area width multiplied by the channel balance threshold;
controlling channel parameters of the playing device based on the channel balance value, wherein the channel parameters comprise left channel gain and right channel gain;
the controlling the channel parameters of the playing device based on the channel balance value includes:
when the channel balance value is greater than 0, setting a value of a right channel gain to be equal to the channel balance value, and setting a value of a left channel gain to be smaller than the value of the right channel gain;
When the channel balance value is smaller than 0, setting a value of a left channel gain to be equal to an absolute value of the channel balance value, and setting a value of a right channel gain to be smaller than the value of the left channel gain;
when the channel balance value is equal to 0, setting the values of the left channel gain and the right channel gain to be equal to a preset channel balance threshold, wherein the channel balance threshold is the maximum value of the channel gain of the playing device.
6. The call control device according to claim 5, wherein the sound source position information includes coordinates of a sound source, and the sound source position information acquisition module includes:
the recognition unit is used for recognizing and acquiring a sound object and taking the sound object as a sound source;
and the positioning unit is used for positioning the sound source and acquiring coordinates of the sound source.
7. An intelligent terminal, characterized in that it comprises a memory, a processor and a call control program stored on the memory and executable on the processor, which call control program, when executed by the processor, implements the steps of the call control method according to any of claims 1-4.
CN202110526170.3A 2021-05-14 2021-05-14 Call control method and device and intelligent terminal Active CN113301294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110526170.3A CN113301294B (en) 2021-05-14 2021-05-14 Call control method and device and intelligent terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110526170.3A CN113301294B (en) 2021-05-14 2021-05-14 Call control method and device and intelligent terminal

Publications (2)

Publication Number Publication Date
CN113301294A CN113301294A (en) 2021-08-24
CN113301294B true CN113301294B (en) 2023-04-25

Family

ID=77321999

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110526170.3A Active CN113301294B (en) 2021-05-14 2021-05-14 Call control method and device and intelligent terminal

Country Status (1)

Country Link
CN (1) CN113301294B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116048448A (en) * 2022-07-26 2023-05-02 荣耀终端有限公司 Audio playing method and electronic equipment

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105741833B (en) * 2016-03-14 2021-05-11 腾讯科技(深圳)有限公司 Voice communication data processing method and device
CN107301028B (en) * 2016-04-14 2020-06-02 阿里巴巴集团控股有限公司 Audio data processing method and device based on multi-person remote call
CN107580289A (en) * 2017-08-10 2018-01-12 西安蜂语信息科技有限公司 Method of speech processing and device
CN108957392A (en) * 2018-04-16 2018-12-07 深圳市沃特沃德股份有限公司 Sounnd source direction estimation method and device
CN109164414A (en) * 2018-09-07 2019-01-08 深圳市天博智科技有限公司 Localization method, device and storage medium based on microphone array

Also Published As

Publication number Publication date
CN113301294A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
EP3627860B1 (en) Audio conferencing using a distributed array of smartphones
EP3286929B1 (en) Processing audio data to compensate for partial hearing loss or an adverse hearing environment
US8509454B2 (en) Focusing on a portion of an audio scene for an audio signal
US9674629B2 (en) Multichannel sound reproduction method and device
EP1938661B1 (en) System and method for audio processing
US8238563B2 (en) System, devices and methods for predicting the perceived spatial quality of sound processing and reproducing equipment
US20150189455A1 (en) Transformation of multiple sound fields to generate a transformed reproduced sound field including modified reproductions of the multiple sound fields
EP2320676A1 (en) Method, communication device and communication system for controlling sound focusing
EP1906707A1 (en) Audio transmission system and communication conference device
EP3515055A1 (en) Normalization of soundfield orientations based on auditory scene analysis
EP2566194A1 (en) Method and device for processing audio in video communication
CN106664501A (en) System, apparatus and method for consistent acoustic scene reproduction based on informed spatial filtering
WO2012005894A1 (en) Facilitating communications using a portable communication device and directed sound output
US11221821B2 (en) Audio scene processing
CN110035372B (en) Output control method and device of sound amplification system, sound amplification system and computer equipment
JP2021535632A (en) Methods and equipment for processing audio signals
US20220150657A1 (en) Apparatus, method or computer program for processing a sound field representation in a spatial transform domain
CN113301294B (en) Call control method and device and intelligent terminal
US20230362571A1 (en) Information processing device, information processing terminal, information processing method, and program
Müller et al. Perceptual differences for modifications of the elevation of early room reflections
CN115002401B (en) Information processing method, electronic equipment, conference system and medium
US20230276187A1 (en) Spatial information enhanced audio for remote meeting participants
JPH03252258A (en) Directivity reproducing device
EP4304207A1 (en) Information processing device, information processing method, and program
WO2017211448A1 (en) Method for generating a two-channel signal from a single-channel signal of a sound source

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant