CN117135557A - Audio processing method, device, electronic equipment, storage medium and program product - Google Patents

Audio processing method, device, electronic equipment, storage medium and program product Download PDF

Info

Publication number
CN117135557A
CN117135557A CN202210940126.1A CN202210940126A CN117135557A CN 117135557 A CN117135557 A CN 117135557A CN 202210940126 A CN202210940126 A CN 202210940126A CN 117135557 A CN117135557 A CN 117135557A
Authority
CN
China
Prior art keywords
played
signal
audio
angle
speaker
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210940126.1A
Other languages
Chinese (zh)
Inventor
秦宇
谢仁礼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen TCL Digital Technology Co Ltd
Original Assignee
Shenzhen TCL Digital Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen TCL Digital Technology Co Ltd filed Critical Shenzhen TCL Digital Technology Co Ltd
Priority to CN202210940126.1A priority Critical patent/CN117135557A/en
Priority to PCT/CN2023/097184 priority patent/WO2024027315A1/en
Publication of CN117135557A publication Critical patent/CN117135557A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/09Electronic reduction of distortion of stereophonic sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

The embodiment of the invention discloses an audio processing method, an audio processing device, electronic equipment, a storage medium and a program product; the method comprises the steps that an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played can be obtained, a signal transfer angle between each loud speaker and a user head is determined, an anti-crosstalk function corresponding to the signal transfer angle is calculated based on the signal transfer angle and the target sound source angle, the anti-crosstalk function is used for eliminating cross sound generated when the at least two loud speakers play audio, and the audio signal to be processed of each loud speaker is subjected to signal transformation based on the target sound source angle, the signal transfer angle and the anti-crosstalk function to obtain a target audio signal to be played, wherein the target audio signal corresponds to the audio signal to be played; it is possible to generate an audio signal capable of expressing the azimuth of a sound source and to eliminate cross-talk sound generated when playing audio using a loud speaker, so that a user can enjoy audio with a better playing effect.

Description

Audio processing method, device, electronic equipment, storage medium and program product
Technical Field
The present invention relates to the field of audio processing technology, and in particular, to an audio processing method, apparatus, electronic device, storage medium, and program product.
Background
With the current rapid development of economy and technology, people also begin to pursue the hearing effect of surround sound in daily life. If the surround sound effect is realized by arranging additional sound equipment and the like, a certain time and economic resources are consumed.
At present, the stereo effect can be realized by utilizing the modern electroacoustic technology on the basis of not adding sound equipment, but by adopting the scheme, the audio data of two channels are generally processed to be played, the real surround stereo effect can not be completely realized, and if the scheme is applied to an externally-placed scene, the limited stereo playing effect can be further weakened due to the factors such as crosstalk and the like.
Disclosure of Invention
The embodiment of the invention provides an audio processing method, an audio processing device, electronic equipment, a storage medium and a program product, which can generate an audio signal capable of representing the sound source direction under the condition of not adding audio playing equipment, and eliminate cross sound generated when a loud speaker is used for playing audio, so that a user can enjoy the audio with better playing effect.
The embodiment of the invention provides an audio processing method, which comprises the following steps:
acquiring an audio signal to be played of each loud speaker in at least two loud speakers and a target sound source angle corresponding to each audio signal to be played;
determining a signal transmission angle between each loud speaker and the head of a user;
calculating an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle, wherein the anti-crosstalk function is used for eliminating cross sound generated when the at least two loud speakers play audio outwards;
and carrying out signal transformation on the audio signals to be processed of the loud speakers based on the anti-crosstalk function to obtain target playing audio signals corresponding to the audio signals to be played.
Accordingly, an embodiment of the present invention provides an audio processing apparatus, including:
the signal acquisition unit is used for acquiring audio signals to be played of each loud speaker in at least two loud speakers and a target sound source angle corresponding to each audio signal to be played;
an angle determining unit for determining a signal transmission angle between each of the loud speakers and the head of the user;
A function calculation unit, configured to calculate an anti-crosstalk function corresponding to the signal transfer angle based on the signal transfer angle and the target sound source angle, where the anti-crosstalk function is used to eliminate cross-talk generated when the at least two loud speakers play audio;
and the signal conversion unit is used for carrying out signal conversion on the audio signals to be processed of the loud speakers based on the anti-crosstalk function to obtain target playing audio signals corresponding to the audio signals to be played.
Optionally, the function calculating unit is configured to determine a speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle;
determining a sound source head related transfer function corresponding to the target sound source angle based on the target sound source angle;
and calculating an anti-crosstalk function corresponding to the signal transfer angle according to the speaker head related transfer function and the sound source head related transfer function.
Optionally, the at least two loud speakers include a left speaker and a right speaker, and the angle determining unit is configured to determine a left signal transmission angle between the left speaker and the head of the user;
Determining a right side signal transfer angle between the right speaker and the user's head;
the function calculation unit is used for determining a first left-ear head related transfer function between the left loudspeaker and the left ear of the user and a first right-ear head related transfer function between the left loudspeaker and the right ear of the user based on the left-side signal transfer angle;
determining a second left-ear head related transfer function between the right speaker and a left ear of the user and a second right-ear head related transfer function between the right speaker and a right ear of the user based on the right-side signal transfer angle;
and taking the first left-ear head related transfer function, the first right-ear head related transfer function, the second left-ear head related transfer function and the second right-ear head related transfer function as speaker head related transfer functions.
Optionally, the at least two loud speakers include a left speaker and a right speaker, and the angle determining unit is configured to determine a positional relationship among the left speaker, the right speaker and the user's head;
if the left speaker and the right speaker are in bilateral symmetry relation relative to the user head, taking the angle between any loud speaker and the user head as a signal transmission angle;
The function calculation unit is used for determining a first head related transfer function between the left speaker and the left ear of the user and between the right speaker and the right ear of the user and a second head related transfer function between the left speaker and the right ear of the user and between the right speaker and the left ear of the user based on the signal transfer angle;
the first head related transfer function and the second head related transfer function are taken as speaker head related transfer functions.
Optionally, the function calculation unit is configured to perform matrix combination processing according to the speaker head related transfer function to obtain a speaker crosstalk matrix corresponding to the audio signal to be processed;
performing matrix cancellation on the speaker crosstalk matrix, and calculating a crosstalk cancellation matrix of the speaker crosstalk matrix;
and calculating an anti-crosstalk function corresponding to the signal transfer angle based on the crosstalk cancellation matrix and the sound source head related transfer function.
Optionally, the audio processing device provided by the embodiment of the present invention further includes a function processing unit, configured to obtain a preset discrete head related transfer function;
performing function approximation processing on the discrete head related transfer function to obtain a target head related transfer function;
The function calculation unit is used for determining a speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle and the target head related transfer function;
and determining a sound source head related transfer function corresponding to the target sound source angle based on the target sound source angle and the target head related transfer function.
Optionally, the signal obtaining unit is configured to obtain an audio signal to be played of each of the at least two loud speakers;
and performing sound source position positioning on the audio signals to be played, and determining target sound source angles corresponding to the audio signals to be played.
Optionally, the signal obtaining unit is configured to obtain an audio signal to be played of each of the at least two loud speakers and a video frame to be played corresponding to the audio signal to be played;
determining sounding position information of a sounding object from the video frame to be played;
and calculating the target sound source angle corresponding to each audio signal to be played based on the sounding position information of the sounding object.
Optionally, the video frame to be played includes at least one candidate sound object, and the audio processing device provided by the embodiment of the invention further includes a sound object determining unit, configured to determine a sound object corresponding to the audio signal to be played, and obtain object identification information of the sound object;
The signal acquisition unit is used for carrying out information matching based on the object identification information from the candidate sound production objects included in the video frame to be played and determining the sound production objects;
and acquiring a target display area of the sound emission object in the video frame to be played, and taking the position information of the target display area as sound emission position information of the sound emission object.
Optionally, the video frame to be played includes at least one display area of a candidate sound object, and the signal obtaining unit is configured to detect sound actions for each display area in the video frame to be played, and if detecting that a candidate sound object in the display area performs a sound action, take the candidate sound object as a sound object;
and acquiring a target display area of the sound emission object in the video frame to be played, and taking the position information of the target display area as sound emission position information of the sound emission object.
Optionally, the audio processing device provided by the embodiment of the present invention further includes a first location determining unit, configured to determine, in response to a location selection operation of a user, sound receiving location information corresponding to the user in the video frame to be played;
The signal acquisition unit is used for determining a signal transmission direction between a sound generating position and a sound receiving position based on sound generating position information of the sound generating object and the sound receiving position information;
and calculating the target sound source angle corresponding to each audio signal to be played according to the signal transmission direction.
Optionally, the audio processing device provided by the embodiment of the present invention further includes a second location determining unit, configured to determine, in response to a reference object selection operation by a user, a reference object corresponding to the user in the video frame to be played;
acquiring reference position information of the reference object in the video frame to be played;
the signal acquisition unit is used for determining a signal transmission angle between the sounding object and the reference object based on the sounding position information of the sounding object and the reference position information, and the signal transmission angle is used as a target sound source angle corresponding to each audio signal to be played.
Optionally, the signal obtaining unit is configured to receive a data packet to be played sent by the signal sending end, where the data packet to be played is obtained by encoding based on audio signals to be played of each loud speaker and target sound source angles corresponding to the audio signals to be played;
Decoding the data packet to be played to obtain an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played.
Optionally, the signal obtaining unit is configured to receive a data packet to be played sent by a cloud end, where the data packet to be played is sent to the cloud end by a signal sending end, and the data packet to be played is obtained by encoding by the signal sending end based on audio signals to be played of each external speaker and target sound source angles corresponding to each audio signal to be played;
decoding the data packet to be played to obtain an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played.
Optionally, the signal conversion unit is configured to calculate an anti-crosstalk resonance function based on the anti-crosstalk function, where the anti-crosstalk resonance function is configured to eliminate cross-talk noise generated when the at least two loud speakers play audio, and an influence of resonance of an external auditory canal of a human ear;
and carrying out signal transformation on the audio signals to be processed of the loud speakers based on the anti-crosstalk resonance function to obtain target playing audio signals corresponding to the audio signals to be played.
Optionally, the audio processing device provided by the embodiment of the present invention further includes an adjustment parameter calculation unit, configured to obtain a play setting parameter of each of the loud speakers and an audio play preference parameter corresponding to a target user;
extracting audio parameters of the audio signal to be processed to obtain parameters to be played corresponding to the audio signal to be processed;
according to the play setting parameters, the audio play preference parameters and the parameters to be played, calculating an audio adjustment function corresponding to the audio signal to be processed;
the signal conversion unit is used for carrying out signal conversion on the audio signal to be processed based on the audio adjustment function, the target sound source angle, the signal transfer angle and the anti-crosstalk function to obtain a target playing audio signal corresponding to the audio signal to be processed.
Optionally, the audio processing device provided by the embodiment of the present invention further includes an audio playing unit, configured to send target playing audio signals corresponding to the audio signals to be played to the loud speakers corresponding to the audio signals to be played, respectively, to trigger the loud speakers to play the corresponding target playing audio signals.
Correspondingly, the embodiment of the invention also provides electronic equipment, which comprises a memory and a processor; the memory stores an application program, and the processor is configured to run the application program in the memory, so as to execute steps in any audio processing method provided by the embodiment of the present invention.
Accordingly, embodiments of the present invention also provide a computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform any of the steps in the audio processing method provided by the embodiments of the present invention.
In addition, the embodiment of the invention also provides a computer program product, which comprises a computer program or instructions, wherein the computer program or instructions realize the steps in any audio processing method provided by the embodiment of the invention when being executed by a processor.
By adopting the scheme of the embodiment of the invention, the audio signal to be played of each of at least two loud speakers and the target sound source angle corresponding to each audio signal to be played can be obtained, the signal transmission angle between each loud speaker and the user head is determined, the anti-crosstalk function corresponding to the signal transmission angle is calculated based on the signal transmission angle and the target sound source angle, the anti-crosstalk function is used for eliminating cross sound generated when the at least two loud speakers play audio, and the audio signal to be processed of each loud speaker is subjected to signal transformation based on the target sound source angle, the signal transmission angle and the anti-crosstalk function to obtain the target audio signal to be played corresponding to the audio signal to be played; in the embodiment of the invention, the anti-crosstalk function capable of eliminating the cross-talk generated in the open sound field is calculated, and the audio signal to be played is converted based on the target sound source angle and the anti-crosstalk function, so that the target audio signal can represent the sound source position information and offset the cross-talk generated during playing, therefore, the audio signal capable of representing the sound source direction can be generated without adding audio playing equipment, and the cross-talk generated when the audio is played by using the loud speaker is eliminated, so that the user can enjoy the audio with better playing effect.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1a is a schematic view of a scenario of an audio processing method according to an embodiment of the present invention;
fig. 1b is a schematic diagram of another scenario of an audio processing method according to an embodiment of the present invention;
FIG. 2 is a flow chart of an audio processing method provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of a video frame to be played according to an embodiment of the present invention;
fig. 4 is another schematic diagram of a video frame to be played according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of ideal and actual states of audio signal transmission provided by an embodiment of the present invention;
fig. 6 is a schematic diagram of a technical implementation of virtual 5.1 surround sound provided by an embodiment of the present invention;
fig. 7 is a schematic structural diagram of an audio processing apparatus according to an embodiment of the present invention;
fig. 8 is another schematic structural diagram of an audio processing apparatus according to an embodiment of the present invention;
Fig. 9 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
The embodiment of the invention provides an audio processing method, an audio processing device, electronic equipment and a computer readable storage medium. Specifically, the embodiment of the invention provides an audio processing method suitable for an audio processing device, and the audio processing device can be integrated in electronic equipment.
The electronic device may be a terminal or the like, including but not limited to a mobile terminal and a fixed terminal, for example, a mobile terminal including but not limited to a smart phone, a smart watch, a tablet computer, a notebook computer, a smart car, etc., wherein the fixed terminal includes but not limited to a desktop computer, a smart television, etc.
The electronic device may be a server, which may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN (Content Delivery Network ), and basic cloud computing services such as big data and artificial intelligence platform, but is not limited thereto.
The audio processing method of the embodiment of the invention can be realized by a server or a terminal and the server together.
The method for processing audio is described below by taking a terminal and a server together as an example.
As shown in fig. 1a, the audio processing system provided in the embodiment of the present invention may include a signal transmitting end 10, a signal receiving end 20, and the like; the signal transmitting terminal 10 and the signal receiving terminal 20 are connected through a network, for example, through a wired or wireless network connection, wherein the signal transmitting terminal 10 may exist as an electronic device for transmitting the audio signal to be played and the target sound source angle to the signal receiving terminal 20.
In some examples, the signal transmitting end 10 may be configured to receive a voice input of a user, generate an audio signal to be played, perform sound source analysis on the audio signal to be played to obtain a target sound source angle, and send the audio signal to be played and the target sound source angle to the signal receiving end 20.
The signal receiving end 20 may be configured to obtain an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played, determine a signal transfer angle between each loud speaker and a user's head, and calculate an anti-crosstalk function corresponding to the signal transfer angle based on the signal transfer angle and the target sound source angle, where the anti-crosstalk function is used to eliminate cross-talk generated when the at least two loud speakers play audio, and perform signal transformation on the audio signal to be processed of each loud speaker based on the anti-crosstalk function, so as to obtain a target playing audio signal corresponding to the audio signal to be played.
It should be understood that, in some embodiments, the steps of audio processing performed by the signal receiving terminal 20 may also be performed by the signal transmitting terminal 10, and the signal transmitting terminal 10 may directly transmit the target playing audio signal after signal conversion to the signal receiving terminal 20, which is not limited in the embodiments of the present invention.
As shown in fig. 1b, the audio processing system provided in the embodiment of the present invention may include a signal sending end 10, a signal receiving end 20, a cloud end 30, and the like; the signal sending end 10, the signal receiving end 20 and the cloud end 30 are connected through a network, for example, through a wired or wireless network connection, wherein the signal sending end 10 can exist as an electronic device for sending the audio signal to be played and the target sound source angle to the cloud end 30.
In some examples, the signal transmitting end 10 may be configured to receive a voice input of a user, generate an audio signal to be played, perform sound source analysis on the audio signal to be played to obtain a target sound source angle, and send the audio signal to be played and the target sound source angle to the cloud end 30.
The cloud end 30 may be configured to forward the audio signal to be played and the target sound source angle to the signal receiving end 20.
The signal receiving end 20 may be configured to obtain an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played, determine a signal transfer angle between each loud speaker and a user's head, and calculate an anti-crosstalk function corresponding to the signal transfer angle based on the signal transfer angle and the target sound source angle, where the anti-crosstalk function is used to eliminate cross-talk generated when the at least two loud speakers play audio, and perform signal transformation on the audio signal to be processed of each loud speaker based on the anti-crosstalk function, so as to obtain a target playing audio signal corresponding to the audio signal to be played.
It should be understood that, in some embodiments, the step of audio processing performed by the signal receiving end 20 may also be performed by the signal sending end 10 or the cloud end 30, for example, the signal sending end 10 or the cloud end 30 may directly send the target playing audio signal after the signal conversion to the signal receiving end 20, which is not limited in the embodiment of the present invention.
The following will describe in detail. The following description of the embodiments is not intended to limit the preferred embodiments.
Embodiments of the present invention will be described in terms of an audio processing apparatus, which may be integrated in a server or a terminal in particular.
As shown in fig. 2, the specific flow of the audio processing method of the present embodiment may be as follows:
201. and acquiring an audio signal to be played of each loud speaker in at least two loud speakers and a target sound source angle corresponding to each audio signal to be played.
In an embodiment of the invention, the loud speaker is a speaker for audio playback in an open sound field. Wherein the number of the loud speakers is at least two.
For example, the open sound field may be a scene such as playing audio and video using home appliances such as televisions and smart speakers, or playing audio and video using car audio in automobiles. The loud speaker may be a speaker provided in an electronic device such as a television, a mobile phone, a vehicle-mounted terminal, or the like.
The audio signal to be played is an original audio signal acquired by the audio processing device. For example, in a video conference scenario, the audio signal to be played may be a speech signal of the speaking user collected by the video conference client. Alternatively, in an audio-video playback scenario, the audio signal to be played may be an audio signal derived from an audio file that needs to be played, and so on.
It will be appreciated that the audio signals to be played for each of the loud speakers may be the same or different. For example, an audio file to be played may naturally include audio signals to be played in the left channel and the right channel, respectively, and then each loud speaker may correspond to the audio signal of the left channel or the audio signal of the right channel, respectively.
Specifically, the target sound source angle is the angle of the sound source expected to be presented to the user during audio playback. For example, in a video call, the target sound source angle may be 30 ° if the angle between the position of the talking user in the video picture and the position of the user of the video call application in the video picture is 30 °.
The target sound source angle corresponding to each audio signal to be played may be the same or different. For example, if the audio signals to be played by the loud speakers are the same, the target sound source angles are the same, or in a scenario where multiple users speak at the same time, different audio signals to be played may be obtained according to the voice input signals of different users, and at this time, the target sound source angles corresponding to the different audio signals to be played may be different.
Wherein, even if the audio signals to be played are different from each other, the target sound source angles corresponding to the different audio signals to be played can be the same. For example, the speech of the same user is divided into audio signals to be played in the left channel and the right channel, and at this time, the loud speakers may correspond to the audio signals of the left channel or the audio signals of the right channel, respectively, but the audio signals of the left channel or the audio signals of the right channel correspond to the same speech user, and the corresponding target sound source angles are the same.
In some alternative embodiments, the target sound source angle may need to be located for the audio signal to be played, and step 201 may specifically include:
acquiring an audio signal to be played of each of at least two loud speakers;
and performing sound source position positioning on the audio signals to be played, and determining target sound source angles corresponding to the audio signals to be played.
The sound source position positioning can be realized by adopting the technologies of direction of arrival positioning technology (Direction Of Arrival, DOA), microphone array system positioning and the like. Specifically, the sound source location may be performed at a signal transmitting end that generates an audio signal to be played. For example, in a video conference, a target sound source angle may be located by a Microphone (MIC) array from the terminal of the speaking user.
Alternatively, the positioning of the sound source position may be performed at a cloud end or a signal receiving end that obtains the audio signal to be played. For example, the signal receiving end may obtain the target sound source angle based on information such as a time delay between a speaking time of a speaking user and a time when an audio signal to be played is received.
In other alternative embodiments, the audio signal to be played may correspond to a video frame, and the target sound source angle may be obtained according to the position of the sound object in the video frame. The step of obtaining the audio signal to be played of each of the at least two loud speakers and the target sound source angle corresponding to each of the audio signals to be played may specifically include:
acquiring an audio signal to be played of each of at least two loud speakers and a video frame to be played corresponding to the audio signal to be played;
determining sounding position information of a sounding object from the video frame to be played;
and calculating the target sound source angle corresponding to each audio signal to be played based on the sounding position information of the sounding object.
The video frame to be played corresponding to the audio signal to be played may be a video frame synchronized with the audio signal to be played, or the video frame to be played may be a synchronized video frame synchronized with the audio signal to be played and N video frames sequentially located before and/or after the synchronized video frame.
Specifically, the number of N may be set by a technician according to practical situations, which is not limited in the embodiment of the present invention.
For example, the video frames to be played may include a synchronous video frame in which the audio signal to be played is synchronized and 10 video frames sequentially following the synchronous video frame. The cloud or the signal receiving end can detect sounding objects in advance according to 10 video frames which are sequentially positioned behind the synchronous video frames, and the speed of audio processing is improved.
The sound emission position information may be a region position of a display region of the sound emission object in a video frame to be played. Alternatively, the sound emission position information may be virtual position information of the sound emission object in a virtual space corresponding to the video frame to be played, and so on.
In some examples, when the audio signal to be played is obtained, it may be determined which object in the video frame to be played corresponds to the audio signal to be played, and after determining the corresponding sound object, the target sound source angle may be calculated according to the position of the sound object in the video frame to be played. That is, the video frame to be played may include at least one candidate sound object, and before the step of determining sound position information of the sound object from the video frame to be played, the audio processing method provided by the embodiment of the present invention may further include:
And determining a sound object corresponding to the audio signal to be played, and acquiring object identification information of the sound object.
The object identification information may be identification information capable of identifying the sound generating object, such as an account nickname, a unique identity ID, etc., and the embodiment of the present invention does not limit the content and form of the object identification information.
Correspondingly, the step of determining the sounding position information of the sounding object from the video frame to be played may specifically include:
information matching is carried out from the candidate sound production objects included in the video frame to be played based on the object identification information, and the sound production objects are determined;
and acquiring a target display area of the sound emission object in the video frame to be played, and taking the position information of the target display area as sound emission position information of the sound emission object.
For example, as shown in fig. 3, 3 different candidate sound objects user 1, user 2, and user 3 are displayed in the smart tv. When the user 1 speaks in the video conference, based on application data of the video conference application, it may be determined that a sound object corresponding to the audio signal to be played is the user 1, and the object identification information may be "user 1".
The smart television can determine the sound generating object corresponding to the user 1 and the target display area of the sound generating object in the video frame to be played from the candidate sound generating objects according to the object identification information of the user 1.
It may be appreciated that the display area of each candidate sound object may be fixed, and the position information of the target display area of the sound object may be obtained after determining the sound object corresponding to the audio signal to be played. Alternatively, the user may set the display area of the candidate sound object by himself, and at this time, the position information of the target display area needs to be determined according to the object identification information of the sound object and the personalized setting information of the user.
In other examples, for the region where each object in the video frame to be played is located, it may be determined which object is sounding by detecting whether each object has a mouth activity, etc. That is, the display area of at least one candidate sound object may be included in the video frame to be played, and the step of determining sound position information of the sound object from the video frame to be played includes:
respectively detecting sounding actions for each display area in the video frame to be played, and taking a candidate sounding object as a sounding object if detecting that the candidate sounding object in one display area executes sounding actions;
And acquiring a target display area of the sound emission object in the video frame to be played, and taking the position information of the target display area as sound emission position information of the sound emission object.
Specifically, the sounding action detection may be to detect only whether or not the mouth of each candidate sounding object is active. Or, in order to avoid misidentification such as a mouth, a cough and the like caused by mouth movements when the candidate sound production object does not produce sound, facial muscle identification and the like can be added on the basis of mouth movement identification, so that the accuracy of sound production movement detection is improved.
For example, as shown in fig. 3, when the mouth of the user 1 is active, the candidate sound object user 1 is the sound object. At this time, the target display area is the local area where the user 1 is located.
In other optional examples, the user may manually select the position of the user, and before the step of calculating the target sound source angle corresponding to each audio signal to be played based on the sound production position information of the sound production object, the audio processing method provided by the embodiment of the present invention may further include:
and responding to the position selection operation of the user, and determining the corresponding sound receiving position information of the user in the video frame to be played.
For example, the user may select his or her position in the video frame to be played, such as the lower left corner of the picture, the center of the picture, etc., at the time of the video conference. The position selection operation of the user may be a drag operation of the user on the screen display frame corresponding to the user, a click/double click trigger operation in the video conference screen, or the like.
Correspondingly, the step of calculating the target sound source angle corresponding to each audio signal to be played based on the sounding position information of the sounding object may specifically include:
determining a signal transmission direction between a sound emission position and a sound reception position based on sound emission position information of the sound emission object and the sound reception position information;
and calculating the target sound source angle corresponding to each audio signal to be played according to the signal transmission direction.
The signal transmission direction refers to a direction between a sound emission position of the sound emission object and a sound receiving position selected by a user.
It will be appreciated that the location indicated by the sound receiving location information may not be in the video frame to be played. For example, in the context of an online concert, a user may pre-select his/her virtual location in a virtual concert venue. At this time, the virtual position is the sound receiving position corresponding to the video frame to be played in the online concert of the user. If the virtual location selected by the user may or may not be displayed in the video frame of the concert.
Correspondingly, the sound production object is the performer in the concert, and the sound production position information is the position information of the performer in the virtual stadium.
Optionally, to enhance the viewing experience of the user when watching the audio and video, the user may be allowed to select an object in a certain video, so as to enhance the feeling of being in the scene of the user. That is, before the step of calculating the target sound source angle corresponding to each audio signal to be played based on the sound production position information of the sound production object, the audio processing method provided by the embodiment of the invention may further include:
responding to a reference object selection operation of a user, and determining a corresponding reference object of the user in the video frame to be played;
and acquiring the reference position information of the reference object in the video frame to be played.
For example, as shown in fig. 4, there are two objects in a video picture displayed by the terminal, and the user can select one of them as a reference object.
Correspondingly, the step of calculating the target sound source angle corresponding to each audio signal to be played based on the sounding position information of the sounding object may specifically include:
and determining a signal transmission angle between the sounding object and the reference object based on the sounding position information of the sounding object and the reference position information, and taking the signal transmission angle as a target sound source angle corresponding to each audio signal to be played.
It will be appreciated that the position of the reference object as well as the position of the sound object may vary. For example, a drama is played in the terminal, the reference object selected by the user may be the main angle of the drama, the sound generating objects may change at any time, and the positions of the sound generating objects and the positions of the reference objects may be movable.
202. Determining a signal transmission angle between each loud speaker and the head of a user;
the signal transmission angle is the included angle between the loud speaker and the head of the user. Specifically, when there is only one person in the speaker or the human body detection range of the terminal in which the speaker is located, the head position of the person may be defaulted to be the user head position.
If the loud speaker or at least two persons exist in the human body detection range of the terminal where the loud speaker is located, the person closest to the loud speaker can be taken as a user; alternatively, the face of the account logged in on the terminal may be matched with the face of each person, the successfully matched person may be used as the user, and so on.
In some embodiments, when the signal sending end and the signal receiving end cannot directly communicate, the data packet to be played may be forwarded to the signal receiving end through the cloud. That is, the step of "obtaining the audio signal to be played and the target sound source angle corresponding to each audio signal to be played in each of the at least two loud speakers" may include:
Receiving a data packet to be played, which is sent by a cloud end, wherein the data packet to be played is sent to the cloud end by a signal sending end, and the data packet to be played is obtained by encoding the signal sending end based on audio signals to be played of all external speakers and target sound source angles corresponding to the audio signals to be played;
decoding the data packet to be played to obtain an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played.
The signal sending end can directly conduct sound source positioning based on the audio signal to be played to obtain a target sound source angle, and then the audio signal to be played and the target sound source angle are encoded to obtain a data packet to be played.
Or, the signal transmitting end may encode only the audio signal to be played to obtain a data packet to be played, send the data packet to be played to the cloud end, and after the cloud end forwards the data packet to be played to the signal receiving end, the signal receiving end may decode the data packet to be played, and determine the target sound source angle according to the obtained audio signal to be played.
It can be understood that, due to the limited processing capability of the signal transmitting end and the signal receiving end, and the fact that the distance between the signal transmitting end and the signal receiving end is too far to directly communicate, the audio signal to be played can be obtained at the cloud end for processing.
In order to reduce transmission pressure, the signal transmitting end may perform encoding processing on an audio signal to be played, and the step of "obtaining the audio signal to be played of each of the loud speakers and the target sound source angle corresponding to each of the audio signals to be played in at least two loud speakers" may include:
receiving a data packet to be played sent by a signal sending end, wherein the data packet to be played is obtained by encoding based on audio signals to be played of all the loud speakers and target sound source angles corresponding to the audio signals to be played;
decoding the data packet to be played to obtain an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played.
The signal sending end can directly conduct sound source positioning based on the audio signal to be played to obtain a target sound source angle, and then the audio signal to be played and the target sound source angle are encoded to obtain a data packet to be played.
Or, the signal transmitting end may encode only the audio signal to be played to obtain a data packet to be played, send the data packet to be played to the cloud, trigger the cloud to decode the data packet to be played, and determine the target sound source angle according to the obtained audio signal to be played.
Further, the cloud may also execute step 203 directly after obtaining the signal transmission angle.
203. And calculating an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle, wherein the anti-crosstalk function is used for eliminating cross sound generated when the at least two loud speakers play audio.
As shown in fig. 5, in an ideal state, a sound source E virtually at an angle θ (i.e., a target sound source angle) is processed by an HRTF (Head Related Transfer Function ) of the left ear at the angle θ, i.e., α0, and played to the left ear; and after being processed by the right ear HRTF of the angle theta, namely beta 0, the sound source is played for the right ear, and a user can hear the sound source which is virtual to the angle theta. That is, the physical paths of the user's left and right ears are substantially isolated in an ideal situation, and the sound of the left ear is substantially impossible for the right ear to hear, and vice versa.
That is, the ideal virtual azimuth listening signal is:
however, in the open sound field scenario, a sound intended for only the left ear of the user can actually be received by the right ear of the user, and a sound intended for only the right ear of the user can actually be received by the left ear of the user, which is a cross-talk (cross-talk).
That is, the actual listening signal is the interference result of crosstalk and the position of the loud speakers, and the loud speakers including the left speaker and the right speaker are exemplified, and the unprocessed audio signal heard by the user can be expressed as:
L=α 11 ,f)α 0 (θ,f)E(f)+β 22 ,f)β 0 (θ,f)E(f)
R=α 22 ,f)β 0 (θ,f)E(f)+β 11 ,f)a 0 (θ,f)E(f)
therefore, in order to achieve a better effect of the virtual sound source, it is necessary to cancel crosstalk generated during spatial propagation of the signal. The step of calculating an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle may include:
determining a speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle;
determining a sound source head related transfer function corresponding to the target sound source angle based on the target sound source angle;
and calculating an anti-crosstalk function corresponding to the signal transfer angle according to the speaker head related transfer function and the sound source head related transfer function.
In the practical application process, each loud speaker has a spatial position, if the signals processed according to the target sound source angle are directly played by the loud speaker, the actually played sound is multiplied by the HRTF corresponding to the signal transmission angle between the loud speaker and the head of the user, and the actually played sound is actually heard by the user.
Therefore, before calculating the anti-crosstalk function, a speaker head related transfer function corresponding to the loud speaker needs to be determined based on the signal transfer angle.
Optionally, taking the example that the at least two loud speakers include a left speaker and a right speaker, the step of determining a signal transmission angle between each loud speaker and a user's head specifically includes:
determining a left signal transfer angle between the left speaker and a user's head;
determining a right side signal transfer angle between the right speaker and the user's head;
the step of determining a speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle may include:
determining a first left-ear head related transfer function between the left speaker and a left ear of a user and a first right-ear head related transfer function between the left speaker and a right ear of the user based on the left signal transfer angle;
determining a second left-ear head related transfer function between the right speaker and a left ear of the user and a second right-ear head related transfer function between the right speaker and a right ear of the user based on the right-side signal transfer angle;
And taking the first left-ear head related transfer function, the first right-ear head related transfer function, the second left-ear head related transfer function and the second right-ear head related transfer function as speaker head related transfer functions.
The left speaker and the right speaker are not limited to the positional relationship between the loud speaker and the user. In general, the left speaker refers to a loud speaker positioned on the left side of two loud speakers, and the right speaker refers to a loud speaker positioned on the right side of two loud speakers.
Specifically, when determining the first left-ear head related transfer function, the first right-ear head related transfer function, the second left-ear head related transfer function and the second right-ear head related transfer function, a pre-established head related transfer function library may be obtained, and when the method is applied, after determining the head related transfer function library, according to the left signal transfer angle and the right signal transfer angle, a first left-ear head related transfer function, a first right-ear head related transfer function, a second left-ear head related transfer function and a second right-ear head related transfer function corresponding to each angle are obtained from the head related transfer function library.
The left signal transfer angle may be only one angle, that is, an angle between the left speaker and a certain position of the user's head, and may be obtained from the head related transfer function library only according to the angle when determining the first left ear head related transfer function and the first right ear head related transfer function.
Alternatively, to improve audio processing accuracy, the left side signal transfer angle may include two angles, namely, an angle between the left speaker and the user's left ear and an angle between the left speaker and the user's right ear. Correspondingly, when the first left-ear head related transfer function is determined, the first left-ear head related transfer function can be obtained from a head related transfer function library according to the angle between the left loudspeaker and the left ear of the user; in determining the first right-ear head related transfer function, it may be obtained from a library of head related transfer functions based on the angle between the left speaker and the user's right ear.
The right signal transmission angle is similar to the left signal transmission angle, and the embodiments of the present invention are not described herein again.
In some alternative embodiments, the speaker head related transfer function may be further simplified if the left and right speakers are left-right symmetric with respect to the user's head direction. That is, taking the example that the at least two loud speakers include a left speaker and a right speaker, the step of determining a signal transmission angle between each of the loud speakers and the head of the user may specifically include:
determining a positional relationship between the left speaker, the right speaker, and a user's head;
And if the left speaker and the right speaker are in bilateral symmetry relation relative to the user head, taking the angle between any loud speaker and the user head as a signal transmission angle.
Accordingly, the step of determining a speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle includes:
determining a first head related transfer function between the left speaker and the user's left ear and between the right speaker and the user's right ear, and a second head related transfer function between the left speaker and the user's right ear and between the right speaker and the user's left ear based on the signal transfer angle;
the first head related transfer function and the second head related transfer function are taken as speaker head related transfer functions.
That is, the HRTF is homonymously symmetric, the head related transfer function from left speaker to left ear is equal to the head related transfer function from right speaker to right ear, homonymous, i.e
α 1 =α 2
β 1 =β 2
Wherein, as shown in FIG. 5, α 1 And beta 1 Is the transfer function of a left speaker with an angle of theta 1 to the left and right ears of the user, alpha 2 And beta 2 Is the head related transfer function of the right speaker at an angle θ2 to the user's right and left ears.
Alternatively, the crosstalk resisting function capable of eliminating the cross-over crosstalk can be determined according to the generation rule of the cross-over crosstalk. The step of calculating an anti-crosstalk function corresponding to the signal transfer angle according to the speaker head related transfer function and the sound source head related transfer function may specifically include:
performing matrix combination processing according to the speaker head related transfer function to obtain a speaker crosstalk matrix corresponding to the audio signal to be processed;
performing matrix cancellation on the speaker crosstalk matrix, and calculating a crosstalk cancellation matrix of the speaker crosstalk matrix;
and calculating an anti-crosstalk function corresponding to the signal transfer angle based on the crosstalk cancellation matrix and the sound source head related transfer function.
Taking the example that the loud speaker comprises a left speaker and a right speaker, performing matrix combination processing on unprocessed audio signals heard by a user according to the speaker head related transfer function to obtain a matrix representation of the audio signals:
the speaker crosstalk matrix is:
further, the crosstalk cancellation matrix may be designed for the speaker crosstalk matrix such that the processed audio signal may cancel the speaker crosstalk matrix during spatial propagation.
Specifically, the crosstalk cancellation matrix a may be represented by the following formula:
at this time, based on the original audio signal to be played, the result of the pre-cancellation is performed through the matrix a, then the effect of a is cancelled in the space propagation process, and finally the user can hear the appropriate sound.
The audio signals x and y respectively transmitted to the left speaker and the right speaker after being processed by the crosstalk cancellation matrix a can be expressed by the following formula:
namely:
/>
thus, the anti-crosstalk function corresponding to the signal transfer angle can be calculated based on the crosstalk cancellation matrix and the acoustic source related transfer function.
The anti-crosstalk function can be expressed by the following formula:
in the case of left and right speakers that are symmetrical with respect to the user's head, the anti-crosstalk function can be reduced to the following form:
it should be noted that, although the foregoing embodiment is described taking a dual speaker as an example, the audio processing method according to the embodiment of the present invention is also applicable to a multi-speaker scenario. The core problem in a multi-speaker scenario is similar to that of a dual speaker, i.e. the crosstalk between the loud speakers and the additional transfer function of each loud speaker, it will be appreciated that the more loud speakers, the more severe the crosstalk between the loud speakers is, and the more complex the problem is.
However, in the case of a multi-loud speaker, the crosstalk cancellation matrix a may be calculated according to the principle that the audio signal is superimposed on the left and right ears of the user. Multiplying the audio signal to be played by a crosstalk cancellation matrix A, and further calculating an anti-crosstalk function. Specifically, in the multi-speaker scene, a is n×m, n is the number of speakers, and m is the number of sound sources that can be virtualized.
It will be appreciated that in practical applications, the HRTF corresponding to an accurate angle may not be obtained by using only the HRTF library obtained by practical measurement. Therefore, the estimation process can be performed according to the existing angle and the corresponding HRTF. Before the step of calculating the anti-crosstalk function corresponding to the signal transfer angle according to the speaker head related transfer function and the sound source head related transfer function, the audio processing method provided by the embodiment of the invention may further include:
acquiring a preset discrete head related transfer function;
and performing function approximation processing on the discrete head related transfer function to obtain a target head related transfer function.
Specifically, the function approximation processing may include an interpolation method or a curve fitting method, and a technician may select an appropriate function approximation processing mode according to actual application requirements.
Correspondingly, the step of determining the speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle may specifically include:
determining a speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle and the target head related transfer function;
accordingly, the step of determining, based on the target sound source angle, a sound source head related transfer function corresponding to the target sound source angle may specifically include:
and determining a sound source head related transfer function corresponding to the target sound source angle based on the target sound source angle and the target head related transfer function.
Alternatively, the data in the preset head related transfer function library may be obtained from an existing database, for example, the data in the head related transfer function library in this embodiment may be obtained from a CIPIC database. Furthermore, the embodiment of the invention can perform interpolation processing according to the CIPIC database to obtain the target head related transfer function comprising the HRTF corresponding to more accurate angles.
It will be appreciated that the same library of head related transfer functions may not be suitable for individual users due to differences in head shape, etc. Thus, in the present embodiment, a head-related transfer function library of different users may be established in advance. In application, the target user to receive the target playing audio signal is determined firstly, and the target user can be determined specifically by information of an account logged in a terminal or an application program, face characteristic information of the target user and the like. After the head related transfer function library of the target user is determined, a speaker head related transfer function and an acoustic source related transfer function are obtained from the head related transfer function library of the target user.
204. And carrying out signal transformation on the audio signals to be processed of the loud speakers based on the anti-crosstalk function to obtain target playing audio signals corresponding to the audio signals to be played.
Specifically, the target playback audio signals x, y respectively sent to the left speaker and the right speaker can be expressed by the following formulas:
x=G L0 ,f)E(f)
y=G R0 ,f)E(f)
in an alternative embodiment, the audio processing method provided by the embodiment of the invention can be applied to realize the effect of using double loud speakers to virtually generate 5.1 surround sound. As shown in fig. 6, the bass path is not considered and it is assumed that the dual loud speakers are at the same angle as the front left and right speakers in the 5.1 channel scene, i.e., 30 ° and 330 ° positions. The technical implementation of virtual 5.1 surround sound with dual loud speakers can be expressed by the following formula:
L′=L+0.707C+G LLS ,f)LS+G LRS ,f)RS
R′=R+0.707C+G RLS ,f)LS+G RRS ,f)RS
wherein,
alpha is the HRTF of the front left speaker to the same side, beta is the HRTF of the front left speaker to the different side, and ideally the front left speaker and the front right speaker are symmetrical, so alpha is the HRTF of the front right speaker to the same side, and beta is the HRTF of the front right speaker to the different side.
HL and HR represent HRTFs from left rear and right rear speakers to ears at different angles, respectively, f-frequencies.
In the above formula, the two angular parameters needed for HL and HR are known in the simulated 5.1 scene, i.e. the left and right rear loudspeaker positions, 120 ° and 240 °, the parameters to be determined are 8 transfer functions, i.e. four azimuth to 8 functions of the ears.
That is, the target playing audio signal to be played by the left speaker may include an audio signal to be played that simulates a left front channel, an audio signal to be played that simulates a front channel, an audio signal that can cancel the crosstalk effect of the left speaker to the right ear after the anti-crosstalk function processing, and an audio signal that can cancel the crosstalk effect of the right speaker to the left ear after the anti-crosstalk function processing.
The content of the target playback audio signal to be played by the right speaker is similar to that described above, and will not be described again here.
In addition, assuming that a general human head is symmetrical, the head related transfer functions are also symmetrical left and right, for example, the head related transfer functions of 330 ° and 30 ° orientations to the left and right ears are symmetrical, so the head related transfer functions can be simplified to 4, which is equivalent to:
H RRS ,f)=H LLS ,f)=α 2
H RLS ,f)=H LRS ,f)=β 2
at this time, the liquid crystal display device,
the finishing result in fig. 6 can be expressed as:
in another alternative embodiment, the audio processing method provided by the embodiment of the invention can be applied to realize the effect of widening the virtual sound field by using the double-loud-speaker. The sound field of the dual loud speaker is widened, which is equivalent to the virtual left rear and right rear two channels with the left front and right front two channels in the case of 5 channels, and simultaneously has no left front, right front and center channels, and thus can be expressed by the following formula:
In the practical application process, the collection of the HRTF is generally obtained through the pickup of the inner auditory canal, that is to say, the HRTF measured by the collection method contains the effect of the transmission of an air channel outside the auditory canal and also contains the influence generated by the resonance of the auditory canal of a person. If such HRTF is directly applied to the signal, although the user can hear a noticeable spatial sensation, the intermediate frequency is distorted because the signal actually heard by the user is the result of passing through the two human ear resonance effects.
Therefore, an anti-crosstalk resonance function can be designed to eliminate the effects of cross-talk and ear canal resonance. The step of performing signal transformation on the audio signal to be processed of each loud speaker based on the anti-crosstalk function to obtain a target playing audio signal corresponding to the audio signal to be played may specifically include:
calculating an anti-crosstalk resonance function based on the anti-crosstalk function, wherein the anti-crosstalk resonance function is used for eliminating cross sound generated when the at least two external speakers externally play audio and the influence of resonance of the external auditory meatus of the human ear;
and carrying out signal transformation on the audio signals to be processed of the loud speakers based on the anti-crosstalk resonance function to obtain target playing audio signals corresponding to the audio signals to be played.
Specifically, for the same frequency, the energy ratio and the phase relation of the left channel and the right channel are unchanged, so that the direction sensing error is not caused. The anti-crosstalk resonance function can be expressed by the following formula:
/>
the denominator of the anti-crosstalk resonance function is equivalent to the frequency point energy spectrum averaged by the left and right channels, the phase between the frequency points is not influenced, the division number of the same frequency point of the left and right channels is the same, the energy relation of the left and right channels of the same frequency point is not influenced, and the energy of the emission signal of the left and right channels is unchanged after passing through the anti-crosstalk resonance function, so that the signal is kept stable by the processing of the anti-crosstalk resonance function, and the poles of each frequency point are eliminated.
Optionally, when performing audio processing, the audio signal to be played may be processed accordingly according to the volume setting of the user. Before the step of performing signal transformation on the audio signal to be processed based on the anti-crosstalk function, the audio processing method provided by the embodiment of the invention further includes:
acquiring the playing setting parameters of the loud speakers and the audio playing preference parameters corresponding to the target users;
extracting audio parameters of the audio signal to be processed to obtain parameters to be played corresponding to the audio signal to be processed;
And calculating an audio adjustment function corresponding to the audio signal to be processed according to the play setting parameter, the audio play preference parameter and the parameter to be played.
In the embodiment of the invention, the target user is a user logged in the audio playing device or a user currently using the audio playing device.
The audio playing preference parameters are generated according to the playing effect adjustment operation performed in advance by the target user before the process of determining the audio adjustment parameters. In particular, the audio playback preference parameters may include, but are not limited to, audio frequency (tone) adjustment parameters, audio loudness adjustment parameters, and the like.
The audio parameter extraction may be that the audio signal to be processed is subjected to audio analysis, so as to obtain specific parameters such as frequency, sampling bit number, channel number, loudness, bit rate and the like corresponding to the audio signal to be processed; alternatively, feature extraction may be performed on the audio signal to be processed to obtain a feature vector corresponding to the audio signal to be processed, where the feature vector may represent a parameter feature of the audio signal to be processed, and so on.
Correspondingly, the step of performing signal transformation on the audio signal to be processed based on the anti-crosstalk function to obtain a target playing audio signal corresponding to the audio signal to be processed includes:
And performing signal transformation on the audio signal to be processed based on the audio adjustment function, the target sound source angle, the signal transfer angle and the anti-crosstalk function to obtain a target playing audio signal corresponding to the audio signal to be processed.
The process of calculating the audio adjustment function corresponding to the audio signal to be processed may be performing operations such as parameter amplification or parameter reduction on the audio signal to be played, so that the target audio signal to be played after being processed by the audio adjustment function may satisfy the audio playing preference parameter when being played based on the playing setting parameter. Alternatively, the operations such as convolution or multiplication may be performed on the parameters to be played in the form of a matrix or vector, and so on.
In some optional embodiments, the audio processing method provided by the embodiment of the present invention may further include:
and respectively sending the target playing audio signals corresponding to the audio signals to be played to the loud speakers corresponding to the audio signals to be played, and triggering the loud speakers to play the corresponding target playing audio signals.
That is, after the cloud or the signal receiving end processes the target playing audio signal, the target playing audio signal can be sent to the corresponding loud speaker for playing.
As can be seen from the foregoing, the embodiment of the present invention may obtain an audio signal to be played and a target sound source angle corresponding to each audio signal to be played for each of at least two loud speakers, determine a signal transfer angle between each loud speaker and a user's head, calculate an anti-crosstalk function corresponding to the signal transfer angle based on the signal transfer angle and the target sound source angle, where the anti-crosstalk function is used to eliminate cross-talk generated when at least two loud speakers play audio, and perform signal transformation on the audio signal to be processed for each loud speaker based on the target sound source angle, the signal transfer angle and the anti-crosstalk function, so as to obtain a target play audio signal corresponding to the audio signal to be played; in the embodiment of the invention, the anti-crosstalk function capable of eliminating the cross-talk generated in the open sound field is calculated, and the audio signal to be played is converted based on the target sound source angle and the anti-crosstalk function, so that the target audio signal can represent the sound source position information and offset the cross-talk generated during playing, therefore, the audio signal capable of representing the sound source direction can be generated without adding audio playing equipment, and the cross-talk generated when the audio is played by using the loud speaker is eliminated, so that the user can enjoy the audio with better playing effect.
In order to better implement the above method, correspondingly, the embodiment of the invention also provides an audio processing device.
Referring to fig. 7, the apparatus includes:
a signal obtaining unit 701, configured to obtain an audio signal to be played by each of the loud speakers in at least two loud speakers and a target sound source angle corresponding to each of the audio signals to be played;
an angle determining unit 702, configured to determine a signal transmission angle between each of the loud speakers and a user's head;
a function calculating unit 703, configured to calculate, based on the signal transfer angle and the target sound source angle, an anti-crosstalk function corresponding to the signal transfer angle, where the anti-crosstalk function is used to eliminate cross-talk generated when the at least two loud speakers play audio;
and a signal conversion unit 704, configured to perform signal conversion on the audio signal to be processed of each of the loud speakers based on the anti-crosstalk function, so as to obtain a target playing audio signal corresponding to the audio signal to be played.
Optionally, the function calculating unit 703 is configured to determine a speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle;
Determining a sound source head related transfer function corresponding to the target sound source angle based on the target sound source angle;
and calculating an anti-crosstalk function corresponding to the signal transfer angle according to the speaker head related transfer function and the sound source head related transfer function.
Optionally, the at least two loud speakers include a left speaker and a right speaker, and the angle determining unit 702 is configured to determine a left signal transmission angle between the left speaker and the head of the user;
determining a right side signal transfer angle between the right speaker and the user's head;
the function calculation unit is used for determining a first left-ear head related transfer function between the left loudspeaker and the left ear of the user and a first right-ear head related transfer function between the left loudspeaker and the right ear of the user based on the left-side signal transfer angle;
determining a second left-ear head related transfer function between the right speaker and a left ear of the user and a second right-ear head related transfer function between the right speaker and a right ear of the user based on the right-side signal transfer angle;
and taking the first left-ear head related transfer function, the first right-ear head related transfer function, the second left-ear head related transfer function and the second right-ear head related transfer function as speaker head related transfer functions.
Optionally, the at least two loud speakers include a left speaker and a right speaker, and the angle determining unit 702 is configured to determine a positional relationship among the left speaker, the right speaker and the user's head;
if the left speaker and the right speaker are in bilateral symmetry relation relative to the user head, taking the angle between any loud speaker and the user head as a signal transmission angle;
the function calculation unit 703 is configured to determine a first head related transfer function between the left speaker and the left ear of the user and between the right speaker and the right ear of the user, and a second head related transfer function between the left speaker and the right ear of the user and between the right speaker and the left ear of the user, based on the signal transfer angle;
the first head related transfer function and the second head related transfer function are taken as speaker head related transfer functions.
Optionally, the function calculating unit 703 is configured to perform matrix combination processing according to the speaker head related transfer function to obtain a speaker crosstalk matrix corresponding to the audio signal to be processed;
performing matrix cancellation on the speaker crosstalk matrix, and calculating a crosstalk cancellation matrix of the speaker crosstalk matrix;
And calculating an anti-crosstalk function corresponding to the signal transfer angle based on the crosstalk cancellation matrix and the sound source head related transfer function.
Optionally, as shown in fig. 8, the audio processing apparatus provided in the embodiment of the present invention further includes a function processing unit 705, configured to obtain a preset discrete head related transfer function;
performing function approximation processing on the discrete head related transfer function to obtain a target head related transfer function;
the function calculation unit is used for determining a speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle and the target head related transfer function;
and determining a sound source head related transfer function corresponding to the target sound source angle based on the target sound source angle and the target head related transfer function.
Optionally, the signal obtaining unit 701 is configured to obtain an audio signal to be played of each of the at least two loud speakers;
and performing sound source position positioning on the audio signals to be played, and determining target sound source angles corresponding to the audio signals to be played.
Optionally, the signal obtaining unit 701 is configured to obtain an audio signal to be played of each of the at least two loud speakers and a video frame to be played corresponding to the audio signal to be played;
Determining sounding position information of a sounding object from the video frame to be played;
and calculating the target sound source angle corresponding to each audio signal to be played based on the sounding position information of the sounding object.
Optionally, the video frame to be played includes at least one candidate sound object, and the audio processing apparatus provided in the embodiment of the present invention further includes a sound object determining unit 706, configured to determine a sound object corresponding to the audio signal to be played, and obtain object identification information of the sound object;
the signal obtaining unit 701 is configured to perform information matching based on the object identification information from the candidate sound generating objects included in the video frame to be played, and determine the sound generating object;
and acquiring a target display area of the sound emission object in the video frame to be played, and taking the position information of the target display area as sound emission position information of the sound emission object.
Optionally, the video frame to be played includes at least one display area of a candidate sound object, and the signal obtaining unit 701 is configured to detect sound actions for each display area in the video frame to be played, and if detecting that a candidate sound object in the display area performs a sound action, take the candidate sound object as a sound object;
And acquiring a target display area of the sound emission object in the video frame to be played, and taking the position information of the target display area as sound emission position information of the sound emission object.
Optionally, the audio processing apparatus provided by the embodiment of the present invention further includes a first position determining unit 707, configured to determine, in response to a position selection operation of a user, sound receiving position information corresponding to the user in the video frame to be played;
the signal acquisition unit 701 is configured to determine a signal transmission direction between a sound emission position and a sound reception position based on sound emission position information of the sound emission object and the sound reception position information;
and calculating the target sound source angle corresponding to each audio signal to be played according to the signal transmission direction.
Optionally, the audio processing device provided by the embodiment of the present invention further includes a second location determining unit, configured to determine, in response to a reference object selection operation by a user, a reference object corresponding to the user in the video frame to be played;
acquiring reference position information of the reference object in the video frame to be played;
the signal acquisition unit 701 is configured to determine, based on the sounding position information of the sounding object and the reference position information, a signal transmission angle between the sounding object and the reference object, as a target sound source angle corresponding to each audio signal to be played.
Optionally, the signal obtaining unit 701 is configured to receive a data packet to be played sent by a signal sending end, where the data packet to be played is obtained by encoding based on audio signals to be played of each loud speaker and target sound source angles corresponding to the audio signals to be played;
decoding the data packet to be played to obtain an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played.
Optionally, the signal obtaining unit 701 is configured to receive a data packet to be played sent by a cloud end, where the data packet to be played is sent to the cloud end by a signal sending end, and the data packet to be played is obtained by encoding by the signal sending end based on audio signals to be played of each external speaker and target sound source angles corresponding to each audio signal to be played;
decoding the data packet to be played to obtain an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played.
Optionally, the signal transforming unit 704 is configured to calculate an anti-crosstalk resonance function based on the anti-crosstalk function, where the anti-crosstalk resonance function is configured to cancel an effect of cross-talk noise generated when the at least two loud speakers are playing audio, and resonance of an external auditory canal of the human ear;
And carrying out signal transformation on the audio signals to be processed of the loud speakers based on the anti-crosstalk resonance function to obtain target playing audio signals corresponding to the audio signals to be played.
Optionally, the audio processing device provided by the embodiment of the present invention further includes an adjustment parameter calculation unit, configured to obtain a play setting parameter of each of the loud speakers and an audio play preference parameter corresponding to a target user;
extracting audio parameters of the audio signal to be processed to obtain parameters to be played corresponding to the audio signal to be processed;
according to the play setting parameters, the audio play preference parameters and the parameters to be played, calculating an audio adjustment function corresponding to the audio signal to be processed;
the signal conversion unit 704 is configured to perform signal conversion on the audio signal to be processed based on the audio adjustment function, the target sound source angle, the signal transfer angle, and the anti-crosstalk function, so as to obtain a target playing audio signal corresponding to the audio signal to be processed.
Optionally, the audio processing device provided by the embodiment of the present invention further includes an audio playing unit, configured to send target playing audio signals corresponding to the audio signals to be played to the loud speakers corresponding to the audio signals to be played, respectively, to trigger the loud speakers to play the corresponding target playing audio signals.
As can be seen from the above, by the audio processing apparatus, it is possible to generate an audio signal capable of expressing the azimuth of a sound source without adding an audio playing device, and to eliminate cross sound generated when playing audio using a loud speaker, so that a user can enjoy audio with a better playing effect.
In addition, the embodiment of the present invention further provides an electronic device, which may be a terminal or a server, as shown in fig. 9, and shows a schematic structural diagram of the electronic device according to the embodiment of the present invention, specifically:
the electronic device may include Radio Frequency (RF) circuitry 901, memory 902 including one or more computer-readable storage media, input unit 903, display unit 904, sensor 905, audio circuitry 906, wireless fidelity (WiFi, wireless Fidelity) module 907, processor 908 including one or more processing cores, and power supply 909. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 9 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components. Wherein:
The RF circuit 901 may be used for receiving and transmitting signals during the process of receiving and transmitting information or communication, in particular, after receiving downlink information of a base station, the downlink information is processed by one or more processors 908; in addition, data relating to uplink is transmitted to the base station. Typically, RF circuitry 901 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a subscriber identity module (SIM, subscriber Identity Module) card, a transceiver, a coupler, a low noise amplifier (LNA, low Noise Amplifier), a duplexer, and the like. In addition, RF circuitry 901 may also communicate with networks and other devices via wireless communications. The wireless communication may use any communication standard or protocol including, but not limited to, global system for mobile communications (GSM, global System of Mobile communication), general packet radio service (GPRS, general Packet Radio Service), code division multiple access (CDMA, code Division Multiple Access), wideband code division multiple access (WCDMA, wideband Code Division Multiple Access), long term evolution (LTE, long Term Evolution), email, short message service (SMS, short Messaging Service), and the like.
The memory 902 may be used to store software programs and modules that the processor 908 performs various functional applications and data processing by executing the software programs and modules stored in the memory 902. The memory 902 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device (such as audio data, phonebooks, etc.), and the like. In addition, the memory 902 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 902 may also include a memory controller to provide access to the memory 902 by the processor 908 and the input unit 903.
The input unit 903 may be used to receive input numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control. In particular, in one particular embodiment, the input unit 903 may include a touch sensitive surface as well as other input devices. The touch-sensitive surface, also referred to as a touch display screen or a touch pad, may collect touch operations thereon or thereabout by a user (e.g., operations thereon or thereabout by a user using any suitable object or accessory such as a finger, stylus, etc.), and actuate the corresponding connection means according to a predetermined program. Alternatively, the touch-sensitive surface may comprise two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device and converts it into touch point coordinates, which are then sent to the processor 908 and can receive commands from the processor 908 and execute them. In addition, touch sensitive surfaces may be implemented in a variety of types, such as resistive, capacitive, infrared, and surface acoustic waves. The input unit 903 may comprise other input devices besides a touch sensitive surface. In particular, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), a trackball, mouse, joystick, etc.
The display unit 904 may be used to display information entered by a user or provided to a user as well as various graphical user interfaces of the electronic device, which may be composed of graphics, text, icons, video, and any combination thereof. The display unit 904 may include a display panel, which may alternatively be configured in the form of a liquid crystal display (LCD, liquid Crystal Display), an Organic Light-Emitting Diode (OLED), or the like. Further, the touch-sensitive surface may overlay a display panel, upon detection of a touch operation thereon or thereabout by the touch-sensitive surface, being communicated to the processor 908 to determine the type of touch event, and the processor 908 then provides a corresponding visual output on the display panel based on the type of touch event. Although in fig. 9 the touch sensitive surface and the display panel are implemented as two separate components for input and output functions, in some embodiments the touch sensitive surface may be integrated with the display panel to implement the input and output functions.
The electronic device may also include at least one sensor 905, such as a light sensor, a motion sensor, and other sensors. In particular, the light sensor may include an ambient light sensor that may adjust the brightness of the display panel according to the brightness of ambient light, and a proximity sensor that may turn off the display panel and/or backlight when the electronic device is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the acceleration in all directions (generally three axes), and can detect the gravity and the direction when the mobile phone is stationary, and can be used for applications of recognizing the gesture of the mobile phone (such as horizontal and vertical screen switching, related games, magnetometer gesture calibration), vibration recognition related functions (such as pedometer and knocking), and the like; other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that may also be configured with the electronic device are not described in detail herein.
Audio circuitry 906, speakers, and a microphone may provide an audio interface between the user and the electronic device. The audio circuit 906 may transmit the received electrical signal after audio data conversion to a speaker, which converts the electrical signal to a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 906 and converted into audio data, which are processed by the audio data output processor 908 for transmission to, for example, another electronic device via the RF circuit 901, or which are output to the memory 902 for further processing. The audio circuitry 906 may also include an ear bud jack to provide communication of the peripheral headphones with the electronic device.
WiFi belongs to a short-distance wireless transmission technology, and the electronic equipment can help a user to send and receive emails, browse webpages, access streaming media and the like through a WiFi module 907, so that wireless broadband Internet access is provided for the user. Although fig. 9 shows a WiFi module 907, it is to be understood that it does not belong to the necessary constitution of the electronic device, and can be omitted entirely as required within a range that does not change the essence of the invention.
The processor 908 is a control center of the electronic device, connects various parts of the entire handset using various interfaces and lines, performs various functions of the electronic device and processes data by running or executing software programs and/or modules stored in the memory 902, and invoking data stored in the memory 902. Optionally, the processor 908 may include one or more processing cores; preferably, the processor 908 may integrate an application processor that primarily handles operating systems, user interfaces, applications, etc., with a modem processor that primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 908.
The electronic device also includes a power supply 909 (e.g., a battery) that provides power to the various components, preferably in logical communication with the processor 908 via a power management system, to enable management of charge, discharge, and power consumption by the power management system. The power supply 909 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
Although not shown, the electronic device may further include a camera, a bluetooth module, etc., which will not be described herein. In particular, in this embodiment, the processor 908 in the electronic device loads executable files corresponding to the processes of one or more application programs into the memory 902 according to the following instructions, and the processor 908 executes the application programs stored in the memory 902, so as to implement various functions as follows:
acquiring an audio signal to be played of each loud speaker in at least two loud speakers and a target sound source angle corresponding to each audio signal to be played;
determining a signal transmission angle between each loud speaker and the head of a user;
Calculating an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle, wherein the anti-crosstalk function is used for eliminating cross sound generated when the at least two loud speakers play audio outwards;
and carrying out signal transformation on the audio signals to be processed of the loud speakers based on the anti-crosstalk function to obtain target playing audio signals corresponding to the audio signals to be played.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.
To this end, embodiments of the present invention provide a computer readable storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any of the audio processing methods provided by the embodiments of the present invention. For example, the instructions may perform the steps of:
acquiring an audio signal to be played of each loud speaker in at least two loud speakers and a target sound source angle corresponding to each audio signal to be played;
Determining a signal transmission angle between each loud speaker and the head of a user;
calculating an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle, wherein the anti-crosstalk function is used for eliminating cross sound generated when the at least two loud speakers play audio outwards;
and carrying out signal transformation on the audio signals to be processed of the loud speakers based on the anti-crosstalk function to obtain target playing audio signals corresponding to the audio signals to be played.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the computer-readable storage medium may comprise: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
Because the instructions stored in the computer readable storage medium can execute the steps in any audio processing method provided by the embodiments of the present invention, the beneficial effects that any audio processing method provided by the embodiments of the present invention can achieve can be achieved, which are detailed in the previous embodiments and are not described herein.
According to one aspect of the present application, there is also provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium and executes the computer instructions to cause the electronic device to perform the methods provided in the various alternative implementations of the embodiments described above.
The foregoing has described in detail the audio processing method, apparatus, electronic device, storage medium and program product provided by the embodiments of the present application, and specific examples have been applied to illustrate the principles and embodiments of the present application, and the above description of the embodiments is only for aiding in understanding the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims (21)

1. An audio processing method, comprising:
acquiring an audio signal to be played of each loud speaker in at least two loud speakers and a target sound source angle corresponding to each audio signal to be played;
Determining a signal transmission angle between each loud speaker and the head of a user;
calculating an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle, wherein the anti-crosstalk function is used for eliminating cross sound generated when the at least two loud speakers play audio outwards;
and carrying out signal transformation on the audio signals to be processed of the loud speakers based on the anti-crosstalk function to obtain target playing audio signals corresponding to the audio signals to be played.
2. The audio processing method according to claim 1, wherein the calculating an anti-crosstalk function corresponding to the signal transfer angle based on the signal transfer angle and the target sound source angle includes:
determining a speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle;
determining a sound source head related transfer function corresponding to the target sound source angle based on the target sound source angle;
and calculating an anti-crosstalk function corresponding to the signal transfer angle according to the speaker head related transfer function and the sound source head related transfer function.
3. The audio processing method of claim 2, wherein the at least two loud speakers include a left speaker and a right speaker, and wherein the determining the signal transfer angle between each of the loud speakers and the user's head comprises:
determining a left signal transfer angle between the left speaker and a user's head;
determining a right side signal transfer angle between the right speaker and the user's head;
the determining, based on the signal transfer angle, a speaker head related transfer function corresponding to the signal transfer angle includes:
determining a first left-ear head related transfer function between the left speaker and a left ear of a user and a first right-ear head related transfer function between the left speaker and a right ear of the user based on the left signal transfer angle;
determining a second left-ear head related transfer function between the right speaker and a left ear of the user and a second right-ear head related transfer function between the right speaker and a right ear of the user based on the right-side signal transfer angle;
and taking the first left-ear head related transfer function, the first right-ear head related transfer function, the second left-ear head related transfer function and the second right-ear head related transfer function as speaker head related transfer functions.
4. The audio processing method of claim 2, wherein the at least two loud speakers include a left speaker and a right speaker, and wherein the determining the signal transfer angle between each of the loud speakers and the user's head comprises:
determining a positional relationship between the left speaker, the right speaker, and a user's head;
if the left speaker and the right speaker are in bilateral symmetry relation relative to the user head, taking the angle between any loud speaker and the user head as a signal transmission angle;
the determining, based on the signal transfer angle, a speaker head related transfer function corresponding to the signal transfer angle includes:
determining a first head related transfer function between the left speaker and the user's left ear and between the right speaker and the user's right ear, and a second head related transfer function between the left speaker and the user's right ear and between the right speaker and the user's left ear based on the signal transfer angle;
the first head related transfer function and the second head related transfer function are taken as speaker head related transfer functions.
5. The audio processing method according to claim 2, wherein calculating an anti-crosstalk function corresponding to the signal transfer angle from the speaker head-related transfer function and the sound source head-related transfer function includes:
Performing matrix combination processing according to the speaker head related transfer function to obtain a speaker crosstalk matrix corresponding to the audio signal to be processed;
performing matrix cancellation on the speaker crosstalk matrix, and calculating a crosstalk cancellation matrix of the speaker crosstalk matrix;
and calculating an anti-crosstalk function corresponding to the signal transfer angle based on the crosstalk cancellation matrix and the sound source head related transfer function.
6. The audio processing method according to claim 2, wherein before calculating the anti-crosstalk function corresponding to the signal transfer angle from the speaker head-related transfer function and the sound source head-related transfer function, the method further comprises:
acquiring a preset discrete head related transfer function;
performing function approximation processing on the discrete head related transfer function to obtain a target head related transfer function;
the determining, based on the signal transfer angle, a speaker head related transfer function corresponding to the signal transfer angle includes:
determining a speaker head related transfer function corresponding to the signal transfer angle based on the signal transfer angle and the target head related transfer function;
The determining, based on the target sound source angle, a sound source head related transfer function corresponding to the target sound source angle includes:
and determining a sound source head related transfer function corresponding to the target sound source angle based on the target sound source angle and the target head related transfer function.
7. The method according to claim 1, wherein the obtaining the audio signal to be played by each of the at least two loud speakers and the target sound source angle corresponding to each of the audio signals to be played includes:
acquiring an audio signal to be played of each of at least two loud speakers;
and performing sound source position positioning on the audio signals to be played, and determining target sound source angles corresponding to the audio signals to be played.
8. The method according to claim 1, wherein the obtaining the audio signal to be played by each of the at least two loud speakers and the target sound source angle corresponding to each of the audio signals to be played includes:
acquiring an audio signal to be played of each of at least two loud speakers and a video frame to be played corresponding to the audio signal to be played;
Determining sounding position information of a sounding object from the video frame to be played;
and calculating the target sound source angle corresponding to each audio signal to be played based on the sounding position information of the sounding object.
9. The audio processing method according to claim 8, wherein the video frame to be played includes at least one candidate sound object, and before determining sound location information of the sound object from the video frame to be played, the method further comprises:
determining a sound object corresponding to the audio signal to be played, and acquiring object identification information of the sound object;
and determining sounding position information of a sounding object from the video frame to be played, wherein the sounding position information comprises:
information matching is carried out from the candidate sound production objects included in the video frame to be played based on the object identification information, and the sound production objects are determined;
and acquiring a target display area of the sound emission object in the video frame to be played, and taking the position information of the target display area as sound emission position information of the sound emission object.
10. The audio processing method according to claim 8, wherein the video frame to be played includes a display area of at least one candidate sound object, and the determining sound location information of the sound object from the video frame to be played includes:
Respectively detecting sounding actions for each display area in the video frame to be played, and taking a candidate sounding object as a sounding object if detecting that the candidate sounding object in one display area executes sounding actions;
and acquiring a target display area of the sound emission object in the video frame to be played, and taking the position information of the target display area as sound emission position information of the sound emission object.
11. The audio processing method according to claim 8, wherein before calculating the target sound source angle corresponding to each of the audio signals to be played based on the sound production position information of the sound production object, the method further comprises:
responding to the position selection operation of a user, and determining corresponding sound receiving position information of the user in the video frame to be played;
the calculating, based on the sounding position information of the sounding object, a target sound source angle corresponding to each audio signal to be played includes:
determining a signal transmission direction between a sound emission position and a sound reception position based on sound emission position information of the sound emission object and the sound reception position information;
and calculating the target sound source angle corresponding to each audio signal to be played according to the signal transmission direction.
12. The audio processing method according to claim 8, wherein before calculating the target sound source angle corresponding to each of the audio signals to be played based on the sound production position information of the sound production object, the method further comprises:
responding to a reference object selection operation of a user, and determining a corresponding reference object of the user in the video frame to be played;
acquiring reference position information of the reference object in the video frame to be played;
the calculating, based on the sounding position information of the sounding object, a target sound source angle corresponding to each audio signal to be played includes:
and determining a signal transmission angle between the sounding object and the reference object based on the sounding position information of the sounding object and the reference position information, and taking the signal transmission angle as a target sound source angle corresponding to each audio signal to be played.
13. The method according to claim 1, wherein the obtaining the audio signal to be played by each of the at least two loud speakers and the target sound source angle corresponding to each of the audio signals to be played includes:
receiving a data packet to be played sent by a signal sending end, wherein the data packet to be played is obtained by encoding based on audio signals to be played of all the loud speakers and target sound source angles corresponding to the audio signals to be played;
Decoding the data packet to be played to obtain an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played.
14. The method according to claim 1, wherein the obtaining the audio signal to be played by each of the at least two loud speakers and the target sound source angle corresponding to each of the audio signals to be played includes:
receiving a data packet to be played, which is sent by a cloud end, wherein the data packet to be played is sent to the cloud end by a signal sending end, and the data packet to be played is obtained by encoding the signal sending end based on audio signals to be played of all external speakers and target sound source angles corresponding to the audio signals to be played;
decoding the data packet to be played to obtain an audio signal to be played of each of at least two loud speakers and a target sound source angle corresponding to each audio signal to be played.
15. The audio processing method according to claim 1, wherein the performing signal transformation on the audio signal to be processed of each of the loud speakers based on the anti-crosstalk function to obtain a target playing audio signal corresponding to the audio signal to be played, includes:
Calculating an anti-crosstalk resonance function based on the anti-crosstalk function, wherein the anti-crosstalk resonance function is used for eliminating cross sound generated when the at least two external speakers externally play audio and the influence of resonance of the external auditory meatus of the human ear;
and carrying out signal transformation on the audio signals to be processed of the loud speakers based on the anti-crosstalk resonance function to obtain target playing audio signals corresponding to the audio signals to be played.
16. The audio processing method according to claim 1, wherein before the signal conversion of the audio signal to be processed based on the anti-crosstalk function, the method further comprises:
acquiring the playing setting parameters of the loud speakers and the audio playing preference parameters corresponding to the target users;
extracting audio parameters of the audio signal to be processed to obtain parameters to be played corresponding to the audio signal to be processed;
according to the play setting parameters, the audio play preference parameters and the parameters to be played, calculating an audio adjustment function corresponding to the audio signal to be processed;
the performing signal transformation on the audio signal to be processed based on the anti-crosstalk function to obtain a target playing audio signal corresponding to the audio signal to be processed, including:
And performing signal transformation on the audio signal to be processed based on the audio adjustment function, the target sound source angle, the signal transfer angle and the anti-crosstalk function to obtain a target playing audio signal corresponding to the audio signal to be processed.
17. The audio processing method according to any one of claims 1 to 16, characterized in that the method further comprises:
and respectively sending the target playing audio signals corresponding to the audio signals to be played to the loud speakers corresponding to the audio signals to be played, and triggering the loud speakers to play the corresponding target playing audio signals.
18. An audio processing apparatus, comprising:
the signal acquisition unit is used for acquiring audio signals to be played of each loud speaker in at least two loud speakers and a target sound source angle corresponding to each audio signal to be played;
an angle determining unit for determining a signal transmission angle between each of the loud speakers and the head of the user;
a function calculation unit, configured to calculate an anti-crosstalk function corresponding to the signal transfer angle based on the signal transfer angle and the target sound source angle, where the anti-crosstalk function is used to eliminate cross-talk generated when the at least two loud speakers play audio;
And the signal conversion unit is used for carrying out signal conversion on the audio signals to be processed of the loud speakers based on the anti-crosstalk function to obtain target playing audio signals corresponding to the audio signals to be played.
19. An electronic device comprising a memory and a processor; the memory stores an application program, and the processor is configured to execute the application program in the memory to perform the steps in the audio processing method according to any one of claims 1 to 17.
20. A computer readable storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the steps in the audio processing method of any of claims 1 to 17.
21. A computer program product comprising computer programs or instructions which, when executed by a processor, implement the steps of the audio processing method according to any one of claims 1 to 17.
CN202210940126.1A 2022-08-05 2022-08-05 Audio processing method, device, electronic equipment, storage medium and program product Pending CN117135557A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210940126.1A CN117135557A (en) 2022-08-05 2022-08-05 Audio processing method, device, electronic equipment, storage medium and program product
PCT/CN2023/097184 WO2024027315A1 (en) 2022-08-05 2023-05-30 Audio processing method and apparatus, electronic device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210940126.1A CN117135557A (en) 2022-08-05 2022-08-05 Audio processing method, device, electronic equipment, storage medium and program product

Publications (1)

Publication Number Publication Date
CN117135557A true CN117135557A (en) 2023-11-28

Family

ID=88855189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210940126.1A Pending CN117135557A (en) 2022-08-05 2022-08-05 Audio processing method, device, electronic equipment, storage medium and program product

Country Status (2)

Country Link
CN (1) CN117135557A (en)
WO (1) WO2024027315A1 (en)

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6296072B2 (en) * 2016-01-29 2018-03-20 沖電気工業株式会社 Sound reproduction apparatus and program
CN106572425A (en) * 2016-05-05 2017-04-19 王杰 Audio processing device and method
FR3113760B1 (en) * 2020-08-28 2022-10-21 Faurecia Clarion Electronics Europe Electronic device and method for crosstalk reduction, audio system for seat headrests and computer program therefor
CN113889140A (en) * 2021-09-24 2022-01-04 北京有竹居网络技术有限公司 Audio signal playing method and device and electronic equipment
CN114143699B (en) * 2021-10-29 2023-11-10 北京奇艺世纪科技有限公司 Audio signal processing method and device and computer readable storage medium
CN114040318A (en) * 2021-11-02 2022-02-11 海信视像科技股份有限公司 Method and equipment for playing spatial audio

Also Published As

Publication number Publication date
WO2024027315A1 (en) 2024-02-08

Similar Documents

Publication Publication Date Title
US11251763B2 (en) Audio signal adjustment method, storage medium, and terminal
WO2018192415A1 (en) Data live broadcast method, and related device and system
CN109166589B (en) Application sound suppression method, device, medium and equipment
WO2020253844A1 (en) Method and device for processing multimedia information, and storage medium
CN107231473B (en) Audio output regulation and control method, equipment and computer readable storage medium
CN110035250A (en) Audio-frequency processing method, processing equipment, terminal and computer readable storage medium
CN109040446B (en) Call processing method and related product
WO2017181365A1 (en) Earphone channel control method, related apparatus, and system
CN108319445B (en) Audio playing method and mobile terminal
JP2017521024A (en) Audio signal optimization method and apparatus, program, and recording medium
CN109121047B (en) Stereo realization method of double-screen terminal, terminal and computer readable storage medium
US20220415333A1 (en) Using audio watermarks to identify co-located terminals in a multi-terminal session
US9832587B1 (en) Assisted near-distance communication using binaural cues
CN106506437B (en) Audio data processing method and device
CN112019929A (en) Volume adjusting method and device
WO2022242405A1 (en) Voice call method and apparatus, electronic device, and computer readable storage medium
CN108683980B (en) Audio signal transmission method and mobile terminal
CN107040661A (en) Control method, device and the mobile terminal of mobile terminal playing volume
CN109873894B (en) Volume adjusting method and mobile terminal
CN113805837A (en) Audio processing method, mobile terminal and storage medium
WO2024037529A1 (en) Chat message processing method and apparatus, and electronic device and storage medium
CN114255781A (en) Method, device and system for acquiring multi-channel audio signal
WO2024000853A1 (en) Wearable device control method and apparatus, terminal device, and storage medium
CN108040311B (en) Audio playing method, terminal and computer readable storage medium
CN117135557A (en) Audio processing method, device, electronic equipment, storage medium and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination