WO2024027315A1 - Audio processing method and apparatus, electronic device, storage medium, and program product - Google Patents

Audio processing method and apparatus, electronic device, storage medium, and program product Download PDF

Info

Publication number
WO2024027315A1
WO2024027315A1 PCT/CN2023/097184 CN2023097184W WO2024027315A1 WO 2024027315 A1 WO2024027315 A1 WO 2024027315A1 CN 2023097184 W CN2023097184 W CN 2023097184W WO 2024027315 A1 WO2024027315 A1 WO 2024027315A1
Authority
WO
WIPO (PCT)
Prior art keywords
played
audio
signal
speaker
sound source
Prior art date
Application number
PCT/CN2023/097184
Other languages
French (fr)
Chinese (zh)
Inventor
秦宇
谢仁礼
Original Assignee
深圳Tcl数字技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳Tcl数字技术有限公司 filed Critical 深圳Tcl数字技术有限公司
Publication of WO2024027315A1 publication Critical patent/WO2024027315A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • H04S7/30Control circuits for electronic adaptation of the sound field
    • H04S7/302Electronic adaptation of stereophonic sound system to listener position or orientation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R5/00Stereophonic arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S1/00Two-channel systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/09Electronic reduction of distortion of stereophonic sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S2400/00Details of stereophonic systems covered by H04S but not provided for in its groups
    • H04S2400/11Positioning of individual sound objects, e.g. moving airplane, within a sound field

Definitions

  • the present invention relates to the technical field of audio processing, and specifically to audio processing methods, devices, electronic equipment, storage media and program products.
  • Embodiments of the present invention provide audio processing methods, devices, electronic equipment, storage media and program products, which can generate audio signals that can express the direction of sound sources without adding audio playback equipment, and eliminate the problem of using external speakers to play audio.
  • the cross-talk generated during playback allows users to enjoy audio with better playback effects.
  • An embodiment of the present invention provides an audio processing method, including:
  • an anti-crosstalk function corresponding to the signal transmission angle is calculated.
  • the anti-crosstalk function is used to eliminate the crossover generated by the at least two external speakers when the audio is externally played.
  • signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
  • an audio processing device including:
  • a signal acquisition unit configured to acquire the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played;
  • An angle determination unit used to determine the signal transmission angle between each of the external speakers and the user's head
  • a function calculation unit configured to calculate an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle.
  • the anti-crosstalk function is used to eliminate the problem caused by the at least two external loudspeakers being externally placed.
  • a signal conversion unit configured to perform signal conversion on the audio signals to be processed from each of the external speakers based on the anti-crosstalk function, to obtain a target playback audio signal corresponding to the audio signal to be played.
  • the function calculation unit is configured to determine, based on the signal transmission angle, the speaker head-related transfer function corresponding to the signal transmission angle;
  • the anti-crosstalk function corresponding to the signal transmission angle is calculated.
  • the at least two external speakers include a left speaker and a right speaker, and the angle determination unit is used to determine the left signal transmission angle between the left speaker and the user's head;
  • the function calculation unit is configured to determine, based on the left signal transmission angle, a first left ear head-related transfer function between the left speaker and the user's left ear, and a first left ear head-related transfer function between the left speaker and the user's right ear.
  • a second left ear head-related transfer function between the right speaker and the user's left ear, and a second right ear head-related transfer function between the right speaker and the user's right ear are determined ;
  • the first left ear tip related transfer function, the first right ear tip related transfer function, the second left ear tip related transfer function and the second right ear tip related transfer function are used as speaker head related transfer functions.
  • the at least two external speakers include a left speaker and a right speaker, and the angle determination unit is used to determine the positional relationship between the left speaker, the right speaker and the user's head;
  • the angle between any external speaker and the user's head is used as the signal transmission angle
  • the function calculation unit is configured to determine a first head-related transfer function between the left speaker and the user's left ear and between the right speaker and the user's right ear based on the signal transmission angle, and the left speaker a second head-related transfer function between the user's right ear and between the right speaker and the user's left ear;
  • the first head-related transfer function and the second head-related transfer function are regarded as speaker head-related transfer functions.
  • the function calculation unit is configured to perform matrix merging processing according to the speaker head related transfer function to obtain the speaker crosstalk matrix corresponding to the audio signal to be processed;
  • an anti-crosstalk function corresponding to the signal transmission angle is calculated.
  • the audio processing device provided by the embodiment of the present invention also includes a function processing unit for obtaining a preset discrete head related transfer function
  • the function calculation unit is configured to determine the speaker head-related transfer function corresponding to the signal transmission angle based on the signal transmission angle and the target head-related transfer function;
  • a sound source related transfer function corresponding to the target sound source angle is determined.
  • the signal acquisition unit is used to acquire the audio signal to be played by each of the at least two external speakers;
  • the signal acquisition unit is configured to acquire the audio signal to be played from each of the at least two external speakers and the video frame to be played corresponding to the audio signal to be played;
  • the target sound source angle corresponding to each of the audio signals to be played is calculated.
  • the video frame to be played includes at least one candidate voicing object.
  • the audio processing device provided by the embodiment of the present invention further includes a voicing object determination unit, used to determine the voicing object corresponding to the audio signal to be played, and obtain the voicing object. Describes the object identification information of the vocal object;
  • the signal acquisition unit is configured to perform information matching based on the object identification information from the candidate voicing objects included in the video frame to be played, and determine the voicing object;
  • the video frame to be played includes a display area of at least one candidate voicing object
  • the signal acquisition unit is configured to perform voicing action detection for each display area in the video frame to be played, respectively. If it is detected that a candidate voicing object in the display area has performed a voicing action, use the candidate voicing object as the voicing object;
  • the audio processing device provided by the embodiment of the present invention further includes a first position determination unit, configured to determine the corresponding sound receiving position information of the user in the video frame to be played in response to the user's position selection operation;
  • the signal acquisition unit is configured to determine the signal transmission direction between the sounding position and the sound receiving position based on the sounding position information of the sounding object and the sound receiving position information;
  • the target sound source angle corresponding to each audio signal to be played is calculated.
  • the audio processing device provided by the embodiment of the present invention further includes a second position determination unit, configured to determine the reference object corresponding to the user in the video frame to be played in response to the user's reference object selection operation;
  • the signal acquisition unit is configured to determine a signal transmission angle between the sounding object and the reference object based on the sounding position information of the sounding object and the reference position information, as each audio signal to be played Corresponding purpose Mark the sound source angle.
  • the signal acquisition unit is configured to receive a data packet to be played sent by the signal transmitting end.
  • the data packet to be played is based on the audio signal to be played from each external speaker and the target corresponding to each of the audio signals to be played.
  • the sound source angle is encoded;
  • the data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
  • the signal acquisition unit is used to receive the data packet to be played sent by the cloud.
  • the data packet to be played is sent to the cloud by the signal sending end.
  • the data packet to be played is sent by the signal sending end based on
  • the audio signals to be played by each external speaker and the target sound source angle corresponding to each of the audio signals to be played are obtained by encoding;
  • the data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
  • the signal conversion unit is configured to calculate an anti-crosstalk resonance function based on the anti-crosstalk function.
  • the anti-crosstalk resonance function is used to eliminate cross-talk generated by the at least two external speakers when the audio is externally played. The influence of sound and resonance of the external auditory canal of the human ear;
  • signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
  • the audio processing device provided by the embodiment of the present invention also includes an adjustment parameter calculation unit for obtaining the playback setting parameters of each of the external speakers and the audio playback preference parameters corresponding to the target user;
  • the signal transformation unit is configured to perform signal transformation on the audio signal to be processed based on the audio adjustment function, the target sound source angle, the signal transmission angle and the anti-crosstalk function to obtain the audio signal to be processed.
  • the target corresponding to the audio signal plays the audio signal.
  • the audio processing device further includes an audio playback unit, configured to send the target playback audio signal corresponding to each of the audio signals to be played to the target playback audio signal corresponding to each of the audio signals to be played.
  • An external speaker triggers the external speaker to play the corresponding target audio signal.
  • embodiments of the present invention also provide an electronic device, including a memory and a processor; the memory stores application programs, and the processor is used to run the application programs in the memory to execute the tasks provided by the embodiments of the present invention. steps in any audio processing method.
  • embodiments of the present invention also provide a computer-readable storage medium that stores a plurality of instructions, and the instructions are suitable for loading by the processor to execute any of the instructions provided by the embodiments of the present invention. Steps in an audio processing method.
  • embodiments of the present invention also provide a computer program product, including a computer program or instructions, and the computer When the program or instructions are executed by the processor, the steps in any of the audio processing methods provided by the embodiments of the present invention are implemented.
  • the audio signal to be played by each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played can be obtained, and the angle between each external speaker and the audio signal to be played can be determined.
  • the anti-crosstalk function corresponding to the signal transmission angle is calculated.
  • the anti-crosstalk function is used to eliminate the problem caused by the at least two external speakers.
  • the cross-talk generated during audio processing is based on the target sound source angle, the signal transmission angle and the anti-crosstalk function.
  • the audio signal to be processed of each external speaker is signal transformed to obtain the target corresponding to the audio signal to be played.
  • the anti-crosstalk function that can eliminate the cross-talk generated in the open sound field is calculated, and the audio signal to be played is also transformed based on the target sound source angle and the anti-crosstalk function, so that the target Playing audio signals can express sound source position information and offset crosstalk generated during playback. Therefore, it is possible to generate audio signals that can express the sound source position without adding additional audio playback equipment, and eliminate the problem when using external speakers.
  • the cross-talk generated when playing audio allows users to enjoy audio with better playback effects. .
  • Figure 1a is a schematic scene diagram of an audio processing method provided by an embodiment of the present invention.
  • Figure 1b is a schematic diagram of another scenario of the audio processing method provided by an embodiment of the present invention.
  • Figure 2 is a flow chart of an audio processing method provided by an embodiment of the present invention.
  • Figure 3 is a schematic diagram of a video frame to be played according to an embodiment of the present invention.
  • Figure 4 is another schematic diagram of a video frame to be played according to an embodiment of the present invention.
  • Figure 5 is a schematic diagram of the ideal state and actual state of audio signal transmission provided by the embodiment of the present invention.
  • Figure 6 is a schematic diagram of the technical implementation of virtual 5.1 surround sound provided by an embodiment of the present invention.
  • Figure 7 is a schematic structural diagram of an audio processing device provided by an embodiment of the present invention.
  • Figure 8 is another structural schematic diagram of an audio processing device provided by an embodiment of the present invention.
  • Figure 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
  • Embodiments of the present invention provide an audio processing method, device, electronic equipment and computer-readable storage medium. Specifically, embodiments of the present invention provide an audio processing method suitable for an audio processing device, and the audio processing device can be integrated in an electronic device.
  • the electronic device may be a terminal or other equipment, including but not limited to mobile terminals and fixed terminals.
  • mobile terminals include but are not limited to smartphones, smart watches, tablets, laptops, smart vehicles, etc.
  • fixed terminals include but are not limited to Desktop computers, smart TVs, etc.
  • the electronic device can also be a server or other equipment.
  • the server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, and cloud functions. , cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, but not Limited to this.
  • the audio processing method in the embodiment of the present invention can be implemented by the server, or can be implemented by the terminal and the server together.
  • the method will be described below by taking the audio processing method jointly implemented by the terminal and the server as an example.
  • the audio processing system may include a signal sending end 10 and a signal receiving end 20; the signal sending end 10 and the signal receiving end 20 are connected through a network, for example, through a wired or wireless network connection, etc., wherein the signal sending end 10 can exist as an electronic device that sends the audio signal to be played and the target sound source angle to the signal receiving end 20 .
  • the signal sending end 10 can be used to receive the user's voice input, generate an audio signal to be played, perform sound source analysis on the audio signal to be played to obtain the target sound source angle, and send the audio signal to be played and the audio signal to the signal receiving end 20.
  • Target sound source angle the target sound source angle
  • the signal receiving end 20 can be used to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played, and determine the angle of each external speaker.
  • the signal transmission angle between the speaker and the user's head based on the signal transmission angle and the target sound source angle, calculates the anti-crosstalk function corresponding to the signal transmission angle, the anti-crosstalk function is used to eliminate the at least The crosstalk generated by the two external speakers when playing audio externally is based on the anti-crosstalk function.
  • the audio signal to be processed of each external speaker is signal transformed to obtain the target corresponding to the audio signal to be played. Play audio signal.
  • the audio processing steps performed by the signal receiving end 20 can also be performed by the signal transmitting end 10, and the signal transmitting end 10 can directly send the signal-transformed target playback audio signal to the signal receiving end. 20, etc., the embodiment of the present invention does not limit this.
  • the audio processing system may include a signal sending end 10, a signal receiving end 20, a cloud 30, etc.; the signal sending end 10, the signal receiving end 20 and the cloud 30 are connected through a network, such as , through wired or wireless network connection, etc., wherein the signal sending end 10 can be used to send the audio signal to be played and the target sound to the cloud 30 Source angle of electronic equipment exists.
  • the signal sending end 10 can be used to receive the user's voice input, generate an audio signal to be played, perform sound source analysis on the audio signal to be played to obtain the target sound source angle, and send the audio signal to be played and the target sound to the cloud 30 source angle.
  • the cloud 30 can be used to forward the audio signal to be played and the target sound source angle to the signal receiving end 20 .
  • the signal receiving end 20 can be used to obtain the audio signal to be played by each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played, and determine the audio signal of each external speaker.
  • the signal transmission angle between the speaker and the user's head based on the signal transmission angle and the target sound source angle, calculates the anti-crosstalk function corresponding to the signal transmission angle, the anti-crosstalk function is used to eliminate the at least The crosstalk generated by the two external speakers when playing audio externally is based on the anti-crosstalk function.
  • the audio signal to be processed of each external speaker is signal transformed to obtain the target corresponding to the audio signal to be played. Play audio signal.
  • the audio processing steps performed by the signal receiving end 20 can also be performed by the signal transmitting end 10 or the cloud 30.
  • the signal transmitting end 10 or the cloud 30 can directly play the target after signal transformation.
  • the audio signal is sent to the signal receiving end 20 and so on, which is not limited in the embodiment of the present invention.
  • the embodiments of the present invention will be described from the perspective of an audio processing device.
  • the audio processing device may be integrated in a server or a terminal.
  • the specific process of the audio processing method in this embodiment can be as follows:
  • the external speaker is a speaker used for audio playback in an open sound field.
  • the number of external speakers is at least two.
  • the open sound field can be used in scenarios such as using TVs, smart speakers and other home appliances to play audio and video, and using car speakers to play audio and video in cars.
  • the external speakers may be speakers installed in electronic devices such as televisions, mobile phones, and car terminals, etc.
  • the audio signal to be played is the original audio signal obtained by the audio processing device.
  • the audio signal to be played may be the voice signal of the speaking user collected by the video conferencing client.
  • the audio signal to be played can be the audio signal obtained from the audio file that needs to be played, etc. wait.
  • each of the external speakers may be the same or different.
  • an audio file that needs to be played can naturally include audio signals to be played in the left channel and the right channel respectively.
  • each external speaker can correspond to the audio signal of the left channel or the audio of the right channel respectively. Signal.
  • the target sound source angle is the angle of the sound source expected to appear to the user during audio playback. For example, during a video call, if the angle between the position of the speaking user in the video screen and the position of the user of the video call application in the video screen is 30°, then the target sound source angle can be 30°.
  • the target sound source angle corresponding to each audio signal to be played may be the same or different. For example, if the audio signals to be played by each external speaker are the same, the target sound source angle is the same at this time, or, in a scenario where multiple users speak at the same time, different audio signals to be played can be based on the voice input of different users. The signal is obtained. At this time, the target sound source angles corresponding to different audio signals to be played may be different.
  • the target sound source angles corresponding to the different audio signals to be played can be the same.
  • the speech of the same user is divided into audio signals to be played in the left channel and the right channel respectively.
  • each external speaker can correspond to the audio signal of the left channel or the audio signal of the right channel, but the left channel
  • the audio signal of the channel or the audio signal of the right channel corresponds to the same speaking user, and the corresponding target sound source angle is also the same.
  • the target sound source angle may need to be obtained by positioning the audio signal to be played.
  • Step 201 may specifically include:
  • sound source location positioning can be achieved using Direction of Arrival (DOA), microphone array system positioning and other technologies.
  • DOA Direction of Arrival
  • the sound source position positioning can be performed at the signal sending end that generates the audio signal to be played.
  • the target sound source angle can be obtained through MIC (microphone) array positioning by the terminal of the user who is speaking.
  • the sound source position positioning can be performed on the cloud or signal receiving end that obtains the audio signal to be played.
  • the signal receiving end can obtain the target sound source angle based on information such as the time delay between the speaking time of the speaking user and the time when the audio signal to be played is received.
  • the audio signal to be played may correspond to a video frame
  • the target sound source angle may be obtained based on the position of the sounding object in the video frame.
  • the target sound source angle corresponding to each of the audio signals to be played is calculated.
  • the video frame to be played corresponding to the audio signal to be played can be a video frame synchronized with the audio signal to be played, or the video frame to be played can be a synchronized video frame synchronized with the audio signal to be played and the sequence is located before and/or the synchronized video frame. or the next N video frames.
  • the number of N can be set by technicians according to actual conditions, and this is not limited in the embodiment of the present invention.
  • the video frame to be played may include a synchronized video frame synchronized with the audio signal to be played and 10 video frames sequentially located after the synchronized video frame.
  • the cloud or the signal receiving end can detect the vocal objects in advance based on the 10 video frames that are sequentially located after the synchronized video frame to improve the speed of audio processing.
  • the sound position information may be the position of the display area of the sound object in the video frame to be played.
  • the utterance position information may be the virtual position information of the utterance object in the virtual space corresponding to the video frame to be played, and so on.
  • the audio processing method when the audio signal to be played is obtained, it can be determined which object in the video frame to be played corresponds to the audio signal to be played. After determining the corresponding sound-producing object, the sound-producing object can be placed in the video frame to be played according to the sound-producing object. The position of the target sound source angle is calculated. That is, the video frame to be played may include at least one candidate voicing object.
  • the audio processing method provided by the embodiment of the present invention may also include :
  • the object identification information may be identification information that can identify the speaking object, such as the account nickname of the speaking object, a unique identity ID, etc.
  • the embodiment of the present invention does not limit the content and form of the object identification information.
  • the step "determine the voicing position information of the voicing object from the video frame to be played" may specifically include:
  • FIG. 3 For example, as shown in Figure 3, three different candidate utterance objects, User 1, User 2 and User 3, are displayed on the smart TV.
  • user 1 speaks in the video conference
  • it can be determined that the speaking object corresponding to the audio signal to be played is user 1, and its object identification information can be "user 1".
  • the smart TV can determine the voicing object corresponding to "User 1" and the target display area of the voicing object in the video frame to be played from each candidate voicing object based on the object identification information of "User 1".
  • each candidate sounding object may be fixed. After determining the sounding object corresponding to the audio signal to be played, the position information of the target display area of the sounding object can be obtained. Alternatively, the user can also set the display area of the candidate utterance object by himself. In this case, the position information of the target display area needs to be determined based on the object identification information of the utterance object and the user's personalized setting information.
  • the video frame to be played may include the display area of at least one candidate voicing object, and the step "determining the voicing position information of the voicing object from the video frame to be played" includes:
  • the utterance action detection may be to only detect whether the mouth of each candidate utterance object moves. Or, in order to avoid misrecognition caused by the mouth movements of the candidate vocal target when they are not speaking, such as pursed lips, coughing, etc., facial muscle recognition can be added on the basis of mouth movement recognition to improve the accuracy of vocal movement detection.
  • user 1's mouth moves.
  • user 1 the candidate utterance target
  • the target display area is the local area where user 1 is located.
  • the user can manually select his or her own position.
  • the audio processing methods provided by the example can also include:
  • the corresponding sound receiving position information of the user in the video frame to be played is determined.
  • users can select their position in the video frame to be played, such as the lower left corner of the screen, the center of the screen, etc.
  • the user's position selection operation can be the user's drag operation of the corresponding screen display frame, Click/double-click and other trigger operations in the video conference screen, etc.
  • the step "calculate the target sound source angle corresponding to each of the audio signals to be played based on the sounding position information of the sounding object" may specifically include:
  • the target sound source angle corresponding to each audio signal to be played is calculated.
  • the signal transmission direction refers to the direction between the sounding position of the sounding object and the sound receiving position selected by the user.
  • the position indicated by the sound receiving position information may not be in the video frame to be played.
  • users can pre-select their virtual location in the virtual concert venue.
  • the virtual position is the sound receiving position corresponding to the video frame to be played in the user's online concert. If the virtual location selected by the user can be displayed in the video screen of the concert, it may or may not be displayed in the video screen of the concert.
  • the vocal object is the performer in the concert
  • the vocal location information is the performer's location information in the virtual venue.
  • the audio processing method may also include:
  • the step "calculate the target sound source angle corresponding to each of the audio signals to be played based on the sounding position information of the sounding object" may specifically include:
  • the signal transmission angle between the sounding object and the reference object is determined as the target sound source angle corresponding to each of the audio signals to be played.
  • the position of the reference object and the position of the sounding object may vary.
  • the reference object selected by the user can be the protagonist of the drama, and the speaking object can change at any time.
  • the position of the sounding object and the position of the reference object can be moved.
  • the signal transmission angle is the angle between the external speaker and the user's head. Specifically, when there is only one person in the human body detection range of the external speaker or the terminal where the external speaker is located, the head position of this person can be the user's head position by default.
  • the nearest person can be used as the user; or, the face of the logged-in account on the terminal can be compared with the faces of each person. Matching, the successfully matched person will be regarded as the user, etc.
  • the step "obtaining the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played" may include:
  • the data packet to be played is sent to the cloud by the signal sending end.
  • the data packet to be played is sent by the signal sending end based on the audio signal to be played by each external speaker and the audio signal of each external speaker.
  • the target sound source angle corresponding to the audio signal to be played is obtained by encoding;
  • the data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
  • the signal sending end can directly perform sound source positioning based on the audio signal to be played to obtain the target sound source angle, and then encode the audio signal to be played and the target sound source angle to obtain the data packet to be played.
  • the signal sending end can only encode the audio signal to be played, obtain the data packet to be played, and send the data packet to be played to the cloud. After the cloud forwards the data packet to be played to the signal receiving end, the signal receiving end can obtain the data to be played. The package is decoded, and the target sound source angle is determined based on the obtained audio signal to be played.
  • the audio signal to be played can be obtained in the cloud for processing.
  • the signal sending end can perform encoding processing on the audio signal to be played.
  • the step "obtains the audio signal to be played by each of the external speakers in at least two external speakers and the audio signal corresponding to each of the audio signals to be played.
  • “Target sound source angle” which can include:
  • Receive the data packet to be played sent by the signal transmitting end the data packet to be played is encoded based on the audio signal to be played from each external speaker and the target sound source angle corresponding to each of the audio signals to be played;
  • the signal sending end can directly perform sound source positioning based on the audio signal to be played to obtain the target sound source angle, and then encode the audio signal to be played and the target sound source angle to obtain the data packet to be played.
  • the signal sending end can only encode the audio signal to be played, obtain the data packet to be played, send the data packet to be played to the cloud, trigger the cloud to decode the data packet to be played, and determine the target sound source based on the obtained audio signal to be played. angle.
  • the cloud can also directly perform step 203 on the cloud.
  • the anti-crosstalk function is used to eliminate the noise generated by the at least two external speakers when playing audio externally. of crosstalk.
  • the sound source E that is virtually at the ⁇ angle i.e., the target sound source angle
  • the left ear HRTF Head Related Transfer Function
  • Ear playback after processing the HRTF of the right ear at the ⁇ angle, that is, ⁇ 0, and playing it to the right ear, the user can hear the virtual sound source at the ⁇ angle.
  • the physical pathways of the user's left and right ears are basically isolated, and it is basically impossible for the sound from the left ear to be heard by the right ear, and vice versa.
  • the actual listening signal is the result of crosstalk and interference from the position of the external speakers.
  • the step "calculate the anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle" may include:
  • the anti-crosstalk function corresponding to the signal transmission angle is calculated.
  • each external speaker has a spatial position. If the signal that has been processed according to the target sound source angle is directly played by the external speaker, the actual played sound must be multiplied by the distance between the external speaker and the user's head.
  • the HRTF corresponding to the signal transmission angle is the sound that the user actually hears.
  • the step "determining the signal transmission angle between each of the external speakers and the user's head" specifically includes:
  • the step "based on the signal transmission angle, determine the speaker head-related transfer function corresponding to the signal transmission angle” may include:
  • a first left ear head-related transfer function between the left speaker and the user's left ear, and a first right ear head-related transfer function between the left speaker and the user's right ear are determined ;
  • a second left ear head-related transfer function between the right speaker and the user's left ear, and a second right ear head-related transfer function between the right speaker and the user's right ear are determined ;
  • the first left ear tip related transfer function, the first right ear tip related transfer function, the second left ear tip related transfer function and the second right ear tip related transfer function are used as speaker head related transfer functions.
  • the left speaker and the right speaker do not limit the positional relationship between the external speaker and the user. Generally speaking, the left speaker refers to the left speaker among the two speakers, and the right speaker refers to the right speaker among the two speakers.
  • the pre-established head related transfer function can be obtained
  • the library After determining the head-related transfer function library, according to the left signal transmission angle and the right signal transmission angle, the first left ear head-related transfer function and the first head-related transfer function corresponding to each angle are obtained from the head-related transfer function library.
  • a right ear tip related transfer function, a second left ear tip related transfer function and a second right ear tip related transfer function are obtained from the head-related transfer function library.
  • the left signal transmission angle can have only one angle, that is, the angle between the left speaker and a certain position on the user's head.
  • the first left ear head related transfer function and the first right ear head related transfer function it can only be based on This perspective is taken from the head-related transfer function library.
  • the left signal transmission angle may include two angles, namely the angle between the left speaker and the user's left ear and the angle between the left speaker and the user's right ear.
  • the first left ear head related transfer function it can be obtained from the head related transfer function library according to the angle between the left speaker and the user's left ear; when determining the first right ear head related transfer function, it can be obtained according to the left ear head related transfer function.
  • the angle between the speaker and the user's right ear is obtained from a library of head-related transfer functions.
  • the signal transmission angle on the right side is similar to the signal transmission angle on the left side, and will not be described again in the embodiment of the present invention.
  • the transfer function related to the speaker head can be further simplified. That is, taking the at least two external speakers including a left speaker and a right speaker as an example, the step "determining the signal transmission angle between each of the external speakers and the user's head" may specifically include:
  • the angle between any external speaker and the user's head is used as the signal transmission angle.
  • the step "based on the signal transmission angle, determine the speaker head-related transfer function corresponding to the signal transmission angle" includes:
  • a first head-related transfer function between the left speaker and the user's left ear and between the right speaker and the user's right ear is determined, and a first head-related transfer function between the left speaker and the user's right ear and the Describe the second head-related transfer function between the right speaker and the user's left ear;
  • the first head-related transfer function and the second head-related transfer function are regarded as speaker head-related transfer functions.
  • the HRTF at this time is symmetrical on the same side.
  • the head-related transfer function from the left speaker to the left ear is equal to the head-related transfer function from the right speaker to the right ear.
  • ⁇ 1 ⁇ 2
  • ⁇ 1 ⁇ 2
  • ⁇ 1 and ⁇ 1 are the transfer functions from the left speaker with an angle ⁇ 1 to the user’s left and right ears
  • ⁇ 2 and ⁇ 2 are the transfer functions from the right speaker with an angle ⁇ 2 to the user’s right and left ears.
  • Head-related transfer function of the ear are the transfer functions from the left speaker with an angle ⁇ 1 to the user’s left and right ears
  • ⁇ 2 and ⁇ 2 are the transfer functions from the right speaker with an angle ⁇ 2 to the user’s right and left ears.
  • an anti-crosstalk function that can eliminate crosstalk can be determined based on the generation rules of crosstalk.
  • the step "calculate the anti-crosstalk function corresponding to the signal transmission angle based on the related transfer function of the speaker head and the related transfer function of the sound source" may specifically include:
  • an anti-crosstalk function corresponding to the signal transmission angle is calculated.
  • the unprocessed audio signals heard by the user are processed by matrix merging according to the relevant transfer function of the speaker head to obtain the matrix representation of the audio signal:
  • the speaker crosstalk matrix is:
  • a crosstalk cancellation matrix can be designed for the speaker crosstalk matrix so that the processed audio signal can eliminate the speaker crosstalk matrix during spatial propagation.
  • crosstalk cancellation matrix A can be expressed as follows:
  • the result of pre-cancellation is performed through matrix A, and then the effect of A is offset during the spatial propagation process, and finally the user can hear the appropriate sound.
  • the audio signals x and y sent to the left speaker and the right speaker respectively after being processed by the crosstalk cancellation matrix A can be expressed by the following formula:
  • the anti-crosstalk function corresponding to the signal transmission angle can be calculated based on the crosstalk cancellation matrix and the sound source related transfer function.
  • the anti-crosstalk function can be expressed by the following formula:
  • the anti-crosstalk function can be simplified to the following form:
  • the audio processing method of the embodiment of the present invention is also applicable to a multi-speaker scenario.
  • the core problem in a multi-speaker scenario is similar to that of dual speakers, which is the crosstalk sum between external speakers and the additional transfer function of each external speaker. It can be understood that the more external speakers, the greater the interaction between the external speakers. The more severe the crosstalk is, the more complex the problem becomes.
  • the crosstalk cancellation matrix A can be calculated based on the principle of superposition of audio signals in the user's left and right ears. Multiply the audio signal to be played by a crosstalk cancellation matrix A to calculate the anti-crosstalk function.
  • A is the size of n*m, n is the number of speakers, and m is the number of virtual sound sources.
  • the audio processing method may also include:
  • function approximation processing can include interpolation methods or curve fitting methods, etc.
  • Technical personnel can choose an appropriate function approximation processing method according to actual application requirements.
  • the step "based on the signal transmission angle, determine the speaker head-related transfer function corresponding to the signal transmission angle” may specifically include:
  • the step "based on the target sound source angle, determine the sound source related transfer function corresponding to the target sound source angle" may specifically include:
  • a sound source related transfer function corresponding to the target sound source angle is determined.
  • the data in the preset header-related transmission function library can be obtained from an existing database.
  • the data in the header-related transmission function library in this embodiment can be obtained from the CIPIC database.
  • embodiments of the present invention can perform interpolation processing according to the CIPIC database to obtain a target head-related transfer function that includes HRTFs corresponding to more precise angles.
  • head-related transmission function libraries of different users can be established in advance.
  • the target user who wants to receive the target playback audio signal which can be determined through the information of the account logged in in the terminal or application, the facial feature information of the target user, etc.
  • the speaker head-related transfer function and the sound source-related transfer function are obtained from the target user's head-related transfer function library.
  • the audio processing method provided by the embodiment of the present invention can be applied to achieve the virtual 5.1 surround sound effect using dual external speakers.
  • the bass path is not considered, and it is assumed that the dual external speakers and In a 5.1-channel scene, the front left and right speakers have the same angle, that is, 30° and 330° positions.
  • is the HRTF from the front left speaker to the same side
  • is the HRTF from the front left speaker to the opposite side.
  • the front left speaker and the front right speaker are symmetrical, so ⁇ is also the HRTF from the front right speaker to the same side, and ⁇ is also the HRTF from the front right speaker to the opposite side. HRTF.
  • HL and HR respectively represent the HRTF from the left rear and right rear speakers to both ears at f frequency at different angles.
  • the two angle parameters required for HL and HR are known in the simulated 5.1 scenario, that is, the positions of the left and right rear speakers, 120° and 240°.
  • the parameters to be determined are 8 transfer functions, That is, 8 functions from four directions to both ears.
  • the target audio signal to be played by the left speaker can include the audio signal to be played that simulates the left front channel, the audio signal to be played that simulates the front channel, and after being processed by the anti-crosstalk function, it can offset the audio signal transmitted from the left speaker to the right.
  • the audio signal of the crosstalk effect from the right ear to the left ear and the audio signal processed by the anti-crosstalk function can offset the crosstalk effect from the right speaker to the left ear.
  • the content of the target audio signal to be played by the right speaker is similar to the above, and will not be described again here.
  • the audio processing method provided by the embodiment of the present invention can be applied to achieve the virtual sound field broadening effect of using dual external speakers.
  • the sound field of dual external speakers is widened, which is equivalent to using the front left and front right channels to virtual the rear left and right channels in a 5-channel situation.
  • HRTF is generally collected through sound pickup in the inner auditory canal. That is to say, the HRTF measured by this acquisition method not only includes the effect of air channel transmission outside the ear canal, but also includes the resonance generated by the human ear canal. Impact. If this HRTF is directly applied to the signal, although the user can hear an obvious sense of space, the intermediate frequency will be distorted because the signal actually heard by the user is the result of two human ear resonance effects.
  • an anti-crosstalk resonance function can be designed to eliminate the effects of crosstalk and ear canal resonance. Step "Based on the anti-crosstalk function, perform signal transformation on the audio signal to be processed from each of the external speakers to obtain the target audio signal to be played corresponding to the audio signal to be played", which may specifically include:
  • the anti-crosstalk resonance function is used to eliminate the impact of cross-talk and resonance of the external auditory canal of the human ear due to the at least two external speakers when the audio is externally played;
  • signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
  • the anti-crosstalk resonance function can be expressed by the following formula:
  • the denominator of the anti-crosstalk resonance function is equivalent to the average frequency energy spectrum of the left and right channels, it does not affect the phase between frequency points, and the same frequency point of the left and right channels is divided by the same number, which does not affect the energy of the left and right channels at the same frequency point. relationship, and the energy of the transmitted signals of the left and right channels remains unchanged after passing through the anti-crosstalk resonance function. Therefore, through the processing of the anti-crosstalk resonance function, the signal remains stable, and the poles at each frequency point are also eliminated.
  • the audio signal to be played can be processed accordingly according to the user's volume setting, etc.
  • the audio processing method provided by the embodiment of the present invention also includes:
  • the audio adjustment function corresponding to the to-be-processed audio signal is calculated.
  • the target user is a user logged in to the audio playback device or a user currently using the audio playback device.
  • the audio playback preference parameters are generated based on the playback effect adjustment operation performed by the target user before determining the audio adjustment parameters.
  • the audio playback preference parameters may include but are not limited to audio frequency (pitch) adjustment parameters, audio loudness adjustment parameters, and so on.
  • Extracting audio parameters from the audio signal to be processed may include audio analysis of the audio signal to be processed to obtain specific parameters such as frequency, number of sampling bits, number of channels, loudness, bit rate, etc. corresponding to the audio signal to be processed; or , or feature extraction of the audio signal to be processed is performed to obtain a feature vector corresponding to the audio signal to be processed, which feature vector can represent the parameter characteristics of the audio signal to be processed, and so on.
  • the step "based on the anti-crosstalk function, perform signal transformation on the audio signal to be processed to obtain the target playback audio signal corresponding to the audio signal to be processed” includes:
  • the audio signal to be processed is signal transformed to obtain the target playback audio signal corresponding to the audio signal to be processed.
  • the process of calculating the audio adjustment function corresponding to the audio signal to be processed may be to perform operations such as amplifying or reducing the parameters to be played, so that the target playback audio signal processed by the audio adjustment function is played based on the playback setting parameters.
  • the audio playback preference parameters can be met.
  • the parameters to be played in the form of matrices or vectors can also be convolved or multiplied, etc.
  • the audio processing method provided by the embodiment of the present invention may also include:
  • the target playback audio signal corresponding to each of the audio signals to be played is sent to the external speaker corresponding to each of the audio signals to be played, and the external speaker is triggered to play the corresponding target playback audio signal.
  • the target playback audio signal after the target playback audio signal is obtained through processing in the cloud or the signal receiving end, the target playback audio signal can be sent to the corresponding external speaker for playback.
  • the embodiment of the present invention can obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each audio signal to be played, and determine the distance between each external speaker and the user's head. Based on the signal transmission angle and the target sound source angle, the anti-crosstalk function corresponding to the signal transmission angle is calculated. The anti-crosstalk function is used to eliminate the crosstalk generated by at least two external speakers when the audio is played externally.
  • the target sound source angle, signal transmission angle and anti-crosstalk function Based on the target sound source angle, signal transmission angle and anti-crosstalk function, signal transformation is performed on the audio signal to be processed of each external speaker to obtain the target playback audio signal corresponding to the audio signal to be played; because in the embodiment of the present invention, the The anti-crosstalk function can eliminate the cross-talk generated in the open sound field. It also transforms the audio signal to be played based on the target sound source angle and the anti-crosstalk function, so that the target audio signal to be played can express the sound source position information and offset the sound source position information during playback. Therefore, it is possible to generate an audio signal that can express the direction of the sound source without adding additional audio playback equipment, and eliminate the cross-talk generated when using external speakers to play audio, so that users can enjoy Play better audio.
  • an embodiment of the present invention also provides an audio processing device.
  • the device includes:
  • the signal acquisition unit 701 is used to acquire the audio signal to be played by each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played;
  • Angle determination unit 702 used to determine the signal transmission angle between each of the external speakers and the user's head
  • Function calculation unit 703 configured to calculate an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle.
  • the anti-crosstalk function is used to eliminate the interference caused by the at least two external speakers. Crosstalk generated when playing audio out;
  • the signal conversion unit 704 is configured to perform signal conversion on the audio signals to be processed from each of the external speakers based on the anti-crosstalk function to obtain a target playback audio signal corresponding to the audio signal to be played.
  • the function calculation unit 703 is configured to determine, based on the signal transmission angle, the speaker head-related transfer function corresponding to the signal transmission angle;
  • the anti-crosstalk function corresponding to the signal transmission angle is calculated.
  • the at least two external speakers include a left speaker and a right speaker, and the angle determination unit 702 is used to determine the left signal transmission angle between the left speaker and the user's head;
  • the function calculation unit is configured to determine, based on the left signal transmission angle, a first left ear head-related transfer function between the left speaker and the user's left ear, and a first left ear head-related transfer function between the left speaker and the user's right ear.
  • a second left ear head-related transfer function between the right speaker and the user's left ear, and a second right ear head-related transfer function between the right speaker and the user's right ear are determined ;
  • the first left ear tip related transfer function, the first right ear tip related transfer function, the second left ear tip related transfer function and the second right ear tip related transfer function are used as speaker head related transfer functions.
  • the at least two external speakers include a left speaker and a right speaker
  • the angle determination unit 702 is used to determine the positional relationship between the left speaker, the right speaker and the user's head;
  • the angle between any external speaker and the user's head is used as the signal transmission angle
  • the function calculation unit 703 is configured to determine a first head-related transfer function between the left speaker and the user's left ear and between the right speaker and the user's right ear based on the signal transmission angle, and the left a second head-related transfer function between the speaker and the user's right ear and between the right speaker and the user's left ear;
  • the first head-related transfer function and the second head-related transfer function are regarded as speaker head-related transfer functions.
  • the function calculation unit 703 is configured to perform matrix merging processing according to the speaker head related transfer function to obtain the speaker crosstalk matrix corresponding to the audio signal to be processed;
  • the immunity corresponding to the signal transmission angle is calculated. crosstalk function.
  • the audio processing device provided by the embodiment of the present invention also includes a function processing unit 705, used to obtain a preset discrete head related transfer function;
  • the function calculation unit is configured to determine the speaker head-related transfer function corresponding to the signal transmission angle based on the signal transmission angle and the target head-related transfer function;
  • a sound source related transfer function corresponding to the target sound source angle is determined.
  • the signal acquisition unit 701 is used to acquire the audio signal to be played by each of the at least two external speakers;
  • the signal acquisition unit 701 is configured to acquire the audio signal to be played from each of the at least two external speakers and the video frame to be played corresponding to the audio signal to be played;
  • the target sound source angle corresponding to each of the audio signals to be played is calculated.
  • the video frame to be played includes at least one candidate voicing object.
  • the audio processing device provided by the embodiment of the present invention also includes a voicing object determination unit 706, used to determine the voicing object corresponding to the audio signal to be played, and obtain The object identification information of the vocal object;
  • the signal acquisition unit 701 is configured to perform information matching based on the object identification information from the candidate voicing objects included in the video frame to be played, and determine the voicing object;
  • the video frame to be played includes a display area of at least one candidate utterance object
  • the signal acquisition unit 701 is configured to perform utterance action detection for each display area in the video frame to be played. , if it is detected that a candidate voicing object in the display area has performed a voicing action, use the candidate voicing object as the voicing object;
  • the audio processing device provided by the embodiment of the present invention also includes a first position determination unit 707, configured to determine the corresponding sound receiving position information of the user in the video frame to be played in response to the user's position selection operation. ;
  • the signal acquisition unit 701 is configured to determine the signal transmission direction between the sounding position and the sound receiving position based on the sounding position information of the sounding object and the sound receiving position information;
  • the target sound source angle corresponding to each audio signal to be played is calculated.
  • the audio processing device provided by the embodiment of the present invention further includes a second position determination unit, configured to determine the reference object corresponding to the user in the video frame to be played in response to the user's reference object selection operation;
  • the signal acquisition unit 701 is configured to determine the signal transmission angle between the sounding object and the reference object based on the sounding position information of the sounding object and the reference position information, as each of the audio to be played The target sound source angle corresponding to the signal.
  • the signal acquisition unit 701 is configured to receive a data packet to be played sent by the signal sending end.
  • the data packet to be played is based on the audio signal to be played from each external speaker and the audio signal corresponding to each audio signal to be played.
  • the target sound source angle is encoded;
  • the data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
  • the signal acquisition unit 701 is used to receive the data packet to be played sent by the cloud.
  • the data packet to be played is sent to the cloud by the signal sending end.
  • the data packet to be played is sent by the signal sending end. Obtained by encoding based on the audio signals to be played from each external speaker and the target sound source angle corresponding to each of the audio signals to be played;
  • the data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
  • the signal transformation unit 704 is configured to calculate an anti-crosstalk resonance function based on the anti-crosstalk function.
  • the anti-crosstalk resonance function is used to eliminate crossover caused by the at least two external speakers when playing audio externally. The impact of cross-sound and resonance of the external auditory canal of the human ear;
  • signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
  • the audio processing device provided by the embodiment of the present invention also includes an adjustment parameter calculation unit for obtaining the playback setting parameters of each of the external speakers and the audio playback preference parameters corresponding to the target user;
  • the signal transformation unit 704 is configured to perform signal transformation on the audio signal to be processed based on the audio adjustment function, the target sound source angle, the signal transmission angle and the anti-crosstalk function to obtain the audio signal to be processed. Process the audio signal corresponding to the target to play the audio signal.
  • the audio processing device further includes an audio playback unit, configured to send the target playback audio signal corresponding to each of the audio signals to be played to the target playback audio signal corresponding to each of the audio signals to be played.
  • An external speaker triggers the external speaker to play the corresponding target audio signal.
  • the audio processing device it is possible to generate audio signals that can express the direction of the sound source without adding additional audio playback equipment, and eliminate the cross-talk generated when using external speakers to play audio, so that users can enjoy to play better audio.
  • an embodiment of the present invention also provides an electronic device, which can be a terminal or a server, etc., as shown in Figure 9, which shows a schematic structural diagram of the electronic device involved in the embodiment of the present invention. Specifically, :
  • the electronic device may include a radio frequency (RF, Radio Frequency) circuit 901, a memory 902 including one or more computer-readable storage media, an input unit 903, a display unit 904, a sensor 905, an audio circuit 906, a wireless fidelity (WiFi) , Wireless Fidelity) module 907, including a processor 908 with one or more processing cores, a power supply 909 and other components.
  • RF Radio Frequency
  • RF Radio Frequency
  • the RF circuit 901 can be used to receive and send information or signals during a call. In particular, after receiving the downlink information of the base station, it is handed over to one or more processors 908 for processing; in addition, the uplink data is sent to the base station. .
  • the RF circuit 901 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, a low noise amplifier (LNA, Low Noise Amplifier), duplexer, etc.
  • SIM Subscriber Identity Module
  • RF circuit 901 can communicate with networks and other devices through wireless communications.
  • Wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM, Global System of Mobile communication), General Packet Radio Service (GPRS, General Packet Radio Service), Code Division Multiple Access (CDMA, Code Division Multiple Access), Wideband Code Division Multiple Access (WCDMA, Wideband Code Division Multiple Access), Long Term Evolution (LTE, Long Term Evolution), email, Short Message Service (SMS, Short Messaging Service), etc.
  • GSM Global System of Mobile communication
  • GPRS General Packet Radio Service
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • LTE Long Term Evolution
  • SMS Short Message Service
  • the memory 902 can be used to store software programs and modules, and the processor 908 executes various functional applications and data processing by running the software programs and modules stored in the memory 902 .
  • the memory 902 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may store a program based on Data created by the use of electronic devices (such as audio data, phone books, etc.), etc.
  • memory 902 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 902 may also include a memory controller to provide the processor 908 and the input unit 903 with access to the memory 902 .
  • the input unit 903 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control.
  • the input unit 903 may include a touch-sensitive surface as well as other input devices.
  • a touch-sensitive surface also known as a touch display or trackpad, can collect the user's touch operations on or near it (such as the user using a finger, stylus, or any suitable object or accessory on or near the touch-sensitive surface). operations near the surface), and drive the corresponding connection device according to the preset program.
  • the touch-sensitive surface may include two parts: a touch detection device and a touch controller.
  • the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact point coordinates, and then sends it to the touch controller. to the processor 908, and can receive commands sent by the processor 908 and execute them.
  • touch-sensitive surfaces can be implemented using a variety of types including resistive, capacitive, infrared, and surface acoustic waves.
  • the input unit 903 may also include other input devices. Specifically, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), trackball, mouse, joystick, etc.
  • the display unit 904 may be used to display information input by the user or information provided to the user as well as various graphical user interfaces of the electronic device. These graphical user interfaces may be composed of graphics, text, icons, videos, and any combination thereof.
  • the display unit 904 may include a display panel, which may optionally be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like.
  • the touch-sensitive surface can cover the display panel. When the touch-sensitive surface detects a touch operation on or near it, it is sent to the processor 908 to determine the type of the touch event. The processor 908 then displays the display panel according to the type of the touch event. Corresponding visual output is provided on the panel.
  • the touch-sensitive surface and the display panel are used as two independent components to implement the input and input functions, in some embodiments, the touch-sensitive surface and the display panel can be integrated to implement the input and output functions.
  • the electronic device may also include at least one sensor 905, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor.
  • the ambient light sensor may adjust the brightness of the display panel according to the brightness of the ambient light.
  • the proximity sensor may close the display panel and/or when the electronic device moves to the ear. Backlight.
  • the gravity acceleration sensor can detect the magnitude of acceleration in various directions (usually three axes). It can detect the magnitude and direction of gravity when stationary.
  • the audio circuit 906, speaker, and microphone can provide an audio interface between the user and the electronic device.
  • the audio circuit 906 can transmit the electrical signal converted from the received audio data to the speaker, which converts it into a sound signal and outputs it; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received and converted by the audio circuit 906
  • the audio data is processed by the audio data output processor 908 and then sent to, for example, another electronic device through the RF circuit 901, or the audio data is output to the memory 902 for further processing.
  • Audio circuitry 906 may also include an earphone jack to provide communication between peripheral earphones and electronic devices.
  • WiFi is a short-distance wireless transmission technology. Electronic devices can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 907. It provides users with wireless broadband Internet access.
  • FIG. 9 shows the WiFi module 907, it can be understood that it is not a necessary component of the electronic device and can be omitted as needed without changing the essence of the invention.
  • the processor 908 is the control center of the electronic device, using various interfaces and lines to connect various parts of the entire mobile phone, by running or executing software programs and/or modules stored in the memory 902, and calling data stored in the memory 902, Perform various functions of electronic devices and process data.
  • the processor 908 may include one or more processing cores; preferably, the processor 908 may integrate an application processor and a modem processor, where the application processor mainly processes operating systems, user interfaces, application programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the above-mentioned modem processor may not be integrated into the processor 908.
  • the electronic device also includes a power supply 909 (such as a battery) that supplies power to various components.
  • a power supply 909 (such as a battery) that supplies power to various components.
  • the power supply can be logically connected to the processor 908 through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system.
  • Power supply 909 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators, and other arbitrary components.
  • the electronic device may also include a camera, a Bluetooth module, etc., which will not be described again here.
  • the processor 908 in the electronic device will load the executable files corresponding to the processes of one or more application programs into the memory 902 according to the following instructions, and the processor 908 will run the executable files stored in the memory 902 application to achieve various functions, as follows:
  • an anti-crosstalk function corresponding to the signal transmission angle is calculated.
  • the anti-crosstalk function is used to eliminate the crossover generated by the at least two external speakers when the audio is externally played.
  • signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
  • embodiments of the present invention provide a computer-readable storage medium in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any audio processing method provided by embodiments of the present invention. .
  • this command can perform the following steps:
  • an anti-crosstalk function corresponding to the signal transmission angle is calculated.
  • the anti-crosstalk function is used to eliminate the crossover generated by the at least two external speakers when the audio is externally played.
  • signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
  • the computer-readable storage medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
  • a computer program product or computer program is also provided.
  • the computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium.
  • the processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device performs the methods provided in various optional implementations in the above embodiments.

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Stereophonic System (AREA)

Abstract

The embodiments of the present invention disclose an audio processing method and apparatus, an electronic device, a storage medium, and a program product; the method comprises: obtaining from each of at least two external loudspeakers an audio signal to be played, and a target source source angle corresponding to each audio signal to be played; determining a signal transmission angle between each external loudspeaker and the head of a user, and based on the signal transmission angle and the target sound source angle, calculating an anti-crosstalk function corresponding to the signal transmission angle, the anti-crosstalk function being used for eliminating crosstalk generated by the at least two external loudspeakers when playing audio externally; based on the target sound source angle, the signal transmission angle, and the anti-crosstalk function, performing signal transformation on the audio signals to be processed of each external loudspeaker, to obtain target playback audio signals corresponding to audio signals to be played. The invention allows for generating an audio signal capable of representing the orientation of a sound source, and allows for eliminating crosstalk generated when external loudspeakers are used for playing audio, allowing a user to enjoy audio having a better playback effect.

Description

音频处理方法、装置、电子设备、存储介质和程序产品Audio processing methods, devices, electronic equipment, storage media and program products
本公开要求于2022年8月5日提交中国专利局、申请号为202210940126.1发明名称为“音频处理方法、装置、电子设备、存储介质和程序产品”的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This disclosure requires the priority of the Chinese patent application with the application number 202210940126.1 and the invention name is "Audio processing method, device, electronic equipment, storage medium and program product" submitted to the China Patent Office on August 5, 2022, and its entire content is approved by This reference is incorporated into this disclosure.
技术领域Technical field
本发明涉及音频处理技术领域,具体涉及音频处理方法、装置、电子设备、存储介质和程序产品。The present invention relates to the technical field of audio processing, and specifically to audio processing methods, devices, electronic equipment, storage media and program products.
背景技术Background technique
随着当前经济和科技的快速发展,人们在日常生活中也开始追求环绕声的听觉效果。如果采用布置额外的音响设备等方式实现环绕声效果,需要耗费一定的时间和经济资源。With the current rapid development of economy and technology, people have begun to pursue surround sound listening effects in their daily lives. If you use methods such as arranging additional audio equipment to achieve surround sound effects, it will take a certain amount of time and economic resources.
目前,还可以利用现代电声技术在不增设音响设备的基础上,实现立体声的效果,但是采用这种方案,一般也只是将音频处理为两个声道的音频数据进行播放。At present, modern electroacoustic technology can also be used to achieve a stereo effect without adding additional audio equipment. However, with this solution, the audio is generally only processed into two-channel audio data for playback.
技术问题technical problem
无法完全实现真实的环绕立体的声音效果,且这种方案如果应用在外放的场景下,由于串扰等因素,会导致本就有限的立体声播放效果被进一步削弱。It is impossible to fully realize the true surround sound effect, and if this solution is used in an external playback scenario, the already limited stereo playback effect will be further weakened due to factors such as crosstalk.
技术解决方案Technical solutions
本发明实施例提供音频处理方法、装置、电子设备、存储介质和程序产品,可以在不增设音频播放设备的情况下,生成能够表现声源方位的音频信号,并消除当使用外放扬声器播放音频时产生的交叉串声,使用户可以享受到播放效果更好的音频。Embodiments of the present invention provide audio processing methods, devices, electronic equipment, storage media and program products, which can generate audio signals that can express the direction of sound sources without adding audio playback equipment, and eliminate the problem of using external speakers to play audio. The cross-talk generated during playback allows users to enjoy audio with better playback effects.
本发明实施例提供一种音频处理方法,包括:An embodiment of the present invention provides an audio processing method, including:
获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度;Obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played;
确定各所述外放扬声器与用户头部之间的信号传递角度;Determine the signal transmission angle between each of the external speakers and the user's head;
基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,所述抗串扰函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声;Based on the signal transmission angle and the target sound source angle, an anti-crosstalk function corresponding to the signal transmission angle is calculated. The anti-crosstalk function is used to eliminate the crossover generated by the at least two external speakers when the audio is externally played. Crosstalk;
基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。Based on the anti-crosstalk function, signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
相应的,本发明实施例提供一种音频处理装置,包括:Correspondingly, an embodiment of the present invention provides an audio processing device, including:
信号获取单元,用于获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度;A signal acquisition unit configured to acquire the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played;
角度确定单元,用于确定各所述外放扬声器与用户头部之间的信号传递角度;An angle determination unit, used to determine the signal transmission angle between each of the external speakers and the user's head;
函数计算单元,用于基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,所述抗串扰函数用于消除由于所述至少两个外放扬声器在外放音频时 产生的交叉串声;A function calculation unit configured to calculate an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle. The anti-crosstalk function is used to eliminate the problem caused by the at least two external loudspeakers being externally placed. When playing audio The resulting crosstalk;
信号变换单元,用于基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。A signal conversion unit, configured to perform signal conversion on the audio signals to be processed from each of the external speakers based on the anti-crosstalk function, to obtain a target playback audio signal corresponding to the audio signal to be played.
可选的,所述函数计算单元,用于基于所述信号传递角度,确定所述信号传递角度对应的扬声器头相关传递函数;Optionally, the function calculation unit is configured to determine, based on the signal transmission angle, the speaker head-related transfer function corresponding to the signal transmission angle;
基于所述目标声源角度,确定所述目标声源角度对应的声源头相关传递函数;Based on the target sound source angle, determine the sound source related transfer function corresponding to the target sound source angle;
根据所述扬声器头相关传递函数和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数。According to the speaker head related transfer function and the sound source related transfer function, the anti-crosstalk function corresponding to the signal transmission angle is calculated.
可选的,所述至少两个外放扬声器包括左扬声器和右扬声器,所述角度确定单元,用于确定所述左扬声器与用户头部之间的左侧信号传递角度;Optionally, the at least two external speakers include a left speaker and a right speaker, and the angle determination unit is used to determine the left signal transmission angle between the left speaker and the user's head;
确定所述右扬声器与所述用户头部之间的右侧信号传递角度;Determine a right signal transmission angle between the right speaker and the user's head;
所述函数计算单元,用于基于所述左侧信号传递角度,确定所述左扬声器与用户左耳之间的第一左耳头相关传递函数,以及所述左扬声器与用户右耳之间的第一右耳头相关传递函数;The function calculation unit is configured to determine, based on the left signal transmission angle, a first left ear head-related transfer function between the left speaker and the user's left ear, and a first left ear head-related transfer function between the left speaker and the user's right ear. First right ear head related transfer function;
基于所述右侧信号传递角度,确定所述右扬声器与用户左耳之间的第二左耳头相关传递函数,以及所述右扬声器与用户右耳之间的第二右耳头相关传递函数;Based on the right signal transmission angle, a second left ear head-related transfer function between the right speaker and the user's left ear, and a second right ear head-related transfer function between the right speaker and the user's right ear are determined ;
将所述第一左耳头相关传递函数、第一右耳头相关传递函数、第二左耳头相关传递函数和第二右耳头相关传递函数作为扬声器头相关传递函数。The first left ear tip related transfer function, the first right ear tip related transfer function, the second left ear tip related transfer function and the second right ear tip related transfer function are used as speaker head related transfer functions.
可选的,所述至少两个外放扬声器包括左扬声器和右扬声器,所述角度确定单元,用于确定所述左扬声器、所述右扬声器与用户头部之间的位置关系;Optionally, the at least two external speakers include a left speaker and a right speaker, and the angle determination unit is used to determine the positional relationship between the left speaker, the right speaker and the user's head;
若所述左扬声器和所述右扬声器相对于所述用户头部为左右对称关系,将任一外放扬声器与所述用户头部之间的角度作为信号传递角度;If the left speaker and the right speaker have a left-right symmetrical relationship with respect to the user's head, the angle between any external speaker and the user's head is used as the signal transmission angle;
所述函数计算单元,用于基于所述信号传递角度,确定所述左扬声器与用户左耳之间和所述右扬声器与用户右耳之间的第一头相关传递函数,以及所述左扬声器与用户右耳之间和所述右扬声器与用户左耳之间的第二头相关传递函数;The function calculation unit is configured to determine a first head-related transfer function between the left speaker and the user's left ear and between the right speaker and the user's right ear based on the signal transmission angle, and the left speaker a second head-related transfer function between the user's right ear and between the right speaker and the user's left ear;
将所述第一头相关传递函数和第二头相关传递函数作为扬声器头相关传递函数。The first head-related transfer function and the second head-related transfer function are regarded as speaker head-related transfer functions.
可选的,所述函数计算单元,用于根据所述扬声器头相关传递函数进行矩阵合并处理,得到所述待处理音频信号对应的扬声器串扰矩阵;Optionally, the function calculation unit is configured to perform matrix merging processing according to the speaker head related transfer function to obtain the speaker crosstalk matrix corresponding to the audio signal to be processed;
针对所述扬声器串扰矩阵进行矩阵抵消,计算出所述扬声器串扰矩阵的串扰抵消矩阵;Perform matrix cancellation on the speaker crosstalk matrix to calculate the crosstalk cancellation matrix of the speaker crosstalk matrix;
基于所述串扰抵消矩阵和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数。Based on the crosstalk cancellation matrix and the sound source related transfer function, an anti-crosstalk function corresponding to the signal transmission angle is calculated.
可选的,本发明实施例提供的音频处理装置还包括函数处理单元,用于获取预设的离散头相关传递函数;Optionally, the audio processing device provided by the embodiment of the present invention also includes a function processing unit for obtaining a preset discrete head related transfer function;
对所述离散头相关传递函数进行函数逼近处理,得到目标头相关传递函数; Perform function approximation processing on the discrete head related transfer function to obtain the target head related transfer function;
所述函数计算单元,用于基于所述信号传递角度和所述目标头相关传递函数,确定所述信号传递角度对应的扬声器头相关传递函数;The function calculation unit is configured to determine the speaker head-related transfer function corresponding to the signal transmission angle based on the signal transmission angle and the target head-related transfer function;
基于所述目标声源角度和所述目标头相关传递函数,确定所述目标声源角度对应的声源头相关传递函数。Based on the target sound source angle and the target head related transfer function, a sound source related transfer function corresponding to the target sound source angle is determined.
可选的,所述信号获取单元,用于获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号;Optionally, the signal acquisition unit is used to acquire the audio signal to be played by each of the at least two external speakers;
对所述待播放音频信号进行声源位置定位,确定各所述待播放音频信号对应的目标声源角度。Position the sound source of the audio signals to be played, and determine the target sound source angle corresponding to each of the audio signals to be played.
可选的,所述信号获取单元,用于获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号以及所述待播放音频信号对应的待播放视频帧;Optionally, the signal acquisition unit is configured to acquire the audio signal to be played from each of the at least two external speakers and the video frame to be played corresponding to the audio signal to be played;
从所述待播放视频帧中,确定发声对象的发声位置信息;Determine the voicing position information of the voicing object from the video frame to be played;
基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度。Based on the sounding position information of the sounding object, the target sound source angle corresponding to each of the audio signals to be played is calculated.
可选的,所述待播放视频帧中包括至少一个候选发声对象,本发明实施例提供的音频处理装置还包括发声对象确定单元,用于确定所述待播放音频信号对应的发声对象,获取所述发声对象的对象标识信息;Optionally, the video frame to be played includes at least one candidate voicing object. The audio processing device provided by the embodiment of the present invention further includes a voicing object determination unit, used to determine the voicing object corresponding to the audio signal to be played, and obtain the voicing object. Describes the object identification information of the vocal object;
所述信号获取单元,用于从所述待播放视频帧包括的所述候选发声对象中,基于所述对象标识信息进行信息匹配,确定所述发声对象;The signal acquisition unit is configured to perform information matching based on the object identification information from the candidate voicing objects included in the video frame to be played, and determine the voicing object;
获取所述发声对象在所述待播放视频帧中的目标显示区域,将所述目标显示区域的位置信息作为所述发声对象的发声位置信息。Obtain the target display area of the utterance object in the video frame to be played, and use the position information of the target display area as the utterance position information of the utterance object.
可选的,所述待播放视频帧中包括至少一个候选发声对象的显示区域,所述信号获取单元,用于针对所述待播放视频帧中的各所述显示区域,分别进行发声动作检测,若检测一所述显示区域中的候选发声对象执行了发声动作,将所述候选发声对象作为发声对象;Optionally, the video frame to be played includes a display area of at least one candidate voicing object, and the signal acquisition unit is configured to perform voicing action detection for each display area in the video frame to be played, respectively. If it is detected that a candidate voicing object in the display area has performed a voicing action, use the candidate voicing object as the voicing object;
获取所述发声对象在所述待播放视频帧中的目标显示区域,将所述目标显示区域的位置信息作为所述发声对象的发声位置信息。Obtain the target display area of the utterance object in the video frame to be played, and use the position information of the target display area as the utterance position information of the utterance object.
可选的,本发明实施例提供的音频处理装置还包括第一位置确定单元,用于响应于用户的位置选择操作,确定所述用户在所述待播放视频帧中对应的声音接收位置信息;Optionally, the audio processing device provided by the embodiment of the present invention further includes a first position determination unit, configured to determine the corresponding sound receiving position information of the user in the video frame to be played in response to the user's position selection operation;
所述信号获取单元,用于基于所述发声对象的发声位置信息和所述声音接收位置信息,确定发声位置与声音接收位置之间的信号传输方向;The signal acquisition unit is configured to determine the signal transmission direction between the sounding position and the sound receiving position based on the sounding position information of the sounding object and the sound receiving position information;
根据所述信号传输方向,计算每个所述待播放音频信号对应的目标声源角度。According to the signal transmission direction, the target sound source angle corresponding to each audio signal to be played is calculated.
可选的,本发明实施例提供的音频处理装置还包括第二位置确定单元,用于响应于用户的参考对象选择操作,确定所述用户在所述待播放视频帧中对应的参考对象;Optionally, the audio processing device provided by the embodiment of the present invention further includes a second position determination unit, configured to determine the reference object corresponding to the user in the video frame to be played in response to the user's reference object selection operation;
获取所述参考对象在所述待播放视频帧中的参考位置信息;Obtain the reference position information of the reference object in the video frame to be played;
所述信号获取单元,用于基于所述发声对象的发声位置信息和所述参考位置信息,确定所述发声对象与所述参考对象之间的信号传输角度,作为每个所述待播放音频信号对应的目 标声源角度。The signal acquisition unit is configured to determine a signal transmission angle between the sounding object and the reference object based on the sounding position information of the sounding object and the reference position information, as each audio signal to be played Corresponding purpose Mark the sound source angle.
可选的,所述信号获取单元,用于接收信号发送端发送的待播放数据包,所述待播放数据包基于各外放扬声器的待播放音频信号以及各所述待播放音频信号对应的目标声源角度进行编码得到;Optionally, the signal acquisition unit is configured to receive a data packet to be played sent by the signal transmitting end. The data packet to be played is based on the audio signal to be played from each external speaker and the target corresponding to each of the audio signals to be played. The sound source angle is encoded;
对所述待播放数据包进行解码,得到至少两个所述外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度。The data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
可选的,所述信号获取单元,用于接收云端发送的待播放数据包,所述待播放数据包由信号发送端发送给所述云端,所述待播放数据包由所述信号发送端基于各外放扬声器的待播放音频信号以及各所述待播放音频信号对应的目标声源角度进行编码得到;Optionally, the signal acquisition unit is used to receive the data packet to be played sent by the cloud. The data packet to be played is sent to the cloud by the signal sending end. The data packet to be played is sent by the signal sending end based on The audio signals to be played by each external speaker and the target sound source angle corresponding to each of the audio signals to be played are obtained by encoding;
对所述待播放数据包进行解码,得到至少两个所述外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度。The data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
可选的,所述信号变换单元,用于基于所述抗串扰函数计算抗串扰共振函数,所述抗串扰共振函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声以及人耳外耳道共振的影响;Optionally, the signal conversion unit is configured to calculate an anti-crosstalk resonance function based on the anti-crosstalk function. The anti-crosstalk resonance function is used to eliminate cross-talk generated by the at least two external speakers when the audio is externally played. The influence of sound and resonance of the external auditory canal of the human ear;
基于所述抗串扰共振函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。Based on the anti-crosstalk resonance function, signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
可选的,本发明实施例提供的音频处理装置还包括调整参数计算单元,用于获取各所述外放扬声器的播放设置参数以及目标用户对应的音频播放偏好参数;Optionally, the audio processing device provided by the embodiment of the present invention also includes an adjustment parameter calculation unit for obtaining the playback setting parameters of each of the external speakers and the audio playback preference parameters corresponding to the target user;
对所述待处理音频信号进行音频参数提取,得到所述待处理音频信号对应的待播放参数;Perform audio parameter extraction on the audio signal to be processed to obtain the parameters to be played corresponding to the audio signal to be processed;
根据所述播放设置参数、音频播放偏好参数和所述待播放参数,计算所述待处理音频信号对应的音频调整函数;Calculate the audio adjustment function corresponding to the audio signal to be processed according to the playback setting parameters, the audio playback preference parameters and the to-be-played parameters;
所述信号变换单元,用于基于所述音频调整函数、所述目标声源角度、所述信号传递角度和所述抗串扰函数,对所述待处理音频信号进行信号变换,得到所述待处理音频信号对应的目标播放音频信号。The signal transformation unit is configured to perform signal transformation on the audio signal to be processed based on the audio adjustment function, the target sound source angle, the signal transmission angle and the anti-crosstalk function to obtain the audio signal to be processed. The target corresponding to the audio signal plays the audio signal.
可选的,本发明实施例提供的音频处理装置还包括音频播放单元,用于将各所述待播放音频信号对应的目标播放音频信号,分别发送给各所述待播放音频信号对应的所述外放扬声器,触发所述外放扬声器播放对应的所述目标播放音频信号。Optionally, the audio processing device provided by the embodiment of the present invention further includes an audio playback unit, configured to send the target playback audio signal corresponding to each of the audio signals to be played to the target playback audio signal corresponding to each of the audio signals to be played. An external speaker triggers the external speaker to play the corresponding target audio signal.
相应的,本发明实施例还提供一种电子设备,包括存储器和处理器;所述存储器存储有应用程序,所述处理器用于运行所述存储器内的应用程序,以执行本发明实施例所提供的任一种音频处理方法中的步骤。Correspondingly, embodiments of the present invention also provide an electronic device, including a memory and a processor; the memory stores application programs, and the processor is used to run the application programs in the memory to execute the tasks provided by the embodiments of the present invention. steps in any audio processing method.
相应的,本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有多条指令,所述指令适于处理器进行加载,以执行本发明实施例所提供的任一种音频处理方法中的步骤。Correspondingly, embodiments of the present invention also provide a computer-readable storage medium that stores a plurality of instructions, and the instructions are suitable for loading by the processor to execute any of the instructions provided by the embodiments of the present invention. Steps in an audio processing method.
此外,本发明实施例还提供一种计算机程序产品,包括计算机程序或指令,所述计算机 程序或指令被处理器执行时实现本发明实施例所提供的任一种音频处理方法中的步骤。In addition, embodiments of the present invention also provide a computer program product, including a computer program or instructions, and the computer When the program or instructions are executed by the processor, the steps in any of the audio processing methods provided by the embodiments of the present invention are implemented.
有益效果beneficial effects
采用本发明实施例的方案,可以获取至少两个外放扬声器中每个该外放扬声器的待播放音频信号和每个该待播放音频信号对应的目标声源角度,确定各该外放扬声器与用户头部之间的信号传递角度,基于该信号传递角度和该目标声源角度,计算该信号传递角度对应的抗串扰函数,该抗串扰函数用于消除由于该至少两个外放扬声器在外放音频时产生的交叉串声,基于该目标声源角度、该信号传递角度和该抗串扰函数,对各该外放扬声器的该待处理音频信号进行信号变换,得到该待播放音频信号对应的目标播放音频信号;由于在本发明实施例中,计算了能够消除在开放音场中产生的交叉串声的抗串扰函数,还基于目标声源角度和抗串扰函数对待播放音频信号进行变换,使得目标播放音频信号能够表现声源位置信息、抵消在播放时产生的交叉串声,因此,可以在不增设音频播放设备的情况下,生成能够表现声源方位的音频信号,并消除当使用外放扬声器播放音频时产生的交叉串声,使用户可以享受到播放效果更好的音频。。Using the solution of the embodiment of the present invention, the audio signal to be played by each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played can be obtained, and the angle between each external speaker and the audio signal to be played can be determined. The signal transmission angle between the user's heads. Based on the signal transmission angle and the target sound source angle, the anti-crosstalk function corresponding to the signal transmission angle is calculated. The anti-crosstalk function is used to eliminate the problem caused by the at least two external speakers. The cross-talk generated during audio processing is based on the target sound source angle, the signal transmission angle and the anti-crosstalk function. The audio signal to be processed of each external speaker is signal transformed to obtain the target corresponding to the audio signal to be played. Play the audio signal; because in the embodiment of the present invention, the anti-crosstalk function that can eliminate the cross-talk generated in the open sound field is calculated, and the audio signal to be played is also transformed based on the target sound source angle and the anti-crosstalk function, so that the target Playing audio signals can express sound source position information and offset crosstalk generated during playback. Therefore, it is possible to generate audio signals that can express the sound source position without adding additional audio playback equipment, and eliminate the problem when using external speakers. The cross-talk generated when playing audio allows users to enjoy audio with better playback effects. .
附图说明Description of the drawings
为了更清楚地说明本发明实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without exerting creative efforts.
图1a是本发明实施例提供的音频处理方法的场景示意图;Figure 1a is a schematic scene diagram of an audio processing method provided by an embodiment of the present invention;
图1b是本发明实施例提供的音频处理方法的另一场景示意图;Figure 1b is a schematic diagram of another scenario of the audio processing method provided by an embodiment of the present invention;
图2是本发明实施例提供的音频处理方法的流程图;Figure 2 is a flow chart of an audio processing method provided by an embodiment of the present invention;
图3是本发明实施例提供的待播放视频帧一个示意图;Figure 3 is a schematic diagram of a video frame to be played according to an embodiment of the present invention;
图4是本发明实施例提供的待播放视频帧另一个示意图;Figure 4 is another schematic diagram of a video frame to be played according to an embodiment of the present invention;
图5是本发明实施例提供的音频信号传输的理想状态和实际状态示意图;Figure 5 is a schematic diagram of the ideal state and actual state of audio signal transmission provided by the embodiment of the present invention;
图6是本发明实施例提供的虚拟5.1环绕立体声的技术实现示意图;Figure 6 is a schematic diagram of the technical implementation of virtual 5.1 surround sound provided by an embodiment of the present invention;
图7是本发明实施例提供的音频处理装置的结构示意图;Figure 7 is a schematic structural diagram of an audio processing device provided by an embodiment of the present invention;
图8是本发明实施例提供的音频处理装置的另一结构示意图;Figure 8 is another structural schematic diagram of an audio processing device provided by an embodiment of the present invention;
图9是本发明实施例提供的电子设备的结构示意图。Figure 9 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.
本发明的实施方式Embodiments of the invention
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those skilled in the art without making creative efforts fall within the scope of protection of the present invention.
本发明实施例提供一种音频处理方法、装置、电子设备和计算机可读存储介质。具体地,本发明实施例提供适用于音频处理装置的音频处理方法,该音频处理装置可以集成在电子设备中。 Embodiments of the present invention provide an audio processing method, device, electronic equipment and computer-readable storage medium. Specifically, embodiments of the present invention provide an audio processing method suitable for an audio processing device, and the audio processing device can be integrated in an electronic device.
该电子设备可以为终端等设备,包括但不限于移动终端和固定终端,例如移动终端包括但不限于智能手机、智能手表、平板电脑、笔记本电脑、智能车载等,其中,固定终端包括但不限于台式电脑、智能电视等。The electronic device may be a terminal or other equipment, including but not limited to mobile terminals and fixed terminals. For example, mobile terminals include but are not limited to smartphones, smart watches, tablets, laptops, smart vehicles, etc., wherein fixed terminals include but are not limited to Desktop computers, smart TVs, etc.
该电子设备还可以为服务器等设备,该服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器,但并不局限于此。The electronic device can also be a server or other equipment. The server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers. It can also provide cloud services, cloud databases, cloud computing, and cloud functions. , cloud storage, network services, cloud communications, middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and cloud servers for basic cloud computing services such as big data and artificial intelligence platforms, but not Limited to this.
本发明实施例的音频处理方法,可以由服务器实现,也可以由终端和服务器共同实现。The audio processing method in the embodiment of the present invention can be implemented by the server, or can be implemented by the terminal and the server together.
下面以终端和服务器共同实现该音频处理方法为例,对该方法进行说明。The method will be described below by taking the audio processing method jointly implemented by the terminal and the server as an example.
如图1a所示,本发明实施例提供的音频处理系统可以包括信号发送端10和信号接收端20等;信号发送端10与信号接收端20之间通过网络连接,比如,通过有线或无线网络连接等,其中,信号发送端10可以作为向信号接收端20发送待播放音频信号和目标声源角度的电子设备存在。As shown in Figure 1a, the audio processing system provided by the embodiment of the present invention may include a signal sending end 10 and a signal receiving end 20; the signal sending end 10 and the signal receiving end 20 are connected through a network, for example, through a wired or wireless network connection, etc., wherein the signal sending end 10 can exist as an electronic device that sends the audio signal to be played and the target sound source angle to the signal receiving end 20 .
在一些示例中,信号发送端10可以用于接收用户的语音输入,生成待播放音频信号,对待播放音频信号进行声源分析得到目标声源角度,以及向信号接收端20发送待播放音频信号和目标声源角度。In some examples, the signal sending end 10 can be used to receive the user's voice input, generate an audio signal to be played, perform sound source analysis on the audio signal to be played to obtain the target sound source angle, and send the audio signal to be played and the audio signal to the signal receiving end 20. Target sound source angle.
信号接收端20,可以用于获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度,确定各所述外放扬声器与用户头部之间的信号传递角度,基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,所述抗串扰函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声,基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。The signal receiving end 20 can be used to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played, and determine the angle of each external speaker. The signal transmission angle between the speaker and the user's head, based on the signal transmission angle and the target sound source angle, calculates the anti-crosstalk function corresponding to the signal transmission angle, the anti-crosstalk function is used to eliminate the at least The crosstalk generated by the two external speakers when playing audio externally is based on the anti-crosstalk function. The audio signal to be processed of each external speaker is signal transformed to obtain the target corresponding to the audio signal to be played. Play audio signal.
可以理解的是,在一些实施例中,信号接收端20执行的音频处理的步骤也可以由信号发送端10执行,信号发送端10可以直接将信号变换后的目标播放音频信号发送给信号接收端20等,本发明实施例对此不做限定。It can be understood that in some embodiments, the audio processing steps performed by the signal receiving end 20 can also be performed by the signal transmitting end 10, and the signal transmitting end 10 can directly send the signal-transformed target playback audio signal to the signal receiving end. 20, etc., the embodiment of the present invention does not limit this.
如图1b所示,本发明实施例提供的音频处理系统可以包括信号发送端10、信号接收端20和云端30等;信号发送端10、信号接收端20和云端30之间通过网络连接,比如,通过有线或无线网络连接等,其中,信号发送端10可以作为向云端30发送待播放音频信号和目标声 源角度的电子设备存在。As shown in Figure 1b, the audio processing system provided by the embodiment of the present invention may include a signal sending end 10, a signal receiving end 20, a cloud 30, etc.; the signal sending end 10, the signal receiving end 20 and the cloud 30 are connected through a network, such as , through wired or wireless network connection, etc., wherein the signal sending end 10 can be used to send the audio signal to be played and the target sound to the cloud 30 Source angle of electronic equipment exists.
在一些示例中,信号发送端10可以用于接收用户的语音输入,生成待播放音频信号,对待播放音频信号进行声源分析得到目标声源角度,以及向云端30发送待播放音频信号和目标声源角度。In some examples, the signal sending end 10 can be used to receive the user's voice input, generate an audio signal to be played, perform sound source analysis on the audio signal to be played to obtain the target sound source angle, and send the audio signal to be played and the target sound to the cloud 30 source angle.
云端30可以用于将待播放音频信号和目标声源角度转发给信号接收端20。The cloud 30 can be used to forward the audio signal to be played and the target sound source angle to the signal receiving end 20 .
信号接收端20,可以用于获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度,确定各所述外放扬声器与用户头部之间的信号传递角度,基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,所述抗串扰函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声,基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。The signal receiving end 20 can be used to obtain the audio signal to be played by each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played, and determine the audio signal of each external speaker. The signal transmission angle between the speaker and the user's head, based on the signal transmission angle and the target sound source angle, calculates the anti-crosstalk function corresponding to the signal transmission angle, the anti-crosstalk function is used to eliminate the at least The crosstalk generated by the two external speakers when playing audio externally is based on the anti-crosstalk function. The audio signal to be processed of each external speaker is signal transformed to obtain the target corresponding to the audio signal to be played. Play audio signal.
可以理解的是,在一些实施例中,信号接收端20执行的音频处理的步骤也可以由信号发送端10或者云端30执行,例如信号发送端10或者云端30可以直接将信号变换后的目标播放音频信号发送给信号接收端20等,本发明实施例对此不做限定。It can be understood that in some embodiments, the audio processing steps performed by the signal receiving end 20 can also be performed by the signal transmitting end 10 or the cloud 30. For example, the signal transmitting end 10 or the cloud 30 can directly play the target after signal transformation. The audio signal is sent to the signal receiving end 20 and so on, which is not limited in the embodiment of the present invention.
以下分别进行详细说明。需要说明的是,以下实施例的描述顺序不作为对实施例优选顺序的限定。Each is explained in detail below. It should be noted that the order of description of the following embodiments does not limit the preferred order of the embodiments.
本发明实施例将从音频处理装置的角度进行描述,该音频处理装置具体可以集成在服务器或终端中。The embodiments of the present invention will be described from the perspective of an audio processing device. The audio processing device may be integrated in a server or a terminal.
如图2所示,本实施例的音频处理方法的具体流程可以如下:As shown in Figure 2, the specific process of the audio processing method in this embodiment can be as follows:
201、获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度。201. Obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
在本发明实施例中,外放扬声器为在开放声场中用于音频播放的扬声器。其中,外放扬声器的数量为至少两个。In the embodiment of the present invention, the external speaker is a speaker used for audio playback in an open sound field. Among them, the number of external speakers is at least two.
例如,开放声场可以为使用电视、智能音箱等家电外放播放音视频、在汽车中使用车载音响外放播放音视频等场景。外放扬声器可以是设置在电视、手机、车载终端等电子设备中的扬声器,等等。For example, the open sound field can be used in scenarios such as using TVs, smart speakers and other home appliances to play audio and video, and using car speakers to play audio and video in cars. The external speakers may be speakers installed in electronic devices such as televisions, mobile phones, and car terminals, etc.
其中,待播放音频信号为音频处理装置获取到的原始音频信号。比如,在视频会议场景中,待播放音频信号可以是视频会议客户端收集到的正在说话的用户的语音信号。或者,在音视频播放场景中,待播放音频信号可以是从需要播放的音频文件中得到的音频信号,等 等。The audio signal to be played is the original audio signal obtained by the audio processing device. For example, in a video conferencing scenario, the audio signal to be played may be the voice signal of the speaking user collected by the video conferencing client. Or, in the audio and video playback scenario, the audio signal to be played can be the audio signal obtained from the audio file that needs to be played, etc. wait.
可以理解的是,每个所述外放扬声器的待播放音频信号可以是相同的,也可以是不同的。例如,某个需要播放的音频文件中可以自然地包括要分别在左声道和右声道中播放的音频信号,则各外放扬声器可分别对应左声道的音频信号或右声道的音频信号。It can be understood that the audio signals to be played by each of the external speakers may be the same or different. For example, an audio file that needs to be played can naturally include audio signals to be played in the left channel and the right channel respectively. Then each external speaker can correspond to the audio signal of the left channel or the audio of the right channel respectively. Signal.
具体的,目标声源角度为在音频播放过程中预期为用户呈现的声源的角度。比如,在视频通话时,正在说话的用户在视频画面中的位置,与视频通话应用的使用者在视频画面中的位置之间的夹角为30°,则目标声源角度可以为30°。Specifically, the target sound source angle is the angle of the sound source expected to appear to the user during audio playback. For example, during a video call, if the angle between the position of the speaking user in the video screen and the position of the user of the video call application in the video screen is 30°, then the target sound source angle can be 30°.
其中,每个所述待播放音频信号对应的目标声源角度可以是相同的,也可以是不同的。例如,若各外放扬声器的待播放音频信号相同,此时目标声源角度是相同的,或者,在有多个用户同时发言的场景下,不同的待播放音频信号可以根据不同用户的语音输入信号得到,此时,不同的待播放音频信号对应的目标声源角度可以是不同的。The target sound source angle corresponding to each audio signal to be played may be the same or different. For example, if the audio signals to be played by each external speaker are the same, the target sound source angle is the same at this time, or, in a scenario where multiple users speak at the same time, different audio signals to be played can be based on the voice input of different users. The signal is obtained. At this time, the target sound source angles corresponding to different audio signals to be played may be different.
其中,即使各外放扬声器的待播放音频信号不同,不同的待播放音频信号对应的目标声源角度也可以是相同的。比如,同一用户的发言被划分为要分别在左声道和右声道中播放的音频信号,此时各外放扬声器可分别对应左声道的音频信号或右声道的音频信号,但是左声道的音频信号或右声道的音频信号对应的是同一发言用户,其对应的目标声源角度也相同。Even if the audio signals to be played by the external speakers are different, the target sound source angles corresponding to the different audio signals to be played can be the same. For example, the speech of the same user is divided into audio signals to be played in the left channel and the right channel respectively. At this time, each external speaker can correspond to the audio signal of the left channel or the audio signal of the right channel, but the left channel The audio signal of the channel or the audio signal of the right channel corresponds to the same speaking user, and the corresponding target sound source angle is also the same.
在一些可选的实施例中,目标声源角度可能需要对待播放音频信号进行定位才能得到,步骤201具体可以包括:In some optional embodiments, the target sound source angle may need to be obtained by positioning the audio signal to be played. Step 201 may specifically include:
获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号;Obtain the audio signal to be played from each of the at least two external speakers;
对所述待播放音频信号进行声源位置定位,确定各所述待播放音频信号对应的目标声源角度。Position the sound source of the audio signals to be played, and determine the target sound source angle corresponding to each of the audio signals to be played.
其中,声源位置定位可以采用波达方向定位技术(Direction Of Arrival,DOA)、麦克风阵列系统定位等技术实现。具体的,声源位置定位可以在生成待播放音频信号的信号发送端进行。例如,在视频会议时,可以由正在发言的用户的终端通过MIC(microphone,麦克风)阵列定位得到目标声源角度。Among them, sound source location positioning can be achieved using Direction of Arrival (DOA), microphone array system positioning and other technologies. Specifically, the sound source position positioning can be performed at the signal sending end that generates the audio signal to be played. For example, during a video conference, the target sound source angle can be obtained through MIC (microphone) array positioning by the terminal of the user who is speaking.
或者,声源位置定位可以在获取待播放音频信号的云端或者信号接收端进行。例如,信号接收端可以基于发言用户的发言时刻与接收到待播放音频信号的时刻之间的时间延迟等信息得到目标声源角度。Alternatively, the sound source position positioning can be performed on the cloud or signal receiving end that obtains the audio signal to be played. For example, the signal receiving end can obtain the target sound source angle based on information such as the time delay between the speaking time of the speaking user and the time when the audio signal to be played is received.
在另一些可选的实施例中,待播放音频信号可以对应有视频帧,目标声源角度可以根据发声对象在视频帧中所处的位置得到。步骤“获取至少两个外放扬声器中每个所述外放扬 声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度”,具体可以包括:In other optional embodiments, the audio signal to be played may correspond to a video frame, and the target sound source angle may be obtained based on the position of the sounding object in the video frame. Step "Obtain at least two external speakers for each of the "The audio signal to be played by the speaker and the target sound source angle corresponding to each of the audio signals to be played", which may specifically include:
获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号以及所述待播放音频信号对应的待播放视频帧;Obtain the audio signal to be played from each of the at least two external speakers and the video frame to be played corresponding to the audio signal to be played;
从所述待播放视频帧中,确定发声对象的发声位置信息;Determine the voicing position information of the voicing object from the video frame to be played;
基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度。Based on the sounding position information of the sounding object, the target sound source angle corresponding to each of the audio signals to be played is calculated.
其中,待播放音频信号对应的待播放视频帧可以是与待播放音频信号同步的视频帧,或者,待播放视频帧可以是待播放音频信号同步的同步视频帧以及顺序位于同步视频帧之前和/或之后的N个视频帧。Wherein, the video frame to be played corresponding to the audio signal to be played can be a video frame synchronized with the audio signal to be played, or the video frame to be played can be a synchronized video frame synchronized with the audio signal to be played and the sequence is located before and/or the synchronized video frame. or the next N video frames.
具体的,N的数量可以由技术人员根据实际情况设置,本发明实施例对此不做限定。Specifically, the number of N can be set by technicians according to actual conditions, and this is not limited in the embodiment of the present invention.
例如,待播放视频帧可以包括待播放音频信号同步的同步视频帧以及顺序位于同步视频帧之后的10个视频帧。云端或者信号接收端可以根据顺序位于同步视频帧之后的10个视频帧,提前进行发声对象的检测,提升音频处理的速度。For example, the video frame to be played may include a synchronized video frame synchronized with the audio signal to be played and 10 video frames sequentially located after the synchronized video frame. The cloud or the signal receiving end can detect the vocal objects in advance based on the 10 video frames that are sequentially located after the synchronized video frame to improve the speed of audio processing.
其中,发声位置信息可以是发声对象的显示区域在待播放视频帧中的区域位置。或者,发声位置信息可以是发声对象在待播放视频帧对应的虚拟空间中的虚拟位置信息,等等。The sound position information may be the position of the display area of the sound object in the video frame to be played. Alternatively, the utterance position information may be the virtual position information of the utterance object in the virtual space corresponding to the video frame to be played, and so on.
在一些示例中,获取到待播放音频信号时,可以确定待播放音频信号与待播放视频帧中的哪一个对象所对应,确定出对应的发声对象后可以根据该发声对象在待播放视频帧中的位置计算目标声源角度。即,所述待播放视频帧中可以包括至少一个候选发声对象,步骤“从所述待播放视频帧中,确定发声对象的发声位置信息”之前,本发明实施例提供的音频处理方法还可以包括:In some examples, when the audio signal to be played is obtained, it can be determined which object in the video frame to be played corresponds to the audio signal to be played. After determining the corresponding sound-producing object, the sound-producing object can be placed in the video frame to be played according to the sound-producing object. The position of the target sound source angle is calculated. That is, the video frame to be played may include at least one candidate voicing object. Before the step of "determining the voicing position information of the voicing object from the video frame to be played", the audio processing method provided by the embodiment of the present invention may also include :
确定所述待播放音频信号对应的发声对象,获取所述发声对象的对象标识信息。Determine the sound-emitting object corresponding to the audio signal to be played, and obtain the object identification information of the sound-emitting object.
其中,对象标识信息可以是发声对象的账号昵称、唯一的身份ID等能够标识发声对象的标识信息,本发明实施例对对象标识信息的内容和形式不做限定。The object identification information may be identification information that can identify the speaking object, such as the account nickname of the speaking object, a unique identity ID, etc. The embodiment of the present invention does not limit the content and form of the object identification information.
对应的,步骤“从所述待播放视频帧中,确定发声对象的发声位置信息”,具体可以包括:Correspondingly, the step "determine the voicing position information of the voicing object from the video frame to be played" may specifically include:
从所述待播放视频帧包括的所述候选发声对象中,基于所述对象标识信息进行信息匹配,确定所述发声对象;From the candidate voicing objects included in the video frame to be played, perform information matching based on the object identification information to determine the voicing object;
获取所述发声对象在所述待播放视频帧中的目标显示区域,将所述目标显示区域的位置信息作为所述发声对象的发声位置信息。 Obtain the target display area of the utterance object in the video frame to be played, and use the position information of the target display area as the utterance position information of the utterance object.
例如,如图3所示,智能电视中显示有3个不同的候选发声对象用户1、用户2和用户3。当用户1在视频会议中发言时,可以基于视频会议应用的应用数据,确定待播放音频信号对应的发声对象为用户1,其对象标识信息可以为“用户1”。For example, as shown in Figure 3, three different candidate utterance objects, User 1, User 2 and User 3, are displayed on the smart TV. When user 1 speaks in the video conference, based on the application data of the video conference application, it can be determined that the speaking object corresponding to the audio signal to be played is user 1, and its object identification information can be "user 1".
智能电视可以根据“用户1”这一对象标识信息,从各个候选发声对象中确定“用户1”对应的发声对象以及发声对象在待播放视频帧中的目标显示区域。The smart TV can determine the voicing object corresponding to "User 1" and the target display area of the voicing object in the video frame to be played from each candidate voicing object based on the object identification information of "User 1".
可以理解的是,每个候选发声对象的显示区域可以是固定的,在确定待播放音频信号对应的发声对象后即可以获取该发声对象的目标显示区域的位置信息。或者,用户也可以自行对候选发声对象的显示区域进行设置,此时,需要根据发声对象的对象标识信息以及用户的个性化设置信息确定目标显示区域的位置信息。It can be understood that the display area of each candidate sounding object may be fixed. After determining the sounding object corresponding to the audio signal to be played, the position information of the target display area of the sounding object can be obtained. Alternatively, the user can also set the display area of the candidate utterance object by himself. In this case, the position information of the target display area needs to be determined based on the object identification information of the utterance object and the user's personalized setting information.
在另一些示例中,可以针对待播放视频帧中的各个对象所在的区域,通过检测各个对象是否发生了嘴部的活动等,以确定是哪个对象在发声。也就是说,待播放视频帧中可以包括至少一个候选发声对象的显示区域,步骤“从所述待播放视频帧中,确定发声对象的发声位置信息”,包括:In other examples, it is possible to determine which object is speaking by detecting whether each object has mouth movements, etc., based on the area where each object in the video frame to be played is located. That is to say, the video frame to be played may include the display area of at least one candidate voicing object, and the step "determining the voicing position information of the voicing object from the video frame to be played" includes:
针对所述待播放视频帧中的各所述显示区域,分别进行发声动作检测,若检测一所述显示区域中的候选发声对象执行了发声动作,将所述候选发声对象作为发声对象;Perform voice action detection for each display area in the video frame to be played respectively. If it is detected that a candidate voice object in the display area has performed a voice action, use the candidate voice object as the voice object;
获取所述发声对象在所述待播放视频帧中的目标显示区域,将所述目标显示区域的位置信息作为所述发声对象的发声位置信息。Obtain the target display area of the utterance object in the video frame to be played, and use the position information of the target display area as the utterance position information of the utterance object.
具体的,发声动作检测可以是只检测各个候选发声对象的嘴部是否发生了活动。或者,为了避免候选发声对象未发声时的嘴部动作造成的误识别例如抿嘴、咳嗽等,可以在嘴部活动识别的基础上,增加面部肌肉识别等,提升发声动作检测的准确率。Specifically, the utterance action detection may be to only detect whether the mouth of each candidate utterance object moves. Or, in order to avoid misrecognition caused by the mouth movements of the candidate vocal target when they are not speaking, such as pursed lips, coughing, etc., facial muscle recognition can be added on the basis of mouth movement recognition to improve the accuracy of vocal movement detection.
比如,如图3所示,用户1的嘴部发生了活动,此时,候选发声对象用户1即为发声对象。此时,目标显示区域,即为用户1所在的局部区域。For example, as shown in Figure 3, user 1's mouth moves. At this time, user 1, the candidate utterance target, is the utterance target. At this time, the target display area is the local area where user 1 is located.
在另一些可选的示例中,用户可以手动选择自己的位置,步骤“基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度”之前,本发明实施例提供的音频处理方法还可以包括:In other optional examples, the user can manually select his or her own position. Before the step of "calculating the target sound source angle corresponding to each of the audio signals to be played based on the sound-emitting position information of the sound-emitting object", the present invention implements The audio processing methods provided by the example can also include:
响应于用户的位置选择操作,确定所述用户在所述待播放视频帧中对应的声音接收位置信息。In response to the user's position selection operation, the corresponding sound receiving position information of the user in the video frame to be played is determined.
例如,用户可以在视频会议时选择自己在待播放视频帧中的位置,比如画面左下角、画面中心等等。用户的位置选择操作,可以是用户对自己对应的画面显示框的拖动操作、在 视频会议画面中的点击/双击等触发操作等等。For example, during a video conference, users can select their position in the video frame to be played, such as the lower left corner of the screen, the center of the screen, etc. The user's position selection operation can be the user's drag operation of the corresponding screen display frame, Click/double-click and other trigger operations in the video conference screen, etc.
相应的,步骤“基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度”,具体可以包括:Correspondingly, the step "calculate the target sound source angle corresponding to each of the audio signals to be played based on the sounding position information of the sounding object" may specifically include:
基于所述发声对象的发声位置信息和所述声音接收位置信息,确定发声位置与声音接收位置之间的信号传输方向;Based on the sound-emitting position information of the sound-emitting object and the sound-receiving position information, determine the signal transmission direction between the sound-emitting position and the sound-receiving position;
根据所述信号传输方向,计算每个所述待播放音频信号对应的目标声源角度。According to the signal transmission direction, the target sound source angle corresponding to each audio signal to be played is calculated.
其中,信号传输方向指的是发声对象的发声位置与用户选择的声音接收位置之间的方向。Among them, the signal transmission direction refers to the direction between the sounding position of the sounding object and the sound receiving position selected by the user.
可以理解的是,声音接收位置信息所指示的位置可以并不在待播放视频帧中。例如,在线上演唱会的场景下,用户可以预先选择自己在虚拟的演唱会场馆中的虚拟位置。此时,虚拟位置即为用户在线上演唱会的待播放视频帧对应的声音接收位置。如果用户选择的虚拟位置可以在演唱会的视频画面中显示,也可以不在演唱会的视频画面中显示。It can be understood that the position indicated by the sound receiving position information may not be in the video frame to be played. For example, in an online concert scenario, users can pre-select their virtual location in the virtual concert venue. At this time, the virtual position is the sound receiving position corresponding to the video frame to be played in the user's online concert. If the virtual location selected by the user can be displayed in the video screen of the concert, it may or may not be displayed in the video screen of the concert.
对应的,发声对象即为演唱会中的表演者,发声位置信息即为表演者在虚拟场馆中的位置信息。Correspondingly, the vocal object is the performer in the concert, and the vocal location information is the performer's location information in the virtual venue.
可选的,为了增强用户观看音视频时的观看体验,可以允许用户选择某个视频中的对象,以增强用户身临其境的感受。即,步骤“基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度”之前,本发明实施例提供的音频处理方法还可以包括:Optionally, in order to enhance the user's viewing experience when watching audio and video, the user can be allowed to select an object in a certain video to enhance the user's immersive experience. That is, before the step "calculating the target sound source angle corresponding to each of the audio signals to be played based on the sounding position information of the sounding object", the audio processing method provided by the embodiment of the present invention may also include:
响应于用户的参考对象选择操作,确定所述用户在所述待播放视频帧中对应的参考对象;In response to the user's reference object selection operation, determine the reference object corresponding to the user in the video frame to be played;
获取所述参考对象在所述待播放视频帧中的参考位置信息。Obtain the reference position information of the reference object in the video frame to be played.
例如,如图4所示,终端显示的视频画面中有两个对象,用户可以选择其中一个作为参考对象。For example, as shown in Figure 4, there are two objects in the video screen displayed by the terminal, and the user can select one of them as a reference object.
相应的,步骤“基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度”,具体可以包括:Correspondingly, the step "calculate the target sound source angle corresponding to each of the audio signals to be played based on the sounding position information of the sounding object" may specifically include:
基于所述发声对象的发声位置信息和所述参考位置信息,确定所述发声对象与所述参考对象之间的信号传输角度,作为每个所述待播放音频信号对应的目标声源角度。Based on the sounding position information of the sounding object and the reference position information, the signal transmission angle between the sounding object and the reference object is determined as the target sound source angle corresponding to each of the audio signals to be played.
可以理解的是,参考对象的位置以及发声对象的位置是可以变化的。例如,终端中播放的是一部戏剧,用户选择的参考对象可以是戏剧的主角,发声对象可以随时发生变化,各 发声对象的位置以及参考对象的位置都是可以移动的。It is understood that the position of the reference object and the position of the sounding object may vary. For example, if a drama is played on the terminal, the reference object selected by the user can be the protagonist of the drama, and the speaking object can change at any time. The position of the sounding object and the position of the reference object can be moved.
202、确定各所述外放扬声器与用户头部之间的信号传递角度;202. Determine the signal transmission angle between each of the external speakers and the user's head;
其中,信号传递角度为外放扬声器与用户头部之间的夹角。具体的,当外放扬声器或者外放扬声器所在的终端的人体检测范围中只有一个人时,可以默认这个人的头部位置即为用户头部位置。Among them, the signal transmission angle is the angle between the external speaker and the user's head. Specifically, when there is only one person in the human body detection range of the external speaker or the terminal where the external speaker is located, the head position of this person can be the user's head position by default.
如果外放扬声器或者外放扬声器所在的终端的人体检测范围中有至少两个人时,可以将距离最近的人作为用户;或者,可以将终端上已登录账户的人脸与每个人的人脸进行匹配,将匹配成功的人作为用户,等等。If there are at least two people in the human body detection range of the external speaker or the terminal where the external speaker is located, the nearest person can be used as the user; or, the face of the logged-in account on the terminal can be compared with the faces of each person. Matching, the successfully matched person will be regarded as the user, etc.
在一些实施例中,当信号发送端与信号接收端无法直接通信时,可以在通过云端将待播放数据包转发给信号接收端。即,步骤“获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度”,可以包括:In some embodiments, when the signal sending end and the signal receiving end cannot communicate directly, the data packet to be played can be forwarded to the signal receiving end through the cloud. That is, the step "obtaining the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played" may include:
接收云端发送的待播放数据包,所述待播放数据包由信号发送端发送给所述云端,所述待播放数据包由所述信号发送端基于各外放扬声器的待播放音频信号以及各所述待播放音频信号对应的目标声源角度进行编码得到;Receive the data packet to be played sent by the cloud. The data packet to be played is sent to the cloud by the signal sending end. The data packet to be played is sent by the signal sending end based on the audio signal to be played by each external speaker and the audio signal of each external speaker. The target sound source angle corresponding to the audio signal to be played is obtained by encoding;
对所述待播放数据包进行解码,得到至少两个所述外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度。The data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
其中,信号发送端可以直接基于待播放音频信号进行声源定位得到目标声源角度,进而对待播放音频信号和目标声源角度进行编码,得到待播放数据包。Among them, the signal sending end can directly perform sound source positioning based on the audio signal to be played to obtain the target sound source angle, and then encode the audio signal to be played and the target sound source angle to obtain the data packet to be played.
或者,信号发送端也可以只对待播放音频信号进行编码,得到待播放数据包,将待播放数据包发送给云端,云端将待播放数据包转发给信号接收端后,信号接收端可以对待播放数据包进行解码,根据得到的待播放音频信号确定目标声源角度。Alternatively, the signal sending end can only encode the audio signal to be played, obtain the data packet to be played, and send the data packet to be played to the cloud. After the cloud forwards the data packet to be played to the signal receiving end, the signal receiving end can obtain the data to be played. The package is decoded, and the target sound source angle is determined based on the obtained audio signal to be played.
可以理解的是,由于信号发送端和信号接收端的处理能力有限以及信号发送端与信号接收端距离太远无法直接通信等原因,可以在云端获取待播放音频信号进行处理。It is understandable that due to the limited processing capabilities of the signal sending end and the signal receiving end and the fact that the signal sending end and the signal receiving end are too far apart to communicate directly, the audio signal to be played can be obtained in the cloud for processing.
为了减轻传输压力,信号发送端可以对待播放音频信号进行编码处理,步骤“获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度”,可以包括:In order to reduce the transmission pressure, the signal sending end can perform encoding processing on the audio signal to be played. The step "obtains the audio signal to be played by each of the external speakers in at least two external speakers and the audio signal corresponding to each of the audio signals to be played. "Target sound source angle", which can include:
接收信号发送端发送的待播放数据包,所述待播放数据包基于各外放扬声器的待播放音频信号以及各所述待播放音频信号对应的目标声源角度进行编码得到;Receive the data packet to be played sent by the signal transmitting end, the data packet to be played is encoded based on the audio signal to be played from each external speaker and the target sound source angle corresponding to each of the audio signals to be played;
对所述待播放数据包进行解码,得到至少两个所述外放扬声器中每个所述外放扬声器 的待播放音频信号和每个所述待播放音频信号对应的目标声源角度。Decode the data packet to be played to obtain each of the at least two external speakers. The audio signals to be played and the target sound source angle corresponding to each of the audio signals to be played.
其中,信号发送端可以直接基于待播放音频信号进行声源定位得到目标声源角度,进而对待播放音频信号和目标声源角度进行编码,得到待播放数据包。Among them, the signal sending end can directly perform sound source positioning based on the audio signal to be played to obtain the target sound source angle, and then encode the audio signal to be played and the target sound source angle to obtain the data packet to be played.
或者,信号发送端也可以只对待播放音频信号进行编码,得到待播放数据包,将待播放数据包发送给云端,触发云端对待播放数据包进行解码,根据得到的待播放音频信号确定目标声源角度。Alternatively, the signal sending end can only encode the audio signal to be played, obtain the data packet to be played, send the data packet to be played to the cloud, trigger the cloud to decode the data packet to be played, and determine the target sound source based on the obtained audio signal to be played. angle.
进一步的,云端在得到信号传递角度后,也可以直接在云端执行步骤203。Furthermore, after the cloud obtains the signal transmission angle, it can also directly perform step 203 on the cloud.
203、基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,所述抗串扰函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声。203. Based on the signal transmission angle and the target sound source angle, calculate an anti-crosstalk function corresponding to the signal transmission angle. The anti-crosstalk function is used to eliminate the noise generated by the at least two external speakers when playing audio externally. of crosstalk.
如图5所示,在理想状态下,虚拟在θ角度(即目标声源角度)的声源E经过θ角度的左耳HRTF(Head Related Transfer Function,头相关传递函数)即α0处理,给左耳播放;经过θ角度的右耳HRTF即β0处理,给右耳播放,用户就能听到虚拟到θ角度的声源。也就是说,理想状态下用户的左右耳的物理通路基本是隔离的,左耳的声音基本不可能让右耳听到,反之亦然。As shown in Figure 5, in an ideal state, the sound source E that is virtually at the θ angle (i.e., the target sound source angle) is processed by the left ear HRTF (Head Related Transfer Function) at the θ angle, that is, α0, and is given to the left ear. Ear playback; after processing the HRTF of the right ear at the θ angle, that is, β0, and playing it to the right ear, the user can hear the virtual sound source at the θ angle. In other words, under ideal conditions, the physical pathways of the user's left and right ears are basically isolated, and it is basically impossible for the sound from the left ear to be heard by the right ear, and vice versa.
即,理想的虚拟方位收听信号为:
L=α0(θ,f)E(f)
R=β0(θ,f)E(f)
That is, the ideal virtual azimuth listening signal is:
L=α 0 (θ,f)E(f)
R=β 0 (θ,f)E(f)
但是在开放声场的场景中,只想发送给用户左耳的声音实际上用户的右耳也能接收到,只想发送给用户右耳的声音实际上用户的左耳也能接收到,这种情况即为交叉串声(cross-talk)。However, in an open sound field scenario, sounds that are only sent to the user's left ear can actually be received by the user's right ear, and sounds that are only sent to the user's right ear can actually be received by the user's left ear. The situation is cross-talk.
即,实际的收听信号是串扰和外放扬声器位置的干扰结果,以外放扬声器包括左扬声器和右扬声器为例,用户听到的未经处理的音频信号可以表示为:
L=α11,f)α0(θ,f)E(f)+β22,f)β0(θ,f)E(f)
R=α22,f)β0(θ,f)E(f)+β11,f)a0(θ,f)E(f)
That is, the actual listening signal is the result of crosstalk and interference from the position of the external speakers. Taking the external speakers including the left speaker and the right speaker as an example, the unprocessed audio signal heard by the user can be expressed as:
L=α 11 ,f)α 0 (θ,f)E(f)+β 22 ,f)β 0 (θ,f)E(f)
R=α 22 ,f)β 0 (θ,f)E(f)+β 11 ,f)a 0 (θ,f)E(f)
因此,为了实现更好的虚拟声源的效果,需要消除在信号的空间传播过程中产生的串扰。步骤“基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数”,可以包括: Therefore, in order to achieve better virtual sound source effects, it is necessary to eliminate the crosstalk generated during the spatial propagation of signals. The step "calculate the anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle" may include:
基于所述信号传递角度,确定所述信号传递角度对应的扬声器头相关传递函数;Based on the signal transmission angle, determine the speaker head related transfer function corresponding to the signal transmission angle;
基于所述目标声源角度,确定所述目标声源角度对应的声源头相关传递函数;Based on the target sound source angle, determine the sound source related transfer function corresponding to the target sound source angle;
根据所述扬声器头相关传递函数和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数。According to the speaker head related transfer function and the sound source related transfer function, the anti-crosstalk function corresponding to the signal transmission angle is calculated.
在实际应用过程中,各个外放扬声器是有空间位置的,如果已经按照目标声源角度处理的信号,直接由外放扬声器播放,实际播放的声音还要乘以外放扬声器于用户头部之间的信号传递角度对应的HRTF才是用户真正听到的声音。In the actual application process, each external speaker has a spatial position. If the signal that has been processed according to the target sound source angle is directly played by the external speaker, the actual played sound must be multiplied by the distance between the external speaker and the user's head. The HRTF corresponding to the signal transmission angle is the sound that the user actually hears.
因此,计算抗串扰函数前,需要基于信号传递角度确定外放扬声器对应的扬声器头相关传递函数。Therefore, before calculating the anti-crosstalk function, it is necessary to determine the speaker head-related transfer function corresponding to the external speaker based on the signal transmission angle.
可选的,以所述至少两个外放扬声器包括左扬声器和右扬声器为例,步骤“确定各所述外放扬声器与用户头部之间的信号传递角度”,具体包括:Optionally, taking the at least two external speakers including a left speaker and a right speaker as an example, the step "determining the signal transmission angle between each of the external speakers and the user's head" specifically includes:
确定所述左扬声器与用户头部之间的左侧信号传递角度;Determine the left signal transmission angle between the left speaker and the user's head;
确定所述右扬声器与所述用户头部之间的右侧信号传递角度;Determine a right signal transmission angle between the right speaker and the user's head;
步骤“基于所述信号传递角度,确定所述信号传递角度对应的扬声器头相关传递函数”,可以包括:The step "based on the signal transmission angle, determine the speaker head-related transfer function corresponding to the signal transmission angle" may include:
基于所述左侧信号传递角度,确定所述左扬声器与用户左耳之间的第一左耳头相关传递函数,以及所述左扬声器与用户右耳之间的第一右耳头相关传递函数;Based on the left signal transmission angle, a first left ear head-related transfer function between the left speaker and the user's left ear, and a first right ear head-related transfer function between the left speaker and the user's right ear are determined ;
基于所述右侧信号传递角度,确定所述右扬声器与用户左耳之间的第二左耳头相关传递函数,以及所述右扬声器与用户右耳之间的第二右耳头相关传递函数;Based on the right signal transmission angle, a second left ear head-related transfer function between the right speaker and the user's left ear, and a second right ear head-related transfer function between the right speaker and the user's right ear are determined ;
将所述第一左耳头相关传递函数、第一右耳头相关传递函数、第二左耳头相关传递函数和第二右耳头相关传递函数作为扬声器头相关传递函数。The first left ear tip related transfer function, the first right ear tip related transfer function, the second left ear tip related transfer function and the second right ear tip related transfer function are used as speaker head related transfer functions.
需要说明的是,左扬声器和右扬声器并不是对于外放扬声器与用户之间的位置关系的限定。一般来说,左扬声器指的是两个外放扬声器中位置位于左侧的外放扬声器,右扬声器是两个外放扬声器中位置位于右侧的外放扬声器。It should be noted that the left speaker and the right speaker do not limit the positional relationship between the external speaker and the user. Generally speaking, the left speaker refers to the left speaker among the two speakers, and the right speaker refers to the right speaker among the two speakers.
具体的,确定第一左耳头相关传递函数、第一右耳头相关传递函数、第二左耳头相关传递函数和第二右耳头相关传递函数时,可以获取预先建立的头相关传输函数库,在应用时,确定了头相关传输函数库之后,根据左侧信号传递角度和右侧信号传递角度,从头相关传输函数库中获取各角度对应的第一左耳头相关传递函数、第一右耳头相关传递函数、第二左耳头相关传递函数和第二右耳头相关传递函数。 Specifically, when determining the first left ear head related transfer function, the first right ear head related transfer function, the second left ear head related transfer function and the second right ear head related transfer function, the pre-established head related transfer function can be obtained When applying the library, after determining the head-related transfer function library, according to the left signal transmission angle and the right signal transmission angle, the first left ear head-related transfer function and the first head-related transfer function corresponding to each angle are obtained from the head-related transfer function library. a right ear tip related transfer function, a second left ear tip related transfer function and a second right ear tip related transfer function.
其中,左侧信号传递角度可以只有一个角度,即左扬声器与用户头部某个位置之间的角度,在确定第一左耳头相关传递函数、第一右耳头相关传递函数时可以只根据这一个角度从头相关传输函数库中获取。Among them, the left signal transmission angle can have only one angle, that is, the angle between the left speaker and a certain position on the user's head. When determining the first left ear head related transfer function and the first right ear head related transfer function, it can only be based on This perspective is taken from the head-related transfer function library.
或者,为了提高音频处理精度,左侧信号传递角度可以包括两个角度,即左扬声器与用户左耳之间的角度以及左扬声器与用户右耳之间的角度。相应的,在确定第一左耳头相关传递函数时,可以根据左扬声器与用户左耳之间的角度从头相关传输函数库中获取;在确定第一右耳头相关传递函数时,可以根据左扬声器与用户右耳之间的角度从头相关传输函数库中获取。Alternatively, in order to improve audio processing accuracy, the left signal transmission angle may include two angles, namely the angle between the left speaker and the user's left ear and the angle between the left speaker and the user's right ear. Correspondingly, when determining the first left ear head related transfer function, it can be obtained from the head related transfer function library according to the angle between the left speaker and the user's left ear; when determining the first right ear head related transfer function, it can be obtained according to the left ear head related transfer function. The angle between the speaker and the user's right ear is obtained from a library of head-related transfer functions.
右侧信号传递角度与左侧信号传递角度类似,本发明实施例在此不再赘述。The signal transmission angle on the right side is similar to the signal transmission angle on the left side, and will not be described again in the embodiment of the present invention.
在一些可选的实施例中,若左扬声器和右扬声器相对于用户头部方向左右对称,此时,可以对扬声器头相关传递函数进一步简化。即,以所述至少两个外放扬声器包括左扬声器和右扬声器为例,步骤“确定各所述外放扬声器与用户头部之间的信号传递角度”,具体可以包括:In some optional embodiments, if the left speaker and the right speaker are left and right symmetrical with respect to the direction of the user's head, at this time, the transfer function related to the speaker head can be further simplified. That is, taking the at least two external speakers including a left speaker and a right speaker as an example, the step "determining the signal transmission angle between each of the external speakers and the user's head" may specifically include:
确定所述左扬声器、所述右扬声器与用户头部之间的位置关系;Determine the positional relationship between the left speaker, the right speaker and the user's head;
若所述左扬声器和所述右扬声器相对于所述用户头部为左右对称关系,将任一外放扬声器与所述用户头部之间的角度作为信号传递角度。If the left speaker and the right speaker have a left-right symmetrical relationship with respect to the user's head, the angle between any external speaker and the user's head is used as the signal transmission angle.
相应的,步骤“基于所述信号传递角度,确定所述信号传递角度对应的扬声器头相关传递函数”,包括:Correspondingly, the step "based on the signal transmission angle, determine the speaker head-related transfer function corresponding to the signal transmission angle" includes:
基于所述信号传递角度,确定所述左扬声器与用户左耳之间和所述右扬声器与用户右耳之间的第一头相关传递函数,以及所述左扬声器与用户右耳之间和所述右扬声器与用户左耳之间的第二头相关传递函数;Based on the signal transmission angle, a first head-related transfer function between the left speaker and the user's left ear and between the right speaker and the user's right ear is determined, and a first head-related transfer function between the left speaker and the user's right ear and the Describe the second head-related transfer function between the right speaker and the user's left ear;
将所述第一头相关传递函数和第二头相关传递函数作为扬声器头相关传递函数。The first head-related transfer function and the second head-related transfer function are regarded as speaker head-related transfer functions.
也就是说,此时的HRTF是同侧对称的,左扬声器到左耳的头相关传递函数,等于右扬声器到右耳的头相关传递函数,异侧同理,即
α1=α2
β1=β2
In other words, the HRTF at this time is symmetrical on the same side. The head-related transfer function from the left speaker to the left ear is equal to the head-related transfer function from the right speaker to the right ear. The same is true for the opposite side, that is, α 1 = α 2
β 1 = β 2
其中,如图5所示,α1和β1是角度为θ1的左扬声器到用户左耳和右耳的传递函数,α2和β2是角度为θ2的右扬声器到用户的右耳和左耳的头相关传递函数。 Among them, as shown in Figure 5, α 1 and β 1 are the transfer functions from the left speaker with an angle θ1 to the user’s left and right ears, α 2 and β 2 are the transfer functions from the right speaker with an angle θ2 to the user’s right and left ears. Head-related transfer function of the ear.
可选的,可以根据交叉串声的产生规律,确定可以消除交叉串声的抗串扰函数。步骤“根据所述扬声器头相关传递函数和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数”,具体可以包括:Optionally, an anti-crosstalk function that can eliminate crosstalk can be determined based on the generation rules of crosstalk. The step "calculate the anti-crosstalk function corresponding to the signal transmission angle based on the related transfer function of the speaker head and the related transfer function of the sound source" may specifically include:
根据所述扬声器头相关传递函数进行矩阵合并处理,得到所述待处理音频信号对应的扬声器串扰矩阵;Perform matrix merging processing according to the speaker head related transfer function to obtain the speaker crosstalk matrix corresponding to the audio signal to be processed;
针对所述扬声器串扰矩阵进行矩阵抵消,计算出所述扬声器串扰矩阵的串扰抵消矩阵;Perform matrix cancellation on the speaker crosstalk matrix to calculate the crosstalk cancellation matrix of the speaker crosstalk matrix;
基于所述串扰抵消矩阵和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数。Based on the crosstalk cancellation matrix and the sound source related transfer function, an anti-crosstalk function corresponding to the signal transmission angle is calculated.
以外放扬声器包括左扬声器和右扬声器为例,将用户听到的未经处理的音频信号,根据所述扬声器头相关传递函数进行矩阵合并处理,得到音频信号的矩阵表示:
Taking the external speakers including the left speaker and the right speaker as an example, the unprocessed audio signals heard by the user are processed by matrix merging according to the relevant transfer function of the speaker head to obtain the matrix representation of the audio signal:
其中,扬声器串扰矩阵即为:
Among them, the speaker crosstalk matrix is:
进一步地,可以针对扬声器串扰矩阵,设计串扰抵消矩阵使得处理后的音频信号在空间传播过程中可以消除扬声器串扰矩阵。Furthermore, a crosstalk cancellation matrix can be designed for the speaker crosstalk matrix so that the processed audio signal can eliminate the speaker crosstalk matrix during spatial propagation.
具体的,串扰抵消矩阵A可以如下公式所示:
Specifically, the crosstalk cancellation matrix A can be expressed as follows:
此时,在原始的待播放音频信号的基础上,经过矩阵A进行预先抵消的结果,然后在空间传播过程中抵消掉A的效果,最后用户就能听到合适的声音。At this time, based on the original audio signal to be played, the result of pre-cancellation is performed through matrix A, and then the effect of A is offset during the spatial propagation process, and finally the user can hear the appropriate sound.
经过串扰抵消矩阵A处理后分别发送给左扬声器和右扬声器的音频信号x、y可以通过如下公式表示:
The audio signals x and y sent to the left speaker and the right speaker respectively after being processed by the crosstalk cancellation matrix A can be expressed by the following formula:
即:

Right now:

因此,信号传递角度对应的抗串扰函数可以通过基于串扰抵消矩阵和声源头相关传递函数计算得到。Therefore, the anti-crosstalk function corresponding to the signal transmission angle can be calculated based on the crosstalk cancellation matrix and the sound source related transfer function.
抗串扰函数可以通过如下公式表示:

The anti-crosstalk function can be expressed by the following formula:

在左右扬声器相对于用户头部对称的情况下,抗串扰函数可以简化为如下形式:

When the left and right speakers are symmetrical with respect to the user's head, the anti-crosstalk function can be simplified to the following form:

需要说明的是,前述实施例虽然是以双扬声器为例进行说明,但是本发明实施例的音频处理方法也适用于多扬声器的场景。多扬声器场景下的核心问题跟双扬声器是相似的,即为外放扬声器之间的串扰和以及各外放扬声器附加的传递函数,可以理解的是,外放扬声器越多,外放扬声器相互之间的串扰越严重,问题越复杂。It should be noted that although the foregoing embodiment uses dual speakers as an example for description, the audio processing method of the embodiment of the present invention is also applicable to a multi-speaker scenario. The core problem in a multi-speaker scenario is similar to that of dual speakers, which is the crosstalk sum between external speakers and the additional transfer function of each external speaker. It can be understood that the more external speakers, the greater the interaction between the external speakers. The more severe the crosstalk is, the more complex the problem becomes.
但是,多外放扬声器的场景下,可以根据音频信号在用户的左右耳叠加的原理,实现计算串扰抵消矩阵A。将待播放音频信号乘一个串扰抵消矩阵A,进而计算抗串扰函数。具体的,多扬声器场景中,A是n*m的大小,n为扬声器数量,m为可以虚拟的声源数量。However, in a scenario with multiple external speakers, the crosstalk cancellation matrix A can be calculated based on the principle of superposition of audio signals in the user's left and right ears. Multiply the audio signal to be played by a crosstalk cancellation matrix A to calculate the anti-crosstalk function. Specifically, in a multi-speaker scenario, A is the size of n*m, n is the number of speakers, and m is the number of virtual sound sources.
可以理解的是,在实际应用过程中,仅使用由实际测量得到的HRTF库可能无法得到精确的角度对应的HRTF。因此,可以根据已有的角度及对应的HRTF进行估计处理。步骤“根据所述扬声器头相关传递函数和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数”之前,本发明实施例提供的音频处理方法还可以包括:It is understandable that in practical applications, it may not be possible to obtain accurate angle-corresponding HRTFs using only the HRTF library obtained from actual measurements. Therefore, estimation processing can be performed based on the existing angles and corresponding HRTFs. Before the step "calculate the anti-crosstalk function corresponding to the signal transmission angle according to the related transfer function of the speaker head and the related transfer function of the sound source", the audio processing method provided by the embodiment of the present invention may also include:
获取预设的离散头相关传递函数;Get the preset discrete head related transfer function;
对所述离散头相关传递函数进行函数逼近处理,得到目标头相关传递函数。 Perform function approximation processing on the discrete head-related transfer function to obtain a target head-related transfer function.
具体的,函数逼近处理可以包括插值方法或者曲线拟合方法等,技术人员可以根据实际应用需求选择合适的函数逼近处理方式。Specifically, function approximation processing can include interpolation methods or curve fitting methods, etc. Technical personnel can choose an appropriate function approximation processing method according to actual application requirements.
对应的,步骤“基于所述信号传递角度,确定所述信号传递角度对应的扬声器头相关传递函数”,具体可以包括:Correspondingly, the step "based on the signal transmission angle, determine the speaker head-related transfer function corresponding to the signal transmission angle" may specifically include:
基于所述信号传递角度和所述目标头相关传递函数,确定所述信号传递角度对应的扬声器头相关传递函数;Based on the signal transmission angle and the target head-related transfer function, determine the speaker head-related transfer function corresponding to the signal transmission angle;
相应的,步骤“基于所述目标声源角度,确定所述目标声源角度对应的声源头相关传递函数”,具体可以包括:Correspondingly, the step "based on the target sound source angle, determine the sound source related transfer function corresponding to the target sound source angle" may specifically include:
基于所述目标声源角度和所述目标头相关传递函数,确定所述目标声源角度对应的声源头相关传递函数。Based on the target sound source angle and the target head related transfer function, a sound source related transfer function corresponding to the target sound source angle is determined.
可选的,预先设置的头相关传输函数库中的数据可以是从现有的数据库中得到,例如,可以从CIPIC数据库中获取本实施例中的头相关传输函数库中的数据。进一步的,本发明实施例可以根据CIPIC数据库进行插值处理,得到包括更多精确角度对应的HRTF的目标头相关传递函数。Optionally, the data in the preset header-related transmission function library can be obtained from an existing database. For example, the data in the header-related transmission function library in this embodiment can be obtained from the CIPIC database. Furthermore, embodiments of the present invention can perform interpolation processing according to the CIPIC database to obtain a target head-related transfer function that includes HRTFs corresponding to more precise angles.
可以理解的是,不同的用户由于头部形状存在差异等原因,同一的头相关传输函数库可能不适用于各个用户。因此,在本实施例中,可以预先建立不同的用户的头相关传输函数库。在应用时,首先确定要接收目标播放音频信号的目标用户,具体可以通过终端或者应用程序中登录的账户的信息、目标用户的人脸特征信息等来确定。确定了目标用户的头相关传输函数库之后,从目标用户的头相关传输函数库中获取扬声器头相关传递函数和声源头相关传递函数。It is understandable that due to differences in head shapes of different users, the same head-related transfer function library may not be suitable for each user. Therefore, in this embodiment, head-related transmission function libraries of different users can be established in advance. When applying, first determine the target user who wants to receive the target playback audio signal, which can be determined through the information of the account logged in in the terminal or application, the facial feature information of the target user, etc. After determining the target user's head-related transfer function library, the speaker head-related transfer function and the sound source-related transfer function are obtained from the target user's head-related transfer function library.
204、基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。204. Based on the anti-crosstalk function, perform signal transformation on the to-be-processed audio signals of each of the external speakers to obtain a target playback audio signal corresponding to the to-be-played audio signal.
具体的,分别发送给左扬声器和右扬声器的目标播放音频信号x、y可以通过如下公式表示:
x=GL0,f)E(f)
y=GR0,f)E(f)
Specifically, the target playback audio signals x and y sent to the left speaker and the right speaker respectively can be expressed by the following formula:
x=G L0 ,f)E(f)
y=G R0 ,f)E(f)
在一个可选的实施例中,可以应用本发明实施例提供的音频处理方法实现用双外放扬声器虚拟出5.1环绕立体声的效果。如图6所示,不考虑低音通路,并且假设双外放扬声器与 5.1声道场景中左前右前两个音箱角度相同,即30°和330°位置。则用双外放扬声器虚拟出5.1环绕立体声的技术实现可以通过如下的公式表示:
L′=L+0.707C+GLLS,f)LS+GLRS,f)RS
R′=R+0.707C+GRLS,f)LS+GRRS,f)RS
In an optional embodiment, the audio processing method provided by the embodiment of the present invention can be applied to achieve the virtual 5.1 surround sound effect using dual external speakers. As shown in Figure 6, the bass path is not considered, and it is assumed that the dual external speakers and In a 5.1-channel scene, the front left and right speakers have the same angle, that is, 30° and 330° positions. The technology of using dual external speakers to virtualize 5.1 surround sound can be expressed by the following formula:
L′=L+0.707C+G LLS ,f)LS+G LRS ,f)RS
R′=R+0.707C+G RLS ,f)LS+G RRS ,f)RS
其中,



in,



α是左前扬声器到同侧的HRTF,β是左前扬声器到异侧的HRTF,理想情况下左前扬声器与右前扬声器是对称的,因此α也是右前扬声器到同侧的HRTF,β也是右前扬声器到异侧的HRTF。α is the HRTF from the front left speaker to the same side, β is the HRTF from the front left speaker to the opposite side. Ideally, the front left speaker and the front right speaker are symmetrical, so α is also the HRTF from the front right speaker to the same side, and β is also the HRTF from the front right speaker to the opposite side. HRTF.
HL和HR分别表示不同角度,f频率下左后和右后扬声器到双耳的HRTF。HL and HR respectively represent the HRTF from the left rear and right rear speakers to both ears at f frequency at different angles.
上面的公式中,HL和HR需要的两个角度参数在模拟5.1的场景下是已知的,即左后和右后音箱位置,120°和240°,要确定的参数是8个传递函数,即四个方位到双耳的8个函数。In the above formula, the two angle parameters required for HL and HR are known in the simulated 5.1 scenario, that is, the positions of the left and right rear speakers, 120° and 240°. The parameters to be determined are 8 transfer functions, That is, 8 functions from four directions to both ears.
也就是说,左扬声器要播放的目标播放音频信号中可以包括模拟左前声道的待播放音频信号、模拟正前声道的待播放音频信号、经过抗串扰函数处理后可以抵消左扬声器传到右耳的串扰效果的音频信号以及经过抗串扰函数处理后可以抵消右扬声器传到左耳的串扰效果的音频信号。That is to say, the target audio signal to be played by the left speaker can include the audio signal to be played that simulates the left front channel, the audio signal to be played that simulates the front channel, and after being processed by the anti-crosstalk function, it can offset the audio signal transmitted from the left speaker to the right. The audio signal of the crosstalk effect from the right ear to the left ear and the audio signal processed by the anti-crosstalk function can offset the crosstalk effect from the right speaker to the left ear.
右扬声器要播放的目标播放音频信号的内容与前述类似,在此不再赘述。The content of the target audio signal to be played by the right speaker is similar to the above, and will not be described again here.
另外,假设是一般的人头是对称的,因此头相关传递函数也是左右对称的,比如330°和30°方位到左右耳的头相关传递函数是对称的,因此可以把头相关传递函数简化成4个,相当于:
HRRS,f)=HLLS,f)=α2
HRLS,f)=HLRS,f)=β2
In addition, it is assumed that the general human head is symmetrical, so the head-related transfer function is also symmetrical. For example, the head-related transfer functions from the 330° and 30° directions to the left and right ears are symmetrical, so the head-related transfer function can be simplified into 4 , equivalent to:
H RRS ,f)=H LLS ,f)=α 2
H RLS ,f)=H LRS ,f)=β 2
此时, at this time,
整理得到图6可以表示为:

After sorting out Figure 6, it can be expressed as:

在另一个可选的实施例中,可以应用本发明实施例提供的音频处理方法实现用双外放扬声器虚拟出声场展宽的效果。双外放扬声器声场展宽,相当于在5声道的情况用左前、右前两个声道虚拟左后和右后两个声道,同时没有左前右前和中央声道,因此,可以通过如下公式表示:

In another optional embodiment, the audio processing method provided by the embodiment of the present invention can be applied to achieve the virtual sound field broadening effect of using dual external speakers. The sound field of dual external speakers is widened, which is equivalent to using the front left and front right channels to virtual the rear left and right channels in a 5-channel situation. At the same time, there are no front left, right, and center channels. Therefore, it can be expressed by the following formula :

在实际应用过程中,HRTF的采集,一般是通过内耳道拾音得到的,也就是说这种采集方法测得的HRTF除了包含耳道外空气信道传输的效应,也包含了人耳耳道共振产生的影响。如果把这种HRTF直接施加于信号,那么尽管用户可以听到明显的空间感,但是中频会失真,因为用户实际听到的信号是经过两次人耳共振效应的结果。In practical applications, HRTF is generally collected through sound pickup in the inner auditory canal. That is to say, the HRTF measured by this acquisition method not only includes the effect of air channel transmission outside the ear canal, but also includes the resonance generated by the human ear canal. Impact. If this HRTF is directly applied to the signal, although the user can hear an obvious sense of space, the intermediate frequency will be distorted because the signal actually heard by the user is the result of two human ear resonance effects.
因此,可以设计抗串扰共振函数,消除交叉串声以及耳道共振的影响。步骤“基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号”,具体可以包括:Therefore, an anti-crosstalk resonance function can be designed to eliminate the effects of crosstalk and ear canal resonance. Step "Based on the anti-crosstalk function, perform signal transformation on the audio signal to be processed from each of the external speakers to obtain the target audio signal to be played corresponding to the audio signal to be played", which may specifically include:
基于所述抗串扰函数计算抗串扰共振函数,所述抗串扰共振函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声以及人耳外耳道共振的影响;Calculate an anti-crosstalk resonance function based on the anti-crosstalk function, the anti-crosstalk resonance function is used to eliminate the impact of cross-talk and resonance of the external auditory canal of the human ear due to the at least two external speakers when the audio is externally played;
基于所述抗串扰共振函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。 Based on the anti-crosstalk resonance function, signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
具体的,对于同一个频率,左右通道的能量比和相位关系不变,就不会引起方向感误差。抗串扰共振函数可以用如下公式表示:

Specifically, for the same frequency, the energy ratio and phase relationship of the left and right channels remain unchanged, which will not cause direction error. The anti-crosstalk resonance function can be expressed by the following formula:

由于抗串扰共振函数的分母相当于左右声道平均的频点能量谱,不影响频点间相位,而且左右声道同一个频点除的数相同,不影响同一个频点左右声道的能量关系,而且左右声道的发射信号能量经过抗串扰共振函数后能量不变,因此,通过抗串扰共振函数的处理,信号任然保持稳定,各频点的极点也被消除掉了。Since the denominator of the anti-crosstalk resonance function is equivalent to the average frequency energy spectrum of the left and right channels, it does not affect the phase between frequency points, and the same frequency point of the left and right channels is divided by the same number, which does not affect the energy of the left and right channels at the same frequency point. relationship, and the energy of the transmitted signals of the left and right channels remains unchanged after passing through the anti-crosstalk resonance function. Therefore, through the processing of the anti-crosstalk resonance function, the signal remains stable, and the poles at each frequency point are also eliminated.
可选的,在进行音频处理时,可以根据用户的音量设置等,对待播放音频信号进行相应的处理。步骤“基于所述抗串扰函数,对所述待处理音频信号进行信号变换”之前,本发明实施例提供的音频处理方法还包括:Optionally, when performing audio processing, the audio signal to be played can be processed accordingly according to the user's volume setting, etc. Before the step of "conducting signal transformation on the audio signal to be processed based on the anti-crosstalk function", the audio processing method provided by the embodiment of the present invention also includes:
获取各所述外放扬声器的播放设置参数以及目标用户对应的音频播放偏好参数;Obtain the playback setting parameters of each of the external speakers and the audio playback preference parameters corresponding to the target user;
对所述待处理音频信号进行音频参数提取,得到所述待处理音频信号对应的待播放参数;Perform audio parameter extraction on the audio signal to be processed to obtain the parameters to be played corresponding to the audio signal to be processed;
根据所述播放设置参数、音频播放偏好参数和所述待播放参数,计算所述待处理音频信号对应的音频调整函数。According to the playback setting parameters, the audio playback preference parameters and the to-be-played parameters, the audio adjustment function corresponding to the to-be-processed audio signal is calculated.
在本发明实施例中,目标用户为音频播放设备中登录的用户或者当前使用音频播放设备的用户。In this embodiment of the present invention, the target user is a user logged in to the audio playback device or a user currently using the audio playback device.
其中,音频播放偏好参数是根据目标用户在确定音频调整参数过程前,预先进行的播放效果调整操作生成的。具体的,音频播放偏好参数可以包括但不限于音频频率(音调)调整参数、音频响度调整参数等等。Among them, the audio playback preference parameters are generated based on the playback effect adjustment operation performed by the target user before determining the audio adjustment parameters. Specifically, the audio playback preference parameters may include but are not limited to audio frequency (pitch) adjustment parameters, audio loudness adjustment parameters, and so on.
其中,对所述待处理音频信号进行音频参数提取,可以是对待处理音频信号进行音频解析,得到待处理音频信号对应的频率、采样位数、通道数、响度、比特率等具体的参数;或者,也可以是对待处理音频信号进行特征提取,得到待处理音频信号对应的特征向量,该特征向量可以表征待处理音频信号的参数特征,等等。Extracting audio parameters from the audio signal to be processed may include audio analysis of the audio signal to be processed to obtain specific parameters such as frequency, number of sampling bits, number of channels, loudness, bit rate, etc. corresponding to the audio signal to be processed; or , or feature extraction of the audio signal to be processed is performed to obtain a feature vector corresponding to the audio signal to be processed, which feature vector can represent the parameter characteristics of the audio signal to be processed, and so on.
相应的,步骤“基于所述抗串扰函数,对所述待处理音频信号进行信号变换,得到所述待处理音频信号对应的目标播放音频信号”,包括: Correspondingly, the step "based on the anti-crosstalk function, perform signal transformation on the audio signal to be processed to obtain the target playback audio signal corresponding to the audio signal to be processed" includes:
基于所述音频调整函数、所述目标声源角度、所述信号传递角度和所述抗串扰函数,对所述待处理音频信号进行信号变换,得到所述待处理音频信号对应的目标播放音频信号。Based on the audio adjustment function, the target sound source angle, the signal transmission angle and the anti-crosstalk function, the audio signal to be processed is signal transformed to obtain the target playback audio signal corresponding to the audio signal to be processed. .
其中,计算所述待处理音频信号对应的音频调整函数的过程,可以是对待播放参数进行参数的放大或者缩小等操作,以使得经过音频调整函数处理后的目标播放音频信号在基于播放设置参数播放时,可以满足音频播放偏好参数。或者,也可以是对矩阵或向量形式的待播放参数进行卷积或者相乘等操作,等等。The process of calculating the audio adjustment function corresponding to the audio signal to be processed may be to perform operations such as amplifying or reducing the parameters to be played, so that the target playback audio signal processed by the audio adjustment function is played based on the playback setting parameters. , the audio playback preference parameters can be met. Alternatively, the parameters to be played in the form of matrices or vectors can also be convolved or multiplied, etc.
在一些可选的实施例中,本发明实施例提供的音频处理方法还可以包括:In some optional embodiments, the audio processing method provided by the embodiment of the present invention may also include:
将各所述待播放音频信号对应的目标播放音频信号,分别发送给各所述待播放音频信号对应的所述外放扬声器,触发所述外放扬声器播放对应的所述目标播放音频信号。The target playback audio signal corresponding to each of the audio signals to be played is sent to the external speaker corresponding to each of the audio signals to be played, and the external speaker is triggered to play the corresponding target playback audio signal.
也就是说,在云端或者信号接收端处理得到目标播放音频信号后,可以将目标播放音频信号发送到对应的外放扬声器进行播放。That is to say, after the target playback audio signal is obtained through processing in the cloud or the signal receiving end, the target playback audio signal can be sent to the corresponding external speaker for playback.
由上可知,本发明实施例可以获取至少两个外放扬声器中每个外放扬声器的待播放音频信号和每个待播放音频信号对应的目标声源角度,确定各外放扬声器与用户头部之间的信号传递角度,基于信号传递角度和目标声源角度,计算信号传递角度对应的抗串扰函数,抗串扰函数用于消除由于至少两个外放扬声器在外放音频时产生的交叉串声,基于目标声源角度、信号传递角度和抗串扰函数,对各外放扬声器的待处理音频信号进行信号变换,得到待播放音频信号对应的目标播放音频信号;由于在本发明实施例中,计算了能够消除在开放音场中产生的交叉串声的抗串扰函数,还基于目标声源角度和抗串扰函数对待播放音频信号进行变换,使得目标播放音频信号能够表现声源位置信息、抵消在播放时产生的交叉串声,因此,可以在不增设音频播放设备的情况下,生成能够表现声源方位的音频信号,并消除当使用外放扬声器播放音频时产生的交叉串声,使用户可以享受到播放效果更好的音频。It can be seen from the above that the embodiment of the present invention can obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each audio signal to be played, and determine the distance between each external speaker and the user's head. Based on the signal transmission angle and the target sound source angle, the anti-crosstalk function corresponding to the signal transmission angle is calculated. The anti-crosstalk function is used to eliminate the crosstalk generated by at least two external speakers when the audio is played externally. Based on the target sound source angle, signal transmission angle and anti-crosstalk function, signal transformation is performed on the audio signal to be processed of each external speaker to obtain the target playback audio signal corresponding to the audio signal to be played; because in the embodiment of the present invention, the The anti-crosstalk function can eliminate the cross-talk generated in the open sound field. It also transforms the audio signal to be played based on the target sound source angle and the anti-crosstalk function, so that the target audio signal to be played can express the sound source position information and offset the sound source position information during playback. Therefore, it is possible to generate an audio signal that can express the direction of the sound source without adding additional audio playback equipment, and eliminate the cross-talk generated when using external speakers to play audio, so that users can enjoy Play better audio.
为了更好地实施以上方法,相应的,本发明实施例还提供一种音频处理装置。In order to better implement the above method, correspondingly, an embodiment of the present invention also provides an audio processing device.
参考图7,该装置包括:Referring to Figure 7, the device includes:
信号获取单元701,用于获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度;The signal acquisition unit 701 is used to acquire the audio signal to be played by each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played;
角度确定单元702,用于确定各所述外放扬声器与用户头部之间的信号传递角度;Angle determination unit 702, used to determine the signal transmission angle between each of the external speakers and the user's head;
函数计算单元703,用于基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,所述抗串扰函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声; Function calculation unit 703, configured to calculate an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle. The anti-crosstalk function is used to eliminate the interference caused by the at least two external speakers. Crosstalk generated when playing audio out;
信号变换单元704,用于基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。The signal conversion unit 704 is configured to perform signal conversion on the audio signals to be processed from each of the external speakers based on the anti-crosstalk function to obtain a target playback audio signal corresponding to the audio signal to be played.
可选的,所述函数计算单元703,用于基于所述信号传递角度,确定所述信号传递角度对应的扬声器头相关传递函数;Optionally, the function calculation unit 703 is configured to determine, based on the signal transmission angle, the speaker head-related transfer function corresponding to the signal transmission angle;
基于所述目标声源角度,确定所述目标声源角度对应的声源头相关传递函数;Based on the target sound source angle, determine the sound source related transfer function corresponding to the target sound source angle;
根据所述扬声器头相关传递函数和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数。According to the speaker head related transfer function and the sound source related transfer function, the anti-crosstalk function corresponding to the signal transmission angle is calculated.
可选的,所述至少两个外放扬声器包括左扬声器和右扬声器,所述角度确定单元702,用于确定所述左扬声器与用户头部之间的左侧信号传递角度;Optionally, the at least two external speakers include a left speaker and a right speaker, and the angle determination unit 702 is used to determine the left signal transmission angle between the left speaker and the user's head;
确定所述右扬声器与所述用户头部之间的右侧信号传递角度;Determine a right signal transmission angle between the right speaker and the user's head;
所述函数计算单元,用于基于所述左侧信号传递角度,确定所述左扬声器与用户左耳之间的第一左耳头相关传递函数,以及所述左扬声器与用户右耳之间的第一右耳头相关传递函数;The function calculation unit is configured to determine, based on the left signal transmission angle, a first left ear head-related transfer function between the left speaker and the user's left ear, and a first left ear head-related transfer function between the left speaker and the user's right ear. First right ear head related transfer function;
基于所述右侧信号传递角度,确定所述右扬声器与用户左耳之间的第二左耳头相关传递函数,以及所述右扬声器与用户右耳之间的第二右耳头相关传递函数;Based on the right signal transmission angle, a second left ear head-related transfer function between the right speaker and the user's left ear, and a second right ear head-related transfer function between the right speaker and the user's right ear are determined ;
将所述第一左耳头相关传递函数、第一右耳头相关传递函数、第二左耳头相关传递函数和第二右耳头相关传递函数作为扬声器头相关传递函数。The first left ear tip related transfer function, the first right ear tip related transfer function, the second left ear tip related transfer function and the second right ear tip related transfer function are used as speaker head related transfer functions.
可选的,所述至少两个外放扬声器包括左扬声器和右扬声器,所述角度确定单元702,用于确定所述左扬声器、所述右扬声器与用户头部之间的位置关系;Optionally, the at least two external speakers include a left speaker and a right speaker, and the angle determination unit 702 is used to determine the positional relationship between the left speaker, the right speaker and the user's head;
若所述左扬声器和所述右扬声器相对于所述用户头部为左右对称关系,将任一外放扬声器与所述用户头部之间的角度作为信号传递角度;If the left speaker and the right speaker have a left-right symmetrical relationship with respect to the user's head, the angle between any external speaker and the user's head is used as the signal transmission angle;
所述函数计算单元703,用于基于所述信号传递角度,确定所述左扬声器与用户左耳之间和所述右扬声器与用户右耳之间的第一头相关传递函数,以及所述左扬声器与用户右耳之间和所述右扬声器与用户左耳之间的第二头相关传递函数;The function calculation unit 703 is configured to determine a first head-related transfer function between the left speaker and the user's left ear and between the right speaker and the user's right ear based on the signal transmission angle, and the left a second head-related transfer function between the speaker and the user's right ear and between the right speaker and the user's left ear;
将所述第一头相关传递函数和第二头相关传递函数作为扬声器头相关传递函数。The first head-related transfer function and the second head-related transfer function are regarded as speaker head-related transfer functions.
可选的,所述函数计算单元703,用于根据所述扬声器头相关传递函数进行矩阵合并处理,得到所述待处理音频信号对应的扬声器串扰矩阵;Optionally, the function calculation unit 703 is configured to perform matrix merging processing according to the speaker head related transfer function to obtain the speaker crosstalk matrix corresponding to the audio signal to be processed;
针对所述扬声器串扰矩阵进行矩阵抵消,计算出所述扬声器串扰矩阵的串扰抵消矩阵;Perform matrix cancellation on the speaker crosstalk matrix to calculate the crosstalk cancellation matrix of the speaker crosstalk matrix;
基于所述串扰抵消矩阵和所述声源头相关传递函数,计算所述信号传递角度对应的抗 串扰函数。Based on the crosstalk cancellation matrix and the sound source related transfer function, the immunity corresponding to the signal transmission angle is calculated. crosstalk function.
可选的,如图8所示,本发明实施例提供的音频处理装置还包括函数处理单元705,用于获取预设的离散头相关传递函数;Optionally, as shown in Figure 8, the audio processing device provided by the embodiment of the present invention also includes a function processing unit 705, used to obtain a preset discrete head related transfer function;
对所述离散头相关传递函数进行函数逼近处理,得到目标头相关传递函数;Perform function approximation processing on the discrete head related transfer function to obtain the target head related transfer function;
所述函数计算单元,用于基于所述信号传递角度和所述目标头相关传递函数,确定所述信号传递角度对应的扬声器头相关传递函数;The function calculation unit is configured to determine the speaker head-related transfer function corresponding to the signal transmission angle based on the signal transmission angle and the target head-related transfer function;
基于所述目标声源角度和所述目标头相关传递函数,确定所述目标声源角度对应的声源头相关传递函数。Based on the target sound source angle and the target head related transfer function, a sound source related transfer function corresponding to the target sound source angle is determined.
可选的,所述信号获取单元701,用于获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号;Optionally, the signal acquisition unit 701 is used to acquire the audio signal to be played by each of the at least two external speakers;
对所述待播放音频信号进行声源位置定位,确定各所述待播放音频信号对应的目标声源角度。Position the sound source of the audio signals to be played, and determine the target sound source angle corresponding to each of the audio signals to be played.
可选的,所述信号获取单元701,用于获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号以及所述待播放音频信号对应的待播放视频帧;Optionally, the signal acquisition unit 701 is configured to acquire the audio signal to be played from each of the at least two external speakers and the video frame to be played corresponding to the audio signal to be played;
从所述待播放视频帧中,确定发声对象的发声位置信息;Determine the voicing position information of the voicing object from the video frame to be played;
基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度。Based on the sounding position information of the sounding object, the target sound source angle corresponding to each of the audio signals to be played is calculated.
可选的,所述待播放视频帧中包括至少一个候选发声对象,本发明实施例提供的音频处理装置还包括发声对象确定单元706,用于确定所述待播放音频信号对应的发声对象,获取所述发声对象的对象标识信息;Optionally, the video frame to be played includes at least one candidate voicing object. The audio processing device provided by the embodiment of the present invention also includes a voicing object determination unit 706, used to determine the voicing object corresponding to the audio signal to be played, and obtain The object identification information of the vocal object;
所述信号获取单元701,用于从所述待播放视频帧包括的所述候选发声对象中,基于所述对象标识信息进行信息匹配,确定所述发声对象;The signal acquisition unit 701 is configured to perform information matching based on the object identification information from the candidate voicing objects included in the video frame to be played, and determine the voicing object;
获取所述发声对象在所述待播放视频帧中的目标显示区域,将所述目标显示区域的位置信息作为所述发声对象的发声位置信息。Obtain the target display area of the utterance object in the video frame to be played, and use the position information of the target display area as the utterance position information of the utterance object.
可选的,所述待播放视频帧中包括至少一个候选发声对象的显示区域,所述信号获取单元701,用于针对所述待播放视频帧中的各所述显示区域,分别进行发声动作检测,若检测一所述显示区域中的候选发声对象执行了发声动作,将所述候选发声对象作为发声对象;Optionally, the video frame to be played includes a display area of at least one candidate utterance object, and the signal acquisition unit 701 is configured to perform utterance action detection for each display area in the video frame to be played. , if it is detected that a candidate voicing object in the display area has performed a voicing action, use the candidate voicing object as the voicing object;
获取所述发声对象在所述待播放视频帧中的目标显示区域,将所述目标显示区域的位置信息作为所述发声对象的发声位置信息。 Obtain the target display area of the utterance object in the video frame to be played, and use the position information of the target display area as the utterance position information of the utterance object.
可选的,本发明实施例提供的音频处理装置还包括第一位置确定单元707,用于响应于用户的位置选择操作,确定所述用户在所述待播放视频帧中对应的声音接收位置信息;Optionally, the audio processing device provided by the embodiment of the present invention also includes a first position determination unit 707, configured to determine the corresponding sound receiving position information of the user in the video frame to be played in response to the user's position selection operation. ;
所述信号获取单元701,用于基于所述发声对象的发声位置信息和所述声音接收位置信息,确定发声位置与声音接收位置之间的信号传输方向;The signal acquisition unit 701 is configured to determine the signal transmission direction between the sounding position and the sound receiving position based on the sounding position information of the sounding object and the sound receiving position information;
根据所述信号传输方向,计算每个所述待播放音频信号对应的目标声源角度。According to the signal transmission direction, the target sound source angle corresponding to each audio signal to be played is calculated.
可选的,本发明实施例提供的音频处理装置还包括第二位置确定单元,用于响应于用户的参考对象选择操作,确定所述用户在所述待播放视频帧中对应的参考对象;Optionally, the audio processing device provided by the embodiment of the present invention further includes a second position determination unit, configured to determine the reference object corresponding to the user in the video frame to be played in response to the user's reference object selection operation;
获取所述参考对象在所述待播放视频帧中的参考位置信息;Obtain the reference position information of the reference object in the video frame to be played;
所述信号获取单元701,用于基于所述发声对象的发声位置信息和所述参考位置信息,确定所述发声对象与所述参考对象之间的信号传输角度,作为每个所述待播放音频信号对应的目标声源角度。The signal acquisition unit 701 is configured to determine the signal transmission angle between the sounding object and the reference object based on the sounding position information of the sounding object and the reference position information, as each of the audio to be played The target sound source angle corresponding to the signal.
可选的,所述信号获取单元701,用于接收信号发送端发送的待播放数据包,所述待播放数据包基于各外放扬声器的待播放音频信号以及各所述待播放音频信号对应的目标声源角度进行编码得到;Optionally, the signal acquisition unit 701 is configured to receive a data packet to be played sent by the signal sending end. The data packet to be played is based on the audio signal to be played from each external speaker and the audio signal corresponding to each audio signal to be played. The target sound source angle is encoded;
对所述待播放数据包进行解码,得到至少两个所述外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度。The data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
可选的,所述信号获取单元701,用于接收云端发送的待播放数据包,所述待播放数据包由信号发送端发送给所述云端,所述待播放数据包由所述信号发送端基于各外放扬声器的待播放音频信号以及各所述待播放音频信号对应的目标声源角度进行编码得到;Optionally, the signal acquisition unit 701 is used to receive the data packet to be played sent by the cloud. The data packet to be played is sent to the cloud by the signal sending end. The data packet to be played is sent by the signal sending end. Obtained by encoding based on the audio signals to be played from each external speaker and the target sound source angle corresponding to each of the audio signals to be played;
对所述待播放数据包进行解码,得到至少两个所述外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度。The data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
可选的,所述信号变换单元704,用于基于所述抗串扰函数计算抗串扰共振函数,所述抗串扰共振函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声以及人耳外耳道共振的影响;Optionally, the signal transformation unit 704 is configured to calculate an anti-crosstalk resonance function based on the anti-crosstalk function. The anti-crosstalk resonance function is used to eliminate crossover caused by the at least two external speakers when playing audio externally. The impact of cross-sound and resonance of the external auditory canal of the human ear;
基于所述抗串扰共振函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。Based on the anti-crosstalk resonance function, signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
可选的,本发明实施例提供的音频处理装置还包括调整参数计算单元,用于获取各所述外放扬声器的播放设置参数以及目标用户对应的音频播放偏好参数;Optionally, the audio processing device provided by the embodiment of the present invention also includes an adjustment parameter calculation unit for obtaining the playback setting parameters of each of the external speakers and the audio playback preference parameters corresponding to the target user;
对所述待处理音频信号进行音频参数提取,得到所述待处理音频信号对应的待播放参 数;Perform audio parameter extraction on the audio signal to be processed to obtain the parameters to be played corresponding to the audio signal to be processed. number;
根据所述播放设置参数、音频播放偏好参数和所述待播放参数,计算所述待处理音频信号对应的音频调整函数;Calculate the audio adjustment function corresponding to the audio signal to be processed according to the playback setting parameters, the audio playback preference parameters and the to-be-played parameters;
所述信号变换单元704,用于基于所述音频调整函数、所述目标声源角度、所述信号传递角度和所述抗串扰函数,对所述待处理音频信号进行信号变换,得到所述待处理音频信号对应的目标播放音频信号。The signal transformation unit 704 is configured to perform signal transformation on the audio signal to be processed based on the audio adjustment function, the target sound source angle, the signal transmission angle and the anti-crosstalk function to obtain the audio signal to be processed. Process the audio signal corresponding to the target to play the audio signal.
可选的,本发明实施例提供的音频处理装置还包括音频播放单元,用于将各所述待播放音频信号对应的目标播放音频信号,分别发送给各所述待播放音频信号对应的所述外放扬声器,触发所述外放扬声器播放对应的所述目标播放音频信号。Optionally, the audio processing device provided by the embodiment of the present invention further includes an audio playback unit, configured to send the target playback audio signal corresponding to each of the audio signals to be played to the target playback audio signal corresponding to each of the audio signals to be played. An external speaker triggers the external speaker to play the corresponding target audio signal.
由上可知,通过音频处理装置,可以在不增设音频播放设备的情况下,生成能够表现声源方位的音频信号,并消除当使用外放扬声器播放音频时产生的交叉串声,使用户可以享受到播放效果更好的音频。It can be seen from the above that through the audio processing device, it is possible to generate audio signals that can express the direction of the sound source without adding additional audio playback equipment, and eliminate the cross-talk generated when using external speakers to play audio, so that users can enjoy to play better audio.
此外,本发明实施例还提供一种电子设备,该电子设备可以为终端或者服务器等等,如图9所示,其示出了本发明实施例所涉及的电子设备的结构示意图,具体来讲:In addition, an embodiment of the present invention also provides an electronic device, which can be a terminal or a server, etc., as shown in Figure 9, which shows a schematic structural diagram of the electronic device involved in the embodiment of the present invention. Specifically, :
该电子设备可以包括射频(RF,Radio Frequency)电路901、包括有一个或一个以上计算机可读存储介质的存储器902、输入单元903、显示单元904、传感器905、音频电路906、无线保真(WiFi,Wireless Fidelity)模块907、包括有一个或者一个以上处理核心的处理器908、以及电源909等部件。本领域技术人员可以理解,图9中示出的电子设备结构并不构成对电子设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。其中:The electronic device may include a radio frequency (RF, Radio Frequency) circuit 901, a memory 902 including one or more computer-readable storage media, an input unit 903, a display unit 904, a sensor 905, an audio circuit 906, a wireless fidelity (WiFi) , Wireless Fidelity) module 907, including a processor 908 with one or more processing cores, a power supply 909 and other components. Those skilled in the art can understand that the structure of the electronic device shown in FIG. 9 does not constitute a limitation on the electronic device, and may include more or fewer components than shown in the figure, or combine certain components, or arrange different components. in:
RF电路901可用于收发信息或通话过程中,信号的接收和发送,特别地,将基站的下行信息接收后,交由一个或者一个以上处理器908处理;另外,将涉及上行的数据发送给基站。通常,RF电路901包括但不限于天线、至少一个放大器、调谐器、一个或多个振荡器、用户身份模块(SIM,Subscriber Identity Module)卡、收发信机、耦合器、低噪声放大器(LNA,Low Noise Amplifier)、双工器等。此外,RF电路901还可以通过无线通信与网络和其他设备通信。无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统(GSM,Global System of Mobile communication)、通用分组无线服务(GPRS,General Packet Radio Service)、码分多址(CDMA,Code Division Multiple Access)、宽带码分多址(WCDMA,Wideband Code Division Multiple Access)、长期演进(LTE,Long Term Evolution)、电子邮件、 短消息服务(SMS,Short Messaging Service)等。The RF circuit 901 can be used to receive and send information or signals during a call. In particular, after receiving the downlink information of the base station, it is handed over to one or more processors 908 for processing; in addition, the uplink data is sent to the base station. . Typically, the RF circuit 901 includes, but is not limited to, an antenna, at least one amplifier, a tuner, one or more oscillators, a Subscriber Identity Module (SIM) card, a transceiver, a coupler, a low noise amplifier (LNA, Low Noise Amplifier), duplexer, etc. In addition, RF circuit 901 can communicate with networks and other devices through wireless communications. Wireless communication can use any communication standard or protocol, including but not limited to Global System of Mobile communication (GSM, Global System of Mobile communication), General Packet Radio Service (GPRS, General Packet Radio Service), Code Division Multiple Access (CDMA, Code Division Multiple Access), Wideband Code Division Multiple Access (WCDMA, Wideband Code Division Multiple Access), Long Term Evolution (LTE, Long Term Evolution), email, Short Message Service (SMS, Short Messaging Service), etc.
存储器902可用于存储软件程序以及模块,处理器908通过运行存储在存储器902的软件程序以及模块,从而执行各种功能应用以及数据处理。存储器902可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等)等;存储数据区可存储根据电子设备的使用所创建的数据(比如音频数据、电话本等)等。此外,存储器902可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。相应地,存储器902还可以包括存储器控制器,以提供处理器908和输入单元903对存储器902的访问。The memory 902 can be used to store software programs and modules, and the processor 908 executes various functional applications and data processing by running the software programs and modules stored in the memory 902 . The memory 902 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function (such as a sound playback function, an image playback function, etc.), etc.; the storage data area may store a program based on Data created by the use of electronic devices (such as audio data, phone books, etc.), etc. In addition, memory 902 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 902 may also include a memory controller to provide the processor 908 and the input unit 903 with access to the memory 902 .
输入单元903可用于接收输入的数字或字符信息,以及产生与用户设置以及功能控制有关的键盘、鼠标、操作杆、光学或者轨迹球信号输入。具体地,在一个具体的实施例中,输入单元903可包括触敏表面以及其他输入设备。触敏表面,也称为触摸显示屏或者触控板,可收集用户在其上或附近的触摸操作(比如用户使用手指、触笔等任何适合的物体或附件在触敏表面上或在触敏表面附近的操作),并根据预先设定的程式驱动相应的连接装置。可选的,触敏表面可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再送给处理器908,并能接收处理器908发来的命令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型实现触敏表面。除了触敏表面,输入单元903还可以包括其他输入设备。具体地,其他输入设备可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆等中的一种或多种。The input unit 903 may be used to receive input numeric or character information and generate keyboard, mouse, joystick, optical or trackball signal input related to user settings and function control. Specifically, in a specific embodiment, the input unit 903 may include a touch-sensitive surface as well as other input devices. A touch-sensitive surface, also known as a touch display or trackpad, can collect the user's touch operations on or near it (such as the user using a finger, stylus, or any suitable object or accessory on or near the touch-sensitive surface). operations near the surface), and drive the corresponding connection device according to the preset program. Optionally, the touch-sensitive surface may include two parts: a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch orientation, detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives the touch information from the touch detection device, converts it into contact point coordinates, and then sends it to the touch controller. to the processor 908, and can receive commands sent by the processor 908 and execute them. In addition, touch-sensitive surfaces can be implemented using a variety of types including resistive, capacitive, infrared, and surface acoustic waves. In addition to touch-sensitive surfaces, the input unit 903 may also include other input devices. Specifically, other input devices may include, but are not limited to, one or more of a physical keyboard, function keys (such as volume control keys, switch keys, etc.), trackball, mouse, joystick, etc.
显示单元904可用于显示由用户输入的信息或提供给用户的信息以及电子设备的各种图形用户接口,这些图形用户接口可以由图形、文本、图标、视频和其任意组合来构成。显示单元904可包括显示面板,可选的,可以采用液晶显示器(LCD,Liquid Crystal Display)、有机发光二极管(OLED,Organic Light-Emitting Diode)等形式来配置显示面板。进一步的,触敏表面可覆盖显示面板,当触敏表面检测到在其上或附近的触摸操作后,传送给处理器908以确定触摸事件的类型,随后处理器908根据触摸事件的类型在显示面板上提供相应的视觉输出。虽然在图9中,触敏表面与显示面板是作为两个独立的部件来实现输入和输入功能,但是在某些实施例中,可以将触敏表面与显示面板集成而实现输入和输出功能。 The display unit 904 may be used to display information input by the user or information provided to the user as well as various graphical user interfaces of the electronic device. These graphical user interfaces may be composed of graphics, text, icons, videos, and any combination thereof. The display unit 904 may include a display panel, which may optionally be configured in the form of a liquid crystal display (LCD), an organic light-emitting diode (OLED), or the like. Further, the touch-sensitive surface can cover the display panel. When the touch-sensitive surface detects a touch operation on or near it, it is sent to the processor 908 to determine the type of the touch event. The processor 908 then displays the display panel according to the type of the touch event. Corresponding visual output is provided on the panel. Although in FIG. 9 , the touch-sensitive surface and the display panel are used as two independent components to implement the input and input functions, in some embodiments, the touch-sensitive surface and the display panel can be integrated to implement the input and output functions.
电子设备还可包括至少一种传感器905,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器,其中,环境光传感器可根据环境光线的明暗来调节显示面板的亮度,接近传感器可在电子设备移动到耳边时,关闭显示面板和/或背光。作为运动传感器的一种,重力加速度传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于电子设备还可配置的陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不再赘述。The electronic device may also include at least one sensor 905, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor may include an ambient light sensor and a proximity sensor. The ambient light sensor may adjust the brightness of the display panel according to the brightness of the ambient light. The proximity sensor may close the display panel and/or when the electronic device moves to the ear. Backlight. As a kind of motion sensor, the gravity acceleration sensor can detect the magnitude of acceleration in various directions (usually three axes). It can detect the magnitude and direction of gravity when stationary. It can be used to identify applications of mobile phone posture (such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, knock), etc.; as for other sensors such as gyroscopes, barometers, hygrometers, thermometers, infrared sensors, etc. that can also be configured in electronic devices, they are not mentioned here. Again.
音频电路906、扬声器,传声器可提供用户与电子设备之间的音频接口。音频电路906可将接收到的音频数据转换后的电信号,传输到扬声器,由扬声器转换为声音信号输出;另一方面,传声器将收集的声音信号转换为电信号,由音频电路906接收后转换为音频数据,再将音频数据输出处理器908处理后,经RF电路901以发送给比如另一电子设备,或者将音频数据输出至存储器902以便进一步处理。音频电路906还可能包括耳塞插孔,以提供外设耳机与电子设备的通信。The audio circuit 906, speaker, and microphone can provide an audio interface between the user and the electronic device. The audio circuit 906 can transmit the electrical signal converted from the received audio data to the speaker, which converts it into a sound signal and outputs it; on the other hand, the microphone converts the collected sound signal into an electrical signal, which is received and converted by the audio circuit 906 The audio data is processed by the audio data output processor 908 and then sent to, for example, another electronic device through the RF circuit 901, or the audio data is output to the memory 902 for further processing. Audio circuitry 906 may also include an earphone jack to provide communication between peripheral earphones and electronic devices.
WiFi属于短距离无线传输技术,电子设备通过WiFi模块907可以帮助用户收发电子邮件、浏览网页和访问流式媒体等,它为用户提供了无线的宽带互联网访问。虽然图9示出了WiFi模块907,但是可以理解的是,其并不属于电子设备的必须构成,完全可以根据需要在不改变发明的本质的范围内而省略。WiFi is a short-distance wireless transmission technology. Electronic devices can help users send and receive emails, browse web pages, and access streaming media through the WiFi module 907. It provides users with wireless broadband Internet access. Although FIG. 9 shows the WiFi module 907, it can be understood that it is not a necessary component of the electronic device and can be omitted as needed without changing the essence of the invention.
处理器908是电子设备的控制中心,利用各种接口和线路连接整个手机的各个部分,通过运行或执行存储在存储器902内的软件程序和/或模块,以及调用存储在存储器902内的数据,执行电子设备的各种功能和处理数据。可选的,处理器908可包括一个或多个处理核心;优选的,处理器908可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器908中。The processor 908 is the control center of the electronic device, using various interfaces and lines to connect various parts of the entire mobile phone, by running or executing software programs and/or modules stored in the memory 902, and calling data stored in the memory 902, Perform various functions of electronic devices and process data. Optionally, the processor 908 may include one or more processing cores; preferably, the processor 908 may integrate an application processor and a modem processor, where the application processor mainly processes operating systems, user interfaces, application programs, etc. , the modem processor mainly handles wireless communications. It can be understood that the above-mentioned modem processor may not be integrated into the processor 908.
电子设备还包括给各个部件供电的电源909(比如电池),优选的,电源可以通过电源管理系统与处理器908逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。电源909还可以包括一个或一个以上的直流或交流电源、再充电系统、电源故障检测电路、电源转换器或者逆变器、电源状态指示器等任意组件。The electronic device also includes a power supply 909 (such as a battery) that supplies power to various components. Preferably, the power supply can be logically connected to the processor 908 through a power management system, so that functions such as charging, discharging, and power consumption management can be implemented through the power management system. Power supply 909 may also include one or more DC or AC power supplies, recharging systems, power failure detection circuits, power converters or inverters, power status indicators, and other arbitrary components.
尽管未示出,电子设备还可以包括摄像头、蓝牙模块等,在此不再赘述。具体在本实 施例中,电子设备中的处理器908会按照如下的指令,将一个或一个以上的应用程序的进程对应的可执行文件加载到存储器902中,并由处理器908来运行存储在存储器902中的应用程序,从而实现各种功能,如下:Although not shown, the electronic device may also include a camera, a Bluetooth module, etc., which will not be described again here. Concrete in reality In this embodiment, the processor 908 in the electronic device will load the executable files corresponding to the processes of one or more application programs into the memory 902 according to the following instructions, and the processor 908 will run the executable files stored in the memory 902 application to achieve various functions, as follows:
获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度;Obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played;
确定各所述外放扬声器与用户头部之间的信号传递角度;Determine the signal transmission angle between each of the external speakers and the user's head;
基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,所述抗串扰函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声;Based on the signal transmission angle and the target sound source angle, an anti-crosstalk function corresponding to the signal transmission angle is calculated. The anti-crosstalk function is used to eliminate the crossover generated by the at least two external speakers when the audio is externally played. Crosstalk;
基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。Based on the anti-crosstalk function, signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
本领域普通技术人员可以理解,上述实施例的各种方法中的全部或部分步骤可以通过指令来完成,或通过指令控制相关的硬件来完成,该指令可以存储于一计算机可读存储介质中,并由处理器进行加载和执行。Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructions, or by controlling relevant hardware through instructions. The instructions can be stored in a computer-readable storage medium, and loaded and executed by the processor.
为此,本发明实施例提供一种计算机可读存储介质,其中存储有多条指令,该指令能够被处理器进行加载,以执行本发明实施例所提供的任一种音频处理方法中的步骤。例如,该指令可以执行如下步骤:To this end, embodiments of the present invention provide a computer-readable storage medium in which a plurality of instructions are stored, and the instructions can be loaded by a processor to execute the steps in any audio processing method provided by embodiments of the present invention. . For example, this command can perform the following steps:
获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度;Obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played;
确定各所述外放扬声器与用户头部之间的信号传递角度;Determine the signal transmission angle between each of the external speakers and the user's head;
基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,所述抗串扰函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声;Based on the signal transmission angle and the target sound source angle, an anti-crosstalk function corresponding to the signal transmission angle is calculated. The anti-crosstalk function is used to eliminate the crossover generated by the at least two external speakers when the audio is externally played. Crosstalk;
基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。Based on the anti-crosstalk function, signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
以上各个操作的具体实施可参见前面的实施例,在此不再赘述。For the specific implementation of each of the above operations, please refer to the previous embodiments and will not be described again here.
其中,该计算机可读存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取记忆体(RAM,Random Access Memory)、磁盘或光盘等。Among them, the computer-readable storage medium may include: read-only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
由于该计算机可读存储介质中所存储的指令,可以执行本发明实施例所提供的任一种音频处理方法中的步骤,因此,可以实现本发明实施例所提供的任一种音频处理方法所能实现的有益效果,详见前面的实施例,在此不再赘述。 Since the instructions stored in the computer-readable storage medium can execute the steps in any audio processing method provided by the embodiment of the present invention, therefore, the steps of any audio processing method provided by the embodiment of the present invention can be realized. The beneficial effects that can be achieved are detailed in the previous embodiments and will not be described again here.
根据本申请的一个方面,还提供了一种计算机程序产品或计算机程序,该计算机程序产品或计算机程序包括计算机指令,该计算机指令存储在计算机可读存储介质中。电子设备的处理器从计算机可读存储介质读取该计算机指令,处理器执行该计算机指令,使得该电子设备执行上述实施例中的各种可选实现方式中提供的方法。According to one aspect of the present application, a computer program product or computer program is also provided. The computer program product or computer program includes computer instructions, and the computer instructions are stored in a computer-readable storage medium. The processor of the electronic device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the electronic device performs the methods provided in various optional implementations in the above embodiments.
以上对本发明实施例所提供的音频处理方法、装置、电子设备、存储介质和程序产品进行了详细介绍,本文中应用了具体个例对本发明的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本发明的方法及其核心思想;同时,对于本领域的技术人员,依据本发明的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本发明的限制。 The audio processing methods, devices, electronic equipment, storage media and program products provided by the embodiments of the present invention have been introduced in detail. This article uses specific examples to illustrate the principles and implementation methods of the present invention. The description of the above embodiments It is only used to help understand the method and its core idea of the present invention; at the same time, for those skilled in the art, there will be changes in the specific implementation and application scope according to the idea of the present invention. In summary, this The content of the description should not be construed as limiting the invention.

Claims (21)

  1. 一种音频处理方法,其特征在于,包括:An audio processing method, characterized by including:
    获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度;Obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played;
    确定各所述外放扬声器与用户头部之间的信号传递角度;Determine the signal transmission angle between each of the external speakers and the user's head;
    基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,所述抗串扰函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声;Based on the signal transmission angle and the target sound source angle, an anti-crosstalk function corresponding to the signal transmission angle is calculated. The anti-crosstalk function is used to eliminate the crossover generated by the at least two external speakers when the audio is externally played. Crosstalk;
    基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。Based on the anti-crosstalk function, signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
  2. 根据权利要求1所述的音频处理方法,其特征在于,所述基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,包括:The audio processing method according to claim 1, characterized in that, based on the signal transmission angle and the target sound source angle, calculating the anti-crosstalk function corresponding to the signal transmission angle includes:
    基于所述信号传递角度,确定所述信号传递角度对应的扬声器头相关传递函数;Based on the signal transmission angle, determine the speaker head related transfer function corresponding to the signal transmission angle;
    基于所述目标声源角度,确定所述目标声源角度对应的声源头相关传递函数;Based on the target sound source angle, determine the sound source related transfer function corresponding to the target sound source angle;
    根据所述扬声器头相关传递函数和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数。According to the speaker head related transfer function and the sound source related transfer function, the anti-crosstalk function corresponding to the signal transmission angle is calculated.
  3. 根据权利要求2所述的音频处理方法,其特征在于,所述至少两个外放扬声器包括左扬声器和右扬声器,所述确定各所述外放扬声器与用户头部之间的信号传递角度,包括:The audio processing method according to claim 2, wherein the at least two external speakers include a left speaker and a right speaker, and determining the signal transmission angle between each of the external speakers and the user's head, include:
    确定所述左扬声器与用户头部之间的左侧信号传递角度;Determine the left signal transmission angle between the left speaker and the user's head;
    确定所述右扬声器与所述用户头部之间的右侧信号传递角度;Determine a right signal transmission angle between the right speaker and the user's head;
    所述基于所述信号传递角度,确定所述信号传递角度对应的扬声器头相关传递函数,包括:Determining the speaker head-related transfer function corresponding to the signal transmission angle based on the signal transmission angle includes:
    基于所述左侧信号传递角度,确定所述左扬声器与用户左耳之间的第一左耳头相关传递函数,以及所述左扬声器与用户右耳之间的第一右耳头相关传递函数;Based on the left signal transmission angle, a first left ear head-related transfer function between the left speaker and the user's left ear, and a first right ear head-related transfer function between the left speaker and the user's right ear are determined ;
    基于所述右侧信号传递角度,确定所述右扬声器与用户左耳之间的第二左耳头相关传递函数,以及所述右扬声器与用户右耳之间的第二右耳头相关传递函数;Based on the right signal transmission angle, a second left ear head-related transfer function between the right speaker and the user's left ear, and a second right ear head-related transfer function between the right speaker and the user's right ear are determined ;
    将所述第一左耳头相关传递函数、第一右耳头相关传递函数、第二左耳头相关传递函数和第二右耳头相关传递函数作为扬声器头相关传递函数。The first left ear tip related transfer function, the first right ear tip related transfer function, the second left ear tip related transfer function and the second right ear tip related transfer function are used as speaker head related transfer functions.
  4. 根据权利要求2所述的音频处理方法,其特征在于,所述至少两个外放扬声器包括 左扬声器和右扬声器,所述确定各所述外放扬声器与用户头部之间的信号传递角度,包括:The audio processing method according to claim 2, characterized in that the at least two external speakers include For the left speaker and the right speaker, determining the signal transmission angle between each of the external speakers and the user's head includes:
    确定所述左扬声器、所述右扬声器与用户头部之间的位置关系;Determine the positional relationship between the left speaker, the right speaker and the user's head;
    若所述左扬声器和所述右扬声器相对于所述用户头部为左右对称关系,将任一外放扬声器与所述用户头部之间的角度作为信号传递角度;If the left speaker and the right speaker have a left-right symmetrical relationship with respect to the user's head, the angle between any external speaker and the user's head is used as the signal transmission angle;
    所述基于所述信号传递角度,确定所述信号传递角度对应的扬声器头相关传递函数,包括:Determining the speaker head-related transfer function corresponding to the signal transmission angle based on the signal transmission angle includes:
    基于所述信号传递角度,确定所述左扬声器与用户左耳之间和所述右扬声器与用户右耳之间的第一头相关传递函数,以及所述左扬声器与用户右耳之间和所述右扬声器与用户左耳之间的第二头相关传递函数;Based on the signal transmission angle, a first head-related transfer function between the left speaker and the user's left ear and between the right speaker and the user's right ear is determined, and a first head-related transfer function between the left speaker and the user's right ear and the Describe the second head-related transfer function between the right speaker and the user's left ear;
    将所述第一头相关传递函数和第二头相关传递函数作为扬声器头相关传递函数。The first head-related transfer function and the second head-related transfer function are regarded as speaker head-related transfer functions.
  5. 根据权利要求2所述的音频处理方法,其特征在于,根据所述扬声器头相关传递函数和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数,包括:The audio processing method according to claim 2, characterized in that, based on the speaker head related transfer function and the sound source related transfer function, calculating the anti-crosstalk function corresponding to the signal transmission angle includes:
    根据所述扬声器头相关传递函数进行矩阵合并处理,得到所述待处理音频信号对应的扬声器串扰矩阵;Perform matrix merging processing according to the speaker head related transfer function to obtain the speaker crosstalk matrix corresponding to the audio signal to be processed;
    针对所述扬声器串扰矩阵进行矩阵抵消,计算出所述扬声器串扰矩阵的串扰抵消矩阵;Perform matrix cancellation on the speaker crosstalk matrix to calculate the crosstalk cancellation matrix of the speaker crosstalk matrix;
    基于所述串扰抵消矩阵和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数。Based on the crosstalk cancellation matrix and the sound source related transfer function, an anti-crosstalk function corresponding to the signal transmission angle is calculated.
  6. 根据权利要求2所述的音频处理方法,其特征在于,所述根据所述扬声器头相关传递函数和所述声源头相关传递函数,计算所述信号传递角度对应的抗串扰函数之前,所述方法还包括:The audio processing method according to claim 2, characterized in that, before calculating the anti-crosstalk function corresponding to the signal transmission angle according to the speaker head related transfer function and the sound source related transfer function, the method Also includes:
    获取预设的离散头相关传递函数;Get the preset discrete head related transfer function;
    对所述离散头相关传递函数进行函数逼近处理,得到目标头相关传递函数;Perform function approximation processing on the discrete head related transfer function to obtain the target head related transfer function;
    所述基于所述信号传递角度,确定所述信号传递角度对应的扬声器头相关传递函数,包括:Determining the speaker head-related transfer function corresponding to the signal transmission angle based on the signal transmission angle includes:
    基于所述信号传递角度和所述目标头相关传递函数,确定所述信号传递角度对应的扬声器头相关传递函数;Based on the signal transmission angle and the target head-related transfer function, determine the speaker head-related transfer function corresponding to the signal transmission angle;
    所述基于所述目标声源角度,确定所述目标声源角度对应的声源头相关传递函数,包括:Determining the sound source related transfer function corresponding to the target sound source angle based on the target sound source angle includes:
    基于所述目标声源角度和所述目标头相关传递函数,确定所述目标声源角度对应的声 源头相关传递函数。Based on the target sound source angle and the target head related transfer function, determine the sound corresponding to the target sound source angle. Source dependent transfer function.
  7. 根据权利要求1所述的音频处理方法,其特征在于,所述获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度,包括:The audio processing method according to claim 1, characterized in that: obtaining the audio signal to be played by each of the at least two external speakers and the target sound corresponding to each of the audio signals to be played. Source angles, including:
    获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号;Obtain the audio signal to be played from each of the at least two external speakers;
    对所述待播放音频信号进行声源位置定位,确定各所述待播放音频信号对应的目标声源角度。Position the sound source of the audio signals to be played, and determine the target sound source angle corresponding to each of the audio signals to be played.
  8. 根据权利要求1所述的音频处理方法,其特征在于,所述获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度,包括:The audio processing method according to claim 1, characterized in that: obtaining the audio signal to be played by each of the at least two external speakers and the target sound corresponding to each of the audio signals to be played. Source angles, including:
    获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号以及所述待播放音频信号对应的待播放视频帧;Obtain the audio signal to be played from each of the at least two external speakers and the video frame to be played corresponding to the audio signal to be played;
    从所述待播放视频帧中,确定发声对象的发声位置信息;Determine the voicing position information of the voicing object from the video frame to be played;
    基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度。Based on the sounding position information of the sounding object, the target sound source angle corresponding to each of the audio signals to be played is calculated.
  9. 根据权利要求8所述的音频处理方法,其特征在于,所述待播放视频帧中包括至少一个候选发声对象,所述从所述待播放视频帧中,确定发声对象的发声位置信息之前,所述方法还包括:The audio processing method according to claim 8, characterized in that the video frame to be played includes at least one candidate voicing object, and the voicing position information of the voicing object is determined from the video frame to be played. The above methods also include:
    确定所述待播放音频信号对应的发声对象,获取所述发声对象的对象标识信息;Determine the sound object corresponding to the audio signal to be played, and obtain the object identification information of the sound object;
    所述从所述待播放视频帧中,确定发声对象的发声位置信息,包括:Determining the voicing position information of the voicing object from the video frame to be played includes:
    从所述待播放视频帧包括的所述候选发声对象中,基于所述对象标识信息进行信息匹配,确定所述发声对象;From the candidate voicing objects included in the video frame to be played, perform information matching based on the object identification information to determine the voicing object;
    获取所述发声对象在所述待播放视频帧中的目标显示区域,将所述目标显示区域的位置信息作为所述发声对象的发声位置信息。Obtain the target display area of the utterance object in the video frame to be played, and use the position information of the target display area as the utterance position information of the utterance object.
  10. 根据权利要求8所述的音频处理方法,其特征在于,所述待播放视频帧中包括至少一个候选发声对象的显示区域,所述从所述待播放视频帧中,确定发声对象的发声位置信息,包括:The audio processing method according to claim 8, characterized in that the video frame to be played includes a display area of at least one candidate voicing object, and the voicing position information of the voicing object is determined from the video frame to be played. ,include:
    针对所述待播放视频帧中的各所述显示区域,分别进行发声动作检测,若检测一所述显示区域中的候选发声对象执行了发声动作,将所述候选发声对象作为发声对象; Perform voice action detection for each display area in the video frame to be played respectively. If it is detected that a candidate voice object in the display area has performed a voice action, use the candidate voice object as the voice object;
    获取所述发声对象在所述待播放视频帧中的目标显示区域,将所述目标显示区域的位置信息作为所述发声对象的发声位置信息。Obtain the target display area of the utterance object in the video frame to be played, and use the position information of the target display area as the utterance position information of the utterance object.
  11. 根据权利要求8所述的音频处理方法,其特征在于,基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度之前,所述方法还包括:The audio processing method according to claim 8, characterized in that, before calculating the target sound source angle corresponding to each of the audio signals to be played based on the voicing position information of the voicing object, the method further includes:
    响应于用户的位置选择操作,确定所述用户在所述待播放视频帧中对应的声音接收位置信息;In response to the user's position selection operation, determine the corresponding sound receiving position information of the user in the video frame to be played;
    所述基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度,包括:Calculating the target sound source angle corresponding to each of the audio signals to be played based on the sounding position information of the sounding object includes:
    基于所述发声对象的发声位置信息和所述声音接收位置信息,确定发声位置与声音接收位置之间的信号传输方向;Based on the sound-emitting position information of the sound-emitting object and the sound-receiving position information, determine the signal transmission direction between the sound-emitting position and the sound-receiving position;
    根据所述信号传输方向,计算每个所述待播放音频信号对应的目标声源角度。According to the signal transmission direction, the target sound source angle corresponding to each audio signal to be played is calculated.
  12. 根据权利要求8所述的音频处理方法,其特征在于,基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度之前,所述方法还包括:The audio processing method according to claim 8, characterized in that, before calculating the target sound source angle corresponding to each of the audio signals to be played based on the voicing position information of the voicing object, the method further includes:
    响应于用户的参考对象选择操作,确定所述用户在所述待播放视频帧中对应的参考对象;In response to the user's reference object selection operation, determine the reference object corresponding to the user in the video frame to be played;
    获取所述参考对象在所述待播放视频帧中的参考位置信息;Obtain the reference position information of the reference object in the video frame to be played;
    所述基于所述发声对象的发声位置信息,计算每个所述待播放音频信号对应的目标声源角度,包括:Calculating the target sound source angle corresponding to each audio signal to be played based on the sounding position information of the sounding object includes:
    基于所述发声对象的发声位置信息和所述参考位置信息,确定所述发声对象与所述参考对象之间的信号传输角度,作为每个所述待播放音频信号对应的目标声源角度。Based on the sounding position information of the sounding object and the reference position information, the signal transmission angle between the sounding object and the reference object is determined as the target sound source angle corresponding to each of the audio signals to be played.
  13. 根据权利要求1所述的音频处理方法,其特征在于,所述获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度,包括:The audio processing method according to claim 1, characterized in that: obtaining the audio signal to be played by each of the at least two external speakers and the target sound corresponding to each of the audio signals to be played. Source angles, including:
    接收信号发送端发送的待播放数据包,所述待播放数据包基于各外放扬声器的待播放音频信号以及各所述待播放音频信号对应的目标声源角度进行编码得到;Receive the data packet to be played sent by the signal transmitting end, the data packet to be played is encoded based on the audio signal to be played from each external speaker and the target sound source angle corresponding to each of the audio signals to be played;
    对所述待播放数据包进行解码,得到至少两个所述外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度。The data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
  14. 根据权利要求1所述的音频处理方法,其特征在于,所述获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角 度,包括:The audio processing method according to claim 1, characterized in that: obtaining the audio signal to be played by each of the at least two external speakers and the target sound corresponding to each of the audio signals to be played. source angle degrees, including:
    接收云端发送的待播放数据包,所述待播放数据包由信号发送端发送给所述云端,所述待播放数据包由所述信号发送端基于各外放扬声器的待播放音频信号以及各所述待播放音频信号对应的目标声源角度进行编码得到;Receive the data packet to be played sent by the cloud. The data packet to be played is sent to the cloud by the signal sending end. The data packet to be played is sent by the signal sending end based on the audio signal to be played by each external speaker and the audio signal of each external speaker. The target sound source angle corresponding to the audio signal to be played is obtained by encoding;
    对所述待播放数据包进行解码,得到至少两个所述外放扬声器中每个所述外放扬声器的待播放音频信号和每个所述待播放音频信号对应的目标声源角度。The data packet to be played is decoded to obtain the audio signal to be played from each of the at least two external speakers and the target sound source angle corresponding to each of the audio signals to be played.
  15. 根据权利要求1所述的音频处理方法,其特征在于,所述基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号,包括:The audio processing method according to claim 1, characterized in that, based on the anti-crosstalk function, the audio signal to be processed of each of the external speakers is signal transformed to obtain the corresponding audio signal to be played. The target plays audio signals including:
    基于所述抗串扰函数计算抗串扰共振函数,所述抗串扰共振函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声以及人耳外耳道共振的影响;Calculate an anti-crosstalk resonance function based on the anti-crosstalk function, and the anti-crosstalk resonance function is used to eliminate the influence of cross-talk and resonance of the external auditory canal of the human ear caused by the at least two external speakers when the audio is played out;
    基于所述抗串扰共振函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。Based on the anti-crosstalk resonance function, signal transformation is performed on the audio signals to be processed from each of the external speakers to obtain a target playback audio signal corresponding to the audio signal to be played.
  16. 根据权利要求1所述的音频处理方法,其特征在于,所述基于所述抗串扰函数,对所述待处理音频信号进行信号变换之前,所述方法还包括:The audio processing method according to claim 1, characterized in that, before performing signal transformation on the audio signal to be processed based on the anti-crosstalk function, the method further includes:
    获取各所述外放扬声器的播放设置参数以及目标用户对应的音频播放偏好参数;Obtain the playback setting parameters of each of the external speakers and the audio playback preference parameters corresponding to the target user;
    对所述待处理音频信号进行音频参数提取,得到所述待处理音频信号对应的待播放参数;Perform audio parameter extraction on the audio signal to be processed to obtain the parameters to be played corresponding to the audio signal to be processed;
    根据所述播放设置参数、音频播放偏好参数和所述待播放参数,计算所述待处理音频信号对应的音频调整函数;Calculate the audio adjustment function corresponding to the audio signal to be processed according to the playback setting parameters, the audio playback preference parameters and the to-be-played parameters;
    所述基于所述抗串扰函数,对所述待处理音频信号进行信号变换,得到所述待处理音频信号对应的目标播放音频信号,包括:The step of performing signal transformation on the audio signal to be processed based on the anti-crosstalk function to obtain the target playback audio signal corresponding to the audio signal to be processed includes:
    基于所述音频调整函数、所述目标声源角度、所述信号传递角度和所述抗串扰函数,对所述待处理音频信号进行信号变换,得到所述待处理音频信号对应的目标播放音频信号。Based on the audio adjustment function, the target sound source angle, the signal transmission angle and the anti-crosstalk function, the audio signal to be processed is signal transformed to obtain the target playback audio signal corresponding to the audio signal to be processed. .
  17. 根据权利要求1-16任一项所述的音频处理方法,其特征在于,所述方法还包括:The audio processing method according to any one of claims 1-16, characterized in that the method further includes:
    将各所述待播放音频信号对应的目标播放音频信号,分别发送给各所述待播放音频信号对应的所述外放扬声器,触发所述外放扬声器播放对应的所述目标播放音频信号。The target playback audio signal corresponding to each of the audio signals to be played is sent to the external speaker corresponding to each of the audio signals to be played, and the external speaker is triggered to play the corresponding target playback audio signal.
  18. 一种音频处理装置,其特征在于,包括:An audio processing device, characterized by including:
    信号获取单元,用于获取至少两个外放扬声器中每个所述外放扬声器的待播放音频信 号和每个所述待播放音频信号对应的目标声源角度;A signal acquisition unit, configured to acquire the audio signal to be played by each of the at least two external speakers. number and the target sound source angle corresponding to each audio signal to be played;
    角度确定单元,用于确定各所述外放扬声器与用户头部之间的信号传递角度;An angle determination unit, used to determine the signal transmission angle between each of the external speakers and the user's head;
    函数计算单元,用于基于所述信号传递角度和所述目标声源角度,计算所述信号传递角度对应的抗串扰函数,所述抗串扰函数用于消除由于所述至少两个外放扬声器在外放音频时产生的交叉串声;A function calculation unit configured to calculate an anti-crosstalk function corresponding to the signal transmission angle based on the signal transmission angle and the target sound source angle. The anti-crosstalk function is used to eliminate the problem caused by the at least two external loudspeakers being externally placed. Crosstalk generated when playing audio;
    信号变换单元,用于基于所述抗串扰函数,对各所述外放扬声器的所述待处理音频信号进行信号变换,得到所述待播放音频信号对应的目标播放音频信号。A signal conversion unit, configured to perform signal conversion on the audio signals to be processed from each of the external speakers based on the anti-crosstalk function, to obtain a target playback audio signal corresponding to the audio signal to be played.
  19. 一种电子设备,其特征在于,包括存储器和处理器;所述存储器存储有应用程序,所述处理器用于运行所述存储器内的应用程序,以执行权利要求1至17任一项所述的音频处理方法中的步骤。An electronic device, characterized in that it includes a memory and a processor; the memory stores an application program, and the processor is used to run the application program in the memory to execute the method described in any one of claims 1 to 17 Steps in audio processing methods.
  20. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有多条指令,所述指令适于处理器进行加载,以执行权利要求1至17任一项所述的音频处理方法中的步骤。A computer-readable storage medium, characterized in that the computer-readable storage medium stores a plurality of instructions, and the instructions are suitable for loading by a processor to perform the audio processing described in any one of claims 1 to 17 steps in the method.
  21. 一种计算机程序产品,包括计算机程序或指令,其特征在于,所述计算机程序或指令被处理器执行时实现如权利要求1至17中任一项所述的音频处理方法的步骤。 A computer program product, including a computer program or instructions, characterized in that when the computer program or instructions are executed by a processor, the steps of the audio processing method according to any one of claims 1 to 17 are implemented.
PCT/CN2023/097184 2022-08-05 2023-05-30 Audio processing method and apparatus, electronic device, storage medium, and program product WO2024027315A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210940126.1A CN117135557A (en) 2022-08-05 2022-08-05 Audio processing method, device, electronic equipment, storage medium and program product
CN202210940126.1 2022-08-05

Publications (1)

Publication Number Publication Date
WO2024027315A1 true WO2024027315A1 (en) 2024-02-08

Family

ID=88855189

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/097184 WO2024027315A1 (en) 2022-08-05 2023-05-30 Audio processing method and apparatus, electronic device, storage medium, and program product

Country Status (2)

Country Link
CN (1) CN117135557A (en)
WO (1) WO2024027315A1 (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106572425A (en) * 2016-05-05 2017-04-19 王杰 Audio processing device and method
JP2017135669A (en) * 2016-01-29 2017-08-03 沖電気工業株式会社 Acoustic reproduction device and program
CN113889140A (en) * 2021-09-24 2022-01-04 北京有竹居网络技术有限公司 Audio signal playing method and device and electronic equipment
CN114040318A (en) * 2021-11-02 2022-02-11 海信视像科技股份有限公司 Method and equipment for playing spatial audio
US20220070587A1 (en) * 2020-08-28 2022-03-03 Faurecia Clarion Electronics Europe Electronic device and method for reducing crosstalk, related audio system for seat headrests and computer program
CN114143699A (en) * 2021-10-29 2022-03-04 北京奇艺世纪科技有限公司 Audio signal processing method and device and computer readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2017135669A (en) * 2016-01-29 2017-08-03 沖電気工業株式会社 Acoustic reproduction device and program
CN106572425A (en) * 2016-05-05 2017-04-19 王杰 Audio processing device and method
US20220070587A1 (en) * 2020-08-28 2022-03-03 Faurecia Clarion Electronics Europe Electronic device and method for reducing crosstalk, related audio system for seat headrests and computer program
CN113889140A (en) * 2021-09-24 2022-01-04 北京有竹居网络技术有限公司 Audio signal playing method and device and electronic equipment
CN114143699A (en) * 2021-10-29 2022-03-04 北京奇艺世纪科技有限公司 Audio signal processing method and device and computer readable storage medium
CN114040318A (en) * 2021-11-02 2022-02-11 海信视像科技股份有限公司 Method and equipment for playing spatial audio

Also Published As

Publication number Publication date
CN117135557A (en) 2023-11-28

Similar Documents

Publication Publication Date Title
EP3440538B1 (en) Spatialized audio output based on predicted position data
US20200186114A1 (en) Audio Signal Adjustment Method, Storage Medium, and Terminal
JP5882964B2 (en) Audio spatialization by camera
CN107071648B (en) Sound playing adjusting system, device and method
EP3039677B1 (en) Multidimensional virtual learning system and method
WO2020253844A1 (en) Method and device for processing multimedia information, and storage medium
WO2021012900A1 (en) Vibration control method and apparatus, mobile terminal, and computer-readable storage medium
US9241215B2 (en) Mobile apparatus and control method thereof
TWI703877B (en) Audio processing device, audio processing method, and computer program product
CN113890932A (en) Audio control method and system and electronic equipment
JP7275375B2 (en) Coordination of audio devices
WO2017215661A1 (en) Scenario-based sound effect control method and electronic device
US20220415333A1 (en) Using audio watermarks to identify co-located terminals in a multi-terminal session
JP2021535632A (en) Methods and equipment for processing audio signals
WO2022062531A1 (en) Multi-channel audio signal acquisition method and apparatus, and system
US20230370774A1 (en) Bluetooth speaker control method and system, storage medium, and mobile terminal
EP4162673A1 (en) Systems, devices, and methods of manipulating audio data based on microphone orientation
WO2024027315A1 (en) Audio processing method and apparatus, electronic device, storage medium, and program product
WO2020063027A1 (en) 3d sound effect processing method and related product
EP4162675A1 (en) Systems, devices, and methods of manipulating audio data based on display orientation
CN115002401B (en) Information processing method, electronic equipment, conference system and medium
US20230300549A1 (en) Audio playing method and device, storage medium, and mobile terminal
WO2023212883A1 (en) Audio output method and apparatus, communication apparatus, and storage medium
CN116347320B (en) Audio playing method and electronic equipment
WO2022002218A1 (en) Audio control method, system, and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23849025

Country of ref document: EP

Kind code of ref document: A1